diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md new file mode 100644 index 00000000000..0646aaff553 --- /dev/null +++ b/.github/CONTRIBUTING.md @@ -0,0 +1,89 @@ +# How to contribute to Open MPI + +First off, thank you for taking the time to prepare a contribution to +Open MPI! + +![You're awesome!](https://www.open-mpi.org/images/youre-awesome.jpg) + +General information about contributing to the Open MPI project can be found at the [Contributing to Open MPI webpage](https://www.open-mpi.org/community/contribute/). +The instructions below are specifically for opening issues and pull requests against Open MPI. + +## Content + +We love getting contributions from anyone. But keep in mind that Open +MPI is used in production environments all around the world. + +If you're contributing a small bug fix, awesome! + +If you're contributing a large new piece of functionality, that will +be best viewed if you -- or someone, anyone -- is also stepping up to +help maintain that functionality over time. We love new ideas and new +features, but we do need to be realistic in what we can reliably test +and deliver to our users. + +## Contributor's Declaration + +In order to ensure that we can keep distributing Open MPI under our +[open source license](LICENSE), we need to ensure that all +contributions are compatible with that license. + +To that end, we require that all Git commits contributed to Open MPI +have a "Signed-off-by" token indicating that the commit author agrees +with [Open MPI's Contributor's +Declaration](https://github.com/open-mpi/ompi/wiki/Admistrative-rules#contributors-declaration). + +If you have not already done so, please ensure that: + +1. Every commit contains exactly the "Signed-off-by" token. You can +add this token via `git commit -s`. +1. The email address after "Signed-off-by" must match the Git commit +email address. + +## **Did you find a bug?** + +* **Ensure the bug was not already reported** by searching on GitHub under [Issues](https://github.com/open-mpi/ompi/issues). + +* If you're unable to find an open issue addressing the problem, [open a new one](https://github.com/open-mpi/ompi/issues/new). + +* For more detailed information on submitting a bug report and creating an issue, visit our [Bug Tracking webpage](https://www.open-mpi.org/community/help/bugs.php). + +## **Did you write a patch that fixes a bug?** + +* Open a new GitHub pull request with the patch. + +* Ensure the PR description clearly describes the problem and solution. If there is an existing GitHub issue open describing this bug, please include it in the description so we can close it. + +* Before submitting, please read the [Contributing to the Open MPI Project FAQ](https://www.open-mpi.org/faq/?category=contributing) web page, and the [SubmittingPullRequests](https://github.com/open-mpi/ompi/wiki/SubmittingPullRequests) wiki. In particular, note that all git commits contributed to Open MPI require a Signed-off by line. + +## **Do you intend to add a new feature or change an existing one?** + +* Suggest your change on the [devel mail list](https://www.open-mpi.org/community/lists/ompi.php) and start writing code. The [developer level technical information on the internals of Open MPI](https://www.open-mpi.org/faq/?category=developers) may also be useful for large scale features. + +* Do not open an issue on GitHub until you have collected positive feedback about the change. GitHub issues are primarily intended for bug reports and fixes. + +## **Do you have questions about the source code?** + +* First checkout the [developer level technical information on the internals of Open MPI](https://www.open-mpi.org/faq/?category=developers). A paper describing the [multi-component architecture](https://www.open-mpi.org/papers/ics-2004/ics-2004.pdf) of Open MPI may also be helpful. The [devel mail list](https://www.open-mpi.org/community/lists/ompi.php) is a good place to post questions about the source code as well. + +## Style + +There are a small number of style rules for Open MPI: + +1. For all code: + * 4 space tabs. No more, no less. + * No tab characters *at all*. 2 indentations are 8 spaces -- not a tab. + * m4 code is a bit weird in terms of indentation: we don't have a + good, consistent indentation style in our existing code. But + still: no tab characters at all. +1. For C code: + * We prefer if all blocks are enclosed in `{}` (even 1-line + blocks). + * We prefer that if you are testing equality with a constant, put + the constant on the *left* of the `==`. E.g., `if (NULL == + ptr)`. + * If there are no parameters to a C function, declare it with + `(void)` (vs. `()`). + +That's about it. Thank you! + +- The Open MPI Team diff --git a/.github/issue_template.md b/.github/issue_template.md new file mode 100644 index 00000000000..5f11ebf8c9d --- /dev/null +++ b/.github/issue_template.md @@ -0,0 +1,29 @@ +Thank you for taking the time to submit an issue! + +## Background information + +### What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.) + + + +### Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.) + + + +### Please describe the system on which you are running + +* Operating system/version: +* Computer hardware: +* Network type: + +----------------------------- + +## Details of the problem + +Please describe, in detail, the problem that you are having, including the behavior you expect to see, the actual behavior that you are seeing, steps to reproduce the problem, etc. It is most helpful if you can attach a small program that a developer can use to reproduce your problem. + +**Note**: If you include verbatim output (or a code block), please use a [GitHub Markdown](https://help.github.com/articles/creating-and-highlighting-code-blocks/) code block like below: +```shell +shell$ mpirun -np 2 ./hello_world +``` + diff --git a/.gitignore b/.gitignore index b45ab10f922..944a82c91f7 100644 --- a/.gitignore +++ b/.gitignore @@ -67,6 +67,7 @@ vc70.pdb .hg .hgignore_local stamp-h? +AUTHORS ar-lib ylwrap @@ -128,6 +129,7 @@ examples/ring_oshmem examples/hello_oshmem examples/ring_oshmemfh examples/hello_oshmemfh +examples/hello_oshmemcxx examples/oshmem_circular_shift examples/oshmem_max_reduction examples/oshmem_shmalloc @@ -179,6 +181,8 @@ ompi/mca/io/romio314/romio/test/pfcoll_test.f ompi/mca/io/romio314/romio/test/runtests ompi/mca/io/romio314/romio/util/romioinstall +ompi/mca/osc/monitoring/osc_monitoring_template_gen.h + ompi/mca/pml/v/autogen.vprotocols ompi/mca/pml/v/mca_vprotocol_config_output @@ -244,6 +248,7 @@ ompi/mpiext/cuda/c/mpiext_cuda_c.h ompi/tools/mpisync/mpisync ompi/tools/mpisync/mpirun_prof ompi/tools/mpisync/ompi_timing_post +ompi/tools/mpisync/mpisync.1 ompi/tools/ompi_info/ompi_info ompi/tools/ompi_info/ompi_info.1 @@ -255,6 +260,7 @@ ompi/tools/wrappers/mpicc.1 ompi/tools/wrappers/mpic++.1 ompi/tools/wrappers/mpicxx.1 ompi/tools/wrappers/mpifort.1 +ompi/tools/wrappers/mpijavac.1 ompi/tools/wrappers/ompi_wrapper_script ompi/tools/wrappers/ompi.pc ompi/tools/wrappers/ompi-c.pc @@ -301,6 +307,8 @@ opal/mca/event/libevent*/libevent/include/event2/event-config.h opal/mca/hwloc/hwloc*/hwloc/include/hwloc/autogen/config.h opal/mca/hwloc/hwloc*/hwloc/include/private/autogen/config.h +opal/mca/hwloc/base/static-components.h.new.extern +opal/mca/hwloc/base/static-components.h.new.struct opal/mca/installdirs/config/install_dirs.h @@ -308,17 +316,34 @@ opal/mca/pmix/pmix*/pmix/include/pmix/autogen/config.h opal/mca/pmix/pmix*/pmix/include/pmix/autogen/config.h.in opal/mca/pmix/pmix*/pmix/src/include/private/autogen/config.h.in opal/mca/pmix/pmix*/pmix/src/include/private/autogen/config.h -opal/mca/hwloc/base/static-components.h.new.extern -opal/mca/hwloc/base/static-components.h.new.struct -opal/mca/pmix/pmix2x/pmix/src/include/frameworks.h -opal/mca/pmix/pmix2x/pmix/src/mca/pinstalldirs/config/pinstall_dirs.h -opal/mca/pmix/pmix2x/pmix/config/autogen_found_items.m4 -opal/mca/pmix/pmix2x/pmix/src/include/pmix_config.h -opal/mca/pmix/pmix2x/pmix/src/include/pmix_config.h.in -opal/mca/pmix/pmix2x/pmix/include/pmix_rename.h -opal/mca/pmix/pmix2x/pmix/include/pmix_version.h -opal/mca/pmix/pmix2x/pmix/src/util/keyval/keyval_lex.c -opal/mca/pmix/pmix2x/pmix/src/util/show_help_lex.c +opal/mca/pmix/pmix*/pmix/src/include/frameworks.h +opal/mca/pmix/pmix*/pmix/src/mca/pinstalldirs/config/pinstall_dirs.h +opal/mca/pmix/pmix*/pmix/config/autogen_found_items.m4 +opal/mca/pmix/pmix*/pmix/src/include/pmix_config.h +opal/mca/pmix/pmix*/pmix/src/include/pmix_config.h.in +opal/mca/pmix/pmix*/pmix/include/pmix_common.h +opal/mca/pmix/pmix*/pmix/include/pmix_rename.h +opal/mca/pmix/pmix*/pmix/include/pmix_version.h +opal/mca/pmix/pmix*/pmix/src/util/keyval/keyval_lex.c +opal/mca/pmix/pmix*/pmix/src/util/show_help_lex.c +opal/mca/pmix/pmix*/pmix/examples/alloc +opal/mca/pmix/pmix*/pmix/examples/client +opal/mca/pmix/pmix*/pmix/examples/debugger +opal/mca/pmix/pmix*/pmix/examples/debuggerd +opal/mca/pmix/pmix*/pmix/examples/dmodex +opal/mca/pmix/pmix*/pmix/examples/dynamic +opal/mca/pmix/pmix*/pmix/examples/fault +opal/mca/pmix/pmix*/pmix/examples/jctrl +opal/mca/pmix/pmix*/pmix/examples/pub +opal/mca/pmix/pmix*/pmix/examples/server +opal/mca/pmix/pmix*/pmix/examples/tool + +opal/mca/pmix/ext3x/ext3x.c +opal/mca/pmix/ext3x/ext3x.h +opal/mca/pmix/ext3x/ext3x_client.c +opal/mca/pmix/ext3x/ext3x_component.c +opal/mca/pmix/ext3x/ext3x_server_north.c +opal/mca/pmix/ext3x/ext3x_server_south.c opal/tools/opal-checkpoint/opal-checkpoint opal/tools/opal-checkpoint/opal-checkpoint.1 @@ -357,6 +382,7 @@ orte/test/mpi/accept orte/test/mpi/attach orte/test/mpi/bad_exit orte/test/mpi/bcast_loop +orte/test/mpi/binding orte/test/mpi/concurrent_spawn orte/test/mpi/connect orte/test/mpi/crisscross @@ -367,6 +393,7 @@ orte/test/mpi/hello_output orte/test/mpi/hello_show_help orte/test/mpi/hello orte/test/mpi/hello++ +orte/test/mpi/interlib orte/test/mpi/loop_child orte/test/mpi/loop_spawn orte/test/mpi/mpi_barrier @@ -377,6 +404,7 @@ orte/test/mpi/parallel_r8 orte/test/mpi/parallel_r64 orte/test/mpi/parallel_w8 orte/test/mpi/parallel_w64 +orte/test/mpi/pinterlib orte/test/mpi/pmix orte/test/mpi/pubsub orte/test/mpi/read_write @@ -386,6 +414,7 @@ orte/test/mpi/segv orte/test/mpi/simple_spawn orte/test/mpi/slave orte/test/mpi/spawn_multiple +orte/test/mpi/xlib orte/test/mpi/ziaprobe orte/test/mpi/ziatest orte/test/mpi/*.dwarf @@ -413,6 +442,9 @@ orte/test/mpi/memcached-dummy orte/test/mpi/coll_test orte/test/mpi/badcoll orte/test/mpi/iof +orte/test/mpi/no-disconnect +orte/test/mpi/nonzero +orte/test/mpi/add_host orte/test/system/radix orte/test/system/sigusr_trap @@ -452,6 +484,7 @@ orte/test/system/orte_sensor orte/test/system/event-threads orte/test/system/test-time orte/test/system/psm_keygen +orte/test/system/pspawn orte/test/system/regex orte/test/system/orte_errors orte/test/system/evthread-test @@ -470,6 +503,7 @@ orte/test/system/opal_db orte/test/system/ulfm orte/test/system/pmixtool orte/test/system/orte_notify +orte/test/system/threads orte/tools/orte-checkpoint/orte-checkpoint orte/tools/orte-checkpoint/orte-checkpoint.1 @@ -500,6 +534,8 @@ orte/tools/orted/orted orte/tools/orted/orted.1 orte/tools/orterun/orterun orte/tools/orterun/orterun.1 +orte/tools/prun/prun +orte/tools/prun/prun.1 orte/tools/wrappers/ortecc-wrapper-data.txt orte/tools/wrappers/ortec++-wrapper-data.txt orte/tools/wrappers/ortecc.1 @@ -560,6 +596,13 @@ oshmem/tools/wrappers/shmemfort.1 oshmem/tools/wrappers/shmemrun.1 oshmem/tools/wrappers/shmemcc-wrapper-data.txt oshmem/tools/wrappers/shmemfort-wrapper-data.txt +oshmem/tools/wrappers/oshCC.1 +oshmem/tools/wrappers/oshc++.1 +oshmem/tools/wrappers/oshcxx.1 +oshmem/tools/wrappers/shmemCC.1 +oshmem/tools/wrappers/shmemc++-wrapper-data.txt +oshmem/tools/wrappers/shmemc++.1 +oshmem/tools/wrappers/shmemcxx.1 test/asm/atomic_math_noinline test/asm/atomic_barrier @@ -600,6 +643,7 @@ test/datatype/ddt_raw test/datatype/opal_datatype_test test/datatype/position_noncontig test/datatype/unpack_ooo +test/datatype/unpack_hetero test/dss/dss_buffer test/dss/dss_copy @@ -615,6 +659,11 @@ test/event/event-test test/event/time-test test/monitoring/monitoring_test +test/monitoring/check_monitoring +test/monitoring/example_reduce_count +test/monitoring/test_overhead +test/monitoring/test_pvar_access + test/mpi/environment/chello @@ -647,3 +696,7 @@ test/util/opal_sos test/util/opal_path_nfs test/util/opal_path_nfs.out test/util/opal_bit_ops +test/util/bipartite_graph + +opal/test/reachable/reachable_netlink +opal/test/reachable/reachable_weighted diff --git a/.mailmap b/.mailmap index 12fb6064507..e8e71435ca9 100644 --- a/.mailmap +++ b/.mailmap @@ -107,3 +107,7 @@ Alex Mikheev Thomas Naughton Geoffrey Paulsen + +Anandhi S Jayakumar + +Mohan Gandhi diff --git a/.travis.yml b/.travis.yml index 1a2463543b5..2ed21afd01b 100644 --- a/.travis.yml +++ b/.travis.yml @@ -14,10 +14,9 @@ compiler: - gcc - clang -# Iterate over 2 different OSs +# Test only linux now os: - linux - - osx addons: # For Linux, make sure we have some extra packages that we like to diff --git a/AUTHORS b/AUTHORS deleted file mode 100644 index 5f48fce071b..00000000000 --- a/AUTHORS +++ /dev/null @@ -1,363 +0,0 @@ -Open MPI Authors -================ - -The following cumulative list contains the names and email addresses -of all individuals who have committed code to the Open MPI repository -(either directly or through a third party, such as through a -Github.com pull request). Note that these email addresses are not -guaranteed to be current; they are simply a unique indicator of the -individual who committed them. - ------ - -Abhishek Joshi, Broadcom - abhishek.joshi@broadcom.com -Abhishek Kulkarni, Indiana University - adkulkar@cs.indiana.edu -Aboorva Devarajan, IBM - abodevar@in.ibm.com -Adrian Knoth, Friedrich-Schiller-Universitat Jena - adi@minet.uni-jena.de -Adrian Reber, Hochschule Esslingen - adrian@lisas.de -Alejandro Vilches, Intel - alejandro.vilches@intel.com -Aleksey Senin, Mellanox - alekseys@mellanox.com -Alex Margolin, Mellanox - alex.margolin@mail.huji.ac.il -Alex Mikheev, Mellanox - alexm@mellanox.com -Alina Sklarevich, Mellanox - alinas@mellanox.com -Anandhi S Jayakumar, Intel - anandhi.s.jayakumar@intel.com -Andreas Knüpfer, Technische Universitaet Dresden - andreas.knuepfer@tu-dresden.de -Andrew Friedley, Indiana University, Sandia National Laboratory, Intel - afriedle@osl.iu.edu - andrew.friedley@intel.com -Andrew Lumsdaine, Indiana University - lums@cs.indiana.edu -Annapurna Dasari, Intel - annapurna.dasari@intel.com -Anya Tatashina, Sun - anya.tatashina@sun.com -Artem Polyakov, Individual, Mellanox - artpol84@gmail.com -Aurelien Bouteiller, University of Tennessee-Knoxville - bouteill@icl.utk.edu - darter4.nics.utk.edu -Avneesh Pant, QLogic - avneesh.pant@qlogic.com -Bert Wesarg, Technische Universitaet Dresden - bert.wesarg@tu-dresden.de -Bill D'Amico, Cisco - bdamico@cisco.com -Boris Karasev, Mellanox - karasev.b@gmail.com -Brad Benton, IBM, AMD - brad.benton@us.ibm.com -Brad Penoff, University of British Columbia - penoff@cs.ubc.ca -Brian Barrett, Indiana University, Los Alamos National Laboratory, Sandia National Laboratory - brbarret@open-mpi.org -Brice Goglin, INRIA - brice.goglin@inria.fr -Camille Coti, University of Tennessee-Knoxville, INRIA - ccoti@icl.utk.edu -Carlos Bederián, Individual - bc@famaf.unc.edu.ar -Christian Bell, QLogic - christian.bell@qlogic.com -Christoph Niethammer, High Performance Computing Center, Stuttgart - niethammer@hlrs.de -Christopher Yeoh, IBM - cyeoh@au1.ibm.com -Clement Foyer, INRIA - clement.foyer@inria.fr -Craig E Rasmussen, Los Alamos National Laboratory, University of Oregon - rasmus@cas.uoregon.edu -Dan Lacher, Sun - dan.lacher@sun.com -Dave Goodell, Cisco - davidjgoodell@gmail.com - dgoodell@cisco.com -David Daniel, Los Alamos National Laboratory - ddd@lanl.gov -Denis Dimick, Los Alamos National Laboratory - dgdimick@lnal.gov -Devendar Bureddy, Mellanox - devendar@mellanox.com -Dimitar Pashov, Individual - d.pashov@gmail.com -Donald Kerr, Sun, Oracle - donald.kerr@oracle.com -Doron Shoham, Mellanox - dorons@mellanox.com -Edgar Gabriel, High Performance Computing Center, Stuttgart, University of Tennessee-Knoxville, University of Houston - gabriel@cs.uh.edu -Elena Elkina, Mellanox - elena.elkina89@gmail.com - elena.elkina@itseez.com -Ethan Mallove, Sun, Oracle - ethan.mallove@oracle.com -Eugene Loh, Sun, Oracle - eugene.loh@oracle.com -Federico Reghenzani, Individual - federico1.reghenzani@mail.polimi.it -Francois WELLENREITER, Individual - francois.wellenreiter@atos.net - wellen@free.fr -Gabriel Pichot, Individual - gabriel.pichot@gmail.com -Galen Shipman, Los Alamos National Laboratory - gshipman@lanl.gov -Geoffrey Paulsen, IBM - gpaulsen@us.ibm.com -George Bosilca, University of Tennessee-Knoxville - bosilca@eecs.utk.edu - bosilca@icl.utk.edu -Gilles Gouaillardet, Research Organization for Information Science and Technology - gilles.gouaillardet@iferc.org - gilles@rist.or.jp -Ginger Young, Los Alamos National Laboratory - gingery@lanl.gov -Gleb Natapov, Voltaire - gleb@voltaire.com -Gopal Santhanaraman, The Ohio State University - santhana@osu.edu -Graham Fagg, University of Tennessee-Knoxville - gef@icl.utk.edu -Greg Koenig, Oak Ridge National Laboratory - koenig@acm.org -Greg Watson, Los Alamos National Laboratory - gwatson@lanl.gov -Gregory M. Kurtzer, Lawrence Berkeley National Laboratory - gmkurtzer@lbl.gov -Guillaume Papauré, Bull - guillaume.papaure@bull.net -Hadi Montakhabi, University of Houston - hmontakhabi@uh.edu -Howard Pritchard, Los Alamos National Laboratory - howardp@lanl.gov - hppritcha@gmail.com -Iain Bason, Sun, Oracle - iain.bason@oracle.com -Igor Ivanov, Mellanox - igor.ivanov.va@gmail.com - igor.ivanov@itseez.com -Igor Usarov, Mellanox - igoru@mellanox.com -Jeff Squyres, University of Indiana, Cisco - jeff@squyres.com - jsquyres@cisco.com -Jelena Pjesivac-Grbovic, University of Tennessee-Knoxville - pjesa@icl.iu.edu -Jijo Varghese, Individual - jijo733@gmail.com -Jithin Jose, Intel - jithin.jose@intel.com -John Westlund, Intel - john.a.westlund@intel.com -Jon Mason, OpenGrid Computing - jdmason@opengridcomputing.com -Jose Roman, Universitat Politecnica de Valencia - jroman@dsic.upv.es -Josh Hursey, Indiana University, Oak Ridge National Laboratory, Los Alamos National Laboratory, Lawrence Berkeley National Laboratory, University of Wisconsin-La Crosse, IBM - jhursey@us.ibm.com - jjhursey@open-mpi.org -Joshua Gerrard, Individual - enquiries@joshuagerrard.com - joshuagerrard+ompi-commit@protonmail.com -Joshua Ladd, Mellanox - jladd.mlnx@gmail.com - joshual@mellanox.com -KAWASHIMA Takahiro, Fujistu - t-kawashima@jp.fujitsu.com -Karen Norteman, Sun - karen.norteman@sun.com -Karol Mroz, University of British Columbia - mroz.karol@gmail.com -Kenneth Matney, Oak Ridge National Laboratory - matneykdsr@ornl.gov -L. R. Rajeshnarayanan, Intel - l.r.rajeshnarayanan@intel.com -LANL OMPI Bot, Los Alamos National Laboratory - openmpihpp@gmail.com -Laura Casswell, Los Alamos National Laboratory - lcasswell@lanl.gov -Lenny Verkhovsky, Voltaire - lennyb@voltaire.com -Leobardo Ruiz Rountree, Individual - lruizrountree@gmail.com -Li-Ta Lo, Los Alamos National Laboratory - ollie@lanl.gov -MPI Team (bot), self - mpiteam@open-mpi.org -Mangala Jyothi Bhaskar, University of Houston - mjbhaskar@crill.cs.uh.edu - mjbhaskar@salmon.cs.uh.edu - mjbhaskar@uh.edu -Manjunath Gorentla Venkata, Oak Ridge National Laboratory - manjugv@ornl.gov -Mark Allen, IBM - markalle@us.ibm.com -Mark Santcroos, Rutgers University - m.a.santcroos@amc.uva.nl - mas781@scarletmail.rutgers.edu -Mark Taylor, Los Alamos National Laboratory - mt@lanl.gov -Martin Kontsek, Cisco - mkontsek@cisco.com -Matias A Cabral, Intel - matias.a.cabral@intel.com -Matthias Jurenz, Technische Universitaet Dresden - matthias.jurenz@tu-dresden.de -Maximilien Levesque, Individual - maximilien.levesque@gmail.com -Mike Dubman, Mellanox - miked@mellanox.com -Mitch Sukalski, Sandia National Laboratory - mwsukal@ca.sandia.gov -Mohamad Chaarawi, University of Houston - mschaara@cs.uh.edu -Nadezhda Kogteva, Mellanox - nadezhda.kogteva@itseez.com - nadezhda@mngx-orion-01.dmz.e2e.mlnx -Nadia Derbey, Bull - nadia.derbey@bull.net -Nathan Hjelm, Los Alamos National Laboratory - hjelmn@cs.unm.edu - hjelmn@lanl.gov - hjelmn@me.com -Nathaniel Graham, Los Alamos National Laboratory - ngraham@lanl.gov - nrgraham23@gmail.com -Nick Papior Andersen, Individual - nickpapior@gmail.com -Nicolas Chevalier, Bull - nicolas.chevalier@bull.net -Nysal Jan K A, IBM - jnysal@gmail.com - jnysal@in.ibm.com -Omri Mor - omri50@gmail.com -Orion Poplawski, Individual - orion@cora.nwra.com -Oscar Vega-Gisbert, Universitat Politecnica de Valencia - ovega@dsic.upv.es -Pak Lui, Sun - pak.lui@sun.com -Pascal Deveze, Bull - pascal.deveze@atos.net -Patrick Geoffray, Myricom - patrick@myri.com -Pavel Shamis, Mellanox, Oak Ridge National Laboratory - shamisp@ornl.gov -Pierre Lemarinier, University of Tennessee-Knoxville - lemarini@icl.utk.edu -Piotr Lesnicki, Bull - piotr.lesnicki@ext.bull.net -Potnuri Bharat Teja, Chelsio - bharat@chelsio.com -Prabhanjan Kambadur, Indiana University - pkambadu@osl.iu.edu -Raghavendra Pendyala, Intel - raghavendra.p.pendyala@intel.com -Rainer Keller, High Performance Computing Center, Stuttgart, Oak Ridge National Laboratory, Hochschule fuer Technik Stuttgart - rainer.keller@hft-stuttgart.de - rainer.keller@hlrs.de -Ralph Castain, Los Alamos National Laboratory, Cisco, Greenplum, Intel - rhc@open-mpi.org -Reese Faucette, Cisco - rfaucett@cisco.com -Rich Graham, Los Alamos National Laboratory, Oak Ridge National Laboratory, Mellanox - richardg@mellanox.com -Rob Awles, Los Alamos National Laboratory - rta@lanl.gov -Rob Latham, Argonne National Laboratory - robl@mcs.anl.gov -Rolf vandeVaart, Sun, Oracle, NVIDIA - rvandevaart@nvidia.com -Ron Brightwell, Sandia National Laboratory - rbbrigh@sandia.gov -Ryan Grant, Sandia National Laboratory - regrant233@gmail.com - regrant@sandia.gov -Sameh S. Sharkawi, IBM - sssharka@us.ibm.com -Sami Ayyorgun, Los Alamos National Laboratory - sami@lanl.gov -Samuel Gutierrez, Los Alamos National Laboratory - samuel@lanl.gov -Sayantan Sur, The Ohio State University - surs@osu.edu -Sharon Melamed, Voltaire - sharonm@voltaire.com -Shiqing Fan, High Performance Computing Center, Stuttgart - shiqing@hlrs.de -Steve Wise, OpenGrid Computing - swise@opengridcomputing.com -Sushant Sharma, Los Alamos National Laboratory - sushant@lanl.gov -Sven Stork, High Performance Computing Center, Stuttgart - stork@hlrs.de -Swen Boehm, Oak Ridge National Laboratory - sboehm@ornl.gov -Sylvain Jeaugey, Bull, NVIDIA - sjeaugey@nvidia.com - sylvain.jeaugey@bull.net -Teng Lin, Individual - teng.lin@gmail.com -Terry Dontje, Sun, Oracle - terry.dontje@oracle.com -Thananon Patinyasakdikul, Cisco, University of Tennessee-Knoxville - apatinya@cisco.com - tpatinya@utk.edu -Thara Angskun, University of Tennessee-Knoxville - angskun@cs.utk.edu -Thomas Herault, University of Tennessee-Knoxville - herault@icl.utk.edu -Thomas Naughton, Oak Ridge National Laboratory - naughtont@ornl.gov -Tim Mattox, Indiana University, Cisco, Individual - timothy.mattox@engilitycorp.com - tmattox@gmail.com -Tim Prins, Indiana University, Los Alamos National Laboratory - tprins@lanl.gov -Tim Woodall, Los Alamos National Laboratory - twoodall@lanl.gov -Todd Kordenbrock, Sandia National Laboratory - thkgcode@gmail.com - thkorde@sandia.gov -Tomislav Janjusic, Mellanox - tomislavj@mngx-apl-01.mtl.labs.mlnx -Torsten Hoefler, Indiana University, Technische Universtaet Chemnitz - htor@osl.iu.edu -Valentin Petrov, Mellanox - valentinp@mellanox.com -Vasily Filipov, Mellanox - vasily@mellanox.com -Vishal Sahay, Indiana University - vsahay@osl.iu.edu -Vishwanath Venkatesan, University of Houston, Intel - vvenkates@gmail.com -Weikuan Yu, Los Alamos National Laboratory - yuw@lanl.gov -Wesley Bland, University of Tennessee-Knoxville - wbland@icl.utk.edu -William Throwe, Individual - wtt6@cornell.edu -Xin Zhao, Mellanox - xinz@mellanox.com -Yael Dayan, Mellanox - yaeld@mellanox.com -Yevgeny Kliteynik, Mellanox - kliteyn@mellanox.co.il -Yohann Burette, Intel - yohann.burette@intel.com -Yossi Itigin, Mellanox - yosefe@mellanox.com -Zhi Ming Wang, IBM - wangzm@cn.ibm.com diff --git a/LICENSE b/LICENSE index 0599a587acb..c835765b580 100644 --- a/LICENSE +++ b/LICENSE @@ -8,24 +8,24 @@ corresponding files. Copyright (c) 2004-2010 The Trustees of Indiana University and Indiana University Research and Technology Corporation. All rights reserved. -Copyright (c) 2004-2010 The University of Tennessee and The University +Copyright (c) 2004-2017 The University of Tennessee and The University of Tennessee Research Foundation. All rights reserved. Copyright (c) 2004-2010 High Performance Computing Center Stuttgart, University of Stuttgart. All rights reserved. Copyright (c) 2004-2008 The Regents of the University of California. All rights reserved. -Copyright (c) 2006-2010 Los Alamos National Security, LLC. All rights +Copyright (c) 2006-2017 Los Alamos National Security, LLC. All rights reserved. -Copyright (c) 2006-2010 Cisco Systems, Inc. All rights reserved. +Copyright (c) 2006-2017 Cisco Systems, Inc. All rights reserved. Copyright (c) 2006-2010 Voltaire, Inc. All rights reserved. -Copyright (c) 2006-2011 Sandia National Laboratories. All rights reserved. +Copyright (c) 2006-2017 Sandia National Laboratories. All rights reserved. Copyright (c) 2006-2010 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. -Copyright (c) 2006-2010 The University of Houston. All rights reserved. +Copyright (c) 2006-2017 The University of Houston. All rights reserved. Copyright (c) 2006-2009 Myricom, Inc. All rights reserved. -Copyright (c) 2007-2008 UT-Battelle, LLC. All rights reserved. -Copyright (c) 2007-2010 IBM Corporation. All rights reserved. +Copyright (c) 2007-2017 UT-Battelle, LLC. All rights reserved. +Copyright (c) 2007-2017 IBM Corporation. All rights reserved. Copyright (c) 1998-2005 Forschungszentrum Juelich, Juelich Supercomputing Centre, Federal Republic of Germany Copyright (c) 2005-2008 ZIH, TU Dresden, Federal Republic of Germany @@ -35,17 +35,26 @@ Copyright (c) 2008-2009 Institut National de Recherche en Informatique. All rights reserved. Copyright (c) 2007 Lawrence Livermore National Security, LLC. All rights reserved. -Copyright (c) 2007-2009 Mellanox Technologies. All rights reserved. +Copyright (c) 2007-2017 Mellanox Technologies. All rights reserved. Copyright (c) 2006-2010 QLogic Corporation. All rights reserved. -Copyright (c) 2008-2010 Oak Ridge National Labs. All rights reserved. -Copyright (c) 2006-2010 Oracle and/or its affiliates. All rights reserved. -Copyright (c) 2009 Bull SAS. All rights reserved. +Copyright (c) 2008-2017 Oak Ridge National Labs. All rights reserved. +Copyright (c) 2006-2012 Oracle and/or its affiliates. All rights reserved. +Copyright (c) 2009-2015 Bull SAS. All rights reserved. Copyright (c) 2010 ARM ltd. All rights reserved. +Copyright (c) 2016 ARM, Inc. All rights reserved. Copyright (c) 2010-2011 Alex Brick . All rights reserved. Copyright (c) 2012 The University of Wisconsin-La Crosse. All rights reserved. Copyright (c) 2013-2016 Intel, Inc. All rights reserved. -Copyright (c) 2011-2014 NVIDIA Corporation. All rights reserved. +Copyright (c) 2011-2017 NVIDIA Corporation. All rights reserved. +Copyright (c) 2016 Broadcom Limited. All rights reserved. +Copyright (c) 2011-2017 Fujitsu Limited. All rights reserved. +Copyright (c) 2014-2015 Hewlett-Packard Development Company, LP. All + rights reserved. +Copyright (c) 2013-2017 Research Organization for Information Science (RIST). + All rights reserved. +Copyright (c) 2017 Amazon.com, Inc. or its affiliates. All Rights + reserved. $COPYRIGHT$ diff --git a/Makefile.am b/Makefile.am index a4eba0a207f..99316916f74 100644 --- a/Makefile.am +++ b/Makefile.am @@ -12,6 +12,8 @@ # Copyright (c) 2006-2016 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2012-2015 Los Alamos National Security, Inc. All rights reserved. # Copyright (c) 2014 Intel, Inc. All rights reserved. +# Copyright (c) 2017-2018 Amazon.com, Inc. or its affiliates. +# All Rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -20,12 +22,17 @@ # SUBDIRS = config contrib $(MCA_PROJECT_SUBDIRS) test -EXTRA_DIST = README INSTALL VERSION Doxyfile LICENSE autogen.pl README.JAVA.txt +DIST_SUBDIRS = config contrib $(MCA_PROJECT_DIST_SUBDIRS) test +EXTRA_DIST = README INSTALL VERSION Doxyfile LICENSE autogen.pl README.JAVA.txt AUTHORS include examples/Makefile.include dist-hook: env LS_COLORS= sh "$(top_srcdir)/config/distscript.sh" "$(top_srcdir)" "$(distdir)" "$(OMPI_REPO_REV)" + @if test ! -s $(distdir)/AUTHORS ; then \ + echo "AUTHORS file is empty; aborting distribution"; \ + exit 1; \ + fi # Check for common symbols. Use a "-hook" to increase the odds that a # developer will see it at the end of their installation process. @@ -39,3 +46,15 @@ install-exec-hook: fi ACLOCAL_AMFLAGS = -I config + +# Use EXTRA_DIST and an explicit target (with a FORCE hack so that +# it's always run) rather than a dist-hook because there's some magic +# extra logic in Automake that will add AUTHORS to EXTRA_DIST if the +# file exists when Automake is run. Once we're explicit (to avoid odd +# copy behavior), it's easier to always build AUTHORS here, rather +# than trying to handle the EXTRA_DIST dependency from a clean repo +# (no AUTHORS file present) and use dist-hook to run every time. +AUTHORS: FORCE + $(PERL) "$(top_srcdir)/contrib/dist/make-authors.pl" --skip-ok --quiet --srcdir="$(top_srcdir)" + +FORCE: diff --git a/NEWS b/NEWS index 25238502017..3c6ec8f7fc8 100644 --- a/NEWS +++ b/NEWS @@ -8,18 +8,20 @@ Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, University of Stuttgart. All rights reserved. Copyright (c) 2004-2006 The Regents of the University of California. All rights reserved. -Copyright (c) 2006-2016 Cisco Systems, Inc. All rights reserved. +Copyright (c) 2006-2018 Cisco Systems, Inc. All rights reserved. Copyright (c) 2006 Voltaire, Inc. All rights reserved. Copyright (c) 2006 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. -Copyright (c) 2006-2016 Los Alamos National Security, LLC. All rights +Copyright (c) 2006-2017 Los Alamos National Security, LLC. All rights reserved. -Copyright (c) 2010-2012 IBM Corporation. All rights reserved. +Copyright (c) 2010-2017 IBM Corporation. All rights reserved. Copyright (c) 2012 Oak Ridge National Labs. All rights reserved. Copyright (c) 2012 Sandia National Laboratories. All rights reserved. Copyright (c) 2012 University of Houston. All rights reserved. Copyright (c) 2013 NVIDIA Corporation. All rights reserved. -Copyright (c) 2013-2016 Intel, Inc. All rights reserved. +Copyright (c) 2013-2018 Intel, Inc. All rights reserved. +Copyright (c) 2018 Amazon.com, Inc. or its affiliates. All Rights + reserved. $COPYRIGHT$ Additional copyrights may follow @@ -56,80 +58,817 @@ included in the vX.Y.Z section and be denoted as: Master (not on release branches yet) ------------------------------------ +- Fix rank-by algorithms to properly rank by object and span +- Do not build Open SHMEM layer when there are no SPMLs available. + Currently, this means the Open SHMEM layer will only build if + a MXM or UCX library is found. + +3.1.0 -- May, 2018 +------------------ + +- Various OpenSHMEM bug fixes. +- Properly handle array_of_commands argument to Fortran version of + MPI_COMM_SPAWN_MULTIPLE. +- Fix bug with MODE_SEQUENTIAL and the sharedfp MPI-IO component. +- Use "javac -h" instead of "javah" when building the Java bindings + with a recent version of Java. +- Fix mis-handling of jostepid under SLURM that could cause problems + with PathScale/OmniPath NICs. +- Disable the POWER 7/BE block in configure. Note that POWER 7/BE is + still not a supported platform, but it is no longer automatically + disabled. See + https://github.com/open-mpi/ompi/issues/4349#issuecomment-374970982 + for more information. +- The output-filename option for mpirun is now converted to an + absolute path before being passed to other nodes. +- Add monitoring component for PML, OSC, and COLL to track data + movement of MPI applications. See + ompi/mca/commmon/monitoring/HowTo_pml_monitoring.tex for more + information about the monitoring framework. +- Add support for communicator assertions: mpi_assert_no_any_tag, + mpi_assert_no_any_source, mpi_assert_exact_length, and + mpi_assert_allow_overtaking. +- Update PMIx to version 2.1.1. +- Update hwloc to 1.11.7. +- Many one-sided behavior fixes. +- Improved performance for Reduce and Allreduce using Rabenseifner's algorithm. +- Revamped mpirun --help output to make it a bit more manageable. +- Portals4 MTL improvements: Fix race condition in rendezvous protocol and + retry logic. +- UCX OSC: initial implementation. +- UCX PML improvements: add multi-threading support. +- Yalla PML improvements: Fix error with irregular contiguous datatypes. +- Openib BTL: disable XRC support by default. +- TCP BTL: Add check to detect and ignore connections from processes + that aren't MPI (such as IDS probes) and verify that source and + destination are using the same version of Open MPI, fix issue with very + large message transfer. - ompi_info parsable output now escapes double quotes in values, and also quotes values can contains colons. Thanks to Lev Givon for the suggestion. - CUDA-aware support can now handle GPUs within a node that do not support CUDA IPC. Earlier versions would get error and abort. -- Do not build the MPI C++ bindings by default. They must be enabled - via --enable-mpi-cxx. -- Removed embedded VampirTrace. It is in maintenance mode since 2013. - Please consider Score-P (score-p.org) as an external replacement. +- Add a mca parameter ras_base_launch_orted_on_hn to allow for launching + MPI processes on the same node where mpirun is executing using a separate + orte daemon, rather than the mpirun process. This may be useful to set to + true when using SLURM, as it improves interoperability with SLURM's signal + propagation tools. By default it is set to false, except for Cray XC systems. +- Remove LoadLeveler RAS support. +- Remove IB XRC support from the OpenIB BTL due to lack of support. +- Add functionality for IBM s390 platforms. Note that regular + regression testing does not occur on the s390 and it is not + considered a supported platform. +- Remove support for big endian PowerPC. +- Remove support for XL compilers older than v13.1. +- Remove support for atomic operations using MacOS atomics library. + +3.0.1 -- March, 2018 +---------------------- + +- Fix ability to attach parallel debuggers to MPI processes. +- Fix a number of issues in MPI I/O found by the HDF5 test suite. +- Fix (extremely) large message transfers with shared memory. +- Fix out of sequence bug in multi-NIC configurations. +- Fix stdin redirection bug that could result in lost input. +- Disable the LSF launcher if CSM is detected. +- Plug a memory leak in MPI_Mem_free(). Thanks to Philip Blakely for reporting. +- Fix the tree spawn operation when the number of nodes is larger than the radix. + Thanks to Carlos Eduardo de Andrade for reporting. +- Fix Fortran 2008 macro in MPI extensions. Thanks to Nathan T. Weeks for + reporting. +- Add UCX to list of interfaces that OpenSHMEM will use by default. +- Add --{enable|disable}-show-load-errors-by-default to control + default behavior of the load errors option. +- OFI MTL improvements: handle empty completion queues properly, fix + incorrect error message around fi_getinfo(), use default progress + option for provider by default, Add support for reading multiple + CQ events in ofi_progress. +- PSM2 MTL improvements: Allow use of GPU buffers, thread fixes. +- Numerous corrections to memchecker behavior. +- Add a mca parameter ras_base_launch_orted_on_hn to allow for launching + MPI processes on the same node where mpirun is executing using a separate + orte daemon, rather than the mpirun process. This may be useful to set to + true when using SLURM, as it improves interoperability with SLURM's signal + propagation tools. By default it is set to false, except for Cray XC systems. +- Fix a problem reported on the mailing separately by Kevin McGrattan and Stephen + Guzik about consistency issues on NFS file systems when using OMPIO. This fix + also introduces a new mca parameter fs_ufs_lock_algorithm which allows to + control the locking algorithm used by ompio for read/write operations. By + default, ompio does not perfom locking on local UNIX file systems, locks the + entire file per operation on NFS file systems, and selective byte-range + locking on other distributed file systems. +- Add an mca parameter pmix_server_usock_connections to allow mpirun to + support applications statically built against the Open MPI v2.x release, + or installed in a container along with the Open MPI v2.x libraries. It is + set to false by default. + +3.0.0 -- September, 2017 +------------------------ + +Major new features: + +- Use UCX allocator for OSHMEM symmetric heap allocations to optimize intra-node + data transfers. UCX SPML only. +- Use UCX multi-threaded API in the UCX PML. Requires UCX 1.0 or later. +- Added support for Flux PMI +- Update embedded PMIx to version 2.1.0 +- Update embedded hwloc to version 1.11.7 + +Changes in behavior compared to prior versions: + +- Per Open MPI's versioning scheme (see the README), increasing the + major version number to 3 indicates that this version is not + ABI-compatible with prior versions of Open MPI. In addition, there may + be differences in MCA parameter names and defaults from previous releases. + Command line options for mpirun and other commands may also differ from + previous versions. You will need to recompile MPI and OpenSHMEM applications + to work with this version of Open MPI. +- With this release, Open MPI supports MPI_THREAD_MULTIPLE by default. +- New configure options have been added to specify the locations of libnl + and zlib. +- A new configure option has been added to request Flux PMI support. +- The help menu for mpirun and related commands is now context based. + "mpirun --help compatibility" generates the help menu in the same format + as previous releases. + +Removed legacy support: +- AIX is no longer supported. +- Loadlever is no longer supported. +- OpenSHMEM currently supports the UCX and MXM transports via the ucx and ikrit + SPMLs respectively. +- Remove IB XRC support from the OpenIB BTL due to lack of support. +- Remove support for big endian PowerPC. +- Remove support for XL compilers older than v13.1 + +Known issues: + +- MPI_Connect/accept between applications started by different mpirun + commands will fail, even if ompi-server is running. + +2.1.3 -- March, 2018 +-------------------- + +Bug fixes/minor improvements: +- Update internal PMIx version to 1.2.5. +- Fix a problem with ompi_info reporting using param option. + Thanks to Alexander Pozdneev for reporting. +- Correct PMPI_Aint_{add|diff} to be functions (not subroutines) + in the Fortran mpi_f08 module. +- Fix a problem when doing MPI I/O using data types with large + extents in conjunction with MPI_TYPE_CREATE_SUBARRAY. Thanks to + Christopher Brady for reporting. +- Fix a problem when opening many files using MPI_FILE_OPEN. + Thanks to William Dawson for reporting. +- Fix a problem with debuggers failing to attach to a running job. + Thanks to Dirk Schubert for reporting. +- Fix a problem when using madvise and the OpenIB BTL. Thanks to + Timo Bingmann for reporting. +- Fix a problem in the Vader BTL that resulted in failures of + IMB under certain circumstances. Thanks to Nicolas Morey- + Chaisemartin for reporting. +- Fix a problem preventing Open MPI from working under Cygwin. + Thanks to Marco Atzeri for reporting. +- Reduce some verbosity being emitted by the USNIC BTL under certain + circumstances. Thanks to Peter Forai for reporting. +- Fix a problem with misdirection of SIGKILL. Thanks to Michael Fern + for reporting. +- Replace use of posix_memalign with malloc for small allocations. Thanks + to Ben Menaude for reporting. +- Fix a problem with Open MPI's out of band TCP network for file descriptors + greater than 32767. Thanks to Wojtek Wasko for reporting and fixing. +- Plug a memory leak in MPI_Mem_free(). Thanks to Philip Blakely for reporting. + +2.1.2 -- September, 2017 +------------------------ + +Bug fixes/minor improvements: +- Update internal PMIx version to 1.2.3. +- Fix some problems when using the NAG Fortran compiler to build Open MPI + and when using the compiler wrappers. Thanks to Neil Carlson for reporting. +- Fix a compilation problem with the SM BTL. Thanks to Paul Hargrove for + reporting. +- Fix a problem with MPI_IALLTOALLW when using zero-length messages. + Thanks to Dahai Guo for reporting. +- Fix a problem with C11 generic type interface for SHMEM_G. Thanks + to Nick Park for reporting. +- Switch to using the lustreapi.h include file when building Open MPI + with Lustre support. +- Fix a problem in the OB1 PML that led to hangs with OSU collective tests. +- Fix a progression issue with MPI_WIN_FLUSH_LOCAL. Thanks to + Joseph Schuchart for reporting. +- Fix an issue with recent versions of PBSPro requiring libcrypto. + Thanks to Petr Hanousek for reporting. +- Fix a problem when using MPI_ANY_SOURCE with MPI_SENDRECV. +- Fix an issue that prevented signals from being propagated to ORTE + daemons. +- Ensure that signals are forwarded from ORTE daemons to all processes + in the process group created by the daemons. Thanks to Ted Sussman + for reporting. +- Fix a problem with launching a job under a debugger. Thanks to + Greg Lee for reporting. +- Fix a problem with Open MPI native I/O MPI_FILE_OPEN when using + a communicator having an associated topology. Thanks to + Wei-keng Liao for reporting. +- Fix an issue when using MPI_ACCUMULATE with derived datatypes. +- Fix a problem with Fortran bindings that led to compilation errors + for user defined reduction operations. Thanks to Nathan Weeks for + reporting. +- Fix ROMIO issues with large writes/reads when using NFS file systems. +- Fix definition of Fortran MPI_ARGV_NULL and MPI_ARGVS_NULL. +- Enable use of the head node of a SLURM allocation on Cray XC systems. +- Fix a problem with synchronous sends when using the UCX PML. +- Use default socket buffer size to improve TCP BTL performance. +- Add a mca parameter ras_base_launch_orted_on_hn to allow for launching + MPI processes on the same node where mpirun is executing using a separate + orte daemon, rather than the mpirun process. This may be useful to set to + true when using SLURM, as it improves interoperability with SLURM's signal + propagation tools. By default it is set to false, except for Cray XC systems. +- Fix --without-lsf when lsf is installed in the default search path. +- Remove support for big endian PowerPC. +- Remove support for XL compilers older than v13.1 +- Remove IB XRC support from the OpenIB BTL due to loss of maintainer. + +2.1.1 -- April, 2017 +-------------------- + +Bug fixes/minor improvements: + +- Fix a problem with one of Open MPI's fifo data structures which led to + hangs in a make check test. Thanks to Nicolas Morey-Chaisemartin for + reporting. +- Add missing MPI_AINT_ADD/MPI_AINT_DIFF function definitions to mpif.h. + Thanks to Aboorva Devarajan for reporting. +- Fix the error return from MPI_WIN_LOCK when rank argument is invalid. + Thanks to Jeff Hammond for reporting and fixing this issue. +- Fix a problem with mpirun/orterun when started under a debugger. Thanks + to Gregory Leff for reporting. +- Add configury option to disable use of CMA by the vader BTL. Thanks + to Sascha Hunold for reporting. +- Add configury check for MPI_DOUBLE_COMPLEX datatype support. + Thanks to Alexander Klein for reporting. +- Fix memory allocated by MPI_WIN_ALLOCATE_SHARED to + be 64 bit aligned. Thanks to Joseph Schuchart for + reporting. +- Update MPI_WTICK man page to reflect possibly higher + resolution than 10e-6. Thanks to Mark Dixon for + reporting +- Add missing MPI_T_PVAR_SESSION_NULL definition to mpi.h + include file. Thanks to Omri Mor for this contribution. +- Enhance the Open MPI spec file to install modulefile in /opt + if installed in a non-default location. Thanks to Kevin + Buckley for reporting and supplying a fix. +- Fix a problem with conflicting PMI symbols when linking statically. + Thanks to Kilian Cavalotti for reporting. + +Known issues (to be addressed in v2.1.2): + +- See the list of fixes slated for v2.1.2 here: + https://github.com/open-mpi/ompi/milestone/28 + +2.1.0 -- March, 2017 +-------------------- + +Major new features: + +- The main focus of the Open MPI v2.1.0 release was to update to PMIx + v1.2.1. When using PMIx (e.g., via mpirun-based launches, or via + direct launches with recent versions of popular resource managers), + launch time scalability is improved, and the run time memory + footprint is greatly decreased when launching large numbers of MPI / + OpenSHMEM processes. +- Update OpenSHMEM API conformance to v1.3. +- The usnic BTL now supports MPI_THREAD_MULTIPLE. +- General/overall performance improvements to MPI_THREAD_MULTIPLE. +- Add a summary message at the bottom of configure that tells you many + of the configuration options specified and/or discovered by Open + MPI. + +Changes in behavior compared to prior versions: + +- None. + +Removed legacy support: + +- The ptmalloc2 hooks have been removed from the Open MPI code base. + This is not really a user-noticable change; it is only mentioned + here because there was much rejoycing in the Open MPI developer + community. + +Bug fixes/minor improvements: + +- New MCA parameters: + - iof_base_redirect_app_stderr_to_stdout: as its name implies, it + combines MPI / OpenSHMEM applications' stderr into its stdout + stream. + - opal_event_include: allow the user to specify which FD selection + mechanism is used by the underlying event engine. + - opal_stacktrace_output: indicate where stacktraces should be sent + upon MPI / OpenSHMEM process crashes ("none", "stdout", "stderr", + "file:filename"). + - orte_timeout_for_stack_trace: number of seconds to wait for stack + traces to be reported (or <=0 to wait forever). + - mtl_ofi_control_prog_type/mtl_ofi_data_prog_type: specify libfabric + progress model to be used for control and data. +- Fix MPI_WTICK regression where the time reported may be inaccurate + on systems with processor frequency scalaing enabled. +- Fix regression that lowered the memory maximum message bandwidth for + large messages on some BTL network transports, such as openib, sm, + and vader. +- Fix a name collision in the shared file pointer MPI IO file locking + scheme. Thanks to Nicolas Joly for reporting the issue. +- Fix datatype extent/offset errors in MPI_PUT and MPI_RACCUMULATE + when using the Portals 4 one-sided component. +- Add support for non-contiguous datatypes to the Portals 4 one-sided + component. +- Various updates for the UCX PML. +- Updates to the following man pages: + - mpirun(1) + - MPI_COMM_CONNECT(3) + - MPI_WIN_GET_NAME(3). Thanks to Nicolas Joly for reporting the + typo. + - MPI_INFO_GET_[NKEYS|NTHKEY](3). Thanks to Nicolas Joly for + reporting the typo. +- Fixed a problem in the TCP BTL when using MPI_THREAD_MULTIPLE. + Thanks to Evgueni Petrov for reporting. +- Fixed external32 representation in the romio314 module. Note that + for now, external32 representation is not correctly supported by the + ompio module. Thanks to Thomas Gastine for bringing this to our + attention. +- Add note how to disable a warning message about when a high-speed + MPI transport is not found. Thanks to Susan Schwarz for reporting + the issue. +- Ensure that sending SIGINT when using the rsh/ssh launcher does not + orphan children nodes in the launch tree. +- Fix the help message when showing deprecated MCA param names to show + the correct (i.e., deprecated) name. +- Enable support for the openib BTL to use multiple different + InfiniBand subnets. +- Fix a minor error in MPI_AINT_DIFF. +- Fix bugs with MPI_IN_PLACE handling in: + - MPI_ALLGATHER[V] + - MPI_[I][GATHER|SCATTER][V] + - MPI_IREDUCE[_SCATTER] + - Thanks to all the users who helped diagnose these issues. +- Allow qrsh to tree spawn (if the back-end system supports it). +- Fix MPI_T_PVAR_GET_INDEX to return the correct index. +- Correctly position the shared file pointer in append mode in the + OMPIO component. +- Add some deprecated names into shmem.h for backwards compatibility + with legacy codes. +- Fix MPI_MODE_NOCHECK support. +- Fix a regression in PowerPC atomics support. Thanks to Orion + Poplawski for reporting the issue. +- Fixes for assembly code with aggressively-optimized compilers on + x86_64/AMD64 platforms. +- Fix one more place where configure was mangling custom CFLAGS. + Thanks to Phil Tooley (@Telemin) for reporting the issue. +- Better handle builds with external installations of hwloc. +- Fixed a hang with MPI_PUT and MPI_WIN_LOCK_ALL. +- Fixed a bug when using MPI_GET on non-contiguous datatypes and + MPI_LOCK/MPI_UNLOCK. +- Fixed a bug when using POST/START/COMPLETE/WAIT after a fence. +- Fix configure portability by cleaning up a few uses of "==" with + "test". Thanks to Kevin Buckley for pointing out the issue. +- Fix bug when using darrays with lib and extent of darray datatypes. +- Updates to make Open MPI binary builds more bit-for-bit + reproducable. Thanks to Alastair McKinstry for the suggestion. +- Fix issues regarding persistent request handling. +- Ensure that shmemx.h is a standalone OpenSHMEM header file. Thanks + to Nick Park (@nspark) for the report. +- Ensure that we always send SIGTERM prior to SIGKILL. Thanks to Noel + Rycroft for the report. +- Added ConnectX-5 and Chelsio T6 device defaults for the openib BTL. +- OpenSHMEM no longer supports MXM less than v2.0. +- Plug a memory leak in ompi_osc_sm_free. Thanks to Joseph Schuchart + for the report. +- The "self" BTL now uses less memory. +- The vader BTL is now more efficient in terms of memory usage when + using XPMEM. +- Removed the --enable-openib-failover configure option. This is not + considered backwards-incompatible because this option was stale and + had long-since stopped working, anyway. +- Allow jobs launched under Cray aprun to use hyperthreads if + opal_hwloc_base_hwthreads_as_cpus MCA parameter is set. +- Add support for 32-bit and floating point Cray Aries atomic + operations. +- Add support for network AMOs for MPI_ACCUMULATE, MPI_FETCH_AND_OP, + and MPI_COMPARE_AND_SWAP if the "ompi_single_intrinsic" info key is + set on the window or the "acc_single_intrinsic" MCA param is set. +- Automatically disqualify RDMA CM support in the openib BTL if + MPI_THREAD_MULTIPLE is used. +- Make configure smarter/better about auto-detecting Linux CMA + support. +- Improve the scalability of MPI_COMM_SPLIT_TYPE. +- Fix the mixing of C99 and C++ header files with the MPI C++ + bindings. Thanks to Alastair McKinstry for the bug report. +- Add support for ARM v8. +- Several MCA parameters now directly support MPI_T enumerator + semantics (i.e., they accept a limited set of values -- e.g., MCA + parameters that accept boolean values). +- Added --with-libmpi-name=STRING configure option for vendor releases + of Open MPI. See the README for more detail. +- Fix a problem with Open MPI's internal memory checker. Thanks to Yvan + Fournier for reporting. +- Fix a multi-threaded issue with MPI_WAIT. Thanks to Pascal Deveze for + reporting. +Known issues (to be addressed in v2.1.1): + +- See the list of fixes slated for v2.1.1 here: + https://github.com/open-mpi/ompi/milestone/26 + +2.0.4 -- November, 2017 +----------------------- + +Bug fixes/minor improvements: +- Fix an issue with visibility of functions defined in the built-in PMIx. + Thanks to Siegmar Gross for reporting this issue. +- Add configure check to prevent trying to build this release of + Open MPI with an external hwloc 2.0 or newer release. +- Add ability to specify layered providers for OFI MTL. +- Fix a correctness issue with Open MPI's memory manager code + that could result in corrupted message data. Thanks to + Valentin Petrov for reporting. +- Fix issues encountered when using newer versions of PBS Pro. + Thanks to Petr Hanousek for reporting. +- Fix a problem with MPI_GET when using the vader BTL. Thanks + to Dahai Guo for reporting. +- Fix a problem when using MPI_ANY_SOURCE with MPI_SENDRECV_REPLACE. + Thanks to Dahai Guo for reporting. +- Fix a problem using MPI_FILE_OPEN with a communicator with an + attached cartesian topology. Thanks to Wei-keng Liao for reporting. +- Remove IB XRC support from the OpenIB BTL due to lack of support. +- Remove support for big endian PowerPC. +- Remove support for XL compilers older than v13.1 + +2.0.3 -- June 2017 +------------------ -2.0.0 -- DATE ------- +Bug fixes/minor improvements: + + - Fix a problem with MPI_IALLTOALLW when zero size messages are present. + Thanks to @mathbird for reporting. + - Add missing MPI_USER_FUNCTION definition to the mpi_f08 module. + Thanks to Nathan Weeks for reporting this issue. + - Fix a problem with MPI_WIN_LOCK not returning an error code when + a negative rank is supplied. Thanks to Jeff Hammond for reporting and + providing a fix. + - Fix a problem with make check that could lead to hangs. Thanks to + Nicolas Morey-Chaisemartin for reporting. + - Resolve a symbol conflict problem with PMI-1 and PMI-2 PMIx components. + Thanks to Kilian Cavalotti for reporting this issue. + - Insure that memory allocations returned from MPI_WIN_ALLOCATE_SHARED are + 64 byte aligned. Thanks to Joseph Schuchart for reporting this issue. + - Make use of DOUBLE_COMPLEX, if available, for Fortran bindings. Thanks + to Alexander Klein for reporting this issue. + - Add missing MPI_T_PVAR_SESSION_NULL definition to Open MPI mpi.h include + file. Thanks to Omri Mor for reporting and fixing. + - Fix a problem with use of MPI shared file pointers when accessing + a file from independent jobs. Thanks to Nicolas Joly for reporting + this issue. + - Optimize zero size MPI_IALLTOALL{V,W} with MPI_IN_PLACE. Thanks to + Lisandro Dalcin for the report. + - Fix a ROMIO buffer overflow problem for large transfers when using NFS + filesystems. + - Fix type of MPI_ARGV[S]_NULL which prevented it from being used + properly with MPI_COMM_SPAWN[_MULTIPLE] in the mpi_f08 module. + - Ensure to add proper linker flags to the wrapper compilers for + dynamic libraries on platforms that need it (e.g., RHEL 7.3 and + later). + - Get better performance on TCP-based networks 10Gbps and higher by + using OS defaults for buffer sizing. + - Fix a bug with MPI_[R][GET_]ACCUMULATE when using DARRAY datatypes. + - Fix handling of --with-lustre configure command line argument. + Thanks to Prentice Bisbal and Tim Mattox for reporting the issue. + - Added MPI_AINT_ADD and MPI_AINT_DIFF declarations to mpif.h. Thanks + to Aboorva Devarajan (@AboorvaDevarajan) for the bug report. + - Fix a problem in the TCP BTL when Open MPI is initialized with + MPI_THREAD_MULTIPLE support. Thanks to Evgueni Petro for analyzing and + reporting this issue. + - Fix yalla PML to properly handle underflow errors, and fixed a + memory leak with blocking non-contiguous sends. + - Restored ability to run autogen.pl on official distribution tarballs + (although this is still not recommended for most users!). + - Fix accuracy problems with MPI_WTIME on some systems by always using + either clock_gettime(3) or gettimeofday(3). + - Fix a problem where MPI_WTICK was not returning a higher time resolution + when available. Thanks to Mark Dixon for reporting this issue. + - Restore SGE functionality. Thanks to Kevin Buckley for the initial + report. + - Fix external hwloc compilation issues, and extend support to allow + using external hwloc installations as far back as v1.5.0. Thanks to + Orion Poplawski for raising the issue. + - Added latest Mellanox Connect-X and Chelsio T-6 adapter part IDs to + the openib list of default values. + - Do a better job of cleaning up session directories (e.g., in /tmp). + - Update a help message to indicate how to suppress a warning about + no high performance networks being detected by Open MPI. Thanks to + Susan Schwarz for reporting this issue. + - Fix a problem with mangling of custom CFLAGS when configuring Open MPI. + Thanks to Phil Tooley for reporting. + - Fix some minor memory leaks and remove some unused variables. + Thanks to Joshua Gerrard for reporting. + - Fix MPI_ALLGATHERV bug with MPI_IN_PLACE. + +Known issues (to be addressed in v2.0.4): + +- See the list of fixes slated for v2.0.4 here: + https://github.com/open-mpi/ompi/milestone/29 + +2.0.2 -- 26 January 2017 +------------------------- + +Bug fixes/minor improvements: + +- Fix a problem with MPI_FILE_WRITE_SHARED when using MPI_MODE_APPEND and + Open MPI's native MPI-IO implementation. Thanks to Nicolas Joly for + reporting. +- Fix a typo in the MPI_WIN_GET_NAME man page. Thanks to Nicolas Joly + for reporting. +- Fix a race condition with ORTE's session directory setup. Thanks to + @tbj900 for reporting this issue. +- Fix a deadlock issue arising from Open MPI's approach to catching calls to + munmap. Thanks to Paul Hargrove for reporting and helping to analyze this + problem. +- Fix a problem with PPC atomics which caused make check to fail unless builtin + atomics configure option was enabled. Thanks to Orion Poplawski for reporting. +- Fix a problem with use of x86_64 cpuid instruction which led to segmentation + faults when Open MPI was configured with -O3 optimization. Thanks to Mark + Santcroos for reporting this problem. +- Fix a problem when using built in atomics configure options on PPC platforms + when building 32 bit applications. Thanks to Paul Hargrove for reporting. +- Fix a problem with building Open MPI against an external hwloc installation. + Thanks to Orion Poplawski for reporting this issue. +- Remove use of DATE in the message queue version string reported to debuggers to + insure bit-wise reproducibility of binaries. Thanks to Alastair McKinstry + for help in fixing this problem. +- Fix a problem with early exit of a MPI process without calling MPI_FINALIZE + or MPI_ABORT that could lead to job hangs. Thanks to Christof Koehler for + reporting. +- Fix a problem with forwarding of SIGTERM signal from mpirun to MPI processes + in a job. Thanks to Noel Rycroft for reporting this problem +- Plug some memory leaks in MPI_WIN_FREE discovered using Valgrind. Thanks + to Joseph Schuchart for reporting. +- Fix a problems MPI_NEIGHOR_ALLTOALL when using a communicator with an empty topology + graph. Thanks to Daniel Ibanez for reporting. +- Fix a typo in a PMIx component help file. Thanks to @njoly for reporting this. +- Fix a problem with Valgrind false positives when using Open MPI's internal memchecker. + Thanks to Yvan Fournier for reporting. +- Fix a problem with MPI_FILE_DELETE returning MPI_SUCCESS when + deleting a non-existent file. Thanks to Wei-keng Liao for reporting. +- Fix a problem with MPI_IMPROBE that could lead to hangs in subsequent MPI + point to point or collective calls. Thanks to Chris Pattison for reporting. +- Fix a problem when configure Open MPI for powerpc with --enable-mpi-cxx + enabled. Thanks to Alastair McKinstry for reporting. +- Fix a problem using MPI_IALLTOALL with MPI_IN_PLACE argument. Thanks to + Chris Ward for reporting. +- Fix a problem using MPI_RACCUMULATE with the Portals4 transport. Thanks to + @PDeveze for reporting. +- Fix an issue with static linking and duplicate symbols arising from PMIx + Slurm components. Thanks to Limin Gu for reporting. +- Fix a problem when using MPI dynamic memory windows. Thanks to + Christoph Niethammer for reporting. +- Fix a problem with Open MPI's pkgconfig files. Thanks to Alastair McKinstry + for reporting. +- Fix a problem with MPI_IREDUCE when the same buffer is supplied for the + send and recv buffer arguments. Thanks to Valentin Petrov for reporting. +- Fix a problem with atomic operations on PowerPC. Thanks to Paul + Hargrove for reporting. + +Known issues (to be addressed in v2.0.3): + +- See the list of fixes slated for v2.0.3 here: + https://github.com/open-mpi/ompi/milestone/23 + +2.0.1 -- 2 September 2016 +----------------------- + +Bug fixes/minor improvements: + +- Short message latency and message rate performance improvements for + all transports. +- Fix shared memory performance when using RDMA-capable networks. + Thanks to Tetsuya Mishima and Christoph Niethammer for reporting. +- Fix bandwith performance degredation in the yalla (MXM) PML. Thanks + to Andreas Kempf for reporting the issue. +- Fix OpenSHMEM crash when running on non-Mellanox MXM-based networks. + Thanks to Debendra Das for reporting the issue. +- Fix a crash occuring after repeated calls to MPI_FILE_SET_VIEW with + predefined datatypes. Thanks to Eric Chamberland and Matthew + Knepley for reporting and helping chase down this issue. +- Fix stdin propagation to MPI processes. Thanks to Jingchao Zhang + for reporting the issue. +- Fix various runtime and portability issues by updating the PMIx + internal component to v1.1.5. +- Fix process startup failures on Intel MIC platforms due to very + large entries in /proc/mounts. +- Fix a problem with use of relative path for specifing executables to + mpirun/oshrun. Thanks to David Schneider for reporting. +- Various improvements when running over portals-based networks. +- Fix thread-based race conditions with GNI-based networks. +- Fix a problem with MPI_FILE_CLOSE and MPI_FILE_SET_SIZE. Thanks + to Cihan Altinay for reporting. +- Remove all use of rand(3) from within Open MPI so as not to perturb + applications use of it. Thanks to Matias Cabral and Noel Rycroft + for reporting. +- Fix crash in MPI_COMM_SPAWN. +- Fix types for MPI_UNWEIGHTED and MPI_WEIGHTS_EMPTY. Thanks to + Lisandro Dalcin for reporting. +- Correctly report the name of MPI_INTEGER16. +- Add some missing MPI constants to the Fortran bindings. +- Fixed compile error when configuring Open MPI with --enable-timing. +- Correctly set the shared library version of libompitrace.so. Thanks + to Alastair McKinstry for reporting. +- Fix errors in the MPI_RPUT, MPI_RGET, MPI_RACCUMULATE, and + MPI_RGET_ACCUMULATE Fortran bindings. Thanks to Alfio Lazzaro and + Joost VandeVondele for tracking this down. +- Fix problems with use of derived datatypes in non-blocking + collectives. Thanks to Yuki Matsumoto for reporting. +- Fix problems with OpenSHMEM header files when using CMake. Thanks to + Paul Kapinos for reporting the issue. +- Fix problem with use use of non-zero lower bound datatypes in + collectives. Thanks to Hristo Iliev for reporting. +- Fix a problem with memory allocation within MPI_GROUP_INTERSECTION. + Thanks to Lisandro Dalcin for reporting. +- Fix an issue with MPI_ALLGATHER for communicators that don't consist + of two ranks. Thanks to David Love for reporting. +- Various fixes for collectives when used with esoteric MPI datatypes. +- Fixed corner cases of handling DARRAY and HINDEXED_BLOCK datatypes. +- Fix a problem with filesystem type check for OpenBSD. + Thanks to Paul Hargrove for reporting. +- Fix some debug input within Open MPI internal functions. Thanks to + Durga Choudhury for reporting. +- Fix a typo in a configury help message. Thanks to Paul Hargrove for + reporting. +- Correctly support MPI_IN_PLACE in MPI_[I]ALLTOALL[V|W] and + MPI_[I]EXSCAN. +- Fix alignment issues on SPARC platforms. + +Known issues (to be addressed in v2.0.2): + +- See the list of fixes slated for v2.0.2 here: + https://github.com/open-mpi/ompi/milestone/20, and + https://github.com/open-mpi/ompi-release/milestone/19 + (note that the "ompi-release" Github repo will be folded/absorbed + into the "ompi" Github repo at some point in the future) + + +2.0.0 -- 12 July 2016 +--------------------- ********************************************************************** * Open MPI is now fully MPI-3.1 compliant ********************************************************************** -- Enhancements to reduce the memory footprint for jobs at scale. A - new MCA parameter, "mpi_add_procs_cutoff", is available to set the - threshold for using this feature. +Major new features: + - Many enhancements to MPI RMA. Open MPI now maps MPI RMA operations on to native RMA operations for those networks which support this capability. -- The MPI C++ bindings -- which were removed from the MPI standard in - v3.0 -- are no longer built by default and will be removed in some - future version of Open MPI. Use the --enable-mpi-cxx-bindings - configure option to build the deprecated/removed MPI C++ bindings. -- ompi_info now shows all components, even if they do not have MCA - parameters. The prettyprint output now separates groups with a - dashed line. +- Greatly improved support for MPI_THREAD_MULTIPLE (when configured + with --enable-mpi-thread-multiple). +- Enhancements to reduce the memory footprint for jobs at scale. A + new MCA parameter, "mpi_add_procs_cutoff", is available to set the + threshold for using this feature. +- Completely revamped support for memory registration hooks when using + OS-bypass network transports. +- Significant OMPIO performance improvements and many bug fixes. - Add support for PMIx - Process Management Interface for Exascale. Version 1.1.2 of PMIx is included internally in this release. - Add support for PLFS file systems in Open MPI I/O. - Add support for UCX transport. -- Improved support for MPI_THREAD_MULTIPLE (when configured with - --enable-mpi-thread-multiple). - Simplify build process for Cray XC systems. Add support for using native SLURM. -- Updated internal/embedded copies of third-part software: - - Update the internal copy of ROMIO to that which shipped in MPICH - 3.1.4. - - Update internal copy of libevent to v2.0.22. - - Update internal copy of hwloc to v1.11.2. - Add a --tune mpirun command line option to simplify setting many environment variables and MCA parameters. -- Add a new MCA parameter - - "opal_common_verbs_want_fork_support". This replaces the - "btl_openib_want_fork_support" parameter. - Add a new MCA parameter "orte_default_dash_host" to offer an analogue to the existing "orte_default_hostfile" MCA parameter. -- Add --with-platform-patches-dir configure option. -- Add --with-pmi-libdir configure option for environments that install - PMI libs in a non-default location. - Add the ability to specify the number of desired slots in the mpirun --host option. + +Changes in behavior compared to prior versions: + - In environments where mpirun cannot automatically determine the number of slots available (e.g., when using a hostfile that does not specify "slots", or when using --host without specifying a ":N" suffix to hostnames), mpirun now requires the use of "-np N" to specify how many MPI processes to launch. -- Removed some legacy support: - - Removed support for OS X Leopard. - - Removed support for Cray XT systems. - - Removed VampirTrace. - - Removed support for Myrinet/MX. - - Removed legacy collective module:ML. - - Removed support for Alpha processors. - - Removed --enable-mpi-profiling configure option. +- The MPI C++ bindings -- which were removed from the MPI standard in + v3.0 -- are no longer built by default and will be removed in some + future version of Open MPI. Use the --enable-mpi-cxx-bindings + configure option to build the deprecated/removed MPI C++ bindings. +- ompi_info now shows all components, even if they do not have MCA + parameters. The prettyprint output now separates groups with a + dashed line. +- OMPIO is now the default implementation of parallel I/O, with the + exception for Lustre parallel filesystems (where ROMIO is still the + default). The default selection of OMPI vs. ROMIO can be controlled + via the "--mca io ompi|romio" command line switch to mpirun. +- Per Open MPI's versioning scheme (see the README), increasing the + major version number to 2 indicates that this version is not + ABI-compatible with prior versions of Open MPI. You will need to + recompile MPI and OpenSHMEM applications to work with this version + of Open MPI. - Removed checkpoint/restart code due to loss of maintainer. :-( +- Change the behavior for handling certain signals when using PSM and + PSM2 libraries. Previously, the PSM and PSM2 libraries would trap + certain signals in order to generate tracebacks. The mechanism was + found to cause issues with Open MPI's own error reporting mechanism. + If not already set, Open MPI now sets the IPATH_NO_BACKTRACE and + HFI_NO_BACKTRACE environment variables to disable PSM/PSM2's + handling these signals. + +Removed legacy support: + +- Removed support for OS X Leopard. +- Removed support for Cray XT systems. +- Removed VampirTrace. +- Removed support for Myrinet/MX. +- Removed legacy collective module:ML. +- Removed support for Alpha processors. +- Removed --enable-mpi-profiling configure option. + +Known issues (to be addressed in v2.0.1): + +- See the list of fixes slated for v2.0.1 here: + https://github.com/open-mpi/ompi/milestone/16, and + https://github.com/open-mpi/ompi-release/milestone/16 + (note that the "ompi-release" Github repo will be folded/absorbed + into the "ompi" Github repo at some point in the future) + +- ompi-release#986: Fix data size counter for large ops with fcoll/static +- ompi-release#987: Fix OMPIO performance on Lustre +- ompi-release#1013: Fix potential inconsistency in btl/openib default settings +- ompi-release#1014: Do not return MPI_ERR_PENDING from collectives +- ompi-release#1056: Remove dead profile code from oshmem +- ompi-release#1081: Fix MPI_IN_PLACE checking for IALLTOALL{V|W} +- ompi-release#1081: Fix memchecker in MPI_IALLTOALLW +- ompi-release#1081: Support MPI_IN_PLACE in MPI_(I)ALLTOALLW and MPI_(I)EXSCAN +- ompi-release#1107: Allow future PMIx support for RM spawn limits +- ompi-release#1108: Fix sparse group process reference counting +- ompi-release#1109: If specified to be oversubcribed, disable binding +- ompi-release#1122: Allow NULL arrays for empty datatypes +- ompi-release#1123: Fix signed vs. unsigned compiler warnings +- ompi-release#1123: Make max hostname length uniform across code base +- ompi-release#1127: Fix MPI_Compare_and_swap +- ompi-release#1127: Fix MPI_Win_lock when used with MPI_Win_fence +- ompi-release#1132: Fix typo in help message for --enable-mca-no-build +- ompi-release#1154: Ensure pairwise coll algorithms disqualify themselves properly +- ompi-release#1165: Fix typos in debugging/verbose message output +- ompi-release#1178: Fix ROMIO filesystem check on OpenBSD 5.7 +- ompi-release#1197: Fix Fortran pthread configure check +- ompi-release#1205: Allow using external PMIx 1.1.4 and 2.0 +- ompi-release#1215: Fix configure to support the NAG Fortran compiler +- ompi-release#1220: Fix combiner args for MPI_HINDEXED_BLOCK +- ompi-release#1225: Fix combiner args for MPI_DARRAY +- ompi-release#1226: Disable old memory hooks with recent gcc versions +- ompi-release#1231: Fix new "patcher" support for some XLC platforms +- ompi-release#1244: Fix Java error handling +- ompi-release#1250: Ensure TCP is not selected for RDMA operations +- ompi-release#1252: Fix verbose output in coll selection +- ompi-release#1253: Set a default name for user-defined MPI_Op +- ompi-release#1254: Add count==0 checks in some non-blocking colls +- ompi-release#1258: Fix "make distclean" when using external pmix/hwloc/libevent +- ompi-release#1260: Clean up/uniform mca/coll/base memory management +- ompi-release#1261: Remove "patcher" warning message for static builds +- ompi-release#1263: Fix IO MPI_Request for 0-size read/write +- ompi-release#1264: Add blocking fence for SLURM operations + +Bug fixes / minor enhancements: + +- Updated internal/embedded copies of third-party software: + - Update the internal copy of ROMIO to that which shipped in MPICH + 3.1.4. + - Update internal copy of libevent to v2.0.22. + - Update internal copy of hwloc to v1.11.2. +- Notable new MCA parameters: + - opal_progress_lp_call_ration: Control how often low-priority + callbacks are made during Open MPI's main progress loop. + - opal_common_verbs_want_fork_support: This replaces the + btl_openib_want_fork_support parameter. +- Add --with-platform-patches-dir configure option. +- Add --with-pmi-libdir configure option for environments that install + PMI libs in a non-default location. +- Various configure-related compatibility updates for newer versions + of libibverbs and OFED. - Numerous fixes/improvements to orte-dvm. Special thanks to Mark Santcroos for his help. +- Fix a problem with timer code on ia32 platforms. Thanks to + Paul Hargrove for reporting this and providing a patch. +- Fix a problem with use of a 64 bit atomic counter. Thanks to + Paul Hargrove for reporting. +- Fix a problem with singleton job launching. Thanks to Lisandro + Dalcin for reporting. +- Fix a problem with use of MPI_UNDEFINED with MPI_COMM_SPLIT_TYPE. + Thanks to Lisandro Dalcin for reporting. - Silence a compiler warning in PSM MTL. Thanks to Adrian Reber for reporting this. +- Properly detect Intel TrueScale and OmniPath devices in the ACTIVE + state. Thanks to Durga Choudhury for reporting the issue. +- Fix detection and use of Solaris Studio 12.5 (beta) compilers. + Thanks to Paul Hargrove for reporting and debugging. +- Fix various small memory leaks. +- Allow NULL arrays when creating empty MPI datatypes. - Replace use of alloca with malloc for certain datatype creation functions. Thanks to Bogdan Sataric for reporting this. - Fix use of MPI_LB and MPI_UB in creation of of certain MPI datatypes. @@ -138,6 +877,8 @@ Master (not on release branches yet) Schnetter for reporting and fixing. - Improve hcoll library detection in configure. Thanks to David Shrader and Ake Sandgren for reporting this. +- Miscellaneous minor bug fixes in the hcoll component. +- Miscellaneous minor bug fixes in the ugni component. - Fix problems with XRC detection in OFED 3.12 and older releases. Thanks to Paul Hargrove for his analysis of this problem. - Update (non-standard/experimental) Java MPI interfaces to support @@ -165,6 +906,8 @@ Master (not on release branches yet) reporting this. - Fix a problem in neighborhood collectives. Thanks to Lisandro Dalcin for reporting. +- Fix MPI_IREDUCE_SCATTER_BLOCK for a one-process communicator. Thanks + to Lisandro Dalcin for reporting. - Add (Open MPI-specific) additional flavors to MPI_COMM_SPLIT_TYPE. See MPI_Comm_split_type(3) for details. Thanks to Nick Andersen for supplying this enhancement. @@ -216,6 +959,52 @@ Master (not on release branches yet) Alastair McKinstry for reporting. +1.10.7 - 16 May 2017 +------ +- Fix bug in TCP BTL that impacted performance on 10GbE (and faster) + networks by not adjusting the TCP send/recv buffer sizes and using + system default values +- Add missing MPI_AINT_ADD and MPI_AINT_DIFF function delcarations in + mpif.h +- Fixed time reported by MPI_WTIME; it was previously reported as + dependent upon the CPU frequency. +- Fix platform detection on FreeBSD +- Fix a bug in the handling of MPI_TYPE_CREATE_DARRAY in + MPI_(R)(GET_)ACCUMULATE +- Fix openib memory registration limit calculation +- Add missing MPI_T_PVAR_SESSION_NULL in mpi.h +- Fix "make distcheck" when using external hwloc and/or libevent packages +- Add latest ConnectX-5 vendor part id to OpenIB device params +- Fix race condition in the UCX PML +- Fix signal handling for rsh launcher +- Fix Fortran compilation errors by removing MPI_SIZEOF in the Fortran + interfaces when the compiler does not support it +- Fixes for the pre-ignore-TKR "mpi" Fortran module implementation + (i.e., for older Fortran compilers -- these problems did not exist + in the "mpi" module implementation for modern Fortran compilers): + - Add PMPI_* interfaces + - Fix typo in MPI_FILE_WRITE_AT_ALL_BEGIN interface name + - Fix typo in MPI_FILE_READ_ORDERED_BEGIN interface name +- Fixed the type of MPI_DISPLACEMENT_CURRENT in all Fortran interfaces + to be an INTEGER(KIND=MPI_OFFSET_KIND). +- Fixed typos in MPI_INFO_GET_* man pages. Thanks to Nicolas Joly for + the patch +- Fix typo bugs in wrapper compiler script + + +1.10.6 - 17 Feb 2017 +------ +- Fix bug in timer code that caused problems at optimization settings + greater than 2 +- OSHMEM: make mmap allocator the default instead of sysv or verbs +- Support MPI_Dims_create with dimension zero +- Update USNIC support +- Prevent 64-bit overflow on timer counter +- Add support for forwarding signals +- Fix bug that caused truncated messages on large sends over TCP BTL +- Fix potential infinite loop when printing a stacktrace + + 1.10.5 - 19 Dec 2016 ------ - Update UCX APIs @@ -260,6 +1049,8 @@ Master (not on release branches yet) 1.10.3 - 15 June 2016 ------ +- Fix zero-length datatypes. Thanks to Wei-keng Liao for reporting + the issue. - Minor manpage cleanups - Implement atomic support in OSHMEM/UCX - Fix support of MPI_COMBINER_RESIZED. Thanks to James Ramsey @@ -310,6 +1101,23 @@ Master (not on release branches yet) - Fix affinity for MPMD jobs running under LSF - Fix many Fortran binding bugs - Fix `MPI_IN_PLACE`-related bugs +- Fix PSM/PSM2 support for singleton operations +- Ensure MPI transports continue to progress during RTE barriers +- Update HWLOC to 1.9.1 end-of-series +- Fix a bug in the Java command line parser when the + -Djava.library.path options was given by the user +- Update the MTL/OFI provider selection behavior +- Add support for clock_gettime on Linux. +- Correctly detect and configure for Solaris Studio 12.5 + beta compilers +- Correctly compute #slots when -host is used for MPMD case +- Fix a bug in the hcoll collectives due to an uninitialized field +- Do not set a binding policy when oversubscribing a node +- Fix hang in intercommunicator operations when oversubscribed +- Speed up process termination during MPI_Abort +- Disable backtrace support by default in the PSM/PSM2 libraries to + prevent unintentional conflicting behavior. + 1.10.2: 26 Jan 2016 diff --git a/README b/README index 08e98ff41f2..4085f8144dd 100644 --- a/README +++ b/README @@ -12,13 +12,15 @@ Copyright (c) 2006-2017 Cisco Systems, Inc. All rights reserved. Copyright (c) 2006-2011 Mellanox Technologies. All rights reserved. Copyright (c) 2006-2012 Oracle and/or its affiliates. All rights reserved. Copyright (c) 2007 Myricom, Inc. All rights reserved. -Copyright (c) 2008-2016 IBM Corporation. All rights reserved. +Copyright (c) 2008-2017 IBM Corporation. All rights reserved. Copyright (c) 2010 Oak Ridge National Labs. All rights reserved. Copyright (c) 2011 University of Houston. All rights reserved. -Copyright (c) 2013-2015 Intel, Inc. All rights reserved +Copyright (c) 2013-2017 Intel, Inc. All rights reserved. Copyright (c) 2015 NVIDIA Corporation. All rights reserved. Copyright (c) 2017 Los Alamos National Security, LLC. All rights reserved. +Copyright (c) 2017 Research Organization for Information Science + and Technology (RIST). All rights reserved. $COPYRIGHT$ @@ -112,7 +114,6 @@ General notes - The run-time systems that are currently supported are: - rsh / ssh - - LoadLeveler - PBS Pro, Torque - Platform LSF (v7.0.2 and later) - SLURM @@ -120,23 +121,32 @@ General notes - Oracle Grid Engine (OGE) 6.1, 6.2 and open source Grid Engine - Systems that have been tested are: - - Linux (various flavors/distros), 32 bit, with gcc - Linux (various flavors/distros), 64 bit (x86), with gcc, Absoft, Intel, and Portland (*) - - OS X (10.8, 10.9, 10.10, 10.11), 32 and 64 bit (x86_64), with - XCode and Absoft compilers (*) - - MacOS (10.12), 64 bit (x85_64) with XCode and Absoft compilers (*) - - OpenBSD. Requires configure options --enable-mca-no-build=patcher - and --disable-slopen with this release. + - macOS (10.12), 64 bit (x85_64) with XCode compilers (*) Be sure to read the Compiler Notes, below. - Other systems have been lightly (but not fully tested): + - Linux (various flavors/distros), 32 bit, with gcc - Cygwin 32 & 64 bit with gcc - - ARMv4, ARMv5, ARMv6, ARMv7, ARMv8 + - ARMv6, ARMv7, ARMv8 (aarch64) - Other 64 bit platforms (e.g., Linux on PPC64) - Oracle Solaris 10 and 11, 32 and 64 bit (SPARC, i386, x86_64), with Oracle Solaris Studio 12.5 + - OpenBSD. Requires configure options --enable-mca-no-build=patcher + and --disable-slopen with this release. + - Problems have been reported when building Open MPI on FreeBSD 11.1 + using the clang-4.0 system compiler. A workaround is to build + Open MPI using the GNU compiler. + +Platform Notes +-------------- + +- ARM and POWER users may experience intermittent hangs when Open MPI + is compiled with low optimization settings, due to an issue with our + atomic list implementation. We recommend compiling with -O3 + optimization, both for performance reasons and to avoid this hang. Compiler Notes -------------- @@ -180,8 +190,30 @@ Compiler Notes source directory path names that was resolved in 9.0-4 (9.0-3 is known to be broken in this regard). -- IBM's xlf compilers: NO known good version that can build/link - the MPI f08 bindings or build/link the OpenSHMEM Fortran bindings. +- Open MPI does not support the PGI compiler suite on OS X or MacOS. + See issues below for more details: + https://github.com/open-mpi/ompi/issues/2604 + https://github.com/open-mpi/ompi/issues/2605 + +- OpenSHMEM Fortran bindings do not support the `no underscore` Fortran + symbol convention. IBM's xlf compilers build in that mode by default. + As such, IBM's xlf compilers cannot build/link the OpenSHMEM Fortran + bindings by default. A workaround is to pass FC="xlf -qextname" at + configure time to force a trailing underscore. See the issue below + for more details: + https://github.com/open-mpi/ompi/issues/3612 + +- MPI applications that use the mpi_f08 module on PowerPC platforms + (tested ppc64le) will likely experience runtime failures if: + - they are using a GNU linker (ld) version after v2.25.1 and before v2.28, + -and- + - they compiled with PGI (tested 17.5) or XL (tested v15.1.5) compilers. + This was noticed on Ubuntu 16.04 which uses the 2.26.1 version of ld by + default. However, this issue impacts any OS using a version of ld noted + above. This GNU linker regression will be fixed in version 2.28. + Below is a link to the GNU bug on this issue: + https://sourceware.org/bugzilla/show_bug.cgi?id=21306 + The XL compiler will include a fix for this issue in a future release. - On NetBSD-6 (at least AMD64 and i386), and possibly on OpenBSD, libtool misidentifies properties of f95/g95, leading to obscure @@ -255,9 +287,6 @@ Compiler Notes version of the Intel 12.1 Linux compiler suite, the problem will go away. -- It has been reported that Pathscale 5.0.5 and 6.0.527 compilers - give an internal compiler error when trying to Open MPI. - - Early versions of the Portland Group 6.0 compiler have problems creating the C++ MPI bindings as a shared library (e.g., v6.0-1). Tests with later versions show that this has been fixed (e.g., @@ -278,6 +307,9 @@ Compiler Notes also automatically add "-Msignextend" when the C and C++ MPI wrapper compilers are used to compile user MPI applications. +- It has been reported that Pathscale 5.0.5 and 6.0.527 compilers + give an internal compiler error when trying to Open MPI. + - Using the MPI C++ bindings with older versions of the Pathscale compiler on some platforms is an old issue that seems to be a problem when Pathscale uses a back-end GCC 3.x compiler. Here's a @@ -296,6 +328,12 @@ Compiler Notes Note the MPI C++ bindings have been deprecated by the MPI Forum and may not be supported in future releases. +- As of July 2017, the Pathscale compiler suite apparently has no + further commercial support, and it does not look like there will be + further releases. Any issues discovered regarding building / + running Open MPI with the Pathscale compiler suite therefore may not + be able to be resolved. + - Using the Absoft compiler to build the MPI Fortran bindings on Suse 9.3 is known to fail due to a Libtool compatibility issue. @@ -531,24 +569,6 @@ OpenSHMEM Functionality and Features MPI Collectives --------------- -- The "hierarch" coll component (i.e., an implementation of MPI - collective operations) attempts to discover network layers of - latency in order to segregate individual "local" and "global" - operations as part of the overall collective operation. In this - way, network traffic can be reduced -- or possibly even minimized - (similar to MagPIe). The current "hierarch" component only - separates MPI processes into on- and off-node groups. - - Hierarch has had sufficient correctness testing, but has not - received much performance tuning. As such, hierarch is not - activated by default -- it must be enabled manually by setting its - priority level to 100: - - mpirun --mca coll_hierarch_priority 100 ... - - We would appreciate feedback from the user community about how well - hierarch works for your applications. - - The "fca" coll component: the Mellanox Fabric Collective Accelerator (FCA) is a solution for offloading collective operations from the MPI process onto Mellanox QDR InfiniBand switch CPUs and HCAs. @@ -578,7 +598,7 @@ Network Support - There are four main MPI network models available: "ob1", "cm", "yalla", and "ucx". "ob1" uses BTL ("Byte Transfer Layer") components for each supported network. "cm" uses MTL ("Matching - Tranport Layer") components for each supported network. "yalla" + Transport Layer") components for each supported network. "yalla" uses the Mellanox MXM transport. "ucx" uses the OpenUCX transport. - "ob1" supports a variety of networks that can be used in @@ -614,19 +634,21 @@ Network Support or shell$ mpirun --mca pml cm ... -- Similarly, there are two OpenSHMEM network models available: "yoda", - and "ikrit". "yoda" also uses the BTL components for supported - networks. "ikrit" interfaces directly with Mellanox MXM. - - - "yoda" supports a variety of networks that can be used: - - - OpenFabrics: InfiniBand, iWARP, and RoCE - - Loopback (send-to-self) - - Shared memory - - TCP - - usNIC - - - "ikrit" only supports Mellanox MXM. +- Similarly, there are two OpenSHMEM network models available: "ucx", + and "ikrit": + - "ucx" interfaces directly with UCX; + - "ikrit" interfaces directly with Mellanox MXM. + +- UCX is the Unified Communication X (UCX) communication library + (http://www.openucx.org/). + This is an open-source project developed in collaboration between + industry, laboratories, and academia to create an open-source + production grade communication framework for data centric and + high-performance applications. + UCX currently supports: + - OFA Verbs; + - Cray's uGNI; + - NVIDIA CUDA drivers. - MXM is the Mellanox Messaging Accelerator library utilizing a full range of IB transports to provide the following messaging services @@ -716,6 +738,11 @@ Open MPI Extensions Building Open MPI ----------------- +If you have checked out a DEVELOPER'S COPY of Open MPI (i.e., you +cloned from Git), you really need to read the HACKING file before +attempting to build Open MPI. Really. + +If you have downloaded a tarball, then things are much simpler. Open MPI uses a traditional configure script paired with "make" to build. Typical installs can be of the pattern: @@ -799,6 +826,10 @@ INSTALLATION OPTIONS This rpath/runpath behavior can be disabled via --disable-wrapper-rpath. + If you would like to keep the rpath option, but not enable runpath + a different configure option is avalabile + --disable-wrapper-runpath. + --enable-dlopen Build all of Open MPI's components as standalone Dynamic Shared Objects (DSO's) that are loaded at run-time (this is the default). @@ -815,6 +846,33 @@ INSTALLATION OPTIONS are build as static or dynamic via --enable|disable-static and --enable|disable-shared. +--disable-show-load-errors-by-default + Set the default value of the mca_base_component_show_load_errors MCA + variable: the --enable form of this option sets the MCA variable to + true, the --disable form sets the MCA variable to false. The MCA + mca_base_component_show_load_errors variable can still be overridden + at run time via the usual MCA-variable-setting mechanisms; this + configure option simply sets the default value. + + The --disable form of this option is intended for Open MPI packagers + who tend to enable support for many different types of networks and + systems in their packages. For example, consider a packager who + includes support for both the FOO and BAR networks in their Open MPI + package, both of which require support libraries (libFOO.so and + libBAR.so). If an end user only has BAR hardware, they likely only + have libBAR.so available on their systems -- not libFOO.so. + Disabling load errors by default will prevent the user from seeing + potentially confusing warnings about the FOO components failing to + load because libFOO.so is not available on their systems. + + Conversely, system administrators tend to build an Open MPI that is + targeted at their specific environment, and contains few (if any) + components that are not needed. In such cases, they might want + their users to be warned that the FOO network components failed to + load (e.g., if libFOO.so was mistakenly unavailable), because Open + MPI may otherwise silently failover to a slower network path for MPI + traffic. + --with-platform=FILE Load configure options for the build from FILE. Options on the command line that are not in FILE are also used. Options on the @@ -991,10 +1049,6 @@ RUN-TIME SYSTEM SUPPORT Force the building of for the Cray Alps run-time environment. If Alps support cannot be found, configure will abort. ---with-loadleveler - Force the building of LoadLeveler scheduler support. If LoadLeveler - support cannot be found, configure will abort. - --with-lsf= Specify the directory where the LSF libraries and header files are located. This option is generally only necessary if the LSF headers @@ -1254,7 +1308,7 @@ MPI FUNCTIONALITY --disable-io-ompio Disable the ompio MPI-IO component - + --enable-sparse-groups Enable the usage of sparse groups. This would save memory significantly especially if you are creating large @@ -1423,7 +1477,7 @@ NOTE: The version numbering conventions were changed with the release release schedule to indicate feature development vs. stable releases. See the README in releases prior to v1.10.0 for more information (e.g., - https://github.com/open-mpi/ompi-release/blob/v1.8/README#L1392-L1475). + https://github.com/open-mpi/ompi/blob/v1.8/README#L1392-L1475). Backwards Compatibility ----------------------- diff --git a/README.JAVA.txt b/README.JAVA.txt index ca0c70abf2f..804003e57e5 100644 --- a/README.JAVA.txt +++ b/README.JAVA.txt @@ -257,7 +257,7 @@ Specifying offsets in buffers In a C program, it is common to specify an offset in a array with "&array[i]" or "array+i", for instance to send data starting from -a given positon in the array. The equivalent form in the Java bindings +a given position in the array. The equivalent form in the Java bindings is to "slice()" the buffer to start at an offset. Making a "slice()" on a buffer is only necessary, when the offset is not zero. Slices work for both arrays and direct buffers. diff --git a/VERSION b/VERSION index fb771b4c6eb..f92b41d35e2 100644 --- a/VERSION +++ b/VERSION @@ -4,6 +4,8 @@ # Copyright (c) 2013 Mellanox Technologies, Inc. # All rights reserved. # Copyright (c) 2016 IBM Corporation. All rights reserved. +# Copyright (c) 2017 Los Alamos National Security, LLC. All rights +# reserved. # This is the VERSION file for Open MPI, describing the precise # version of Open MPI in this distribution. The various components of @@ -13,7 +15,7 @@ # major, minor, and release are generally combined in the form # ... -major=3 +major=4 minor=0 release=0 @@ -50,7 +52,7 @@ date="Unreleased developer copy" # The shared library version of each of Open MPI's public libraries. # These versions are maintained in accordance with the "Library # Interface Versions" chapter from the GNU Libtool documentation. The -# first Open MPI release to programatically specify these versions was +# first Open MPI release to programmatically specify these versions was # v1.3.4 (note that Libtool defaulted all prior releases to 0:0:0). # All changes in these version numbers are dictated by the Open MPI # release managers (not individual developers). Notes: @@ -101,13 +103,14 @@ libompitrace_so_version=0:0:0 # OMPI layer libmca_ompi_common_ompio_so_version=0:0:0 +libmca_ompi_common_monitoring_so_version=0:0:0 # ORTE layer libmca_orte_common_alps_so_version=0:0:0 # OPAL layer libmca_opal_common_cuda_so_version=0:0:0 -libmca_opal_common_libfabric_so_version=0:0:0 +libmca_opal_common_ofi_so_version=0:0:0 libmca_opal_common_sm_so_version=0:0:0 libmca_opal_common_ugni_so_version=0:0:0 libmca_opal_common_verbs_so_version=0:0:0 diff --git a/autogen.pl b/autogen.pl index 5293337e85c..924c4c6d68f 100755 --- a/autogen.pl +++ b/autogen.pl @@ -316,7 +316,7 @@ sub mca_process_framework { $mca_found->{$pname}->{$framework}->{found} = 1; opendir(DIR, $dir) || my_die "Can't open $dir directory"; - foreach my $d (readdir(DIR)) { + foreach my $d (sort(readdir(DIR))) { # Skip any non-directory, "base", or any dir that # begins with "." next @@ -628,7 +628,7 @@ sub mpiext_run_global { my $dir = "$topdir/$ext_prefix"; opendir(DIR, $dir) || my_die "Can't open $dir directory"; - foreach my $d (readdir(DIR)) { + foreach my $d (sort(readdir(DIR))) { # Skip any non-directory, "base", or any dir that begins with "." next if (! -d "$dir/$d" || $d eq "base" || substr($d, 0, 1) eq "."); @@ -715,7 +715,7 @@ sub mpicontrib_run_global { my $dir = "$topdir/$contrib_prefix"; opendir(DIR, $dir) || my_die "Can't open $dir directory"; - foreach my $d (readdir(DIR)) { + foreach my $d (sort(readdir(DIR))) { # Skip any non-directory, "base", or any dir that begins with "." next if (! -d "$dir/$d" || $d eq "base" || substr($d, 0, 1) eq "."); @@ -915,7 +915,7 @@ sub patch_autotools_output { # Patch ltmain.sh error for PGI version numbers. Redirect stderr to # /dev/null because this patch is only necessary for some versions of # Libtool (e.g., 2.2.6b); it'll [rightfully] fail if you have a new - # enough Libtool that dosn't need this patch. But don't alarm the + # enough Libtool that doesn't need this patch. But don't alarm the # user and make them think that autogen failed if this patch fails -- # make the errors be silent. # Also patch ltmain.sh for NAG compiler @@ -935,7 +935,7 @@ sub patch_autotools_output { my @verbose_out; # Total ugh. We have to patch the configure script itself. See below - # for explainations why. + # for explanations why. open(IN, "configure") || my_die "Can't open configure"; my $c; $c .= $_ @@ -975,6 +975,28 @@ sub patch_autotools_output { # Below is essentially an upstream patch for Libtool which we want # made available to Open MPI users running older versions of Libtool + foreach my $tag (("", "_FC")) { + + # We have to change the search pattern and substitution on each + # iteration to take into account the tag changing + my $search_string = '# icc used to be incompatible with GCC.\n\s+' . + '# ICC 10 doesn\047t accept -KPIC any more.\n.*\n\s+' . + "lt_prog_compiler_wl${tag}="; + my $replace_string = "# Flang compiler + *flang) + lt_prog_compiler_wl${tag}='-Wl,' + lt_prog_compiler_pic${tag}='-fPIC -DPIC' + lt_prog_compiler_static${tag}='-static' + ;; + # icc used to be incompatible with GCC. + # ICC 10 doesn't accept -KPIC any more. + icc* | ifort*) + lt_prog_compiler_wl${tag}="; + + push(@verbose_out, $indent_str . "Patching configure for flang Fortran ($tag)\n"); + $c =~ s/$search_string/$replace_string/; + } + foreach my $tag (("", "_FC")) { # We have to change the search pattern and substitution on each diff --git a/config/ltmain_nag_pthread.diff b/config/ltmain_nag_pthread.diff index 87c27810096..927b671f9ae 100644 --- a/config/ltmain_nag_pthread.diff +++ b/config/ltmain_nag_pthread.diff @@ -8,7 +8,7 @@ if test -n "$inherited_linker_flags"; then - tmp_inherited_linker_flags=`$ECHO "$inherited_linker_flags" | $SED 's/-framework \([^ $]*\)/\1.ltframework/g'` + case "$CC" in -+ nagfor*) ++ *nagfor*) + tmp_inherited_linker_flags=`$ECHO "$inherited_linker_flags" | $SED 's/-framework \([^ $]*\)/\1.ltframework/g' | $SED 's/-pthread/-Wl,-pthread/g'`;; + *) + tmp_inherited_linker_flags=`$ECHO "$inherited_linker_flags" | $SED 's/-framework \([^ $]*\)/\1.ltframework/g'`;; diff --git a/config/ompi_check_lustre.m4 b/config/ompi_check_lustre.m4 index d27fe3bf390..8c385bfe8fa 100644 --- a/config/ompi_check_lustre.m4 +++ b/config/ompi_check_lustre.m4 @@ -10,8 +10,8 @@ dnl Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, dnl University of Stuttgart. All rights reserved. dnl Copyright (c) 2004-2006 The Regents of the University of California. dnl All rights reserved. -dnl Copyright (c) 2009 Cisco Systems, Inc. All rights reserved. -dnl Copyright (c) 2008-2012 University of Houston. All rights reserved. +dnl Copyright (c) 2009-2017 Cisco Systems, Inc. All rights reserved +dnl Copyright (c) 2008-2018 University of Houston. All rights reserved. dnl Copyright (c) 2015 Research Organization for Information Science dnl and Technology (RIST). All rights reserved. dnl $COPYRIGHT$ @@ -39,31 +39,33 @@ AC_DEFUN([OMPI_CHECK_LUSTRE],[ check_lustre_configuration="none" ompi_check_lustre_happy="yes" - # Get some configuration information AC_ARG_WITH([lustre], [AC_HELP_STRING([--with-lustre(=DIR)], [Build Lustre support, optionally adding DIR/include, DIR/lib, and DIR/lib64 to the search path for headers and libraries])]) - OPAL_CHECK_WITHDIR([lustre], [$with_lustre], [include/lustre/liblustreapi.h]) - - AS_IF([test -z "$with_lustre"], - [ompi_check_lustre_dir="/usr"], - [ompi_check_lustre_dir="$with_lustre"]) - - if test -e "$ompi_check_lustre_dir/lib64" ; then - ompi_check_lustre_libdir="$ompi_check_lustre_dir/lib64" - else - ompi_check_lustre_libdir="$ompi_check_lustre_dir/lib" - fi + OPAL_CHECK_WITHDIR([lustre], [$with_lustre], [include/lustre/lustreapi.h]) - # Add correct -I and -L flags - OPAL_CHECK_PACKAGE([$1], [lustre/liblustreapi.h], [lustreapi], [llapi_file_create], [], - [$ompi_check_lustre_dir], [$ompi_check_lustre_libdir], [ompi_check_lustre_happy="yes"], - [ompi_check_lustre_happy="no"]) - - AC_MSG_CHECKING([for required lustre data structures]) - cat > conftest.c < conftest.c <]],[[]])], + [ompi_check_ucx_happy="yes"], + [ompi_check_ucx_happy="no"]) - ompi_check_ucx_extra_libs="-L$ompi_check_ucx_libdir" + AC_MSG_RESULT([$ompi_check_ucx_happy])]) + AS_IF([test "$ompi_check_ucx_happy" = "no"], + [ompi_check_ucx_dir=/opt/ucx])]) + AS_IF([test "$ompi_check_ucx_happy" != yes], + [AS_IF([test -n "$with_ucx_libdir"], + [ompi_check_ucx_libdir="$with_ucx_libdir"], + [files=`ls $ompi_check_ucx_dir/lib64/libucp.* 2> /dev/null | wc -l` + AS_IF([test "$files" -gt 0], + [ompi_check_ucx_libdir=$ompi_check_ucx_dir/lib64], + [ompi_check_ucx_libdir=$ompi_check_ucx_dir/lib])]) - OPAL_CHECK_PACKAGE([ompi_check_ucx], - [ucp/api/ucp.h], - [ucp], - [ucp_cleanup], - [$ompi_check_ucx_extra_libs], - [$ompi_check_ucx_dir], - [$ompi_check_ucx_libdir], - [ompi_check_ucx_happy="yes"], - [ompi_check_ucx_happy="no"])], - [ompi_check_ucx_happy="no"]) + ompi_check_ucx_$1_save_CPPFLAGS="$CPPFLAGS" + ompi_check_ucx_$1_save_LDFLAGS="$LDFLAGS" + ompi_check_ucx_$1_save_LIBS="$LIBS" + OPAL_CHECK_PACKAGE([ompi_check_ucx], + [ucp/api/ucp.h], + [ucp], + [ucp_cleanup], + [], + [$ompi_check_ucx_dir], + [$ompi_check_ucx_libdir], + [ompi_check_ucx_happy="yes"], + [ompi_check_ucx_happy="no"]) + CPPFLAGS="$ompi_check_ucx_$1_save_CPPFLAGS" + LDFLAGS="$ompi_check_ucx_$1_save_LDFLAGS" + LIBS="$ompi_check_ucx_$1_save_LIBS" + AS_IF([test "$ompi_check_ucx_happy" = yes], + [AC_MSG_CHECKING(for UCX version compatibility) + AC_REQUIRE_CPP + old_CPPFLAGS="$CPPFLAGS" + CPPFLAGS="$CPPFLAGS -I$ompi_check_ucx_dir/include" + AC_COMPILE_IFELSE( + [AC_LANG_PROGRAM([[#include ]],[[]])], + [ompi_check_ucx_happy="yes"], + [ompi_check_ucx_happy="no"]) - CPPFLAGS="$ompi_check_ucx_$1_save_CPPFLAGS" - LDFLAGS="$ompi_check_ucx_$1_save_LDFLAGS" - LIBS="$ompi_check_ucx_$1_save_LIBS" + AC_MSG_RESULT([$ompi_check_ucx_happy]) + CPPFLAGS=$old_CPPFLAGS])]) - AC_MSG_CHECKING(for UCX version compatibility) - AC_REQUIRE_CPP - old_CPPFLAGS="$CPPFLAGS" - CPPFLAGS="$CPPFLAGS -I$ompi_check_ucx_dir/include" - AC_COMPILE_IFELSE( - [AC_LANG_PROGRAM([[#include ]], - [[ - ]])], - [ompi_ucx_version_ok="yes"], - [ompi_ucx_version_ok="no"]) - - AC_MSG_RESULT([$ompi_ucx_version_ok]) - CPPFLAGS=$old_CPPFLAGS - - AS_IF([test "$ompi_ucx_version_ok" = "no"], [ompi_check_ucx_happy="no"]) - - OPAL_SUMMARY_ADD([[Transports]],[[Open UCX]],[$1],[$ompi_check_ucx_happy]) - fi + old_CPPFLAGS="$CPPFLAGS" + AS_IF([test -n "$ompi_check_ucx_dir"], + [CPPFLAGS="$CPPFLAGS -I$ompi_check_ucx_dir/include"]) + AC_CHECK_DECLS([ucp_tag_send_nbr], + [AC_DEFINE([HAVE_UCP_TAG_SEND_NBR],[1], + [have ucp_tag_send_nbr()])], [], + [#include ]) + CPPFLAGS=$old_CPPFLAGS + OPAL_SUMMARY_ADD([[Transports]],[[Open UCX]],[$1],[$ompi_check_ucx_happy])])]) AS_IF([test "$ompi_check_ucx_happy" = "yes"], [$1_CPPFLAGS="[$]$1_CPPFLAGS $ompi_check_ucx_CPPFLAGS" - $1_LDFLAGS="[$]$1_LDFLAGS $ompi_check_ucx_LDFLAGS" - $1_LIBS="[$]$1_LIBS $ompi_check_ucx_LIBS" - $2], + $1_LDFLAGS="[$]$1_LDFLAGS $ompi_check_ucx_LDFLAGS" + $1_LIBS="[$]$1_LIBS $ompi_check_ucx_LIBS" + $2], [AS_IF([test ! -z "$with_ucx" && test "$with_ucx" != "no"], [AC_MSG_ERROR([UCX support requested but not found. Aborting])]) $3]) + + OPAL_VAR_SCOPE_POP ]) diff --git a/config/ompi_config_files.m4 b/config/ompi_config_files.m4 index b20ca13400e..bed371325fc 100644 --- a/config/ompi_config_files.m4 +++ b/config/ompi_config_files.m4 @@ -1,6 +1,8 @@ # -*- shell-script -*- # -# Copyright (c) 2009-2014 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2009-2017 Cisco Systems, Inc. All rights reserved +# Copyright (c) 2017 Research Organization for Information Science +# and Technology (RIST). All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -32,7 +34,7 @@ AC_DEFUN([OMPI_CONFIG_FILES],[ ompi/mpi/fortran/use-mpi-ignore-tkr/mpi-ignore-tkr-interfaces.h ompi/mpi/fortran/use-mpi-ignore-tkr/mpi-ignore-tkr-file-interfaces.h ompi/mpi/fortran/use-mpi-f08/Makefile - ompi/mpi/fortran/use-mpi-f08-desc/Makefile + ompi/mpi/fortran/use-mpi-f08/mod/Makefile ompi/mpi/fortran/mpiext/Makefile ompi/mpi/tool/Makefile ompi/mpi/tool/profile/Makefile diff --git a/config/ompi_configure_options.m4 b/config/ompi_configure_options.m4 index 3301df03c1b..eea5f4cabbb 100644 --- a/config/ompi_configure_options.m4 +++ b/config/ompi_configure_options.m4 @@ -10,14 +10,14 @@ dnl Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, dnl University of Stuttgart. All rights reserved. dnl Copyright (c) 2004-2005 The Regents of the University of California. dnl All rights reserved. -dnl Copyright (c) 2006-2015 Cisco Systems, Inc. All rights reserved. +dnl Copyright (c) 2006-2017 Cisco Systems, Inc. All rights reserved dnl Copyright (c) 2007 Sun Microsystems, Inc. All rights reserved. dnl Copyright (c) 2009 IBM Corporation. All rights reserved. dnl Copyright (c) 2009 Los Alamos National Security, LLC. All rights dnl reserved. dnl Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. dnl Copyright (c) 2013 Intel, Inc. All rights reserved. -dnl Copyright (c) 2015-2016 Research Organization for Information Science +dnl Copyright (c) 2015-2017 Research Organization for Information Science dnl and Technology (RIST). All rights reserved. dnl dnl $COPYRIGHT$ @@ -75,7 +75,7 @@ else GROUP_SPARSE=0 fi AC_DEFINE_UNQUOTED([OMPI_GROUP_SPARSE],$GROUP_SPARSE, - [Wether we want sparse process groups]) + [Whether we want sparse process groups]) # @@ -231,38 +231,6 @@ AC_DEFINE_UNQUOTED(MPI_PARAM_CHECK, $mpi_param_check, AC_DEFINE_UNQUOTED(OMPI_PARAM_CHECK, $ompi_param_check, [Whether we want to check MPI parameters never or possible (an integer constant)]) -# -# Do we want the prototype "use mpi_f08" implementation that uses -# Fortran descriptors? -# - -AC_MSG_CHECKING([which 'use mpi_f08' implementation to use]) -AC_ARG_ENABLE(mpi-f08-subarray-prototype, - AC_HELP_STRING([--enable-mpi-f08-subarray-prototype], - [Use the PROTOTYPE and SEVERLY FUNCTIONALITY-LIMITED Fortran 08 'use mpi_f08' implementation that supports subarrrays (via Fortran descriptors). This option will disable the normal 'use mpi_f08' implementation and *only* build the prototype implementation.])) -OMPI_BUILD_FORTRAN_F08_SUBARRAYS=0 -AS_IF([test $OMPI_TRY_FORTRAN_BINDINGS -lt $OMPI_FORTRAN_USEMPIF08_BINDINGS], - [AC_MSG_RESULT([none (use mpi_f08 disabled)])], - [AS_IF([test "$enable_mpi_f08_subarray_prototype" = "yes"], - [OMPI_BUILD_FORTRAN_F08_SUBARRAYS=1 - AC_MSG_RESULT([extra crispy (subarray prototype)]) - AC_MSG_WARN([Sorry, the subarray prototype is no longer available]) - AC_MSG_WARN([Contact your favorite OMPI developer and ask for it to be re-enabled]) - AC_MSG_ERROR([Cannot continue])], - [AC_MSG_RESULT([regular (no subarray support)])]) - ]) -AC_DEFINE_UNQUOTED([OMPI_BUILD_FORTRAN_F08_SUBARRAYS], - [$OMPI_BUILD_FORTRAN_F08_SUBARRAYS], - [Whether we built the 'use mpi_f08' prototype subarray-based implementation or not (i.e., whether to build the use-mpi-f08-desc prototype or the regular use-mpi-f08 implementation)]) - -AC_ARG_ENABLE([mpi-io], - [AC_HELP_STRING([--disable-mpi-io], - [Disable built-in support for MPI-2 I/O, likely because - an externally-provided MPI I/O package will be used. - Default is to use the internal framework system that uses - the ompio component and a specially modified version of ROMIO - that fits inside the romio314 component])]) - AC_ARG_ENABLE([io-ompio], [AC_HELP_STRING([--disable-io-ompio], [Disable the ompio MPI-IO component])]) diff --git a/config/ompi_cxx_have_exceptions.m4 b/config/ompi_cxx_have_exceptions.m4 index 2bd886e675f..d102683a727 100644 --- a/config/ompi_cxx_have_exceptions.m4 +++ b/config/ompi_cxx_have_exceptions.m4 @@ -22,7 +22,7 @@ AC_DEFUN([OMPI_CXX_HAVE_EXCEPTIONS],[ # # Arguments: None # -# Depdencies: None +# Dependencies: None # # Check to see if the C++ compiler can handle exceptions # diff --git a/config/ompi_ext.m4 b/config/ompi_ext.m4 index 40be85af98c..53984650986 100644 --- a/config/ompi_ext.m4 +++ b/config/ompi_ext.m4 @@ -3,10 +3,13 @@ dnl dnl Copyright (c) 2004-2009 The Trustees of Indiana University and Indiana dnl University Research and Technology dnl Corporation. All rights reserved. -dnl Copyright (c) 2009-2015 Cisco Systems, Inc. All rights reserved. +dnl Copyright (c) 2009-2017 Cisco Systems, Inc. All rights reserved dnl Copyright (c) 2011-2012 Oak Ridge National Labs. All rights reserved. -dnl Copyright (c) 2015 Research Organization for Information Science +dnl Copyright (c) 2015-2017 Research Organization for Information Science dnl and Technology (RIST). All rights reserved. +dnl Copyright (c) 2017 The University of Tennessee and The University +dnl of Tennessee Research Foundation. All rights +dnl reserved. dnl $COPYRIGHT$ dnl dnl Additional copyrights may follow @@ -212,9 +215,8 @@ module mpi_f08_ext EOF # Only build this mpi_f08_ext module if we're building the "use - # mpi_f08" module *and* it's the non-descriptor one. - AS_IF([test $OMPI_BUILD_FORTRAN_BINDINGS -ge $OMPI_FORTRAN_USEMPIF08_BINDINGS && \ - test $OMPI_BUILD_FORTRAN_F08_SUBARRAYS -eq 0], + # mpi_f08" module + AS_IF([test $OMPI_BUILD_FORTRAN_BINDINGS -ge $OMPI_FORTRAN_USEMPIF08_BINDINGS], [OMPI_BUILD_FORTRAN_USEMPIF08_EXT=1], [OMPI_BUILD_FORTRAN_USEMPIF08_EXT=0]) AM_CONDITIONAL(OMPI_BUILD_FORTRAN_USEMPIF08_EXT, @@ -554,17 +556,17 @@ EOF # # Include the mpif.h header if it is available. Cannot do # this from inside the usempi.h since, for VPATH builds, the - # top_ompi_srcdir is needed to find the header. + # srcdir is needed to find the header. # if test "$enabled_mpifh" = 1; then mpifh_component_header="mpiext_${component}_mpifh.h" cat >> $mpiusempi_ext_h <> $mpiusempi_ext_h <> $mpiusempif08_ext_h <> $mpiusempif08_ext_h <=LONG_ANNOYING_DIRECTORY. + AS_IF([test -z "$with_jdk_bindir"], + [ # OS X/macOS + ompi_java_found=0 + # The following logic was deliberately decided upon in + # https://github.com/open-mpi/ompi/pull/5015 specifically + # to prevent this script and the rest of Open MPI's build + # system from getting confused by the somewhat unorthodox + # Java toolchain layout present on OS X/macOS systems, + # described in depth by + # https://github.com/open-mpi/ompi/pull/5015#issuecomment-379324639, + # and mishandling OS X/macOS Java toolchain path detection + # as a result. + AS_IF([test -x /usr/libexec/java_home], + [ompi_java_dir=`/usr/libexec/java_home`], + [ompi_java_dir=/System/Library/Frameworks/JavaVM.framework/Versions/Current]) + AC_MSG_CHECKING([for Java in OS X/macOS locations]) + AS_IF([test -d $ompi_java_dir], + [AC_MSG_RESULT([found ($ompi_java_dir)]) + ompi_java_found=1 + if test -d "$ompi_java_dir/Headers" && test -d "$ompi_java_dir/Commands"; then + with_jdk_headers=$ompi_java_dir/Headers + with_jdk_bindir=$ompi_java_dir/Commands + elif test -d "$ompi_java_dir/include" && test -d "$ompi_java_dir/bin"; then + with_jdk_headers=$ompi_java_dir/include + with_jdk_bindir=$ompi_java_dir/bin + else + AC_MSG_WARN([No recognized OS X/macOS JDK directory structure found under $ompi_java_dir]) + ompi_java_found=0 + fi], + [AC_MSG_RESULT([not found])]) + + if test "$ompi_java_found" = "0"; then + # Various Linux + if test -z "$JAVA_HOME"; then + ompi_java_dir='/usr/lib/jvm/java-*-openjdk-*/include/' + else + ompi_java_dir=$JAVA_HOME/include + fi + ompi_java_jnih=`ls $ompi_java_dir/jni.h 2>/dev/null | head -n 1` + AC_MSG_CHECKING([for Java in Linux locations]) + AS_IF([test -r "$ompi_java_jnih"], + [with_jdk_headers=`dirname $ompi_java_jnih` + OPAL_WHICH([javac], [with_jdk_bindir]) + AS_IF([test -n "$with_jdk_bindir"], + [AC_MSG_RESULT([found ($with_jdk_headers)]) + ompi_java_found=1 + with_jdk_bindir=`dirname $with_jdk_bindir`], + [with_jdk_headers=])], + [ompi_java_dir='/usr/lib/jvm/default-java/include/' + ompi_java_jnih=`ls $ompi_java_dir/jni.h 2>/dev/null | head -n 1` + AS_IF([test -r "$ompi_java_jnih"], + [with_jdk_headers=`dirname $ompi_java_jnih` + OPAL_WHICH([javac], [with_jdk_bindir]) + AS_IF([test -n "$with_jdk_bindir"], + [AC_MSG_RESULT([found ($with_jdk_headers)]) + ompi_java_found=1 + with_jdk_bindir=`dirname $with_jdk_bindir`], + [with_jdk_headers=])], + [AC_MSG_RESULT([not found])])]) + fi + + if test "$ompi_java_found" = "0"; then + # Solaris + ompi_java_dir=/usr/java + AC_MSG_CHECKING([for Java in Solaris locations]) + AS_IF([test -d $ompi_java_dir && test -r "$ompi_java_dir/include/jni.h"], + [AC_MSG_RESULT([found ($ompi_java_dir)]) + with_jdk_headers=$ompi_java_dir/include + with_jdk_bindir=$ompi_java_dir/bin + ompi_java_found=1], + [AC_MSG_RESULT([not found])]) + fi + ], + [ompi_java_found=1]) + + if test "$ompi_java_found" = "1"; then + OPAL_CHECK_WITHDIR([jdk-bindir], [$with_jdk_bindir], [javac]) + OPAL_CHECK_WITHDIR([jdk-headers], [$with_jdk_headers], [jni.h]) + + # Look for various Java-related programs + ompi_java_happy=no + ompi_java_PATH_save=$PATH + AS_IF([test -n "$with_jdk_bindir" && test "$with_jdk_bindir" != "yes" && test "$with_jdk_bindir" != "no"], + [PATH="$with_jdk_bindir:$PATH"]) + AC_PATH_PROG(JAVAC, javac) + AC_PATH_PROG(JAR, jar) + AC_PATH_PROG(JAVADOC, javadoc) + AC_PATH_PROG(JAVAH, javah) + PATH=$ompi_java_PATH_save + + # Check to see if we have all 3 programs. + AS_IF([test -z "$JAVAC" || test -z "$JAR" || test -z "$JAVADOC"], + [ompi_java_happy=no], + [ompi_java_happy=yes]) - # Only build the Java bindings if requested - if test "$opal_java_happy" = "yes" && test "$enable_mpi_java" = "yes"; then - AC_MSG_RESULT([yes]) - WANT_MPI_JAVA_SUPPORT=1 - AC_MSG_CHECKING([if shared libraries are enabled]) - AS_IF([test "$enable_shared" != "yes"], - [AC_MSG_RESULT([no]) - AC_MSG_WARN([Java bindings cannot be built without shared libraries]) - AC_MSG_WARN([Please reconfigure with --enable-shared]) - AC_MSG_ERROR([Cannot continue])], - [AC_MSG_RESULT([yes])]) - # must have Java support - AC_MSG_CHECKING([if Java support was found]) - AS_IF([test "$opal_java_happy" = "yes"], - [AC_MSG_RESULT([yes])], - [AC_MSG_WARN([Java MPI bindings requested, but Java support was not found]) - AC_MSG_WARN([Please reconfigure the --with-jdk options to where Java]) - AC_MSG_WARN([support can be found]) - AC_MSG_ERROR([Cannot continue])]) - - # Mac Java requires this file (i.e., some other Java-related - # header file needs this file, so we need to check for - # it/include it in our sources when compiling on Mac). - AC_CHECK_HEADERS([TargetConditionals.h]) - - # dladdr and Dl_info are required to build the full path to libmpi on OS X 10.11 aka El Capitan - AC_CHECK_TYPES([Dl_info], [], [], [[#include ]]) + # Look for jni.h + AS_IF([test "$ompi_java_happy" = "yes"], + [ompi_java_CPPFLAGS_save=$CPPFLAGS + # silence a stupid Mac warning + CPPFLAGS="$CPPFLAGS -DTARGET_RT_MAC_CFM=0" + AC_MSG_CHECKING([javac -h]) + cat > Conftest.java << EOF +public final class Conftest { + public native void conftest(); +} +EOF + AS_IF([$JAVAC -d . -h . Conftest.java > /dev/null 2>&1], + [AC_MSG_RESULT([yes])], + [AC_MSG_RESULT([no]) + AS_IF([test -n "$JAVAH"], + [ompi_javah_happy=yes], + [ompi_java_happy=no])]) + rm -f Conftest.java Conftest.class Conftest.h + + AS_IF([test -n "$with_jdk_headers" && test "$with_jdk_headers" != "yes" && test "$with_jdk_headers" != "no"], + [OMPI_JDK_CPPFLAGS="-I$with_jdk_headers" + # Some flavors of JDK also require -I/linux. + # See if that's there, and if so, add a -I for that, + # too. Ugh. + AS_IF([test -d "$with_jdk_headers/linux"], + [OMPI_JDK_CPPFLAGS="$OMPI_JDK_CPPFLAGS -I$with_jdk_headers/linux"]) + # Solaris JDK also require -I/solaris. + # See if that's there, and if so, add a -I for that, + # too. Ugh. + AS_IF([test -d "$with_jdk_headers/solaris"], + [OMPI_JDK_CPPFLAGS="$OMPI_JDK_CPPFLAGS -I$with_jdk_headers/solaris"]) + # Darwin JDK also require -I/darwin. + # See if that's there, and if so, add a -I for that, + # too. Ugh. + AS_IF([test -d "$with_jdk_headers/darwin"], + [OMPI_JDK_CPPFLAGS="$OMPI_JDK_CPPFLAGS -I$with_jdk_headers/darwin"]) + + CPPFLAGS="$CPPFLAGS $OMPI_JDK_CPPFLAGS"]) + AC_CHECK_HEADER([jni.h], [], + [ompi_java_happy=no]) + CPPFLAGS=$ompi_java_CPPFLAGS_save + ]) else - AC_MSG_RESULT([no]) - WANT_MPI_JAVA_SUPPORT=0 + ompi_java_happy=no fi - AC_DEFINE_UNQUOTED([OMPI_WANT_JAVA_BINDINGS], [$WANT_MPI_JAVA_SUPPORT], - [do we want java mpi bindings]) - AM_CONDITIONAL(OMPI_WANT_JAVA_BINDINGS, test "$WANT_MPI_JAVA_SUPPORT" = "1") - - # Are we happy? - AS_IF([test "$WANT_MPI_JAVA_SUPPORT" = "1"], - [AC_MSG_WARN([******************************************************]) - AC_MSG_WARN([*** Java MPI bindings are provided on a provisional]) - AC_MSG_WARN([*** basis. They are NOT part of the current or]) - AC_MSG_WARN([*** proposed MPI standard. Continued inclusion of]) - AC_MSG_WARN([*** the Java MPI bindings in Open MPI is contingent]) - AC_MSG_WARN([*** upon user interest and developer support.]) - AC_MSG_WARN([******************************************************]) - ]) - - AC_CONFIG_FILES([ - ompi/mpi/java/Makefile - ompi/mpi/java/java/Makefile - ompi/mpi/java/c/Makefile - ]) + AC_SUBST(OMPI_JDK_CPPFLAGS) + + # Are we happy? + AC_MSG_CHECKING([if Java support available]) + AS_IF([test "$ompi_java_happy" = "yes"], + [AC_MSG_RESULT([yes])], + [AC_MSG_RESULT([no]) + AC_MSG_WARN([Java support requested but not found.]) + AC_MSG_ERROR([Cannot continue])]) + + OPAL_VAR_SCOPE_POP +]) + +dnl OMPI_SETUP_JAVA() +dnl ---------------- +dnl Do everything required to setup the Java compiler. +AC_DEFUN([OMPI_SETUP_JAVA],[ + OPAL_VAR_SCOPE_PUSH([ompi_java_happy ompi_javah_happy]) + + ompi_java_happy=no + ompi_javah_happy=no + + AC_ARG_WITH(jdk-dir, + AC_HELP_STRING([--with-jdk-dir(=DIR)], + [Location of the JDK header directory. If you use this option, do not specify --with-jdk-bindir or --with-jdk-headers.])) + AC_ARG_WITH(jdk-bindir, + AC_HELP_STRING([--with-jdk-bindir(=DIR)], + [Location of the JDK bin directory. If you use this option, you must also use --with-jdk-headers (and you must NOT use --with-jdk-dir)])) + AC_ARG_WITH(jdk-headers, + AC_HELP_STRING([--with-jdk-headers(=DIR)], + [Location of the JDK header directory. If you use this option, you must also use --with-jdk-bindir (and you must NOT use --with-jdk-dir)])) + + # Only setup the compiler if we were requested to + AS_IF([test "$1" = "yes"], + [_OMPI_SETUP_JAVA]) + + AM_CONDITIONAL(OMPI_HAVE_JAVAH_SUPPORT, test "$ompi_javah_happy" = "yes") + + OPAL_VAR_SCOPE_POP ]) diff --git a/config/ompi_setup_mpi_fortran.m4 b/config/ompi_setup_mpi_fortran.m4 index 089f7c5b934..04ae4e0a8f9 100644 --- a/config/ompi_setup_mpi_fortran.m4 +++ b/config/ompi_setup_mpi_fortran.m4 @@ -10,12 +10,12 @@ dnl Copyright (c) 2004-2007 High Performance Computing Center Stuttgart, dnl University of Stuttgart. All rights reserved. dnl Copyright (c) 2004-2005 The Regents of the University of California. dnl All rights reserved. -dnl Copyright (c) 2006-2015 Cisco Systems, Inc. All rights reserved. +dnl Copyright (c) 2006-2017 Cisco Systems, Inc. All rights reserved dnl Copyright (c) 2006-2008 Sun Microsystems, Inc. All rights reserved. dnl Copyright (c) 2006-2007 Los Alamos National Security, LLC. All rights dnl reserved. dnl Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. -dnl Copyright (c) 2014-2016 Research Organization for Information Science +dnl Copyright (c) 2014-2017 Research Organization for Information Science dnl and Technology (RIST). All rights reserved. dnl Copyright (c) 2016 IBM Corporation. All rights reserved. dnl $COPYRIGHT$ @@ -34,7 +34,6 @@ AC_DEFUN([OMPI_SETUP_MPI_FORTRAN],[ OMPI_FORTRAN_USEMPI_DIR= OMPI_FORTRAN_USEMPI_LIB= - OMPI_FORTRAN_USEMPIF08_DIR= OMPI_FORTRAN_USEMPIF08_LIB= OMPI_FORTRAN_MAX_ARRAY_RANK=0 @@ -51,7 +50,6 @@ AC_DEFUN([OMPI_SETUP_MPI_FORTRAN],[ OMPI_FORTRAN_HAVE_BIND_C_TYPE_NAME=0 OMPI_FORTRAN_HAVE_F08_ASSUMED_RANK=0 OMPI_FORTRAN_HAVE_PRIVATE=0 - OMPI_FORTRAN_SUBARRAYS_SUPPORTED=.FALSE. # These macros control symbol names for Fortran/C interoperability # @@ -532,33 +530,20 @@ end type test_mpi_handle], OMPI_FORTRAN_HAVE_F08_ASSUMED_RANK=1]) # Which mpi_f08 implementation are we using? - # a) partial, proof-of-concept that supports array - # subsections (Intel compiler only) - # b) compiler supports BIND(C) and optional arguments + # a) compiler supports BIND(C) and optional arguments # ("good" compilers) - # c) compiler that does not support the items listed + # b) compiler that does not support the items listed # in b) ("bad" compilers) AC_MSG_CHECKING([which mpi_f08 implementation to build]) - AS_IF([test $OMPI_BUILD_FORTRAN_F08_SUBARRAYS -eq 1], - [ # Case a) partial/prototype implementation - OMPI_FORTRAN_USEMPIF08_DIR=mpi/fortran/use-mpi-f08-desc - OMPI_FORTRAN_SUBARRAYS_SUPPORTED=.TRUE. - OMPI_FORTRAN_NEED_WRAPPER_ROUTINES=0 - AC_MSG_RESULT([array subsections (partial/experimental)]) + AS_IF([test $OMPI_FORTRAN_HAVE_OPTIONAL_ARGS -eq 1], + [ # Case a) "good compiler" + OMPI_FORTRAN_NEED_WRAPPER_ROUTINES=0 + AC_MSG_RESULT(["good" compiler, no array subsections]) ], - [ # Both cases b) and c) - OMPI_FORTRAN_USEMPIF08_DIR=mpi/fortran/use-mpi-f08 - OMPI_FORTRAN_SUBARRAYS_SUPPORTED=.FALSE. - AS_IF([test $OMPI_FORTRAN_HAVE_OPTIONAL_ARGS -eq 1], - [ # Case b) "good compiler" - OMPI_FORTRAN_NEED_WRAPPER_ROUTINES=0 - AC_MSG_RESULT(["good" compiler, no array subsections]) - ], - [ # Case c) "bad compiler" - OMPI_FORTRAN_NEED_WRAPPER_ROUTINES=1 - AC_MSG_RESULT(["bad" compiler, no array subsections]) - ]) + [ # Case b) "bad compiler" + OMPI_FORTRAN_NEED_WRAPPER_ROUTINES=1 + AC_MSG_RESULT(["bad" compiler, no array subsections]) ]) ]) @@ -699,8 +684,6 @@ end type test_mpi_handle], # use mpi_f08 final setup # ------------------- - # This goes into ompi/Makefile.am - AC_SUBST(OMPI_FORTRAN_USEMPIF08_DIR) # This goes into mpifort-wrapper-data.txt AC_SUBST(OMPI_FORTRAN_USEMPIF08_LIB) @@ -714,12 +697,6 @@ end type test_mpi_handle], AC_SUBST(OMPI_F08_SUFFIX) AC_SUBST(OMPI_F_SUFFIX) - # This goes into ompi/mpi/fortran/configure-fortran-output.h - AC_SUBST(OMPI_FORTRAN_SUBARRAYS_SUPPORTED) - AC_DEFINE_UNQUOTED(OMPI_FORTRAN_SUBARRAYS_SUPPORTED, - [$OMPI_FORTRAN_SUBARRAYS_SUPPORTED], - [Value to load to the MPI_SUBARRAYS_SUPPORTED compile-time constant]) - # This is used to generate weak symbols (or not) in # ompi/mpi/fortran/mpif-h/_f.c, and # ompi/mpi/fortran/configure-fortran-output.h. diff --git a/config/ompi_setup_mpi_java.m4 b/config/ompi_setup_mpi_java.m4 new file mode 100644 index 00000000000..4d14cd4fbbc --- /dev/null +++ b/config/ompi_setup_mpi_java.m4 @@ -0,0 +1,85 @@ +dnl -*- shell-script -*- +dnl +dnl Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana +dnl University Research and Technology +dnl Corporation. All rights reserved. +dnl Copyright (c) 2004-2006 The University of Tennessee and The University +dnl of Tennessee Research Foundation. All rights +dnl reserved. +dnl Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, +dnl University of Stuttgart. All rights reserved. +dnl Copyright (c) 2004-2006 The Regents of the University of California. +dnl All rights reserved. +dnl Copyright (c) 2006-2012 Los Alamos National Security, LLC. All rights +dnl reserved. +dnl Copyright (c) 2007-2012 Oracle and/or its affiliates. All rights reserved. +dnl Copyright (c) 2008-2018 Cisco Systems, Inc. All rights reserved +dnl Copyright (c) 2015 Research Organization for Information Science +dnl and Technology (RIST). All rights reserved. +dnl $COPYRIGHT$ +dnl +dnl Additional copyrights may follow +dnl +dnl $HEADER$ +dnl + +dnl OMPI_SETUP_JAVA_BINDINGS() +dnl ---------------- +dnl Do everything required to setup the Java MPI bindings. +AC_DEFUN([OMPI_SETUP_JAVA_BINDINGS],[ + opal_show_subtitle "Java MPI bindings" + + AC_ARG_ENABLE(mpi-java, + AC_HELP_STRING([--enable-mpi-java], + [enable Java MPI bindings (default: disabled)])) + + # Find the Java compiler and whatnot. + # It knows to do very little if $enable_mpi_java!="yes". + OMPI_SETUP_JAVA([$enable_mpi_java]) + + # Only build the Java bindings if requested + AC_MSG_CHECKING([if want Java bindings]) + if test "$enable_mpi_java" = "yes"; then + AC_MSG_RESULT([yes]) + WANT_MPI_JAVA_BINDINGS=1 + AC_MSG_CHECKING([if shared libraries are enabled]) + AS_IF([test "$enable_shared" != "yes"], + [AC_MSG_RESULT([no]) + AC_MSG_WARN([Java bindings cannot be built without shared libraries]) + AC_MSG_WARN([Please reconfigure with --enable-shared]) + AC_MSG_ERROR([Cannot continue])], + [AC_MSG_RESULT([yes])]) + + # Mac Java requires this file (i.e., some other Java-related + # header file needs this file, so we need to check for + # it/include it in our sources when compiling on Mac). + AC_CHECK_HEADERS([TargetConditionals.h]) + + # dladdr and Dl_info are required to build the full path to + # libmpi on OS X 10.11 (a.k.a. El Capitan) + AC_CHECK_TYPES([Dl_info], [], [], [[#include ]]) + else + AC_MSG_RESULT([no]) + WANT_MPI_JAVA_BINDINGS=0 + fi + AC_DEFINE_UNQUOTED([OMPI_WANT_JAVA_BINDINGS], [$WANT_MPI_JAVA_BINDINGS], + [do we want java mpi bindings]) + AM_CONDITIONAL(OMPI_WANT_JAVA_BINDINGS, test "$WANT_MPI_JAVA_BINDINGS" = "1") + + # Are we happy? + AS_IF([test $WANT_MPI_JAVA_BINDINGS -eq 1], + [AC_MSG_WARN([******************************************************]) + AC_MSG_WARN([*** Java MPI bindings are provided on a provisional]) + AC_MSG_WARN([*** basis. They are NOT part of the current or]) + AC_MSG_WARN([*** proposed MPI standard. Continued inclusion of]) + AC_MSG_WARN([*** the Java MPI bindings in Open MPI is contingent]) + AC_MSG_WARN([*** upon user interest and developer support.]) + AC_MSG_WARN([******************************************************]) + ]) + + AC_CONFIG_FILES([ + ompi/mpi/java/Makefile + ompi/mpi/java/java/Makefile + ompi/mpi/java/c/Makefile + ]) +]) diff --git a/config/opal_check_attributes.m4 b/config/opal_check_attributes.m4 index 064a59aea6d..53fa38eb0d9 100644 --- a/config/opal_check_attributes.m4 +++ b/config/opal_check_attributes.m4 @@ -11,11 +11,12 @@ dnl University of Stuttgart. All rights reserved. dnl Copyright (c) 2004-2005 The Regents of the University of California. dnl All rights reserved. dnl Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. -dnl Copyright (c) 2010-2013 Cisco Systems, Inc. All rights reserved. +dnl Copyright (c) 2010-2018 Cisco Systems, Inc. All rights reserved dnl Copyright (c) 2013 Mellanox Technologies, Inc. dnl All rights reserved. dnl Copyright (c) 2015 Research Organization for Information Science dnl and Technology (RIST). All rights reserved. +dnl Copyright (c) 2017 Intel, Inc. All rights reserved. dnl $COPYRIGHT$ dnl dnl Additional copyrights may follow @@ -221,6 +222,7 @@ AC_DEFUN([OPAL_CHECK_ATTRIBUTES], [ opal_cv___attribute__warn_unused_result=0 opal_cv___attribute__weak_alias=0 opal_cv___attribute__destructor=0 + opal_cv___attribute__optnone=0 else AC_MSG_RESULT([yes]) @@ -556,6 +558,21 @@ AC_DEFUN([OPAL_CHECK_ATTRIBUTES], [ ], [], []) + + _OPAL_CHECK_SPECIFIC_ATTRIBUTE([optnone], + [ + void __attribute__ ((__optnone__)) foo(void); + void foo(void) { return ; } + ], + [], + []) + + _OPAL_CHECK_SPECIFIC_ATTRIBUTE([extension], + [ + int i = __extension__ 3; + ], + [], + []) fi # Now that all the values are set, define them @@ -608,4 +625,8 @@ AC_DEFUN([OPAL_CHECK_ATTRIBUTES], [ [Whether your compiler has __attribute__ weak alias or not]) AC_DEFINE_UNQUOTED(OPAL_HAVE_ATTRIBUTE_DESTRUCTOR, [$opal_cv___attribute__destructor], [Whether your compiler has __attribute__ destructor or not]) + AC_DEFINE_UNQUOTED(OPAL_HAVE_ATTRIBUTE_OPTNONE, [$opal_cv___attribute__optnone], + [Whether your compiler has __attribute__ optnone or not]) + AC_DEFINE_UNQUOTED(OPAL_HAVE_ATTRIBUTE_EXTENSION, [$opal_cv___attribute__extension], + [Whether your compiler has __attribute__ extension or not]) ]) diff --git a/config/opal_check_cma.m4 b/config/opal_check_cma.m4 index 2930debf911..013f39477d1 100644 --- a/config/opal_check_cma.m4 +++ b/config/opal_check_cma.m4 @@ -22,6 +22,10 @@ AC_DEFUN([OPAL_CHECK_CMA],[ [AC_HELP_STRING([--with-cma], [Build Cross Memory Attach support (default: autodetect)])]) + if test "x$with_cma" = "xno" ; then + opal_check_cma_happy=0 + fi + # We only need to do the back-end test once if test -z "$opal_check_cma_happy" ; then OPAL_CHECK_CMA_BACKEND diff --git a/config/opal_check_cray_pmi.m4 b/config/opal_check_cray_pmi.m4 index 8e3dfee58f3..8f210f5dbe4 100644 --- a/config/opal_check_cray_pmi.m4 +++ b/config/opal_check_cray_pmi.m4 @@ -52,11 +52,11 @@ AC_DEFUN([OPAL_CHECK_CRAY_PMI_EXPLICIT],[ # AS_IF([test "$enable_static" = "yes"], [AS_IF([test -d /usr/lib/alps], - [AC_MSG_RESULT([Detected presense of /usr/lib/alps]) + [AC_MSG_RESULT([Detected presence of /usr/lib/alps]) CRAY_PMI_LDFLAGS="$CRAY_PMI_LDFLAGS -L/usr/lib/alps -lalpslli -lalpsutil" CRAY_PMI_LIBS="$CRAY_PMI_LIBS -L/usr/lib/alps -lalpslli -lalpsutil"], [AS_IF([test -d /opt/cray/xe-sysroot/default/usr/lib/alps], - [AC_MSG_RESULT([Detected presense of /opt/cray/xe-sysroot/default/usr/lib/alps]) + [AC_MSG_RESULT([Detected presence of /opt/cray/xe-sysroot/default/usr/lib/alps]) CRAY_PMI_LDFLAGS="$CRAY_PMI_LDFLAGS -L/opt/cray/xe-sysroot/default/usr/lib/alps -lalpslli -lalpsutil" CRAY_PMI_LIBS="$CRAY_PMI_LIBS -L/opt/cray/xe-sysroot/default/usr/lib/alps -lalpslli -lalpsutil"], [AC_MSG_ERROR([Requested enabling static linking but unable to local libalpslli and libalpsutil])]) diff --git a/config/opal_check_libfabric.m4 b/config/opal_check_libfabric.m4 deleted file mode 100644 index 142c7c61008..00000000000 --- a/config/opal_check_libfabric.m4 +++ /dev/null @@ -1,95 +0,0 @@ -dnl -*- shell-script -*- -dnl -dnl Copyright (c) 2015-2016 Cisco Systems, Inc. All rights reserved. -dnl Copyright (c) 2016 Los Alamos National Security, LLC. All rights -dnl reserved. -dnl $COPYRIGHT$ -dnl -dnl Additional copyrights may follow -dnl -dnl $HEADER$ -dnl - - -# OPAL_CHECK_LIBFABRIC(prefix, [action-if-found], [action-if-not-found] -# -------------------------------------------------------- -# Check if libfabric support can be found. -# -# Sets prefix_{CPPFLAGS, LDFLAGs, LIBS} as needed and runs -# action-if-found if there is support; otherwise executes -# action-if-not-found. -# -AC_DEFUN([OPAL_CHECK_LIBFABRIC],[ - if test -z "$opal_check_libfabric_happy" ; then - OPAL_VAR_SCOPE_PUSH([opal_check_libfabric_$1_save_CPPFLAGS opal_check_libfabric_$1_save_LDFLAGS opal_check_libfabric_$1_save_LIBS]) - - # Add --with options - AC_ARG_WITH([libfabric], - [AC_HELP_STRING([--with-libfabric=DIR], - [Specify location of libfabric installation, adding DIR/include to the default search location for libfabric headers, and DIR/lib or DIR/lib64 to the default search location for libfabric libraries. Error if libfabric support cannot be found.])]) - AC_ARG_WITH([libfabric-libdir], - [AC_HELP_STRING([--with-libfabric-libdir=DIR], - [Search for libfabric libraries in DIR])]) - - # Sanity check the --with values - OPAL_CHECK_WITHDIR([libfabric], [$with_libfabric], - [include/rdma/fabric.h]) - OPAL_CHECK_WITHDIR([libfabric-libdir], [$with_libfabric_libdir], - [libfabric.*]) - - opal_check_libfabric_$1_save_CPPFLAGS=$CPPFLAGS - opal_check_libfabric_$1_save_LDFLAGS=$LDFLAGS - opal_check_libfabric_$1_save_LIBS=$LIBS - - opal_check_libfabric_happy=yes - AS_IF([test "$with_libfabric" = "no"], - [opal_check_libfabric_happy=no]) - - AS_IF([test $opal_check_libfabric_happy = yes], - [AC_MSG_CHECKING([looking for libfabric in]) - AS_IF([test "$with_libfabric" != "yes"], - [opal_libfabric_dir=$with_libfabric - AC_MSG_RESULT([($opal_libfabric_dir)])], - [AC_MSG_RESULT([(default search paths)])]) - AS_IF([test ! -z "$with_libfabric_libdir" && \ - test "$with_libfabric_libdir" != "yes"], - [opal_libfabric_libdir=$with_libfabric_libdir]) - ]) - - AS_IF([test $opal_check_libfabric_happy = yes], - [OPAL_CHECK_PACKAGE([opal_check_libfabric], - [rdma/fabric.h], - [fabric], - [fi_getinfo], - [], - [$opal_libfabric_dir], - [$opal_libfabric_libdir], - [], - [opal_check_libfabric_happy=no])]) - - CPPFLAGS=$opal_check_libfabric_$1_save_CPPFLAGS - LDFLAGS=$opal_check_libfabric_$1_save_LDFLAGS - LIBS=$opal_check_libfabric_$1_save_LIBS - - OPAL_SUMMARY_ADD([[Transports]],[[OpenFabrics Libfabric]],[$1],[$opal_check_libfabric_happy]) - - OPAL_VAR_SCOPE_POP - fi - - if test $opal_check_libfabric_happy = yes ; then - $1_CPPFLAGS="[$]$1_CPPFLAGS $opal_check_libfabric_CPPFLAGS" - $1_LIBS="[$]$1_LIBS $opal_check_libfabric_LIBS" - $1_LDFLAGS="[$]$1_LDFLAGS $opal_check_libfabric_LDFLAGS" - - AC_SUBST($1_CPPFLAGS) - AC_SUBST($1_LDFLAGS) - AC_SUBST($1_LIBS) - fi - - AS_IF([test $opal_check_libfabric_happy = yes], - [$2], - [AS_IF([test -n "$with_libfabric" && test "$with_libfabric" != "no"], - [AC_MSG_WARN([libfabric support requested (via --with-libfabric), but not found.]) - AC_MSG_ERROR([Cannot continue.])]) - $3]) -])dnl diff --git a/config/opal_check_libnl.m4 b/config/opal_check_libnl.m4 index 075ce6ed822..68c13c06cbe 100644 --- a/config/opal_check_libnl.m4 +++ b/config/opal_check_libnl.m4 @@ -1,7 +1,8 @@ dnl -*- shell-script -*- dnl -dnl Copyright (c) 2015-2016 Research Organization for Information Science +dnl Copyright (c) 2015-2017 Research Organization for Information Science dnl and Technology (RIST). All rights reserved. +dnl Copyright (c) 2017 Cisco Systems, Inc. All rights reserved. dnl $COPYRIGHT$ dnl dnl Additional copyrights may follow @@ -18,7 +19,7 @@ dnl libnl-3.so.200 and friends, so if libnl3-devel packages are not dnl installed, but libnl-devel are, Open MPI should not try to use libnl. dnl dnl GROSS: libnl wants us to either use pkg-config (which we -dnl can't assume is always present) or we need to look in a +dnl cannot assume is always present) or we need to look in a dnl particular directory for the right libnl3 include files. For dnl now, just hard code the special path into this logic. dnl @@ -67,46 +68,128 @@ AC_DEFUN([OPAL_LIBNL_SANITY_INIT], [ esac ]) -dnl OPAL_LIBNL_SANITY_CHECK(lib, function, LIBS) +dnl OPAL_LIBNL_SANITY_FAIL_MSG(lib) +dnl +dnl Helper to pring a big warning message when we detect a libnl conflict. +dnl +dnl -------------------------------------------------------------------- +AC_DEFUN([OPAL_LIBNL_SANITY_FAIL_MSG], [ + AC_MSG_WARN([This is a configuration that is *known* to cause run-time crashes.]) + AC_MSG_WARN([This is an error in lib$1 (not Open MPI).]) + AC_MSG_WARN([Open MPI will therefore skip using lib$1.]) +]) + +dnl OPAL_LIBNL_SANITY_CHECK(lib, function, LIBS, libnl_check_ok) +dnl +dnl This macro is invoked from OPAL_CHECK_PACKAGE to make sure that +dnl new libraries that are added to LIBS do not pull in conflicting +dnl versions of libnl. E.g., if we already have a library in LIBS +dnl that pulls in libnl v3, if OPAL_CHECK_PACKAGE is later called that +dnl pulls in a library that pulls in libnl v1, this macro will detect +dnl the conflict and will abort configure. +dnl +dnl We abort rather than silently ignore this library simply because +dnl we are now multiple levels deep in the M4 "call stack", and this +dnl layer does not know the intent of the user. Hence, all we can do +dnl is abort with a hopefully helpful error message (that we aborted +dnl because Open MPI would have been built in a configuration that is +dnl known to crash). +dnl dnl -------------------------------------------------------------------- AC_DEFUN([OPAL_LIBNL_SANITY_CHECK], [ + OPAL_VAR_SCOPE_PUSH([opal_libnl_sane]) + opal_libnl_sane=1 case $host in *linux*) - OPAL_VAR_SCOPE_PUSH([ldd_output libnl_version]) - AC_LANG_PUSH(C) - cat > conftest_c.$ac_ext << EOF + OPAL_LIBNL_SANITY_CHECK_LINUX($1, $2, $3, opal_libnl_sane) + ;; + esac + + $4=$opal_libnl_sane + OPAL_VAR_SCOPE_POP([opal_libnl_sane]) +]) + +dnl +dnl Simple helper for OPAL_LIBNL_SANITY_CHECK +dnl $1: library name +dnl $2: function +dnl $3: LIBS +dnl $4: output variable (1=ok, 0=not ok) +dnl +AC_DEFUN([OPAL_LIBNL_SANITY_CHECK_LINUX], [ + OPAL_VAR_SCOPE_PUSH([this_requires_v1 libnl_sane this_requires_v3 ldd_output result_msg]) + + AC_LANG_PUSH(C) + + AC_MSG_CHECKING([if lib$1 requires libnl v1 or v3]) + cat > conftest_c.$ac_ext << EOF extern void $2 (void); int main(int argc, char *argv[[]]) { $2 (); return 0; } EOF - OPAL_LOG_COMMAND([$CC -o conftest $CFLAGS $CPPFLAGS conftest_c.$ac_ext $LDFLAGS -l$1 $LIBS $3], - [ldd_output=`ldd conftest` - libnl_version=0 - AS_IF([echo $ldd_output | grep -q libnl.so], - [AS_IF([test $opal_libnl_version -eq 3], - [AC_MSG_WARN([lib nl version conflict: $opal_libnlv3_libs requires libnl-3 whereas $1 requires libnl])], - [opal_libnlv1_libs="$opal_libnlv1_libs $1" - OPAL_UNIQ([opal_libnlv1_libs]) - opal_libnl_version=1]) - libnl_version=1]) - AS_IF([echo $ldd_output | grep -q libnl-3.so], - [AS_IF([test $libnl_version -eq 1], - [AC_MSG_WARN([lib $1 requires both libnl v1 and libnl v3 -- yoinks!]) - AC_MSG_WARN([This is a configuration that is known to cause run-time crashes]) - AC_MSG_ERROR([Cannot continue])]) - AS_IF([test $opal_libnl_version -eq 1], - [AC_MSG_WARN([lib nl version conflict: $opal_libnlv1_libs requires libnl whereas $1 requires libnl-3])], - [opal_libnlv3_libs="$opal_libnlv3_libs $1" - OPAL_UNIQ([opal_libnlv3_libs]) - opal_libnl_version=3])]) - rm -f conftest conftest_c.$ac_ext], - [AC_MSG_WARN([Could not link a simple program with lib $1])]) - AC_LANG_POP(C) - OPAL_VAR_SCOPE_POP([ldd_output libnl_version]) - ;; - esac + + this_requires_v1=0 + this_requires_v3=0 + result_msg= + OPAL_LOG_COMMAND([$CC -o conftest $CFLAGS $CPPFLAGS conftest_c.$ac_ext $LDFLAGS -l$1 $LIBS $3], + [ldd_output=`ldd conftest` + AS_IF([echo $ldd_output | grep -q libnl-3.so], + [this_requires_v3=1 + result_msg="v3"]) + AS_IF([echo $ldd_output | grep -q libnl.so], + [this_requires_v1=1 + result_msg="v1 $result_msg"]) + AC_MSG_RESULT([$result_msg]) + ], + [AC_MSG_WARN([Could not link a simple program with lib $1]) + ]) + + # Assume that our configuration is sane; this may get reset below + libnl_sane=1 + + # Note: in all the checks below, only add this library to the list + # of libraries (for v1 or v3 as relevant) if we do not fail. + # I.e., assume that a higher level will refuse to use this library + # if we return failure. + + # Does this library require both v1 and v3? If so, fail. + AS_IF([test $this_requires_v1 -eq 1 && test $this_requires_v3 -eq 1], + [AC_MSG_WARN([Unfortunately, lib$1 links to both libnl and libnl-3.]) + OPAL_LIBNL_SANITY_FAIL_MSG($1) + libnl_sane=0]) + + # Does this library require v1, but some prior library required + # v3? If so, fail. + AS_IF([test $libnl_sane -eq 1 && test $this_requires_v1 -eq 1], + [AS_IF([test $opal_libnl_version -eq 3], + [AC_MSG_WARN([libnl version conflict: $opal_libnlv3_libs requires libnl-3 whereas $1 requires libnl]) + OPAL_LIBNL_SANITY_FAIL_MSG($1) + libnl_sane=0], + [opal_libnlv1_libs="$opal_libnlv1_libs $1" + OPAL_UNIQ([opal_libnlv1_libs]) + opal_libnl_version=1]) + ]) + + # Does this library require v3, but some prior library required + # v1? If so, fail. + AS_IF([test $libnl_sane -eq 1 && test $this_requires_v3 -eq 1], + [AS_IF([test $opal_libnl_version -eq 1], + [AC_MSG_WARN([libnl version conflict: $opal_libnlv1_libs requires libnl whereas lib$1 requires libnl-3]) + OPAL_LIBNL_SANITY_FAIL_MSG($1) + libnl_sane=0], + [opal_libnlv3_libs="$opal_libnlv3_libs $1" + OPAL_UNIQ([opal_libnlv3_libs]) + opal_libnl_version=3]) + ]) + + AC_LANG_POP(C) + rm -f conftest conftest_c.$ac_ext + + $4=$libnl_sane + + OPAL_VAR_SCOPE_POP([ldd_output libnl_sane this_requires_v1 this_requires_v3 result_msg]) ]) dnl @@ -217,7 +300,12 @@ AC_DEFUN([OPAL_CHECK_LIBNL_V3],[ # If we found everything AS_IF([test $opal_libnlv3_happy -eq 1], [$2_LIBS="-lnl-3 -lnl-route-3" - OPAL_HAVE_LIBNL3=1]) + OPAL_HAVE_LIBNL3=1], + [# OPAL_CHECK_PACKAGE(...,nl_recvmsgs_report,...) might have set the variables below + # so reset them if libnl v3 cannot be used + $2_CPPFLAGS="" + $2_LDFLAGS="" + $2_LIBS=""]) OPAL_VAR_SCOPE_POP ]) @@ -261,7 +349,7 @@ dnl dnl Summarize libnl and libnl3 usage, dnl and abort if conflict is found dnl -dnl Print the list of libraries that use libnl, +dnl Print the list of libraries that use libnl, dnl the list of libraries that use libnl3, dnl and aborts if both libnl and libnl3 are used. dnl diff --git a/config/opal_check_ofi.m4 b/config/opal_check_ofi.m4 new file mode 100644 index 00000000000..f57cfae4e62 --- /dev/null +++ b/config/opal_check_ofi.m4 @@ -0,0 +1,111 @@ +dnl -*- shell-script -*- +dnl +dnl Copyright (c) 2015-2016 Cisco Systems, Inc. All rights reserved. +dnl Copyright (c) 2016-2017 Los Alamos National Security, LLC. All rights +dnl reserved. +dnl $COPYRIGHT$ +dnl +dnl Additional copyrights may follow +dnl +dnl $HEADER$ +dnl + + +# OPAL_CHECK_OFI(prefix, [action-if-found], [action-if-not-found] +# -------------------------------------------------------- +# Check if libfabric support can be found. +# +# Sets prefix_{CPPFLAGS, LDFLAGs, LIBS} as needed and runs +# action-if-found if there is support; otherwise executes +# action-if-not-found. +# +AC_DEFUN([OPAL_CHECK_OFI],[ + if test -z "$opal_check_libfabric_happy" ; then + OPAL_VAR_SCOPE_PUSH([opal_check_libfabric_$1_save_CPPFLAGS opal_check_libfabric_$1_save_LDFLAGS opal_check_libfabric_$1_save_LIBS]) + + # Add --with options + AC_ARG_WITH([libfabric], + [AC_HELP_STRING([--with-libfabric=DIR], + [Deprecated synonym for --with-ofi])]) + AC_ARG_WITH([libfabric-libdir], + [AC_HELP_STRING([--with-libfabric-libdir=DIR], + [Deprecated synonym for --with-ofi-libdir])]) + + AC_ARG_WITH([ofi], + [AC_HELP_STRING([--with-ofi=DIR], + [Specify location of OFI libfabric installation, adding DIR/include to the default search location for libfabric headers, and DIR/lib or DIR/lib64 to the default search location for libfabric libraries. Error if libfabric support cannot be found.])]) + + AC_ARG_WITH([ofi-libdir], + [AC_HELP_STRING([--with-ofi-libdir=DIR], + [Search for OFI libfabric libraries in DIR])]) + + if test "$with_ofi" = ""; then + with_ofi=$with_libfabric + fi + + if test "$with_ofi_libdir" = ""; then + with_ofi_libdir=$with_libfabric_libdir + fi + + # Sanity check the --with values + OPAL_CHECK_WITHDIR([ofi], [$with_ofi], + [include/rdma/fabric.h]) + OPAL_CHECK_WITHDIR([ofi-libdir], [$with_ofi_libdir], + [libfabric.*]) + + opal_check_ofi_$1_save_CPPFLAGS=$CPPFLAGS + opal_check_ofi_$1_save_LDFLAGS=$LDFLAGS + opal_check_ofi_$1_save_LIBS=$LIBS + + opal_check_ofi_happy=yes + AS_IF([test "$with_ofi" = "no"], + [opal_check_ofi_happy=no]) + + AS_IF([test $opal_check_ofi_happy = yes], + [AC_MSG_CHECKING([looking for OFI libfabric in]) + AS_IF([test "$with_ofi" != "yes"], + [opal_ofi_dir=$with_ofi + AC_MSG_RESULT([($opal_ofi_dir)])], + [AC_MSG_RESULT([(default search paths)])]) + AS_IF([test ! -z "$with_ofi_libdir" && \ + test "$with_ofi_libdir" != "yes"], + [opal_ofi_libdir=$with_ofi_libdir]) + ]) + + AS_IF([test $opal_check_ofi_happy = yes], + [OPAL_CHECK_PACKAGE([opal_check_ofi], + [rdma/fabric.h], + [fabric], + [fi_getinfo], + [], + [$opal_ofi_dir], + [$opal_ofi_libdir], + [], + [opal_check_ofi_happy=no])]) + + CPPFLAGS=$opal_check_ofi_$1_save_CPPFLAGS + LDFLAGS=$opal_check_ofi_$1_save_LDFLAGS + LIBS=$opal_check_ofi_$1_save_LIBS + + OPAL_SUMMARY_ADD([[Transports]],[[OpenFabrics Libfabric]],[$1],[$opal_check_ofi_happy]) + + OPAL_VAR_SCOPE_POP + fi + + if test $opal_check_ofi_happy = yes ; then + $1_CPPFLAGS="[$]$1_CPPFLAGS $opal_check_ofi_CPPFLAGS" + $1_LIBS="[$]$1_LIBS $opal_check_ofi_LIBS" + $1_LDFLAGS="[$]$1_LDFLAGS $opal_check_ofi_LDFLAGS" + + AC_SUBST($1_CPPFLAGS) + AC_SUBST($1_LDFLAGS) + AC_SUBST($1_LIBS) + fi + + AS_IF([test $opal_check_ofi_happy = yes], + [$2], + [AS_IF([test -n "$with_ofi" && test "$with_ofi" != "no"], + [AC_MSG_WARN([OFI libfabric support requested (via --with-ofi or --with-libfabric), but not found.]) + AC_MSG_ERROR([Cannot continue.])]) + $3]) +])dnl diff --git a/config/opal_check_openfabrics.m4 b/config/opal_check_openfabrics.m4 index 6f6e4be764c..ef1e50bf744 100644 --- a/config/opal_check_openfabrics.m4 +++ b/config/opal_check_openfabrics.m4 @@ -11,7 +11,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2006-2016 Cisco Systems, Inc. All rights reserved. -# Copyright (c) 2006-2016 Los Alamos National Security, LLC. All rights +# Copyright (c) 2006-2017 Los Alamos National Security, LLC. All rights # reserved. # Copyright (c) 2006-2009 Mellanox Technologies. All rights reserved. # Copyright (c) 2010-2012 Oracle and/or its affiliates. All rights reserved. @@ -57,7 +57,7 @@ AC_DEFUN([OPAL_CHECK_OPENFABRICS],[ # Enable padding for SPARC platforms by default because the # btl will segv otherwise. Keep padding disabled for other # platforms since there are some performance implications with - # padding on for those plaforms. + # padding on for those platforms. # case "${host}" in sparc*) @@ -301,12 +301,13 @@ AC_DEFUN([OPAL_CHECK_OPENFABRICS],[ AC_DEFUN([OPAL_CHECK_OPENFABRICS_CM_ARGS],[ # - # ConnectX XRC support + # ConnectX XRC support - disabled see issue #3890 # - AC_ARG_ENABLE([openib-connectx-xrc], - [AC_HELP_STRING([--enable-openib-connectx-xrc], - [Enable ConnectX XRC support in the openib BTL. If you do not have InfiniBand ConnectX adapters, you may disable the ConnectX XRC support. If you do not know which InfiniBand adapter is installed on your cluster, leave this option enabled (default: enabled)])], - [enable_connectx_xrc="$enableval"], [enable_connectx_xrc="yes"]) +dnl AC_ARG_ENABLE([openib-connectx-xrc], +dnl [AC_HELP_STRING([--enable-openib-connectx-xrc], +dnl [Enable ConnectX XRC support in the openib BTL. (default: disabled)])], +dnl [enable_connectx_xrc="$enableval"], [enable_connectx_xrc="no"]) + enable_connectx_xrc="no" # # Unconnect Datagram (UD) based connection manager # diff --git a/config/opal_check_os_flavors.m4 b/config/opal_check_os_flavors.m4 index e8eaba112e9..f055d949b06 100644 --- a/config/opal_check_os_flavors.m4 +++ b/config/opal_check_os_flavors.m4 @@ -22,7 +22,7 @@ AC_DEFUN([OPAL_CHECK_OS_FLAVOR_SPECIFIC], AC_MSG_CHECKING([$1]) AC_COMPILE_IFELSE([AC_LANG_PROGRAM( [[#ifndef $1 - error: this isnt $1 + error: this is not $1 #endif ]])], [opal_found_$2=yes], diff --git a/config/opal_check_package.m4 b/config/opal_check_package.m4 index 5caec7c063b..197c8b73172 100644 --- a/config/opal_check_package.m4 +++ b/config/opal_check_package.m4 @@ -10,7 +10,7 @@ dnl Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, dnl University of Stuttgart. All rights reserved. dnl Copyright (c) 2004-2005 The Regents of the University of California. dnl All rights reserved. -dnl Copyright (c) 2012-2015 Cisco Systems, Inc. All rights reserved. +dnl Copyright (c) 2012-2017 Cisco Systems, Inc. All rights reserved. dnl Copyright (c) 2012 Oracle and/or its affiliates. All rights reserved. dnl Copyright (c) 2014 Intel, Inc. All rights reserved. dnl Copyright (c) 2015-2016 Research Organization for Information Science @@ -132,16 +132,22 @@ AC_DEFUN([_OPAL_CHECK_PACKAGE_LIB], [ $1_LDFLAGS="$opal_check_package_$1_orig_LDFLAGS" unset opal_Lib])])])]) + AS_IF([test "$opal_check_package_lib_happy" = "yes"], + [ # libnl v1 and libnl3 are known to *not* coexist + # harmoniously in the same process. Check to see if this + # new package will introduce such a conflict. + OPAL_LIBNL_SANITY_CHECK([$2], [$3], [$$1_LIBS], + [opal_check_package_libnl_check_ok]) + AS_IF([test $opal_check_package_libnl_check_ok -eq 0], + [opal_check_package_lib_happy=no]) + ]) + AS_IF([test "$opal_check_package_lib_happy" = "yes"], [ # The result of AC SEARCH_LIBS is cached in $ac_cv_search_[function] AS_IF([test "$ac_cv_search_$3" != "no" && test "$ac_cv_search_$3" != "none required"], [$1_LIBS="$ac_cv_search_$3 $4"], [$1_LIBS="$4"]) - # libnl v1 and libnl3 are known *not* to coexist - # for each library, figure out whether it depends on libnl or libnl3 or none - # so conflicts can be reported and/or prevented - OPAL_LIBNL_SANITY_CHECK([$2], [$3], [$$1_LIBS]) $7], [$8]) @@ -180,7 +186,7 @@ dnl * library_name / function_name: check for function function_name in dnl -llibrary_name. Specifically, for library_name, use the "foo" form, dnl as opposed to "libfoo". dnl * extra_libraries: if the library_name you are checking for requires -dnl additonal -l arguments to link successfully, list them here. +dnl additional -l arguments to link successfully, list them here. dnl * dir_prefix: if the header/library is located in a non-standard dnl location (e.g., /opt/foo as opposed to /usr), list it here dnl * libdir_prefix: if the library is not under $dir_prefix/lib or diff --git a/config/opal_check_pmi.m4 b/config/opal_check_pmi.m4 index 7c25ab14f11..8877d672f53 100644 --- a/config/opal_check_pmi.m4 +++ b/config/opal_check_pmi.m4 @@ -13,9 +13,9 @@ # Copyright (c) 2009-2015 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2011-2014 Los Alamos National Security, LLC. All rights # reserved. -# Copyright (c) 2014-2016 Intel, Inc. All rights reserved. -# Copyright (c) 2014-2016 Research Organization for Information Science -# and Technology (RIST). All rights reserved. +# Copyright (c) 2014-2017 Intel, Inc. All rights reserved. +# Copyright (c) 2014-2018 Research Organization for Information Science +# and Technology (RIST). All rights reserved. # Copyright (c) 2016 IBM Corporation. All rights reserved. # $COPYRIGHT$ # @@ -39,18 +39,35 @@ AC_DEFUN([OPAL_CHECK_PMI_LIB], opal_check_$3_mycppflags= # check for the header - AC_MSG_CHECKING([for $3.h in $1/include]) - AS_IF([test -f $1/include/$3.h], - [AC_MSG_RESULT([found]) - opal_check_$3_mycppflags="-I$1/include"], - [AC_MSG_RESULT([not found]) - AC_MSG_CHECKING([for $3.h in $1/include/slurm]) - AS_IF([test -f $1/include/slurm/$3.h], + AS_IF([test -n "$1"], + [AC_MSG_CHECKING([for $3.h in $1]) + AS_IF([test -f $1/$3.h && test -r $1/$3.h], [AC_MSG_RESULT([found]) - opal_check_$3_mycppflags="-I$1/include/slurm" - $5], + opal_check_$3_mycppflags="-I$1"], [AC_MSG_RESULT([not found]) - opal_check_$3_hdr_happy=no])]) + AC_MSG_CHECKING([for $3.h in $1/include]) + AS_IF([test -f $1/include/$3.h && test -r $1/include/$3.h], + [AC_MSG_RESULT([found]) + opal_check_$3_mycppflags="-I$1/include"], + [AC_MSG_RESULT([not found]) + AC_MSG_CHECKING([for $3.h in $1/include/slurm]) + AS_IF([test -f $1/include/slurm/$3.h && test -r $1/include/slurm/$3.h], + [AC_MSG_RESULT([found]) + opal_check_$3_mycppflags="-I$1/include/slurm" + $5], + [AC_MSG_RESULT([not found]) + opal_check_$3_hdr_happy=no])])])], + [AC_MSG_CHECKING([for $3.h in /usr/include]) + AS_IF([test -f /usr/include/$3.h && test -r /usr/include/$3.h], + [AC_MSG_RESULT([found])], + [AC_MSG_RESULT([not found]) + AC_MSG_CHECKING([for $3.h in /usr/include/slurm]) + AS_IF([test -f /usr/include/slurm/$3.h && test -r /usr/include/slurm/$3.h], + [AC_MSG_RESULT([found]) + opal_check_$3_mycppflags="-I/usr/include/slurm" + $5], + [AC_MSG_RESULT([not found]) + opal_check_$3_hdr_happy=no])])]) AS_IF([test "$opal_check_$3_hdr_happy" != "no"], [CPPFLAGS="$CPPFLAGS $opal_check_$3_mycppflags" @@ -65,51 +82,47 @@ AC_DEFUN([OPAL_CHECK_PMI_LIB], # check for the library in the given location in case # an exact path was given - AC_MSG_CHECKING([for lib$3 in $2]) - files=`ls $2/lib$3.* 2> /dev/null | wc -l` - AS_IF([test "$files" -gt "0"], - [AC_MSG_RESULT([found]) - LDFLAGS="$LDFLAGS -L$2" - AC_CHECK_LIB([$3], [$4], - [opal_check_$3_lib_happy=yes - $3_LDFLAGS=-L$2 - $3_rpath=$2], + AS_IF([test -z "$1" && test -z "$2"], + [AC_CHECK_LIB([$3], [$4], + [opal_check_$3_lib_happy=yes], [opal_check_$3_lib_happy=no])], - [opal_check_$3_lib_happy=no - AC_MSG_RESULT([not found])]) - - # check for presence of lib64 directory - if found, see if the - # desired library is present and matches our build requirements - files=`ls $2/lib64/lib$3.* 2> /dev/null | wc -l` - AS_IF([test "$opal_check_$3_lib_happy" != "yes"], - [AC_MSG_CHECKING([for lib$3 in $2/lib64]) - AS_IF([test "$files" -gt "0"], - [AC_MSG_RESULT([found]) - LDFLAGS="$LDFLAGS -L$2/lib64" - AC_CHECK_LIB([$3], [$4], - [opal_check_$3_lib_happy=yes - $3_LDFLAGS=-L$2/lib64 - $3_rpath=$2/lib64], - [opal_check_$3_lib_happy=no])], - [opal_check_$3_lib_happy=no - AC_MSG_RESULT([not found])])]) - - - # if we didn't find lib64, or the library wasn't present or correct, - # then try a lib directory if present - files=`ls $2/lib/lib$3.* 2> /dev/null | wc -l` - AS_IF([test "$opal_check_$3_lib_happy" != "yes"], - [AC_MSG_CHECKING([for lib$3 in $2/lib]) - AS_IF([test "$files" -gt "0"], - [AC_MSG_RESULT([found]) - LDFLAGS="$LDFLAGS -L$2/lib" - AC_CHECK_LIB([$3], [$4], - [opal_check_$3_lib_happy=yes - $3_LDFLAGS=-L$2/lib - $3_rpath=$2/lib], - [opal_check_$3_lib_happy=no])], - [opal_check_$3_lib_happy=no - AC_MSG_RESULT([not found])])]) + [AS_IF([test -n "$2"], + [AC_MSG_CHECKING([for lib$3 in $2]) + files=`ls $2/lib$3.* 2> /dev/null | wc -l` + AS_IF([test "$files" -gt "0"], + [AC_MSG_RESULT([found]) + LDFLAGS="$LDFLAGS -L$2" + AC_CHECK_LIB([$3], [$4], + [opal_check_$3_lib_happy=yes + $3_LDFLAGS=-L$2 + $3_rpath=$2], + [opal_check_$3_lib_happy=no])], + [opal_check_$3_lib_happy=no + AC_MSG_RESULT([not found])])], + [AC_MSG_CHECKING([for lib$3 in $1/lib]) + files=`ls $1/lib/lib$3.* 2> /dev/null | wc -l` + AS_IF([test "$files" -gt "0"], + [AC_MSG_RESULT([found]) + LDFLAGS="$LDFLAGS -L$1/lib" + AC_CHECK_LIB([$3], [$4], + [opal_check_$3_lib_happy=yes + $3_LDFLAGS=-L$1/lib + $3_rpath=$1/lib], + [opal_check_$3_lib_happy=no])], + [# check for presence of lib64 directory - if found, see if the + # desired library is present and matches our build requirements + AC_MSG_CHECKING([for lib$3 in $1/lib64]) + files=`ls $1/lib64/lib$3.* 2> /dev/null | wc -l` + AS_IF([test "$files" -gt "0"], + [AC_MSG_RESULT([found]) + LDFLAGS="$LDFLAGS -L$1/lib64" + AC_CHECK_LIB([$3], [$4], + [opal_check_$3_lib_happy=yes + $3_LDFLAGS=-L$1/lib64 + $3_rpath=$1/lib64], + [opal_check_$3_lib_happy=no])], + [opal_check_$3_lib_happy=no + AC_MSG_RESULT([not found])])])])]) # restore flags CPPFLAGS=$opal_check_$3_save_CPPFLAGS @@ -124,7 +137,7 @@ AC_DEFUN([OPAL_CHECK_PMI_LIB], # OPAL_CHECK_PMI() # -------------------------------------------------------- AC_DEFUN([OPAL_CHECK_PMI],[ - OPAL_VAR_SCOPE_PUSH([check_pmi_install_dir check_pmi_lib_dir default_pmi_loc default_pmi_libloc slurm_pmi_found]) + OPAL_VAR_SCOPE_PUSH([check_pmi_install_dir check_pmi_lib_dir default_pmi_libloc slurm_pmi_found]) AC_ARG_WITH([pmi], [AC_HELP_STRING([--with-pmi(=DIR)], @@ -132,12 +145,11 @@ AC_DEFUN([OPAL_CHECK_PMI],[ [], with_pmi=no) AC_ARG_WITH([pmi-libdir], - [AC_HELP_STRING([--with-pmi-libdir(=DIR)], - [Look for libpmi or libpmi2 in the given directory, DIR/lib or DIR/lib64])]) + [AC_HELP_STRING([--with-pmi-libdir=DIR], + [Look for libpmi or libpmi2 in the given directory DIR, DIR/lib or DIR/lib64])]) check_pmi_install_dir= check_pmi_lib_dir= - default_pmi_loc= default_pmi_libloc= slurm_pmi_found= @@ -149,18 +161,10 @@ AC_DEFUN([OPAL_CHECK_PMI],[ # cannot use OPAL_CHECK_PACKAGE as its backend header # support appends "include" to the path, which won't # work with slurm :-( - AS_IF([test ! -z "$with_pmi" && test "$with_pmi" != "yes"], - [check_pmi_install_dir=$with_pmi - default_pmi_loc=no], - [check_pmi_install_dir=/usr - default_pmi_loc=yes]) - AS_IF([test ! -z "$with_pmi_libdir"], - [check_pmi_lib_dir=$with_pmi_libdir - default_pmi_libloc=no], - [check_pmi_lib_dir=$check_pmi_install_dir - AS_IF([test "$default_pmi_loc" = "no"], - [default_pmi_libloc=no], - [default_pmi_libloc=yes])]) + AS_IF([test -n "$with_pmi" && test "$with_pmi" != "yes"], + [check_pmi_install_dir=$with_pmi]) + AS_IF([test -n "$with_pmi_libdir"], + [check_pmi_lib_dir=$with_pmi_libdir]) # check for pmi-1 lib */ slurm_pmi_found=no @@ -174,10 +178,10 @@ AC_DEFUN([OPAL_CHECK_PMI],[ [opal_enable_pmi1=no]) AS_IF([test "$opal_enable_pmi1" = "yes"], - [AS_IF([test "$default_pmi_loc" = "no" || test "$slurm_pmi_found" = "yes"], + [AS_IF([test "$slurm_pmi_found" = "yes"], [opal_pmi1_CPPFLAGS="$pmi_CPPFLAGS" AC_SUBST(opal_pmi1_CPPFLAGS)]) - AS_IF([test "$default_pmi_libloc" = "no" || test "$slurm_pmi_found" = "yes"], + AS_IF([test "$slurm_pmi_found" = "yes"], [opal_pmi1_LDFLAGS="$pmi_LDFLAGS" AC_SUBST(opal_pmi1_LDFLAGS) opal_pmi1_rpath="$pmi_rpath" @@ -195,10 +199,10 @@ AC_DEFUN([OPAL_CHECK_PMI],[ [opal_enable_pmi2=no]) AS_IF([test "$opal_enable_pmi2" = "yes"], - [AS_IF([test "$default_pmi_loc" = "no" || test "$slurm_pmi_found" = "yes"], + [AS_IF([test "$slurm_pmi_found" = "yes"], [opal_pmi2_CPPFLAGS="$pmi2_CPPFLAGS" AC_SUBST(opal_pmi2_CPPFLAGS)]) - AS_IF([test "$default_pmi_libloc" = "no" || test "$slurm_pmi_found" = "yes"], + AS_IF([test "$slurm_pmi_found" = "yes"], [opal_pmi2_LDFLAGS="$pmi2_LDFLAGS" AC_SUBST(opal_pmi2_LDFLAGS) opal_pmi2_rpath="$pmi2_rpath" @@ -233,6 +237,10 @@ AC_DEFUN([OPAL_CHECK_PMIX],[ [AC_HELP_STRING([--with-pmix(=DIR)], [Build PMIx support. DIR can take one of three values: "internal", "external", or a valid directory name. "internal" (or no DIR value) forces Open MPI to use its internal copy of PMIx. "external" forces Open MPI to use an external installation of PMIx. Supplying a valid directory name also forces Open MPI to use an external installation of PMIx, and adds DIR/include, DIR/lib, and DIR/lib64 to the search path for headers and libraries. Note that Open MPI does not support --without-pmix.])]) + AC_ARG_WITH([pmix-libdir], + [AC_HELP_STRING([--with-pmix-libdir=DIR], + [Look for libpmix the given directory DIR, DIR/lib or DIR/lib64])]) + AS_IF([test "$with_pmix" = "no"], [AC_MSG_WARN([Open MPI requires PMIx support. It can be built]) AC_MSG_WARN([with either its own internal copy of PMIx, or with]) @@ -240,9 +248,12 @@ AC_DEFUN([OPAL_CHECK_PMIX],[ AC_MSG_ERROR([Cannot continue])]) AC_MSG_CHECKING([if user requested external PMIx support($with_pmix)]) + opal_prun_happy=no + opal_external_have_pmix1=0 AS_IF([test -z "$with_pmix" || test "$with_pmix" = "yes" || test "$with_pmix" = "internal"], [AC_MSG_RESULT([no]) - opal_external_pmix_happy=no], + opal_external_pmix_happy=no + opal_prun_happy=yes], [AC_MSG_RESULT([yes]) # check for external pmix lib */ @@ -252,7 +263,34 @@ AC_DEFUN([OPAL_CHECK_PMIX],[ # Make sure we have the headers and libs in the correct location OPAL_CHECK_WITHDIR([external-pmix], [$pmix_ext_install_dir/include], [pmix.h]) - OPAL_CHECK_WITHDIR([external-libpmix], [$pmix_ext_install_dir/lib], [libpmix.*]) + + AS_IF([test -n "$with_pmix_libdir"], + [AC_MSG_CHECKING([libpmix.* in $with_pmix_libdir]) + files=`ls $with_pmix_libdir/libpmix.* 2> /dev/null | wc -l` + AS_IF([test "$files" -gt 0], + [pmix_ext_install_libdir=$with_pmix_libdir], + [AC_MSG_CHECKING([libpmix.* in $with_pmix_libdir/lib64]) + files=`ls $with_pmix_libdir/lib64/libpmix.* 2> /dev/null | wc -l` + AS_IF([test "$files" -gt 0], + [pmix_ext_install_libdir=$with_pmix_libdir/lib64], + [AC_MSG_CHECKING([libpmix.* in $with_pmix_libdir/lib]) + files=`ls $with_pmix_libdir/lib/libpmix.* 2> /dev/null | wc -l` + AS_IF([test "$files" -gt 0], + [pmix_ext_install_libdir=$with_pmix_libdir/lib], + [AC_MSG_RESULT([not found]) + AC_MSG_ERROR([Cannot continue])])])])], + [# check for presence of lib64 directory - if found, see if the + # desired library is present and matches our build requirements + AC_MSG_CHECKING([libpmix.* in $pmix_ext_install_dir/lib64]) + files=`ls $pmix_ext_install_dir/lib64/libpmix.* 2> /dev/null | wc -l` + AS_IF([test "$files" -gt 0], + [pmix_ext_install_libdir=$pmix_ext_install_dir/lib64], + [AC_MSG_CHECKING([libpmix.* in $pmix_ext_install_dir/lib]) + files=`ls $pmix_ext_install_dir/lib/libpmix.* 2> /dev/null | wc -l` + AS_IF([test "$files" -gt 0], + [pmix_ext_install_libdir=$pmix_ext_install_dir/lib], + [AC_MSG_RESULT([not found]) + AC_MSG_ERROR([Cannot continue])])])]) # check the version opal_external_pmix_save_CPPFLAGS=$CPPFLAGS @@ -266,7 +304,8 @@ AC_DEFUN([OPAL_CHECK_PMIX],[ AS_IF([test "x`ls $pmix_ext_install_dir/include/pmix_version.h 2> /dev/null`" = "x"], [AC_MSG_RESULT([version file not found - assuming v1.1.4]) opal_external_pmix_version_found=1 - opal_external_pmix_version=114], + opal_external_pmix_version=114 + opal_external_have_pmix1=1], [AC_MSG_RESULT([version file found]) opal_external_pmix_version_found=0]) @@ -295,7 +334,8 @@ AC_DEFUN([OPAL_CHECK_PMIX],[ ], [])], [AC_MSG_RESULT([found]) opal_external_pmix_version=2x - opal_external_pmix_version_found=1], + opal_external_pmix_version_found=1 + opal_prun_happy=yes], [AC_MSG_RESULT([not found])])]) AS_IF([test "$opal_external_pmix_version_found" = "0"], @@ -308,7 +348,8 @@ AC_DEFUN([OPAL_CHECK_PMIX],[ ], [])], [AC_MSG_RESULT([found]) opal_external_pmix_version=1x - opal_external_pmix_version_found=1], + opal_external_pmix_version_found=1 + opal_external_have_pmix1=1], [AC_MSG_RESULT([not found])])]) AS_IF([test "x$opal_external_pmix_version" = "x"], @@ -321,10 +362,14 @@ AC_DEFUN([OPAL_CHECK_PMIX],[ LDFLAGS=$opal_external_pmix_save_LDFLAGS LIBS=$opal_external_pmix_save_LIBS - opal_external_pmix_CPPFLAGS="-I$pmix_ext_install_dir/include" - opal_external_pmix_LDFLAGS=-L$pmix_ext_install_dir/lib + AS_IF([test "$pmix_ext_install_dir" != "/usr"], + [opal_external_pmix_CPPFLAGS="-I$pmix_ext_install_dir/include" + opal_external_pmix_LDFLAGS=-L$pmix_ext_install_libdir]) opal_external_pmix_LIBS=-lpmix opal_external_pmix_happy=yes]) + AC_DEFINE_UNQUOTED([OPAL_PMIX_V1],[$opal_external_have_pmix1], + [Whether the external PMIx library is v1]) + AM_CONDITIONAL([OPAL_WANT_PRUN], [test "$opal_prun_happy" = "yes"]) OPAL_VAR_SCOPE_POP ]) diff --git a/config/opal_check_pthread_pids.m4 b/config/opal_check_pthread_pids.m4 index cb3b20a85e5..1cdf5b9c437 100644 --- a/config/opal_check_pthread_pids.m4 +++ b/config/opal_check_pthread_pids.m4 @@ -69,7 +69,7 @@ void *checkpid(void *arg) { [MSG=no OPAL_THREADS_HAVE_DIFFERENT_PIDS=0], [MSG=yes OPAL_THREADS_HAVE_DIFFERENT_PIDS=1], [ - # If we're cross compiling, we can't do another AC_* function here beause + # If we're cross compiling, we can't do another AC_* function here because # it we haven't displayed the result from the last one yet. So defer # another test until below. OPAL_THREADS_HAVE_DIFFERENT_PIDS= diff --git a/config/opal_check_vendor.m4 b/config/opal_check_vendor.m4 index c227c2a347f..056d9397592 100644 --- a/config/opal_check_vendor.m4 +++ b/config/opal_check_vendor.m4 @@ -12,6 +12,7 @@ dnl Copyright (c) 2004-2005 The Regents of the University of California. dnl All rights reserved. dnl Copyright (c) 2012 Oracle and/or its affiliates. All rights reserved. dnl Copyright (c) 2014 Intel, Inc. All rights reserved +dnl Copyright (c) 2017 IBM Corporation. All rights reserved. dnl $COPYRIGHT$ dnl dnl Additional copyrights may follow @@ -114,6 +115,18 @@ AC_DEFUN([_OPAL_CHECK_COMPILER_VENDOR], [ [OPAL_IF_IFELSE([defined(__FUJITSU)], [opal_check_compiler_vendor_result="fujitsu"])]) + # IBM XL C/C++ + AS_IF([test "$opal_check_compiler_vendor_result" = "unknown"], + [OPAL_IF_IFELSE([defined(__xlC__) || defined(__IBMC__) || defined(__IBMCPP__)], + [opal_check_compiler_vendor_result="ibm" + xlc_major_version=`$CC -qversion 2>&1 | tail -n 1 | cut -d ' ' -f 2 | cut -d '.' -f 1` + xlc_minor_version=`$CC -qversion 2>&1 | tail -n 1 | cut -d ' ' -f 2 | cut -d '.' -f 2` + AS_IF([ (test "$xlc_major_version" -lt "13" ) || (test "$xlc_major_version" -eq "13" && test "$xlc_minor_version" -lt "1" )], + [AC_MSG_ERROR(["XL Compiler versions less than 13.1 not supported. Detected $xlc_major_version.$xlc_minor_version"])]) + ], + [OPAL_IF_IFELSE([defined(_AIX) && !defined(__GNUC__)], + [opal_check_compiler_vendor_result="ibm"])])]) + # GNU AS_IF([test "$opal_check_compiler_vendor_result" = "unknown"], [OPAL_IFDEF_IFELSE([__GNUC__], @@ -131,7 +144,7 @@ AC_DEFUN([_OPAL_CHECK_COMPILER_VENDOR], [ AC_MSG_WARN([Detected gccfss being used to compile Open MPI.]) AC_MSG_WARN([Because of several issues Open MPI does not support]) AC_MSG_WARN([the gccfss compiler. Please use a different compiler.]) - AC_MSG_WARN([If you didn't think you used gccfss you may want to]) + AC_MSG_WARN([If you did not think you used gccfss you may want to]) AC_MSG_WARN([check to see if the compiler you think you used is]) AC_MSG_WARN([actually a link to gccfss.]) AC_MSG_ERROR([Cannot continue]) @@ -181,13 +194,6 @@ AC_DEFUN([_OPAL_CHECK_COMPILER_VENDOR], [ [OPAL_IF_IFELSE([defined(__HP_cc) || defined(__HP_aCC)], [opal_check_compiler_vendor_result="hp"])]) - # IBM XL C/C++ - AS_IF([test "$opal_check_compiler_vendor_result" = "unknown"], - [OPAL_IF_IFELSE([defined(__xlC__) || defined(__IBMC__) || defined(__IBMCPP__)], - [opal_check_compiler_vendor_result="ibm"], - [OPAL_IF_IFELSE([defined(_AIX) && !defined(__GNUC__)], - [opal_check_compiler_vendor_result="ibm"])])]) - # KAI C++ (rest in peace) AS_IF([test "$opal_check_compiler_vendor_result" = "unknown"], [OPAL_IFDEF_IFELSE([__KCC], diff --git a/config/opal_check_withdir.m4 b/config/opal_check_withdir.m4 index 7c0ffa84ffd..8ef026eb19c 100644 --- a/config/opal_check_withdir.m4 +++ b/config/opal_check_withdir.m4 @@ -8,6 +8,7 @@ dnl reserved. dnl Copyright (c) 2008-2009 Cisco Systems, Inc. All rights reserved. dnl Copyright (c) 2015 Research Organization for Information Science dnl and Technology (RIST). All rights reserved. +dnl Copyright (c) 2017 IBM Corporation. All rights reserved. dnl $COPYRIGHT$ dnl dnl Additional copyrights may follow @@ -19,17 +20,21 @@ dnl # ---------------------------------------------------- AC_DEFUN([OPAL_CHECK_WITHDIR],[ AC_MSG_CHECKING([--with-$1 value]) - AS_IF([test "$2" = "yes" || test "$2" = "no" || test "x$2" = "x"], - [AC_MSG_RESULT([simple ok (unspecified)])], - [AS_IF([test ! -d "$2"], - [AC_MSG_RESULT([not found]) - AC_MSG_WARN([Directory $2 not found]) - AC_MSG_ERROR([Cannot continue])], - [AS_IF([test "x`ls $2/$3 2> /dev/null`" = "x"], + AS_IF([test "$2" = "no" ], + [AC_MSG_RESULT([simple no (specified --without-$1)])], + [AS_IF([test "$2" = "yes" || test "x$2" = "x"], + [AC_MSG_RESULT([simple ok (unspecified value)])], + [AS_IF([test ! -d "$2"], [AC_MSG_RESULT([not found]) - AC_MSG_WARN([Expected file $2/$3 not found]) + AC_MSG_WARN([Directory $2 not found]) AC_MSG_ERROR([Cannot continue])], - [AC_MSG_RESULT([sanity check ok ($2)])] + [AS_IF([test "x`ls $2/$3 2> /dev/null`" = "x"], + [AC_MSG_RESULT([not found]) + AC_MSG_WARN([Expected file $2/$3 not found]) + AC_MSG_ERROR([Cannot continue])], + [AC_MSG_RESULT([sanity check ok ($2)])] + ) + ] ) ] ) diff --git a/config/opal_config_asm.m4 b/config/opal_config_asm.m4 index a406c816cca..db120d409e7 100644 --- a/config/opal_config_asm.m4 +++ b/config/opal_config_asm.m4 @@ -13,7 +13,9 @@ dnl Copyright (c) 2008-2015 Cisco Systems, Inc. All rights reserved. dnl Copyright (c) 2010 Oracle and/or its affiliates. All rights reserved. dnl Copyright (c) 2015-2017 Research Organization for Information Science dnl and Technology (RIST). All rights reserved. -dnl Copyright (c) 2014-2016 Los Alamos National Security, LLC. All rights +dnl Copyright (c) 2014-2017 Los Alamos National Security, LLC. All rights +dnl reserved. +dnl Copyright (c) 2017 Amazon.com, Inc. or its affiliates. All Rights dnl reserved. dnl $COPYRIGHT$ dnl @@ -191,9 +193,15 @@ AC_DEFUN([OPAL_CHECK_GCC_BUILTIN_CSWAP_INT128], [ AC_DEFUN([OPAL_CHECK_GCC_ATOMIC_BUILTINS], [ AC_MSG_CHECKING([for __atomic builtin atomics]) - AC_TRY_LINK([long tmp, old = 0;], [__atomic_thread_fence(__ATOMIC_SEQ_CST); + AC_TRY_LINK([ +#include +uint32_t tmp, old = 0; +uint64_t tmp64, old64 = 0;], [ +__atomic_thread_fence(__ATOMIC_SEQ_CST); __atomic_compare_exchange_n(&tmp, &old, 1, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED); -__atomic_add_fetch(&tmp, 1, __ATOMIC_RELAXED);], +__atomic_add_fetch(&tmp, 1, __ATOMIC_RELAXED); +__atomic_compare_exchange_n(&tmp64, &old64, 1, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED); +__atomic_add_fetch(&tmp64, 1, __ATOMIC_RELAXED);], [AC_MSG_RESULT([yes]) $1], [AC_MSG_RESULT([no]) @@ -524,7 +532,7 @@ dnl OPAL_CHECK_ASM_TYPE dnl dnl Sets OPAL_ASM_TYPE to the prefix for the function type to dnl set a symbol's type as function (needed on ELF for shared -dnl libaries). If no .type directive is needed, sets OPAL_ASM_TYPE +dnl libraries). If no .type directive is needed, sets OPAL_ASM_TYPE dnl to an empty string dnl dnl We look for @ \# % @@ -823,7 +831,7 @@ dnl dnl Check if the compiler is capable of doing GCC-style inline dnl assembly. Some compilers emit a warning and ignore the inline dnl assembly (xlc on OS X) and compile without error. Therefore, -dnl the test attempts to run the emited code to check that the +dnl the test attempts to run the emitted code to check that the dnl assembly is actually run. To run this test, one argument to dnl the macro must be an assembly instruction in gcc format to move dnl the value 0 into the register containing the variable ret. @@ -876,6 +884,7 @@ return ret; if test "$asm_result" = "yes" ; then OPAL_C_GCC_INLINE_ASSEMBLY=1 + opal_cv_asm_inline_supported="yes" else OPAL_C_GCC_INLINE_ASSEMBLY=0 fi @@ -887,70 +896,6 @@ return ret; unset OPAL_C_GCC_INLINE_ASSEMBLY assembly asm_result ])dnl - -dnl ################################################################# -dnl -dnl OPAL_CHECK_INLINE_DEC -dnl -dnl DEFINE OPAL_DEC to 0 or 1 depending on DEC -dnl support -dnl -dnl ################################################################# -AC_DEFUN([OPAL_CHECK_INLINE_C_DEC],[ - - AC_MSG_CHECKING([if $CC supports DEC inline assembly]) - - AC_LINK_IFELSE([AC_LANG_PROGRAM([ -AC_INCLUDES_DEFAULT -#include ], -[[asm(""); -return 0;]])], - [asm_result="yes"], [asm_result="no"]) - - AC_MSG_RESULT([$asm_result]) - - if test "$asm_result" = "yes" ; then - OPAL_C_DEC_INLINE_ASSEMBLY=1 - else - OPAL_C_DEC_INLINE_ASSEMBLY=0 - fi - - AC_DEFINE_UNQUOTED([OPAL_C_DEC_INLINE_ASSEMBLY], - [$OPAL_C_DEC_INLINE_ASSEMBLY], - [Whether C compiler supports DEC style inline assembly]) - - unset OPAL_C_DEC_INLINE_ASSEMBLY asm_result -])dnl - - -dnl ################################################################# -dnl -dnl OPAL_CHECK_INLINE_XLC -dnl -dnl DEFINE OPAL_XLC to 0 or 1 depending on XLC -dnl support -dnl -dnl ################################################################# -AC_DEFUN([OPAL_CHECK_INLINE_C_XLC],[ - - AC_MSG_CHECKING([if $CC supports XLC inline assembly]) - - OPAL_C_XLC_INLINE_ASSEMBLY=0 - asm_result="no" - if test "$CC" = "xlc" ; then - OPAL_XLC_INLINE_ASSEMBLY=1 - asm_result="yes" - fi - - AC_MSG_RESULT([$asm_result]) - AC_DEFINE_UNQUOTED([OPAL_C_XLC_INLINE_ASSEMBLY], - [$OPAL_C_XLC_INLINE_ASSEMBLY], - [Whether C compiler supports XLC style inline assembly]) - - unset OPAL_C_XLC_INLINE_ASSEMBLY -])dnl - - dnl ################################################################# dnl dnl OPAL_CONFIG_ASM @@ -968,24 +913,15 @@ AC_DEFUN([OPAL_CONFIG_ASM],[ AC_ARG_ENABLE([builtin-atomics], [AC_HELP_STRING([--enable-builtin-atomics], - [Enable use of __sync builtin atomics (default: enabled)])], - [], [enable_builtin_atomics="yes"]) - AC_ARG_ENABLE([osx-builtin-atomics], - [AC_HELP_STRING([--enable-osx-builtin-atomics], - [Enable use of OSX builtin atomics (default: enabled)])], - [], [enable_osx_builtin_atomics="yes"]) + [Enable use of __sync builtin atomics (default: enabled)])]) opal_cv_asm_builtin="BUILTIN_NO" - if test "$opal_cv_asm_builtin" = "BUILTIN_NO" && test "$enable_builtin_atomics" = "yes" ; then - OPAL_CHECK_GCC_ATOMIC_BUILTINS([opal_cv_asm_builtin="BUILTIN_GCC"], []) - fi - if test "$opal_cv_asm_builtin" = "BUILTIN_NO" && test "$enable_builtin_atomics" = "yes" ; then - OPAL_CHECK_SYNC_BUILTINS([opal_cv_asm_builtin="BUILTIN_SYNC"], []) - fi - if test "$opal_cv_asm_builtin" = "BUILTIN_NO" && test "$enable_osx_builtin_atomics" = "yes" ; then - AC_CHECK_HEADER([libkern/OSAtomic.h], - [opal_cv_asm_builtin="BUILTIN_OSX"]) - fi + AS_IF([test "$opal_cv_asm_builtin" = "BUILTIN_NO" && test "$enable_builtin_atomics" != "no"], + [OPAL_CHECK_GCC_ATOMIC_BUILTINS([opal_cv_asm_builtin="BUILTIN_GCC"], [])]) + AS_IF([test "$opal_cv_asm_builtin" = "BUILTIN_NO" && test "$enable_builtin_atomics" != "no"], + [OPAL_CHECK_SYNC_BUILTINS([opal_cv_asm_builtin="BUILTIN_SYNC"], [])]) + AS_IF([test "$opal_cv_asm_builtin" = "BUILTIN_NO" && test "$enable_builtin_atomics" = "yes"], + [AC_MSG_ERROR([__sync builtin atomics requested but not found.])]) OPAL_CHECK_ASM_PROC OPAL_CHECK_ASM_TEXT @@ -1008,7 +944,7 @@ AC_DEFUN([OPAL_CONFIG_ASM],[ OPAL_ASM_SUPPORT_64BIT=1 OPAL_GCC_INLINE_ASSIGN='"xaddl %1,%0" : "=m"(ret), "+r"(negone) : "m"(ret)' ;; - i?86-*|x86_64*) + i?86-*|x86_64*|amd64*) if test "$ac_cv_sizeof_long" = "4" ; then opal_cv_asm_arch="IA32" else @@ -1021,8 +957,8 @@ AC_DEFUN([OPAL_CONFIG_ASM],[ ia64-*) opal_cv_asm_arch="IA64" - OPAL_ASM_SUPPORT_64BIT=1 - OPAL_GCC_INLINE_ASSIGN='"mov %0=r0\n;;\n" : "=&r"(ret)' + OPAL_CHECK_SYNC_BUILTINS([opal_cv_asm_builtin="BUILTIN_SYNC"], + [AC_MSG_ERROR([No atomic primitives available for $host])]) ;; aarch64*) opal_cv_asm_arch="ARM64" @@ -1055,20 +991,16 @@ AC_DEFUN([OPAL_CONFIG_ASM],[ armv5*linux*|armv4*linux*|arm-*-linux-gnueabi) # uses Linux kernel helpers for some atomic operations opal_cv_asm_arch="ARM" - OPAL_ASM_SUPPORT_64BIT=0 - OPAL_ASM_ARM_VERSION=5 - CCASFLAGS="$CCASFLAGS -march=armv7-a" - AC_DEFINE_UNQUOTED([OPAL_ASM_ARM_VERSION], [$OPAL_ASM_ARM_VERSION], - [What ARM assembly version to use]) - OPAL_GCC_INLINE_ASSIGN='"mov %0, #0" : "=&r"(ret)' + OPAL_CHECK_SYNC_BUILTINS([opal_cv_asm_builtin="BUILTIN_SYNC"], + [AC_MSG_ERROR([No atomic primitives available for $host])]) ;; mips-*|mips64*) # Should really find some way to make sure that we are on # a MIPS III machine (r4000 and later) opal_cv_asm_arch="MIPS" - OPAL_ASM_SUPPORT_64BIT=1 - OPAL_GCC_INLINE_ASSIGN='"or %0,[$]0,[$]0" : "=&r"(ret)' + OPAL_CHECK_SYNC_BUILTINS([opal_cv_asm_builtin="BUILTIN_SYNC"], + [AC_MSG_ERROR([No atomic primitives available for $host])]) ;; powerpc-*|powerpc64-*|powerpcle-*|powerpc64le-*|rs6000-*|ppc-*) @@ -1089,7 +1021,19 @@ AC_DEFUN([OPAL_CONFIG_ASM],[ fi OPAL_GCC_INLINE_ASSIGN='"1: li %0,0" : "=&r"(ret)' ;; - + # There is no current difference between s390 and s390x + # But use two different defines in case some come later + # as s390 is 31bits while s390x is 64bits + s390-*) + opal_cv_asm_arch="S390" + OPAL_CHECK_SYNC_BUILTINS([opal_cv_asm_builtin="BUILTIN_SYNC"], + [AC_MSG_ERROR([No atomic primitives available for $host])]) + ;; + s390x-*) + opal_cv_asm_arch="S390X" + OPAL_CHECK_SYNC_BUILTINS([opal_cv_asm_builtin="BUILTIN_SYNC"], + [AC_MSG_ERROR([No atomic primitives available for $host])]) + ;; sparc*-*) # SPARC v9 (and above) are the only ones with 64bit support # if compiling 32 bit, see if we are v9 (aka v8plus) or @@ -1151,10 +1095,9 @@ AC_MSG_ERROR([Can not continue.]) ;; esac + opal_cv_asm_inline_supported="no" # now that we know our architecture, try to inline assemble OPAL_CHECK_INLINE_C_GCC([$OPAL_GCC_INLINE_ASSIGN]) - OPAL_CHECK_INLINE_C_DEC - OPAL_CHECK_INLINE_C_XLC # format: # config_file-text-global-label_suffix-gsym-lsym-type-size-align_log-ppc_r_reg-64_bit-gnu_stack @@ -1239,64 +1182,10 @@ AC_DEFUN([OPAL_ASM_FIND_FILE], [ AC_REQUIRE([AC_PROG_GREP]) AC_REQUIRE([AC_PROG_FGREP]) -if test "$opal_cv_asm_arch" != "WINDOWS" && test "$opal_cv_asm_builtin" != "BUILTIN_SYNC" && test "$opal_cv_asm_builtin" != "BUILTIN_GCC" && test "$opal_cv_asm_builtin" != "BUILTIN_OSX" ; then - # see if we have a pre-built one already - AC_MSG_CHECKING([for pre-built assembly file]) - opal_cv_asm_file="" - if $GREP "$opal_cv_asm_arch" "${OPAL_TOP_SRCDIR}/opal/asm/asm-data.txt" | $FGREP "$opal_cv_asm_format" >conftest.out 2>&1 ; then - opal_cv_asm_file="`cut -f3 conftest.out`" - if test ! "$opal_cv_asm_file" = "" ; then - opal_cv_asm_file="atomic-${opal_cv_asm_file}.s" - if test -f "${OPAL_TOP_SRCDIR}/opal/asm/generated/${opal_cv_asm_file}" ; then - AC_MSG_RESULT([yes ($opal_cv_asm_file)]) - else - AC_MSG_RESULT([no ($opal_cv_asm_file not found)]) - opal_cv_asm_file="" - fi - fi - else - AC_MSG_RESULT([no (not in asm-data)]) - fi - rm -rf conftest.* - - if test "$opal_cv_asm_file" = "" ; then - # Can we generate a file? - AC_MSG_CHECKING([whether possible to generate assembly file]) - mkdir -p opal/asm/generated - opal_cv_asm_file="atomic-local.s" - opal_try='$PERL $OPAL_TOP_SRCDIR/opal/asm/generate-asm.pl $opal_cv_asm_arch "$opal_cv_asm_format" $OPAL_TOP_SRCDIR/opal/asm/base $OPAL_TOP_BUILDDIR/opal/asm/generated/$opal_cv_asm_file >conftest.out 2>&1' - if AC_TRY_EVAL(opal_try) ; then - # save the warnings - cat conftest.out >&AC_FD_CC - AC_MSG_RESULT([yes]) - else - # save output - cat conftest.out >&AC_FD_CC - opal_cv_asm_file="" - AC_MSG_RESULT([failed]) - AC_MSG_WARN([Could not build atomic operations assembly file.]) - AC_MSG_WARN([There will be no atomic operations for this build.]) - fi - fi - rm -rf conftest.* +if test "$opal_cv_asm_arch" != "WINDOWS" && test "$opal_cv_asm_builtin" != "BUILTIN_SYNC" && test "$opal_cv_asm_builtin" != "BUILTIN_GCC" && test "$opal_cv_asm_builtin" != "BUILTIN_OSX" && test "$opal_cv_asm_inline_arch" = "no" ; then + AC_MSG_ERROR([no atomic support available. exiting]) else # On windows with VC++, atomics are done with compiler primitives opal_cv_asm_file="" fi - - AC_MSG_CHECKING([for atomic assembly filename]) - if test "$opal_cv_asm_file" = "" ; then - AC_MSG_RESULT([none]) - result=0 - else - AC_MSG_RESULT([$opal_cv_asm_file]) - result=1 - fi - - AC_DEFINE_UNQUOTED([OPAL_HAVE_ASM_FILE], [$result], - [Whether there is an atomic assembly file available]) - AM_CONDITIONAL([OPAL_HAVE_ASM_FILE], [test "$result" = "1"]) - - OPAL_ASM_FILE=$opal_cv_asm_file - AC_SUBST(OPAL_ASM_FILE) ])dnl diff --git a/config/opal_config_files.m4 b/config/opal_config_files.m4 index 14aec99bbab..0978977be46 100644 --- a/config/opal_config_files.m4 +++ b/config/opal_config_files.m4 @@ -15,7 +15,6 @@ AC_DEFUN([OPAL_CONFIG_FILES],[ opal/Makefile opal/etc/Makefile opal/include/Makefile - opal/asm/Makefile opal/datatype/Makefile opal/util/Makefile opal/util/keyval/Makefile diff --git a/config/opal_configure_options.m4 b/config/opal_configure_options.m4 index c7f6e7b4288..43fcaf3469d 100644 --- a/config/opal_configure_options.m4 +++ b/config/opal_configure_options.m4 @@ -10,7 +10,7 @@ dnl Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, dnl University of Stuttgart. All rights reserved. dnl Copyright (c) 2004-2005 The Regents of the University of California. dnl All rights reserved. -dnl Copyright (c) 2006-2016 Cisco Systems, Inc. All rights reserved. +dnl Copyright (c) 2006-2017 Cisco Systems, Inc. All rights reserved dnl Copyright (c) 2007 Sun Microsystems, Inc. All rights reserved. dnl Copyright (c) 2009 IBM Corporation. All rights reserved. dnl Copyright (c) 2009 Los Alamos National Security, LLC. All rights @@ -275,8 +275,8 @@ AC_ARG_ENABLE([dlopen], Disabling dlopen implies --disable-mca-dso. (default: enabled)])]) if test "$enable_dlopen" = "no" ; then - enable_mca_dso="no" - enable_mca_static="yes" + enable_mca_dso=no + enable_mca_static=yes OPAL_ENABLE_DLOPEN_SUPPORT=0 AC_MSG_RESULT([no]) else @@ -286,6 +286,34 @@ fi AC_DEFINE_UNQUOTED(OPAL_ENABLE_DLOPEN_SUPPORT, $OPAL_ENABLE_DLOPEN_SUPPORT, [Whether we want to enable dlopen support]) + +# +# Do we want to show component load error messages by default? +# + +AC_MSG_CHECKING([for default value of mca_base_component_show_load_errors]) +AC_ARG_ENABLE([show-load-errors-by-default], + [AC_HELP_STRING([--enable-show-load-errors-by-default], + [Set the default value for the MCA parameter + mca_base_component_show_load_errors (but can be + overridden at run time by the usual + MCA-variable-setting mechansism). This MCA variable + controls whether warnings are displayed when an MCA + component fails to load at run time due to an error. + (default: enabled, meaning that + mca_base_component_show_load_errors is enabled + by default])]) +if test "$enable_show_load_errors_by_default" = "no" ; then + OPAL_SHOW_LOAD_ERRORS_DEFAULT=0 + AC_MSG_RESULT([disabled by default]) +else + OPAL_SHOW_LOAD_ERRORS_DEFAULT=1 + AC_MSG_RESULT([enabled by default]) +fi +AC_DEFINE_UNQUOTED(OPAL_SHOW_LOAD_ERRORS_DEFAULT, $OPAL_SHOW_LOAD_ERRORS_DEFAULT, + [Default value for mca_base_component_show_load_errors MCA variable]) + + # # Heterogeneous support # @@ -374,7 +402,7 @@ AM_CONDITIONAL([OPAL_WANT_SCRIPT_WRAPPER_COMPILERS], [test "$enable_script_wrapp # AC_ARG_ENABLE([per-user-config-files], [AC_HELP_STRING([--enable-per-user-config-files], - [Disable per-user configuration files, to save disk accesses during job start-up. This is likely desirable for large jobs. Note that this can also be acheived by environment variables at run-time. (default: enabled)])]) + [Disable per-user configuration files, to save disk accesses during job start-up. This is likely desirable for large jobs. Note that this can also be achieved by environment variables at run-time. (default: enabled)])]) if test "$enable_per_user_config_files" = "no" ; then result=0 else diff --git a/config/opal_functions.m4 b/config/opal_functions.m4 index 62c8c6102c5..34c965df31f 100644 --- a/config/opal_functions.m4 +++ b/config/opal_functions.m4 @@ -14,7 +14,7 @@ dnl Copyright (c) 2007 Sun Microsystems, Inc. All rights reserved. dnl Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. dnl Copyright (c) 2009-2015 Cisco Systems, Inc. All rights reserved. dnl Copyright (c) 2014 Intel, Inc. All rights reserved. -dnl Copyright (c) 2015-2016 Research Organization for Information Science +dnl Copyright (c) 2015-2017 Research Organization for Information Science dnl and Technology (RIST). All rights reserved. dnl dnl $COPYRIGHT$ @@ -42,7 +42,7 @@ dnl AC_DEFUN([OPAL_CONFIGURE_SETUP],[ # Some helper script functions. Unfortunately, we cannot use $1 kinds -# of arugments here because of the m4 substitution. So we have to set +# of arguments here because of the m4 substitution. So we have to set # special variable names before invoking the function. :-\ opal_show_title() { @@ -95,7 +95,7 @@ EOF # OPAL_CONFIGURE_USER="`whoami`" -OPAL_CONFIGURE_HOST="`hostname | head -n 1`" +OPAL_CONFIGURE_HOST="`(hostname || uname -n) 2> /dev/null | sed 1q`" OPAL_CONFIGURE_DATE="`date`" OPAL_LIBNL_SANITY_INIT @@ -117,7 +117,7 @@ AC_DEFUN([OPAL_BASIC_SETUP],[ # OPAL_CONFIGURE_USER="`whoami`" -OPAL_CONFIGURE_HOST="`hostname | head -n 1`" +OPAL_CONFIGURE_HOST="`(hostname || uname -n) 2> /dev/null | sed 1q`" OPAL_CONFIGURE_DATE="`date`" # @@ -322,7 +322,7 @@ dnl ####################################################################### # OPAL_APPEND_UNIQ(variable, new_argument) # ---------------------------------------- # Append new_argument to variable if not already in variable. This assumes a -# space seperated list. +# space separated list. # # This could probably be made more efficient :(. AC_DEFUN([OPAL_APPEND_UNIQ], [ @@ -453,7 +453,7 @@ dnl ####################################################################### # - the argument does not begin with -I, -L, or -l, or # - the argument begins with -I, -L, or -l, and it's not already in variable # -# This macro assumes a space seperated list. +# This macro assumes a space separated list. AC_DEFUN([OPAL_FLAGS_APPEND_UNIQ], [ OPAL_VAR_SCOPE_PUSH([opal_tmp opal_append]) @@ -649,7 +649,7 @@ AC_DEFUN([OPAL_COMPUTE_MAX_VALUE], [ overflow=1 fi else - # stil negative. Time to give up. + # still negative. Time to give up. overflow=1 fi opal_num_bits=0 diff --git a/config/opal_mca.m4 b/config/opal_mca.m4 index a1f94bce404..1e84bb3e4b8 100644 --- a/config/opal_mca.m4 +++ b/config/opal_mca.m4 @@ -11,7 +11,9 @@ dnl University of Stuttgart. All rights reserved. dnl Copyright (c) 2004-2005 The Regents of the University of California. dnl All rights reserved. dnl Copyright (c) 2010-2016 Cisco Systems, Inc. All rights reserved. -dnl Copyright (c) 2013-2014 Intel, Inc. All rights reserved. +dnl Copyright (c) 2013-2017 Intel, Inc. All rights reserved. +dnl Copyright (c) 2018 Amazon.com, Inc. or its affiliates. +dnl All Rights reserved. dnl $COPYRIGHT$ dnl dnl Additional copyrights may follow @@ -173,6 +175,7 @@ AC_DEFUN([OPAL_MCA],[ elif test "$enable_mca_dso" = "no"; then DSO_all=0 msg=none + enable_dlopen=no else DSO_all=0 ifs_save="$IFS" @@ -224,16 +227,19 @@ AC_DEFUN([OPAL_MCA],[ # now configure all the projects, frameworks, and components. Most # of the hard stuff is in here MCA_PROJECT_SUBDIRS= + MCA_PROJECT_DIST_SUBDIRS= m4_foreach(mca_project, [mca_project_list], - [# BWB: Until projects have seperate configure scripts + [# BWB: Until projects have separate configure scripts # and can skip running all of ORTE, just avoid recursing # into orte sub directory if orte disabled if (test "mca_project" = "ompi" && test "$enable_mpi" != "no") || test "mca_project" = "opal" || test "mca_project" = "orte" || test "mca_project" = "oshmem"; then MCA_PROJECT_SUBDIRS="$MCA_PROJECT_SUBDIRS mca_project" + MCA_PROJECT_DIST_SUBDIRS="$MCA_PROJECT_DIST_SUBDIRS mca_project" fi MCA_CONFIGURE_PROJECT(mca_project)]) AC_SUBST(MCA_PROJECT_SUBDIRS) + AC_SUBST(MCA_PROJECT_DIST_SUBDIRS) m4_undefine([mca_component_configure_active]) ]) diff --git a/config/opal_setup_cc.m4 b/config/opal_setup_cc.m4 index 14e29265ba2..0d6102685a2 100644 --- a/config/opal_setup_cc.m4 +++ b/config/opal_setup_cc.m4 @@ -12,7 +12,7 @@ dnl Copyright (c) 2004-2006 The Regents of the University of California. dnl All rights reserved. dnl Copyright (c) 2007-2009 Sun Microsystems, Inc. All rights reserved. dnl Copyright (c) 2008-2015 Cisco Systems, Inc. All rights reserved. -dnl Copyright (c) 2012 Los Alamos National Security, LLC. All rights +dnl Copyright (c) 2012-2017 Los Alamos National Security, LLC. All rights dnl reserved. dnl Copyright (c) 2015 Research Organization for Information Science dnl and Technology (RIST). All rights reserved. @@ -23,6 +23,105 @@ dnl dnl $HEADER$ dnl +AC_DEFUN([OPAL_CC_HELPER],[ + OPAL_VAR_SCOPE_PUSH([opal_prog_cc_c11_helper_tmp]) + AC_MSG_CHECKING([$1]) + + opal_prog_cc_c11_helper_tmp=0 + + AC_LINK_IFELSE([AC_LANG_PROGRAM([$3],[$4])],[ + $2=yes + opal_prog_cc_c11_helper_tmp=1], [$2=no]) + + AC_DEFINE_UNQUOTED([$5], [$opal_prog_cc_c11_helper_tmp], [$6]) + + AC_MSG_RESULT([$$2]) + OPAL_VAR_SCOPE_POP +]) + + +AC_DEFUN([OPAL_PROG_CC_C11_HELPER],[ + OPAL_VAR_SCOPE_PUSH([opal_prog_cc_c11_helper_CFLAGS_save opal_prog_cc_c11_helper__Thread_local_available opal_prog_cc_c11_helper_atomic_var_available opal_prog_cc_c11_helper__Atomic_available opal_prog_cc_c11_helper__static_assert_available opal_prog_cc_c11_helper__Generic_available]) + + opal_prog_cc_c11_helper_CFLAGS_save=$CFLAGS + CFLAGS="$CFLAGS $1" + + OPAL_CC_HELPER([if $CC $1 supports C11 thread local storage], [opal_prog_cc_c11_helper__Thread_local_available], + [],[[static _Thread_local int foo = 1;++foo;]], [OPAL_C_HAVE__THREAD_LOCAL], + [Whether C compiler supports __Thread_local]) + + OPAL_CC_HELPER([if $CC $1 supports C11 atomic variables], [opal_prog_cc_c11_helper_atomic_var_available], + [[#include ]], [[static atomic_long foo = 1;++foo;]], [OPAL_C_HAVE_ATOMIC_CONV_VAR], + [Whether C compiler support atomic convenience variables in stdatomic.h]) + + OPAL_CC_HELPER([if $CC $1 supports C11 _Atomic keyword], [opal_prog_cc_c11_helper__Atomic_available], + [[#include ]],[[static _Atomic long foo = 1;++foo;]], [OPAL_C_HAVE__ATOMIC], + [Whether C compiler supports __Atomic keyword]) + + OPAL_CC_HELPER([if $CC $1 supports C11 _Generic keyword], [opal_prog_cc_c11_helper__Generic_available], + [[#define FOO(x) (_Generic (x, int: 1))]], [[static int x, y; y = FOO(x);]], [OPAL_C_HAVE__GENERIC], + [Whether C compiler supports __Generic keyword]) + + OPAL_CC_HELPER([if $CC $1 supports C11 _Static_assert], [opal_prog_cc_c11_helper__static_assert_available], + [[#include ]],[[_Static_assert(sizeof(int64_t) == 8, "WTH");]], [OPAL_C_HAVE__STATIC_ASSERT], + [Whether C compiler support _Static_assert keyword]) + + dnl At this time Open MPI only needs thread local and the atomic convenience types for C11 support. These + dnl will likely be required in the future. + AS_IF([test "x$opal_prog_cc_c11_helper__Thread_local_available" = "xyes" && test "x$opal_prog_cc_c11_helper_atomic_var_available" = "xyes"], + [$2], [$3]) + + CFLAGS=$opal_prog_cc_c11_helper_CFLAGS_save + + OPAL_VAR_SCOPE_POP +]) + +AC_DEFUN([OPAL_PROG_CC_C11],[ + OPAL_VAR_SCOPE_PUSH([opal_prog_cc_c11_flags]) + if test -z "$opal_cv_c11_supported" ; then + opal_cv_c11_supported=no + opal_cv_c11_flag_required=yes + + AC_MSG_CHECKING([if $CC requires a flag for C11]) + + AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[ +#if __STDC_VERSION__ < 201112L +#error "Without any CLI flags, this compiler does not support C11" +#endif + ]],[])], + [opal_cv_c11_flag_required=no]) + + AC_MSG_RESULT([$opal_cv_c11_flag_required]) + + if test "x$opal_cv_c11_flag_required" = "xno" ; then + AC_MSG_NOTICE([verifying $CC supports C11 without a flag]) + OPAL_PROG_CC_C11_HELPER([], [], [opal_cv_c11_flag_required=yes]) + fi + + if test "x$opal_cv_c11_flag_required" = "xyes" ; then + opal_prog_cc_c11_flags="-std=gnu11 -std=c11 -c11" + + AC_MSG_NOTICE([checking if $CC supports C11 with a flag]) + opal_cv_c11_flag= + for flag in $(echo $opal_prog_cc_c11_flags | tr ' ' '\n') ; do + OPAL_PROG_CC_C11_HELPER([$flag],[opal_cv_c11_flag=$flag],[]) + if test "x$opal_cv_c11_flag" != "x" ; then + CFLAGS="$CFLAGS $opal_cv_c11_flag" + AC_MSG_NOTICE([using $flag to enable C11 support]) + opal_cv_c11_supported=yes + break + fi + done + else + AC_MSG_NOTICE([no flag required for C11 support]) + opal_cv_c11_supported=yes + fi + fi + + OPAL_VAR_SCOPE_POP +]) + + # OPAL_SETUP_CC() # --------------- # Do everything required to setup the C compiler. Safe to AC_REQUIRE @@ -41,14 +140,25 @@ AC_DEFUN([OPAL_SETUP_CC],[ WRAPPER_CC="$CC" AC_SUBST([WRAPPER_CC]) - # From Open MPI 1.7 on we require a C99 compiant compiler - AC_PROG_CC_C99 - # The result of AC_PROG_CC_C99 is stored in ac_cv_prog_cc_c99 - if test "x$ac_cv_prog_cc_c99" = xno ; then - AC_MSG_WARN([Open MPI requires a C99 compiler]) - AC_MSG_ERROR([Aborting.]) - fi + OPAL_PROG_CC_C11 + + if test $opal_cv_c11_supported = no ; then + # It is not currently an error if C11 support is not available. Uncomment the + # following lines and update the warning when we require a C11 compiler. + # AC_MSG_WARNING([Open MPI requires a C11 (or newer) compiler]) + # AC_MSG_ERROR([Aborting.]) + # From Open MPI 1.7 on we require a C99 compiant compiler + AC_PROG_CC_C99 + # The result of AC_PROG_CC_C99 is stored in ac_cv_prog_cc_c99 + if test "x$ac_cv_prog_cc_c99" = xno ; then + AC_MSG_WARN([Open MPI requires a C99 (or newer) compiler. C11 is recommended.]) + AC_MSG_ERROR([Aborting.]) + fi + # Get the correct result for C11 support flags now that the compiler flags have + # changed + OPAL_PROG_CC_C11_HELPER([],[],[]) + fi OPAL_C_COMPILER_VENDOR([opal_c_vendor]) diff --git a/config/opal_setup_java.m4 b/config/opal_setup_java.m4 deleted file mode 100644 index 699ae780241..00000000000 --- a/config/opal_setup_java.m4 +++ /dev/null @@ -1,218 +0,0 @@ -dnl -*- shell-script -*- -dnl -dnl Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana -dnl University Research and Technology -dnl Corporation. All rights reserved. -dnl Copyright (c) 2004-2006 The University of Tennessee and The University -dnl of Tennessee Research Foundation. All rights -dnl reserved. -dnl Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, -dnl University of Stuttgart. All rights reserved. -dnl Copyright (c) 2004-2006 The Regents of the University of California. -dnl All rights reserved. -dnl Copyright (c) 2006-2012 Los Alamos National Security, LLC. All rights -dnl reserved. -dnl Copyright (c) 2007-2012 Oracle and/or its affiliates. All rights reserved. -dnl Copyright (c) 2008-2013 Cisco Systems, Inc. All rights reserved. -dnl Copyright (c) 2013 Intel, Inc. All rights reserved. -dnl Copyright (c) 2015 Research Organization for Information Science -dnl and Technology (RIST). All rights reserved. -dnl $COPYRIGHT$ -dnl -dnl Additional copyrights may follow -dnl -dnl $HEADER$ -dnl - -# This macro is necessary to get the title to be displayed first. :-) -AC_DEFUN([OPAL_SETUP_JAVA_BANNER],[ - opal_show_subtitle "Java compiler" -]) - -# OPAL_SETUP_JAVA() -# ---------------- -# Do everything required to setup the Java compiler. Safe to AC_REQUIRE -# this macro. -AC_DEFUN([OPAL_SETUP_JAVA],[ - AC_REQUIRE([OPAL_SETUP_JAVA_BANNER]) - - OPAL_VAR_SCOPE_PUSH([opal_java_bad opal_java_found opal_java_dir opal_java_jnih opal_java_PATH_save opal_java_CPPFLAGS_save]) - AC_ARG_ENABLE(java, - AC_HELP_STRING([--enable-java], - [Enable Java-based support in the system - use this option to disable all Java-based compiler tests (default: enabled)])) - - AC_ARG_WITH(jdk-dir, - AC_HELP_STRING([--with-jdk-dir(=DIR)], - [Location of the JDK header directory. If you use this option, do not specify --with-jdk-bindir or --with-jdk-headers.])) - AC_ARG_WITH(jdk-bindir, - AC_HELP_STRING([--with-jdk-bindir(=DIR)], - [Location of the JDK bin directory. If you use this option, you must also use --with-jdk-headers (and you must NOT use --with-jdk-dir)])) - AC_ARG_WITH(jdk-headers, - AC_HELP_STRING([--with-jdk-headers(=DIR)], - [Location of the JDK header directory. If you use this option, you must also use --with-jdk-bindir (and you must NOT use --with-jdk-dir)])) - - if test "$enable_java" = "no"; then - HAVE_JAVA_SUPPORT=0 - opal_java_happy=no - else - # Check for bozo case: ensure a directory was specified - AS_IF([test "$with_jdk_dir" = "yes" || test "$with_jdk_dir" = "no"], - [AC_MSG_WARN([Must specify a directory name for --with-jdk-dir]) - AC_MSG_ERROR([Cannot continue])]) - AS_IF([test "$with_jdk_bindir" = "yes" || test "$with_jdk_bindir" = "no"], - [AC_MSG_WARN([Must specify a directory name for --with-jdk-bindir]) - AC_MSG_ERROR([Cannot continue])]) - AS_IF([test "$with_jdk_headers" = "yes" || test "$with_jdk_headers" = "no"], - [AC_MSG_WARN([Must specify a directory name for --with-jdk-headers]) - AC_MSG_ERROR([Cannot continue])]) - - # Check for bozo case: either specify --with-jdk-dir or - # (--with-jdk-bindir, --with-jdk-headers) -- not both. - opal_java_bad=0 - AS_IF([test -n "$with_jdk_dir" && \ - (test -n "$with_jdk_bindir" || test -n "$with_jdk_headers")], - [opal_java_bad=1]) - AS_IF([(test -z "$with_jdk_bindir" && test -n "$with_jdk_headers") || \ - (test -n "$with_jdk_bindir" && test -z "$with_jdk_headers")], - [opal_java_bad=1]) - AS_IF([test "$opal_java_bad" = "1"], - [AC_MSG_WARN([Either specify --with-jdk-dir or both of (--with-jdk_bindir, --with-jdk-headers) -- not both.]) - AC_MSG_ERROR([Cannot continue])]) - - AS_IF([test -n "$with_jdk_dir"], - [with_jdk_bindir=$with_jdk_dir/bin - with_jdk_headers=$with_jdk_dir/include]) - - ################################################################## - # with_jdk_dir can now be ignored; with_jdk_bindir and - # with_jdk_headers will be either empty or have valid values. - ################################################################## - - # Some java installations are in obscure places. So let's - # hard-code a few of the common ones so that users don't have to - # specify --with-java-=LONG_ANNOYING_DIRECTORY. - AS_IF([test -z "$with_jdk_bindir"], - [ # OS X Snow Leopard and Lion (10.6 and 10.7 -- did not - # check prior versions) - opal_java_found=0 - AS_IF([test -x /usr/libexec/java_home], - [opal_java_dir=`/usr/libexec/java_home`/include], - [opal_java_dir=/System/Library/Frameworks/JavaVM.framework/Versions/Current/Headers]) - AC_MSG_CHECKING([OSX locations]) - AS_IF([test -d $opal_java_dir], - [AC_MSG_RESULT([found ($opal_java_dir)]) - opal_java_found=1 - with_jdk_headers=$opal_java_dir - with_jdk_bindir=/usr/bin], - [AC_MSG_RESULT([not found])]) - - if test "$opal_java_found" = "0"; then - # Various Linux - if test -z "$JAVA_HOME"; then - opal_java_dir='/usr/lib/jvm/java-*-openjdk-*/include/' - else - opal_java_dir=$JAVA_HOME/include - fi - opal_java_jnih=`ls $opal_java_dir/jni.h 2>/dev/null | head -n 1` - AC_MSG_CHECKING([Linux locations]) - AS_IF([test -r "$opal_java_jnih"], - [with_jdk_headers=`dirname $opal_java_jnih` - OPAL_WHICH([javac], [with_jdk_bindir]) - AS_IF([test -n "$with_jdk_bindir"], - [AC_MSG_RESULT([found ($with_jdk_headers)]) - opal_java_found=1 - with_jdk_bindir=`dirname $with_jdk_bindir`], - [with_jdk_headers=])], - [opal_java_dir='/usr/lib/jvm/default-java/include/' - opal_java_jnih=`ls $opal_java_dir/jni.h 2>/dev/null | head -n 1` - AS_IF([test -r "$opal_java_jnih"], - [with_jdk_headers=`dirname $opal_java_jnih` - OPAL_WHICH([javac], [with_jdk_bindir]) - AS_IF([test -n "$with_jdk_bindir"], - [AC_MSG_RESULT([found ($with_jdk_headers)]) - opal_java_found=1 - with_jdk_bindir=`dirname $with_jdk_bindir`], - [with_jdk_headers=])], - [AC_MSG_RESULT([not found])])]) - fi - - if test "$opal_java_found" = "0"; then - # Solaris - opal_java_dir=/usr/java - AC_MSG_CHECKING([Solaris locations]) - AS_IF([test -d $opal_java_dir && test -r "$opal_java_dir/include/jni.h"], - [AC_MSG_RESULT([found ($opal_java_dir)]) - with_jdk_headers=$opal_java_dir/include - with_jdk_bindir=$opal_java_dir/bin - opal_java_found=1], - [AC_MSG_RESULT([not found])]) - fi - ], - [opal_java_found=1]) - - if test "$opal_java_found" = "1"; then - OPAL_CHECK_WITHDIR([jdk-bindir], [$with_jdk_bindir], [javac]) - OPAL_CHECK_WITHDIR([jdk-headers], [$with_jdk_headers], [jni.h]) - - # Look for various Java-related programs - opal_java_happy=no - opal_java_PATH_save=$PATH - AS_IF([test -n "$with_jdk_bindir" && test "$with_jdk_bindir" != "yes" && test "$with_jdk_bindir" != "no"], - [PATH="$with_jdk_bindir:$PATH"]) - AC_PATH_PROG(JAVAC, javac) - AC_PATH_PROG(JAVAH, javah) - AC_PATH_PROG(JAR, jar) - PATH=$opal_java_PATH_save - - # Check to see if we have all 3 programs. - AS_IF([test -z "$JAVAC" || test -z "$JAVAH" || test -z "$JAR"], - [opal_java_happy=no - HAVE_JAVA_SUPPORT=0], - [opal_java_happy=yes - HAVE_JAVA_SUPPORT=1]) - - # Look for jni.h - AS_IF([test "$opal_java_happy" = "yes"], - [opal_java_CPPFLAGS_save=$CPPFLAGS - # silence a stupid Mac warning - CPPFLAGS="$CPPFLAGS -DTARGET_RT_MAC_CFM=0" - AS_IF([test -n "$with_jdk_headers" && test "$with_jdk_headers" != "yes" && test "$with_jdk_headers" != "no"], - [OPAL_JDK_CPPFLAGS="-I$with_jdk_headers" - # Some flavors of JDK also require -I/linux. - # See if that's there, and if so, add a -I for that, - # too. Ugh. - AS_IF([test -d "$with_jdk_headers/linux"], - [OPAL_JDK_CPPFLAGS="$OPAL_JDK_CPPFLAGS -I$with_jdk_headers/linux"]) - # Solaris JDK also require -I/solaris. - # See if that's there, and if so, add a -I for that, - # too. Ugh. - AS_IF([test -d "$with_jdk_headers/solaris"], - [OPAL_JDK_CPPFLAGS="$OPAL_JDK_CPPFLAGS -I$with_jdk_headers/solaris"]) - # Darwin JDK also require -I/darwin. - # See if that's there, and if so, add a -I for that, - # too. Ugh. - AS_IF([test -d "$with_jdk_headers/darwin"], - [OPAL_JDK_CPPFLAGS="$OPAL_JDK_CPPFLAGS -I$with_jdk_headers/darwin"]) - - CPPFLAGS="$CPPFLAGS $OPAL_JDK_CPPFLAGS"]) - AC_CHECK_HEADER([jni.h], [], - [opal_java_happy=no]) - CPPFLAGS=$opal_java_CPPFLAGS_save - ]) - else - opal_java_happy=no; - HAVE_JAVA_SUPPORT=no; - fi - AC_SUBST(OPAL_JDK_CPPFLAGS) - fi - - # Are we happy? - AC_MSG_CHECKING([Java support available]) - AS_IF([test "$opal_java_happy" = "no"], - [AC_MSG_RESULT([no])], - [AC_MSG_RESULT([yes])]) - - AC_DEFINE_UNQUOTED([OPAL_HAVE_JAVA_SUPPORT], [$HAVE_JAVA_SUPPORT], [do we have Java support]) - AM_CONDITIONAL(OPAL_HAVE_JAVA_SUPPORT, test "$opal_java_happy" = "yes") - OPAL_VAR_SCOPE_POP -]) diff --git a/config/opal_setup_wrappers.m4 b/config/opal_setup_wrappers.m4 index 6c3300856f6..397e1eca37c 100644 --- a/config/opal_setup_wrappers.m4 +++ b/config/opal_setup_wrappers.m4 @@ -12,7 +12,7 @@ dnl Copyright (c) 2004-2005 The Regents of the University of California. dnl All rights reserved. dnl Copyright (c) 2006-2010 Oracle and/or its affiliates. All rights reserved. dnl Copyright (c) 2009-2016 Cisco Systems, Inc. All rights reserved. -dnl Copyright (c) 2015-2016 Research Organization for Information Science +dnl Copyright (c) 2015-2017 Research Organization for Information Science dnl and Technology (RIST). All rights reserved. dnl Copyright (c) 2016 IBM Corporation. All rights reserved. dnl $COPYRIGHT$ @@ -68,7 +68,7 @@ AC_DEFUN([OPAL_WRAPPER_FLAGS_ADD], [ # That is, a component may not influence CFLAGS, CXXFLAGS, or FCFLAGS. # # Notes: -# * Keep user flags seperate as 1) they should have no influence +# * Keep user flags separate as 1) they should have no influence # over build and 2) they don't go through the uniqification we do # with the other wrapper compiler options # * While the user (the person who runs configure) is allowed to set @@ -130,15 +130,24 @@ AC_DEFUN([OPAL_SETUP_WRAPPER_INIT],[ [enable rpath/runpath support in the wrapper compilers (default=yes)])]) AS_IF([test "$enable_wrapper_rpath" != "no"], [enable_wrapper_rpath=yes]) AC_MSG_RESULT([$enable_wrapper_rpath]) + + AC_MSG_CHECKING([if want wrapper compiler runpath support]) + AC_ARG_ENABLE([wrapper-runpath], + [AS_HELP_STRING([--enable-wrapper-runpath], + [enable runpath in the wrapper compilers if linker supports it (default: enabled, unless wrapper-rpath is disabled).])]) + AS_IF([test "$enable_wrapper_runpath" != "no"], [enable_wrapper_runpath=yes]) + AC_MSG_RESULT([$enable_wrapper_runpath]) + + AS_IF([test "$enable_wrapper_rpath" = "no" && test "$enable_wrapper_runpath" = "yes"], + [AC_MSG_ERROR([--enable-wrapper-runpath cannot be selected with --disable-wrapper-rpath])]) ]) -# Check to see whether the linker supports DT_RPATH. We'll need to -# use config.rpath to find the flags that it needs, if it does (see -# comments in config.rpath for an explanation of where it came from). -AC_DEFUN([OPAL_SETUP_RPATH],[ - OPAL_VAR_SCOPE_PUSH([rpath_libdir_save rpath_script rpath_outfile]) - AC_MSG_CHECKING([if linker supports RPATH]) - # Output goes into globally-visible $rpath_args. Run this in a +# OPAL_LIBTOOL_CONFIG(libtool-variable, result-variable, +# libtool-tag, extra-code) +# Retrieve information from the generated libtool +AC_DEFUN([OPAL_LIBTOOL_CONFIG],[ + OPAL_VAR_SCOPE_PUSH([rpath_script rpath_outfile]) + # Output goes into globally-visible variable. Run this in a # sub-process so that we don't pollute the current process # environment. rpath_script=conftest.$$.sh @@ -153,52 +162,37 @@ AC_DEFUN([OPAL_SETUP_RPATH],[ # (because if script A sources script B, and B calls "exit", then both # B and A will exit). Instead, we have to send the output to a file # and then source that. -$OPAL_TOP_BUILDDIR/libtool --config > $rpath_outfile +$OPAL_TOP_BUILDDIR/libtool $3 --config > $rpath_outfile chmod +x $rpath_outfile . ./$rpath_outfile rm -f $rpath_outfile -# Evaluate \$hardcode_libdir_flag_spec, and substitute in LIBDIR for \$libdir -libdir=LIBDIR -flags="\`eval echo \$hardcode_libdir_flag_spec\`" +# Evaluate \$$1, and substitute in LIBDIR for \$libdir +$4 +flags="\`eval echo \$$1\`" echo \$flags # Done exit 0 EOF chmod +x $rpath_script - rpath_args=`./$rpath_script` + $2=`./$rpath_script` rm -f $rpath_script + OPAL_VAR_SCOPE_POP +]) + +# Check to see whether the linker supports DT_RPATH. We'll need to +# use config.rpath to find the flags that it needs, if it does (see +# comments in config.rpath for an explanation of where it came from). +AC_DEFUN([OPAL_SETUP_RPATH],[ + OPAL_VAR_SCOPE_PUSH([rpath_libdir_save]) + AC_MSG_CHECKING([if linker supports RPATH]) + OPAL_LIBTOOL_CONFIG([hardcode_libdir_flag_spec],[rpath_args],[],[libdir=LIBDIR]) AS_IF([test -n "$rpath_args"], [WRAPPER_RPATH_SUPPORT=rpath - cat > $rpath_script < $rpath_outfile - -chmod +x $rpath_outfile -. ./$rpath_outfile -rm -f $rpath_outfile - -# Evaluate \$hardcode_libdir_flag_spec, and substitute in LIBDIR for \$libdir -libdir=LIBDIR -flags="\`eval echo \$hardcode_libdir_flag_spec\`" -echo \$flags - -# Done -exit 0 -EOF - chmod +x $rpath_script - rpath_fc_args=`./$rpath_script` - rm -f $rpath_script + OPAL_LIBTOOL_CONFIG([hardcode_libdir_flag_spec],[rpath_fc_args],[--tag=FC],[libdir=LIBDIR]) AC_MSG_RESULT([yes ($rpath_args + $rpath_fc_args)])], [WRAPPER_RPATH_SUPPORT=unnecessary AC_MSG_RESULT([yes (no extra flags needed)])]) @@ -218,59 +212,31 @@ EOF # If DT_RUNPATH is supported, then we'll use *both* the RPATH and # RUNPATH flags in the LDFLAGS. AC_DEFUN([OPAL_SETUP_RUNPATH],[ - OPAL_VAR_SCOPE_PUSH([LDFLAGS_save rpath_script rpath_outfile wl_fc]) + OPAL_VAR_SCOPE_PUSH([LDFLAGS_save wl_fc]) - AC_MSG_CHECKING([if linker supports RUNPATH]) # Set the output in $runpath_args runpath_args= LDFLAGS_save=$LDFLAGS LDFLAGS="$LDFLAGS -Wl,--enable-new-dtags" - AC_LANG_PUSH([C]) - AC_LINK_IFELSE([AC_LANG_PROGRAM([], [return 7;])], - [WRAPPER_RPATH_SUPPORT=runpath - runpath_args="-Wl,--enable-new-dtags" - AC_MSG_RESULT([yes (-Wl,--enable-new-dtags)])], - [AC_MSG_RESULT([no])]) - AC_LANG_POP([C]) -m4_ifdef([project_ompi],[ - # Output goes into globally-visible $rpath_args. Run this in a - # sub-process so that we don't pollute the current process - # environment. - rpath_script=conftest.$$.sh - rpath_outfile=conftest.$$.out - rm -f $rpath_script $rpath_outfile - cat > $rpath_script < $rpath_outfile - -chmod +x $rpath_outfile -. ./$rpath_outfile -rm -f $rpath_outfile - -wl="\`eval echo \$wl\`" -echo \$wl - -# Done -exit 0 -EOF - chmod +x $rpath_script - wl_fc=`./$rpath_script` - rm -f $rpath_script - - LDFLAGS="$LDFLAGS_save ${wl_fc}--enable-new-dtags" - AC_LANG_PUSH([Fortran]) - AC_LINK_IFELSE([AC_LANG_SOURCE([[program test + AS_IF([test x"$enable_wrapper_runpath" = x"yes"], + [AC_LANG_PUSH([C]) + AC_MSG_CHECKING([if linker supports RUNPATH]) + AC_LINK_IFELSE([AC_LANG_PROGRAM([], [return 7;])], + [WRAPPER_RPATH_SUPPORT=runpath + runpath_args="-Wl,--enable-new-dtags" + AC_MSG_RESULT([yes (-Wl,--enable-new-dtags)])], + [AC_MSG_RESULT([no])]) + AC_LANG_POP([C])]) + m4_ifdef([project_ompi],[ + OPAL_LIBTOOL_CONFIG([wl],[wl_fc],[--tag=FC],[]) + + LDFLAGS="$LDFLAGS_save ${wl_fc}--enable-new-dtags" + AC_LANG_PUSH([Fortran]) + AC_LINK_IFELSE([AC_LANG_SOURCE([[program test end program]])], - [runpath_fc_args="${wl_fc}--enable-new-dtags"], - [runpath_fc_args=""]) - AC_LANG_POP([Fortran])]) + [runpath_fc_args="${wl_fc}--enable-new-dtags"], + [runpath_fc_args=""]) + AC_LANG_POP([Fortran])]) LDFLAGS=$LDFLAGS_save OPAL_VAR_SCOPE_POP diff --git a/config/opal_setup_zlib.m4 b/config/opal_setup_zlib.m4 index 76b1f97a39e..55fc55d54bf 100644 --- a/config/opal_setup_zlib.m4 +++ b/config/opal_setup_zlib.m4 @@ -3,6 +3,8 @@ # Copyright (c) 2009-2015 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2013 Los Alamos National Security, LLC. All rights reserved. # Copyright (c) 2013-2017 Intel, Inc. All rights reserved. +# Copyright (c) 2017 Research Organization for Information Science +# and Technology (RIST). All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -13,7 +15,7 @@ # MCA_zlib_CONFIG([action-if-found], [action-if-not-found]) # -------------------------------------------------------------------- AC_DEFUN([OPAL_ZLIB_CONFIG],[ - OPAL_VAR_SCOPE_PUSH([opal_zlib_dir opal_zlib_libdir]) + OPAL_VAR_SCOPE_PUSH([opal_zlib_dir opal_zlib_libdir opal_zlib_standard_header_location opal_zlib_standard_lib_location]) AC_ARG_WITH([zlib], [AC_HELP_STRING([--with-zlib=DIR], @@ -29,23 +31,26 @@ AC_DEFUN([OPAL_ZLIB_CONFIG],[ if test ! -z "$with_zlib" && test "$with_zlib" != "yes"; then opal_zlib_dir=$with_zlib opal_zlib_standard_header_location=no - if test -d $with_zlib/lib; then - opal_zlib_libdir=$with_zlib/lib - elif test -d $with_zlib/lib64; then - opal_zlib_libdir=$with_zlib/lib64 - else - AC_MSG_RESULT([Could not find $with_zlib/lib or $with_zlib/lib64]) - AC_MSG_ERROR([Can not continue]) - fi - AC_MSG_RESULT([$opal_zlib_dir and $opal_zlib_libdir]) + opal_zlib_standard_lib_location=no + AS_IF([test -z "$with_zlib_libdir" || test "$with_zlib_libdir" = "yes"], + [if test -d $with_zlib/lib; then + opal_zlib_libdir=$with_zlib/lib + elif test -d $with_zlib/lib64; then + opal_zlib_libdir=$with_zlib/lib64 + else + AC_MSG_RESULT([Could not find $with_zlib/lib or $with_zlib/lib64]) + AC_MSG_ERROR([Can not continue]) + fi + AC_MSG_RESULT([$opal_zlib_dir and $opal_zlib_libdir])], + [AC_MSG_RESULT([$with_zlib_libdir])]) else AC_MSG_RESULT([(default search paths)]) opal_zlib_standard_header_location=yes + opal_zlib_standard_lib_location=yes fi AS_IF([test ! -z "$with_zlib_libdir" && test "$with_zlib_libdir" != "yes"], [opal_zlib_libdir="$with_zlib_libdir" - opal_zlib_standard_lib_location=no], - [opal_zlib_standard_lib_location=yes]) + opal_zlib_standard_lib_location=no]) OPAL_CHECK_PACKAGE([opal_zlib], [zlib.h], @@ -56,7 +61,7 @@ AC_DEFUN([OPAL_ZLIB_CONFIG],[ [$opal_zlib_libdir], [opal_zlib_support=1], [opal_zlib_support=0]) - if test $opal_zlib_support == "1"; then + if test $opal_zlib_support = "1"; then LIBS="$LIBS -lz" if test "$opal_zlib_standard_header_location" != "yes"; then CPPFLAGS="$CPPFLAGS $opal_zlib_CPPFLAGS" diff --git a/config/opal_summary.m4 b/config/opal_summary.m4 index 084896df125..95ba540fb36 100644 --- a/config/opal_summary.m4 +++ b/config/opal_summary.m4 @@ -2,7 +2,7 @@ dnl -*- shell-script -*- dnl dnl Copyright (c) 2016 Los Alamos National Security, LLC. All rights dnl reserved. -dnl Copyright (c) 2016 Cisco Systems, Inc. All rights reserved. +dnl Copyright (c) 2016-2018 Cisco Systems, Inc. All rights reserved dnl Copyright (c) 2016 Research Organization for Information Science dnl and Technology (RIST). All rights reserved. dnl $COPYRIGHT$ @@ -67,7 +67,7 @@ EOF echo "Build MPI Fortran bindings: no" fi - if test x$WANT_MPI_JAVA_SUPPORT = x1 ; then + if test $WANT_MPI_JAVA_BINDINGS -eq 1 ; then echo "Build MPI Java bindings (experimental): yes" else echo "MPI Build Java bindings (experimental): no" @@ -76,8 +76,10 @@ EOF if test "$project_oshmem_amc" = "true" ; then echo "Build Open SHMEM support: yes" - else + elif test -z "$project_oshmem_amc" ; then echo "Build Open SHMEM support: no" + else + echo "Build Open SHMEM support: $project_oshmem_amc" fi if test $WANT_DEBUG = 0 ; then diff --git a/config/orte_check_loadleveler.m4 b/config/orte_check_loadleveler.m4 deleted file mode 100644 index a8d609981b2..00000000000 --- a/config/orte_check_loadleveler.m4 +++ /dev/null @@ -1,53 +0,0 @@ -# -*- shell-script -*- -# -# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana -# University Research and Technology -# Corporation. All rights reserved. -# Copyright (c) 2004-2005 The University of Tennessee and The University -# of Tennessee Research Foundation. All rights -# reserved. -# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, -# University of Stuttgart. All rights reserved. -# Copyright (c) 2004-2005 The Regents of the University of California. -# All rights reserved. -# Copyright (c) 2006-2009 Cisco Systems, Inc. All rights reserved. -# Copyright (c) 2011 IBM Corporation. All rights reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - -# 1. if --with-loadleveler is given, always build -# 2. if --without-loadleveler is given, never build -# 3. if neither is given, build if-and-only-if the OS is Linux or AIX - -# ORTE_CHECK_LOADLEVELER(prefix, [action-if-found], [action-if-not-found]) -# -------------------------------------------------------- -AC_DEFUN([ORTE_CHECK_LOADLEVELER],[ - AC_ARG_WITH([loadleveler], - [AC_HELP_STRING([--with-loadleveler], - [Build LoadLeveler scheduler component (default: yes)])]) - - if test "$with_loadleveler" = "no" ; then - orte_check_loadleveler_happy="no" - elif test "$with_loadleveler" = "" ; then - # unless user asked, only build LoadLeveler component on Linux - # and AIX (these are the platforms that LoadLeveler supports) - case $host in - *-linux*|*-aix*) - orte_check_loadleveler_happy="yes" - ;; - *) - orte_check_loadleveler_happy="no" - ;; - esac - else - orte_check_loadleveler_happy="yes" - fi - - AS_IF([test "$orte_check_loadleveler_happy" = "yes"], - [$2], - [$3]) -]) diff --git a/config/orte_check_lsf.m4 b/config/orte_check_lsf.m4 index c32c1aaa556..0de332ca566 100644 --- a/config/orte_check_lsf.m4 +++ b/config/orte_check_lsf.m4 @@ -15,6 +15,7 @@ dnl Copyright (c) 2015 Research Organization for Information Science dnl and Technology (RIST). All rights reserved. dnl Copyright (c) 2016 Los Alamos National Security, LLC. All rights dnl reserved. +dnl Copyright (c) 2017 IBM Corporation. All rights reserved. dnl $COPYRIGHT$ dnl dnl Additional copyrights may follow @@ -26,113 +27,116 @@ dnl # ORTE_CHECK_LSF(prefix, [action-if-found], [action-if-not-found]) # -------------------------------------------------------- AC_DEFUN([ORTE_CHECK_LSF],[ - if test -z "$orte_check_lsf_happy" ; then - AC_ARG_WITH([lsf], - [AC_HELP_STRING([--with-lsf(=DIR)], - [Build LSF support])]) - OPAL_CHECK_WITHDIR([lsf], [$with_lsf], [include/lsf/lsbatch.h]) - AC_ARG_WITH([lsf-libdir], - [AC_HELP_STRING([--with-lsf-libdir=DIR], - [Search for LSF libraries in DIR])]) - OPAL_CHECK_WITHDIR([lsf-libdir], [$with_lsf_libdir], [libbat.*]) + AS_IF([test -z "$orte_check_lsf_happy"],[ + AC_ARG_WITH([lsf], + [AC_HELP_STRING([--with-lsf(=DIR)], + [Build LSF support])]) + OPAL_CHECK_WITHDIR([lsf], [$with_lsf], [include/lsf/lsbatch.h]) + AC_ARG_WITH([lsf-libdir], + [AC_HELP_STRING([--with-lsf-libdir=DIR], + [Search for LSF libraries in DIR])]) + OPAL_CHECK_WITHDIR([lsf-libdir], [$with_lsf_libdir], [libbat.*]) - # Defaults - orte_check_lsf_dir_msg="compiler default" - orte_check_lsf_libdir_msg="linker default" + AS_IF([test "$with_lsf" != "no"],[ + # Defaults + orte_check_lsf_dir_msg="compiler default" + orte_check_lsf_libdir_msg="linker default" - # Save directory names if supplied - AS_IF([test ! -z "$with_lsf" && test "$with_lsf" != "yes"], - [orte_check_lsf_dir="$with_lsf" - orte_check_lsf_dir_msg="$orte_check_lsf_dir (from --with-lsf)"]) - AS_IF([test ! -z "$with_lsf_libdir" && test "$with_lsf_libdir" != "yes"], - [orte_check_lsf_libdir="$with_lsf_libdir" - orte_check_lsf_libdir_msg="$orte_check_lsf_libdir (from --with-lsf-libdir)"]) + # Save directory names if supplied + AS_IF([test ! -z "$with_lsf" && test "$with_lsf" != "yes"], + [orte_check_lsf_dir="$with_lsf" + orte_check_lsf_dir_msg="$orte_check_lsf_dir (from --with-lsf)"]) + AS_IF([test ! -z "$with_lsf_libdir" && test "$with_lsf_libdir" != "yes"], + [orte_check_lsf_libdir="$with_lsf_libdir" + orte_check_lsf_libdir_msg="$orte_check_lsf_libdir (from --with-lsf-libdir)"]) - # If no directories were specified, look for LSF_LIBDIR, - # LSF_INCLUDEDIR, and/or LSF_ENVDIR. - AS_IF([test -z "$orte_check_lsf_dir" && test -z "$orte_check_lsf_libdir"], - [AS_IF([test ! -z "$LSF_ENVDIR" && test -z "$LSF_LIBDIR" && test -f "$LSF_ENVDIR/lsf.conf"], - [LSF_LIBDIR=`egrep ^LSF_LIBDIR= $LSF_ENVDIR/lsf.conf | cut -d= -f2-`]) - AS_IF([test ! -z "$LSF_ENVDIR" && test -z "$LSF_INCLUDEDIR" && test -f "$LSF_ENVDIR/lsf.conf"], - [LSF_INCLUDEDIR=`egrep ^LSF_INCLUDEDIR= $LSF_ENVDIR/lsf.conf | cut -d= -f2-`]) - AS_IF([test ! -z "$LSF_LIBDIR"], - [orte_check_lsf_libdir=$LSF_LIBDIR - orte_check_lsf_libdir_msg="$LSF_LIBDIR (from \$LSF_LIBDIR)"]) - AS_IF([test ! -z "$LSF_INCLUDEDIR"], - [orte_check_lsf_dir=`dirname $LSF_INCLUDEDIR` - orte_check_lsf_dir_msg="$orte_check_lsf_dir (from \$LSF_INCLUDEDIR)"])]) + # If no directories were specified, look for LSF_LIBDIR, + # LSF_INCLUDEDIR, and/or LSF_ENVDIR. + AS_IF([test -z "$orte_check_lsf_dir" && test -z "$orte_check_lsf_libdir"], + [AS_IF([test ! -z "$LSF_ENVDIR" && test -z "$LSF_LIBDIR" && test -f "$LSF_ENVDIR/lsf.conf"], + [LSF_LIBDIR=`egrep ^LSF_LIBDIR= $LSF_ENVDIR/lsf.conf | cut -d= -f2-`]) + AS_IF([test ! -z "$LSF_ENVDIR" && test -z "$LSF_INCLUDEDIR" && test -f "$LSF_ENVDIR/lsf.conf"], + [LSF_INCLUDEDIR=`egrep ^LSF_INCLUDEDIR= $LSF_ENVDIR/lsf.conf | cut -d= -f2-`]) + AS_IF([test ! -z "$LSF_LIBDIR"], + [orte_check_lsf_libdir=$LSF_LIBDIR + orte_check_lsf_libdir_msg="$LSF_LIBDIR (from \$LSF_LIBDIR)"]) + AS_IF([test ! -z "$LSF_INCLUDEDIR"], + [orte_check_lsf_dir=`dirname $LSF_INCLUDEDIR` + orte_check_lsf_dir_msg="$orte_check_lsf_dir (from \$LSF_INCLUDEDIR)"])]) - AS_IF([test "$with_lsf" = "no"], - [orte_check_lsf_happy="no"], - [orte_check_lsf_happy="yes"]) + AS_IF([test "$with_lsf" = "no"], + [orte_check_lsf_happy="no"], + [orte_check_lsf_happy="yes"]) - orte_check_lsf_$1_save_CPPFLAGS="$CPPFLAGS" - orte_check_lsf_$1_save_LDFLAGS="$LDFLAGS" - orte_check_lsf_$1_save_LIBS="$LIBS" + orte_check_lsf_$1_save_CPPFLAGS="$CPPFLAGS" + orte_check_lsf_$1_save_LDFLAGS="$LDFLAGS" + orte_check_lsf_$1_save_LIBS="$LIBS" - # liblsf requires yp_all, yp_get_default_domain, and ypprot_err - # on Linux, Solaris, NEC, and Sony NEWSs these are found in libnsl - # on AIX it should be in libbsd - # on HP-UX it should be in libBSD - # on IRIX < 6 it should be in libsun (IRIX 6 and later it is in libc) - OPAL_SEARCH_LIBS_COMPONENT([yp_all_nsl], [yp_all], [nsl bsd BSD sun], - [yp_all_nsl_happy="yes"], - [yp_all_nsl_happy="no"]) + # liblsf requires yp_all, yp_get_default_domain, and ypprot_err + # on Linux, Solaris, NEC, and Sony NEWSs these are found in libnsl + # on AIX it should be in libbsd + # on HP-UX it should be in libBSD + # on IRIX < 6 it should be in libsun (IRIX 6 and later it is in libc) + OPAL_SEARCH_LIBS_COMPONENT([yp_all_nsl], [yp_all], [nsl bsd BSD sun], + [yp_all_nsl_happy="yes"], + [yp_all_nsl_happy="no"]) - AS_IF([test "$yp_all_nsl_happy" = "no"], - [orte_check_lsf_happy="no"], - [orte_check_lsf_happy="yes"]) + AS_IF([test "$yp_all_nsl_happy" = "no"], + [orte_check_lsf_happy="no"], + [orte_check_lsf_happy="yes"]) - # liblsb requires liblsf - using ls_info as a test for liblsf presence - OPAL_CHECK_PACKAGE([ls_info_lsf], - [lsf/lsf.h], - [lsf], - [ls_info], - [$yp_all_nsl_LIBS], - [$orte_check_lsf_dir], - [$orte_check_lsf_libdir], - [ls_info_lsf_happy="yes"], - [ls_info_lsf_happy="no"]) + # liblsb requires liblsf - using ls_info as a test for liblsf presence + OPAL_CHECK_PACKAGE([ls_info_lsf], + [lsf/lsf.h], + [lsf], + [ls_info], + [$yp_all_nsl_LIBS], + [$orte_check_lsf_dir], + [$orte_check_lsf_libdir], + [ls_info_lsf_happy="yes"], + [ls_info_lsf_happy="no"]) - AS_IF([test "$ls_info_lsf_happy" = "no"], - [orte_check_lsf_happy="no"], - [orte_check_lsf_happy="yes"]) + AS_IF([test "$ls_info_lsf_happy" = "no"], + [orte_check_lsf_happy="no"], + [orte_check_lsf_happy="yes"]) - # test function of liblsb LSF package - AS_IF([test "$orte_check_lsf_happy" = "yes"], - [AC_MSG_CHECKING([for LSF dir]) - AC_MSG_RESULT([$orte_check_lsf_dir_msg]) - AC_MSG_CHECKING([for LSF library dir]) - AC_MSG_RESULT([$orte_check_lsf_libdir_msg]) - AC_MSG_CHECKING([for liblsf function]) - AC_MSG_RESULT([$ls_info_lsf_happy]) - AC_MSG_CHECKING([for liblsf yp requirements]) - AC_MSG_RESULT([$yp_all_nsl_happy]) - OPAL_CHECK_PACKAGE([orte_check_lsf], - [lsf/lsbatch.h], - [bat], - [lsb_launch], - [$ls_info_lsf_LIBS $yp_all_nsl_LIBS], - [$orte_check_lsf_dir], - [$orte_check_lsf_libdir], - [orte_check_lsf_happy="yes"], - [orte_check_lsf_happy="no"])]) + # test function of liblsb LSF package + AS_IF([test "$orte_check_lsf_happy" = "yes"], + [AC_MSG_CHECKING([for LSF dir]) + AC_MSG_RESULT([$orte_check_lsf_dir_msg]) + AC_MSG_CHECKING([for LSF library dir]) + AC_MSG_RESULT([$orte_check_lsf_libdir_msg]) + AC_MSG_CHECKING([for liblsf function]) + AC_MSG_RESULT([$ls_info_lsf_happy]) + AC_MSG_CHECKING([for liblsf yp requirements]) + AC_MSG_RESULT([$yp_all_nsl_happy]) + OPAL_CHECK_PACKAGE([orte_check_lsf], + [lsf/lsbatch.h], + [bat], + [lsb_launch], + [$ls_info_lsf_LIBS $yp_all_nsl_LIBS], + [$orte_check_lsf_dir], + [$orte_check_lsf_libdir], + [orte_check_lsf_happy="yes"], + [orte_check_lsf_happy="no"])]) - CPPFLAGS="$orte_check_lsf_$1_save_CPPFLAGS" - LDFLAGS="$orte_check_lsf_$1_save_LDFLAGS" - LIBS="$orte_check_lsf_$1_save_LIBS" + CPPFLAGS="$orte_check_lsf_$1_save_CPPFLAGS" + LDFLAGS="$orte_check_lsf_$1_save_LDFLAGS" + LIBS="$orte_check_lsf_$1_save_LIBS" - OPAL_SUMMARY_ADD([[Resource Managers]],[[LSF]],[$1],[$orte_check_lsf_happy]) - fi + ],[orte_check_lsf_happy=no]) + + OPAL_SUMMARY_ADD([[Resource Managers]],[[LSF]],[$1],[$orte_check_lsf_happy]) + ]) AS_IF([test "$orte_check_lsf_happy" = "yes"], [$1_LIBS="[$]$1_LIBS $orte_check_lsf_LIBS" - $1_LDFLAGS="[$]$1_LDFLAGS $orte_check_lsf_LDFLAGS" - $1_CPPFLAGS="[$]$1_CPPFLAGS $orte_check_lsf_CPPFLAGS" - # add the LSF libraries to static builds as they are required - $1_WRAPPER_EXTRA_LDFLAGS=[$]$1_LDFLAGS - $1_WRAPPER_EXTRA_LIBS=[$]$1_LIBS - $2], + $1_LDFLAGS="[$]$1_LDFLAGS $orte_check_lsf_LDFLAGS" + $1_CPPFLAGS="[$]$1_CPPFLAGS $orte_check_lsf_CPPFLAGS" + # add the LSF libraries to static builds as they are required + $1_WRAPPER_EXTRA_LDFLAGS=[$]$1_LDFLAGS + $1_WRAPPER_EXTRA_LIBS=[$]$1_LIBS + $2], [AS_IF([test ! -z "$with_lsf" && test "$with_lsf" != "no"], [AC_MSG_WARN([LSF support requested (via --with-lsf) but not found.]) AC_MSG_ERROR([Aborting.])]) diff --git a/config/orte_check_slurm.m4 b/config/orte_check_slurm.m4 index b59e5f5804b..ee5cd02cce7 100644 --- a/config/orte_check_slurm.m4 +++ b/config/orte_check_slurm.m4 @@ -13,6 +13,7 @@ # Copyright (c) 2009-2016 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2016 Los Alamos National Security, LLC. All rights # reserved. +# Copyright (c) 2017 Intel, Inc. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -68,6 +69,15 @@ AC_DEFUN([ORTE_CHECK_SLURM],[ [orte_check_slurm_happy="yes"], [orte_check_slurm_happy="no"])]) + # check to see if this is a Cray nativized slurm env. + + slurm_cray_env=0 + OPAL_CHECK_ALPS([orte_slurm_cray], + [slurm_cray_env=1]) + + AC_DEFINE_UNQUOTED([SLURM_CRAY_ENV],[$slurm_cray_env], + [defined to 1 if slurm cray env, 0 otherwise]) + OPAL_SUMMARY_ADD([[Resource Managers]],[[Slurm]],[$1],[$orte_check_slurm_happy]) fi diff --git a/config/orte_check_tm.m4 b/config/orte_check_tm.m4 index 3fa9ac69b75..285874857c2 100644 --- a/config/orte_check_tm.m4 +++ b/config/orte_check_tm.m4 @@ -11,7 +11,7 @@ dnl University of Stuttgart. All rights reserved. dnl Copyright (c) 2004-2005 The Regents of the University of California. dnl All rights reserved. dnl Copyright (c) 2006-2016 Cisco Systems, Inc. All rights reserved. -dnl Copyright (c) 2015 Research Organization for Information Science +dnl Copyright (c) 2015-2017 Research Organization for Information Science dnl and Technology (RIST). All rights reserved. dnl Copyright (c) 2016 Los Alamos National Security, LLC. All rights dnl reserved. @@ -128,14 +128,21 @@ AC_DEFUN([ORTE_CHECK_TM],[ [$orte_check_tm_dir], [$orte_check_tm_libdir], [orte_check_tm_found="yes"], - [_OPAL_CHECK_PACKAGE_LIB([orte_check_tm], - [torque], - [tm_init], - [], - [$orte_check_tm_dir], - [$orte_check_tm_libdir], - [orte_check_tm_found="yes"], - [orte_check_tm_found="no"])])])]) + [_OPAL_CHECK_PACKAGE_LIB([orte_check_tm], + [pbs], + [tm_init], + [-lcrypto], + [$orte_check_tm_dir], + [$orte_check_tm_libdir], + [orte_check_tm_found="yes"], + [_OPAL_CHECK_PACKAGE_LIB([orte_check_tm], + [torque], + [tm_init], + [], + [$orte_check_tm_dir], + [$orte_check_tm_libdir], + [orte_check_tm_found="yes"], + [orte_check_tm_found="no"])])])])]) CPPFLAGS="$orte_check_package_$1_save_CPPFLAGS" LDFLAGS="$orte_check_package_$1_save_LDFLAGS" diff --git a/config/orte_config_files.m4 b/config/orte_config_files.m4 index 564ce0ca80b..90f69808c93 100644 --- a/config/orte_config_files.m4 +++ b/config/orte_config_files.m4 @@ -6,7 +6,7 @@ # Corporation. All rights reserved. # Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights # reserved. -# Copyright (c) 2015-2016 Intel, Inc. All rights reserved +# Copyright (c) 2015-2017 Intel, Inc. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -25,14 +25,12 @@ AC_DEFUN([ORTE_CONFIG_FILES],[ orte/tools/wrappers/Makefile orte/tools/wrappers/ortecc-wrapper-data.txt orte/tools/wrappers/orte.pc - orte/tools/orte-checkpoint/Makefile - orte/tools/orte-restart/Makefile orte/tools/orte-ps/Makefile orte/tools/orte-clean/Makefile orte/tools/orte-top/Makefile - orte/tools/orte-migrate/Makefile orte/tools/orte-info/Makefile orte/tools/orte-server/Makefile orte/tools/orte-dvm/Makefile + orte/tools/prun/Makefile ]) ]) diff --git a/config/oshmem_config_files.m4 b/config/oshmem_config_files.m4 index 74d80cd1637..779757dfdde 100644 --- a/config/oshmem_config_files.m4 +++ b/config/oshmem_config_files.m4 @@ -2,7 +2,9 @@ # # Copyright (c) 2013 Mellanox Technologies, Inc. # All rights reserved. -# Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 Research Organization for Information Science +# and Technology (RIST). All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -23,6 +25,7 @@ AC_DEFUN([OSHMEM_CONFIG_FILES],[ oshmem/tools/oshmem_info/Makefile oshmem/tools/wrappers/Makefile oshmem/tools/wrappers/shmemcc-wrapper-data.txt + oshmem/tools/wrappers/shmemc++-wrapper-data.txt oshmem/tools/wrappers/shmemfort-wrapper-data.txt ]) ]) diff --git a/config/oshmem_configure_options.m4 b/config/oshmem_configure_options.m4 index 48ede73a544..886ab2bb072 100644 --- a/config/oshmem_configure_options.m4 +++ b/config/oshmem_configure_options.m4 @@ -6,6 +6,8 @@ dnl Copyright (c) 2013-2014 Cisco Systems, Inc. All rights reserved. dnl Copyright (c) 2014 Intel, Inc. All rights reserved dnl Copyright (c) 2014-2015 Research Organization for Information Science dnl and Technology (RIST). All rights reserved. +dnl Copyright (c) 2018 Amazon.com, Inc. or its affiliates. +dnl All Rights reserved. dnl $COPYRIGHT$ dnl dnl Additional copyrights may follow @@ -25,28 +27,23 @@ AC_SUBST(OSHMEM_LIBSHMEM_EXTRA_LDFLAGS) AC_MSG_CHECKING([if want oshmem]) AC_ARG_ENABLE([oshmem], [AC_HELP_STRING([--enable-oshmem], - [Enable building the OpenSHMEM interface (available on Linux only, where it is enabled by default)])], - [oshmem_arg_given=yes], - [oshmem_arg_given=no]) -if test "$oshmem_arg_given" = "yes"; then - if test "$enable_oshmem" = "yes"; then - AC_MSG_RESULT([yes]) - if test "$opal_found_linux" != "yes"; then - AC_MSG_WARN([OpenSHMEM support was requested, but currently]) - AC_MSG_WARN([only supports Linux.]) - AC_MSG_ERROR([Cannot continue]) - fi - else - AC_MSG_RESULT([no]) - fi -else + [Enable building the OpenSHMEM interface (available on Linux only, where it is enabled by default)])]) +if test "$enable_oshmem" = "no"; then + AC_MSG_RESULT([no]) +elif test "$enable_oshmem" = ""; then if test "$opal_found_linux" = "yes"; then - enable_oshmem=yes AC_MSG_RESULT([yes]) else enable_oshmem=no AC_MSG_RESULT([not supported on this platform]) fi +else + AC_MSG_RESULT([yes]) + if test "$opal_found_linux" != "yes"; then + AC_MSG_WARN([OpenSHMEM support was requested, but currently]) + AC_MSG_WARN([only supports Linux.]) + AC_MSG_ERROR([Cannot continue]) + fi fi # @@ -56,7 +53,7 @@ AC_MSG_CHECKING([if want SGI/Quadrics compatibility mode]) AC_ARG_ENABLE(oshmem-compat, AC_HELP_STRING([--enable-oshmem-compat], [enable compatibility mode (default: enabled)])) -if test "$enable_oshmem" != "no" && test "$enable_oshmem_compat" != "no"; then +if test "$enable_oshmem_compat" != "no"; then AC_MSG_RESULT([yes]) OSHMEM_SPEC_COMPAT=1 else @@ -75,26 +72,21 @@ AC_MSG_CHECKING([if want OSHMEM API parameter checking]) AC_ARG_WITH(oshmem-param-check, AC_HELP_STRING([--with-oshmem-param-check(=VALUE)], [behavior of OSHMEM API function parameter checking. Valid values are: always, never. If --with-oshmem-param-check is specified with no VALUE argument, it is equivalent to a VALUE of "always"; --without-oshmem-param-check is equivalent to "never" (default: always).])) -if test "$enable_oshmem" != "no"; then - if test "$with_oshmem_param_check" = "no" || \ - test "$with_oshmem_param_check" = "never"; then - shmem_param_check=0 - AC_MSG_RESULT([never]) - elif test "$with_oshmem_param_check" = "yes" || \ - test "$with_oshmem_param_check" = "always" || \ - test -z "$with_oshmem_param_check"; then - shmem_param_check=1 - AC_MSG_RESULT([always]) - else - shmem_param_check=1 - AC_MSG_RESULT([unknown]) - AC_MSG_WARN([*** Unrecognized --with-oshmem-param-check value]) - AC_MSG_WARN([*** See "configure --help" output]) - AC_MSG_WARN([*** Defaulting to "always"]) - fi -else +if test "$with_oshmem_param_check" = "no" || \ + test "$with_oshmem_param_check" = "never"; then shmem_param_check=0 - AC_MSG_RESULT([no]) + AC_MSG_RESULT([never]) +elif test "$with_oshmem_param_check" = "yes" || \ + test "$with_oshmem_param_check" = "always" || \ + test -z "$with_oshmem_param_check"; then + shmem_param_check=1 + AC_MSG_RESULT([always]) +else + shmem_param_check=1 + AC_MSG_RESULT([unknown]) + AC_MSG_WARN([*** Unrecognized --with-oshmem-param-check value]) + AC_MSG_WARN([*** See "configure --help" output]) + AC_MSG_WARN([*** Defaulting to "always"]) fi AC_DEFINE_UNQUOTED(OSHMEM_PARAM_CHECK, $shmem_param_check, [Whether we want to check OSHMEM parameters always or never]) @@ -132,7 +124,7 @@ AC_MSG_CHECKING([if want to build OSHMEM fortran bindings]) AC_ARG_ENABLE(oshmem-fortran, AC_HELP_STRING([--enable-oshmem-fortran], [enable OSHMEM Fortran bindings (default: enabled if Fortran compiler found)])) -if test "$enable_oshmem" != "no" && test "$enable_oshmem_fortran" != "no"; then +if test "$enable_oshmem_fortran" != "no"; then # If no OMPI FORTRAN, bail AS_IF([test $OMPI_TRY_FORTRAN_BINDINGS -eq $OMPI_FORTRAN_NO_BINDINGS && \ test "$enable_oshmem_fortran" = "yes"], @@ -153,7 +145,7 @@ fi # # We can't set am_conditional here since it's yet unknown if there is -# valid Fortran compiler avaliable +# valid Fortran compiler available # ]) dnl diff --git a/config/pkg.m4 b/config/pkg.m4 index b0bab42dfa9..91582341c13 100644 --- a/config/pkg.m4 +++ b/config/pkg.m4 @@ -53,7 +53,7 @@ fi[]dnl # to PKG_CHECK_MODULES(), but does not set variables or print errors. # # Please remember that m4 expands AC_REQUIRE([PKG_PROG_PKG_CONFIG]) -# only at the first occurence in configure.ac, so if the first place +# only at the first occurrence in configure.ac, so if the first place # it's called might be skipped (such as if it is within an "if", you # have to call PKG_CHECK_EXISTS manually # -------------------------------------------------------------- diff --git a/configure.ac b/configure.ac index 9667a3b3a84..511e0100a6c 100644 --- a/configure.ac +++ b/configure.ac @@ -10,9 +10,9 @@ # University of Stuttgart. All rights reserved. # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. -# Copyright (c) 2006-2016 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2006-2018 Cisco Systems, Inc. All rights reserved # Copyright (c) 2006-2008 Sun Microsystems, Inc. All rights reserved. -# Copyright (c) 2006-2011 Los Alamos National Security, LLC. All rights +# Copyright (c) 2006-2017 Los Alamos National Security, LLC. All rights # reserved. # Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. # Copyright (c) 2011-2013 NVIDIA Corporation. All rights reserved. @@ -20,9 +20,11 @@ # Copyright (c) 2013 Mellanox Technologies, Inc. # All rights reserved. # Copyright (c) 2013-2017 Intel, Inc. All rights reserved. -# Copyright (c) 2014-2016 Research Organization for Information Science +# Copyright (c) 2014-2017 Research Organization for Information Science # and Technology (RIST). All rights reserved. -# Copyright (c) 2016 IBM Corporation. All rights reserved. +# Copyright (c) 2016-2017 IBM Corporation. All rights reserved. +# Copyright (c) 2018 Amazon.com, Inc. or its affiliates. +# All Rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -84,6 +86,7 @@ AS_IF([test "$host" != "$target"], [AC_MSG_WARN([Cross-compile detected]) AC_MSG_WARN([Cross-compiling is only partially supported]) AC_MSG_WARN([Proceed at your own risk!])]) + # AC_USE_SYSTEM_EXTENSIONS alters CFLAGS (e.g., adds -g -O2) OPAL_VAR_SCOPE_PUSH([CFLAGS_save]) CFLAGS_save=$CFLAGS @@ -151,13 +154,14 @@ AC_SUBST(libopen_pal_so_version) # transparently by adding some intelligence in autogen.pl # and/or opal_mca.m4, but I don't have the cycles to do this # right now. -AC_SUBST(libmca_opal_common_libfabric_so_version) +AC_SUBST(libmca_opal_common_ofi_so_version) AC_SUBST(libmca_opal_common_cuda_so_version) AC_SUBST(libmca_opal_common_sm_so_version) AC_SUBST(libmca_opal_common_ugni_so_version) AC_SUBST(libmca_opal_common_verbs_so_version) AC_SUBST(libmca_orte_common_alps_so_version) AC_SUBST(libmca_ompi_common_ompio_so_version) +AC_SUBST(libmca_ompi_common_monitoring_so_version) # # Get the versions of the autotools that were used to bootstrap us @@ -264,15 +268,12 @@ m4_ifdef([project_oshmem], [OSHMEM_CONFIGURE_OPTIONS]) # Set up project specific AM_CONDITIONALs AS_IF([test "$enable_ompi" != "no"], [project_ompi_amc=true], [project_ompi_amc=false]) m4_ifndef([project_ompi], [project_ompi_amc=false]) -AM_CONDITIONAL([PROJECT_OMPI], [test "$project_ompi_amc" = "true"]) AS_IF([test "$enable_orte" != "no"], [project_orte_amc=true], [project_orte_amc=false]) m4_ifndef([project_orte], [project_orte_amc=false]) -AM_CONDITIONAL([PROJECT_ORTE], [test "$project_orte_amc" = "true"]) -AS_IF([test "$enable_oshmem" != "no"], [project_oshmem_amc=true], [project_oshmem_amc=false]) -m4_ifndef([project_oshmem], [project_oshmem_amc=false]) -AM_CONDITIONAL([PROJECT_OSHMEM], [test "$project_oshmem_amc" = "true"]) +AS_IF([test "$enable_oshmem" != "no"], [project_oshmem_amc=true], [project_oshmem_amc="no (disabled)"]) +m4_ifndef([project_oshmem], [project_oshmem_amc="no (not available)"]) if test "$enable_binaries" = "no" && test "$enable_dist" = "yes"; then AC_MSG_WARN([--disable-binaries is incompatible with --enable dist]) @@ -420,11 +421,18 @@ if test "$ac_cv_type_ssize_t" = yes ; then fi if test "$ac_cv_type_ptrdiff_t" = yes; then AC_CHECK_SIZEOF(ptrdiff_t) +else + AC_MSG_ERROR([ptrdiff_t type is not available, this is required by C99 standard. Cannot continue]) fi AC_CHECK_SIZEOF(wchar_t) AC_CHECK_SIZEOF(pid_t) +# Check sizes of atomic types so we can define fixed-width types in OPAL +AC_CHECK_SIZEOF(atomic_short, [],[[#include ]]) +AC_CHECK_SIZEOF(atomic_int,[],[[#include ]]) +AC_CHECK_SIZEOF(atomic_long,[],[[#include ]]) +AC_CHECK_SIZEOF(atomic_llong,[],[[#include ]]) # # Check for type alignments @@ -586,7 +594,7 @@ AC_CACHE_SAVE opal_show_title "Header file tests" AC_CHECK_HEADERS([alloca.h aio.h arpa/inet.h dirent.h \ - dlfcn.h execinfo.h err.h fcntl.h grp.h libgen.h \ + dlfcn.h endian.h execinfo.h err.h fcntl.h grp.h libgen.h \ libutil.h memory.h netdb.h netinet/in.h netinet/tcp.h \ poll.h pthread.h pty.h pwd.h sched.h \ strings.h stropts.h linux/ethtool.h linux/sockios.h \ @@ -597,7 +605,7 @@ AC_CHECK_HEADERS([alloca.h aio.h arpa/inet.h dirent.h \ sys/types.h sys/uio.h sys/un.h net/uio.h sys/utsname.h sys/vfs.h sys/wait.h syslog.h \ termios.h ulimit.h unistd.h util.h utmp.h malloc.h \ ifaddrs.h crt_externs.h regex.h mntent.h paths.h \ - ioLib.h sockLib.h hostLib.h shlwapi.h sys/synch.h db.h ndbm.h zlib.h]) + ioLib.h sockLib.h hostLib.h shlwapi.h sys/synch.h db.h ndbm.h zlib.h ieee754.h]) AC_CHECK_HEADERS([sys/mount.h], [], [], [AC_INCLUDES_DEFAULT @@ -782,27 +790,6 @@ AC_INCLUDES_DEFAULT #endif ]) -# -# Check for ptrdiff type. Yes, there are platforms where -# sizeof(void*) != sizeof(long) (64 bit Windows, apparently). -# -AC_MSG_CHECKING([for pointer diff type]) -if test $ac_cv_type_ptrdiff_t = yes ; then - opal_ptrdiff_t="ptrdiff_t" - opal_ptrdiff_size=$ac_cv_sizeof_ptrdiff_t -elif test $ac_cv_sizeof_void_p -eq $ac_cv_sizeof_long ; then - opal_ptrdiff_t="long" - opal_ptrdiff_size=$ac_cv_sizeof_long -elif test $ac_cv_type_long_long = yes && test $ac_cv_sizeof_void_p -eq $ac_cv_sizeof_long_long ; then - opal_ptrdiff_t="long long" - opal_ptrdiff_size=$ac_cv_sizeof_long_long -else - AC_MSG_ERROR([Could not find datatype to emulate ptrdiff_t. Cannot continue]) -fi -AC_DEFINE_UNQUOTED([OPAL_PTRDIFF_TYPE], [$opal_ptrdiff_t], - [type to use for ptrdiff_t]) -AC_MSG_RESULT([$opal_ptrdiff_t (size: $opal_ptrdiff_size)]) - # # Find corresponding types for MPI_Aint, MPI_Count, and MPI_Offset. # And if relevant, find the corresponding MPI_ADDRESS_KIND, @@ -1096,7 +1083,7 @@ AC_CACHE_SAVE # visible again # ########################################################### -OPAL_SETUP_FT_OPTIONS +dnl OPAL_SETUP_FT_OPTIONS ########################################################### # The following line is always required as it contains the # AC_DEFINE and AM_CONDITIONAL calls that set variables used @@ -1119,6 +1106,23 @@ OPAL_MCA m4_ifdef([project_ompi], [OMPI_REQUIRE_ENDPOINT_TAG_FINI]) +# Last minute disable of OpenSHMEM if we didn't find any oshmem SPMLs +if test "$project_oshmem_amc" = "true" && test $OSHMEM_FOUND_WORKING_SPML -eq 0 ; then + # We don't have an spml that will work, so oshmem wouldn't be able + # to run an application. Therefore, don't build the oshmem layer. + if test "$enable_oshmem" != "no" && test -n "$enable_oshmem"; then + AC_MSG_WARN([No spml found, so OpenSHMEM layer will be non functional.]) + AC_MSG_ERROR([Aborting because OpenSHMEM requested, but can not build.]) + else + AC_MSG_WARN([No spml found. Will not build OpenSHMEM layer.]) + project_oshmem_amc="false (no spml)" + # now for the hard part, remove project from list that will + # run. This is a hack, but it works as long as the project + # remains named "oshmem". + MCA_PROJECT_SUBDIRS=`echo "$MCA_PROJECT_SUBDIRS" | sed -e 's/oshmem//'` + fi +fi + # checkpoint results AC_CACHE_SAVE @@ -1356,6 +1360,14 @@ m4_ifdef([project_ompi], # Party on ############################################################################ +# set projects good/no good AM_CONDITIONALS. This is at the end so +# that the OSHMEM/OMPI projects can be disabled, if needed, based on +# MCA tests. If a project is to be disabled, also remove it from +# MCA_PROJECT_SUBDIRS to actually disable building. +AM_CONDITIONAL([PROJECT_OMPI], [test "$project_ompi_amc" = "true"]) +AM_CONDITIONAL([PROJECT_ORTE], [test "$project_orte_amc" = "true"]) +AM_CONDITIONAL([PROJECT_OSHMEM], [test "$project_oshmem_amc" = "true"]) + AC_MSG_CHECKING([if libtool needs -no-undefined flag to build shared libraries]) case "`uname`" in CYGWIN*|MINGW*|AIX*) @@ -1423,10 +1435,12 @@ AC_CONFIG_FILES([ test/datatype/Makefile test/dss/Makefile test/class/Makefile + test/mpool/Makefile test/support/Makefile test/threads/Makefile test/util/Makefile ]) + m4_ifdef([project_ompi], [AC_CONFIG_FILES([test/monitoring/Makefile])]) AC_CONFIG_FILES([contrib/dist/mofed/debian/rules], diff --git a/contrib/Makefile.am b/contrib/Makefile.am index cd67ee608f7..bf78f975ad5 100644 --- a/contrib/Makefile.am +++ b/contrib/Makefile.am @@ -9,12 +9,14 @@ # University of Stuttgart. All rights reserved. # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. -# Copyright (c) 2009 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2009-2017 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2010 IBM Corporation. All rights reserved. # Copyright (c) 2010-2011 Oak Ridge National Labs. All rights reserved. -# Copyright (c) 2013-2016 Los Alamos National Security, Inc. All rights +# Copyright (c) 2013-2018 Los Alamos National Security, Inc. All rights # reserved. # Copyright (c) 2013 Intel Corporation. All rights reserved. +# Copyright (c) 2017 Amazon.com, Inc. or its affiliates. +# All Rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -35,11 +37,10 @@ EXTRA_DIST = \ completion/mpirun.sh \ completion/mpirun.zsh \ dist/make_dist_tarball \ + dist/make-authors.pl \ dist/linux/openmpi.spec \ dist/mofed/compile_debian_mlnx_example.in \ dist/mofed/debian \ - dist/macosx-pkg/buildpackage.sh \ - dist/macosx-pkg/ReadMe.rtf \ platform/optimized \ platform/redstorm \ platform/cray_xt3 \ @@ -66,17 +67,21 @@ EXTRA_DIST = \ platform/lanl/cray_xc_cle5.2/optimized-common \ platform/lanl/cray_xc_cle5.2/optimized-lustre \ platform/lanl/cray_xc_cle5.2/optimized-lustre.conf \ - platform/lanl/toss/debug-common \ - platform/lanl/toss/debug \ - platform/lanl/toss/debug.conf \ - platform/lanl/toss/debug-mlx \ - platform/lanl/toss/debug-mlx.conf \ - platform/lanl/toss/optimized-common \ - platform/lanl/toss/optimized \ - platform/lanl/toss/optimized.conf \ - platform/lanl/toss/optimized-mlx \ - platform/lanl/toss/optimized-mlx.conf \ - platform/lanl/toss/toss-common \ + platform/lanl/toss/README \ + platform/lanl/toss/common \ + platform/lanl/toss/common-optimized \ + platform/lanl/toss/cray-lustre-optimized \ + platform/lanl/toss/cray-lustre-optimized.conf \ + platform/lanl/toss/toss2-mlx-optimized \ + platform/lanl/toss/toss2-mlx-optimized.conf \ + platform/lanl/toss/toss2-qib-optimized \ + platform/lanl/toss/toss2-qib-optimized.conf \ + platform/lanl/toss/toss3-hfi-optimized \ + platform/lanl/toss/toss3-hfi-optimized.conf \ + platform/lanl/toss/toss3-mlx-optimized \ + platform/lanl/toss/toss3-mlx-optimized.conf \ + platform/lanl/toss/toss3-wc-optimized \ + platform/lanl/toss/toss3-wc-optimized.conf \ platform/lanl/darwin/darwin-common \ platform/lanl/darwin/debug-common \ platform/lanl/darwin/optimized-common \ diff --git a/contrib/build-server/hwloc-nightly-coverity.pl b/contrib/build-server/hwloc-nightly-coverity.pl deleted file mode 100755 index f2e1f1c4b84..00000000000 --- a/contrib/build-server/hwloc-nightly-coverity.pl +++ /dev/null @@ -1,167 +0,0 @@ -#!/usr/bin/env perl - -use warnings; -use strict; - -use Getopt::Long; -use File::Temp qw/ tempfile tempdir /; -use File::Basename; - -my $coverity_project = "hwloc"; -# Coverity changes this URL periodically -my $coverity_tool_url = "https://scan.coverity.com/download/cxx/linux64"; - -my $filename_arg; -my $coverity_token_arg; -my $dry_run_arg = 0; -my $verbose_arg = 0; -my $debug_arg = 0; -my $logfile_dir_arg; -my $configure_args = ""; -my $make_args = "-j 32"; -my $help_arg = 0; - -&Getopt::Long::Configure("bundling"); -my $ok = Getopt::Long::GetOptions("filename=s" => \$filename_arg, - "coverity-token=s" => \$coverity_token_arg, - "logfile-dir=s" => \$logfile_dir_arg, - "configure-args=s" => \$configure_args, - "make-args=s" => \$make_args, - "dry-run!" => \$dry_run_arg, - "verbose!" => \$verbose_arg, - "debug!" => \$debug_arg, - "help|h" => \$help_arg); - -$ok = 0 - if (!defined($filename_arg)); -$ok = 0 - if (!defined($coverity_token_arg)); -if (!$ok || $help_arg) { - print "Usage: $0 --filename=FILENAME --coverity-token=TOKEN [--dry-run] [--verbose] [--help]\n"; - exit($ok); -} - -die "Cannot read $filename_arg" - if (! -r $filename_arg); - -$verbose_arg = 1 - if ($debug_arg); - -###################################################################### - -sub verbose { - print @_ - if ($verbose_arg); -} - -# run a command and save the stdout / stderr -sub safe_system { - my $allowed_to_fail = shift; - my $cmd = shift; - my $stdout_file = shift; - - # Redirect stdout if requested or not verbose - if (defined($stdout_file)) { - $stdout_file = "$logfile_dir_arg/$stdout_file"; - unlink($stdout_file); - $cmd .= " >$stdout_file"; - } elsif (!$debug_arg) { - $cmd .= " >/dev/null"; - } - $cmd .= " 2>&1"; - - my $rc = system($cmd); - if (0 != $rc && !$allowed_to_fail) { - # If we die/fail, ensure to change out of the temp tree so - # that it can be removed upon exit. - chdir("/"); - print "Command $cmd failed: exit status $rc\n"; - if (defined($stdout_file) && -f $stdout_file) { - print "Last command output:\n"; - system("cat $stdout_file"); - } - die "Cannot continue"; - } - system("cat $stdout_file") - if ($debug_arg && defined($stdout_file) && -f $stdout_file); -} - -###################################################################### - -# Make an area to work - -my $dir = tempdir(CLEANUP => 0); -chdir($dir); -verbose "*** Working in $dir\n"; - -###################################################################### - -# Get the coverity tool, put it in our path - -my $cdir = "$ENV{HOME}/coverity"; -safe_system(0, "mkdir $cdir") - if (! -d $cdir); - -# Optimization: the tool is pretty large. If our local copy is less -# than a day old, just use that without re-downloading. -my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, - $atime,$mtime,$ctime,$blksize,$blocks) = - stat("$cdir/coverity_tool.tgz"); -my $now = time(); -if (!defined($mtime) || $mtime < $now - 24*60*60) { - verbose "*** Downloading new copy of the coverity tool\n"; - safe_system(0, "wget $coverity_tool_url --post-data \"token=$coverity_token_arg&project=$coverity_project\" -O coverity_tool.tgz"); - safe_system(0, "cp coverity_tool.tgz $cdir"); -} - -verbose "*** Expanding coverity tool tarball\n"; -safe_system(0, "tar xf $cdir/coverity_tool.tgz"); -opendir(my $dh, ".") || - die "Can't opendir ."; -my @files = grep { /^cov/ && -d "./$_" } readdir($dh); -closedir($dh); - -my $cov_dir = "$dir/$files[0]/bin"; -$ENV{PATH} = "$cov_dir:$ENV{PATH}"; - -###################################################################### - -# Expand the HWLOC tarball, build it - -verbose "*** Extracting HWLOC tarball\n"; -safe_system(0, "tar xf $filename_arg"); -my $tarball_filename = basename($filename_arg); -$tarball_filename =~ m/^hwloc-(.+)\.tar.+$/; -my $hwloc_ver = $1; -chdir("hwloc-$hwloc_ver"); - -verbose "*** Configuring HWLOC tarball\n"; -safe_system(0, "./configure $configure_args", "configure"); - -verbose "*** Building HWLOC tarball\n"; -safe_system(0, "cov-build --dir cov-int make $make_args", "cov-build"); - -# Tar up the Coverity results -verbose "*** Tarring up results\n"; -safe_system(0, "tar jcf $hwloc_ver-analyzed.tar.bz2 cov-int"); - -# If not dry-run, submit to Coverity -if ($dry_run_arg) { - verbose "*** Would have submitted, but this is a dry run\n"; -} else { - verbose "*** Submitting results\n"; - safe_system(0, "curl --form token=$coverity_token_arg " . - "--form email=brice.goglin\@labri.fr " . - "--form file=\@$hwloc_ver-analyzed.tar.bz2 " . - "--form version=$hwloc_ver " . - "--form description=nightly-master " . - "https://scan.coverity.com/builds?project=hwloc", - "coverity-submit"); -} - -verbose("*** All done\n"); - -# Chdir out of the tempdir so that it can be removed -chdir("/"); - -exit(0); diff --git a/contrib/build-server/hwloc-nightly-tarball.sh b/contrib/build-server/hwloc-nightly-tarball.sh deleted file mode 100755 index 2d3d7891ad0..00000000000 --- a/contrib/build-server/hwloc-nightly-tarball.sh +++ /dev/null @@ -1,186 +0,0 @@ -#!/bin/sh - -##### -# -# Configuration options -# -##### - -# e-mail address to send results to -results_addr=hwloc-devel@lists.open-mpi.org -#results_addr=rhc@open-mpi.org - -# git repository URL -code_uri=https://github.com/open-mpi/hwloc.git -raw_uri=https://raw.github.com/open-mpi/hwloc - -# where to put built tarballs -outputroot=$HOME/hwloc/nightly - -# Target where to scp the final tarballs -output_ssh_target=ompiteam@192.185.39.252 - -# where to find the build script -script_uri=contrib/nightly/make_snapshot_tarball - -# helper scripts dir -script_dir=$HOME/ompi/contrib/build-server - -# Set this to any value for additional output; typically only when -# debugging -: ${debug:=} - -# The tarballs to make -if [ $# -eq 0 ] ; then - # Branches v1.6 and earlier were not updated to build nightly - # snapshots from git, so only check v1.7 and later - branches="master v1.11" -else - branches=$@ -fi - -# Build root - scratch space -build_root=$HOME/hwloc/nightly-tarball-build-root - -# Coverity stuff -coverity_token=`cat $HOME/coverity/hwloc-token.txt` - -export PATH=$HOME_PREFIX/bin:$PATH -export LD_LIBRARY_PATH=$HOME_PREFIX/lib:$LD_LIBRARY_PATH - -##### -# -# Actually do stuff -# -##### - -debug() { - if test -n "$debug"; then - echo "=== DEBUG: $*" - fi -} - -run_command() { - debug "Running command: $*" - debug "Running in pwd: `pwd`" - if test -n "$debug"; then - eval $* - else - eval $* > /dev/null 2>&1 - fi - - if test $? -ne 0; then - echo "=== Command failed: $*" - fi -} - -# load the modules configuration -. $MODULE_INIT -module use $AUTOTOOL_MODULE - -# get our nightly build script -mkdir -p $build_root -cd $build_root - -pending_coverity=$build_root/tarballs-to-run-through-coverity.txt -rm -f $pending_coverity -touch $pending_coverity - -# Loop making them -module unload autotools -for branch in $branches; do - echo "=== Branch: $branch" - # Get the last tarball version that was made - prev_snapshot=`cat $outputroot/$branch/latest_snapshot.txt` - prev_snapshot_hash=`echo $prev_snapshot | cut -d- -f3` - echo "=== Previous snapshot: $prev_snapshot (hash: $prev_snapshot_hash)" - - # Form a URL-specific script name - script=$branch-`basename $script_uri` - - echo "=== Getting script from: $raw_uri" - run_command wget --quiet --no-check-certificate --tries=10 $raw_uri/$branch/$script_uri -O $script - if test ! $? -eq 0 ; then - echo "wget of hwloc nightly tarball create script failed." - if test -f $script ; then - echo "Using older version of $script for this run." - else - echo "No build script available. Aborting." - exit 1 - fi - fi - chmod +x $script - - module load "autotools/hwloc-$branch" - # module load "tex-live/hwloc-$branch" - - echo "=== Running script..." - run_command ./$script \ - $build_root/$branch \ - $results_addr \ - $outputroot/$branch \ - $code_uri \ - $branch - - module unload autotools - echo "=== Done running script" - - # Did the script generate a new tarball? Ensure to compare the - # only the hash of the previous tarball and the hash of the new - # tarball (the filename also contains the date/timestamp, which - # will always be different). - - # If so, save it so that we can spawn the coverity checker on it - # afterwards. Only for this for the master (for now). - latest_snapshot=`cat $outputroot/$branch/latest_snapshot.txt` - latest_snapshot_hash=`echo $latest_snapshot | cut -d- -f3` - echo "=== Latest snapshot: $latest_snapshot (hash: $latest_snapshot_hash)" - if test "$prev_snapshot_hash" = "$latest_snapshot_hash"; then - echo "=== Hash has not changed; no need to upload/save the new tarball" - else - if test "$branch" = "master"; then - echo "=== Saving output for a Coverity run" - echo "$outputroot/$branch/hwloc-$latest_snapshot.tar.bz2" >> $pending_coverity - else - echo "=== NOT saving output for a Coverity run" - fi - echo "=== Posting tarball to open-mpi.org" - # tell the web server to cleanup old nightly tarballs - run_command ssh -p 2222 \ - $output_ssh_target \ - \"git/ompi/contrib/build-server/remove-old.pl 7 public_html/software/hwloc/nightly/$branch\" - # upload the new ones - run_command scp -P 2222 \ - $outputroot/$branch/hwloc-$latest_snapshot.tar.* \ - $output_ssh_target:public_html/software/hwloc/nightly/$branch/ - run_command scp -P 2222 \ - $outputroot/$branch/latest_snapshot.txt \ - $output_ssh_target:public_html/software/hwloc/nightly/$branch/ - # direct the web server to regenerate the checksums - run_command ssh -p 2222 \ - $output_ssh_target \ - \"cd public_html/software/hwloc/nightly/$branch \&\& md5sum hwloc\* \> md5sums.txt\" - run_command ssh -p 2222 \ - $output_ssh_target \ - \"cd public_html/software/hwloc/nightly/$branch \&\& sha1sum hwloc\* \> sha1sums.txt\" - fi - - # Failed builds are not removed. But if a human forgets to come - # in here and clean up the old failed builds, we can accumulate - # many over time. So remove any old failed builds that are over - # 4 weeks old. - run_command ${script_dir}/remove-old.pl 7 $build_root/$branch - -done - -# If we had any new snapshots to send to coverity, process them now - -for tarball in `cat $pending_coverity`; do - run_command ${script_dir}/hwloc-nightly-coverity.pl \ - --filename=$tarball \ - --coverity-token=$coverity_token \ - --verbose \ - --logfile-dir=$HOME/coverity \ - --make-args=\"-j8\" -done -rm -f $pending_coverity diff --git a/contrib/build-server/openmpi-nightly-coverity.pl b/contrib/build-server/openmpi-nightly-coverity.pl deleted file mode 100755 index 12390ab22d0..00000000000 --- a/contrib/build-server/openmpi-nightly-coverity.pl +++ /dev/null @@ -1,162 +0,0 @@ -#!/usr/bin/env perl - -use warnings; -use strict; - -use Getopt::Long; -use File::Temp qw/ tempfile tempdir /; -use File::Basename; - -my $coverity_project = "Open+MPI"; -# Coverity changes this URL periodically -my $coverity_tool_url = "https://scan.coverity.com/download/cxx/linux64"; - -my $filename_arg; -my $coverity_token_arg; -my $dry_run_arg = 0; -my $verbose_arg = 0; -my $debug_arg = 0; -my $logfile_dir_arg = "/tmp"; -my $configure_args = ""; -my $make_args = "-j 32"; -my $help_arg = 0; - -&Getopt::Long::Configure("bundling"); -my $ok = Getopt::Long::GetOptions("filename=s" => \$filename_arg, - "coverity-token=s" => \$coverity_token_arg, - "logfile-dir=s" => \$logfile_dir_arg, - "configure-args=s" => \$configure_args, - "make-args=s" => \$make_args, - "dry-run!" => \$dry_run_arg, - "verbose!" => \$verbose_arg, - "debug!" => \$debug_arg, - "help|h" => \$help_arg); - -$ok = 0 - if (!defined($filename_arg)); -$ok = 0 - if (!defined($coverity_token_arg)); -if (!$ok || $help_arg) { - print "Usage: $0 --filename=FILENAME --coverity-token=TOKEN [--dry-run] [--verbose] [--help]\n"; - exit($ok); -} - -die "Cannot read $filename_arg" - if (! -r $filename_arg); - -$verbose_arg = 1 - if ($debug_arg); - -###################################################################### - -sub verbose { - print @_ - if ($verbose_arg); -} - -# run a command and save the stdout / stderr -sub safe_system { - my $allowed_to_fail = shift; - my $cmd = shift; - my $stdout_file = shift; - - # Redirect stdout if requested or not verbose - if (defined($stdout_file)) { - $stdout_file = "$logfile_dir_arg/$stdout_file"; - unlink($stdout_file); - $cmd .= " >$stdout_file"; - } elsif (!$debug_arg) { - $cmd .= " >/dev/null"; - } - $cmd .= " 2>&1"; - - my $rc = system($cmd); - if (0 != $rc && !$allowed_to_fail) { - # If we die/fail, ensure to change out of the temp tree so - # that it can be removed upon exit. - chdir("/"); - die "Command $cmd failed: exit status $rc"; - } - system("cat $stdout_file") - if ($debug_arg && defined($stdout_file) && -f $stdout_file); -} - -###################################################################### - -# Make an area to work - -my $dir = tempdir(CLEANUP => 1); -chdir($dir); -verbose "*** Working in $dir\n"; - -###################################################################### - -# Get the coverity tool, put it in our path - -my $cdir = "$ENV{HOME}/coverity"; -safe_system(0, "mkdir $cdir") - if (! -d $cdir); - -# Optimization: the tool is pretty large. If our local copy is less -# than a day old, just use that without re-downloading. -my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, - $atime,$mtime,$ctime,$blksize,$blocks) = - stat("$cdir/coverity_tool.tgz"); -my $now = time(); -if (!defined($mtime) || $mtime < $now - 24*60*60) { - verbose "*** Downloading new copy of the coverity tool\n"; - safe_system(0, "wget $coverity_tool_url --post-data \"token=$coverity_token_arg&project=$coverity_project\" -O coverity_tool.tgz"); - safe_system(0, "cp coverity_tool.tgz $cdir"); -} - -verbose "*** Expanding coverity tool tarball\n"; -safe_system(0, "tar xf $cdir/coverity_tool.tgz"); -opendir(my $dh, ".") || - die "Can't opendir ."; -my @files = grep { /^cov/ && -d "./$_" } readdir($dh); -closedir($dh); - -my $cov_dir = "$dir/$files[0]/bin"; -$ENV{PATH} = "$cov_dir:$ENV{PATH}"; - -###################################################################### - -# Expand the OMPI tarball, build it - -verbose "*** Extracting OMPI tarball\n"; -safe_system(0, "tar xf $filename_arg"); -my $tarball_filename = basename($filename_arg); -$tarball_filename =~ m/^openmpi-(.+)\.tar.+$/; -my $ompi_ver = $1; -chdir("openmpi-$ompi_ver"); - -verbose "*** Configuring OMPI tarball\n"; -safe_system(0, "./configure $configure_args", "configure"); - -verbose "*** Building OMPI tarball\n"; -safe_system(0, "cov-build --dir cov-int make $make_args", "cov-build"); - -# Tar up the Coverity results -verbose "*** Tarring up results\n"; -safe_system(0, "tar jcf $ompi_ver-analyzed.tar.bz2 cov-int"); - -# If not dry-run, submit to Coverity -if ($dry_run_arg) { - verbose "*** Would have submitted, but this is a dry run\n"; -} else { - verbose "*** Submitting results\n"; - safe_system(0, "curl --form token=$coverity_token_arg " . - "--form email=jsquyres\@cisco.com " . - "--form file=\@$ompi_ver-analyzed.tar.bz2 " . - "--form version=$ompi_ver " . - "--form description=nightly-master " . - "https://scan.coverity.com/builds?project=$coverity_project", - "coverity-submit"); -} - -verbose("*** All done\n"); - -# Chdir out of the tempdir so that it can be removed -chdir("/"); - -exit(0); diff --git a/contrib/build-server/openmpi-nightly-tarball.sh b/contrib/build-server/openmpi-nightly-tarball.sh deleted file mode 100755 index 500d097d357..00000000000 --- a/contrib/build-server/openmpi-nightly-tarball.sh +++ /dev/null @@ -1,192 +0,0 @@ -#!/bin/sh - -##### -# -# Configuration options -# -##### - -# e-mail address to send results to -results_addr=testing@lists.open-mpi.org -#results_addr=rhc@open-mpi.org - -# Set this to any value for additional output; typically only when -# debugging -: ${debug:=} - -# svn repository uri -master_code_uri=https://github.com/open-mpi/ompi.git -master_raw_uri=https://raw.github.com/open-mpi/ompi - -# where to put built tarballs - needs to be -# adjusted to match your site! -outputroot=$HOME/openmpi/nightly - -# Target where to scp the final tarballs -output_ssh_target=ompiteam@192.185.39.252 - -# where to find the build script -script_uri=contrib/nightly/create_tarball.sh - -# helper scripts dir -script_dir=$HOME/ompi/contrib/build-server - -# The tarballs to make -if [ $# -eq 0 ] ; then - # We're no longer ever checking the 1.0 - 1.8 branches anymore - branches="master v1.10 v2.x v2.0.x" -else - branches=$@ -fi - -# Build root - scratch space -build_root=$HOME/openmpi/nightly-tarball-build-root - -# Coverity stuff -coverity_token=`cat $HOME/coverity/openmpi-token.txt` -coverity_configure_args="--enable-debug --enable-mpi-fortran --enable-mpi-cxx --enable-mpi-java --enable-oshmem --enable-oshmem-fortran --with-usnic --with-libfabric=/mnt/data/local-installs" - -export PATH=$HOME_PREFIX/bin:$PATH -export LD_LIBRARY_PATH=$HOME_PREFIX/lib:$LD_LIBRARY_PATH - -##### -# -# Actually do stuff -# -##### - -debug() { - if test -n "$debug"; then - echo "=== DEBUG: $*" - fi -} - -run_command() { - debug "Running command: $*" - debug "Running in pwd: `pwd`" - if test -n "$debug"; then - eval $* - else - eval $* > /dev/null 2>&1 - fi - - if test $? -ne 0; then - echo "=== Command failed: $*" - fi -} - -# load the modules configuration -. $MODULE_INIT -module use $AUTOTOOL_MODULE - -# get our nightly build script -mkdir -p $build_root -cd $build_root - -pending_coverity=$build_root/tarballs-to-run-through-coverity.txt -rm -f $pending_coverity -touch $pending_coverity - -# Loop making the tarballs -module unload autotools -for branch in $branches; do - echo "=== Branch: $branch" - # Get the last tarball version that was made - prev_snapshot=`cat $outputroot/$branch/latest_snapshot.txt` - prev_snapshot_hash=`echo $prev_snapshot | cut -d- -f3` - echo "=== Previous snapshot: $prev_snapshot (hash: $prev_snapshot_hash)" - - code_uri=$master_code_uri - raw_uri=$master_raw_uri - - # Form a URL-specific script name - script=$branch-`basename $script_uri` - - echo "=== Getting script from: $raw_uri" - run_command wget --quiet --no-check-certificate --tries=10 $raw_uri/$branch/$script_uri -O $script - if test ! $? -eq 0 ; then - echo "wget of OMPI nightly tarball create script failed." - if test -f $script ; then - echo "Using older version of $script for this run." - else - echo "No build script available. Aborting." - exit 1 - fi - fi - chmod +x $script - - module load "autotools/ompi-$branch" - - echo "=== Running script..." - run_command eval ./$script \ - $build_root/$branch \ - $results_addr \ - $outputroot/$branch \ - $code_uri \ - $branch - - module unload autotools - echo "=== Done running script" - - # Did the script generate a new tarball? Ensure to compare the - # only the hash of the previous tarball and the hash of the new - # tarball (the filename also contains the date/timestamp, which - # will always be different). - - # If so, save it so that we can spawn the coverity checker on it - # afterwards. Only for this for the master (for now). - latest_snapshot=`cat $outputroot/$branch/latest_snapshot.txt` - latest_snapshot_hash=`echo $latest_snapshot | cut -d- -f3` - echo "=== Latest snapshot: $latest_snapshot (hash: $latest_snapshot_hash)" - if test "$prev_snapshot_hash" = "$latest_snapshot_hash"; then - echo "=== Hash has not changed; no need to upload/save the new tarball" - else - if test "$branch" = "master"; then - echo "=== Saving output for a Coverity run" - echo "$outputroot/$branch/openmpi-$latest_snapshot.tar.bz2" >> $pending_coverity - else - echo "=== NOT saving output for a Coverity run" - fi - echo "=== Posting tarball to open-mpi.org" - # tell the web server to cleanup old nightly tarballs - run_command ssh -p 2222 \ - $output_ssh_target \ - \"git/ompi/contrib/build-server/remove-old.pl 7 public_html/nightly/$branch\" - # upload the new ones - run_command scp -P 2222 \ - $outputroot/$branch/openmpi-$latest_snapshot.tar.* \ - $output_ssh_target:public_html/nightly/$branch/ - run_command scp -P 2222 \ - $outputroot/$branch/latest_snapshot.txt \ - $output_ssh_target:public_html/nightly/$branch/ - # direct the web server to regenerate the checksums - run_command ssh -p 2222 \ - $output_ssh_target \ - \"cd public_html/nightly/$branch \&\& md5sum openmpi\* \> md5sums.txt\" - run_command ssh -p 2222 \ - $output_ssh_target \ - \"cd public_html/nightly/$branch \&\& sha1sum openmpi\* \> sha1sums.txt\" - fi - - # Failed builds are not removed. But if a human forgets to come - # in here and clean up the old failed builds, we can accumulate - # many over time. So remove any old failed builds that are over - # 4 weeks old. - run_command ${script_dir}/remove-old.pl 7 $build_root/$branch - -done - - -# If we had any new snapshots to send to coverity, process them now - -for tarball in `cat $pending_coverity`; do - echo "=== Submitting $tarball to Coverity..." - run_command ${script_dir}/openmpi-nightly-coverity.pl \ - --filename=$tarball \ - --coverity-token=$coverity_token \ - --verbose \ - --logfile-dir=$HOME/coverity \ - --make-args=-j4 \ - --configure-args=\"$coverity_configure_args\" -done -rm -f $pending_coverity diff --git a/contrib/build-server/pmix-nightly-coverity.pl b/contrib/build-server/pmix-nightly-coverity.pl deleted file mode 100755 index 329393fd3bf..00000000000 --- a/contrib/build-server/pmix-nightly-coverity.pl +++ /dev/null @@ -1,162 +0,0 @@ -#!/usr/bin/env perl - -use warnings; -use strict; - -use Getopt::Long; -use File::Temp qw/ tempfile tempdir /; -use File::Basename; - -my $coverity_project = "open-mpi%2Fpmix"; -# Coverity changes this URL periodically -my $coverity_tool_url = "https://scan.coverity.com/download/cxx/linux64"; - -my $filename_arg; -my $coverity_token_arg; -my $dry_run_arg = 0; -my $verbose_arg = 0; -my $debug_arg = 0; -my $logfile_dir_arg = "/tmp"; -my $configure_args = ""; -my $make_args = "-j 32"; -my $help_arg = 0; - -&Getopt::Long::Configure("bundling"); -my $ok = Getopt::Long::GetOptions("filename=s" => \$filename_arg, - "coverity-token=s" => \$coverity_token_arg, - "logfile-dir=s" => \$logfile_dir_arg, - "configure-args=s" => \$configure_args, - "make-args=s" => \$make_args, - "dry-run!" => \$dry_run_arg, - "verbose!" => \$verbose_arg, - "debug!" => \$debug_arg, - "help|h" => \$help_arg); - -$ok = 0 - if (!defined($filename_arg)); -$ok = 0 - if (!defined($coverity_token_arg)); -if (!$ok || $help_arg) { - print "Usage: $0 --filename=FILENAME --coverity-token=TOKEN [--dry-run] [--verbose] [--help]\n"; - exit($ok); -} - -die "Cannot read $filename_arg" - if (! -r $filename_arg); - -$verbose_arg = 1 - if ($debug_arg); - -###################################################################### - -sub verbose { - print @_ - if ($verbose_arg); -} - -# run a command and save the stdout / stderr -sub safe_system { - my $allowed_to_fail = shift; - my $cmd = shift; - my $stdout_file = shift; - - # Redirect stdout if requested or not verbose - if (defined($stdout_file)) { - $stdout_file = "$logfile_dir_arg/$stdout_file"; - unlink($stdout_file); - $cmd .= " >$stdout_file"; - } elsif (!$debug_arg) { - $cmd .= " >/dev/null"; - } - $cmd .= " 2>&1"; - - my $rc = system($cmd); - if (0 != $rc && !$allowed_to_fail) { - # If we die/fail, ensure to change out of the temp tree so - # that it can be removed upon exit. - chdir("/"); - die "Command $cmd failed: exit status $rc"; - } - system("cat $stdout_file") - if ($debug_arg && defined($stdout_file) && -f $stdout_file); -} - -###################################################################### - -# Make an area to work - -my $dir = tempdir(CLEANUP => 1); -chdir($dir); -verbose "*** Working in $dir\n"; - -###################################################################### - -# Get the coverity tool, put it in our path. - -my $cdir = "$ENV{HOME}/coverity"; -safe_system(0, "mkdir $cdir") - if (! -d $cdir); - -# Optimization: the tool is pretty large. If our local copy is less -# than a day old, just use that without re-downloading. -my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, - $atime,$mtime,$ctime,$blksize,$blocks) = - stat("$cdir/coverity_tool.tgz"); -my $now = time(); -if (!defined($mtime) || $mtime < $now - 24*60*60) { - verbose "*** Downloading new copy of the coverity tool\n"; - safe_system(0, "wget $coverity_tool_url --post-data \"token=$coverity_token_arg&project=$coverity_project\" -O coverity_tool.tgz"); - safe_system(0, "cp coverity_tool.tgz $cdir"); -} - -verbose "*** Expanding coverity tool tarball\n"; -safe_system(0, "tar xf $cdir/coverity_tool.tgz"); -opendir(my $dh, ".") || - die "Can't opendir ."; -my @files = grep { /^cov/ && -d "./$_" } readdir($dh); -closedir($dh); - -my $cov_dir = "$dir/$files[0]/bin"; -$ENV{PATH} = "$cov_dir:$ENV{PATH}"; - -###################################################################### - -# Expand the PMIX tarball, build it - -verbose "*** Extracting PMIX tarball\n"; -safe_system(0, "tar xf $filename_arg"); -my $tarball_filename = basename($filename_arg); -$tarball_filename =~ m/^pmix-(.+)\.tar.+$/; -my $pmix_ver = $1; -chdir("pmix-$pmix_ver"); - -verbose "*** Configuring PMIX tarball\n"; -safe_system(0, "./configure $configure_args", "configure"); - -verbose "*** Building PMIX tarball\n"; -safe_system(0, "cov-build --dir cov-int make $make_args", "cov-build"); - -# Tar up the Coverity results -verbose "*** Tarring up results\n"; -safe_system(0, "tar jcf $pmix_ver-analyzed.tar.bz2 cov-int"); - -# If not dry-run, submit to Coverity -if ($dry_run_arg) { - verbose "*** Would have submitted, but this is a dry run\n"; -} else { - verbose "*** Submitting results\n"; - safe_system(0, "curl --form token=$coverity_token_arg " . - "--form email=rhc\@open-mpi.org " . - "--form file=\@$pmix_ver-analyzed.tar.bz2 " . - "--form version=$pmix_ver " . - "--form description=nightly-master " . - "https://scan.coverity.com/builds?project=$coverity_project", - "coverity-submit"); -} - -verbose("*** All done\n"); - -# Chdir out of the tempdir so that it can be removed -chdir("/"); - -exit(0); diff --git a/contrib/build-server/pmix-nightly-tarball.sh b/contrib/build-server/pmix-nightly-tarball.sh deleted file mode 100755 index 6a248ef1e4a..00000000000 --- a/contrib/build-server/pmix-nightly-tarball.sh +++ /dev/null @@ -1,196 +0,0 @@ -#!/bin/sh - -##### -# -# Configuration options -# -##### - -# e-mail address to send results to -#results_addr=testing@lists.open-mpi.org -results_addr=rhc@open-mpi.org - -# Set this to any value for additional output; typically only when -# debugging -: ${debug:=} - -# svn repository uri -master_code_uri=https://github.com/pmix/master.git -master_raw_uri=https://raw.github.com/pmix/master -release_code_uri=https://github.com/pmix/releases.git -release_raw_uri=https://raw.github.com/pmix/releases - -# where to put built tarballs -outputroot=$HOME/pmix/nightly - -# Target where to scp the final tarballs -output_ssh_target=ompiteam@192.185.39.252 - -# where to find the build script -script_uri=contrib/nightly/create_tarball.sh - -# helper scripts dir -script_dir=$HOME/ompi/contrib/build-server - -# The tarballs to make -if [ $# -eq 0 ] ; then - branches="master" -else - branches=$@ -fi - -# Build root - scratch space -build_root=$HOME/pmix/nightly-tarball-build-root - -# Coverity stuff -coverity_token=`cat $HOME/coverity/pmix-token.txt` -coverity_configure_args="--with-libevent=$HOME_PREFIX" - -export PATH=$HOME_PREFIX/bin:$PATH -export LD_LIBRARY_PATH=$HOME_PREFIX/lib:$LD_LIBRARY_PATH - -##### -# -# Actually do stuff -# -##### - -debug() { - if test -n "$debug"; then - echo "=== DEBUG: $*" - fi -} - -run_command() { - debug "Running command: $*" - debug "Running in pwd: `pwd`" - if test -n "$debug"; then - eval $* - else - eval $* > /dev/null 2>&1 - fi - - if test $? -ne 0; then - echo "=== Command failed: $*" - fi -} - -# load the modules configuration -. $MODULE_INIT -module use $AUTOTOOL_MODULE - -# get our nightly build script -mkdir -p $build_root -cd $build_root - -pending_coverity=$build_root/tarballs-to-run-through-coverity.txt -rm -f $pending_coverity -touch $pending_coverity - -# Loop making the tarballs -module unload autotools -for branch in $branches; do - echo "=== Branch: $branch" - # Get the last tarball version that was made - prev_snapshot=`cat $outputroot/$branch/latest_snapshot.txt` - prev_snapshot_hash=`echo $prev_snapshot | cut -d- -f3` - echo "=== Previous snapshot: $prev_snapshot (hash: $prev_snapshot_hash)" - - if test "$branch" = "master"; then - code_uri=$master_code_uri - raw_uri=$master_raw_uri - else - code_uri=$release_code_uri - raw_uri=$release_raw_uri - fi - - # Form a URL-specific script name - script=$branch-`basename $script_uri` - - echo "=== Getting script from: $raw_uri" - run_command wget --quiet --no-check-certificate --tries=10 $raw_uri/$branch/$script_uri -O $script - if test ! $? -eq 0 ; then - echo "wget of PMIX nightly tarball create script failed." - if test -f $script ; then - echo "Using older version of $script for this run." - else - echo "No build script available. Aborting." - exit 1 - fi - fi - chmod +x $script - - module load "autotools/pmix-$branch" - # module load "libevent/pmix-$branch" - - echo "=== Running script..." - run_command ./$script \ - $build_root/$branch \ - $results_addr \ - $outputroot/$branch \ - $code_uri \ - $branch - - module unload autotools - echo "=== Done running script" - - # Did the script generate a new tarball? Ensure to compare the - # only the hash of the previous tarball and the hash of the new - # tarball (the filename also contains the date/timestamp, which - # will always be different). - - # If so, save it so that we can spawn the coverity checker on it - # afterwards. Only for this for the master (for now). - latest_snapshot=`cat $outputroot/$branch/latest_snapshot.txt` - latest_snapshot_hash=`echo $latest_snapshot | cut -d- -f3` - echo "=== Latest snapshot: $latest_snapshot (hash: $latest_snapshot_hash)" - if test "$prev_snapshot_hash" = "$latest_snapshot_hash"; then - echo "=== Hash has not changed; no need to upload/save the new tarball" - else - if test "$branch" = "master"; then - echo "=== Saving output for a Coverity run" - echo "$outputroot/$branch/pmix-$latest_snapshot.tar.bz2" >> $pending_coverity - else - echo "=== NOT saving output for a Coverity run" - fi - echo "=== Posting tarball to open-mpi.org" - # tell the web server to cleanup old nightly tarballs - run_command ssh -p 2222 \ - $output_ssh_target \ - \"git/ompi/contrib/build-server/remove-old.pl 7 public_html/software/pmix/nightly/$branch\" - # upload the new ones - run_command scp -P 2222 \ - $outputroot/$branch/pmix-$latest_snapshot.tar.* \ - $output_ssh_target:public_html/software/pmix/nightly/$branch/ - run_command scp -P 2222 \ - $outputroot/$branch/latest_snapshot.txt \ - $output_ssh_target:public_html/software/pmix/nightly/$branch/ - # direct the web server to regenerate the checksums - run_command ssh -p 2222 \ - $output_ssh_target \ - \"cd public_html/software/pmix/nightly/$branch \&\& md5sum pmix\* \> md5sums.txt\" - run_command ssh -p 2222 \ - $output_ssh_target \ - \"cd public_html/software/pmix/nightly/$branch \&\& sha1sum pmix\* \> sha1sums.txt\" - fi - - # Failed builds are not removed. But if a human forgets to come - # in here and clean up the old failed builds, we can accumulate - # many over time. So remove any old failed bbuilds that are over - # 4 weeks old. - run_command ${script_dir}/remove-old.pl 7 $build_root/$branch -done - -# If we had any new snapshots to send to coverity, process them now - -for tarball in `cat $pending_coverity`; do - echo "=== Submitting $tarball to Coverity..." - run_command ${script_dir}/pmix-nightly-coverity.pl \ - --filename=$tarball \ - --coverity-token=$coverity_token \ - --verbose \ - --logfile-dir=$HOME/coverity \ - --make-args=-j8 \ - --configure-args=\"$coverity_configure_args\" -done -rm -f $pending_coverity diff --git a/contrib/build-server/remove-old.pl b/contrib/build-server/remove-old.pl deleted file mode 100755 index b16533eda25..00000000000 --- a/contrib/build-server/remove-old.pl +++ /dev/null @@ -1,59 +0,0 @@ -#!/usr/bin/env perl - -use strict; -use warnings; - -use POSIX qw(strftime); - -my $happy = 1; - -my $savedays = $ARGV[0]; -my $dir = $ARGV[1]; - -$happy = 0 - if ($savedays <= 0 || ! -d $dir); -die "Must specify number of days and a directory" - if (!$happy); - -#------------------------------------------------------------------ - -# Read in all the dir entries -opendir(DIR, $dir) || die "Cannot open $dir"; -my @files = readdir(DIR); -closedir(DIR); - -# How many days to keep? -my $t = time() - ($savedays * 60 * 60 * 24); -print "Deleting anything in $dir before: " . strftime("%D", localtime($t)) . "\n"; -my $to_delete = ""; - -# Check everything in the dir; if is a dir, is not . or .., and is -# older than the save date, keep it for deleting later. -foreach my $file (sort(@files)) { - if ($file ne "index.php" && - $file ne "md5sums.txt" && - $file ne "sha1sums.txt" && - $file ne "latest_snapshot.txt" && - $file ne "." && - $file ne "..") { - my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, - $atime,$mtime,$ctime,$blksize,$blocks) = stat("$dir/$file"); - my $str = "SAVE"; - if ($mtime < $t) { - $to_delete = "$to_delete $dir/$file"; - $str = "DELETE"; - } - print "Found $file: $str (mtime: " . strftime("%D", localtime($mtime)) . ")\n"; - } -} - -# If we found anything to delete, do so. -if ($to_delete ne "") { - print "Deleting: $to_delete\n"; - system("chmod -R u=rwx $to_delete"); - system("rm -rf $to_delete"); - } else { - print "Nothing to delete!\n"; -} - -exit(0); diff --git a/contrib/dist/linux/openmpi.spec b/contrib/dist/linux/openmpi.spec index 370e5dd5fc7..2a80af296b8 100644 --- a/contrib/dist/linux/openmpi.spec +++ b/contrib/dist/linux/openmpi.spec @@ -163,6 +163,8 @@ # bets are off. So feel free to install it anywhere in your tree. He # suggests $prefix/doc. %define _defaultdocdir /opt/%{name}/%{version}/doc +# Also put the modulefile in /opt. +%define modulefile_path /opt/%{name}/%{version}/share/openmpi/modulefiles %endif %if !%{build_debuginfo_rpm} @@ -767,6 +769,10 @@ test "x$RPM_BUILD_ROOT" != "x" && rm -rf $RPM_BUILD_ROOT # ############################################################################# %changelog +* Tue Mar 28 2017 Jeff Squyres +- Reverting a decision from a prior changelog entry: if + install_in_opt==1, then even put the modulefile under /opt. + * Thu Nov 12 2015 Gilles Gouaillardet - Revamp packaging when prefix is /usr diff --git a/contrib/dist/macosx-pkg/ReadMe.rtf b/contrib/dist/macosx-pkg/ReadMe.rtf deleted file mode 100644 index 82969cc7528..00000000000 --- a/contrib/dist/macosx-pkg/ReadMe.rtf +++ /dev/null @@ -1,34 +0,0 @@ -{\rtf1\mac\ansicpg10000\cocoartf824\cocoasubrtf410 -{\fonttbl\f0\fnil\fcharset77 Verdana;\f1\fswiss\fcharset77 Helvetica;} -{\colortbl;\red255\green255\blue255;\red0\green0\blue236;} -\margl1440\margr1440\vieww10580\viewh15280\viewkind0 -\pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\sa240\ql\qnatural - -\f0\fs24 \cf0 Open MPI is a project combining technologies and resources from several other projects ({\field{\*\fldinst{HYPERLINK "http://icl.cs.utk.edu/ftmpi/"}}{\fldrslt \cf2 \ul \ulc2 FT-MPI}}, {\field{\*\fldinst{HYPERLINK "http://public.lanl.gov/lampi/"}}{\fldrslt \cf2 \ul \ulc2 LA-MPI}}, {\field{\*\fldinst{HYPERLINK "http://www.lam-mpi.org/"}}{\fldrslt \cf2 \ul \ulc2 LAM/MPI}}, and {\field{\*\fldinst{HYPERLINK "http://www.hlrs.de/organization/pds/projects/pacx-mpi/"}}{\fldrslt \cf2 \ul \ulc2 PACX-MPI}}) in order to build the best MPI library available. A completely new MPI-2 compliant implementation, Open MPI offers advantages for system and software vendors, application developers and computer science researchers. -\f1 More information about Open MPI, including all the source code and documentation, is available from the main web site:\ -\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural -\cf0 \ -\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\qc -{\field{\*\fldinst{HYPERLINK "http://www.open-mpi.org/"}}{\fldrslt \cf0 http://www.open-mpi.org/}}\ -\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural -\cf0 \ -\ -\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural -\cf0 \ul \ulc0 Special OS X Package Notes\ulnone \ -\ -The binary package for Open MPI includes support for TCP and shared memory transports, with rsh/ssh and XGrid job launching. Sites requiring support for other networks or job launching mechanisms will need to rebuild from source.\ -\ -There is no Fortran support in this binary package, as Apple does not ship a Fortran compiler with the Developer Tools - if you have a Fortran compiler and need Fortran support from Open MPI, you will have to build it from source.\ -\ -Because HFS+ is case-preserving but not case-sensitive, the C++ wrapper compiler is named mpic++, not the traditional mpiCC (which would conflict with mpicc).\ -\ -\ -\ul Getting Help\ -\ulnone \ -Please see the Open MPI web page for help with Open MPI, especially the frequently asked questions.\ -\ -\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\qc -\cf0 http://www.open-mpi.org/faq/\ -\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural -\cf0 \ -If this does not answer your question, further help is available via our mailing list at users@open-mpi.org} diff --git a/contrib/dist/macosx-pkg/buildpackage.sh b/contrib/dist/macosx-pkg/buildpackage.sh deleted file mode 100755 index dc0f0b01bc7..00000000000 --- a/contrib/dist/macosx-pkg/buildpackage.sh +++ /dev/null @@ -1,550 +0,0 @@ -#!/bin/bash - -# Copyright (c) 2001-2006 The Trustees of Indiana University. -# All rights reserved. -# Copyright (c) 2006-2007 Los Alamos National Security, LLC. All rights -# reserved. -# -# This file is part of the Open MPI software package. For license -# information, see the LICENSE file in the top level directory of the -# Open MPI source distribution. -# -# - -# -# Build a Mac OS X package for use by Installer.app -# -# Usage: buildpackage.sh [prefix] -# -# Prefix defaults to /usr/local - - -######################################################################## -# -# Configuration Options -# -######################################################################## - -# -# User-configurable stuff -# -OMPI_PACKAGE="openmpi" -OMPI_PREFIX="/usr/local/" -OMPI_OPTIONS="--disable-mpi-f77 --without-cs-fs --enable-mca-no-build=ras-slurm,pls-slurm,gpr-null,sds-pipe,sds-slurm,pml-cm NM=\"nm -p\"" -OMPI_OSX_README="ReadMe.rtf" -# note - if want XGrid support, make sure that a cocoa-supported -# architecture appears first on the list. Otherwise, we won't -# lipo that component and it will be dropped -OPAL_ARCH_LIST="ppc ppc64 i386 x86_64" -OMPI_SDK="/Developer/SDKs/MacOSX10.4u.sdk" - -# -# Not so modifiable stuff -# -BUILD_TMP="/tmp/buildpackage-$$" -if test ! "$2" = ""; then - OMPI_PREFIX="$2" -fi - -OMPI_STARTDIR=`pwd` - -echo "--> Configuration options:" -echo " Package Name: $OMPI_PACKAGE" -echo " Prefix: $OMPI_PREFIX" -echo " Config Options: $OMPI_OPTIONS" -echo " Architectures: $OPAL_ARCH_LIST" -echo " Target SDK: $OMPI_SDK" -echo "" - -######################################################################## -# -# Start actual code that does stuff -# -######################################################################## - -# -# Sanity check -# -fulltarball="$1" -if test "$fulltarball" = ""; then - echo "Usage: buildpackage.sh [prefix]" - exit 1 -fi -if test ! -f $fulltarball; then - echo "*** Can't find $fulltarball!" - exit 1 -fi -echo "--> Found tarball: $fulltarball" - -# -# Find version info -# -tarball=`basename $fulltarball` -first="`echo $tarball | cut -d- -f2`" -version="`echo $first | sed -e 's/\.tar\.gz//'`" -unset first -echo "--> Found OMPI version: $version" - -OMPI_VER_PACKAGE="${OMPI_PACKAGE}-${version}" - -# -# Sanity check that we can continue -# -if test -d "/Volumes/${OMPI_VER_PACKAGE}"; then - echo "*** Already have disk image (/Volumes/${OMPI_VER_PACKAGE}) mounted." - echo "*** Unmount and try again" - exit 1 -fi - -if test ! -r "${OMPI_OSX_README}"; then - echo "*** Can not find ${OMPI_OSX_README} in `pwd`." - exit 1 -else - OMPI_OSX_README="`pwd`/${OMPI_OSX_README}" -fi - - -# -# Clean out the environment a bit -# -echo "--> Cleaning environment" -PATH=/bin:/sbin/:/usr/bin -LANGUAGE=C -LC_ALL=C -LC_MESSAGES= -LANG= -export PATH LANGUAGE LC_ALL LC_MESSAGES LANG -unset LD_LIBRARY_PATH CC CXX FC F77 OBJC - -# -# Make some play space -# -echo "--> Making play space: $BUILD_TMP" -if test -d $BUILD_TMP; then - echo "Build dir $BUILD_TMP exists - exiting" - exit 1 -fi -# -p is safe - will only run on OS X -mkdir -p $BUILD_TMP - - -######################################################################## -# -# Configure, Build, and Install Open MPI -# -######################################################################## - -# -# Put tarball in right place -# -echo "--> Copying tarball" -cp $fulltarball $BUILD_TMP/. - -cd $BUILD_TMP - -# -# Expand tarball -# - -# we know there can't be spaces in $tarball - filename only -cmd="tar xzf $tarball" -echo "--> Untarring source: $cmd" - -eval $cmd -srcdir="$BUILD_TMP/openmpi-$version" -if test ! -d "$srcdir"; then - echo "*** Didn't find $srcdir as expected - aborting" - exit 1 -fi - -build_arch=`uname -p`"-apple-darwin"`uname -r` - -real_install=1 -for arch in $OPAL_ARCH_LIST ; do - builddir="$BUILD_TMP/build-$arch" - mkdir "$builddir" - - case "$arch" in - ppc) - host_arch="powerpc-apple-darwin"`uname -r` - ;; - ppc64) - # lie, but makes building on G4 easier - host_arch="powerpc64-apple-darwin"`uname -r` - ;; - i386) - host_arch="i386-apple-darwin"`uname -r` - ;; - x86_64) - host_arch="x86_64-apple-darwin"`uname -r` - ;; - *) - echo "**** Could not find arch string for $arch ****" - exit 1 - ;; - esac - - # - # Run configure - # - cd $builddir - config="$srcdir/configure CFLAGS=\"-arch $arch -isysroot $OMPI_SDK\" CXXFLAGS=\"-arch $arch -isysroot $OMPI_SDK\" OBJCFLAGS=\"-arch $arch -isysroot $OMPI_SDK\" --prefix=$OMPI_PREFIX $OMPI_OPTIONS --build=$build_arch --host=$host_arch" - echo "--> Running configure: $config" - eval $config > "$BUILD_TMP/configure.out-$arch" 2>&1 - - if test $? != 0; then - echo "*** Problem running configure - aborting!" - echo "*** See $BUILD_TMP/configure.out-$arch for help." - exit 1 - fi - - # - # Build - # - cmd="make -j 4 all" - echo "--> Building: $cmd" - eval $cmd > "$BUILD_TMP/make.out-$arch" 2>&1 - - if test $? != 0; then - echo "*** Problem building - aborting!" - echo "*** See $BUILD_TMP/make.out-$arch for help." - exit 1 - fi - - # - # Install into tmp place - # - if test $real_install -eq 1 ; then - distdir="dist" - real_install=0 - else - distdir="dist-$arch" - fi - fulldistdir="$BUILD_TMP/$distdir" - cmd="make DESTDIR=$fulldistdir install" - echo "--> Installing:" - eval $cmd > "$BUILD_TMP/install.out-$arch" 2>&1 - - if test $? != 0; then - echo "*** Problem installing - aborting!" - echo "*** See $BUILD_TMP/install.out-$arch for help." - exit 1 - fi - - # - # Copy in special doc files - # - SPECIAL_FILES="README LICENSE" - echo "--> Copying in special files: $SPECIAL_FILES" - pushd $srcdir >/dev/null - mkdir -p "${fulldistdir}/${OMPI_PREFIX}/share/openmpi/doc" - cp $SPECIAL_FILES "${fulldistdir}/${OMPI_PREFIX}/share/openmpi/doc/." - if [ ! $? = 0 ]; then - echo "*** Problem copying files $SPECIAL_FILES. Aborting!" - exit 1 - fi - popd >/dev/null - - distdir= - fulldistdir= -done - - -######################################################################## -# -# Make the fat binary -# -######################################################################## -print_arch_if() { - case "$1" in - ppc) - echo "#ifdef __ppc__" >> mpi.h - ;; - ppc64) - echo "#ifdef __ppc64__" >> mpi.h - ;; - i386) - echo "#ifdef __i386__" >> mpi.h - ;; - x86_64) - echo "#ifdef __x86_64__" >> mpi.h - ;; - *) - echo "*** Could not find arch #ifdef for $1" - exit 1 - ;; - esac -} - -# Set arch to the first arch in the list. Go through the for loop, -# although we'll break out at the end of the first time through. Look -# at the other arches that were built by using ls. -for arch in $OPAL_ARCH_LIST ; do - cd $BUILD_TMP - other_archs=`ls -d dist-*` - fulldistdir="$BUILD_TMP/dist" - - echo "--> Creating fat binares and libraries" - for other_arch in $other_archs ; do - cd "$fulldistdir" - - # /bin - don't copy in 64 bit binaries - if echo $other_arch | grep -v 64 > /dev/null ; then - files=`find ./${OMPI_PREFIX}/bin -type f -print` - for file in $files ; do - other_file="$BUILD_TMP/${other_arch}/$file" - if test -r $other_file ; then - lipo -create $file $other_file -output $file - fi - done - fi - - # /lib - ignore .la files - files=`find ./${OMPI_PREFIX}/lib -type f -print | grep -v '\.la$'` - for file in $files ; do - other_file="$BUILD_TMP/${other_arch}/$file" - if test -r $other_file ; then - lipo -create $file $other_file -output $file - else - echo "Not lipoing missing file $other_file" - fi - done - - done - - cd $BUILD_TMP - - echo "--> Creating multi-architecture mpi.h" - # mpi.h - # get the top of mpi.h - mpih_top=`grep -n '@OMPI_BEGIN_CONFIGURE_SECTION@' $BUILD_TMP/dist/${OMPI_PREFIX}/include/mpi.h | cut -f1 -d:` - mpih_top=`echo "$mpih_top - 1" | bc` - head -n $mpih_top $BUILD_TMP/dist/${OMPI_PREFIX}/include/mpi.h > mpih_top.txt - - # now the bottom of mpi.h - mpih_bottom_top=`grep -n '@OMPI_END_CONFIGURE_SECTION@' $BUILD_TMP/dist/${OMPI_PREFIX}/include/mpi.h | cut -f1 -d:` - mpih_bottom_bottom=`wc -l $BUILD_TMP/dist/${OMPI_PREFIX}/include/mpi.h | cut -f1 -d/` - mpih_bottom=`echo "$mpih_bottom_bottom - $mpih_bottom_top" | bc` - tail -n $mpih_bottom $BUILD_TMP/dist/${OMPI_PREFIX}/include/mpi.h > mpih_bottom.txt - - # now get our little section of fun - mpih_top=`echo "$mpih_top + 1" | bc` - mpih_fun_len=`echo "$mpih_bottom_top - $mpih_top + 1" | bc` - head -n $mpih_bottom_top $BUILD_TMP/dist/${OMPI_PREFIX}/include/mpi.h | tail -n $mpih_fun_len > mpih_$arch.txt - - # start putting it back together - rm -f mpi.h - cat mpih_top.txt > mpi.h - - print_arch_if $arch - cat mpih_$arch.txt >> mpi.h - echo "#endif" >> mpi.h - - for other_arch_dir in $other_archs ; do - other_arch=`echo $other_arch_dir | cut -f2 -d-` - mpih_top=`grep -n '@OMPI_BEGIN_CONFIGURE_SECTION@' $BUILD_TMP/$other_arch_dir/${OMPI_PREFIX}/include/mpi.h | cut -f1 -d:` - mpih_bottom_top=`grep -n '@OMPI_END_CONFIGURE_SECTION@' $BUILD_TMP/$other_arch_dir/${OMPI_PREFIX}/include/mpi.h | cut -f1 -d:` - mpih_fun_len=`echo "$mpih_bottom_top - $mpih_top + 1" | bc` - head -n $mpih_bottom_top $BUILD_TMP/$other_arch_dir/${OMPI_PREFIX}/include/mpi.h | tail -n $mpih_fun_len > mpih_$other_arch.txt - - print_arch_if $other_arch - cat mpih_$other_arch.txt >> mpi.h - echo "#endif" >> mpi.h - done - - cat mpih_bottom.txt >> mpi.h - mv mpi.h $BUILD_TMP/dist/${OMPI_PREFIX}/include/. - rm mpih* - break -done - -# set component load errors to false, as we're almost always going to -# fail to load the XGrid components on 64 bit systems, and users don't -# need to see that. -echo "mca_component_show_load_errors = 0" >> $BUILD_TMP/dist/${OMPI_PREFIX}/etc/openmpi-mca-params.conf - -######################################################################## -# -# Do all the package mojo -# -######################################################################## - -# -# Prep package info -# -debug_file="${BUILD_TMP}/disk.out" -touch "$debug_file" -echo "--> Creating Package Info:" - -cd $BUILD_TMP - -pkdir="${BUILD_TMP}/${OMPI_PACKAGE}.pkg" -mkdir -p ${pkdir} -mkdir ${pkdir}/Contents -mkdir ${pkdir}/Contents/Resources -mkdir ${pkdir}/Contents/Resources/English.lproj -echo 'pmkrpkg1' > ${pkdir}/Contents/PkgInfo - -infofile=${pkdir}/Contents/Resources/English.lproj/${OMPI_PACKAGE}.info - -echo "Title Open MPI ${version}" > ${infofile} -echo "Version ${version}" >> ${infofile} -echo "Description Install Open MPI ${version}" >> ${infofile} -echo 'DefaultLocation /' >> ${infofile} -echo 'DeleteWarning' >> ${infofile} -echo 'NeedsAuthorization YES' >> ${infofile} -echo 'Required NO' >> ${infofile} -echo 'Relocatable NO' >> ${infofile} -echo 'RequiresReboot NO' >> ${infofile} -echo 'UseUserMask NO' >> ${infofile} -echo 'OverwritePermissions NO' >> ${infofile} -echo 'InstallFat NO' >> ${infofile} - -echo "--> Copying OS X-specific ReadMe into package" -cp "${OMPI_OSX_README}" "${pkdir}/Contents/Resources/ReadMe.rtf" -if [ ! $? = 0 ]; then - echo "*** Could not copy in ReadMe.rtf. Aborting!" - exit 1 -fi - -echo "--> Creating pax file" -CWD=`pwd` -cd "$fulldistdir" -pax -w -f "${pkdir}/Contents/Resources/${OMPI_PACKAGE}.pax" . >> "$debug_file" 2>&1 -if [ ! $? = 0 ]; then - echo "*** Failed building pax file. Aborting!" - echo "*** Check $debug_file for information" - cd "$CWD" - exit 1 -fi -cd "$CWD" -unset CWD - - -echo "--> Compressing pax file" -gzip "${pkdir}/Contents/Resources/${OMPI_PACKAGE}.pax" >> "$debug_file" 2>&1 -if [ ! $? = 0 ]; then - echo "*** Failed compressing pax file. Aborting!" - echo "*** Check $debug_file for information" - exit 1 -fi - -echo "--> Creating bom file" -mkbom "$fulldistdir" "${pkdir}/Contents/Resources/${OMPI_PACKAGE}.bom" >> "$debug_file" 2>&1 -if [ ! $? = 0 ]; then - echo "*** Failed building bom file. Aborting!" - echo "*** Check $debug_file for information" - exit 1 -fi - -echo "--> Generating sizes file:" -sizesfile="${pkdir}/Contents/Resources/${OMPI_PACKAGE}.sizes" - -numFiles=`du -a ${fulldistdir} | wc -l` -installedSize=`du -s ${fulldistdir} | cut -f1` -compressedSize=`du -s ${fulldistdir} | cut -f1` - -echo "NumFiles ${numFiles}" > ${sizesfile} -echo "InstalledSize ${installedSize}" >> ${sizesfile} -echo "CompressedSize ${compressedSize}" >> ${sizesfile} -cat ${sizesfile} - -# -# Make a disk image in read-write mode -# -echo "--> Creating Disc Image" -# Allocated about 2.5MB more than we need, just to be safe. If that -# number is less than about 5MB, make 5MB to keep disk utilities -# happy. -sectorsAlloced=`echo 2*${compressedSize}+50|bc` -if [ $sectorsAlloced -lt 10000 ]; then - sectorsAlloced=10000 -fi -hdiutil create -ov "${BUILD_TMP}/${OMPI_VER_PACKAGE}RW" -sectors ${sectorsAlloced} >> "$debug_file" 2>&1 -if [ ! $? = 0 ]; then - echo "*** Failed hdiutil create. Aborting!" - echo "*** Check $debug_file for information" - exit 1 -fi - -mountLoc=`hdid -nomount ${BUILD_TMP}/${OMPI_VER_PACKAGE}RW.dmg | grep HFS | cut -f1` -/sbin/newfs_hfs -v ${OMPI_VER_PACKAGE} ${mountLoc} >> "$debug_file" 2>&1 -if [ ! $? = 0 ]; then - echo "*** Failed building HFS+ file system. Aborting!" - echo "*** Check $debug_file for information" - exit 1 -fi - -hdiutil eject ${mountLoc} >> "$debug_file" 2>&1 -if [ ! $? = 0 ]; then - echo "*** Could not unmount $mountLoc. Aborting!" - echo "*** Check $debug_file for information" - exit 1 -fi - -# -# Copy above package into the disk image -# -echo "--> Copying Package to Disc Image" -hdid "${BUILD_TMP}/${OMPI_VER_PACKAGE}RW.dmg" >> "$debug_file" 2>&1 -if [ ! $? = 0 ]; then - echo "*** Could not mount ${BUILD_TMP}/${OMPI_VER_PACKAGE}RW.dmg. Aborting!" - echo "*** Check $debug_file for information" - exit 1 -fi - -if [ ! -d "/Volumes/${OMPI_VER_PACKAGE}" ]; then - echo "*** /Volumes/${OMPI_VER_PACKAGE} does not exist. Aborting!" - echo "*** Check $debug_file for information" - exit 1 -fi - -cp -R "${pkdir}" "/Volumes/${OMPI_VER_PACKAGE}" -if [ ! $? = 0 ]; then - echo "*** Error copying ${OMPI_VER_PACKAGE}.pkg. Aborting!" - echo "*** Check $debug_file for information" - exit 1 -fi - -# -# Converting Disk Image to read-only (and shrink to size needed) -# -cmd="hdiutil eject ${mountLoc}" -echo "--> Ejecting R/W disk: $cmd" -eval $cmd >> "$debug_file" 2>&1 -if [ ! $? = 0 ]; then - echo "*** Error ejecting R/W disk. Aborting!" - echo "*** Check $debug_file for information" - exit 1 -fi - -cmd="hdiutil resize \"${BUILD_TMP}/${OMPI_VER_PACKAGE}RW.dmg\" -sectors min" -echo "--> Resizing: $cmd" -eval $cmd >> "$debug_file" 2>&1 -if [ ! $? = 0 ]; then - echo "*** Error resizing disk. Aborting!" - echo "*** Check $debug_file for information" - exit 1 -fi - -cmd="hdiutil convert \"${BUILD_TMP}/${OMPI_VER_PACKAGE}RW.dmg\" -format UDRO -o \"/tmp/${OMPI_VER_PACKAGE}.dmg\"" -echo "--> Converting to R-O: $cmd" -eval $cmd >> "$debug_file" 2>&1 -if [ ! $? = 0 ]; then - echo "*** Error converting disk to read-only. Aborting!" - echo "*** Check $debug_file for information" - exit 1 -fi - -echo "--> Compressing disk image" -gzip --best "/tmp/${OMPI_VER_PACKAGE}.dmg" - -echo "--> Cleaning up the staging directory" -rm -rf "${BUILD_TMP}" -if [ ! $? = 0 ]; then - echo "*** Could not clean up ${BUILD_TMP}." - echo "You may want to clean it up yourself." - exit 1 -fi - -echo "--> Done. Package is at: /tmp/${OMPI_VER_PACKAGE}.dmg.gz" diff --git a/contrib/dist/make-authors.pl b/contrib/dist/make-authors.pl index 925e83a9ca2..92df0a4b230 100755 --- a/contrib/dist/make-authors.pl +++ b/contrib/dist/make-authors.pl @@ -1,161 +1,114 @@ #!/usr/bin/env perl # # Copyright (c) 2008-2016 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 Amazon.com, Inc. or its affiliates. +# All Rights reserved. # use strict; use Data::Dumper; +use Getopt::Long; +use Cwd; # Ensure that we're in the root of a writeable Git clone my $in_git_clone = 1; - -$in_git_clone = 0 - if (! -d ".git" || ! -f "AUTHORS"); +my $skip_ok = 0; +my $quiet = 0; +my $srcdir = "."; +my $destdir = getcwd(); + +GetOptions("skip-ok" => \$skip_ok, + "quiet" => \$quiet, + "srcdir=s" => \$srcdir, + "destdir=s" => \$destdir) + or die("Error in command line arguments\n"); + +# we still work with git old enough to not have the -C option, and the +# --git-dir option screws up .mailmap, so just jump into the source +# directory and make life easier. +chdir($srcdir); + +if (! -d ".git") { + if ($skip_ok == 0) { + print STDERR "I don't seem to be in a git repo :(\n"; + exit(1); + } else { + # called from make dist, just exit quietly (for case where + # user runs "make dist" from a dist tarball) + exit(0); + } +} ###################################################################### -my $header_sep = "-----"; -my $unknown_org = "********* NO ORGANIZATION SET ********"; - my $people; ###################################################################### # Run git log to get a list of committers -open (GIT, "git log --format=tformat:'%aN <%aE>'|") || die "Can't run 'git log'."; +open (GIT, "git log --no-merges --format=tformat:'%aN <%aE>'|") || die "Can't run 'git log'."; while () { chomp; m/^\s*(.+)\s+<(.+)>\s*$/; + my $email = lc($2); + + # special case from the SVN migration + if ($email eq 'no-author@open-mpi.org') { next; } + # skip the mpi bot... + if ($email eq 'mpiteam@open-mpi.org') { next; } + if (!exists($people->{$1})) { # The person doesn't exist, so save a new entry $people->{$1} = { name => $1, - org => $unknown_org, emails => { - lc($2) => 1, + $email => 1, } }; - - print "Found Git committer: $1 <$2>\n"; + if ($quiet == 0) { print STDOUT "Found Git committer: $1 <$email>\n"; } } else { # The person already exists, so just add (or overwrite) this # email address - $people->{$1}->{emails}->{$2} = 1; + $people->{$1}->{emails}->{$email} = 1; } } close(GIT); -###################################################################### - -# Read the existing AUTHORS file - -my $header; - -print "Matching Git emails to existing names/affiliations...\n"; - -sub save { - my $current = shift; - - print "Saving person from AUTHORS: $current->{name}\n"; - - # We may overwrite an entry written from the git log, but that's - # ok - $people->{$current->{name}} = $current; +if (scalar(keys(%{$people})) == 0) { + print STDERR "Found no author entries, assuming git broke. Aborting!\n"; + exit(1); } -open (AUTHORS, "AUTHORS") || die "Can't open AUTHORS file"; -my $in_header = 1; -my $current = undef; -while () { - chomp; - my $line = $_; - - # Slurp down header lines until we hit a line that begins with - # $header_sep - if ($in_header) { - $header .= "$line\n"; - - if ($_ =~ /^$header_sep/) { - $in_header = 0; - - # There should be a blank line after this, too - $header .= "\n"; - } - next; - } +###################################################################### - # Skip blank lines - next - if ($line =~ /^\s*$/); - - # Format of body: - # - # NAME, Affiliation 1[, Affiliation 2[...]] - # Email address 1 - # [Email address 2] - # [...] - # NAME, Affiliation 1[, Affiliation 2[...]] - # Email address 1 - # [Email address 2] - # [...] - - # Found a new email address for an existing person - if ($line =~ /^ /) { - m/^ (.+)$/; - $current->{emails}->{lc($1)} = 1; - - next; - } else { - # Found a new person; save the old entry - save($current) - if (defined($current)); - - $current = undef; - $current->{org} = $unknown_org; - if ($line =~ m/^(.+?),\s+(.+)$/) { - $current->{name} = $1; - $current->{org} = $2; - } else { - $current->{name} = $line; - } +# Output a new AUTHORS file - next; - } -} +open (AUTHORS, ">$destdir/AUTHORS") || die "Can't write to AUTHORS file"; -save($current) - if (defined($current)); +my $header = <<'END_HEADER'; +Open MPI Authors +================ -close(AUTHORS); +The following cumulative list contains the names and email addresses +of all individuals who have committed code to the Open MPI repository +(either directly or through a third party, such as through a +Github.com pull request). Note that these email addresses are not +guaranteed to be current; they are simply a unique indicator of the +individual who committed them. -###################################################################### - -# Output a new AUTHORS file - -open (AUTHORS, ">AUTHORS.new") || die "Can't write to AUTHORS file"; +END_HEADER print AUTHORS $header; -my @people_with_unknown_orgs; my $email_dups; my @sorted_people = sort(keys(%{$people})); foreach my $p (@sorted_people) { - print AUTHORS $p; - if (exists($people->{$p}->{org})) { - my $org = $people->{$p}->{org}; - if ($org ne $unknown_org) { - print AUTHORS ", $org"; - } else { - # Record this so that we can warn about it - push(@people_with_unknown_orgs, $p); - } - } - print AUTHORS "\n"; + print AUTHORS "$p\n"; foreach my $e (sort(keys(%{$people->{$p}->{emails}}))) { # Sanity check: make sure this email address does not show up @@ -191,38 +144,27 @@ sub save { } close(AUTHORS); -# We have a new AUTHORS file! Replace the old one. -unlink("AUTHORS"); -rename("AUTHORS.new", "AUTHORS"); - -print "New AUTHORS file written.\n"; +print STDOUT "New AUTHORS file written.\n"; ###################################################################### # Output any relevant warnings my $warned = 0; -if ($#people_with_unknown_orgs >= 0) { - $warned = 1; - print "\n*** WARNING: The following people have unspecified organiations:\n"; - foreach my $p (@people_with_unknown_orgs) { - print "*** $p\n"; - } -} my @k = sort(keys(%{$email_dups})); if ($#k >= 0) { $warned = 1; - print "\n*** WARNING: The following people had the same email address:\n"; + print STDERR "\n*** WARNING: The following people had the same email address:\n"; foreach my $p (@k) { - print "*** $p, $email_dups->{$p}\n"; + print STDERR "*** $p, $email_dups->{$p}\n"; } } if ($warned) { - print " + print STDERR " ******************************************************************************* -*** YOU SHOULD EDIT THE .mailmap AND/OR AUTHORS FILE TO RESOLVE THESE WARNINGS! +*** YOU SHOULD EDIT THE .mailmap FILE TO RESOLVE THESE WARNINGS! *******************************************************************************\n"; } diff --git a/contrib/dist/make-html-man-pages.pl b/contrib/dist/make-html-man-pages.pl index 31de66ed6a6..58f7679638c 100755 --- a/contrib/dist/make-html-man-pages.pl +++ b/contrib/dist/make-html-man-pages.pl @@ -76,7 +76,7 @@ sub doit { # Autogen if we don't have a configure script doit("./autogen.pl") if (! -x "configure"); -doit("./configure --prefix=$prefix --enable-mpi-ext=all"); +doit("./configure --prefix=$prefix --enable-mpi-ext=all --without-cs-fs"); # Find this OMPI's version my $version = `fgrep PACKAGE_VERSION opal/include/opal_config.h | cut -d\\\" -f2`; diff --git a/contrib/libadd_mca_comp_update.py b/contrib/libadd_mca_comp_update.py new file mode 100755 index 00000000000..2388cadaf4e --- /dev/null +++ b/contrib/libadd_mca_comp_update.py @@ -0,0 +1,230 @@ +#!/usr/bin/env python + +# Copyright (c) 2017 IBM Corporation. All rights reserved. +# $COPYRIGHT$ +# + +import glob, os, re, shutil + +projects= {'opal' : ["$(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la"], + 'orte' : ["$(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la"], + 'ompi' : ["$(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la"], + 'oshmem' : ["$(top_builddir)/oshmem/liboshmem.la"], + } + +no_anchor_file = [] +missing_files = [] +skipped_files = [] +partly_files = [] +updated_files = [] + +# +# Check of all of the libadd fields are accounted for in the LIBADD +# Return a list indicating which are missing (positional) +# +def check_libadd(content, libadd_field, project): + global projects + + libadd_list = projects[project] + libadd_missing = [True] * len(libadd_list) + + on_libadd = False + for line in content: + # First libadd line + if re.search( r"^\s*"+libadd_field, line): + # If line continuation, then keep searching after this point + if line[-2] == '\\': + on_libadd = True + + for idx, lib in enumerate(libadd_list): + if True == libadd_missing[idx]: + if 0 <= line.find(lib): + libadd_missing[idx] = False + + # Line continuation + elif True == on_libadd: + for idx, lib in enumerate(libadd_list): + if True == libadd_missing[idx]: + if 0 <= line.find(lib): + libadd_missing[idx] = False + + # No more line continuations, so stop processing + if line[-2] != '\\': + on_libadd = False + break + + return libadd_missing + +# +# Update all of the Makefile.am's with the proper LIBADD additions +# +def update_makefile_ams(): + global projects + global no_anchor_file + global missing_files + global skipped_files + global partly_files + global updated_files + + for project, libadd_list in projects.items(): + libadd_str = " \\\n\t".join(libadd_list) + + print("="*40) + print("Project: "+project) + print("LIBADD:\n"+libadd_str) + print("="*40) + + # + # Walk the directory structure + # + for root, dirs, files in os.walk(project+"/mca"): + parts = root.split("/") + if len(parts) != 4: + continue + if parts[-1] == ".libs" or parts[-1] == ".deps" or parts[-1] == "base": + continue + if parts[2] == "common": + continue + + print("Processing: "+root) + + # + # Find Makefile.am + # + make_filename = os.path.join(root, "Makefile.am") + if False == os.path.isfile( make_filename ): + missing_files.append("Missing: "+make_filename) + print(" ---> Error: "+make_filename+" is not present in this directory") + continue + + # + # Stearching for: mca_FRAMEWORK_COMPONENT_la_{LIBADD|LDFLAGS} + # First scan file to see if it has an LIBADD / LDFLAGS + # + libadd_field = "mca_"+parts[2]+"_"+parts[3]+"_la_LIBADD" + ldflags_field = "mca_"+parts[2]+"_"+parts[3]+"_la_LDFLAGS" + has_ldflags = False + has_libadd = False + + r_fd = open(make_filename, 'r') + orig_content = r_fd.readlines() + r_fd.close() + libadd_missing = [] + + for line in orig_content: + if re.search( r"^\s*"+ldflags_field, line): + has_ldflags = True + elif re.search( r"^\s*"+libadd_field, line): + has_libadd = True + + if True == has_libadd: + libadd_missing = check_libadd(orig_content, libadd_field, project) + + # + # Sanity Check: Was there an anchor field. + # If not skip, we might need to manually update or it might be a + # static component. + # + if False == has_ldflags and False == has_libadd: + no_anchor_file.append("No anchor ("+ldflags_field+"): "+make_filename) + print(" ---> Error: Makefile.am does not contain necessary anchor") + continue + + # + # Sanity Check: This file does not need to be updated. + # + if True == has_libadd and all(False == v for v in libadd_missing): + skipped_files.append("Skip: "+make_filename) + print(" Skip: Already updated Makefile.am") + continue + + # + # Now go though and create a new version of the Makefile.am + # + r_fd = open(make_filename, 'r') + w_fd = open(make_filename+".mod", 'w') + + num_libadds=0 + for line in r_fd: + # LDFLAGS anchor + if re.search( r"^\s*"+ldflags_field, line): + w_fd.write(line) + # If there is no LIBADD, then put it after the LDFLAGS + if False == has_libadd: + w_fd.write(libadd_field+" = "+libadd_str+"\n") + # Existing LIBADD field to extend + elif 0 == num_libadds and re.search( r"^\s*"+libadd_field, line): + parts = line.partition("=") + num_libadds += 1 + + if parts[0][-1] == '+': + w_fd.write(libadd_field+" += ") + else: + w_fd.write(libadd_field+" = ") + + # If all libs are missing, then add the full string + # Otherwise only add the missing items + if all(True == v for v in libadd_missing): + w_fd.write(libadd_str) + # Only add a continuation if there is something to continue + if 0 != len(parts[2].strip()): + w_fd.write(" \\") + w_fd.write("\n") + else: + partly_files.append("Partly updated: "+make_filename) + for idx, lib in enumerate(libadd_list): + if True == libadd_missing[idx]: + w_fd.write(lib+" \\\n") + + # Original content (unless it's just a line continuation) + if 0 != len(parts[2].strip()) and parts[2].strip() != "\\": + w_fd.write("\t"+parts[2].lstrip()) + + # Non matching line, just echo + else: + w_fd.write(line) + + r_fd.close() + w_fd.close() + + # + # Replace the original with the updated version + # + shutil.move(make_filename+".mod", make_filename) + updated_files.append(make_filename) + + +if __name__ == "__main__": + + update_makefile_ams() + + print("") + + print("="*40); + print("{:>3} : Files skipped".format(len(skipped_files))) + print("="*40); + + print("="*40); + print("{:>3} : Files updated, but had some libs already in place.".format(len(partly_files))) + print("="*40); + for fn in partly_files: + print(fn) + + print("="*40); + print("{:>3} : Files fully updated".format(len(updated_files))) + print("="*40); + for fn in updated_files: + print(fn) + + print("="*40); + print("{:>3} : Missing Makefile.am".format(len(missing_files))) + print("="*40); + for err in missing_files: + print(err) + + print("="*40); + print("{:>3} : Missing Anchor for parsing (might be static-only components)".format(len(no_anchor_file))) + print("="*40); + for err in no_anchor_file: + print(err) + diff --git a/contrib/nightly/create_tarball.sh b/contrib/nightly/create_tarball.sh deleted file mode 100644 index 3b595b4c0bb..00000000000 --- a/contrib/nightly/create_tarball.sh +++ /dev/null @@ -1,311 +0,0 @@ -#!/bin/sh -# -# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana -# University Research and Technology -# Corporation. All rights reserved. -# Copyright (c) 2004-2005 The University of Tennessee and The University -# of Tennessee Research Foundation. All rights -# reserved. -# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, -# University of Stuttgart. All rights reserved. -# Copyright (c) 2004-2005 The Regents of the University of California. -# All rights reserved. -# Copyright (c) 2006-2017 Cisco Systems, Inc. All rights reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# -# This script is used to create a nightly snapshot tarball of Open MPI. -# -# $1: scratch root -# $2: e-mail address for destination -# $3: dest dir -# $4: git URL -# $5: git branch -# - -scratch_root=$1 -email=$2 -destdir=$3 -giturl=$4 -gitbranch=$5 - -# Set this to any value for additional output; typically only when -# debugging -: ${debug:=} - -# do you want a success mail? -want_success_mail=1 - -# max length of logfile to send in an e-mail -max_log_len=50 - -# how many snapshots to keep in the destdir? -max_snapshots=5 - -############################################################################ -# Shouldn't need to change below this line -############################################################################ - -start_time="`date`" - -# Sanity checks -if test -z "$scratch_root" -o -z "$email" -o -z "$giturl" -o -z "$gitbranch" \ - -o -z "$destdir"; then - echo "$0 scratch_root email_addr dest_dir git_url git_branch" - exit 1 -fi - -# Use the branch name as the "version" string (for if there is an -# error). This version string will be replaced upon successful "make -# distcheck" with the real version. -version=$gitbranch - -# send a mail -# should only be called after logdir is set -send_error_mail() { - outfile="$scratch_root/output.txt" - rm -f "$outfile" - touch "$outfile" - for file in `/bin/ls $logdir/* | sort`; do - len="`wc -l $file | awk '{ print $1}'`" - if test "`expr $len \> $max_log_len`" = "1"; then - echo "[... previous lines snipped ...]" >> "$outfile" - tail -n $max_log_len "$file" >> "$outfile" - else - cat "$file" >> "$outfile" - fi - done - Mail -s "=== CREATE FAILURE ($version) ===" "$email" < "$outfile" - rm -f "$outfile" -} - -# send output error message -die() { - msg="$*" - cat > "$logdir/00_announce.txt" < "$logfile" 2>&1 - st=$? - echo "*** Command complete: exit status: $st" - else - eval $cmd > "$logfile" 2>&1 - st=$? - fi - if test "$st" != "0"; then - cat > "$logdir/15-error.txt" < "$logdir/25-error.txt" < VERSION.new -cp -f VERSION.new VERSION -rm -f VERSION.new - -# lie about our username in $USER so that autogen will skip all -# .ompi_ignore'ed directories (i.e., so that we won't get -# .ompi_unignore'ed) -USER="ompibuilder" -export USER - -# autogen is our friend -do_command "./autogen.pl --force" - -# do config -do_command "./configure" - -# Do make distcheck (which will invoke config/distscript.csh to set -# the right values in VERSION). distcheck does many things; we need -# to ensure it doesn't pick up any other installs via LD_LIBRARY_PATH. -# It may be a bit Draconian to totally clean LD_LIBRARY_PATH (i.e., we -# may need something in there), but at least in the current building -# setup, we don't. But be advised that this may need to change in the -# future... -save=$LD_LIBRARY_PATH -LD_LIBRARY_PATH= -do_command "make -j 8 distcheck" -LD_LIBRARY_PATH=$save -save= - -# chmod the whole directory, so that core files are accessible by others -chmod a+rX -R . - -# move the resulting tarballs to the destdir -gz="`/bin/ls openmpi*tar.gz`" -bz2="`/bin/ls openmpi*tar.bz2`" -mv $gz $bz2 $destdir -if test "$?" != "0"; then - cat < latest_snapshot.txt - -# trim the destdir to $max_snapshots -for ext in gz bz2; do - count="`ls openmpi*.tar.$ext | wc -l | awk '{ print $1 }'`" - if test "`expr $count \> $max_snapshots`" = "1"; then - num_old="`expr $count - $max_snapshots`" - old="`ls -rt openmpi*.tar.$ext | head -n $num_old`" - rm -f $old - fi -done - -# generate md5 and sha1 sums -rm -f md5sums.txt sha1sums.txt -touch md5sums.txt sha1sums.txt -for file in `/bin/ls *gz *bz2 | grep -v latest`; do - md5sum $file >> md5sums.txt - sha1sum $file >> sha1sums.txt -done - -# remove temp dirs -cd "$scratch_root" -rm -rf "$root" - -# send success mail -if test "$want_success_mail" = "1"; then - Mail -s "Create success ($version)" "$email" < #include "mpi.h" #include "opal/mca/pmix/pmix.h" +#include "opal/util/argv.h" #include "orte/runtime/runtime.h" #include "orte/util/proc_info.h" #include "orte/util/name_fns.h" @@ -117,17 +118,19 @@ static void sample(void) free(tmp); OPAL_LIST_FOREACH(kv, &response, opal_value_t) { lt = (opal_list_t*)kv->data.ptr; - OPAL_LIST_FOREACH(ival, lt, opal_value_t) { - if (0 == strcmp(ival->key, OPAL_PMIX_DAEMON_MEMORY)) { - asprintf(&tmp, "\tDaemon: %f", ival->data.fval); - opal_argv_append_nosize(&answer, tmp); - free(tmp); - } else if (0 == strcmp(ival->key, OPAL_PMIX_CLIENT_AVG_MEMORY)) { - asprintf(&tmp, "\tClient: %f", ival->data.fval); - opal_argv_append_nosize(&answer, tmp); - free(tmp); - } else { - fprintf(stderr, "\tUnknown key: %s", ival->key); + if (NULL != lt) { + OPAL_LIST_FOREACH(ival, lt, opal_value_t) { + if (0 == strcmp(ival->key, OPAL_PMIX_DAEMON_MEMORY)) { + asprintf(&tmp, "\tDaemon: %f", ival->data.fval); + opal_argv_append_nosize(&answer, tmp); + free(tmp); + } else if (0 == strcmp(ival->key, OPAL_PMIX_CLIENT_AVG_MEMORY)) { + asprintf(&tmp, "\tClient: %f", ival->data.fval); + opal_argv_append_nosize(&answer, tmp); + free(tmp); + } else { + fprintf(stderr, "\tUnknown key: %s", ival->key); + } } } } @@ -149,7 +152,6 @@ static void sample(void) } OPAL_LIST_DESTRUCT(&response); - if (0 == rank) { /* send the notification to release the other procs */ wait_for_release = true; @@ -162,19 +164,15 @@ static void sample(void) active = -1; if (OPAL_SUCCESS != opal_pmix.notify_event(MEMPROBE_RELEASE, NULL, OPAL_PMIX_RANGE_GLOBAL, &response, - notifycbfunc, (void*)&active)) { + NULL, NULL)) { fprintf(stderr, "Notify event failed\n"); exit(1); } - while (-1 == active) { + } else { + /* now wait for notification */ + while (wait_for_release) { usleep(10); } - OPAL_LIST_DESTRUCT(&response); - } - - /* now wait for notification */ - while (wait_for_release) { - usleep(10); } } diff --git a/contrib/scaling/scaling.pl b/contrib/scaling/scaling.pl index 27f03ef623b..e676b3ce23f 100755 --- a/contrib/scaling/scaling.pl +++ b/contrib/scaling/scaling.pl @@ -3,6 +3,10 @@ # Copyright (c) 2012 Los Alamos National Security, Inc. # All rights reserved. # Copyright (c) 2015-2016 Intel, Inc. All rights reserved. +# Copyright (c) 2017-2018 The University of Tennessee and The University +# of Tennessee Research Foundation. All rights +# reserved. + use strict; use Getopt::Long; @@ -17,19 +21,21 @@ my $useaprun = 0; my $useaprun = 0; my $myapp; -my $runall = 0; +my $runall = 1; my $rawoutput = 0; my $myresults = "myresults"; my $ppn = 1; +my $npmin = 1; my @csvrow; +my $multiplier = 1; my @tests = qw(/bin/true ./orte_no_op ./mpi_no_op ./mpi_no_op ./mpi_no_op); -my @options = ("", "", "", "--fwd-mpirun-port -mca mpi_add_procs_cutoff 0 -mca pmix_base_async_modex 1", "--fwd-mpirun-port -mca mpi_add_procs_cutoff 0 -mca pmix_base_async_modex 1 -mca async_mpi_init 1 -mca async_mpi_finalize 1"); -my @starterlist = qw(mpirun orterun srun aprun); -my @starteroptionlist = ("--novm", - "--hnp file:dvm_uri", - "--distribution=cyclic -N", - "-N"); +my @options = ("", "", "", "-mca mpi_add_procs_cutoff 0 -mca pmix_base_async_modex 1 -mca pmix_base_collect_data 0", "-mca mpi_add_procs_cutoff 0 -mca pmix_base_async_modex 1 -mca async_mpi_init 1 -mca async_mpi_finalize 1 -mca pmix_base_collect_data 0"); +my @starterlist = qw(mpirun prun srun aprun); +my @starteroptionlist = (" --novm --timeout 600", + " --system-server-only", + " --distribution=cyclic", + ""); # Set to true if the script should merely print the cmds # it would run, but don't run them @@ -49,10 +55,11 @@ "aprun" => \$useaprun, "mpirun" => \$usempirun, "myapp=s" => \$myapp, - "all" => \$runall, "results=s" => \$myresults, "rawout" => \$rawoutput, "ppn=s" => \$ppn, + "multiplier=s" => \$multiplier, + "npmin=s" => \$npmin, ) or die "unable to parse options, stopped"; if ($HELP) { @@ -67,10 +74,11 @@ --srun Use srun (if available) to execute the test --arpun Use aprun (if available) to execute the test --myapp=s In addition to the standard tests, run this specific application (including any args) ---all Use all available start commands [default] ---results=file File where results are to stored in comma-separated value format +--results=file File where results are to be stored in comma-separated value format --rawout Provide raw timing output to the file --ppn=n Run n procs/node +--multiplier=n Run n daemons/node (only for DVM and mpirun) +--npmin=n Minimal number of nodes "; exit(0); } @@ -89,8 +97,15 @@ my $havedvm = 0; my @starters; my @starteroptions; +my $pid; + +# if they explicitly requested specific starters, then +# only use those +if ($useaprun || $usempirun || $usesrun || $usedvm) { + $runall = 0 +} -# if they asked for all, then set all starters to requested +# if they didn't specify something, then set all starters to requested if ($runall) { $useaprun = 1; $usempirun = 1; @@ -112,21 +127,24 @@ } } if ($exists) { - if ($usedvm && $starter eq "orterun") { + if ($usedvm && $starter eq "prun") { push @starters, $starter; $opt = $starteroptionlist[$idx] . " --npernode " . $ppn; push @starteroptions, $opt; } elsif ($usempirun && $starter eq "mpirun") { push @starters, $starter; $opt = $starteroptionlist[$idx] . " --npernode " . $ppn; + if ($multiplier gt 1) { + $opt = $opt . " --mca rtc ^hwloc --mca ras_base_multiplier " . $multiplier; + } push @starteroptions, $opt; } elsif ($useaprun && $starter eq "aprun") { push @starters, $starter; - $opt = $starteroptionlist[$idx] . " " . $ppn; + $opt = $starteroptionlist[$idx] . " -N " . $ppn; push @starteroptions, $opt; } elsif ($usesrun && $starter eq "srun") { push @starters, $starter; - $opt = $starteroptionlist[$idx] . " " . $ppn; + $opt = $starteroptionlist[$idx] . " --ntasks-per-node " . $ppn; push @starteroptions, $opt; } } @@ -180,10 +198,21 @@ sub runcmd() { + my $rc; for (1..$reps) { $output = `$cmd`; + # Check the error code of the command; if the error code is alright + # just add a 0 in front of the number to neutraly mark the success; + # If the code is not correct, add a ! in front of the number to mark + # it invalid. + if($? != 0) { + $rc = "0"; + } + else { + $rc = "!"; + } if ($myresults && $rawoutput) { - print FILE $n . " " . $output . "\n"; + print FILE $n . " " . $output . " $rc\n"; } @lines = split(/\n/, $output); foreach $line (@lines) { @@ -205,14 +234,14 @@ () if (0 == $strloc) { if (0 == $idx) { # it must be in the next location - push @csvrow,$results[1]; + push @csvrow,join $rc,$results[1]; } else { # it must be in the prior location - push @csvrow,$results[$idx-1]; + push @csvrow,join $rc,$results[$idx-1]; } } else { # take the portion of the string up to the tag - push @csvrow,substr($res, 0, $strloc); + push @csvrow,join $rc,substr($res, 0, $strloc); } } else { $strloc = index($res, "elapsed"); @@ -223,14 +252,14 @@ () if (0 == $strloc) { if (0 == $idx) { # it must be in the next location - push @csvrow,$results[1]; + push @csvrow,join $rc,$results[1]; } else { # it must be in the prior location - push @csvrow,$results[$idx-1]; + push @csvrow,join $rc,$results[$idx-1]; } } else { # take the portion of the string up to the tag - push @csvrow,substr($res, 0, $strloc); + push @csvrow,join $rc,substr($res, 0, $strloc); } } } @@ -259,24 +288,34 @@ () } foreach $starter (@starters) { + my $dvmout; print "STARTER: $starter\n"; # if we are going to use the dvm, then we - if ($starter eq "orterun") { - # need to start it - if (-e "dvm_uri") { - system("rm -f dvm_uri"); + if ($starter eq "prun") { + my $dvm = "orte-dvm --system-server"; + if ($multiplier gt 1) { + $dvm = $dvm . " --mca rtc ^hwloc --mca ras_base_multiplier " . $multiplier; } - $cmd = "orte-dvm --report-uri dvm_uri 2>&1 &"; + # need to start it + print "##DVM: Launching $dvm\n"; if ($myresults) { - print FILE "\n\n$cmd\n"; + print FILE "\n\n$dvm\n"; } if (!$SHOWME) { - system($cmd); - # wait for the rendezvous file to appear - while (! -e "dvm_uri") { - sleep(1); + $havedvm = open($dvmout, $dvm."|") or die "##DVM: Spawn error $!\n"; + print "##DVM: pid=$havedvm\n"; + # Wait that the dvm reports that it is ready + my $waitready = <$dvmout>; + if($waitready =~ /DVM ready/i) { + print "##DVM: $waitready\n"; + } + else { + die "##DVM: error: $waitready\n"; } - $havedvm = 1; + } + } else { + if ($myresults) { + print FILE "\n\n"; } } @@ -286,6 +325,13 @@ () my $testnum = 0; foreach $test (@tests) { $option = $options[$testnum]; + if ($starter eq "aprun") { + $option =~ s/-mca\s+(\S+)\s+(\S+)/-e OMPI_MCA_$1=$2/g; + } + if ($starter eq "srun") { + $option =~ s/-mca\s+(\S+)\s+(\S+)\s*/OMPI_MCA_$1=$2,/g; + $option =~ s/\s*(OMPI_MCA\S+)/ --export=$1ALL/g; + } if (-e $test) { if ($myresults) { print FILE "#nodes,$test,$option\n"; @@ -293,12 +339,25 @@ () if (!$SHOWME) { # pre-position the executable $cmd = $starter . $starteroptions[$index] . " $test 2>&1"; - system($cmd); + my $error; + $error = `$cmd`; + if (0 != $error) { + if ($myresults) { + print FILE "Command $cmd returned error $error\n"; + $testnum = $testnum + 1; + next; + } + } } - $n = 1; + $n = $npmin; while ($n <= $num_nodes) { push @csvrow,$n; - $cmd = "time " . $starter . " " . $starteroptions[$index] . " $option $test 2>&1"; + if ($starter eq "prun" or $starter eq "mpirun" or $starter eq "aprun") { + my $np = $n * $ppn; + $cmd = "time " . $starter . " " . $starteroptions[$index] . " $option -n $np $test 2>&1"; + } else { + $cmd = "time " . $starter . " " . $starteroptions[$index] . " $option -N $n $test 2>&1"; + } print $cmd . "\n"; if (!$SHOWME) { runcmd(); @@ -318,15 +377,19 @@ () print "\n--------------------------------------------------\n"; } $testnum = $testnum + 1; + if ($starter eq "srun" or $starter eq "aprun") { + if ($testnum ge 3) { + last; + } + } } if ($havedvm) { if (!$SHOWME) { - $cmd = "orterun --hnp file:dvm_uri --terminate"; + $cmd = "prun --system-server-only --terminate"; system($cmd); + waitpid($havedvm, 0); } - if (-e "dvm_uri") { - system("rm -f dvm_uri"); - } + $havedvm = 0; } $index = $index + 1; } diff --git a/contrib/update-my-copyright.pl b/contrib/update-my-copyright.pl index 934758c7718..9900ff0654c 100755 --- a/contrib/update-my-copyright.pl +++ b/contrib/update-my-copyright.pl @@ -2,6 +2,7 @@ # # Copyright (c) 2010-2014 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2016-2017 Intel, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # @@ -66,12 +67,13 @@ # Defaults my $my_search_name = "Cisco"; my $my_formal_name = "Cisco Systems, Inc. All rights reserved."; +my $my_manual_list = ""; # Protected directories my @protected = qw( - opal\\/mca\\/pmi\\/pmix.+?\\/pmix\\/ + opal\\/mca\\/pmix\\/pmix.+?\\/pmix\\/ opal\\/mca\\/hwloc\\/hwloc.+?\\/hwloc\\/ - opal\\/mca\\/libevent\\/libevent.+?\\/libevent\\/ + opal\\/mca\\/event\\/libevent.+?\\/libevent\\/ contrib\\/update-my-copyright.pl ); @@ -80,6 +82,8 @@ if (defined($ENV{OMPI_COPYRIGHT_SEARCH_NAME})); $my_formal_name = $ENV{OMPI_COPYRIGHT_FORMAL_NAME} if (defined($ENV{OMPI_COPYRIGHT_FORMAL_NAME})); +$my_manual_list = $ENV{OMPI_COPYRIGHT_MANUAL_LIST} + if (defined($ENV{OMPI_COPYRIGHT_MANUAL_LIST})); GetOptions( "help" => \$HELP, @@ -87,6 +91,7 @@ "check-only" => \$CHECK_ONLY, "search-name=s" => \$my_search_name, "formal-name=s" => \$my_formal_name, + "manual-list=s" => \$my_manual_list, ) or die "unable to parse options, stopped"; if ($HELP) { @@ -98,6 +103,7 @@ --check-only exit(111) if there are files with copyrights to edit --search-name=NAME Set search name to NAME --formal-same=NAME Set formal name to NAME +--manual-list=FNAME Use specified file as list of files to mod copyright EOT exit(0); } @@ -143,6 +149,8 @@ sub quiet_print { if (-d "$top/.hg"); $vcs = "svn" if (-d "$top/.svn"); +$vcs = "manual" + if ("$my_manual_list" ne ""); my @files = find_modified_files($vcs); @@ -363,6 +371,9 @@ sub find_modified_files { } close(CMD); } + elsif ($vcs eq "manual") { + @files = split(/\n/, `cat $my_manual_list`); + } else { die "unknown VCS '$vcs', stopped"; } diff --git a/examples/Makefile b/examples/Makefile index 0a7e57c408f..86ce69b2b5c 100644 --- a/examples/Makefile +++ b/examples/Makefile @@ -13,6 +13,8 @@ # Copyright (c) 2011-2016 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2012 Los Alamos National Security, Inc. All rights reserved. # Copyright (c) 2013 Mellanox Technologies, Inc. All rights reserved. +# Copyright (c) 2017 Research Organization for Information Science +# and Technology (RIST). All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -20,16 +22,14 @@ # $HEADER$ # -# Use the Open MPI-provided wrapper compilers. Note that gmake -# requires the CXX macro, while other versions of make (such as Sun's -# make) require the CCC macro. +# Use the Open MPI-provided wrapper compilers. -CC = mpicc -CXX = mpic++ -CCC = mpic++ -FC = mpifort -JAVAC = mpijavac +MPICC = mpicc +MPICXX = mpic++ +MPIFC = mpifort +MPIJAVAC = mpijavac SHMEMCC = shmemcc +SHMEMCXX = shmemc++ SHMEMFC = shmemfort # Using -g is not necessary, but it is helpful for example programs, @@ -37,10 +37,10 @@ SHMEMFC = shmemfort # gmake requires the CXXFLAGS macro, while other versions of make # (such as Sun's make) require the CCFLAGS macro. -CFLAGS = -g -CXXFLAGS = -g -CCFLAGS = -g -FCFLAGS = -g +CFLAGS += -g +CXXFLAGS += -g +CCFLAGS += -g +FCFLAGS += -g # Example programs to build @@ -51,6 +51,7 @@ EXAMPLES = \ hello_usempi \ hello_usempif08 \ hello_oshmem \ + hello_oshmemcxx \ hello_oshmemfh \ Hello.class \ ring_c \ @@ -105,6 +106,7 @@ mpi: oshmem: @ if oshmem_info --parsable | grep oshmem:bindings:c:yes >/dev/null; then \ $(MAKE) hello_oshmem; \ + $(MAKE) hello_oshmemcxx; \ $(MAKE) ring_oshmem; \ $(MAKE) oshmem_shmalloc; \ $(MAKE) oshmem_circular_shift; \ @@ -124,47 +126,61 @@ clean: # Don't rely on default rules for the Fortran and Java examples +hello_c: hello_c.c + $(MPICC) $(CFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ +ring_c: ring_c.c + $(MPICC) $(CFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ +connectivity_c: connectivity_c.c + $(MPICC) $(CFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ + +hello_cxx: hello_cxx.cc + $(MPICXX) $(CXXFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ +ring_cxx: ring_cxx.cc + $(MPICXX) $(CXXFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ + hello_mpifh: hello_mpifh.f - $(FC) $(FCFLAGS) $? -o $@ + $(MPIFC) $(FCFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ ring_mpifh: ring_mpifh.f - $(FC) $(FCFLAGS) $? -o $@ + $(MPIFC) $(FCFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ hello_usempi: hello_usempi.f90 - $(FC) $(FCFLAGS) $? -o $@ + $(MPIFC) $(FCFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ ring_usempi: ring_usempi.f90 - $(FC) $(FCFLAGS) $? -o $@ + $(MPIFC) $(FCFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ hello_usempif08: hello_usempif08.f90 - $(FC) $(FCFLAGS) $? -o $@ + $(MPIFC) $(FCFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ ring_usempif08: ring_usempif08.f90 - $(FC) $(FCFLAGS) $? -o $@ + $(MPIFC) $(FCFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ Hello.class: Hello.java - $(JAVAC) Hello.java + $(MPIJAVAC) Hello.java Ring.class: Ring.java - $(JAVAC) Ring.java + $(MPIJAVAC) Ring.java hello_oshmem: hello_oshmem_c.c - $(SHMEMCC) $(CFLAGS) $? -o $@ + $(SHMEMCC) $(CFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ +hello_oshmemcxx: hello_oshmem_cxx.cc + $(SHMEMCXX) $(CXXFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ hello_oshmemfh: hello_oshmemfh.f90 - $(SHMEMFC) $(FCFLAGS) $? -o $@ + $(SHMEMFC) $(FCFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ ring_oshmem: ring_oshmem_c.c - $(SHMEMCC) $(CFLAGS) $? -o $@ + $(SHMEMCC) $(CFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ ring_oshmemfh: ring_oshmemfh.f90 - $(SHMEMFC) $(FCFLAGS) $? -o $@ + $(SHMEMFC) $(FCFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ oshmem_shmalloc: oshmem_shmalloc.c - $(SHMEMCC) $(CCFLAGS) $? -o $@ + $(SHMEMCC) $(CFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ oshmem_circular_shift: oshmem_circular_shift.c - $(SHMEMCC) $(CFLAGS) $? -o $@ + $(SHMEMCC) $(CFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ oshmem_max_reduction: oshmem_max_reduction.c - $(SHMEMCC) $(CFLAGS) $? -o $@ + $(SHMEMCC) $(CFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ oshmem_strided_puts: oshmem_strided_puts.c - $(SHMEMCC) $(CFLAGS) $? -o $@ + $(SHMEMCC) $(CFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ oshmem_symmetric_data: oshmem_symmetric_data.c - $(SHMEMCC) $(CFLAGS) $? -o $@ + $(SHMEMCC) $(CFLAGS) $(LDFLAGS) $? $(LDLIBS) -o $@ diff --git a/examples/Makefile.include b/examples/Makefile.include index 7707521c943..ebf0eb9d370 100644 --- a/examples/Makefile.include +++ b/examples/Makefile.include @@ -14,6 +14,8 @@ # Copyright (c) 2007 Sun Microsystems, Inc. All rights reserved. # Copyright (c) 2012 Los Alamos National Security, Inc. All rights reserved. # Copyright (c) 2013 Mellanox Technologies, Inc. All rights reserved. +# Copyright (c) 2017 Research Organization for Information Science +# and Technology (RIST). All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -39,6 +41,7 @@ EXTRA_DIST += \ examples/hello_usempi.f90 \ examples/hello_usempif08.f90 \ examples/hello_oshmem_c.c \ + examples/hello_oshmem_cxx.cc \ examples/hello_oshmemfh.f90 \ examples/ring_c.c \ examples/ring_cxx.cc \ diff --git a/examples/hello_oshmem_cxx.cc b/examples/hello_oshmem_cxx.cc new file mode 100644 index 00000000000..99d8565c8a5 --- /dev/null +++ b/examples/hello_oshmem_cxx.cc @@ -0,0 +1,39 @@ +/* + * Copyright (c) 2014 Mellanox Technologies, Inc. + * All rights reserved. + * Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include "shmem.h" + +#if !defined(OSHMEM_SPEC_VERSION) || OSHMEM_SPEC_VERSION < 10200 +#error This application uses API 1.2 and up +#endif + +int main(int argc, char* argv[]) +{ + int proc, nproc; + char name[SHMEM_MAX_NAME_LEN]; + int major, minor; + + shmem_init(); + nproc = shmem_n_pes(); + proc = shmem_my_pe(); + shmem_info_get_name(name); + shmem_info_get_version(&major, &minor); + + std::cout << "Hello, world, I am " << proc << " of " << nproc << ": " << name + << " (version: " << major << "." << minor << ")" << std::endl; + + shmem_finalize(); + + return 0; +} diff --git a/ompi/Makefile.am b/ompi/Makefile.am index f8e9b802f15..3bbf52f5d2b 100644 --- a/ompi/Makefile.am +++ b/ompi/Makefile.am @@ -9,13 +9,13 @@ # University of Stuttgart. All rights reserved. # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. -# Copyright (c) 2008-2014 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2008-2017 Cisco Systems, Inc. All rights reserved # Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved. # Copyright (c) 2010-2011 Sandia National Laboratories. All rights reserved. # Copyright (c) 2013-2015 Los Alamos National Security, LLC. All rights # reserved. -# Copyright (c) 2015 Intel, Inc. All rights reserved. -# Copyright (c) 2015 Research Organization for Information Science +# Copyright (c) 2015-2017 Intel, Inc. All rights reserved. +# Copyright (c) 2015-2017 Research Organization for Information Science # and Technology (RIST). All rights reserved. # Copyright (c) 2016 IBM Corporation. All rights reserved. # $COPYRIGHT$ @@ -90,8 +90,9 @@ SUBDIRS = \ mpi/fortran/mpif-h \ $(OMPI_MPIEXT_USEMPI_DIR) \ $(OMPI_FORTRAN_USEMPI_DIR) \ + mpi/fortran/use-mpi-f08/mod \ $(OMPI_MPIEXT_USEMPIF08_DIRS) \ - $(OMPI_FORTRAN_USEMPIF08_DIR) \ + mpi/fortran/use-mpi-f08 \ mpi/fortran/mpiext \ $(MCA_ompi_FRAMEWORK_COMPONENT_DSO_SUBDIRS) \ $(OMPI_CONTRIB_SUBDIRS) @@ -119,7 +120,7 @@ DIST_SUBDIRS = \ mpi/fortran/use-mpi-tkr \ mpi/fortran/use-mpi-ignore-tkr \ mpi/fortran/use-mpi-f08 \ - mpi/fortran/use-mpi-f08-desc \ + mpi/fortran/use-mpi-f08/mod \ mpi/fortran/mpiext \ mpi/java \ $(OMPI_MPIEXT_ALL_SUBDIRS) \ @@ -178,6 +179,7 @@ include errhandler/Makefile.am include file/Makefile.am include group/Makefile.am include info/Makefile.am +include interlib/Makefile.am include message/Makefile.am include op/Makefile.am include peruse/Makefile.am @@ -192,6 +194,7 @@ include mpiext/Makefile.am include patterns/net/Makefile.am include patterns/comm/Makefile.am include mca/Makefile.am +include util/Makefile.am # Ensure that the man page directory exists before we try to make man # page files (because ompi/mpi/man/man3 has no config.status-generated diff --git a/ompi/attribute/attribute.c b/ompi/attribute/attribute.c index 8a0a0e8d5b3..b3f5eda4568 100644 --- a/ompi/attribute/attribute.c +++ b/ompi/attribute/attribute.c @@ -12,6 +12,8 @@ * Copyright (c) 2006-2014 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2012 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -30,12 +32,13 @@ * There are several places in the standard that should be read about * attributes: * - * MPI-1: Section 5.7 (pp 167-173) - * MPI-1: Section 7.1 (pp 191-192) predefined attributes in MPI-1 - * MPI-2: Section 4.12.7 (pp 57-59) interlanguage attribute - * clarifications - * MPI-2: Section 6.2.2 (pp 112) window predefined attributes - * MPI-2: Section 8.8 (pp 198-208) new attribute caching functions + * MPI-1: Section 5.7 (pp 167-173) + * MPI-1: Section 7.1 (pp 191-192) predefined attributes in MPI-1 + * MPI-2: Section 4.12.7 (pp 57-59) interlanguage attribute + * clarifications + * MPI-2: Section 6.2.2 (pp 112) window predefined attributes + * MPI-2: Section 8.8 (pp 198-208) new attribute caching functions + * MPI-3.1: Section 11.2.6 (pp 414-415) window attributes * * After reading all of this, note the following: * @@ -50,6 +53,8 @@ * means writing a pointer to an instance of something; changing the * value of that instance will make it visible to anyone who reads * that attribute value). + * - C also internally store some int attributes of a MPI_Win by value, + * and these attributes are read-only (i.e. set once for all) * - Fortran functions store values by value (i.e., writing an * attribute value means that anyone who reads that attribute value * will not be able to affect the value read by anyone else). @@ -60,10 +65,10 @@ * - MPI-2 4.12.7:Example 4.13 (p58) is wrong. The C->Fortran example * should have the Fortran "val" variable equal to &I. * - * By the first two of these, there are 9 possible use cases -- 3 + * By the first two of these, there are 12 possible use cases -- 4 * possibilities for writing an attribute value, each of which has 3 * possibilities for reading that value back. The following lists - * each of the 9 cases, and what happens in each. + * each of the 12 cases, and what happens in each. * * Cases where C writes an attribute value: * ---------------------------------------- @@ -109,6 +114,38 @@ * CALL MPI_COMM_GET_ATTR(..., ret, ierr) * --> ret will equal &bar * + * Cases where C writes an int attribute: + * ---------------------------------------------------- + * + * In all of these cases, an int is written by C. + * This is done internally when writing the attributes of a MPI_Win + * + * Example: int foo = 7; + * ompi_set_attr_int(..., foo, ...) + * + * 4. C reads the attribute value. The value returned is a pointer + * that points to an int that has a value + * of 7. + * + * Example: int *ret; + * MPI_Attr_get(..., &ret); + * -> *ret will equal 7. + * + * 5. Fortran MPI-1 reads the attribute value. This is the unity + * case; the same value is returned. + * + * Example: INTEGER ret + * CALL MPI_ATTR_GET(..., ret, ierr) + * --> ret will equal 7 + * + * 6. Fortran MPI-2 reads the attribute value. The same value is + * returned, but potentially sign-extended if sizeof(INTEGER) < + * sizeof(INTEGER(KIND=MPI_ADDRESS_KIND)). + * + * Example: INTEGER(KIND=MPI_ADDRESS_KIND) ret + * CALL MPI_COMM_GET_ATTR(..., ret, ierr) + * --> ret will equal 7 + * * Cases where Fortran MPI-1 writes an attribute value: * ---------------------------------------------------- * @@ -117,7 +154,7 @@ * Example: INTEGER FOO = 7 * CALL MPI_ATTR_PUT(..., foo, ierr) * - * 4. C reads the attribute value. The value returned is a pointer + * 7. C reads the attribute value. The value returned is a pointer * that points to an INTEGER (i.e., an MPI_Fint) that has a value * of 7. * --> NOTE: The external MPI interface does not distinguish between @@ -128,14 +165,14 @@ * MPI_Attr_get(..., &ret); * -> *ret will equal 7. * - * 5. Fortran MPI-1 reads the attribute value. This is the unity + * 8. Fortran MPI-1 reads the attribute value. This is the unity * case; the same value is returned. * * Example: INTEGER ret * CALL MPI_ATTR_GET(..., ret, ierr) * --> ret will equal 7 * - * 6. Fortran MPI-2 reads the attribute value. The same value is + * 9. Fortran MPI-2 reads the attribute value. The same value is * returned, but potentially sign-extended if sizeof(INTEGER) < * sizeof(INTEGER(KIND=MPI_ADDRESS_KIND)). * @@ -156,7 +193,7 @@ * INTEGER(KIND=MPI_ADDRESS_KIND) FOO = pow(2, 40) * CALL MPI_COMM_PUT_ATTR(..., foo, ierr) * - * 7. C reads the attribute value. The value returned is a pointer + * 10. C reads the attribute value. The value returned is a pointer * that points to an INTEGER(KIND=MPI_ADDRESS_KIND) (i.e., a void*) * that has a value of 12. * --> NOTE: The external MPI interface does not distinguish between @@ -170,7 +207,7 @@ * MPI_Attr_get(..., &ret); * -> *ret will equal 2^40 * - * 8. Fortran MPI-1 reads the attribute value. The same value is + * 11. Fortran MPI-1 reads the attribute value. The same value is * returned, but potentially truncated if sizeof(INTEGER) < * sizeof(INTEGER(KIND=MPI_ADDRESS_KIND)). * @@ -181,7 +218,7 @@ * CALL MPI_ATTR_GET(..., ret, ierr) * --> ret will equal 0 * - * 9. Fortran MPI-2 reads the attribute value. This is the unity + * 12. Fortran MPI-2 reads the attribute value. This is the unity * case; the same value is returned. * * Example A: INTEGER(KIND=MPI_ADDRESS_KIND) ret @@ -235,10 +272,10 @@ 1. MPI-1 Fortran-style: attribute and extra state arguments are of type (INTEGER). This is used if both the OMPI_KEYVAL_F77 and - OMPI_KEYVAL_F77_MPI1 flags are set. + OMPI_KEYVAL_F77_INT flags are set. 2. MPI-2 Fortran-style: attribute and extra state arguments are of type (INTEGER(KIND=MPI_ADDRESS_KIND)). This is used if the - OMPI_KEYVAL_F77 flag is set and the OMPI_KEYVAL_F77_MPI1 flag is + OMPI_KEYVAL_F77 flag is set and the OMPI_KEYVAL_F77_INT flag is *not* set. 3. C-style: attribute arguments are of type (void*). This is used if OMPI_KEYVAL_F77 is not set. @@ -252,11 +289,13 @@ do { \ if (0 != (keyval_obj->attr_flag & OMPI_KEYVAL_F77)) { \ MPI_Fint f_key = OMPI_INT_2_FINT(key); \ MPI_Fint f_err; \ + MPI_Fint attr_##type##_f; \ + attr_##type##_f = OMPI_INT_2_FINT(((ompi_##type##_t *)keyval_obj)->attr_##type##_f); \ /* MPI-1 Fortran-style */ \ - if (0 != (keyval_obj->attr_flag & OMPI_KEYVAL_F77_MPI1)) { \ - MPI_Fint attr_val = translate_to_fortran_mpi1(attribute); \ - (*((keyval_obj->delete_attr_fn).attr_mpi1_fortran_delete_fn)) \ - (&(((ompi_##type##_t *)object)->attr_##type##_f), \ + if (0 != (keyval_obj->attr_flag & OMPI_KEYVAL_F77_INT)) { \ + MPI_Fint attr_val = translate_to_fint(attribute); \ + (*((keyval_obj->delete_attr_fn).attr_fint_delete_fn)) \ + (&attr_##type##_f, \ &f_key, &attr_val, &keyval_obj->extra_state.f_integer, &f_err); \ if (MPI_SUCCESS != OMPI_FINT_2_INT(f_err)) { \ err = OMPI_FINT_2_INT(f_err); \ @@ -264,9 +303,9 @@ do { \ } \ /* MPI-2 Fortran-style */ \ else { \ - MPI_Aint attr_val = translate_to_fortran_mpi2(attribute); \ - (*((keyval_obj->delete_attr_fn).attr_mpi2_fortran_delete_fn)) \ - (&(((ompi_##type##_t *)object)->attr_##type##_f), \ + MPI_Aint attr_val = translate_to_aint(attribute); \ + (*((keyval_obj->delete_attr_fn).attr_aint_delete_fn)) \ + (&attr_##type##_f, \ &f_key, (int*)&attr_val, &keyval_obj->extra_state.f_address, &f_err); \ if (MPI_SUCCESS != OMPI_FINT_2_INT(f_err)) { \ err = OMPI_FINT_2_INT(f_err); \ @@ -295,27 +334,31 @@ do { \ MPI_Fint f_err; \ ompi_fortran_logical_t f_flag; \ /* MPI-1 Fortran-style */ \ - if (0 != (keyval_obj->attr_flag & OMPI_KEYVAL_F77_MPI1)) { \ - MPI_Fint in, out; \ - in = translate_to_fortran_mpi1(in_attr); \ - (*((keyval_obj->copy_attr_fn).attr_mpi1_fortran_copy_fn)) \ - (&(((ompi_##type##_t *)old_object)->attr_##type##_f), \ + if (0 != (keyval_obj->attr_flag & OMPI_KEYVAL_F77_INT)) { \ + MPI_Fint in, out; \ + MPI_Fint attr_##type##_f; \ + in = translate_to_fint(in_attr); \ + attr_##type##_f = OMPI_INT_2_FINT(((ompi_##type##_t *)old_object)->attr_##type##_f); \ + (*((keyval_obj->copy_attr_fn).attr_fint_copy_fn)) \ + (&attr_##type##_f, \ &f_key, &keyval_obj->extra_state.f_integer, \ &in, &out, &f_flag, &f_err); \ if (MPI_SUCCESS != OMPI_FINT_2_INT(f_err)) { \ err = OMPI_FINT_2_INT(f_err); \ } else { \ out_attr->av_value = (void*) 0; \ - *out_attr->av_integer_pointer = out; \ + *out_attr->av_fint_pointer = out; \ flag = OMPI_LOGICAL_2_INT(f_flag); \ } \ } \ /* MPI-2 Fortran-style */ \ else { \ MPI_Aint in, out; \ - in = translate_to_fortran_mpi2(in_attr); \ - (*((keyval_obj->copy_attr_fn).attr_mpi2_fortran_copy_fn)) \ - (&(((ompi_##type##_t *)old_object)->attr_##type##_f), \ + MPI_Fint attr_##type##_f; \ + in = translate_to_aint(in_attr); \ + attr_##type##_f = OMPI_INT_2_FINT(((ompi_##type##_t *)old_object)->attr_##type##_f); \ + (*((keyval_obj->copy_attr_fn).attr_aint_copy_fn)) \ + (&attr_##type##_f, \ &f_key, &keyval_obj->extra_state.f_address, &in, &out, \ &f_flag, &f_err); \ if (MPI_SUCCESS != OMPI_FINT_2_INT(f_err)) { \ @@ -339,17 +382,16 @@ do { \ OPAL_THREAD_LOCK(&attribute_lock); \ } while (0) - /* * Cases for attribute values */ typedef enum ompi_attribute_translate_t { OMPI_ATTRIBUTE_C, - OMPI_ATTRIBUTE_FORTRAN_MPI1, - OMPI_ATTRIBUTE_FORTRAN_MPI2 + OMPI_ATTRIBUTE_INT, + OMPI_ATTRIBUTE_FINT, + OMPI_ATTRIBUTE_AINT } ompi_attribute_translate_t; - /* * struct to hold attribute values on each MPI object */ @@ -357,8 +399,9 @@ typedef struct attribute_value_t { opal_object_t super; int av_key; void *av_value; - MPI_Aint *av_address_kind_pointer; - MPI_Fint *av_integer_pointer; + int *av_int_pointer; + MPI_Fint *av_fint_pointer; + MPI_Aint *av_aint_pointer; int av_set_from; int av_sequence; } attribute_value_t; @@ -377,8 +420,9 @@ static int set_value(ompi_attribute_type_t type, void *object, static int get_value(opal_hash_table_t *attr_hash, int key, attribute_value_t **attribute, int *flag); static void *translate_to_c(attribute_value_t *val); -static MPI_Fint translate_to_fortran_mpi1(attribute_value_t *val); -static MPI_Aint translate_to_fortran_mpi2(attribute_value_t *val); +static MPI_Fint translate_to_fint(attribute_value_t *val); +static MPI_Aint translate_to_aint(attribute_value_t *val); + static int compare_attr_sequence(const void *attr1, const void *attr2); @@ -408,6 +452,7 @@ static opal_hash_table_t *keyval_hash; static opal_bitmap_t *key_bitmap; static int attr_sequence; static unsigned int int_pos = 12345; +static unsigned int integer_pos = 12345; /* * MPI attributes are *not* high performance, so just use a One Big Lock @@ -423,8 +468,9 @@ static opal_mutex_t attribute_lock; static void attribute_value_construct(attribute_value_t *item) { item->av_key = MPI_KEYVAL_INVALID; - item->av_address_kind_pointer = (MPI_Aint*) &item->av_value; - item->av_integer_pointer = &(((MPI_Fint*) &item->av_value)[int_pos]); + item->av_aint_pointer = (MPI_Aint*) &item->av_value; + item->av_int_pointer = (int *)&item->av_value + int_pos; + item->av_fint_pointer = (MPI_Fint *)&item->av_value + integer_pos; item->av_set_from = 0; item->av_sequence = -1; } @@ -475,7 +521,7 @@ int ompi_attr_init(void) { int ret; void *bogus = (void*) 1; - MPI_Fint *p = (MPI_Fint*) &bogus; + int *p = (int *) &bogus; keyval_hash = OBJ_NEW(opal_hash_table_t); if (NULL == keyval_hash) { @@ -490,13 +536,20 @@ int ompi_attr_init(void) return OMPI_ERR_OUT_OF_RESOURCE; } - for (int_pos = 0; int_pos < (sizeof(void*) / sizeof(MPI_Fint)); + for (int_pos = 0; int_pos < (sizeof(void*) / sizeof(int)); ++int_pos) { if (p[int_pos] == 1) { break; } } + for (integer_pos = 0; integer_pos < (sizeof(void*) / sizeof(MPI_Fint)); + ++integer_pos) { + if (p[integer_pos] == 1) { + break; + } + } + OBJ_CONSTRUCT(&attribute_lock, opal_mutex_t); if (OMPI_SUCCESS != (ret = opal_hash_table_init(keyval_hash, @@ -600,6 +653,9 @@ int ompi_attr_create_keyval_fint(ompi_attribute_type_t type, ompi_attribute_fortran_ptr_t es_tmp; es_tmp.f_integer = extra_state; +#if SIZEOF_INT == OMPI_SIZEOF_FORTRAN_INTEGER + flags |= OMPI_KEYVAL_F77_INT; +#endif return ompi_attr_create_keyval_impl(type, copy_attr_fn, delete_attr_fn, key, &es_tmp, flags, bindings_extra_state); @@ -686,14 +742,45 @@ int ompi_attr_set_c(ompi_attribute_type_t type, void *object, } +/* + * Front-end function internally called by the C API functions to set an + * int attribute. + */ +int ompi_attr_set_int(ompi_attribute_type_t type, void *object, + opal_hash_table_t **attr_hash, + int key, int attribute, bool predefined) +{ + int ret; + attribute_value_t *new_attr = OBJ_NEW(attribute_value_t); + if (NULL == new_attr) { + return OMPI_ERR_OUT_OF_RESOURCE; + } + + OPAL_THREAD_LOCK(&attribute_lock); + + new_attr->av_value = (void *) 0; + *new_attr->av_int_pointer = attribute; + new_attr->av_set_from = OMPI_ATTRIBUTE_INT; + ret = set_value(type, object, attr_hash, key, new_attr, predefined); + if (OMPI_SUCCESS != ret) { + OBJ_RELEASE(new_attr); + } + + opal_atomic_wmb(); + OPAL_THREAD_UNLOCK(&attribute_lock); + + return ret; +} + + /* * Front-end function called by the Fortran MPI-1 API functions to set * an attribute. */ -int ompi_attr_set_fortran_mpi1(ompi_attribute_type_t type, void *object, - opal_hash_table_t **attr_hash, - int key, MPI_Fint attribute, - bool predefined) +int ompi_attr_set_fint(ompi_attribute_type_t type, void *object, + opal_hash_table_t **attr_hash, + int key, MPI_Fint attribute, + bool predefined) { int ret; attribute_value_t *new_attr = OBJ_NEW(attribute_value_t); @@ -704,8 +791,8 @@ int ompi_attr_set_fortran_mpi1(ompi_attribute_type_t type, void *object, OPAL_THREAD_LOCK(&attribute_lock); new_attr->av_value = (void *) 0; - *new_attr->av_integer_pointer = attribute; - new_attr->av_set_from = OMPI_ATTRIBUTE_FORTRAN_MPI1; + *new_attr->av_fint_pointer = attribute; + new_attr->av_set_from = OMPI_ATTRIBUTE_FINT; ret = set_value(type, object, attr_hash, key, new_attr, predefined); if (OMPI_SUCCESS != ret) { OBJ_RELEASE(new_attr); @@ -722,10 +809,10 @@ int ompi_attr_set_fortran_mpi1(ompi_attribute_type_t type, void *object, * Front-end function called by the Fortran MPI-2 API functions to set * an attribute. */ -int ompi_attr_set_fortran_mpi2(ompi_attribute_type_t type, void *object, - opal_hash_table_t **attr_hash, - int key, MPI_Aint attribute, - bool predefined) +int ompi_attr_set_aint(ompi_attribute_type_t type, void *object, + opal_hash_table_t **attr_hash, + int key, MPI_Aint attribute, + bool predefined) { int ret; attribute_value_t *new_attr = OBJ_NEW(attribute_value_t); @@ -736,7 +823,7 @@ int ompi_attr_set_fortran_mpi2(ompi_attribute_type_t type, void *object, OPAL_THREAD_LOCK(&attribute_lock); new_attr->av_value = (void *) attribute; - new_attr->av_set_from = OMPI_ATTRIBUTE_FORTRAN_MPI2; + new_attr->av_set_from = OMPI_ATTRIBUTE_AINT; ret = set_value(type, object, attr_hash, key, new_attr, predefined); if (OMPI_SUCCESS != ret) { OBJ_RELEASE(new_attr); @@ -777,8 +864,8 @@ int ompi_attr_get_c(opal_hash_table_t *attr_hash, int key, * Front-end function called by the Fortran MPI-1 API functions to get * attributes. */ -int ompi_attr_get_fortran_mpi1(opal_hash_table_t *attr_hash, int key, - MPI_Fint *attribute, int *flag) +int ompi_attr_get_fint(opal_hash_table_t *attr_hash, int key, + MPI_Fint *attribute, int *flag) { attribute_value_t *val = NULL; int ret; @@ -787,7 +874,7 @@ int ompi_attr_get_fortran_mpi1(opal_hash_table_t *attr_hash, int key, ret = get_value(attr_hash, key, &val, flag); if (MPI_SUCCESS == ret && 1 == *flag) { - *attribute = translate_to_fortran_mpi1(val); + *attribute = translate_to_fint(val); } opal_atomic_wmb(); @@ -800,8 +887,8 @@ int ompi_attr_get_fortran_mpi1(opal_hash_table_t *attr_hash, int key, * Front-end function called by the Fortran MPI-2 API functions to get * attributes. */ -int ompi_attr_get_fortran_mpi2(opal_hash_table_t *attr_hash, int key, - MPI_Aint *attribute, int *flag) +int ompi_attr_get_aint(opal_hash_table_t *attr_hash, int key, + MPI_Aint *attribute, int *flag) { attribute_value_t *val = NULL; int ret; @@ -810,7 +897,7 @@ int ompi_attr_get_fortran_mpi2(opal_hash_table_t *attr_hash, int key, ret = get_value(attr_hash, key, &val, flag); if (MPI_SUCCESS == ret && 1 == *flag) { - *attribute = translate_to_fortran_mpi2(val); + *attribute = translate_to_aint(val); } opal_atomic_wmb(); @@ -903,10 +990,10 @@ int ompi_attr_copy_all(ompi_attribute_type_t type, void *old_object, -- not .TRUE. */ if (1 == flag) { if (0 != (hash_value->attr_flag & OMPI_KEYVAL_F77)) { - if (0 != (hash_value->attr_flag & OMPI_KEYVAL_F77_MPI1)) { - new_attr->av_set_from = OMPI_ATTRIBUTE_FORTRAN_MPI1; + if (0 != (hash_value->attr_flag & OMPI_KEYVAL_F77_INT)) { + new_attr->av_set_from = OMPI_ATTRIBUTE_FINT; } else { - new_attr->av_set_from = OMPI_ATTRIBUTE_FORTRAN_MPI2; + new_attr->av_set_from = OMPI_ATTRIBUTE_AINT; } } else { new_attr->av_set_from = OMPI_ATTRIBUTE_C; @@ -1234,19 +1321,21 @@ static void *translate_to_c(attribute_value_t *val) { switch (val->av_set_from) { case OMPI_ATTRIBUTE_C: - /* Case 1: written in C, read in C (unity) */ + /* Case 1: wrote a C pointer, read a C pointer + (unity) */ return val->av_value; - break; - case OMPI_ATTRIBUTE_FORTRAN_MPI1: - /* Case 4: written in Fortran MPI-1, read in C */ - return (void *) val->av_integer_pointer; - break; + case OMPI_ATTRIBUTE_INT: + /* Case 4: wrote an int, read a C pointer */ + return (void *) val->av_int_pointer; + + case OMPI_ATTRIBUTE_FINT: + /* Case 7: wrote a MPI_Fint, read a C pointer */ + return (void *) val->av_fint_pointer; - case OMPI_ATTRIBUTE_FORTRAN_MPI2: - /* Case 7: written in Fortran MPI-2, read in C */ - return (void *) val->av_address_kind_pointer; - break; + case OMPI_ATTRIBUTE_AINT: + /* Case 10: wrote a MPI_Aint, read a C pointer */ + return (void *) val->av_aint_pointer; default: /* Should never reach here */ @@ -1262,24 +1351,25 @@ static void *translate_to_c(attribute_value_t *val) * This function does not fail -- it is only invoked in "safe" * situations. */ -static MPI_Fint translate_to_fortran_mpi1(attribute_value_t *val) +static MPI_Fint translate_to_fint(attribute_value_t *val) { switch (val->av_set_from) { case OMPI_ATTRIBUTE_C: - /* Case 2: written in C, read in Fortran MPI-1 */ - return *val->av_integer_pointer; - break; + /* Case 2: wrote a C pointer, read a MPI_Fint */ + return (MPI_Fint)*val->av_int_pointer; - case OMPI_ATTRIBUTE_FORTRAN_MPI1: - /* Case 5: written in Fortran MPI-1, read in Fortran MPI-1 + case OMPI_ATTRIBUTE_INT: + /* Case 5: wrote an int, read a MPI_Fint */ + return (MPI_Fint)*val->av_int_pointer; + + case OMPI_ATTRIBUTE_FINT: + /* Case 8: wrote a MPI_Fint, read a MPI_Fint (unity) */ - return *val->av_integer_pointer; - break; + return *val->av_fint_pointer; - case OMPI_ATTRIBUTE_FORTRAN_MPI2: - /* Case 8: written in Fortran MPI-2, read in Fortran MPI-1 */ - return *val->av_integer_pointer; - break; + case OMPI_ATTRIBUTE_AINT: + /* Case 11: wrote a MPI_Aint, read a MPI_Fint */ + return (MPI_Fint)*val->av_fint_pointer; default: /* Should never reach here */ @@ -1295,24 +1385,25 @@ static MPI_Fint translate_to_fortran_mpi1(attribute_value_t *val) * This function does not fail -- it is only invoked in "safe" * situations. */ -static MPI_Aint translate_to_fortran_mpi2(attribute_value_t *val) +static MPI_Aint translate_to_aint(attribute_value_t *val) { switch (val->av_set_from) { case OMPI_ATTRIBUTE_C: - /* Case 3: written in C, read in Fortran MPI-2 */ + /* Case 3: wrote a C pointer, read a MPI_Aint */ return (MPI_Aint) val->av_value; - break; - case OMPI_ATTRIBUTE_FORTRAN_MPI1: - /* Case 6: written in Fortran MPI-1, read in Fortran MPI-2 */ - return (MPI_Aint) *val->av_integer_pointer; - break; + case OMPI_ATTRIBUTE_INT: + /* Case 6: wrote an int, read a MPI_Aint */ + return (MPI_Aint) *val->av_int_pointer; + + case OMPI_ATTRIBUTE_FINT: + /* Case 9: wrote a MPI_Fint, read a MPI_Aint */ + return (MPI_Aint) *val->av_fint_pointer; - case OMPI_ATTRIBUTE_FORTRAN_MPI2: - /* Case 9: written in Fortran MPI-2, read in Fortran MPI-2 + case OMPI_ATTRIBUTE_AINT: + /* Case 12: wrote a MPI_Aint, read a MPI_Aint (unity) */ return (MPI_Aint) val->av_value; - break; default: /* Should never reach here */ diff --git a/ompi/attribute/attribute.h b/ompi/attribute/attribute.h index b762aa24f45..2bec4387dad 100644 --- a/ompi/attribute/attribute.h +++ b/ompi/attribute/attribute.h @@ -10,6 +10,8 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2007 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -41,7 +43,7 @@ */ #define OMPI_KEYVAL_PREDEFINED 0x0001 #define OMPI_KEYVAL_F77 0x0002 -#define OMPI_KEYVAL_F77_MPI1 0x0004 +#define OMPI_KEYVAL_F77_INT 0x0004 BEGIN_C_DECLS @@ -62,14 +64,14 @@ typedef enum ompi_attribute_type_t ompi_attribute_type_t; delete. These will only be used here and not in the front end functions. */ -typedef void (ompi_mpi1_fortran_copy_attr_function)(MPI_Fint *oldobj, +typedef void (ompi_fint_copy_attr_function)(MPI_Fint *oldobj, MPI_Fint *keyval, MPI_Fint *extra_state, MPI_Fint *attr_in, MPI_Fint *attr_out, ompi_fortran_logical_t *flag, MPI_Fint *ierr); -typedef void (ompi_mpi1_fortran_delete_attr_function)(MPI_Fint *obj, +typedef void (ompi_fint_delete_attr_function)(MPI_Fint *obj, MPI_Fint *keyval, MPI_Fint *attr_in, MPI_Fint *extra_state, @@ -79,18 +81,18 @@ typedef void (ompi_mpi1_fortran_delete_attr_function)(MPI_Fint *obj, delete. These will only be used here and not in the front end functions. */ -typedef void (ompi_mpi2_fortran_copy_attr_function)(MPI_Fint *oldobj, - MPI_Fint *keyval, - void *extra_state, - void *attr_in, - void *attr_out, - ompi_fortran_logical_t *flag, - MPI_Fint *ierr); -typedef void (ompi_mpi2_fortran_delete_attr_function)(MPI_Fint *obj, - MPI_Fint *keyval, - void *attr_in, - void *extra_state, - MPI_Fint *ierr); +typedef void (ompi_aint_copy_attr_function)(MPI_Fint *oldobj, + MPI_Fint *keyval, + void *extra_state, + void *attr_in, + void *attr_out, + ompi_fortran_logical_t *flag, + MPI_Fint *ierr); +typedef void (ompi_aint_delete_attr_function)(MPI_Fint *obj, + MPI_Fint *keyval, + void *attr_in, + void *extra_state, + MPI_Fint *ierr); /* * Internally the copy function for all kinds of MPI objects has one more * argument, the pointer to the new object. Therefore, we can do on the @@ -124,13 +126,13 @@ union ompi_attribute_fn_ptr_union_t { /* For Fortran old MPI-1 callback functions */ - ompi_mpi1_fortran_delete_attr_function *attr_mpi1_fortran_delete_fn; - ompi_mpi1_fortran_copy_attr_function *attr_mpi1_fortran_copy_fn; + ompi_fint_delete_attr_function *attr_fint_delete_fn; + ompi_fint_copy_attr_function *attr_fint_copy_fn; /* For Fortran new MPI-2 callback functions */ - ompi_mpi2_fortran_delete_attr_function *attr_mpi2_fortran_delete_fn; - ompi_mpi2_fortran_copy_attr_function *attr_mpi2_fortran_copy_fn; + ompi_aint_delete_attr_function *attr_aint_delete_fn; + ompi_aint_copy_attr_function *attr_aint_copy_fn; }; typedef union ompi_attribute_fn_ptr_union_t ompi_attribute_fn_ptr_union_t; @@ -297,8 +299,8 @@ int ompi_attr_free_keyval(ompi_attribute_type_t type, int *key, * If (*attr_hash) == NULL, a new hash will be created and * initialized. * - * All three of these functions (ompi_attr_set_c(), - * ompi_attr_set_fortran_mpi1(), and ompi_attr_set_fortran_mpi2()) + * All four of these functions (ompi_attr_set_c(), ompi_attr_set_int(), + * ompi_attr_set_fint(), and ompi_attr_set_aint()) * could have been combined into one function that took some kind of * (void*) and an enum to indicate which way to translate the final * representation, but that just seemed to make an already complicated @@ -312,6 +314,35 @@ int ompi_attr_set_c(ompi_attribute_type_t type, void *object, opal_hash_table_t **attr_hash, int key, void *attribute, bool predefined); +/** + * Set an int predefined attribute in a form valid for C. + * + * @param type Type of attribute (COMM/WIN/DTYPE) (IN) + * @param object The actual Comm/Win/Datatype object (IN) + * @param attr_hash The attribute hash table hanging on the object(IN/OUT) + * @param key Key val for the attribute (IN) + * @param attribute The actual attribute value (IN) + * @param predefined Whether the key is predefined or not 0/1 (IN) + * @return OMPI error code + * + * If (*attr_hash) == NULL, a new hash will be created and + * initialized. + * + * All four of these functions (ompi_attr_set_c(), ompi_attr_set_int(), + * ompi_attr_set_fint(), and ompi_attr_set_aint()) + * could have been combined into one function that took some kind of + * (void*) and an enum to indicate which way to translate the final + * representation, but that just seemed to make an already complicated + * situation more complicated through yet another layer of + * indirection. + * + * So yes, this is more code, but it's clearer and less error-prone + * (read: better) this way. + */ +int ompi_attr_set_int(ompi_attribute_type_t type, void *object, + opal_hash_table_t **attr_hash, + int key, int attribute, bool predefined); + /** * Set an attribute on the comm/win/datatype in a form valid for * Fortran MPI-1. @@ -327,8 +358,8 @@ int ompi_attr_set_c(ompi_attribute_type_t type, void *object, * If (*attr_hash) == NULL, a new hash will be created and * initialized. * - * All three of these functions (ompi_attr_set_c(), - * ompi_attr_set_fortran_mpi1(), and ompi_attr_set_fortran_mpi2()) + * All four of these functions (ompi_attr_set_c(), ompi_attr_set_int(), + * ompi_attr_set_fint(), and ompi_attr_set_aint()) * could have been combined into one function that took some kind of * (void*) and an enum to indicate which way to translate the final * representation, but that just seemed to make an already complicated @@ -338,10 +369,10 @@ int ompi_attr_set_c(ompi_attribute_type_t type, void *object, * So yes, this is more code, but it's clearer and less error-prone * (read: better) this way. */ -OMPI_DECLSPEC int ompi_attr_set_fortran_mpi1(ompi_attribute_type_t type, void *object, - opal_hash_table_t **attr_hash, - int key, MPI_Fint attribute, - bool predefined); +OMPI_DECLSPEC int ompi_attr_set_fint(ompi_attribute_type_t type, void *object, + opal_hash_table_t **attr_hash, + int key, MPI_Fint attribute, + bool predefined); /** * Set an attribute on the comm/win/datatype in a form valid for @@ -358,8 +389,8 @@ OMPI_DECLSPEC int ompi_attr_set_fortran_mpi1(ompi_attribute_type_t type, void *o * If (*attr_hash) == NULL, a new hash will be created and * initialized. * - * All three of these functions (ompi_attr_set_c(), - * ompi_attr_set_fortran_mpi1(), and ompi_attr_set_fortran_mpi2()) + * All four of these functions (ompi_attr_set_c(), ompi_attr_set_int(), + * ompi_attr_set_fint(), and ompi_attr_set_aint()) * could have been combined into one function that took some kind of * (void*) and an enum to indicate which way to translate the final * representation, but that just seemed to make an already complicated @@ -369,10 +400,10 @@ OMPI_DECLSPEC int ompi_attr_set_fortran_mpi1(ompi_attribute_type_t type, void *o * So yes, this is more code, but it's clearer and less error-prone * (read: better) this way. */ -OMPI_DECLSPEC int ompi_attr_set_fortran_mpi2(ompi_attribute_type_t type, void *object, - opal_hash_table_t **attr_hash, - int key, MPI_Aint attribute, - bool predefined); +OMPI_DECLSPEC int ompi_attr_set_aint(ompi_attribute_type_t type, void *object, + opal_hash_table_t **attr_hash, + int key, MPI_Aint attribute, + bool predefined); /** * Get an attribute on the comm/win/datatype in a form valid for C. @@ -385,7 +416,7 @@ OMPI_DECLSPEC int ompi_attr_set_fortran_mpi2(ompi_attribute_type_t type, void *o * @return OMPI error code * * All three of these functions (ompi_attr_get_c(), - * ompi_attr_get_fortran_mpi1(), and ompi_attr_get_fortran_mpi2()) + * ompi_attr_get_fint(), and ompi_attr_get_aint()) * could have been combined into one function that took some kind of * (void*) and an enum to indicate which way to translate the final * representation, but that just seemed to make an already complicated @@ -412,7 +443,7 @@ int ompi_attr_get_c(opal_hash_table_t *attr_hash, int key, * @return OMPI error code * * All three of these functions (ompi_attr_get_c(), - * ompi_attr_get_fortran_mpi1(), and ompi_attr_get_fortran_mpi2()) + * ompi_attr_get_fint(), and ompi_attr_get_aint()) * could have been combined into one function that took some kind of * (void*) and an enum to indicate which way to translate the final * representation, but that just seemed to make an already complicated @@ -423,8 +454,8 @@ int ompi_attr_get_c(opal_hash_table_t *attr_hash, int key, * (read: better) this way. */ - OMPI_DECLSPEC int ompi_attr_get_fortran_mpi1(opal_hash_table_t *attr_hash, int key, - MPI_Fint *attribute, int *flag); + OMPI_DECLSPEC int ompi_attr_get_fint(opal_hash_table_t *attr_hash, int key, + MPI_Fint *attribute, int *flag); /** @@ -439,7 +470,7 @@ int ompi_attr_get_c(opal_hash_table_t *attr_hash, int key, * @return OMPI error code * * All three of these functions (ompi_attr_get_c(), - * ompi_attr_get_fortran_mpi1(), and ompi_attr_get_fortran_mpi2()) + * ompi_attr_get_fint(), and ompi_attr_get_aint()) * could have been combined into one function that took some kind of * (void*) and an enum to indicate which way to translate the final * representation, but that just seemed to make an already complicated @@ -450,8 +481,8 @@ int ompi_attr_get_c(opal_hash_table_t *attr_hash, int key, * (read: better) this way. */ -OMPI_DECLSPEC int ompi_attr_get_fortran_mpi2(opal_hash_table_t *attr_hash, int key, - MPI_Aint *attribute, int *flag); +OMPI_DECLSPEC int ompi_attr_get_aint(opal_hash_table_t *attr_hash, int key, + MPI_Aint *attribute, int *flag); /** diff --git a/ompi/attribute/attribute_predefined.c b/ompi/attribute/attribute_predefined.c index e9cdc1273e7..3213bbacdfc 100644 --- a/ompi/attribute/attribute_predefined.c +++ b/ompi/attribute/attribute_predefined.c @@ -11,6 +11,8 @@ * All rights reserved. * Copyright (c) 2006 University of Houston. All rights reserved. * Copyright (c) 2007 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -270,8 +272,8 @@ static int free_win(int keyval) static int set_f(int keyval, MPI_Fint value) { - return ompi_attr_set_fortran_mpi1(COMM_ATTR, MPI_COMM_WORLD, - &MPI_COMM_WORLD->c_keyhash, - keyval, value, - true); + return ompi_attr_set_fint(COMM_ATTR, MPI_COMM_WORLD, + &MPI_COMM_WORLD->c_keyhash, + keyval, value, + true); } diff --git a/ompi/communicator/Makefile.am b/ompi/communicator/Makefile.am index e7f6dc731ee..6f57a3787f9 100644 --- a/ompi/communicator/Makefile.am +++ b/ompi/communicator/Makefile.am @@ -10,7 +10,7 @@ # University of Stuttgart. All rights reserved. # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. -# Copyright (c) 2013 Los Alamos National Security, LLC. All rights +# Copyright (c) 2013-2017 Los Alamos National Security, LLC. All rights # reserved. # Copyright (c) 2014 Research Organization for Information Science # and Technology (RIST). All rights reserved. @@ -26,13 +26,11 @@ headers += \ communicator/communicator.h \ - communicator/comm_request.h \ - communicator/comm_helpers.h + communicator/comm_request.h lib@OMPI_LIBMPI_NAME@_la_SOURCES += \ communicator/comm_init.c \ communicator/comm.c \ communicator/comm_cid.c \ - communicator/comm_request.c \ - communicator/comm_helpers.c + communicator/comm_request.c diff --git a/ompi/communicator/comm.c b/ompi/communicator/comm.c index a14785cd2ca..228abae7ab7 100644 --- a/ompi/communicator/comm.c +++ b/ompi/communicator/comm.c @@ -18,10 +18,11 @@ * Copyright (c) 2012 Oak Ridge National Labs. All rights reserved. * Copyright (c) 2012-2016 Los Alamos National Security, LLC. * All rights reserved. - * Copyright (c) 2014-2016 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2014-2015 Intel, Inc. All rights reserved. * Copyright (c) 2015 Mellanox Technologies. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -86,7 +87,7 @@ static int ompi_comm_copy_topo (ompi_communicator_t *oldcomm, /* idup with local group and info. the local group support is provided to support ompi_comm_set_nb */ static int ompi_comm_idup_internal (ompi_communicator_t *comm, ompi_group_t *group, ompi_group_t *remote_group, - ompi_info_t *info, ompi_communicator_t **newcomm, ompi_request_t **req); + opal_info_t *info, ompi_communicator_t **newcomm, ompi_request_t **req); /**********************************************************************/ @@ -157,6 +158,10 @@ int ompi_comm_set_nb ( ompi_communicator_t **ncomm, /* ompi_comm_allocate */ newcomm = OBJ_NEW(ompi_communicator_t); + if (NULL == newcomm) { + return OMPI_ERR_OUT_OF_RESOURCE; + } + newcomm->super.s_info = NULL; /* fill in the inscribing hyper-cube dimensions */ newcomm->c_cube_dim = opal_cube_dim(local_size); newcomm->c_id_available = MPI_UNDEFINED; @@ -352,11 +357,6 @@ int ompi_comm_create ( ompi_communicator_t *comm, ompi_group_t *group, goto exit; } - if ( NULL == newcomm ) { - rc = MPI_ERR_INTERN; - goto exit; - } - /* Determine context id. It is identical to f_2_c_handle */ rc = ompi_comm_nextcid (newcomp, comm, NULL, NULL, NULL, false, mode); if ( OMPI_SUCCESS != rc ) { @@ -578,10 +578,6 @@ int ompi_comm_split( ompi_communicator_t* comm, int color, int key, local_group, /* local group */ remote_group); /* remote group */ - if ( NULL == newcomp ) { - rc = MPI_ERR_INTERN; - goto exit; - } if ( OMPI_SUCCESS != rc ) { goto exit; } @@ -596,6 +592,14 @@ int ompi_comm_split( ompi_communicator_t* comm, int color, int key, } } + /* set the rank to MPI_UNDEFINED. This prevents this process from interfering + * in ompi_comm_nextcid() and the collective module selection in ompi_comm_activate() + * for a communicator that will be freed anyway. + */ + if ( MPI_UNDEFINED == color || (inter && my_rsize==0)) { + newcomp->c_local_group->grp_my_rank = MPI_UNDEFINED; + } + /* Determine context id. It is identical to f_2_c_handle */ rc = ompi_comm_nextcid (newcomp, comm, NULL, NULL, NULL, false, mode); if ( OMPI_SUCCESS != rc ) { @@ -606,13 +610,6 @@ int ompi_comm_split( ompi_communicator_t* comm, int color, int key, snprintf(newcomp->c_name, MPI_MAX_OBJECT_NAME, "MPI COMMUNICATOR %d SPLIT FROM %d", newcomp->c_contextid, comm->c_contextid ); - /* set the rank to MPI_UNDEFINED. This prevents in comm_activate - * the collective module selection for a communicator that will - * be freed anyway. - */ - if ( MPI_UNDEFINED == color || (inter && my_rsize==0)) { - newcomp->c_local_group->grp_my_rank = MPI_UNDEFINED; - } /* Activate the communicator and init coll-component */ @@ -787,7 +784,7 @@ static int ompi_comm_split_verify (ompi_communicator_t *comm, int split_type, in } int ompi_comm_split_type (ompi_communicator_t *comm, int split_type, int key, - ompi_info_t *info, ompi_communicator_t **newcomm) + opal_info_t *info, ompi_communicator_t **newcomm) { bool need_split = false, no_reorder = false, no_undefined = false; ompi_communicator_t *newcomp = MPI_COMM_NULL; @@ -917,6 +914,12 @@ int ompi_comm_split_type (ompi_communicator_t *comm, int split_type, int key, break; } + // Copy info if there is one. + newcomp->super.s_info = OBJ_NEW(opal_info_t); + if (info) { + opal_info_dup(info, &(newcomp->super.s_info)); + } + /* Activate the communicator and init coll-component */ rc = ompi_comm_activate (&newcomp, comm, NULL, NULL, NULL, false, mode); if (OPAL_UNLIKELY(OMPI_SUCCESS != rc)) { @@ -972,7 +975,7 @@ int ompi_comm_dup ( ompi_communicator_t * comm, ompi_communicator_t **newcomm ) /**********************************************************************/ /**********************************************************************/ /**********************************************************************/ -int ompi_comm_dup_with_info ( ompi_communicator_t * comm, ompi_info_t *info, ompi_communicator_t **newcomm ) +int ompi_comm_dup_with_info ( ompi_communicator_t * comm, opal_info_t *info, ompi_communicator_t **newcomm ) { ompi_communicator_t *newcomp = NULL; ompi_group_t *remote_group = NULL; @@ -996,11 +999,7 @@ int ompi_comm_dup_with_info ( ompi_communicator_t * comm, ompi_info_t *info, omp true, /* copy the topo */ comm->c_local_group, /* local group */ remote_group ); /* remote group */ - if ( NULL == newcomp ) { - rc = MPI_ERR_INTERN; - return rc; - } - if ( MPI_SUCCESS != rc) { + if ( OMPI_SUCCESS != rc) { return rc; } @@ -1014,6 +1013,12 @@ int ompi_comm_dup_with_info ( ompi_communicator_t * comm, ompi_info_t *info, omp snprintf(newcomp->c_name, MPI_MAX_OBJECT_NAME, "MPI COMMUNICATOR %d DUP FROM %d", newcomp->c_contextid, comm->c_contextid ); + // Copy info if there is one. + newcomp->super.s_info = OBJ_NEW(opal_info_t); + if (info) { + opal_info_dup(info, &(newcomp->super.s_info)); + } + /* activate communicator and init coll-module */ rc = ompi_comm_activate (&newcomp, comm, NULL, NULL, NULL, false, mode); if ( OMPI_SUCCESS != rc ) { @@ -1042,14 +1047,14 @@ int ompi_comm_idup (ompi_communicator_t *comm, ompi_communicator_t **newcomm, om return ompi_comm_idup_with_info (comm, NULL, newcomm, req); } -int ompi_comm_idup_with_info (ompi_communicator_t *comm, ompi_info_t *info, ompi_communicator_t **newcomm, ompi_request_t **req) +int ompi_comm_idup_with_info (ompi_communicator_t *comm, opal_info_t *info, ompi_communicator_t **newcomm, ompi_request_t **req) { return ompi_comm_idup_internal (comm, comm->c_local_group, comm->c_remote_group, info, newcomm, req); } /* NTH: we need a way to idup with a smaller local group so this function takes a local group */ static int ompi_comm_idup_internal (ompi_communicator_t *comm, ompi_group_t *group, ompi_group_t *remote_group, - ompi_info_t *info, ompi_communicator_t **newcomm, ompi_request_t **req) + opal_info_t *info, ompi_communicator_t **newcomm, ompi_request_t **req) { ompi_comm_idup_with_info_context_t *context; ompi_comm_request_t *request; @@ -1089,11 +1094,20 @@ static int ompi_comm_idup_internal (ompi_communicator_t *comm, ompi_group_t *gro group, /* local group */ remote_group, /* remote group */ subreq); /* new subrequest */ - if (NULL == context->newcomp) { + if (OMPI_SUCCESS != rc) { ompi_comm_request_return (request); return rc; } + // Copy info if there is one. + { + ompi_communicator_t *newcomp = context->newcomp; + newcomp->super.s_info = OBJ_NEW(opal_info_t); + if (info) { + opal_info_dup(info, &(newcomp->super.s_info)); + } + } + ompi_comm_request_schedule_append (request, ompi_comm_idup_getcid, subreq, subreq[0] ? 1 : 0); /* assign the newcomm now */ @@ -1187,11 +1201,7 @@ int ompi_comm_create_group (ompi_communicator_t *comm, ompi_group_t *group, int true, /* copy the topo */ group, /* local group */ NULL); /* remote group */ - if ( NULL == newcomp ) { - rc = MPI_ERR_INTERN; - return rc; - } - if ( MPI_SUCCESS != rc) { + if ( OMPI_SUCCESS != rc) { return rc; } @@ -1306,16 +1316,12 @@ int ompi_comm_compare(ompi_communicator_t *comm1, ompi_communicator_t *comm2, in int ompi_comm_set_name (ompi_communicator_t *comm, const char *name ) { -#ifdef USE_MUTEX_FOR_COMMS OPAL_THREAD_LOCK(&(comm->c_lock)); -#endif memset(comm->c_name, 0, MPI_MAX_OBJECT_NAME); strncpy(comm->c_name, name, MPI_MAX_OBJECT_NAME); comm->c_name[MPI_MAX_OBJECT_NAME - 1] = 0; comm->c_flags |= OMPI_COMM_NAMEISSET; -#ifdef USE_MUTEX_FOR_COMMS OPAL_THREAD_UNLOCK(&(comm->c_lock)); -#endif return OMPI_SUCCESS; } @@ -1471,6 +1477,10 @@ int ompi_comm_free( ompi_communicator_t **comm ) ompi_mpi_comm_parent = &ompi_mpi_comm_null.comm; } + if (NULL != ((*comm)->super.s_info)) { + OBJ_RELEASE((*comm)->super.s_info); + } + /* Release the communicator */ if ( OMPI_COMM_IS_DYNAMIC (*comm) ) { ompi_comm_num_dyncomm --; @@ -1768,8 +1778,8 @@ int ompi_comm_determine_first ( ompi_communicator_t *intercomm, int high ) theirproc = ompi_group_peer_lookup(intercomm->c_remote_group,0); mask = OMPI_RTE_CMP_JOBID | OMPI_RTE_CMP_VPID; - rc = ompi_rte_compare_name_fields(mask, (const orte_process_name_t*)&(ourproc->super.proc_name), - (const orte_process_name_t*)&(theirproc->super.proc_name)); + rc = ompi_rte_compare_name_fields(mask, (const ompi_process_name_t*)&(ourproc->super.proc_name), + (const ompi_process_name_t*)&(theirproc->super.proc_name)); if ( 0 > rc ) { flag = true; } diff --git a/ompi/communicator/comm_cid.c b/ompi/communicator/comm_cid.c index fa3ac47cc00..764fe42f4e7 100644 --- a/ompi/communicator/comm_cid.c +++ b/ompi/communicator/comm_cid.c @@ -21,6 +21,7 @@ * Copyright (c) 2014-2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2016 IBM Corporation. All rights reserved. + * Copyright (c) 2017 Mellanox Technologies. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -303,6 +304,7 @@ static int ompi_comm_allreduce_getnextcid (ompi_comm_request_t *request) ompi_request_t *subreq; bool flag; int ret; + int participate = (context->newcomm->c_local_group->grp_my_rank != MPI_UNDEFINED); if (OPAL_THREAD_TRYLOCK(&ompi_cid_lock)) { return ompi_comm_request_schedule_append (request, ompi_comm_allreduce_getnextcid, NULL, 0); @@ -318,39 +320,47 @@ static int ompi_comm_allreduce_getnextcid (ompi_comm_request_t *request) /** * This is the real algorithm described in the doc */ - flag = false; - context->nextlocal_cid = mca_pml.pml_max_contextid; - for (unsigned int i = context->start ; i < mca_pml.pml_max_contextid ; ++i) { - flag = opal_pointer_array_test_and_set_item (&ompi_mpi_communicators, i, - context->comm); - if (true == flag) { - context->nextlocal_cid = i; - break; + if( participate ){ + flag = false; + context->nextlocal_cid = mca_pml.pml_max_contextid; + for (unsigned int i = context->start ; i < mca_pml.pml_max_contextid ; ++i) { + flag = opal_pointer_array_test_and_set_item (&ompi_mpi_communicators, i, + context->comm); + if (true == flag) { + context->nextlocal_cid = i; + break; + } } + } else { + context->nextlocal_cid = 0; } ret = context->allreduce_fn (&context->nextlocal_cid, &context->nextcid, 1, MPI_MAX, context, &subreq); + /* there was a failure during non-blocking collective + * all we can do is abort + */ if (OMPI_SUCCESS != ret) { - ompi_comm_cid_lowest_id = INT64_MAX; - OPAL_THREAD_UNLOCK(&ompi_cid_lock); - return ret; + goto err_exit; } - if ((unsigned int) context->nextlocal_cid == mca_pml.pml_max_contextid) { - /* at least one peer ran out of CIDs */ - if (flag) { - opal_pointer_array_test_and_set_item(&ompi_mpi_communicators, context->nextlocal_cid, NULL); - } - - ompi_comm_cid_lowest_id = INT64_MAX; - OPAL_THREAD_UNLOCK(&ompi_cid_lock); - return OMPI_ERR_OUT_OF_RESOURCE; + if ( ((unsigned int) context->nextlocal_cid == mca_pml.pml_max_contextid) ) { + /* Our local CID space is out, others already aware (allreduce above) */ + ret = OMPI_ERR_OUT_OF_RESOURCE; + goto err_exit; } OPAL_THREAD_UNLOCK(&ompi_cid_lock); /* next we want to verify that the resulting commid is ok */ return ompi_comm_request_schedule_append (request, ompi_comm_checkcid, &subreq, 1); +err_exit: + if (participate && flag) { + opal_pointer_array_test_and_set_item(&ompi_mpi_communicators, context->nextlocal_cid, NULL); + } + ompi_comm_cid_lowest_id = INT64_MAX; + OPAL_THREAD_UNLOCK(&ompi_cid_lock); + return ret; + } static int ompi_comm_checkcid (ompi_comm_request_t *request) @@ -358,18 +368,22 @@ static int ompi_comm_checkcid (ompi_comm_request_t *request) ompi_comm_cid_context_t *context = (ompi_comm_cid_context_t *) request->context; ompi_request_t *subreq; int ret; + int participate = (context->newcomm->c_local_group->grp_my_rank != MPI_UNDEFINED); if (OPAL_THREAD_TRYLOCK(&ompi_cid_lock)) { return ompi_comm_request_schedule_append (request, ompi_comm_checkcid, NULL, 0); } - context->flag = (context->nextcid == context->nextlocal_cid); - - if (!context->flag) { - opal_pointer_array_set_item(&ompi_mpi_communicators, context->nextlocal_cid, NULL); + if( !participate ){ + context->flag = 1; + } else { + context->flag = (context->nextcid == context->nextlocal_cid); + if ( participate && !context->flag) { + opal_pointer_array_set_item(&ompi_mpi_communicators, context->nextlocal_cid, NULL); - context->flag = opal_pointer_array_test_and_set_item (&ompi_mpi_communicators, - context->nextcid, context->comm); + context->flag = opal_pointer_array_test_and_set_item (&ompi_mpi_communicators, + context->nextcid, context->comm); + } } ++context->iter; @@ -377,22 +391,45 @@ static int ompi_comm_checkcid (ompi_comm_request_t *request) ret = context->allreduce_fn (&context->flag, &context->rflag, 1, MPI_MIN, context, &subreq); if (OMPI_SUCCESS == ret) { ompi_comm_request_schedule_append (request, ompi_comm_nextcid_check_flag, &subreq, 1); + } else { + if (participate && context->flag ) { + opal_pointer_array_test_and_set_item(&ompi_mpi_communicators, context->nextlocal_cid, NULL); + } + ompi_comm_cid_lowest_id = INT64_MAX; } OPAL_THREAD_UNLOCK(&ompi_cid_lock); - return ret; } static int ompi_comm_nextcid_check_flag (ompi_comm_request_t *request) { ompi_comm_cid_context_t *context = (ompi_comm_cid_context_t *) request->context; + int participate = (context->newcomm->c_local_group->grp_my_rank != MPI_UNDEFINED); if (OPAL_THREAD_TRYLOCK(&ompi_cid_lock)) { return ompi_comm_request_schedule_append (request, ompi_comm_nextcid_check_flag, NULL, 0); } if (1 == context->rflag) { + if( !participate ) { + /* we need to provide something sane here + * but we cannot use `nextcid` as we may have it + * in-use, go ahead with next locally-available CID + */ + context->nextlocal_cid = mca_pml.pml_max_contextid; + for (unsigned int i = context->start ; i < mca_pml.pml_max_contextid ; ++i) { + bool flag; + flag = opal_pointer_array_test_and_set_item (&ompi_mpi_communicators, i, + context->comm); + if (true == flag) { + context->nextlocal_cid = i; + break; + } + } + context->nextcid = context->nextlocal_cid; + } + /* set the according values to the newcomm */ context->newcomm->c_contextid = context->nextcid; opal_pointer_array_set_item (&ompi_mpi_communicators, context->nextcid, context->newcomm); @@ -405,7 +442,7 @@ static int ompi_comm_nextcid_check_flag (ompi_comm_request_t *request) return OMPI_SUCCESS; } - if (1 == context->flag) { + if (participate && (1 == context->flag)) { /* we could use this cid, but other don't agree */ opal_pointer_array_set_item (&ompi_mpi_communicators, context->nextcid, NULL); context->start = context->nextcid + 1; /* that's where we can start the next round */ @@ -469,7 +506,7 @@ int ompi_comm_activate_nb (ompi_communicator_t **newcomm, ompi_communicator_t *c if (MPI_UNDEFINED != (*newcomm)->c_local_group->grp_my_rank) { /* Initialize the PML stuff in the newcomm */ if ( OMPI_SUCCESS != (ret = MCA_PML_CALL(add_comm(*newcomm))) ) { - OBJ_RELEASE(newcomm); + OBJ_RELEASE(*newcomm); OBJ_RELEASE(context); *newcomm = MPI_COMM_NULL; return ret; diff --git a/ompi/communicator/comm_helpers.c b/ompi/communicator/comm_helpers.c deleted file mode 100644 index 584e80ee983..00000000000 --- a/ompi/communicator/comm_helpers.c +++ /dev/null @@ -1,92 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2006 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2006 The Technical University of Chemnitz. All - * rights reserved. - * Copyright (c) 2014 Research Organization for Information Science - * and Technology (RIST). All rights reserved. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights - * reserved. - * - * Author(s): Torsten Hoefler - * - */ - -#include "comm_helpers.h" - -int ompi_comm_neighbors_count(MPI_Comm comm, int *indegree, int *outdegree, int *weighted) { - int res; - - if (OMPI_COMM_IS_CART(comm)) { - int ndims; - res = MPI_Cartdim_get(comm, &ndims) ; - if (MPI_SUCCESS != res) { - return res; - } - /* outdegree is always 2*ndims because we need to iterate over empty buffers for MPI_PROC_NULL */ - *outdegree = *indegree = 2*ndims; - *weighted = 0; - } else if (OMPI_COMM_IS_GRAPH(comm)) { - int rank, nneighbors; - rank = ompi_comm_rank ((ompi_communicator_t *) comm); - res = MPI_Graph_neighbors_count(comm, rank, &nneighbors); - if (MPI_SUCCESS != res) { - return res; - } - *outdegree = *indegree = nneighbors; - *weighted = 0; - } else if (OMPI_COMM_IS_DIST_GRAPH(comm)) { - res = MPI_Dist_graph_neighbors_count(comm, indegree, outdegree, weighted); - } else { - return MPI_ERR_ARG; - } - - return MPI_SUCCESS; -} - -int ompi_comm_neighbors(MPI_Comm comm, int maxindegree, int sources[], int sourceweights[], int maxoutdegree, int destinations[], int destweights[]) { - int res; - int index = 0; - - int indeg, outdeg, wgtd; - res = ompi_comm_neighbors_count(comm, &indeg, &outdeg, &wgtd); - if (MPI_SUCCESS != res) { - return res; - } - if(indeg > maxindegree && outdeg > maxoutdegree) return MPI_ERR_TRUNCATE; /* we want to return *all* neighbors */ - - if (OMPI_COMM_IS_CART(comm)) { - int ndims, i, rpeer, speer; - res = MPI_Cartdim_get(comm, &ndims); - if (MPI_SUCCESS != res) { - return res; - } - - for(i = 0; i - * - * $HEADER$ - */ -#ifndef __TOPO_HELPERS_H__ -#define __TOPO_HELPERS_H__ -#include "ompi_config.h" - -#include "mpi.h" - -#include "ompi/include/ompi/constants.h" -#include "ompi/communicator/communicator.h" - -#include -#include -#include -#include -#include -#include - -#ifdef __cplusplus -extern "C" { -#endif - -int ompi_comm_neighbors_count(MPI_Comm comm, int *indegree, int *outdegree, int *weighted); -int ompi_comm_neighbors(MPI_Comm comm, int maxindegree, int sources[], int sourceweights[], int maxoutdegree, int destinations[], int destweights[]); - -#ifdef __cplusplus -} -#endif - -#endif diff --git a/ompi/communicator/comm_init.c b/ompi/communicator/comm_init.c index 914f58a7119..75aac4d49e3 100644 --- a/ompi/communicator/comm_init.c +++ b/ompi/communicator/comm_init.c @@ -10,7 +10,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2006-2010 University of Houston. All rights reserved. + * Copyright (c) 2006-2017 University of Houston. All rights reserved. * Copyright (c) 2007-2012 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2012-2015 Los Alamos National Security, LLC. @@ -18,9 +18,10 @@ * Copyright (c) 2011-2013 Inria. All rights reserved. * Copyright (c) 2011-2013 Universite Bordeaux 1 * All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2015 Intel, Inc. All rights reserved. + * Copyright (c) 2015-2017 Intel, Inc. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -33,6 +34,8 @@ #include #include "opal/util/bit_ops.h" +#include "opal/util/info_subscriber.h" +#include "opal/mca/pmix/pmix.h" #include "ompi/constants.h" #include "ompi/mca/pml/pml.h" #include "ompi/mca/coll/base/base.h" @@ -52,9 +55,9 @@ opal_pointer_array_t ompi_mpi_communicators = {{0}}; opal_pointer_array_t ompi_comm_f_to_c_table = {{0}}; -ompi_predefined_communicator_t ompi_mpi_comm_world = {{{0}}}; -ompi_predefined_communicator_t ompi_mpi_comm_self = {{{0}}}; -ompi_predefined_communicator_t ompi_mpi_comm_null = {{{0}}}; +ompi_predefined_communicator_t ompi_mpi_comm_world = {{{{0}}}}; +ompi_predefined_communicator_t ompi_mpi_comm_self = {{{{0}}}}; +ompi_predefined_communicator_t ompi_mpi_comm_null = {{{{0}}}}; ompi_communicator_t *ompi_mpi_comm_parent = NULL; ompi_predefined_communicator_t *ompi_mpi_comm_world_addr = @@ -67,7 +70,7 @@ ompi_predefined_communicator_t *ompi_mpi_comm_null_addr = static void ompi_comm_construct(ompi_communicator_t* comm); static void ompi_comm_destruct(ompi_communicator_t* comm); -OBJ_CLASS_INSTANCE(ompi_communicator_t, opal_object_t, +OBJ_CLASS_INSTANCE(ompi_communicator_t, opal_infosubscriber_t, ompi_comm_construct, ompi_comm_destruct); @@ -86,15 +89,15 @@ int ompi_comm_init(void) /* Setup communicator array */ OBJ_CONSTRUCT(&ompi_mpi_communicators, opal_pointer_array_t); - if( OPAL_SUCCESS != opal_pointer_array_init(&ompi_mpi_communicators, 0, + if( OPAL_SUCCESS != opal_pointer_array_init(&ompi_mpi_communicators, 16, OMPI_FORTRAN_HANDLE_MAX, 64) ) { return OMPI_ERROR; } /* Setup f to c table (we can no longer use the cid as the fortran handle) */ OBJ_CONSTRUCT(&ompi_comm_f_to_c_table, opal_pointer_array_t); - if( OPAL_SUCCESS != opal_pointer_array_init(&ompi_comm_f_to_c_table, 0, - OMPI_FORTRAN_HANDLE_MAX, 64) ) { + if( OPAL_SUCCESS != opal_pointer_array_init(&ompi_comm_f_to_c_table, 8, + OMPI_FORTRAN_HANDLE_MAX, 32) ) { return OMPI_ERROR; } @@ -148,6 +151,26 @@ int ompi_comm_init(void) because MPI_COMM_WORLD has some predefined attributes. */ ompi_attr_hash_init(&ompi_mpi_comm_world.comm.c_keyhash); + /* Check for the binding policy used. We are only interested in + whether mapby-node has been set right now (could be extended later) + and only on MPI_COMM_WORLD, since for all other sub-communicators + it is virtually impossible to identify their layout across nodes + in the most generic sense. This is used by OMPIO for deciding which + ranks to use for aggregators + */ + opal_process_name_t wildcard = {OMPI_PROC_MY_NAME->jobid, OPAL_VPID_WILDCARD}; + char *str=NULL; + int rc; + + OPAL_MODEX_RECV_VALUE_OPTIONAL(rc, OPAL_PMIX_MAPBY, &wildcard, &str, OPAL_STRING); + if ( 0 == rc && NULL != str) { + if ( strstr ( str, "BYNODE") ) { + OMPI_COMM_SET_MAPBY_NODE(&ompi_mpi_comm_world.comm); + } + if (NULL != str) { + free(str); + } + } /* Setup MPI_COMM_SELF */ OBJ_CONSTRUCT(&ompi_mpi_comm_self, ompi_communicator_t); assert(ompi_mpi_comm_self.comm.c_f_to_c_index == 1); @@ -219,6 +242,7 @@ ompi_communicator_t *ompi_comm_allocate ( int local_size, int remote_size ) /* create new communicator element */ new_comm = OBJ_NEW(ompi_communicator_t); + new_comm->super.s_info = NULL; new_comm->c_local_group = ompi_group_allocate ( local_size ); if ( 0 < remote_size ) { new_comm->c_remote_group = ompi_group_allocate (remote_size); @@ -363,6 +387,7 @@ static void ompi_comm_construct(ompi_communicator_t* comm) #ifdef OMPI_WANT_PERUSE comm->c_peruse_handles = NULL; #endif + OBJ_CONSTRUCT(&comm->c_lock, opal_mutex_t); } static void ompi_comm_destruct(ompi_communicator_t* comm) @@ -440,4 +465,43 @@ static void ompi_comm_destruct(ompi_communicator_t* comm) opal_pointer_array_set_item ( &ompi_comm_f_to_c_table, comm->c_f_to_c_index, NULL); } + + OBJ_DESTRUCT(&comm->c_lock); +} + +#define OMPI_COMM_SET_INFO_FN(name, flag) \ + static char *ompi_comm_set_ ## name (opal_infosubscriber_t *obj, char *key, char *value) \ + { \ + ompi_communicator_t *comm = (ompi_communicator_t *) obj; \ + \ + if (opal_str_to_bool(value)) { \ + comm->c_assertions |= flag; \ + } else { \ + comm->c_assertions &= ~flag; \ + } \ + \ + return OMPI_COMM_CHECK_ASSERT(comm, flag) ? "true" : "false"; \ + } + +OMPI_COMM_SET_INFO_FN(no_any_source, OMPI_COMM_ASSERT_NO_ANY_SOURCE) +OMPI_COMM_SET_INFO_FN(no_any_tag, OMPI_COMM_ASSERT_NO_ANY_TAG) +OMPI_COMM_SET_INFO_FN(allow_overtake, OMPI_COMM_ASSERT_ALLOW_OVERTAKE) +OMPI_COMM_SET_INFO_FN(exact_length, OMPI_COMM_ASSERT_EXACT_LENGTH) + +void ompi_comm_assert_subscribe (ompi_communicator_t *comm, int32_t assert_flag) +{ + switch (assert_flag) { + case OMPI_COMM_ASSERT_NO_ANY_SOURCE: + opal_infosubscribe_subscribe (&comm->super, "mpi_assert_no_any_source", "false", ompi_comm_set_no_any_source); + break; + case OMPI_COMM_ASSERT_NO_ANY_TAG: + opal_infosubscribe_subscribe (&comm->super, "mpi_assert_no_any_tag", "false", ompi_comm_set_no_any_tag); + break; + case OMPI_COMM_ASSERT_ALLOW_OVERTAKE: + opal_infosubscribe_subscribe (&comm->super, "mpi_assert_allow_overtaking", "false", ompi_comm_set_allow_overtake); + break; + case OMPI_COMM_ASSERT_EXACT_LENGTH: + opal_infosubscribe_subscribe (&comm->super, "mpi_assert_exact_length", "false", ompi_comm_set_exact_length); + break; + } } diff --git a/ompi/communicator/communicator.h b/ompi/communicator/communicator.h index 5c51ffdbe7b..4fe4721244c 100644 --- a/ompi/communicator/communicator.h +++ b/ompi/communicator/communicator.h @@ -10,8 +10,8 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2006-2010 University of Houston. All rights reserved. + * Copyright (c) 2006-2017 Cisco Systems, Inc. All rights reserved + * Copyright (c) 2006-2017 University of Houston. All rights reserved. * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2011-2013 Inria. All rights reserved. * Copyright (c) 2011-2013 Universite Bordeaux 1 @@ -20,7 +20,7 @@ * Copyright (c) 2014-2015 Intel, Inc. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2017 IBM Corporation. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -33,6 +33,8 @@ #include "ompi_config.h" #include "opal/class/opal_object.h" +#include "opal/class/opal_hash_table.h" +#include "opal/util/info_subscriber.h" #include "ompi/errhandler/errhandler.h" #include "opal/threads/mutex.h" #include "ompi/communicator/comm_request.h" @@ -58,6 +60,7 @@ OMPI_DECLSPEC OBJ_CLASS_DECLARATION(ompi_communicator_t); #define OMPI_COMM_DIST_GRAPH 0x00000400 #define OMPI_COMM_PML_ADDED 0x00001000 #define OMPI_COMM_EXTRA_RETAIN 0x00004000 +#define OMPI_COMM_MAPBY_NODE 0x00008000 /* some utility #defines */ #define OMPI_COMM_IS_INTER(comm) ((comm)->c_flags & OMPI_COMM_INTER) @@ -74,12 +77,14 @@ OMPI_DECLSPEC OBJ_CLASS_DECLARATION(ompi_communicator_t); #define OMPI_COMM_IS_TOPO(comm) (OMPI_COMM_IS_CART((comm)) || \ OMPI_COMM_IS_GRAPH((comm)) || \ OMPI_COMM_IS_DIST_GRAPH((comm))) +#define OMPI_COMM_IS_MAPBY_NODE(comm) ((comm)->c_flags & OMPI_COMM_MAPBY_NODE) #define OMPI_COMM_SET_DYNAMIC(comm) ((comm)->c_flags |= OMPI_COMM_DYNAMIC) #define OMPI_COMM_SET_INVALID(comm) ((comm)->c_flags |= OMPI_COMM_INVALID) #define OMPI_COMM_SET_PML_ADDED(comm) ((comm)->c_flags |= OMPI_COMM_PML_ADDED) #define OMPI_COMM_SET_EXTRA_RETAIN(comm) ((comm)->c_flags |= OMPI_COMM_EXTRA_RETAIN) +#define OMPI_COMM_SET_MAPBY_NODE(comm) ((comm)->c_flags |= OMPI_COMM_MAPBY_NODE) /* a set of special tags: */ @@ -88,6 +93,17 @@ OMPI_DECLSPEC OBJ_CLASS_DECLARATION(ompi_communicator_t); #define OMPI_COMM_BARRIER_TAG -31079 #define OMPI_COMM_ALLREDUCE_TAG -31080 +#define OMPI_COMM_ASSERT_NO_ANY_TAG 0x00000001 +#define OMPI_COMM_ASSERT_NO_ANY_SOURCE 0x00000002 +#define OMPI_COMM_ASSERT_EXACT_LENGTH 0x00000004 +#define OMPI_COMM_ASSERT_ALLOW_OVERTAKE 0x00000008 + +#define OMPI_COMM_CHECK_ASSERT(comm, flag) !!((comm)->c_assertions & flag) +#define OMPI_COMM_CHECK_ASSERT_NO_ANY_TAG(comm) OMPI_COMM_CHECK_ASSERT(comm, OMPI_COMM_ASSERT_NO_ANY_TAG) +#define OMPI_COMM_CHECK_ASSERT_NO_ANY_SOURCE(comm) OMPI_COMM_CHECK_ASSERT(comm, OMPI_COMM_ASSERT_NO_ANY_SOURCE) +#define OMPI_COMM_CHECK_ASSERT_EXACT_LENGTH(comm) OMPI_COMM_CHECK_ASSERT(comm, OMPI_COMM_ASSERT_EXACT_LENGTH) +#define OMPI_COMM_CHECK_ASSERT_ALLOW_OVERTAKE(comm) OMPI_COMM_CHECK_ASSERT(comm, OMPI_COMM_ASSERT_ALLOW_OVERTAKE) + /** * Modes required for acquiring the new comm-id. * The first (INTER/INTRA) indicates whether the @@ -116,7 +132,7 @@ OMPI_DECLSPEC extern opal_pointer_array_t ompi_mpi_communicators; OMPI_DECLSPEC extern opal_pointer_array_t ompi_comm_f_to_c_table; struct ompi_communicator_t { - opal_object_t c_base; + opal_infosubscriber_t super; opal_mutex_t c_lock; /* mutex for name and potentially attributes */ char c_name[MPI_MAX_OBJECT_NAME]; @@ -124,6 +140,7 @@ struct ompi_communicator_t { int c_my_rank; uint32_t c_flags; /* flags, e.g. intercomm, topology, etc. */ + uint32_t c_assertions; /* info assertions */ int c_id_available; /* the currently available Cid for allocation to a child*/ @@ -240,6 +257,15 @@ typedef struct ompi_communicator_t ompi_communicator_t; * the ompi_communicator_t without impacting the size of the * ompi_predefined_communicator_t structure for some number of additions. * + * Note: we used to define the PAD as a multiple of sizeof(void*). + * However, this makes a different size PAD, depending on + * sizeof(void*). In some cases + * (https://github.com/open-mpi/ompi/issues/3610), 32 bit builds can + * run out of space when 64 bit builds are still ok. So we changed to + * use just a naked byte size. As a rule of thumb, however, the size + * should probably still be a multiple of 8 so that it has the + * possibility of being nicely aligned. + * * As an example: * If the size of ompi_communicator_t is less than the size of the _PAD then * the _PAD ensures that the size of the ompi_predefined_communicator_t is @@ -256,7 +282,7 @@ typedef struct ompi_communicator_t ompi_communicator_t; * the PREDEFINED_COMMUNICATOR_PAD macro? * A: Most likely not, but it would be good to check. */ -#define PREDEFINED_COMMUNICATOR_PAD (sizeof(void*) * 64) +#define PREDEFINED_COMMUNICATOR_PAD 512 struct ompi_predefined_communicator_t { struct ompi_communicator_t comm; @@ -442,7 +468,7 @@ OMPI_DECLSPEC int ompi_comm_split (ompi_communicator_t *comm, int color, int key */ OMPI_DECLSPEC int ompi_comm_split_type(ompi_communicator_t *comm, int split_type, int key, - struct ompi_info_t *info, + struct opal_info_t *info, ompi_communicator_t** newcomm); /** @@ -473,7 +499,7 @@ OMPI_DECLSPEC int ompi_comm_idup (ompi_communicator_t *comm, ompi_communicator_t * @param comm: input communicator * @param newcomm: the new communicator or MPI_COMM_NULL if any error is detected. */ -OMPI_DECLSPEC int ompi_comm_dup_with_info (ompi_communicator_t *comm, ompi_info_t *info, ompi_communicator_t **newcomm); +OMPI_DECLSPEC int ompi_comm_dup_with_info (ompi_communicator_t *comm, opal_info_t *info, ompi_communicator_t **newcomm); /** * dup a communicator (non-blocking) with info. @@ -483,7 +509,7 @@ OMPI_DECLSPEC int ompi_comm_dup_with_info (ompi_communicator_t *comm, ompi_info_ * @param comm: input communicator * @param newcomm: the new communicator or MPI_COMM_NULL if any error is detected. */ -OMPI_DECLSPEC int ompi_comm_idup_with_info (ompi_communicator_t *comm, ompi_info_t *info, ompi_communicator_t **newcomm, ompi_request_t **req); +OMPI_DECLSPEC int ompi_comm_idup_with_info (ompi_communicator_t *comm, opal_info_t *info, ompi_communicator_t **newcomm, ompi_request_t **req); /** * compare two communicators. @@ -686,6 +712,8 @@ extern int ompi_comm_num_dyncomm; OMPI_DECLSPEC int ompi_comm_cid_init ( void ); +void ompi_comm_assert_subscribe (ompi_communicator_t *comm, int32_t assert_flag); + END_C_DECLS #endif /* OMPI_COMMUNICATOR_H */ diff --git a/ompi/contrib/libompitrace/Makefile.am b/ompi/contrib/libompitrace/Makefile.am index 9b4cbdf5e88..be35171d42a 100644 --- a/ompi/contrib/libompitrace/Makefile.am +++ b/ompi/contrib/libompitrace/Makefile.am @@ -27,13 +27,13 @@ libompitrace_la_SOURCES = \ add_error_class.c \ add_error_code.c \ add_error_string.c \ - address.c \ allgather.c \ allgatherv.c \ alloc_mem.c \ allreduce.c \ barrier.c \ bcast.c \ + get_address.c \ finalize.c \ init.c \ isend.c \ diff --git a/ompi/contrib/libompitrace/address.c b/ompi/contrib/libompitrace/address.c deleted file mode 100644 index 259d9f30639..00000000000 --- a/ompi/contrib/libompitrace/address.c +++ /dev/null @@ -1,39 +0,0 @@ -/* - * Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2005 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2009 Cisco Systems, Inc. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#include "ompi_config.h" - -#include - -#include "opal_stdint.h" -#include "ompi/mpi/c/bindings.h" - -int MPI_Address(void *location, MPI_Aint *address) -{ - - int rank; - - PMPI_Comm_rank(MPI_COMM_WORLD, &rank); - - fprintf(stderr, "MPI_ADDRESS[%d]: location %0" PRIxPTR " address %0" PRIxPTR "\n", - rank, (uintptr_t)location, (uintptr_t)address); - fflush(stderr); - - return PMPI_Address(location, address); -} diff --git a/ompi/contrib/libompitrace/get_address.c b/ompi/contrib/libompitrace/get_address.c new file mode 100644 index 00000000000..bef3d347967 --- /dev/null +++ b/ompi/contrib/libompitrace/get_address.c @@ -0,0 +1,41 @@ +/* + * Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana + * University Research and Technology + * Corporation. All rights reserved. + * Copyright (c) 2004-2005 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, + * University of Stuttgart. All rights reserved. + * Copyright (c) 2004-2005 The Regents of the University of California. + * All rights reserved. + * Copyright (c) 2009 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2018 Los Alamos National Security, LLC. All + * rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "ompi_config.h" + +#include + +#include "opal_stdint.h" +#include "ompi/mpi/c/bindings.h" + +int MPI_Get_address(const void *location, MPI_Aint *address) +{ + + int rank; + + PMPI_Comm_rank(MPI_COMM_WORLD, &rank); + + fprintf(stderr, "MPI_GET_ADDRESS[%d]: location %0" PRIxPTR " address %0" PRIxPTR "\n", + rank, (uintptr_t)location, (uintptr_t)address); + fflush(stderr); + + return PMPI_Get_address(location, address); +} diff --git a/ompi/datatype/ompi_datatype.h b/ompi/datatype/ompi_datatype.h index 15284f1fd3c..8b48bc30973 100644 --- a/ompi/datatype/ompi_datatype.h +++ b/ompi/datatype/ompi_datatype.h @@ -4,10 +4,10 @@ * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. - * Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2010-2017 Cisco Systems, Inc. All rights reserved * Copyright (c) 2013 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2015-2018 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -44,6 +44,8 @@ BEGIN_C_DECLS /* These flags are on top of the flags in opal_datatype.h */ /* Is the datatype predefined as MPI type (not necessarily as OPAL type, e.g. struct/block types) */ #define OMPI_DATATYPE_FLAG_PREDEFINED 0x0200 +#define OMPI_DATATYPE_FLAG_ANALYZED 0x0400 +#define OMPI_DATATYPE_FLAG_MONOTONIC 0x0800 /* Keep trace of the type of the predefined datatypes */ #define OMPI_DATATYPE_FLAG_DATA_INT 0x1000 #define OMPI_DATATYPE_FLAG_DATA_FLOAT 0x2000 @@ -95,7 +97,7 @@ OMPI_DECLSPEC OBJ_CLASS_DECLARATION(ompi_datatype_t); /* Using set constant for padding of the DATATYPE handles because the size of * base structure is very close to being the same no matter the bitness. */ -#define PREDEFINED_DATATYPE_PAD (512) +#define PREDEFINED_DATATYPE_PAD 512 struct ompi_predefined_datatype_t { struct ompi_datatype_t dt; @@ -152,13 +154,23 @@ ompi_datatype_is_contiguous_memory_layout( const ompi_datatype_t* type, int32_t return opal_datatype_is_contiguous_memory_layout(&type->super, count); } +static inline int32_t +ompi_datatype_is_monotonic( ompi_datatype_t * type ) { + if (!(type->super.flags & OMPI_DATATYPE_FLAG_ANALYZED)) { + if (opal_datatype_is_monotonic(&type->super)) { + type->super.flags |= OMPI_DATATYPE_FLAG_MONOTONIC; + } + type->super.flags |= OMPI_DATATYPE_FLAG_ANALYZED; + } + return type->super.flags & OMPI_DATATYPE_FLAG_MONOTONIC; +} + static inline int32_t ompi_datatype_commit( ompi_datatype_t ** type ) { return opal_datatype_commit ( (opal_datatype_t*)*type ); } - OMPI_DECLSPEC int32_t ompi_datatype_destroy( ompi_datatype_t** type); @@ -166,8 +178,8 @@ OMPI_DECLSPEC int32_t ompi_datatype_destroy( ompi_datatype_t** type); * Datatype creation functions */ static inline int32_t -ompi_datatype_add( ompi_datatype_t* pdtBase, const ompi_datatype_t* pdtAdd, uint32_t count, - OPAL_PTRDIFF_TYPE disp, OPAL_PTRDIFF_TYPE extent ) +ompi_datatype_add( ompi_datatype_t* pdtBase, const ompi_datatype_t* pdtAdd, size_t count, + ptrdiff_t disp, ptrdiff_t extent ) { return opal_datatype_add( &pdtBase->super, &pdtAdd->super, count, disp, extent ); } @@ -178,17 +190,17 @@ ompi_datatype_duplicate( const ompi_datatype_t* oldType, ompi_datatype_t** newTy OMPI_DECLSPEC int32_t ompi_datatype_create_contiguous( int count, const ompi_datatype_t* oldType, ompi_datatype_t** newType ); OMPI_DECLSPEC int32_t ompi_datatype_create_vector( int count, int bLength, int stride, const ompi_datatype_t* oldType, ompi_datatype_t** newType ); -OMPI_DECLSPEC int32_t ompi_datatype_create_hvector( int count, int bLength, OPAL_PTRDIFF_TYPE stride, +OMPI_DECLSPEC int32_t ompi_datatype_create_hvector( int count, int bLength, ptrdiff_t stride, const ompi_datatype_t* oldType, ompi_datatype_t** newType ); OMPI_DECLSPEC int32_t ompi_datatype_create_indexed( int count, const int* pBlockLength, const int* pDisp, const ompi_datatype_t* oldType, ompi_datatype_t** newType ); -OMPI_DECLSPEC int32_t ompi_datatype_create_hindexed( int count, const int* pBlockLength, const OPAL_PTRDIFF_TYPE* pDisp, +OMPI_DECLSPEC int32_t ompi_datatype_create_hindexed( int count, const int* pBlockLength, const ptrdiff_t* pDisp, const ompi_datatype_t* oldType, ompi_datatype_t** newType ); OMPI_DECLSPEC int32_t ompi_datatype_create_indexed_block( int count, int bLength, const int* pDisp, const ompi_datatype_t* oldType, ompi_datatype_t** newType ); -OMPI_DECLSPEC int32_t ompi_datatype_create_hindexed_block( int count, int bLength, const OPAL_PTRDIFF_TYPE* pDisp, +OMPI_DECLSPEC int32_t ompi_datatype_create_hindexed_block( int count, int bLength, const ptrdiff_t* pDisp, const ompi_datatype_t* oldType, ompi_datatype_t** newType ); -OMPI_DECLSPEC int32_t ompi_datatype_create_struct( int count, const int* pBlockLength, const OPAL_PTRDIFF_TYPE* pDisp, +OMPI_DECLSPEC int32_t ompi_datatype_create_struct( int count, const int* pBlockLength, const ptrdiff_t* pDisp, ompi_datatype_t* const* pTypes, ompi_datatype_t** newType ); OMPI_DECLSPEC int32_t ompi_datatype_create_darray( int size, int rank, int ndims, int const* gsize_array, int const* distrib_array, int const* darg_array, @@ -199,8 +211,8 @@ OMPI_DECLSPEC int32_t ompi_datatype_create_subarray(int ndims, int const* size_a const ompi_datatype_t* oldtype, ompi_datatype_t** newtype); static inline int32_t ompi_datatype_create_resized( const ompi_datatype_t* oldType, - OPAL_PTRDIFF_TYPE lb, - OPAL_PTRDIFF_TYPE extent, + ptrdiff_t lb, + ptrdiff_t extent, ompi_datatype_t** newType ) { ompi_datatype_t * type; @@ -214,13 +226,13 @@ ompi_datatype_create_resized( const ompi_datatype_t* oldType, } static inline int32_t -ompi_datatype_type_lb( const ompi_datatype_t* type, OPAL_PTRDIFF_TYPE* disp ) +ompi_datatype_type_lb( const ompi_datatype_t* type, ptrdiff_t* disp ) { return opal_datatype_type_lb(&type->super, disp); } static inline int32_t -ompi_datatype_type_ub( const ompi_datatype_t* type, OPAL_PTRDIFF_TYPE* disp ) +ompi_datatype_type_ub( const ompi_datatype_t* type, ptrdiff_t* disp ) { return opal_datatype_type_ub( &type->super, disp); } @@ -232,19 +244,19 @@ ompi_datatype_type_size ( const ompi_datatype_t* type, size_t *size ) } static inline int32_t -ompi_datatype_type_extent( const ompi_datatype_t* type, OPAL_PTRDIFF_TYPE* extent ) +ompi_datatype_type_extent( const ompi_datatype_t* type, ptrdiff_t* extent ) { return opal_datatype_type_extent( &type->super, extent); } static inline int32_t -ompi_datatype_get_extent( const ompi_datatype_t* type, OPAL_PTRDIFF_TYPE* lb, OPAL_PTRDIFF_TYPE* extent) +ompi_datatype_get_extent( const ompi_datatype_t* type, ptrdiff_t* lb, ptrdiff_t* extent) { return opal_datatype_get_extent( &type->super, lb, extent); } static inline int32_t -ompi_datatype_get_true_extent( const ompi_datatype_t* type, OPAL_PTRDIFF_TYPE* true_lb, OPAL_PTRDIFF_TYPE* true_extent) +ompi_datatype_get_true_extent( const ompi_datatype_t* type, ptrdiff_t* true_lb, ptrdiff_t* true_extent) { return opal_datatype_get_true_extent( &type->super, true_lb, true_extent); } @@ -266,7 +278,7 @@ ompi_datatype_copy_content_same_ddt( const ompi_datatype_t* type, size_t count, char* pDestBuf, char* pSrcBuf ) { int32_t length, rc; - OPAL_PTRDIFF_TYPE extent; + ptrdiff_t extent; ompi_datatype_type_extent( type, &extent ); while( 0 != count ) { @@ -295,11 +307,11 @@ OMPI_DECLSPEC int32_t ompi_datatype_sndrcv( const void *sbuf, int32_t scount, co */ OMPI_DECLSPEC int32_t ompi_datatype_get_args( const ompi_datatype_t* pData, int32_t which, int32_t * ci, int32_t * i, - int32_t * ca, OPAL_PTRDIFF_TYPE* a, + int32_t * ca, ptrdiff_t* a, int32_t * cd, ompi_datatype_t** d, int32_t * type); OMPI_DECLSPEC int32_t ompi_datatype_set_args( ompi_datatype_t* pData, int32_t ci, const int32_t ** i, - int32_t ca, const OPAL_PTRDIFF_TYPE* a, + int32_t ca, const ptrdiff_t* a, int32_t cd, ompi_datatype_t* const * d,int32_t type); OMPI_DECLSPEC int32_t ompi_datatype_copy_args( const ompi_datatype_t* source_data, ompi_datatype_t* dest_data ); diff --git a/ompi/datatype/ompi_datatype_args.c b/ompi/datatype/ompi_datatype_args.c index 6fdd5167f19..d301aa44e78 100644 --- a/ompi/datatype/ompi_datatype_args.c +++ b/ompi/datatype/ompi_datatype_args.c @@ -11,10 +11,11 @@ * Copyright (c) 2004-2006 The Regents of the University of California. * All rights reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. - * Copyright (c) 2013-2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2013-2017 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -40,7 +41,7 @@ static inline int __ompi_datatype_pack_description( ompi_datatype_t* datatype, void** packed_buffer, int* next_index ); static ompi_datatype_t* -__ompi_datatype_create_from_args( int32_t* i, OPAL_PTRDIFF_TYPE * a, +__ompi_datatype_create_from_args( int32_t* i, ptrdiff_t * a, ompi_datatype_t** d, int32_t type ); typedef struct __dt_args { @@ -51,7 +52,7 @@ typedef struct __dt_args { int32_t ca; int32_t cd; int* i; - OPAL_PTRDIFF_TYPE* a; + ptrdiff_t* a; ompi_datatype_t** d; } ompi_datatype_args_t; @@ -65,7 +66,7 @@ typedef struct __dt_args { */ #if OPAL_ALIGN_WORD_SIZE_INTEGERS #define OMPI_DATATYPE_ALIGN_PTR(PTR, TYPE) \ - (PTR) = OPAL_ALIGN_PTR((PTR), sizeof(OPAL_PTRDIFF_TYPE), TYPE) + (PTR) = OPAL_ALIGN_PTR((PTR), sizeof(ptrdiff_t), TYPE) #else #define OMPI_DATATYPE_ALIGN_PTR(PTR, TYPE) #endif /* OPAL_ALIGN_WORD_SIZE_INTEGERS */ @@ -80,7 +81,7 @@ typedef struct __dt_args { #define ALLOC_ARGS(PDATA, IC, AC, DC) \ do { \ int length = sizeof(ompi_datatype_args_t) + (IC) * sizeof(int) + \ - (AC) * sizeof(OPAL_PTRDIFF_TYPE) + (DC) * sizeof(MPI_Datatype); \ + (AC) * sizeof(ptrdiff_t) + (DC) * sizeof(MPI_Datatype); \ char* buf = (char*)malloc( length ); \ ompi_datatype_args_t* pArgs = (ompi_datatype_args_t*)buf; \ pArgs->ci = (IC); \ @@ -89,8 +90,8 @@ typedef struct __dt_args { buf += sizeof(ompi_datatype_args_t); \ if( pArgs->ca == 0 ) pArgs->a = NULL; \ else { \ - pArgs->a = (OPAL_PTRDIFF_TYPE*)buf; \ - buf += pArgs->ca * sizeof(OPAL_PTRDIFF_TYPE); \ + pArgs->a = (ptrdiff_t*)buf; \ + buf += pArgs->ca * sizeof(ptrdiff_t); \ } \ if( pArgs->cd == 0 ) pArgs->d = NULL; \ else { \ @@ -101,7 +102,7 @@ typedef struct __dt_args { else pArgs->i = (int*)buf; \ pArgs->ref_count = 1; \ pArgs->total_pack_size = (4 + (IC) + (DC)) * sizeof(int) + \ - (AC) * sizeof(OPAL_PTRDIFF_TYPE); \ + (AC) * sizeof(ptrdiff_t); \ (PDATA)->args = (void*)pArgs; \ (PDATA)->packed_description = NULL; \ } while(0) @@ -109,7 +110,7 @@ typedef struct __dt_args { int32_t ompi_datatype_set_args( ompi_datatype_t* pData, int32_t ci, const int32_t** i, - int32_t ca, const OPAL_PTRDIFF_TYPE* a, + int32_t ca, const ptrdiff_t* a, int32_t cd, ompi_datatype_t* const * d, int32_t type) { int pos; @@ -220,9 +221,9 @@ int32_t ompi_datatype_set_args( ompi_datatype_t* pData, break; } - /* copy the array of MPI_Aint, aka OPAL_PTRDIFF_TYPE */ + /* copy the array of MPI_Aint, aka ptrdiff_t */ if( pArgs->a != NULL ) - memcpy( pArgs->a, a, ca * sizeof(OPAL_PTRDIFF_TYPE) ); + memcpy( pArgs->a, a, ca * sizeof(ptrdiff_t) ); for( pos = 0; pos < cd; pos++ ) { pArgs->d[pos] = d[pos]; @@ -317,7 +318,7 @@ int32_t ompi_datatype_print_args( const ompi_datatype_t* pData ) int32_t ompi_datatype_get_args( const ompi_datatype_t* pData, int32_t which, int32_t* ci, int32_t* i, - int32_t* ca, OPAL_PTRDIFF_TYPE* a, + int32_t* ca, ptrdiff_t* a, int32_t* cd, ompi_datatype_t** d, int32_t* type) { ompi_datatype_args_t* pArgs = (ompi_datatype_args_t*)pData->args; @@ -354,7 +355,7 @@ int32_t ompi_datatype_get_args( const ompi_datatype_t* pData, int32_t which, memcpy( i, pArgs->i, pArgs->ci * sizeof(int) ); } if( (NULL != a) && (NULL != pArgs->a) ) { - memcpy( a, pArgs->a, pArgs->ca * sizeof(OPAL_PTRDIFF_TYPE) ); + memcpy( a, pArgs->a, pArgs->ca * sizeof(ptrdiff_t) ); } if( (NULL != d) && (NULL != pArgs->d) ) { memcpy( d, pArgs->d, pArgs->cd * sizeof(MPI_Datatype) ); @@ -377,7 +378,7 @@ int32_t ompi_datatype_copy_args( const ompi_datatype_t* source_data, * a read only memory). */ if( NULL != pArgs ) { - OPAL_THREAD_ADD32(&pArgs->ref_count, 1); + OPAL_THREAD_ADD_FETCH32(&pArgs->ref_count, 1); dest_data->args = pArgs; } return OMPI_SUCCESS; @@ -395,7 +396,7 @@ int32_t ompi_datatype_release_args( ompi_datatype_t* pData ) ompi_datatype_args_t* pArgs = (ompi_datatype_args_t*)pData->args; assert( 0 < pArgs->ref_count ); - OPAL_THREAD_ADD32(&pArgs->ref_count, -1); + OPAL_THREAD_ADD_FETCH32(&pArgs->ref_count, -1); if( 0 == pArgs->ref_count ) { /* There are some duplicated datatypes around that have a pointer to this * args. We will release them only when the last datatype will dissapear. @@ -449,8 +450,8 @@ static inline int __ompi_datatype_pack_description( ompi_datatype_t* datatype, /* description of the displacements must be 64 bits aligned */ OMPI_DATATYPE_ALIGN_PTR(next_packed, char*); - memcpy( next_packed, args->a, sizeof(OPAL_PTRDIFF_TYPE) * args->ca ); - next_packed += sizeof(OPAL_PTRDIFF_TYPE) * args->ca; + memcpy( next_packed, args->a, sizeof(ptrdiff_t) * args->ca ); + next_packed += sizeof(ptrdiff_t) * args->ca; } position = (int*)next_packed; next_packed += sizeof(int) * args->cd; @@ -486,7 +487,8 @@ int ompi_datatype_get_pack_description( ompi_datatype_t* datatype, void* recursive_buffer; if (NULL == packed_description) { - if (opal_atomic_cmpset (&datatype->packed_description, NULL, (void *) 1)) { + void *_tmp_ptr = NULL; + if (opal_atomic_compare_exchange_strong_ptr (&datatype->packed_description, (void *) &_tmp_ptr, (void *) 1)) { if( ompi_datatype_is_predefined(datatype) ) { packed_description = malloc(2 * sizeof(int)); } else if( NULL == args ) { @@ -557,7 +559,7 @@ static ompi_datatype_t* __ompi_datatype_create_from_packed_description( void** p int* position; ompi_datatype_t* datatype = NULL; ompi_datatype_t** array_of_datatype; - OPAL_PTRDIFF_TYPE* array_of_disp; + ptrdiff_t* array_of_disp; int* array_of_length; int number_of_length, number_of_disp, number_of_datatype, data_id; int create_type, i; @@ -609,13 +611,13 @@ static ompi_datatype_t* __ompi_datatype_create_from_packed_description( void** p next_buffer += (4 * sizeof(int)); /* move after the header */ /* description of the displacements (if ANY !) should always be aligned - on MPI_Aint, aka OPAL_PTRDIFF_TYPE */ + on MPI_Aint, aka ptrdiff_t */ if (number_of_disp > 0) { OMPI_DATATYPE_ALIGN_PTR(next_buffer, char*); } - array_of_disp = (OPAL_PTRDIFF_TYPE*)next_buffer; - next_buffer += number_of_disp * sizeof(OPAL_PTRDIFF_TYPE); + array_of_disp = (ptrdiff_t*)next_buffer; + next_buffer += number_of_disp * sizeof(ptrdiff_t); /* the other datatypes */ position = (int*)next_buffer; next_buffer += number_of_datatype * sizeof(int); @@ -758,12 +760,12 @@ static ompi_datatype_t* __ompi_datatype_create_from_args( int32_t* i, MPI_Aint* /******************************************************************/ case MPI_COMBINER_DARRAY: ompi_datatype_create_darray( i[0] /* size */, i[1] /* rank */, i[2] /* ndims */, - &i[3 + 0 * i[0]], &i[3 + 1 * i[0]], - &i[3 + 2 * i[0]], &i[3 + 3 * i[0]], - i[3 + 4 * i[0]], d[0], &datatype ); + &i[3 + 0 * i[2]], &i[3 + 1 * i[2]], + &i[3 + 2 * i[2]], &i[3 + 3 * i[2]], + i[3 + 4 * i[2]], d[0], &datatype ); { - const int* a_i[8] = {&i[0], &i[1], &i[2], &i[3 + 0 * i[0]], &i[3 + 1 * i[0]], &i[3 + 2 * i[0]], - &i[3 + 3 * i[0]], &i[3 + 4 * i[0]]}; + const int* a_i[8] = {&i[0], &i[1], &i[2], &i[3 + 0 * i[2]], &i[3 + 1 * i[2]], &i[3 + 2 * i[2]], + &i[3 + 3 * i[2]], &i[3 + 4 * i[2]]}; ompi_datatype_set_args( datatype, 4 * i[2] + 4, a_i, 0, NULL, 1, d, MPI_COMBINER_DARRAY); } break; @@ -837,16 +839,19 @@ ompi_datatype_t* ompi_datatype_get_single_predefined_type_from_args( ompi_dataty return NULL; } } - if( NULL == predef ) { /* This is the first iteration */ - predef = current_predef; - } else { - /** - * What exactly should we consider as identical types? If they are - * the same MPI level type, or if they map to the same OPAL datatype? - * In other words, MPI_FLOAT and MPI_REAL4 are they identical? - */ - if( predef != current_predef ) { - return NULL; + if (current_predef != MPI_LB && current_predef != MPI_UB) { + if( NULL == predef ) { /* This is the first iteration */ + predef = current_predef; + } else { + /** + * What exactly should we consider as identical types? + * If they are the same MPI level type, or if they map + * to the same OPAL datatype? In other words, MPI_FLOAT + * and MPI_REAL4 are they identical? + */ + if( predef != current_predef ) { + return NULL; + } } } } diff --git a/ompi/datatype/ompi_datatype_create.c b/ompi/datatype/ompi_datatype_create.c index 8c942ba4baf..69ec1b2c6ce 100644 --- a/ompi/datatype/ompi_datatype_create.c +++ b/ompi/datatype/ompi_datatype_create.c @@ -9,7 +9,7 @@ * Copyright (c) 2004-2006 High Performance Computing Center Stuttgart, * University of Stuttgart. All rights reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. - * Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2010-2018 Cisco Systems, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -29,9 +29,11 @@ static void __ompi_datatype_allocate( ompi_datatype_t* datatype ) { datatype->args = NULL; - datatype->d_f_to_c_index = opal_pointer_array_add(&ompi_datatype_f_to_c_table, datatype); - /* Later generated datatypes will have their id according to the Fortran ID, as ALL types are registered */ - datatype->id = datatype->d_f_to_c_index; + /* Do not add the newly created datatypes to the f2c translation table. We will add them only + * if necessary, basically upon the first call the MPI_Datatype_f2c. + */ + datatype->d_f_to_c_index = -1; + datatype->id = -1; datatype->d_keyhash = NULL; datatype->name[0] = '\0'; datatype->packed_description = NULL; @@ -48,8 +50,9 @@ static void __ompi_datatype_release(ompi_datatype_t * datatype) free( datatype->packed_description ); datatype->packed_description = NULL; } - if( NULL != opal_pointer_array_get_item(&ompi_datatype_f_to_c_table, datatype->d_f_to_c_index) ){ + if( datatype->d_f_to_c_index >= 0 ) { opal_pointer_array_set_item( &ompi_datatype_f_to_c_table, datatype->d_f_to_c_index, NULL ); + datatype->d_f_to_c_index = -1; } /* any pending attributes ? */ if (NULL != datatype->d_keyhash) { @@ -105,8 +108,12 @@ ompi_datatype_duplicate( const ompi_datatype_t* oldType, ompi_datatype_t** newTy the top level (specifically, MPI_TYPE_DUP). */ new_ompi_datatype->d_keyhash = NULL; new_ompi_datatype->args = NULL; - snprintf (new_ompi_datatype->name, MPI_MAX_OBJECT_NAME, "Dup %s", - oldType->name); + + char *new_name; + asprintf(&new_name, "Dup %s", oldType->name); + strncpy(new_ompi_datatype->name, new_name, MPI_MAX_OBJECT_NAME - 1); + new_ompi_datatype->name[MPI_MAX_OBJECT_NAME - 1] = '\0'; + free(new_name); return OMPI_SUCCESS; } diff --git a/ompi/datatype/ompi_datatype_create_darray.c b/ompi/datatype/ompi_datatype_create_darray.c index 98c81f0dc29..a245dcebce4 100644 --- a/ompi/datatype/ompi_datatype_create_darray.c +++ b/ompi/datatype/ompi_datatype_create_darray.c @@ -15,6 +15,7 @@ * Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2016 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -35,7 +36,7 @@ block(const int *gsize_array, int dim, int ndims, int nprocs, ptrdiff_t *st_offset) { int blksize, global_size, mysize, i, j, rc, start_loop, step; - ptrdiff_t stride; + ptrdiff_t stride, disps[2]; global_size = gsize_array[dim]; @@ -71,6 +72,20 @@ block(const int *gsize_array, int dim, int ndims, int nprocs, /* in terms of no. of elements of type oldtype in this dimension */ if (mysize == 0) *st_offset = 0; + /* need to set the UB for block-cyclic to work */ + disps[0] = 0; disps[1] = orig_extent; + if (order == MPI_ORDER_FORTRAN) { + for(i=0; i<=dim; i++) { + disps[1] *= gsize_array[i]; + } + } else { + for(i=ndims-1; i>=dim; i--) { + disps[1] *= gsize_array[i]; + } + } + rc = opal_datatype_resize( &(*type_new)->super, disps[0], disps[1] ); + if (OMPI_SUCCESS != rc) return rc; + return OMPI_SUCCESS; } diff --git a/ompi/datatype/ompi_datatype_create_indexed.c b/ompi/datatype/ompi_datatype_create_indexed.c index 9311eac7972..50c521b7bf9 100644 --- a/ompi/datatype/ompi_datatype_create_indexed.c +++ b/ompi/datatype/ompi_datatype_create_indexed.c @@ -13,7 +13,7 @@ * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. * Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -35,7 +35,7 @@ int32_t ompi_datatype_create_indexed( int count, const int* pBlockLength, const { ompi_datatype_t* pdt; int i, dLength, endat, disp; - OPAL_PTRDIFF_TYPE extent; + ptrdiff_t extent; if( 0 == count ) { return ompi_datatype_duplicate( &ompi_mpi_datatype_null.dt, newType); @@ -66,12 +66,12 @@ int32_t ompi_datatype_create_indexed( int count, const int* pBlockLength, const } -int32_t ompi_datatype_create_hindexed( int count, const int* pBlockLength, const OPAL_PTRDIFF_TYPE* pDisp, +int32_t ompi_datatype_create_hindexed( int count, const int* pBlockLength, const ptrdiff_t* pDisp, const ompi_datatype_t* oldType, ompi_datatype_t** newType ) { ompi_datatype_t* pdt; int i, dLength; - OPAL_PTRDIFF_TYPE extent, disp, endat; + ptrdiff_t extent, disp, endat; if( 0 == count ) { *newType = ompi_datatype_create( 0 ); @@ -109,7 +109,7 @@ int32_t ompi_datatype_create_indexed_block( int count, int bLength, const int* p { ompi_datatype_t* pdt; int i, dLength, endat, disp; - OPAL_PTRDIFF_TYPE extent; + ptrdiff_t extent; ompi_datatype_type_extent( oldType, &extent ); if( (count == 0) || (bLength == 0) ) { @@ -143,12 +143,12 @@ int32_t ompi_datatype_create_indexed_block( int count, int bLength, const int* p return OMPI_SUCCESS; } -int32_t ompi_datatype_create_hindexed_block( int count, int bLength, const OPAL_PTRDIFF_TYPE* pDisp, +int32_t ompi_datatype_create_hindexed_block( int count, int bLength, const ptrdiff_t* pDisp, const ompi_datatype_t* oldType, ompi_datatype_t** newType ) { ompi_datatype_t* pdt; int i, dLength; - OPAL_PTRDIFF_TYPE extent, disp, endat; + ptrdiff_t extent, disp, endat; ompi_datatype_type_extent( oldType, &extent ); if( (count == 0) || (bLength == 0) ) { diff --git a/ompi/datatype/ompi_datatype_create_struct.c b/ompi/datatype/ompi_datatype_create_struct.c index e2457d16ec9..98daa8bacbb 100644 --- a/ompi/datatype/ompi_datatype_create_struct.c +++ b/ompi/datatype/ompi_datatype_create_struct.c @@ -13,6 +13,8 @@ * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. * Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -26,11 +28,11 @@ #include "ompi/datatype/ompi_datatype.h" -int32_t ompi_datatype_create_struct( int count, const int* pBlockLength, const OPAL_PTRDIFF_TYPE* pDisp, +int32_t ompi_datatype_create_struct( int count, const int* pBlockLength, const ptrdiff_t* pDisp, ompi_datatype_t* const * pTypes, ompi_datatype_t** newType ) { int i; - OPAL_PTRDIFF_TYPE disp = 0, endto, lastExtent, lastDisp; + ptrdiff_t disp = 0, endto, lastExtent, lastDisp; int lastBlock; ompi_datatype_t *pdt, *lastType; diff --git a/ompi/datatype/ompi_datatype_create_vector.c b/ompi/datatype/ompi_datatype_create_vector.c index c899f1d9028..1de8df4d2d2 100644 --- a/ompi/datatype/ompi_datatype_create_vector.c +++ b/ompi/datatype/ompi_datatype_create_vector.c @@ -13,6 +13,8 @@ * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. * Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -36,7 +38,7 @@ int32_t ompi_datatype_create_vector( int count, int bLength, int stride, const ompi_datatype_t* oldType, ompi_datatype_t** newType ) { ompi_datatype_t *pTempData, *pData; - OPAL_PTRDIFF_TYPE extent = oldType->super.ub - oldType->super.lb; + ptrdiff_t extent = oldType->super.ub - oldType->super.lb; if( 0 == count ) { @@ -47,7 +49,7 @@ int32_t ompi_datatype_create_vector( int count, int bLength, int stride, pData = ompi_datatype_create( oldType->super.desc.used + 2 ); if( (bLength == stride) || (1 >= count) ) { /* the elements are contiguous */ - ompi_datatype_add( pData, oldType, count * bLength, 0, extent ); + ompi_datatype_add( pData, oldType, (size_t)count * bLength, 0, extent ); } else { if( 1 == bLength ) { ompi_datatype_add( pData, oldType, count, 0, extent * stride ); @@ -64,11 +66,11 @@ int32_t ompi_datatype_create_vector( int count, int bLength, int stride, } -int32_t ompi_datatype_create_hvector( int count, int bLength, OPAL_PTRDIFF_TYPE stride, +int32_t ompi_datatype_create_hvector( int count, int bLength, ptrdiff_t stride, const ompi_datatype_t* oldType, ompi_datatype_t** newType ) { ompi_datatype_t *pTempData, *pData; - OPAL_PTRDIFF_TYPE extent = oldType->super.ub - oldType->super.lb; + ptrdiff_t extent = oldType->super.ub - oldType->super.lb; if( 0 == count ) { *newType = ompi_datatype_create( 0 ); diff --git a/ompi/datatype/ompi_datatype_get_elements.c b/ompi/datatype/ompi_datatype_get_elements.c index 0c1f8a7b842..72ac87d6df7 100644 --- a/ompi/datatype/ompi_datatype_get_elements.c +++ b/ompi/datatype/ompi_datatype_get_elements.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2013 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, @@ -25,6 +25,7 @@ #include "ompi/runtime/params.h" #include "ompi/datatype/ompi_datatype.h" +#include "opal/datatype/opal_datatype_internal.h" int ompi_datatype_get_elements (ompi_datatype_t *datatype, size_t ucount, size_t *count) { @@ -48,9 +49,10 @@ int ompi_datatype_get_elements (ompi_datatype_t *datatype, size_t ucount, size_t there are no leftover bytes */ if (!ompi_datatype_is_predefined(datatype)) { if (0 != internal_count) { + opal_datatype_compute_ptypes(&datatype->super); /* count the basic elements in the datatype */ - for (i = 4, total = 0 ; i < OPAL_DATATYPE_MAX_PREDEFINED ; ++i) { - total += datatype->super.btypes[i]; + for (i = OPAL_DATATYPE_FIRST_TYPE, total = 0 ; i < OPAL_DATATYPE_MAX_PREDEFINED ; ++i) { + total += datatype->super.ptypes[i]; } internal_count = total * internal_count; } diff --git a/ompi/datatype/ompi_datatype_internal.h b/ompi/datatype/ompi_datatype_internal.h index 76485370dfa..0cbfb25a95a 100644 --- a/ompi/datatype/ompi_datatype_internal.h +++ b/ompi/datatype/ompi_datatype_internal.h @@ -1,13 +1,13 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* - * Copyright (c) 2009-2013 The University of Tennessee and The University + * Copyright (c) 2009-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. * Copyright (c) 2010-2012 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2013 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2015-2018 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2016 FUJITSU LIMITED. All rights reserved. * $COPYRIGHT$ @@ -403,7 +403,7 @@ extern const ompi_datatype_t* ompi_datatype_basicDatatypes[OMPI_DATATYPE_MPI_MAX #define OMPI_DATATYPE_EMPTY_DATA(NAME) \ .id = OMPI_DATATYPE_MPI_ ## NAME, \ - .d_f_to_c_index = 0, \ + .d_f_to_c_index = -1, \ .d_keyhash = NULL, \ .args = NULL, \ .packed_description = NULL, \ @@ -416,6 +416,8 @@ extern const ompi_datatype_t* ompi_datatype_basicDatatypes[OMPI_DATATYPE_MPI_MAX { /*ompi_predefined_datatype_t*/ \ { /* ompi_datatype_t */ \ OMPI_DATATYPE_INITIALIZER_ ## TYPE (OMPI_DATATYPE_FLAG_PREDEFINED | \ + OMPI_DATATYPE_FLAG_ANALYZED | \ + OMPI_DATATYPE_FLAG_MONOTONIC | \ (FLAGS)) /*super*/, \ OMPI_DATATYPE_EMPTY_DATA(NAME) /*id,d_f_to_c_index,d_keyhash,args,packed_description,name*/ \ }, \ @@ -432,6 +434,8 @@ extern const ompi_datatype_t* ompi_datatype_basicDatatypes[OMPI_DATATYPE_MPI_MAX OMPI_DATATYPE_INIT_PREDEFINED_BASIC_TYPE( NAME, NAME, FLAGS ) #define OMPI_DATATYPE_INIT_UNAVAILABLE( NAME, FLAGS ) \ OMPI_DATATYPE_INIT_PREDEFINED_BASIC_TYPE( UNAVAILABLE, NAME, FLAGS ) +#define OMPI_DATATYPE_INIT_UNAVAILABLE_BASIC_TYPE(TYPE, NAME, FLAGS) \ + OMPI_DATATYPE_INIT_PREDEFINED_BASIC_TYPE( UNAVAILABLE, NAME, FLAGS ) /* * Initilization for these types is deferred until runtime. @@ -455,6 +459,8 @@ extern const ompi_datatype_t* ompi_datatype_basicDatatypes[OMPI_DATATYPE_MPI_MAX .super = OPAL_OBJ_STATIC_INIT(opal_datatype_t), \ .flags = OPAL_DATATYPE_FLAG_BASIC | \ OMPI_DATATYPE_FLAG_PREDEFINED | \ + OMPI_DATATYPE_FLAG_ANALYZED | \ + OMPI_DATATYPE_FLAG_MONOTONIC | \ OMPI_DATATYPE_FLAG_DATA_FORTRAN | (FLAGS), \ .id = OPAL_DATATYPE_ ## TYPE ## SIZE, \ .bdt_used = (((uint32_t)1)<<(OPAL_DATATYPE_ ## TYPE ## SIZE)), \ @@ -465,7 +471,7 @@ extern const ompi_datatype_t* ompi_datatype_basicDatatypes[OMPI_DATATYPE_MPI_MAX .name = OPAL_DATATYPE_INIT_NAME(TYPE ## SIZE), \ .desc = OPAL_DATATYPE_INIT_DESC_PREDEFINED(TYPE ## SIZE), \ .opt_desc = OPAL_DATATYPE_INIT_DESC_PREDEFINED(TYPE ## SIZE), \ - .btypes = OPAL_DATATYPE_INIT_BTYPES_ARRAY(TYPE ## SIZE) \ + .ptypes = OPAL_DATATYPE_INIT_PTYPES_ARRAY(TYPE ## SIZE) \ } #define OMPI_DATATYPE_INIT_PREDEFINED_BASIC_TYPE_FORTRAN( TYPE, NAME, SIZE, ALIGN, FLAGS ) \ diff --git a/ompi/datatype/ompi_datatype_module.c b/ompi/datatype/ompi_datatype_module.c index fb5a09e9072..3ee09173cd8 100644 --- a/ompi/datatype/ompi_datatype_module.c +++ b/ompi/datatype/ompi_datatype_module.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2016 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2006 High Performance Computing Center Stuttgart, @@ -15,7 +15,7 @@ * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. * Copyright (c) 2013 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2015-2018 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2016 FUJITSU LIMITED. All rights reserved. * $COPYRIGHT$ @@ -384,8 +384,9 @@ opal_pointer_array_t ompi_datatype_f_to_c_table = {{0}}; (PDST)->super.opt_desc = (PSRC)->super.opt_desc; \ (PDST)->packed_description = (PSRC)->packed_description; \ (PSRC)->packed_description = NULL; \ - memcpy( (PDST)->super.btypes, (PSRC)->super.btypes, \ - OPAL_DATATYPE_MAX_PREDEFINED * sizeof(uint32_t) ); \ + /* transfer the ptypes */ \ + (PDST)->super.ptypes = (PSRC)->super.ptypes; \ + (PSRC)->super.ptypes = NULL; \ } while(0) #define DECLARE_MPI2_COMPOSED_STRUCT_DDT( PDATA, MPIDDT, MPIDDTNAME, type1, type2, MPIType1, MPIType2, FLAGS) \ @@ -393,27 +394,29 @@ opal_pointer_array_t ompi_datatype_f_to_c_table = {{0}}; struct { type1 v1; type2 v2; } s[2]; \ ompi_datatype_t *types[2], *ptype; \ int bLength[2] = {1, 1}; \ - OPAL_PTRDIFF_TYPE base, displ[2]; \ + ptrdiff_t base, displ[2]; \ \ types[0] = (ompi_datatype_t*)ompi_datatype_basicDatatypes[MPIType1]; \ types[1] = (ompi_datatype_t*)ompi_datatype_basicDatatypes[MPIType2]; \ - base = (OPAL_PTRDIFF_TYPE)(&(s[0])); \ - displ[0] = (OPAL_PTRDIFF_TYPE)(&(s[0].v1)); \ + base = (ptrdiff_t)(&(s[0])); \ + displ[0] = (ptrdiff_t)(&(s[0].v1)); \ displ[0] -= base; \ - displ[1] = (OPAL_PTRDIFF_TYPE)(&(s[0].v2)); \ + displ[1] = (ptrdiff_t)(&(s[0].v2)); \ displ[1] -= base; \ \ ompi_datatype_create_struct( 2, bLength, displ, types, &ptype ); \ - displ[0] = (OPAL_PTRDIFF_TYPE)(&(s[1])); \ + displ[0] = (ptrdiff_t)(&(s[1])); \ displ[0] -= base; \ - if( displ[0] != (displ[1] + (OPAL_PTRDIFF_TYPE)sizeof(type2)) ) \ + if( displ[0] != (displ[1] + (ptrdiff_t)sizeof(type2)) ) \ ptype->super.ub = displ[0]; /* force a new extent for the datatype */ \ ptype->super.flags |= (FLAGS); \ ptype->id = MPIDDT; \ ompi_datatype_commit( &ptype ); \ COPY_DATA_DESC( PDATA, ptype ); \ (PDATA)->super.flags &= ~OPAL_DATATYPE_FLAG_PREDEFINED; \ - (PDATA)->super.flags |= OMPI_DATATYPE_FLAG_PREDEFINED; \ + (PDATA)->super.flags |= OMPI_DATATYPE_FLAG_PREDEFINED | \ + OMPI_DATATYPE_FLAG_ANALYZED | \ + OMPI_DATATYPE_FLAG_MONOTONIC; \ ptype->super.desc.desc = NULL; \ ptype->super.opt_desc.desc = NULL; \ OBJ_RELEASE( ptype ); \ @@ -429,7 +432,9 @@ opal_pointer_array_t ompi_datatype_f_to_c_table = {{0}}; ompi_datatype_commit( &ptype ); \ COPY_DATA_DESC( (PDATA), ptype ); \ (PDATA)->super.flags &= ~OPAL_DATATYPE_FLAG_PREDEFINED; \ - (PDATA)->super.flags |= OMPI_DATATYPE_FLAG_PREDEFINED; \ + (PDATA)->super.flags |= OMPI_DATATYPE_FLAG_PREDEFINED | \ + OMPI_DATATYPE_FLAG_ANALYZED | \ + OMPI_DATATYPE_FLAG_MONOTONIC; \ ptype->super.desc.desc = NULL; \ ptype->super.opt_desc.desc = NULL; \ OBJ_RELEASE( ptype ); \ @@ -444,7 +449,9 @@ opal_pointer_array_t ompi_datatype_f_to_c_table = {{0}}; /* forget the language flag */ \ (PDATA)->super.flags &= ~OMPI_DATATYPE_FLAG_DATA_LANGUAGE; \ (PDATA)->super.flags &= ~OPAL_DATATYPE_FLAG_PREDEFINED; \ - (PDATA)->super.flags |= OMPI_DATATYPE_FLAG_PREDEFINED; \ + (PDATA)->super.flags |= OMPI_DATATYPE_FLAG_PREDEFINED | \ + OMPI_DATATYPE_FLAG_ANALYZED | \ + OMPI_DATATYPE_FLAG_MONOTONIC; \ } while(0) @@ -457,7 +464,7 @@ int32_t ompi_datatype_init( void ) /* Create the f2c translation table */ OBJ_CONSTRUCT(&ompi_datatype_f_to_c_table, opal_pointer_array_t); if( OPAL_SUCCESS != opal_pointer_array_init(&ompi_datatype_f_to_c_table, - 0, OMPI_FORTRAN_HANDLE_MAX, 64)) { + 64, OMPI_FORTRAN_HANDLE_MAX, 32)) { return OMPI_ERROR; } /* All temporary datatypes created on the following statement will get registered @@ -512,7 +519,6 @@ int32_t ompi_datatype_init( void ) /* Copy the desc pointer from the ub - datatype->lb) == (OPAL_PTRDIFF_TYPE)datatype->size ) { + if( (datatype->ub - datatype->lb) == (ptrdiff_t)datatype->size ) { datatype->flags |= OPAL_DATATYPE_FLAG_NO_GAPS; } else { datatype->flags &= ~OPAL_DATATYPE_FLAG_NO_GAPS; @@ -737,7 +743,7 @@ void ompi_datatype_dump( const ompi_datatype_t* pData ) (long)pData->super.size, (int)pData->super.align, pData->super.id, (int)pData->super.desc.length, (int)pData->super.desc.used, (long)pData->super.true_lb, (long)pData->super.true_ub, (long)(pData->super.true_ub - pData->super.true_lb), (long)pData->super.lb, (long)pData->super.ub, (long)(pData->super.ub - pData->super.lb), - (int)pData->super.nbElems, (int)pData->super.btypes[OPAL_DATATYPE_LOOP], (int)pData->super.flags ); + (int)pData->super.nbElems, (int)pData->super.loops, (int)pData->super.flags ); /* dump the flags */ if( ompi_datatype_is_predefined(pData) ) { index += snprintf( buffer + index, length - index, "predefined " ); diff --git a/ompi/debuggers/dlopen_test.c b/ompi/debuggers/dlopen_test.c index 266a1c24389..27ef62c5a63 100644 --- a/ompi/debuggers/dlopen_test.c +++ b/ompi/debuggers/dlopen_test.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2009-2015 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2009-2018 Cisco Systems, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -12,6 +12,7 @@ #include #include #include +#include #include "opal/runtime/opal.h" #include "opal/mca/dl/base/base.h" diff --git a/ompi/debuggers/ompi_mpihandles_dll.c b/ompi/debuggers/ompi_mpihandles_dll.c index 05a20e113f6..131040b57fd 100644 --- a/ompi/debuggers/ompi_mpihandles_dll.c +++ b/ompi/debuggers/ompi_mpihandles_dll.c @@ -7,6 +7,7 @@ * Copyright (c) 2012-2013 Inria. All rights reserved. * Copyright (c) 2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corp. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -237,7 +238,7 @@ int mpidbg_init_per_image(mqs_image *image, const mqs_image_callbacks *icb, mqs_find_type(image, "ompi_file_t", mqs_lang_c); handle_types->hi_c_group = i_info->ompi_group_t.type; handle_types->hi_c_info = - mqs_find_type(image, "ompi_info_t", mqs_lang_c); + mqs_find_type(image, "opal_info_t", mqs_lang_c); /* JMS: "MPI_Offset" is a typedef (see comment about MPI_Aint above) */ handle_types->hi_c_offset = mqs_find_type(image, "MPI_Offset", mqs_lang_c); diff --git a/ompi/debuggers/predefined_gap_test.c b/ompi/debuggers/predefined_gap_test.c index 69eb1c1791b..0129eb63a23 100644 --- a/ompi/debuggers/predefined_gap_test.c +++ b/ompi/debuggers/predefined_gap_test.c @@ -5,6 +5,7 @@ * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2012-2013 Inria. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -52,8 +53,8 @@ int main(int argc, char **argv) { /* Test Predefined communicator sizes */ printf("ompi_predefined_communicator_t = %lu bytes\n", sizeof(ompi_predefined_communicator_t)); printf("ompi_communicator_t = %lu bytes\n", sizeof(ompi_communicator_t)); - GAP_CHECK("c_base", test_comm, c_base, c_base, 0); - GAP_CHECK("c_lock", test_comm, c_lock, c_base, 1); + GAP_CHECK("c_base", test_comm, super, super, 0); + GAP_CHECK("c_lock", test_comm, c_lock, super, 1); GAP_CHECK("c_name", test_comm, c_name, c_lock, 1); GAP_CHECK("c_contextid", test_comm, c_contextid, c_name, 1); GAP_CHECK("c_my_rank", test_comm, c_my_rank, c_contextid, 1); @@ -120,8 +121,8 @@ int main(int argc, char **argv) { printf("=============================================\n"); printf("ompi_predefined_win_t = %lu bytes\n", sizeof(ompi_predefined_win_t)); printf("ompi_win_t = %lu bytes\n", sizeof(ompi_win_t)); - GAP_CHECK("w_base", test_win, w_base, w_base, 0); - GAP_CHECK("w_lock", test_win, w_lock, w_base, 1); + GAP_CHECK("super", test_win, super, super, 0); + GAP_CHECK("w_lock", test_win, w_lock, super, 1); GAP_CHECK("w_name", test_win, w_name, w_lock, 1); GAP_CHECK("w_group", test_win, w_group, w_name, 1); GAP_CHECK("w_flags", test_win, w_flags, w_group, 1); @@ -137,8 +138,7 @@ int main(int argc, char **argv) { printf("ompi_info_t = %lu bytes\n", sizeof(ompi_info_t)); GAP_CHECK("super", test_info, super, super, 0); GAP_CHECK("i_f_to_c_index", test_info, i_f_to_c_index, super, 1); - GAP_CHECK("i_lock", test_info, i_lock, i_f_to_c_index, 1); - GAP_CHECK("i_freed", test_info, i_freed, i_lock, 1); + GAP_CHECK("i_freed", test_info, i_freed, i_f_to_c_index, 1); /* Test Predefined file sizes */ printf("=============================================\n"); @@ -148,8 +148,7 @@ int main(int argc, char **argv) { GAP_CHECK("f_comm", test_file, f_comm, super, 1); GAP_CHECK("f_filename", test_file, f_filename, f_comm, 1); GAP_CHECK("f_amode", test_file, f_amode, f_filename, 1); - GAP_CHECK("f_info", test_file, f_info, f_amode, 1); - GAP_CHECK("f_flags", test_file, f_flags, f_info, 1); + GAP_CHECK("f_flags", test_file, f_flags, f_amode, 1); GAP_CHECK("f_f_to_c_index", test_file, f_f_to_c_index, f_flags, 1); GAP_CHECK("error_handler", test_file, error_handler, f_f_to_c_index, 1); GAP_CHECK("errhandler_type", test_file, errhandler_type, error_handler, 1); diff --git a/ompi/dpm/dpm.c b/ompi/dpm/dpm.c index d4346e417d7..14810f6b028 100644 --- a/ompi/dpm/dpm.c +++ b/ompi/dpm/dpm.c @@ -10,12 +10,12 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2007-2015 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2007-2018 Cisco Systems, Inc. All rights reserved * Copyright (c) 2006-2009 University of Houston. All rights reserved. * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2011-2015 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2013-2016 Intel, Inc. All rights reserved. + * Copyright (c) 2013-2017 Intel, Inc. All rights reserved. * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ @@ -40,6 +40,7 @@ #include "opal/util/argv.h" #include "opal/util/opal_getcwd.h" #include "opal/util/proc.h" +#include "opal/util/show_help.h" #include "opal/dss/dss.h" #include "opal/mca/hwloc/base/base.h" #include "opal/mca/pmix/pmix.h" @@ -112,6 +113,12 @@ int ompi_dpm_connect_accept(ompi_communicator_t *comm, int root, if (NULL == opal_pmix.publish || NULL == opal_pmix.connect || NULL == opal_pmix.unpublish || (NULL == opal_pmix.lookup && NULL == opal_pmix.lookup_nb)) { + /* print a nice message explaining we don't have support */ + opal_show_help("help-mpi-runtime.txt", "noconxcpt", true); + return OMPI_ERR_NOT_SUPPORTED; + } + if (!ompi_rte_connect_accept_support(port_string)) { + /* they will have printed the help message */ return OMPI_ERR_NOT_SUPPORTED; } @@ -158,8 +165,8 @@ int ompi_dpm_connect_accept(ompi_communicator_t *comm, int root, sizeof(ompi_proc_t *)); for (i=0 ; igrp_proc_count ; i++) { if (NULL == (proc_list[i] = ompi_group_peer_lookup(group,i))) { - ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND); - rc = ORTE_ERR_NOT_FOUND; + OMPI_ERROR_LOG(OMPI_ERR_NOT_FOUND); + rc = OMPI_ERR_NOT_FOUND; free(proc_list); goto exit; } @@ -460,8 +467,7 @@ int ompi_dpm_connect_accept(ompi_communicator_t *comm, int root, group, /* local group */ new_group_pointer /* remote group */ ); - if ( NULL == newcomp ) { - rc = OMPI_ERR_OUT_OF_RESOURCE; + if (OMPI_SUCCESS != rc) { goto exit; } @@ -582,13 +588,8 @@ int ompi_dpm_disconnect(ompi_communicator_t *comm) } /* ensure we tell the host RM to disconnect us - this - * is a blocking operation that must include a fence */ - if (NULL == opal_pmix.disconnect) { - /* use the fence */ - ret = opal_pmix.fence(&coll, false); - } else { - ret = opal_pmix.disconnect(&coll); - } + * is a blocking operation so just use a fence */ + ret = opal_pmix.fence(&coll, false); OPAL_LIST_DESTRUCT(&coll); return ret; @@ -665,10 +666,10 @@ int ompi_dpm_spawn(int count, const char *array_of_commands[], for (i = 0; i < count; ++i) { app = OBJ_NEW(opal_pmix_app_t); if (NULL == app) { - ORTE_ERROR_LOG(ORTE_ERR_OUT_OF_RESOURCE); + OMPI_ERROR_LOG(OMPI_ERR_OUT_OF_RESOURCE); OPAL_LIST_DESTRUCT(&apps); opal_progress_event_users_decrement(); - return ORTE_ERR_OUT_OF_RESOURCE; + return OMPI_ERR_OUT_OF_RESOURCE; } /* add the app to the job data */ opal_list_append(&apps, &app->super); @@ -893,9 +894,9 @@ int ompi_dpm_spawn(int count, const char *array_of_commands[], ompi_info_get (array_of_info[i], "ompi_stdin_target", sizeof(stdin_target) - 1, stdin_target, &flag); if ( flag ) { if (0 == strcmp(stdin_target, "all")) { - ui32 = ORTE_VPID_WILDCARD; + ui32 = OPAL_VPID_WILDCARD; } else if (0 == strcmp(stdin_target, "none")) { - ui32 = ORTE_VPID_INVALID; + ui32 = OPAL_VPID_INVALID; } else { ui32 = strtoul(stdin_target, NULL, 10); } @@ -911,7 +912,7 @@ int ompi_dpm_spawn(int count, const char *array_of_commands[], */ if ( !have_wdir ) { if (OMPI_SUCCESS != (rc = opal_getcwd(cwd, OPAL_PATH_MAX))) { - ORTE_ERROR_LOG(rc); + OMPI_ERROR_LOG(rc); OPAL_LIST_DESTRUCT(&apps); opal_progress_event_users_decrement(); return rc; @@ -957,6 +958,7 @@ int ompi_dpm_open_port(char *port_name) r = opal_rand(&rnd); opal_convert_process_name_to_string(&tmp, OMPI_PROC_MY_NAME); snprintf(port_name, MPI_MAX_PORT_NAME-1, "%s:%u", tmp, r); + port_name[MPI_MAX_PORT_NAME - 1] = '\0'; free(tmp); return OMPI_SUCCESS; } diff --git a/ompi/errhandler/errcode.c b/ompi/errhandler/errcode.c index 3a63fa45dff..d807a1ae814 100644 --- a/ompi/errhandler/errcode.c +++ b/ompi/errhandler/errcode.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2007 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -133,8 +133,8 @@ int ompi_mpi_errcode_init (void) /* Initialize the pointer array, which will hold the references to the error objects */ OBJ_CONSTRUCT(&ompi_mpi_errcodes, opal_pointer_array_t); - if( OPAL_SUCCESS != opal_pointer_array_init(&ompi_mpi_errcodes, 0, - OMPI_FORTRAN_HANDLE_MAX, 64) ) { + if( OPAL_SUCCESS != opal_pointer_array_init(&ompi_mpi_errcodes, 64, + OMPI_FORTRAN_HANDLE_MAX, 32) ) { return OMPI_ERROR; } diff --git a/ompi/errhandler/errhandler.c b/ompi/errhandler/errhandler.c index 8ce4c383428..67cef457c0d 100644 --- a/ompi/errhandler/errhandler.c +++ b/ompi/errhandler/errhandler.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2014 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -83,8 +83,8 @@ int ompi_errhandler_init(void) /* initialize ompi_errhandler_f_to_c_table */ OBJ_CONSTRUCT( &ompi_errhandler_f_to_c_table, opal_pointer_array_t); - if( OPAL_SUCCESS != opal_pointer_array_init(&ompi_errhandler_f_to_c_table, 0, - OMPI_FORTRAN_HANDLE_MAX, 64) ) { + if( OPAL_SUCCESS != opal_pointer_array_init(&ompi_errhandler_f_to_c_table, 8, + OMPI_FORTRAN_HANDLE_MAX, 16) ) { return OMPI_ERROR; } diff --git a/ompi/errhandler/errhandler_predefined.c b/ompi/errhandler/errhandler_predefined.c index cd54bb6e30b..33134fb7f96 100644 --- a/ompi/errhandler/errhandler_predefined.c +++ b/ompi/errhandler/errhandler_predefined.c @@ -193,7 +193,7 @@ static void backend_fatal_aggregate(char *type, arg = va_arg(arglist, char*); va_end(arglist); - if (asprintf(&prefix, "[%s:%d]", + if (asprintf(&prefix, "[%s:%05d]", ompi_process_info.nodename, (int) ompi_process_info.pid) == -1) { prefix = NULL; diff --git a/ompi/file/file.c b/ompi/file/file.c index 93354de346a..bf546a55694 100644 --- a/ompi/file/file.c +++ b/ompi/file/file.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2007 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -12,9 +12,10 @@ * All rights reserved. * Copyright (c) 2008-2009 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2016 University of Houston. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -44,7 +45,7 @@ opal_pointer_array_t ompi_file_f_to_c_table = {{0}}; /* * MPI_FILE_NULL (_addr flavor is for F03 bindings) */ -ompi_predefined_file_t ompi_mpi_file_null = {{{0}}}; +ompi_predefined_file_t ompi_mpi_file_null = {{{{0}}}}; ompi_predefined_file_t *ompi_mpi_file_null_addr = &ompi_mpi_file_null; @@ -59,7 +60,7 @@ static void file_destructor(ompi_file_t *obj); * Class instance for ompi_file_t */ OBJ_CLASS_INSTANCE(ompi_file_t, - opal_object_t, + opal_infosubscriber_t, file_constructor, file_destructor); @@ -73,7 +74,7 @@ int ompi_file_init(void) OBJ_CONSTRUCT(&ompi_file_f_to_c_table, opal_pointer_array_t); if( OPAL_SUCCESS != opal_pointer_array_init(&ompi_file_f_to_c_table, 0, - OMPI_FORTRAN_HANDLE_MAX, 64) ) { + OMPI_FORTRAN_HANDLE_MAX, 16) ) { return OMPI_ERROR; } @@ -97,7 +98,7 @@ int ompi_file_init(void) * Back end to MPI_FILE_OPEN */ int ompi_file_open(struct ompi_communicator_t *comm, const char *filename, - int amode, struct ompi_info_t *info, ompi_file_t **fh) + int amode, struct opal_info_t *info, ompi_file_t **fh) { int ret; ompi_file_t *file; @@ -113,17 +114,10 @@ int ompi_file_open(struct ompi_communicator_t *comm, const char *filename, file->f_comm = comm; OBJ_RETAIN(comm); - if (MPI_INFO_NULL != info) { - if(NULL == file->f_info) { - file->f_info = OBJ_NEW(ompi_info_t); - } - if (OMPI_SUCCESS != (ret = ompi_info_dup(info, &file->f_info))) { - OBJ_RELEASE(file); - return ret; - } - } else { - file->f_info = MPI_INFO_NULL; - OBJ_RETAIN(MPI_INFO_NULL); + /* Copy the info for the info layer */ + file->super.s_info = OBJ_NEW(opal_info_t); + if (info) { + opal_info_dup(info, &(file->super.s_info)); } file->f_amode = amode; @@ -134,7 +128,7 @@ int ompi_file_open(struct ompi_communicator_t *comm, const char *filename, } /* Create the mutex */ - OBJ_CONSTRUCT(&file->f_mutex, opal_mutex_t); + OBJ_CONSTRUCT(&file->f_lock, opal_mutex_t); /* Select a module and actually open the file */ @@ -156,7 +150,7 @@ int ompi_file_open(struct ompi_communicator_t *comm, const char *filename, int ompi_file_close(ompi_file_t **file) { - OBJ_DESTRUCT(&(*file)->f_mutex); + OBJ_DESTRUCT(&(*file)->f_lock); (*file)->f_flags |= OMPI_FILE_ISCLOSED; OBJ_RELEASE(*file); @@ -236,7 +230,6 @@ static void file_constructor(ompi_file_t *file) file->f_comm = NULL; file->f_filename = NULL; file->f_amode = 0; - file->f_info = NULL; /* Initialize flags */ @@ -316,10 +309,10 @@ static void file_destructor(ompi_file_t *file) #endif } - if (NULL != file->f_info) { - OBJ_RELEASE(file->f_info); + if (NULL != file->super.s_info) { + OBJ_RELEASE(file->super.s_info); #if OPAL_ENABLE_DEBUG - file->f_info = NULL; + file->super.s_info = NULL; #endif } diff --git a/ompi/file/file.h b/ompi/file/file.h index 92f49aa0581..8e3fbb85a3d 100644 --- a/ompi/file/file.h +++ b/ompi/file/file.h @@ -11,10 +11,11 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. - * Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2009-2017 Cisco Systems, Inc. All rights reserved + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2016 University of Houston. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -30,6 +31,7 @@ #include "opal/class/opal_list.h" #include "ompi/errhandler/errhandler.h" #include "opal/threads/mutex.h" +#include "opal/util/info_subscriber.h" #include "ompi/mca/io/io.h" /* @@ -45,7 +47,7 @@ BEGIN_C_DECLS */ struct ompi_file_t { /** Base of OBJ_* interface */ - opal_object_t super; + opal_infosubscriber_t super; /** Communicator that this file was created with */ struct ompi_communicator_t *f_comm; @@ -56,10 +58,6 @@ struct ompi_file_t { /** Amode that this file was created with */ int f_amode; - /** MPI_Info that this file was created with. Note that this is - *NOT* what should be returned from OMPI_FILE_GET_INFO! */ - struct ompi_info_t *f_info; - /** Bit flags */ int32_t f_flags; @@ -81,7 +79,7 @@ struct ompi_file_t { /** Mutex to be used to protect access to the selected component on a per file-handle basis */ - opal_mutex_t f_mutex; + opal_mutex_t f_lock; /** The selected component (note that this is a union) -- we need this to add and remove the component from the list of @@ -105,7 +103,7 @@ typedef struct ompi_file_t ompi_file_t; * See ompi/communicator/communicator.h comments with struct ompi_communicator_t * for full explanation why we chose the following padding construct for predefines. */ -#define PREDEFINED_FILE_PAD (sizeof(void*) * 192) +#define PREDEFINED_FILE_PAD 1536 struct ompi_predefined_file_t { struct ompi_file_t file; @@ -153,7 +151,7 @@ int ompi_file_init(void); * handling as well. */ int ompi_file_open(struct ompi_communicator_t *comm, const char *filename, - int amode, struct ompi_info_t *info, + int amode, struct opal_info_t *info, ompi_file_t **fh); /** diff --git a/ompi/group/group.c b/ompi/group/group.c index dc8c4d49e6f..f5cc88be98c 100644 --- a/ompi/group/group.c +++ b/ompi/group/group.c @@ -563,10 +563,13 @@ bool ompi_group_have_remote_peers (ompi_group_t *group) #if OMPI_GROUP_SPARSE proc = ompi_group_peer_lookup (group, i); #else - if (ompi_proc_is_sentinel (group->grp_proc_pointers[i])) { + proc = ompi_group_get_proc_ptr_raw (group, i); + if (ompi_proc_is_sentinel (proc)) { + /* the proc must be stored in the group or cached in the proc + * hash table if the process resides in the local node + * (see ompi_proc_complete_init) */ return true; } - proc = group->grp_proc_pointers[i]; #endif if (!OPAL_PROC_ON_LOCAL_NODE(proc->super.proc_flags)) { return true; diff --git a/ompi/group/group.h b/ompi/group/group.h index c4ff03b6847..30664f8a4e0 100644 --- a/ompi/group/group.h +++ b/ompi/group/group.h @@ -11,10 +11,10 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2007 University of Houston. All rights reserved. - * Copyright (c) 2007-2012 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2007-2017 Cisco Systems, Inc. All rights reserved * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2012 Oak Ridge National Labs. All rights reserved. - * Copyright (c) 2013-2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2013-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. @@ -107,7 +107,7 @@ OMPI_DECLSPEC OBJ_CLASS_DECLARATION(ompi_group_t); * See ompi/communicator/communicator.h comments with struct ompi_communicator_t * for full explanation why we chose the following padding construct for predefines. */ -#define PREDEFINED_GROUP_PAD (sizeof(void*) * 32) +#define PREDEFINED_GROUP_PAD 256 struct ompi_predefined_group_t { struct ompi_group_t group; @@ -356,7 +356,7 @@ static inline struct ompi_proc_t *ompi_group_dense_lookup (ompi_group_t *group, ompi_proc_t *real_proc = (ompi_proc_t *) ompi_proc_for_name (ompi_proc_sentinel_to_name ((uintptr_t) proc)); - if (opal_atomic_cmpset_ptr (group->grp_proc_pointers + peer_id, proc, real_proc)) { + if (opal_atomic_compare_exchange_strong_ptr (group->grp_proc_pointers + peer_id, &proc, real_proc)) { OBJ_RETAIN(real_proc); } diff --git a/ompi/group/group_init.c b/ompi/group/group_init.c index 2155d262470..674e4749eda 100644 --- a/ompi/group/group_init.c +++ b/ompi/group/group_init.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2007 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -319,8 +319,8 @@ int ompi_group_init(void) { /* initialize ompi_group_f_to_c_table */ OBJ_CONSTRUCT( &ompi_group_f_to_c_table, opal_pointer_array_t); - if( OPAL_SUCCESS != opal_pointer_array_init(&ompi_group_f_to_c_table, 0, - OMPI_FORTRAN_HANDLE_MAX, 64) ) { + if( OPAL_SUCCESS != opal_pointer_array_init(&ompi_group_f_to_c_table, 4, + OMPI_FORTRAN_HANDLE_MAX, 16) ) { return OMPI_ERROR; } diff --git a/ompi/group/group_plist.c b/ompi/group/group_plist.c index 62007154f3b..244cd17385e 100644 --- a/ompi/group/group_plist.c +++ b/ompi/group/group_plist.c @@ -16,6 +16,7 @@ * reserved. * Copyright (c) 2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -24,6 +25,7 @@ */ #include "ompi_config.h" +#include "opal/class/opal_bitmap.h" #include "ompi/group/group.h" #include "ompi/constants.h" #include "ompi/proc/proc.h" diff --git a/ompi/include/mpi.h.in b/ompi/include/mpi.h.in index f9d21c636b1..e84435fabdc 100644 --- a/ompi/include/mpi.h.in +++ b/ompi/include/mpi.h.in @@ -17,7 +17,7 @@ * reserved. * Copyright (c) 2011-2013 INRIA. All rights reserved. * Copyright (c) 2015 University of Houston. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ @@ -158,7 +158,7 @@ #undef OMPI_MPI_COUNT_TYPE /* type to use for ptrdiff_t, if it does not exist, set to ptrdiff_t if it does exist */ -#undef OPAL_PTRDIFF_TYPE +#undef ptrdiff_t /* Whether we want MPI cxx support or not */ #undef OMPI_BUILD_CXX_BINDINGS @@ -195,9 +195,6 @@ /* Whether C compiler supports -fvisibility */ #undef OPAL_C_HAVE_VISIBILITY -/* Whether OMPI should provide MPI File interface */ -#undef OMPI_PROVIDE_MPI_FILE_INTERFACE - #ifndef OMPI_DECLSPEC # if defined(WIN32) || defined(_WIN32) # if defined(OMPI_IMPORTS) @@ -293,7 +290,7 @@ * To accomodate programs written for MPI implementations that use a * straight ROMIO import */ -#if !OMPI_BUILDING && OMPI_PROVIDE_MPI_FILE_INTERFACE +#if !OMPI_BUILDING #define MPIO_Request MPI_Request #define MPIO_Test MPI_Test #define MPIO_Wait MPI_Wait @@ -329,9 +326,7 @@ typedef OMPI_MPI_COUNT_TYPE MPI_Count; typedef struct ompi_communicator_t *MPI_Comm; typedef struct ompi_datatype_t *MPI_Datatype; typedef struct ompi_errhandler_t *MPI_Errhandler; -#if OMPI_PROVIDE_MPI_FILE_INTERFACE typedef struct ompi_file_t *MPI_File; -#endif typedef struct ompi_group_t *MPI_Group; typedef struct ompi_info_t *MPI_Info; typedef struct ompi_op_t *MPI_Op; @@ -380,7 +375,6 @@ typedef void (MPI_Comm_errhandler_function)(MPI_Comm *, int *, ...); typedef MPI_Comm_errhandler_function MPI_Comm_errhandler_fn __mpi_interface_deprecated__("MPI_Comm_errhandler_fn was deprecated in MPI-2.2; use MPI_Comm_errhandler_function instead"); -#if OMPI_PROVIDE_MPI_FILE_INTERFACE /* This is a little hackish, but errhandler.h needs space for a MPI_File_errhandler_fn. While it could just be removed, this allows us to maintain a stable ABI within OMPI, at least for @@ -389,10 +383,6 @@ typedef void (ompi_file_errhandler_fn)(MPI_File *, int *, ...); typedef ompi_file_errhandler_fn MPI_File_errhandler_fn __mpi_interface_deprecated__("MPI_File_errhandler_fn was deprecated in MPI-2.2; use MPI_File_errhandler_function instead"); typedef ompi_file_errhandler_fn MPI_File_errhandler_function; -#else -struct ompi_file_t; -typedef void (ompi_file_errhandler_fn)(struct ompi_file_t**, int *, ...); -#endif typedef void (MPI_Win_errhandler_function)(MPI_Win *, int *, ...); typedef MPI_Win_errhandler_function MPI_Win_errhandler_fn __mpi_interface_deprecated__("MPI_Win_errhandler_fn was deprecated in MPI-2.2; use MPI_Win_errhandler_function instead"); @@ -453,7 +443,6 @@ typedef int (MPI_Grequest_cancel_function)(void *, int); #define MPI_DISTRIBUTE_NONE 2 /* not distributed */ #define MPI_DISTRIBUTE_DFLT_DARG (-1) /* default distribution arg */ -#if OMPI_PROVIDE_MPI_FILE_INTERFACE /* * Since these values are arbitrary to Open MPI, we might as well make * them the same as ROMIO for ease of mapping. These values taken @@ -478,8 +467,6 @@ typedef int (MPI_Grequest_cancel_function)(void *, int); /* Max data representation length */ #define MPI_MAX_DATAREP_STRING OPAL_MAX_DATAREP_STRING -#endif /* #if OMPI_PROVIDE_MPI_FILE_INTERFACE */ - /* * MPI-2 One-Sided Communications asserts */ @@ -758,9 +745,7 @@ enum { #define MPI_ERRHANDLER_NULL OMPI_PREDEFINED_GLOBAL(MPI_Errhandler, ompi_mpi_errhandler_null) #define MPI_INFO_NULL OMPI_PREDEFINED_GLOBAL(MPI_Info, ompi_mpi_info_null) #define MPI_WIN_NULL OMPI_PREDEFINED_GLOBAL(MPI_Win, ompi_mpi_win_null) -#if OMPI_PROVIDE_MPI_FILE_INTERFACE #define MPI_FILE_NULL OMPI_PREDEFINED_GLOBAL(MPI_File, ompi_mpi_file_null) -#endif #define MPI_T_ENUM_NULL ((MPI_T_enum) NULL) /* @@ -1359,7 +1344,6 @@ OMPI_DECLSPEC int MPI_Fetch_and_op(const void *origin_addr, void *result_addr, int target_rank, MPI_Aint target_disp, MPI_Op op, MPI_Win win); OMPI_DECLSPEC int MPI_Iexscan(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm, MPI_Request *request); -#if OMPI_PROVIDE_MPI_FILE_INTERFACE OMPI_DECLSPEC MPI_Fint MPI_File_c2f(MPI_File file); OMPI_DECLSPEC MPI_File MPI_File_f2c(MPI_Fint file); OMPI_DECLSPEC int MPI_File_call_errhandler(MPI_File fh, int errorcode); @@ -1456,7 +1440,6 @@ OMPI_DECLSPEC int MPI_File_get_type_extent(MPI_File fh, MPI_Datatype datatype, OMPI_DECLSPEC int MPI_File_set_atomicity(MPI_File fh, int flag); OMPI_DECLSPEC int MPI_File_get_atomicity(MPI_File fh, int *flag); OMPI_DECLSPEC int MPI_File_sync(MPI_File fh); -#endif /* #if OMPI_PROVIDE_MPI_FILE_INTERFACE */ OMPI_DECLSPEC int MPI_Finalize(void); OMPI_DECLSPEC int MPI_Finalized(int *flag); OMPI_DECLSPEC int MPI_Free_mem(void *base); @@ -2059,7 +2042,6 @@ OMPI_DECLSPEC int PMPI_Fetch_and_op(const void *origin_addr, void *result_addr, int target_rank, MPI_Aint target_disp, MPI_Op op, MPI_Win win); OMPI_DECLSPEC int PMPI_Iexscan(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm, MPI_Request *request); -#if OMPI_PROVIDE_MPI_FILE_INTERFACE OMPI_DECLSPEC MPI_Fint PMPI_File_c2f(MPI_File file); OMPI_DECLSPEC MPI_File PMPI_File_f2c(MPI_Fint file); OMPI_DECLSPEC int PMPI_File_call_errhandler(MPI_File fh, int errorcode); @@ -2156,7 +2138,6 @@ OMPI_DECLSPEC int PMPI_File_get_type_extent(MPI_File fh, MPI_Datatype datatype, OMPI_DECLSPEC int PMPI_File_set_atomicity(MPI_File fh, int flag); OMPI_DECLSPEC int PMPI_File_get_atomicity(MPI_File fh, int *flag); OMPI_DECLSPEC int PMPI_File_sync(MPI_File fh); -#endif /* #if OMPI_PROVIDE_MPI_FILE_INTERFACE */ OMPI_DECLSPEC int PMPI_Finalize(void); OMPI_DECLSPEC int PMPI_Finalized(int *flag); OMPI_DECLSPEC int PMPI_Free_mem(void *base); @@ -2703,11 +2684,4 @@ OMPI_DECLSPEC int MPI_T_enum_get_item(MPI_T_enum enumtype, int index, int *valu #endif #endif -#if !OMPI_PROVIDE_MPI_FILE_INTERFACE && !OMPI_BUILDING -/* ROMIO requires MPI implementations to set this to 1 if they provide - MPI_OFFSET. We need to provide it because its used throughout the - DDT engine */ -#define HAVE_MPI_OFFSET 1 -#endif - #endif /* OMPI_MPI_H */ diff --git a/ompi/include/mpif-config.h.in b/ompi/include/mpif-config.h.in index 527d80035a2..a3a6d7b0c1e 100644 --- a/ompi/include/mpif-config.h.in +++ b/ompi/include/mpif-config.h.in @@ -10,7 +10,7 @@ ! University of Stuttgart. All rights reserved. ! Copyright (c) 2004-2005 The Regents of the University of California. ! All rights reserved. -! Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. +! Copyright (c) 2006-2017 Cisco Systems, Inc. All rights reserved ! Copyright (c) 2013 Los Alamos National Security, LLC. All rights ! reserved. ! $COPYRIGHT$ @@ -74,7 +74,8 @@ ! logical MPI_SUBARRAYS_SUPPORTED logical MPI_ASYNC_PROTECTS_NONBLOCKING - parameter (MPI_SUBARRAYS_SUPPORTED=@OMPI_FORTRAN_SUBARRAYS_SUPPORTED@) + ! Hard-coded for .false. for now + parameter (MPI_SUBARRAYS_SUPPORTED= .false.) ! Hard-coded for .false. for now parameter (MPI_ASYNC_PROTECTS_NONBLOCKING = .false.) diff --git a/ompi/include/mpif-externals.h b/ompi/include/mpif-externals.h index afeb89ac0cd..31e15f7aa03 100644 --- a/ompi/include/mpif-externals.h +++ b/ompi/include/mpif-externals.h @@ -10,7 +10,7 @@ ! University of Stuttgart. All rights reserved. ! Copyright (c) 2004-2005 The Regents of the University of California. ! All rights reserved. -! Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. +! Copyright (c) 2006-2017 Cisco Systems, Inc. All rights reserved ! $COPYRIGHT$ ! ! Additional copyrights may follow @@ -41,4 +41,8 @@ ! external MPI_WTIME, MPI_WTICK , PMPI_WTICK, PMPI_WTIME double precision MPI_WTIME, MPI_WTICK , PMPI_WTICK, PMPI_WTIME - +! +! address integer functions +! + external MPI_AINT_ADD, MPI_AINT_DIFF + integer(kind=MPI_ADDRESS_KIND) MPI_AINT_ADD, MPI_AINT_DIFF diff --git a/ompi/include/mpif-values.pl b/ompi/include/mpif-values.pl index bb14229dbd7..1b955ec50d1 100755 --- a/ompi/include/mpif-values.pl +++ b/ompi/include/mpif-values.pl @@ -1,7 +1,7 @@ #!/usr/bin/env perl # # Copyright (c) 2011-2014 Cisco Systems, Inc. All rights reserved. -# Copyright (c) 2016 Research Organization for Information Science +# Copyright (c) 2016-2017 Research Organization for Information Science # and Technology (RIST). All rights reserved. # Copyright (c) 2016 FUJITSU LIMITED. All rights reserved. # $COPYRIGHT$ @@ -511,7 +511,6 @@ sub write_fortran_file { $output .= "#define OMPI_$key $handles->{$key}\n"; } -$output .= "\n#if OMPI_PROVIDE_MPI_FILE_INTERFACE\n"; foreach my $key (sort(keys(%{$io_constants}))) { $output .= "#define OMPI_$key $io_constants->{$key}\n"; } @@ -522,9 +521,8 @@ sub write_fortran_file { foreach my $key (sort(keys(%{$io_handles}))) { $output .= "#define OMPI_$key $io_handles->{$key}\n"; } -$output .= "#endif /* OMPI_PROVIDE_MPI_FILE_INTERFACE */ - -#endif /* USE_MPI_F08_CONSTANTS_H */\n"; +$output .= "\n"; +$output .= "#endif /* USE_MPI_F08_CONSTANTS_H */\n"; write_file("$topdir/ompi/mpi/fortran/use-mpi-f08/constants.h", $output); diff --git a/ompi/include/mpif.h.in b/ompi/include/mpif.h.in index d9b825e4b37..d4cbd138325 100644 --- a/ompi/include/mpif.h.in +++ b/ompi/include/mpif.h.in @@ -11,6 +11,8 @@ ! Copyright (c) 2004-2005 The Regents of the University of California. ! All rights reserved. ! Copyright (c) 2006-2014 Cisco Systems, Inc. All rights reserved. +! Copyright (c) 2017 Research Organization for Information Science +! and Technology (RIST). All rights reserved. ! $COPYRIGHT$ ! ! Additional copyrights may follow @@ -54,8 +56,8 @@ include 'mpif-config.h' include 'mpif-constants.h' include 'mpif-handles.h' - @OMPI_MPIF_IO_CONSTANTS_INCLUDE@ - @OMPI_MPIF_IO_HANDLES_INCLUDE@ + include 'mpif-io-constants.h' + include 'mpif-io-handles.h' include 'mpif-externals.h' include 'mpif-sentinels.h' include 'mpif-sizeof.h' diff --git a/ompi/include/ompi/memchecker.h b/ompi/include/ompi/memchecker.h index 90a89199353..a56f065c364 100644 --- a/ompi/include/ompi/memchecker.h +++ b/ompi/include/ompi/memchecker.h @@ -6,7 +6,7 @@ * reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. * Copyright (c) 2012-2013 Inria. All rights reserved. - * Copyright (c) 2014-2015 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2014 Intel, Inc. All rights reserved. * @@ -353,10 +353,10 @@ static inline int memchecker_datatype(MPI_Datatype type) opal_memchecker_base_isdefined (&type->super.id, sizeof(uint16_t)); opal_memchecker_base_isdefined (&type->super.bdt_used, sizeof(uint32_t)); opal_memchecker_base_isdefined (&type->super.size, sizeof(size_t)); - opal_memchecker_base_isdefined (&type->super.true_lb, sizeof(OPAL_PTRDIFF_T)); - opal_memchecker_base_isdefined (&type->super.true_ub, sizeof(OPAL_PTRDIFF_T)); - opal_memchecker_base_isdefined (&type->super.lb, sizeof(OPAL_PTRDIFF_T)); - opal_memchecker_base_isdefined (&type->super.ub, sizeof(OPAL_PTRDIFF_T)); + opal_memchecker_base_isdefined (&type->super.true_lb, sizeof(ptrdiff_t)); + opal_memchecker_base_isdefined (&type->super.true_ub, sizeof(ptrdiff_t)); + opal_memchecker_base_isdefined (&type->super.lb, sizeof(ptrdiff_t)); + opal_memchecker_base_isdefined (&type->super.ub, sizeof(ptrdiff_t)); opal_memchecker_base_isdefined (&type->super.align, sizeof(uint32_t)); opal_memchecker_base_isdefined (&type->super.nbElems, sizeof(uint32_t)); /* name... */ @@ -366,7 +366,8 @@ static inline int memchecker_datatype(MPI_Datatype type) opal_memchecker_base_isdefined (&type->super.opt_desc.length, sizeof(opal_datatype_count_t)); opal_memchecker_base_isdefined (&type->super.opt_desc.used, sizeof(opal_datatype_count_t)); opal_memchecker_base_isdefined (&type->super.opt_desc.desc, sizeof(dt_elem_desc_t *)); - opal_memchecker_base_isdefined (&type->super.btypes, OPAL_DATATYPE_MAX_PREDEFINED * sizeof(uint32_t)); + if( NULL != type->super.ptypes ) + opal_memchecker_base_isdefined (&type->super.ptypes, OPAL_DATATYPE_MAX_PREDEFINED * sizeof(size_t)); opal_memchecker_base_isdefined (&type->id, sizeof(int32_t)); opal_memchecker_base_isdefined (&type->d_f_to_c_index, sizeof(int32_t)); diff --git a/ompi/info/info.c b/ompi/info/info.c index 8f56311edfb..f209ca00574 100644 --- a/ompi/info/info.c +++ b/ompi/info/info.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2007 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -16,6 +16,7 @@ * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -43,46 +44,33 @@ #include "opal/util/opal_getcwd.h" #include "opal/util/output.h" #include "opal/util/strncpy.h" +#include "opal/util/info.h" #include "ompi/info/info.h" #include "ompi/runtime/mpiruntime.h" #include "ompi/runtime/params.h" - /* * Global variables */ -ompi_predefined_info_t ompi_mpi_info_null = {{{{0}}}}; +ompi_predefined_info_t ompi_mpi_info_null = {{{{{0}}}}}; ompi_predefined_info_t *ompi_mpi_info_null_addr = &ompi_mpi_info_null; -ompi_predefined_info_t ompi_mpi_info_env = {{{{0}}}}; - +ompi_predefined_info_t ompi_mpi_info_env = {{{{{0}}}}}; /* * Local functions */ static void info_constructor(ompi_info_t *info); static void info_destructor(ompi_info_t *info); -static void info_entry_constructor(ompi_info_entry_t *entry); -static void info_entry_destructor(ompi_info_entry_t *entry); -static ompi_info_entry_t *info_find_key (ompi_info_t *info, const char *key); - /* * ompi_info_t classes */ OBJ_CLASS_INSTANCE(ompi_info_t, - opal_list_t, + opal_info_t, info_constructor, info_destructor); -/* - * ompi_info_entry_t classes - */ -OBJ_CLASS_INSTANCE(ompi_info_entry_t, - opal_list_item_t, - info_entry_constructor, - info_entry_destructor); - /* * The global fortran <-> C translation table */ @@ -93,7 +81,7 @@ opal_pointer_array_t ompi_info_f_to_c_table = {{0}}; * fortran to C translation table. It also fills in the values * for the MPI_INFO_GET_ENV object */ -int ompi_info_init(void) +int ompi_mpiinfo_init(void) { char val[OPAL_MAXHOSTNAMELEN]; char *cptr; @@ -102,7 +90,7 @@ int ompi_info_init(void) OBJ_CONSTRUCT(&ompi_info_f_to_c_table, opal_pointer_array_t); if( OPAL_SUCCESS != opal_pointer_array_init(&ompi_info_f_to_c_table, 0, - OMPI_FORTRAN_HANDLE_MAX, 64) ) { + OMPI_FORTRAN_HANDLE_MAX, 16) ) { return OMPI_ERROR; } @@ -118,35 +106,35 @@ int ompi_info_init(void) /* command for this app_context */ if (NULL != (cptr = getenv("OMPI_COMMAND"))) { - ompi_info_set(&ompi_mpi_info_env.info, "command", cptr); + opal_info_set(&ompi_mpi_info_env.info.super, "command", cptr); } /* space-separated list of argv for this command */ if (NULL != (cptr = getenv("OMPI_ARGV"))) { - ompi_info_set(&ompi_mpi_info_env.info, "argv", cptr); + opal_info_set(&ompi_mpi_info_env.info.super, "argv", cptr); } /* max procs for the entire job */ if (NULL != (cptr = getenv("OMPI_MCA_orte_ess_num_procs"))) { - ompi_info_set(&ompi_mpi_info_env.info, "maxprocs", cptr); + opal_info_set(&ompi_mpi_info_env.info.super, "maxprocs", cptr); /* Open MPI does not support the "soft" option, so set it to maxprocs */ - ompi_info_set(&ompi_mpi_info_env.info, "soft", cptr); + opal_info_set(&ompi_mpi_info_env.info.super, "soft", cptr); } /* local host name */ gethostname(val, sizeof(val)); - ompi_info_set(&ompi_mpi_info_env.info, "host", val); + opal_info_set(&ompi_mpi_info_env.info.super, "host", val); /* architecture name */ if (NULL != (cptr = getenv("OMPI_MCA_orte_cpu_type"))) { - ompi_info_set(&ompi_mpi_info_env.info, "arch", cptr); + opal_info_set(&ompi_mpi_info_env.info.super, "arch", cptr); } #ifdef HAVE_SYS_UTSNAME_H else { struct utsname sysname; uname(&sysname); cptr = sysname.machine; - ompi_info_set(&ompi_mpi_info_env.info, "arch", cptr); + opal_info_set(&ompi_mpi_info_env.info.super, "arch", cptr); } #endif @@ -155,7 +143,7 @@ int ompi_info_init(void) * of determining the value */ if (NULL != (cptr = getenv("OMPI_MCA_initial_wdir"))) { - ompi_info_set(&ompi_mpi_info_env.info, "wdir", cptr); + opal_info_set(&ompi_mpi_info_env.info.super, "wdir", cptr); } /* provide the REQUESTED thread level - may be different @@ -163,16 +151,16 @@ int ompi_info_init(void) * ugly, but have to do a switch to find the string representation */ switch (ompi_mpi_thread_requested) { case MPI_THREAD_SINGLE: - ompi_info_set(&ompi_mpi_info_env.info, "thread_level", "MPI_THREAD_SINGLE"); + opal_info_set(&ompi_mpi_info_env.info.super, "thread_level", "MPI_THREAD_SINGLE"); break; case MPI_THREAD_FUNNELED: - ompi_info_set(&ompi_mpi_info_env.info, "thread_level", "MPI_THREAD_FUNNELED"); + opal_info_set(&ompi_mpi_info_env.info.super, "thread_level", "MPI_THREAD_FUNNELED"); break; case MPI_THREAD_SERIALIZED: - ompi_info_set(&ompi_mpi_info_env.info, "thread_level", "MPI_THREAD_SERIALIZED"); + opal_info_set(&ompi_mpi_info_env.info.super, "thread_level", "MPI_THREAD_SERIALIZED"); break; case MPI_THREAD_MULTIPLE: - ompi_info_set(&ompi_mpi_info_env.info, "thread_level", "MPI_THREAD_MULTIPLE"); + opal_info_set(&ompi_mpi_info_env.info.super, "thread_level", "MPI_THREAD_MULTIPLE"); break; default: /* do nothing - don't know the value */ @@ -183,24 +171,24 @@ int ompi_info_init(void) /* the number of app_contexts in this job */ if (NULL != (cptr = getenv("OMPI_NUM_APP_CTX"))) { - ompi_info_set(&ompi_mpi_info_env.info, "ompi_num_apps", cptr); + opal_info_set(&ompi_mpi_info_env.info.super, "ompi_num_apps", cptr); } /* space-separated list of first MPI rank of each app_context */ if (NULL != (cptr = getenv("OMPI_FIRST_RANKS"))) { - ompi_info_set(&ompi_mpi_info_env.info, "ompi_first_rank", cptr); + opal_info_set(&ompi_mpi_info_env.info.super, "ompi_first_rank", cptr); } /* space-separated list of num procs for each app_context */ if (NULL != (cptr = getenv("OMPI_APP_CTX_NUM_PROCS"))) { - ompi_info_set(&ompi_mpi_info_env.info, "ompi_np", cptr); + opal_info_set(&ompi_mpi_info_env.info.super, "ompi_np", cptr); } /* location of the directory containing any prepositioned files * the user may have requested */ if (NULL != (cptr = getenv("OMPI_FILE_LOCATION"))) { - ompi_info_set(&ompi_mpi_info_env.info, "ompi_positioned_file_dir", cptr); + opal_info_set(&ompi_mpi_info_env.info.super, "ompi_positioned_file_dir", cptr); } /* All done */ @@ -208,314 +196,69 @@ int ompi_info_init(void) return OMPI_SUCCESS; } +// Generally ompi_info_t processing is handled by opal_info_t now. +// But to avoid compiler warnings and to avoid having to constantly +// change code to mpiinfo->super to make MPI code use the opal_info_t +// it's convenient to have ompi_info_t wrappers for some of the opal_info_t +// related calls: -/* - * Duplicate an info - */ -int ompi_info_dup (ompi_info_t *info, ompi_info_t **newinfo) -{ - int err; - opal_list_item_t *item; - ompi_info_entry_t *iterator; - - OPAL_THREAD_LOCK(info->i_lock); - for (item = opal_list_get_first(&(info->super)); - item != opal_list_get_end(&(info->super)); - item = opal_list_get_next(iterator)) { - iterator = (ompi_info_entry_t *) item; - err = ompi_info_set(*newinfo, iterator->ie_key, iterator->ie_value); - if (MPI_SUCCESS != err) { - OPAL_THREAD_UNLOCK(info->i_lock); - return err; - } - } - OPAL_THREAD_UNLOCK(info->i_lock); - return MPI_SUCCESS; +int ompi_info_dup (ompi_info_t *info, ompi_info_t **newinfo) { + return opal_info_dup (&(info->super), (opal_info_t **)newinfo); } - - -/* - * Set a value on the info - */ -int ompi_info_set (ompi_info_t *info, const char *key, const char *value) -{ - char *new_value; - ompi_info_entry_t *new_info; - ompi_info_entry_t *old_info; - - new_value = strdup(value); - if (NULL == new_value) { - return MPI_ERR_NO_MEM; - } - - OPAL_THREAD_LOCK(info->i_lock); - old_info = info_find_key (info, key); - if (NULL != old_info) { - /* - * key already exists. remove the value associated with it - */ - free(old_info->ie_value); - old_info->ie_value = new_value; - } else { - new_info = OBJ_NEW(ompi_info_entry_t); - if (NULL == new_info) { - free(new_value); - OPAL_THREAD_UNLOCK(info->i_lock); - return MPI_ERR_NO_MEM; - } - strncpy (new_info->ie_key, key, MPI_MAX_INFO_KEY); - new_info->ie_value = new_value; - opal_list_append (&(info->super), (opal_list_item_t *) new_info); - } - OPAL_THREAD_UNLOCK(info->i_lock); - return MPI_SUCCESS; +int ompi_info_dup_mpistandard (ompi_info_t *info, ompi_info_t **newinfo) { + return opal_info_dup_mpistandard (&(info->super), (opal_info_t **)newinfo); +} +int ompi_info_set (ompi_info_t *info, const char *key, const char *value) { + return opal_info_set (&(info->super), key, value); } - - int ompi_info_set_value_enum (ompi_info_t *info, const char *key, int value, mca_base_var_enum_t *var_enum) { - char *string_value; - int ret; - - ret = var_enum->string_from_value (var_enum, value, &string_value); - if (OPAL_SUCCESS != ret) { - return ret; - } - - ret = ompi_info_set (info, key, string_value); - free (string_value); - return ret; + return opal_info_set_value_enum (&(info->super), key, value, var_enum); } - - - -/* - * Free an info handle and all of its keys and values. - */ -int ompi_info_free (ompi_info_t **info) -{ - (*info)->i_freed = true; - OBJ_RELEASE(*info); - *info = MPI_INFO_NULL; - return MPI_SUCCESS; -} - - -/* - * Get a value from an info - */ int ompi_info_get (ompi_info_t *info, const char *key, int valuelen, char *value, int *flag) { - ompi_info_entry_t *search; - int value_length; - - OPAL_THREAD_LOCK(info->i_lock); - search = info_find_key (info, key); - if (NULL == search){ - *flag = 0; - } else { - /* - * We have found the element, so we can return the value - * Set the flag, value_length and value - */ - *flag = 1; - value_length = strlen(search->ie_value); - /* - * If the stored value is shorter than valuelen, then - * we can copy the entire value out. Else, we have to - * copy ONLY valuelen bytes out - */ - if (value_length < valuelen ) { - strcpy(value, search->ie_value); - } else { - opal_strncpy(value, search->ie_value, valuelen); - if (MPI_MAX_INFO_VAL == valuelen) { - value[valuelen-1] = 0; - } else { - value[valuelen] = 0; - } - } - } - OPAL_THREAD_UNLOCK(info->i_lock); - return MPI_SUCCESS; + return opal_info_get (&(info->super), key, valuelen, value, flag); } - int ompi_info_get_value_enum (ompi_info_t *info, const char *key, int *value, int default_value, mca_base_var_enum_t *var_enum, int *flag) { - ompi_info_entry_t *search; - int ret; - - *value = default_value; - - OPAL_THREAD_LOCK(info->i_lock); - search = info_find_key (info, key); - if (NULL == search){ - OPAL_THREAD_UNLOCK(info->i_lock); - *flag = 0; - return MPI_SUCCESS; - } - - /* we found a mathing key. pass the string value to the enumerator and - * return */ - *flag = 1; - - ret = var_enum->value_from_string (var_enum, search->ie_value, value); - OPAL_THREAD_UNLOCK(info->i_lock); - - return ret; + return opal_info_get_value_enum (&(info->super), key, value, + default_value, var_enum, flag); } - - -/* - * Similar to ompi_info_get(), but cast the result into a boolean - * using some well-defined rules. - */ -int ompi_info_get_bool(ompi_info_t *info, char *key, bool *value, int *flag) -{ - char *ptr; - char str[256]; - - str[sizeof(str) - 1] = '\0'; - ompi_info_get(info, key, sizeof(str) - 1, str, flag); - if (*flag) { - *value = false; - - /* Trim whitespace */ - ptr = str + sizeof(str) - 1; - while (ptr >= str && isspace(*ptr)) { - *ptr = '\0'; - --ptr; - } - ptr = str; - while (ptr < str + sizeof(str) - 1 && *ptr != '\0' && - isspace(*ptr)) { - ++ptr; - } - if ('\0' != *ptr) { - if (isdigit(*ptr)) { - *value = (bool) atoi(ptr); - } else if (0 == strcasecmp(ptr, "yes") || - 0 == strcasecmp(ptr, "true")) { - *value = true; - } else if (0 != strcasecmp(ptr, "no") && - 0 != strcasecmp(ptr, "false")) { - /* RHC unrecognized value -- print a warning? */ - } - } - } - return MPI_SUCCESS; +int ompi_info_get_bool(ompi_info_t *info, char *key, bool *value, int *flag) { + return opal_info_get_bool(&(info->super), key, value, flag); } - -/* - * Delete a key from an info - */ -int ompi_info_delete (ompi_info_t *info, const char *key) -{ - ompi_info_entry_t *search; - - OPAL_THREAD_LOCK(info->i_lock); - search = info_find_key (info, key); - if (NULL == search){ - OPAL_THREAD_UNLOCK(info->i_lock); - return MPI_ERR_INFO_NOKEY; - } else { - /* - * An entry with this key value was found. Remove the item - * and free the memory allocated to it. - * As this key *must* be available, we do not check for errors. - */ - opal_list_remove_item (&(info->super), - (opal_list_item_t *)search); - OBJ_RELEASE(search); - } - OPAL_THREAD_UNLOCK(info->i_lock); - return MPI_SUCCESS; +int ompi_info_delete (ompi_info_t *info, const char *key) { + return opal_info_delete (&(info->super), key); } - - -/* - * Return the length of a value - */ int ompi_info_get_valuelen (ompi_info_t *info, const char *key, int *valuelen, int *flag) { - ompi_info_entry_t *search; - - OPAL_THREAD_LOCK(info->i_lock); - search = info_find_key (info, key); - if (NULL == search){ - *flag = 0; - } else { - /* - * We have found the element, so we can return the value - * Set the flag, value_length and value - */ - *flag = 1; - *valuelen = strlen(search->ie_value); - } - OPAL_THREAD_UNLOCK(info->i_lock); - return MPI_SUCCESS; + return opal_info_get_valuelen (&(info->super), key, valuelen, flag); } - - -/* - * Get the nth key - */ -int ompi_info_get_nthkey (ompi_info_t *info, int n, char *key) +int ompi_info_get_nthkey (ompi_info_t *info, int n, char *key) { + return opal_info_get_nthkey (&(info->super), n, key); +} +int ompi_info_get_nkeys(ompi_info_t *info, int *nkeys) { - ompi_info_entry_t *iterator; - - /* - * Iterate over and over till we get to the nth key - */ - OPAL_THREAD_LOCK(info->i_lock); - for (iterator = (ompi_info_entry_t *)opal_list_get_first(&(info->super)); - n > 0; - --n) { - iterator = (ompi_info_entry_t *)opal_list_get_next(iterator); - if (opal_list_get_end(&(info->super)) == - (opal_list_item_t *) iterator) { - OPAL_THREAD_UNLOCK(info->i_lock); - return MPI_ERR_ARG; - } - } - /* - * iterator is of the type opal_list_item_t. We have to - * cast it to ompi_info_entry_t before we can use it to - * access the value - */ - strncpy(key, iterator->ie_key, MPI_MAX_INFO_KEY); - OPAL_THREAD_UNLOCK(info->i_lock); - return MPI_SUCCESS; + return opal_info_get_nkeys (&(info->super), nkeys); } /* * Shut down MPI_Info handling */ -int ompi_info_finalize(void) +int ompi_mpiinfo_finalize(void) { size_t i, max; ompi_info_t *info; opal_list_item_t *item; - ompi_info_entry_t *entry; + opal_info_entry_t *entry; bool found = false; - /* Release MPI_INFO_NULL. Do this so that we don't get a bogus - leak report on it. Plus, it's statically allocated, so we - don't want to call OBJ_RELEASE on it. */ - - OBJ_DESTRUCT(&ompi_mpi_info_null.info); - opal_pointer_array_set_item(&ompi_info_f_to_c_table, 0, NULL); - - /* ditto for MPI_INFO_GET_ENV */ - OBJ_DESTRUCT(&ompi_mpi_info_env.info); - opal_pointer_array_set_item(&ompi_info_f_to_c_table, 1, NULL); - /* Go through the f2c table and see if anything is left. Free them all. */ @@ -544,10 +287,11 @@ int ompi_info_finalize(void) if (!info->i_freed && ompi_debug_show_handle_leaks) { if (ompi_debug_show_handle_leaks) { opal_output(0, "WARNING: MPI_Info still allocated at MPI_FINALIZE"); - for (item = opal_list_get_first(&(info->super)); - opal_list_get_end(&(info->super)) != item; + + for (item = opal_list_get_first(&info->super.super); + opal_list_get_end(&(info->super.super)) != item; item = opal_list_get_next(item)) { - entry = (ompi_info_entry_t *) item; + entry = (opal_info_entry_t *) item; opal_output(0, "WARNING: key=\"%s\", value=\"%s\"", entry->ie_key, NULL != entry->ie_value ? entry->ie_value : "(null)"); @@ -570,10 +314,11 @@ int ompi_info_finalize(void) /* All done -- destroy the table */ OBJ_DESTRUCT(&ompi_info_f_to_c_table); - return OMPI_SUCCESS; + return OPAL_SUCCESS; } + /* * This function is invoked when OBJ_NEW() is called. Here, we add this * info pointer to the table and then store its index as the handle @@ -582,39 +327,26 @@ static void info_constructor(ompi_info_t *info) { info->i_f_to_c_index = opal_pointer_array_add(&ompi_info_f_to_c_table, info); - info->i_lock = OBJ_NEW(opal_mutex_t); info->i_freed = false; - /* If the user doesn't want us to ever free it, then add an extra - RETAIN here */ - +/* + * If the user doesn't want us to ever free it, then add an extra + * RETAIN here + */ if (ompi_debug_no_free_handles) { OBJ_RETAIN(&(info->super)); } } - /* - * This function is called during OBJ_DESTRUCT of "info". When this - * done, we need to remove the entry from the ompi fortran to C - * translation table - */ + * * This function is called during OBJ_DESTRUCT of "info". When this + * * done, we need to remove the entry from the opal fortran to C + * * translation table + * */ static void info_destructor(ompi_info_t *info) { - opal_list_item_t *item; - ompi_info_entry_t *iterator; - - /* Remove every key in the list */ - - for (item = opal_list_remove_first(&(info->super)); - NULL != item; - item = opal_list_remove_first(&(info->super))) { - iterator = (ompi_info_entry_t *) item; - OBJ_RELEASE(iterator); - } - - /* reset the &ompi_info_f_to_c_table entry - make sure that the - entry is in the table */ + /* reset the &ompi_info_f_to_c_table entry - make sure that the + entry is in the table */ if (MPI_UNDEFINED != info->i_f_to_c_index && NULL != opal_pointer_array_get_item(&ompi_info_f_to_c_table, @@ -623,104 +355,16 @@ static void info_destructor(ompi_info_t *info) info->i_f_to_c_index, NULL); } - /* Release the lock */ - - OBJ_RELEASE(info->i_lock); } /* - * ompi_info_entry_t interface functions - */ -static void info_entry_constructor(ompi_info_entry_t *entry) -{ - memset(entry->ie_key, 0, sizeof(entry->ie_key)); - entry->ie_key[MPI_MAX_INFO_KEY] = 0; -} - - -static void info_entry_destructor(ompi_info_entry_t *entry) -{ - if (NULL != entry->ie_value) { - free(entry->ie_value); - } -} - - -/* - * Find a key - * - * Do NOT thread lock in here -- the calling function is responsible - * for that. + * Free an info handle and all of its keys and values. */ -static ompi_info_entry_t *info_find_key (ompi_info_t *info, const char *key) -{ - ompi_info_entry_t *iterator; - - /* No thread locking in here! */ - - /* Iterate over all the entries. If the key is found, then - * return immediately. Else, the loop will fall of the edge - * and NULL is returned - */ - for (iterator = (ompi_info_entry_t *)opal_list_get_first(&(info->super)); - opal_list_get_end(&(info->super)) != (opal_list_item_t*) iterator; - iterator = (ompi_info_entry_t *)opal_list_get_next(iterator)) { - if (0 == strcmp(key, iterator->ie_key)) { - return iterator; - } - } - return NULL; -} - - -int -ompi_info_value_to_int(char *value, int *interp) -{ - long tmp; - char *endp; - - if (NULL == value || '\0' == value[0]) return OMPI_ERR_BAD_PARAM; - - errno = 0; - tmp = strtol(value, &endp, 10); - /* we found something not a number */ - if (*endp != '\0') return OMPI_ERR_BAD_PARAM; - /* underflow */ - if (tmp == 0 && errno == EINVAL) return OMPI_ERR_BAD_PARAM; - - *interp = (int) tmp; - - return OMPI_SUCCESS; -} - - -int -ompi_info_value_to_bool(char *value, bool *interp) +int ompi_info_free (ompi_info_t **info) { - int tmp; - - /* idiot case */ - if (NULL == value || NULL == interp) return OMPI_ERR_BAD_PARAM; - - /* is it true / false? */ - if (0 == strcmp(value, "true")) { - *interp = true; - return OMPI_SUCCESS; - } else if (0 == strcmp(value, "false")) { - *interp = false; - return OMPI_SUCCESS; - - /* is it a number? */ - } else if (OMPI_SUCCESS == ompi_info_value_to_int(value, &tmp)) { - if (tmp == 0) { - *interp = false; - } else { - *interp = true; - } - return OMPI_SUCCESS; - } - - return OMPI_ERR_BAD_PARAM; + (*info)->i_freed = true; + OBJ_RELEASE(*info); + *info = MPI_INFO_NULL; + return MPI_SUCCESS; } - diff --git a/ompi/info/info.h b/ompi/info/info.h index 15881273522..6e9466bc7c0 100644 --- a/ompi/info/info.h +++ b/ompi/info/info.h @@ -10,10 +10,11 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2007-2012 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2007-2017 Cisco Systems, Inc. All rights reserved * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2012-2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -28,32 +29,25 @@ #include #include "mpi.h" +#include "opal/util/info.h" #include "opal/class/opal_list.h" #include "opal/class/opal_pointer_array.h" #include "opal/threads/mutex.h" #include "opal/mca/base/mca_base_var_enum.h" -/** - * \internal - * ompi_info_t structure. MPI_Info is a pointer to this structure - */ + struct ompi_info_t { - opal_list_t super; + struct opal_info_t super; /**< generic list pointer which is the container for (key,value) pairs */ int i_f_to_c_index; /**< fortran handle for info. This is needed for translation from fortran to C and vice versa */ - opal_mutex_t *i_lock; /**< Mutex for thread safety */ bool i_freed; /**< Whether this info has been freed or not */ }; -/** - * \internal - * Convenience typedef - */ typedef struct ompi_info_t ompi_info_t; /** @@ -61,7 +55,7 @@ typedef struct ompi_info_t ompi_info_t; * See ompi/communicator/communicator.h comments with struct ompi_communicator_t * for full explanation why we chose the following padding construct for predefines. */ -#define PREDEFINED_INFO_PAD (sizeof(void*) * 32) +#define PREDEFINED_INFO_PAD 256 struct ompi_predefined_info_t { struct ompi_info_t info; @@ -69,33 +63,8 @@ struct ompi_predefined_info_t { }; typedef struct ompi_predefined_info_t ompi_predefined_info_t; - -/** - * \internal - * - * ompi_info_entry_t object. Each item in ompi_info_list is of this - * type. It contains (key,value) pairs - */ -struct ompi_info_entry_t { - opal_list_item_t super; /**< required for opal_list_t type */ - char *ie_value; /**< value part of the (key, value) pair. - * Maximum length is MPI_MAX_INFO_VAL */ - char ie_key[MPI_MAX_INFO_KEY + 1]; /**< "key" part of the (key, value) - * pair */ -}; -/** - * \internal - * Convenience typedef - */ -typedef struct ompi_info_entry_t ompi_info_entry_t; - BEGIN_C_DECLS -/** - * Table for Fortran <-> C translation table - */ -extern opal_pointer_array_t ompi_info_f_to_c_table; - /** * Global instance for MPI_INFO_NULL */ @@ -106,11 +75,6 @@ OMPI_DECLSPEC extern ompi_predefined_info_t ompi_mpi_info_null; */ OMPI_DECLSPEC extern ompi_predefined_info_t *ompi_mpi_info_null_addr; -/** - * Global instance for MPI_INFO_ENV - */ -OMPI_DECLSPEC extern ompi_predefined_info_t ompi_mpi_info_env; - /** * \internal * Some declarations needed to use OBJ_NEW and OBJ_DESTRUCT macros @@ -118,229 +82,90 @@ OMPI_DECLSPEC extern ompi_predefined_info_t ompi_mpi_info_env; OMPI_DECLSPEC OBJ_CLASS_DECLARATION(ompi_info_t); /** - * \internal - * Some declarations needed to use OBJ_NEW and OBJ_DESTRUCT macros + * This function is invoked during ompi_mpi_init() and sets up + * MPI_Info handling. */ -OMPI_DECLSPEC OBJ_CLASS_DECLARATION(ompi_info_entry_t); +int ompi_mpiinfo_init(void); /** - * This function is invoked during ompi_mpi_init() and sets up - * MPI_Info handling. + * This function is used to free a ompi level info */ -int ompi_info_init(void); +int ompi_info_free (ompi_info_t **info); + /** * This functions is called during ompi_mpi_finalize() and shuts * down MPI_Info handling. */ -int ompi_info_finalize(void); +int ompi_mpiinfo_finalize(void); /** - * ompi_info_dup - Duplicate an 'MPI_Info' object - * - * @param info source info object (handle) - * @param newinfo pointer to the new info object (handle) - * - * @retval MPI_SUCCESS upon success - * @retval MPI_ERR_NO_MEM if out of memory - * - * Not only will the (key, value) pairs be duplicated, the order - * of keys will be the same in 'newinfo' as it is in 'info'. When - * an info object is no longer being used, it should be freed with - * 'MPI_Info_free'. + * ompi_info_foo() wrapper around various opal_info_foo() calls */ -int ompi_info_dup (ompi_info_t *info, ompi_info_t **newinfo); - +OMPI_DECLSPEC int ompi_info_dup (ompi_info_t *info, ompi_info_t **newinfo); /** - * Set a new key,value pair on info. - * - * @param info pointer to ompi_info_t object - * @param key pointer to the new key object - * @param value pointer to the new value object - * - * @retval MPI_SUCCESS upon success - * @retval MPI_ERR_NO_MEM if out of memory + * ompi_info_foo() wrapper around various opal_info_foo() calls + */ +OMPI_DECLSPEC int ompi_info_dup_mpistandard (ompi_info_t *info, ompi_info_t **newinfo); +/** + * ompi_info_foo() wrapper around various opal_info_foo() calls */ OMPI_DECLSPEC int ompi_info_set (ompi_info_t *info, const char *key, const char *value); - /** - * Set a new key,value pair from a variable enumerator. - * - * @param info pointer to ompi_info_t object - * @param key pointer to the new key object - * @param value integer value of the info key (must be valid in var_enum) - * @param var_enum variable enumerator - * - * @retval MPI_SUCCESS upon success - * @retval MPI_ERR_NO_MEM if out of memory - * @retval OPAL_ERR_VALUE_OUT_OF_BOUNDS if the value is not valid in the enumerator + * ompi_info_foo() wrapper around various opal_info_foo() calls */ OMPI_DECLSPEC int ompi_info_set_value_enum (ompi_info_t *info, const char *key, int value, mca_base_var_enum_t *var_enum); - /** - * ompi_info_free - Free an 'MPI_Info' object. - * - * @param info pointer to info (ompi_info_t *) object to be freed (handle) - * - * @retval MPI_SUCCESS - * @retval MPI_ERR_ARG - * - * Upon successful completion, 'info' will be set to - * 'MPI_INFO_NULL'. Free the info handle and all of its keys and - * values. + * ompi_info_foo() wrapper around various opal_info_foo() calls */ -int ompi_info_free (ompi_info_t **info); - - /** - * Get a (key, value) pair from an 'MPI_Info' object and assign it - * into a boolen output. - * - * @param info Pointer to ompi_info_t object - * @param key null-terminated character string of the index key - * @param value Boolean output value - * @param flag true (1) if 'key' defined on 'info', false (0) if not - * (logical) - * - * @retval MPI_SUCCESS - * - * If found, the string value will be cast to the boolen output in - * the following manner: - * - * - If the string value is digits, the return value is "(bool) - * atoi(value)" - * - If the string value is (case-insensitive) "yes" or "true", the - * result is true - * - If the string value is (case-insensitive) "no" or "false", the - * result is false - * - All other values are false - */ -OMPI_DECLSPEC int ompi_info_get_bool (ompi_info_t *info, char *key, bool *value, - int *flag); - +OMPI_DECLSPEC int ompi_info_get_bool (ompi_info_t *info, char *key, bool *value, int *flag); /** - * Get a (key, value) pair from an 'MPI_Info' object and assign it - * into an integer output based on the enumerator value. - * - * @param info Pointer to ompi_info_t object - * @param key null-terminated character string of the index key - * @param value integer output value - * @param default_value value to use if the string does not conform to the - * values accepted by the enumerator - * @param var_enum variable enumerator for the value - * @param flag true (1) if 'key' defined on 'info', false (0) if not - * (logical) - * - * @retval MPI_SUCCESS + * ompi_info_foo() wrapper around various opal_info_foo() calls */ - OMPI_DECLSPEC int ompi_info_get_value_enum (ompi_info_t *info, const char *key, int *value, int default_value, mca_base_var_enum_t *var_enum, int *flag); - /** - * Get a (key, value) pair from an 'MPI_Info' object - * - * @param info Pointer to ompi_info_t object - * @param key null-terminated character string of the index key - * @param valuelen maximum length of 'value' (integer) - * @param value null-terminated character string of the value - * @param flag true (1) if 'key' defined on 'info', false (0) if not - * (logical) - * - * @retval MPI_SUCCESS - * - * In C and C++, 'valuelen' should be one less than the allocated - * space to allow for for the null terminator. + * ompi_info_foo() wrapper around various opal_info_foo() calls */ OMPI_DECLSPEC int ompi_info_get (ompi_info_t *info, const char *key, int valuelen, char *value, int *flag); - /** - * Delete a (key,value) pair from "info" - * - * @param info ompi_info_t pointer on which we need to operate - * @param key The key portion of the (key,value) pair that - * needs to be deleted - * - * @retval MPI_SUCCESS - * @retval MPI_ERR_NOKEY + * ompi_info_foo() wrapper around various opal_info_foo() calls */ -int ompi_info_delete (ompi_info_t *info, const char *key); - +OMPI_DECLSPEC int ompi_info_delete (ompi_info_t *info, const char *key); /** - * @param info - ompi_info_t pointer object (handle) - * @param key - null-terminated character string of the index key - * @param valuelen - length of the value associated with 'key' (integer) - * @param flag - true (1) if 'key' defined on 'info', false (0) if not - * (logical) - * - * @retval MPI_SUCCESS - * @retval MPI_ERR_ARG - * @retval MPI_ERR_INFO_KEY - * - * The length returned in C and C++ does not include the end-of-string - * character. If the 'key' is not found on 'info', 'valuelen' is left - * alone. + * ompi_info_foo() wrapper around various opal_info_foo() calls */ OMPI_DECLSPEC int ompi_info_get_valuelen (ompi_info_t *info, const char *key, int *valuelen, int *flag); - /** - * ompi_info_get_nthkey - Get a key indexed by integer from an 'MPI_Info' o - * - * @param info Pointer to ompi_info_t object - * @param n index of key to retrieve (integer) - * @param key character string of at least 'MPI_MAX_INFO_KEY' characters - * - * @retval MPI_SUCCESS - * @retval MPI_ERR_ARG + * ompi_info_foo() wrapper around various opal_info_foo() calls */ -int ompi_info_get_nthkey (ompi_info_t *info, int n, char *key); - +OMPI_DECLSPEC int ompi_info_get_nthkey (ompi_info_t *info, int n, char *key); /** - * Convert value string to boolean - * - * Convert value string \c value into a boolean, using the - * interpretation rules specified in MPI-2 Section 4.10. The - * strings "true", "false", and integer numbers can be converted - * into booleans. All others will return \c OMPI_ERR_BAD_PARAM - * - * @param value Value string for info key to interpret - * @param interp returned interpretation of the value key - * - * @retval OMPI_SUCCESS string was successfully interpreted - * @retval OMPI_ERR_BAD_PARAM string was not able to be interpreted + * ompi_info_foo() wrapper around various opal_info_foo() calls */ OMPI_DECLSPEC int ompi_info_value_to_bool(char *value, bool *interp); - /** - * Convert value string to integer - * - * Convert value string \c value into a integer, using the - * interpretation rules specified in MPI-2 Section 4.10. - * All others will return \c OMPI_ERR_BAD_PARAM - * - * @param value Value string for info key to interpret - * @param interp returned interpretation of the value key - * - * @retval OMPI_SUCCESS string was successfully interpreted - * @retval OMPI_ERR_BAD_PARAM string was not able to be interpreted + * ompi_info_foo() wrapper around various opal_info_foo() calls */ -int ompi_info_value_to_int(char *value, int *interp); +OMPI_DECLSPEC int ompi_info_get_nkeys(ompi_info_t *info, int *nkeys); + END_C_DECLS /** * Return whether this info has been freed already or not. * - * @param info Pointer to ompi_info_t object. + * @param info Pointer to opal_info_t object. * * @retval true If the info has already been freed * @retval false If the info has not yet been freed * * If the info has been freed, return true. This will likely only - * happen in a reliable manner if ompi_debug_handle_never_free is + * happen in a reliable manner if opal_debug_handle_never_free is * true, in which case an extra OBJ_RETAIN is set on the object during * OBJ_NEW, meaning that the user will never be able to actually free * the underlying object. It's a good way to find out if a process is @@ -352,18 +177,5 @@ static inline bool ompi_info_is_freed(ompi_info_t *info) } -/** - * Get the number of keys defined on on an MPI_Info object - * @param info Pointer to ompi_info_t object. - * @param nkeys Pointer to nkeys, which needs to be filled up. - * - * @retval The number of keys defined on info - */ -static inline int -ompi_info_get_nkeys(ompi_info_t *info, int *nkeys) -{ - *nkeys = (int) opal_list_get_size(&(info->super)); - return MPI_SUCCESS; -} #endif /* OMPI_INFO_H */ diff --git a/ompi/interlib/Makefile.am b/ompi/interlib/Makefile.am new file mode 100644 index 00000000000..1a40fe8b260 --- /dev/null +++ b/ompi/interlib/Makefile.am @@ -0,0 +1,29 @@ +# -*- makefile -*- +# +# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana +# University Research and Technology +# Corporation. All rights reserved. +# Copyright (c) 2004-2005 The University of Tennessee and The University +# of Tennessee Research Foundation. All rights +# reserved. +# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, +# University of Stuttgart. All rights reserved. +# Copyright (c) 2004-2005 The Regents of the University of California. +# All rights reserved. +# Copyright (c) 2008 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2016 IBM Corporation. All rights reserved. +# Copyright (c) 2017 Intel, Inc. All rights reserved. +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +# This makefile.am does not stand on its own - it is included from ompi/Makefile.am + +headers += \ + interlib/interlib.h + +lib@OMPI_LIBMPI_NAME@_la_SOURCES += \ + interlib/interlib.c diff --git a/ompi/interlib/interlib.c b/ompi/interlib/interlib.c new file mode 100644 index 00000000000..cf9cd2c7429 --- /dev/null +++ b/ompi/interlib/interlib.c @@ -0,0 +1,162 @@ +/* -*- Mode: C; c-basic-offset:4 ; -*- */ +/* + * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana + * University Research and Technology + * Corporation. All rights reserved. + * Copyright (c) 2004-2017 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, + * University of Stuttgart. All rights reserved. + * Copyright (c) 2004-2005 The Regents of the University of California. + * All rights reserved. + * Copyright (c) 2008-2012 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. + * Copyright (c) 2015 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * Copyright (c) 2015-2017 Intel, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "ompi_config.h" + +#include + +#include "opal/mca/pmix/pmix.h" +#include "ompi/mca/rte/rte.h" +#include "ompi/interlib/interlib.h" + +#include "mpi.h" + +typedef struct { + int status; + volatile bool active; +} myreg_t; + +/* + * errhandler id + */ +static size_t interlibhandler_id = SIZE_MAX; + + +static void model_registration_callback(int status, + size_t errhandler_ref, + void *cbdata) +{ + myreg_t *trk = (myreg_t*)cbdata; + + trk->status = status; + interlibhandler_id = errhandler_ref; + trk->active = false; +} +static void model_callback(int status, + const opal_process_name_t *source, + opal_list_t *info, opal_list_t *results, + opal_pmix_notification_complete_fn_t cbfunc, + void *cbdata) +{ + opal_value_t *val; + + if (NULL != getenv("OMPI_SHOW_MODEL_CALLBACK")) { + /* we can ignore our own callback as we obviously + * know that we are MPI */ + if (NULL != info) { + OPAL_LIST_FOREACH(val, info, opal_value_t) { + if (0 == strcmp(val->key, OPAL_PMIX_PROGRAMMING_MODEL) && + 0 == strcmp(val->data.string, "MPI")) { + goto cback; + } + if (OPAL_STRING == val->type) { + opal_output(0, "OMPI Model Callback Key: %s Val %s", val->key, val->data.string); + } + } + } + } + /* otherwise, do something clever here */ + + cback: + /* we must NOT tell the event handler state machine that we + * are the last step as that will prevent it from notifying + * anyone else that might be listening for declarations */ + if (NULL != cbfunc) { + cbfunc(OMPI_SUCCESS, NULL, NULL, NULL, cbdata); + } +} + +int ompi_interlib_declare(int threadlevel, char *version) +{ + opal_list_t info, directives; + opal_value_t *kv; + myreg_t trk; + int ret; + + /* Register an event handler for library model declarations */ + trk.status = OPAL_ERROR; + trk.active = true; + /* give it a name so we can distinguish it */ + OBJ_CONSTRUCT(&directives, opal_list_t); + kv = OBJ_NEW(opal_value_t); + kv->key = strdup(OPAL_PMIX_EVENT_HDLR_NAME); + kv->type = OPAL_STRING; + kv->data.string = strdup("MPI-Model-Declarations"); + opal_list_append(&directives, &kv->super); + /* specify the event code */ + OBJ_CONSTRUCT(&info, opal_list_t); + kv = OBJ_NEW(opal_value_t); + kv->key = strdup("status"); // the key here is irrelevant + kv->type = OPAL_INT; + kv->data.integer = OPAL_ERR_MODEL_DECLARED; + opal_list_append(&info, &kv->super); + /* we could constrain the range to proc_local - technically, this + * isn't required so long as the code that generates + * the event stipulates its range as proc_local. We rely + * on that here */ + opal_pmix.register_evhandler(&info, &directives, model_callback, + model_registration_callback, + (void*)&trk); + OMPI_LAZY_WAIT_FOR_COMPLETION(trk.active); + + OPAL_LIST_DESTRUCT(&directives); + OPAL_LIST_DESTRUCT(&info); + if (OPAL_SUCCESS != trk.status) { + return trk.status; + } + + /* declare that we are present and active */ + OBJ_CONSTRUCT(&info, opal_list_t); + kv = OBJ_NEW(opal_value_t); + kv->key = strdup(OPAL_PMIX_PROGRAMMING_MODEL); + kv->type = OPAL_STRING; + kv->data.string = strdup("MPI"); + opal_list_append(&info, &kv->super); + kv = OBJ_NEW(opal_value_t); + kv->key = strdup(OPAL_PMIX_MODEL_LIBRARY_NAME); + kv->type = OPAL_STRING; + kv->data.string = strdup("OpenMPI"); + opal_list_append(&info, &kv->super); + kv = OBJ_NEW(opal_value_t); + kv->key = strdup(OPAL_PMIX_MODEL_LIBRARY_VERSION); + kv->type = OPAL_STRING; + kv->data.string = strdup(version); + opal_list_append(&info, &kv->super); + kv = OBJ_NEW(opal_value_t); + kv->key = strdup(OPAL_PMIX_THREADING_MODEL); + kv->type = OPAL_STRING; + if (MPI_THREAD_SINGLE == threadlevel) { + kv->data.string = strdup("NONE"); + } else { + kv->data.string = strdup("PTHREAD"); + } + opal_list_append(&info, &kv->super); + /* call pmix to initialize these values */ + ret = opal_pmix.init(&info); + OPAL_LIST_DESTRUCT(&info); + /* account for our refcount on pmix_init */ + opal_pmix.finalize(); + return ret; +} diff --git a/ompi/interlib/interlib.h b/ompi/interlib/interlib.h new file mode 100644 index 00000000000..404c3e56043 --- /dev/null +++ b/ompi/interlib/interlib.h @@ -0,0 +1,45 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ +/* + * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana + * University Research and Technology + * Corporation. All rights reserved. + * Copyright (c) 2004-2011 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, + * University of Stuttgart. All rights reserved. + * Copyright (c) 2004-2005 The Regents of the University of California. + * All rights reserved. + * Copyright (c) 2008-2012 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2008-2009 Sun Microsystems, Inc. All rights reserved. + * Copyright (c) 2015-2017 Intel, Inc. All rights reserved. + * Copyright (c) 2016 Los Alamos National Security, LLC. All rights + * reserved. + * Copyright (c) 2016 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ +/** @file **/ + +#ifndef OMPI_INTERLIB_H +#define OMPI_INTERLIB_H + +#include "ompi_config.h" + + +BEGIN_C_DECLS + +/* declare the presence of the OMPI library to other + * libraries that may be used in this application, and + * register for callbacks when any other such libraries + * declare themselves */ +OMPI_DECLSPEC int ompi_interlib_declare(int threadlevel, char *version); + + +END_C_DECLS + +#endif /* OMPI_INTERLIB_H */ diff --git a/ompi/mca/bml/r2/Makefile.am b/ompi/mca/bml/r2/Makefile.am index 533a5bc86c1..cddb1e32ec9 100644 --- a/ompi/mca/bml/r2/Makefile.am +++ b/ompi/mca/bml/r2/Makefile.am @@ -8,6 +8,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -37,6 +38,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_bml_r2_la_SOURCES = $(r2_sources) mca_bml_r2_la_LDFLAGS = -module -avoid-version +mca_bml_r2_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_bml_r2_la_SOURCES = $(r2_sources) diff --git a/ompi/mca/coll/base/Makefile.am b/ompi/mca/coll/base/Makefile.am index 21c144bf782..e513dce6049 100644 --- a/ompi/mca/coll/base/Makefile.am +++ b/ompi/mca/coll/base/Makefile.am @@ -42,4 +42,7 @@ libmca_coll_la_SOURCES += \ base/coll_base_alltoallv.c \ base/coll_base_reduce.c \ base/coll_base_barrier.c \ - base/coll_base_reduce_scatter.c + base/coll_base_reduce_scatter.c \ + base/coll_base_reduce_scatter_block.c \ + base/coll_base_exscan.c \ + base/coll_base_scan.c diff --git a/ompi/mca/coll/base/coll_base_allgather.c b/ompi/mca/coll/base/coll_base_allgather.c index 3ceea29ceb9..c774b3cd41d 100644 --- a/ompi/mca/coll/base/coll_base_allgather.c +++ b/ompi/mca/coll/base/coll_base_allgather.c @@ -168,7 +168,7 @@ int ompi_coll_base_allgather_intra_bruck(const void *sbuf, int scount, */ if (0 != rank) { char *free_buf = NULL, *shift_buf = NULL; - ptrdiff_t span, gap; + ptrdiff_t span, gap = 0; span = opal_datatype_span(&rdtype->super, (int64_t)(size - rank) * rcount, &gap); diff --git a/ompi/mca/coll/base/coll_base_allreduce.c b/ompi/mca/coll/base/coll_base_allreduce.c index d05235ccca5..eeb1d35fb45 100644 --- a/ompi/mca/coll/base/coll_base_allreduce.c +++ b/ompi/mca/coll/base/coll_base_allreduce.c @@ -13,8 +13,10 @@ * Copyright (c) 2009 University of Houston. All rights reserved. * Copyright (c) 2013 Los Alamos National Security, LLC. All Rights * reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2018 Siberian State University of Telecommunications + * and Information Science. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -135,8 +137,7 @@ ompi_coll_base_allreduce_intra_recursivedoubling(const void *sbuf, void *rbuf, int ret, line, rank, size, adjsize, remote, distance; int newrank, newremote, extra_ranks; char *tmpsend = NULL, *tmprecv = NULL, *tmpswap = NULL, *inplacebuf_free = NULL, *inplacebuf; - ompi_request_t *reqs[2] = {NULL, NULL}; - OPAL_PTRDIFF_TYPE span, gap; + ptrdiff_t span, gap = 0; size = ompi_comm_size(comm); rank = ompi_comm_rank(comm); @@ -215,14 +216,11 @@ ompi_coll_base_allreduce_intra_recursivedoubling(const void *sbuf, void *rbuf, (newremote * 2 + 1):(newremote + extra_ranks); /* Exchange the data */ - ret = MCA_PML_CALL(irecv(tmprecv, count, dtype, remote, - MCA_COLL_BASE_TAG_ALLREDUCE, comm, &reqs[0])); - if (MPI_SUCCESS != ret) { line = __LINE__; goto error_hndl; } - ret = MCA_PML_CALL(isend(tmpsend, count, dtype, remote, - MCA_COLL_BASE_TAG_ALLREDUCE, - MCA_PML_BASE_SEND_STANDARD, comm, &reqs[1])); - if (MPI_SUCCESS != ret) { line = __LINE__; goto error_hndl; } - ret = ompi_request_wait_all(2, reqs, MPI_STATUSES_IGNORE); + ret = ompi_coll_base_sendrecv_actual(tmpsend, count, dtype, remote, + MCA_COLL_BASE_TAG_ALLREDUCE, + tmprecv, count, dtype, remote, + MCA_COLL_BASE_TAG_ALLREDUCE, + comm, MPI_STATUS_IGNORE); if (MPI_SUCCESS != ret) { line = __LINE__; goto error_hndl; } /* Apply operation */ @@ -630,7 +628,7 @@ ompi_coll_base_allreduce_intra_ring_segmented(const void *sbuf, void *rbuf, int char *tmpsend = NULL, *tmprecv = NULL, *inbuf[2] = {NULL, NULL}; ptrdiff_t block_offset, max_real_segsize; ompi_request_t *reqs[2] = {NULL, NULL}; - OPAL_PTRDIFF_TYPE lb, extent, gap; + ptrdiff_t lb, extent, gap; size = ompi_comm_size(comm); rank = ompi_comm_rank(comm); @@ -911,5 +909,335 @@ ompi_coll_base_allreduce_intra_basic_linear(const void *sbuf, void *rbuf, int co return ompi_coll_base_bcast_intra_basic_linear(rbuf, count, dtype, 0, comm, module); } +/* + * ompi_coll_base_allreduce_intra_redscat_allgather + * + * Function: Allreduce using Rabenseifner's algorithm. + * Accepts: Same arguments as MPI_Allreduce + * Returns: MPI_SUCCESS or error code + * + * Description: an implementation of Rabenseifner's allreduce algorithm [1, 2]. + * [1] Rajeev Thakur, Rolf Rabenseifner and William Gropp. + * Optimization of Collective Communication Operations in MPICH // + * The Int. Journal of High Performance Computing Applications. Vol 19, + * Issue 1, pp. 49--66. + * [2] http://www.hlrs.de/mpi/myreduce.html. + * + * This algorithm is a combination of a reduce-scatter implemented with + * recursive vector halving and recursive distance doubling, followed either + * by an allgather implemented with recursive doubling [1]. + * + * Step 1. If the number of processes is not a power of two, reduce it to + * the nearest lower power of two (p' = 2^{\floor{\log_2 p}}) + * by removing r = p - p' extra processes as follows. In the first 2r processes + * (ranks 0 to 2r - 1), all the even ranks send the second half of the input + * vector to their right neighbor (rank + 1), and all the odd ranks send + * the first half of the input vector to their left neighbor (rank - 1). + * The even ranks compute the reduction on the first half of the vector and + * the odd ranks compute the reduction on the second half. The odd ranks then + * send the result to their left neighbors (the even ranks). As a result, + * the even ranks among the first 2r processes now contain the reduction with + * the input vector on their right neighbors (the odd ranks). These odd ranks + * do not participate in the rest of the algorithm, which leaves behind + * a power-of-two number of processes. The first r even-ranked processes and + * the last p - 2r processes are now renumbered from 0 to p' - 1. + * + * Step 2. The remaining processes now perform a reduce-scatter by using + * recursive vector halving and recursive distance doubling. The even-ranked + * processes send the second half of their buffer to rank + 1 and the odd-ranked + * processes send the first half of their buffer to rank - 1. All processes + * then compute the reduction between the local buffer and the received buffer. + * In the next log_2(p') - 1 steps, the buffers are recursively halved, and the + * distance is doubled. At the end, each of the p' processes has 1 / p' of the + * total reduction result. + * + * Step 3. An allgather is performed by using recursive vector doubling and + * distance halving. All exchanges are executed in reverse order relative + * to recursive doubling on previous step. If the number of processes is not + * a power of two, the total result vector must be sent to the r processes + * that were removed in the first step. + * + * Limitations: + * count >= 2^{\floor{\log_2 p}} + * commutative operations only + * intra-communicators only + * + * Memory requirements (per process): + * count * typesize + 4 * \log_2(p) * sizeof(int) = O(count) + */ +int ompi_coll_base_allreduce_intra_redscat_allgather( + const void *sbuf, void *rbuf, int count, struct ompi_datatype_t *dtype, + struct ompi_op_t *op, struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + int *rindex = NULL, *rcount = NULL, *sindex = NULL, *scount = NULL; + + int comm_size = ompi_comm_size(comm); + int rank = ompi_comm_rank(comm); + OPAL_OUTPUT((ompi_coll_base_framework.framework_output, + "coll:base:allreduce_intra_redscat_allgather: rank %d/%d", + rank, comm_size)); + + /* Find nearest power-of-two less than or equal to comm_size */ + int nsteps = opal_hibit(comm_size, comm->c_cube_dim + 1); /* ilog2(comm_size) */ + assert(nsteps >= 0); + int nprocs_pof2 = 1 << nsteps; /* flp2(comm_size) */ + + if (count < nprocs_pof2 || !ompi_op_is_commute(op)) { + OPAL_OUTPUT((ompi_coll_base_framework.framework_output, + "coll:base:allreduce_intra_redscat_allgather: rank %d/%d " + "count %d switching to basic linear allreduce", + rank, comm_size, count)); + return ompi_coll_base_allreduce_intra_basic_linear(sbuf, rbuf, count, dtype, + op, comm, module); + } + + int err = MPI_SUCCESS; + ptrdiff_t lb, extent, dsize, gap = 0; + ompi_datatype_get_extent(dtype, &lb, &extent); + dsize = opal_datatype_span(&dtype->super, count, &gap); + + /* Temporary buffer for receiving messages */ + char *tmp_buf = NULL; + char *tmp_buf_raw = (char *)malloc(dsize); + if (NULL == tmp_buf_raw) + return OMPI_ERR_OUT_OF_RESOURCE; + tmp_buf = tmp_buf_raw - gap; + + if (sbuf != MPI_IN_PLACE) { + err = ompi_datatype_copy_content_same_ddt(dtype, count, (char *)rbuf, + (char *)sbuf); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } + + /* + * Step 1. Reduce the number of processes to the nearest lower power of two + * p' = 2^{\floor{\log_2 p}} by removing r = p - p' processes. + * 1. In the first 2r processes (ranks 0 to 2r - 1), all the even ranks send + * the second half of the input vector to their right neighbor (rank + 1) + * and all the odd ranks send the first half of the input vector to their + * left neighbor (rank - 1). + * 2. All 2r processes compute the reduction on their half. + * 3. The odd ranks then send the result to their left neighbors + * (the even ranks). + * + * The even ranks (0 to 2r - 1) now contain the reduction with the input + * vector on their right neighbors (the odd ranks). The first r even + * processes and the p - 2r last processes are renumbered from + * 0 to 2^{\floor{\log_2 p}} - 1. + */ + + int vrank, step, wsize; + int nprocs_rem = comm_size - nprocs_pof2; + + if (rank < 2 * nprocs_rem) { + int count_lhalf = count / 2; + int count_rhalf = count - count_lhalf; + + if (rank % 2 != 0) { + /* + * Odd process -- exchange with rank - 1 + * Send the left half of the input vector to the left neighbor, + * Recv the right half of the input vector from the left neighbor + */ + err = ompi_coll_base_sendrecv(rbuf, count_lhalf, dtype, rank - 1, + MCA_COLL_BASE_TAG_ALLREDUCE, + (char *)tmp_buf + (ptrdiff_t)count_lhalf * extent, + count_rhalf, dtype, rank - 1, + MCA_COLL_BASE_TAG_ALLREDUCE, comm, + MPI_STATUS_IGNORE, rank); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + + /* Reduce on the right half of the buffers (result in rbuf) */ + ompi_op_reduce(op, (char *)tmp_buf + (ptrdiff_t)count_lhalf * extent, + (char *)rbuf + count_lhalf * extent, count_rhalf, dtype); + + /* Send the right half to the left neighbor */ + err = MCA_PML_CALL(send((char *)rbuf + (ptrdiff_t)count_lhalf * extent, + count_rhalf, dtype, rank - 1, + MCA_COLL_BASE_TAG_ALLREDUCE, + MCA_PML_BASE_SEND_STANDARD, comm)); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + + /* This process does not pariticipate in recursive doubling phase */ + vrank = -1; + + } else { + /* + * Even process -- exchange with rank + 1 + * Send the right half of the input vector to the right neighbor, + * Recv the left half of the input vector from the right neighbor + */ + err = ompi_coll_base_sendrecv((char *)rbuf + (ptrdiff_t)count_lhalf * extent, + count_rhalf, dtype, rank + 1, + MCA_COLL_BASE_TAG_ALLREDUCE, + tmp_buf, count_lhalf, dtype, rank + 1, + MCA_COLL_BASE_TAG_ALLREDUCE, comm, + MPI_STATUS_IGNORE, rank); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + + /* Reduce on the right half of the buffers (result in rbuf) */ + ompi_op_reduce(op, tmp_buf, rbuf, count_lhalf, dtype); + + /* Recv the right half from the right neighbor */ + err = MCA_PML_CALL(recv((char *)rbuf + (ptrdiff_t)count_lhalf * extent, + count_rhalf, dtype, rank + 1, + MCA_COLL_BASE_TAG_ALLREDUCE, comm, + MPI_STATUS_IGNORE)); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + + vrank = rank / 2; + } + } else { /* rank >= 2 * nprocs_rem */ + vrank = rank - nprocs_rem; + } + + /* + * Step 2. Reduce-scatter implemented with recursive vector halving and + * recursive distance doubling. We have p' = 2^{\floor{\log_2 p}} + * power-of-two number of processes with new ranks (vrank) and result in rbuf. + * + * The even-ranked processes send the right half of their buffer to rank + 1 + * and the odd-ranked processes send the left half of their buffer to + * rank - 1. All processes then compute the reduction between the local + * buffer and the received buffer. In the next \log_2(p') - 1 steps, the + * buffers are recursively halved, and the distance is doubled. At the end, + * each of the p' processes has 1 / p' of the total reduction result. + */ + rindex = malloc(sizeof(*rindex) * nsteps); + sindex = malloc(sizeof(*sindex) * nsteps); + rcount = malloc(sizeof(*rcount) * nsteps); + scount = malloc(sizeof(*scount) * nsteps); + if (NULL == rindex || NULL == sindex || NULL == rcount || NULL == scount) { + err = OMPI_ERR_OUT_OF_RESOURCE; + goto cleanup_and_return; + } + + if (vrank != -1) { + step = 0; + wsize = count; + sindex[0] = rindex[0] = 0; + + for (int mask = 1; mask < nprocs_pof2; mask <<= 1) { + /* + * On each iteration: rindex[step] = sindex[step] -- begining of the + * current window. Length of the current window is storded in wsize. + */ + int vdest = vrank ^ mask; + /* Translate vdest virtual rank to real rank */ + int dest = (vdest < nprocs_rem) ? vdest * 2 : vdest + nprocs_rem; + + if (rank < dest) { + /* + * Recv into the left half of the current window, send the right + * half of the window to the peer (perform reduce on the left + * half of the current window) + */ + rcount[step] = wsize / 2; + scount[step] = wsize - rcount[step]; + sindex[step] = rindex[step] + rcount[step]; + } else { + /* + * Recv into the right half of the current window, send the left + * half of the window to the peer (perform reduce on the right + * half of the current window) + */ + scount[step] = wsize / 2; + rcount[step] = wsize - scount[step]; + rindex[step] = sindex[step] + scount[step]; + } + + /* Send part of data from the rbuf, recv into the tmp_buf */ + err = ompi_coll_base_sendrecv((char *)rbuf + (ptrdiff_t)sindex[step] * extent, + scount[step], dtype, dest, + MCA_COLL_BASE_TAG_ALLREDUCE, + (char *)tmp_buf + (ptrdiff_t)rindex[step] * extent, + rcount[step], dtype, dest, + MCA_COLL_BASE_TAG_ALLREDUCE, comm, + MPI_STATUS_IGNORE, rank); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + + /* Local reduce: rbuf[] = tmp_buf[] rbuf[] */ + ompi_op_reduce(op, (char *)tmp_buf + (ptrdiff_t)rindex[step] * extent, + (char *)rbuf + (ptrdiff_t)rindex[step] * extent, + rcount[step], dtype); + + /* Move the current window to the received message */ + if (step + 1 < nsteps) { + rindex[step + 1] = rindex[step]; + sindex[step + 1] = rindex[step]; + wsize = rcount[step]; + step++; + } + } + /* + * Assertion: each process has 1 / p' of the total reduction result: + * rcount[nsteps - 1] elements in the rbuf[rindex[nsteps - 1], ...]. + */ + + /* + * Step 3. Allgather by the recursive doubling algorithm. + * Each process has 1 / p' of the total reduction result: + * rcount[nsteps - 1] elements in the rbuf[rindex[nsteps - 1], ...]. + * All exchanges are executed in reverse order relative + * to recursive doubling (previous step). + */ + + step = nsteps - 1; + + for (int mask = nprocs_pof2 >> 1; mask > 0; mask >>= 1) { + int vdest = vrank ^ mask; + /* Translate vdest virtual rank to real rank */ + int dest = (vdest < nprocs_rem) ? vdest * 2 : vdest + nprocs_rem; + + /* + * Send rcount[step] elements from rbuf[rindex[step]...] + * Recv scount[step] elements to rbuf[sindex[step]...] + */ + err = ompi_coll_base_sendrecv((char *)rbuf + (ptrdiff_t)rindex[step] * extent, + rcount[step], dtype, dest, + MCA_COLL_BASE_TAG_ALLREDUCE, + (char *)rbuf + (ptrdiff_t)sindex[step] * extent, + scount[step], dtype, dest, + MCA_COLL_BASE_TAG_ALLREDUCE, comm, + MPI_STATUS_IGNORE, rank); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + step--; + } + } + + /* + * Step 4. Send total result to excluded odd ranks. + */ + if (rank < 2 * nprocs_rem) { + if (rank % 2 != 0) { + /* Odd process -- recv result from rank - 1 */ + err = MCA_PML_CALL(recv(rbuf, count, dtype, rank - 1, + MCA_COLL_BASE_TAG_ALLREDUCE, comm, + MPI_STATUS_IGNORE)); + if (OMPI_SUCCESS != err) { goto cleanup_and_return; } + + } else { + /* Even process -- send result to rank + 1 */ + err = MCA_PML_CALL(send(rbuf, count, dtype, rank + 1, + MCA_COLL_BASE_TAG_ALLREDUCE, + MCA_PML_BASE_SEND_STANDARD, comm)); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } + } + + cleanup_and_return: + if (NULL != tmp_buf_raw) + free(tmp_buf_raw); + if (NULL != rindex) + free(rindex); + if (NULL != sindex) + free(sindex); + if (NULL != rcount) + free(rcount); + if (NULL != scount) + free(scount); + return err; +} /* copied function (with appropriate renaming) ends here */ diff --git a/ompi/mca/coll/base/coll_base_alltoall.c b/ompi/mca/coll/base/coll_base_alltoall.c index 676c12612b2..3509ed36414 100644 --- a/ompi/mca/coll/base/coll_base_alltoall.c +++ b/ompi/mca/coll/base/coll_base_alltoall.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2016 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -12,8 +12,9 @@ * All rights reserved. * Copyright (c) 2013-2016 Los Alamos National Security, LLC. All Rights * reserved. - * Copyright (c) 2014-2016 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -42,7 +43,7 @@ mca_coll_base_alltoall_intra_basic_inplace(const void *rbuf, int rcount, mca_coll_base_module_t *module) { int i, j, size, rank, err = MPI_SUCCESS, line; - OPAL_PTRDIFF_TYPE ext, gap; + ptrdiff_t ext, gap = 0; ompi_request_t *req; char *allocated_buffer = NULL, *tmp_buffer; size_t max_size; @@ -197,7 +198,7 @@ int ompi_coll_base_alltoall_intra_bruck(const void *sbuf, int scount, int i, k, line = -1, rank, size, err = 0; int sendto, recvfrom, distance, *displs = NULL, *blen = NULL; char *tmpbuf = NULL, *tmpbuf_free = NULL; - OPAL_PTRDIFF_TYPE sext, rext, span, gap; + ptrdiff_t sext, rext, span, gap = 0; struct ompi_datatype_t *new_ddt; if (MPI_IN_PLACE == sbuf) { @@ -390,7 +391,7 @@ int ompi_coll_base_alltoall_intra_linear_sync(const void *sbuf, int scount, (max_outstanding_reqs <= 0)) ? (size - 1) : (max_outstanding_reqs)); if (0 < total_reqs) { - reqs = coll_base_comm_get_reqs(module->base_data, 2 * total_reqs); + reqs = ompi_coll_base_comm_get_reqs(module->base_data, 2 * total_reqs); if (NULL == reqs) { error = -1; line = __LINE__; goto error_hndl; } } @@ -613,7 +614,7 @@ int ompi_coll_base_alltoall_intra_basic_linear(const void *sbuf, int scount, /* Initiate all send/recv to/from others. */ - req = rreq = coll_base_comm_get_reqs(data, (size - 1) * 2); + req = rreq = ompi_coll_base_comm_get_reqs(data, (size - 1) * 2); if (NULL == req) { err = OMPI_ERR_OUT_OF_RESOURCE; line = __LINE__; goto err_hndl; } prcv = (char *) rbuf; diff --git a/ompi/mca/coll/base/coll_base_alltoallv.c b/ompi/mca/coll/base/coll_base_alltoallv.c index d74ebb5f016..aec8b859444 100644 --- a/ompi/mca/coll/base/coll_base_alltoallv.c +++ b/ompi/mca/coll/base/coll_base_alltoallv.c @@ -14,8 +14,9 @@ * Copyright (c) 2013 Los Alamos National Security, LLC. All Rights * reserved. * Copyright (c) 2013 FUJITSU LIMITED. All rights reserved. - * Copyright (c) 2014-2016 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -43,30 +44,34 @@ mca_coll_base_alltoallv_intra_basic_inplace(const void *rbuf, const int *rcounts mca_coll_base_module_t *module) { int i, j, size, rank, err=MPI_SUCCESS; - ompi_request_t *req; char *allocated_buffer, *tmp_buffer; - size_t max_size, rdtype_size; - OPAL_PTRDIFF_TYPE ext, gap = 0; + size_t max_size; + ptrdiff_t ext, gap = 0; /* Initialize. */ size = ompi_comm_size(comm); rank = ompi_comm_rank(comm); - ompi_datatype_type_size(rdtype, &rdtype_size); /* If only one process, we're done. */ - if (1 == size || 0 == rdtype_size) { + if (1 == size) { return MPI_SUCCESS; } - /* Find the largest receive amount */ ompi_datatype_type_extent (rdtype, &ext); for (i = 0, max_size = 0 ; i < size ; ++i) { + if (i == rank) { + continue; + } size_t size = opal_datatype_span(&rdtype->super, rcounts[i], &gap); max_size = size > max_size ? size : max_size; } /* The gap will always be the same as we are working on the same datatype */ + if (OPAL_UNLIKELY(0 == max_size)) { + return MPI_SUCCESS; + } + /* Allocate a temporary buffer */ allocated_buffer = calloc (max_size, 1); if (NULL == allocated_buffer) { @@ -78,43 +83,33 @@ mca_coll_base_alltoallv_intra_basic_inplace(const void *rbuf, const int *rcounts /* in-place alltoallv slow algorithm (but works) */ for (i = 0 ; i < size ; ++i) { for (j = i+1 ; j < size ; ++j) { - if (i == rank && rcounts[j]) { + if (i == rank && 0 != rcounts[j]) { /* Copy the data into the temporary buffer */ err = ompi_datatype_copy_content_same_ddt (rdtype, rcounts[j], tmp_buffer, (char *) rbuf + rdisps[j] * ext); if (MPI_SUCCESS != err) { goto error_hndl; } /* Exchange data with the peer */ - err = MCA_PML_CALL(irecv ((char *) rbuf + rdisps[j] * ext, rcounts[j], rdtype, - j, MCA_COLL_BASE_TAG_ALLTOALLV, comm, &req)); - if (MPI_SUCCESS != err) { goto error_hndl; } - - err = MCA_PML_CALL(send ((void *) tmp_buffer, rcounts[j], rdtype, - j, MCA_COLL_BASE_TAG_ALLTOALLV, MCA_PML_BASE_SEND_STANDARD, - comm)); + err = ompi_coll_base_sendrecv_actual((void *) tmp_buffer, rcounts[j], rdtype, + j, MCA_COLL_BASE_TAG_ALLTOALLV, + (char *)rbuf + rdisps[j] * ext, rcounts[j], rdtype, + j, MCA_COLL_BASE_TAG_ALLTOALLV, + comm, MPI_STATUS_IGNORE); if (MPI_SUCCESS != err) { goto error_hndl; } - } else if (j == rank && rcounts[i]) { + } else if (j == rank && 0 != rcounts[i]) { /* Copy the data into the temporary buffer */ err = ompi_datatype_copy_content_same_ddt (rdtype, rcounts[i], tmp_buffer, (char *) rbuf + rdisps[i] * ext); if (MPI_SUCCESS != err) { goto error_hndl; } /* Exchange data with the peer */ - err = MCA_PML_CALL(irecv ((char *) rbuf + rdisps[i] * ext, rcounts[i], rdtype, - i, MCA_COLL_BASE_TAG_ALLTOALLV, comm, &req)); + err = ompi_coll_base_sendrecv_actual((void *) tmp_buffer, rcounts[i], rdtype, + i, MCA_COLL_BASE_TAG_ALLTOALLV, + (char *) rbuf + rdisps[i] * ext, rcounts[i], rdtype, + i, MCA_COLL_BASE_TAG_ALLTOALLV, + comm, MPI_STATUS_IGNORE); if (MPI_SUCCESS != err) { goto error_hndl; } - - err = MCA_PML_CALL(send ((void *) tmp_buffer, rcounts[i], rdtype, - i, MCA_COLL_BASE_TAG_ALLTOALLV, MCA_PML_BASE_SEND_STANDARD, - comm)); - if (MPI_SUCCESS != err) { goto error_hndl; } - } else { - continue; } - - /* Wait for the requests to complete */ - err = ompi_request_wait (&req, MPI_STATUSES_IGNORE); - if (MPI_SUCCESS != err) { goto error_hndl; } } } @@ -237,12 +232,12 @@ ompi_coll_base_alltoallv_intra_basic_linear(const void *sbuf, const int *scounts /* Now, initiate all send/recv to/from others. */ nreqs = 0; - reqs = preq = coll_base_comm_get_reqs(data, 2 * size); + reqs = preq = ompi_coll_base_comm_get_reqs(data, 2 * size); if( NULL == reqs ) { err = OMPI_ERR_OUT_OF_RESOURCE; goto err_hndl; } /* Post all receives first */ for (i = 0; i < size; ++i) { - if (i == rank || 0 == rcounts[i]) { + if (i == rank) { continue; } @@ -256,7 +251,7 @@ ompi_coll_base_alltoallv_intra_basic_linear(const void *sbuf, const int *scounts /* Now post all sends */ for (i = 0; i < size; ++i) { - if (i == rank || 0 == scounts[i]) { + if (i == rank) { continue; } diff --git a/ompi/mca/coll/base/coll_base_barrier.c b/ompi/mca/coll/base/coll_base_barrier.c index 3b3fb8ad733..a190f3be723 100644 --- a/ompi/mca/coll/base/coll_base_barrier.c +++ b/ompi/mca/coll/base/coll_base_barrier.c @@ -15,6 +15,7 @@ * reserved. * Copyright (c) 2015-2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -342,7 +343,7 @@ int ompi_coll_base_barrier_intra_basic_linear(struct ompi_communicator_t *comm, /* The root collects and broadcasts the messages. */ else { - requests = coll_base_comm_get_reqs(module->base_data, size); + requests = ompi_coll_base_comm_get_reqs(module->base_data, size); if( NULL == requests ) { err = OMPI_ERR_OUT_OF_RESOURCE; line = __LINE__; goto err_hndl; } for (i = 1; i < size; ++i) { diff --git a/ompi/mca/coll/base/coll_base_bcast.c b/ompi/mca/coll/base/coll_base_bcast.c index 737af89fe30..38210bab9df 100644 --- a/ompi/mca/coll/base/coll_base_bcast.c +++ b/ompi/mca/coll/base/coll_base_bcast.c @@ -13,6 +13,7 @@ * Copyright (c) 2012 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -68,7 +69,7 @@ ompi_coll_base_bcast_intra_generic( void* buffer, tmpbuf = (char *) buffer; if( tree->tree_nextsize != 0 ) { - send_reqs = coll_base_comm_get_reqs(module->base_data, tree->tree_nextsize); + send_reqs = ompi_coll_base_comm_get_reqs(module->base_data, tree->tree_nextsize); if( NULL == send_reqs ) { err = OMPI_ERR_OUT_OF_RESOURCE; line = __LINE__; goto error_hndl; } } @@ -628,7 +629,7 @@ ompi_coll_base_bcast_intra_basic_linear(void *buff, int count, } /* Root sends data to all others. */ - preq = reqs = coll_base_comm_get_reqs(module->base_data, size-1); + preq = reqs = ompi_coll_base_comm_get_reqs(module->base_data, size-1); if( NULL == reqs ) { err = OMPI_ERR_OUT_OF_RESOURCE; goto err_hndl; } for (i = 0; i < size; ++i) { diff --git a/ompi/mca/coll/base/coll_base_exscan.c b/ompi/mca/coll/base/coll_base_exscan.c new file mode 100644 index 00000000000..ef984049ae1 --- /dev/null +++ b/ompi/mca/coll/base/coll_base_exscan.c @@ -0,0 +1,223 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ +/* + * Copyright (c) 2018 Siberian State University of Telecommunications + * and Information Science. All rights reserved. + * Copyright (c) 2018 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "ompi_config.h" + +#include "mpi.h" +#include "ompi/constants.h" +#include "ompi/datatype/ompi_datatype.h" +#include "ompi/communicator/communicator.h" +#include "ompi/mca/coll/coll.h" +#include "ompi/mca/coll/base/coll_base_functions.h" +#include "ompi/mca/coll/base/coll_tags.h" +#include "ompi/mca/coll/base/coll_base_util.h" +#include "ompi/mca/pml/pml.h" +#include "ompi/op/op.h" + +/* + * ompi_coll_base_exscan_intra_linear + * + * Function: Linear algorithm for exclusive scan. + * Accepts: Same as MPI_Exscan + * Returns: MPI_SUCCESS or error code + */ +int +ompi_coll_base_exscan_intra_linear(const void *sbuf, void *rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + int size, rank, err; + ptrdiff_t dsize, gap; + char *free_buffer = NULL; + char *reduce_buffer = NULL; + + rank = ompi_comm_rank(comm); + size = ompi_comm_size(comm); + + /* For MPI_IN_PLACE, just adjust send buffer to point to + * receive buffer. */ + if (MPI_IN_PLACE == sbuf) { + sbuf = rbuf; + } + + /* If we're rank 0, then just send our sbuf to the next rank, and + * we are done. */ + if (0 == rank) { + return MCA_PML_CALL(send(sbuf, count, dtype, rank + 1, + MCA_COLL_BASE_TAG_EXSCAN, + MCA_PML_BASE_SEND_STANDARD, comm)); + } + + /* If we're the last rank, then just receive the result from the + * prior rank, and we are done. */ + else if ((size - 1) == rank) { + return MCA_PML_CALL(recv(rbuf, count, dtype, rank - 1, + MCA_COLL_BASE_TAG_EXSCAN, comm, + MPI_STATUS_IGNORE)); + } + + /* Otherwise, get the result from the prior rank, combine it with my + * data, and send it to the next rank */ + + /* Get a temporary buffer to perform the reduction into. Rationale + * for malloc'ing this size is provided in coll_basic_reduce.c. */ + dsize = opal_datatype_span(&dtype->super, count, &gap); + + free_buffer = (char*)malloc(dsize); + if (NULL == free_buffer) { + return OMPI_ERR_OUT_OF_RESOURCE; + } + reduce_buffer = free_buffer - gap; + err = ompi_datatype_copy_content_same_ddt(dtype, count, + reduce_buffer, (char*)sbuf); + + /* Receive the reduced value from the prior rank */ + err = MCA_PML_CALL(recv(rbuf, count, dtype, rank - 1, + MCA_COLL_BASE_TAG_EXSCAN, comm, MPI_STATUS_IGNORE)); + if (MPI_SUCCESS != err) { + goto error; + } + + /* Now reduce the prior rank's result with my source buffer. The source + * buffer had been previously copied into the temporary reduce_buffer. */ + ompi_op_reduce(op, rbuf, reduce_buffer, count, dtype); + + /* Send my result off to the next rank */ + err = MCA_PML_CALL(send(reduce_buffer, count, dtype, rank + 1, + MCA_COLL_BASE_TAG_EXSCAN, + MCA_PML_BASE_SEND_STANDARD, comm)); + /* Error */ + error: + free(free_buffer); + + /* All done */ + return err; +} + + +/* + * ompi_coll_base_exscan_intra_recursivedoubling + * + * Function: Recursive doubling algorithm for exclusive scan. + * Accepts: Same as MPI_Exscan + * Returns: MPI_SUCCESS or error code + * + * Description: Implements recursive doubling algorithm for MPI_Exscan. + * The algorithm preserves order of operations so it can + * be used both by commutative and non-commutative operations. + * + * Example for 5 processes and commutative operation MPI_SUM: + * Process: 0 1 2 3 4 + * recvbuf: - - - - - + * psend: [0] [1] [2] [3] [4] + * + * Step 1: + * recvbuf: - [0] - [2] - + * psend: [1+0] [0+1] [3+2] [2+3] [4] + * + * Step 2: + * recvbuf: - [0] [1+0] [(0+1)+2] - + * psend: [(3+2)+(1+0)] [(2+3)+(0+1)] [(1+0)+(3+2)] [(1+0)+(2+3)] [4] + * + * Step 3: + * recvbuf: - [0] [1+0] [(0+1)+2] [(3+2)+(1+0)] + * psend: [4+((3+2)+(1+0))] [((3+2)+(1+0))+4] + * + * Time complexity (worst case): \ceil(\log_2(p))(2\alpha + 2m\beta + 2m\gamma) + * Memory requirements (per process): 2 * count * typesize = O(count) + * Limitations: intra-communicators only + */ +int ompi_coll_base_exscan_intra_recursivedoubling( + const void *sendbuf, void *recvbuf, int count, struct ompi_datatype_t *datatype, + struct ompi_op_t *op, struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + int err = MPI_SUCCESS; + char *tmpsend_raw = NULL, *tmprecv_raw = NULL; + int comm_size = ompi_comm_size(comm); + int rank = ompi_comm_rank(comm); + + OPAL_OUTPUT((ompi_coll_base_framework.framework_output, "coll:base:exscan_intra_recursivedoubling: rank %d/%d", + rank, comm_size)); + if (count == 0) + return MPI_SUCCESS; + if (comm_size < 2) + return MPI_SUCCESS; + + ptrdiff_t dsize, gap; + dsize = opal_datatype_span(&datatype->super, count, &gap); + tmpsend_raw = malloc(dsize); + tmprecv_raw = malloc(dsize); + if (NULL == tmpsend_raw || NULL == tmprecv_raw) { + err = OMPI_ERR_OUT_OF_RESOURCE; + goto cleanup_and_return; + } + char *psend = tmpsend_raw - gap; + char *precv = tmprecv_raw - gap; + if (sendbuf != MPI_IN_PLACE) { + err = ompi_datatype_copy_content_same_ddt(datatype, count, psend, (char *)sendbuf); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } else { + err = ompi_datatype_copy_content_same_ddt(datatype, count, psend, recvbuf); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } + int is_commute = ompi_op_is_commute(op); + int is_first_block = 1; + + for (int mask = 1; mask < comm_size; mask <<= 1) { + int remote = rank ^ mask; + if (remote < comm_size) { + err = ompi_coll_base_sendrecv(psend, count, datatype, remote, + MCA_COLL_BASE_TAG_EXSCAN, + precv, count, datatype, remote, + MCA_COLL_BASE_TAG_EXSCAN, comm, + MPI_STATUS_IGNORE, rank); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + + if (rank > remote) { + /* Assertion: rank > 0 and rbuf is valid */ + if (is_first_block) { + err = ompi_datatype_copy_content_same_ddt(datatype, count, + recvbuf, precv); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + is_first_block = 0; + } else { + /* Accumulate prefix reduction: recvbuf = precv recvbuf */ + ompi_op_reduce(op, precv, recvbuf, count, datatype); + } + /* Partial result: psend = precv psend */ + ompi_op_reduce(op, precv, psend, count, datatype); + } else { + if (is_commute) { + /* psend = precv psend */ + ompi_op_reduce(op, precv, psend, count, datatype); + } else { + /* precv = psend precv */ + ompi_op_reduce(op, psend, precv, count, datatype); + char *tmp = psend; + psend = precv; + precv = tmp; + } + } + } + } + +cleanup_and_return: + if (NULL != tmpsend_raw) + free(tmpsend_raw); + if (NULL != tmprecv_raw) + free(tmprecv_raw); + return err; +} diff --git a/ompi/mca/coll/base/coll_base_find_available.c b/ompi/mca/coll/base/coll_base_find_available.c index e1f69d4ba47..b2e25944f3f 100644 --- a/ompi/mca/coll/base/coll_base_find_available.c +++ b/ompi/mca/coll/base/coll_base_find_available.c @@ -2,7 +2,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2005 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -46,9 +46,6 @@ static int init_query(const mca_base_component_t * ls, bool enable_progress_threads, bool enable_mpi_threads); -static int init_query_2_0_0(const mca_base_component_t * ls, - bool enable_progress_threads, - bool enable_mpi_threads); /* * Scan down the list of successfully opened components and query each of @@ -105,6 +102,20 @@ int mca_coll_base_find_available(bool enable_progress_threads, } +/* + * Query a specific component, coll v2.0.0 + */ +static inline int +init_query_2_0_0(const mca_base_component_t * component, + bool enable_progress_threads, + bool enable_mpi_threads) +{ + mca_coll_base_component_2_0_0_t *coll = + (mca_coll_base_component_2_0_0_t *) component; + + return coll->collm_init_query(enable_progress_threads, + enable_mpi_threads); +} /* * Query a component, see if it wants to run at all. If it does, save * some information. If it doesn't, close it. @@ -138,33 +149,11 @@ static int init_query(const mca_base_component_t * component, } /* Query done -- look at the return value to see what happened */ - - if (OMPI_SUCCESS != ret) { - opal_output_verbose(10, ompi_coll_base_framework.framework_output, - "coll:find_available: coll component %s is not available", - component->mca_component_name); - } else { - opal_output_verbose(10, ompi_coll_base_framework.framework_output, - "coll:find_available: coll component %s is available", - component->mca_component_name); - } - - /* All done */ + opal_output_verbose(10, ompi_coll_base_framework.framework_output, + "coll:find_available: coll component %s is %savailable", + component->mca_component_name, + (OMPI_SUCCESS == ret) ? "": "not "); return ret; } - -/* - * Query a specific component, coll v2.0.0 - */ -static int init_query_2_0_0(const mca_base_component_t * component, - bool enable_progress_threads, - bool enable_mpi_threads) -{ - mca_coll_base_component_2_0_0_t *coll = - (mca_coll_base_component_2_0_0_t *) component; - - return coll->collm_init_query(enable_progress_threads, - enable_mpi_threads); -} diff --git a/ompi/mca/coll/base/coll_base_frame.c b/ompi/mca/coll/base/coll_base_frame.c index edbbe04db1c..cd080e52030 100644 --- a/ompi/mca/coll/base/coll_base_frame.c +++ b/ompi/mca/coll/base/coll_base_frame.c @@ -14,6 +14,7 @@ * reserved. * Copyright (c) 2014 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -109,7 +110,7 @@ coll_base_comm_destruct(mca_coll_base_comm_t *data) OBJ_CLASS_INSTANCE(mca_coll_base_comm_t, opal_object_t, coll_base_comm_construct, coll_base_comm_destruct); -ompi_request_t** coll_base_comm_get_reqs(mca_coll_base_comm_t* data, int nreqs) +ompi_request_t** ompi_coll_base_comm_get_reqs(mca_coll_base_comm_t* data, int nreqs) { if( 0 == nreqs ) return NULL; diff --git a/ompi/mca/coll/base/coll_base_functions.h b/ompi/mca/coll/base/coll_base_functions.h index 54aa9e24353..3ea0c6dfe73 100644 --- a/ompi/mca/coll/base/coll_base_functions.h +++ b/ompi/mca/coll/base/coll_base_functions.h @@ -14,7 +14,7 @@ * Copyright (c) 2008 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2013-2016 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2018 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * Copyright (c) 2017 FUJITSU LIMITED. All rights reserved. @@ -182,6 +182,7 @@ int ompi_coll_base_allreduce_intra_recursivedoubling(ALLREDUCE_ARGS); int ompi_coll_base_allreduce_intra_ring(ALLREDUCE_ARGS); int ompi_coll_base_allreduce_intra_ring_segmented(ALLREDUCE_ARGS, uint32_t segsize); int ompi_coll_base_allreduce_intra_basic_linear(ALLREDUCE_ARGS); +int ompi_coll_base_allreduce_intra_redscat_allgather(ALLREDUCE_ARGS); /* AlltoAll */ int ompi_coll_base_alltoall_intra_pairwise(ALLTOALL_ARGS); @@ -222,6 +223,9 @@ int ompi_coll_base_bcast_intra_bintree(BCAST_ARGS, uint32_t segsize); int ompi_coll_base_bcast_intra_split_bintree(BCAST_ARGS, uint32_t segsize); /* Exscan */ +int ompi_coll_base_exscan_intra_recursivedoubling(EXSCAN_ARGS); +int ompi_coll_base_exscan_intra_linear(EXSCAN_ARGS); +int ompi_coll_base_exscan_intra_recursivedoubling(EXSCAN_ARGS); /* Gather */ int ompi_coll_base_gather_intra_basic_linear(GATHER_ARGS); @@ -238,13 +242,23 @@ int ompi_coll_base_reduce_intra_pipeline(REDUCE_ARGS, uint32_t segsize, int max_ int ompi_coll_base_reduce_intra_binary(REDUCE_ARGS, uint32_t segsize, int max_outstanding_reqs ); int ompi_coll_base_reduce_intra_binomial(REDUCE_ARGS, uint32_t segsize, int max_outstanding_reqs ); int ompi_coll_base_reduce_intra_in_order_binary(REDUCE_ARGS, uint32_t segsize, int max_outstanding_reqs ); +int ompi_coll_base_reduce_intra_redscat_gather(REDUCE_ARGS); /* Reduce_scatter */ int ompi_coll_base_reduce_scatter_intra_nonoverlapping(REDUCESCATTER_ARGS); int ompi_coll_base_reduce_scatter_intra_basic_recursivehalving(REDUCESCATTER_ARGS); int ompi_coll_base_reduce_scatter_intra_ring(REDUCESCATTER_ARGS); +/* Reduce_scatter_block */ +int ompi_coll_base_reduce_scatter_block_basic_linear(REDUCESCATTERBLOCK_ARGS); +int ompi_coll_base_reduce_scatter_block_intra_recursivedoubling(REDUCESCATTERBLOCK_ARGS); +int ompi_coll_base_reduce_scatter_block_intra_recursivehalving(REDUCESCATTERBLOCK_ARGS); +int ompi_coll_base_reduce_scatter_block_intra_butterfly(REDUCESCATTERBLOCK_ARGS); + /* Scan */ +int ompi_coll_base_scan_intra_recursivedoubling(SCAN_ARGS); +int ompi_coll_base_scan_intra_linear(SCAN_ARGS); +int ompi_coll_base_scan_intra_recursivedoubling(SCAN_ARGS); /* Scatter */ int ompi_coll_base_scatter_intra_basic_linear(SCATTER_ARGS); @@ -455,6 +469,6 @@ static inline void ompi_coll_base_free_reqs(ompi_request_t **reqs, int count) * Return the array of requests on the data. If the array was not initialized * or if it's size was too small, allocate it to fit the requested size. */ -ompi_request_t** coll_base_comm_get_reqs(mca_coll_base_comm_t* data, int nreqs); +ompi_request_t** ompi_coll_base_comm_get_reqs(mca_coll_base_comm_t* data, int nreqs); #endif /* MCA_COLL_BASE_EXPORT_H */ diff --git a/ompi/mca/coll/base/coll_base_gather.c b/ompi/mca/coll/base/coll_base_gather.c index 41ae1f64105..83766bff2c8 100644 --- a/ompi/mca/coll/base/coll_base_gather.c +++ b/ompi/mca/coll/base/coll_base_gather.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2015 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -14,6 +14,7 @@ * reserved. * Copyright (c) 2015-2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -49,8 +50,8 @@ ompi_coll_base_gather_intra_binomial(const void *sbuf, int scount, char *ptmp = NULL, *tempbuf = NULL; ompi_coll_tree_t* bmtree; MPI_Status status; - MPI_Aint sextent, sgap, ssize; - MPI_Aint rextent, rgap, rsize; + MPI_Aint sextent, sgap = 0, ssize; + MPI_Aint rextent, rgap = 0, rsize; mca_coll_base_module_t *base_module = (mca_coll_base_module_t*) module; mca_coll_base_comm_t *data = base_module->base_data; @@ -267,7 +268,7 @@ ompi_coll_base_gather_intra_linear_sync(const void *sbuf, int scount, */ char *ptmp; ompi_request_t *first_segment_req; - reqs = coll_base_comm_get_reqs(module->base_data, size); + reqs = ompi_coll_base_comm_get_reqs(module->base_data, size); if (NULL == reqs) { ret = -1; line = __LINE__; goto error_hndl; } ompi_datatype_type_size(rdtype, &typelng); diff --git a/ompi/mca/coll/base/coll_base_reduce.c b/ompi/mca/coll/base/coll_base_reduce.c index 711c0dea4c4..82838ddbcd5 100644 --- a/ompi/mca/coll/base/coll_base_reduce.c +++ b/ompi/mca/coll/base/coll_base_reduce.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2015 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -15,6 +15,8 @@ * Copyright (c) 2015-2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. + * Copyright (c) 2018 Siberian State University of Telecommunications + * and Information Science. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -25,6 +27,7 @@ #include "ompi_config.h" #include "mpi.h" +#include "opal/util/bit_ops.h" #include "ompi/constants.h" #include "ompi/datatype/ompi_datatype.h" #include "ompi/communicator/communicator.h" @@ -34,6 +37,7 @@ #include "ompi/op/op.h" #include "ompi/mca/coll/base/coll_base_functions.h" #include "coll_base_topo.h" +#include "coll_base_util.h" int mca_coll_base_reduce_local(const void *inbuf, void *inoutbuf, int count, struct ompi_datatype_t * dtype, struct ompi_op_t * op, @@ -65,7 +69,7 @@ int ompi_coll_base_reduce_generic( const void* sendbuf, void* recvbuf, int origi char *inbuf[2] = {NULL, NULL}, *inbuf_free[2] = {NULL, NULL}; char *accumbuf = NULL, *accumbuf_free = NULL; char *local_op_buffer = NULL, *sendtmpbuf = NULL; - ptrdiff_t extent, size, gap, segment_increment; + ptrdiff_t extent, size, gap = 0, segment_increment; ompi_request_t **sreq = NULL, *reqs[2] = {MPI_REQUEST_NULL, MPI_REQUEST_NULL}; int num_segments, line, ret, segindex, i, rank; int recvcount, prevcount, inbi; @@ -287,7 +291,7 @@ int ompi_coll_base_reduce_generic( const void* sendbuf, void* recvbuf, int origi int creq = 0; - sreq = coll_base_comm_get_reqs(module->base_data, max_outstanding_reqs); + sreq = ompi_coll_base_comm_get_reqs(module->base_data, max_outstanding_reqs); if (NULL == sreq) { line = __LINE__; ret = -1; goto error_hndl; } /* post first group of requests */ @@ -526,7 +530,7 @@ int ompi_coll_base_reduce_intra_in_order_binary( const void *sendbuf, void *recv use_this_sendbuf = (void *)sendbuf; use_this_recvbuf = recvbuf; if (io_root != root) { - ptrdiff_t dsize, gap; + ptrdiff_t dsize, gap = 0; char *tmpbuf; dsize = opal_datatype_span(&datatype->super, count, &gap); @@ -610,7 +614,7 @@ ompi_coll_base_reduce_intra_basic_linear(const void *sbuf, void *rbuf, int count mca_coll_base_module_t *module) { int i, rank, err, size; - ptrdiff_t extent, dsize, gap; + ptrdiff_t extent, dsize, gap = 0; char *free_buffer = NULL; char *pml_buffer = NULL; char *inplace_temp_free = NULL; @@ -706,3 +710,395 @@ ompi_coll_base_reduce_intra_basic_linear(const void *sbuf, void *rbuf, int count return MPI_SUCCESS; } +/* + * ompi_coll_base_reduce_intra_redscat_gather + * + * Function: Reduce using Rabenseifner's algorithm. + * Accepts: Same arguments as MPI_Reduce + * Returns: MPI_SUCCESS or error code + * + * Description: an implementation of Rabenseifner's reduce algorithm [1, 2]. + * [1] Rajeev Thakur, Rolf Rabenseifner and William Gropp. + * Optimization of Collective Communication Operations in MPICH // + * The Int. Journal of High Performance Computing Applications. Vol 19, + * Issue 1, pp. 49--66. + * [2] http://www.hlrs.de/mpi/myreduce.html. + * + * This algorithm is a combination of a reduce-scatter implemented with + * recursive vector halving and recursive distance doubling, followed either + * by a binomial tree gather [1]. + * + * Step 1. If the number of processes is not a power of two, reduce it to + * the nearest lower power of two (p' = 2^{\floor{\log_2 p}}) + * by removing r = p - p' extra processes as follows. In the first 2r processes + * (ranks 0 to 2r - 1), all the even ranks send the second half of the input + * vector to their right neighbor (rank + 1), and all the odd ranks send + * the first half of the input vector to their left neighbor (rank - 1). + * The even ranks compute the reduction on the first half of the vector and + * the odd ranks compute the reduction on the second half. The odd ranks then + * send the result to their left neighbors (the even ranks). As a result, + * the even ranks among the first 2r processes now contain the reduction with + * the input vector on their right neighbors (the odd ranks). These odd ranks + * do not participate in the rest of the algorithm, which leaves behind + * a power-of-two number of processes. The first r even-ranked processes and + * the last p - 2r processes are now renumbered from 0 to p' - 1. + * + * Step 2. The remaining processes now perform a reduce-scatter by using + * recursive vector halving and recursive distance doubling. The even-ranked + * processes send the second half of their buffer to rank + 1 and the odd-ranked + * processes send the first half of their buffer to rank - 1. All processes + * then compute the reduction between the local buffer and the received buffer. + * In the next log_2(p') - 1 steps, the buffers are recursively halved, and the + * distance is doubled. At the end, each of the p' processes has 1 / p' of the + * total reduction result. + * + * Step 3. A binomial tree gather is performed by using recursive vector + * doubling and distance halving. In the non-power-of-two case, if the root + * happens to be one of those odd-ranked processes that would normally + * be removed in the first step, then the role of this process and process 0 + * are interchanged. + * + * Limitations: + * count >= 2^{\floor{\log_2 p}} + * commutative operations only + * intra-communicators only + * + * Memory requirements (per process): + * rank != root: 2 * count * typesize + 4 * \log_2(p) * sizeof(int) = O(count) + * rank == root: count * typesize + 4 * \log_2(p) * sizeof(int) = O(count) + * + * Recommendations: root = 0, otherwise it is required additional steps + * in the root process. + */ +int ompi_coll_base_reduce_intra_redscat_gather( + const void *sbuf, void *rbuf, int count, struct ompi_datatype_t *dtype, + struct ompi_op_t *op, int root, struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + int comm_size = ompi_comm_size(comm); + int rank = ompi_comm_rank(comm); + + OPAL_OUTPUT((ompi_coll_base_framework.framework_output, + "coll:base:reduce_intra_redscat_gather: rank %d/%d, root %d", + rank, comm_size, root)); + + /* Find nearest power-of-two less than or equal to comm_size */ + int nsteps = opal_hibit(comm_size, comm->c_cube_dim + 1); /* ilog2(comm_size) */ + assert(nsteps >= 0); + int nprocs_pof2 = 1 << nsteps; /* flp2(comm_size) */ + + if (nprocs_pof2 < 2 || count < nprocs_pof2 || !ompi_op_is_commute(op)) { + OPAL_OUTPUT((ompi_coll_base_framework.framework_output, + "coll:base:reduce_intra_redscat_gather: rank %d/%d count %d " + "switching to basic linear reduce", rank, comm_size, count)); + return ompi_coll_base_reduce_intra_basic_linear(sbuf, rbuf, count, dtype, + op, root, comm, module); + } + + int err = MPI_SUCCESS; + int *rindex = NULL, *rcount = NULL, *sindex = NULL, *scount = NULL; + ptrdiff_t lb, extent, dsize, gap; + ompi_datatype_get_extent(dtype, &lb, &extent); + dsize = opal_datatype_span(&dtype->super, count, &gap); + + /* Temporary buffers */ + char *tmp_buf_raw = NULL, *rbuf_raw = NULL; + tmp_buf_raw = malloc(dsize); + if (NULL == tmp_buf_raw) { + err = OMPI_ERR_OUT_OF_RESOURCE; + goto cleanup_and_return; + } + char *tmp_buf = tmp_buf_raw - gap; + + if (rank != root) { + rbuf_raw = malloc(dsize); + if (NULL == rbuf_raw) { + err = OMPI_ERR_OUT_OF_RESOURCE; + goto cleanup_and_return; + } + rbuf = rbuf_raw - gap; + } + + if ((rank != root) || (sbuf != MPI_IN_PLACE)) { + err = ompi_datatype_copy_content_same_ddt(dtype, count, rbuf, + (char *)sbuf); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } + + /* + * Step 1. Reduce the number of processes to the nearest lower power of two + * p' = 2^{\floor{\log_2 p}} by removing r = p - p' processes. + * 1. In the first 2r processes (ranks 0 to 2r - 1), all the even ranks send + * the second half of the input vector to their right neighbor (rank + 1) + * and all the odd ranks send the first half of the input vector to their + * left neighbor (rank - 1). + * 2. All 2r processes compute the reduction on their half. + * 3. The odd ranks then send the result to their left neighbors + * (the even ranks). + * + * The even ranks (0 to 2r - 1) now contain the reduction with the input + * vector on their right neighbors (the odd ranks). The first r even + * processes and the p - 2r last processes are renumbered from + * 0 to 2^{\floor{\log_2 p}} - 1. These odd ranks do not participate in the + * rest of the algorithm. + */ + + int vrank, step, wsize; + int nprocs_rem = comm_size - nprocs_pof2; + + if (rank < 2 * nprocs_rem) { + int count_lhalf = count / 2; + int count_rhalf = count - count_lhalf; + + if (rank % 2 != 0) { + /* + * Odd process -- exchange with rank - 1 + * Send the left half of the input vector to the left neighbor, + * Recv the right half of the input vector from the left neighbor + */ + err = ompi_coll_base_sendrecv(rbuf, count_lhalf, dtype, rank - 1, + MCA_COLL_BASE_TAG_REDUCE, + (char *)tmp_buf + (ptrdiff_t)count_lhalf * extent, + count_rhalf, dtype, rank - 1, + MCA_COLL_BASE_TAG_REDUCE, comm, + MPI_STATUS_IGNORE, rank); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + + /* Reduce on the right half of the buffers (result in rbuf) */ + ompi_op_reduce(op, (char *)tmp_buf + (ptrdiff_t)count_lhalf * extent, + (char *)rbuf + count_lhalf * extent, count_rhalf, dtype); + + /* Send the right half to the left neighbor */ + err = MCA_PML_CALL(send((char *)rbuf + (ptrdiff_t)count_lhalf * extent, + count_rhalf, dtype, rank - 1, + MCA_COLL_BASE_TAG_REDUCE, + MCA_PML_BASE_SEND_STANDARD, comm)); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + + /* This process does not pariticipate in recursive doubling phase */ + vrank = -1; + + } else { + /* + * Even process -- exchange with rank + 1 + * Send the right half of the input vector to the right neighbor, + * Recv the left half of the input vector from the right neighbor + */ + err = ompi_coll_base_sendrecv((char *)rbuf + (ptrdiff_t)count_lhalf * extent, + count_rhalf, dtype, rank + 1, + MCA_COLL_BASE_TAG_REDUCE, + tmp_buf, count_lhalf, dtype, rank + 1, + MCA_COLL_BASE_TAG_REDUCE, comm, + MPI_STATUS_IGNORE, rank); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + + /* Reduce on the right half of the buffers (result in rbuf) */ + ompi_op_reduce(op, tmp_buf, rbuf, count_lhalf, dtype); + + /* Recv the right half from the right neighbor */ + err = MCA_PML_CALL(recv((char *)rbuf + (ptrdiff_t)count_lhalf * extent, + count_rhalf, dtype, rank + 1, + MCA_COLL_BASE_TAG_REDUCE, comm, + MPI_STATUS_IGNORE)); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + + vrank = rank / 2; + } + } else { /* rank >= 2 * nprocs_rem */ + vrank = rank - nprocs_rem; + } + + /* + * Step 2. Reduce-scatter implemented with recursive vector halving and + * recursive distance doubling. We have p' = 2^{\floor{\log_2 p}} + * power-of-two number of processes with new ranks (vrank) and result in rbuf. + * + * The even-ranked processes send the right half of their buffer to rank + 1 + * and the odd-ranked processes send the left half of their buffer to + * rank - 1. All processes then compute the reduction between the local + * buffer and the received buffer. In the next \log_2(p') - 1 steps, the + * buffers are recursively halved, and the distance is doubled. At the end, + * each of the p' processes has 1 / p' of the total reduction result. + */ + + rindex = malloc(sizeof(*rindex) * nsteps); /* O(\log_2(p)) */ + sindex = malloc(sizeof(*sindex) * nsteps); + rcount = malloc(sizeof(*rcount) * nsteps); + scount = malloc(sizeof(*scount) * nsteps); + if (NULL == rindex || NULL == sindex || NULL == rcount || NULL == scount) { + err = OMPI_ERR_OUT_OF_RESOURCE; + goto cleanup_and_return; + } + + if (vrank != -1) { + step = 0; + wsize = count; + sindex[0] = rindex[0] = 0; + + for (int mask = 1; mask < nprocs_pof2; mask <<= 1) { + /* + * On each iteration: rindex[step] = sindex[step] -- begining of the + * current window. Length of the current window is storded in wsize. + */ + int vdest = vrank ^ mask; + /* Translate vdest virtual rank to real rank */ + int dest = (vdest < nprocs_rem) ? vdest * 2 : vdest + nprocs_rem; + + if (rank < dest) { + /* + * Recv into the left half of the current window, send the right + * half of the window to the peer (perform reduce on the left + * half of the current window) + */ + rcount[step] = wsize / 2; + scount[step] = wsize - rcount[step]; + sindex[step] = rindex[step] + rcount[step]; + } else { + /* + * Recv into the right half of the current window, send the left + * half of the window to the peer (perform reduce on the right + * half of the current window) + */ + scount[step] = wsize / 2; + rcount[step] = wsize - scount[step]; + rindex[step] = sindex[step] + scount[step]; + } + + /* Send part of data from the rbuf, recv into the tmp_buf */ + err = ompi_coll_base_sendrecv((char *)rbuf + (ptrdiff_t)sindex[step] * extent, + scount[step], dtype, dest, + MCA_COLL_BASE_TAG_REDUCE, + (char *)tmp_buf + (ptrdiff_t)rindex[step] * extent, + rcount[step], dtype, dest, + MCA_COLL_BASE_TAG_REDUCE, comm, + MPI_STATUS_IGNORE, rank); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + + /* Local reduce: rbuf[] = tmp_buf[] rbuf[] */ + ompi_op_reduce(op, (char *)tmp_buf + (ptrdiff_t)rindex[step] * extent, + (char *)rbuf + (ptrdiff_t)rindex[step] * extent, + rcount[step], dtype); + + /* Move the current window to the received message */ + if (step + 1 < nsteps) { + rindex[step + 1] = rindex[step]; + sindex[step + 1] = rindex[step]; + wsize = rcount[step]; + step++; + } + } + } + /* + * Assertion: each process has 1 / p' of the total reduction result: + * rcount[nsteps - 1] elements in the rbuf[rindex[nsteps - 1], ...]. + */ + + /* + * Setup the root process for gather operation. + * Case 1: root < 2r and root is odd -- root process was excluded on step 1 + * Recv data from process 0, vroot = 0, vrank = 0 + * Case 2: root < 2r and root is even: vroot = root / 2 + * Case 3: root >= 2r: vroot = root - r + */ + int vroot = 0; + if (root < 2 * nprocs_rem) { + if (root % 2 != 0) { + vroot = 0; + if (rank == root) { + /* + * Case 1: root < 2r and root is odd -- root process was + * excluded on step 1 (newrank == -1). + * Recv a data from the process 0. + */ + rindex[0] = 0; + step = 0, wsize = count; + for (int mask = 1; mask < nprocs_pof2; mask *= 2) { + rcount[step] = wsize / 2; + scount[step] = wsize - rcount[step]; + rindex[step] = 0; + sindex[step] = rcount[step]; + step++; + wsize /= 2; + } + + err = MCA_PML_CALL(recv(rbuf, rcount[nsteps - 1], dtype, 0, + MCA_COLL_BASE_TAG_REDUCE, comm, + MPI_STATUS_IGNORE)); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + vrank = 0; + + } else if (vrank == 0) { + /* Send a data to the root */ + err = MCA_PML_CALL(send(rbuf, rcount[nsteps - 1], dtype, root, + MCA_COLL_BASE_TAG_REDUCE, + MCA_PML_BASE_SEND_STANDARD, comm)); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + vrank = -1; + } + } else { + /* Case 2: root < 2r and a root is even: vroot = root / 2 */ + vroot = root / 2; + } + } else { + /* Case 3: root >= 2r: newroot = root - r */ + vroot = root - nprocs_rem; + } + + /* + * Step 3. Gather result at the vroot by the binomial tree algorithm. + * Each process has 1 / p' of the total reduction result: + * rcount[nsteps - 1] elements in the rbuf[rindex[nsteps - 1], ...]. + * All exchanges are executed in reverse order relative + * to recursive doubling (previous step). + */ + + if (vrank != -1) { + int vdest_tree, vroot_tree; + step = nsteps - 1; /* step = ilog2(p') - 1 */ + + for (int mask = nprocs_pof2 >> 1; mask > 0; mask >>= 1) { + int vdest = vrank ^ mask; + /* Translate vdest virtual rank to real rank */ + int dest = (vdest < nprocs_rem) ? vdest * 2 : vdest + nprocs_rem; + if ((vdest == 0) && (root < 2 * nprocs_rem) && (root % 2 != 0)) + dest = root; + + vdest_tree = vdest >> step; + vdest_tree <<= step; + vroot_tree = vroot >> step; + vroot_tree <<= step; + if (vdest_tree == vroot_tree) { + /* Send data from rbuf and exit */ + err = MCA_PML_CALL(send((char *)rbuf + (ptrdiff_t)rindex[step] * extent, + rcount[step], dtype, dest, + MCA_COLL_BASE_TAG_REDUCE, + MCA_PML_BASE_SEND_STANDARD, comm)); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + break; + } else { + /* Recv and continue */ + err = MCA_PML_CALL(recv((char *)rbuf + (ptrdiff_t)sindex[step] * extent, + scount[step], dtype, dest, + MCA_COLL_BASE_TAG_REDUCE, comm, + MPI_STATUS_IGNORE)); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } + step--; + } + } + + cleanup_and_return: + if (NULL != tmp_buf_raw) + free(tmp_buf_raw); + if (NULL != rbuf_raw) + free(rbuf_raw); + if (NULL != rindex) + free(rindex); + if (NULL != sindex) + free(sindex); + if (NULL != rcount) + free(rcount); + if (NULL != scount) + free(scount); + return err; +} diff --git a/ompi/mca/coll/base/coll_base_reduce_scatter.c b/ompi/mca/coll/base/coll_base_reduce_scatter.c index 950acbe55a5..f24211d355f 100644 --- a/ompi/mca/coll/base/coll_base_reduce_scatter.c +++ b/ompi/mca/coll/base/coll_base_reduce_scatter.c @@ -76,7 +76,7 @@ int ompi_coll_base_reduce_scatter_intra_nonoverlapping(const void *sbuf, void *r if (root == rank) { /* We must allocate temporary receive buffer on root to ensure that rbuf is big enough */ - ptrdiff_t dsize, gap; + ptrdiff_t dsize, gap = 0; dsize = opal_datatype_span(&dtype->super, total_count, &gap); tmprbuf_free = (char*) malloc(dsize); @@ -138,7 +138,7 @@ ompi_coll_base_reduce_scatter_intra_basic_recursivehalving( const void *sbuf, { int i, rank, size, count, err = OMPI_SUCCESS; int tmp_size, remain = 0, tmp_rank, *disps = NULL; - ptrdiff_t extent, buf_size, gap; + ptrdiff_t extent, buf_size, gap = 0; char *recv_buf = NULL, *recv_buf_free = NULL; char *result_buf = NULL, *result_buf_free = NULL; @@ -462,7 +462,7 @@ ompi_coll_base_reduce_scatter_intra_ring( const void *sbuf, void *rbuf, const in int inbi, *displs = NULL; char *tmpsend = NULL, *tmprecv = NULL, *accumbuf = NULL, *accumbuf_free = NULL; char *inbuf_free[2] = {NULL, NULL}, *inbuf[2] = {NULL, NULL}; - ptrdiff_t extent, max_real_segsize, dsize, gap; + ptrdiff_t extent, max_real_segsize, dsize, gap = 0; ompi_request_t *reqs[2] = {NULL, NULL}; size = ompi_comm_size(comm); diff --git a/ompi/mca/coll/base/coll_base_reduce_scatter_block.c b/ompi/mca/coll/base/coll_base_reduce_scatter_block.c new file mode 100644 index 00000000000..9bc04c91351 --- /dev/null +++ b/ompi/mca/coll/base/coll_base_reduce_scatter_block.c @@ -0,0 +1,917 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ +/* + * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana + * University Research and Technology + * Corporation. All rights reserved. + * Copyright (c) 2004-2017 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, + * University of Stuttgart. All rights reserved. + * Copyright (c) 2004-2005 The Regents of the University of California. + * All rights reserved. + * Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved. + * Copyright (c) 2012 Oak Ridge National Labs. All rights reserved. + * Copyright (c) 2012 Sandia National Laboratories. All rights reserved. + * Copyright (c) 2014-2018 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * Copyright (c) 2018 Siberian State University of Telecommunications + * and Information Sciences. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "ompi_config.h" + +#include "mpi.h" +#include "opal/util/bit_ops.h" +#include "ompi/constants.h" +#include "ompi/datatype/ompi_datatype.h" +#include "ompi/communicator/communicator.h" +#include "ompi/mca/coll/coll.h" +#include "ompi/mca/coll/basic/coll_basic.h" +#include "ompi/mca/pml/pml.h" +#include "ompi/op/op.h" +#include "coll_tags.h" +#include "coll_base_functions.h" +#include "coll_base_topo.h" +#include "coll_base_util.h" + +/* + * ompi_reduce_scatter_block_basic_linear + * + * Function: - reduce then scatter + * Accepts: - same as MPI_Reduce_scatter_block() + * Returns: - MPI_SUCCESS or error code + * + * Algorithm: + * reduce and scatter (needs to be cleaned + * up at some point) + */ +int +ompi_coll_base_reduce_scatter_block_basic_linear(const void *sbuf, void *rbuf, int rcount, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + int rank, size, count, err = OMPI_SUCCESS; + ptrdiff_t gap, span; + char *recv_buf = NULL, *recv_buf_free = NULL; + + /* Initialize */ + rank = ompi_comm_rank(comm); + size = ompi_comm_size(comm); + + /* short cut the trivial case */ + count = rcount * size; + if (0 == count) { + return OMPI_SUCCESS; + } + + /* get datatype information */ + span = opal_datatype_span(&dtype->super, count, &gap); + + /* Handle MPI_IN_PLACE */ + if (MPI_IN_PLACE == sbuf) { + sbuf = rbuf; + } + + if (0 == rank) { + /* temporary receive buffer. See coll_basic_reduce.c for + details on sizing */ + recv_buf_free = (char*) malloc(span); + if (NULL == recv_buf_free) { + err = OMPI_ERR_OUT_OF_RESOURCE; + goto cleanup; + } + recv_buf = recv_buf_free - gap; + } + + /* reduction */ + err = + comm->c_coll->coll_reduce(sbuf, recv_buf, count, dtype, op, 0, + comm, comm->c_coll->coll_reduce_module); + + /* scatter */ + if (MPI_SUCCESS == err) { + err = comm->c_coll->coll_scatter(recv_buf, rcount, dtype, + rbuf, rcount, dtype, 0, + comm, comm->c_coll->coll_scatter_module); + } + + cleanup: + if (NULL != recv_buf_free) free(recv_buf_free); + + return err; +} +/* + * ompi_rounddown: Rounds a number down to nearest multiple. + * rounddown(10,4) = 8, rounddown(6,3) = 6, rounddown(14,3) = 12 + */ +static int ompi_rounddown(int num, int factor) +{ + num /= factor; + return num * factor; /* floor(num / factor) * factor */ +} + +/* + * ompi_coll_base_reduce_scatter_block_intra_recursivedoubling + * + * Function: Recursive doubling algorithm for reduce_scatter_block. + * Accepts: Same as MPI_Reduce_scatter_block + * Returns: MPI_SUCCESS or error code + * + * Description: Implements recursive doubling algorithm for MPI_Reduce_scatter_block. + * The algorithm preserves order of operations so it can + * be used both by commutative and non-commutative operations. + * + * Time complexity: \alpha\log(p) + \beta*m(\log(p)-(p-1)/p) + \gamma*m(\log(p)-(p-1)/p), + * where m = rcount * comm_size, p = comm_size + * Memory requirements (per process): 2 * rcount * comm_size * typesize + */ +int +ompi_coll_base_reduce_scatter_block_intra_recursivedoubling( + const void *sbuf, void *rbuf, int rcount, struct ompi_datatype_t *dtype, + struct ompi_op_t *op, struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + struct ompi_datatype_t *dtypesend = NULL, *dtyperecv = NULL; + char *tmprecv_raw = NULL, *tmpbuf_raw = NULL, *tmprecv, *tmpbuf; + ptrdiff_t span, gap, totalcount, extent; + int blocklens[2], displs[2]; + int err = MPI_SUCCESS; + int comm_size = ompi_comm_size(comm); + int rank = ompi_comm_rank(comm); + + OPAL_OUTPUT((ompi_coll_base_framework.framework_output, + "coll:base:reduce_scatter_block_intra_recursivedoubling: rank %d/%d", + rank, comm_size)); + if (rcount == 0) + return MPI_SUCCESS; + if (comm_size < 2) + return MPI_SUCCESS; + + totalcount = comm_size * rcount; + ompi_datatype_type_extent(dtype, &extent); + span = opal_datatype_span(&dtype->super, totalcount, &gap); + tmpbuf_raw = malloc(span); + tmprecv_raw = malloc(span); + if (NULL == tmpbuf_raw || NULL == tmprecv_raw) { + err = OMPI_ERR_OUT_OF_RESOURCE; + goto cleanup_and_return; + } + tmpbuf = tmpbuf_raw - gap; + tmprecv = tmprecv_raw - gap; + + if (sbuf != MPI_IN_PLACE) { + err = ompi_datatype_copy_content_same_ddt(dtype, totalcount, tmpbuf, (char *)sbuf); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } else { + err = ompi_datatype_copy_content_same_ddt(dtype, totalcount, tmpbuf, rbuf); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } + int is_commutative = ompi_op_is_commute(op); + + /* Recursive distance doubling */ + int rdoubling_step = 0; + for (int mask = 1; mask < comm_size; mask <<= 1) { + int remote = rank ^ mask; + int cur_tree_root = ompi_rounddown(rank, mask); + int remote_tree_root = ompi_rounddown(remote, mask); + + /* + * Let be m is a block size in bytes (rcount), p is a comm_size, + * p*m is a total message size in sbuf. + * Step 1: processes send and recv (p*m-m) amount of data + * Step 2: processes send and recv (p*m-2*m) amount of data + * Step 3: processes send and recv (p*m-4*m) amount of data + * ... + * Step ceil(\log_2(p)): send and recv (p*m-m*2^floor{\log_2(p-1)}) + * + * Send block from tmpbuf: [0..cur_tree_root - 1], [cur_tree_root + mask, p - 1] + * Recv block into tmprecv: [0..remote_tree_root - 1], [remote_tree_root + mask, p - 1] + */ + + /* Send type */ + blocklens[0] = rcount * cur_tree_root; + blocklens[1] = (comm_size >= cur_tree_root + mask) ? + rcount * (comm_size - cur_tree_root - mask) : 0; + displs[0] = 0; + displs[1] = comm_size * rcount - blocklens[1]; + err = ompi_datatype_create_indexed(2, blocklens, displs, dtype, &dtypesend); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + err = ompi_datatype_commit(&dtypesend); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + + /* Recv type */ + blocklens[0] = rcount * remote_tree_root; + blocklens[1] = (comm_size >= remote_tree_root + mask) ? + rcount * (comm_size - remote_tree_root - mask) : 0; + displs[0] = 0; + displs[1] = comm_size * rcount - blocklens[1]; + err = ompi_datatype_create_indexed(2, blocklens, displs, dtype, &dtyperecv); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + err = ompi_datatype_commit(&dtyperecv); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + + int is_block_received = 0; + if (remote < comm_size) { + err = ompi_coll_base_sendrecv(tmpbuf, 1, dtypesend, remote, + MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, + tmprecv, 1, dtyperecv, remote, + MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, + comm, MPI_STATUS_IGNORE, rank); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + is_block_received = 1; + } + /* + * Non-power-of-two case: if process did not have destination process + * to communicate with, we need to send him the current result. + * Recursive halving algorithm is used for search of process. + */ + if (remote_tree_root + mask > comm_size) { + /* + * Compute the number of processes in current subtree + * that have all the data + */ + int nprocs_alldata = comm_size - cur_tree_root - mask; + for (int rhalving_mask = mask >> 1; rhalving_mask > 0; rhalving_mask >>= 1) { + remote = rank ^ rhalving_mask; + int tree_root = ompi_rounddown(rank, rhalving_mask << 1); + /* + * Send only if: + * 1) current process has data: (remote > rank) && (rank < tree_root + nprocs_alldata) + * 2) remote process does not have data at any step: remote >= tree_root + nprocs_alldata + */ + if ((remote > rank) && (rank < tree_root + nprocs_alldata) + && (remote >= tree_root + nprocs_alldata)) { + err = MCA_PML_CALL(send(tmprecv, 1, dtyperecv, remote, + MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, + MCA_PML_BASE_SEND_STANDARD, comm)); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + + } else if ((remote < rank) && (remote < tree_root + nprocs_alldata) && + (rank >= tree_root + nprocs_alldata)) { + err = MCA_PML_CALL(recv(tmprecv, 1, dtyperecv, remote, + MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, + comm, MPI_STATUS_IGNORE)); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + is_block_received = 1; + } + } + } + + if (is_block_received) { + /* After reduction the result must be in tmpbuf */ + if (is_commutative || (remote_tree_root < cur_tree_root)) { + ompi_op_reduce(op, tmprecv, tmpbuf, blocklens[0], dtype); + ompi_op_reduce(op, tmprecv + (ptrdiff_t)displs[1] * extent, + tmpbuf + (ptrdiff_t)displs[1] * extent, + blocklens[1], dtype); + } else { + ompi_op_reduce(op, tmpbuf, tmprecv, blocklens[0], dtype); + ompi_op_reduce(op, tmpbuf + (ptrdiff_t)displs[1] * extent, + tmprecv + (ptrdiff_t)displs[1] * extent, + blocklens[1], dtype); + err = ompi_datatype_copy_content_same_ddt(dtyperecv, 1, + tmpbuf, tmprecv); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } + } + rdoubling_step++; + err = ompi_datatype_destroy(&dtypesend); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + err = ompi_datatype_destroy(&dtyperecv); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } + err = ompi_datatype_copy_content_same_ddt(dtype, rcount, rbuf, + tmpbuf + (ptrdiff_t)rank * rcount * extent); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + +cleanup_and_return: + if (dtypesend) + ompi_datatype_destroy(&dtypesend); + if (dtyperecv) + ompi_datatype_destroy(&dtyperecv); + if (tmpbuf_raw) + free(tmpbuf_raw); + if (tmprecv_raw) + free(tmprecv_raw); + return err; +} + +/* + * ompi_range_sum: Returns sum of elems in intersection of [a, b] and [0, r] + * index: 0 1 2 3 4 ... r r+1 r+2 ... nproc_pof2 + * value: 2 2 2 2 2 ... 2 1 1 ... 1 + */ +static int ompi_range_sum(int a, int b, int r) +{ + if (r < a) + return b - a + 1; + else if (r > b) + return 2 * (b - a + 1); + return 2 * (r - a + 1) + b - r; +} + +/* + * ompi_coll_base_reduce_scatter_block_intra_recursivehalving + * + * Function: Recursive halving algorithm for reduce_scatter_block + * Accepts: Same as MPI_Reduce_scatter_block + * Returns: MPI_SUCCESS or error code + * + * Description: Implements recursive halving algorithm for MPI_Reduce_scatter_block. + * The algorithm can be used by commutative operations only. + * + * Limitations: commutative operations only + * Memory requirements (per process): 2 * rcount * comm_size * typesize + */ +int +ompi_coll_base_reduce_scatter_block_intra_recursivehalving( + const void *sbuf, void *rbuf, int rcount, struct ompi_datatype_t *dtype, + struct ompi_op_t *op, struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + char *tmprecv_raw = NULL, *tmpbuf_raw = NULL, *tmprecv, *tmpbuf; + ptrdiff_t span, gap, totalcount, extent; + int err = MPI_SUCCESS; + int comm_size = ompi_comm_size(comm); + int rank = ompi_comm_rank(comm); + + OPAL_OUTPUT((ompi_coll_base_framework.framework_output, + "coll:base:reduce_scatter_block_intra_recursivehalving: rank %d/%d", + rank, comm_size)); + if (rcount == 0 || comm_size < 2) + return MPI_SUCCESS; + + if (!ompi_op_is_commute(op)) { + OPAL_OUTPUT((ompi_coll_base_framework.framework_output, + "coll:base:reduce_scatter_block_intra_recursivehalving: rank %d/%d " + "switching to basic reduce_scatter_block", rank, comm_size)); + return ompi_coll_base_reduce_scatter_block_basic_linear(sbuf, rbuf, rcount, dtype, + op, comm, module); + } + totalcount = comm_size * rcount; + ompi_datatype_type_extent(dtype, &extent); + span = opal_datatype_span(&dtype->super, totalcount, &gap); + tmpbuf_raw = malloc(span); + tmprecv_raw = malloc(span); + if (NULL == tmpbuf_raw || NULL == tmprecv_raw) { + err = OMPI_ERR_OUT_OF_RESOURCE; + goto cleanup_and_return; + } + tmpbuf = tmpbuf_raw - gap; + tmprecv = tmprecv_raw - gap; + + if (sbuf != MPI_IN_PLACE) { + err = ompi_datatype_copy_content_same_ddt(dtype, totalcount, tmpbuf, (char *)sbuf); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } else { + err = ompi_datatype_copy_content_same_ddt(dtype, totalcount, tmpbuf, rbuf); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } + + /* + * Step 1. Reduce the number of processes to the nearest lower power of two + * p' = 2^{\floor{\log_2 p}} by removing r = p - p' processes. + * In the first 2r processes (ranks 0 to 2r - 1), all the even ranks send + * the input vector to their neighbor (rank + 1) and all the odd ranks recv + * the input vector and perform local reduction. + * The odd ranks (0 to 2r - 1) contain the reduction with the input + * vector on their neighbors (the even ranks). The first r odd + * processes and the p - 2r last processes are renumbered from + * 0 to 2^{\floor{\log_2 p}} - 1. Even ranks do not participate in the + * rest of the algorithm. + */ + + /* Find nearest power-of-two less than or equal to comm_size */ + int nprocs_pof2 = opal_next_poweroftwo(comm_size); + nprocs_pof2 >>= 1; + int nprocs_rem = comm_size - nprocs_pof2; + + int vrank = -1; + if (rank < 2 * nprocs_rem) { + if ((rank % 2) == 0) { + /* Even process */ + err = MCA_PML_CALL(send(tmpbuf, totalcount, dtype, rank + 1, + MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, + MCA_PML_BASE_SEND_STANDARD, comm)); + if (OMPI_SUCCESS != err) { goto cleanup_and_return; } + /* This process does not pariticipate in the rest of the algorithm */ + vrank = -1; + } else { + /* Odd process */ + err = MCA_PML_CALL(recv(tmprecv, totalcount, dtype, rank - 1, + MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, + comm, MPI_STATUS_IGNORE)); + if (OMPI_SUCCESS != err) { goto cleanup_and_return; } + ompi_op_reduce(op, tmprecv, tmpbuf, totalcount, dtype); + /* Adjust rank to be the bottom "remain" ranks */ + vrank = rank / 2; + } + } else { + /* Adjust rank to show that the bottom "even remain" ranks dropped out */ + vrank = rank - nprocs_rem; + } + + if (vrank != -1) { + /* + * Step 2. Recursive vector halving. We have p' = 2^{\floor{\log_2 p}} + * power-of-two number of processes with new ranks (vrank) and partial + * result in tmpbuf. + * All processes then compute the reduction between the local + * buffer and the received buffer. In the next \log_2(p') - 1 steps, the + * buffers are recursively halved. At the end, each of the p' processes + * has 1 / p' of the total reduction result. + */ + int send_index = 0, recv_index = 0, last_index = nprocs_pof2; + for (int mask = nprocs_pof2 >> 1; mask > 0; mask >>= 1) { + int vpeer = vrank ^ mask; + int peer = (vpeer < nprocs_rem) ? vpeer * 2 + 1 : vpeer + nprocs_rem; + + /* + * Calculate the recv_count and send_count because the + * even-numbered processes who no longer participate will + * have their result calculated by the process to their + * right (rank + 1). + */ + int send_count = 0, recv_count = 0; + if (vrank < vpeer) { + /* Send the right half of the buffer, recv the left half */ + send_index = recv_index + mask; + send_count = rcount * ompi_range_sum(send_index, last_index - 1, nprocs_rem - 1); + recv_count = rcount * ompi_range_sum(recv_index, send_index - 1, nprocs_rem - 1); + } else { + /* Send the left half of the buffer, recv the right half */ + recv_index = send_index + mask; + send_count = rcount * ompi_range_sum(send_index, recv_index - 1, nprocs_rem - 1); + recv_count = rcount * ompi_range_sum(recv_index, last_index - 1, nprocs_rem - 1); + } + ptrdiff_t rdispl = rcount * ((recv_index <= nprocs_rem - 1) ? + 2 * recv_index : nprocs_rem + recv_index); + ptrdiff_t sdispl = rcount * ((send_index <= nprocs_rem - 1) ? + 2 * send_index : nprocs_rem + send_index); + struct ompi_request_t *request = NULL; + + if (recv_count > 0) { + err = MCA_PML_CALL(irecv(tmprecv + rdispl * extent, recv_count, + dtype, peer, MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, + comm, &request)); + if (OMPI_SUCCESS != err) { goto cleanup_and_return; } + } + if (send_count > 0) { + err = MCA_PML_CALL(send(tmpbuf + sdispl * extent, send_count, + dtype, peer, MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, + MCA_PML_BASE_SEND_STANDARD, + comm)); + if (OMPI_SUCCESS != err) { goto cleanup_and_return; } + } + if (recv_count > 0) { + err = ompi_request_wait(&request, MPI_STATUS_IGNORE); + if (OMPI_SUCCESS != err) { goto cleanup_and_return; } + ompi_op_reduce(op, tmprecv + rdispl * extent, + tmpbuf + rdispl * extent, recv_count, dtype); + } + send_index = recv_index; + last_index = recv_index + mask; + } + err = ompi_datatype_copy_content_same_ddt(dtype, rcount, rbuf, + tmpbuf + (ptrdiff_t)rank * rcount * extent); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } + + /* Step 3. Send the result to excluded even ranks */ + if (rank < 2 * nprocs_rem) { + if ((rank % 2) == 0) { + /* Even process */ + err = MCA_PML_CALL(recv(rbuf, rcount, dtype, rank + 1, + MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, comm, + MPI_STATUS_IGNORE)); + if (OMPI_SUCCESS != err) { goto cleanup_and_return; } + } else { + /* Odd process */ + err = MCA_PML_CALL(send(tmpbuf + (ptrdiff_t)(rank - 1) * rcount * extent, + rcount, dtype, rank - 1, + MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, + MCA_PML_BASE_SEND_STANDARD, comm)); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } + } + +cleanup_and_return: + if (tmpbuf_raw) + free(tmpbuf_raw); + if (tmprecv_raw) + free(tmprecv_raw); + return err; +} + +/* + * ompi_mirror_perm: Returns mirror permutation of nbits low-order bits + * of x [*]. + * [*] Warren Jr., Henry S. Hacker's Delight (2ed). 2013. + * Chapter 7. Rearranging Bits and Bytes. + */ +static unsigned int ompi_mirror_perm(unsigned int x, int nbits) +{ + x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1)); + x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2)); + x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4)); + x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8)); + x = ((x >> 16) | (x << 16)); + return x >> (sizeof(x) * CHAR_BIT - nbits); +} + +static int ompi_coll_base_reduce_scatter_block_intra_butterfly_pof2( + const void *sbuf, void *rbuf, int rcount, struct ompi_datatype_t *dtype, + struct ompi_op_t *op, struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +/* + * ompi_coll_base_reduce_scatter_block_intra_butterfly + * + * Function: Butterfly algorithm for reduce_scatter_block + * Accepts: Same as MPI_Reduce_scatter_block + * Returns: MPI_SUCCESS or error code + * + * Description: Implements butterfly algorithm for MPI_Reduce_scatter_block [*]. + * The algorithm can be used both by commutative and non-commutative + * operations, for power-of-two and non-power-of-two number of processes. + * + * [*] J.L. Traff. An improved Algorithm for (non-commutative) Reduce-scatter + * with an Application // Proc. of EuroPVM/MPI, 2005. -- pp. 129-137. + * + * Time complexity: + * m\lambda + (\alpha + m\beta + m\gamma) + + * + 2\log_2(p)\alpha + 2m(1-1/p)\beta + m(1-1/p)\gamma + + * + 3(\alpha + m/p\beta) = O(m\lambda + log(p)\alpha + m\beta + m\gamma), + * where m = rcount * comm_size, p = comm_size + * Memory requirements (per process): 2 * rcount * comm_size * typesize + * + * Example: comm_size=6, nprocs_pof2=4, nprocs_rem=2, rcount=1, sbuf=[0,1,...,5] + * Step 1. Reduce the number of processes to 4 + * rank 0: [0|1|2|3|4|5]: send to 1: vrank -1 + * rank 1: [0|1|2|3|4|5]: recv from 0, op: vrank 0: [0|2|4|6|8|10] + * rank 2: [0|1|2|3|4|5]: send to 3: vrank -1 + * rank 3: [0|1|2|3|4|5]: recv from 2, op: vrank 1: [0|2|4|6|8|10] + * rank 4: [0|1|2|3|4|5]: vrank 2: [0|1|2|3|4|5] + * rank 5: [0|1|2|3|4|5]: vrank 3: [0|1|2|3|4|5] + * + * Step 2. Butterfly. Buffer of 6 elements is divided into 4 blocks. + * Round 1 (mask=1, nblocks=2) + * 0: vrank -1 + * 1: vrank 0 [0 2|4 6|8|10]: exch with 1: send [2,3], recv [0,1]: [0 4|8 12|*|*] + * 2: vrank -1 + * 3: vrank 1 [0 2|4 6|8|10]: exch with 0: send [0,1], recv [2,3]: [**|**|16|20] + * 4: vrank 2 [0 1|2 3|4|5] : exch with 3: send [2,3], recv [0,1]: [0 2|4 6|*|*] + * 5: vrank 3 [0 1|2 3|4|5] : exch with 2: send [0,1], recv [2,3]: [**|**|8|10] + * + * Round 2 (mask=2, nblocks=1) + * 0: vrank -1 + * 1: vrank 0 [0 4|8 12|*|*]: exch with 2: send [1], recv [0]: [0 6|**|*|*] + * 2: vrank -1 + * 3: vrank 1 [**|**|16|20] : exch with 3: send [3], recv [2]: [**|**|24|*] + * 4: vrank 2 [0 2|4 6|*|*] : exch with 0: send [0], recv [1]: [**|12 18|*|*] + * 5: vrank 3 [**|**|8|10] : exch with 1: send [2], recv [3]: [**|**|*|30] + * + * Step 3. Exchange with remote process according to a mirror permutation: + * mperm(0)=0, mperm(1)=2, mperm(2)=1, mperm(3)=3 + * 0: vrank -1: recv "0" from process 0 + * 1: vrank 0 [0 6|**|*|*]: send "0" to 0, copy "6" to rbuf (mperm(0)=0) + * 2: vrank -1: recv result "12" from process 4 + * 3: vrank 1 [**|**|24|*] + * 4: vrank 2 [**|12 18|*|*]: send "12" to 2, send "18" to 3, recv "24" from 3 + * 5: vrank 3 [**|**|*|30]: copy "30" to rbuf (mperm(3)=3) + */ +int +ompi_coll_base_reduce_scatter_block_intra_butterfly( + const void *sbuf, void *rbuf, int rcount, struct ompi_datatype_t *dtype, + struct ompi_op_t *op, struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + char *tmpbuf[2] = {NULL, NULL}, *psend, *precv; + ptrdiff_t span, gap, totalcount, extent; + int err = MPI_SUCCESS; + int comm_size = ompi_comm_size(comm); + int rank = ompi_comm_rank(comm); + + OPAL_OUTPUT((ompi_coll_base_framework.framework_output, + "coll:base:reduce_scatter_block_intra_butterfly: rank %d/%d", + rank, comm_size)); + if (rcount == 0 || comm_size < 2) + return MPI_SUCCESS; + + if (!(comm_size & (comm_size - 1))) { + /* Special case: comm_size is a power of two */ + return ompi_coll_base_reduce_scatter_block_intra_butterfly_pof2( + sbuf, rbuf, rcount, dtype, op, comm, module); + } + + totalcount = comm_size * rcount; + ompi_datatype_type_extent(dtype, &extent); + span = opal_datatype_span(&dtype->super, totalcount, &gap); + tmpbuf[0] = malloc(span); + tmpbuf[1] = malloc(span); + if (NULL == tmpbuf[0] || NULL == tmpbuf[1]) { + err = OMPI_ERR_OUT_OF_RESOURCE; + goto cleanup_and_return; + } + psend = tmpbuf[0] - gap; + precv = tmpbuf[1] - gap; + + if (sbuf != MPI_IN_PLACE) { + err = ompi_datatype_copy_content_same_ddt(dtype, totalcount, psend, (char *)sbuf); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } else { + err = ompi_datatype_copy_content_same_ddt(dtype, totalcount, psend, rbuf); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } + + /* + * Step 1. Reduce the number of processes to the nearest lower power of two + * p' = 2^{\floor{\log_2 p}} by removing r = p - p' processes. + * In the first 2r processes (ranks 0 to 2r - 1), all the even ranks send + * the input vector to their neighbor (rank + 1) and all the odd ranks recv + * the input vector and perform local reduction. + * The odd ranks (0 to 2r - 1) contain the reduction with the input + * vector on their neighbors (the even ranks). The first r odd + * processes and the p - 2r last processes are renumbered from + * 0 to 2^{\floor{\log_2 p}} - 1. Even ranks do not participate in the + * rest of the algorithm. + */ + + /* Find nearest power-of-two less than or equal to comm_size */ + int nprocs_pof2 = opal_next_poweroftwo(comm_size); + nprocs_pof2 >>= 1; + int nprocs_rem = comm_size - nprocs_pof2; + int log2_size = opal_cube_dim(nprocs_pof2); + + int vrank = -1; + if (rank < 2 * nprocs_rem) { + if ((rank % 2) == 0) { + /* Even process */ + err = MCA_PML_CALL(send(psend, totalcount, dtype, rank + 1, + MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, + MCA_PML_BASE_SEND_STANDARD, comm)); + if (OMPI_SUCCESS != err) { goto cleanup_and_return; } + /* This process does not participate in the rest of the algorithm */ + vrank = -1; + } else { + /* Odd process */ + err = MCA_PML_CALL(recv(precv, totalcount, dtype, rank - 1, + MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, + comm, MPI_STATUS_IGNORE)); + if (OMPI_SUCCESS != err) { goto cleanup_and_return; } + ompi_op_reduce(op, precv, psend, totalcount, dtype); + /* Adjust rank to be the bottom "remain" ranks */ + vrank = rank / 2; + } + } else { + /* Adjust rank to show that the bottom "even remain" ranks dropped out */ + vrank = rank - nprocs_rem; + } + + if (vrank != -1) { + /* + * Now, psend vector of size rcount * comm_size elements is divided into + * nprocs_pof2 blocks: + * block 0 has 2*rcount elems (for process 0 and 1) + * block 1 has 2*rcount elems (for process 2 and 3) + * ... + * block r-1 has 2*rcount elems (for process 2*(r-1) and 2*(r-1)+1) + * block r has rcount elems (for process r+r) + * block r+1 has rcount elems (for process r+r+1) + * ... + * block nprocs_pof2 - 1 has rcount elems (for process r + nprocs_pof2-1) + */ + int nblocks = nprocs_pof2, send_index = 0, recv_index = 0; + for (int mask = 1; mask < nprocs_pof2; mask <<= 1) { + int vpeer = vrank ^ mask; + int peer = (vpeer < nprocs_rem) ? vpeer * 2 + 1 : vpeer + nprocs_rem; + + nblocks /= 2; + if ((vrank & mask) == 0) { + /* Send the upper half of reduction buffer, recv the lower half */ + send_index += nblocks; + } else { + /* Send the upper half of reduction buffer, recv the lower half */ + recv_index += nblocks; + } + int send_count = rcount * ompi_range_sum(send_index, + send_index + nblocks - 1, nprocs_rem - 1); + int recv_count = rcount * ompi_range_sum(recv_index, + recv_index + nblocks - 1, nprocs_rem - 1); + ptrdiff_t sdispl = rcount * ((send_index <= nprocs_rem - 1) ? + 2 * send_index : nprocs_rem + send_index); + ptrdiff_t rdispl = rcount * ((recv_index <= nprocs_rem - 1) ? + 2 * recv_index : nprocs_rem + recv_index); + + err = ompi_coll_base_sendrecv(psend + (ptrdiff_t)sdispl * extent, send_count, + dtype, peer, MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, + precv + (ptrdiff_t)rdispl * extent, recv_count, + dtype, peer, MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, + comm, MPI_STATUS_IGNORE, rank); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + + if (vrank < vpeer) { + /* precv = psend precv */ + ompi_op_reduce(op, psend + (ptrdiff_t)rdispl * extent, + precv + (ptrdiff_t)rdispl * extent, recv_count, dtype); + char *p = psend; + psend = precv; + precv = p; + } else { + /* psend = precv psend */ + ompi_op_reduce(op, precv + (ptrdiff_t)rdispl * extent, + psend + (ptrdiff_t)rdispl * extent, recv_count, dtype); + } + send_index = recv_index; + } + /* + * psend points to the result: [send_index, send_index + recv_count - 1] + * Exchange results with remote process according to a mirror permutation. + */ + int vpeer = ompi_mirror_perm(vrank, log2_size); + int peer = (vpeer < nprocs_rem) ? vpeer * 2 + 1 : vpeer + nprocs_rem; + + if (vpeer < nprocs_rem) { + /* + * Process has two blocks: for excluded process and own. + * Send result to the excluded process. + */ + ptrdiff_t sdispl = rcount * ((send_index <= nprocs_rem - 1) ? + 2 * send_index : nprocs_rem + send_index); + err = MCA_PML_CALL(send(psend + (ptrdiff_t)sdispl * extent, + rcount, dtype, peer - 1, + MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, + MCA_PML_BASE_SEND_STANDARD, comm)); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } + + /* Send result to a remote process according to a mirror permutation */ + ptrdiff_t sdispl = rcount * ((send_index <= nprocs_rem - 1) ? + 2 * send_index : nprocs_rem + send_index); + /* If process has two blocks, then send the second block (own block) */ + if (vpeer < nprocs_rem) + sdispl += rcount; + if (vpeer != vrank) { + err = ompi_coll_base_sendrecv(psend + (ptrdiff_t)sdispl * extent, rcount, + dtype, peer, MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, + rbuf, rcount, dtype, peer, + MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, + comm, MPI_STATUS_IGNORE, rank); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } else { + err = ompi_datatype_copy_content_same_ddt(dtype, rcount, rbuf, + psend + (ptrdiff_t)sdispl * extent); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } + + } else { + /* Excluded process: receive result */ + int vpeer = ompi_mirror_perm((rank + 1) / 2, log2_size); + int peer = (vpeer < nprocs_rem) ? vpeer * 2 + 1 : vpeer + nprocs_rem; + err = MCA_PML_CALL(recv(rbuf, rcount, dtype, peer, + MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, comm, + MPI_STATUS_IGNORE)); + if (OMPI_SUCCESS != err) { goto cleanup_and_return; } + } + +cleanup_and_return: + if (tmpbuf[0]) + free(tmpbuf[0]); + if (tmpbuf[1]) + free(tmpbuf[1]); + return err; +} + +/* + * ompi_coll_base_reduce_scatter_block_intra_butterfly_pof2 + * + * Function: Butterfly algorithm for reduce_scatter_block + * Accepts: Same as MPI_Reduce_scatter_block + * Returns: MPI_SUCCESS or error code + * Limitations: Power-of-two number of processes. + * + * Description: Implements butterfly algorithm for MPI_Reduce_scatter_block [*]. + * The algorithm can be used both by commutative and non-commutative + * operations, for power-of-two number of processes. + * + * [*] J.L. Traff. An improved Algorithm for (non-commutative) Reduce-scatter + * with an Application // Proc. of EuroPVM/MPI, 2005. -- pp. 129-137. + * + * Time complexity: + * m\lambda + 2\log_2(p)\alpha + 2m(1-1/p)\beta + m(1-1/p)\gamma + m/p\lambda = + * = O(m\lambda + log(p)\alpha + m\beta + m\gamma), + * where m = rcount * comm_size, p = comm_size + * Memory requirements (per process): 2 * rcount * comm_size * typesize + * + * Example: comm_size=4, rcount=1, sbuf=[0,1,2,3] + * Step 1. Permute the blocks according to a mirror permutation: + * mperm(0)=0, mperm(1)=2, mperm(2)=1, mperm(3)=3 + * sbuf=[0|1|2|3] ==> psend=[0|2|1|3] + * + * Step 2. Butterfly + * Round 1 (mask=1, nblocks=2) + * 0: [0|2|1|3]: exch with 1: send [2,3], recv [0,1]: [0|4|*|*] + * 1: [0|2|1|3]: exch with 0: send [0,1], recv [2,3]: [*|*|2|6] + * 2: [0|2|1|3]: exch with 3: send [2,3], recv [0,1]: [0|4|*|*] + * 3: [0|2|1|3]: exch with 2: send [0,1], recv [2,3]: [*|*|2|6] + * + * Round 2 (mask=2, nblocks=1) + * 0: [0|4|*|*]: exch with 2: send [1], recv [0]: [0|*|*|*] + * 1: [*|*|2|6]: exch with 3: send [3], recv [2]: [*|*|4|*] + * 2: [0|4|*|*]: exch with 0: send [0], recv [1]: [*|8|*|*] + * 3: [*|*|2|6]: exch with 1: send [2], recv [3]: [*|*|*|12] + * + * Step 3. Copy result to rbuf + */ +static int +ompi_coll_base_reduce_scatter_block_intra_butterfly_pof2( + const void *sbuf, void *rbuf, int rcount, struct ompi_datatype_t *dtype, + struct ompi_op_t *op, struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + char *tmpbuf[2] = {NULL, NULL}, *psend, *precv; + ptrdiff_t span, gap, totalcount, extent; + int err = MPI_SUCCESS; + int comm_size = ompi_comm_size(comm); + int rank = ompi_comm_rank(comm); + + if (rcount == 0 || comm_size < 2) + return MPI_SUCCESS; + + totalcount = comm_size * rcount; + ompi_datatype_type_extent(dtype, &extent); + span = opal_datatype_span(&dtype->super, totalcount, &gap); + tmpbuf[0] = malloc(span); + tmpbuf[1] = malloc(span); + if (NULL == tmpbuf[0] || NULL == tmpbuf[1]) { + err = OMPI_ERR_OUT_OF_RESOURCE; + goto cleanup_and_return; + } + psend = tmpbuf[0] - gap; + precv = tmpbuf[1] - gap; + + /* Permute the blocks according to a mirror permutation */ + int log2_comm_size = opal_cube_dim(comm_size); + char *pdata = (sbuf != MPI_IN_PLACE) ? (char *)sbuf : rbuf; + for (int i = 0; i < comm_size; i++) { + char *src = pdata + (ptrdiff_t)i * extent * rcount; + char *dst = psend + (ptrdiff_t)ompi_mirror_perm(i, log2_comm_size) * extent * rcount; + err = ompi_datatype_copy_content_same_ddt(dtype, rcount, dst, src); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } + + int nblocks = totalcount, send_index = 0, recv_index = 0; + for (int mask = 1; mask < comm_size; mask <<= 1) { + int peer = rank ^ mask; + nblocks /= 2; + + if ((rank & mask) == 0) { + /* Send the upper half of reduction buffer, recv the lower half */ + send_index += nblocks; + } else { + /* Send the upper half of reduction buffer, recv the lower half */ + recv_index += nblocks; + } + err = ompi_coll_base_sendrecv(psend + (ptrdiff_t)send_index * extent, + nblocks, dtype, peer, + MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, + precv + (ptrdiff_t)recv_index * extent, + nblocks, dtype, peer, + MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK, + comm, MPI_STATUS_IGNORE, rank); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + + if (rank < peer) { + /* precv = psend precv */ + ompi_op_reduce(op, psend + (ptrdiff_t)recv_index * extent, + precv + (ptrdiff_t)recv_index * extent, nblocks, dtype); + char *p = psend; + psend = precv; + precv = p; + } else { + /* psend = precv psend */ + ompi_op_reduce(op, precv + (ptrdiff_t)recv_index * extent, + psend + (ptrdiff_t)recv_index * extent, nblocks, dtype); + } + send_index = recv_index; + } + /* Copy the result to the rbuf */ + err = ompi_datatype_copy_content_same_ddt(dtype, rcount, rbuf, + psend + (ptrdiff_t)recv_index * extent); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + +cleanup_and_return: + if (tmpbuf[0]) + free(tmpbuf[0]); + if (tmpbuf[1]) + free(tmpbuf[1]); + return err; +} diff --git a/ompi/mca/coll/base/coll_base_scan.c b/ompi/mca/coll/base/coll_base_scan.c new file mode 100644 index 00000000000..a82e837965f --- /dev/null +++ b/ompi/mca/coll/base/coll_base_scan.c @@ -0,0 +1,230 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ +/* + * Copyright (c) 2018 Siberian State University of Telecommunications + * and Information Science. All rights reserved. + * Copyright (c) 2018 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "ompi_config.h" + +#include "mpi.h" +#include "ompi/constants.h" +#include "ompi/datatype/ompi_datatype.h" +#include "ompi/communicator/communicator.h" +#include "ompi/mca/coll/coll.h" +#include "ompi/mca/coll/base/coll_base_functions.h" +#include "ompi/mca/coll/base/coll_tags.h" +#include "ompi/mca/coll/base/coll_base_util.h" +#include "ompi/mca/pml/pml.h" +#include "ompi/op/op.h" + +/* + * ompi_coll_base_scan_intra_linear + * + * Function: Linear algorithm for inclusive scan. + * Accepts: Same as MPI_Scan + * Returns: MPI_SUCCESS or error code + */ +int +ompi_coll_base_scan_intra_linear(const void *sbuf, void *rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + int size, rank, err; + ptrdiff_t dsize, gap; + char *free_buffer = NULL; + char *pml_buffer = NULL; + + /* Initialize */ + + rank = ompi_comm_rank(comm); + size = ompi_comm_size(comm); + + /* If I'm rank 0, just copy into the receive buffer */ + + if (0 == rank) { + if (MPI_IN_PLACE != sbuf) { + err = ompi_datatype_copy_content_same_ddt(dtype, count, (char*)rbuf, (char*)sbuf); + if (MPI_SUCCESS != err) { + return err; + } + } + } + + /* Otherwise receive previous buffer and reduce. */ + + else { + /* Allocate a temporary buffer. Rationale for this size is + * listed in coll_basic_reduce.c. Use this temporary buffer to + * receive into, later. */ + + dsize = opal_datatype_span(&dtype->super, count, &gap); + free_buffer = malloc(dsize); + if (NULL == free_buffer) { + return OMPI_ERR_OUT_OF_RESOURCE; + } + pml_buffer = free_buffer - gap; + + /* Copy the send buffer into the receive buffer. */ + + if (MPI_IN_PLACE != sbuf) { + err = ompi_datatype_copy_content_same_ddt(dtype, count, (char*)rbuf, (char*)sbuf); + if (MPI_SUCCESS != err) { + if (NULL != free_buffer) { + free(free_buffer); + } + return err; + } + } + + /* Receive the prior answer */ + + err = MCA_PML_CALL(recv(pml_buffer, count, dtype, + rank - 1, MCA_COLL_BASE_TAG_SCAN, comm, + MPI_STATUS_IGNORE)); + if (MPI_SUCCESS != err) { + if (NULL != free_buffer) { + free(free_buffer); + } + return err; + } + + /* Perform the operation */ + + ompi_op_reduce(op, pml_buffer, rbuf, count, dtype); + + /* All done */ + + if (NULL != free_buffer) { + free(free_buffer); + } + } + + /* Send result to next process. */ + + if (rank < (size - 1)) { + return MCA_PML_CALL(send(rbuf, count, dtype, rank + 1, + MCA_COLL_BASE_TAG_SCAN, + MCA_PML_BASE_SEND_STANDARD, comm)); + } + + /* All done */ + + return MPI_SUCCESS; +} + + +/* + * ompi_coll_base_scan_intra_recursivedoubling + * + * Function: Recursive doubling algorithm for inclusive scan. + * Accepts: Same as MPI_Scan + * Returns: MPI_SUCCESS or error code + * + * Description: Implements recursive doubling algorithm for MPI_Scan. + * The algorithm preserves order of operations so it can + * be used both by commutative and non-commutative operations. + * + * Example for 5 processes and commutative operation MPI_SUM: + * Process: 0 1 2 3 4 + * recvbuf: [0] [1] [2] [3] [4] + * psend: [0] [1] [2] [3] [4] + * + * Step 1: + * recvbuf: [0] [0+1] [2] [2+3] [4] + * psend: [1+0] [0+1] [3+2] [2+3] [4] + * + * Step 2: + * recvbuf: [0] [0+1] [(1+0)+2] [(1+0)+(2+3)] [4] + * psend: [(3+2)+(1+0)] [(2+3)+(0+1)] [(1+0)+(3+2)] [(1+0)+(2+3)] [4] + * + * Step 3: + * recvbuf: [0] [0+1] [(1+0)+2] [(1+0)+(2+3)] [((3+2)+(1+0))+4] + * psend: [4+((3+2)+(1+0))] [((3+2)+(1+0))+4] + * + * Time complexity (worst case): \ceil(\log_2(p))(2\alpha + 2m\beta + 2m\gamma) + * Memory requirements (per process): 2 * count * typesize = O(count) + * Limitations: intra-communicators only + */ +int ompi_coll_base_scan_intra_recursivedoubling( + const void *sendbuf, void *recvbuf, int count, struct ompi_datatype_t *datatype, + struct ompi_op_t *op, struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + int err = MPI_SUCCESS; + char *tmpsend_raw = NULL, *tmprecv_raw = NULL; + int comm_size = ompi_comm_size(comm); + int rank = ompi_comm_rank(comm); + + OPAL_OUTPUT((ompi_coll_base_framework.framework_output, + "coll:base:scan_intra_recursivedoubling: rank %d/%d", + rank, comm_size)); + if (count == 0) + return MPI_SUCCESS; + + if (sendbuf != MPI_IN_PLACE) { + err = ompi_datatype_copy_content_same_ddt(datatype, count, recvbuf, (char *)sendbuf); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + } + if (comm_size < 2) + return MPI_SUCCESS; + + ptrdiff_t dsize, gap; + dsize = opal_datatype_span(&datatype->super, count, &gap); + tmpsend_raw = malloc(dsize); + tmprecv_raw = malloc(dsize); + if (NULL == tmpsend_raw || NULL == tmprecv_raw) { + err = OMPI_ERR_OUT_OF_RESOURCE; + goto cleanup_and_return; + } + char *psend = tmpsend_raw - gap; + char *precv = tmprecv_raw - gap; + err = ompi_datatype_copy_content_same_ddt(datatype, count, psend, recvbuf); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + int is_commute = ompi_op_is_commute(op); + + for (int mask = 1; mask < comm_size; mask <<= 1) { + int remote = rank ^ mask; + if (remote < comm_size) { + err = ompi_coll_base_sendrecv(psend, count, datatype, remote, + MCA_COLL_BASE_TAG_SCAN, + precv, count, datatype, remote, + MCA_COLL_BASE_TAG_SCAN, comm, + MPI_STATUS_IGNORE, rank); + if (MPI_SUCCESS != err) { goto cleanup_and_return; } + + if (rank > remote) { + /* Accumulate prefix reduction: recvbuf = precv recvbuf */ + ompi_op_reduce(op, precv, recvbuf, count, datatype); + /* Partial result: psend = precv psend */ + ompi_op_reduce(op, precv, psend, count, datatype); + } else { + if (is_commute) { + /* psend = precv psend */ + ompi_op_reduce(op, precv, psend, count, datatype); + } else { + /* precv = psend precv */ + ompi_op_reduce(op, psend, precv, count, datatype); + char *tmp = psend; + psend = precv; + precv = tmp; + } + } + } + } + +cleanup_and_return: + if (NULL != tmpsend_raw) + free(tmpsend_raw); + if (NULL != tmprecv_raw) + free(tmprecv_raw); + return err; +} diff --git a/ompi/mca/coll/base/coll_base_scatter.c b/ompi/mca/coll/base/coll_base_scatter.c index 0239bd9aea4..ba952885053 100644 --- a/ompi/mca/coll/base/coll_base_scatter.c +++ b/ompi/mca/coll/base/coll_base_scatter.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2015 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -49,7 +49,7 @@ ompi_coll_base_scatter_intra_binomial( const void *sbuf, int scount, MPI_Status status; mca_coll_base_module_t *base_module = (mca_coll_base_module_t*) module; mca_coll_base_comm_t *data = base_module->base_data; - ptrdiff_t sextent, rextent, ssize, rsize, sgap, rgap; + ptrdiff_t sextent, rextent, ssize, rsize, sgap = 0, rgap = 0; size = ompi_comm_size(comm); diff --git a/ompi/mca/coll/base/coll_base_util.c b/ompi/mca/coll/base/coll_base_util.c index 338146d4045..d35c14173a5 100644 --- a/ompi/mca/coll/base/coll_base_util.c +++ b/ompi/mca/coll/base/coll_base_util.c @@ -9,7 +9,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2014-2016 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -29,16 +29,16 @@ #include "ompi/mca/pml/pml.h" #include "coll_base_util.h" -int ompi_coll_base_sendrecv_nonzero_actual( void* sendbuf, size_t scount, - ompi_datatype_t* sdatatype, - int dest, int stag, - void* recvbuf, size_t rcount, - ompi_datatype_t* rdatatype, - int source, int rtag, - struct ompi_communicator_t* comm, - ompi_status_public_t* status ) +int ompi_coll_base_sendrecv_actual( const void* sendbuf, size_t scount, + ompi_datatype_t* sdatatype, + int dest, int stag, + void* recvbuf, size_t rcount, + ompi_datatype_t* rdatatype, + int source, int rtag, + struct ompi_communicator_t* comm, + ompi_status_public_t* status ) -{ /* post receive first, then send, then waitall... should be fast (I hope) */ +{ /* post receive first, then send, then wait... should be fast (I hope) */ int err, line = 0; size_t rtypesize, stypesize; ompi_request_t *req; @@ -46,30 +46,21 @@ int ompi_coll_base_sendrecv_nonzero_actual( void* sendbuf, size_t scount, /* post new irecv */ ompi_datatype_type_size(rdatatype, &rtypesize); - if (0 != rcount && 0 != rtypesize) { - err = MCA_PML_CALL(irecv( recvbuf, rcount, rdatatype, source, rtag, - comm, &req)); - if (err != MPI_SUCCESS) { line = __LINE__; goto error_handler; } - } + err = MCA_PML_CALL(irecv( recvbuf, rcount, rdatatype, source, rtag, + comm, &req)); + if (err != MPI_SUCCESS) { line = __LINE__; goto error_handler; } /* send data to children */ ompi_datatype_type_size(sdatatype, &stypesize); - if (0 != scount && 0 != stypesize) { - err = MCA_PML_CALL(send( sendbuf, scount, sdatatype, dest, stag, - MCA_PML_BASE_SEND_STANDARD, comm)); - if (err != MPI_SUCCESS) { line = __LINE__; goto error_handler; } - } + err = MCA_PML_CALL(send( sendbuf, scount, sdatatype, dest, stag, + MCA_PML_BASE_SEND_STANDARD, comm)); + if (err != MPI_SUCCESS) { line = __LINE__; goto error_handler; } - if (0 != rcount && 0 != rtypesize) { - err = ompi_request_wait( &req, &rstatus); - if (err != MPI_SUCCESS) { line = __LINE__; goto error_handler; } + err = ompi_request_wait( &req, &rstatus); + if (err != MPI_SUCCESS) { line = __LINE__; goto error_handler; } - if (MPI_STATUS_IGNORE != status) { - *status = rstatus; - } - } else { - if( MPI_STATUS_IGNORE != status ) - *status = ompi_status_empty; + if (MPI_STATUS_IGNORE != status) { + *status = rstatus; } return (MPI_SUCCESS); diff --git a/ompi/mca/coll/base/coll_base_util.h b/ompi/mca/coll/base/coll_base_util.h index 12523d337f6..df1f7d18f40 100644 --- a/ompi/mca/coll/base/coll_base_util.h +++ b/ompi/mca/coll/base/coll_base_util.h @@ -9,7 +9,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2014 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -36,14 +36,14 @@ BEGIN_C_DECLS * If one of the communications results in a zero-byte message the * communication is ignored, and no message will cross to the peer. */ -int ompi_coll_base_sendrecv_nonzero_actual( void* sendbuf, size_t scount, - ompi_datatype_t* sdatatype, - int dest, int stag, - void* recvbuf, size_t rcount, - ompi_datatype_t* rdatatype, - int source, int rtag, - struct ompi_communicator_t* comm, - ompi_status_public_t* status ); +int ompi_coll_base_sendrecv_actual( const void* sendbuf, size_t scount, + ompi_datatype_t* sdatatype, + int dest, int stag, + void* recvbuf, size_t rcount, + ompi_datatype_t* rdatatype, + int source, int rtag, + struct ompi_communicator_t* comm, + ompi_status_public_t* status ); /** @@ -64,10 +64,10 @@ ompi_coll_base_sendrecv( void* sendbuf, size_t scount, ompi_datatype_t* sdatatyp return (int) ompi_datatype_sndrcv(sendbuf, (int32_t) scount, sdatatype, recvbuf, (int32_t) rcount, rdatatype); } - return ompi_coll_base_sendrecv_nonzero_actual (sendbuf, scount, sdatatype, - dest, stag, - recvbuf, rcount, rdatatype, - source, rtag, comm, status); + return ompi_coll_base_sendrecv_actual (sendbuf, scount, sdatatype, + dest, stag, + recvbuf, rcount, rdatatype, + source, rtag, comm, status); } END_C_DECLS diff --git a/ompi/mca/coll/base/coll_tags.h b/ompi/mca/coll/base/coll_tags.h index 45c9724dba3..f40f029fbbc 100644 --- a/ompi/mca/coll/base/coll_tags.h +++ b/ompi/mca/coll/base/coll_tags.h @@ -37,10 +37,11 @@ #define MCA_COLL_BASE_TAG_GATHERV -20 #define MCA_COLL_BASE_TAG_REDUCE -21 #define MCA_COLL_BASE_TAG_REDUCE_SCATTER -22 -#define MCA_COLL_BASE_TAG_SCAN -23 -#define MCA_COLL_BASE_TAG_SCATTER -24 -#define MCA_COLL_BASE_TAG_SCATTERV -25 -#define MCA_COLL_BASE_TAG_NONBLOCKING_BASE -26 +#define MCA_COLL_BASE_TAG_REDUCE_SCATTER_BLOCK -23 +#define MCA_COLL_BASE_TAG_SCAN -24 +#define MCA_COLL_BASE_TAG_SCATTER -25 +#define MCA_COLL_BASE_TAG_SCATTERV -26 +#define MCA_COLL_BASE_TAG_NONBLOCKING_BASE -27 #define MCA_COLL_BASE_TAG_NONBLOCKING_END ((-1 * INT_MAX/2) + 1) #define MCA_COLL_BASE_TAG_HCOLL_BASE (-1 * INT_MAX/2) #define MCA_COLL_BASE_TAG_HCOLL_END (-1 * INT_MAX) diff --git a/ompi/mca/coll/basic/Makefile.am b/ompi/mca/coll/basic/Makefile.am index e0abe4f3211..341c5def950 100644 --- a/ompi/mca/coll/basic/Makefile.am +++ b/ompi/mca/coll/basic/Makefile.am @@ -13,6 +13,7 @@ # Copyright (c) 2012 Sandia National Laboratories. All rights reserved. # Copyright (c) 2013 Los Alamos National Security, LLC. All rights # reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -63,6 +64,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_coll_basic_la_SOURCES = $(sources) mca_coll_basic_la_LDFLAGS = -module -avoid-version +mca_coll_basic_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_coll_basic_la_SOURCES =$(sources) diff --git a/ompi/mca/coll/basic/coll_basic_allgather.c b/ompi/mca/coll/basic/coll_basic_allgather.c index 66ff5eed7fe..446a5fe49ad 100644 --- a/ompi/mca/coll/basic/coll_basic_allgather.c +++ b/ompi/mca/coll/basic/coll_basic_allgather.c @@ -11,6 +11,7 @@ * All rights reserved. * Copyright (c) 2014-2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -78,7 +79,7 @@ mca_coll_basic_allgather_inter(const void *sbuf, int scount, if (OMPI_SUCCESS != err) { line = __LINE__; goto exit; } /* Get a requests arrays of the right size */ - reqs = coll_base_comm_get_reqs(module->base_data, rsize + 1); + reqs = ompi_coll_base_comm_get_reqs(module->base_data, rsize + 1); if( NULL == reqs ) { line = __LINE__; err = OMPI_ERR_OUT_OF_RESOURCE; goto exit; } /* Do a send-recv between the two root procs. to avoid deadlock */ diff --git a/ompi/mca/coll/basic/coll_basic_allreduce.c b/ompi/mca/coll/basic/coll_basic_allreduce.c index 23463ea0e24..84f60f2f685 100644 --- a/ompi/mca/coll/basic/coll_basic_allreduce.c +++ b/ompi/mca/coll/basic/coll_basic_allreduce.c @@ -9,8 +9,9 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -27,6 +28,7 @@ #include "ompi/op/op.h" #include "ompi/mca/coll/coll.h" #include "ompi/mca/coll/base/coll_tags.h" +#include "ompi/mca/coll/base/coll_base_util.h" #include "coll_basic.h" #include "ompi/mca/pml/pml.h" @@ -83,7 +85,6 @@ mca_coll_basic_allreduce_inter(const void *sbuf, void *rbuf, int count, int err, i, rank, root = 0, rsize, line; ptrdiff_t extent, dsize, gap; char *tmpbuf = NULL, *pml_buffer = NULL; - ompi_request_t *req[2]; ompi_request_t **reqs = NULL; rank = ompi_comm_rank(comm); @@ -109,23 +110,16 @@ mca_coll_basic_allreduce_inter(const void *sbuf, void *rbuf, int count, pml_buffer = tmpbuf - gap; if (rsize > 1) { - reqs = coll_base_comm_get_reqs(module->base_data, rsize - 1); + reqs = ompi_coll_base_comm_get_reqs(module->base_data, rsize - 1); if( NULL == reqs ) { err = OMPI_ERR_OUT_OF_RESOURCE; line = __LINE__; goto exit; } } /* Do a send-recv between the two root procs. to avoid deadlock */ - err = MCA_PML_CALL(irecv(rbuf, count, dtype, 0, - MCA_COLL_BASE_TAG_ALLREDUCE, comm, - &(req[0]))); - if (OMPI_SUCCESS != err) { line = __LINE__; goto exit; } - - err = MCA_PML_CALL(isend(sbuf, count, dtype, 0, - MCA_COLL_BASE_TAG_ALLREDUCE, - MCA_PML_BASE_SEND_STANDARD, - comm, &(req[1]))); - if (OMPI_SUCCESS != err) { line = __LINE__; goto exit; } - - err = ompi_request_wait_all(2, req, MPI_STATUSES_IGNORE); + err = ompi_coll_base_sendrecv_actual(sbuf, count, dtype, 0, + MCA_COLL_BASE_TAG_ALLREDUCE, + rbuf, count, dtype, 0, + MCA_COLL_BASE_TAG_ALLREDUCE, + comm, MPI_STATUS_IGNORE); if (OMPI_SUCCESS != err) { line = __LINE__; goto exit; } /* Loop receiving and calling reduction function (C or Fortran). */ @@ -154,18 +148,11 @@ mca_coll_basic_allreduce_inter(const void *sbuf, void *rbuf, int count, /***************************************************************************/ if (rank == root) { /* sendrecv between the two roots */ - err = MCA_PML_CALL(irecv(pml_buffer, count, dtype, 0, - MCA_COLL_BASE_TAG_ALLREDUCE, - comm, &(req[1]))); - if (OMPI_SUCCESS != err) { line = __LINE__; goto exit; } - - err = MCA_PML_CALL(isend(rbuf, count, dtype, 0, - MCA_COLL_BASE_TAG_ALLREDUCE, - MCA_PML_BASE_SEND_STANDARD, comm, - &(req[0]))); - if (OMPI_SUCCESS != err) { line = __LINE__; goto exit; } - - err = ompi_request_wait_all(2, req, MPI_STATUSES_IGNORE); + err = ompi_coll_base_sendrecv_actual(rbuf, count, dtype, 0, + MCA_COLL_BASE_TAG_ALLREDUCE, + pml_buffer, count, dtype, 0, + MCA_COLL_BASE_TAG_ALLREDUCE, + comm, MPI_STATUS_IGNORE); if (OMPI_SUCCESS != err) { line = __LINE__; goto exit; } /* distribute the data to other processes in remote group. diff --git a/ompi/mca/coll/basic/coll_basic_alltoall.c b/ompi/mca/coll/basic/coll_basic_alltoall.c index acb08b8455c..6d3ff46adcd 100644 --- a/ompi/mca/coll/basic/coll_basic_alltoall.c +++ b/ompi/mca/coll/basic/coll_basic_alltoall.c @@ -14,6 +14,7 @@ * reserved. * Copyright (c) 2014-2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -77,7 +78,7 @@ mca_coll_basic_alltoall_inter(const void *sbuf, int scount, /* Initiate all send/recv to/from others. */ nreqs = size * 2; - req = rreq = coll_base_comm_get_reqs( module->base_data, nreqs); + req = rreq = ompi_coll_base_comm_get_reqs( module->base_data, nreqs); if( NULL == req ) { return OMPI_ERR_OUT_OF_RESOURCE; } sreq = rreq + size; diff --git a/ompi/mca/coll/basic/coll_basic_alltoallv.c b/ompi/mca/coll/basic/coll_basic_alltoallv.c index aa66aa3c075..26e585ce2e8 100644 --- a/ompi/mca/coll/basic/coll_basic_alltoallv.c +++ b/ompi/mca/coll/basic/coll_basic_alltoallv.c @@ -15,6 +15,7 @@ * Copyright (c) 2013 FUJITSU LIMITED. All rights reserved. * Copyright (c) 2014-2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -68,7 +69,7 @@ mca_coll_basic_alltoallv_inter(const void *sbuf, const int *scounts, const int * /* Initiate all send/recv to/from others. */ nreqs = rsize * 2; - preq = coll_base_comm_get_reqs(module->base_data, nreqs); + preq = ompi_coll_base_comm_get_reqs(module->base_data, nreqs); if( NULL == preq ) { return OMPI_ERR_OUT_OF_RESOURCE; } /* Post all receives first */ diff --git a/ompi/mca/coll/basic/coll_basic_alltoallw.c b/ompi/mca/coll/basic/coll_basic_alltoallw.c index fcdc4262c98..93fa880fc2d 100644 --- a/ompi/mca/coll/basic/coll_basic_alltoallw.c +++ b/ompi/mca/coll/basic/coll_basic_alltoallw.c @@ -17,6 +17,7 @@ * Copyright (c) 2014-2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -179,7 +180,7 @@ mca_coll_basic_alltoallw_intra(const void *sbuf, const int *scounts, const int * /* Initiate all send/recv to/from others. */ nreqs = 0; - reqs = preq = coll_base_comm_get_reqs(module->base_data, 2 * size); + reqs = preq = ompi_coll_base_comm_get_reqs(module->base_data, 2 * size); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } /* Post all receives first -- a simple optimization */ @@ -269,7 +270,7 @@ mca_coll_basic_alltoallw_inter(const void *sbuf, const int *scounts, const int * /* Initiate all send/recv to/from others. */ nreqs = 0; - reqs = preq = coll_base_comm_get_reqs(module->base_data, 2 * size); + reqs = preq = ompi_coll_base_comm_get_reqs(module->base_data, 2 * size); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } /* Post all receives first -- a simple optimization */ diff --git a/ompi/mca/coll/basic/coll_basic_bcast.c b/ompi/mca/coll/basic/coll_basic_bcast.c index 9dbbb9ac36c..3003582ded3 100644 --- a/ompi/mca/coll/basic/coll_basic_bcast.c +++ b/ompi/mca/coll/basic/coll_basic_bcast.c @@ -12,6 +12,7 @@ * Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -81,7 +82,7 @@ mca_coll_basic_bcast_log_intra(void *buff, int count, /* Send data to the children. */ - reqs = coll_base_comm_get_reqs(module->base_data, size); + reqs = ompi_coll_base_comm_get_reqs(module->base_data, size); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } err = MPI_SUCCESS; @@ -156,7 +157,7 @@ mca_coll_basic_bcast_lin_inter(void *buff, int count, MCA_COLL_BASE_TAG_BCAST, comm, MPI_STATUS_IGNORE)); } else { - reqs = coll_base_comm_get_reqs(module->base_data, rsize); + reqs = ompi_coll_base_comm_get_reqs(module->base_data, rsize); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } /* root section */ diff --git a/ompi/mca/coll/basic/coll_basic_exscan.c b/ompi/mca/coll/basic/coll_basic_exscan.c index 057bcfa48c5..1c6c23dfee6 100644 --- a/ompi/mca/coll/basic/coll_basic_exscan.c +++ b/ompi/mca/coll/basic/coll_basic_exscan.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011 NVIDIA Corporation. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2018 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -48,72 +48,7 @@ mca_coll_basic_exscan_intra(const void *sbuf, void *rbuf, int count, struct ompi_communicator_t *comm, mca_coll_base_module_t *module) { - int size, rank, err; - ptrdiff_t dsize, gap; - char *free_buffer = NULL; - char *reduce_buffer = NULL; - - rank = ompi_comm_rank(comm); - size = ompi_comm_size(comm); - - /* For MPI_IN_PLACE, just adjust send buffer to point to - * receive buffer. */ - if (MPI_IN_PLACE == sbuf) { - sbuf = rbuf; - } - - /* If we're rank 0, then just send our sbuf to the next rank, and - * we are done. */ - if (0 == rank) { - return MCA_PML_CALL(send(sbuf, count, dtype, rank + 1, - MCA_COLL_BASE_TAG_EXSCAN, - MCA_PML_BASE_SEND_STANDARD, comm)); - } - - /* If we're the last rank, then just receive the result from the - * prior rank, and we are done. */ - else if ((size - 1) == rank) { - return MCA_PML_CALL(recv(rbuf, count, dtype, rank - 1, - MCA_COLL_BASE_TAG_EXSCAN, comm, - MPI_STATUS_IGNORE)); - } - - /* Otherwise, get the result from the prior rank, combine it with my - * data, and send it to the next rank */ - - /* Get a temporary buffer to perform the reduction into. Rationale - * for malloc'ing this size is provided in coll_basic_reduce.c. */ - dsize = opal_datatype_span(&dtype->super, count, &gap); - - free_buffer = (char*)malloc(dsize); - if (NULL == free_buffer) { - return OMPI_ERR_OUT_OF_RESOURCE; - } - reduce_buffer = free_buffer - gap; - err = ompi_datatype_copy_content_same_ddt(dtype, count, - reduce_buffer, (char*)sbuf); - - /* Receive the reduced value from the prior rank */ - err = MCA_PML_CALL(recv(rbuf, count, dtype, rank - 1, - MCA_COLL_BASE_TAG_EXSCAN, comm, MPI_STATUS_IGNORE)); - if (MPI_SUCCESS != err) { - goto error; - } - - /* Now reduce the prior rank's result with my source buffer. The source - * buffer had been previously copied into the temporary reduce_buffer. */ - ompi_op_reduce(op, rbuf, reduce_buffer, count, dtype); - - /* Send my result off to the next rank */ - err = MCA_PML_CALL(send(reduce_buffer, count, dtype, rank + 1, - MCA_COLL_BASE_TAG_EXSCAN, - MCA_PML_BASE_SEND_STANDARD, comm)); - /* Error */ - error: - free(free_buffer); - - /* All done */ - return err; + return ompi_coll_base_exscan_intra_linear(sbuf, rbuf, count, dtype, op, comm, module); } diff --git a/ompi/mca/coll/basic/coll_basic_gatherv.c b/ompi/mca/coll/basic/coll_basic_gatherv.c index 047a70d4e01..6ea30c49afe 100644 --- a/ompi/mca/coll/basic/coll_basic_gatherv.c +++ b/ompi/mca/coll/basic/coll_basic_gatherv.c @@ -11,6 +11,7 @@ * All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -27,7 +28,6 @@ #include "ompi/mca/coll/coll.h" #include "ompi/mca/coll/base/coll_tags.h" #include "ompi/mca/pml/pml.h" -#include "coll_basic.h" /* * gatherv_intra @@ -142,7 +142,7 @@ mca_coll_basic_gatherv_inter(const void *sbuf, int scount, return OMPI_ERROR; } - reqs = coll_base_comm_get_reqs(module->base_data, size); + reqs = ompi_coll_base_comm_get_reqs(module->base_data, size); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } for (i = 0; i < size; ++i) { diff --git a/ompi/mca/coll/basic/coll_basic_neighbor_allgather.c b/ompi/mca/coll/basic/coll_basic_neighbor_allgather.c index 3bd17f0614f..8f79b43d870 100644 --- a/ompi/mca/coll/basic/coll_basic_neighbor_allgather.c +++ b/ompi/mca/coll/basic/coll_basic_neighbor_allgather.c @@ -14,6 +14,7 @@ * reserved. * Copyright (c) 2014-2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -52,7 +53,7 @@ mca_coll_basic_neighbor_allgather_cart(const void *sbuf, int scount, ompi_datatype_get_extent(rdtype, &lb, &extent); - reqs = preqs = coll_base_comm_get_reqs( module->base_data, 4 * cart->ndims ); + reqs = preqs = ompi_coll_base_comm_get_reqs( module->base_data, 4 * cart->ndims ); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } /* The ordering is defined as -1 then +1 in each dimension in @@ -139,7 +140,7 @@ mca_coll_basic_neighbor_allgather_graph(const void *sbuf, int scount, } ompi_datatype_get_extent(rdtype, &lb, &extent); - reqs = preqs = coll_base_comm_get_reqs( module->base_data, 2 * degree); + reqs = preqs = ompi_coll_base_comm_get_reqs( module->base_data, 2 * degree); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } for (neighbor = 0; neighbor < degree ; ++neighbor) { @@ -190,7 +191,7 @@ mca_coll_basic_neighbor_allgather_dist_graph(const void *sbuf, int scount, outedges = dist_graph->out; ompi_datatype_get_extent(rdtype, &lb, &extent); - reqs = preqs = coll_base_comm_get_reqs( module->base_data, indegree + outdegree); + reqs = preqs = ompi_coll_base_comm_get_reqs( module->base_data, indegree + outdegree); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } for (neighbor = 0; neighbor < indegree ; ++neighbor) { diff --git a/ompi/mca/coll/basic/coll_basic_neighbor_allgatherv.c b/ompi/mca/coll/basic/coll_basic_neighbor_allgatherv.c index 33465f55479..f837109f908 100644 --- a/ompi/mca/coll/basic/coll_basic_neighbor_allgatherv.c +++ b/ompi/mca/coll/basic/coll_basic_neighbor_allgatherv.c @@ -14,6 +14,7 @@ * reserved. * Copyright (c) 2014-2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -51,7 +52,7 @@ mca_coll_basic_neighbor_allgatherv_cart(const void *sbuf, int scount, struct omp ompi_datatype_get_extent(rdtype, &lb, &extent); - reqs = preqs = coll_base_comm_get_reqs( module->base_data, 4 * cart->ndims); + reqs = preqs = ompi_coll_base_comm_get_reqs( module->base_data, 4 * cart->ndims); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } /* The ordering is defined as -1 then +1 in each dimension in @@ -126,7 +127,7 @@ mca_coll_basic_neighbor_allgatherv_graph(const void *sbuf, int scount, struct om } ompi_datatype_get_extent(rdtype, &lb, &extent); - reqs = preqs = coll_base_comm_get_reqs( module->base_data, 2 * degree); + reqs = preqs = ompi_coll_base_comm_get_reqs( module->base_data, 2 * degree); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } for (neighbor = 0; neighbor < degree ; ++neighbor) { @@ -175,7 +176,7 @@ mca_coll_basic_neighbor_allgatherv_dist_graph(const void *sbuf, int scount, stru outedges = dist_graph->out; ompi_datatype_get_extent(rdtype, &lb, &extent); - reqs = preqs = coll_base_comm_get_reqs( module->base_data, indegree + outdegree); + reqs = preqs = ompi_coll_base_comm_get_reqs( module->base_data, indegree + outdegree); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } for (neighbor = 0; neighbor < indegree ; ++neighbor) { diff --git a/ompi/mca/coll/basic/coll_basic_neighbor_alltoall.c b/ompi/mca/coll/basic/coll_basic_neighbor_alltoall.c index 804d398d500..70fdf9dc1b6 100644 --- a/ompi/mca/coll/basic/coll_basic_neighbor_alltoall.c +++ b/ompi/mca/coll/basic/coll_basic_neighbor_alltoall.c @@ -14,6 +14,7 @@ * reserved. * Copyright (c) 2014-2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -50,7 +51,7 @@ mca_coll_basic_neighbor_alltoall_cart(const void *sbuf, int scount, struct ompi_ ompi_datatype_get_extent(rdtype, &lb, &rdextent); ompi_datatype_get_extent(sdtype, &lb, &sdextent); - reqs = preqs = coll_base_comm_get_reqs( module->base_data, 4 * cart->ndims); + reqs = preqs = ompi_coll_base_comm_get_reqs( module->base_data, 4 * cart->ndims); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } /* post receives first */ @@ -157,7 +158,7 @@ mca_coll_basic_neighbor_alltoall_graph(const void *sbuf, int scount, struct ompi ompi_datatype_get_extent(rdtype, &lb, &rdextent); ompi_datatype_get_extent(sdtype, &lb, &sdextent); - reqs = preqs = coll_base_comm_get_reqs( module->base_data, 2 * degree); + reqs = preqs = ompi_coll_base_comm_get_reqs( module->base_data, 2 * degree); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } /* post receives first */ @@ -215,7 +216,7 @@ mca_coll_basic_neighbor_alltoall_dist_graph(const void *sbuf, int scount,struct ompi_datatype_get_extent(rdtype, &lb, &rdextent); ompi_datatype_get_extent(sdtype, &lb, &sdextent); - reqs = preqs = coll_base_comm_get_reqs( module->base_data, indegree + outdegree); + reqs = preqs = ompi_coll_base_comm_get_reqs( module->base_data, indegree + outdegree); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } /* post receives first */ diff --git a/ompi/mca/coll/basic/coll_basic_neighbor_alltoallv.c b/ompi/mca/coll/basic/coll_basic_neighbor_alltoallv.c index d6c41777856..8449778140f 100644 --- a/ompi/mca/coll/basic/coll_basic_neighbor_alltoallv.c +++ b/ompi/mca/coll/basic/coll_basic_neighbor_alltoallv.c @@ -14,6 +14,7 @@ * reserved. * Copyright (c) 2014-2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -51,7 +52,7 @@ mca_coll_basic_neighbor_alltoallv_cart(const void *sbuf, const int scounts[], co ompi_datatype_get_extent(rdtype, &lb, &rdextent); ompi_datatype_get_extent(sdtype, &lb, &sdextent); - reqs = preqs = coll_base_comm_get_reqs( module->base_data, 4 * cart->ndims ); + reqs = preqs = ompi_coll_base_comm_get_reqs( module->base_data, 4 * cart->ndims ); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } /* post receives first */ @@ -144,7 +145,7 @@ mca_coll_basic_neighbor_alltoallv_graph(const void *sbuf, const int scounts[], c ompi_datatype_get_extent(rdtype, &lb, &rdextent); ompi_datatype_get_extent(sdtype, &lb, &sdextent); - reqs = preqs = coll_base_comm_get_reqs( module->base_data, 2 * degree ); + reqs = preqs = ompi_coll_base_comm_get_reqs( module->base_data, 2 * degree ); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } /* post all receives first */ @@ -201,7 +202,7 @@ mca_coll_basic_neighbor_alltoallv_dist_graph(const void *sbuf, const int scounts ompi_datatype_get_extent(rdtype, &lb, &rdextent); ompi_datatype_get_extent(sdtype, &lb, &sdextent); - reqs = preqs = coll_base_comm_get_reqs( module->base_data, indegree + outdegree); + reqs = preqs = ompi_coll_base_comm_get_reqs( module->base_data, indegree + outdegree); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } /* post all receives first */ diff --git a/ompi/mca/coll/basic/coll_basic_neighbor_alltoallw.c b/ompi/mca/coll/basic/coll_basic_neighbor_alltoallw.c index 5b15574d0ec..9060c82c106 100644 --- a/ompi/mca/coll/basic/coll_basic_neighbor_alltoallw.c +++ b/ompi/mca/coll/basic/coll_basic_neighbor_alltoallw.c @@ -49,7 +49,7 @@ mca_coll_basic_neighbor_alltoallw_cart(const void *sbuf, const int scounts[], co if (0 == cart->ndims) return OMPI_SUCCESS; - reqs = preqs = coll_base_comm_get_reqs( module->base_data, 4 * cart->ndims ); + reqs = preqs = ompi_coll_base_comm_get_reqs( module->base_data, 4 * cart->ndims ); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } /* post receives first */ @@ -134,7 +134,7 @@ mca_coll_basic_neighbor_alltoallw_graph(const void *sbuf, const int scounts[], c mca_topo_base_graph_neighbors_count (comm, rank, °ree); if (0 == degree) return OMPI_SUCCESS; - reqs = preqs = coll_base_comm_get_reqs( module->base_data, 2 * degree ); + reqs = preqs = ompi_coll_base_comm_get_reqs( module->base_data, 2 * degree ); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } edges = graph->edges; @@ -195,7 +195,7 @@ mca_coll_basic_neighbor_alltoallw_dist_graph(const void *sbuf, const int scounts if (0 == indegree+outdegree) return OMPI_SUCCESS; - reqs = preqs = coll_base_comm_get_reqs( module->base_data, indegree + outdegree ); + reqs = preqs = ompi_coll_base_comm_get_reqs( module->base_data, indegree + outdegree ); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } /* post all receives first */ diff --git a/ompi/mca/coll/basic/coll_basic_reduce_scatter.c b/ompi/mca/coll/basic/coll_basic_reduce_scatter.c index 9940753ffe8..8c8b9611af5 100644 --- a/ompi/mca/coll/basic/coll_basic_reduce_scatter.c +++ b/ompi/mca/coll/basic/coll_basic_reduce_scatter.c @@ -39,7 +39,7 @@ #include "coll_basic.h" #include "ompi/op/op.h" -#define COMMUTATIVE_LONG_MSG 8 * 1024 * 1024 +#define COMMUTATIVE_LONG_MSG (8 * 1024 * 1024) /* * reduce_scatter @@ -60,7 +60,7 @@ * usage for the recusive halving is msg_size + 2 * comm_size greater * for the recursive halving, so I've limited where the recursive * halving is used to be nice to the app memory wise. There are much - * better algorithms for large messages with cummutative operations, + * better algorithms for large messages with commutative operations, * so this should be investigated further. */ int diff --git a/ompi/mca/coll/basic/coll_basic_reduce_scatter_block.c b/ompi/mca/coll/basic/coll_basic_reduce_scatter_block.c index 21331b31023..044a0804209 100644 --- a/ompi/mca/coll/basic/coll_basic_reduce_scatter_block.c +++ b/ompi/mca/coll/basic/coll_basic_reduce_scatter_block.c @@ -12,7 +12,7 @@ * Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2012 Oak Ridge National Labs. All rights reserved. * Copyright (c) 2012 Sandia National Laboratories. All rights reserved. - * Copyright (c) 2014-2016 Research Organization for Information Science + * Copyright (c) 2014-2018 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -57,58 +57,9 @@ mca_coll_basic_reduce_scatter_block_intra(const void *sbuf, void *rbuf, int rcou struct ompi_communicator_t *comm, mca_coll_base_module_t *module) { - int rank, size, count, err = OMPI_SUCCESS; - ptrdiff_t gap, span; - char *recv_buf = NULL, *recv_buf_free = NULL; - - /* Initialize */ - rank = ompi_comm_rank(comm); - size = ompi_comm_size(comm); - - /* short cut the trivial case */ - count = rcount * size; - if (0 == count) { - return OMPI_SUCCESS; - } - - /* get datatype information */ - span = opal_datatype_span(&dtype->super, count, &gap); - - /* Handle MPI_IN_PLACE */ - if (MPI_IN_PLACE == sbuf) { - sbuf = rbuf; - } - - if (0 == rank) { - /* temporary receive buffer. See coll_basic_reduce.c for - details on sizing */ - recv_buf_free = (char*) malloc(span); - if (NULL == recv_buf_free) { - err = OMPI_ERR_OUT_OF_RESOURCE; - goto cleanup; - } - recv_buf = recv_buf_free - gap; - } - - /* reduction */ - err = - comm->c_coll->coll_reduce(sbuf, recv_buf, count, dtype, op, 0, - comm, comm->c_coll->coll_reduce_module); - - /* scatter */ - if (MPI_SUCCESS == err) { - err = comm->c_coll->coll_scatter(recv_buf, rcount, dtype, - rbuf, rcount, dtype, 0, - comm, comm->c_coll->coll_scatter_module); - } - - cleanup: - if (NULL != recv_buf_free) free(recv_buf_free); - - return err; + return ompi_coll_base_reduce_scatter_block_basic_linear(sbuf, rbuf, rcount, dtype, op, comm, module); } - /* * reduce_scatter_block_inter * diff --git a/ompi/mca/coll/basic/coll_basic_scan.c b/ompi/mca/coll/basic/coll_basic_scan.c index 2ee07d0fd24..e7399eb91fa 100644 --- a/ompi/mca/coll/basic/coll_basic_scan.c +++ b/ompi/mca/coll/basic/coll_basic_scan.c @@ -9,7 +9,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2018 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -46,85 +46,5 @@ mca_coll_basic_scan_intra(const void *sbuf, void *rbuf, int count, struct ompi_communicator_t *comm, mca_coll_base_module_t *module) { - int size, rank, err; - ptrdiff_t dsize, gap; - char *free_buffer = NULL; - char *pml_buffer = NULL; - - /* Initialize */ - - rank = ompi_comm_rank(comm); - size = ompi_comm_size(comm); - - /* If I'm rank 0, just copy into the receive buffer */ - - if (0 == rank) { - if (MPI_IN_PLACE != sbuf) { - err = ompi_datatype_copy_content_same_ddt(dtype, count, (char*)rbuf, (char*)sbuf); - if (MPI_SUCCESS != err) { - return err; - } - } - } - - /* Otherwise receive previous buffer and reduce. */ - - else { - /* Allocate a temporary buffer. Rationale for this size is - * listed in coll_basic_reduce.c. Use this temporary buffer to - * receive into, later. */ - - dsize = opal_datatype_span(&dtype->super, count, &gap); - free_buffer = malloc(dsize); - if (NULL == free_buffer) { - return OMPI_ERR_OUT_OF_RESOURCE; - } - pml_buffer = free_buffer - gap; - - /* Copy the send buffer into the receive buffer. */ - - if (MPI_IN_PLACE != sbuf) { - err = ompi_datatype_copy_content_same_ddt(dtype, count, (char*)rbuf, (char*)sbuf); - if (MPI_SUCCESS != err) { - if (NULL != free_buffer) { - free(free_buffer); - } - return err; - } - } - - /* Receive the prior answer */ - - err = MCA_PML_CALL(recv(pml_buffer, count, dtype, - rank - 1, MCA_COLL_BASE_TAG_SCAN, comm, - MPI_STATUS_IGNORE)); - if (MPI_SUCCESS != err) { - if (NULL != free_buffer) { - free(free_buffer); - } - return err; - } - - /* Perform the operation */ - - ompi_op_reduce(op, pml_buffer, rbuf, count, dtype); - - /* All done */ - - if (NULL != free_buffer) { - free(free_buffer); - } - } - - /* Send result to next process. */ - - if (rank < (size - 1)) { - return MCA_PML_CALL(send(rbuf, count, dtype, rank + 1, - MCA_COLL_BASE_TAG_SCAN, - MCA_PML_BASE_SEND_STANDARD, comm)); - } - - /* All done */ - - return MPI_SUCCESS; + return ompi_coll_base_scan_intra_linear(sbuf, rbuf, count, dtype, op, comm, module); } diff --git a/ompi/mca/coll/basic/coll_basic_scatter.c b/ompi/mca/coll/basic/coll_basic_scatter.c index eef5f3136bb..ea5aa7aecbe 100644 --- a/ompi/mca/coll/basic/coll_basic_scatter.c +++ b/ompi/mca/coll/basic/coll_basic_scatter.c @@ -11,6 +11,7 @@ * All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -68,7 +69,7 @@ mca_coll_basic_scatter_inter(const void *sbuf, int scount, return OMPI_ERROR; } - reqs = coll_base_comm_get_reqs(module->base_data, size); + reqs = ompi_coll_base_comm_get_reqs(module->base_data, size); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } incr *= scount; diff --git a/ompi/mca/coll/basic/coll_basic_scatterv.c b/ompi/mca/coll/basic/coll_basic_scatterv.c index fe0a49be223..16602158b2b 100644 --- a/ompi/mca/coll/basic/coll_basic_scatterv.c +++ b/ompi/mca/coll/basic/coll_basic_scatterv.c @@ -11,6 +11,7 @@ * All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -144,7 +145,7 @@ mca_coll_basic_scatterv_inter(const void *sbuf, const int *scounts, return OMPI_ERROR; } - reqs = coll_base_comm_get_reqs(module->base_data, size); + reqs = ompi_coll_base_comm_get_reqs(module->base_data, size); if( NULL == reqs ) { return OMPI_ERR_OUT_OF_RESOURCE; } for (i = 0; i < size; ++i) { diff --git a/ompi/mca/coll/cuda/Makefile.am b/ompi/mca/coll/cuda/Makefile.am index e81d7ec45e3..74a6ecfd947 100644 --- a/ompi/mca/coll/cuda/Makefile.am +++ b/ompi/mca/coll/cuda/Makefile.am @@ -3,6 +3,7 @@ # of Tennessee Research Foundation. All rights # reserved. # Copyright (c) 2014 NVIDIA Corporation. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -31,6 +32,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_coll_cuda_la_SOURCES = $(sources) mca_coll_cuda_la_LDFLAGS = -module -avoid-version +mca_coll_cuda_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_coll_cuda_la_SOURCES =$(sources) diff --git a/ompi/mca/coll/demo/Makefile.am b/ompi/mca/coll/demo/Makefile.am index 235ba68883a..1246c5d4389 100644 --- a/ompi/mca/coll/demo/Makefile.am +++ b/ompi/mca/coll/demo/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -56,6 +57,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_coll_demo_la_SOURCES = $(sources) mca_coll_demo_la_LDFLAGS = -module -avoid-version +mca_coll_demo_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_coll_demo_la_SOURCES = $(sources) diff --git a/ompi/mca/coll/fca/Makefile.am b/ompi/mca/coll/fca/Makefile.am index 9298b6f60ef..ccbe6b40e03 100644 --- a/ompi/mca/coll/fca/Makefile.am +++ b/ompi/mca/coll/fca/Makefile.am @@ -2,6 +2,7 @@ # # # Copyright (c) 2011 Mellanox Technologies. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -37,7 +38,8 @@ endif mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_coll_fca_la_SOURCES = $(coll_fca_sources) -mca_coll_fca_la_LIBADD = $(coll_fca_LIBS) +mca_coll_fca_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(coll_fca_LIBS) mca_coll_fca_la_LDFLAGS = -module -avoid-version $(coll_fca_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/ompi/mca/coll/hcoll/Makefile.am b/ompi/mca/coll/hcoll/Makefile.am index dafa2b32f91..37ec1c96c92 100644 --- a/ompi/mca/coll/hcoll/Makefile.am +++ b/ompi/mca/coll/hcoll/Makefile.am @@ -4,6 +4,7 @@ # Copyright (c) 2011 Mellanox Technologies. All rights reserved. # Copyright (c) 2015 Research Organization for Information Science # and Technology (RIST). All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -38,7 +39,8 @@ endif mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_coll_hcoll_la_SOURCES = $(coll_hcoll_sources) -mca_coll_hcoll_la_LIBADD = $(coll_hcoll_LIBS) +mca_coll_hcoll_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(coll_hcoll_LIBS) mca_coll_hcoll_la_LDFLAGS = -module -avoid-version $(coll_hcoll_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/ompi/mca/coll/hcoll/coll_hcoll.h b/ompi/mca/coll/hcoll/coll_hcoll.h index 6e8382d49f5..1ad34be11c6 100644 --- a/ompi/mca/coll/hcoll/coll_hcoll.h +++ b/ompi/mca/coll/hcoll/coll_hcoll.h @@ -56,6 +56,7 @@ typedef struct { } mca_coll_hcoll_dtype_t; OBJ_CLASS_DECLARATION(mca_coll_hcoll_dtype_t); +extern mca_coll_hcoll_dtype_t zero_dte_mapping; struct mca_coll_hcoll_component_t { /** Base coll component */ mca_coll_base_component_2_0_0_t super; diff --git a/ompi/mca/coll/hcoll/coll_hcoll_dtypes.h b/ompi/mca/coll/hcoll/coll_hcoll_dtypes.h index f0efb41c4fd..a818e6675ba 100644 --- a/ompi/mca/coll/hcoll/coll_hcoll_dtypes.h +++ b/ompi/mca/coll/hcoll/coll_hcoll_dtypes.h @@ -10,7 +10,7 @@ #include "ompi/mca/op/op.h" #include "hcoll/api/hcoll_dte.h" extern int hcoll_type_attr_keyval; - +extern mca_coll_hcoll_dtype_t zero_dte_mapping; /*to keep this at hand: Ids of the basic opal_datatypes: #define OPAL_DATATYPE_INT1 4 #define OPAL_DATATYPE_INT2 5 @@ -36,8 +36,16 @@ total 15 types static dte_data_representation_t* ompi_datatype_2_dte_data_rep[OMPI_DATATYPE_MAX_PREDEFINED] = { &DTE_ZERO, /*OPAL_DATATYPE_LOOP 0 */ &DTE_ZERO, /*OPAL_DATATYPE_END_LOOP 1 */ - &DTE_ZERO, /*OPAL_DATATYPE_LB 2 */ - &DTE_ZERO, /*OPAL_DATATYPE_UB 3 */ +#if defined(DTE_LB) + &DTE_LB, /*OPAL_DATATYPE_LB 2 */ +#else + &DTE_ZERO, +#endif +#if defined(DTE_UB) + &DTE_UB, /*OPAL_DATATYPE_UB 3 */ +#else + &DTE_ZERO, +#endif &DTE_BYTE, /*OPAL_DATATYPE_INT1 4 */ &DTE_INT16, /*OPAL_DATATYPE_INT2 5 */ &DTE_INT32, /*OPAL_DATATYPE_INT4 6 */ @@ -68,8 +76,16 @@ static dte_data_representation_t* ompi_datatype_2_dte_data_rep[OMPI_DATATYPE_MAX #else &DTE_ZERO, #endif - &DTE_ZERO, /*OPAL_DATATYPE_BOOL 22 */ - &DTE_ZERO, /*OPAL_DATATYPE_WCHAR 23 */ +#if defined(DTE_BOOL) + &DTE_BOOL, /*OPAL_DATATYPE_BOOL 22 */ +#else + &DTE_ZERO, +#endif +#if defined(DTE_WCHAR) + &DTE_WCHAR, /*OPAL_DATATYPE_WCHAR 23 */ +#else + &DTE_ZERO, +#endif &DTE_ZERO /*OPAL_DATATYPE_UNAVAILABLE 24 */ }; @@ -81,15 +97,21 @@ enum { #if HCOLL_API >= HCOLL_VERSION(3,6) static inline -int hcoll_map_derived_type(ompi_datatype_t *dtype, dte_data_representation_t *new_dte) +void hcoll_map_derived_type(ompi_datatype_t *dtype, dte_data_representation_t *new_dte) { int rc; if (NULL == dtype->args) { /* predefined type, shouldn't call this */ - return OMPI_SUCCESS; + return; } rc = hcoll_create_mpi_type((void*)dtype, new_dte); - return rc == HCOLL_SUCCESS ? OMPI_SUCCESS : OMPI_ERROR; + if (rc != HCOLL_SUCCESS) { + /* If hcoll fails to create mpi derived type let's set zero_dte on this dtype. + This will save cycles on subsequent collective calls with the same derived + type since we will not try to create hcoll type again. */ + ompi_attr_set_c(TYPE_ATTR, (void*)dtype, &(dtype->d_keyhash), + hcoll_type_attr_keyval, &zero_dte_mapping, false); + } } static dte_data_representation_t find_derived_mapping(ompi_datatype_t *dtype){ @@ -222,6 +244,9 @@ static int hcoll_type_attr_del_fn(MPI_Datatype type, int keyval, void *attr_val, (mca_coll_hcoll_dtype_t*) attr_val; assert(dtype); + if (&zero_dte_mapping == dtype) { + return OMPI_SUCCESS; + } if (HCOLL_SUCCESS != (ret = hcoll_dt_destroy(dtype->type))) { HCOL_ERROR("failed to delete type attr: hcoll_dte_destroy returned %d",ret); return OMPI_ERROR; diff --git a/ompi/mca/coll/hcoll/coll_hcoll_module.c b/ompi/mca/coll/hcoll/coll_hcoll_module.c index dfc8f676727..1cd36fd89b5 100644 --- a/ompi/mca/coll/hcoll/coll_hcoll_module.c +++ b/ompi/mca/coll/hcoll/coll_hcoll_module.c @@ -17,7 +17,7 @@ int hcoll_comm_attr_keyval; int hcoll_type_attr_keyval; - +mca_coll_hcoll_dtype_t zero_dte_mapping; /* * Initial query function that is invoked during MPI_INIT, allowing * this module to indicate what level of thread support it provides. @@ -333,6 +333,7 @@ mca_coll_hcoll_comm_query(struct ompi_communicator_t *comm, int *priority) } if (mca_coll_hcoll_component.derived_types_support_enabled) { + zero_dte_mapping.type = DTE_ZERO; copy_fn.attr_datatype_copy_fn = (MPI_Type_internal_copy_attr_function *) MPI_TYPE_NULL_COPY_FN; del_fn.attr_datatype_delete_fn = hcoll_type_attr_del_fn; err = ompi_attr_create_keyval(TYPE_ATTR, copy_fn, del_fn, &hcoll_type_attr_keyval, NULL ,0, NULL); diff --git a/ompi/mca/coll/hcoll/coll_hcoll_rte.c b/ompi/mca/coll/hcoll/coll_hcoll_rte.c index ba64e99b13f..6df2dde7e90 100644 --- a/ompi/mca/coll/hcoll/coll_hcoll_rte.c +++ b/ompi/mca/coll/hcoll/coll_hcoll_rte.c @@ -185,7 +185,7 @@ static int recv_nb(struct dte_data_representation_t data, if (NULL == ec_h.handle && -1 != ec_h.rank) { fprintf(stderr,"***Error in hcolrte_rml_recv_nb: wrong null argument: " "ec_h.handle = %p, ec_h.rank = %d\n",ec_h.handle,ec_h.rank); - return 1; + return HCOLL_ERROR; } assert(HCOL_DTE_IS_INLINE(data)); /*do inline nb recv*/ @@ -195,7 +195,7 @@ static int recv_nb(struct dte_data_representation_t data, if (!buffer && !HCOL_DTE_IS_ZERO(data)) { fprintf(stderr, "***Error in hcolrte_rml_recv_nb: buffer pointer is NULL" " for non DTE_ZERO INLINE data representation\n"); - return 1; + return HCOLL_ERROR; } size = (size_t)data.rep.in_line_rep.data_handle.in_line.packed_size*count/8; @@ -204,7 +204,7 @@ static int recv_nb(struct dte_data_representation_t data, if (MCA_PML_CALL(irecv(buffer,size,&(ompi_mpi_unsigned_char.dt),ec_h.rank, tag,comm,&ompi_req))) { - return 1; + return HCOLL_ERROR; } req->data = (void *)ompi_req; req->status = HCOLRTE_REQUEST_ACTIVE; @@ -226,7 +226,7 @@ static int send_nb( dte_data_representation_t data, if (! ec_h.handle) { fprintf(stderr,"***Error in hcolrte_rml_send_nb: wrong null argument: " "ec_h.handle = %p, ec_h.rank = %d\n",ec_h.handle,ec_h.rank); - return 1; + return HCOLL_ERROR; } assert(HCOL_DTE_IS_INLINE(data)); /*do inline nb recv*/ @@ -235,7 +235,7 @@ static int send_nb( dte_data_representation_t data, if (!buffer && !HCOL_DTE_IS_ZERO(data)) { fprintf(stderr, "***Error in hcolrte_rml_send_nb: buffer pointer is NULL" " for non DTE_ZERO INLINE data representation\n"); - return 1; + return HCOLL_ERROR; } size = (size_t)data.rep.in_line_rep.data_handle.in_line.packed_size*count/8; HCOL_VERBOSE(30,"PML_ISEND: dest = %d: buf = %p: size = %u: comm = %p", @@ -243,7 +243,7 @@ static int send_nb( dte_data_representation_t data, if (MCA_PML_CALL(isend(buffer,size,&(ompi_mpi_unsigned_char.dt),ec_h.rank, tag,MCA_PML_BASE_SEND_STANDARD,comm,&ompi_req))) { - return 1; + return HCOLL_ERROR; } req->data = (void *)ompi_req; req->status = HCOLRTE_REQUEST_ACTIVE; diff --git a/ompi/mca/coll/inter/Makefile.am b/ompi/mca/coll/inter/Makefile.am index fb6585488e7..d9c691cf458 100644 --- a/ompi/mca/coll/inter/Makefile.am +++ b/ompi/mca/coll/inter/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -32,6 +33,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_coll_inter_la_SOURCES = $(sources) mca_coll_inter_la_LDFLAGS = -module -avoid-version +mca_coll_inter_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_coll_inter_la_SOURCES = $(sources) diff --git a/ompi/mca/coll/inter/coll_inter_allgather.c b/ompi/mca/coll/inter/coll_inter_allgather.c index d270ab2c73c..6bd0e91b58d 100644 --- a/ompi/mca/coll/inter/coll_inter_allgather.c +++ b/ompi/mca/coll/inter/coll_inter_allgather.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2010 University of Houston. All rights reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -27,11 +27,11 @@ #include "mpi.h" #include "ompi/constants.h" #include "ompi/datatype/ompi_datatype.h" -#include "ompi/request/request.h" #include "ompi/communicator/communicator.h" #include "ompi/mca/coll/coll.h" #include "ompi/mca/pml/pml.h" #include "ompi/mca/coll/base/coll_tags.h" +#include "ompi/mca/coll/base/coll_base_util.h" /* * allgather_inter @@ -51,7 +51,6 @@ mca_coll_inter_allgather_inter(const void *sbuf, int scount, int rank, root = 0, size, rsize, err = OMPI_SUCCESS; char *ptmp_free = NULL, *ptmp = NULL; ptrdiff_t gap, span; - ompi_request_t *req[2]; rank = ompi_comm_rank(comm); size = ompi_comm_size(comm->c_local_comm); @@ -77,22 +76,11 @@ mca_coll_inter_allgather_inter(const void *sbuf, int scount, if (rank == root) { /* Do a send-recv between the two root procs. to avoid deadlock */ - err = MCA_PML_CALL(irecv(rbuf, rcount*rsize, rdtype, 0, - MCA_COLL_BASE_TAG_ALLGATHER, comm, - &(req[0]))); - if (OMPI_SUCCESS != err) { - goto exit; - } - - err = MCA_PML_CALL(isend(ptmp, scount*size, sdtype, 0, - MCA_COLL_BASE_TAG_ALLGATHER, - MCA_PML_BASE_SEND_STANDARD, - comm, &(req[1]))); - if (OMPI_SUCCESS != err) { - goto exit; - } - - err = ompi_request_wait_all(2, req, MPI_STATUSES_IGNORE); + err = ompi_coll_base_sendrecv_actual(ptmp, scount*size, sdtype, 0, + MCA_COLL_BASE_TAG_ALLGATHER, + rbuf, rcount*rsize, rdtype, 0, + MCA_COLL_BASE_TAG_ALLGATHER, + comm, MPI_STATUS_IGNORE); if (OMPI_SUCCESS != err) { goto exit; } diff --git a/ompi/mca/coll/inter/coll_inter_allgatherv.c b/ompi/mca/coll/inter/coll_inter_allgatherv.c index c12cdfa846a..0728fd28072 100644 --- a/ompi/mca/coll/inter/coll_inter_allgatherv.c +++ b/ompi/mca/coll/inter/coll_inter_allgatherv.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2010 University of Houston. All rights reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -24,11 +24,11 @@ #include "mpi.h" #include "ompi/datatype/ompi_datatype.h" -#include "ompi/request/request.h" #include "ompi/communicator/communicator.h" #include "ompi/constants.h" #include "ompi/mca/coll/coll.h" #include "ompi/mca/coll/base/coll_tags.h" +#include "ompi/mca/coll/base/coll_base_util.h" #include "ompi/mca/pml/pml.h" @@ -51,7 +51,6 @@ mca_coll_inter_allgatherv_inter(const void *sbuf, int scount, int *count=NULL,*displace=NULL; char *ptmp_free=NULL, *ptmp=NULL; ompi_datatype_t *ndtype = NULL; - ompi_request_t *req[2]; rank = ompi_comm_rank(comm); size_local = ompi_comm_size(comm->c_local_comm); @@ -106,25 +105,14 @@ mca_coll_inter_allgatherv_inter(const void *sbuf, int scount, if (0 == rank) { /* Exchange data between roots */ - err = MCA_PML_CALL(irecv(rbuf, 1, ndtype, 0, - MCA_COLL_BASE_TAG_ALLGATHERV, comm, - &(req[0]))); + err = ompi_coll_base_sendrecv_actual(ptmp, total, sdtype, 0, + MCA_COLL_BASE_TAG_ALLGATHERV, + rbuf, 1, ndtype, 0, + MCA_COLL_BASE_TAG_ALLGATHERV, + comm, MPI_STATUS_IGNORE); if (OMPI_SUCCESS != err) { goto exit; } - - err = MCA_PML_CALL(isend(ptmp, total, sdtype, 0, - MCA_COLL_BASE_TAG_ALLGATHERV, - MCA_PML_BASE_SEND_STANDARD, - comm, &(req[1]))); - if (OMPI_SUCCESS != err) { - goto exit; - } - - err = ompi_request_wait_all(2, req, MPI_STATUSES_IGNORE); - if (OMPI_SUCCESS != err) { - goto exit; - } } /* bcast the message to all the local processes */ diff --git a/ompi/mca/coll/inter/coll_inter_allreduce.c b/ompi/mca/coll/inter/coll_inter_allreduce.c index 8c972a223de..91ca00ff858 100644 --- a/ompi/mca/coll/inter/coll_inter_allreduce.c +++ b/ompi/mca/coll/inter/coll_inter_allreduce.c @@ -11,7 +11,7 @@ * All rights reserved. * Copyright (c) 2006-2007 University of Houston. All rights reserved. * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -27,10 +27,10 @@ #include "ompi/constants.h" #include "ompi/datatype/ompi_datatype.h" #include "ompi/communicator/communicator.h" -#include "ompi/request/request.h" #include "ompi/op/op.h" #include "ompi/mca/coll/coll.h" #include "ompi/mca/coll/base/coll_tags.h" +#include "ompi/mca/coll/base/coll_base_util.h" #include "ompi/mca/pml/pml.h" /* @@ -49,7 +49,6 @@ mca_coll_inter_allreduce_inter(const void *sbuf, void *rbuf, int count, { int err, rank, root = 0; char *tmpbuf = NULL, *pml_buffer = NULL; - ompi_request_t *req[2]; ptrdiff_t gap, span; rank = ompi_comm_rank(comm); @@ -73,22 +72,11 @@ mca_coll_inter_allreduce_inter(const void *sbuf, void *rbuf, int count, if (rank == root) { /* Do a send-recv between the two root procs. to avoid deadlock */ - err = MCA_PML_CALL(irecv(rbuf, count, dtype, 0, - MCA_COLL_BASE_TAG_ALLREDUCE, comm, - &(req[0]))); - if (OMPI_SUCCESS != err) { - goto exit; - } - - err = MCA_PML_CALL(isend(pml_buffer, count, dtype, 0, - MCA_COLL_BASE_TAG_ALLREDUCE, - MCA_PML_BASE_SEND_STANDARD, - comm, &(req[1]))); - if (OMPI_SUCCESS != err) { - goto exit; - } - - err = ompi_request_wait_all(2, req, MPI_STATUSES_IGNORE); + err = ompi_coll_base_sendrecv_actual(pml_buffer, count, dtype, 0, + MCA_COLL_BASE_TAG_ALLREDUCE, + rbuf, count, dtype, 0, + MCA_COLL_BASE_TAG_ALLREDUCE, + comm, MPI_STATUS_IGNORE); if (OMPI_SUCCESS != err) { goto exit; } diff --git a/ompi/mca/coll/libnbc/Makefile.am b/ompi/mca/coll/libnbc/Makefile.am index 4d3e90186a9..4afa48cdd2c 100644 --- a/ompi/mca/coll/libnbc/Makefile.am +++ b/ompi/mca/coll/libnbc/Makefile.am @@ -12,6 +12,9 @@ # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2013 Los Alamos National Security, LLC. All rights # reserved. +# Copyright (c) 2017 Research Organization for Information Science +# and Technology (RIST). All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -37,7 +40,6 @@ sources = \ nbc_ialltoallw.c \ nbc_ibarrier.c \ nbc_ibcast.c \ - nbc_ibcast_inter.c \ nbc_iexscan.c \ nbc_igather.c \ nbc_igatherv.c \ @@ -70,6 +72,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_coll_libnbc_la_SOURCES = $(sources) mca_coll_libnbc_la_LDFLAGS = -module -avoid-version +mca_coll_libnbc_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_coll_libnbc_la_SOURCES =$(sources) diff --git a/ompi/mca/coll/libnbc/coll_libnbc.h b/ompi/mca/coll/libnbc/coll_libnbc.h index c7c88dc9370..94f1d038368 100644 --- a/ompi/mca/coll/libnbc/coll_libnbc.h +++ b/ompi/mca/coll/libnbc/coll_libnbc.h @@ -75,7 +75,6 @@ struct ompi_coll_libnbc_component_t { opal_free_list_t requests; opal_list_t active_requests; int32_t active_comms; - opal_atomic_lock_t progress_lock; /* protect from recursive calls */ opal_mutex_t lock; /* protect access to the active_requests list */ }; typedef struct ompi_coll_libnbc_component_t ompi_coll_libnbc_component_t; diff --git a/ompi/mca/coll/libnbc/coll_libnbc_component.c b/ompi/mca/coll/libnbc/coll_libnbc_component.c index 1ac5b0b943e..a94ca934e00 100644 --- a/ompi/mca/coll/libnbc/coll_libnbc_component.c +++ b/ompi/mca/coll/libnbc/coll_libnbc_component.c @@ -39,6 +39,7 @@ const char *mca_coll_libnbc_component_version_string = static int libnbc_priority = 10; +static bool libnbc_in_progress = false; /* protect from recursive calls */ bool libnbc_ibcast_skip_dt_decision = true; @@ -102,8 +103,6 @@ libnbc_open(void) a non-blocking collective started */ mca_coll_libnbc_component.active_comms = 0; - opal_atomic_init(&mca_coll_libnbc_component.progress_lock, OPAL_ATOMIC_UNLOCKED); - return OMPI_SUCCESS; } @@ -263,37 +262,43 @@ ompi_coll_libnbc_progress(void) ompi_coll_libnbc_request_t* request, *next; int res; - /* return if invoked recursively */ - if (opal_atomic_trylock(&mca_coll_libnbc_component.progress_lock)) return 0; + if (0 == opal_list_get_size (&mca_coll_libnbc_component.active_requests)) { + /* no requests -- nothing to do. do not grab a lock */ + return 0; + } /* process active requests, and use mca_coll_libnbc_component.lock to access the * mca_coll_libnbc_component.active_requests list */ OPAL_THREAD_LOCK(&mca_coll_libnbc_component.lock); - OPAL_LIST_FOREACH_SAFE(request, next, &mca_coll_libnbc_component.active_requests, - ompi_coll_libnbc_request_t) { - OPAL_THREAD_UNLOCK(&mca_coll_libnbc_component.lock); - res = NBC_Progress(request); - if( NBC_CONTINUE != res ) { - /* done, remove and complete */ - OPAL_THREAD_LOCK(&mca_coll_libnbc_component.lock); - opal_list_remove_item(&mca_coll_libnbc_component.active_requests, - &request->super.super.super); - OPAL_THREAD_UNLOCK(&mca_coll_libnbc_component.lock); + /* return if invoked recursively */ + if (!libnbc_in_progress) { + libnbc_in_progress = true; - if( OMPI_SUCCESS == res || NBC_OK == res || NBC_SUCCESS == res ) { - request->super.req_status.MPI_ERROR = OMPI_SUCCESS; - } - else { - request->super.req_status.MPI_ERROR = res; + OPAL_LIST_FOREACH_SAFE(request, next, &mca_coll_libnbc_component.active_requests, + ompi_coll_libnbc_request_t) { + OPAL_THREAD_UNLOCK(&mca_coll_libnbc_component.lock); + res = NBC_Progress(request); + if( NBC_CONTINUE != res ) { + /* done, remove and complete */ + OPAL_THREAD_LOCK(&mca_coll_libnbc_component.lock); + opal_list_remove_item(&mca_coll_libnbc_component.active_requests, + &request->super.super.super); + OPAL_THREAD_UNLOCK(&mca_coll_libnbc_component.lock); + + if( OMPI_SUCCESS == res || NBC_OK == res || NBC_SUCCESS == res ) { + request->super.req_status.MPI_ERROR = OMPI_SUCCESS; + } + else { + request->super.req_status.MPI_ERROR = res; + } + ompi_request_complete(&request->super, true); } - ompi_request_complete(&request->super, true); + OPAL_THREAD_LOCK(&mca_coll_libnbc_component.lock); } - OPAL_THREAD_LOCK(&mca_coll_libnbc_component.lock); + libnbc_in_progress = false; } OPAL_THREAD_UNLOCK(&mca_coll_libnbc_component.lock); - opal_atomic_unlock(&mca_coll_libnbc_component.progress_lock); - return 0; } @@ -314,7 +319,7 @@ libnbc_module_destruct(ompi_coll_libnbc_module_t *module) /* if we ever were used for a collective op, do the progress cleanup. */ if (true == module->comm_registered) { int32_t tmp = - OPAL_THREAD_ADD32(&mca_coll_libnbc_component.active_comms, -1); + OPAL_THREAD_ADD_FETCH32(&mca_coll_libnbc_component.active_comms, -1); if (0 == tmp) { opal_progress_unregister(ompi_coll_libnbc_progress); } diff --git a/ompi/mca/coll/libnbc/nbc.c b/ompi/mca/coll/libnbc/nbc.c index fe55fa5e757..28f022e5c99 100644 --- a/ompi/mca/coll/libnbc/nbc.c +++ b/ompi/mca/coll/libnbc/nbc.c @@ -10,7 +10,7 @@ * rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * * Author(s): Torsten Hoefler @@ -618,7 +618,7 @@ int NBC_Init_handle(struct ompi_communicator_t *comm, ompi_coll_libnbc_request_t /* register progress */ if (need_register) { int32_t tmp = - OPAL_THREAD_ADD32(&mca_coll_libnbc_component.active_comms, 1); + OPAL_THREAD_ADD_FETCH32(&mca_coll_libnbc_component.active_comms, 1); if (tmp == 1) { opal_progress_register(ompi_coll_libnbc_progress); } @@ -709,6 +709,25 @@ int NBC_Start(NBC_Handle *handle, NBC_Schedule *schedule) { return OMPI_SUCCESS; } +int NBC_Schedule_request(NBC_Schedule *schedule, ompi_communicator_t *comm, ompi_coll_libnbc_module_t *module, ompi_request_t **request, void *tmpbuf) { + int res; + NBC_Handle *handle; + res = NBC_Init_handle (comm, &handle, module); + if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { + return res; + } + handle->tmpbuf = tmpbuf; + + res = NBC_Start (handle, schedule); + if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { + NBC_Return_handle (handle); + return res; + } + + *request = (ompi_request_t *) handle; + return OMPI_SUCCESS; +} + #ifdef NBC_CACHE_SCHEDULE void NBC_SchedCache_args_delete_key_dummy(void *k) { /* do nothing because the key and the data element are identical :-) diff --git a/ompi/mca/coll/libnbc/nbc_iallgather.c b/ompi/mca/coll/libnbc/nbc_iallgather.c index b136d89b7a8..dd20b7a40fe 100644 --- a/ompi/mca/coll/libnbc/nbc_iallgather.c +++ b/ompi/mca/coll/libnbc/nbc_iallgather.c @@ -54,7 +54,6 @@ int ompi_coll_libnbc_iallgather(const void* sendbuf, int sendcount, MPI_Datatype #ifdef NBC_CACHE_SCHEDULE NBC_Allgather_args *args, *found, search; #endif - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; NBC_IN_PLACE(sendbuf, recvbuf, inplace); @@ -155,20 +154,12 @@ int ompi_coll_libnbc_iallgather(const void* sendbuf, int sendcount, MPI_Datatype } #endif - res = NBC_Init_handle (comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start (handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - OMPI_COLL_LIBNBC_REQUEST_RETURN(handle); - return res; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } @@ -180,7 +171,6 @@ int ompi_coll_libnbc_iallgather_inter(const void* sendbuf, int sendcount, MPI_Da MPI_Aint rcvext; NBC_Schedule *schedule; char *rbuf; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; res = ompi_datatype_type_extent(recvtype, &rcvext); @@ -221,19 +211,11 @@ int ompi_coll_libnbc_iallgather_inter(const void* sendbuf, int sendcount, MPI_Da return res; } - res = NBC_Init_handle (comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start (handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - OMPI_COLL_LIBNBC_REQUEST_RETURN(handle); - return res; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } diff --git a/ompi/mca/coll/libnbc/nbc_iallgatherv.c b/ompi/mca/coll/libnbc/nbc_iallgatherv.c index 39fc662ac8f..ac711c6e87a 100644 --- a/ompi/mca/coll/libnbc/nbc_iallgatherv.c +++ b/ompi/mca/coll/libnbc/nbc_iallgatherv.c @@ -11,7 +11,7 @@ * Copyright (c) 2012 Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2013-2015 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2014-2016 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ @@ -41,7 +41,6 @@ int ompi_coll_libnbc_iallgatherv(const void* sendbuf, int sendcount, MPI_Datatyp MPI_Aint rcvext; NBC_Schedule *schedule; char *rbuf, *sbuf, inplace; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; NBC_IN_PLACE(sendbuf, recvbuf, inplace); @@ -100,20 +99,12 @@ int ompi_coll_libnbc_iallgatherv(const void* sendbuf, int sendcount, MPI_Datatyp return res; } - res = NBC_Init_handle (comm, &handle, libnbc_module); + res = NBC_Schedule_request (schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start (handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - return res; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } @@ -124,7 +115,6 @@ int ompi_coll_libnbc_iallgatherv_inter(const void* sendbuf, int sendcount, MPI_D int res, rsize; MPI_Aint rcvext; NBC_Schedule *schedule; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; rsize = ompi_comm_remote_size (comm); @@ -169,19 +159,11 @@ int ompi_coll_libnbc_iallgatherv_inter(const void* sendbuf, int sendcount, MPI_D return res; } - res = NBC_Init_handle (comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start (handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - return res; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } diff --git a/ompi/mca/coll/libnbc/nbc_iallreduce.c b/ompi/mca/coll/libnbc/nbc_iallreduce.c index a1d98ec33b6..1a1e17039c2 100644 --- a/ompi/mca/coll/libnbc/nbc_iallreduce.c +++ b/ompi/mca/coll/libnbc/nbc_iallreduce.c @@ -25,13 +25,13 @@ #include static inline int allred_sched_diss(int rank, int p, int count, MPI_Datatype datatype, ptrdiff_t gap, const void *sendbuf, - void *recvbuf, MPI_Op op, char inplace, NBC_Schedule *schedule, NBC_Handle *handle); + void *recvbuf, MPI_Op op, char inplace, NBC_Schedule *schedule, void *tmpbuf); static inline int allred_sched_ring(int rank, int p, int count, MPI_Datatype datatype, const void *sendbuf, void *recvbuf, MPI_Op op, int size, int ext, NBC_Schedule *schedule, - NBC_Handle *handle); + void *tmpbuf); static inline int allred_sched_linear(int rank, int p, const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, ptrdiff_t gap, MPI_Op op, int ext, int size, - NBC_Schedule *schedule, NBC_Handle *handle); + NBC_Schedule *schedule, void *tmpbuf); #ifdef NBC_CACHE_SCHEDULE /* tree comparison function for schedule cache */ @@ -57,7 +57,7 @@ int ompi_coll_libnbc_iallreduce(const void* sendbuf, void* recvbuf, int count, M struct mca_coll_base_module_2_2_0_t *module) { int rank, p, res; - OPAL_PTRDIFF_TYPE ext, lb; + ptrdiff_t ext, lb; NBC_Schedule *schedule; size_t size; #ifdef NBC_CACHE_SCHEDULE @@ -65,7 +65,7 @@ int ompi_coll_libnbc_iallreduce(const void* sendbuf, void* recvbuf, int count, M #endif enum { NBC_ARED_BINOMIAL, NBC_ARED_RING } alg; char inplace; - NBC_Handle *handle = NULL; + void *tmpbuf = NULL; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; ptrdiff_t span, gap; @@ -91,7 +91,6 @@ int ompi_coll_libnbc_iallreduce(const void* sendbuf, void* recvbuf, int count, M /* for a single node - copy data to receivebuf */ res = NBC_Copy(sendbuf, count, datatype, recvbuf, count, datatype, comm); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); return res; } } @@ -99,15 +98,9 @@ int ompi_coll_libnbc_iallreduce(const void* sendbuf, void* recvbuf, int count, M return OMPI_SUCCESS; } - res = NBC_Init_handle (comm, &handle, libnbc_module); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - return res; - } - span = opal_datatype_span(&datatype->super, count, &gap); - handle->tmpbuf = malloc (span); - if (OPAL_UNLIKELY(NULL == handle->tmpbuf)) { - NBC_Return_handle (handle); + tmpbuf = malloc (span); + if (OPAL_UNLIKELY(NULL == tmpbuf)) { return OMPI_ERR_OUT_OF_RESOURCE; } @@ -130,30 +123,29 @@ int ompi_coll_libnbc_iallreduce(const void* sendbuf, void* recvbuf, int count, M #endif schedule = OBJ_NEW(NBC_Schedule); if (NULL == schedule) { - NBC_Return_handle (handle); + free(tmpbuf); return OMPI_ERR_OUT_OF_RESOURCE; } - /* ensure the schedule is released with the handle on error */ - handle->schedule = schedule; - switch(alg) { case NBC_ARED_BINOMIAL: - res = allred_sched_diss(rank, p, count, datatype, gap, sendbuf, recvbuf, op, inplace, schedule, handle); + res = allred_sched_diss(rank, p, count, datatype, gap, sendbuf, recvbuf, op, inplace, schedule, tmpbuf); break; case NBC_ARED_RING: - res = allred_sched_ring(rank, p, count, datatype, sendbuf, recvbuf, op, size, ext, schedule, handle); + res = allred_sched_ring(rank, p, count, datatype, sendbuf, recvbuf, op, size, ext, schedule, tmpbuf); break; } if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } res = NBC_Sched_commit(schedule); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } @@ -188,15 +180,13 @@ int ompi_coll_libnbc_iallreduce(const void* sendbuf, void* recvbuf, int count, M } #endif - res = NBC_Start (handle, schedule); + res = NBC_Schedule_request (schedule, comm, libnbc_module, request, tmpbuf); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - *request = (ompi_request_t *) handle; - - /* tmpbuf is freed with the handle */ return OMPI_SUCCESS; } @@ -208,7 +198,7 @@ int ompi_coll_libnbc_iallreduce_inter(const void* sendbuf, void* recvbuf, int co size_t size; MPI_Aint ext; NBC_Schedule *schedule; - NBC_Handle *handle; + void *tmpbuf = NULL; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; ptrdiff_t span, gap; @@ -227,49 +217,40 @@ int ompi_coll_libnbc_iallreduce_inter(const void* sendbuf, void* recvbuf, int co return res; } - res = NBC_Init_handle (comm, &handle, libnbc_module); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - return res; - } - span = opal_datatype_span(&datatype->super, count, &gap); - handle->tmpbuf = malloc (span); - if (OPAL_UNLIKELY(NULL == handle->tmpbuf)) { - NBC_Return_handle (handle); + tmpbuf = malloc (span); + if (OPAL_UNLIKELY(NULL == tmpbuf)) { return OMPI_ERR_OUT_OF_RESOURCE; } schedule = OBJ_NEW(NBC_Schedule); if (OPAL_UNLIKELY(NULL == schedule)) { - NBC_Return_handle (handle); + free(tmpbuf); return OMPI_ERR_OUT_OF_RESOURCE; } - /* ensure the schedule is released with the handle on error */ - handle->schedule = schedule; - res = allred_sched_linear (rank, rsize, sendbuf, recvbuf, count, datatype, gap, op, - ext, size, schedule, handle); + ext, size, schedule, tmpbuf); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } res = NBC_Sched_commit(schedule); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - res = NBC_Start(handle, schedule); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, tmpbuf); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - *request = (ompi_request_t *) handle; - - /* tmpbuf is freed with the handle */ return OMPI_SUCCESS; } @@ -310,7 +291,7 @@ int ompi_coll_libnbc_iallreduce_inter(const void* sendbuf, void* recvbuf, int co if (vrank == root) rank = 0; \ } static inline int allred_sched_diss(int rank, int p, int count, MPI_Datatype datatype, ptrdiff_t gap, const void *sendbuf, void *recvbuf, - MPI_Op op, char inplace, NBC_Schedule *schedule, NBC_Handle *handle) { + MPI_Op op, char inplace, NBC_Schedule *schedule, void *tmpbuf) { int root, vrank, maxr, vpeer, peer, res; char *rbuf, *lbuf, *buf; int tmprbuf, tmplbuf; @@ -330,7 +311,7 @@ static inline int allred_sched_diss(int rank, int p, int count, MPI_Datatype dat rbuf = recvbuf; tmprbuf = false; if (inplace) { - res = NBC_Copy(rbuf, count, datatype, ((char *)handle->tmpbuf) - gap, count, datatype, MPI_COMM_SELF); + res = NBC_Copy(rbuf, count, datatype, ((char *)tmpbuf) - gap, count, datatype, MPI_COMM_SELF); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { return res; } @@ -349,7 +330,7 @@ static inline int allred_sched_diss(int rank, int p, int count, MPI_Datatype dat return res; } - /* this cannot be done until handle->tmpbuf is unused :-( so barrier after the op */ + /* this cannot be done until tmpbuf is unused :-( so barrier after the op */ if (firstred && !inplace) { /* perform the reduce with the senbuf */ res = NBC_Sched_op (sendbuf, false, rbuf, tmprbuf, count, datatype, op, schedule, true); @@ -425,7 +406,7 @@ static inline int allred_sched_diss(int rank, int p, int count, MPI_Datatype dat } static inline int allred_sched_ring (int r, int p, int count, MPI_Datatype datatype, const void *sendbuf, void *recvbuf, MPI_Op op, - int size, int ext, NBC_Schedule *schedule, NBC_Handle *handle) { + int size, int ext, NBC_Schedule *schedule, void *tmpbuf) { int segsize, *segsizes, *segoffsets; /* segment sizes and offsets per segment (number of segments == number of nodes */ int speer, rpeer; /* send and recvpeer */ int res = OMPI_SUCCESS; @@ -625,7 +606,7 @@ static inline int allred_sched_ring (int r, int p, int count, MPI_Datatype datat } static inline int allred_sched_linear(int rank, int rsize, const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, - ptrdiff_t gap, MPI_Op op, int ext, int size, NBC_Schedule *schedule, NBC_Handle *handle) { + ptrdiff_t gap, MPI_Op op, int ext, int size, NBC_Schedule *schedule, void *tmpbuf) { int res; if (0 == count) { diff --git a/ompi/mca/coll/libnbc/nbc_ialltoall.c b/ompi/mca/coll/libnbc/nbc_ialltoall.c index 45d38a8735f..77432194aab 100644 --- a/ompi/mca/coll/libnbc/nbc_ialltoall.c +++ b/ompi/mca/coll/libnbc/nbc_ialltoall.c @@ -8,7 +8,7 @@ * Copyright (c) 2013-2015 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 NVIDIA Corporation. All rights reserved. - * Copyright (c) 2014-2016 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ @@ -28,7 +28,7 @@ static inline int a2a_sched_pairwise(int rank, int p, MPI_Aint sndext, MPI_Aint int recvcount, MPI_Datatype recvtype, MPI_Comm comm); static inline int a2a_sched_diss(int rank, int p, MPI_Aint sndext, MPI_Aint rcvext, NBC_Schedule* schedule, const void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, - int recvcount, MPI_Datatype recvtype, MPI_Comm comm, NBC_Handle *handle); + int recvcount, MPI_Datatype recvtype, MPI_Comm comm, void* tmpbuf); static inline int a2a_sched_inplace(int rank, int p, NBC_Schedule* schedule, void* buf, int count, MPI_Datatype type, MPI_Aint ext, ptrdiff_t gap, MPI_Comm comm); @@ -66,7 +66,7 @@ int ompi_coll_libnbc_ialltoall(const void* sendbuf, int sendcount, MPI_Datatype #endif char *rbuf, *sbuf, inplace; enum {NBC_A2A_LINEAR, NBC_A2A_PAIRWISE, NBC_A2A_DISS, NBC_A2A_INPLACE} alg; - NBC_Handle *handle; + void *tmpbuf = NULL; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; ptrdiff_t span, gap; @@ -119,17 +119,11 @@ int ompi_coll_libnbc_ialltoall(const void* sendbuf, int sendcount, MPI_Datatype } } - res = NBC_Init_handle(comm, &handle, libnbc_module); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - return res; - } - /* allocate temp buffer if we need one */ if (alg == NBC_A2A_INPLACE) { span = opal_datatype_span(&recvtype->super, recvcount, &gap); - handle->tmpbuf = malloc(span); - if (OPAL_UNLIKELY(NULL == handle->tmpbuf)) { - NBC_Return_handle (handle); + tmpbuf = malloc(span); + if (OPAL_UNLIKELY(NULL == tmpbuf)) { return OMPI_ERR_OUT_OF_RESOURCE; } } else if (alg == NBC_A2A_DISS) { @@ -140,21 +134,19 @@ int ompi_coll_libnbc_ialltoall(const void* sendbuf, int sendcount, MPI_Datatype res = PMPI_Pack_size (sendcount, sendtype, comm, &datasize); if (MPI_SUCCESS != res) { NBC_Error("MPI Error in PMPI_Pack_size() (%i)", res); - NBC_Return_handle (handle); return res; } } /* allocate temporary buffers */ if ((p & 1) == 0) { - handle->tmpbuf = malloc (datasize * p * 2); + tmpbuf = malloc (datasize * p * 2); } else { /* we cannot divide p by two, so alloc more to be safe ... */ - handle->tmpbuf = malloc (datasize * (p / 2 + 1) * 2 * 2); + tmpbuf = malloc (datasize * (p / 2 + 1) * 2 * 2); } - if (OPAL_UNLIKELY(NULL == handle->tmpbuf)) { - NBC_Return_handle (handle); + if (OPAL_UNLIKELY(NULL == tmpbuf)) { return OMPI_ERR_OUT_OF_RESOURCE; } @@ -165,29 +157,29 @@ int ompi_coll_libnbc_ialltoall(const void* sendbuf, int sendcount, MPI_Datatype if (NBC_Type_intrinsic(sendtype)) { #endif /* OPAL_CUDA_SUPPORT */ /* contiguous - just copy (1st copy) */ - memcpy (handle->tmpbuf, (char *) sendbuf + datasize * rank, datasize * (p - rank)); + memcpy (tmpbuf, (char *) sendbuf + datasize * rank, datasize * (p - rank)); if (rank != 0) { - memcpy ((char *) handle->tmpbuf + datasize * (p - rank), sendbuf, datasize * rank); + memcpy ((char *) tmpbuf + datasize * (p - rank), sendbuf, datasize * rank); } } else { int pos=0; /* non-contiguous - pack */ - res = PMPI_Pack ((char *) sendbuf + rank * sendcount * sndext, (p - rank) * sendcount, sendtype, handle->tmpbuf, + res = PMPI_Pack ((char *) sendbuf + rank * sendcount * sndext, (p - rank) * sendcount, sendtype, tmpbuf, (p - rank) * datasize, &pos, comm); if (OPAL_UNLIKELY(MPI_SUCCESS != res)) { NBC_Error("MPI Error in PMPI_Pack() (%i)", res); - NBC_Return_handle (handle); + free(tmpbuf); return res; } if (rank != 0) { pos = 0; - res = PMPI_Pack(sendbuf, rank * sendcount, sendtype, (char *) handle->tmpbuf + datasize * (p - rank), + res = PMPI_Pack(sendbuf, rank * sendcount, sendtype, (char *) tmpbuf + datasize * (p - rank), rank * datasize, &pos, comm); if (OPAL_UNLIKELY(MPI_SUCCESS != res)) { NBC_Error("MPI Error in PMPI_Pack() (%i)", res); - NBC_Return_handle (handle); + free(tmpbuf); return res; } } @@ -208,13 +200,10 @@ int ompi_coll_libnbc_ialltoall(const void* sendbuf, int sendcount, MPI_Datatype /* not found - generate new schedule */ schedule = OBJ_NEW(NBC_Schedule); if (OPAL_UNLIKELY(NULL == schedule)) { - NBC_Return_handle (handle); + free(tmpbuf); return OMPI_ERR_OUT_OF_RESOURCE; } - /* ensure the schedule is released with the handle on error */ - handle->schedule = schedule; - switch(alg) { case NBC_A2A_INPLACE: res = a2a_sched_inplace(rank, p, schedule, recvbuf, recvcount, recvtype, rcvext, gap, comm); @@ -223,7 +212,7 @@ int ompi_coll_libnbc_ialltoall(const void* sendbuf, int sendcount, MPI_Datatype res = a2a_sched_linear(rank, p, sndext, rcvext, schedule, sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm); break; case NBC_A2A_DISS: - res = a2a_sched_diss(rank, p, sndext, rcvext, schedule, sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm, handle); + res = a2a_sched_diss(rank, p, sndext, rcvext, schedule, sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm, tmpbuf); break; case NBC_A2A_PAIRWISE: res = a2a_sched_pairwise(rank, p, sndext, rcvext, schedule, sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm); @@ -231,13 +220,15 @@ int ompi_coll_libnbc_ialltoall(const void* sendbuf, int sendcount, MPI_Datatype } if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } res = NBC_Sched_commit(schedule); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } @@ -273,14 +264,13 @@ int ompi_coll_libnbc_ialltoall(const void* sendbuf, int sendcount, MPI_Datatype } #endif - res = NBC_Start (handle, schedule); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, tmpbuf); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } @@ -292,7 +282,6 @@ int ompi_coll_libnbc_ialltoall_inter (const void* sendbuf, int sendcount, MPI_Da MPI_Aint sndext, rcvext; NBC_Schedule *schedule; char *rbuf, *sbuf; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; rsize = ompi_comm_remote_size (comm); @@ -341,20 +330,12 @@ int ompi_coll_libnbc_ialltoall_inter (const void* sendbuf, int sendcount, MPI_Da return res; } - res = NBC_Init_handle (comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start (handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - return res; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } @@ -416,7 +397,7 @@ static inline int a2a_sched_linear(int rank, int p, MPI_Aint sndext, MPI_Aint rc static inline int a2a_sched_diss(int rank, int p, MPI_Aint sndext, MPI_Aint rcvext, NBC_Schedule* schedule, const void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, - MPI_Datatype recvtype, MPI_Comm comm, NBC_Handle *handle) { + MPI_Datatype recvtype, MPI_Comm comm, void* tmpbuf) { int res, speer, rpeer, datasize, offset, virtp; char *rbuf, *rtmpbuf, *stmpbuf; @@ -436,13 +417,13 @@ static inline int a2a_sched_diss(int rank, int p, MPI_Aint sndext, MPI_Aint rcve /* allocate temporary buffers */ if ((p & 1) == 0) { - rtmpbuf = (char *) handle->tmpbuf + datasize * p; - stmpbuf = (char *) handle->tmpbuf + datasize * (p + p / 2); + rtmpbuf = (char *)tmpbuf + datasize * p; + stmpbuf = (char *)tmpbuf + datasize * (p + p / 2); } else { /* we cannot divide p by two, so alloc more to be safe ... */ virtp = (p / 2 + 1) * 2; - rtmpbuf = (char *) handle->tmpbuf + datasize * p; - stmpbuf = (char *) handle->tmpbuf + datasize * (p + virtp / 2); + rtmpbuf = (char *)tmpbuf + datasize * p; + stmpbuf = (char *)tmpbuf + datasize * (p + virtp / 2); } /* phase 2 - communicate */ @@ -454,7 +435,7 @@ static inline int a2a_sched_diss(int rank, int p, MPI_Aint sndext, MPI_Aint rcve /* copy data to sendbuffer (2nd copy) - could be avoided using iovecs */ /*printf("[%i] round %i: copying element %i to buffer %lu\n", rank, r, i, (unsigned long)(stmpbuf+offset));*/ res = NBC_Sched_copy((void *)(intptr_t)(i * datasize), true, datasize, MPI_BYTE, stmpbuf + offset - - (intptr_t) handle->tmpbuf, true, datasize, MPI_BYTE, schedule, false); + (intptr_t)tmpbuf, true, datasize, MPI_BYTE, schedule, false); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { return res; } @@ -466,12 +447,12 @@ static inline int a2a_sched_diss(int rank, int p, MPI_Aint sndext, MPI_Aint rcve /* add p because modulo does not work with negative values */ rpeer = ((rank - r) + p) % p; - res = NBC_Sched_recv (rtmpbuf - (intptr_t) handle->tmpbuf, true, offset, MPI_BYTE, rpeer, schedule, false); + res = NBC_Sched_recv (rtmpbuf - (intptr_t)tmpbuf, true, offset, MPI_BYTE, rpeer, schedule, false); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { return res; } - res = NBC_Sched_send (stmpbuf - (intptr_t) handle->tmpbuf, true, offset, MPI_BYTE, speer, schedule, true); + res = NBC_Sched_send (stmpbuf - (intptr_t)tmpbuf, true, offset, MPI_BYTE, speer, schedule, true); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { return res; } @@ -482,7 +463,7 @@ static inline int a2a_sched_diss(int rank, int p, MPI_Aint sndext, MPI_Aint rcve /* test if bit r is set in rank number i */ if (i & r) { /* copy data to tmpbuffer (3rd copy) - could be avoided using iovecs */ - res = NBC_Sched_copy (rtmpbuf + offset - (intptr_t) handle->tmpbuf, true, datasize, MPI_BYTE, + res = NBC_Sched_copy (rtmpbuf + offset - (intptr_t)tmpbuf, true, datasize, MPI_BYTE, (void *)(intptr_t)(i * datasize), true, datasize, MPI_BYTE, schedule, false); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { @@ -494,8 +475,7 @@ static inline int a2a_sched_diss(int rank, int p, MPI_Aint sndext, MPI_Aint rcve } } - /* phase 3 - reorder - data is now in wrong order in handle->tmpbuf - - * reorder it into recvbuf */ + /* phase 3 - reorder - data is now in wrong order in tmpbuf - reorder it into recvbuf */ for (int i = 0 ; i < p; ++i) { rbuf = (char *) recvbuf + ((rank - i + p) % p) * recvcount * rcvext; res = NBC_Sched_unpack ((void *)(intptr_t) (i * datasize), true, recvcount, recvtype, rbuf, false, schedule, diff --git a/ompi/mca/coll/libnbc/nbc_ialltoallv.c b/ompi/mca/coll/libnbc/nbc_ialltoallv.c index f7dacac1f3c..61f9d1a4192 100644 --- a/ompi/mca/coll/libnbc/nbc_ialltoallv.c +++ b/ompi/mca/coll/libnbc/nbc_ialltoallv.c @@ -50,7 +50,7 @@ int ompi_coll_libnbc_ialltoallv(const void* sendbuf, const int *sendcounts, cons NBC_Schedule *schedule; char *rbuf, *sbuf, inplace; ptrdiff_t gap, span; - NBC_Handle *handle; + void * tmpbuf = NULL; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; NBC_IN_PLACE(sendbuf, recvbuf, inplace); @@ -64,11 +64,6 @@ int ompi_coll_libnbc_ialltoallv(const void* sendbuf, const int *sendcounts, cons return res; } - res = NBC_Init_handle (comm, &handle, libnbc_module); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - return res; - } - /* copy data to receivbuffer */ if (inplace) { int count = 0; @@ -80,12 +75,10 @@ int ompi_coll_libnbc_ialltoallv(const void* sendbuf, const int *sendcounts, cons span = opal_datatype_span(&recvtype->super, count, &gap); if (OPAL_UNLIKELY(0 == span)) { *request = &ompi_request_empty; - NBC_Return_handle (handle); return MPI_SUCCESS; } - handle->tmpbuf = malloc(span); - if (OPAL_UNLIKELY(NULL == handle->tmpbuf)) { - NBC_Return_handle (handle); + tmpbuf = malloc(span); + if (OPAL_UNLIKELY(NULL == tmpbuf)) { return OMPI_ERR_OUT_OF_RESOURCE; } sendcounts = recvcounts; @@ -94,7 +87,6 @@ int ompi_coll_libnbc_ialltoallv(const void* sendbuf, const int *sendcounts, cons res = ompi_datatype_type_extent (sendtype, &sndext); if (MPI_SUCCESS != res) { NBC_Error("MPI Error in ompi_datatype_type_extent() (%i)", res); - NBC_Return_handle (handle); return res; } if (sendcounts[rank] != 0) { @@ -109,7 +101,7 @@ int ompi_coll_libnbc_ialltoallv(const void* sendbuf, const int *sendcounts, cons schedule = OBJ_NEW(NBC_Schedule); if (OPAL_UNLIKELY(NULL == schedule)) { - NBC_Return_handle (handle); + free(tmpbuf); return OMPI_ERR_OUT_OF_RESOURCE; } @@ -123,27 +115,25 @@ int ompi_coll_libnbc_ialltoallv(const void* sendbuf, const int *sendcounts, cons recvbuf, recvcounts, rdispls, rcvext, recvtype); } if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); OBJ_RELEASE(schedule); + free(tmpbuf); return res; } res = NBC_Sched_commit (schedule); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - res = NBC_Start(handle, schedule); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, tmpbuf); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } @@ -156,7 +146,6 @@ int ompi_coll_libnbc_ialltoallv_inter (const void* sendbuf, const int *sendcount int res, rsize; MPI_Aint sndext, rcvext; NBC_Schedule *schedule; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; @@ -206,21 +195,12 @@ int ompi_coll_libnbc_ialltoallv_inter (const void* sendbuf, const int *sendcount return res; } - res = NBC_Init_handle(comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start(handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - OBJ_RELEASE(schedule); - return res; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } diff --git a/ompi/mca/coll/libnbc/nbc_ialltoallw.c b/ompi/mca/coll/libnbc/nbc_ialltoallw.c index e818eef54bf..164a250eafc 100644 --- a/ompi/mca/coll/libnbc/nbc_ialltoallw.c +++ b/ompi/mca/coll/libnbc/nbc_ialltoallw.c @@ -49,7 +49,7 @@ int ompi_coll_libnbc_ialltoallw(const void* sendbuf, const int *sendcounts, cons NBC_Schedule *schedule; char *rbuf, *sbuf, inplace; ptrdiff_t span=0; - NBC_Handle *handle; + void *tmpbuf = NULL; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; NBC_IN_PLACE(sendbuf, recvbuf, inplace); @@ -57,11 +57,6 @@ int ompi_coll_libnbc_ialltoallw(const void* sendbuf, const int *sendcounts, cons rank = ompi_comm_rank (comm); p = ompi_comm_size (comm); - res = NBC_Init_handle (comm, &handle, libnbc_module); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - return res; - } - /* copy data to receivbuffer */ if (inplace) { ptrdiff_t lgap, lspan; @@ -73,12 +68,10 @@ int ompi_coll_libnbc_ialltoallw(const void* sendbuf, const int *sendcounts, cons } if (OPAL_UNLIKELY(0 == span)) { *request = &ompi_request_empty; - NBC_Return_handle (handle); return OMPI_SUCCESS; } - handle->tmpbuf = malloc(span); - if (OPAL_UNLIKELY(NULL == handle->tmpbuf)) { - NBC_Return_handle (handle); + tmpbuf = malloc(span); + if (OPAL_UNLIKELY(NULL == tmpbuf)) { return OMPI_ERR_OUT_OF_RESOURCE; } sendcounts = recvcounts; @@ -89,14 +82,13 @@ int ompi_coll_libnbc_ialltoallw(const void* sendbuf, const int *sendcounts, cons sbuf = (char *) sendbuf + sdispls[rank]; res = NBC_Copy(sbuf, sendcounts[rank], sendtypes[rank], rbuf, recvcounts[rank], recvtypes[rank], comm); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); return res; } } schedule = OBJ_NEW(NBC_Schedule); if (OPAL_UNLIKELY(NULL == schedule)) { - NBC_Return_handle (handle); + free(tmpbuf); return OMPI_ERR_OUT_OF_RESOURCE; } @@ -109,26 +101,25 @@ int ompi_coll_libnbc_ialltoallw(const void* sendbuf, const int *sendcounts, cons recvbuf, recvcounts, rdispls, recvtypes); } if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); OBJ_RELEASE(schedule); + free(tmpbuf); return res; } res = NBC_Sched_commit (schedule); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - res = NBC_Start (handle, schedule); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, tmpbuf); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } @@ -141,7 +132,6 @@ int ompi_coll_libnbc_ialltoallw_inter (const void* sendbuf, const int *sendcount int res, rsize; NBC_Schedule *schedule; char *rbuf, *sbuf; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; rsize = ompi_comm_remote_size (comm); @@ -178,20 +168,12 @@ int ompi_coll_libnbc_ialltoallw_inter (const void* sendbuf, const int *sendcount return res; } - res = NBC_Init_handle (comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start (handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - return res; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } diff --git a/ompi/mca/coll/libnbc/nbc_ibarrier.c b/ompi/mca/coll/libnbc/nbc_ibarrier.c index 2a0a14072f0..8e0b0a6bd6b 100644 --- a/ompi/mca/coll/libnbc/nbc_ibarrier.c +++ b/ompi/mca/coll/libnbc/nbc_ibarrier.c @@ -7,7 +7,7 @@ * rights reserved. * Copyright (c) 2013-2015 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2014 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Mellanox Technologies. All rights reserved. * Copyright (c) 2017 IBM Corporation. All rights reserved. diff --git a/ompi/mca/coll/libnbc/nbc_ibcast.c b/ompi/mca/coll/libnbc/nbc_ibcast.c index ec28465a70c..932341847d8 100644 --- a/ompi/mca/coll/libnbc/nbc_ibcast.c +++ b/ompi/mca/coll/libnbc/nbc_ibcast.c @@ -55,7 +55,6 @@ int ompi_coll_libnbc_ibcast(void *buffer, int count, MPI_Datatype datatype, int NBC_Bcast_args *args, *found, search; #endif enum { NBC_BCAST_LINEAR, NBC_BCAST_BINOMIAL, NBC_BCAST_CHAIN } alg; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; rank = ompi_comm_rank (comm); @@ -163,20 +162,12 @@ int ompi_coll_libnbc_ibcast(void *buffer, int count, MPI_Datatype datatype, int } #endif - res = NBC_Init_handle (comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start (handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - return res; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } @@ -331,3 +322,55 @@ static inline int bcast_sched_chain(int rank, int p, int root, NBC_Schedule *sch return OMPI_SUCCESS; } + +int ompi_coll_libnbc_ibcast_inter(void *buffer, int count, MPI_Datatype datatype, int root, + struct ompi_communicator_t *comm, ompi_request_t ** request, + struct mca_coll_base_module_2_2_0_t *module) { + int res; + NBC_Schedule *schedule; + ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; + + schedule = OBJ_NEW(NBC_Schedule); + if (OPAL_UNLIKELY(NULL == schedule)) { + return OMPI_ERR_OUT_OF_RESOURCE; + } + + if (root != MPI_PROC_NULL) { + /* send to all others */ + if (root == MPI_ROOT) { + int remsize; + + remsize = ompi_comm_remote_size (comm); + + for (int peer = 0 ; peer < remsize ; ++peer) { + /* send msg to peer */ + res = NBC_Sched_send (buffer, false, count, datatype, peer, schedule, false); + if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { + OBJ_RELEASE(schedule); + return res; + } + } + } else { + /* recv msg from root */ + res = NBC_Sched_recv (buffer, false, count, datatype, root, schedule, false); + if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { + OBJ_RELEASE(schedule); + return res; + } + } + } + + res = NBC_Sched_commit (schedule); + if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { + OBJ_RELEASE(schedule); + return res; + } + + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); + if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { + OBJ_RELEASE(schedule); + return res; + } + + return OMPI_SUCCESS; +} diff --git a/ompi/mca/coll/libnbc/nbc_ibcast_inter.c b/ompi/mca/coll/libnbc/nbc_ibcast_inter.c deleted file mode 100644 index 9b591356146..00000000000 --- a/ompi/mca/coll/libnbc/nbc_ibcast_inter.c +++ /dev/null @@ -1,81 +0,0 @@ -/* -*- Mode: C; c-basic-offset:2 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2006 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2006 The Technical University of Chemnitz. All - * rights reserved. - * Copyright (c) 2014-2015 Research Organization for Information Science - * and Technology (RIST). All rights reserved. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights - * reserved. - * Copyright (c) 2017 IBM Corporation. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * Author(s): Torsten Hoefler - * - */ -#include "nbc_internal.h" - -int ompi_coll_libnbc_ibcast_inter(void *buffer, int count, MPI_Datatype datatype, int root, - struct ompi_communicator_t *comm, ompi_request_t ** request, - struct mca_coll_base_module_2_2_0_t *module) { - int res; - NBC_Schedule *schedule; - NBC_Handle *handle; - ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; - - schedule = OBJ_NEW(NBC_Schedule); - if (OPAL_UNLIKELY(NULL == schedule)) { - return OMPI_ERR_OUT_OF_RESOURCE; - } - - if (root != MPI_PROC_NULL) { - /* send to all others */ - if (root == MPI_ROOT) { - int remsize; - - remsize = ompi_comm_remote_size (comm); - - for (int peer = 0 ; peer < remsize ; ++peer) { - /* send msg to peer */ - res = NBC_Sched_send (buffer, false, count, datatype, peer, schedule, false); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - OBJ_RELEASE(schedule); - return res; - } - } - } else { - /* recv msg from root */ - res = NBC_Sched_recv (buffer, false, count, datatype, root, schedule, false); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - OBJ_RELEASE(schedule); - return res; - } - } - } - - res = NBC_Sched_commit (schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - OBJ_RELEASE(schedule); - return res; - } - - res = NBC_Init_handle (comm, &handle, libnbc_module); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - OBJ_RELEASE(schedule); - return res; - } - - res = NBC_Start(handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - return res; - } - - *request = (ompi_request_t *) handle; - - return OMPI_SUCCESS; -} diff --git a/ompi/mca/coll/libnbc/nbc_iexscan.c b/ompi/mca/coll/libnbc/nbc_iexscan.c index 3ae838a29fb..a9fb0fba14d 100644 --- a/ompi/mca/coll/libnbc/nbc_iexscan.c +++ b/ompi/mca/coll/libnbc/nbc_iexscan.c @@ -7,7 +7,7 @@ * rights reserved. * Copyright (c) 2013-2015 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2014-2015 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ @@ -55,7 +55,7 @@ int ompi_coll_libnbc_iexscan(const void* sendbuf, void* recvbuf, int count, MPI_ NBC_Scan_args *args, *found, search; #endif char inplace; - NBC_Handle *handle; + void *tmpbuf = NULL; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; NBC_IN_PLACE(sendbuf, recvbuf, inplace); @@ -63,25 +63,19 @@ int ompi_coll_libnbc_iexscan(const void* sendbuf, void* recvbuf, int count, MPI_ rank = ompi_comm_rank (comm); p = ompi_comm_size (comm); - res = NBC_Init_handle(comm, &handle, libnbc_module); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - return res; - } - span = opal_datatype_span(&datatype->super, count, &gap); if (0 < rank) { - handle->tmpbuf = malloc(span); - if (handle->tmpbuf == NULL) { - NBC_Return_handle (handle); + tmpbuf = malloc(span); + if (NULL == tmpbuf) { return OMPI_ERR_OUT_OF_RESOURCE; } if (inplace) { - res = NBC_Copy(recvbuf, count, datatype, (char *)handle->tmpbuf-gap, count, datatype, comm); + res = NBC_Copy(recvbuf, count, datatype, (char *)tmpbuf-gap, count, datatype, comm); } else { - res = NBC_Copy(sendbuf, count, datatype, (char *)handle->tmpbuf-gap, count, datatype, comm); + res = NBC_Copy(sendbuf, count, datatype, (char *)tmpbuf-gap, count, datatype, comm); } if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + free(tmpbuf); return res; } } @@ -98,18 +92,16 @@ int ompi_coll_libnbc_iexscan(const void* sendbuf, void* recvbuf, int count, MPI_ #endif schedule = OBJ_NEW(NBC_Schedule); if (OPAL_UNLIKELY(NULL == schedule)) { - NBC_Return_handle (handle); + free(tmpbuf); return OMPI_ERR_OUT_OF_RESOURCE; } - /* make sure the schedule is released with the handle on error */ - handle->schedule = schedule; - if (rank != 0) { res = NBC_Sched_recv (recvbuf, false, count, datatype, rank-1, schedule, false); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } @@ -117,7 +109,8 @@ int ompi_coll_libnbc_iexscan(const void* sendbuf, void* recvbuf, int count, MPI_ /* we have to wait until we have the data */ res = NBC_Sched_barrier(schedule); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } @@ -125,14 +118,16 @@ int ompi_coll_libnbc_iexscan(const void* sendbuf, void* recvbuf, int count, MPI_ datatype, op, schedule, true); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } /* send reduced data onward */ res = NBC_Sched_send ((void *)(-gap), true, count, datatype, rank + 1, schedule, false); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } } @@ -143,14 +138,16 @@ int ompi_coll_libnbc_iexscan(const void* sendbuf, void* recvbuf, int count, MPI_ res = NBC_Sched_send (sendbuf, false, count, datatype, 1, schedule, false); } if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } } res = NBC_Sched_commit(schedule); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } @@ -185,14 +182,12 @@ int ompi_coll_libnbc_iexscan(const void* sendbuf, void* recvbuf, int count, MPI_ } #endif - res = NBC_Start (handle, schedule); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, tmpbuf); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - *request = (ompi_request_t *) handle; - - /* tmpbuf is freed with the handle */ return OMPI_SUCCESS; } diff --git a/ompi/mca/coll/libnbc/nbc_igather.c b/ompi/mca/coll/libnbc/nbc_igather.c index b1971dda96c..bafb58517ce 100644 --- a/ompi/mca/coll/libnbc/nbc_igather.c +++ b/ompi/mca/coll/libnbc/nbc_igather.c @@ -8,7 +8,7 @@ * Copyright (c) 2013 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. - * Copyright (c) 2014-2016 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. @@ -51,7 +51,6 @@ int ompi_coll_libnbc_igather(const void* sendbuf, int sendcount, MPI_Datatype se MPI_Aint rcvext = 0; NBC_Schedule *schedule; char *rbuf, inplace = 0; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; rank = ompi_comm_rank (comm); @@ -161,20 +160,12 @@ int ompi_coll_libnbc_igather(const void* sendbuf, int sendcount, MPI_Datatype se } #endif - res = NBC_Init_handle (comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start (handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - return res; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } @@ -186,7 +177,6 @@ int ompi_coll_libnbc_igather_inter (const void* sendbuf, int sendcount, MPI_Data MPI_Aint rcvext = 0; NBC_Schedule *schedule; char *rbuf; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; rsize = ompi_comm_remote_size (comm); @@ -230,19 +220,11 @@ int ompi_coll_libnbc_igather_inter (const void* sendbuf, int sendcount, MPI_Data return res; } - res = NBC_Init_handle (comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start (handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - return res; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } diff --git a/ompi/mca/coll/libnbc/nbc_igatherv.c b/ompi/mca/coll/libnbc/nbc_igatherv.c index 57d2ddbbefe..a15f800482b 100644 --- a/ompi/mca/coll/libnbc/nbc_igatherv.c +++ b/ompi/mca/coll/libnbc/nbc_igatherv.c @@ -8,7 +8,7 @@ * Copyright (c) 2013 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. - * Copyright (c) 2014-2016 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. @@ -36,7 +36,6 @@ int ompi_coll_libnbc_igatherv(const void* sendbuf, int sendcount, MPI_Datatype s MPI_Aint rcvext = 0; NBC_Schedule *schedule; char *rbuf, inplace = 0; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; rank = ompi_comm_rank (comm); @@ -96,20 +95,12 @@ int ompi_coll_libnbc_igatherv(const void* sendbuf, int sendcount, MPI_Datatype s return res; } - res = NBC_Init_handle (comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start (handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - return res; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } @@ -121,7 +112,6 @@ int ompi_coll_libnbc_igatherv_inter (const void* sendbuf, int sendcount, MPI_Dat MPI_Aint rcvext; NBC_Schedule *schedule; char *rbuf; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; rsize = ompi_comm_remote_size (comm); @@ -165,19 +155,11 @@ int ompi_coll_libnbc_igatherv_inter (const void* sendbuf, int sendcount, MPI_Dat return res; } - res = NBC_Init_handle (comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start (handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - return res; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } diff --git a/ompi/mca/coll/libnbc/nbc_ineighbor_allgather.c b/ompi/mca/coll/libnbc/nbc_ineighbor_allgather.c index eeb63717302..77fbf3978f0 100644 --- a/ompi/mca/coll/libnbc/nbc_ineighbor_allgather.c +++ b/ompi/mca/coll/libnbc/nbc_ineighbor_allgather.c @@ -5,7 +5,7 @@ * Corporation. All rights reserved. * Copyright (c) 2006 The Technical University of Chemnitz. All * rights reserved. - * Copyright (c) 2014-2015 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. @@ -48,7 +48,6 @@ int ompi_coll_libnbc_ineighbor_allgather(const void *sbuf, int scount, MPI_Datat ompi_request_t ** request, struct mca_coll_base_module_2_2_0_t *module) { int res, indegree, outdegree, *srcs, *dsts; MPI_Aint rcvext; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; NBC_Schedule *schedule; @@ -153,20 +152,11 @@ int ompi_coll_libnbc_ineighbor_allgather(const void *sbuf, int scount, MPI_Datat } #endif - res = NBC_Init_handle(comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start(handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - OBJ_RELEASE(schedule); - return res; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } diff --git a/ompi/mca/coll/libnbc/nbc_ineighbor_allgatherv.c b/ompi/mca/coll/libnbc/nbc_ineighbor_allgatherv.c index e89d1972725..d963fcc4235 100644 --- a/ompi/mca/coll/libnbc/nbc_ineighbor_allgatherv.c +++ b/ompi/mca/coll/libnbc/nbc_ineighbor_allgatherv.c @@ -5,7 +5,7 @@ * Corporation. All rights reserved. * Copyright (c) 2006 The Technical University of Chemnitz. All * rights reserved. - * Copyright (c) 2014-2015 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. @@ -49,7 +49,6 @@ int ompi_coll_libnbc_ineighbor_allgatherv(const void *sbuf, int scount, MPI_Data struct mca_coll_base_module_2_2_0_t *module) { int res, indegree, outdegree, *srcs, *dsts; MPI_Aint rcvext; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; NBC_Schedule *schedule; @@ -155,20 +154,11 @@ int ompi_coll_libnbc_ineighbor_allgatherv(const void *sbuf, int scount, MPI_Data } #endif - res = NBC_Init_handle(comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start (handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - OBJ_RELEASE(schedule); - return OMPI_ERR_OUT_OF_RESOURCE; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } diff --git a/ompi/mca/coll/libnbc/nbc_ineighbor_alltoall.c b/ompi/mca/coll/libnbc/nbc_ineighbor_alltoall.c index f4bdc7259fc..d9ae492ee21 100644 --- a/ompi/mca/coll/libnbc/nbc_ineighbor_alltoall.c +++ b/ompi/mca/coll/libnbc/nbc_ineighbor_alltoall.c @@ -5,7 +5,7 @@ * Corporation. All rights reserved. * Copyright (c) 2006 The Technical University of Chemnitz. All * rights reserved. - * Copyright (c) 2014-2015 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. @@ -45,7 +45,6 @@ int ompi_coll_libnbc_ineighbor_alltoall(const void *sbuf, int scount, MPI_Dataty ompi_request_t ** request, struct mca_coll_base_module_2_2_0_t *module) { int res, indegree, outdegree, *srcs, *dsts; MPI_Aint sndext, rcvext; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; NBC_Schedule *schedule; @@ -157,19 +156,11 @@ int ompi_coll_libnbc_ineighbor_alltoall(const void *sbuf, int scount, MPI_Dataty } #endif - res = NBC_Init_handle (comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start (handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - return OMPI_ERR_OUT_OF_RESOURCE; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } diff --git a/ompi/mca/coll/libnbc/nbc_ineighbor_alltoallv.c b/ompi/mca/coll/libnbc/nbc_ineighbor_alltoallv.c index 8f2e99522dd..4caf50e010b 100644 --- a/ompi/mca/coll/libnbc/nbc_ineighbor_alltoallv.c +++ b/ompi/mca/coll/libnbc/nbc_ineighbor_alltoallv.c @@ -5,7 +5,7 @@ * Corporation. All rights reserved. * Copyright (c) 2006 The Technical University of Chemnitz. All * rights reserved. - * Copyright (c) 2014-2015 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. @@ -49,7 +49,6 @@ int ompi_coll_libnbc_ineighbor_alltoallv(const void *sbuf, const int *scounts, c struct mca_coll_base_module_2_2_0_t *module) { int res, indegree, outdegree, *srcs, *dsts; MPI_Aint sndext, rcvext; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; NBC_Schedule *schedule; @@ -162,19 +161,11 @@ int ompi_coll_libnbc_ineighbor_alltoallv(const void *sbuf, const int *scounts, c } #endif - res = NBC_Init_handle(comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start (handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - return res; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } diff --git a/ompi/mca/coll/libnbc/nbc_ineighbor_alltoallw.c b/ompi/mca/coll/libnbc/nbc_ineighbor_alltoallw.c index c434815c382..10033010c62 100644 --- a/ompi/mca/coll/libnbc/nbc_ineighbor_alltoallw.c +++ b/ompi/mca/coll/libnbc/nbc_ineighbor_alltoallw.c @@ -5,7 +5,7 @@ * Corporation. All rights reserved. * Copyright (c) 2006 The Technical University of Chemnitz. All * rights reserved. - * Copyright (c) 2014-2015 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. @@ -47,7 +47,6 @@ int ompi_coll_libnbc_ineighbor_alltoallw(const void *sbuf, const int *scounts, c struct ompi_communicator_t *comm, ompi_request_t ** request, struct mca_coll_base_module_2_2_0_t *module) { int res, indegree, outdegree, *srcs, *dsts; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; NBC_Schedule *schedule; @@ -147,19 +146,11 @@ int ompi_coll_libnbc_ineighbor_alltoallw(const void *sbuf, const int *scounts, c } #endif - res = NBC_Init_handle(comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start(handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - return res; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } diff --git a/ompi/mca/coll/libnbc/nbc_internal.h b/ompi/mca/coll/libnbc/nbc_internal.h index b463a20afd1..f43c5e905bb 100644 --- a/ompi/mca/coll/libnbc/nbc_internal.h +++ b/ompi/mca/coll/libnbc/nbc_internal.h @@ -10,7 +10,7 @@ * * Copyright (c) 2012 Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2014 NVIDIA Corporation. All rights reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. @@ -261,6 +261,7 @@ void NBC_SchedCache_args_delete_key_dummy(void *k); int NBC_Start(NBC_Handle *handle, NBC_Schedule *schedule); int NBC_Init_handle(struct ompi_communicator_t *comm, ompi_coll_libnbc_request_t **request, ompi_coll_libnbc_module_t *module); +int NBC_Schedule_request(NBC_Schedule *schedule, ompi_communicator_t *comm, ompi_coll_libnbc_module_t *module, ompi_request_t **request, void *tmpbuf); void NBC_Return_handle(ompi_coll_libnbc_request_t *request); static inline int NBC_Type_intrinsic(MPI_Datatype type); int NBC_Create_fortran_handle(int *fhandle, NBC_Handle **handle); @@ -539,7 +540,7 @@ static inline int NBC_Copy(const void *src, int srccount, MPI_Datatype srctype, static inline int NBC_Unpack(void *src, int srccount, MPI_Datatype srctype, void *tgt, MPI_Comm comm) { int size, pos, res; - OPAL_PTRDIFF_TYPE ext, lb; + ptrdiff_t ext, lb; #if OPAL_CUDA_SUPPORT if(NBC_Type_intrinsic(srctype) && !(opal_cuda_check_bufs((char *)tgt, (char *)src))) { diff --git a/ompi/mca/coll/libnbc/nbc_ireduce.c b/ompi/mca/coll/libnbc/nbc_ireduce.c index 377ebe02669..b35801aeb2d 100644 --- a/ompi/mca/coll/libnbc/nbc_ireduce.c +++ b/ompi/mca/coll/libnbc/nbc_ireduce.c @@ -24,12 +24,12 @@ #include "nbc_internal.h" static inline int red_sched_binomial (int rank, int p, int root, const void *sendbuf, void *redbuf, char tmpredbuf, int count, MPI_Datatype datatype, - MPI_Op op, char inplace, NBC_Schedule *schedule, NBC_Handle *handle); + MPI_Op op, char inplace, NBC_Schedule *schedule, void *tmpbuf); static inline int red_sched_chain (int rank, int p, int root, const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, - MPI_Op op, int ext, size_t size, NBC_Schedule *schedule, NBC_Handle *handle, int fragsize); + MPI_Op op, int ext, size_t size, NBC_Schedule *schedule, void *tmpbuf, int fragsize); static inline int red_sched_linear (int rank, int rsize, int root, const void *sendbuf, void *recvbuf, void *tmpbuf, int count, MPI_Datatype datatype, - MPI_Op op, NBC_Schedule *schedule, NBC_Handle *handle); + MPI_Op op, NBC_Schedule *schedule); #ifdef NBC_CACHE_SCHEDULE /* tree comparison function for schedule cache */ @@ -60,9 +60,9 @@ int ompi_coll_libnbc_ireduce(const void* sendbuf, void* recvbuf, int count, MPI_ MPI_Aint ext; NBC_Schedule *schedule; char *redbuf=NULL, inplace; + void *tmpbuf; char tmpredbuf = 0; enum { NBC_RED_BINOMIAL, NBC_RED_CHAIN } alg; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; ptrdiff_t span, gap; @@ -95,11 +95,6 @@ int ompi_coll_libnbc_ireduce(const void* sendbuf, void* recvbuf, int count, MPI_ return OMPI_SUCCESS; } - res = NBC_Init_handle (comm, &handle, libnbc_module); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - return res; - } - span = opal_datatype_span(&datatype->super, count, &gap); /* algorithm selection */ @@ -107,23 +102,22 @@ int ompi_coll_libnbc_ireduce(const void* sendbuf, void* recvbuf, int count, MPI_ alg = NBC_RED_BINOMIAL; if(rank == root) { /* root reduces in receivebuffer */ - handle->tmpbuf = malloc (span); + tmpbuf = malloc (span); redbuf = recvbuf; } else { /* recvbuf may not be valid on non-root nodes */ ptrdiff_t span_align = OPAL_ALIGN(span, datatype->super.align, ptrdiff_t); - handle->tmpbuf = malloc (span_align + span); + tmpbuf = malloc (span_align + span); redbuf = (char*)span_align - gap; tmpredbuf = 1; } } else { - handle->tmpbuf = malloc (span); + tmpbuf = malloc (span); alg = NBC_RED_CHAIN; segsize = 16384/2; } - if (OPAL_UNLIKELY(NULL == handle->tmpbuf)) { - NBC_Return_handle (handle); + if (OPAL_UNLIKELY(NULL == tmpbuf)) { return OMPI_ERR_OUT_OF_RESOURCE; } @@ -142,30 +136,29 @@ int ompi_coll_libnbc_ireduce(const void* sendbuf, void* recvbuf, int count, MPI_ #endif schedule = OBJ_NEW(NBC_Schedule); if (OPAL_UNLIKELY(NULL == schedule)) { - NBC_Return_handle (handle); + free(tmpbuf); return OMPI_ERR_OUT_OF_RESOURCE; } - /* make sure the schedule is released with the handle on error */ - handle->schedule = schedule; - switch(alg) { case NBC_RED_BINOMIAL: - res = red_sched_binomial(rank, p, root, sendbuf, redbuf, tmpredbuf, count, datatype, op, inplace, schedule, handle); + res = red_sched_binomial(rank, p, root, sendbuf, redbuf, tmpredbuf, count, datatype, op, inplace, schedule, tmpbuf); break; case NBC_RED_CHAIN: - res = red_sched_chain(rank, p, root, sendbuf, recvbuf, count, datatype, op, ext, size, schedule, handle, segsize); + res = red_sched_chain(rank, p, root, sendbuf, recvbuf, count, datatype, op, ext, size, schedule, tmpbuf, segsize); break; } if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } res = NBC_Sched_commit(schedule); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } #ifdef NBC_CACHE_SCHEDULE @@ -200,15 +193,13 @@ int ompi_coll_libnbc_ireduce(const void* sendbuf, void* recvbuf, int count, MPI_ } #endif - res = NBC_Start(handle, schedule); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, tmpbuf); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - *request = (ompi_request_t *) handle; - - /* tmpbuf is freed with the handle */ return OMPI_SUCCESS; } @@ -217,52 +208,46 @@ int ompi_coll_libnbc_ireduce_inter(const void* sendbuf, void* recvbuf, int count struct mca_coll_base_module_2_2_0_t *module) { int rank, res, rsize; NBC_Schedule *schedule; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; ptrdiff_t span, gap; + void *tmpbuf; rank = ompi_comm_rank (comm); rsize = ompi_comm_remote_size (comm); - res = NBC_Init_handle(comm, &handle, libnbc_module); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - return res; - } - span = opal_datatype_span(&datatype->super, count, &gap); - handle->tmpbuf = malloc (span); - if (OPAL_UNLIKELY(NULL == handle->tmpbuf)) { - NBC_Return_handle (handle); + tmpbuf = malloc (span); + if (OPAL_UNLIKELY(NULL == tmpbuf)) { return OMPI_ERR_OUT_OF_RESOURCE; } schedule = OBJ_NEW(NBC_Schedule); if (OPAL_UNLIKELY(NULL == schedule)) { - NBC_Return_handle (handle); + free(tmpbuf); return OMPI_ERR_OUT_OF_RESOURCE; } - res = red_sched_linear (rank, rsize, root, sendbuf, recvbuf, (void *)(-gap), count, datatype, op, schedule, handle); + res = red_sched_linear (rank, rsize, root, sendbuf, recvbuf, (void *)(-gap), count, datatype, op, schedule); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return OMPI_ERR_OUT_OF_RESOURCE; } res = NBC_Sched_commit(schedule); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - return OMPI_ERR_OUT_OF_RESOURCE; + OBJ_RELEASE(schedule); + free(tmpbuf); + return res; } - res = NBC_Start(handle, schedule); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, tmpbuf); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return OMPI_ERR_OUT_OF_RESOURCE; } - *request = (ompi_request_t *) handle; - - /* tmpbuf is freed with the handle */ return OMPI_SUCCESS; } @@ -299,9 +284,9 @@ int ompi_coll_libnbc_ireduce_inter(const void* sendbuf, void* recvbuf, int count if (vrank == root) rank = 0; \ } static inline int red_sched_binomial (int rank, int p, int root, const void *sendbuf, void *redbuf, char tmpredbuf, int count, MPI_Datatype datatype, - MPI_Op op, char inplace, NBC_Schedule *schedule, NBC_Handle *handle) { + MPI_Op op, char inplace, NBC_Schedule *schedule, void *tmpbuf) { int vroot, vrank, vpeer, peer, res, maxr; - char *rbuf, *lbuf, *buf, tmpbuf; + char *rbuf, *lbuf, *buf; int tmprbuf, tmplbuf; ptrdiff_t gap; (void)opal_datatype_span(&datatype->super, count, &gap); @@ -330,7 +315,7 @@ static inline int red_sched_binomial (int rank, int p, int root, const void *sen rbuf = redbuf; tmprbuf = tmpredbuf; if (inplace) { - res = NBC_Copy(rbuf, count, datatype, ((char *)handle->tmpbuf)-gap, count, datatype, MPI_COMM_SELF); + res = NBC_Copy(rbuf, count, datatype, ((char *)tmpbuf)-gap, count, datatype, MPI_COMM_SELF); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { return res; } @@ -343,6 +328,7 @@ static inline int red_sched_binomial (int rank, int p, int root, const void *sen vpeer = vrank + (1 << (r - 1)); VRANK2RANK(peer, vpeer, vroot) if (peer < p) { + int tbuf; /* we have to wait until we have the data */ res = NBC_Sched_recv (rbuf, tmprbuf, count, datatype, peer, schedule, true); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { @@ -350,7 +336,7 @@ static inline int red_sched_binomial (int rank, int p, int root, const void *sen } /* perform the reduce in my local buffer */ - /* this cannot be done until handle->tmpbuf is unused :-( so barrier after the op */ + /* this cannot be done until tmpbuf is unused :-( so barrier after the op */ if (firstred && !inplace) { /* perform the reduce with the senbuf */ res = NBC_Sched_op (sendbuf, false, rbuf, tmprbuf, count, datatype, op, schedule, true); @@ -365,7 +351,7 @@ static inline int red_sched_binomial (int rank, int p, int root, const void *sen } /* swap left and right buffers */ buf = rbuf; rbuf = lbuf ; lbuf = buf; - tmpbuf = tmprbuf; tmprbuf = tmplbuf; tmplbuf = tmpbuf; + tbuf = tmprbuf; tmprbuf = tmplbuf; tmplbuf = tbuf; } } else { /* we have to send this round */ @@ -401,7 +387,7 @@ static inline int red_sched_binomial (int rank, int p, int root, const void *sen /* chain send ... */ static inline int red_sched_chain (int rank, int p, int root, const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, - MPI_Op op, int ext, size_t size, NBC_Schedule *schedule, NBC_Handle *handle, int fragsize) { + MPI_Op op, int ext, size_t size, NBC_Schedule *schedule, void *tmpbuf, int fragsize) { int res, vrank, rpeer, speer, numfrag, fragcount, thiscount; long offset; @@ -479,7 +465,7 @@ static inline int red_sched_chain (int rank, int p, int root, const void *sendbu /* simple linear algorithm for intercommunicators */ static inline int red_sched_linear (int rank, int rsize, int root, const void *sendbuf, void *recvbuf, void *tmpbuf, int count, MPI_Datatype datatype, - MPI_Op op, NBC_Schedule *schedule, NBC_Handle *handle) { + MPI_Op op, NBC_Schedule *schedule) { int res; char *rbuf, *lbuf, *buf; int tmprbuf, tmplbuf; diff --git a/ompi/mca/coll/libnbc/nbc_ireduce_scatter.c b/ompi/mca/coll/libnbc/nbc_ireduce_scatter.c index ffc9506ec28..49edfeb7d30 100644 --- a/ompi/mca/coll/libnbc/nbc_ireduce_scatter.c +++ b/ompi/mca/coll/libnbc/nbc_ireduce_scatter.c @@ -7,7 +7,7 @@ * rights reserved. * Copyright (c) 2013-2015 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2014-2016 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015 The University of Tennessee and The University * of Tennessee Research Foundation. All rights @@ -49,7 +49,7 @@ int ompi_coll_libnbc_ireduce_scatter(const void* sendbuf, void* recvbuf, const i ptrdiff_t gap, span, span_align; char *sbuf, inplace; NBC_Schedule *schedule; - NBC_Handle *handle; + void *tmpbuf; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; char *rbuf, *lbuf, *buf; @@ -82,18 +82,12 @@ int ompi_coll_libnbc_ireduce_scatter(const void* sendbuf, void* recvbuf, const i return OMPI_SUCCESS; } - res = NBC_Init_handle (comm, &handle, libnbc_module); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - return res; - } - maxr = (int) ceil ((log((double) p) / LOG2)); span = opal_datatype_span(&datatype->super, count, &gap); span_align = OPAL_ALIGN(span, datatype->super.align, ptrdiff_t); - handle->tmpbuf = malloc (span_align + span); - if (OPAL_UNLIKELY(NULL == handle->tmpbuf)) { - NBC_Return_handle (handle); + tmpbuf = malloc (span_align + span); + if (OPAL_UNLIKELY(NULL == tmpbuf)) { return OMPI_ERR_OUT_OF_RESOURCE; } @@ -102,13 +96,10 @@ int ompi_coll_libnbc_ireduce_scatter(const void* sendbuf, void* recvbuf, const i schedule = OBJ_NEW(NBC_Schedule); if (OPAL_UNLIKELY(NULL == schedule)) { - NBC_Return_handle (handle); + free(tmpbuf); return OMPI_ERR_OUT_OF_RESOURCE; } - /* make sure the schedule is released with the handle on error */ - handle->schedule = schedule; - for (int r = 1, firstred = 1 ; r <= maxr ; ++r) { if ((rank % (1 << r)) == 0) { /* we have to receive this round */ @@ -117,11 +108,12 @@ int ompi_coll_libnbc_ireduce_scatter(const void* sendbuf, void* recvbuf, const i /* we have to wait until we have the data */ res = NBC_Sched_recv(rbuf, true, count, datatype, peer, schedule, true); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - /* this cannot be done until handle->tmpbuf is unused :-( so barrier after the op */ + /* this cannot be done until tmpbuf is unused :-( so barrier after the op */ if (firstred) { /* take reduce data from the sendbuf in the first round -> save copy */ res = NBC_Sched_op (sendbuf, false, rbuf, true, count, datatype, op, schedule, true); @@ -132,7 +124,8 @@ int ompi_coll_libnbc_ireduce_scatter(const void* sendbuf, void* recvbuf, const i } if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } /* swap left and right buffers */ @@ -149,7 +142,8 @@ int ompi_coll_libnbc_ireduce_scatter(const void* sendbuf, void* recvbuf, const i res = NBC_Sched_send (lbuf, true, count, datatype, peer, schedule, false); } if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } @@ -160,7 +154,8 @@ int ompi_coll_libnbc_ireduce_scatter(const void* sendbuf, void* recvbuf, const i res = NBC_Sched_barrier(schedule); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } @@ -173,7 +168,8 @@ int ompi_coll_libnbc_ireduce_scatter(const void* sendbuf, void* recvbuf, const i res = NBC_Sched_send (sbuf, true, recvcounts[r], datatype, r, schedule, false); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } } @@ -185,25 +181,25 @@ int ompi_coll_libnbc_ireduce_scatter(const void* sendbuf, void* recvbuf, const i } if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } res = NBC_Sched_commit (schedule); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - res = NBC_Start (handle, schedule); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, tmpbuf); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - *request = (ompi_request_t *) handle; - - /* tmpbuf is freed with the handle */ return OMPI_SUCCESS; } @@ -214,7 +210,7 @@ int ompi_coll_libnbc_ireduce_scatter_inter (const void* sendbuf, void* recvbuf, MPI_Aint ext; ptrdiff_t gap, span, span_align; NBC_Schedule *schedule; - NBC_Handle *handle; + void *tmpbuf = NULL; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; rank = ompi_comm_rank (comm); @@ -235,32 +231,24 @@ int ompi_coll_libnbc_ireduce_scatter_inter (const void* sendbuf, void* recvbuf, span = opal_datatype_span(&datatype->super, count, &gap); span_align = OPAL_ALIGN(span, datatype->super.align, ptrdiff_t); - res = NBC_Init_handle(comm, &handle, libnbc_module); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - return res; - } - if (count > 0) { - handle->tmpbuf = malloc (span_align + span); - if (OPAL_UNLIKELY(NULL == handle->tmpbuf)) { - NBC_Return_handle (handle); + tmpbuf = malloc (span_align + span); + if (OPAL_UNLIKELY(NULL == tmpbuf)) { return OMPI_ERR_OUT_OF_RESOURCE; } } schedule = OBJ_NEW(NBC_Schedule); if (OPAL_UNLIKELY(NULL == schedule)) { - NBC_Return_handle (handle); + free(tmpbuf); return OMPI_ERR_OUT_OF_RESOURCE; } - /* make sure the schedule is released with the handle on error */ - handle->schedule = schedule; - /* send my data to the remote root */ res = NBC_Sched_send(sendbuf, false, count, datatype, 0, schedule, false); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } @@ -270,7 +258,8 @@ int ompi_coll_libnbc_ireduce_scatter_inter (const void* sendbuf, void* recvbuf, rbuf = (char *)(span_align-gap); res = NBC_Sched_recv (lbuf, true, count, datatype, 0, schedule, true); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } @@ -278,14 +267,16 @@ int ompi_coll_libnbc_ireduce_scatter_inter (const void* sendbuf, void* recvbuf, char *tbuf; res = NBC_Sched_recv (rbuf, true, count, datatype, peer, schedule, true); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } res = NBC_Sched_op (lbuf, true, rbuf, true, count, datatype, op, schedule, true); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } tbuf = lbuf; lbuf = rbuf; rbuf = tbuf; @@ -295,14 +286,16 @@ int ompi_coll_libnbc_ireduce_scatter_inter (const void* sendbuf, void* recvbuf, res = NBC_Sched_copy (lbuf, true, recvcounts[0], datatype, recvbuf, false, recvcounts[0], datatype, schedule, false); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } for (int peer = 1, offset = recvcounts[0] * ext; peer < lsize ; ++peer) { res = NBC_Sched_local_send (lbuf + offset, true, recvcounts[peer], datatype, peer, schedule, false); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } @@ -312,25 +305,25 @@ int ompi_coll_libnbc_ireduce_scatter_inter (const void* sendbuf, void* recvbuf, /* receive my block */ res = NBC_Sched_local_recv (recvbuf, false, recvcounts[rank], datatype, 0, schedule, false); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } } res = NBC_Sched_commit (schedule); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - res = NBC_Start (handle, schedule); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, tmpbuf); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - *request = (ompi_request_t *) handle; - - /* tmpbuf is freed with the handle */ return OMPI_SUCCESS; } diff --git a/ompi/mca/coll/libnbc/nbc_ireduce_scatter_block.c b/ompi/mca/coll/libnbc/nbc_ireduce_scatter_block.c index f3fb6213f45..5c1cedf7c2d 100644 --- a/ompi/mca/coll/libnbc/nbc_ireduce_scatter_block.c +++ b/ompi/mca/coll/libnbc/nbc_ireduce_scatter_block.c @@ -8,7 +8,7 @@ * Copyright (c) 2012 Sandia National Laboratories. All rights reserved. * Copyright (c) 2013-2015 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2014-2016 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ @@ -47,7 +47,7 @@ int ompi_coll_libnbc_ireduce_scatter_block(const void* sendbuf, void* recvbuf, i ptrdiff_t gap, span; char *redbuf, *sbuf, inplace; NBC_Schedule *schedule; - NBC_Handle *handle; + void *tmpbuf = NULL; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; NBC_IN_PLACE(sendbuf, recvbuf, inplace); @@ -61,20 +61,11 @@ int ompi_coll_libnbc_ireduce_scatter_block(const void* sendbuf, void* recvbuf, i return (MPI_SUCCESS == res) ? MPI_ERR_SIZE : res; } - res = NBC_Init_handle(comm, &handle, libnbc_module); - if (OMPI_SUCCESS != res) { - return res; - } - schedule = OBJ_NEW(NBC_Schedule); if (NULL == schedule) { - OMPI_COLL_LIBNBC_REQUEST_RETURN(handle); return OMPI_ERR_OUT_OF_RESOURCE; } - /* make sure the schedule is released with the handle on error */ - handle->schedule = schedule; - maxr = (int)ceil((log((double)p)/LOG2)); count = p * recvcount; @@ -85,23 +76,22 @@ int ompi_coll_libnbc_ireduce_scatter_block(const void* sendbuf, void* recvbuf, i span = opal_datatype_span(&datatype->super, count, &gap); span_align = OPAL_ALIGN(span, datatype->super.align, ptrdiff_t); - handle->tmpbuf = malloc (span_align + span); - if (NULL == handle->tmpbuf) { - OMPI_COLL_LIBNBC_REQUEST_RETURN(handle); + tmpbuf = malloc (span_align + span); + if (NULL == tmpbuf) { OBJ_RELEASE(schedule); return OMPI_ERR_OUT_OF_RESOURCE; } rbuf = (void *)(-gap); lbuf = (char *)(span_align - gap); - redbuf = (char *) handle->tmpbuf + span_align - gap; + redbuf = (char *) tmpbuf + span_align - gap; /* copy data to redbuf if we only have a single node */ if ((p == 1) && !inplace) { res = NBC_Copy (sendbuf, count, datatype, redbuf, count, datatype, comm); if (OMPI_SUCCESS != res) { - NBC_Return_handle (handle); OBJ_RELEASE(schedule); + free(tmpbuf); return res; } } @@ -114,7 +104,8 @@ int ompi_coll_libnbc_ireduce_scatter_block(const void* sendbuf, void* recvbuf, i /* we have to wait until we have the data */ res = NBC_Sched_recv (rbuf, true, count, datatype, peer, schedule, true); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } @@ -128,7 +119,8 @@ int ompi_coll_libnbc_ireduce_scatter_block(const void* sendbuf, void* recvbuf, i } if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } /* swap left and right buffers */ @@ -146,7 +138,8 @@ int ompi_coll_libnbc_ireduce_scatter_block(const void* sendbuf, void* recvbuf, i } if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } @@ -157,7 +150,8 @@ int ompi_coll_libnbc_ireduce_scatter_block(const void* sendbuf, void* recvbuf, i res = NBC_Sched_barrier(schedule); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } @@ -165,7 +159,8 @@ int ompi_coll_libnbc_ireduce_scatter_block(const void* sendbuf, void* recvbuf, i if (rank != 0) { res = NBC_Sched_recv (recvbuf, false, recvcount, datatype, 0, schedule, false); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } } else { @@ -175,7 +170,8 @@ int ompi_coll_libnbc_ireduce_scatter_block(const void* sendbuf, void* recvbuf, i /* root sends the right buffer to the right receiver */ res = NBC_Sched_send (sbuf, true, recvcount, datatype, r, schedule, false); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } } @@ -185,7 +181,8 @@ int ompi_coll_libnbc_ireduce_scatter_block(const void* sendbuf, void* recvbuf, i datatype, schedule, false); } if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } } @@ -193,19 +190,18 @@ int ompi_coll_libnbc_ireduce_scatter_block(const void* sendbuf, void* recvbuf, i res = NBC_Sched_commit (schedule); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - res = NBC_Start (handle, schedule); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, tmpbuf); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - *request = (ompi_request_t *) handle; - - /* tmpbuf is freed with the handle */ return OMPI_SUCCESS; } @@ -216,7 +212,7 @@ int ompi_coll_libnbc_ireduce_scatter_block_inter(const void *sendbuf, void *recv MPI_Aint ext; ptrdiff_t gap, span, span_align; NBC_Schedule *schedule; - NBC_Handle *handle; + void *tmpbuf = NULL; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; rank = ompi_comm_rank (comm); @@ -229,37 +225,29 @@ int ompi_coll_libnbc_ireduce_scatter_block_inter(const void *sendbuf, void *recv return res; } - res = NBC_Init_handle(comm, &handle, libnbc_module); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - return res; - } - count = rcount * lsize; span = opal_datatype_span(&dtype->super, count, &gap); span_align = OPAL_ALIGN(span, dtype->super.align, ptrdiff_t); if (count > 0) { - handle->tmpbuf = malloc (span_align + span); - if (NULL == handle->tmpbuf) { - NBC_Return_handle (handle); + tmpbuf = malloc (span_align + span); + if (NULL == tmpbuf) { return OMPI_ERR_OUT_OF_RESOURCE; } } schedule = OBJ_NEW(NBC_Schedule); if (NULL == schedule) { - NBC_Return_handle (handle); + free(tmpbuf); return OMPI_ERR_OUT_OF_RESOURCE; } - /* make sure the schedule is released with the handle on error */ - handle->schedule = schedule; - /* send my data to the remote root */ res = NBC_Sched_send (sendbuf, false, count, dtype, 0, schedule, false); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } @@ -269,7 +257,8 @@ int ompi_coll_libnbc_ireduce_scatter_block_inter(const void *sendbuf, void *recv rbuf = (char *)(span_align-gap); res = NBC_Sched_recv (lbuf, true, count, dtype, 0, schedule, true); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } @@ -277,14 +266,16 @@ int ompi_coll_libnbc_ireduce_scatter_block_inter(const void *sendbuf, void *recv char *tbuf; res = NBC_Sched_recv (rbuf, true, count, dtype, peer, schedule, true); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } res = NBC_Sched_op (lbuf, true, rbuf, true, count, dtype, op, schedule, true); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } tbuf = lbuf; lbuf = rbuf; rbuf = tbuf; @@ -294,13 +285,15 @@ int ompi_coll_libnbc_ireduce_scatter_block_inter(const void *sendbuf, void *recv res = NBC_Sched_copy (lbuf, true, rcount, dtype, recvbuf, false, rcount, dtype, schedule, false); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } for (int peer = 1 ; peer < lsize ; ++peer) { res = NBC_Sched_local_send (lbuf + ext * rcount * peer, true, rcount, dtype, peer, schedule, false); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } } @@ -308,7 +301,8 @@ int ompi_coll_libnbc_ireduce_scatter_block_inter(const void *sendbuf, void *recv /* receive my block */ res = NBC_Sched_local_recv(recvbuf, false, rcount, dtype, 0, schedule, false); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } } @@ -317,18 +311,17 @@ int ompi_coll_libnbc_ireduce_scatter_block_inter(const void *sendbuf, void *recv res = NBC_Sched_commit(schedule); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - res = NBC_Start(handle, schedule); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, tmpbuf); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - *request = (ompi_request_t *) handle; - - /* tmpbuf is freed with the handle */ return OMPI_SUCCESS; } diff --git a/ompi/mca/coll/libnbc/nbc_iscan.c b/ompi/mca/coll/libnbc/nbc_iscan.c index f99404d2cc7..87333251a04 100644 --- a/ompi/mca/coll/libnbc/nbc_iscan.c +++ b/ompi/mca/coll/libnbc/nbc_iscan.c @@ -5,7 +5,7 @@ * Corporation. All rights reserved. * Copyright (c) 2006 The Technical University of Chemnitz. All * rights reserved. - * Copyright (c) 2014-2015 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. @@ -51,8 +51,8 @@ int ompi_coll_libnbc_iscan(const void* sendbuf, void* recvbuf, int count, MPI_Da int rank, p, res; ptrdiff_t gap, span; NBC_Schedule *schedule; + void *tmpbuf = NULL; char inplace; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; NBC_IN_PLACE(sendbuf, recvbuf, inplace); @@ -68,11 +68,6 @@ int ompi_coll_libnbc_iscan(const void* sendbuf, void* recvbuf, int count, MPI_Da } } - res = NBC_Init_handle(comm, &handle, libnbc_module); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - return res; - } - #ifdef NBC_CACHE_SCHEDULE NBC_Scan_args *args, *found, search; @@ -87,34 +82,32 @@ int ompi_coll_libnbc_iscan(const void* sendbuf, void* recvbuf, int count, MPI_Da #endif schedule = OBJ_NEW(NBC_Schedule); if (OPAL_UNLIKELY(NULL == schedule)) { - NBC_Return_handle (handle); return OMPI_ERR_OUT_OF_RESOURCE; } - /* ensure the schedule is released with the handle */ - handle->schedule = schedule; - if(rank != 0) { span = opal_datatype_span(&datatype->super, count, &gap); - handle->tmpbuf = malloc (span); - if (NULL == handle->tmpbuf) { - NBC_Return_handle (handle); + tmpbuf = malloc (span); + if (NULL == tmpbuf) { + OBJ_RELEASE(schedule); return OMPI_ERR_OUT_OF_RESOURCE; } /* we have to wait until we have the data */ res = NBC_Sched_recv ((void *)(-gap), true, count, datatype, rank-1, schedule, true); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } /* perform the reduce in my local buffer */ - /* this cannot be done until handle->tmpbuf is unused :-( so barrier after the op */ + /* this cannot be done until tmpbuf is unused :-( so barrier after the op */ res = NBC_Sched_op ((void *)(-gap), true, recvbuf, false, count, datatype, op, schedule, true); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } } @@ -122,14 +115,16 @@ int ompi_coll_libnbc_iscan(const void* sendbuf, void* recvbuf, int count, MPI_Da if (rank != p-1) { res = NBC_Sched_send (recvbuf, false, count, datatype, rank+1, schedule, false); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } } res = NBC_Sched_commit (schedule); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } @@ -164,14 +159,12 @@ int ompi_coll_libnbc_iscan(const void* sendbuf, void* recvbuf, int count, MPI_Da } #endif - res = NBC_Start(handle, schedule); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, tmpbuf); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); + OBJ_RELEASE(schedule); + free(tmpbuf); return res; } - *request = (ompi_request_t *) handle; - - /* tmpbuf is freed with the handle */ return OMPI_SUCCESS; } diff --git a/ompi/mca/coll/libnbc/nbc_iscatter.c b/ompi/mca/coll/libnbc/nbc_iscatter.c index ecd887c090c..48b0917cdc4 100644 --- a/ompi/mca/coll/libnbc/nbc_iscatter.c +++ b/ompi/mca/coll/libnbc/nbc_iscatter.c @@ -10,7 +10,7 @@ * Copyright (c) 2013 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. - * Copyright (c) 2014-2016 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ @@ -52,7 +52,6 @@ int ompi_coll_libnbc_iscatter (const void* sendbuf, int sendcount, MPI_Datatype MPI_Aint sndext = 0; NBC_Schedule *schedule; char *sbuf, inplace = 0; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; @@ -158,20 +157,12 @@ int ompi_coll_libnbc_iscatter (const void* sendbuf, int sendcount, MPI_Datatype } #endif - res = NBC_Init_handle(comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start(handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - return res; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } @@ -183,7 +174,6 @@ int ompi_coll_libnbc_iscatter_inter (const void* sendbuf, int sendcount, MPI_Dat MPI_Aint sndext; NBC_Schedule *schedule; char *sbuf; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; rsize = ompi_comm_remote_size (comm); @@ -227,19 +217,11 @@ int ompi_coll_libnbc_iscatter_inter (const void* sendbuf, int sendcount, MPI_Dat return res; } - res = NBC_Init_handle(comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start(handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - return res; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } diff --git a/ompi/mca/coll/libnbc/nbc_iscatterv.c b/ompi/mca/coll/libnbc/nbc_iscatterv.c index 3772fc9014f..b16ef085c13 100644 --- a/ompi/mca/coll/libnbc/nbc_iscatterv.c +++ b/ompi/mca/coll/libnbc/nbc_iscatterv.c @@ -10,7 +10,7 @@ * Copyright (c) 2013 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. - * Copyright (c) 2014-2016 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ @@ -35,7 +35,6 @@ int ompi_coll_libnbc_iscatterv(const void* sendbuf, const int *sendcounts, const MPI_Aint sndext; NBC_Schedule *schedule; char *sbuf, inplace = 0; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; rank = ompi_comm_rank (comm); @@ -93,20 +92,12 @@ int ompi_coll_libnbc_iscatterv(const void* sendbuf, const int *sendcounts, const return res; } - res = NBC_Init_handle (comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start (handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - return res; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } @@ -118,7 +109,6 @@ int ompi_coll_libnbc_iscatterv_inter (const void* sendbuf, const int *sendcounts MPI_Aint sndext; NBC_Schedule *schedule; char *sbuf; - NBC_Handle *handle; ompi_coll_libnbc_module_t *libnbc_module = (ompi_coll_libnbc_module_t*) module; rsize = ompi_comm_remote_size (comm); @@ -161,19 +151,11 @@ int ompi_coll_libnbc_iscatterv_inter (const void* sendbuf, const int *sendcounts return res; } - res = NBC_Init_handle(comm, &handle, libnbc_module); + res = NBC_Schedule_request(schedule, comm, libnbc_module, request, NULL); if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { OBJ_RELEASE(schedule); return res; } - res = NBC_Start(handle, schedule); - if (OPAL_UNLIKELY(OMPI_SUCCESS != res)) { - NBC_Return_handle (handle); - return res; - } - - *request = (ompi_request_t *) handle; - return OMPI_SUCCESS; } diff --git a/ompi/mca/coll/monitoring/Makefile.am b/ompi/mca/coll/monitoring/Makefile.am new file mode 100644 index 00000000000..9c6e96b1c52 --- /dev/null +++ b/ompi/mca/coll/monitoring/Makefile.am @@ -0,0 +1,54 @@ +# +# Copyright (c) 2016 Inria. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +monitoring_sources = \ + coll_monitoring.h \ + coll_monitoring_allgather.c \ + coll_monitoring_allgatherv.c \ + coll_monitoring_allreduce.c \ + coll_monitoring_alltoall.c \ + coll_monitoring_alltoallv.c \ + coll_monitoring_alltoallw.c \ + coll_monitoring_barrier.c \ + coll_monitoring_bcast.c \ + coll_monitoring_component.c \ + coll_monitoring_exscan.c \ + coll_monitoring_gather.c \ + coll_monitoring_gatherv.c \ + coll_monitoring_neighbor_allgather.c \ + coll_monitoring_neighbor_allgatherv.c \ + coll_monitoring_neighbor_alltoall.c \ + coll_monitoring_neighbor_alltoallv.c \ + coll_monitoring_neighbor_alltoallw.c \ + coll_monitoring_reduce.c \ + coll_monitoring_reduce_scatter.c \ + coll_monitoring_reduce_scatter_block.c \ + coll_monitoring_scan.c \ + coll_monitoring_scatter.c \ + coll_monitoring_scatterv.c + +if MCA_BUILD_ompi_coll_monitoring_DSO +component_noinst = +component_install = mca_coll_monitoring.la +else +component_noinst = libmca_coll_monitoring.la +component_install = +endif + +mcacomponentdir = $(ompilibdir) +mcacomponent_LTLIBRARIES = $(component_install) +mca_coll_monitoring_la_SOURCES = $(monitoring_sources) +mca_coll_monitoring_la_LDFLAGS = -module -avoid-version +mca_coll_monitoring_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(OMPI_TOP_BUILDDIR)/ompi/mca/common/monitoring/libmca_common_monitoring.la + +noinst_LTLIBRARIES = $(component_noinst) +libmca_coll_monitoring_la_SOURCES = $(monitoring_sources) +libmca_coll_monitoring_la_LDFLAGS = -module -avoid-version diff --git a/ompi/mca/coll/monitoring/coll_monitoring.h b/ompi/mca/coll/monitoring/coll_monitoring.h new file mode 100644 index 00000000000..8046c982082 --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring.h @@ -0,0 +1,387 @@ +/* + * Copyright (c) 2016 Inria. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 Amazon.com, Inc. or its affiliates. All Rights + * reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#ifndef MCA_COLL_MONITORING_H +#define MCA_COLL_MONITORING_H + +BEGIN_C_DECLS + +#include +#include +#include +#include +#include +#include +#include + +struct mca_coll_monitoring_component_t { + mca_coll_base_component_t super; + int priority; +}; +typedef struct mca_coll_monitoring_component_t mca_coll_monitoring_component_t; + +OMPI_DECLSPEC extern mca_coll_monitoring_component_t mca_coll_monitoring_component; + +struct mca_coll_monitoring_module_t { + mca_coll_base_module_t super; + mca_coll_base_comm_coll_t real; + mca_monitoring_coll_data_t*data; + int32_t is_initialized; +}; +typedef struct mca_coll_monitoring_module_t mca_coll_monitoring_module_t; +OMPI_DECLSPEC OBJ_CLASS_DECLARATION(mca_coll_monitoring_module_t); + +/* + * Coll interface functions + */ + +/* Blocking */ +extern int mca_coll_monitoring_allgather(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_allgatherv(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, const int *rcounts, + const int *disps, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_allreduce(const void *sbuf, void *rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_alltoall(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_alltoallv(const void *sbuf, const int *scounts, + const int *sdisps, + struct ompi_datatype_t *sdtype, + void *rbuf, const int *rcounts, + const int *rdisps, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_alltoallw(const void *sbuf, const int *scounts, + const int *sdisps, + struct ompi_datatype_t * const *sdtypes, + void *rbuf, const int *rcounts, + const int *rdisps, + struct ompi_datatype_t * const *rdtypes, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_barrier(struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_bcast(void *buff, int count, + struct ompi_datatype_t *datatype, + int root, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_exscan(const void *sbuf, void *rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_gather(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, struct ompi_datatype_t *rdtype, + int root, struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_gatherv(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, const int *rcounts, const int *disps, + struct ompi_datatype_t *rdtype, + int root, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_reduce(const void *sbuf, void *rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + int root, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_reduce_scatter(const void *sbuf, void *rbuf, + const int *rcounts, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_reduce_scatter_block(const void *sbuf, void *rbuf, + int rcount, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_scan(const void *sbuf, void *rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_scatter(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, + struct ompi_datatype_t *rdtype, + int root, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_scatterv(const void *sbuf, const int *scounts, const int *disps, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, + struct ompi_datatype_t *rdtype, + int root, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +/* Nonblocking */ +extern int mca_coll_monitoring_iallgather(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_iallgatherv(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, const int *rcounts, + const int *disps, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_iallreduce(const void *sbuf, void *rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_ialltoall(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_ialltoallv(const void *sbuf, const int *scounts, + const int *sdisps, + struct ompi_datatype_t *sdtype, + void *rbuf, const int *rcounts, + const int *rdisps, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_ialltoallw(const void *sbuf, const int *scounts, + const int *sdisps, + struct ompi_datatype_t * const *sdtypes, + void *rbuf, const int *rcounts, + const int *rdisps, + struct ompi_datatype_t * const *rdtypes, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_ibarrier(struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_ibcast(void *buff, int count, + struct ompi_datatype_t *datatype, + int root, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_iexscan(const void *sbuf, void *rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_igather(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, struct ompi_datatype_t *rdtype, + int root, struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_igatherv(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, const int *rcounts, const int *disps, + struct ompi_datatype_t *rdtype, + int root, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_ireduce(const void *sbuf, void *rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + int root, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_ireduce_scatter(const void *sbuf, void *rbuf, + const int *rcounts, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_ireduce_scatter_block(const void *sbuf, void *rbuf, + int rcount, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_iscan(const void *sbuf, void *rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_iscatter(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, + struct ompi_datatype_t *rdtype, + int root, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_iscatterv(const void *sbuf, const int *scounts, const int *disps, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, + struct ompi_datatype_t *rdtype, + int root, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +/* Neighbor */ +extern int mca_coll_monitoring_neighbor_allgather(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, void *rbuf, + int rcount, struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_neighbor_allgatherv(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, void * rbuf, + const int *rcounts, const int *disps, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_neighbor_alltoall(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_neighbor_alltoallv(const void *sbuf, const int *scounts, + const int *sdisps, + struct ompi_datatype_t *sdtype, + void *rbuf, const int *rcounts, + const int *rdisps, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_neighbor_alltoallw(const void *sbuf, const int *scounts, + const MPI_Aint *sdisps, + struct ompi_datatype_t * const *sdtypes, + void *rbuf, const int *rcounts, + const MPI_Aint *rdisps, + struct ompi_datatype_t * const *rdtypes, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_ineighbor_allgather(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, void *rbuf, + int rcount, struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_ineighbor_allgatherv(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void * rbuf, const int *rcounts, + const int *disps, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_ineighbor_alltoall(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, void *rbuf, + int rcount, struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_ineighbor_alltoallv(const void *sbuf, const int *scounts, + const int *sdisps, + struct ompi_datatype_t *sdtype, + void *rbuf, const int *rcounts, + const int *rdisps, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +extern int mca_coll_monitoring_ineighbor_alltoallw(const void *sbuf, const int *scounts, + const MPI_Aint *sdisps, + struct ompi_datatype_t * const *sdtypes, + void *rbuf, const int *rcounts, + const MPI_Aint *rdisps, + struct ompi_datatype_t * const *rdtypes, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module); + +END_C_DECLS + +#endif /* MCA_COLL_MONITORING_H */ diff --git a/ompi/mca/coll/monitoring/coll_monitoring_allgather.c b/ompi/mca/coll/monitoring/coll_monitoring_allgather.c new file mode 100644 index 00000000000..dc45d8f8974 --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_allgather.c @@ -0,0 +1,71 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_allgather(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + ompi_datatype_type_size(sdtype, &type_size); + data_size = scount * type_size; + mca_common_monitoring_coll_a2a(data_size * (comm_size - 1), monitoring_module->data); + for( i = 0; i < comm_size; ++i ) { + if( i == my_rank ) continue; /* No communication for self */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + return monitoring_module->real.coll_allgather(sbuf, scount, sdtype, rbuf, rcount, rdtype, comm, monitoring_module->real.coll_allgather_module); +} + +int mca_coll_monitoring_iallgather(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + ompi_datatype_type_size(sdtype, &type_size); + data_size = scount * type_size; + mca_common_monitoring_coll_a2a(data_size * (comm_size - 1), monitoring_module->data); + for( i = 0; i < comm_size; ++i ) { + if( my_rank == i ) continue; /* No communication for self */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + return monitoring_module->real.coll_iallgather(sbuf, scount, sdtype, rbuf, rcount, rdtype, comm, request, monitoring_module->real.coll_iallgather_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_allgatherv.c b/ompi/mca/coll/monitoring/coll_monitoring_allgatherv.c new file mode 100644 index 00000000000..85510009df5 --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_allgatherv.c @@ -0,0 +1,71 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_allgatherv(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void * rbuf, const int *rcounts, const int *disps, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + ompi_datatype_type_size(sdtype, &type_size); + data_size = scount * type_size; + mca_common_monitoring_coll_a2a(data_size * (comm_size - 1), monitoring_module->data); + for( i = 0; i < comm_size; ++i ) { + if( my_rank == i ) continue; /* No communication for self */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + return monitoring_module->real.coll_allgatherv(sbuf, scount, sdtype, rbuf, rcounts, disps, rdtype, comm, monitoring_module->real.coll_allgatherv_module); +} + +int mca_coll_monitoring_iallgatherv(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void * rbuf, const int *rcounts, const int *disps, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + ompi_datatype_type_size(sdtype, &type_size); + data_size = scount * type_size; + mca_common_monitoring_coll_a2a(data_size * (comm_size - 1), monitoring_module->data); + for( i = 0; i < comm_size; ++i ) { + if( my_rank == i ) continue; /* No communication for self */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + return monitoring_module->real.coll_iallgatherv(sbuf, scount, sdtype, rbuf, rcounts, disps, rdtype, comm, request, monitoring_module->real.coll_iallgatherv_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_allreduce.c b/ompi/mca/coll/monitoring/coll_monitoring_allreduce.c new file mode 100644 index 00000000000..c0f3a74d086 --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_allreduce.c @@ -0,0 +1,70 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_allreduce(const void *sbuf, void *rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + ompi_datatype_type_size(dtype, &type_size); + data_size = count * type_size; + mca_common_monitoring_coll_a2a(data_size * (comm_size - 1), monitoring_module->data); + for( i = 0; i < comm_size; ++i ) { + if( my_rank == i ) continue; /* No communication for self */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + return monitoring_module->real.coll_allreduce(sbuf, rbuf, count, dtype, op, comm, monitoring_module->real.coll_allreduce_module); +} + +int mca_coll_monitoring_iallreduce(const void *sbuf, void *rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + ompi_datatype_type_size(dtype, &type_size); + data_size = count * type_size; + mca_common_monitoring_coll_a2a(data_size * (comm_size - 1), monitoring_module->data); + for( i = 0; i < comm_size; ++i ) { + if( my_rank == i ) continue; /* No communication for self */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + return monitoring_module->real.coll_iallreduce(sbuf, rbuf, count, dtype, op, comm, request, monitoring_module->real.coll_iallreduce_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_alltoall.c b/ompi/mca/coll/monitoring/coll_monitoring_alltoall.c new file mode 100644 index 00000000000..60e8ebaeab4 --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_alltoall.c @@ -0,0 +1,69 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_alltoall(const void *sbuf, int scount, struct ompi_datatype_t *sdtype, + void* rbuf, int rcount, struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + ompi_datatype_type_size(sdtype, &type_size); + data_size = scount * type_size; + mca_common_monitoring_coll_a2a(data_size * (comm_size - 1), monitoring_module->data); + for( i = 0; i < comm_size; ++i ) { + if( my_rank == i ) continue; /* No communication for self */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + return monitoring_module->real.coll_alltoall(sbuf, scount, sdtype, rbuf, rcount, rdtype, comm, monitoring_module->real.coll_alltoall_module); +} + +int mca_coll_monitoring_ialltoall(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + ompi_datatype_type_size(sdtype, &type_size); + data_size = scount * type_size; + mca_common_monitoring_coll_a2a(data_size * (comm_size - 1), monitoring_module->data); + for( i = 0; i < comm_size; ++i ) { + if( my_rank == i ) continue; /* No communication for self */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + return monitoring_module->real.coll_ialltoall(sbuf, scount, sdtype, rbuf, rcount, rdtype, comm, request, monitoring_module->real.coll_ialltoall_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_alltoallv.c b/ompi/mca/coll/monitoring/coll_monitoring_alltoallv.c new file mode 100644 index 00000000000..97941e7687e --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_alltoallv.c @@ -0,0 +1,75 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_alltoallv(const void *sbuf, const int *scounts, const int *sdisps, + struct ompi_datatype_t *sdtype, + void *rbuf, const int *rcounts, const int *rdisps, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size, data_size_aggreg = 0; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + ompi_datatype_type_size(sdtype, &type_size); + for( i = 0; i < comm_size; ++i ) { + if( my_rank == i ) continue; /* No communication for self */ + data_size = scounts[i] * type_size; + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + data_size_aggreg += data_size; + } + } + mca_common_monitoring_coll_a2a(data_size_aggreg, monitoring_module->data); + return monitoring_module->real.coll_alltoallv(sbuf, scounts, sdisps, sdtype, rbuf, rcounts, rdisps, rdtype, comm, monitoring_module->real.coll_alltoallv_module); +} + +int mca_coll_monitoring_ialltoallv(const void *sbuf, const int *scounts, + const int *sdisps, + struct ompi_datatype_t *sdtype, + void *rbuf, const int *rcounts, + const int *rdisps, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size, data_size_aggreg = 0; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + ompi_datatype_type_size(sdtype, &type_size); + for( i = 0; i < comm_size; ++i ) { + if( my_rank == i ) continue; /* No communication for self */ + data_size = scounts[i] * type_size; + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + data_size_aggreg += data_size; + } + } + mca_common_monitoring_coll_a2a(data_size_aggreg, monitoring_module->data); + return monitoring_module->real.coll_ialltoallv(sbuf, scounts, sdisps, sdtype, rbuf, rcounts, rdisps, rdtype, comm, request, monitoring_module->real.coll_ialltoallv_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_alltoallw.c b/ompi/mca/coll/monitoring/coll_monitoring_alltoallw.c new file mode 100644 index 00000000000..8d8b0591b2e --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_alltoallw.c @@ -0,0 +1,77 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_alltoallw(const void *sbuf, const int *scounts, + const int *sdisps, + struct ompi_datatype_t * const *sdtypes, + void *rbuf, const int *rcounts, + const int *rdisps, + struct ompi_datatype_t * const *rdtypes, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size, data_size_aggreg = 0; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + for( i = 0; i < comm_size; ++i ) { + if( my_rank == i ) continue; /* No communication for self */ + ompi_datatype_type_size(sdtypes[i], &type_size); + data_size = scounts[i] * type_size; + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + data_size_aggreg += data_size; + } + } + mca_common_monitoring_coll_a2a(data_size_aggreg, monitoring_module->data); + return monitoring_module->real.coll_alltoallw(sbuf, scounts, sdisps, sdtypes, rbuf, rcounts, rdisps, rdtypes, comm, monitoring_module->real.coll_alltoallw_module); +} + +int mca_coll_monitoring_ialltoallw(const void *sbuf, const int *scounts, + const int *sdisps, + struct ompi_datatype_t * const *sdtypes, + void *rbuf, const int *rcounts, + const int *rdisps, + struct ompi_datatype_t * const *rdtypes, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size, data_size_aggreg = 0; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + for( i = 0; i < comm_size; ++i ) { + if( my_rank == i ) continue; /* No communication for self */ + ompi_datatype_type_size(sdtypes[i], &type_size); + data_size = scounts[i] * type_size; + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + data_size_aggreg += data_size; + } + } + mca_common_monitoring_coll_a2a(data_size_aggreg, monitoring_module->data); + return monitoring_module->real.coll_ialltoallw(sbuf, scounts, sdisps, sdtypes, rbuf, rcounts, rdisps, rdtypes, comm, request, monitoring_module->real.coll_ialltoallw_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_barrier.c b/ompi/mca/coll/monitoring/coll_monitoring_barrier.c new file mode 100644 index 00000000000..f1e42efed39 --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_barrier.c @@ -0,0 +1,56 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_barrier(struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + int i, rank; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + for( i = 0; i < comm_size; ++i ) { + if( my_rank == i ) continue; /* No communication for self */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, 0); + } + } + mca_common_monitoring_coll_a2a(0, monitoring_module->data); + return monitoring_module->real.coll_barrier(comm, monitoring_module->real.coll_barrier_module); +} + +int mca_coll_monitoring_ibarrier(struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + int i, rank; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + for( i = 0; i < comm_size; ++i ) { + if( my_rank == i ) continue; /* No communication for self */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, 0); + } + } + mca_common_monitoring_coll_a2a(0, monitoring_module->data); + return monitoring_module->real.coll_ibarrier(comm, request, monitoring_module->real.coll_ibarrier_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_bcast.c b/ompi/mca/coll/monitoring/coll_monitoring_bcast.c new file mode 100644 index 00000000000..bb877458abd --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_bcast.c @@ -0,0 +1,73 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_bcast(void *buff, int count, + struct ompi_datatype_t *datatype, + int root, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + ompi_datatype_type_size(datatype, &type_size); + data_size = count * type_size; + if( root == ompi_comm_rank(comm) ) { + int i, rank; + mca_common_monitoring_coll_o2a(data_size * (comm_size - 1), monitoring_module->data); + for( i = 0; i < comm_size; ++i ) { + if( i == root ) continue; /* No self sending */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + } + return monitoring_module->real.coll_bcast(buff, count, datatype, root, comm, monitoring_module->real.coll_bcast_module); +} + +int mca_coll_monitoring_ibcast(void *buff, int count, + struct ompi_datatype_t *datatype, + int root, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + ompi_datatype_type_size(datatype, &type_size); + data_size = count * type_size; + if( root == ompi_comm_rank(comm) ) { + int i, rank; + mca_common_monitoring_coll_o2a(data_size * (comm_size - 1), monitoring_module->data); + for( i = 0; i < comm_size; ++i ) { + if( i == root ) continue; /* No self sending */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + } + return monitoring_module->real.coll_ibcast(buff, count, datatype, root, comm, request, monitoring_module->real.coll_ibcast_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_component.c b/ompi/mca/coll/monitoring/coll_monitoring_component.c new file mode 100644 index 00000000000..47d14375e10 --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_component.c @@ -0,0 +1,256 @@ +/* + * Copyright (c) 2016-2017 Inria. All rights reserved. + * Copyright (c) 2017 Amazon.com, Inc. or its affiliates. All Rights + * reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include "coll_monitoring.h" +#include +#include +#include +#include + +#define MONITORING_SAVE_PREV_COLL_API(__module, __comm, __api) \ + do { \ + if( NULL != __comm->c_coll->coll_ ## __api ## _module ) { \ + __module->real.coll_ ## __api = __comm->c_coll->coll_ ## __api; \ + __module->real.coll_ ## __api ## _module = __comm->c_coll->coll_ ## __api ## _module; \ + OBJ_RETAIN(__module->real.coll_ ## __api ## _module); \ + } else { \ + /* If no function previously provided, do not monitor */ \ + __module->super.coll_ ## __api = NULL; \ + OPAL_MONITORING_PRINT_WARN("COMM \"%s\": No monitoring available for " \ + "coll_" # __api, __comm->c_name); \ + } \ + if( NULL != __comm->c_coll->coll_i ## __api ## _module ) { \ + __module->real.coll_i ## __api = __comm->c_coll->coll_i ## __api; \ + __module->real.coll_i ## __api ## _module = __comm->c_coll->coll_i ## __api ## _module; \ + OBJ_RETAIN(__module->real.coll_i ## __api ## _module); \ + } else { \ + /* If no function previously provided, do not monitor */ \ + __module->super.coll_i ## __api = NULL; \ + OPAL_MONITORING_PRINT_WARN("COMM \"%s\": No monitoring available for " \ + "coll_i" # __api, __comm->c_name); \ + } \ + } while(0) + +#define MONITORING_RELEASE_PREV_COLL_API(__module, __comm, __api) \ + do { \ + if( NULL != __module->real.coll_ ## __api ## _module ) { \ + if( NULL != __module->real.coll_ ## __api ## _module->coll_module_disable ) { \ + __module->real.coll_ ## __api ## _module->coll_module_disable(__module->real.coll_ ## __api ## _module, __comm); \ + } \ + OBJ_RELEASE(__module->real.coll_ ## __api ## _module); \ + __module->real.coll_ ## __api = NULL; \ + __module->real.coll_ ## __api ## _module = NULL; \ + } \ + if( NULL != __module->real.coll_i ## __api ## _module ) { \ + if( NULL != __module->real.coll_i ## __api ## _module->coll_module_disable ) { \ + __module->real.coll_i ## __api ## _module->coll_module_disable(__module->real.coll_i ## __api ## _module, __comm); \ + } \ + OBJ_RELEASE(__module->real.coll_i ## __api ## _module); \ + __module->real.coll_i ## __api = NULL; \ + __module->real.coll_i ## __api ## _module = NULL; \ + } \ + } while(0) + +#define MONITORING_SET_FULL_PREV_COLL_API(m, c, operation) \ + do { \ + operation(m, c, allgather); \ + operation(m, c, allgatherv); \ + operation(m, c, allreduce); \ + operation(m, c, alltoall); \ + operation(m, c, alltoallv); \ + operation(m, c, alltoallw); \ + operation(m, c, barrier); \ + operation(m, c, bcast); \ + operation(m, c, exscan); \ + operation(m, c, gather); \ + operation(m, c, gatherv); \ + operation(m, c, reduce); \ + operation(m, c, reduce_scatter); \ + operation(m, c, reduce_scatter_block); \ + operation(m, c, scan); \ + operation(m, c, scatter); \ + operation(m, c, scatterv); \ + operation(m, c, neighbor_allgather); \ + operation(m, c, neighbor_allgatherv); \ + operation(m, c, neighbor_alltoall); \ + operation(m, c, neighbor_alltoallv); \ + operation(m, c, neighbor_alltoallw); \ + } while(0) + +#define MONITORING_SAVE_FULL_PREV_COLL_API(m, c) \ + MONITORING_SET_FULL_PREV_COLL_API((m), (c), MONITORING_SAVE_PREV_COLL_API) + +#define MONITORING_RELEASE_FULL_PREV_COLL_API(m, c) \ + MONITORING_SET_FULL_PREV_COLL_API((m), (c), MONITORING_RELEASE_PREV_COLL_API) + +static int mca_coll_monitoring_component_open(void) +{ + return OMPI_SUCCESS; +} + +static int mca_coll_monitoring_component_close(void) +{ + OPAL_MONITORING_PRINT_INFO("coll_module_close"); + mca_common_monitoring_finalize(); + return OMPI_SUCCESS; +} + +static int mca_coll_monitoring_component_init(bool enable_progress_threads, + bool enable_mpi_threads) +{ + OPAL_MONITORING_PRINT_INFO("coll_module_init"); + return mca_common_monitoring_init(); +} + +static int mca_coll_monitoring_component_register(void) +{ + return OMPI_SUCCESS; +} + +static int +mca_coll_monitoring_module_enable(mca_coll_base_module_t*module, struct ompi_communicator_t*comm) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + if( 1 == opal_atomic_add_fetch_32(&monitoring_module->is_initialized, 1) ) { + MONITORING_SAVE_FULL_PREV_COLL_API(monitoring_module, comm); + monitoring_module->data = mca_common_monitoring_coll_new(comm); + OPAL_MONITORING_PRINT_INFO("coll_module_enabled"); + } + return OMPI_SUCCESS; +} + +static int +mca_coll_monitoring_module_disable(mca_coll_base_module_t*module, struct ompi_communicator_t*comm) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + if( 0 == opal_atomic_sub_fetch_32(&monitoring_module->is_initialized, 1) ) { + MONITORING_RELEASE_FULL_PREV_COLL_API(monitoring_module, comm); + mca_common_monitoring_coll_release(monitoring_module->data); + monitoring_module->data = NULL; + OPAL_MONITORING_PRINT_INFO("coll_module_disabled"); + } + return OMPI_SUCCESS; +} + +static int mca_coll_monitoring_ft_event(int state) +{ + switch(state) { + case OPAL_CRS_CHECKPOINT: + case OPAL_CRS_CONTINUE: + case OPAL_CRS_RESTART: + case OPAL_CRS_TERM: + default: + ; + } + return OMPI_SUCCESS; +} + +static mca_coll_base_module_t* +mca_coll_monitoring_component_query(struct ompi_communicator_t*comm, int*priority) +{ + OPAL_MONITORING_PRINT_INFO("coll_module_query"); + mca_coll_monitoring_module_t*monitoring_module = OBJ_NEW(mca_coll_monitoring_module_t); + if( NULL == monitoring_module ) return (*priority = -1, NULL); + + /* Initialize module functions */ + monitoring_module->super.coll_module_enable = mca_coll_monitoring_module_enable; + monitoring_module->super.coll_module_disable = mca_coll_monitoring_module_disable; + monitoring_module->super.ft_event = mca_coll_monitoring_ft_event; + + /* Initialise module collectives functions */ + /* Blocking functions */ + monitoring_module->super.coll_allgather = mca_coll_monitoring_allgather; + monitoring_module->super.coll_allgatherv = mca_coll_monitoring_allgatherv; + monitoring_module->super.coll_allreduce = mca_coll_monitoring_allreduce; + monitoring_module->super.coll_alltoall = mca_coll_monitoring_alltoall; + monitoring_module->super.coll_alltoallv = mca_coll_monitoring_alltoallv; + monitoring_module->super.coll_alltoallw = mca_coll_monitoring_alltoallw; + monitoring_module->super.coll_barrier = mca_coll_monitoring_barrier; + monitoring_module->super.coll_bcast = mca_coll_monitoring_bcast; + monitoring_module->super.coll_exscan = mca_coll_monitoring_exscan; + monitoring_module->super.coll_gather = mca_coll_monitoring_gather; + monitoring_module->super.coll_gatherv = mca_coll_monitoring_gatherv; + monitoring_module->super.coll_reduce = mca_coll_monitoring_reduce; + monitoring_module->super.coll_reduce_scatter = mca_coll_monitoring_reduce_scatter; + monitoring_module->super.coll_reduce_scatter_block = mca_coll_monitoring_reduce_scatter_block; + monitoring_module->super.coll_scan = mca_coll_monitoring_scan; + monitoring_module->super.coll_scatter = mca_coll_monitoring_scatter; + monitoring_module->super.coll_scatterv = mca_coll_monitoring_scatterv; + + /* Nonblocking functions */ + monitoring_module->super.coll_iallgather = mca_coll_monitoring_iallgather; + monitoring_module->super.coll_iallgatherv = mca_coll_monitoring_iallgatherv; + monitoring_module->super.coll_iallreduce = mca_coll_monitoring_iallreduce; + monitoring_module->super.coll_ialltoall = mca_coll_monitoring_ialltoall; + monitoring_module->super.coll_ialltoallv = mca_coll_monitoring_ialltoallv; + monitoring_module->super.coll_ialltoallw = mca_coll_monitoring_ialltoallw; + monitoring_module->super.coll_ibarrier = mca_coll_monitoring_ibarrier; + monitoring_module->super.coll_ibcast = mca_coll_monitoring_ibcast; + monitoring_module->super.coll_iexscan = mca_coll_monitoring_iexscan; + monitoring_module->super.coll_igather = mca_coll_monitoring_igather; + monitoring_module->super.coll_igatherv = mca_coll_monitoring_igatherv; + monitoring_module->super.coll_ireduce = mca_coll_monitoring_ireduce; + monitoring_module->super.coll_ireduce_scatter = mca_coll_monitoring_ireduce_scatter; + monitoring_module->super.coll_ireduce_scatter_block = mca_coll_monitoring_ireduce_scatter_block; + monitoring_module->super.coll_iscan = mca_coll_monitoring_iscan; + monitoring_module->super.coll_iscatter = mca_coll_monitoring_iscatter; + monitoring_module->super.coll_iscatterv = mca_coll_monitoring_iscatterv; + + /* Neighborhood functions */ + monitoring_module->super.coll_neighbor_allgather = mca_coll_monitoring_neighbor_allgather; + monitoring_module->super.coll_neighbor_allgatherv = mca_coll_monitoring_neighbor_allgatherv; + monitoring_module->super.coll_neighbor_alltoall = mca_coll_monitoring_neighbor_alltoall; + monitoring_module->super.coll_neighbor_alltoallv = mca_coll_monitoring_neighbor_alltoallv; + monitoring_module->super.coll_neighbor_alltoallw = mca_coll_monitoring_neighbor_alltoallw; + monitoring_module->super.coll_ineighbor_allgather = mca_coll_monitoring_ineighbor_allgather; + monitoring_module->super.coll_ineighbor_allgatherv = mca_coll_monitoring_ineighbor_allgatherv; + monitoring_module->super.coll_ineighbor_alltoall = mca_coll_monitoring_ineighbor_alltoall; + monitoring_module->super.coll_ineighbor_alltoallv = mca_coll_monitoring_ineighbor_alltoallv; + monitoring_module->super.coll_ineighbor_alltoallw = mca_coll_monitoring_ineighbor_alltoallw; + + /* Initialization flag */ + monitoring_module->is_initialized = 0; + + *priority = mca_coll_monitoring_component.priority; + + return &(monitoring_module->super); +} + +mca_coll_monitoring_component_t mca_coll_monitoring_component = { + .super = { + /* First, the mca_base_component_t struct containing meta + information about the component itself */ + .collm_version = { + MCA_COLL_BASE_VERSION_2_0_0, + + .mca_component_name = "monitoring", /* MCA component name */ + MCA_MONITORING_MAKE_VERSION, + .mca_open_component = mca_coll_monitoring_component_open, /* component open */ + .mca_close_component = mca_coll_monitoring_component_close, /* component close */ + .mca_register_component_params = mca_coll_monitoring_component_register + }, + .collm_data = { + /* The component is checkpoint ready */ + MCA_BASE_METADATA_PARAM_CHECKPOINT + }, + + .collm_init_query = mca_coll_monitoring_component_init, + .collm_comm_query = mca_coll_monitoring_component_query + }, + .priority = INT_MAX +}; + +OBJ_CLASS_INSTANCE(mca_coll_monitoring_module_t, + mca_coll_base_module_t, + NULL, + NULL); + diff --git a/ompi/mca/coll/monitoring/coll_monitoring_exscan.c b/ompi/mca/coll/monitoring/coll_monitoring_exscan.c new file mode 100644 index 00000000000..14a038d8985 --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_exscan.c @@ -0,0 +1,68 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_exscan(const void *sbuf, void *rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + ompi_datatype_type_size(dtype, &type_size); + data_size = count * type_size; + mca_common_monitoring_coll_a2a(data_size * (comm_size - my_rank), monitoring_module->data); + for( i = my_rank + 1; i < comm_size; ++i ) { + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + return monitoring_module->real.coll_exscan(sbuf, rbuf, count, dtype, op, comm, monitoring_module->real.coll_exscan_module); +} + +int mca_coll_monitoring_iexscan(const void *sbuf, void *rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + ompi_datatype_type_size(dtype, &type_size); + data_size = count * type_size; + mca_common_monitoring_coll_a2a(data_size * (comm_size - my_rank), monitoring_module->data); + for( i = my_rank + 1; i < comm_size; ++i ) { + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + return monitoring_module->real.coll_iexscan(sbuf, rbuf, count, dtype, op, comm, request, monitoring_module->real.coll_iexscan_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_gather.c b/ompi/mca/coll/monitoring/coll_monitoring_gather.c new file mode 100644 index 00000000000..331cf3725e9 --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_gather.c @@ -0,0 +1,71 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_gather(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, struct ompi_datatype_t *rdtype, + int root, struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + if( root == ompi_comm_rank(comm) ) { + int i, rank; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + ompi_datatype_type_size(rdtype, &type_size); + data_size = rcount * type_size; + for( i = 0; i < comm_size; ++i ) { + if( root == i ) continue; /* No communication for self */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + mca_common_monitoring_coll_a2o(data_size * (comm_size - 1), monitoring_module->data); + } + return monitoring_module->real.coll_gather(sbuf, scount, sdtype, rbuf, rcount, rdtype, root, comm, monitoring_module->real.coll_gather_module); +} + +int mca_coll_monitoring_igather(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, struct ompi_datatype_t *rdtype, + int root, struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + if( root == ompi_comm_rank(comm) ) { + int i, rank; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + ompi_datatype_type_size(rdtype, &type_size); + data_size = rcount * type_size; + for( i = 0; i < comm_size; ++i ) { + if( root == i ) continue; /* No communication for self */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + mca_common_monitoring_coll_a2o(data_size * (comm_size - 1), monitoring_module->data); + } + return monitoring_module->real.coll_igather(sbuf, scount, sdtype, rbuf, rcount, rdtype, root, comm, request, monitoring_module->real.coll_igather_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_gatherv.c b/ompi/mca/coll/monitoring/coll_monitoring_gatherv.c new file mode 100644 index 00000000000..bf28a56a87a --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_gatherv.c @@ -0,0 +1,77 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_gatherv(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, const int *rcounts, const int *disps, + struct ompi_datatype_t *rdtype, + int root, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + if( root == ompi_comm_rank(comm) ) { + int i, rank; + size_t type_size, data_size, data_size_aggreg = 0; + const int comm_size = ompi_comm_size(comm); + ompi_datatype_type_size(rdtype, &type_size); + for( i = 0; i < comm_size; ++i ) { + if( root == i ) continue; /* No communication for self */ + data_size = rcounts[i] * type_size; + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + data_size_aggreg += data_size; + } + } + mca_common_monitoring_coll_a2o(data_size_aggreg, monitoring_module->data); + } + return monitoring_module->real.coll_gatherv(sbuf, scount, sdtype, rbuf, rcounts, disps, rdtype, root, comm, monitoring_module->real.coll_gatherv_module); +} + +int mca_coll_monitoring_igatherv(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, const int *rcounts, const int *disps, + struct ompi_datatype_t *rdtype, + int root, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + if( root == ompi_comm_rank(comm) ) { + int i, rank; + size_t type_size, data_size, data_size_aggreg = 0; + const int comm_size = ompi_comm_size(comm); + ompi_datatype_type_size(rdtype, &type_size); + for( i = 0; i < comm_size; ++i ) { + if( root == i ) continue; /* No communication for self */ + data_size = rcounts[i] * type_size; + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + data_size_aggreg += data_size; + } + } + mca_common_monitoring_coll_a2o(data_size_aggreg, monitoring_module->data); + } + return monitoring_module->real.coll_igatherv(sbuf, scount, sdtype, rbuf, rcounts, disps, rdtype, root, comm, request, monitoring_module->real.coll_igatherv_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_neighbor_allgather.c b/ompi/mca/coll/monitoring/coll_monitoring_neighbor_allgather.c new file mode 100644 index 00000000000..459b8d62209 --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_neighbor_allgather.c @@ -0,0 +1,120 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_neighbor_allgather(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, void *rbuf, + int rcount, struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size, data_size_aggreg = 0; + const mca_topo_base_comm_cart_t *cart = comm->c_topo->mtc.cart; + int dim, srank, drank, world_rank; + + ompi_datatype_type_size(sdtype, &type_size); + data_size = scount * type_size; + + for( dim = 0; dim < cart->ndims; ++dim ) { + srank = MPI_PROC_NULL, drank = MPI_PROC_NULL; + + if (cart->dims[dim] > 1) { + mca_topo_base_cart_shift (comm, dim, 1, &srank, &drank); + } else if (1 == cart->dims[dim] && cart->periods[dim]) { + /* Don't record exchanges with self */ + continue; + } + + if (MPI_PROC_NULL != srank) { + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(srank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + } + + if (MPI_PROC_NULL != drank) { + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(drank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + } + } + + mca_common_monitoring_coll_a2a(data_size_aggreg, monitoring_module->data); + + return monitoring_module->real.coll_neighbor_allgather(sbuf, scount, sdtype, rbuf, rcount, rdtype, comm, monitoring_module->real.coll_neighbor_allgather_module); +} + +int mca_coll_monitoring_ineighbor_allgather(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, void *rbuf, + int rcount, struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size, data_size_aggreg = 0; + const mca_topo_base_comm_cart_t *cart = comm->c_topo->mtc.cart; + int dim, srank, drank, world_rank; + + ompi_datatype_type_size(sdtype, &type_size); + data_size = scount * type_size; + + for( dim = 0; dim < cart->ndims; ++dim ) { + srank = MPI_PROC_NULL, drank = MPI_PROC_NULL; + + if (cart->dims[dim] > 1) { + mca_topo_base_cart_shift (comm, dim, 1, &srank, &drank); + } else if (1 == cart->dims[dim] && cart->periods[dim]) { + /* Don't record exchanges with self */ + continue; + } + + if (MPI_PROC_NULL != srank) { + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(srank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + } + + if (MPI_PROC_NULL != drank) { + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(drank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + } + } + + mca_common_monitoring_coll_a2a(data_size_aggreg, monitoring_module->data); + + return monitoring_module->real.coll_ineighbor_allgather(sbuf, scount, sdtype, rbuf, rcount, rdtype, comm, request, monitoring_module->real.coll_ineighbor_allgather_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_neighbor_allgatherv.c b/ompi/mca/coll/monitoring/coll_monitoring_neighbor_allgatherv.c new file mode 100644 index 00000000000..1f74e141846 --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_neighbor_allgatherv.c @@ -0,0 +1,124 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_neighbor_allgatherv(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void * rbuf, const int *rcounts, const int *disps, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size, data_size_aggreg = 0; + const mca_topo_base_comm_cart_2_2_0_t *cart = comm->c_topo->mtc.cart; + int dim, srank, drank, world_rank; + + ompi_datatype_type_size(sdtype, &type_size); + data_size = scount * type_size; + + for( dim = 0; dim < cart->ndims; ++dim ) { + srank = MPI_PROC_NULL, drank = MPI_PROC_NULL; + + if (cart->dims[dim] > 1) { + mca_topo_base_cart_shift (comm, dim, 1, &srank, &drank); + } else if (1 == cart->dims[dim] && cart->periods[dim]) { + /* Don't record exchanges with self */ + continue; + } + + if (MPI_PROC_NULL != srank) { + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(srank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + } + + if (MPI_PROC_NULL != drank) { + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(drank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + } + } + + mca_common_monitoring_coll_a2a(data_size_aggreg, monitoring_module->data); + + return monitoring_module->real.coll_neighbor_allgatherv(sbuf, scount, sdtype, rbuf, rcounts, disps, rdtype, comm, monitoring_module->real.coll_neighbor_allgatherv_module); +} + +int mca_coll_monitoring_ineighbor_allgatherv(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void * rbuf, const int *rcounts, const int *disps, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size, data_size_aggreg = 0; + const mca_topo_base_comm_cart_2_2_0_t *cart = comm->c_topo->mtc.cart; + int dim, srank, drank, world_rank; + + ompi_datatype_type_size(sdtype, &type_size); + data_size = scount * type_size; + + for( dim = 0; dim < cart->ndims; ++dim ) { + srank = MPI_PROC_NULL, drank = MPI_PROC_NULL; + + if (cart->dims[dim] > 1) { + mca_topo_base_cart_shift (comm, dim, 1, &srank, &drank); + } else if (1 == cart->dims[dim] && cart->periods[dim]) { + /* Don't record exchanges with self */ + continue; + } + + if (MPI_PROC_NULL != srank) { + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(srank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + } + + if (MPI_PROC_NULL != drank) { + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(drank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + } + } + + mca_common_monitoring_coll_a2a(data_size_aggreg, monitoring_module->data); + + return monitoring_module->real.coll_ineighbor_allgatherv(sbuf, scount, sdtype, rbuf, rcounts, disps, rdtype, comm, request, monitoring_module->real.coll_ineighbor_allgatherv_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_neighbor_alltoall.c b/ompi/mca/coll/monitoring/coll_monitoring_neighbor_alltoall.c new file mode 100644 index 00000000000..7e9e31e7968 --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_neighbor_alltoall.c @@ -0,0 +1,122 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_neighbor_alltoall(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void* rbuf, int rcount, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size, data_size_aggreg = 0; + const mca_topo_base_comm_cart_t *cart = comm->c_topo->mtc.cart; + int dim, srank, drank, world_rank; + + ompi_datatype_type_size(sdtype, &type_size); + data_size = scount * type_size; + + for( dim = 0; dim < cart->ndims; ++dim ) { + srank = MPI_PROC_NULL, drank = MPI_PROC_NULL; + + if (cart->dims[dim] > 1) { + mca_topo_base_cart_shift (comm, dim, 1, &srank, &drank); + } else if (1 == cart->dims[dim] && cart->periods[dim]) { + /* Don't record exchanges with self */ + continue; + } + + if (MPI_PROC_NULL != srank) { + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(srank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + } + + if (MPI_PROC_NULL != drank) { + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(drank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + } + } + + mca_common_monitoring_coll_a2a(data_size_aggreg, monitoring_module->data); + + return monitoring_module->real.coll_neighbor_alltoall(sbuf, scount, sdtype, rbuf, rcount, rdtype, comm, monitoring_module->real.coll_neighbor_alltoall_module); +} + +int mca_coll_monitoring_ineighbor_alltoall(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size, data_size_aggreg = 0; + const mca_topo_base_comm_cart_t *cart = comm->c_topo->mtc.cart; + int dim, srank, drank, world_rank; + + ompi_datatype_type_size(sdtype, &type_size); + data_size = scount * type_size; + + for( dim = 0; dim < cart->ndims; ++dim ) { + srank = MPI_PROC_NULL, drank = MPI_PROC_NULL; + + if (cart->dims[dim] > 1) { + mca_topo_base_cart_shift (comm, dim, 1, &srank, &drank); + } else if (1 == cart->dims[dim] && cart->periods[dim]) { + /* Don't record exchanges with self */ + continue; + } + + if (MPI_PROC_NULL != srank) { + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(srank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + } + + if (MPI_PROC_NULL != drank) { + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(drank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + } + } + + mca_common_monitoring_coll_a2a(data_size_aggreg, monitoring_module->data); + + return monitoring_module->real.coll_ineighbor_alltoall(sbuf, scount, sdtype, rbuf, rcount, rdtype, comm, request, monitoring_module->real.coll_ineighbor_alltoall_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_neighbor_alltoallv.c b/ompi/mca/coll/monitoring/coll_monitoring_neighbor_alltoallv.c new file mode 100644 index 00000000000..c355a1a54d8 --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_neighbor_alltoallv.c @@ -0,0 +1,130 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_neighbor_alltoallv(const void *sbuf, const int *scounts, + const int *sdisps, struct ompi_datatype_t *sdtype, + void *rbuf, const int *rcounts, const int *rdisps, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size, data_size_aggreg = 0; + const mca_topo_base_comm_cart_t *cart = comm->c_topo->mtc.cart; + int dim, i, srank, drank, world_rank; + + ompi_datatype_type_size(sdtype, &type_size); + + for( dim = 0, i = 0; dim < cart->ndims; ++dim ) { + srank = MPI_PROC_NULL, drank = MPI_PROC_NULL; + + if (cart->dims[dim] > 1) { + mca_topo_base_cart_shift (comm, dim, 1, &srank, &drank); + } else if (1 == cart->dims[dim] && cart->periods[dim]) { + /* Don't record exchanges with self */ + continue; + } + + if (MPI_PROC_NULL != srank) { + data_size = scounts[i] * type_size; + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(srank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + ++i; + } + + if (MPI_PROC_NULL != drank) { + data_size = scounts[i] * type_size; + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(drank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + ++i; + } + } + + mca_common_monitoring_coll_a2a(data_size_aggreg, monitoring_module->data); + + return monitoring_module->real.coll_neighbor_alltoallv(sbuf, scounts, sdisps, sdtype, rbuf, rcounts, rdisps, rdtype, comm, monitoring_module->real.coll_neighbor_alltoallv_module); +} + +int mca_coll_monitoring_ineighbor_alltoallv(const void *sbuf, const int *scounts, + const int *sdisps, + struct ompi_datatype_t *sdtype, + void *rbuf, const int *rcounts, + const int *rdisps, + struct ompi_datatype_t *rdtype, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size, data_size_aggreg = 0; + const mca_topo_base_comm_cart_t *cart = comm->c_topo->mtc.cart; + int dim, i, srank, drank, world_rank; + + ompi_datatype_type_size(sdtype, &type_size); + + for( dim = 0, i = 0; dim < cart->ndims; ++dim ) { + srank = MPI_PROC_NULL, drank = MPI_PROC_NULL; + + if (cart->dims[dim] > 1) { + mca_topo_base_cart_shift (comm, dim, 1, &srank, &drank); + } else if (1 == cart->dims[dim] && cart->periods[dim]) { + /* Don't record exchanges with self */ + continue; + } + + if (MPI_PROC_NULL != srank) { + data_size = scounts[i] * type_size; + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(srank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + ++i; + } + + if (MPI_PROC_NULL != drank) { + data_size = scounts[i] * type_size; + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(drank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + ++i; + } + } + + mca_common_monitoring_coll_a2a(data_size_aggreg, monitoring_module->data); + + return monitoring_module->real.coll_ineighbor_alltoallv(sbuf, scounts, sdisps, sdtype, rbuf, rcounts, rdisps, rdtype, comm, request, monitoring_module->real.coll_ineighbor_alltoallv_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_neighbor_alltoallw.c b/ompi/mca/coll/monitoring/coll_monitoring_neighbor_alltoallw.c new file mode 100644 index 00000000000..f707d36a287 --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_neighbor_alltoallw.c @@ -0,0 +1,132 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_neighbor_alltoallw(const void *sbuf, const int *scounts, + const MPI_Aint *sdisps, + struct ompi_datatype_t * const *sdtypes, + void *rbuf, const int *rcounts, + const MPI_Aint *rdisps, + struct ompi_datatype_t * const *rdtypes, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size, data_size_aggreg = 0; + const mca_topo_base_comm_cart_t *cart = comm->c_topo->mtc.cart; + int dim, i, srank, drank, world_rank; + + for( dim = 0, i = 0; dim < cart->ndims; ++dim ) { + srank = MPI_PROC_NULL, drank = MPI_PROC_NULL; + + if (cart->dims[dim] > 1) { + mca_topo_base_cart_shift (comm, dim, 1, &srank, &drank); + } else if (1 == cart->dims[dim] && cart->periods[dim]) { + /* Don't record exchanges with self */ + continue; + } + + if (MPI_PROC_NULL != srank) { + ompi_datatype_type_size(sdtypes[i], &type_size); + data_size = scounts[i] * type_size; + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(srank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + ++i; + } + + if (MPI_PROC_NULL != drank) { + ompi_datatype_type_size(sdtypes[i], &type_size); + data_size = scounts[i] * type_size; + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(drank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + ++i; + } + } + + mca_common_monitoring_coll_a2a(data_size_aggreg, monitoring_module->data); + + return monitoring_module->real.coll_neighbor_alltoallw(sbuf, scounts, sdisps, sdtypes, rbuf, rcounts, rdisps, rdtypes, comm, monitoring_module->real.coll_neighbor_alltoallw_module); +} + +int mca_coll_monitoring_ineighbor_alltoallw(const void *sbuf, const int *scounts, + const MPI_Aint *sdisps, + struct ompi_datatype_t * const *sdtypes, + void *rbuf, const int *rcounts, + const MPI_Aint *rdisps, + struct ompi_datatype_t * const *rdtypes, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size, data_size_aggreg = 0; + const mca_topo_base_comm_cart_t *cart = comm->c_topo->mtc.cart; + int dim, i, srank, drank, world_rank; + + for( dim = 0, i = 0; dim < cart->ndims; ++dim ) { + srank = MPI_PROC_NULL, drank = MPI_PROC_NULL; + + if (cart->dims[dim] > 1) { + mca_topo_base_cart_shift (comm, dim, 1, &srank, &drank); + } else if (1 == cart->dims[dim] && cart->periods[dim]) { + /* Don't record exchanges with self */ + continue; + } + + if (MPI_PROC_NULL != srank) { + ompi_datatype_type_size(sdtypes[i], &type_size); + data_size = scounts[i] * type_size; + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(srank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + ++i; + } + + if (MPI_PROC_NULL != drank) { + ompi_datatype_type_size(sdtypes[i], &type_size); + data_size = scounts[i] * type_size; + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(drank, comm->c_remote_group, &world_rank) ) { + mca_common_monitoring_record_coll(world_rank, data_size); + data_size_aggreg += data_size; + } + ++i; + } + } + + mca_common_monitoring_coll_a2a(data_size_aggreg, monitoring_module->data); + + return monitoring_module->real.coll_ineighbor_alltoallw(sbuf, scounts, sdisps, sdtypes, rbuf, rcounts, rdisps, rdtypes, comm, request, monitoring_module->real.coll_ineighbor_alltoallw_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_reduce.c b/ompi/mca/coll/monitoring/coll_monitoring_reduce.c new file mode 100644 index 00000000000..afe417243b7 --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_reduce.c @@ -0,0 +1,74 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_reduce(const void *sbuf, void *rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + int root, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + if( root == ompi_comm_rank(comm) ) { + int i, rank; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + ompi_datatype_type_size(dtype, &type_size); + data_size = count * type_size; + for( i = 0; i < comm_size; ++i ) { + if( root == i ) continue; /* No communication for self */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + mca_common_monitoring_coll_a2o(data_size * (comm_size - 1), monitoring_module->data); + } + return monitoring_module->real.coll_reduce(sbuf, rbuf, count, dtype, op, root, comm, monitoring_module->real.coll_reduce_module); +} + +int mca_coll_monitoring_ireduce(const void *sbuf, void *rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + int root, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + if( root == ompi_comm_rank(comm) ) { + int i, rank; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + ompi_datatype_type_size(dtype, &type_size); + data_size = count * type_size; + for( i = 0; i < comm_size; ++i ) { + if( root == i ) continue; /* No communication for self */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + mca_common_monitoring_coll_a2o(data_size * (comm_size - 1), monitoring_module->data); + } + return monitoring_module->real.coll_ireduce(sbuf, rbuf, count, dtype, op, root, comm, request, monitoring_module->real.coll_ireduce_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_reduce_scatter.c b/ompi/mca/coll/monitoring/coll_monitoring_reduce_scatter.c new file mode 100644 index 00000000000..86cce794a13 --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_reduce_scatter.c @@ -0,0 +1,74 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_reduce_scatter(const void *sbuf, void *rbuf, + const int *rcounts, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size, data_size_aggreg = 0; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + ompi_datatype_type_size(dtype, &type_size); + for( i = 0; i < comm_size; ++i ) { + if( my_rank == i ) continue; /* No communication for self */ + data_size = rcounts[i] * type_size; + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + data_size_aggreg += data_size; + } + mca_common_monitoring_coll_a2a(data_size_aggreg, monitoring_module->data); + return monitoring_module->real.coll_reduce_scatter(sbuf, rbuf, rcounts, dtype, op, comm, monitoring_module->real.coll_reduce_scatter_module); +} + +int mca_coll_monitoring_ireduce_scatter(const void *sbuf, void *rbuf, + const int *rcounts, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size, data_size_aggreg = 0; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + ompi_datatype_type_size(dtype, &type_size); + for( i = 0; i < comm_size; ++i ) { + if( my_rank == i ) continue; /* No communication for self */ + data_size = rcounts[i] * type_size; + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + data_size_aggreg += data_size; + } + mca_common_monitoring_coll_a2a(data_size_aggreg, monitoring_module->data); + return monitoring_module->real.coll_ireduce_scatter(sbuf, rbuf, rcounts, dtype, op, comm, request, monitoring_module->real.coll_ireduce_scatter_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_reduce_scatter_block.c b/ompi/mca/coll/monitoring/coll_monitoring_reduce_scatter_block.c new file mode 100644 index 00000000000..5f76b413bb0 --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_reduce_scatter_block.c @@ -0,0 +1,72 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_reduce_scatter_block(const void *sbuf, void *rbuf, + int rcount, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + ompi_datatype_type_size(dtype, &type_size); + data_size = rcount * type_size; + for( i = 0; i < comm_size; ++i ) { + if( my_rank == i ) continue; /* No communication for self */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + mca_common_monitoring_coll_a2a(data_size * (comm_size - 1), monitoring_module->data); + return monitoring_module->real.coll_reduce_scatter_block(sbuf, rbuf, rcount, dtype, op, comm, monitoring_module->real.coll_reduce_scatter_block_module); +} + +int mca_coll_monitoring_ireduce_scatter_block(const void *sbuf, void *rbuf, + int rcount, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + ompi_datatype_type_size(dtype, &type_size); + data_size = rcount * type_size; + for( i = 0; i < comm_size; ++i ) { + if( my_rank == i ) continue; /* No communication for self */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + mca_common_monitoring_coll_a2a(data_size * (comm_size - 1), monitoring_module->data); + return monitoring_module->real.coll_ireduce_scatter_block(sbuf, rbuf, rcount, dtype, op, comm, request, monitoring_module->real.coll_ireduce_scatter_block_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_scan.c b/ompi/mca/coll/monitoring/coll_monitoring_scan.c new file mode 100644 index 00000000000..1fd7deef70f --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_scan.c @@ -0,0 +1,68 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_scan(const void *sbuf, void *rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + ompi_datatype_type_size(dtype, &type_size); + data_size = count * type_size; + mca_common_monitoring_coll_a2a(data_size * (comm_size - my_rank), monitoring_module->data); + for( i = my_rank + 1; i < comm_size; ++i ) { + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + return monitoring_module->real.coll_scan(sbuf, rbuf, count, dtype, op, comm, monitoring_module->real.coll_scan_module); +} + +int mca_coll_monitoring_iscan(const void *sbuf, void *rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + const int my_rank = ompi_comm_rank(comm); + int i, rank; + ompi_datatype_type_size(dtype, &type_size); + data_size = count * type_size; + mca_common_monitoring_coll_a2a(data_size * (comm_size - my_rank), monitoring_module->data); + for( i = my_rank + 1; i < comm_size; ++i ) { + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + return monitoring_module->real.coll_iscan(sbuf, rbuf, count, dtype, op, comm, request, monitoring_module->real.coll_iscan_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_scatter.c b/ompi/mca/coll/monitoring/coll_monitoring_scatter.c new file mode 100644 index 00000000000..82ca0da3dc3 --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_scatter.c @@ -0,0 +1,78 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_scatter(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, + struct ompi_datatype_t *rdtype, + int root, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + const int my_rank = ompi_comm_rank(comm); + if( root == my_rank ) { + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + int i, rank; + ompi_datatype_type_size(sdtype, &type_size); + data_size = scount * type_size; + for( i = 0; i < comm_size; ++i ) { + if( my_rank == i ) continue; /* No communication for self */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + mca_common_monitoring_coll_o2a(data_size * (comm_size - 1), monitoring_module->data); + } + return monitoring_module->real.coll_scatter(sbuf, scount, sdtype, rbuf, rcount, rdtype, root, comm, monitoring_module->real.coll_scatter_module); +} + + +int mca_coll_monitoring_iscatter(const void *sbuf, int scount, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, + struct ompi_datatype_t *rdtype, + int root, + struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + const int my_rank = ompi_comm_rank(comm); + if( root == my_rank ) { + size_t type_size, data_size; + const int comm_size = ompi_comm_size(comm); + int i, rank; + ompi_datatype_type_size(sdtype, &type_size); + data_size = scount * type_size; + for( i = 0; i < comm_size; ++i ) { + if( my_rank == i ) continue; /* No communication for self */ + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + } + } + mca_common_monitoring_coll_o2a(data_size * (comm_size - 1), monitoring_module->data); + } + return monitoring_module->real.coll_iscatter(sbuf, scount, sdtype, rbuf, rcount, rdtype, root, comm, request, monitoring_module->real.coll_iscatter_module); +} diff --git a/ompi/mca/coll/monitoring/coll_monitoring_scatterv.c b/ompi/mca/coll/monitoring/coll_monitoring_scatterv.c new file mode 100644 index 00000000000..af009cdbe4a --- /dev/null +++ b/ompi/mca/coll/monitoring/coll_monitoring_scatterv.c @@ -0,0 +1,73 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include +#include +#include "coll_monitoring.h" + +int mca_coll_monitoring_scatterv(const void *sbuf, const int *scounts, const int *disps, + struct ompi_datatype_t *sdtype, + void* rbuf, int rcount, struct ompi_datatype_t *rdtype, + int root, struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + const int my_rank = ompi_comm_rank(comm); + if( root == my_rank ) { + size_t type_size, data_size, data_size_aggreg = 0; + const int comm_size = ompi_comm_size(comm); + int i, rank; + ompi_datatype_type_size(sdtype, &type_size); + for( i = 0; i < comm_size; ++i ) { + data_size = scounts[i] * type_size; + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + data_size_aggreg += data_size; + } + } + mca_common_monitoring_coll_o2a(data_size_aggreg, monitoring_module->data); + } + return monitoring_module->real.coll_scatterv(sbuf, scounts, disps, sdtype, rbuf, rcount, rdtype, root, comm, monitoring_module->real.coll_scatterv_module); +} + +int mca_coll_monitoring_iscatterv(const void *sbuf, const int *scounts, const int *disps, + struct ompi_datatype_t *sdtype, + void *rbuf, int rcount, struct ompi_datatype_t *rdtype, + int root, struct ompi_communicator_t *comm, + ompi_request_t ** request, + mca_coll_base_module_t *module) +{ + mca_coll_monitoring_module_t*monitoring_module = (mca_coll_monitoring_module_t*) module; + const int my_rank = ompi_comm_rank(comm); + if( root == my_rank ) { + size_t type_size, data_size, data_size_aggreg = 0; + const int comm_size = ompi_comm_size(comm); + int i, rank; + ompi_datatype_type_size(sdtype, &type_size); + for( i = 0; i < comm_size; ++i ) { + data_size = scounts[i] * type_size; + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, comm->c_remote_group, &rank) ) { + mca_common_monitoring_record_coll(rank, data_size); + data_size_aggreg += data_size; + } + } + mca_common_monitoring_coll_o2a(data_size_aggreg, monitoring_module->data); + } + return monitoring_module->real.coll_iscatterv(sbuf, scounts, disps, sdtype, rbuf, rcount, rdtype, root, comm, request, monitoring_module->real.coll_iscatterv_module); +} diff --git a/ompi/mca/coll/monitoring/configure.m4 b/ompi/mca/coll/monitoring/configure.m4 new file mode 100644 index 00000000000..008bff46994 --- /dev/null +++ b/ompi/mca/coll/monitoring/configure.m4 @@ -0,0 +1,23 @@ +# -*- shell-script -*- +# +# Copyright (c) 2017 The University of Tennessee and The University +# of Tennessee Research Foundation. All rights +# reserved. +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +# MCA_ompi_coll_monitoring_CONFIG([action-if-can-compile], +# [action-if-cant-compile]) +# ------------------------------------------------ +AC_DEFUN([MCA_ompi_coll_monitoring_CONFIG],[ + AC_CONFIG_FILES([ompi/mca/coll/monitoring/Makefile]) + + AS_IF([test "$MCA_BUILD_ompi_common_monitoring_DSO_TRUE" = ''], + [$1], + [$2]) +])dnl + diff --git a/ompi/mca/coll/portals4/Makefile.am b/ompi/mca/coll/portals4/Makefile.am index c8668033564..8f9babbd13b 100644 --- a/ompi/mca/coll/portals4/Makefile.am +++ b/ompi/mca/coll/portals4/Makefile.am @@ -1,6 +1,7 @@ # # Copyright (c) 2013-2015 Sandia National Laboratories. All rights reserved. # Copyright (c) 2015 Bull SAS. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -32,7 +33,8 @@ AM_CPPFLAGS = $(coll_portals4_CPPFLAGS) mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_coll_portals4_la_SOURCES = $(local_sources) -mca_coll_portals4_la_LIBADD = $(coll_portals4_LIBS) +mca_coll_portals4_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(coll_portals4_LIBS) mca_coll_portals4_la_LDFLAGS = -module -avoid-version $(coll_portals4_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/ompi/mca/coll/portals4/coll_portals4_allreduce.c b/ompi/mca/coll/portals4/coll_portals4_allreduce.c index 935ce6cd9d3..56f1ea30621 100644 --- a/ompi/mca/coll/portals4/coll_portals4_allreduce.c +++ b/ompi/mca/coll/portals4/coll_portals4_allreduce.c @@ -68,7 +68,7 @@ allreduce_kary_tree_top(const void *sendbuf, void *recvbuf, int count, zero_md_h = mca_coll_portals4_component.zero_md_h; data_md_h = mca_coll_portals4_component.data_md_h; - internal_count = opal_atomic_add_size_t(&module->coll_count, 1); + internal_count = opal_atomic_add_fetch_size_t(&module->coll_count, 1); /* ** DATATYPE and SIZES diff --git a/ompi/mca/coll/portals4/coll_portals4_barrier.c b/ompi/mca/coll/portals4/coll_portals4_barrier.c index 9d5c4f3c164..f2544ce0cd1 100644 --- a/ompi/mca/coll/portals4/coll_portals4_barrier.c +++ b/ompi/mca/coll/portals4/coll_portals4_barrier.c @@ -44,7 +44,7 @@ barrier_hypercube_top(struct ompi_communicator_t *comm, request->type = OMPI_COLL_PORTALS4_TYPE_BARRIER; - count = opal_atomic_add_size_t(&portals4_module->coll_count, 1); + count = opal_atomic_add_fetch_size_t(&portals4_module->coll_count, 1); ret = PtlCTAlloc(mca_coll_portals4_component.ni_h, &request->u.barrier.rtr_ct_h); diff --git a/ompi/mca/coll/portals4/coll_portals4_bcast.c b/ompi/mca/coll/portals4/coll_portals4_bcast.c index 11132f6ce4c..8432d5823cd 100644 --- a/ompi/mca/coll/portals4/coll_portals4_bcast.c +++ b/ompi/mca/coll/portals4/coll_portals4_bcast.c @@ -176,7 +176,7 @@ bcast_kary_tree_top(void *buff, int count, zero_md_h = mca_coll_portals4_component.zero_md_h; data_md_h = mca_coll_portals4_component.data_md_h; - internal_count = opal_atomic_add_size_t(&portals4_module->coll_count, 1); + internal_count = opal_atomic_add_fetch_size_t(&portals4_module->coll_count, 1); /* @@ -513,7 +513,7 @@ bcast_pipeline_top(void *buff, int count, zero_md_h = mca_coll_portals4_component.zero_md_h; data_md_h = mca_coll_portals4_component.data_md_h; - internal_count = opal_atomic_add_size_t(&portals4_module->coll_count, 1); + internal_count = opal_atomic_add_fetch_size_t(&portals4_module->coll_count, 1); /* ** DATATYPE and SIZES diff --git a/ompi/mca/coll/portals4/coll_portals4_gather.c b/ompi/mca/coll/portals4/coll_portals4_gather.c index 45ff4c07728..7e38e27c009 100644 --- a/ompi/mca/coll/portals4/coll_portals4_gather.c +++ b/ompi/mca/coll/portals4/coll_portals4_gather.c @@ -582,7 +582,7 @@ ompi_coll_portals4_gather_intra_binomial_top(const void *sbuf, int scount, struc /* Setup Common Parameters */ /**********************************/ - request->u.gather.coll_count = opal_atomic_add_size_t(&portals4_module->coll_count, 1); + request->u.gather.coll_count = opal_atomic_add_fetch_size_t(&portals4_module->coll_count, 1); COLL_PORTALS4_UPDATE_IN_ORDER_BMTREE( comm, portals4_module, request->u.gather.root_rank ); bmtree = portals4_module->cached_in_order_bmtree; @@ -879,7 +879,7 @@ ompi_coll_portals4_gather_intra_linear_top(const void *sbuf, int scount, struct i_am_root = (request->u.gather.my_rank == request->u.gather.root_rank); - request->u.gather.coll_count = opal_atomic_add_size_t(&portals4_module->coll_count, 1); + request->u.gather.coll_count = opal_atomic_add_fetch_size_t(&portals4_module->coll_count, 1); ret = setup_gather_buffers_linear(comm, request, portals4_module); if (MPI_SUCCESS != ret) { line = __LINE__; goto err_hdlr; } diff --git a/ompi/mca/coll/portals4/coll_portals4_reduce.c b/ompi/mca/coll/portals4/coll_portals4_reduce.c index 1a55a5c3f70..2fdb36b739c 100644 --- a/ompi/mca/coll/portals4/coll_portals4_reduce.c +++ b/ompi/mca/coll/portals4/coll_portals4_reduce.c @@ -69,7 +69,7 @@ reduce_kary_tree_top(const void *sendbuf, void *recvbuf, int count, zero_md_h = mca_coll_portals4_component.zero_md_h; data_md_h = mca_coll_portals4_component.data_md_h; - internal_count = opal_atomic_add_size_t(&module->coll_count, 1); + internal_count = opal_atomic_add_fetch_size_t(&module->coll_count, 1); /* ** DATATYPE and SIZES diff --git a/ompi/mca/coll/portals4/coll_portals4_scatter.c b/ompi/mca/coll/portals4/coll_portals4_scatter.c index d1cfbbaa0d2..4f3351ac784 100644 --- a/ompi/mca/coll/portals4/coll_portals4_scatter.c +++ b/ompi/mca/coll/portals4/coll_portals4_scatter.c @@ -399,7 +399,7 @@ ompi_coll_portals4_scatter_intra_linear_top(const void *sbuf, int scount, struct i_am_root = (request->u.scatter.my_rank == request->u.scatter.root_rank); - request->u.scatter.coll_count = opal_atomic_add_size_t(&portals4_module->coll_count, 1); + request->u.scatter.coll_count = opal_atomic_add_fetch_size_t(&portals4_module->coll_count, 1); ret = setup_scatter_buffers_linear(comm, request, portals4_module); if (MPI_SUCCESS != ret) { line = __LINE__; goto err_hdlr; } diff --git a/ompi/mca/coll/self/Makefile.am b/ompi/mca/coll/self/Makefile.am index a3735ece346..6b06aab4028 100644 --- a/ompi/mca/coll/self/Makefile.am +++ b/ompi/mca/coll/self/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -54,6 +55,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_coll_self_la_SOURCES = $(sources) mca_coll_self_la_LDFLAGS = -module -avoid-version +mca_coll_self_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_coll_self_la_SOURCES =$(sources) diff --git a/ompi/mca/coll/sm/Makefile.am b/ompi/mca/coll/sm/Makefile.am index 47a6582d16c..fafcbd7e473 100644 --- a/ompi/mca/coll/sm/Makefile.am +++ b/ompi/mca/coll/sm/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2009-2014 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -61,7 +62,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_coll_sm_la_SOURCES = $(sources) mca_coll_sm_la_LDFLAGS = -module -avoid-version -mca_coll_sm_la_LIBADD = \ +mca_coll_sm_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ $(OMPI_TOP_BUILDDIR)/opal/mca/common/sm/lib@OPAL_LIB_PREFIX@mca_common_sm.la noinst_LTLIBRARIES = $(component_noinst) diff --git a/ompi/mca/coll/sm/coll_sm.h b/ompi/mca/coll/sm/coll_sm.h index baaa510ed19..b2da6ede425 100644 --- a/ompi/mca/coll/sm/coll_sm.h +++ b/ompi/mca/coll/sm/coll_sm.h @@ -358,7 +358,7 @@ extern uint32_t mca_coll_sm_one; * Macro to release an in-use flag from this process */ #define FLAG_RELEASE(flag) \ - (void)opal_atomic_add(&(flag)->mcsiuf_num_procs_using, -1) + opal_atomic_add(&(flag)->mcsiuf_num_procs_using, -1) /** * Macro to copy a single segment in from a user buffer to a shared diff --git a/ompi/mca/coll/sm/coll_sm_barrier.c b/ompi/mca/coll/sm/coll_sm_barrier.c index a3000b7d847..2722bbf09f5 100644 --- a/ompi/mca/coll/sm/coll_sm_barrier.c +++ b/ompi/mca/coll/sm/coll_sm_barrier.c @@ -101,7 +101,7 @@ int mca_coll_sm_barrier_intra(struct ompi_communicator_t *comm, if (0 != rank) { /* Get parent *in* buffer */ parent = &data->mcb_barrier_control_parent[buffer_set]; - (void)opal_atomic_add(parent, 1); + opal_atomic_add (parent, 1); SPIN_CONDITION(0 != *me_out, exit_label2); *me_out = 0; diff --git a/ompi/mca/coll/sm/coll_sm_module.c b/ompi/mca/coll/sm/coll_sm_module.c index 6c34851ee46..8922a70eafe 100644 --- a/ompi/mca/coll/sm/coll_sm_module.c +++ b/ompi/mca/coll/sm/coll_sm_module.c @@ -463,7 +463,7 @@ int ompi_coll_sm_lazy_enable(mca_coll_base_module_t *module, OBJ_RETAIN(sm_module->previous_reduce_module); /* Indicate that we have successfully attached and setup */ - (void)opal_atomic_add(&(data->sm_bootstrap_meta->module_seg->seg_inited), 1); + opal_atomic_add (&(data->sm_bootstrap_meta->module_seg->seg_inited), 1); /* Wait for everyone in this communicator to attach and setup */ opal_output_verbose(10, ompi_coll_base_framework.framework_output, diff --git a/ompi/mca/coll/sync/Makefile.am b/ompi/mca/coll/sync/Makefile.am index 61c2437e96e..2f75cd2dfa5 100644 --- a/ompi/mca/coll/sync/Makefile.am +++ b/ompi/mca/coll/sync/Makefile.am @@ -11,6 +11,7 @@ # All rights reserved. # Copyright (c) 2009 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2016 Intel, Inc. All rights reserved +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -46,6 +47,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_coll_sync_la_SOURCES = $(sources) mca_coll_sync_la_LDFLAGS = -module -avoid-version +mca_coll_sync_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_coll_sync_la_SOURCES =$(sources) diff --git a/ompi/mca/coll/sync/coll_sync_module.c b/ompi/mca/coll/sync/coll_sync_module.c index 2e99c9925c0..02c81f513c5 100644 --- a/ompi/mca/coll/sync/coll_sync_module.c +++ b/ompi/mca/coll/sync/coll_sync_module.c @@ -12,6 +12,7 @@ * Copyright (c) 2009 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2018 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -30,8 +31,8 @@ #include "mpi.h" -#include "orte/util/show_help.h" -#include "orte/util/proc_info.h" +#include "opal/util/show_help.h" +#include "ompi/mca/rte/rte.h" #include "ompi/constants.h" #include "ompi/communicator/communicator.h" @@ -173,8 +174,8 @@ int mca_coll_sync_module_enable(mca_coll_base_module_t *module, if (good) { return OMPI_SUCCESS; } - orte_show_help("help-coll-sync.txt", "missing collective", true, - orte_process_info.nodename, + opal_show_help("help-coll-sync.txt", "missing collective", true, + ompi_process_info.nodename, mca_coll_sync_component.priority, msg); return OMPI_ERR_NOT_FOUND; } diff --git a/ompi/mca/coll/tuned/Makefile.am b/ompi/mca/coll/tuned/Makefile.am index cc426671c5d..82be7bb72aa 100644 --- a/ompi/mca/coll/tuned/Makefile.am +++ b/ompi/mca/coll/tuned/Makefile.am @@ -10,6 +10,9 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. +# Copyright (c) 2018 Research Organization for Information Science +# and Technology (RIST). All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -37,7 +40,10 @@ sources = \ coll_tuned_reduce_decision.c \ coll_tuned_bcast_decision.c \ coll_tuned_reduce_scatter_decision.c \ - coll_tuned_scatter_decision.c + coll_tuned_scatter_decision.c \ + coll_tuned_reduce_scatter_block_decision.c \ + coll_tuned_exscan_decision.c \ + coll_tuned_scan_decision.c # Make the output library in this directory, and name it either # mca__.la (for DSO builds) or libmca__.la @@ -55,6 +61,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_coll_tuned_la_SOURCES = $(sources) mca_coll_tuned_la_LDFLAGS = -module -avoid-version +mca_coll_tuned_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_coll_tuned_la_SOURCES =$(sources) diff --git a/ompi/mca/coll/tuned/coll_tuned.h b/ompi/mca/coll/tuned/coll_tuned.h index 661fcde591f..d4b201bc7a3 100644 --- a/ompi/mca/coll/tuned/coll_tuned.h +++ b/ompi/mca/coll/tuned/coll_tuned.h @@ -3,8 +3,8 @@ * Copyright (c) 2004-2015 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. - * Copyright (c) 2015 Research Organization for Information Science - * and Technology (RIST). All rights reserved. + * Copyright (c) 2015-2018 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -150,12 +150,30 @@ int ompi_coll_tuned_reduce_scatter_intra_dec_dynamic(REDUCESCATTER_ARGS); int ompi_coll_tuned_reduce_scatter_intra_do_this(REDUCESCATTER_ARGS, int algorithm, int faninout, int segsize); int ompi_coll_tuned_reduce_scatter_intra_check_forced_init (coll_tuned_force_algorithm_mca_param_indices_t *mca_param_indices); +/* Reduce_scatter_block */ +int ompi_coll_tuned_reduce_scatter_block_intra_dec_fixed(REDUCESCATTERBLOCK_ARGS); +int ompi_coll_tuned_reduce_scatter_block_intra_dec_dynamic(REDUCESCATTERBLOCK_ARGS); +int ompi_coll_tuned_reduce_scatter_block_intra_do_this(REDUCESCATTERBLOCK_ARGS, int algorithm, int faninout, int segsize); +int ompi_coll_tuned_reduce_scatter_block_intra_check_forced_init (coll_tuned_force_algorithm_mca_param_indices_t *mca_param_indices); + /* Scatter */ int ompi_coll_tuned_scatter_intra_dec_fixed(SCATTER_ARGS); int ompi_coll_tuned_scatter_intra_dec_dynamic(SCATTER_ARGS); int ompi_coll_tuned_scatter_intra_do_this(SCATTER_ARGS, int algorithm, int faninout, int segsize); int ompi_coll_tuned_scatter_intra_check_forced_init (coll_tuned_force_algorithm_mca_param_indices_t *mca_param_indices); +/* Exscan */ +int ompi_coll_tuned_exscan_intra_dec_fixed(EXSCAN_ARGS); +int ompi_coll_tuned_exscan_intra_dec_dynamic(EXSCAN_ARGS); +int ompi_coll_tuned_exscan_intra_do_this(EXSCAN_ARGS, int algorithm); +int ompi_coll_tuned_exscan_intra_check_forced_init (coll_tuned_force_algorithm_mca_param_indices_t *mca_param_indices); + +/* Scan */ +int ompi_coll_tuned_scan_intra_dec_fixed(SCAN_ARGS); +int ompi_coll_tuned_scan_intra_dec_dynamic(SCAN_ARGS); +int ompi_coll_tuned_scan_intra_do_this(SCAN_ARGS, int algorithm); +int ompi_coll_tuned_scan_intra_check_forced_init (coll_tuned_force_algorithm_mca_param_indices_t *mca_param_indices); + int mca_coll_tuned_ft_event(int state); struct mca_coll_tuned_component_t { diff --git a/ompi/mca/coll/tuned/coll_tuned_allreduce_decision.c b/ompi/mca/coll/tuned/coll_tuned_allreduce_decision.c index 5ad46e2ce7a..a25c69f7c48 100644 --- a/ompi/mca/coll/tuned/coll_tuned_allreduce_decision.c +++ b/ompi/mca/coll/tuned/coll_tuned_allreduce_decision.c @@ -3,8 +3,8 @@ * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. - * Copyright (c) 2015 Research Organization for Information Science - * and Technology (RIST). All rights reserved. + * Copyright (c) 2015-2018 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -41,6 +41,7 @@ static mca_base_var_enum_value_t allreduce_algorithms[] = { {3, "recursive_doubling"}, {4, "ring"}, {5, "segmented_ring"}, + {6, "rabenseifner"}, {0, NULL} }; @@ -142,6 +143,8 @@ int ompi_coll_tuned_allreduce_intra_do_this(const void *sbuf, void *rbuf, int co return ompi_coll_base_allreduce_intra_ring(sbuf, rbuf, count, dtype, op, comm, module); case (5): return ompi_coll_base_allreduce_intra_ring_segmented(sbuf, rbuf, count, dtype, op, comm, module, segsize); + case (6): + return ompi_coll_base_allreduce_intra_redscat_allgather(sbuf, rbuf, count, dtype, op, comm, module); } /* switch */ OPAL_OUTPUT((ompi_coll_tuned_stream,"coll:tuned:allreduce_intra_do_this attempt to select algorithm %d when only 0-%d is valid?", algorithm, ompi_coll_tuned_forced_max_algorithms[ALLREDUCE])); diff --git a/ompi/mca/coll/tuned/coll_tuned_component.c b/ompi/mca/coll/tuned/coll_tuned_component.c index 7b9410da02f..be0d14a988f 100644 --- a/ompi/mca/coll/tuned/coll_tuned_component.c +++ b/ompi/mca/coll/tuned/coll_tuned_component.c @@ -14,7 +14,7 @@ * Copyright (c) 2008 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2018 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -187,8 +187,11 @@ static int tuned_register(void) ompi_coll_tuned_bcast_intra_check_forced_init(&ompi_coll_tuned_forced_params[BCAST]); ompi_coll_tuned_reduce_intra_check_forced_init(&ompi_coll_tuned_forced_params[REDUCE]); ompi_coll_tuned_reduce_scatter_intra_check_forced_init(&ompi_coll_tuned_forced_params[REDUCESCATTER]); + ompi_coll_tuned_reduce_scatter_block_intra_check_forced_init(&ompi_coll_tuned_forced_params[REDUCESCATTERBLOCK]); ompi_coll_tuned_gather_intra_check_forced_init(&ompi_coll_tuned_forced_params[GATHER]); ompi_coll_tuned_scatter_intra_check_forced_init(&ompi_coll_tuned_forced_params[SCATTER]); + ompi_coll_tuned_exscan_intra_check_forced_init(&ompi_coll_tuned_forced_params[EXSCAN]); + ompi_coll_tuned_scan_intra_check_forced_init(&ompi_coll_tuned_forced_params[SCAN]); return OMPI_SUCCESS; } diff --git a/ompi/mca/coll/tuned/coll_tuned_decision_dynamic.c b/ompi/mca/coll/tuned/coll_tuned_decision_dynamic.c index 2a7914e7880..f52686caa09 100644 --- a/ompi/mca/coll/tuned/coll_tuned_decision_dynamic.c +++ b/ompi/mca/coll/tuned/coll_tuned_decision_dynamic.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2018 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -386,6 +386,58 @@ int ompi_coll_tuned_reduce_scatter_intra_dec_dynamic(const void *sbuf, void *rbu dtype, op, comm, module); } +/* + * reduce_scatter_block_intra_dec + * + * Function: - seletects reduce_scatter_block algorithm to use + * Accepts: - same arguments as MPI_Reduce_scatter_block() + * Returns: - MPI_SUCCESS or error code (passed from + * the reduce_scatter implementation) + * + */ +int ompi_coll_tuned_reduce_scatter_block_intra_dec_dynamic(const void *sbuf, void *rbuf, + int rcount, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_tuned_module_t *tuned_module = (mca_coll_tuned_module_t*) module; + + OPAL_OUTPUT((ompi_coll_tuned_stream, "coll:tuned:reduce_scatter_block_intra_dec_dynamic")); + + /* check to see if we have some filebased rules */ + if (tuned_module->com_rules[REDUCESCATTERBLOCK]) { + /* we do, so calc the message size or what ever we need and use + this for the evaluation */ + int alg, faninout, segsize, ignoreme, size; + size_t dsize; + size = ompi_comm_size(comm); + ompi_datatype_type_size (dtype, &dsize); + dsize *= rcount * size; + + alg = ompi_coll_tuned_get_target_method_params(tuned_module->com_rules[REDUCESCATTERBLOCK], + dsize, &faninout, + &segsize, &ignoreme); + if (alg) { + /* we have found a valid choice from the file based rules for this message size */ + return ompi_coll_tuned_reduce_scatter_block_intra_do_this (sbuf, rbuf, rcount, dtype, + op, comm, module, + alg, faninout, segsize); + } /* found a method */ + } /* end if any com rules to check */ + + if (tuned_module->user_forced[REDUCESCATTERBLOCK].algorithm) { + return ompi_coll_tuned_reduce_scatter_block_intra_do_this(sbuf, rbuf, rcount, dtype, + op, comm, module, + tuned_module->user_forced[REDUCESCATTERBLOCK].algorithm, + tuned_module->user_forced[REDUCESCATTERBLOCK].chain_fanout, + tuned_module->user_forced[REDUCESCATTERBLOCK].segsize); + } + return ompi_coll_tuned_reduce_scatter_block_intra_dec_fixed (sbuf, rbuf, rcount, + dtype, op, comm, module); +} + /* * allgather_intra_dec * @@ -610,3 +662,89 @@ int ompi_coll_tuned_scatter_intra_dec_dynamic(const void *sbuf, int scount, rbuf, rcount, rdtype, root, comm, module); } + +int ompi_coll_tuned_exscan_intra_dec_dynamic(const void *sbuf, void* rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_tuned_module_t *tuned_module = (mca_coll_tuned_module_t*) module; + + OPAL_OUTPUT((ompi_coll_tuned_stream, + "ompi_coll_tuned_exscan_intra_dec_dynamic")); + + /** + * check to see if we have some filebased rules. + */ + if (tuned_module->com_rules[EXSCAN]) { + int comsize, alg, faninout, segsize, max_requests; + size_t dsize; + + comsize = ompi_comm_size(comm); + ompi_datatype_type_size (dtype, &dsize); + dsize *= comsize; + + alg = ompi_coll_tuned_get_target_method_params (tuned_module->com_rules[EXSCAN], + dsize, &faninout, &segsize, &max_requests); + + if (alg) { + /* we have found a valid choice from the file based rules for this message size */ + return ompi_coll_tuned_exscan_intra_do_this (sbuf, rbuf, count, dtype, + op, comm, module, + alg); + } /* found a method */ + } /*end if any com rules to check */ + + if (tuned_module->user_forced[EXSCAN].algorithm) { + return ompi_coll_tuned_exscan_intra_do_this(sbuf, rbuf, count, dtype, + op, comm, module, + tuned_module->user_forced[EXSCAN].algorithm); + } + + return ompi_coll_base_exscan_intra_linear(sbuf, rbuf, count, dtype, + op, comm, module); +} + +int ompi_coll_tuned_scan_intra_dec_dynamic(const void *sbuf, void* rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + mca_coll_tuned_module_t *tuned_module = (mca_coll_tuned_module_t*) module; + + OPAL_OUTPUT((ompi_coll_tuned_stream, + "ompi_coll_tuned_scan_intra_dec_dynamic")); + + /** + * check to see if we have some filebased rules. + */ + if (tuned_module->com_rules[SCAN]) { + int comsize, alg, faninout, segsize, max_requests; + size_t dsize; + + comsize = ompi_comm_size(comm); + ompi_datatype_type_size (dtype, &dsize); + dsize *= comsize; + + alg = ompi_coll_tuned_get_target_method_params (tuned_module->com_rules[SCAN], + dsize, &faninout, &segsize, &max_requests); + + if (alg) { + /* we have found a valid choice from the file based rules for this message size */ + return ompi_coll_tuned_scan_intra_do_this (sbuf, rbuf, count, dtype, + op, comm, module, + alg); + } /* found a method */ + } /*end if any com rules to check */ + + if (tuned_module->user_forced[SCAN].algorithm) { + return ompi_coll_tuned_scan_intra_do_this(sbuf, rbuf, count, dtype, + op, comm, module, + tuned_module->user_forced[SCAN].algorithm); + } + + return ompi_coll_base_scan_intra_linear(sbuf, rbuf, count, dtype, + op, comm, module); +} diff --git a/ompi/mca/coll/tuned/coll_tuned_decision_fixed.c b/ompi/mca/coll/tuned/coll_tuned_decision_fixed.c index 3d289a97591..102e4ee11f3 100644 --- a/ompi/mca/coll/tuned/coll_tuned_decision_fixed.c +++ b/ompi/mca/coll/tuned/coll_tuned_decision_fixed.c @@ -13,7 +13,7 @@ * Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2013 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2018 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -500,6 +500,26 @@ int ompi_coll_tuned_reduce_scatter_intra_dec_fixed( const void *sbuf, void *rbuf comm, module); } +/* + * reduce_scatter_block_intra_dec + * + * Function: - seletects reduce_scatter_block algorithm to use + * Accepts: - same arguments as MPI_Reduce_scatter_block() + * Returns: - MPI_SUCCESS or error code (passed from + * the reduce scatter implementation) + */ +int ompi_coll_tuned_reduce_scatter_block_intra_dec_fixed(const void *sbuf, void *rbuf, + int rcount, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module) +{ + OPAL_OUTPUT((ompi_coll_tuned_stream, "ompi_coll_tuned_reduce_scatter_block_intra_dec_fixed")); + return ompi_coll_base_reduce_scatter_block_basic_linear(sbuf, rbuf, rcount, + dtype, op, comm, module); +} + /* * allgather_intra_dec * diff --git a/ompi/mca/coll/tuned/coll_tuned_dynamic_rules.c b/ompi/mca/coll/tuned/coll_tuned_dynamic_rules.c index 6b85dac8508..2c2b4469635 100644 --- a/ompi/mca/coll/tuned/coll_tuned_dynamic_rules.c +++ b/ompi/mca/coll/tuned/coll_tuned_dynamic_rules.c @@ -10,6 +10,8 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 FUJITSU LIMITED. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -289,7 +291,7 @@ ompi_coll_com_rule_t* ompi_coll_tuned_get_com_rule_ptr (ompi_coll_alg_rule_t* ru ompi_coll_alg_rule_t* alg_p = (ompi_coll_alg_rule_t*) NULL; ompi_coll_com_rule_t* com_p = (ompi_coll_com_rule_t*) NULL; ompi_coll_com_rule_t* best_com_p = (ompi_coll_com_rule_t*) NULL; - int i, best; + int i; if (!rules) { /* no rule base no resulting com rule */ return ((ompi_coll_com_rule_t*)NULL); @@ -305,13 +307,12 @@ ompi_coll_com_rule_t* ompi_coll_tuned_get_com_rule_ptr (ompi_coll_alg_rule_t* ru /* make a copy of the first com rule */ best_com_p = com_p = alg_p->com_rules; - i = best = 0; + i = 0; while( i < alg_p->n_com_sizes ) { if (com_p->mpi_comsize > mpi_comsize) { break; } - best = i; best_com_p = com_p; /* go to the next entry */ com_p++; @@ -344,7 +345,7 @@ int ompi_coll_tuned_get_target_method_params (ompi_coll_com_rule_t* base_com_rul { ompi_coll_msg_rule_t* msg_p = (ompi_coll_msg_rule_t*) NULL; ompi_coll_msg_rule_t* best_msg_p = (ompi_coll_msg_rule_t*) NULL; - int i, best; + int i; /* No rule or zero rules */ if( (NULL == base_com_rule) || (0 == base_com_rule->n_msg_sizes)) { @@ -355,13 +356,12 @@ int ompi_coll_tuned_get_target_method_params (ompi_coll_com_rule_t* base_com_rul /* make a copy of the first msg rule */ best_msg_p = msg_p = base_com_rule->msg_rules; - i = best = 0; + i = 0; while (in_msg_sizes) { /* OPAL_OUTPUT((ompi_coll_tuned_stream,"checking mpi_msgsize %d against com_id %d msg_id %d index %d msg_size %d", */ /* mpi_msgsize, msg_p->com_rule_id, msg_p->msg_rule_id, i, msg_p->msg_size)); */ if (msg_p->msg_size <= mpi_msgsize) { - best = i; best_msg_p = msg_p; /* OPAL_OUTPUT((ompi_coll_tuned_stream(":ok\n")); */ } diff --git a/ompi/mca/coll/tuned/coll_tuned_exscan_decision.c b/ompi/mca/coll/tuned/coll_tuned_exscan_decision.c new file mode 100644 index 00000000000..8b4c78869f5 --- /dev/null +++ b/ompi/mca/coll/tuned/coll_tuned_exscan_decision.c @@ -0,0 +1,104 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ +/* + * Copyright (c) 2018 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "ompi_config.h" + +#include "mpi.h" +#include "ompi/constants.h" +#include "ompi/datatype/ompi_datatype.h" +#include "ompi/communicator/communicator.h" +#include "ompi/mca/coll/coll.h" +#include "ompi/mca/coll/base/coll_base_topo.h" +#include "ompi/mca/coll/base/coll_tags.h" +#include "ompi/mca/pml/pml.h" +#include "ompi/op/op.h" +#include "coll_tuned.h" + +/* exscan algorithm variables */ +static int coll_tuned_exscan_forced_algorithm = 0; + +/* valid values for coll_tuned_exscan_forced_algorithm */ +static mca_base_var_enum_value_t exscan_algorithms[] = { + {0, "ignore"}, + {1, "linear"}, + {2, "recursive_doubling"}, + {0, NULL} +}; + +/** + * The following are used by dynamic and forced rules + * + * publish details of each algorithm and if its forced/fixed/locked in + * as you add methods/algorithms you must update this and the query/map routines + * + * this routine is called by the component only + * this makes sure that the mca parameters are set to their initial values and + * perms module does not call this they call the forced_getvalues routine + * instead. + */ + +int ompi_coll_tuned_exscan_intra_check_forced_init (coll_tuned_force_algorithm_mca_param_indices_t *mca_param_indices) +{ + mca_base_var_enum_t*new_enum; + int cnt; + + for( cnt = 0; NULL != exscan_algorithms[cnt].string; cnt++ ); + ompi_coll_tuned_forced_max_algorithms[EXSCAN] = cnt; + + (void) mca_base_component_var_register(&mca_coll_tuned_component.super.collm_version, + "exscan_algorithm_count", + "Number of exscan algorithms available", + MCA_BASE_VAR_TYPE_INT, NULL, 0, + MCA_BASE_VAR_FLAG_DEFAULT_ONLY, + OPAL_INFO_LVL_5, + MCA_BASE_VAR_SCOPE_CONSTANT, + &ompi_coll_tuned_forced_max_algorithms[EXSCAN]); + + /* MPI_T: This variable should eventually be bound to a communicator */ + coll_tuned_exscan_forced_algorithm = 0; + (void) mca_base_var_enum_create("coll_tuned_exscan_algorithms", exscan_algorithms, &new_enum); + mca_param_indices->algorithm_param_index = + mca_base_component_var_register(&mca_coll_tuned_component.super.collm_version, + "exscan_algorithm", + "Which exscan algorithm is used. Can be locked down to choice of: 0 ignore, 1 linear, 2 recursive_doubling", + MCA_BASE_VAR_TYPE_INT, new_enum, 0, MCA_BASE_VAR_FLAG_SETTABLE, + OPAL_INFO_LVL_5, + MCA_BASE_VAR_SCOPE_ALL, + &coll_tuned_exscan_forced_algorithm); + OBJ_RELEASE(new_enum); + if (mca_param_indices->algorithm_param_index < 0) { + return mca_param_indices->algorithm_param_index; + } + + return (MPI_SUCCESS); +} + +int ompi_coll_tuned_exscan_intra_do_this(const void *sbuf, void* rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module, + int algorithm) +{ + OPAL_OUTPUT((ompi_coll_tuned_stream,"coll:tuned:exscan_intra_do_this selected algorithm %d", + algorithm)); + + switch (algorithm) { + case (0): + case (1): return ompi_coll_base_exscan_intra_linear(sbuf, rbuf, count, dtype, + op, comm, module); + case (2): return ompi_coll_base_exscan_intra_recursivedoubling(sbuf, rbuf, count, dtype, + op, comm, module); + } /* switch */ + OPAL_OUTPUT((ompi_coll_tuned_stream,"coll:tuned:exscan_intra_do_this attempt to select algorithm %d when only 0-%d is valid?", + algorithm, ompi_coll_tuned_forced_max_algorithms[EXSCAN])); + return (MPI_ERR_ARG); +} diff --git a/ompi/mca/coll/tuned/coll_tuned_module.c b/ompi/mca/coll/tuned/coll_tuned_module.c index 9f312844773..343cfbb222e 100644 --- a/ompi/mca/coll/tuned/coll_tuned_module.c +++ b/ompi/mca/coll/tuned/coll_tuned_module.c @@ -11,6 +11,8 @@ * All rights reserved. * Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2016 Intel, Inc. All rights reserved. + * Copyright (c) 2018 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -106,6 +108,7 @@ ompi_coll_tuned_comm_query(struct ompi_communicator_t *comm, int *priority) tuned_module->super.coll_gatherv = NULL; tuned_module->super.coll_reduce = ompi_coll_tuned_reduce_intra_dec_fixed; tuned_module->super.coll_reduce_scatter = ompi_coll_tuned_reduce_scatter_intra_dec_fixed; + tuned_module->super.coll_reduce_scatter_block = ompi_coll_tuned_reduce_scatter_block_intra_dec_fixed; tuned_module->super.coll_scan = NULL; tuned_module->super.coll_scatter = ompi_coll_tuned_scatter_intra_dec_fixed; tuned_module->super.coll_scatterv = NULL; @@ -229,7 +232,7 @@ tuned_module_enable( mca_coll_base_module_t *module, COLL_TUNED_EXECUTE_IF_DYNAMIC(tuned_module, BCAST, tuned_module->super.coll_bcast = ompi_coll_tuned_bcast_intra_dec_dynamic); COLL_TUNED_EXECUTE_IF_DYNAMIC(tuned_module, EXSCAN, - tuned_module->super.coll_exscan = NULL); + tuned_module->super.coll_exscan = ompi_coll_tuned_exscan_intra_dec_dynamic); COLL_TUNED_EXECUTE_IF_DYNAMIC(tuned_module, GATHER, tuned_module->super.coll_gather = ompi_coll_tuned_gather_intra_dec_dynamic); COLL_TUNED_EXECUTE_IF_DYNAMIC(tuned_module, GATHERV, @@ -238,8 +241,10 @@ tuned_module_enable( mca_coll_base_module_t *module, tuned_module->super.coll_reduce = ompi_coll_tuned_reduce_intra_dec_dynamic); COLL_TUNED_EXECUTE_IF_DYNAMIC(tuned_module, REDUCESCATTER, tuned_module->super.coll_reduce_scatter = ompi_coll_tuned_reduce_scatter_intra_dec_dynamic); + COLL_TUNED_EXECUTE_IF_DYNAMIC(tuned_module, REDUCESCATTERBLOCK, + tuned_module->super.coll_reduce_scatter_block = ompi_coll_tuned_reduce_scatter_block_intra_dec_dynamic); COLL_TUNED_EXECUTE_IF_DYNAMIC(tuned_module, SCAN, - tuned_module->super.coll_scan = NULL); + tuned_module->super.coll_scan = ompi_coll_tuned_scan_intra_dec_dynamic); COLL_TUNED_EXECUTE_IF_DYNAMIC(tuned_module, SCATTER, tuned_module->super.coll_scatter = ompi_coll_tuned_scatter_intra_dec_dynamic); COLL_TUNED_EXECUTE_IF_DYNAMIC(tuned_module, SCATTERV, diff --git a/ompi/mca/coll/tuned/coll_tuned_reduce_decision.c b/ompi/mca/coll/tuned/coll_tuned_reduce_decision.c index eee424658e1..3aeeb1220c6 100644 --- a/ompi/mca/coll/tuned/coll_tuned_reduce_decision.c +++ b/ompi/mca/coll/tuned/coll_tuned_reduce_decision.c @@ -3,8 +3,8 @@ * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. - * Copyright (c) 2015 Research Organization for Information Science - * and Technology (RIST). All rights reserved. + * Copyright (c) 2015-2018 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -41,6 +41,7 @@ static mca_base_var_enum_value_t reduce_algorithms[] = { {4, "binary"}, {5, "binomial"}, {6, "in-order_binary"}, + {7, "rabenseifner"}, {0, NULL} }; @@ -79,7 +80,7 @@ int ompi_coll_tuned_reduce_intra_check_forced_init (coll_tuned_force_algorithm_m mca_param_indices->algorithm_param_index = mca_base_component_var_register(&mca_coll_tuned_component.super.collm_version, "reduce_algorithm", - "Which reduce algorithm is used. Can be locked down to choice of: 0 ignore, 1 linear, 2 chain, 3 pipeline, 4 binary, 5 binomial, 6 in-order binary", + "Which reduce algorithm is used. Can be locked down to choice of: 0 ignore, 1 linear, 2 chain, 3 pipeline, 4 binary, 5 binomial, 6 in-order binary, 7 rabenseifner", MCA_BASE_VAR_TYPE_INT, new_enum, 0, MCA_BASE_VAR_FLAG_SETTABLE, OPAL_INFO_LVL_5, MCA_BASE_VAR_SCOPE_ALL, @@ -173,6 +174,8 @@ int ompi_coll_tuned_reduce_intra_do_this(const void *sbuf, void* rbuf, int count case (6): return ompi_coll_base_reduce_intra_in_order_binary(sbuf, rbuf, count, dtype, op, root, comm, module, segsize, max_requests); + case (7): return ompi_coll_base_reduce_intra_redscat_gather(sbuf, rbuf, count, dtype, + op, root, comm, module); } /* switch */ OPAL_OUTPUT((ompi_coll_tuned_stream,"coll:tuned:reduce_intra_do_this attempt to select algorithm %d when only 0-%d is valid?", algorithm, ompi_coll_tuned_forced_max_algorithms[REDUCE])); diff --git a/ompi/mca/coll/tuned/coll_tuned_reduce_scatter_block_decision.c b/ompi/mca/coll/tuned/coll_tuned_reduce_scatter_block_decision.c new file mode 100644 index 00000000000..131787b0925 --- /dev/null +++ b/ompi/mca/coll/tuned/coll_tuned_reduce_scatter_block_decision.c @@ -0,0 +1,139 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ +/* + * Copyright (c) 2018 Siberian State University of Telecommunications + * and Information Sciences. All rights reserved. + * Copyright (c) 2018 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "ompi_config.h" + +#include "mpi.h" +#include "opal/util/bit_ops.h" +#include "ompi/constants.h" +#include "ompi/datatype/ompi_datatype.h" +#include "ompi/communicator/communicator.h" +#include "ompi/mca/coll/coll.h" +#include "ompi/mca/coll/base/coll_base_topo.h" +#include "ompi/mca/coll/base/coll_tags.h" +#include "ompi/mca/pml/pml.h" +#include "ompi/op/op.h" +#include "coll_tuned.h" + +/* reduce_scatter_block algorithm variables */ +static int coll_tuned_reduce_scatter_block_forced_algorithm = 0; +static int coll_tuned_reduce_scatter_block_segment_size = 0; +static int coll_tuned_reduce_scatter_block_tree_fanout; + +/* valid values for coll_tuned_reduce_scatter_blokc_forced_algorithm */ +static mca_base_var_enum_value_t reduce_scatter_block_algorithms[] = { + {0, "ignore"}, + {1, "basic_linear"}, + {2, "recursive_doubling"}, + {3, "recursive_halving"}, + {4, "butterfly"}, + {0, NULL} +}; + +/** + * The following are used by dynamic and forced rules + * + * publish details of each algorithm and if its forced/fixed/locked in + * as you add methods/algorithms you must update this and the query/map routines + * + * this routine is called by the component only + * this makes sure that the mca parameters are set to their initial values and + * perms module does not call this they call the forced_getvalues routine + * instead + */ + +int ompi_coll_tuned_reduce_scatter_block_intra_check_forced_init (coll_tuned_force_algorithm_mca_param_indices_t *mca_param_indices) +{ + mca_base_var_enum_t *new_enum; + int cnt; + + for( cnt = 0; NULL != reduce_scatter_block_algorithms[cnt].string; cnt++ ); + ompi_coll_tuned_forced_max_algorithms[REDUCESCATTERBLOCK] = cnt; + + (void) mca_base_component_var_register(&mca_coll_tuned_component.super.collm_version, + "reduce_scatter_block_algorithm_count", + "Number of reduce_scatter_block algorithms available", + MCA_BASE_VAR_TYPE_INT, NULL, 0, + MCA_BASE_VAR_FLAG_DEFAULT_ONLY, + OPAL_INFO_LVL_5, + MCA_BASE_VAR_SCOPE_CONSTANT, + &ompi_coll_tuned_forced_max_algorithms[REDUCESCATTERBLOCK]); + + /* MPI_T: This variable should eventually be bound to a communicator */ + coll_tuned_reduce_scatter_block_forced_algorithm = 0; + (void) mca_base_var_enum_create("coll_tuned_reduce_scatter_block_algorithms", reduce_scatter_block_algorithms, &new_enum); + mca_param_indices->algorithm_param_index = + mca_base_component_var_register(&mca_coll_tuned_component.super.collm_version, + "reduce_scatter_block_algorithm", + "Which reduce reduce_scatter_block algorithm is used. " + "Can be locked down to choice of: 0 ignore, 1 basic_linear, 2 recursive_doubling, " + "3 recursive_halving, 4 butterfly", + MCA_BASE_VAR_TYPE_INT, new_enum, 0, MCA_BASE_VAR_FLAG_SETTABLE, + OPAL_INFO_LVL_5, + MCA_BASE_VAR_SCOPE_ALL, + &coll_tuned_reduce_scatter_block_forced_algorithm); + OBJ_RELEASE(new_enum); + if (mca_param_indices->algorithm_param_index < 0) { + return mca_param_indices->algorithm_param_index; + } + + coll_tuned_reduce_scatter_block_segment_size = 0; + mca_param_indices->segsize_param_index = + mca_base_component_var_register(&mca_coll_tuned_component.super.collm_version, + "reduce_scatter_block_algorithm_segmentsize", + "Segment size in bytes used by default for reduce_scatter_block algorithms. Only has meaning if algorithm is forced and supports segmenting. 0 bytes means no segmentation.", + MCA_BASE_VAR_TYPE_INT, NULL, 0, MCA_BASE_VAR_FLAG_SETTABLE, + OPAL_INFO_LVL_5, + MCA_BASE_VAR_SCOPE_ALL, + &coll_tuned_reduce_scatter_block_segment_size); + + coll_tuned_reduce_scatter_block_tree_fanout = ompi_coll_tuned_init_tree_fanout; /* get system wide default */ + mca_param_indices->tree_fanout_param_index = + mca_base_component_var_register(&mca_coll_tuned_component.super.collm_version, + "reduce_scatter_block_algorithm_tree_fanout", + "Fanout for n-tree used for reduce_scatter_block algorithms. Only has meaning if algorithm is forced and supports n-tree topo based operation.", + MCA_BASE_VAR_TYPE_INT, NULL, 0, MCA_BASE_VAR_FLAG_SETTABLE, + OPAL_INFO_LVL_5, + MCA_BASE_VAR_SCOPE_ALL, + &coll_tuned_reduce_scatter_block_tree_fanout); + + return (MPI_SUCCESS); +} + +int ompi_coll_tuned_reduce_scatter_block_intra_do_this(const void *sbuf, void *rbuf, + int rcount, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module, + int algorithm, int faninout, int segsize) +{ + OPAL_OUTPUT((ompi_coll_tuned_stream, "coll:tuned:reduce_scatter_block_intra_do_this selected algorithm %d topo faninout %d segsize %d", + algorithm, faninout, segsize)); + + switch (algorithm) { + case (0): return ompi_coll_tuned_reduce_scatter_block_intra_dec_fixed(sbuf, rbuf, rcount, + dtype, op, comm, module); + case (1): return ompi_coll_base_reduce_scatter_block_basic_linear(sbuf, rbuf, rcount, + dtype, op, comm, module); + case (2): return ompi_coll_base_reduce_scatter_block_intra_recursivedoubling(sbuf, rbuf, rcount, + dtype, op, comm, module); + case (3): return ompi_coll_base_reduce_scatter_block_intra_recursivehalving(sbuf, rbuf, rcount, + dtype, op, comm, module); + case (4): return ompi_coll_base_reduce_scatter_block_intra_butterfly(sbuf, rbuf, rcount, dtype, op, comm, + module); + } /* switch */ + OPAL_OUTPUT((ompi_coll_tuned_stream, "coll:tuned:reduce_scatter_block_intra_do_this attempt to select algorithm %d when only 0-%d is valid?", + algorithm, ompi_coll_tuned_forced_max_algorithms[REDUCESCATTERBLOCK])); + return (MPI_ERR_ARG); +} diff --git a/ompi/mca/coll/tuned/coll_tuned_scan_decision.c b/ompi/mca/coll/tuned/coll_tuned_scan_decision.c new file mode 100644 index 00000000000..7bff86f0d5d --- /dev/null +++ b/ompi/mca/coll/tuned/coll_tuned_scan_decision.c @@ -0,0 +1,104 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ +/* + * Copyright (c) 2018 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "ompi_config.h" + +#include "mpi.h" +#include "ompi/constants.h" +#include "ompi/datatype/ompi_datatype.h" +#include "ompi/communicator/communicator.h" +#include "ompi/mca/coll/coll.h" +#include "ompi/mca/coll/base/coll_base_topo.h" +#include "ompi/mca/coll/base/coll_tags.h" +#include "ompi/mca/pml/pml.h" +#include "ompi/op/op.h" +#include "coll_tuned.h" + +/* scan algorithm variables */ +static int coll_tuned_scan_forced_algorithm = 0; + +/* valid values for coll_tuned_scan_forced_algorithm */ +static mca_base_var_enum_value_t scan_algorithms[] = { + {0, "ignore"}, + {1, "linear"}, + {2, "recursive_doubling"}, + {0, NULL} +}; + +/** + * The following are used by dynamic and forced rules + * + * publish details of each algorithm and if its forced/fixed/locked in + * as you add methods/algorithms you must update this and the query/map routines + * + * this routine is called by the component only + * this makes sure that the mca parameters are set to their initial values and + * perms module does not call this they call the forced_getvalues routine + * instead. + */ + +int ompi_coll_tuned_scan_intra_check_forced_init (coll_tuned_force_algorithm_mca_param_indices_t *mca_param_indices) +{ + mca_base_var_enum_t*new_enum; + int cnt; + + for( cnt = 0; NULL != scan_algorithms[cnt].string; cnt++ ); + ompi_coll_tuned_forced_max_algorithms[SCAN] = cnt; + + (void) mca_base_component_var_register(&mca_coll_tuned_component.super.collm_version, + "scan_algorithm_count", + "Number of scan algorithms available", + MCA_BASE_VAR_TYPE_INT, NULL, 0, + MCA_BASE_VAR_FLAG_DEFAULT_ONLY, + OPAL_INFO_LVL_5, + MCA_BASE_VAR_SCOPE_CONSTANT, + &ompi_coll_tuned_forced_max_algorithms[SCAN]); + + /* MPI_T: This variable should eventually be bound to a communicator */ + coll_tuned_scan_forced_algorithm = 0; + (void) mca_base_var_enum_create("coll_tuned_scan_algorithms", scan_algorithms, &new_enum); + mca_param_indices->algorithm_param_index = + mca_base_component_var_register(&mca_coll_tuned_component.super.collm_version, + "scan_algorithm", + "Which scan algorithm is used. Can be locked down to choice of: 0 ignore, 1 linear, 2 recursive_doubling", + MCA_BASE_VAR_TYPE_INT, new_enum, 0, MCA_BASE_VAR_FLAG_SETTABLE, + OPAL_INFO_LVL_5, + MCA_BASE_VAR_SCOPE_ALL, + &coll_tuned_scan_forced_algorithm); + OBJ_RELEASE(new_enum); + if (mca_param_indices->algorithm_param_index < 0) { + return mca_param_indices->algorithm_param_index; + } + + return (MPI_SUCCESS); +} + +int ompi_coll_tuned_scan_intra_do_this(const void *sbuf, void* rbuf, int count, + struct ompi_datatype_t *dtype, + struct ompi_op_t *op, + struct ompi_communicator_t *comm, + mca_coll_base_module_t *module, + int algorithm) +{ + OPAL_OUTPUT((ompi_coll_tuned_stream,"coll:tuned:scan_intra_do_this selected algorithm %d", + algorithm)); + + switch (algorithm) { + case (0): + case (1): return ompi_coll_base_scan_intra_linear(sbuf, rbuf, count, dtype, + op, comm, module); + case (2): return ompi_coll_base_scan_intra_recursivedoubling(sbuf, rbuf, count, dtype, + op, comm, module); + } /* switch */ + OPAL_OUTPUT((ompi_coll_tuned_stream,"coll:tuned:scan_intra_do_this attempt to select algorithm %d when only 0-%d is valid?", + algorithm, ompi_coll_tuned_forced_max_algorithms[SCAN])); + return (MPI_ERR_ARG); +} diff --git a/ompi/mca/common/monitoring/HowTo_pml_monitoring.tex b/ompi/mca/common/monitoring/HowTo_pml_monitoring.tex new file mode 100644 index 00000000000..752ed464520 --- /dev/null +++ b/ompi/mca/common/monitoring/HowTo_pml_monitoring.tex @@ -0,0 +1,1298 @@ +% Copyright (c) 2016-2017 Inria. All rights reserved. +% $COPYRIGHT$ +% +% Additional copyrights may follow +% +% $HEADER$ + +\documentclass[notitlepage]{article} + +\usepackage[english]{babel} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage[a4paper]{geometry} +\usepackage{verbatim} +\usepackage{dirtree} + +\title{How to use Open~MPI monitoring component} + +\author{C. FOYER - INRIA} + +\newcommand{\mpit}[1]{\textit{MPI\_Tool#1}} +\newcommand{\ompi}[0]{Open~MPI} +\newcommand{\brkunds}[0]{\allowbreak\_} + +\begin{document} + +\maketitle + +\section{Introduction} + +\mpit{} is a concept introduced in the MPI-3 standard. It allows MPI +developers, or third party, to offer a portable interface to different +tools. These tools may be used to monitor application, measure its +performances, or profile it. \mpit{} is an interface that ease the +addition of external functions to a MPI library. It also allows the +user to control and monitor given internal variables of the runtime +system. + +The present document is here to introduce the use the \mpit{} +interface from a user point of view, and to facilitate the usage of +the \ompi{} monitoring component. This component allows for +precisely recording the message exchanges between nodes during MPI +applications execution. The number of messages and the amount of data +exchanged are recorded, including or excluding internal communications +(such as those generated by the implementation of the collective +algorithms). + +This component offers two types of monitoring, whether the user wants +a fine control over the monitoring, or just an overall view of the +messages. Moreover, the fine control allows the user to access the +results through the application, and let him reset the variables when +needed. The fine control is achieved via the \mpit{} interface, which +needs the code to be adapted by adding a specific initialization +function. However, the basic overall monitoring is achieved without +any modification of the application code. + +Whether you are using one version or the other, the monitoring need to +be enabled with parameters added when calling \texttt{mpiexec}, or +globally on your \ompi{} MCA configuration file +(\${HOME}/openmpi/mca-param.conf). Three new parameters have been +introduced: +\begin{description} +\item [\texttt{-{}-mca pml\brkunds{}monitoring\brkunds{}enable value}] + This parameter sets the monitoring mode. \texttt{value} may be: + \begin{description} + \item [0] monitoring is disabled + \item [1] monitoring is enabled, with no distinction between user + issued and library issued messages. + \item [$\ge$ 2] monitoring enabled, with a distinction between + messages issued from the library ({\bf internal}) and messages + issued from the user ({\bf external}). + \end{description} +\item [\texttt{-{}-mca + pml\brkunds{}monitoring\brkunds{}enable\brkunds{}output value}] + This parameter enables the automatic flushing of monitored values + during the call to \texttt{MPI\brkunds{}Finalize}. {\bf This option + is to be used only without \mpit{}, or with \texttt{value} = + 0}. \texttt{value} may be: + \begin{description} + \item [0] final output flushing is disable + \item [1] final output flushing is done in the standard output + stream (\texttt{stdout}) + \item [2] final output flushing is done in the error output stream + (\texttt{stderr}) + \item [$\ge$ 3] final output flushing is done in the file which name + is given with the + \texttt{pml\brkunds{}monitoring\brkunds{}filename} parameter. + \end{description} + Each MPI process flushes its recorded data. The pieces of + information can be aggregated whether with the use of PMPI (see + Section~\ref{subsec:ldpreload}) or with the distributed script {\it + test/monitoring/profile2mat.pl}. +\item [\texttt{-{}-mca pml\brkunds{}monitoring\brkunds{}filename + filename}] Set the file where to flush the resulting output from + monitoring. The output is a communication matrix of both the number + of messages and the total size of exchanged data between each couple + of nodes. This parameter is needed if + \texttt{pml\brkunds{}monitoring\brkunds{}enable\brkunds{}output} + $\ge$ 3. +\end{description} + + +Also, in order to run an application without some monitoring enabled, +you need to add the following parameters at mpiexec time: +\begin{description} +\item [\texttt{-{}-mca pml \^{}monitoring}] This parameter disable the + monitoring component of the PML framework +\item [\texttt{-{}-mca osc \^{}monitoring}] This parameter disable the + monitoring component of the OSC framework +\item [\texttt{-{}-mca coll \^{}monitoring}] This parameter disable + the monitoring component of the COLL framework +\end{description} + +\section{Without \mpit{}} + +This mode should be used to monitor the whole application from its +start until its end. It is defined such as you can record the amount +of communications without any code modification. + +In order to do so, you have to get \ompi{} compiled with monitoring +enabled. When you launch your application, you need to set the +parameter \texttt{pml\brkunds{}monitoring\brkunds{}enable} to a value +$> 0$, and, if +\texttt{pml\brkunds{}monitoring\brkunds{}enable\brkunds{}output} $\ge$ +3, to set the \texttt{pml\brkunds{}monitoring\brkunds{}filename} +parameter to a proper filename, which path must exists. + +\section{With \mpit{}} + +This section explains how to monitor your applications with the use +of \mpit{}. + +\subsection{How it works} + +\mpit{} is a layer that is added to the standard MPI +implementation. As such, it must be noted first that it may have an +impact to the performances. + +As these functionality are orthogonal to the core ones, \mpit{} +initialization and finalization are independent from MPI's one. There +is no restriction regarding the order or the different calls. Also, +the \mpit{} interface initialization function can be called more than +once within the execution, as long as the finalize function is called +as many times. + +\mpit{} introduces two types of variables, \textit{control variables} +and \textit{performance variables}. These variables will be referred +to respectively as \textit{cvar} and \textit{pvar}. The variables can +be used to tune dynamically the application to fit best the needs of +the application. They are defined by the library (or by the external +component), and accessed with the given accessors functions, specified +in the standard. The variables are named uniquely through the +application. Every variable, once defined and registered within the +MPI engine, is given an index that will not change during the entire +execution. + +Same as for the monitoring without \mpit{}, you need to start your +application with the control variable +\textit{pml\brkunds{}monitoring\brkunds{}enable} properly set. Even +though, it is not required, you can also add for your command line the +desired filename to flush the monitoring output. As long as no +filename is provided, no output can be generated. + +\subsection{Initialization} + +The initialization is made by a call to +\texttt{MPI\brkunds{}T\brkunds{}init\brkunds{}thread}. This function +takes two parameters. The first one is the desired level of thread +support, the second one is the provided level of thread support. It +has the same semantic as the +\texttt{MPI\brkunds{}Init\brkunds{}thread} function. Please note that +the first function to be called (between +\texttt{MPI\brkunds{}T\brkunds{}init\brkunds{}thread} and +\texttt{MPI\brkunds{}Init\brkunds{}thread}) may influence the second +one for the provided level of thread support. This function goal is to +initialize control and performance variables. + +But, in order to use the performance variables within one context +without influencing the one from an other context, a variable has to +be bound to a session. To create a session, you have to call +\texttt{MPI\brkunds{}T\brkunds{}pvar\brkunds{}session\brkunds{}create} +in order to initialize a session. + +In addition to the binding of a session, a performance variable may +also depend on a MPI object. For example, the +\textit{pml\brkunds{}monitoring\brkunds{}flush} variable needs to be +bound to a communicator. In order to do so, you need to use the +\texttt{MPI\brkunds{}T\brkunds{}pvar\brkunds{}handle\brkunds{}alloc} +function, which takes as parameters the used session, the id of the +variable, the MPI object +(i.e. \texttt{MPI\brkunds{}COMM\brkunds{}WORLD} in the case of +\textit{pml\brkunds{}monitoring\brkunds{}flush}), the reference to the +performance variable handle and a reference to an integer value. The +last parameter allow the user to receive some additional information +about the variable, or the MPI object bound. As an example, when +binding to the \textit{pml\brkunds{}monitoring\brkunds{}flush} +performance variable, the last parameter is set to the length of the +current filename used for the flush, if any, and 0 otherwise ; when +binding to the +\textit{pml\brkunds{}monitoring\brkunds{}messages\brkunds{}count} +performance variable, the parameter is set to the size of the size of +bound communicator, as it corresponds to the expected size of the +array (in number of elements) when retrieving the data. This parameter +is used to let the application determines the amount of data to be +returned when reading the performance variables. Please note that the +\textit{handle\brkunds{}alloc} function takes the variable id as +parameter. In order to retrieve this value, you have to call +\texttt{MPI\brkunds{}T\brkunds{}pvar\brkunds{}get\brkunds{}index} +which take as a IN parameter a string that contains the name of the +desired variable. + +\subsection{How to use the performance variables} + +Some performance variables are defined in the monitoring component: +\begin{description} +\item [\textit{pml\brkunds{}monitoring\brkunds{}flush}] Allow the user + to define a file where to flush the recorded data. +\item + [\textit{pml\brkunds{}monitoring\brkunds{}messages\brkunds{}count}] + Allow the user to access within the application the number of + messages exchanged through the PML framework with each node from the + bound communicator (\textit{MPI\brkunds{}Comm}). This variable + returns an array of number of nodes size typed integers. +\item + [\textit{pml\brkunds{}monitoring\brkunds{}messages\brkunds{}size}] + Allow the user to access within the application the amount of data + exchanged through the PML framework with each node from the bound + communicator (\textit{MPI\brkunds{}Comm}). This variable returns an + array of number of nodes size typed integers. +\item + [\textit{osc\brkunds{}monitoring\brkunds{}messages\brkunds{}sent\brkunds{}count}] + Allow the user to access within the application the number of + messages sent through the OSC framework with each node from the + bound communicator (\textit{MPI\brkunds{}Comm}). This variable + returns an array of number of nodes size typed integers. +\item + [\textit{osc\brkunds{}monitoring\brkunds{}messages\brkunds{}sent\brkunds{}size}] + Allow the user to access within the application the amount of data + sent through the OSC framework with each node from the bound + communicator (\textit{MPI\brkunds{}Comm}). This variable returns an + array of number of nodes size typed integers. +\item + [\textit{osc\brkunds{}monitoring\brkunds{}messages\brkunds{}recv\brkunds{}count}] + Allow the user to access within the application the number of + messages received through the OSC framework with each node from the + bound communicator (\textit{MPI\brkunds{}Comm}). This variable + returns an array of number of nodes size typed integers. +\item + [\textit{osc\brkunds{}monitoring\brkunds{}messages\brkunds{}recv\brkunds{}size}] + Allow the user to access within the application the amount of data + received through the OSC framework with each node from the bound + communicator (\textit{MPI\brkunds{}Comm}). This variable returns an + array of number of nodes size typed integers. +\item + [\textit{coll\brkunds{}monitoring\brkunds{}messages\brkunds{}count}] + Allow the user to access within the application the number of + messages exchanged through the COLL framework with each node from + the bound communicator (\textit{MPI\brkunds{}Comm}). This variable + returns an array of number of nodes size typed integers. +\item + [\textit{coll\brkunds{}monitoring\brkunds{}messages\brkunds{}size}] + Allow the user to access within the application the amount of data + exchanged through the COLL framework with each node from the bound + communicator (\textit{MPI\brkunds{}Comm}). This variable returns an + array of number of nodes size typed integers. +\item [\textit{coll\brkunds{}monitoring\brkunds{}o2a\brkunds{}count}] + Allow the user to access within the application the number of + one-to-all collective operations across the bound communicator + (\textit{MPI\brkunds{}Comm}) where the process was defined as + root. This variable returns a single size typed integer. +\item [\textit{coll\brkunds{}monitoring\brkunds{}o2a\brkunds{}size}] + Allow the user to access within the application the amount of data + sent as one-to-all collective operations across the bound + communicator (\textit{MPI\brkunds{}Comm}). This variable returns a + single size typed integers. The communications between a process + and itself are not taken in account +\item [\textit{coll\brkunds{}monitoring\brkunds{}a2o\brkunds{}count}] + Allow the user to access within the application the number of + all-to-one collective operations across the bound communicator + (\textit{MPI\brkunds{}Comm}) where the process was defined as + root. This variable returns a single size typed integer. +\item [\textit{coll\brkunds{}monitoring\brkunds{}a2o\brkunds{}size}] + Allow the user to access within the application the amount of data + received from all-to-one collective operations across the bound + communicator (\textit{MPI\brkunds{}Comm}). This variable returns a + single size typed integers. The communications between a process + and itself are not taken in account +\item [\textit{coll\brkunds{}monitoring\brkunds{}a2a\brkunds{}count}] + Allow the user to access within the application the number of + all-to-all collective operations across the bound communicator + (\textit{MPI\brkunds{}Comm}). This variable returns a single + size typed integer. +\item [\textit{coll\brkunds{}monitoring\brkunds{}a2a\brkunds{}size}] + Allow the user to access within the application the amount of data + sent as all-to-all collective operations across the bound + communicator (\textit{MPI\brkunds{}Comm}). This variable returns a + single size typed integers. The communications between a process + and itself are not taken in account +\end{description} + +In case of uncertainty about how a collective in categorized as, please refer to the list given in Table~\ref{tab:coll-cat}. + +\begin{table} + \begin{center} + \begin{tabular}{|l|l|l|} + \hline + One-To-All & All-To-One & All-To-All \\ + \hline + MPI\_Bcast & MPI\_Gather & MPI\_Allgather \\ + MPI\_Ibcast & MPI\_Gatherv & MPI\_Allgatherv \\ + MPI\_Iscatter & MPI\_Igather & MPI\_Allreduce \\ + MPI\_Iscatterv & MPI\_Igatherv & MPI\_Alltoall \\ + MPI\_Scatter & MPI\_Ireduce & MPI\_Alltoallv \\ + MPI\_Scatterv & MPI\_Reduce & MPI\_Alltoallw \\ + && MPI\_Barrier \\ + && MPI\_Exscan \\ + && MPI\_Iallgather \\ + && MPI\_Iallgatherv \\ + && MPI\_Iallreduce \\ + && MPI\_Ialltoall \\ + && MPI\_Ialltoallv \\ + && MPI\_Ialltoallw \\ + && MPI\_Ibarrier \\ + && MPI\_Iexscan \\ + && MPI\_Ineighbor\_allgather \\ + && MPI\_Ineighbor\_allgatherv \\ + && MPI\_Ineighbor\_alltoall \\ + && MPI\_Ineighbor\_alltoallv \\ + && MPI\_Ineighbor\_alltoallw \\ + && MPI\_Ireduce\_scatter \\ + && MPI\_Ireduce\_scatter\_block \\ + && MPI\_Iscan \\ + && MPI\_Neighbor\_allgather \\ + && MPI\_Neighbor\_allgatherv \\ + && MPI\_Neighbor\_alltoall \\ + && MPI\_Neighbor\_alltoallv \\ + && MPI\_Neighbor\_alltoallw \\ + && MPI\_Reduce\_scatter \\ + && MPI\_Reduce\_scatter\_block \\ + && MPI\_Scan \\ + \hline + \end{tabular} +\end{center} + \caption{Collective Operations Categorization} + \label{tab:coll-cat} +\end{table} + +Once bound to a session and to the proper MPI object, these variables +may be accessed through a set of given functions. It must be noted +here that each of the functions applied to the different variables +need, in fact, to be called with the handle of the variable. + +The first variable may be modified by using the +\texttt{MPI\brkunds{}T\brkunds{}pvar\brkunds{}write} function. The +later variables may be read using +\texttt{MPI\brkunds{}T\brkunds{}pvar\brkunds{}read} but cannot be +written. Stopping the \textit{flush} performance variable, with a call +to \texttt{MPI\brkunds{}T\brkunds{}pvar\brkunds{}stop}, force the +counters to be flushed into the given file, reseting to 0 the counters +at the same time. Also, binding a new handle to the \textit{flush} +variable will reset the counters. Finally, please note that the size +and counter performance variables may overflow for multiple large +amounts of communications. + +The monitoring will start on the call to the +\texttt{MPI\brkunds{}T\brkunds{}pvar\brkunds{}start} until the moment +you call the \texttt{MPI\brkunds{}T\brkunds{}pvar\brkunds{}stop} +function. + +Once you are done with the different monitoring, you can clean +everything by calling the function +\texttt{MPI\brkunds{}T\brkunds{}pvar\brkunds{}handle\brkunds{}free} to +free the allocated handles, +\texttt{MPI\brkunds{}T\brkunds{}pvar\brkunds{}session\brkunds{}free} +to free the session, and \texttt{MPI\brkunds{}T\brkunds{}Finalize} to +state the end of your use of performance and control variables. + +\subsection{Overview of the calls} + +To summarize the previous informations, here is the list of available +performance variables, and the outline of the different calls to be +used to properly access monitored data through the \mpit{} interface. +\begin{itemize} +\item \textit{pml\brkunds{}monitoring\brkunds{}flush} +\item + \textit{pml\brkunds{}monitoring\brkunds{}messages\brkunds{}count} +\item \textit{pml\brkunds{}monitoring\brkunds{}messages\brkunds{}size} +\item + \textit{osc\brkunds{}monitoring\brkunds{}messages\brkunds{}sent\brkunds{}count} +\item + \textit{osc\brkunds{}monitoring\brkunds{}messages\brkunds{}sent\brkunds{}size} +\item + \textit{osc\brkunds{}monitoring\brkunds{}messages\brkunds{}recv\brkunds{}count} +\item + \textit{osc\brkunds{}monitoring\brkunds{}messages\brkunds{}recv\brkunds{}size} +\item + \textit{coll\brkunds{}monitoring\brkunds{}messages\brkunds{}count} +\item + \textit{coll\brkunds{}monitoring\brkunds{}messages\brkunds{}size} +\item \textit{coll\brkunds{}monitoring\brkunds{}o2a\brkunds{}count} +\item \textit{coll\brkunds{}monitoring\brkunds{}o2a\brkunds{}size} +\item \textit{coll\brkunds{}monitoring\brkunds{}a2o\brkunds{}count} +\item \textit{coll\brkunds{}monitoring\brkunds{}a2o\brkunds{}size} +\item \textit{coll\brkunds{}monitoring\brkunds{}a2a\brkunds{}count} +\item \textit{coll\brkunds{}monitoring\brkunds{}a2a\brkunds{}size} +\end{itemize} +Add to your command line at least \texttt{-{}-mca + pml\brkunds{}monitoring\brkunds{}enable [1,2]} \\ Sequence of +\mpit{}: +\begin{enumerate} +\item {\texttt{MPI\brkunds{}T\brkunds{}init\brkunds{}thread}} + Initialize the MPI\brkunds{}Tools interface +\item + {\texttt{MPI\brkunds{}T\brkunds{}pvar\brkunds{}get\brkunds{}index}} + To retrieve the variable id +\item {\texttt{MPI\brkunds{}T\brkunds{}session\brkunds{}create}} To + create a new context in which you use your variable +\item {\texttt{MPI\brkunds{}T\brkunds{}handle\brkunds{}alloc}} To bind + your variable to the proper session and MPI object +\item {\texttt{MPI\brkunds{}T\brkunds{}pvar\brkunds{}start}} To start + the monitoring +\item Now you do all the communications you want to monitor +\item {\texttt{MPI\brkunds{}T\brkunds{}pvar\brkunds{}stop}} To stop + and flush the monitoring +\item + {\texttt{MPI\brkunds{}T\brkunds{}pvar\brkunds{}handle\brkunds{}free}} +\item + {\texttt{MPI\brkunds{}T\brkunds{}pvar\brkunds{}session\brkunds{}free}} +\item {\texttt{MPI\brkunds{}T\brkunds{}finalize}} +\end{enumerate} + +\subsection{Use of \textsc{LD\brkunds{}PRELOAD}} +\label{subsec:ldpreload} + +In order to automatically generate communication matrices, you can use +the {\it monitoring\brkunds{}prof} tool that can be found in +\textit{test/monitoring/monitoring\brkunds{}prof.c}. While launching +your application, you can add the following option in addition to the +\texttt{-{}-mca pml\brkunds{}monitoring\brkunds{}enable} parameter: +\begin{description} +\item [\texttt{-x + LD\_PRELOAD=ompi\_install\_dir/lib/monitoring\_prof.so}] +\end{description} + +This library automatically gathers sent and received data into one +communication matrix. Although, the use of monitoring \mpit{} within +the code may interfere with this library. The main goal of this +library is to avoid dumping one file per MPI process, and gather +everything in one file aggregating all pieces of information. + +The resulting communication matrices are as close as possible as the +effective amount of data exchanged between nodes. But it has to be +kept in mind that because of the stack of the logical layers in +\ompi{}, the amount of data recorded as part of collectives or +one-sided operations may be duplicated when the PML layer handles the +communication. For an exact measure of communications, the application +must use \mpit{}'s monitoring performance variables to potentially +subtract double-recorded data. + +\subsection{Examples} + +First is presented an example of monitoring using the \mpit{} in order +to define phases during which the monitoring component is active. A +second snippet is presented for how to access monitoring performance +variables with \mpit{}. + +\subsubsection{Monitoring Phases} + +You can execute the following example with +\\ \verb|mpiexec -n 4 --mca pml_monitoring_enable 2 test_monitoring|. Please +note that you need the prof directory to already exists to retrieve +the dumped files. Following the complete code example, you will find a +sample dumped file and the corresponding explanations. + +\paragraph{test\_monitoring.c} (extract) + +\begin{verbatim} +#include +#include +#include + +static const void* nullbuff = NULL; +static MPI_T_pvar_handle flush_handle; +static const char flush_pvar_name[] = "pml_monitoring_flush"; +static const char flush_cvar_name[] = "pml_monitoring_enable"; +static int flush_pvar_idx; + +int main(int argc, char* argv[]) +{ + int rank, size, n, to, from, tagno, MPIT_result, provided, count; + MPI_T_pvar_session session; + MPI_Status status; + MPI_Comm newcomm; + MPI_Request request; + char filename[1024]; + + /* Initialization of parameters */ + + n = -1; + MPI_Init(&argc, &argv); + MPI_Comm_rank(MPI_COMM_WORLD, &rank); + MPI_Comm_size(MPI_COMM_WORLD, &size); + to = (rank + 1) % size; + from = (rank + size - 1) % size; + tagno = 201; + + /* Initialization of performance variables */ + + MPIT_result = MPI_T_init_thread(MPI_THREAD_SINGLE, &provided); + if (MPIT_result != MPI_SUCCESS) + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + + MPIT_result = MPI_T_pvar_get_index(flush_pvar_name, + MPI_T_PVAR_CLASS_GENERIC, + &flush_pvar_idx); + if (MPIT_result != MPI_SUCCESS) { + printf("cannot find monitoring MPI_T \"%s\" pvar, " + "check that you have monitoring pml\n", + flush_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + MPIT_result = MPI_T_pvar_session_create(&session); + if (MPIT_result != MPI_SUCCESS) { + printf("cannot create a session for \"%s\" pvar\n", + flush_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + /* Allocating a new PVAR in a session will reset the counters */ + + MPIT_result = MPI_T_pvar_handle_alloc(session, flush_pvar_idx, + MPI_COMM_WORLD, + &flush_handle, + &count); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to allocate handle on \"%s\" pvar, " + "check that you have monitoring pml\n", + flush_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + /* First phase: make a token circulated in MPI_COMM_WORLD */ + + MPIT_result = MPI_T_pvar_start(session, flush_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to start handle on \"%s\" pvar, " + "check that you have monitoring pml\n", + flush_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + if (rank == 0) { + n = 25; + MPI_Isend(&n,1,MPI_INT,to,tagno,MPI_COMM_WORLD,&request); + } + while (1) { + MPI_Irecv(&n, 1, MPI_INT, from, tagno, MPI_COMM_WORLD, &request); + MPI_Wait(&request, &status); + if (rank == 0) {n--;tagno++;} + MPI_Isend(&n, 1, MPI_INT, to, tagno, MPI_COMM_WORLD, &request); + if (rank != 0) {n--;tagno++;} + if (n<0){ + break; + } + } + + /* + * Build one file per processes + * Every thing that has been monitored by each + * process since the last flush will be output in filename + * + * Requires directory prof to be created. + * Filename format should display the phase number + * and the process rank for ease of parsing with + * aggregate_profile.pl script + */ + + sprintf(filename,"prof/phase_1"); + if( MPI_SUCCESS != MPI_T_pvar_write(session, flush_handle, + filename) ) + { + fprintf(stderr, + "Process %d cannot save monitoring in %s.%d.prof\n", + rank, filename, rank); + } + + /* Force the writing of the monitoring data */ + + MPIT_result = MPI_T_pvar_stop(session, flush_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to stop handle on \"%s\" pvar, " + "check that you have monitoring pml\n", + flush_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + /* + * Don't set a filename. If we stop the session before setting + * it, then no output will be generated. + */ + + if( MPI_SUCCESS != MPI_T_pvar_write(session, flush_handle, + &nullbuff) ) + { + fprintf(stderr, + "Process %d cannot save monitoring in %s\n", + rank, filename); + } + + (void)MPI_T_finalize(); + + MPI_Finalize(); + + return EXIT_SUCCESS; +} +\end{verbatim} + +\paragraph{prof/phase\_1.0.prof} + +\begin{verbatim} +# POINT TO POINT +E 0 1 108 bytes 27 msgs sent 0,0,0,27,0,[...],0 +# OSC +# COLLECTIVES +D MPI_COMM_WORLD procs: 0,1,2,3 +O2A 0 0 bytes 0 msgs sent +A2O 0 0 bytes 0 msgs sent +A2A 0 0 bytes 0 msgs sent +\end{verbatim} + +As it show on the sample profiling, for each kind of communication +(point-to-point, one-sided and collective), you find all the related +informations. There is one line per peers communicating. Each line +start with a lettre describing the kind of communication, such as +follows: + +\begin{description} +\item [{\tt E}] External messages, i.e. issued by the user +\item [{\tt I}] Internal messages, i.e. issued by the library +\item [{\tt S}] Sent one-sided messages, i.e. writing access to the remote memory +\item [{\tt R}] Received one-sided messages, i.e. reading access to the remote memory +\item [{\tt C}] Collective messages +\end{description} + +This letter is followed by the rank of the issuing process, and the +rank of the receiving one. Then you have the total amount in bytes +exchanged and the count of messages. For point-to-point entries +(i.e. {\tt E} of {\tt I} entries), the line is completed by the full +distribution of messages in the form of a histogram. See variable {\tt + size\brkunds{}histogram} in +Section~\ref{subsubsec:TDI-common-monitoring} for the corresponding +values. In the case of a disabled filtering between external and +internal messages, the {\tt I} lines are merged with the {\tt E} +lines, keeping the {\tt E} header. + +The end of the summary is a per communicator information, where you +find the name of the communicator, the ranks of the processes included +in this communicator, and the amount of data send (or received) for +each kind of collective, with the corresponding count of operations of +each kind. + +\subsubsection{Accessing Monitoring Performance Variables} + +The following snippet presents how to access the performances +variables defined as part of the \mpit{} interface. The session +allocation is not presented as it is the same as in the previous +example. Please note that contrary to the {\it + pml\brkunds{}monitoring\brkunds{}flush} variable, the class of the +monitoring performance values is {\tt + MPI\brkunds{}T\brkunds{}PVAR\brkunds{}CLASS\brkunds{}SIZE}, whereas +the {\it flush} variable is of class {\tt GENERIC}. Also, performances +variables are only to be read. + +\paragraph{test/monitoring/example\_reduce\_count.c} (extract) + +\begin{verbatim} +MPI_T_pvar_handle count_handle; +int count_pvar_idx; +const char count_pvar_name[] = "pml_monitoring_messages_count"; +size_t*counts; + +/* Retrieve the proper pvar index */ +MPIT_result = MPI_T_pvar_get_index(count_pvar_name, MPI_T_PVAR_CLASS_SIZE, &count_pvar_idx); +if (MPIT_result != MPI_SUCCESS) { + printf("cannot find monitoring MPI_T \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); +} + +/* Allocating a new PVAR in a session will reset the counters */ +MPIT_result = MPI_T_pvar_handle_alloc(session, count_pvar_idx, + MPI_COMM_WORLD, &count_handle, &count); +if (MPIT_result != MPI_SUCCESS) { + printf("failed to allocate handle on \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); +} + +counts = (size_t*)malloc(count * sizeof(size_t)); + +MPIT_result = MPI_T_pvar_start(session, count_handle); +if (MPIT_result != MPI_SUCCESS) { + printf("failed to start handle on \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); +} + +/* Token Ring communications */ +if (rank == 0) { + n = 25; + MPI_Isend(&n,1,MPI_INT,to,tagno,MPI_COMM_WORLD,&request); +} +while (1) { + MPI_Irecv(&n, 1, MPI_INT, from, tagno, MPI_COMM_WORLD, &request); + MPI_Wait(&request, &status); + if (rank == 0) {n--;tagno++;} + MPI_Isend(&n, 1, MPI_INT, to, tagno, MPI_COMM_WORLD, &request); + if (rank != 0) {n--;tagno++;} + if (n<0){ + break; + } +} + +MPIT_result = MPI_T_pvar_read(session, count_handle, counts); +if (MPIT_result != MPI_SUCCESS) { + printf("failed to read handle on \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); +} + +/* Global reduce so everyone knows the maximum messages sent to each rank */ +MPI_Allreduce(MPI_IN_PLACE, counts, count, MPI_UNSIGNED_LONG, MPI_MAX, MPI_COMM_WORLD); + +/* OPERATIONS ON COUNTS */ +... + +free(counts); + +MPIT_result = MPI_T_pvar_stop(session, count_handle); +if (MPIT_result != MPI_SUCCESS) { + printf("failed to stop handle on \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); +} + +MPIT_result = MPI_T_pvar_handle_free(session, &count_handle); +if (MPIT_result != MPI_SUCCESS) { + printf("failed to free handle on \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); +} +\end{verbatim} + +\section{Technical Documentation of the Implementation} +\label{sec:TDI} + +This section describes the technical details of the components +implementation. It is of no use from a user point of view but it is made +to facilitate the work for future developer that would debug or enrich +the monitoring components. + +The architecture of this component is as follows. The Common component +is the main part where the magic occurs. PML, OSC and COLL components +are the entry points to the monitoring tool from the software stack +point-of-view. The relevant files can be found in accordance with +the partial directory tree presented in Figure~\ref{fig:tree}. + +\begin{figure} + \dirtree{% + .1 ompi/mca/. + .2 common. + .3 monitoring. + .4 common\_monitoring.h. + .4 common\_monitoring.c. + .4 common\_monitoring\_coll.h. + .4 common\_monitoring\_coll.c. + .4 HowTo\_pml\_monitoring.tex. + .4 Makefile.am. + .2 pml. + .3 monitoring. + .4 pml\_monitoring.h. + .4 pml\_monitoring\_component.c. + .4 pml\_monitoring\_comm.c. + .4 pml\_monitoring\_irecv.c. + .4 pml\_monitoring\_isend.c. + .4 pml\_monitoring\_start.c. + .4 pml\_monitoring\_iprobe.c. + .4 Makefile.am. + .2 osc. + .3 monitoring. + .4 osc\_monitoring.h. + .4 osc\_monitoring\_component.c. + .4 osc\_monitoring\_comm.h. + .4 osc\_monitoring\_module.h. + .4 osc\_monitoring\_dynamic.h. + .4 osc\_monitoring\_template.h. + .4 osc\_monitoring\_accumulate.h. + .4 osc\_monitoring\_active\_target.h. + .4 osc\_monitoring\_passive\_target.h. + .4 configure.m4. + .4 Makefile.am. + .2 coll. + .3 monitoring. + .4 coll\_monitoring.h. + .4 coll\_monitoring\_component.c. + .4 coll\_monitoring\_bcast.c. + .4 coll\_monitoring\_reduce.c. + .4 coll\_monitoring\_barrier.c. + .4 coll\_monitoring\_alltoall.c. + .4 {...} . + .4 Makefile.am. + } +\caption{Monitoring component files architecture (partial)} +\label{fig:tree} +\end{figure} + +\subsection{Common} +\label{subsec:TDI-common} +This part of the monitoring components is the place where data is +managed. It centralizes all recorded information, the translation +hash-table and ensures a unique initialization of the monitoring +structures. This component is also the one where the MCA variables (to +be set as part of the command line) are defined and where the final +output, if any requested, is dealt with. + +The header file defines the unique monitoring version number, +different preprocessing macros for printing information using the +monitoring output stream object, and the ompi monitoring API (i.e. the +API to be used INSIDE the ompi software stack, not the one to be +exposed to the end-user). It has to be noted that the {\tt + mca\brkunds{}common\brkunds{}monitoring\brkunds{}record\brkunds{}*} +functions are to be used with the destination rank translated into the +corresponding rank in {\tt MPI\brkunds{}COMM\brkunds{}WORLD}. This +translation is done by using {\tt + mca\brkunds{}common\brkunds{}monitoring\brkunds{}get\brkunds{}world\brkunds{}rank}. The +use of this function may be limited by how the initialization occurred +(see in~\ref{subsec:TDI-pml}). + +\subsubsection{Common monitoring} +\label{subsubsec:TDI-common-monitoring} + +The the common\brkunds{}monitoring.c file defines multiples variables +that has the following use: +\begin{description} +\item[{\tt mca\brkunds{}common\brkunds{}monitoring\brkunds{}hold}] is + the counter that keeps tracks of whether the common component has + already been initialized or if it is to be released. The operations + on this variable are atomic to avoid race conditions in a + multi-threaded environment. +\item[{\tt + mca\brkunds{}common\brkunds{}monitoring\brkunds{}output\brkunds{}stream\brkunds{}obj}] + is the structure used internally by \ompi{} for output streams. The + monitoring output stream states that this output is for debug, so + the actual output will only happen when OPAL is configured with {\tt + -{}-enable-debug}. The output is sent to stderr standard output + stream. The prefix field, initialized in {\tt + mca\brkunds{}common\brkunds{}monitoring\brkunds{}init}, states + that every log message emitted from this stream object will be + prefixed by ``{\tt [hostname:PID] monitoring: }'', where {\tt + hostname} is the configured name of the machine running the + process and {\tt PID} is the process id, with 6 digits, prefixed + with zeros if needed. +\item[{\tt mca\brkunds{}common\brkunds{}monitoring\brkunds{}enabled}] + is the variable retaining the original value given to the MCA option + system, as an example as part of the command line. The corresponding + variable is {\tt pml\brkunds{}monitoring\brkunds{}enable}. This + variable is not to be written by the monitoring component. It is + used to reset the {\tt + mca\brkunds{}common\brkunds{}monitoring\brkunds{}current\brkunds{}state} + variable between phases. The value given to this parameter also + defines whether or not the filtering between internal and externals + messages is enabled. +\item[{\tt + mca\brkunds{}common\brkunds{}monitoring\brkunds{}current\brkunds{}state}] + is the variable used to determine the actual current state of the + monitoring. This variable is the one used to define phases. +\item[{\tt + mca\brkunds{}common\brkunds{}monitoring\brkunds{}output\brkunds{}enabled}] + is a variable, set by the MCA engine, that states whether or not the + user requested a summary of the monitored data to be streamed out at + the end of the execution. It also states whether the output should + be to stdout, stderr or to a file. If a file is requested, the next + two variables have to be set. The corresponding variable is {\tt + pml\brkunds{}monitoring\brkunds{}enable\brkunds{}output}. {\bf + Warning:} This variable may be set to 0 in case the monitoring is + also controlled with \mpit{}. We cannot both control the monitoring + via \mpit{} and expect accurate answer upon {\tt + MPI\brkunds{}Finalize}. +\item[{\tt + mca\brkunds{}common\brkunds{}monitoring\brkunds{}initial\brkunds{}filename}] + works the same as {\tt + mca\brkunds{}common\brkunds{}monitoring\brkunds{}ena\allowbreak{}bled}. This + variable is, and has to be, only used as a placeholder for the {\tt + pml\brkunds{}monitoring\allowbreak\brkunds{}filename} + variable. This variable has to be handled very carefully as it has + to live as long as the program and it has to be a valid pointer + address, which content is not to be released by the component. The + way MCA handles variable (especially strings) makes it very easy to + create segmentation faults. But it deals with the memory release of + the content. So, in the end, {\tt + mca\brkunds{}common\brkunds{}monitoring\brkunds{}initial\brkunds{}filename} + is just to be read. +\item[{\tt + mca\brkunds{}common\brkunds{}monitoring\brkunds{}current\brkunds{}filename}] + is the variable the monitoring component will work with. This + variable is the one to be set by \mpit{'s} control variable {\tt + pml\brkunds{}monitoring\brkunds{}flush}. Even though this control + variable is prefixed with {\tt pml} for historical and easy reasons, + it depends on the common section for its behavior. +\item[{\tt pml\brkunds{}data} and {\tt pml\brkunds{}count}] arrays of + unsigned 64-bits integers record respectively the cumulated amount + of bytes sent from the current process to another process $p$, and + the count of messages. The data in this array at the index $i$ + corresponds to the data sent to the process $p$, of id $i$ in {\tt + MPI\brkunds{}COMM\brkunds{}WORLD}. These arrays are of size $N$, + where $N$ is the number of nodes in the MPI application. If the + filtering is disabled, these variables gather all information + regardless of the tags. In this case, the next two arrays are, + obviously, not used, even though they will still be allocated. The + {\tt pml\brkunds{}data} and {\tt pml\brkunds{}count} arrays, and the + nine next arrays described, are allocated, initialized, reset and + freed all at once, and are concurrent in the memory. +\item[{\tt filtered\brkunds{}pml\brkunds{}data} and {\tt + filtered\brkunds{}pml\brkunds{}count}] arrays of unsigned 64-bits + integers record respectively the cumulated amount of bytes sent from + the current process to another process $p$, and the count of + internal messages. The data in this array at the index $i$ + corresponds to the data sent to the process $p$, of id $i$ in {\tt + MPI\brkunds{}COMM\brkunds{}WORLD}. These arrays are of size $N$, + where $N$ is the number of nodes in the MPI application. The + internal messages are defined as messages sent through the PML + layer, with a negative tag. They are issued, as an example, from the + decomposition of collectives operations. +\item[{\tt osc\brkunds{}data\brkunds{}s} and {\tt + osc\brkunds{}count\brkunds{}s}] arrays of unsigned 64-bits + integers record respectively the cumulated amount of bytes sent from + the current process to another process $p$, and the count of + messages. The data in this array at the index $i$ corresponds to the + data sent to the process $p$, of id $i$ in {\tt + MPI\brkunds{}COMM\brkunds{}WORLD}. These arrays are of size $N$, + where $N$ is the number of nodes in the MPI application. +\item[{\tt osc\brkunds{}data\brkunds{}r} and {\tt + osc\brkunds{}count\brkunds{}r}] arrays of unsigned 64-bits + integers record respectively the cumulated amount of bytes received + to the current process to another process $p$, and the count of + messages. The data in this array at the index $i$ corresponds to the + data sent to the process $p$, of id $i$ in {\tt + MPI\brkunds{}COMM\brkunds{}WORLD}. These arrays are of size $N$, + where $N$ is the number of nodes in the MPI application. +\item[{\tt coll\brkunds{}data} and {\tt coll\brkunds{}count}] arrays + of unsigned 64-bits integers record respectively the cumulated + amount of bytes sent from the current process to another process + $p$, in the case of a all-to-all or one-to-all operations, or + received from another process $p$ to the current process, in the + case of all-to-one operations, and the count of messages. The data + in this array at the index $i$ corresponds to the data sent to the + process $p$, of id $i$ in {\tt + MPI\brkunds{}COMM\brkunds{}WORLD}. These arrays are of size $N$, + where $N$ is the number of nodes in the MPI application. The + communications are thus considered symmetrical in the resulting + matrices. +\item[{\tt size\brkunds{}histogram}] array of unsigned 64-bits + integers records the distribution of sizes of pml messages, filtered + or not, between the current process and a process $p$. This + histogram is of log-2 scale. The index 0 is for empty + messages. Messages of size between 1 and $2^{64}$ are recorded such + as the following. For a given size $S$, with $2^k \le S < 2^{k+1}$, + the $k$-th element of the histogram is incremented. This array is of + size $N \times {\tt max\brkunds{}size\brkunds{}histogram}$, where + $N$ is the number of nodes in the MPI application. +\item[{\tt max\brkunds{}size\brkunds{}histogram}] constant value + correspond to the number of elements in the {\tt + size\brkunds{}histo\allowbreak{}gram} array for each processor. It + is stored here to avoid having its value hang here and there in the + code. This value is used to compute the total size of the array to + be allocated, initialized, reset or freed. This value equals $(10 + + {\tt max\brkunds{}size\brkunds{}histogram}) \times N$, where $N$ + correspond to the number of nodes in the MPI application. This value + is also used to compute the index to the histogram of a given + process $p$ ; this index equals $i \times {\tt + max\brkunds{}size\brkunds{}histogram}$, where $i$ is $p$'s id in + {\tt MPI\brkunds{}COMM\brkunds{}WORLD}. +\item[{\tt log10\brkunds{}2}] is a cached value for the common + logarithm (or decimal logarithm) of 2. This value is used to compute + the index at which increment the histogram value. This index $j$, + for a message that is not empty, is computed as follow $j = 1 + + \left \lfloor{log_{10}(S)/log_{10}(2)} \right \rfloor$, where + $log_{10}$ is the decimal logarithm and $S$ the size of the message. +\item[{\tt rank\brkunds{}world}] is the cached value of the current + process in {\tt MPI\brkunds{}COMM\brkunds{}WORLD}. +\item[{\tt nprocs\brkunds{}world}] is the cached value of the size of + {\tt MPI\brkunds{}COMM\brkunds{}WORLD}. +\item[{\tt + common\brkunds{}monitoring\brkunds{}translation\brkunds{}ht}] is + the hash table used to translate the rank of any process $p$ of rank + $r$ from any communicator, into its rank in {\tt + MPI\brkunds{}COMM\brkunds{}WORLD}. It lives as long as the + monitoring components do. +\end{description} + +In any case, we never monitor communications between one process and +itself. + +The different functions to access \mpit{} performance variables are +pretty straight forward. Note that for PML, OSC and COLL, for both +count and size, performance variables the {\it notify} function is the +same. At binding, it sets the {\tt count} variable to the size of {\tt + MPI\brkunds{}COMM\brkunds{}WORLD}, as requested by the MPI-3 +standard (for arrays, the parameter should be set to the number of +elements of the array). Also, the {\it notify} function is responsible +for starting the monitoring when any monitoring performance value +handle is started, and it also disable the monitoring when any +monitoring performance value handle is stopped. The {\it flush} +control variable behave as follows. On binding, it returns the size of +the filename defined if any, 0 otherwise. On start event, this +variable also enable the monitoring, as the performance variables do, +but it also disable the final output, even though it was previously +requested by the end-user. On the stop event, this variable flushes +the monitored data to the proper output stream (i.e. stdout, stderr or +the requested file). Note that these variables are to be bound only +with the {\tt MPI\brkunds{}COMM\brkunds{}WORLD} communicator. For far, +the behavior in case of a binding to another communicator is not +tested. + +For the flushing itself, it is decomposed into two functions. The +first one ({\tt + mca\brkunds{}common\brkunds{}monitoring\brkunds{}flush}) is +responsible for opening the proper stream. If it is given 0 as its +first parameter, it does nothing with no error propagated as it +correspond to a disable monitoring. The {\tt filename} parameter is +only taken in account if {\tt fd} is strictly greater than 2. Note +that upon flushing, the record arrays are reset to 0. Also, the +flushing called in {\it common\brkunds{}monitoring.c} call the +specific flushing for per communicator collectives monitoring data. + +For historical reasons, and because of the fact that the PML layer is +the first one to be loaded, MCA parameters and the {\it + monitoring\brkunds{}flush} control variable are linked to the PML +framework. The other performance variables, though, are linked to the +proper frameworks. + +\subsubsection{Common Coll Monitoring} +\label{subsubsec:TDI-common-coll} + +In addition to the monitored data kept in the arrays, the monitoring +component also provide a per communicator set of records. It keeps +pieces of information about collective operations. As we cannot know +how the data are indeed exchanged (see Section~\ref{subsec:TDI-coll}), +we added this complement to the final summary of the monitored +operations. + +We keep the per communicator data set as part of the {\it + coll\brkunds{}monitoring\brkunds{}module}. Each data set is also +kept in a hash table, with the communicator structure address as the +hash-key. This data set is made to keep tracks of the mount of data +sent through a communicator with collective operations and the count +of each kind of operations. It also cache the list of the processes' +ranks, translated to their rank in {\tt + MPI\brkunds{}COMM\brkunds{}WORLD}, as a string, the rank of the +current process, translated into its rank in {\tt + MPI\brkunds{}COMM\brkunds{}WORLD} and the communicator's name. + +The process list is generated with the following algorithm. First, we +allocate a string long enough to contain it. We define long enough as +$1 + (d + 2) \times s$, where $d$ is the number of digit of the higher +rank in {\tt MPI\brkunds{}COMM\brkunds{}WORLD} and $s$ the size of the +current communicator. We add 2 to $d$, to consider the space needed +for the comma and the space between each rank, and 1 to ensure there +is enough room for the NULL character terminating the string. Then, we +fill the string with the proper values, and adjust the final size of +the string. + +When possible, this process happen when the communicator is being +created. If it fails, this process will be tested again when the +communicator is being released. + +This data set lifetime is different from the one of its corresponding +communicator. It is actually destroyed only once its data had been +flushed (at the end of the execution or at the end of a monitoring +phase). To this end, this structure keeps a flag to know if it is safe +to release it or not. + +\subsection{PML} +\label{subsec:TDI-pml} + +As specified in Section~\ref{subsubsec:TDI-common-monitoring}, this +component is closely working with the common component. They were +merged initially, but separated later in order to propose a cleaner +and more logical architecture. + +This module is the first one to be initialized by the \ompi{} software +stack ; thus it is the one responsible for the proper initialization, +as an example, of the translation hash table. \ompi{} relies on the +PML layer to add process logical structures as far as communicators +are concerned. + +To this end, and because of the way the PML layer is managed by the +MCA engine, this component has some specific variables to manage its +own state, in order to be properly instantiated. The module selection +process works as follows. All the PML modules available for the +framework are loaded, initialized and asked for a priority. The higher +the priority, the higher the odds to be selected. This is why our +component returns a priority of 0. Note that the priority is returned +and initialization of the common module is done at this point only if +the monitoring had been requested by the user. + +% CF - TODO: check what happen if the monitoring is the only PML module available. +If everything works properly, we should not be selected. The next step +in the PML initialization is to finalize every module that is not the +selected one, and then close components that were not used. At this +point the winner component and its module are saved for the PML. The +variables {\tt + mca\brkunds{}pml\brkunds{}base\brkunds{}selected\brkunds{}component} +and {\tt mca\brkunds{}pml}, defined in {\it + ompi/mca/pml/base/pml\brkunds{}base\brkunds{}frame.c}, are now +initialized. This point is the one where we install our interception +layer. We also indicate ourself now initialized, in order to know on +the next call to the {\it component\brkunds{}close} function that we +actually have to be closed this time. Note that the adding of our +layer require the add of the {\tt + MCA\brkunds{}PML\brkunds{}BASE\brkunds{}FLAG\brkunds{}REQUIRE\brkunds{}WORLD} +flag in order to request for the whole list of processes to be given +at the initialization of {\tt MPI\brkunds{}COMM\brkunds{}WORLD}, so we +can properly fill our hash table. The downside of this trick is that +it stops the \ompi{} optimization of lazily adding them. + +Once that is done, we are properly installed, and we can monitor every +messages going through the PML layer. As we only monitor messages from +the emitter side, we only actually record when the messages are using +the {\tt MPI\brkunds{}Send}, {\tt MPI\brkunds{}Isend} or {\tt + MPI\brkunds{}Start} functions. + +\subsection{OSC} +\label{subsec:TDI-osc} + +This layer is responsible for remote memory access operations, and +thus, it has its specificities. Even though the component selection +process is quite close to the PML selection's one, there are some +aspects on the usage of OSC modules that had us to adapt the +interception layer. + +The first problem comes from how the module is accessed inside the +components. In the OSC layer, the module is part of the {\tt + ompi\brkunds{}win\brkunds{}t} structure. This implies that it is +possible to access directly to the proper field of the structure to +find the reference to the module. And it how it is done. Because of +that it is not possible to directly replace a module with ours that +would have saved the original module. The first solution was then to +``extend'' (in the ompi manner of extending {\it objects}) with a +structure that would have contain as the first field a union type of +every possible module. We would have then copy their fields values, +save their functions, and replace them with pointers to our inception +functions. This solution was implemented but a second problem was +faced, stopping us from going with this solution. + +The second problem was that the {\it osc/rdma} uses internally a hash +table to keep tracks of its modules and allocated segments, with the +module's pointer address as the hash key. Hence, it was not possible +for us to modify this address, as the RDMA module would not be able to +find the corresponding segments. This also implies that it is neither +possible for us to extend the structures. Therefore, we could only +modify the common fields of the structures to keep our ``module'' +adapted to any OSC component. We designed templates, dynamically +adapted for each kind of module. + +To this end and for each kind of OSC module, we generate and +instantiate three variables: +\begin{description} +\item[{\tt + OMPI\brkunds{}OSC\brkunds{}MONITORING\brkunds{}MODULE\brkunds{}VARIABLE(template)}] + is the structure that keeps the address of the original module + functions of a given component type (i.e. RDMA, PORTALS4, PT2PT or + SM). It is initialized once, and referred to to propagate the calls + after the initial interception. There is one generated for each kind + of OSC component. +\item[{\tt + OMPI\brkunds{}OSC\brkunds{}MONITORING\brkunds{}MODULE\brkunds{}INIT(template)}] + is a flag to ensure the module variable is only initialized once, in + order to avoid race conditions. There is one generated for each {\tt + OMPI\brkunds{}OSC\brkunds{}MONITORING\brkunds{}MODULE\brkunds{}VARIABLE(template)}, + thus one per kind of OSC component. +\item[{\tt + OMPI\brkunds{}OSC\brkunds{}MONITORING\brkunds{}TEMPLATE\brkunds{}VARIABLE(template)}] + is a structure containing the address of the interception + functions. There is one generated for each kind of OSC component. +\end{description} + +The interception is done with the following steps. First, we follow +the selecting process. Our priority is set to {\tt INT\brkunds{}MAX} +in order to ensure that we would be the selected component. Then we do +this selection ourselves. This gives us the opportunity to modify as +needed the communication module. If it is the first time a module of +this kind of component is used, we extract from the given module the +function's addresses and save them to the {\tt + OMPI\brkunds{}OSC\brkunds{}MONITORING\brkunds{}MODULE\brkunds{}VARIABLE(template)} +structure, after setting the initialization flag. Then we replace the +origin functions in the module with our interception ones. + +To make everything work for each kind of component, the variables are +generated with the corresponding interception functions. These +operations are done at compilation time. An issue appeared with the +use of PORTALS4, that have its symbols propagated only when the card +are available on the system. In the header files, where we define the +template functions and structures, {\it template} refers to the OSC +component name. + +We found two drawbacks to this solution. First, the readability of the +code is bad. Second, is that this solution is not auto-adaptive to new +components. If a new component is added, the code in {\it + ompi/mca/osc/monitoring/osc\brkunds{}monitoring\brkunds{}component.c} +needs to be modified in order to monitor the operations going through +it. Even though the modification is three lines long, it my be +preferred to have the monitoring working without any modification +related to other components. + +A second solution for the OSC monitoring could have been the use of a +hash table. We would have save in the hash table the structure +containing the original function's addresses, with the module address +as a hash key. Our interception functions would have then search in +the hash table the corresponding structure on every call, in order to +propagate the functions calls. This solution was not implemented +because because it offers an higher memory footprint for a large +amount of windows allocated. Also, the cost of our interceptions would +have been then higher, because of the search in the hash table. This +reason was the main reason we choose the first solution. The OSC layer +is designed to be very cost-effective in order to take the best +advantages of the background communication and +communication/computations overlap. This solution would have however +give us the adaptability our solution lacks. + +\subsection{COLL} +\label{subsec:TDI-coll} + +The collective module (or to be closer to the reality, {\it modules}) +is part of the communicator. The modules selection is made with the +following algorithm. First all available components are selected, +queried and sorted in ascending order of priorities. The modules may +provide part or all operations, keeping in mind that modules with +higher priority may take your place. The sorted list of module is +iterated over, and for each module, for each operation, if the +function's address is not {\tt NULL}, the previous module is replace +with the current one, and so is the corresponding function. Every time +a module is selected it is retained and enabled (i.e. the {\tt + coll\brkunds{}module\brkunds{}enable} function is called), and every +time it gets replaced, it is disabled (i.e. the {\tt + coll\brkunds{}module\brkunds{}disable} function is called) and +released. + +When the monitoring module is queried, the priority returned is {\tt + INT\brkunds{}MAX} to ensure that our module comes last in the +list. Then, when enabled, all the previous function-module couples are +kept as part of our monitoring module. The modules are retained to +avoid having the module freed when released by the selecting +process. To ensure the error detection in communicator (i.e. an +incomplete collective API), if, for a given operation, there is no +corresponding module given, we set this function's address to {\tt + NULL}. Symmetrically, when our module is released, we also propagate +this call to each underlying module, and we also release the +objects. Also, when the module is enabled, we initialize the per +communicator data record, which gets released when the module is +disabled. + +When an collective operation is called, both blocking or non blocking, +we intercept the call and record the data in two different +entries. The operations are groups between three kinds. One-to-all +operations, all-to-one operations and all-to-all operations. + +For one-to-all operations, the root process of the operation computes +the total amount of data to be sent, and keep it as part of the per +communicator data (see Section~\ref{subsubsec:TDI-common-coll}). Then +it update the {\it common\brkunds{}monitoring} array with the amount +of data each pair has to receive in the end. As we cannot predict the +actual algorithm used to communicate the data, we assume the root send +everything directly to each process. + +For all-to-one operations, each non-root process compute the amount of +data to send to the root and update the {\it common\brkunds{}monitoring} +array with the amount of data at the index $i$, with $i$ being the +rank in {\tt MPI\brkunds{}COMM\brkunds{}WORLD} of the root process. As +we cannot predict the actual algorithm used to communicate the data, +we assume each process send its data directly to the root. The root +process compute the total amount of data to receive and update the per +communicator data. + +For all-to-all operations, each process compute for each other process +the amount of data to both send and receive from it. The amount of +data to be sent to each process $p$ is added to update the {\it + common\brkunds{}monitoring} array at the index $i$, with $i$ being +the rank of $p$ in {\tt MPI\brkunds{}COMM\brkunds{}WORLD}. The total +amount of data sent by a process is also added to the per communicator +data. + +For every rank translation, we use the {\tt + common\brkunds{}monitoring\brkunds{}translation\brkunds{}ht} hash +table. + +\end{document} diff --git a/ompi/mca/common/monitoring/Makefile.am b/ompi/mca/common/monitoring/Makefile.am new file mode 100644 index 00000000000..1812245cdeb --- /dev/null +++ b/ompi/mca/common/monitoring/Makefile.am @@ -0,0 +1,70 @@ +# +# Copyright (c) 2016 The University of Tennessee and The University +# of Tennessee Research Foundation. All rights +# reserved. +# Copyright (c) 2016 Inria. All rights reserved. +# Copyright (c) 2017 Research Organization for Information Science +# and Technology (RIST). All rights reserved. +# Copyright (c) 2018 Cisco Systems, Inc. All rights reserved +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +EXTRA_DIST = profile2mat.pl aggregate_profile.pl + +sources = common_monitoring.c common_monitoring_coll.c +headers = common_monitoring.h common_monitoring_coll.h + +lib_LTLIBRARIES = +noinst_LTLIBRARIES = +component_install = libmca_common_monitoring.la +component_noinst = libmca_common_monitoring_noinst.la + +if MCA_BUILD_ompi_common_monitoring_DSO +lib_LTLIBRARIES += $(component_install) +lib_LTLIBRARIES += ompi_monitoring_prof.la + +ompi_monitoring_prof_la_SOURCES = monitoring_prof.c +ompi_monitoring_prof_la_LDFLAGS= \ + -module -avoid-version -shared $(WRAPPER_EXTRA_LDFLAGS) +ompi_monitoring_prof_la_LIBADD = \ + $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la + +if OPAL_INSTALL_BINARIES +bin_SCRIPTS = profile2mat.pl aggregate_profile.pl +endif # OPAL_INSTALL_BINARIES + +else # MCA_BUILD_ompi_common_monitoring_DSO +noinst_LTLIBRARIES += $(component_noinst) +endif # MCA_BUILD_ompi_common_monitoring_DSO + +libmca_common_monitoring_la_SOURCES = $(headers) $(sources) +libmca_common_monitoring_la_CPPFLAGS = $(common_monitoring_CPPFLAGS) +libmca_common_monitoring_la_LDFLAGS = \ + -version-info $(libmca_ompi_common_monitoring_so_version) \ + $(common_monitoring_LDFLAGS) +libmca_common_monitoring_la_LIBADD = $(common_monitoring_LIBS) +libmca_common_monitoring_noinst_la_SOURCES = $(headers) $(sources) + +# These two rules will sym link the "noinst" libtool library filename +# to the installable libtool library filename in the case where we are +# compiling this component statically (case 2), described above). +V=0 +OMPI_V_LN_SCOMP = $(ompi__v_LN_SCOMP_$V) +ompi__v_LN_SCOMP_ = $(ompi__v_LN_SCOMP_$AM_DEFAULT_VERBOSITY) +ompi__v_LN_SCOMP_0 = @echo " LN_S " `basename $(component_install)`; + +all-local: + $(OMPI_V_LN_SCOMP) if test -z "$(lib_LTLIBRARIES)"; then \ + rm -f "$(component_install)"; \ + $(LN_S) "$(component_noinst)" "$(component_install)"; \ + fi + +clean-local: + if test -z "$(lib_LTLIBRARIES)"; then \ + rm -f "$(component_install)"; \ + fi diff --git a/ompi/mca/pml/monitoring/README b/ompi/mca/common/monitoring/README similarity index 100% rename from ompi/mca/pml/monitoring/README rename to ompi/mca/common/monitoring/README diff --git a/test/monitoring/aggregate_profile.pl b/ompi/mca/common/monitoring/aggregate_profile.pl similarity index 83% rename from test/monitoring/aggregate_profile.pl rename to ompi/mca/common/monitoring/aggregate_profile.pl index da6d3780b00..1af60b93371 100644 --- a/test/monitoring/aggregate_profile.pl +++ b/ompi/mca/common/monitoring/aggregate_profile.pl @@ -28,7 +28,7 @@ # ensure that this script as the executable right: chmod +x ... # -die "$0 \n\tProfile files should be of the form \"name_phaseid_processesid.prof\"\n\tFor instance if you saved the monitoring into phase_0_0.prof, phase_0_1.prof, ..., phase_1_0.prof etc you should call: $0 phase\n" if ($#ARGV!=0); +die "$0 \n\tProfile files should be of the form \"name_phaseid_processesid.prof\"\n\tFor instance if you saved the monitoring into phase_0.0.prof, phase_0.1.prof, ..., phase_1.0.prof etc you should call: $0 phase\n" if ($#ARGV!=0); $name = $ARGV[0]; @@ -39,7 +39,7 @@ # Detect the different phases foreach $file (@files) { - ($id)=($file =~ m/$name\_(\d+)_\d+/); + ($id)=($file =~ m/$name\_(\d+)\.\d+/); $phaseid{$id} = 1 if ($id); } @@ -53,12 +53,13 @@ sub aggregate{ $phase = $_[0]; - + #Aggregating all files of given phase in files array.This should be done + # before creating $phase.prof to avoid adding $phase.prof to files array + @files = glob ($phase."*"); print "Building $phase.prof\n"; open OUT,">$phase.prof"; - @files = glob ($phase."*"); foreach $file ( @files) { open IN,$file; diff --git a/ompi/mca/common/monitoring/common_monitoring.c b/ompi/mca/common/monitoring/common_monitoring.c new file mode 100644 index 00000000000..e521ca56417 --- /dev/null +++ b/ompi/mca/common/monitoring/common_monitoring.c @@ -0,0 +1,799 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ +/* + * Copyright (c) 2013-2017 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * Copyright (c) 2013-2017 Inria. All rights reserved. + * Copyright (c) 2015 Bull SAS. All rights reserved. + * Copyright (c) 2016-2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 Los Alamos National Security, LLC. All rights + * reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include "common_monitoring.h" +#include "common_monitoring_coll.h" +#include +#include +#include +#include +#include +#include + +#if SIZEOF_LONG_LONG == SIZEOF_SIZE_T +#define MCA_MONITORING_VAR_TYPE MCA_BASE_VAR_TYPE_UNSIGNED_LONG_LONG +#elif SIZEOF_LONG == SIZEOF_SIZE_T +#define MCA_MONITORING_VAR_TYPE MCA_BASE_VAR_TYPE_UNSIGNED_LONG +#endif + +/*** Monitoring specific variables ***/ +/* Keep tracks of how many components are currently using the common part */ +static int32_t mca_common_monitoring_hold = 0; +/* Output parameters */ +int mca_common_monitoring_output_stream_id = -1; +static opal_output_stream_t mca_common_monitoring_output_stream_obj = { + .lds_verbose_level = 0, + .lds_want_syslog = false, + .lds_prefix = NULL, + .lds_suffix = NULL, + .lds_is_debugging = true, + .lds_want_stdout = false, + .lds_want_stderr = true, + .lds_want_file = false, + .lds_want_file_append = false, + .lds_file_suffix = NULL +}; + +/*** MCA params to mark the monitoring as enabled. ***/ +/* This signals that the monitoring will highjack the PML, OSC and COLL */ +int mca_common_monitoring_enabled = 0; +int mca_common_monitoring_current_state = 0; +/* Signals there will be an output of the monitored data at component close */ +static int mca_common_monitoring_output_enabled = 0; +/* File where to output the monitored data */ +static char* mca_common_monitoring_initial_filename = ""; +static char* mca_common_monitoring_current_filename = NULL; + +/* array for stroring monitoring data*/ +static size_t* pml_data = NULL; +static size_t* pml_count = NULL; +static size_t* filtered_pml_data = NULL; +static size_t* filtered_pml_count = NULL; +static size_t* osc_data_s = NULL; +static size_t* osc_count_s = NULL; +static size_t* osc_data_r = NULL; +static size_t* osc_count_r = NULL; +static size_t* coll_data = NULL; +static size_t* coll_count = NULL; + +static size_t* size_histogram = NULL; +static const int max_size_histogram = 66; +static double log10_2 = 0.; + +static int rank_world = -1; +static int nprocs_world = 0; + +opal_hash_table_t *common_monitoring_translation_ht = NULL; + +/* Reset all the monitoring arrays */ +static void mca_common_monitoring_reset ( void ); + +/* Flushes the monitored data and reset the values */ +static int mca_common_monitoring_flush (int fd, char* filename); + +/* Retreive the PML recorded count of messages sent */ +static int mca_common_monitoring_get_pml_count (const struct mca_base_pvar_t *pvar, + void *value, void *obj_handle); + +/* Retreive the PML recorded amount of data sent */ +static int mca_common_monitoring_get_pml_size (const struct mca_base_pvar_t *pvar, + void *value, void *obj_handle); + +/* Retreive the OSC recorded count of messages sent */ +static int mca_common_monitoring_get_osc_sent_count (const struct mca_base_pvar_t *pvar, + void *value, void *obj_handle); + +/* Retreive the OSC recorded amount of data sent */ +static int mca_common_monitoring_get_osc_sent_size (const struct mca_base_pvar_t *pvar, + void *value, void *obj_handle); + +/* Retreive the OSC recorded count of messages received */ +static int mca_common_monitoring_get_osc_recv_count (const struct mca_base_pvar_t *pvar, + void *value, void *obj_handle); + +/* Retreive the OSC recorded amount of data received */ +static int mca_common_monitoring_get_osc_recv_size (const struct mca_base_pvar_t *pvar, + void *value, void *obj_handle); + +/* Retreive the COLL recorded count of messages sent */ +static int mca_common_monitoring_get_coll_count (const struct mca_base_pvar_t *pvar, + void *value, void *obj_handle); + +/* Retreive the COLL recorded amount of data sent */ +static int mca_common_monitoring_get_coll_size (const struct mca_base_pvar_t *pvar, + void *value, void *obj_handle); + +/* Set the filename where to output the monitored data */ +static int mca_common_monitoring_set_flush(struct mca_base_pvar_t *pvar, + const void *value, void *obj); + +/* Does nothing, as the pml_monitoring_flush pvar has no point to be read */ +static int mca_common_monitoring_get_flush(const struct mca_base_pvar_t *pvar, + void *value, void *obj); + +/* pml_monitoring_count, pml_monitoring_size, + osc_monitoring_sent_count, osc_monitoring sent_size, + osc_monitoring_recv_size and osc_monitoring_recv_count pvar notify + function */ +static int mca_common_monitoring_comm_size_notify(mca_base_pvar_t *pvar, + mca_base_pvar_event_t event, + void *obj_handle, int *count); + +/* pml_monitoring_flush pvar notify function */ +static int mca_common_monitoring_notify_flush(struct mca_base_pvar_t *pvar, + mca_base_pvar_event_t event, + void *obj, int *count); + +static int mca_common_monitoring_set_flush(struct mca_base_pvar_t *pvar, + const void *value, void *obj) +{ + if( NULL != mca_common_monitoring_current_filename ) { + free(mca_common_monitoring_current_filename); + } + if( NULL == *(char**)value || 0 == strlen((char*)value) ) { /* No more output */ + mca_common_monitoring_current_filename = NULL; + } else { + mca_common_monitoring_current_filename = strdup((char*)value); + if( NULL == mca_common_monitoring_current_filename ) + return OMPI_ERROR; + } + return OMPI_SUCCESS; +} + +static int mca_common_monitoring_get_flush(const struct mca_base_pvar_t *pvar, + void *value, void *obj) +{ + return OMPI_SUCCESS; +} + +static int mca_common_monitoring_notify_flush(struct mca_base_pvar_t *pvar, + mca_base_pvar_event_t event, + void *obj, int *count) +{ + switch (event) { + case MCA_BASE_PVAR_HANDLE_BIND: + mca_common_monitoring_reset(); + *count = (NULL == mca_common_monitoring_current_filename + ? 0 : strlen(mca_common_monitoring_current_filename)); + case MCA_BASE_PVAR_HANDLE_UNBIND: + return OMPI_SUCCESS; + case MCA_BASE_PVAR_HANDLE_START: + mca_common_monitoring_current_state = mca_common_monitoring_enabled; + mca_common_monitoring_output_enabled = 0; /* we can't control the monitoring via MPIT and + * expect accurate answer upon MPI_Finalize. */ + return OMPI_SUCCESS; + case MCA_BASE_PVAR_HANDLE_STOP: + return mca_common_monitoring_flush(3, mca_common_monitoring_current_filename); + } + return OMPI_ERROR; +} + +static int mca_common_monitoring_comm_size_notify(mca_base_pvar_t *pvar, + mca_base_pvar_event_t event, + void *obj_handle, + int *count) +{ + switch (event) { + case MCA_BASE_PVAR_HANDLE_BIND: + /* Return the size of the communicator as the number of values */ + *count = ompi_comm_size ((ompi_communicator_t *) obj_handle); + case MCA_BASE_PVAR_HANDLE_UNBIND: + return OMPI_SUCCESS; + case MCA_BASE_PVAR_HANDLE_START: + mca_common_monitoring_current_state = mca_common_monitoring_enabled; + return OMPI_SUCCESS; + case MCA_BASE_PVAR_HANDLE_STOP: + mca_common_monitoring_current_state = 0; + return OMPI_SUCCESS; + } + + return OMPI_ERROR; +} + +int mca_common_monitoring_init( void ) +{ + if( !mca_common_monitoring_enabled ) return OMPI_ERROR; + if( 1 < opal_atomic_add_fetch_32(&mca_common_monitoring_hold, 1) ) return OMPI_SUCCESS; /* Already initialized */ + + char hostname[OPAL_MAXHOSTNAMELEN] = "NA"; + /* Initialize constant */ + log10_2 = log10(2.); + /* Open the opal_output stream */ + gethostname(hostname, sizeof(hostname)); + asprintf(&mca_common_monitoring_output_stream_obj.lds_prefix, + "[%s:%06d] monitoring: ", hostname, getpid()); + mca_common_monitoring_output_stream_id = + opal_output_open(&mca_common_monitoring_output_stream_obj); + /* Initialize proc translation hashtable */ + common_monitoring_translation_ht = OBJ_NEW(opal_hash_table_t); + opal_hash_table_init(common_monitoring_translation_ht, 2048); + return OMPI_SUCCESS; +} + +void mca_common_monitoring_finalize( void ) +{ + if( ! mca_common_monitoring_enabled || /* Don't release if not last */ + 0 < opal_atomic_sub_fetch_32(&mca_common_monitoring_hold, 1) ) return; + + OPAL_MONITORING_PRINT_INFO("common_component_finish"); + /* Dump monitoring informations */ + mca_common_monitoring_flush(mca_common_monitoring_output_enabled, + mca_common_monitoring_current_filename); + /* Disable all monitoring */ + mca_common_monitoring_enabled = 0; + /* Close the opal_output stream */ + opal_output_close(mca_common_monitoring_output_stream_id); + free(mca_common_monitoring_output_stream_obj.lds_prefix); + /* Free internal data structure */ + free(pml_data); /* a single allocation */ + opal_hash_table_remove_all( common_monitoring_translation_ht ); + OBJ_RELEASE(common_monitoring_translation_ht); + mca_common_monitoring_coll_finalize(); + if( NULL != mca_common_monitoring_current_filename ) { + free(mca_common_monitoring_current_filename); + mca_common_monitoring_current_filename = NULL; + } +} + +void mca_common_monitoring_register(void*pml_monitoring_component) +{ + /* Because we are playing tricks with the component close, we should not + * use mca_base_component_var_register but instead stay with the basic + * version mca_base_var_register. + */ + (void)mca_base_var_register("ompi", "pml", "monitoring", "enable", + "Enable the monitoring at the PML level. A value of 0 " + "will disable the monitoring (default). A value of 1 will " + "aggregate all monitoring information (point-to-point and " + "collective). Any other value will enable filtered monitoring", + MCA_BASE_VAR_TYPE_INT, NULL, MPI_T_BIND_NO_OBJECT, + MCA_BASE_VAR_FLAG_DWG, OPAL_INFO_LVL_4, + MCA_BASE_VAR_SCOPE_READONLY, + &mca_common_monitoring_enabled); + + mca_common_monitoring_current_state = mca_common_monitoring_enabled; + + (void)mca_base_var_register("ompi", "pml", "monitoring", "enable_output", + "Enable the PML monitoring textual output at MPI_Finalize " + "(it will be automatically turned off when MPIT is used to " + "monitor communications). This value should be different " + "than 0 in order for the output to be enabled (default disable)", + MCA_BASE_VAR_TYPE_INT, NULL, MPI_T_BIND_NO_OBJECT, + MCA_BASE_VAR_FLAG_DWG, OPAL_INFO_LVL_9, + MCA_BASE_VAR_SCOPE_READONLY, + &mca_common_monitoring_output_enabled); + + (void)mca_base_var_register("ompi", "pml", "monitoring", "filename", + /*&mca_common_monitoring_component.pmlm_version, "filename",*/ + "The name of the file where the monitoring information " + "should be saved (the filename will be extended with the " + "process rank and the \".prof\" extension). If this field " + "is NULL the monitoring will not be saved.", + MCA_BASE_VAR_TYPE_STRING, NULL, MPI_T_BIND_NO_OBJECT, + MCA_BASE_VAR_FLAG_DWG, OPAL_INFO_LVL_9, + MCA_BASE_VAR_SCOPE_READONLY, + &mca_common_monitoring_initial_filename); + + /* Now that the MCA variables are automatically unregistered when + * their component close, we need to keep a safe copy of the + * filename. + * Keep the copy completely separated in order to let the initial + * filename to be handled by the framework. It's easier to deal + * with the string lifetime. + */ + if( NULL != mca_common_monitoring_initial_filename ) + mca_common_monitoring_current_filename = strdup(mca_common_monitoring_initial_filename); + + /* Register PVARs */ + + /* PML PVARs */ + (void)mca_base_pvar_register("ompi", "pml", "monitoring", "flush", "Flush the monitoring " + "information in the provided file. The filename is append with " + "the .%d.prof suffix, where %d is replaced with the processus " + "rank in MPI_COMM_WORLD.", + OPAL_INFO_LVL_1, MCA_BASE_PVAR_CLASS_GENERIC, + MCA_BASE_VAR_TYPE_STRING, NULL, MPI_T_BIND_NO_OBJECT, MCA_BASE_PVAR_FLAG_IWG, + mca_common_monitoring_get_flush, mca_common_monitoring_set_flush, + mca_common_monitoring_notify_flush, NULL); + + (void)mca_base_pvar_register("ompi", "pml", "monitoring", "messages_count", "Number of " + "messages sent to each peer through the PML framework.", + OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_SIZE, + MCA_MONITORING_VAR_TYPE, NULL, MPI_T_BIND_MPI_COMM, + MCA_BASE_PVAR_FLAG_READONLY | MCA_BASE_PVAR_FLAG_IWG, + mca_common_monitoring_get_pml_count, NULL, + mca_common_monitoring_comm_size_notify, NULL); + + (void)mca_base_pvar_register("ompi", "pml", "monitoring", "messages_size", "Size of messages " + "sent to each peer in a communicator through the PML framework.", + OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_SIZE, + MCA_MONITORING_VAR_TYPE, NULL, MPI_T_BIND_MPI_COMM, + MCA_BASE_PVAR_FLAG_READONLY | MCA_BASE_PVAR_FLAG_IWG, + mca_common_monitoring_get_pml_size, NULL, + mca_common_monitoring_comm_size_notify, NULL); + + /* OSC PVARs */ + (void)mca_base_pvar_register("ompi", "osc", "monitoring", "messages_sent_count", "Number of " + "messages sent through the OSC framework with each peer.", + OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_SIZE, + MCA_MONITORING_VAR_TYPE, NULL, MPI_T_BIND_MPI_COMM, + MCA_BASE_PVAR_FLAG_READONLY | MCA_BASE_PVAR_FLAG_IWG, + mca_common_monitoring_get_osc_sent_count, NULL, + mca_common_monitoring_comm_size_notify, NULL); + + (void)mca_base_pvar_register("ompi", "osc", "monitoring", "messages_sent_size", "Size of " + "messages sent through the OSC framework with each peer.", + OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_SIZE, + MCA_MONITORING_VAR_TYPE, NULL, MPI_T_BIND_MPI_COMM, + MCA_BASE_PVAR_FLAG_READONLY | MCA_BASE_PVAR_FLAG_IWG, + mca_common_monitoring_get_osc_sent_size, NULL, + mca_common_monitoring_comm_size_notify, NULL); + + (void)mca_base_pvar_register("ompi", "osc", "monitoring", "messages_recv_count", "Number of " + "messages received through the OSC framework with each peer.", + OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_SIZE, + MCA_MONITORING_VAR_TYPE, NULL, MPI_T_BIND_MPI_COMM, + MCA_BASE_PVAR_FLAG_READONLY | MCA_BASE_PVAR_FLAG_IWG, + mca_common_monitoring_get_osc_recv_count, NULL, + mca_common_monitoring_comm_size_notify, NULL); + + (void)mca_base_pvar_register("ompi", "osc", "monitoring", "messages_recv_size", "Size of " + "messages received through the OSC framework with each peer.", + OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_SIZE, + MCA_MONITORING_VAR_TYPE, NULL, MPI_T_BIND_MPI_COMM, + MCA_BASE_PVAR_FLAG_READONLY | MCA_BASE_PVAR_FLAG_IWG, + mca_common_monitoring_get_osc_recv_size, NULL, + mca_common_monitoring_comm_size_notify, NULL); + + /* COLL PVARs */ + (void)mca_base_pvar_register("ompi", "coll", "monitoring", "messages_count", "Number of " + "messages exchanged through the COLL framework with each peer.", + OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_SIZE, + MCA_MONITORING_VAR_TYPE, NULL, MPI_T_BIND_MPI_COMM, + MCA_BASE_PVAR_FLAG_READONLY | MCA_BASE_PVAR_FLAG_IWG, + mca_common_monitoring_get_coll_count, NULL, + mca_common_monitoring_comm_size_notify, NULL); + + (void)mca_base_pvar_register("ompi", "coll", "monitoring", "messages_size", "Size of " + "messages exchanged through the COLL framework with each peer.", + OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_SIZE, + MCA_MONITORING_VAR_TYPE, NULL, MPI_T_BIND_MPI_COMM, + MCA_BASE_PVAR_FLAG_READONLY | MCA_BASE_PVAR_FLAG_IWG, + mca_common_monitoring_get_coll_size, NULL, + mca_common_monitoring_comm_size_notify, NULL); + + (void)mca_base_pvar_register("ompi", "coll", "monitoring", "o2a_count", "Number of messages " + "exchanged as one-to-all operations in a communicator.", + OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_COUNTER, + MCA_MONITORING_VAR_TYPE, NULL, MPI_T_BIND_MPI_COMM, + MCA_BASE_PVAR_FLAG_READONLY | MCA_BASE_PVAR_FLAG_IWG, + mca_common_monitoring_coll_get_o2a_count, NULL, + mca_common_monitoring_coll_messages_notify, NULL); + + (void)mca_base_pvar_register("ompi", "coll", "monitoring", "o2a_size", "Size of messages " + "exchanged as one-to-all operations in a communicator.", + OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_AGGREGATE, + MCA_MONITORING_VAR_TYPE, NULL, MPI_T_BIND_MPI_COMM, + MCA_BASE_PVAR_FLAG_READONLY | MCA_BASE_PVAR_FLAG_IWG, + mca_common_monitoring_coll_get_o2a_size, NULL, + mca_common_monitoring_coll_messages_notify, NULL); + + (void)mca_base_pvar_register("ompi", "coll", "monitoring", "a2o_count", "Number of messages " + "exchanged as all-to-one operations in a communicator.", + OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_COUNTER, + MCA_MONITORING_VAR_TYPE, NULL, MPI_T_BIND_MPI_COMM, + MCA_BASE_PVAR_FLAG_READONLY | MCA_BASE_PVAR_FLAG_IWG, + mca_common_monitoring_coll_get_a2o_count, NULL, + mca_common_monitoring_coll_messages_notify, NULL); + + (void)mca_base_pvar_register("ompi", "coll", "monitoring", "a2o_size", "Size of messages " + "exchanged as all-to-one operations in a communicator.", + OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_AGGREGATE, + MCA_MONITORING_VAR_TYPE, NULL, MPI_T_BIND_MPI_COMM, + MCA_BASE_PVAR_FLAG_READONLY | MCA_BASE_PVAR_FLAG_IWG, + mca_common_monitoring_coll_get_a2o_size, NULL, + mca_common_monitoring_coll_messages_notify, NULL); + + (void)mca_base_pvar_register("ompi", "coll", "monitoring", "a2a_count", "Number of messages " + "exchanged as all-to-all operations in a communicator.", + OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_COUNTER, + MCA_MONITORING_VAR_TYPE, NULL, MPI_T_BIND_MPI_COMM, + MCA_BASE_PVAR_FLAG_READONLY | MCA_BASE_PVAR_FLAG_IWG, + mca_common_monitoring_coll_get_a2a_count, NULL, + mca_common_monitoring_coll_messages_notify, NULL); + + (void)mca_base_pvar_register("ompi", "coll", "monitoring", "a2a_size", "Size of messages " + "exchanged as all-to-all operations in a communicator.", + OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_AGGREGATE, + MCA_MONITORING_VAR_TYPE, NULL, MPI_T_BIND_MPI_COMM, + MCA_BASE_PVAR_FLAG_READONLY | MCA_BASE_PVAR_FLAG_IWG, + mca_common_monitoring_coll_get_a2a_size, NULL, + mca_common_monitoring_coll_messages_notify, NULL); +} + +/** + * This PML monitors only the processes in the MPI_COMM_WORLD. As OMPI is now lazily + * adding peers on the first call to add_procs we need to check how many processes + * are in the MPI_COMM_WORLD to create the storage with the right size. + */ +int mca_common_monitoring_add_procs(struct ompi_proc_t **procs, + size_t nprocs) +{ + opal_process_name_t tmp, wp_name; + size_t i; + int peer_rank; + uint64_t key; + if( 0 > rank_world ) + rank_world = ompi_comm_rank((ompi_communicator_t*)&ompi_mpi_comm_world); + if( !nprocs_world ) + nprocs_world = ompi_comm_size((ompi_communicator_t*)&ompi_mpi_comm_world); + + if( NULL == pml_data ) { + int array_size = (10 + max_size_histogram) * nprocs_world; + pml_data = (size_t*)calloc(array_size, sizeof(size_t)); + pml_count = pml_data + nprocs_world; + filtered_pml_data = pml_count + nprocs_world; + filtered_pml_count = filtered_pml_data + nprocs_world; + osc_data_s = filtered_pml_count + nprocs_world; + osc_count_s = osc_data_s + nprocs_world; + osc_data_r = osc_count_s + nprocs_world; + osc_count_r = osc_data_r + nprocs_world; + coll_data = osc_count_r + nprocs_world; + coll_count = coll_data + nprocs_world; + + size_histogram = coll_count + nprocs_world; + } + + /* For all procs in the same MPI_COMM_WORLD we need to add them to the hash table */ + for( i = 0; i < nprocs; i++ ) { + + /* Extract the peer procname from the procs array */ + if( ompi_proc_is_sentinel(procs[i]) ) { + tmp = ompi_proc_sentinel_to_name((uintptr_t)procs[i]); + } else { + tmp = procs[i]->super.proc_name; + } + if( tmp.jobid != ompi_proc_local_proc->super.proc_name.jobid ) + continue; + + /* each process will only be added once, so there is no way it already exists in the hash */ + for( peer_rank = 0; peer_rank < nprocs_world; peer_rank++ ) { + wp_name = ompi_group_get_proc_name(((ompi_communicator_t*)&ompi_mpi_comm_world)->c_remote_group, peer_rank); + if( 0 != opal_compare_proc( tmp, wp_name ) ) + continue; + + key = *((uint64_t*)&tmp); + /* save the rank of the process in MPI_COMM_WORLD in the hash using the proc_name as the key */ + if( OPAL_SUCCESS != opal_hash_table_set_value_uint64(common_monitoring_translation_ht, + key, (void*)(uintptr_t)peer_rank) ) { + return OMPI_ERR_OUT_OF_RESOURCE; /* failed to allocate memory or growing the hash table */ + } + break; + } + } + return OMPI_SUCCESS; +} + +static void mca_common_monitoring_reset( void ) +{ + int array_size = (10 + max_size_histogram) * nprocs_world; + memset(pml_data, 0, array_size * sizeof(size_t)); + mca_common_monitoring_coll_reset(); +} + +void mca_common_monitoring_record_pml(int world_rank, size_t data_size, int tag) +{ + if( 0 == mca_common_monitoring_current_state ) return; /* right now the monitoring is not started */ + + /* Keep tracks of the data_size distribution */ + if( 0 == data_size ) { + opal_atomic_add_fetch_size_t(&size_histogram[world_rank * max_size_histogram], 1); + } else { + int log2_size = log10(data_size)/log10_2; + if(log2_size > max_size_histogram - 2) /* Avoid out-of-bound write */ + log2_size = max_size_histogram - 2; + opal_atomic_add_fetch_size_t(&size_histogram[world_rank * max_size_histogram + log2_size + 1], 1); + } + + /* distinguishses positive and negative tags if requested */ + if( (tag < 0) && (mca_common_monitoring_filter()) ) { + opal_atomic_add_fetch_size_t(&filtered_pml_data[world_rank], data_size); + opal_atomic_add_fetch_size_t(&filtered_pml_count[world_rank], 1); + } else { /* if filtered monitoring is not activated data is aggregated indifferently */ + opal_atomic_add_fetch_size_t(&pml_data[world_rank], data_size); + opal_atomic_add_fetch_size_t(&pml_count[world_rank], 1); + } +} + +static int mca_common_monitoring_get_pml_count(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle) +{ + ompi_communicator_t *comm = (ompi_communicator_t *) obj_handle; + int i, comm_size = ompi_comm_size (comm); + size_t *values = (size_t*) value; + + if(comm != &ompi_mpi_comm_world.comm || NULL == pml_count) + return OMPI_ERROR; + + for (i = 0 ; i < comm_size ; ++i) { + values[i] = pml_count[i]; + } + + return OMPI_SUCCESS; +} + +static int mca_common_monitoring_get_pml_size(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle) +{ + ompi_communicator_t *comm = (ompi_communicator_t *) obj_handle; + int comm_size = ompi_comm_size (comm); + size_t *values = (size_t*) value; + int i; + + if(comm != &ompi_mpi_comm_world.comm || NULL == pml_data) + return OMPI_ERROR; + + for (i = 0 ; i < comm_size ; ++i) { + values[i] = pml_data[i]; + } + + return OMPI_SUCCESS; +} + +void mca_common_monitoring_record_osc(int world_rank, size_t data_size, + enum mca_monitoring_osc_direction dir) +{ + if( 0 == mca_common_monitoring_current_state ) return; /* right now the monitoring is not started */ + + if( SEND == dir ) { + opal_atomic_add_fetch_size_t(&osc_data_s[world_rank], data_size); + opal_atomic_add_fetch_size_t(&osc_count_s[world_rank], 1); + } else { + opal_atomic_add_fetch_size_t(&osc_data_r[world_rank], data_size); + opal_atomic_add_fetch_size_t(&osc_count_r[world_rank], 1); + } +} + +static int mca_common_monitoring_get_osc_sent_count(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle) +{ + ompi_communicator_t *comm = (ompi_communicator_t *) obj_handle; + int i, comm_size = ompi_comm_size (comm); + size_t *values = (size_t*) value; + + if(comm != &ompi_mpi_comm_world.comm || NULL == pml_count) + return OMPI_ERROR; + + for (i = 0 ; i < comm_size ; ++i) { + values[i] = osc_count_s[i]; + } + + return OMPI_SUCCESS; +} + +static int mca_common_monitoring_get_osc_sent_size(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle) +{ + ompi_communicator_t *comm = (ompi_communicator_t *) obj_handle; + int comm_size = ompi_comm_size (comm); + size_t *values = (size_t*) value; + int i; + + if(comm != &ompi_mpi_comm_world.comm || NULL == pml_data) + return OMPI_ERROR; + + for (i = 0 ; i < comm_size ; ++i) { + values[i] = osc_data_s[i]; + } + + return OMPI_SUCCESS; +} + +static int mca_common_monitoring_get_osc_recv_count(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle) +{ + ompi_communicator_t *comm = (ompi_communicator_t *) obj_handle; + int i, comm_size = ompi_comm_size (comm); + size_t *values = (size_t*) value; + + if(comm != &ompi_mpi_comm_world.comm || NULL == pml_count) + return OMPI_ERROR; + + for (i = 0 ; i < comm_size ; ++i) { + values[i] = osc_count_r[i]; + } + + return OMPI_SUCCESS; +} + +static int mca_common_monitoring_get_osc_recv_size(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle) +{ + ompi_communicator_t *comm = (ompi_communicator_t *) obj_handle; + int comm_size = ompi_comm_size (comm); + size_t *values = (size_t*) value; + int i; + + if(comm != &ompi_mpi_comm_world.comm || NULL == pml_data) + return OMPI_ERROR; + + for (i = 0 ; i < comm_size ; ++i) { + values[i] = osc_data_r[i]; + } + + return OMPI_SUCCESS; +} + +void mca_common_monitoring_record_coll(int world_rank, size_t data_size) +{ + if( 0 == mca_common_monitoring_current_state ) return; /* right now the monitoring is not started */ + + opal_atomic_add_fetch_size_t(&coll_data[world_rank], data_size); + opal_atomic_add_fetch_size_t(&coll_count[world_rank], 1); +} + +static int mca_common_monitoring_get_coll_count(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle) +{ + ompi_communicator_t *comm = (ompi_communicator_t *) obj_handle; + int i, comm_size = ompi_comm_size (comm); + size_t *values = (size_t*) value; + + if(comm != &ompi_mpi_comm_world.comm || NULL == pml_count) + return OMPI_ERROR; + + for (i = 0 ; i < comm_size ; ++i) { + values[i] = coll_count[i]; + } + + return OMPI_SUCCESS; +} + +static int mca_common_monitoring_get_coll_size(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle) +{ + ompi_communicator_t *comm = (ompi_communicator_t *) obj_handle; + int comm_size = ompi_comm_size (comm); + size_t *values = (size_t*) value; + int i; + + if(comm != &ompi_mpi_comm_world.comm || NULL == pml_data) + return OMPI_ERROR; + + for (i = 0 ; i < comm_size ; ++i) { + values[i] = coll_data[i]; + } + + return OMPI_SUCCESS; +} + +static void mca_common_monitoring_output( FILE *pf, int my_rank, int nbprocs ) +{ + /* Dump outgoing messages */ + fprintf(pf, "# POINT TO POINT\n"); + for (int i = 0 ; i < nbprocs ; i++) { + if(pml_count[i] > 0) { + fprintf(pf, "E\t%" PRId32 "\t%" PRId32 "\t%zu bytes\t%zu msgs sent\t", + my_rank, i, pml_data[i], pml_count[i]); + for(int j = 0 ; j < max_size_histogram ; ++j) + fprintf(pf, "%zu%s", size_histogram[i * max_size_histogram + j], + j < max_size_histogram - 1 ? "," : "\n"); + } + } + + /* Dump outgoing synchronization/collective messages */ + if( mca_common_monitoring_filter() ) { + for (int i = 0 ; i < nbprocs ; i++) { + if(filtered_pml_count[i] > 0) { + fprintf(pf, "I\t%" PRId32 "\t%" PRId32 "\t%zu bytes\t%zu msgs sent%s", + my_rank, i, filtered_pml_data[i], filtered_pml_count[i], + 0 == pml_count[i] ? "\t" : "\n"); + /* + * In the case there was no external messages + * exchanged between the two processes, the histogram + * has not yet been dumpped. Then we need to add it at + * the end of the internal category. + */ + if(0 == pml_count[i]) { + for(int j = 0 ; j < max_size_histogram ; ++j) + fprintf(pf, "%zu%s", size_histogram[i * max_size_histogram + j], + j < max_size_histogram - 1 ? "," : "\n"); + } + } + } + } + + /* Dump incoming messages */ + fprintf(pf, "# OSC\n"); + for (int i = 0 ; i < nbprocs ; i++) { + if(osc_count_s[i] > 0) { + fprintf(pf, "S\t%" PRId32 "\t%" PRId32 "\t%zu bytes\t%zu msgs sent\n", + my_rank, i, osc_data_s[i], osc_count_s[i]); + } + if(osc_count_r[i] > 0) { + fprintf(pf, "R\t%" PRId32 "\t%" PRId32 "\t%zu bytes\t%zu msgs sent\n", + my_rank, i, osc_data_r[i], osc_count_r[i]); + } + } + + /* Dump collectives */ + fprintf(pf, "# COLLECTIVES\n"); + for (int i = 0 ; i < nbprocs ; i++) { + if(coll_count[i] > 0) { + fprintf(pf, "C\t%" PRId32 "\t%" PRId32 "\t%zu bytes\t%zu msgs sent\n", + my_rank, i, coll_data[i], coll_count[i]); + } + } + mca_common_monitoring_coll_flush_all(pf); +} + +/* + * Flushes the monitoring into filename + * Useful for phases (see example in test/monitoring) + */ +static int mca_common_monitoring_flush(int fd, char* filename) +{ + /* If we are not drived by MPIT then dump the monitoring information */ + if( 0 == mca_common_monitoring_current_state || 0 == fd ) /* if disabled do nothing */ + return OMPI_SUCCESS; + + if( 1 == fd ) { + OPAL_MONITORING_PRINT_INFO("Proc %" PRId32 " flushing monitoring to stdout", rank_world); + mca_common_monitoring_output( stdout, rank_world, nprocs_world ); + } else if( 2 == fd ) { + OPAL_MONITORING_PRINT_INFO("Proc %" PRId32 " flushing monitoring to stderr", rank_world); + mca_common_monitoring_output( stderr, rank_world, nprocs_world ); + } else { + FILE *pf = NULL; + char* tmpfn = NULL; + + if( NULL == filename ) { /* No filename */ + OPAL_MONITORING_PRINT_ERR("Error while flushing: no filename provided"); + return OMPI_ERROR; + } else { + asprintf(&tmpfn, "%s.%" PRId32 ".prof", filename, rank_world); + pf = fopen(tmpfn, "w"); + free(tmpfn); + } + + if(NULL == pf) { /* Error during open */ + OPAL_MONITORING_PRINT_ERR("Error while flushing to: %s.%" PRId32 ".prof", + filename, rank_world); + return OMPI_ERROR; + } + + OPAL_MONITORING_PRINT_INFO("Proc %d flushing monitoring to: %s.%" PRId32 ".prof", + rank_world, filename, rank_world); + + mca_common_monitoring_output( pf, rank_world, nprocs_world ); + + fclose(pf); + } + /* Reset to 0 all monitored data */ + mca_common_monitoring_reset(); + return OMPI_SUCCESS; +} diff --git a/ompi/mca/common/monitoring/common_monitoring.h b/ompi/mca/common/monitoring/common_monitoring.h new file mode 100644 index 00000000000..5dedf371bc7 --- /dev/null +++ b/ompi/mca/common/monitoring/common_monitoring.h @@ -0,0 +1,121 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#ifndef MCA_COMMON_MONITORING_H +#define MCA_COMMON_MONITORING_H + +BEGIN_C_DECLS + +#include +#include +#include +#include +#include +#include + +#define MCA_MONITORING_MAKE_VERSION \ + MCA_BASE_MAKE_VERSION(component, OMPI_MAJOR_VERSION, OMPI_MINOR_VERSION, OMPI_RELEASE_VERSION) + +#define OPAL_MONITORING_VERBOSE(x, ...) \ + OPAL_OUTPUT_VERBOSE((x, mca_common_monitoring_output_stream_id, __VA_ARGS__)) + +/* When built in debug mode, always display error messages */ +#if OPAL_ENABLE_DEBUG +#define OPAL_MONITORING_PRINT_ERR(...) \ + OPAL_MONITORING_VERBOSE(0, __VA_ARGS__) +#else /* if( ! OPAL_ENABLE_DEBUG ) */ +#define OPAL_MONITORING_PRINT_ERR(...) \ + OPAL_MONITORING_VERBOSE(1, __VA_ARGS__) +#endif /* OPAL_ENABLE_DEBUG */ + +#define OPAL_MONITORING_PRINT_WARN(...) \ + OPAL_MONITORING_VERBOSE(5, __VA_ARGS__) + +#define OPAL_MONITORING_PRINT_INFO(...) \ + OPAL_MONITORING_VERBOSE(10, __VA_ARGS__) + +extern int mca_common_monitoring_output_stream_id; +extern int mca_common_monitoring_enabled; +extern int mca_common_monitoring_current_state; +extern opal_hash_table_t *common_monitoring_translation_ht; + +OMPI_DECLSPEC void mca_common_monitoring_register(void*pml_monitoring_component); +OMPI_DECLSPEC int mca_common_monitoring_init( void ); +OMPI_DECLSPEC void mca_common_monitoring_finalize( void ); +OMPI_DECLSPEC int mca_common_monitoring_add_procs(struct ompi_proc_t **procs, size_t nprocs); + +/* Records PML communication */ +OMPI_DECLSPEC void mca_common_monitoring_record_pml(int world_rank, size_t data_size, int tag); + +/* SEND corresponds to data emitted from the current proc to the given + * one. RECV represents data emitted from the given proc to the + * current one. + */ +enum mca_monitoring_osc_direction { SEND, RECV }; + +/* Records OSC communications. */ +OMPI_DECLSPEC void mca_common_monitoring_record_osc(int world_rank, size_t data_size, + enum mca_monitoring_osc_direction dir); + +/* Records COLL communications. */ +OMPI_DECLSPEC void mca_common_monitoring_record_coll(int world_rank, size_t data_size); + +/* Translate the rank from the given rank of a process to its rank in MPI_COMM_RANK. */ +static inline int mca_common_monitoring_get_world_rank(int dest, ompi_group_t *group, + int *world_rank) +{ + opal_process_name_t tmp; + + /* find the processor of the destination */ + ompi_proc_t *proc = ompi_group_get_proc_ptr(group, dest, true); + if( ompi_proc_is_sentinel(proc) ) { + tmp = ompi_proc_sentinel_to_name((uintptr_t)proc); + } else { + tmp = proc->super.proc_name; + } + + /* find its name*/ + uint64_t rank, key = *((uint64_t*)&tmp); + /** + * If this fails the destination is not part of my MPI_COM_WORLD + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank + */ + int ret = opal_hash_table_get_value_uint64(common_monitoring_translation_ht, + key, (void *)&rank); + + /* Use intermediate variable to avoid overwriting while looking up in the hashtbale. */ + if( ret == OPAL_SUCCESS ) *world_rank = (int)rank; + return ret; +} + +/* Return the current status of the monitoring system 0 if off or the + * seperation between internal tags and external tags is disabled. Any + * other positive value if the segregation between point-to-point and + * collective is enabled. + */ +static inline int mca_common_monitoring_filter( void ) +{ + return 1 < mca_common_monitoring_current_state; +} + +/* Collective operation monitoring */ +struct mca_monitoring_coll_data_t; +typedef struct mca_monitoring_coll_data_t mca_monitoring_coll_data_t; +OMPI_DECLSPEC OBJ_CLASS_DECLARATION(mca_monitoring_coll_data_t); + +OMPI_DECLSPEC mca_monitoring_coll_data_t*mca_common_monitoring_coll_new(ompi_communicator_t*comm); +OMPI_DECLSPEC int mca_common_monitoring_coll_cache_name(ompi_communicator_t*comm); +OMPI_DECLSPEC void mca_common_monitoring_coll_release(mca_monitoring_coll_data_t*data); +OMPI_DECLSPEC void mca_common_monitoring_coll_o2a(size_t size, mca_monitoring_coll_data_t*data); +OMPI_DECLSPEC void mca_common_monitoring_coll_a2o(size_t size, mca_monitoring_coll_data_t*data); +OMPI_DECLSPEC void mca_common_monitoring_coll_a2a(size_t size, mca_monitoring_coll_data_t*data); + +END_C_DECLS + +#endif /* MCA_COMMON_MONITORING_H */ diff --git a/ompi/mca/common/monitoring/common_monitoring_coll.c b/ompi/mca/common/monitoring/common_monitoring_coll.c new file mode 100644 index 00000000000..571f48070e6 --- /dev/null +++ b/ompi/mca/common/monitoring/common_monitoring_coll.c @@ -0,0 +1,371 @@ +/* + * Copyright (c) 2013-2016 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * Copyright (c) 2013-2018 Inria. All rights reserved. + * Copyright (c) 2015 Bull SAS. All rights reserved. + * Copyright (c) 2016-2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include "common_monitoring.h" +#include "common_monitoring_coll.h" +#include +#include +#include +#include +#include + +/*** Monitoring specific variables ***/ +struct mca_monitoring_coll_data_t { + opal_object_t super; + char*procs; + char*comm_name; + int world_rank; + int is_released; + ompi_communicator_t*p_comm; + size_t o2a_count; + size_t o2a_size; + size_t a2o_count; + size_t a2o_size; + size_t a2a_count; + size_t a2a_size; +}; + +/* Collectives operation monitoring */ +static opal_hash_table_t *comm_data = NULL; + +int mca_common_monitoring_coll_cache_name(ompi_communicator_t*comm) +{ + mca_monitoring_coll_data_t*data; + int ret = opal_hash_table_get_value_uint64(comm_data, *((uint64_t*)&comm), (void*)&data); + if( OPAL_SUCCESS == ret ) { + data->comm_name = strdup(comm->c_name); + data->p_comm = NULL; + } + return ret; +} + +static inline void mca_common_monitoring_coll_cache(mca_monitoring_coll_data_t*data) +{ + if( -1 == data->world_rank ) { + /* Get current process world_rank */ + mca_common_monitoring_get_world_rank(ompi_comm_rank(data->p_comm), + data->p_comm->c_remote_group, + &data->world_rank); + } + /* Only list procs if the hashtable is already initialized, + i.e. if the previous call worked */ + if( (-1 != data->world_rank) && (NULL == data->procs || 0 == strlen(data->procs)) ) { + int i, pos = 0, size, world_size = -1, max_length, world_rank; + char*tmp_procs; + size = ompi_comm_size(data->p_comm); + world_size = ompi_comm_size((ompi_communicator_t*)&ompi_mpi_comm_world) - 1; + assert( 0 < size ); + /* Allocate enough space for list (add 1 to keep the final '\0' if already exact size) */ + max_length = snprintf(NULL, 0, "%d,", world_size - 1) + 1; + tmp_procs = malloc((1 + max_length * size) * sizeof(char)); + if( NULL == tmp_procs ) { + OPAL_MONITORING_PRINT_ERR("Cannot allocate memory for caching proc list."); + } else { + tmp_procs[0] = '\0'; + /* Build procs list */ + for(i = 0; i < size; ++i) { + if( OPAL_SUCCESS == mca_common_monitoring_get_world_rank(i, data->p_comm->c_remote_group, &world_rank) ) + pos += sprintf(&tmp_procs[pos], "%d,", world_rank); + } + tmp_procs[pos - 1] = '\0'; /* Remove final coma */ + data->procs = realloc(tmp_procs, pos * sizeof(char)); /* Adjust to size required */ + } + } +} + +mca_monitoring_coll_data_t*mca_common_monitoring_coll_new( ompi_communicator_t*comm ) +{ + mca_monitoring_coll_data_t*data = OBJ_NEW(mca_monitoring_coll_data_t); + if( NULL == data ) { + OPAL_MONITORING_PRINT_ERR("coll: new: data structure cannot be allocated"); + return NULL; + } + + data->p_comm = comm; + + /* Allocate hashtable */ + if( NULL == comm_data ) { + comm_data = OBJ_NEW(opal_hash_table_t); + if( NULL == comm_data ) { + OPAL_MONITORING_PRINT_ERR("coll: new: failed to allocate hashtable"); + return data; + } + opal_hash_table_init(comm_data, 2048); + } + + /* Insert in hashtable */ + uint64_t key = *((uint64_t*)&comm); + if( OPAL_SUCCESS != opal_hash_table_set_value_uint64(comm_data, key, (void*)data) ) { + OPAL_MONITORING_PRINT_ERR("coll: new: failed to allocate memory or " + "growing the hash table"); + } + + /* Cache data so the procs can be released without affecting the output */ + mca_common_monitoring_coll_cache(data); + + return data; +} + +void mca_common_monitoring_coll_release(mca_monitoring_coll_data_t*data) +{ +#if OPAL_ENABLE_DEBUG + if( NULL == data ) { + OPAL_MONITORING_PRINT_ERR("coll: release: data structure empty or already desallocated"); + return; + } +#endif /* OPAL_ENABLE_DEBUG */ + + /* not flushed yet */ + data->is_released = 1; + mca_common_monitoring_coll_cache(data); +} + +static void mca_common_monitoring_coll_cond_release(mca_monitoring_coll_data_t*data) +{ +#if OPAL_ENABLE_DEBUG + if( NULL == data ) { + OPAL_MONITORING_PRINT_ERR("coll: release: data structure empty or already desallocated"); + return; + } +#endif /* OPAL_ENABLE_DEBUG */ + + if( data->is_released ) { /* if the communicator is already released */ + opal_hash_table_remove_value_uint64(comm_data, *((uint64_t*)&data->p_comm)); + data->p_comm = NULL; + free(data->comm_name); + free(data->procs); + OBJ_RELEASE(data); + } +} + +void mca_common_monitoring_coll_finalize( void ) +{ + if( NULL != comm_data ) { + opal_hash_table_remove_all( comm_data ); + OBJ_RELEASE(comm_data); + } +} + +void mca_common_monitoring_coll_flush(FILE *pf, mca_monitoring_coll_data_t*data) +{ + /* Flush data */ + fprintf(pf, + "D\t%s\tprocs: %s\n" + "O2A\t%" PRId32 "\t%zu bytes\t%zu msgs sent\n" + "A2O\t%" PRId32 "\t%zu bytes\t%zu msgs sent\n" + "A2A\t%" PRId32 "\t%zu bytes\t%zu msgs sent\n", + data->comm_name ? data->comm_name : data->p_comm ? + data->p_comm->c_name : "(no-name)", data->procs, + data->world_rank, data->o2a_size, data->o2a_count, + data->world_rank, data->a2o_size, data->a2o_count, + data->world_rank, data->a2a_size, data->a2a_count); +} + +void mca_common_monitoring_coll_flush_all(FILE *pf) +{ + if( NULL == comm_data ) return; /* No hashtable */ + + uint64_t key; + mca_monitoring_coll_data_t*previous = NULL, *data; + + OPAL_HASH_TABLE_FOREACH(key, uint64, data, comm_data) { + if( NULL != previous && NULL == previous->p_comm ) { + /* Phase flushed -> free already released once coll_data_t */ + mca_common_monitoring_coll_cond_release(previous); + } + mca_common_monitoring_coll_flush(pf, data); + previous = data; + } + mca_common_monitoring_coll_cond_release(previous); +} + + +void mca_common_monitoring_coll_reset(void) +{ + if( NULL == comm_data ) return; /* No hashtable */ + + uint64_t key; + mca_monitoring_coll_data_t*data; + + OPAL_HASH_TABLE_FOREACH(key, uint64, data, comm_data) { + data->o2a_count = 0; data->o2a_size = 0; + data->a2o_count = 0; data->a2o_size = 0; + data->a2a_count = 0; data->a2a_size = 0; + } +} + +int mca_common_monitoring_coll_messages_notify(mca_base_pvar_t *pvar, + mca_base_pvar_event_t event, + void *obj_handle, + int *count) +{ + switch (event) { + case MCA_BASE_PVAR_HANDLE_BIND: + *count = 1; + case MCA_BASE_PVAR_HANDLE_UNBIND: + return OMPI_SUCCESS; + case MCA_BASE_PVAR_HANDLE_START: + mca_common_monitoring_current_state = mca_common_monitoring_enabled; + return OMPI_SUCCESS; + case MCA_BASE_PVAR_HANDLE_STOP: + mca_common_monitoring_current_state = 0; + return OMPI_SUCCESS; + } + + return OMPI_ERROR; +} + +void mca_common_monitoring_coll_o2a(size_t size, mca_monitoring_coll_data_t*data) +{ + if( 0 == mca_common_monitoring_current_state ) return; /* right now the monitoring is not started */ +#if OPAL_ENABLE_DEBUG + if( NULL == data ) { + OPAL_MONITORING_PRINT_ERR("coll: o2a: data structure empty"); + return; + } +#endif /* OPAL_ENABLE_DEBUG */ + opal_atomic_add_fetch_size_t(&data->o2a_size, size); + opal_atomic_add_fetch_size_t(&data->o2a_count, 1); +} + +int mca_common_monitoring_coll_get_o2a_count(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle) +{ + ompi_communicator_t *comm = (ompi_communicator_t *) obj_handle; + size_t *value_size = (size_t*) value; + mca_monitoring_coll_data_t*data; + int ret = opal_hash_table_get_value_uint64(comm_data, *((uint64_t*)&comm), (void*)&data); + if( OPAL_SUCCESS == ret ) { + *value_size = data->o2a_count; + } + return ret; +} + +int mca_common_monitoring_coll_get_o2a_size(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle) +{ + ompi_communicator_t *comm = (ompi_communicator_t *) obj_handle; + size_t *value_size = (size_t*) value; + mca_monitoring_coll_data_t*data; + int ret = opal_hash_table_get_value_uint64(comm_data, *((uint64_t*)&comm), (void*)&data); + if( OPAL_SUCCESS == ret ) { + *value_size = data->o2a_size; + } + return ret; +} + +void mca_common_monitoring_coll_a2o(size_t size, mca_monitoring_coll_data_t*data) +{ + if( 0 == mca_common_monitoring_current_state ) return; /* right now the monitoring is not started */ +#if OPAL_ENABLE_DEBUG + if( NULL == data ) { + OPAL_MONITORING_PRINT_ERR("coll: a2o: data structure empty"); + return; + } +#endif /* OPAL_ENABLE_DEBUG */ + opal_atomic_add_fetch_size_t(&data->a2o_size, size); + opal_atomic_add_fetch_size_t(&data->a2o_count, 1); +} + +int mca_common_monitoring_coll_get_a2o_count(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle) +{ + ompi_communicator_t *comm = (ompi_communicator_t *) obj_handle; + size_t *value_size = (size_t*) value; + mca_monitoring_coll_data_t*data; + int ret = opal_hash_table_get_value_uint64(comm_data, *((uint64_t*)&comm), (void*)&data); + if( OPAL_SUCCESS == ret ) { + *value_size = data->a2o_count; + } + return ret; +} + +int mca_common_monitoring_coll_get_a2o_size(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle) +{ + ompi_communicator_t *comm = (ompi_communicator_t *) obj_handle; + size_t *value_size = (size_t*) value; + mca_monitoring_coll_data_t*data; + int ret = opal_hash_table_get_value_uint64(comm_data, *((uint64_t*)&comm), (void*)&data); + if( OPAL_SUCCESS == ret ) { + *value_size = data->a2o_size; + } + return ret; +} + +void mca_common_monitoring_coll_a2a(size_t size, mca_monitoring_coll_data_t*data) +{ + if( 0 == mca_common_monitoring_current_state ) return; /* right now the monitoring is not started */ +#if OPAL_ENABLE_DEBUG + if( NULL == data ) { + OPAL_MONITORING_PRINT_ERR("coll: a2a: data structure empty"); + return; + } +#endif /* OPAL_ENABLE_DEBUG */ + opal_atomic_add_fetch_size_t(&data->a2a_size, size); + opal_atomic_add_fetch_size_t(&data->a2a_count, 1); +} + +int mca_common_monitoring_coll_get_a2a_count(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle) +{ + ompi_communicator_t *comm = (ompi_communicator_t *) obj_handle; + size_t *value_size = (size_t*) value; + mca_monitoring_coll_data_t*data; + int ret = opal_hash_table_get_value_uint64(comm_data, *((uint64_t*)&comm), (void*)&data); + if( OPAL_SUCCESS == ret ) { + *value_size = data->a2a_count; + } + return ret; +} + +int mca_common_monitoring_coll_get_a2a_size(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle) +{ + ompi_communicator_t *comm = (ompi_communicator_t *) obj_handle; + size_t *value_size = (size_t*) value; + mca_monitoring_coll_data_t*data; + int ret = opal_hash_table_get_value_uint64(comm_data, *((uint64_t*)&comm), (void*)&data); + if( OPAL_SUCCESS == ret ) { + *value_size = data->a2a_size; + } + return ret; +} + +static void mca_monitoring_coll_construct (mca_monitoring_coll_data_t*coll_data) +{ + coll_data->procs = NULL; + coll_data->comm_name = NULL; + coll_data->world_rank = -1; + coll_data->p_comm = NULL; + coll_data->is_released = 0; + coll_data->o2a_count = 0; + coll_data->o2a_size = 0; + coll_data->a2o_count = 0; + coll_data->a2o_size = 0; + coll_data->a2a_count = 0; + coll_data->a2a_size = 0; +} + +static void mca_monitoring_coll_destruct (mca_monitoring_coll_data_t*coll_data){} + +OBJ_CLASS_INSTANCE(mca_monitoring_coll_data_t, opal_object_t, mca_monitoring_coll_construct, mca_monitoring_coll_destruct); diff --git a/ompi/mca/common/monitoring/common_monitoring_coll.h b/ompi/mca/common/monitoring/common_monitoring_coll.h new file mode 100644 index 00000000000..3deb4d0ad4f --- /dev/null +++ b/ompi/mca/common/monitoring/common_monitoring_coll.h @@ -0,0 +1,59 @@ +/* + * Copyright (c) 2016-2017 Inria. All rights reserved. + * Copyright (c) 2017 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#ifndef MCA_COMMON_MONITORING_COLL_H +#define MCA_COMMON_MONITORING_COLL_H + +BEGIN_C_DECLS + +#include +#include + +OMPI_DECLSPEC void mca_common_monitoring_coll_flush(FILE *pf, mca_monitoring_coll_data_t*data); + +OMPI_DECLSPEC void mca_common_monitoring_coll_flush_all(FILE *pf); + +OMPI_DECLSPEC void mca_common_monitoring_coll_reset( void ); + +OMPI_DECLSPEC int mca_common_monitoring_coll_messages_notify(mca_base_pvar_t *pvar, + mca_base_pvar_event_t event, + void *obj_handle, + int *count); + +OMPI_DECLSPEC int mca_common_monitoring_coll_get_o2a_count(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle); + +OMPI_DECLSPEC int mca_common_monitoring_coll_get_o2a_size(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle); + +OMPI_DECLSPEC int mca_common_monitoring_coll_get_a2o_count(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle); + +OMPI_DECLSPEC int mca_common_monitoring_coll_get_a2o_size(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle); + +OMPI_DECLSPEC int mca_common_monitoring_coll_get_a2a_count(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle); + +OMPI_DECLSPEC int mca_common_monitoring_coll_get_a2a_size(const struct mca_base_pvar_t *pvar, + void *value, + void *obj_handle); + +OMPI_DECLSPEC void mca_common_monitoring_coll_finalize( void ); +END_C_DECLS + +#endif /* MCA_COMMON_MONITORING_COLL_H */ diff --git a/ompi/mca/common/monitoring/configure.m4 b/ompi/mca/common/monitoring/configure.m4 new file mode 100644 index 00000000000..b7632bd4b8d --- /dev/null +++ b/ompi/mca/common/monitoring/configure.m4 @@ -0,0 +1,28 @@ +# -*- shell-script -*- +# +# Copyright (c) 2017 The University of Tennessee and The University +# of Tennessee Research Foundation. All rights +# reserved. +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +# MCA_ompi_common_monitoring_CONFIG([action-if-can-compile], +# [action-if-cant-compile]) +# ------------------------------------------------ +AC_DEFUN([MCA_ompi_common_monitoring_CONFIG],[ + AC_CONFIG_FILES([ompi/mca/common/monitoring/Makefile]) + + m4_ifdef([project_ompi], [ + m4_ifdef([MCA_BUILD_ompi_common_monitoring_DSO_TRUE], + [AC_CONFIG_LINKS(profile2mat.pl:test/monitoring/profile2mat.pl + aggregate_profile.pl:test/monitoring/aggregate_profile.pl)])]) + + + AS_IF([test "$MCA_BUILD_ompi_common_monitoring_DSO_TRUE" = ''], + [$1], + [$2]) +])dnl diff --git a/ompi/mca/common/monitoring/monitoring_prof.c b/ompi/mca/common/monitoring/monitoring_prof.c new file mode 100644 index 00000000000..3585c4927cf --- /dev/null +++ b/ompi/mca/common/monitoring/monitoring_prof.c @@ -0,0 +1,444 @@ +/* + * Copyright (c) 2013-2017 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * Copyright (c) 2013-2017 Inria. All rights reserved. + * Copyright (c) 2013-2015 Bull SAS. All rights reserved. + * Copyright (c) 2016 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +/* +pml monitoring PMPI profiler + +Designed by: + George Bosilca + Emmanuel Jeannot + Guillaume Papauré + Clément Foyer + +Contact the authors for questions. + +To be run as: + +mpirun -np 4 \ + --mca pml_monitoring_enable 1 \ + -x LD_PRELOAD=ompi_install_dir/lib/ompi_monitoring_prof.so \ + ./my_app + +... +... +... + +writing 4x4 matrix to monitoring_msg.mat +writing 4x4 matrix to monitoring_size.mat +writing 4x4 matrix to monitoring_avg.mat + +*/ + +#include +#include +#include +#include + +static MPI_T_pvar_session session; +static int comm_world_size; +static int comm_world_rank; + +struct monitoring_result +{ + char * pvar_name; + int pvar_idx; + MPI_T_pvar_handle pvar_handle; + size_t * vector; +}; +typedef struct monitoring_result monitoring_result; + +/* PML Sent */ +static monitoring_result pml_counts; +static monitoring_result pml_sizes; +/* OSC Sent */ +static monitoring_result osc_scounts; +static monitoring_result osc_ssizes; +/* OSC Recv */ +static monitoring_result osc_rcounts; +static monitoring_result osc_rsizes; +/* COLL Sent/Recv */ +static monitoring_result coll_counts; +static monitoring_result coll_sizes; + +static int write_mat(char *, size_t *, unsigned int); +static void init_monitoring_result(const char *, monitoring_result *); +static void start_monitoring_result(monitoring_result *); +static void stop_monitoring_result(monitoring_result *); +static void get_monitoring_result(monitoring_result *); +static void destroy_monitoring_result(monitoring_result *); + +int MPI_Init(int* argc, char*** argv) +{ + int result, MPIT_result; + int provided; + + result = PMPI_Init(argc, argv); + + PMPI_Comm_size(MPI_COMM_WORLD, &comm_world_size); + PMPI_Comm_rank(MPI_COMM_WORLD, &comm_world_rank); + + MPIT_result = MPI_T_init_thread(MPI_THREAD_SINGLE, &provided); + if (MPIT_result != MPI_SUCCESS) { + fprintf(stderr, "ERROR : failed to intialize MPI_T interface, preventing to get monitoring results: check your OpenMPI installation\n"); + PMPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + MPIT_result = MPI_T_pvar_session_create(&session); + if (MPIT_result != MPI_SUCCESS) { + fprintf(stderr, "ERROR : failed to create MPI_T session, preventing to get monitoring results: check your OpenMPI installation\n"); + PMPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + init_monitoring_result("pml_monitoring_messages_count", &pml_counts); + init_monitoring_result("pml_monitoring_messages_size", &pml_sizes); + init_monitoring_result("osc_monitoring_messages_sent_count", &osc_scounts); + init_monitoring_result("osc_monitoring_messages_sent_size", &osc_ssizes); + init_monitoring_result("osc_monitoring_messages_recv_count", &osc_rcounts); + init_monitoring_result("osc_monitoring_messages_recv_size", &osc_rsizes); + init_monitoring_result("coll_monitoring_messages_count", &coll_counts); + init_monitoring_result("coll_monitoring_messages_size", &coll_sizes); + + start_monitoring_result(&pml_counts); + start_monitoring_result(&pml_sizes); + start_monitoring_result(&osc_scounts); + start_monitoring_result(&osc_ssizes); + start_monitoring_result(&osc_rcounts); + start_monitoring_result(&osc_rsizes); + start_monitoring_result(&coll_counts); + start_monitoring_result(&coll_sizes); + + return result; +} + +int MPI_Finalize(void) +{ + int result, MPIT_result; + size_t * exchange_count_matrix_1 = NULL; + size_t * exchange_size_matrix_1 = NULL; + size_t * exchange_count_matrix_2 = NULL; + size_t * exchange_size_matrix_2 = NULL; + size_t * exchange_all_size_matrix = NULL; + size_t * exchange_all_count_matrix = NULL; + size_t * exchange_all_avg_matrix = NULL; + + stop_monitoring_result(&pml_counts); + stop_monitoring_result(&pml_sizes); + stop_monitoring_result(&osc_scounts); + stop_monitoring_result(&osc_ssizes); + stop_monitoring_result(&osc_rcounts); + stop_monitoring_result(&osc_rsizes); + stop_monitoring_result(&coll_counts); + stop_monitoring_result(&coll_sizes); + + get_monitoring_result(&pml_counts); + get_monitoring_result(&pml_sizes); + get_monitoring_result(&osc_scounts); + get_monitoring_result(&osc_ssizes); + get_monitoring_result(&osc_rcounts); + get_monitoring_result(&osc_rsizes); + get_monitoring_result(&coll_counts); + get_monitoring_result(&coll_sizes); + + if (0 == comm_world_rank) { + exchange_count_matrix_1 = (size_t *) calloc(comm_world_size * comm_world_size, sizeof(size_t)); + exchange_size_matrix_1 = (size_t *) calloc(comm_world_size * comm_world_size, sizeof(size_t)); + exchange_count_matrix_2 = (size_t *) calloc(comm_world_size * comm_world_size, sizeof(size_t)); + exchange_size_matrix_2 = (size_t *) calloc(comm_world_size * comm_world_size, sizeof(size_t)); + exchange_all_size_matrix = (size_t *) calloc(comm_world_size * comm_world_size, sizeof(size_t)); + exchange_all_count_matrix = (size_t *) calloc(comm_world_size * comm_world_size, sizeof(size_t)); + exchange_all_avg_matrix = (size_t *) calloc(comm_world_size * comm_world_size, sizeof(size_t)); + } + + /* Gather PML and COLL results */ + PMPI_Gather(pml_counts.vector, comm_world_size, MPI_UNSIGNED_LONG, exchange_count_matrix_1, comm_world_size, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD); + PMPI_Gather(pml_sizes.vector, comm_world_size, MPI_UNSIGNED_LONG, exchange_size_matrix_1, comm_world_size, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD); + PMPI_Gather(coll_counts.vector, comm_world_size, MPI_UNSIGNED_LONG, exchange_count_matrix_2, comm_world_size, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD); + PMPI_Gather(coll_sizes.vector, comm_world_size, MPI_UNSIGNED_LONG, exchange_size_matrix_2, comm_world_size, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD); + + if (0 == comm_world_rank) { + int i, j; + + for (i = 0; i < comm_world_size; ++i) { + for (j = i + 1; j < comm_world_size; ++j) { + /* Reduce PML results */ + exchange_count_matrix_1[i * comm_world_size + j] = exchange_count_matrix_1[j * comm_world_size + i] = (exchange_count_matrix_1[i * comm_world_size + j] + exchange_count_matrix_1[j * comm_world_size + i]) / 2; + exchange_size_matrix_1[i * comm_world_size + j] = exchange_size_matrix_1[j * comm_world_size + i] = (exchange_size_matrix_1[i * comm_world_size + j] + exchange_size_matrix_1[j * comm_world_size + i]) / 2; + if (exchange_count_matrix_1[i * comm_world_size + j] != 0) + exchange_all_size_matrix[i * comm_world_size + j] = exchange_all_size_matrix[j * comm_world_size + i] = exchange_size_matrix_1[i * comm_world_size + j] / exchange_count_matrix_1[i * comm_world_size + j]; + + /* Reduce COLL results */ + exchange_count_matrix_2[i * comm_world_size + j] = exchange_count_matrix_2[j * comm_world_size + i] = (exchange_count_matrix_2[i * comm_world_size + j] + exchange_count_matrix_2[j * comm_world_size + i]) / 2; + exchange_size_matrix_2[i * comm_world_size + j] = exchange_size_matrix_2[j * comm_world_size + i] = (exchange_size_matrix_2[i * comm_world_size + j] + exchange_size_matrix_2[j * comm_world_size + i]) / 2; + if (exchange_count_matrix_2[i * comm_world_size + j] != 0) + exchange_all_count_matrix[i * comm_world_size + j] = exchange_all_count_matrix[j * comm_world_size + i] = exchange_size_matrix_2[i * comm_world_size + j] / exchange_count_matrix_2[i * comm_world_size + j]; + } + } + + /* Write PML matrices */ + write_mat("monitoring_pml_msg.mat", exchange_count_matrix_1, comm_world_size); + write_mat("monitoring_pml_size.mat", exchange_size_matrix_1, comm_world_size); + write_mat("monitoring_pml_avg.mat", exchange_all_size_matrix, comm_world_size); + + /* Write COLL matrices */ + write_mat("monitoring_coll_msg.mat", exchange_count_matrix_2, comm_world_size); + write_mat("monitoring_coll_size.mat", exchange_size_matrix_2, comm_world_size); + write_mat("monitoring_coll_avg.mat", exchange_all_count_matrix, comm_world_size); + + /* Aggregate PML and COLL in ALL matrices */ + for (i = 0; i < comm_world_size; ++i) { + for (j = i + 1; j < comm_world_size; ++j) { + exchange_all_size_matrix[i * comm_world_size + j] = exchange_all_size_matrix[j * comm_world_size + i] = exchange_size_matrix_1[i * comm_world_size + j] + exchange_size_matrix_2[i * comm_world_size + j]; + exchange_all_count_matrix[i * comm_world_size + j] = exchange_all_count_matrix[j * comm_world_size + i] = exchange_count_matrix_1[i * comm_world_size + j] + exchange_count_matrix_2[i * comm_world_size + j]; + } + } + } + + /* Gather OSC results */ + PMPI_Gather(osc_scounts.vector, comm_world_size, MPI_UNSIGNED_LONG, exchange_count_matrix_1, comm_world_size, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD); + PMPI_Gather(osc_ssizes.vector, comm_world_size, MPI_UNSIGNED_LONG, exchange_size_matrix_1, comm_world_size, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD); + PMPI_Gather(osc_rcounts.vector, comm_world_size, MPI_UNSIGNED_LONG, exchange_count_matrix_2, comm_world_size, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD); + PMPI_Gather(osc_rsizes.vector, comm_world_size, MPI_UNSIGNED_LONG, exchange_size_matrix_2, comm_world_size, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD); + + if (0 == comm_world_rank) { + int i, j; + + for (i = 0; i < comm_world_size; ++i) { + for (j = i + 1; j < comm_world_size; ++j) { + /* Reduce OSC results */ + exchange_count_matrix_1[i * comm_world_size + j] = exchange_count_matrix_1[j * comm_world_size + i] = (exchange_count_matrix_1[i * comm_world_size + j] + exchange_count_matrix_1[j * comm_world_size + i] + exchange_count_matrix_2[i * comm_world_size + j] + exchange_count_matrix_2[j * comm_world_size + i]) / 2; + exchange_size_matrix_1[i * comm_world_size + j] = exchange_size_matrix_1[j * comm_world_size + i] = (exchange_size_matrix_1[i * comm_world_size + j] + exchange_size_matrix_1[j * comm_world_size + i] + exchange_size_matrix_2[i * comm_world_size + j] + exchange_size_matrix_2[j * comm_world_size + i]) / 2; + if (exchange_count_matrix_1[i * comm_world_size + j] != 0) + exchange_all_avg_matrix[i * comm_world_size + j] = exchange_all_avg_matrix[j * comm_world_size + i] = exchange_size_matrix_1[i * comm_world_size + j] / exchange_count_matrix_1[i * comm_world_size + j]; + } + } + + /* Write OSC matrices */ + write_mat("monitoring_osc_msg.mat", exchange_count_matrix_1, comm_world_size); + write_mat("monitoring_osc_size.mat", exchange_size_matrix_1, comm_world_size); + write_mat("monitoring_osc_avg.mat", exchange_all_avg_matrix, comm_world_size); + + /* Aggregate OSC in ALL matrices and compute AVG */ + for (i = 0; i < comm_world_size; ++i) { + for (j = i + 1; j < comm_world_size; ++j) { + exchange_all_size_matrix[i * comm_world_size + j] = exchange_all_size_matrix[j * comm_world_size + i] += exchange_size_matrix_1[i * comm_world_size + j]; + exchange_all_count_matrix[i * comm_world_size + j] = exchange_all_count_matrix[j * comm_world_size + i] += exchange_count_matrix_1[i * comm_world_size + j]; + if (exchange_all_count_matrix[i * comm_world_size + j] != 0) + exchange_all_avg_matrix[i * comm_world_size + j] = exchange_all_avg_matrix[j * comm_world_size + i] = exchange_all_size_matrix[i * comm_world_size + j] / exchange_all_count_matrix[i * comm_world_size + j]; + } + } + + /* Write ALL matrices */ + write_mat("monitoring_all_msg.mat", exchange_all_count_matrix, comm_world_size); + write_mat("monitoring_all_size.mat", exchange_all_size_matrix, comm_world_size); + write_mat("monitoring_all_avg.mat", exchange_all_avg_matrix, comm_world_size); + + /* Free matrices */ + free(exchange_count_matrix_1); + free(exchange_size_matrix_1); + free(exchange_count_matrix_2); + free(exchange_size_matrix_2); + free(exchange_all_count_matrix); + free(exchange_all_size_matrix); + free(exchange_all_avg_matrix); + } + + destroy_monitoring_result(&pml_counts); + destroy_monitoring_result(&pml_sizes); + destroy_monitoring_result(&osc_scounts); + destroy_monitoring_result(&osc_ssizes); + destroy_monitoring_result(&osc_rcounts); + destroy_monitoring_result(&osc_rsizes); + destroy_monitoring_result(&coll_counts); + destroy_monitoring_result(&coll_sizes); + + MPIT_result = MPI_T_pvar_session_free(&session); + if (MPIT_result != MPI_SUCCESS) { + fprintf(stderr, "WARNING : failed to free MPI_T session, monitoring results may be impacted : check your OpenMPI installation\n"); + } + + MPIT_result = MPI_T_finalize(); + if (MPIT_result != MPI_SUCCESS) { + fprintf(stderr, "WARNING : failed to finalize MPI_T interface, monitoring results may be impacted : check your OpenMPI installation\n"); + } + + result = PMPI_Finalize(); + + return result; +} + +void init_monitoring_result(const char * pvar_name, monitoring_result * res) +{ + int count; + int MPIT_result; + MPI_Comm comm_world = MPI_COMM_WORLD; + + res->pvar_name = strdup(pvar_name); + + MPIT_result = MPI_T_pvar_get_index(res->pvar_name, MPI_T_PVAR_CLASS_SIZE, &(res->pvar_idx)); + if (MPIT_result != MPI_SUCCESS) { + fprintf(stderr, "ERROR : cannot find monitoring MPI_T \"%s\" pvar, check that you have monitoring pml\n", pvar_name); + PMPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + MPIT_result = MPI_T_pvar_handle_alloc(session, res->pvar_idx, comm_world, &(res->pvar_handle), &count); + if (MPIT_result != MPI_SUCCESS) { + fprintf(stderr, "ERROR : failed to allocate handle on \"%s\" pvar, check that you have monitoring pml\n", pvar_name); + PMPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + if (count != comm_world_size) { + fprintf(stderr, "ERROR : COMM_WORLD has %d ranks \"%s\" pvar contains %d values, check that you have monitoring pml\n", comm_world_size, pvar_name, count); + PMPI_Abort(MPI_COMM_WORLD, count); + } + + res->vector = (size_t *) malloc(comm_world_size * sizeof(size_t)); +} + +void start_monitoring_result(monitoring_result * res) +{ + int MPIT_result; + + MPIT_result = MPI_T_pvar_start(session, res->pvar_handle); + if (MPIT_result != MPI_SUCCESS) { + fprintf(stderr, "ERROR : failed to start handle on \"%s\" pvar, check that you have enabled the monitoring pml\n", res->pvar_name); + PMPI_Abort(MPI_COMM_WORLD, MPIT_result); + } +} + +void stop_monitoring_result(monitoring_result * res) +{ + int MPIT_result; + + MPIT_result = MPI_T_pvar_stop(session, res->pvar_handle); + if (MPIT_result != MPI_SUCCESS) { + fprintf(stderr, "ERROR : failed to stop handle on \"%s\" pvar, check that you have enabled the monitoring pml\n", res->pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } +} + +void get_monitoring_result(monitoring_result * res) +{ + int MPIT_result; + + MPIT_result = MPI_T_pvar_read(session, res->pvar_handle, res->vector); + if (MPIT_result != MPI_SUCCESS) { + fprintf(stderr, "ERROR : failed to read \"%s\" pvar, check that you have enabled the monitoring pml\n", res->pvar_name); + PMPI_Abort(MPI_COMM_WORLD, MPIT_result); + } +} + +void destroy_monitoring_result(monitoring_result * res) +{ + int MPIT_result; + + MPIT_result = MPI_T_pvar_handle_free(session, &(res->pvar_handle)); + if (MPIT_result != MPI_SUCCESS) { + printf("ERROR : failed to free handle on \"%s\" pvar, check that you have enabled the monitoring pml\n", res->pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + free(res->pvar_name); + free(res->vector); +} + +int write_mat(char * filename, size_t * mat, unsigned int dim) +{ + FILE *matrix_file; + int i, j; + + matrix_file = fopen(filename, "w"); + if (!matrix_file) { + fprintf(stderr, "ERROR : failed to open \"%s\" file in write mode, check your permissions\n", filename); + return -1; + } + + printf("writing %ux%u matrix to %s\n", dim, dim, filename); + + for (i = 0; i < comm_world_size; ++i) { + for (j = 0; j < comm_world_size; ++j) { + fprintf(matrix_file, "%zu ", mat[i * comm_world_size + j]); + } + fprintf(matrix_file, "\n"); + } + fflush(matrix_file); + fclose(matrix_file); + + return 0; +} + +/** + * MPI binding for fortran + */ + +#include +#include "ompi_config.h" +#include "opal/threads/thread_usage.h" +#include "ompi/mpi/fortran/base/constants.h" +#include "ompi/mpi/fortran/base/fint_2_int.h" + +void monitoring_prof_mpi_init_f2c( MPI_Fint * ); +void monitoring_prof_mpi_finalize_f2c( MPI_Fint * ); + +void monitoring_prof_mpi_init_f2c( MPI_Fint *ierr ) { + int c_ierr; + int argc = 0; + char ** argv = NULL; + + c_ierr = MPI_Init(&argc, &argv); + if (NULL != ierr) *ierr = OMPI_INT_2_FINT(c_ierr); +} + +void monitoring_prof_mpi_finalize_f2c( MPI_Fint *ierr ) { + int c_ierr; + + c_ierr = MPI_Finalize(); + if (NULL != ierr) *ierr = OMPI_INT_2_FINT(c_ierr); +} + +#if OPAL_HAVE_WEAK_SYMBOLS +#pragma weak MPI_INIT = monitoring_prof_mpi_init_f2c +#pragma weak mpi_init = monitoring_prof_mpi_init_f2c +#pragma weak mpi_init_ = monitoring_prof_mpi_init_f2c +#pragma weak mpi_init__ = monitoring_prof_mpi_init_f2c +#pragma weak MPI_Init_f = monitoring_prof_mpi_init_f2c +#pragma weak MPI_Init_f08 = monitoring_prof_mpi_init_f2c + +#pragma weak MPI_FINALIZE = monitoring_prof_mpi_finalize_f2c +#pragma weak mpi_finalize = monitoring_prof_mpi_finalize_f2c +#pragma weak mpi_finalize_ = monitoring_prof_mpi_finalize_f2c +#pragma weak mpi_finalize__ = monitoring_prof_mpi_finalize_f2c +#pragma weak MPI_Finalize_f = monitoring_prof_mpi_finalize_f2c +#pragma weak MPI_Finalize_f08 = monitoring_prof_mpi_finalize_f2c +#elif OMPI_BUILD_FORTRAN_BINDINGS +#define OMPI_F77_PROTOTYPES_MPI_H +#include "ompi/mpi/fortran/mpif-h/bindings.h" + +OMPI_GENERATE_F77_BINDINGS (MPI_INIT, + mpi_init, + mpi_init_, + mpi_init__, + monitoring_prof_mpi_init_f2c, + (MPI_Fint *ierr), + (ierr) ) + +OMPI_GENERATE_F77_BINDINGS (MPI_FINALIZE, + mpi_finalize, + mpi_finalize_, + mpi_finalize__, + monitoring_prof_mpi_finalize_f2c, + (MPI_Fint *ierr), + (ierr) ) +#endif diff --git a/test/monitoring/profile2mat.pl b/ompi/mca/common/monitoring/profile2mat.pl similarity index 92% rename from test/monitoring/profile2mat.pl rename to ompi/mca/common/monitoring/profile2mat.pl index a6ea6a52bb4..69275a24ff5 100644 --- a/test/monitoring/profile2mat.pl +++ b/ompi/mca/common/monitoring/profile2mat.pl @@ -4,7 +4,7 @@ # Copyright (c) 2013-2015 The University of Tennessee and The University # of Tennessee Research Foundation. All rights # reserved. -# Copyright (c) 2013-2015 Inria. All rights reserved. +# Copyright (c) 2013-2016 Inria. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -35,9 +35,11 @@ $filename=$ARGV[0]; } -profile($filename,"I|E","all"); +profile($filename,"I|E|S|R|C","all"); if ( profile($filename,"E","external") ){ - profile($filename,"I","internal"); + profile($filename,"I","internal"); + profile($filename,"S|R","osc"); + profile($filename,"C","coll"); } sub profile{ diff --git a/ompi/mca/common/ompio/Makefile.am b/ompi/mca/common/ompio/Makefile.am index 76e08a5986c..0b14e910b53 100644 --- a/ompi/mca/common/ompio/Makefile.am +++ b/ompi/mca/common/ompio/Makefile.am @@ -11,6 +11,8 @@ # All rights reserved. # Copyright (c) 2008-2016 University of Houston. All rights reserved. # Copyright (c) 2016 IBM Corporation. All rights reserved. +# Copyright (c) 2017 Research Organization for Information Science +# and Technology (RIST). All rights reserved. # # $COPYRIGHT$ # @@ -19,8 +21,6 @@ # $HEADER$ # -if OMPI_PROVIDE_MPI_FILE_INTERFACE - headers = \ common_ompio_print_queue.h \ common_ompio.h @@ -88,14 +88,3 @@ clean-local: if test -z "$(lib_LTLIBRARIES)"; then \ rm -f "$(comp_inst)"; \ fi - -else - -# Need to have empty targets because AM can't handle having an -# AM_CONDITIONAL was targets in the "if" statement but not in the -# "else". :-( - -all-local: -clean-local: - -endif diff --git a/ompi/mca/common/ompio/common_ompio.h b/ompi/mca/common/ompio/common_ompio.h index bebdcf72802..2edb9d280c8 100644 --- a/ompi/mca/common/ompio/common_ompio.h +++ b/ompi/mca/common/ompio/common_ompio.h @@ -49,7 +49,8 @@ OMPI_DECLSPEC int mca_common_ompio_file_iwrite_at_all (mca_io_ompio_file_t *fp, OMPI_DECLSPEC int mca_common_ompio_build_io_array ( mca_io_ompio_file_t *fh, int index, int cycles, size_t bytes_per_cycle, int max_data, uint32_t iov_count, - struct iovec *decoded_iov, int *ii, int *jj, size_t *tbw ); + struct iovec *decoded_iov, int *ii, int *jj, size_t *tbw, + size_t *spc ); OMPI_DECLSPEC int mca_common_ompio_file_read (mca_io_ompio_file_t *fh, void *buf, int count, @@ -75,7 +76,7 @@ OMPI_DECLSPEC int mca_common_ompio_file_iread_at_all (mca_io_ompio_file_t *fp, O ompi_request_t **request); OMPI_DECLSPEC int mca_common_ompio_file_open (ompi_communicator_t *comm, const char *filename, - int amode, ompi_info_t *info, + int amode, opal_info_t *info, mca_io_ompio_file_t *ompio_fh, bool use_sharedfp); OMPI_DECLSPEC int mca_common_ompio_file_close (mca_io_ompio_file_t *ompio_fh); @@ -85,7 +86,7 @@ OMPI_DECLSPEC int mca_common_ompio_set_explicit_offset (mca_io_ompio_file_t *fh, OMPI_DECLSPEC int mca_common_ompio_set_file_defaults (mca_io_ompio_file_t *fh); OMPI_DECLSPEC int mca_common_ompio_set_view (mca_io_ompio_file_t *fh, OMPI_MPI_OFFSET_TYPE disp, ompi_datatype_t *etype, ompi_datatype_t *filetype, const char *datarep, - ompi_info_t *info); + opal_info_t *info); #endif /* MCA_COMMON_OMPIO_H */ diff --git a/ompi/mca/common/ompio/common_ompio_file_open.c b/ompi/mca/common/ompio/common_ompio_file_open.c index 8aa809a2581..db9bbe930ab 100644 --- a/ompi/mca/common/ompio/common_ompio_file_open.c +++ b/ompi/mca/common/ompio/common_ompio_file_open.c @@ -13,6 +13,7 @@ * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2016 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -43,7 +44,7 @@ int mca_common_ompio_file_open (ompi_communicator_t *comm, const char *filename, int amode, - ompi_info_t *info, + opal_info_t *info, mca_io_ompio_file_t *ompio_fh, bool use_sharedfp) { int ret = OMPI_SUCCESS; @@ -89,6 +90,7 @@ int mca_common_ompio_file_open (ompi_communicator_t *comm, ompio_fh->f_amode = amode; ompio_fh->f_info = info; ompio_fh->f_atomicity = 0; + ompio_fh->f_fs_block_size = 4096; mca_common_ompio_set_file_defaults (ompio_fh); ompio_fh->f_filename = filename; @@ -104,16 +106,17 @@ int mca_common_ompio_file_open (ompi_communicator_t *comm, ompio_fh->f_decode_datatype=ompi_io_ompio_decode_datatype; ompio_fh->f_generate_current_file_view=ompi_io_ompio_generate_current_file_view; - ompio_fh->f_get_num_aggregators=mca_io_ompio_get_num_aggregators; - ompio_fh->f_get_bytes_per_agg=mca_io_ompio_get_bytes_per_agg; + ompio_fh->f_get_mca_parameter_value=mca_io_ompio_get_mca_parameter_value; ompio_fh->f_set_aggregator_props=mca_io_ompio_set_aggregator_props; /* This fix is needed for data seiving to work with two-phase collective I/O */ - if ((amode & MPI_MODE_WRONLY)){ - amode -= MPI_MODE_WRONLY; - amode += MPI_MODE_RDWR; - } + if ( mca_io_ompio_overwrite_amode && !(amode & MPI_MODE_SEQUENTIAL) ) { + if ((amode & MPI_MODE_WRONLY)){ + amode -= MPI_MODE_WRONLY; + amode += MPI_MODE_RDWR; + } + } /*--------------------------------------------------*/ @@ -128,11 +131,6 @@ int mca_common_ompio_file_open (ompi_communicator_t *comm, goto fn_fail; } - if (OMPI_SUCCESS != (ret = mca_fcoll_base_file_select (ompio_fh, - NULL))) { - opal_output(1, "mca_fcoll_base_file_select() failed\n"); - goto fn_fail; - } ompio_fh->f_sharedfp_component = NULL; /*component*/ ompio_fh->f_sharedfp = NULL; /*module*/ @@ -153,14 +151,6 @@ int mca_common_ompio_file_open (ompi_communicator_t *comm, ompio_fh->f_flags |= OMPIO_SHAREDFP_IS_SET; } - /*Determine topology information if set*/ - if (ompio_fh->f_comm->c_flags & OMPI_COMM_CART){ - ret = mca_io_ompio_cart_based_grouping(ompio_fh); - if(OMPI_SUCCESS != ret ){ - ret = MPI_ERR_FILE; - } - } - ret = ompio_fh->f_fs->fs_file_open (comm, filename, amode, @@ -168,7 +158,15 @@ int mca_common_ompio_file_open (ompi_communicator_t *comm, ompio_fh); if ( OMPI_SUCCESS != ret ) { - ret = MPI_ERR_FILE; +#ifdef OMPIO_DEBUG + opal_output(1, "fs_file failed, error code %d\n", ret); +#endif + goto fn_fail; + } + + if (OMPI_SUCCESS != (ret = mca_fcoll_base_file_select (ompio_fh, + NULL))) { + opal_output(1, "mca_fcoll_base_file_select() failed\n"); goto fn_fail; } @@ -207,12 +205,12 @@ int mca_common_ompio_file_open (ompi_communicator_t *comm, !mca_io_ompio_sharedfp_lazy_open ) { shared_fp_base_module = ompio_fh->f_sharedfp; ret = shared_fp_base_module->sharedfp_seek(ompio_fh,current_size, MPI_SEEK_SET); - } - else { - opal_output(1, "mca_common_ompio_file_open: Could not adjust position of " - "shared file pointer with MPI_MODE_APPEND\n"); - ret = MPI_ERR_OTHER; - goto fn_fail; + if ( MPI_SUCCESS != ret ) { + opal_output(1, "mca_common_ompio_file_open: Could not adjust position of " + "shared file pointer with MPI_MODE_APPEND\n"); + ret = MPI_ERR_OTHER; + goto fn_fail; + } } } @@ -281,7 +279,7 @@ int mca_common_ompio_file_close (mca_io_ompio_file_t *ompio_fh) ret = ompio_fh->f_fs->fs_file_close (ompio_fh); } if ( delete_flag && 0 == ompio_fh->f_rank ) { - mca_io_ompio_file_delete ( ompio_fh->f_filename, MPI_INFO_NULL ); + mca_io_ompio_file_delete ( ompio_fh->f_filename, &(MPI_INFO_NULL->super) ); } if ( NULL != ompio_fh->f_fs ) { @@ -400,7 +398,7 @@ int mca_common_ompio_set_file_defaults (mca_io_ompio_file_t *fh) if (NULL != fh) { ompi_datatype_t *types[2]; int blocklen[2] = {1, 1}; - OPAL_PTRDIFF_TYPE d[2], base; + ptrdiff_t d[2], base; int i; fh->f_io_array = NULL; @@ -427,7 +425,7 @@ int mca_common_ompio_set_file_defaults (mca_io_ompio_file_t *fh) /* Default file View */ fh->f_iov_type = MPI_DATATYPE_NULL; - fh->f_stripe_size = mca_io_ompio_bytes_per_agg; + fh->f_stripe_size = 0; /*Decoded iovec of the file-view*/ fh->f_decoded_iov = NULL; fh->f_etype = NULL; @@ -446,8 +444,8 @@ int mca_common_ompio_set_file_defaults (mca_io_ompio_file_t *fh) types[0] = &ompi_mpi_long.dt; types[1] = &ompi_mpi_long.dt; - d[0] = (OPAL_PTRDIFF_TYPE) fh->f_decoded_iov; - d[1] = (OPAL_PTRDIFF_TYPE) &fh->f_decoded_iov[0].iov_len; + d[0] = (ptrdiff_t) fh->f_decoded_iov; + d[1] = (ptrdiff_t) &fh->f_decoded_iov[0].iov_len; base = d[0]; for (i=0 ; i<2 ; i++) { diff --git a/ompi/mca/common/ompio/common_ompio_file_read.c b/ompi/mca/common/ompio/common_ompio_file_read.c index c8fb15e5a2c..284e8101e34 100644 --- a/ompi/mca/common/ompio/common_ompio_file_read.c +++ b/ompi/mca/common/ompio/common_ompio_file_read.c @@ -67,6 +67,7 @@ int mca_common_ompio_file_read (mca_io_ompio_file_t *fh, struct iovec *decoded_iov = NULL; size_t max_data=0, real_bytes_read=0; + size_t spc=0; ssize_t ret_code=0; int i = 0; /* index into the decoded iovec of the buffer */ int j = 0; /* index into the file vie iovec */ @@ -117,7 +118,8 @@ int mca_common_ompio_file_read (mca_io_ompio_file_t *fh, decoded_iov, &i, &j, - &total_bytes_read); + &total_bytes_read, + &spc); if (fh->f_num_of_io_entries) { ret_code = fh->f_fbtl->fbtl_preadv (fh); @@ -181,6 +183,7 @@ int mca_common_ompio_file_iread (mca_io_ompio_file_t *fh, { int ret = OMPI_SUCCESS; mca_ompio_request_t *ompio_req=NULL; + size_t spc=0; ompio_req = OBJ_NEW(mca_ompio_request_t); ompio_req->req_type = MCA_OMPIO_REQUEST_READ; @@ -226,7 +229,8 @@ int mca_common_ompio_file_iread (mca_io_ompio_file_t *fh, decoded_iov, &i, &j, - &total_bytes_read); + &total_bytes_read, + &spc); if (fh->f_num_of_io_entries) { fh->f_fbtl->fbtl_ipreadv (fh, (ompi_request_t *) ompio_req); diff --git a/ompi/mca/common/ompio/common_ompio_file_view.c b/ompi/mca/common/ompio/common_ompio_file_view.c index 43b42ee72eb..a41cc6e964f 100644 --- a/ompi/mca/common/ompio/common_ompio_file_view.c +++ b/ompi/mca/common/ompio/common_ompio_file_view.c @@ -9,7 +9,10 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2008-2016 University of Houston. All rights reserved. + * Copyright (c) 2008-2017 University of Houston. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -25,8 +28,9 @@ #include "common_ompio.h" #include "ompi/mca/fcoll/base/base.h" +#include "ompi/mca/topo/topo.h" -static OMPI_MPI_OFFSET_TYPE get_contiguous_chunk_size (mca_io_ompio_file_t *); +static OMPI_MPI_OFFSET_TYPE get_contiguous_chunk_size (mca_io_ompio_file_t *, int flag); static int datatype_duplicate (ompi_datatype_t *oldtype, ompi_datatype_t **newtype ); static int datatype_duplicate (ompi_datatype_t *oldtype, ompi_datatype_t **newtype ) { @@ -54,16 +58,16 @@ int mca_common_ompio_set_view (mca_io_ompio_file_t *fh, ompi_datatype_t *etype, ompi_datatype_t *filetype, const char *datarep, - ompi_info_t *info) + opal_info_t *info) { - + int ret=OMPI_SUCCESS; size_t max_data = 0; int i; int num_groups = 0; - mca_io_ompio_contg *contg_groups; + mca_io_ompio_contg *contg_groups=NULL; size_t ftype_size; - OPAL_PTRDIFF_TYPE ftype_extent, lb, ub; + ptrdiff_t ftype_extent, lb, ub; ompi_datatype_t *newfiletype; if ( NULL != fh->f_etype ) { @@ -92,7 +96,6 @@ int mca_common_ompio_set_view (mca_io_ompio_file_t *fh, if ( fh->f_flags & OMPIO_UNIFORM_FVIEW ) { fh->f_flags &= ~OMPIO_UNIFORM_FVIEW; } - fh->f_flags |= OMPIO_FILE_VIEW_IS_SET; fh->f_datarep = strdup (datarep); datatype_duplicate (filetype, &fh->f_orig_filetype ); @@ -101,7 +104,7 @@ int mca_common_ompio_set_view (mca_io_ompio_file_t *fh, if ( etype == filetype && ompi_datatype_is_predefined (filetype ) && - ftype_extent == (OPAL_PTRDIFF_TYPE)ftype_size ){ + ftype_extent == (ptrdiff_t)ftype_size ){ ompi_datatype_create_contiguous(MCA_IO_DEFAULT_FILE_VIEW_SIZE, &ompi_mpi_byte.dt, &newfiletype); @@ -109,6 +112,7 @@ int mca_common_ompio_set_view (mca_io_ompio_file_t *fh, } else { newfiletype = filetype; + fh->f_flags |= OMPIO_FILE_VIEW_IS_SET; } fh->f_iov_count = 0; @@ -135,11 +139,17 @@ int mca_common_ompio_set_view (mca_io_ompio_file_t *fh, // in orig_file type, No need to set args on this one. ompi_datatype_duplicate (newfiletype, &fh->f_filetype); - fh->f_cc_size = get_contiguous_chunk_size (fh); + + if( SIMPLE_PLUS == mca_io_ompio_grouping_option ) { + fh->f_cc_size = get_contiguous_chunk_size (fh, 1); + } + else { + fh->f_cc_size = get_contiguous_chunk_size (fh, 0); + } if (opal_datatype_is_contiguous_memory_layout(&etype->super,1)) { if (opal_datatype_is_contiguous_memory_layout(&filetype->super,1) && - fh->f_view_extent == (OPAL_PTRDIFF_TYPE)fh->f_view_size ) { + fh->f_view_extent == (ptrdiff_t)fh->f_view_size ) { fh->f_flags |= OMPIO_CONTIGUOUS_FVIEW; } } @@ -162,54 +172,95 @@ int mca_common_ompio_set_view (mca_io_ompio_file_t *fh, } } - if ( SIMPLE != mca_io_ompio_grouping_option ) { - if( OMPI_SUCCESS != mca_io_ompio_fview_based_grouping(fh, - &num_groups, - contg_groups)){ + if ( SIMPLE != mca_io_ompio_grouping_option && SIMPLE_PLUS != mca_io_ompio_grouping_option ) { + + ret = mca_io_ompio_fview_based_grouping(fh, + &num_groups, + contg_groups); + if ( OMPI_SUCCESS != ret ) { opal_output(1, "mca_common_ompio_set_view: mca_io_ompio_fview_based_grouping failed\n"); - free(contg_groups); - return OMPI_ERROR; + goto exit; } } else { - if( OMPI_SUCCESS != mca_io_ompio_simple_grouping(fh, - &num_groups, - contg_groups)){ - opal_output(1, "mca_common_ompio_set_view: mca_io_ompio_simple_grouping failed\n"); - free(contg_groups); - return OMPI_ERROR; + int done=0; + int ndims; + + if ( fh->f_comm->c_flags & OMPI_COMM_CART ){ + ret = fh->f_comm->c_topo->topo.cart.cartdim_get( fh->f_comm, &ndims); + if ( OMPI_SUCCESS != ret ){ + goto exit; + } + if ( ndims > 1 ) { + ret = mca_io_ompio_cart_based_grouping( fh, + &num_groups, + contg_groups); + if (OMPI_SUCCESS != ret ) { + opal_output(1, "mca_common_ompio_set_view: mca_io_ompio_cart_based_grouping failed\n"); + goto exit; + } + done=1; + } + } + + if ( !done ) { + ret = mca_io_ompio_simple_grouping(fh, + &num_groups, + contg_groups); + if ( OMPI_SUCCESS != ret ){ + opal_output(1, "mca_common_ompio_set_view: mca_io_ompio_simple_grouping failed\n"); + goto exit; + } } } - - - if ( OMPI_SUCCESS != mca_io_ompio_finalize_initial_grouping(fh, - num_groups, - contg_groups) ){ - opal_output(1, "mca_common_ompio_set_view: mca_io_ompio_finalize_initial_grouping failed\n"); - free(contg_groups); - return OMPI_ERROR; + +#ifdef DEBUG_OMPIO + if ( fh->f_rank == 0) { + int ii, jj; + printf("BEFORE finalize_init: comm size = %d num_groups = %d\n", fh->f_size, num_groups); + for ( ii=0; ii< num_groups; ii++ ) { + printf("contg_groups[%d].procs_per_contg_group=%d\n", ii, contg_groups[ii].procs_per_contg_group); + printf("contg_groups[%d].procs_in_contg_group.[", ii); + + for ( jj=0; jj< contg_groups[ii].procs_per_contg_group; jj++ ) { + printf("%d,", contg_groups[ii].procs_in_contg_group[jj]); + } + printf("]\n"); + } } - for( i = 0; i < fh->f_size; i++){ - free(contg_groups[i].procs_in_contg_group); +#endif + + ret = mca_io_ompio_finalize_initial_grouping(fh, + num_groups, + contg_groups); + if ( OMPI_SUCCESS != ret ) { + opal_output(1, "mca_common_ompio_set_view: mca_io_ompio_finalize_initial_grouping failed\n"); + goto exit; } - free(contg_groups); if ( etype == filetype && ompi_datatype_is_predefined (filetype ) && - ftype_extent == (OPAL_PTRDIFF_TYPE)ftype_size ){ + ftype_extent == (ptrdiff_t)ftype_size ){ ompi_datatype_destroy ( &newfiletype ); } - if (OMPI_SUCCESS != mca_fcoll_base_file_select (fh, NULL)) { + ret = mca_fcoll_base_file_select (fh, NULL); + if ( OMPI_SUCCESS != ret ) { opal_output(1, "mca_common_ompio_set_view: mca_fcoll_base_file_select() failed\n"); - return OMPI_ERROR; + goto exit; } - return OMPI_SUCCESS; +exit: + for( i = 0; i < fh->f_size; i++){ + free(contg_groups[i].procs_in_contg_group); + } + free(contg_groups); + + return ret; } -OMPI_MPI_OFFSET_TYPE get_contiguous_chunk_size (mca_io_ompio_file_t *fh) +OMPI_MPI_OFFSET_TYPE get_contiguous_chunk_size (mca_io_ompio_file_t *fh, int flag) { int uniform = 0; OMPI_MPI_OFFSET_TYPE avg[3] = {0,0,0}; @@ -224,60 +275,66 @@ OMPI_MPI_OFFSET_TYPE get_contiguous_chunk_size (mca_io_ompio_file_t *fh) ** 2. each section in the file view has exactly the same size */ - for (i=0 ; i<(int)fh->f_iov_count ; i++) { - avg[0] += fh->f_decoded_iov[i].iov_len; - if (i && 0 == uniform) { - if (fh->f_decoded_iov[i].iov_len != fh->f_decoded_iov[i-1].iov_len) { - uniform = 1; + if ( flag ) { + global_avg[0] = MCA_IO_DEFAULT_FILE_VIEW_SIZE; + } + else { + for (i=0 ; i<(int)fh->f_iov_count ; i++) { + avg[0] += fh->f_decoded_iov[i].iov_len; + if (i && 0 == uniform) { + if (fh->f_decoded_iov[i].iov_len != fh->f_decoded_iov[i-1].iov_len) { + uniform = 1; + } } } - } - if ( 0 != fh->f_iov_count ) { - avg[0] = avg[0]/fh->f_iov_count; - } - avg[1] = (OMPI_MPI_OFFSET_TYPE) fh->f_iov_count; - avg[2] = (OMPI_MPI_OFFSET_TYPE) uniform; - - fh->f_comm->c_coll->coll_allreduce (avg, - global_avg, - 3, - OMPI_OFFSET_DATATYPE, - MPI_SUM, - fh->f_comm, - fh->f_comm->c_coll->coll_allreduce_module); - global_avg[0] = global_avg[0]/fh->f_size; - global_avg[1] = global_avg[1]/fh->f_size; - + if ( 0 != fh->f_iov_count ) { + avg[0] = avg[0]/fh->f_iov_count; + } + avg[1] = (OMPI_MPI_OFFSET_TYPE) fh->f_iov_count; + avg[2] = (OMPI_MPI_OFFSET_TYPE) uniform; + + fh->f_comm->c_coll->coll_allreduce (avg, + global_avg, + 3, + OMPI_OFFSET_DATATYPE, + MPI_SUM, + fh->f_comm, + fh->f_comm->c_coll->coll_allreduce_module); + global_avg[0] = global_avg[0]/fh->f_size; + global_avg[1] = global_avg[1]/fh->f_size; + #if 0 - /* Disabling the feature since we are not using it anyway. Saves us one allreduce operation. */ - int global_uniform=0; - - if ( global_avg[0] == avg[0] && - global_avg[1] == avg[1] && - 0 == avg[2] && - 0 == global_avg[2] ) { - uniform = 0; - } - else { - uniform = 1; + /* Disabling the feature since we are not using it anyway. Saves us one allreduce operation. */ + int global_uniform=0; + + if ( global_avg[0] == avg[0] && + global_avg[1] == avg[1] && + 0 == avg[2] && + 0 == global_avg[2] ) { + uniform = 0; + } + else { + uniform = 1; + } + + /* second confirmation round to see whether all processes agree + ** on having a uniform file view or not + */ + fh->f_comm->c_coll->coll_allreduce (&uniform, + &global_uniform, + 1, + MPI_INT, + MPI_MAX, + fh->f_comm, + fh->f_comm->c_coll->coll_allreduce_module); + + if ( 0 == global_uniform ){ + /* yes, everybody agrees on having a uniform file view */ + fh->f_flags |= OMPIO_UNIFORM_FVIEW; + } +#endif } - /* second confirmation round to see whether all processes agree - ** on having a uniform file view or not - */ - fh->f_comm->c_coll->coll_allreduce (&uniform, - &global_uniform, - 1, - MPI_INT, - MPI_MAX, - fh->f_comm, - fh->f_comm->c_coll->coll_allreduce_module); - - if ( 0 == global_uniform ){ - /* yes, everybody agrees on having a uniform file view */ - fh->f_flags |= OMPIO_UNIFORM_FVIEW; - } -#endif return global_avg[0]; } diff --git a/ompi/mca/common/ompio/common_ompio_file_write.c b/ompi/mca/common/ompio/common_ompio_file_write.c index 97fe28671f8..bdfc5dd66e3 100644 --- a/ompi/mca/common/ompio/common_ompio_file_write.c +++ b/ompi/mca/common/ompio/common_ompio_file_write.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2016 University of Houston. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -51,6 +51,7 @@ int mca_common_ompio_file_write (mca_io_ompio_file_t *fh, size_t total_bytes_written = 0; size_t max_data=0, real_bytes_written=0; ssize_t ret_code=0; + size_t spc=0; int i = 0; /* index into the decoded iovec of the buffer */ int j = 0; /* index into the file view iovec */ @@ -92,7 +93,8 @@ int mca_common_ompio_file_write (mca_io_ompio_file_t *fh, decoded_iov, &i, &j, - &total_bytes_written); + &total_bytes_written, + &spc); if (fh->f_num_of_io_entries) { ret_code =fh->f_fbtl->fbtl_pwritev (fh); @@ -152,6 +154,7 @@ int mca_common_ompio_file_iwrite (mca_io_ompio_file_t *fh, { int ret = OMPI_SUCCESS; mca_ompio_request_t *ompio_req=NULL; + size_t spc=0; ompio_req = OBJ_NEW(mca_ompio_request_t); ompio_req->req_type = MCA_OMPIO_REQUEST_WRITE; @@ -195,7 +198,8 @@ int mca_common_ompio_file_iwrite (mca_io_ompio_file_t *fh, decoded_iov, &i, &j, - &total_bytes_written); + &total_bytes_written, + &spc); if (fh->f_num_of_io_entries) { fh->f_fbtl->fbtl_ipwritev (fh, (ompi_request_t *) ompio_req); @@ -327,13 +331,16 @@ int mca_common_ompio_file_iwrite_at_all (mca_io_ompio_file_t *fp, int mca_common_ompio_build_io_array ( mca_io_ompio_file_t *fh, int index, int cycles, size_t bytes_per_cycle, int max_data, uint32_t iov_count, - struct iovec *decoded_iov, int *ii, int *jj, size_t *tbw ) + struct iovec *decoded_iov, int *ii, int *jj, size_t *tbw, + size_t *spc) { - OPAL_PTRDIFF_TYPE disp; + ptrdiff_t disp; int block = 1; size_t total_bytes_written = *tbw; /* total bytes that have been written*/ size_t bytes_to_write_in_cycle = 0; /* left to be written in a cycle*/ - size_t sum_previous_counts = 0; + size_t sum_previous_counts = *spc; /* total bytes used, up to the start + of the memory block decoded_iov[*ii]; + is always less or equal to tbw */ size_t sum_previous_length = 0; int k = 0; /* index into the io_array */ int i = *ii; @@ -374,7 +381,7 @@ int mca_common_ompio_build_io_array ( mca_io_ompio_file_t *fh, int index, int cy i = i + 1; } - disp = (OPAL_PTRDIFF_TYPE)decoded_iov[i].iov_base + + disp = (ptrdiff_t)decoded_iov[i].iov_base + (total_bytes_written - sum_previous_counts); fh->f_io_array[k].memory_address = (IOVBASE_TYPE *)disp; @@ -404,7 +411,7 @@ int mca_common_ompio_build_io_array ( mca_io_ompio_file_t *fh, int index, int cy } } - disp = (OPAL_PTRDIFF_TYPE)fh->f_decoded_iov[j].iov_base + + disp = (ptrdiff_t)fh->f_decoded_iov[j].iov_base + (fh->f_total_bytes - sum_previous_length); fh->f_io_array[k].offset = (IOVBASE_TYPE *)(intptr_t)(disp + fh->f_offset); @@ -432,16 +439,18 @@ int mca_common_ompio_build_io_array ( mca_io_ompio_file_t *fh, int index, int cy printf("*************************** %d\n", fh->f_num_of_io_entries); for (d=0 ; df_num_of_io_entries ; d++) { - printf(" ADDRESS: %p OFFSET: %p LENGTH: %d\n", + printf(" ADDRESS: %p OFFSET: %p LENGTH: %d prev_count=%ld prev_length=%ld\n", fh->f_io_array[d].memory_address, fh->f_io_array[d].offset, - fh->f_io_array[d].length); + fh->f_io_array[d].length, + sum_previous_counts, sum_previous_length); } } #endif *ii = i; *jj = j; *tbw = total_bytes_written; + *spc = sum_previous_counts; return OMPI_SUCCESS; } diff --git a/ompi/mca/common/ompio/configure.m4 b/ompi/mca/common/ompio/configure.m4 index eee33e5750d..4ee2d96fffe 100644 --- a/ompi/mca/common/ompio/configure.m4 +++ b/ompi/mca/common/ompio/configure.m4 @@ -1,6 +1,6 @@ # -*- shell-script -*- # -# Copyright (c) 2016 Research Organization for Information Science +# Copyright (c) 2016-2017 Research Organization for Information Science # and Technology (RIST). All rights reserved. # $COPYRIGHT$ # @@ -15,8 +15,7 @@ AC_DEFUN([MCA_ompi_common_ompio_CONFIG],[ AC_CONFIG_FILES([ompi/mca/common/ompio/Makefile]) - AS_IF([test "$enable_mpi_io" != "no" && - test "$enable_io_ompio" != "no"], + AS_IF([test "$enable_io_ompio" != "no"], [$1], [$2]) ])dnl diff --git a/ompi/mca/crcp/bkmrk/Makefile.am b/ompi/mca/crcp/bkmrk/Makefile.am index e1a72081a43..be788df3497 100644 --- a/ompi/mca/crcp/bkmrk/Makefile.am +++ b/ompi/mca/crcp/bkmrk/Makefile.am @@ -8,6 +8,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2010-2014 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -38,6 +39,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_crcp_bkmrk_la_SOURCES = $(sources) mca_crcp_bkmrk_la_LDFLAGS = -module -avoid-version +mca_crcp_bkmrk_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_crcp_bkmrk_la_SOURCES = $(sources) diff --git a/ompi/mca/fbtl/configure.m4 b/ompi/mca/fbtl/configure.m4 index 803de5aaf79..e06e8bc4d71 100644 --- a/ompi/mca/fbtl/configure.m4 +++ b/ompi/mca/fbtl/configure.m4 @@ -1,7 +1,7 @@ # -*- shell-script -*- # # Copyright (c) 2011 Cisco Systems, Inc. All rights reserved. -# Copyright (c) 2016 Research Organization for Information Science +# Copyright (c) 2016-2017 Research Organization for Information Science # and Technology (RIST). All rights reserved. # # $COPYRIGHT$ @@ -17,8 +17,7 @@ AC_DEFUN([MCA_ompi_fbtl_CONFIG], [ OPAL_VAR_SCOPE_PUSH([want_io_ompio]) - AS_IF([test "$enable_mpi_io" != "no" && - test "$enable_io_ompio" != "no"], + AS_IF([test "$enable_io_ompio" != "no"], [want_io_ompio=1], [want_io_ompio=0]) diff --git a/ompi/mca/fbtl/plfs/Makefile.am b/ompi/mca/fbtl/plfs/Makefile.am deleted file mode 100644 index 68fb67d034d..00000000000 --- a/ompi/mca/fbtl/plfs/Makefile.am +++ /dev/null @@ -1,54 +0,0 @@ -# -# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana -# University Research and Technology -# Corporation. All rights reserved. -# Copyright (c) 2004-2005 The University of Tennessee and The University -# of Tennessee Research Foundation. All rights -# reserved. -# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, -# University of Stuttgart. All rights reserved. -# Copyright (c) 2004-2005 The Regents of the University of California. -# All rights reserved. -# Copyright (c) 2008-2011 University of Houston. All rights reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - -# Make the output library in this directory, and name it either -# mca__.la (for DSO builds) or libmca__.la -# (for static builds). - -if MCA_BUILD_ompi_fbtl_plfs_DSO -component_noinst = -component_install = mca_fbtl_plfs.la -else -component_noinst = libmca_fbtl_plfs.la -component_install = -endif - -# Source files - -fbtl_plfs_sources = \ - fbtl_plfs.h \ - fbtl_plfs.c \ - fbtl_plfs_component.c \ - fbtl_plfs_preadv.c \ - fbtl_plfs_ipreadv.c \ - fbtl_plfs_pwritev.c \ - fbtl_plfs_ipwritev.c - -AM_CPPFLAGS = $(fbtl_plfs_CPPFLAGS) - -mcacomponentdir = $(pkglibdir) -mcacomponent_LTLIBRARIES = $(component_install) -mca_fbtl_plfs_la_SOURCES = $(fbtl_plfs_sources) -mca_fbtl_plfs_la_LIBADD = $(fbtl_plfs_LIBS) -mca_fbtl_plfs_la_LDFLAGS = -module -avoid-version $(fbtl_plfs_LDFLAGS) - -noinst_LTLIBRARIES = $(component_noinst) -libmca_fbtl_plfs_la_SOURCES = $(fbtl_plfs_sources) -libmca_fbtl_plfs_la_LIBADD = $(fbtl_plfs_LIBS) -libmca_fbtl_plfs_la_LDFLAGS = -module -avoid-version $(fbtl_plfs_LDFLAGS) diff --git a/ompi/mca/fbtl/plfs/configure.m4 b/ompi/mca/fbtl/plfs/configure.m4 deleted file mode 100644 index c7502b51107..00000000000 --- a/ompi/mca/fbtl/plfs/configure.m4 +++ /dev/null @@ -1,42 +0,0 @@ -# -*- shell-script -*- -# -# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana -# University Research and Technology -# Corporation. All rights reserved. -# Copyright (c) 2004-2005 The University of Tennessee and The University -# of Tennessee Research Foundation. All rights -# reserved. -# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, -# University of Stuttgart. All rights reserved. -# Copyright (c) 2004-2005 The Regents of the University of California. -# All rights reserved. -# Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. -# Copyright (c) 2008-2014 University of Houston. All rights reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - - -# MCA_fbtl_plfs_CONFIG(action-if-can-compile, -# [action-if-cant-compile]) -# ------------------------------------------------ -AC_DEFUN([MCA_ompi_fbtl_plfs_CONFIG],[ - AC_CONFIG_FILES([ompi/mca/fbtl/plfs/Makefile]) - - OMPI_CHECK_PLFS([fbtl_plfs], - [fbtl_plfs_happy="yes"], - [fbtl_plfs_happy="no"]) - - AS_IF([test "$fbtl_plfs_happy" = "yes"], - [$1], - [$2]) - - - # substitute in the things needed to build plfs - AC_SUBST([fbtl_plfs_CPPFLAGS]) - AC_SUBST([fbtl_plfs_LDFLAGS]) - AC_SUBST([fbtl_plfs_LIBS]) -])dnl diff --git a/ompi/mca/fbtl/plfs/fbtl_plfs.c b/ompi/mca/fbtl/plfs/fbtl_plfs.c deleted file mode 100644 index df4391a8f04..00000000000 --- a/ompi/mca/fbtl/plfs/fbtl_plfs.c +++ /dev/null @@ -1,85 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2006 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2008-2014 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - * - * These symbols are in a file by themselves to provide nice linker - * semantics. Since linkers generally pull in symbols by object fules, - * keeping these symbols as the only symbols in this file prevents - * utility programs such as "ompi_info" from having to import entire - * modules just to query their version and parameters - */ - -#include "ompi_config.h" -#include "mpi.h" -#include "ompi/mca/fbtl/fbtl.h" -#include "ompi/mca/fbtl/plfs/fbtl_plfs.h" - -/* - * ******************************************************************* - * ************************ actions structure ************************ - * ******************************************************************* - */ -static mca_fbtl_base_module_1_0_0_t plfs = { - mca_fbtl_plfs_module_init, /* initalise after being selected */ - mca_fbtl_plfs_module_finalize, /* close a module on a communicator */ - mca_fbtl_plfs_preadv, /* blocking read */ - NULL, /* non-blocking read */ - mca_fbtl_plfs_pwritev, /* blocking write */ - NULL, /* non-blocking write */ - NULL, /* module specific progress */ - NULL /* free module specific data items on the request */ -}; -/* - * ******************************************************************* - * ************************* structure ends ************************** - * ******************************************************************* - */ - -int mca_fbtl_plfs_component_init_query(bool enable_progress_threads, - bool enable_mpi_threads) { - /* Nothing to do */ - return OMPI_SUCCESS; -} - -struct mca_fbtl_base_module_1_0_0_t * -mca_fbtl_plfs_component_file_query (mca_io_ompio_file_t *fh, int *priority) { - *priority = mca_fbtl_plfs_priority; - - if (PLFS == fh->f_fstype) { - if (*priority < 50) { - *priority = 50; - } - } - return &plfs; -} - -int mca_fbtl_plfs_component_file_unquery (mca_io_ompio_file_t *file) { - /* This function might be needed for some purposes later. for now it - * does not have anything to do since there are no steps which need - * to be undone if this module is not selected */ - - return OMPI_SUCCESS; -} - -int mca_fbtl_plfs_module_init (mca_io_ompio_file_t *file) { - return OMPI_SUCCESS; -} - - -int mca_fbtl_plfs_module_finalize (mca_io_ompio_file_t *file) { - return OMPI_SUCCESS; -} diff --git a/ompi/mca/fbtl/plfs/fbtl_plfs.h b/ompi/mca/fbtl/plfs/fbtl_plfs.h deleted file mode 100644 index dc1f4ea7859..00000000000 --- a/ompi/mca/fbtl/plfs/fbtl_plfs.h +++ /dev/null @@ -1,64 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2006 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2008-2014 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef MCA_FBTL_PLFS_H -#define MCA_FBTL_PLFS_H - -#include "ompi_config.h" -#include "ompi/mca/mca.h" -#include "ompi/mca/fbtl/fbtl.h" -#include "ompi/mca/common/ompio/common_ompio.h" -#include - -extern int mca_fbtl_plfs_priority; - -BEGIN_C_DECLS - -int mca_fbtl_plfs_component_init_query(bool enable_progress_threads, - bool enable_mpi_threads); -struct mca_fbtl_base_module_1_0_0_t * -mca_fbtl_plfs_component_file_query (mca_io_ompio_file_t *file, int *priority); -int mca_fbtl_plfs_component_file_unquery (mca_io_ompio_file_t *file); - -int mca_fbtl_plfs_module_init (mca_io_ompio_file_t *file); -int mca_fbtl_plfs_module_finalize (mca_io_ompio_file_t *file); - -OMPI_MODULE_DECLSPEC extern mca_fbtl_base_component_2_0_0_t mca_fbtl_plfs_component; -/* - * ****************************************************************** - * ********* functions which are implemented in this module ********* - * ****************************************************************** - */ - -ssize_t mca_fbtl_plfs_preadv (mca_io_ompio_file_t *file ); -ssize_t mca_fbtl_plfs_pwritev (mca_io_ompio_file_t *file ); -ssize_t mca_fbtl_plfs_ipreadv (mca_io_ompio_file_t *file, - ompi_request_t **request); -ssize_t mca_fbtl_plfs_ipwritev (mca_io_ompio_file_t *file, - ompi_request_t **request); - -/* - * ****************************************************************** - * ************ functions implemented in this module end ************ - * ****************************************************************** - */ - -END_C_DECLS - -#endif /* MCA_FBTL_PLFS_H */ diff --git a/ompi/mca/fbtl/plfs/fbtl_plfs_component.c b/ompi/mca/fbtl/plfs/fbtl_plfs_component.c deleted file mode 100644 index aa2b9f347a6..00000000000 --- a/ompi/mca/fbtl/plfs/fbtl_plfs_component.c +++ /dev/null @@ -1,65 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2005 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2008-2014 University of Houston. All rights reserved. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights - * reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - * - * These symbols are in a file by themselves to provide nice linker - * semantics. Since linkers generally pull in symbols by object - * files, keeping these symbols as the only symbols in this file - * prevents utility programs such as "ompi_info" from having to import - * entire components just to query their version and parameters. - */ - -#include "ompi_config.h" -#include "fbtl_plfs.h" -#include "mpi.h" - -/* - * Public string showing the fbtl plfs component version number - */ -const char *mca_fbtl_plfs_component_version_string = - "OMPI/MPI plfs FBTL MCA component version " OMPI_VERSION; - -int mca_fbtl_plfs_priority = 10; - -/* - * Instantiate the public struct with all of our public information - * and pointers to our public functions in it - */ -mca_fbtl_base_component_2_0_0_t mca_fbtl_plfs_component = { - - /* First, the mca_component_t struct containing meta information - about the component itself */ - - .fbtlm_version = { - MCA_FBTL_BASE_VERSION_2_0_0, - - /* Component name and version */ - .mca_component_name = "plfs", - MCA_BASE_MAKE_VERSION(component, OMPI_MAJOR_VERSION, OMPI_MINOR_VERSION, - OMPI_RELEASE_VERSION), - }, - .fbtlm_data = { - /* This component is checkpointable */ - MCA_BASE_METADATA_PARAM_CHECKPOINT - }, - .fbtlm_init_query = mca_fbtl_plfs_component_init_query, /* get thread level */ - .fbtlm_file_query = mca_fbtl_plfs_component_file_query, /* get priority and actions */ - .fbtlm_file_unquery = mca_fbtl_plfs_component_file_unquery, /* undo what was done by previous function */ -}; diff --git a/ompi/mca/fbtl/plfs/fbtl_plfs_ipreadv.c b/ompi/mca/fbtl/plfs/fbtl_plfs_ipreadv.c deleted file mode 100644 index 9cb16785034..00000000000 --- a/ompi/mca/fbtl/plfs/fbtl_plfs_ipreadv.c +++ /dev/null @@ -1,33 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2011 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2008-2014 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - - -#include "ompi_config.h" -#include "fbtl_plfs.h" - -#include "mpi.h" -#include "ompi/constants.h" -#include "ompi/mca/fbtl/fbtl.h" - -ssize_t mca_fbtl_plfs_ipreadv (mca_io_ompio_file_t *file, - ompi_request_t **request) -{ - printf ("PLFS IPREADV\n"); - return OMPI_ERROR; -} diff --git a/ompi/mca/fbtl/plfs/fbtl_plfs_ipwritev.c b/ompi/mca/fbtl/plfs/fbtl_plfs_ipwritev.c deleted file mode 100644 index 4365ac99238..00000000000 --- a/ompi/mca/fbtl/plfs/fbtl_plfs_ipwritev.c +++ /dev/null @@ -1,33 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2011 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2008-2014 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#include "ompi_config.h" -#include "fbtl_plfs.h" - -#include "mpi.h" -#include -#include "ompi/constants.h" -#include "ompi/mca/fbtl/fbtl.h" - -ssize_t mca_fbtl_plfs_ipwritev (mca_io_ompio_file_t *fh, - ompi_request_t **request) -{ - printf ("PLFS IPWRITEV\n"); - return OMPI_ERROR; -} diff --git a/ompi/mca/fbtl/plfs/fbtl_plfs_preadv.c b/ompi/mca/fbtl/plfs/fbtl_plfs_preadv.c deleted file mode 100644 index 26e60065a5a..00000000000 --- a/ompi/mca/fbtl/plfs/fbtl_plfs_preadv.c +++ /dev/null @@ -1,55 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2011 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2008-2014 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#include "ompi_config.h" -#include "fbtl_plfs.h" - -#include "mpi.h" -#include -#include "ompi/constants.h" -#include "ompi/mca/fbtl/fbtl.h" - -ssize_t mca_fbtl_plfs_preadv (mca_io_ompio_file_t *fh ) -{ - - Plfs_fd *pfd = fh->f_fs_ptr; - plfs_error_t plfs_ret; - ssize_t total_bytes_read=0; - int i; - ssize_t bytes_read; - - if (NULL == fh->f_io_array) { - return OMPI_ERROR; - } - - for (i=0 ; if_num_of_io_entries ; i++) { - plfs_ret = plfs_read( pfd, fh->f_io_array[i].memory_address, fh->f_io_array[i].length, - (off_t )fh->f_io_array[i].offset, &bytes_read ); - if (PLFS_SUCCESS != plfs_ret) { - opal_output(0, "fbtl_plfs_preadv: Error in plfs_read:\n%s\n", strplfserr(plfs_ret)); - return OMPI_ERROR; - } - - if (bytes_read < 0) - return OMPI_ERROR; - total_bytes_read += bytes_read; - } - - return total_bytes_read; -} diff --git a/ompi/mca/fbtl/plfs/fbtl_plfs_pwritev.c b/ompi/mca/fbtl/plfs/fbtl_plfs_pwritev.c deleted file mode 100644 index cd63c9db5a2..00000000000 --- a/ompi/mca/fbtl/plfs/fbtl_plfs_pwritev.c +++ /dev/null @@ -1,54 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2011 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2008-2014 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - - -#include "ompi_config.h" -#include "fbtl_plfs.h" - -#include "mpi.h" -#include -#include "ompi/constants.h" -#include "ompi/mca/fbtl/fbtl.h" - -ssize_t mca_fbtl_plfs_pwritev (mca_io_ompio_file_t *fh ) -{ - Plfs_fd *pfd = fh->f_fs_ptr; - plfs_error_t plfs_ret; - ssize_t total_bytes_written=0; - ssize_t bytes_written; - int i; - - if (NULL == fh->f_io_array) { - return OMPI_ERROR; - } - - for (i=0 ; if_num_of_io_entries ; i++) { - plfs_ret = plfs_write( pfd, fh->f_io_array[i].memory_address, - fh->f_io_array[i].length, - (off_t) fh->f_io_array[i].offset, - fh->f_rank, &bytes_written ); - if (PLFS_SUCCESS != plfs_ret) { - opal_output(0, "fbtl_plfs_pwritev: Error in plfs_write:\n%s\n", strplfserr(plfs_ret)); - return OMPI_ERROR; - } - total_bytes_written += bytes_written; - } - - return total_bytes_written; -} diff --git a/ompi/mca/fbtl/plfs/owner.txt b/ompi/mca/fbtl/plfs/owner.txt deleted file mode 100644 index 2e9726c28a4..00000000000 --- a/ompi/mca/fbtl/plfs/owner.txt +++ /dev/null @@ -1,7 +0,0 @@ -# -# owner/status file -# owner: institution that is responsible for this package -# status: e.g. active, maintenance, unmaintained -# -owner: UH -status: active diff --git a/ompi/mca/fbtl/posix/Makefile.am b/ompi/mca/fbtl/posix/Makefile.am index 2c806f08e00..a7b0624d3ec 100644 --- a/ompi/mca/fbtl/posix/Makefile.am +++ b/ompi/mca/fbtl/posix/Makefile.am @@ -9,7 +9,8 @@ # University of Stuttgart. All rights reserved. # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. -# Copyright (c) 2008-2011 University of Houston. All rights reserved. +# Copyright (c) 2008-2017 University of Houston. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -33,6 +34,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_fbtl_posix_la_SOURCES = $(sources) mca_fbtl_posix_la_LDFLAGS = -module -avoid-version +mca_fbtl_posix_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_fbtl_posix_la_SOURCES = $(sources) @@ -47,4 +49,5 @@ sources = \ fbtl_posix_preadv.c \ fbtl_posix_ipreadv.c \ fbtl_posix_pwritev.c \ - fbtl_posix_ipwritev.c + fbtl_posix_ipwritev.c \ + fbtl_posix_lock.c diff --git a/ompi/mca/fbtl/posix/fbtl_posix.c b/ompi/mca/fbtl/posix/fbtl_posix.c index 4c6d21ab011..f8c1c46385c 100644 --- a/ompi/mca/fbtl/posix/fbtl_posix.c +++ b/ompi/mca/fbtl/posix/fbtl_posix.c @@ -10,6 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2015 University of Houston. All rights reserved. + * Copyright (c) 2018 Cisco Systems, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -116,8 +117,9 @@ bool mca_fbtl_posix_progress ( mca_ompio_request_t *req) { bool ret=false; #if defined (FBTL_POSIX_HAVE_AIO) - int i=0, lcount=0; + int i=0, lcount=0, ret_code=0; mca_fbtl_posix_request_data_t *data=(mca_fbtl_posix_request_data_t *)req->req_data; + off_t start_offset, end_offset, total_length; for (i=data->aio_first_active_req; i < data->aio_last_active_req; i++ ) { if ( EINPROGRESS == data->aio_req_status[i] ) { @@ -154,6 +156,9 @@ bool mca_fbtl_posix_progress ( mca_ompio_request_t *req) #endif if ( (lcount == data->aio_req_chunks) && (0 != data->aio_open_reqs )) { + /* release the lock of the previous operations */ + mca_fbtl_posix_unlock ( &data->aio_lock, data->aio_fh ); + /* post the next batch of operations */ data->aio_first_active_req = data->aio_last_active_req; if ( (data->aio_req_count-data->aio_last_active_req) > data->aio_req_chunks ) { @@ -162,16 +167,36 @@ bool mca_fbtl_posix_progress ( mca_ompio_request_t *req) else { data->aio_last_active_req = data->aio_req_count; } + + start_offset = data->aio_reqs[data->aio_first_active_req].aio_offset; + end_offset = data->aio_reqs[data->aio_last_active_req-1].aio_offset + data->aio_reqs[data->aio_last_active_req-1].aio_nbytes; + total_length = (end_offset - start_offset); + + if ( FBTL_POSIX_READ == data->aio_req_type ) { + ret_code = mca_fbtl_posix_lock( &data->aio_lock, data->aio_fh, F_RDLCK, start_offset, total_length, OMPIO_LOCK_ENTIRE_REGION ); + } + else if ( FBTL_POSIX_WRITE == data->aio_req_type ) { + ret_code = mca_fbtl_posix_lock( &data->aio_lock, data->aio_fh, F_WRLCK, start_offset, total_length, OMPIO_LOCK_ENTIRE_REGION ); + } + if ( 0 < ret_code ) { + opal_output(1, "mca_fbtl_posix_progress: error in mca_fbtl_posix_lock() %d", ret_code); + /* Just in case some part of the lock actually succeeded. */ + mca_fbtl_posix_unlock ( &data->aio_lock, data->aio_fh ); + return OMPI_ERROR; + } + for ( i=data->aio_first_active_req; i< data->aio_last_active_req; i++ ) { if ( FBTL_POSIX_READ == data->aio_req_type ) { if (-1 == aio_read(&data->aio_reqs[i])) { - perror("aio_read() error"); + opal_output(1, "mca_fbtl_posix_progress: error in aio_read()"); + mca_fbtl_posix_unlock ( &data->aio_lock, data->aio_fh ); return OMPI_ERROR; } } else if ( FBTL_POSIX_WRITE == data->aio_req_type ) { if (-1 == aio_write(&data->aio_reqs[i])) { - perror("aio_write() error"); + opal_output(1, "mca_fbtl_posix_progress: error in aio_write()"); + mca_fbtl_posix_unlock ( &data->aio_lock, data->aio_fh ); return OMPI_ERROR; } } @@ -185,6 +210,7 @@ bool mca_fbtl_posix_progress ( mca_ompio_request_t *req) /* all pending operations are finished for this request */ req->req_ompi.req_status.MPI_ERROR = OMPI_SUCCESS; req->req_ompi.req_status._ucount = data->aio_total_len; + mca_fbtl_posix_unlock ( &data->aio_lock, data->aio_fh ); ret = true; } #endif @@ -197,6 +223,7 @@ void mca_fbtl_posix_request_free ( mca_ompio_request_t *req) /* Free the fbtl specific data structures */ mca_fbtl_posix_request_data_t *data=(mca_fbtl_posix_request_data_t *)req->req_data; if (NULL != data ) { + mca_fbtl_posix_unlock ( &data->aio_lock, data->aio_fh ); if ( NULL != data->aio_reqs ) { free ( data->aio_reqs); } diff --git a/ompi/mca/fbtl/posix/fbtl_posix.h b/ompi/mca/fbtl/posix/fbtl_posix.h index 9111cba7612..1ee6ffb40a5 100644 --- a/ompi/mca/fbtl/posix/fbtl_posix.h +++ b/ompi/mca/fbtl/posix/fbtl_posix.h @@ -58,6 +58,11 @@ ssize_t mca_fbtl_posix_ipwritev (mca_io_ompio_file_t *file, bool mca_fbtl_posix_progress ( mca_ompio_request_t *req); void mca_fbtl_posix_request_free ( mca_ompio_request_t *req); +int mca_fbtl_posix_lock ( struct flock *lock, mca_io_ompio_file_t *fh, int op, + OMPI_MPI_OFFSET_TYPE iov_offset, off_t len, int flags); +void mca_fbtl_posix_unlock ( struct flock *lock, mca_io_ompio_file_t *fh ); + + struct mca_fbtl_posix_request_data_t { int aio_req_count; /* total number of aio reqs */ int aio_open_reqs; /* number of unfinished reqs */ @@ -68,6 +73,8 @@ struct mca_fbtl_posix_request_data_t { struct aiocb *aio_reqs; /* pointer array of req structures */ int *aio_req_status; /* array of statuses */ ssize_t aio_total_len; /* total amount of data written */ + struct flock aio_lock; /* lock used for certain file systems */ + mca_io_ompio_file_t *aio_fh; /* pointer back to the mca_io_ompio_fh structure */ }; typedef struct mca_fbtl_posix_request_data_t mca_fbtl_posix_request_data_t; @@ -78,6 +85,7 @@ typedef struct mca_fbtl_posix_request_data_t mca_fbtl_posix_request_data_t; #define FBTL_POSIX_READ 1 #define FBTL_POSIX_WRITE 2 + /* * ****************************************************************** * ************ functions implemented in this module end ************ diff --git a/ompi/mca/fbtl/posix/fbtl_posix_ipreadv.c b/ompi/mca/fbtl/posix/fbtl_posix_ipreadv.c index 00eaedeaf74..0b56d8334ad 100644 --- a/ompi/mca/fbtl/posix/fbtl_posix_ipreadv.c +++ b/ompi/mca/fbtl/posix/fbtl_posix_ipreadv.c @@ -39,7 +39,8 @@ ssize_t mca_fbtl_posix_ipreadv (mca_io_ompio_file_t *fh, #if defined (FBTL_POSIX_HAVE_AIO) mca_fbtl_posix_request_data_t *data; mca_ompio_request_t *req = (mca_ompio_request_t *) request; - int i=0; + int i=0, ret; + off_t start_offset, end_offset, total_length; data = (mca_fbtl_posix_request_data_t *) malloc ( sizeof (mca_fbtl_posix_request_data_t)); if ( NULL == data ) { @@ -67,6 +68,7 @@ ssize_t mca_fbtl_posix_ipreadv (mca_io_ompio_file_t *fh, free(data); return 0; } + data->aio_fh = fh; for ( i=0; if_num_of_io_entries; i++ ) { data->aio_reqs[i].aio_offset = (OMPI_MPI_OFFSET_TYPE)(intptr_t) @@ -86,9 +88,24 @@ ssize_t mca_fbtl_posix_ipreadv (mca_io_ompio_file_t *fh, else { data->aio_last_active_req = data->aio_req_count; } + + start_offset = data->aio_reqs[data->aio_first_active_req].aio_offset; + end_offset = data->aio_reqs[data->aio_last_active_req-1].aio_offset + data->aio_reqs[data->aio_last_active_req-1].aio_nbytes; + total_length = (end_offset - start_offset); + ret = mca_fbtl_posix_lock( &data->aio_lock, data->aio_fh, F_RDLCK, start_offset, total_length, OMPIO_LOCK_ENTIRE_REGION ); + if ( 0 < ret ) { + opal_output(1, "mca_fbtl_posix_ipreadv: error in mca_fbtl_posix_lock() error ret=%d %s", ret, strerror(errno)); + mca_fbtl_posix_unlock ( &data->aio_lock, data->aio_fh ); + free(data->aio_reqs); + free(data->aio_req_status); + free(data); + return OMPI_ERROR; + } + for (i=0; i < data->aio_last_active_req; i++) { if (-1 == aio_read(&data->aio_reqs[i])) { - opal_output(1, "aio_read() error: %s", strerror(errno)); + opal_output(1, "mca_fbtl_posix_ipreadv: error in aio_read(): %s", strerror(errno)); + mca_fbtl_posix_unlock ( &data->aio_lock, data->aio_fh ); free(data->aio_reqs); free(data->aio_req_status); free(data); diff --git a/ompi/mca/fbtl/posix/fbtl_posix_ipwritev.c b/ompi/mca/fbtl/posix/fbtl_posix_ipwritev.c index 1d869c2a756..11790f453f9 100644 --- a/ompi/mca/fbtl/posix/fbtl_posix_ipwritev.c +++ b/ompi/mca/fbtl/posix/fbtl_posix_ipwritev.c @@ -38,7 +38,8 @@ ssize_t mca_fbtl_posix_ipwritev (mca_io_ompio_file_t *fh, #if defined(FBTL_POSIX_HAVE_AIO) mca_fbtl_posix_request_data_t *data; mca_ompio_request_t *req = (mca_ompio_request_t *) request; - int i=0; + int i=0, ret; + off_t start_offset, end_offset, total_length; data = (mca_fbtl_posix_request_data_t *) malloc ( sizeof (mca_fbtl_posix_request_data_t)); if ( NULL == data ) { @@ -66,6 +67,7 @@ ssize_t mca_fbtl_posix_ipwritev (mca_io_ompio_file_t *fh, free(data); return 0; } + data->aio_fh = fh; for ( i=0; if_num_of_io_entries; i++ ) { data->aio_reqs[i].aio_offset = (OMPI_MPI_OFFSET_TYPE)(intptr_t) @@ -85,10 +87,24 @@ ssize_t mca_fbtl_posix_ipwritev (mca_io_ompio_file_t *fh, else { data->aio_last_active_req = data->aio_req_count; } + + start_offset = data->aio_reqs[data->aio_first_active_req].aio_offset; + end_offset = data->aio_reqs[data->aio_last_active_req-1].aio_offset + data->aio_reqs[data->aio_last_active_req-1].aio_nbytes; + total_length = (end_offset - start_offset); + ret = mca_fbtl_posix_lock( &data->aio_lock, data->aio_fh, F_WRLCK, start_offset, total_length, OMPIO_LOCK_ENTIRE_REGION ); + if ( 0 < ret ) { + opal_output(1, "mca_fbtl_posix_ipwritev: error in mca_fbtl_posix_lock() error ret=%d %s", ret, strerror(errno)); + mca_fbtl_posix_unlock ( &data->aio_lock, data->aio_fh ); + free(data->aio_reqs); + free(data->aio_req_status); + free(data); + return OMPI_ERROR; + } for (i=0; i < data->aio_last_active_req; i++) { if (-1 == aio_write(&data->aio_reqs[i])) { - opal_output(1, "aio_write() error: %s", strerror(errno)); + opal_output(1, "mca_fbtl_posix_ipwritev: error in aio_write(): %s", strerror(errno)); + mca_fbtl_posix_unlock ( &data->aio_lock, data->aio_fh ); free(data->aio_req_status); free(data->aio_reqs); free(data); diff --git a/ompi/mca/fbtl/posix/fbtl_posix_lock.c b/ompi/mca/fbtl/posix/fbtl_posix_lock.c new file mode 100644 index 00000000000..b59ec057e90 --- /dev/null +++ b/ompi/mca/fbtl/posix/fbtl_posix_lock.c @@ -0,0 +1,149 @@ +/* + * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana + * University Research and Technology + * Corporation. All rights reserved. + * Copyright (c) 2004-2011 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, + * University of Stuttgart. All rights reserved. + * Copyright (c) 2004-2005 The Regents of the University of California. + * All rights reserved. + * Copyright (c) 2017 University of Houston. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "ompi_config.h" +#include "fbtl_posix.h" + +#include "mpi.h" +#include +#include +#include +#include +#include "ompi/constants.h" +#include "ompi/mca/fbtl/fbtl.h" + +#define MAX_ERRCOUNT 100 + +/* + op: can be F_WRLCK or F_RDLCK + flags: can be OMPIO_LOCK_ENTIRE_REGION or OMPIO_LOCK_SELECTIVE. This is typically set by the operation, not the fs component. + e.g. a collective and an individual component might require different level of protection through locking, + also one might need to do different things for blocking (pwritev,preadv) operations and non-blocking (aio) operations. + + fh->f_flags can contain similar sounding flags, those were set by the fs component and/or user requests. + + Support for MPI atomicity operations are envisioned, but not yet tested. +*/ + +int mca_fbtl_posix_lock ( struct flock *lock, mca_io_ompio_file_t *fh, int op, + OMPI_MPI_OFFSET_TYPE offset, off_t len, int flags) +{ + off_t lmod, bmod; + int ret, err_count; + + lock->l_type = op; + lock->l_whence = SEEK_SET; + lock->l_start =-1; + lock->l_len =-1; + if ( 0 == len ) { + return 0; + } + + if ( fh->f_flags & OMPIO_LOCK_ENTIRE_FILE ) { + lock->l_start = (off_t) 0; + lock->l_len = 0; + } + else { + if ( (fh->f_flags & OMPIO_LOCK_NEVER) || + (fh->f_flags & OMPIO_LOCK_NOT_THIS_OP )){ + /* OMPIO_LOCK_NEVER: + ompio tells us not to worry about locking. This can be due to three + reasons: + 1. user enforced + 2. single node job where the locking is handled already in the kernel + 3. file view is set to distinct regions such that multiple processes + do not collide on the block level. ( not entirely sure yet how + to check for this except in trivial cases). + OMPI_LOCK_NOT_THIS_OP: + will typically be set by fcoll components indicating that the file partitioning + ensures no overlap in blocks. + */ + return 0; + } + if ( flags == OMPIO_LOCK_ENTIRE_REGION ) { + lock->l_start = (off_t) offset; + lock->l_len = len; + } + else { + /* We only try to lock the first block in the data range if + the starting offset is not the starting offset of a file system + block. And the last block in the data range if the offset+len + is not equal to the end of a file system block. + If we need to lock both beginning + end, we combine + the two into a single lock. + */ + bmod = offset % fh->f_fs_block_size; + if ( bmod ) { + lock->l_start = (off_t) offset; + lock->l_len = bmod; + } + lmod = (offset+len)%fh->f_fs_block_size; + if ( lmod ) { + if ( !bmod ) { + lock->l_start = (offset+len-lmod ); + lock->l_len = lmod; + } + else { + lock->l_len = len; + } + } + if ( -1 == lock->l_start && -1 == lock->l_len ) { + /* no need to lock in this instance */ + return 0; + } + } + } + + +#ifdef OMPIO_DEBUG + printf("%d: acquiring lock for offset %ld length %ld requested offset %ld request len %ld \n", + fh->f_rank, lock->l_start, lock->l_len, offset, len); +#endif + errno=0; + err_count=0; + do { + ret = fcntl ( fh->fd, F_SETLKW, lock); + if ( ret ) { +#ifdef OMPIO_DEBUG + printf("[%d] ret = %d errno=%d %s\n", fh->f_rank, ret, errno, strerror(errno) ); +#endif + err_count++; + } + } while ( ret && ((errno == EINTR) || ((errno == EINPROGRESS) && err_count < MAX_ERRCOUNT ))); + + + return ret; +} + +void mca_fbtl_posix_unlock ( struct flock *lock, mca_io_ompio_file_t *fh ) +{ + if ( -1 == lock->l_start && -1 == lock->l_len ) { + return; + } + + lock->l_type = F_UNLCK; +#ifdef OMPIO_DEBUG + printf("%d: releasing lock for offset %ld length %ld\n", fh->f_rank, lock->l_start, lock->l_len); +#endif + fcntl ( fh->fd, F_SETLK, lock); + lock->l_start = -1; + lock->l_len = -1; + + return; +} diff --git a/ompi/mca/fbtl/posix/fbtl_posix_preadv.c b/ompi/mca/fbtl/posix/fbtl_posix_preadv.c index 27dc589ee0a..5f5593c8273 100644 --- a/ompi/mca/fbtl/posix/fbtl_posix_preadv.c +++ b/ompi/mca/fbtl/posix/fbtl_posix_preadv.c @@ -9,8 +9,8 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2008-2014 University of Houston. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2008-2017 University of Houston. All rights reserved. + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -31,11 +31,13 @@ ssize_t mca_fbtl_posix_preadv (mca_io_ompio_file_t *fh ) { /*int *fp = NULL;*/ - int i, block=1; + int i, block=1, ret; struct iovec *iov = NULL; int iov_count = 0; OMPI_MPI_OFFSET_TYPE iov_offset = 0; ssize_t bytes_read=0, ret_code=0; + struct flock lock; + off_t total_length, end_offset=0; if (NULL == fh->f_io_array) { return OMPI_ERROR; @@ -53,6 +55,7 @@ ssize_t mca_fbtl_posix_preadv (mca_io_ompio_file_t *fh ) iov[iov_count].iov_base = fh->f_io_array[i].memory_address; iov[iov_count].iov_len = fh->f_io_array[i].length; iov_offset = (OMPI_MPI_OFFSET_TYPE)(intptr_t)fh->f_io_array[i].offset; + end_offset = (off_t)fh->f_io_array[i].offset + (off_t)fh->f_io_array[i].length; iov_count ++; } @@ -69,35 +72,45 @@ ssize_t mca_fbtl_posix_preadv (mca_io_ompio_file_t *fh ) if (fh->f_num_of_io_entries != i+1) { if (((((OMPI_MPI_OFFSET_TYPE)(intptr_t)fh->f_io_array[i].offset + - (OPAL_PTRDIFF_TYPE)fh->f_io_array[i].length) == + (ptrdiff_t)fh->f_io_array[i].length) == (OMPI_MPI_OFFSET_TYPE)(intptr_t)fh->f_io_array[i+1].offset)) && (iov_count < IOV_MAX ) ){ iov[iov_count].iov_base = fh->f_io_array[i+1].memory_address; iov[iov_count].iov_len = fh->f_io_array[i+1].length; + end_offset = (off_t)fh->f_io_array[i].offset + (off_t)fh->f_io_array[i].length; iov_count ++; continue; } } + total_length = (end_offset - (off_t)iov_offset ); + + ret = mca_fbtl_posix_lock ( &lock, fh, F_RDLCK, iov_offset, total_length, OMPIO_LOCK_SELECTIVE ); + if ( 0 < ret ) { + opal_output(1, "mca_fbtl_posix_preadv: error in mca_fbtl_posix_lock() ret=%d: %s", ret, strerror(errno)); + free (iov); + /* Just in case some part of the lock worked */ + mca_fbtl_posix_unlock ( &lock, fh); + return OMPI_ERROR; + } #if defined(HAVE_PREADV) ret_code = preadv (fh->fd, iov, iov_count, iov_offset); - if ( 0 < ret_code ) { - bytes_read+=ret_code; - } #else if (-1 == lseek (fh->fd, iov_offset, SEEK_SET)) { - opal_output(1, "lseek:%s", strerror(errno)); + opal_output(1, "mca_fbtl_posix_preadv: error in lseek:%s", strerror(errno)); free(iov); + mca_fbtl_posix_unlock ( &lock, fh ); return OMPI_ERROR; } ret_code = readv (fh->fd, iov, iov_count); +#endif + mca_fbtl_posix_unlock ( &lock, fh ); if ( 0 < ret_code ) { bytes_read+=ret_code; } -#endif else if ( ret_code == -1 ) { - opal_output(1, "readv:%s", strerror(errno)); + opal_output(1, "mca_fbtl_posix_preadv: error in (p)readv:%s", strerror(errno)); free(iov); return OMPI_ERROR; } diff --git a/ompi/mca/fbtl/posix/fbtl_posix_pwritev.c b/ompi/mca/fbtl/posix/fbtl_posix_pwritev.c index fbf69489ff8..c6a640290d9 100644 --- a/ompi/mca/fbtl/posix/fbtl_posix_pwritev.c +++ b/ompi/mca/fbtl/posix/fbtl_posix_pwritev.c @@ -9,8 +9,8 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2008-2014 University of Houston. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2008-2017 University of Houston. All rights reserved. + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -33,11 +33,13 @@ ssize_t mca_fbtl_posix_pwritev(mca_io_ompio_file_t *fh ) { /*int *fp = NULL;*/ - int i, block = 1; + int i, block = 1, ret; struct iovec *iov = NULL; int iov_count = 0; OMPI_MPI_OFFSET_TYPE iov_offset = 0; ssize_t ret_code=0, bytes_written=0; + struct flock lock; + off_t total_length, end_offset=0; if (NULL == fh->f_io_array) { return OMPI_ERROR; @@ -55,6 +57,7 @@ ssize_t mca_fbtl_posix_pwritev(mca_io_ompio_file_t *fh ) iov[iov_count].iov_base = fh->f_io_array[i].memory_address; iov[iov_count].iov_len = fh->f_io_array[i].length; iov_offset = (OMPI_MPI_OFFSET_TYPE)(intptr_t)fh->f_io_array[i].offset; + end_offset = (off_t)fh->f_io_array[i].offset + (off_t)fh->f_io_array[i].length; iov_count ++; } @@ -71,13 +74,13 @@ ssize_t mca_fbtl_posix_pwritev(mca_io_ompio_file_t *fh ) if (fh->f_num_of_io_entries != i+1) { if ( (((OMPI_MPI_OFFSET_TYPE)(intptr_t)fh->f_io_array[i].offset + - (OPAL_PTRDIFF_TYPE)fh->f_io_array[i].length) == + (ptrdiff_t)fh->f_io_array[i].length) == (OMPI_MPI_OFFSET_TYPE)(intptr_t)fh->f_io_array[i+1].offset) && (iov_count < IOV_MAX )) { - iov[iov_count].iov_base = - fh->f_io_array[i+1].memory_address; - iov[iov_count].iov_len = fh->f_io_array[i+1].length; - iov_count ++; + iov[iov_count].iov_base = fh->f_io_array[i+1].memory_address; + iov[iov_count].iov_len = fh->f_io_array[i+1].length; + end_offset = (off_t)fh->f_io_array[i].offset + (off_t)fh->f_io_array[i].length; + iov_count ++; continue; } } @@ -93,25 +96,33 @@ ssize_t mca_fbtl_posix_pwritev(mca_io_ompio_file_t *fh ) } */ + + total_length = (end_offset - (off_t)iov_offset); + ret = mca_fbtl_posix_lock ( &lock, fh, F_WRLCK, iov_offset, total_length, OMPIO_LOCK_SELECTIVE ); + if ( 0 < ret ) { + opal_output(1, "mca_fbtl_posix_pwritev: error in mca_fbtl_posix_lock() error ret=%d %s", ret, strerror(errno)); + free (iov); + /* just in case some part of the lock worked */ + mca_fbtl_posix_unlock ( &lock, fh ); + return OMPI_ERROR; + } #if defined (HAVE_PWRITEV) ret_code = pwritev (fh->fd, iov, iov_count, iov_offset); - if ( 0 < ret_code ) { - bytes_written += ret_code; - } - #else if (-1 == lseek (fh->fd, iov_offset, SEEK_SET)) { - opal_output(1, "lseek:%s", strerror(errno)); + opal_output(1, "mca_fbtl_posix_pwritev: error in lseek:%s", strerror(errno)); free(iov); + mca_fbtl_posix_unlock ( &lock, fh ); return OMPI_ERROR; } ret_code = writev (fh->fd, iov, iov_count); +#endif + mca_fbtl_posix_unlock ( &lock, fh ); if ( 0 < ret_code ) { bytes_written += ret_code; } -#endif else if (-1 == ret_code ) { - opal_output(1, "writev:%s", strerror(errno)); + opal_output(1, "mca_fbtl_posix_pwritev: error in writev:%s", strerror(errno)); free (iov); return OMPI_ERROR; } diff --git a/ompi/mca/fbtl/pvfs2/Makefile.am b/ompi/mca/fbtl/pvfs2/Makefile.am index fc877c819c1..66582947d93 100644 --- a/ompi/mca/fbtl/pvfs2/Makefile.am +++ b/ompi/mca/fbtl/pvfs2/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2008-2011 University of Houston. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -45,7 +46,8 @@ AM_CPPFLAGS = $(fbtl_pvfs2_CPPFLAGS) mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_fbtl_pvfs2_la_SOURCES = $(fbtl_pvfs2_sources) -mca_fbtl_pvfs2_la_LIBADD = $(fbtl_pvfs2_LIBS) +mca_fbtl_pvfs2_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(fbtl_pvfs2_LIBS) mca_fbtl_pvfs2_la_LDFLAGS = -module -avoid-version $(fbtl_pvfs2_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/ompi/mca/fbtl/pvfs2/fbtl_pvfs2_preadv.c b/ompi/mca/fbtl/pvfs2/fbtl_pvfs2_preadv.c index 61e9e2460c7..362c6e789b3 100644 --- a/ompi/mca/fbtl/pvfs2/fbtl_pvfs2_preadv.c +++ b/ompi/mca/fbtl/pvfs2/fbtl_pvfs2_preadv.c @@ -10,6 +10,8 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2014 University of Houston. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -54,7 +56,7 @@ ssize_t mca_fbtl_pvfs2_preadv (mca_io_ompio_file_t *fh) for (i=0 ; if_num_of_io_entries ; i++) { if (fh->f_num_of_io_entries != i+1) { if (((OMPI_MPI_OFFSET_TYPE)fh->f_io_array[i].offset + - (OPAL_PTRDIFF_TYPE)fh->f_io_array[i].length) == + (ptrdiff_t)fh->f_io_array[i].length) == (OMPI_MPI_OFFSET_TYPE)fh->f_io_array[i+1].offset) { if (!merge) { merge_offset = (OMPI_MPI_OFFSET_TYPE) diff --git a/ompi/mca/fbtl/pvfs2/fbtl_pvfs2_pwritev.c b/ompi/mca/fbtl/pvfs2/fbtl_pvfs2_pwritev.c index 31c5b46c5df..cd7c846169c 100644 --- a/ompi/mca/fbtl/pvfs2/fbtl_pvfs2_pwritev.c +++ b/ompi/mca/fbtl/pvfs2/fbtl_pvfs2_pwritev.c @@ -10,6 +10,8 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2014 University of Houston. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -55,7 +57,7 @@ ssize_t mca_fbtl_pvfs2_pwritev (mca_io_ompio_file_t *fh ) for (i=0 ; if_num_of_io_entries ; i++) { if (fh->f_num_of_io_entries != i+1) { if (((OMPI_MPI_OFFSET_TYPE)fh->f_io_array[i].offset + - (OPAL_PTRDIFF_TYPE)fh->f_io_array[i].length) == + (ptrdiff_t)fh->f_io_array[i].length) == (OMPI_MPI_OFFSET_TYPE)fh->f_io_array[i+1].offset) { if (!merge) { merge_offset = (OMPI_MPI_OFFSET_TYPE) diff --git a/ompi/mca/fcoll/base/base.h b/ompi/mca/fcoll/base/base.h index e0951cfc016..2ee125ac167 100644 --- a/ompi/mca/fcoll/base/base.h +++ b/ompi/mca/fcoll/base/base.h @@ -11,6 +11,7 @@ * All rights reserved. * Copyright (c) 2008 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2008-2011 University of Houston. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -47,7 +48,7 @@ OMPI_DECLSPEC int mca_fcoll_base_find_available(bool enable_progress_threads, OMPI_DECLSPEC int mca_fcoll_base_init_file (struct mca_io_ompio_file_t *file); OMPI_DECLSPEC int mca_fcoll_base_get_param (struct mca_io_ompio_file_t *file, int keyval); -OMPI_DECLSPEC int fcoll_base_sort_iovec (struct iovec *iov, int num_entries, int *sorted); +OMPI_DECLSPEC int ompi_fcoll_base_sort_iovec (struct iovec *iov, int num_entries, int *sorted); /* * Globals diff --git a/ompi/mca/fcoll/base/fcoll_base_coll_array.c b/ompi/mca/fcoll/base/fcoll_base_coll_array.c index 4b334f13310..4812426f560 100644 --- a/ompi/mca/fcoll/base/fcoll_base_coll_array.c +++ b/ompi/mca/fcoll/base/fcoll_base_coll_array.c @@ -11,6 +11,9 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2016 University of Houston. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -32,7 +35,7 @@ #include "ompi/mca/common/ompio/common_ompio.h" -int fcoll_base_coll_allgatherv_array (void *sbuf, +int ompi_fcoll_base_coll_allgatherv_array (void *sbuf, int scount, ompi_datatype_t *sdtype, void *rbuf, @@ -45,7 +48,7 @@ int fcoll_base_coll_allgatherv_array (void *sbuf, ompi_communicator_t *comm) { int err = OMPI_SUCCESS; - OPAL_PTRDIFF_TYPE extent, lb; + ptrdiff_t extent, lb; int i, rank, j; char *send_buf = NULL; struct ompi_datatype_t *newtype, *send_type; @@ -74,7 +77,7 @@ int fcoll_base_coll_allgatherv_array (void *sbuf, send_type = sdtype; } - err = fcoll_base_coll_gatherv_array (send_buf, + err = ompi_fcoll_base_coll_gatherv_array (send_buf, rcounts[j], send_type, rbuf, @@ -102,7 +105,7 @@ int fcoll_base_coll_allgatherv_array (void *sbuf, return err; } - fcoll_base_coll_bcast_array (rbuf, + ompi_fcoll_base_coll_bcast_array (rbuf, 1, newtype, root_index, @@ -115,7 +118,7 @@ int fcoll_base_coll_allgatherv_array (void *sbuf, return OMPI_SUCCESS; } -int fcoll_base_coll_gatherv_array (void *sbuf, +int ompi_fcoll_base_coll_gatherv_array (void *sbuf, int scount, ompi_datatype_t *sdtype, void *rbuf, @@ -130,7 +133,7 @@ int fcoll_base_coll_gatherv_array (void *sbuf, int i, rank; int err = OMPI_SUCCESS; char *ptmp; - OPAL_PTRDIFF_TYPE extent, lb; + ptrdiff_t extent, lb; ompi_request_t **reqs=NULL; rank = ompi_comm_rank (comm); @@ -204,7 +207,7 @@ int fcoll_base_coll_gatherv_array (void *sbuf, return err; } -int fcoll_base_coll_scatterv_array (void *sbuf, +int ompi_fcoll_base_coll_scatterv_array (void *sbuf, int *scounts, int *disps, ompi_datatype_t *sdtype, @@ -219,7 +222,7 @@ int fcoll_base_coll_scatterv_array (void *sbuf, int i, rank; int err = OMPI_SUCCESS; char *ptmp; - OPAL_PTRDIFF_TYPE extent, lb; + ptrdiff_t extent, lb; ompi_request_t ** reqs=NULL; rank = ompi_comm_rank (comm); @@ -294,7 +297,7 @@ int fcoll_base_coll_scatterv_array (void *sbuf, return err; } -int fcoll_base_coll_allgather_array (void *sbuf, +int ompi_fcoll_base_coll_allgather_array (void *sbuf, int scount, ompi_datatype_t *sdtype, void *rbuf, @@ -307,7 +310,7 @@ int fcoll_base_coll_allgather_array (void *sbuf, { int err = OMPI_SUCCESS; int rank; - OPAL_PTRDIFF_TYPE extent, lb; + ptrdiff_t extent, lb; rank = ompi_comm_rank (comm); @@ -322,7 +325,7 @@ int fcoll_base_coll_allgather_array (void *sbuf, } /* Gather and broadcast. */ - err = fcoll_base_coll_gather_array (sbuf, + err = ompi_fcoll_base_coll_gather_array (sbuf, scount, sdtype, rbuf, @@ -334,7 +337,7 @@ int fcoll_base_coll_allgather_array (void *sbuf, comm); if (OMPI_SUCCESS == err) { - err = fcoll_base_coll_bcast_array (rbuf, + err = ompi_fcoll_base_coll_bcast_array (rbuf, rcount * procs_per_group, rdtype, root_index, @@ -347,7 +350,7 @@ int fcoll_base_coll_allgather_array (void *sbuf, return err; } -int fcoll_base_coll_gather_array (void *sbuf, +int ompi_fcoll_base_coll_gather_array (void *sbuf, int scount, ompi_datatype_t *sdtype, void *rbuf, @@ -361,8 +364,8 @@ int fcoll_base_coll_gather_array (void *sbuf, int i; int rank; char *ptmp; - OPAL_PTRDIFF_TYPE incr; - OPAL_PTRDIFF_TYPE extent, lb; + ptrdiff_t incr; + ptrdiff_t extent, lb; int err = OMPI_SUCCESS; ompi_request_t ** reqs=NULL; @@ -437,7 +440,7 @@ int fcoll_base_coll_gather_array (void *sbuf, return err; } -int fcoll_base_coll_bcast_array (void *buff, +int ompi_fcoll_base_coll_bcast_array (void *buff, int count, ompi_datatype_t *datatype, int root_index, diff --git a/ompi/mca/fcoll/base/fcoll_base_coll_array.h b/ompi/mca/fcoll/base/fcoll_base_coll_array.h index a0f97d7b2ab..7f6c21ca488 100644 --- a/ompi/mca/fcoll/base/fcoll_base_coll_array.h +++ b/ompi/mca/fcoll/base/fcoll_base_coll_array.h @@ -13,6 +13,7 @@ * Copyright (c) 2008-2016 University of Houston. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -41,7 +42,7 @@ * Modified versions of Collective operations * Based on an array of procs in group */ -OMPI_DECLSPEC int fcoll_base_coll_gatherv_array (void *sbuf, +OMPI_DECLSPEC int ompi_fcoll_base_coll_gatherv_array (void *sbuf, int scount, ompi_datatype_t *sdtype, void *rbuf, @@ -52,7 +53,7 @@ OMPI_DECLSPEC int fcoll_base_coll_gatherv_array (void *sbuf, int *procs_in_group, int procs_per_group, ompi_communicator_t *comm); -OMPI_DECLSPEC int fcoll_base_coll_scatterv_array (void *sbuf, +OMPI_DECLSPEC int ompi_fcoll_base_coll_scatterv_array (void *sbuf, int *scounts, int *disps, ompi_datatype_t *sdtype, @@ -63,7 +64,7 @@ OMPI_DECLSPEC int fcoll_base_coll_scatterv_array (void *sbuf, int *procs_in_group, int procs_per_group, ompi_communicator_t *comm); -OMPI_DECLSPEC int fcoll_base_coll_allgather_array (void *sbuf, +OMPI_DECLSPEC int ompi_fcoll_base_coll_allgather_array (void *sbuf, int scount, ompi_datatype_t *sdtype, void *rbuf, @@ -74,7 +75,7 @@ OMPI_DECLSPEC int fcoll_base_coll_allgather_array (void *sbuf, int procs_per_group, ompi_communicator_t *comm); -OMPI_DECLSPEC int fcoll_base_coll_allgatherv_array (void *sbuf, +OMPI_DECLSPEC int ompi_fcoll_base_coll_allgatherv_array (void *sbuf, int scount, ompi_datatype_t *sdtype, void *rbuf, @@ -85,7 +86,7 @@ OMPI_DECLSPEC int fcoll_base_coll_allgatherv_array (void *sbuf, int *procs_in_group, int procs_per_group, ompi_communicator_t *comm); -OMPI_DECLSPEC int fcoll_base_coll_gather_array (void *sbuf, +OMPI_DECLSPEC int ompi_fcoll_base_coll_gather_array (void *sbuf, int scount, ompi_datatype_t *sdtype, void *rbuf, @@ -95,7 +96,7 @@ OMPI_DECLSPEC int fcoll_base_coll_gather_array (void *sbuf, int *procs_in_group, int procs_per_group, ompi_communicator_t *comm); -OMPI_DECLSPEC int fcoll_base_coll_bcast_array (void *buff, +OMPI_DECLSPEC int ompi_fcoll_base_coll_bcast_array (void *buff, int count, ompi_datatype_t *datatype, int root_index, diff --git a/ompi/mca/fcoll/base/fcoll_base_file_select.c b/ompi/mca/fcoll/base/fcoll_base_file_select.c index b0a410937a0..bcb547db543 100644 --- a/ompi/mca/fcoll/base/fcoll_base_file_select.c +++ b/ompi/mca/fcoll/base/fcoll_base_file_select.c @@ -9,7 +9,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2008-2016 University of Houston. All rights reserved. + * Copyright (c) 2008-2017 University of Houston. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -279,7 +279,8 @@ int mca_fcoll_base_query_table (struct mca_io_ompio_file_t *file, char *name) } if (!strcmp (name, "two_phase")) { if ((int)file->f_cc_size < file->f_bytes_per_agg && - file->f_cc_size < file->f_stripe_size) { + (0 == file->f_stripe_size || file->f_cc_size < file->f_stripe_size) && + (LUSTRE != file->f_fstype) ) { return 1; } } diff --git a/ompi/mca/fcoll/base/fcoll_base_sort.c b/ompi/mca/fcoll/base/fcoll_base_sort.c index 685a6d8b113..03a74aaf2cb 100644 --- a/ompi/mca/fcoll/base/fcoll_base_sort.c +++ b/ompi/mca/fcoll/base/fcoll_base_sort.c @@ -11,6 +11,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2016 University of Houston. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -23,7 +24,7 @@ #include "ompi/mca/common/ompio/common_ompio.h" -int fcoll_base_sort_iovec (struct iovec *iov, +int ompi_fcoll_base_sort_iovec (struct iovec *iov, int num_entries, int *sorted) { diff --git a/ompi/mca/fcoll/configure.m4 b/ompi/mca/fcoll/configure.m4 index 30f5cbfc52b..abb84810212 100644 --- a/ompi/mca/fcoll/configure.m4 +++ b/ompi/mca/fcoll/configure.m4 @@ -1,7 +1,7 @@ # -*- shell-script -*- # # Copyright (c) 2011 Cisco Systems, Inc. All rights reserved. -# Copyright (c) 2016 Research Organization for Information Science +# Copyright (c) 2016-2017 Research Organization for Information Science # and Technology (RIST). All rights reserved. # # $COPYRIGHT$ @@ -17,8 +17,7 @@ AC_DEFUN([MCA_ompi_fcoll_CONFIG], [ OPAL_VAR_SCOPE_PUSH([want_io_ompio]) - AS_IF([test "$enable_mpi_io" != "no" && - test "$enable_io_ompio" != "no"], + AS_IF([test "$enable_io_ompio" != "no"], [want_io_ompio=1], [want_io_ompio=0]) diff --git a/ompi/mca/fcoll/dynamic/Makefile.am b/ompi/mca/fcoll/dynamic/Makefile.am index e6d4cc02906..6b77394ec6b 100644 --- a/ompi/mca/fcoll/dynamic/Makefile.am +++ b/ompi/mca/fcoll/dynamic/Makefile.am @@ -11,6 +11,7 @@ # All rights reserved. # Copyright (c) 2008-2015 University of Houston. All rights reserved. # Copyright (c) 2012 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -41,6 +42,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_fcoll_dynamic_la_SOURCES = $(sources) mca_fcoll_dynamic_la_LDFLAGS = -module -avoid-version +mca_fcoll_dynamic_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_fcoll_dynamic_la_SOURCES =$(sources) diff --git a/ompi/mca/fcoll/dynamic/fcoll_dynamic_file_read_all.c b/ompi/mca/fcoll/dynamic/fcoll_dynamic_file_read_all.c index 4e3c7c73277..608d231f708 100644 --- a/ompi/mca/fcoll/dynamic/fcoll_dynamic_file_read_all.c +++ b/ompi/mca/fcoll/dynamic/fcoll_dynamic_file_read_all.c @@ -10,6 +10,9 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2015 University of Houston. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -97,7 +100,7 @@ mca_fcoll_dynamic_file_read_all (mca_io_ompio_file_t *fh, int my_aggregator =-1; bool recvbuf_is_contiguous=false; size_t ftype_size; - OPAL_PTRDIFF_TYPE ftype_extent, lb; + ptrdiff_t ftype_extent, lb; #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN @@ -114,7 +117,7 @@ mca_fcoll_dynamic_file_read_all (mca_io_ompio_file_t *fh, opal_datatype_type_size ( &datatype->super, &ftype_size ); opal_datatype_get_extent ( &datatype->super, &lb, &ftype_extent ); - if ( (ftype_extent == (OPAL_PTRDIFF_TYPE) ftype_size) && + if ( (ftype_extent == (ptrdiff_t) ftype_size) && opal_datatype_is_contiguous_memory_layout(&datatype->super,1) && 0 == lb ) { recvbuf_is_contiguous = true; @@ -141,7 +144,11 @@ mca_fcoll_dynamic_file_read_all (mca_io_ompio_file_t *fh, status->_ucount = max_data; } - fh->f_get_num_aggregators ( &dynamic_num_io_procs); + dynamic_num_io_procs = fh->f_get_mca_parameter_value ( "num_aggregators", strlen ("num_aggregators")); + if ( OMPI_ERR_MAX == dynamic_num_io_procs ) { + ret = OMPI_ERROR; + goto exit; + } ret = fh->f_set_aggregator_props ((struct mca_io_ompio_file_t *) fh, dynamic_num_io_procs, max_data); @@ -162,7 +169,7 @@ mca_fcoll_dynamic_file_read_all (mca_io_ompio_file_t *fh, #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN start_rcomm_time = MPI_Wtime(); #endif - ret = fcoll_base_coll_allgather_array (&max_data, + ret = ompi_fcoll_base_coll_allgather_array (&max_data, 1, MPI_LONG, total_bytes_per_process, @@ -214,7 +221,7 @@ mca_fcoll_dynamic_file_read_all (mca_io_ompio_file_t *fh, #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN start_rcomm_time = MPI_Wtime(); #endif - ret = fcoll_base_coll_allgather_array (&local_count, + ret = ompi_fcoll_base_coll_allgather_array (&local_count, 1, MPI_INT, fview_count, @@ -272,7 +279,7 @@ mca_fcoll_dynamic_file_read_all (mca_io_ompio_file_t *fh, #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN start_rcomm_time = MPI_Wtime(); #endif - ret = fcoll_base_coll_allgatherv_array (local_iov_array, + ret = ompi_fcoll_base_coll_allgatherv_array (local_iov_array, local_count, fh->f_iov_type, global_iov_array, @@ -307,7 +314,7 @@ mca_fcoll_dynamic_file_read_all (mca_io_ompio_file_t *fh, ret = OMPI_ERR_OUT_OF_RESOURCE; goto exit; } - fcoll_base_sort_iovec (global_iov_array, total_fview_count, sorted); + ompi_fcoll_base_sort_iovec (global_iov_array, total_fview_count, sorted); } if (NULL != local_iov_array) { @@ -330,7 +337,12 @@ mca_fcoll_dynamic_file_read_all (mca_io_ompio_file_t *fh, *** 6. Determine the number of cycles required to execute this *** operation *************************************************************/ - fh->f_get_bytes_per_agg ( (int *) &bytes_per_cycle); + bytes_per_cycle = fh->f_get_mca_parameter_value ("bytes_per_agg", strlen ("bytes_per_agg")); + if ( OMPI_ERR_MAX == bytes_per_cycle ) { + ret = OMPI_ERROR; + goto exit; + } + cycles = ceil((double)total_bytes/bytes_per_cycle); if ( my_aggregator == fh->f_rank) { @@ -503,7 +515,7 @@ mca_fcoll_dynamic_file_read_all (mca_io_ompio_file_t *fh, if (my_aggregator == fh->f_rank) { blocklen_per_process[n][disp_index[n] - 1] = bytes_remaining; displs_per_process[n][disp_index[n] - 1] = - (OPAL_PTRDIFF_TYPE)global_iov_array[sorted[current_index]].iov_base + + (ptrdiff_t)global_iov_array[sorted[current_index]].iov_base + (global_iov_array[sorted[current_index]].iov_len - bytes_remaining); blocklen_per_process[n] = (int *) realloc @@ -528,7 +540,7 @@ mca_fcoll_dynamic_file_read_all (mca_io_ompio_file_t *fh, if (my_aggregator == fh->f_rank) { blocklen_per_process[n][disp_index[n] - 1] = bytes_to_read_in_cycle; displs_per_process[n][disp_index[n] - 1] = - (OPAL_PTRDIFF_TYPE)global_iov_array[sorted[current_index]].iov_base + + (ptrdiff_t)global_iov_array[sorted[current_index]].iov_base + (global_iov_array[sorted[current_index]].iov_len - bytes_remaining); } @@ -548,7 +560,7 @@ mca_fcoll_dynamic_file_read_all (mca_io_ompio_file_t *fh, if (my_aggregator == fh->f_rank) { blocklen_per_process[n][disp_index[n] - 1] = bytes_to_read_in_cycle; displs_per_process[n][disp_index[n] - 1] = - (OPAL_PTRDIFF_TYPE)global_iov_array[sorted[current_index]].iov_base ; + (ptrdiff_t)global_iov_array[sorted[current_index]].iov_base ; } if (fh->f_procs_in_group[n] == fh->f_rank) { @@ -564,7 +576,7 @@ mca_fcoll_dynamic_file_read_all (mca_io_ompio_file_t *fh, if (my_aggregator == fh->f_rank) { blocklen_per_process[n][disp_index[n] - 1] = global_iov_array[sorted[current_index]].iov_len; - displs_per_process[n][disp_index[n] - 1] = (OPAL_PTRDIFF_TYPE) + displs_per_process[n][disp_index[n] - 1] = (ptrdiff_t) global_iov_array[sorted[current_index]].iov_base; blocklen_per_process[n] = (int *) realloc ((void *)blocklen_per_process[n], (disp_index[n]+1)*sizeof(int)); @@ -813,14 +825,14 @@ mca_fcoll_dynamic_file_read_all (mca_io_ompio_file_t *fh, /* If data is not contigous in memory, copy the data from the receive buffer into the buffer passed in */ if (!recvbuf_is_contiguous ) { - OPAL_PTRDIFF_TYPE mem_address; + ptrdiff_t mem_address; size_t remaining = 0; size_t temp_position = 0; remaining = bytes_received; while (remaining) { - mem_address = (OPAL_PTRDIFF_TYPE) + mem_address = (ptrdiff_t) (decoded_iov[iov_index].iov_base) + current_position; if (remaining >= diff --git a/ompi/mca/fcoll/dynamic/fcoll_dynamic_file_write_all.c b/ompi/mca/fcoll/dynamic/fcoll_dynamic_file_write_all.c index 7bc41c4590d..86f30d7df14 100644 --- a/ompi/mca/fcoll/dynamic/fcoll_dynamic_file_write_all.c +++ b/ompi/mca/fcoll/dynamic/fcoll_dynamic_file_write_all.c @@ -10,8 +10,9 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2015 University of Houston. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -101,7 +102,7 @@ mca_fcoll_dynamic_file_write_all (mca_io_ompio_file_t *fh, int my_aggregator=-1; bool sendbuf_is_contiguous = false; size_t ftype_size; - OPAL_PTRDIFF_TYPE ftype_extent, lb; + ptrdiff_t ftype_extent, lb; #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN @@ -117,7 +118,7 @@ mca_fcoll_dynamic_file_write_all (mca_io_ompio_file_t *fh, /************************************************************************** ** 1. In case the data is not contigous in memory, decode it into an iovec **************************************************************************/ - if ( ( ftype_extent == (OPAL_PTRDIFF_TYPE) ftype_size) && + if ( ( ftype_extent == (ptrdiff_t) ftype_size) && opal_datatype_is_contiguous_memory_layout(&datatype->super,1) && 0 == lb ) { sendbuf_is_contiguous = true; @@ -145,7 +146,11 @@ mca_fcoll_dynamic_file_write_all (mca_io_ompio_file_t *fh, status->_ucount = max_data; } - fh->f_get_num_aggregators ( &dynamic_num_io_procs ); + dynamic_num_io_procs = fh->f_get_mca_parameter_value ( "num_aggregators", strlen ("num_aggregators")); + if ( OMPI_ERR_MAX == dynamic_num_io_procs ) { + ret = OMPI_ERROR; + goto exit; + } ret = fh->f_set_aggregator_props ((struct mca_io_ompio_file_t *) fh, dynamic_num_io_procs, max_data); @@ -168,7 +173,7 @@ mca_fcoll_dynamic_file_write_all (mca_io_ompio_file_t *fh, #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN start_comm_time = MPI_Wtime(); #endif - ret = fcoll_base_coll_allgather_array (&max_data, + ret = ompi_fcoll_base_coll_allgather_array (&max_data, 1, MPI_LONG, total_bytes_per_process, @@ -231,7 +236,7 @@ mca_fcoll_dynamic_file_write_all (mca_io_ompio_file_t *fh, #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN start_comm_time = MPI_Wtime(); #endif - ret = fcoll_base_coll_allgather_array (&local_count, + ret = ompi_fcoll_base_coll_allgather_array (&local_count, 1, MPI_INT, fview_count, @@ -293,7 +298,7 @@ mca_fcoll_dynamic_file_write_all (mca_io_ompio_file_t *fh, #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN start_comm_time = MPI_Wtime(); #endif - ret = fcoll_base_coll_allgatherv_array (local_iov_array, + ret = ompi_fcoll_base_coll_allgatherv_array (local_iov_array, local_count, fh->f_iov_type, global_iov_array, @@ -327,7 +332,7 @@ mca_fcoll_dynamic_file_write_all (mca_io_ompio_file_t *fh, ret = OMPI_ERR_OUT_OF_RESOURCE; goto exit; } - fcoll_base_sort_iovec (global_iov_array, total_fview_count, sorted); + ompi_fcoll_base_sort_iovec (global_iov_array, total_fview_count, sorted); } if (NULL != local_iov_array){ @@ -356,7 +361,11 @@ mca_fcoll_dynamic_file_write_all (mca_io_ompio_file_t *fh, *** 6. Determine the number of cycles required to execute this *** operation *************************************************************/ - fh->f_get_bytes_per_agg ( (int *)&bytes_per_cycle ); + bytes_per_cycle = fh->f_get_mca_parameter_value ("bytes_per_agg", strlen ("bytes_per_agg")); + if ( OMPI_ERR_MAX == bytes_per_cycle ) { + ret = OMPI_ERROR; + goto exit; + } cycles = ceil((double)total_bytes/bytes_per_cycle); if (my_aggregator == fh->f_rank) { @@ -523,7 +532,7 @@ mca_fcoll_dynamic_file_write_all (mca_io_ompio_file_t *fh, if (my_aggregator == fh->f_rank) { blocklen_per_process[n][disp_index[n] - 1] = bytes_remaining; displs_per_process[n][disp_index[n] - 1] = - (OPAL_PTRDIFF_TYPE)global_iov_array[sorted[current_index]].iov_base + + (ptrdiff_t)global_iov_array[sorted[current_index]].iov_base + (global_iov_array[sorted[current_index]].iov_len - bytes_remaining); @@ -551,7 +560,7 @@ mca_fcoll_dynamic_file_write_all (mca_io_ompio_file_t *fh, if (my_aggregator == fh->f_rank) { blocklen_per_process[n][disp_index[n] - 1] = bytes_to_write_in_cycle; displs_per_process[n][disp_index[n] - 1] = - (OPAL_PTRDIFF_TYPE)global_iov_array[sorted[current_index]].iov_base + + (ptrdiff_t)global_iov_array[sorted[current_index]].iov_base + (global_iov_array[sorted[current_index]].iov_len - bytes_remaining); } @@ -572,7 +581,7 @@ mca_fcoll_dynamic_file_write_all (mca_io_ompio_file_t *fh, if (my_aggregator == fh->f_rank) { blocklen_per_process[n][disp_index[n] - 1] = bytes_to_write_in_cycle; displs_per_process[n][disp_index[n] - 1] = - (OPAL_PTRDIFF_TYPE)global_iov_array[sorted[current_index]].iov_base ; + (ptrdiff_t)global_iov_array[sorted[current_index]].iov_base ; } if (fh->f_procs_in_group[n] == fh->f_rank) { bytes_sent += bytes_to_write_in_cycle; @@ -588,7 +597,7 @@ mca_fcoll_dynamic_file_write_all (mca_io_ompio_file_t *fh, if (my_aggregator == fh->f_rank) { blocklen_per_process[n][disp_index[n] - 1] = global_iov_array[sorted[current_index]].iov_len; - displs_per_process[n][disp_index[n] - 1] = (OPAL_PTRDIFF_TYPE) + displs_per_process[n][disp_index[n] - 1] = (ptrdiff_t) global_iov_array[sorted[current_index]].iov_base; /*realloc for next blocklength @@ -798,7 +807,7 @@ mca_fcoll_dynamic_file_write_all (mca_io_ompio_file_t *fh, /* allocate a send buffer and copy the data that needs to be sent into it in case the data is non-contigous in memory */ - OPAL_PTRDIFF_TYPE mem_address; + ptrdiff_t mem_address; size_t remaining = 0; size_t temp_position = 0; @@ -812,7 +821,7 @@ mca_fcoll_dynamic_file_write_all (mca_io_ompio_file_t *fh, remaining = bytes_sent; while (remaining) { - mem_address = (OPAL_PTRDIFF_TYPE) + mem_address = (ptrdiff_t) (decoded_iov[iov_index].iov_base) + current_position; if (remaining >= @@ -946,7 +955,7 @@ mca_fcoll_dynamic_file_write_all (mca_io_ompio_file_t *fh, for (i=0 ; if_num_of_io_entries ; i++) { printf(" ADDRESS: %p OFFSET: %ld LENGTH: %ld\n", fh->f_io_array[i].memory_address, - (OPAL_PTRDIFF_TYPE)fh->f_io_array[i].offset, + (ptrdiff_t)fh->f_io_array[i].offset, fh->f_io_array[i].length); } diff --git a/ompi/mca/fcoll/dynamic/fcoll_dynamic_module.c b/ompi/mca/fcoll/dynamic/fcoll_dynamic_module.c index 4d3466b3ec8..17b4636a925 100644 --- a/ompi/mca/fcoll/dynamic/fcoll_dynamic_module.c +++ b/ompi/mca/fcoll/dynamic/fcoll_dynamic_module.c @@ -61,8 +61,8 @@ mca_fcoll_dynamic_component_file_query (mca_io_ompio_file_t *fh, int *priority) } if (mca_fcoll_base_query_table (fh, "dynamic")) { - if (*priority < 50) { - *priority = 50; + if (*priority < 30) { + *priority = 30; } } diff --git a/ompi/mca/fcoll/dynamic_gen2/Makefile.am b/ompi/mca/fcoll/dynamic_gen2/Makefile.am index f4910ac5e97..052e34fc50a 100644 --- a/ompi/mca/fcoll/dynamic_gen2/Makefile.am +++ b/ompi/mca/fcoll/dynamic_gen2/Makefile.am @@ -11,6 +11,7 @@ # All rights reserved. # Copyright (c) 2008-2015 University of Houston. All rights reserved. # Copyright (c) 2012 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -41,6 +42,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_fcoll_dynamic_gen2_la_SOURCES = $(sources) mca_fcoll_dynamic_gen2_la_LDFLAGS = -module -avoid-version +mca_fcoll_dynamic_gen2_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_fcoll_dynamic_gen2_la_SOURCES =$(sources) diff --git a/ompi/mca/fcoll/dynamic_gen2/fcoll_dynamic_gen2_file_read_all.c b/ompi/mca/fcoll/dynamic_gen2/fcoll_dynamic_gen2_file_read_all.c index 44cc0a2bdee..c80cbd36dc4 100644 --- a/ompi/mca/fcoll/dynamic_gen2/fcoll_dynamic_gen2_file_read_all.c +++ b/ompi/mca/fcoll/dynamic_gen2/fcoll_dynamic_gen2_file_read_all.c @@ -10,6 +10,9 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2015 University of Houston. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -97,7 +100,7 @@ mca_fcoll_dynamic_gen2_file_read_all (mca_io_ompio_file_t *fh, int my_aggregator =-1; bool recvbuf_is_contiguous=false; size_t ftype_size; - OPAL_PTRDIFF_TYPE ftype_extent, lb; + ptrdiff_t ftype_extent, lb; #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN @@ -114,7 +117,7 @@ mca_fcoll_dynamic_gen2_file_read_all (mca_io_ompio_file_t *fh, opal_datatype_type_size ( &datatype->super, &ftype_size ); opal_datatype_get_extent ( &datatype->super, &lb, &ftype_extent ); - if ( (ftype_extent == (OPAL_PTRDIFF_TYPE) ftype_size) && + if ( (ftype_extent == (ptrdiff_t) ftype_size) && opal_datatype_is_contiguous_memory_layout(&datatype->super,1) && 0 == lb ) { recvbuf_is_contiguous = true; @@ -141,7 +144,11 @@ mca_fcoll_dynamic_gen2_file_read_all (mca_io_ompio_file_t *fh, status->_ucount = max_data; } - fh->f_get_num_aggregators ( &dynamic_gen2_num_io_procs); + dynamic_gen2_num_io_procs = fh->f_get_mca_parameter_value ( "num_aggregators", strlen ("num_aggregators")); + if ( OMPI_ERR_MAX == dynamic_gen2_num_io_procs ) { + ret = OMPI_ERROR; + goto exit; + } ret = fh->f_set_aggregator_props ((struct mca_io_ompio_file_t *) fh, dynamic_gen2_num_io_procs, max_data); @@ -162,7 +169,7 @@ mca_fcoll_dynamic_gen2_file_read_all (mca_io_ompio_file_t *fh, #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN start_rcomm_time = MPI_Wtime(); #endif - ret = fcoll_base_coll_allgather_array (&max_data, + ret = ompi_fcoll_base_coll_allgather_array (&max_data, 1, MPI_LONG, total_bytes_per_process, @@ -214,7 +221,7 @@ mca_fcoll_dynamic_gen2_file_read_all (mca_io_ompio_file_t *fh, #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN start_rcomm_time = MPI_Wtime(); #endif - ret = fcoll_base_coll_allgather_array (&local_count, + ret = ompi_fcoll_base_coll_allgather_array (&local_count, 1, MPI_INT, fview_count, @@ -272,7 +279,7 @@ mca_fcoll_dynamic_gen2_file_read_all (mca_io_ompio_file_t *fh, #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN start_rcomm_time = MPI_Wtime(); #endif - ret = fcoll_base_coll_allgatherv_array (local_iov_array, + ret = ompi_fcoll_base_coll_allgatherv_array (local_iov_array, local_count, fh->f_iov_type, global_iov_array, @@ -307,7 +314,7 @@ mca_fcoll_dynamic_gen2_file_read_all (mca_io_ompio_file_t *fh, ret = OMPI_ERR_OUT_OF_RESOURCE; goto exit; } - fcoll_base_sort_iovec (global_iov_array, total_fview_count, sorted); + ompi_fcoll_base_sort_iovec (global_iov_array, total_fview_count, sorted); } if (NULL != local_iov_array) { @@ -330,7 +337,11 @@ mca_fcoll_dynamic_gen2_file_read_all (mca_io_ompio_file_t *fh, *** 6. Determine the number of cycles required to execute this *** operation *************************************************************/ - fh->f_get_bytes_per_agg ( (int *) &bytes_per_cycle); + bytes_per_cycle = fh->f_get_mca_parameter_value ("bytes_per_agg", strlen ("bytes_per_agg")); + if ( OMPI_ERR_MAX == bytes_per_cycle ) { + ret = OMPI_ERROR; + goto exit; + } cycles = ceil((double)total_bytes/bytes_per_cycle); if ( my_aggregator == fh->f_rank) { @@ -503,7 +514,7 @@ mca_fcoll_dynamic_gen2_file_read_all (mca_io_ompio_file_t *fh, if (my_aggregator == fh->f_rank) { blocklen_per_process[n][disp_index[n] - 1] = bytes_remaining; displs_per_process[n][disp_index[n] - 1] = - (OPAL_PTRDIFF_TYPE)global_iov_array[sorted[current_index]].iov_base + + (ptrdiff_t)global_iov_array[sorted[current_index]].iov_base + (global_iov_array[sorted[current_index]].iov_len - bytes_remaining); blocklen_per_process[n] = (int *) realloc @@ -528,7 +539,7 @@ mca_fcoll_dynamic_gen2_file_read_all (mca_io_ompio_file_t *fh, if (my_aggregator == fh->f_rank) { blocklen_per_process[n][disp_index[n] - 1] = bytes_to_read_in_cycle; displs_per_process[n][disp_index[n] - 1] = - (OPAL_PTRDIFF_TYPE)global_iov_array[sorted[current_index]].iov_base + + (ptrdiff_t)global_iov_array[sorted[current_index]].iov_base + (global_iov_array[sorted[current_index]].iov_len - bytes_remaining); } @@ -548,7 +559,7 @@ mca_fcoll_dynamic_gen2_file_read_all (mca_io_ompio_file_t *fh, if (my_aggregator == fh->f_rank) { blocklen_per_process[n][disp_index[n] - 1] = bytes_to_read_in_cycle; displs_per_process[n][disp_index[n] - 1] = - (OPAL_PTRDIFF_TYPE)global_iov_array[sorted[current_index]].iov_base ; + (ptrdiff_t)global_iov_array[sorted[current_index]].iov_base ; } if (fh->f_procs_in_group[n] == fh->f_rank) { @@ -564,7 +575,7 @@ mca_fcoll_dynamic_gen2_file_read_all (mca_io_ompio_file_t *fh, if (my_aggregator == fh->f_rank) { blocklen_per_process[n][disp_index[n] - 1] = global_iov_array[sorted[current_index]].iov_len; - displs_per_process[n][disp_index[n] - 1] = (OPAL_PTRDIFF_TYPE) + displs_per_process[n][disp_index[n] - 1] = (ptrdiff_t) global_iov_array[sorted[current_index]].iov_base; blocklen_per_process[n] = (int *) realloc ((void *)blocklen_per_process[n], (disp_index[n]+1)*sizeof(int)); @@ -813,14 +824,14 @@ mca_fcoll_dynamic_gen2_file_read_all (mca_io_ompio_file_t *fh, /* If data is not contigous in memory, copy the data from the receive buffer into the buffer passed in */ if (!recvbuf_is_contiguous ) { - OPAL_PTRDIFF_TYPE mem_address; + ptrdiff_t mem_address; size_t remaining = 0; size_t temp_position = 0; remaining = bytes_received; while (remaining) { - mem_address = (OPAL_PTRDIFF_TYPE) + mem_address = (ptrdiff_t) (decoded_iov[iov_index].iov_base) + current_position; if (remaining >= diff --git a/ompi/mca/fcoll/dynamic_gen2/fcoll_dynamic_gen2_file_write_all.c b/ompi/mca/fcoll/dynamic_gen2/fcoll_dynamic_gen2_file_write_all.c index 31bfa83150b..8ec536bcb7f 100644 --- a/ompi/mca/fcoll/dynamic_gen2/fcoll_dynamic_gen2_file_write_all.c +++ b/ompi/mca/fcoll/dynamic_gen2/fcoll_dynamic_gen2_file_write_all.c @@ -10,8 +10,10 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2016 University of Houston. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. + * Copyright (c) 2018 Cisco Systems, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -35,6 +37,7 @@ #define DEBUG_ON 0 #define FCOLL_DYNAMIC_GEN2_SHUFFLE_TAG 123 +#define INIT_LEN 10 /*Used for loading file-offsets per aggregator*/ typedef struct mca_io_ompio_local_io_array{ @@ -45,6 +48,7 @@ typedef struct mca_io_ompio_local_io_array{ typedef struct mca_io_ompio_aggregator_data { int *disp_index, *sorted, *fview_count, n; + int *max_disp_index; int **blocklen_per_process; MPI_Aint **displs_per_process, total_bytes, bytes_per_cycle, total_bytes_written; MPI_Comm comm; @@ -54,13 +58,11 @@ typedef struct mca_io_ompio_aggregator_data { int current_index, current_position; int bytes_to_write_in_cycle, bytes_remaining, procs_per_group; int *procs_in_group, iov_index; - bool sendbuf_is_contiguous, prev_sendbuf_is_contiguous; int bytes_sent, prev_bytes_sent; struct iovec *decoded_iov; int bytes_to_write, prev_bytes_to_write; mca_io_ompio_io_array_t *io_array, *prev_io_array; int num_io_entries, prev_num_io_entries; - char *send_buf, *prev_send_buf; } mca_io_ompio_aggregator_data; @@ -75,9 +77,7 @@ typedef struct mca_io_ompio_aggregator_data { for (_i=0; _i<_num; _i++ ) { \ _aggr[_i]->prev_io_array=_aggr[_i]->io_array; \ _aggr[_i]->prev_num_io_entries=_aggr[_i]->num_io_entries; \ - _aggr[_i]->prev_send_buf=_aggr[_i]->send_buf; \ _aggr[_i]->prev_bytes_sent=_aggr[_i]->bytes_sent; \ - _aggr[_i]->prev_sendbuf_is_contiguous=_aggr[_i]->sendbuf_is_contiguous; \ _aggr[_i]->prev_bytes_to_write=_aggr[_i]->bytes_to_write; \ _t=_aggr[_i]->prev_global_buf; \ _aggr[_i]->prev_global_buf=_aggr[_i]->global_buf; \ @@ -159,7 +159,11 @@ int mca_fcoll_dynamic_gen2_file_write_all (mca_io_ompio_file_t *fh, /************************************************************************** ** 1. In case the data is not contigous in memory, decode it into an iovec **************************************************************************/ - fh->f_get_bytes_per_agg ( (int *)&bytes_per_cycle ); + bytes_per_cycle = fh->f_get_mca_parameter_value ("bytes_per_agg", strlen ("bytes_per_agg")); + if ( OMPI_ERR_MAX == bytes_per_cycle ) { + ret = OMPI_ERROR; + goto exit; + } /* since we want to overlap 2 iterations, define the bytes_per_cycle to be half of what the user requested */ bytes_per_cycle =bytes_per_cycle/2; @@ -187,7 +191,11 @@ int mca_fcoll_dynamic_gen2_file_write_all (mca_io_ompio_file_t *fh, dynamic_gen2_num_io_procs = fh->f_stripe_count; } else { - fh->f_get_num_aggregators ( &dynamic_gen2_num_io_procs ); + dynamic_gen2_num_io_procs = fh->f_get_mca_parameter_value ( "num_aggregators", strlen ("num_aggregators")); + if ( OMPI_ERR_MAX == dynamic_gen2_num_io_procs ) { + ret = OMPI_ERROR; + goto exit; + } } @@ -220,8 +228,6 @@ int mca_fcoll_dynamic_gen2_file_write_all (mca_io_ompio_file_t *fh, aggr_data[i]->procs_in_group = fh->f_procs_in_group; aggr_data[i]->comm = fh->f_comm; aggr_data[i]->buf = (char *)buf; // should not be used in the new version. - aggr_data[i]->sendbuf_is_contiguous = false; //safe assumption for right now - aggr_data[i]->prev_sendbuf_is_contiguous = false; //safe assumption for right now } /********************************************************************* @@ -274,7 +280,7 @@ int mca_fcoll_dynamic_gen2_file_write_all (mca_io_ompio_file_t *fh, fh->f_comm->c_coll->coll_allgather_module); } else { - ret = fcoll_base_coll_allgather_array (broken_total_lengths, + ret = ompi_fcoll_base_coll_allgather_array (broken_total_lengths, dynamic_gen2_num_io_procs, MPI_LONG, total_bytes_per_process, @@ -333,7 +339,7 @@ int mca_fcoll_dynamic_gen2_file_write_all (mca_io_ompio_file_t *fh, fh->f_comm->c_coll->coll_allgather_module); } else { - ret = fcoll_base_coll_allgather_array (broken_counts, + ret = ompi_fcoll_base_coll_allgather_array (broken_counts, dynamic_gen2_num_io_procs, MPI_INT, result_counts, @@ -420,7 +426,7 @@ int mca_fcoll_dynamic_gen2_file_write_all (mca_io_ompio_file_t *fh, fh->f_comm->c_coll->coll_allgatherv_module ); } else { - ret = fcoll_base_coll_allgatherv_array (broken_iov_arrays[i], + ret = ompi_fcoll_base_coll_allgatherv_array (broken_iov_arrays[i], broken_counts[i], fh->f_iov_type, aggr_data[i]->global_iov_array, @@ -455,7 +461,7 @@ int mca_fcoll_dynamic_gen2_file_write_all (mca_io_ompio_file_t *fh, ret = OMPI_ERR_OUT_OF_RESOURCE; goto exit; } - fcoll_base_sort_iovec (aggr_data[i]->global_iov_array, total_fview_count, aggr_data[i]->sorted); + ompi_fcoll_base_sort_iovec (aggr_data[i]->global_iov_array, total_fview_count, aggr_data[i]->sorted); } if (NULL != local_iov_array){ @@ -495,6 +501,13 @@ int mca_fcoll_dynamic_gen2_file_write_all (mca_io_ompio_file_t *fh, goto exit; } + aggr_data[i]->max_disp_index = (int *)calloc (fh->f_procs_per_group, sizeof (int)); + if (NULL == aggr_data[i]->max_disp_index) { + opal_output (1, "OUT OF MEMORY\n"); + ret = OMPI_ERR_OUT_OF_RESOURCE; + goto exit; + } + aggr_data[i]->blocklen_per_process = (int **)calloc (fh->f_procs_per_group, sizeof (int*)); if (NULL == aggr_data[i]->blocklen_per_process) { opal_output (1, "OUT OF MEMORY\n"); @@ -602,10 +615,6 @@ int mca_fcoll_dynamic_gen2_file_write_all (mca_io_ompio_file_t *fh, end_write_time = MPI_Wtime(); write_time += end_write_time - start_write_time; #endif - - if (!aggr_data[i]->prev_sendbuf_is_contiguous && aggr_data[i]->prev_bytes_sent) { - free (aggr_data[i]->prev_send_buf); - } } } /* end for (index = 0; index < cycles; index++) */ @@ -635,10 +644,6 @@ int mca_fcoll_dynamic_gen2_file_write_all (mca_io_ompio_file_t *fh, end_write_time = MPI_Wtime(); write_time += end_write_time - start_write_time; #endif - - if (!aggr_data[i]->prev_sendbuf_is_contiguous && aggr_data[i]->prev_bytes_sent) { - free (aggr_data[i]->prev_send_buf); - } } } @@ -683,6 +688,7 @@ exit : } free (aggr_data[i]->disp_index); + free (aggr_data[i]->max_disp_index); free (aggr_data[i]->global_buf); free (aggr_data[i]->prev_global_buf); for(l=0;lprocs_per_group;l++){ @@ -702,6 +708,7 @@ exit : } free (aggr_data); } + free(local_iov_array); free(displs); free(decoded_iov); free(broken_counts); @@ -772,11 +779,12 @@ static int shuffle_init ( int index, int cycles, int aggregator, int rank, mca_i MPI_Aint *memory_displacements=NULL; int *temp_disp_index=NULL; MPI_Aint global_count = 0; + int* blocklength_proc=NULL; + ptrdiff_t* displs_proc=NULL; data->num_io_entries = 0; data->bytes_sent = 0; data->io_array=NULL; - data->send_buf=NULL; /********************************************************************** *** 7a. Getting ready for next cycle: initializing and freeing buffers **********************************************************************/ @@ -793,16 +801,20 @@ static int shuffle_init ( int index, int cycles, int aggregator, int rank, mca_i for(l=0;lprocs_per_group;l++){ data->disp_index[l] = 1; - - free(data->blocklen_per_process[l]); - free(data->displs_per_process[l]); - - data->blocklen_per_process[l] = (int *) calloc (1, sizeof(int)); - data->displs_per_process[l] = (MPI_Aint *) calloc (1, sizeof(MPI_Aint)); - if (NULL == data->displs_per_process[l] || NULL == data->blocklen_per_process[l]){ - opal_output (1, "OUT OF MEMORY for displs\n"); - ret = OMPI_ERR_OUT_OF_RESOURCE; - goto exit; + + if(data->max_disp_index[l] == 0) { + data->blocklen_per_process[l] = (int *) calloc (INIT_LEN, sizeof(int)); + data->displs_per_process[l] = (MPI_Aint *) calloc (INIT_LEN, sizeof(MPI_Aint)); + if (NULL == data->displs_per_process[l] || NULL == data->blocklen_per_process[l]){ + opal_output (1, "OUT OF MEMORY for displs\n"); + ret = OMPI_ERR_OUT_OF_RESOURCE; + goto exit; + } + data->max_disp_index[l] = INIT_LEN; + } + else { + memset ( data->blocklen_per_process[l], 0, data->max_disp_index[l]*sizeof(int) ); + memset ( data->displs_per_process[l], 0, data->max_disp_index[l]*sizeof(MPI_Aint) ); } } } /* (aggregator == rank */ @@ -869,19 +881,26 @@ static int shuffle_init ( int index, int cycles, int aggregator, int rank, mca_i if (aggregator == rank) { data->blocklen_per_process[data->n][data->disp_index[data->n] - 1] = data->bytes_remaining; data->displs_per_process[data->n][data->disp_index[data->n] - 1] = - (OPAL_PTRDIFF_TYPE)data->global_iov_array[data->sorted[data->current_index]].iov_base + + (ptrdiff_t)data->global_iov_array[data->sorted[data->current_index]].iov_base + (data->global_iov_array[data->sorted[data->current_index]].iov_len - data->bytes_remaining); + data->disp_index[data->n] += 1; + /* In this cases the length is consumed so allocating for next displacement and blocklength*/ - data->blocklen_per_process[data->n] = (int *) realloc - ((void *)data->blocklen_per_process[data->n], (data->disp_index[data->n]+1)*sizeof(int)); - data->displs_per_process[data->n] = (MPI_Aint *) realloc - ((void *)data->displs_per_process[data->n], (data->disp_index[data->n]+1)*sizeof(MPI_Aint)); + if ( data->disp_index[data->n] == data->max_disp_index[data->n] ) { + data->max_disp_index[data->n] *= 2; + data->blocklen_per_process[data->n] = (int *) realloc( + (void *)data->blocklen_per_process[data->n], + (data->max_disp_index[data->n])*sizeof(int)); + data->displs_per_process[data->n] = (MPI_Aint *) realloc( + (void *)data->displs_per_process[data->n], + (data->max_disp_index[data->n])*sizeof(MPI_Aint)); + } + data->blocklen_per_process[data->n][data->disp_index[data->n]] = 0; data->displs_per_process[data->n][data->disp_index[data->n]] = 0; - data->disp_index[data->n] += 1; } if (data->procs_in_group[data->n] == rank) { bytes_sent += data->bytes_remaining; @@ -889,7 +908,6 @@ static int shuffle_init ( int index, int cycles, int aggregator, int rank, mca_i data->current_index ++; data->bytes_to_write_in_cycle -= data->bytes_remaining; data->bytes_remaining = 0; -// continue; } else { /* the remaining data from the previous cycle is larger than the @@ -897,7 +915,7 @@ static int shuffle_init ( int index, int cycles, int aggregator, int rank, mca_i if (aggregator == rank) { data->blocklen_per_process[data->n][data->disp_index[data->n] - 1] = data->bytes_to_write_in_cycle; data->displs_per_process[data->n][data->disp_index[data->n] - 1] = - (OPAL_PTRDIFF_TYPE)data->global_iov_array[data->sorted[data->current_index]].iov_base + + (ptrdiff_t)data->global_iov_array[data->sorted[data->current_index]].iov_base + (data->global_iov_array[data->sorted[data->current_index]].iov_len - data->bytes_remaining); } @@ -918,7 +936,7 @@ static int shuffle_init ( int index, int cycles, int aggregator, int rank, mca_i if (aggregator == rank) { data->blocklen_per_process[data->n][data->disp_index[data->n] - 1] = data->bytes_to_write_in_cycle; data->displs_per_process[data->n][data->disp_index[data->n] - 1] = - (OPAL_PTRDIFF_TYPE)data->global_iov_array[data->sorted[data->current_index]].iov_base ; + (ptrdiff_t)data->global_iov_array[data->sorted[data->current_index]].iov_base ; } if (data->procs_in_group[data->n] == rank) { bytes_sent += data->bytes_to_write_in_cycle; @@ -934,19 +952,25 @@ static int shuffle_init ( int index, int cycles, int aggregator, int rank, mca_i if (aggregator == rank) { data->blocklen_per_process[data->n][data->disp_index[data->n] - 1] = data->global_iov_array[data->sorted[data->current_index]].iov_len; - data->displs_per_process[data->n][data->disp_index[data->n] - 1] = (OPAL_PTRDIFF_TYPE) + data->displs_per_process[data->n][data->disp_index[data->n] - 1] = (ptrdiff_t) data->global_iov_array[data->sorted[data->current_index]].iov_base; + data->disp_index[data->n] += 1; + /*realloc for next blocklength and assign this displacement and check for next displs as the total length of this entry has been consumed!*/ - data->blocklen_per_process[data->n] = - (int *) realloc ((void *)data->blocklen_per_process[data->n], (data->disp_index[data->n]+1)*sizeof(int)); - data->displs_per_process[data->n] = (MPI_Aint *)realloc - ((void *)data->displs_per_process[data->n], (data->disp_index[data->n]+1)*sizeof(MPI_Aint)); + if ( data->disp_index[data->n] == data->max_disp_index[data->n] ) { + data->max_disp_index[data->n] *= 2; + data->blocklen_per_process[data->n] = (int *) realloc( + (void *)data->blocklen_per_process[data->n], + (data->max_disp_index[data->n]*sizeof(int))); + data->displs_per_process[data->n] = (MPI_Aint *)realloc( + (void *)data->displs_per_process[data->n], + (data->max_disp_index[data->n]*sizeof(MPI_Aint))); + } data->blocklen_per_process[data->n][data->disp_index[data->n]] = 0; data->displs_per_process[data->n][data->disp_index[data->n]] = 0; - data->disp_index[data->n] += 1; } if (data->procs_in_group[data->n] == rank) { bytes_sent += data->global_iov_array[data->sorted[data->current_index]].iov_len; @@ -954,7 +978,6 @@ static int shuffle_init ( int index, int cycles, int aggregator, int rank, mca_i data->bytes_to_write_in_cycle -= data->global_iov_array[data->sorted[data->current_index]].iov_len; data->current_index ++; -// continue; } } } @@ -1134,73 +1157,86 @@ static int shuffle_init ( int index, int cycles, int aggregator, int rank, mca_i } } /* end if (entries_per_aggr > 0 ) */ }/* end if (aggregator == rank ) */ - - if ( data->sendbuf_is_contiguous ) { - data->send_buf = &((char*)data->buf)[data->total_bytes_written]; - } - else if (bytes_sent) { - /* allocate a send buffer and copy the data that needs - to be sent into it in case the data is non-contigous - in memory */ - OPAL_PTRDIFF_TYPE mem_address; - size_t remaining = 0; - size_t temp_position = 0; - - data->send_buf = malloc (bytes_sent); - if (NULL == data->send_buf) { + + if (bytes_sent) { + size_t remaining = bytes_sent; + int block_index = -1; + int blocklength_size = INIT_LEN; + + ptrdiff_t send_mem_address = (ptrdiff_t) NULL; + ompi_datatype_t *newType = MPI_DATATYPE_NULL; + blocklength_proc = (int *) calloc (blocklength_size, sizeof (int)); + displs_proc = (ptrdiff_t *) calloc (blocklength_size, sizeof (ptrdiff_t)); + + if (NULL == blocklength_proc || NULL == displs_proc ) { opal_output (1, "OUT OF MEMORY\n"); ret = OMPI_ERR_OUT_OF_RESOURCE; goto exit; } - - remaining = bytes_sent; - + while (remaining) { - mem_address = (OPAL_PTRDIFF_TYPE) - (data->decoded_iov[data->iov_index].iov_base) + data->current_position; - + block_index++; + + if(0 == block_index) { + send_mem_address = (ptrdiff_t) (data->decoded_iov[data->iov_index].iov_base) + + data->current_position; + } + else { + // Reallocate more memory if blocklength_size is not enough + if(0 == block_index % INIT_LEN) { + blocklength_size += INIT_LEN; + blocklength_proc = (int *) realloc(blocklength_proc, blocklength_size * sizeof(int)); + displs_proc = (ptrdiff_t *) realloc(displs_proc, blocklength_size * sizeof(ptrdiff_t)); + } + displs_proc[block_index] = (ptrdiff_t) (data->decoded_iov[data->iov_index].iov_base) + + data->current_position - send_mem_address; + } + if (remaining >= (data->decoded_iov[data->iov_index].iov_len - data->current_position)) { - memcpy (data->send_buf+temp_position, - (IOVBASE_TYPE *)mem_address, - data->decoded_iov[data->iov_index].iov_len - data->current_position); + + blocklength_proc[block_index] = data->decoded_iov[data->iov_index].iov_len - + data->current_position; remaining = remaining - - (data->decoded_iov[data->iov_index].iov_len - data->current_position); - temp_position = temp_position + - (data->decoded_iov[data->iov_index].iov_len - data->current_position); + (data->decoded_iov[data->iov_index].iov_len - data->current_position); data->iov_index = data->iov_index + 1; data->current_position = 0; } else { - memcpy (data->send_buf+temp_position, - (IOVBASE_TYPE *) mem_address, - remaining); + blocklength_proc[block_index] = remaining; data->current_position += remaining; remaining = 0; } } - } - data->total_bytes_written += bytes_sent; - data->bytes_sent = bytes_sent; - /* Gather the sendbuf from each process in appropritate locations in - aggregators*/ - - if (bytes_sent){ - ret = MCA_PML_CALL(isend(data->send_buf, - bytes_sent, - MPI_BYTE, - aggregator, - FCOLL_DYNAMIC_GEN2_SHUFFLE_TAG+index, - MCA_PML_BASE_SEND_STANDARD, - data->comm, - &reqs[data->procs_per_group])); - - - if ( OMPI_SUCCESS != ret ){ - goto exit; + + data->total_bytes_written += bytes_sent; + data->bytes_sent = bytes_sent; + + if ( 0 <= block_index ) { + ompi_datatype_create_hindexed(block_index+1, + blocklength_proc, + displs_proc, + MPI_BYTE, + &newType); + ompi_datatype_commit(&newType); + + ret = MCA_PML_CALL(isend((char *)send_mem_address, + 1, + newType, + aggregator, + FCOLL_DYNAMIC_GEN2_SHUFFLE_TAG+index, + MCA_PML_BASE_SEND_STANDARD, + data->comm, + &reqs[data->procs_per_group])); + if ( MPI_DATATYPE_NULL != newType ) { + ompi_datatype_destroy(&newType); + } + if (OMPI_SUCCESS != ret){ + goto exit; + } } - } + #if DEBUG_ON if (aggregator == rank){ @@ -1266,7 +1302,7 @@ static int shuffle_init ( int index, int cycles, int aggregator, int rank, mca_i for (i=0 ; i= fh->f_size ) { *priority = 100; diff --git a/ompi/mca/fcoll/static/Makefile.am b/ompi/mca/fcoll/static/Makefile.am index c9ff1893d2f..f72f28ed273 100644 --- a/ompi/mca/fcoll/static/Makefile.am +++ b/ompi/mca/fcoll/static/Makefile.am @@ -11,6 +11,7 @@ # All rights reserved. # Copyright (c) 2008-2015 University of Houston. All rights reserved. # Copyright (c) 2012 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -41,6 +42,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_fcoll_static_la_SOURCES = $(sources) mca_fcoll_static_la_LDFLAGS = -module -avoid-version +mca_fcoll_static_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_fcoll_static_la_SOURCES =$(sources) diff --git a/ompi/mca/fcoll/static/fcoll_static_file_read_all.c b/ompi/mca/fcoll/static/fcoll_static_file_read_all.c index 73f0a8dbbd6..c6410c72095 100644 --- a/ompi/mca/fcoll/static/fcoll_static_file_read_all.c +++ b/ompi/mca/fcoll/static/fcoll_static_file_read_all.c @@ -11,7 +11,10 @@ * All rights reserved. * Copyright (c) 2008-2016 University of Houston. All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -79,7 +82,8 @@ mca_fcoll_static_file_read_all (mca_io_ompio_file_t *fh, MPI_Aint **displs_per_process=NULL, global_iov_count=0, global_count=0; MPI_Aint *memory_displacements=NULL; int bytes_to_read_in_cycle=0; - size_t max_data=0, bytes_per_cycle=0; + size_t max_data=0; + MPI_Aint bytes_per_cycle=0; uint32_t iov_count=0, iov_index=0; struct iovec *decoded_iov=NULL, *iov=NULL; mca_fcoll_static_local_io_array *local_iov_array=NULL, *global_iov_array=NULL; @@ -89,7 +93,7 @@ mca_fcoll_static_file_read_all (mca_io_ompio_file_t *fh, int blocklen[3] = {1, 1, 1}; int static_num_io_procs=1; - OPAL_PTRDIFF_TYPE d[3], base; + ptrdiff_t d[3], base; ompi_datatype_t *types[3]; ompi_datatype_t *io_array_type=MPI_DATATYPE_NULL; ompi_datatype_t **sendtype = NULL; @@ -97,7 +101,7 @@ mca_fcoll_static_file_read_all (mca_io_ompio_file_t *fh, int my_aggregator=-1; bool recvbuf_is_contiguous=false; size_t ftype_size; - OPAL_PTRDIFF_TYPE ftype_extent, lb; + ptrdiff_t ftype_extent, lb; #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN double read_time = 0.0, start_read_time = 0.0, end_read_time = 0.0; @@ -114,7 +118,7 @@ mca_fcoll_static_file_read_all (mca_io_ompio_file_t *fh, /************************************************************************** ** 1. In case the data is not contigous in memory, decode it into an iovec **************************************************************************/ - if ( ( ftype_extent == (OPAL_PTRDIFF_TYPE) ftype_size) && + if ( ( ftype_extent == (ptrdiff_t) ftype_size) && opal_datatype_is_contiguous_memory_layout(&datatype->super,1) && 0 == lb ) { recvbuf_is_contiguous = true; @@ -140,7 +144,11 @@ mca_fcoll_static_file_read_all (mca_io_ompio_file_t *fh, } - fh->f_get_num_aggregators ( &static_num_io_procs ); + static_num_io_procs = fh->f_get_mca_parameter_value ( "num_aggregators", strlen ("num_aggregators")); + if ( OMPI_ERR_MAX == static_num_io_procs ) { + ret = OMPI_ERROR; + goto exit; + } fh->f_set_aggregator_props ((struct mca_io_ompio_file_t *) fh, static_num_io_procs, max_data); @@ -186,9 +194,9 @@ mca_fcoll_static_file_read_all (mca_io_ompio_file_t *fh, local_iov_array[0].process_id = fh->f_rank; } - d[0] = (OPAL_PTRDIFF_TYPE)&local_iov_array[0]; - d[1] = (OPAL_PTRDIFF_TYPE)&local_iov_array[0].length; - d[2] = (OPAL_PTRDIFF_TYPE)&local_iov_array[0].process_id; + d[0] = (ptrdiff_t)&local_iov_array[0]; + d[1] = (ptrdiff_t)&local_iov_array[0].length; + d[2] = (ptrdiff_t)&local_iov_array[0].process_id; base = d[0]; for (i=0 ; i<3 ; i++) { d[i] -= base; @@ -207,7 +215,11 @@ mca_fcoll_static_file_read_all (mca_io_ompio_file_t *fh, ompi_datatype_commit (&io_array_type); /* #########################################################*/ - fh->f_get_bytes_per_agg ( (int*) &bytes_per_cycle); + bytes_per_cycle = fh->f_get_mca_parameter_value ("bytes_per_agg", strlen ("bytes_per_agg")); + if ( OMPI_ERR_MAX == bytes_per_cycle ) { + ret = OMPI_ERROR; + goto exit; + } local_cycles = ceil((double)max_data*fh->f_procs_per_group/bytes_per_cycle); #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN @@ -292,7 +304,7 @@ mca_fcoll_static_file_read_all (mca_io_ompio_file_t *fh, #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN start_rexch = MPI_Wtime(); #endif - ret = fcoll_base_coll_allgather_array (&iov_size, + ret = ompi_fcoll_base_coll_allgather_array (&iov_size, 1, MPI_INT, iovec_count_per_process, @@ -335,7 +347,7 @@ mca_fcoll_static_file_read_all (mca_io_ompio_file_t *fh, #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN start_rexch = MPI_Wtime(); #endif - ret = fcoll_base_coll_gatherv_array (local_iov_array, + ret = ompi_fcoll_base_coll_gatherv_array (local_iov_array, iov_size, io_array_type, global_iov_array, @@ -480,7 +492,7 @@ mca_fcoll_static_file_read_all (mca_io_ompio_file_t *fh, if ((index == local_cycles-1) && (max_data % (bytes_per_cycle/fh->f_procs_per_group))) { bytes_to_read_in_cycle = max_data - position; } - else if (max_data <= bytes_per_cycle/fh->f_procs_per_group) { + else if (max_data <= (size_t) (bytes_per_cycle/fh->f_procs_per_group)) { bytes_to_read_in_cycle = max_data; } else { @@ -494,7 +506,7 @@ mca_fcoll_static_file_read_all (mca_io_ompio_file_t *fh, #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN start_rexch = MPI_Wtime(); #endif - fcoll_base_coll_gather_array (&bytes_to_read_in_cycle, + ompi_fcoll_base_coll_gather_array (&bytes_to_read_in_cycle, 1, MPI_INT, bytes_per_process, @@ -768,7 +780,7 @@ mca_fcoll_static_file_read_all (mca_io_ompio_file_t *fh, for (i=0 ; if_num_of_io_entries ; i++) { printf(" ADDRESS: %p OFFSET: %ld LENGTH: %ld\n", fh->f_io_array[i].memory_address, - (OPAL_PTRDIFF_TYPE)fh->f_io_array[i].offset, + (ptrdiff_t)fh->f_io_array[i].offset, fh->f_io_array[i].length); } #endif @@ -871,14 +883,14 @@ mca_fcoll_static_file_read_all (mca_io_ompio_file_t *fh, position += bytes_to_read_in_cycle; if (!recvbuf_is_contiguous) { - OPAL_PTRDIFF_TYPE mem_address; + ptrdiff_t mem_address; size_t remaining = 0; size_t temp_position = 0; remaining = bytes_to_read_in_cycle; while (remaining && (iov_count > iov_index)){ - mem_address = (OPAL_PTRDIFF_TYPE) + mem_address = (ptrdiff_t) (decoded_iov[iov_index].iov_base) + current_position; if (remaining >= diff --git a/ompi/mca/fcoll/static/fcoll_static_file_write_all.c b/ompi/mca/fcoll/static/fcoll_static_file_write_all.c index 75dcf88b979..07f57db2116 100644 --- a/ompi/mca/fcoll/static/fcoll_static_file_write_all.c +++ b/ompi/mca/fcoll/static/fcoll_static_file_write_all.c @@ -11,8 +11,10 @@ * All rights reserved. * Copyright (c) 2008-2016 University of Houston. All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. + * Copyright (c) 2018 Cisco Systems, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -68,7 +70,8 @@ mca_fcoll_static_file_write_all (mca_io_ompio_file_t *fh, - size_t max_data = 0, bytes_per_cycle=0; + size_t max_data = 0; + MPI_Aint bytes_per_cycle=0; struct iovec *iov=NULL, *decoded_iov=NULL; uint32_t iov_count=0, iov_index=0; int i=0,j=0,l=0, temp_index; @@ -90,13 +93,13 @@ mca_fcoll_static_file_write_all (mca_io_ompio_file_t *fh, /* For creating datatype of type io_array */ int blocklen[3] = {1, 1, 1}; int static_num_io_procs=1; - OPAL_PTRDIFF_TYPE d[3], base; + ptrdiff_t d[3], base; ompi_datatype_t *types[3]; ompi_datatype_t *io_array_type=MPI_DATATYPE_NULL; int my_aggregator=-1; bool sendbuf_is_contiguous= false; size_t ftype_size; - OPAL_PTRDIFF_TYPE ftype_extent, lb; + ptrdiff_t ftype_extent, lb; /*----------------------------------------------*/ @@ -118,7 +121,7 @@ mca_fcoll_static_file_write_all (mca_io_ompio_file_t *fh, /************************************************************************** ** 1. In case the data is not contigous in memory, decode it into an iovec **************************************************************************/ - if ( ( ftype_extent == (OPAL_PTRDIFF_TYPE) ftype_size) && + if ( ( ftype_extent == (ptrdiff_t) ftype_size) && opal_datatype_is_contiguous_memory_layout(&datatype->super,1) && 0 == lb ) { sendbuf_is_contiguous = true; @@ -143,7 +146,11 @@ mca_fcoll_static_file_write_all (mca_io_ompio_file_t *fh, status->_ucount = max_data; } - fh->f_get_num_aggregators ( & static_num_io_procs ); + static_num_io_procs = fh->f_get_mca_parameter_value ( "num_aggregators", strlen ("num_aggregators")); + if ( OMPI_ERR_MAX == static_num_io_procs ) { + ret = OMPI_ERROR; + goto exit; + } fh->f_set_aggregator_props ((struct mca_io_ompio_file_t *)fh, static_num_io_procs, max_data); @@ -155,9 +162,9 @@ mca_fcoll_static_file_write_all (mca_io_ompio_file_t *fh, types[1] = &ompi_mpi_long.dt; types[2] = &ompi_mpi_int.dt; - d[0] = (OPAL_PTRDIFF_TYPE)&local_iov_array[0]; - d[1] = (OPAL_PTRDIFF_TYPE)&local_iov_array[0].length; - d[2] = (OPAL_PTRDIFF_TYPE)&local_iov_array[0].process_id; + d[0] = (ptrdiff_t)&local_iov_array[0]; + d[1] = (ptrdiff_t)&local_iov_array[0].length; + d[2] = (ptrdiff_t)&local_iov_array[0].process_id; base = d[0]; for (i=0 ; i<3 ; i++) { d[i] -= base; @@ -201,7 +208,11 @@ mca_fcoll_static_file_write_all (mca_io_ompio_file_t *fh, } - fh->f_get_bytes_per_agg ( (int *) &bytes_per_cycle); + bytes_per_cycle = fh->f_get_mca_parameter_value ("bytes_per_agg", strlen ("bytes_per_agg")); + if ( OMPI_ERR_MAX == bytes_per_cycle ) { + ret = OMPI_ERROR; + goto exit; + } local_cycles = ceil( ((double)max_data*fh->f_procs_per_group) /bytes_per_cycle); #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN @@ -295,7 +306,7 @@ mca_fcoll_static_file_write_all (mca_io_ompio_file_t *fh, #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN start_exch = MPI_Wtime(); #endif - ret = fcoll_base_coll_allgather_array (&iov_size, + ret = ompi_fcoll_base_coll_allgather_array (&iov_size, 1, MPI_INT, iovec_count_per_process, @@ -339,7 +350,7 @@ mca_fcoll_static_file_write_all (mca_io_ompio_file_t *fh, #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN start_exch = MPI_Wtime(); #endif - ret = fcoll_base_coll_gatherv_array (local_iov_array, + ret = ompi_fcoll_base_coll_gatherv_array (local_iov_array, iov_size, io_array_type, global_iov_array, @@ -474,7 +485,7 @@ mca_fcoll_static_file_write_all (mca_io_ompio_file_t *fh, if ((index == local_cycles-1) && (max_data % (bytes_per_cycle/fh->f_procs_per_group)) ) { bytes_to_write_in_cycle = max_data - total_bytes_written; } - else if (max_data <= bytes_per_cycle/fh->f_procs_per_group) { + else if (max_data <= (size_t) (bytes_per_cycle/fh->f_procs_per_group)) { bytes_to_write_in_cycle = max_data; } else { @@ -500,7 +511,7 @@ mca_fcoll_static_file_write_all (mca_io_ompio_file_t *fh, start_exch = MPI_Wtime(); #endif /* gather from each process how many bytes each will be sending */ - ret = fcoll_base_coll_gather_array (&bytes_to_write_in_cycle, + ret = ompi_fcoll_base_coll_gather_array (&bytes_to_write_in_cycle, 1, MPI_INT, bytes_per_process, @@ -787,7 +798,7 @@ mca_fcoll_static_file_write_all (mca_io_ompio_file_t *fh, /* allocate a send buffer and copy the data that needs to be sent into it in case the data is non-contigous in memory */ - OPAL_PTRDIFF_TYPE mem_address; + ptrdiff_t mem_address; size_t remaining = 0; size_t temp_position = 0; @@ -800,7 +811,7 @@ mca_fcoll_static_file_write_all (mca_io_ompio_file_t *fh, remaining = bytes_to_write_in_cycle; while (remaining) { - mem_address = (OPAL_PTRDIFF_TYPE) + mem_address = (ptrdiff_t) (decoded_iov[iov_index].iov_base) + current_position; if (remaining >= @@ -914,7 +925,7 @@ mca_fcoll_static_file_write_all (mca_io_ompio_file_t *fh, for (i=0 ; if_num_of_io_entries ; i++) { printf(" ADDRESS: %p OFFSET: %ld LENGTH: %ld\n", fh->f_io_array[i].memory_address, - (OPAL_PTRDIFF_TYPE)fh->f_io_array[i].offset, + (ptrdiff_t)fh->f_io_array[i].offset, fh->f_io_array[i].length); } #endif @@ -937,9 +948,7 @@ mca_fcoll_static_file_write_all (mca_io_ompio_file_t *fh, #endif } - - if (my_aggregator == fh->f_rank) { - } } + } #if OMPIO_FCOLL_WANT_TIME_BREAKDOWN end_exch = MPI_Wtime(); @@ -966,12 +975,12 @@ mca_fcoll_static_file_write_all (mca_io_ompio_file_t *fh, decoded_iov = NULL; } - if (my_aggregator == fh->f_rank) { + if (NULL != local_iov_array){ + free(local_iov_array); + local_iov_array = NULL; + } - if (NULL != local_iov_array){ - free(local_iov_array); - local_iov_array = NULL; - } + if (my_aggregator == fh->f_rank) { for(l=0;lf_procs_per_group;l++){ if (NULL != blocklen_per_process[l]){ free(blocklen_per_process[l]); diff --git a/ompi/mca/fcoll/static/fcoll_static_module.c b/ompi/mca/fcoll/static/fcoll_static_module.c index e4438f70a19..79e2377f0af 100644 --- a/ompi/mca/fcoll/static/fcoll_static_module.c +++ b/ompi/mca/fcoll/static/fcoll_static_module.c @@ -61,8 +61,8 @@ mca_fcoll_static_component_file_query (mca_io_ompio_file_t *fh, int *priority) } if (mca_fcoll_base_query_table (fh, "static")) { - if (*priority < 50) { - *priority = 50; + if (*priority < 30) { + *priority = 30; } } diff --git a/ompi/mca/fcoll/two_phase/Makefile.am b/ompi/mca/fcoll/two_phase/Makefile.am index 7b9395f55e7..154d9a32e93 100644 --- a/ompi/mca/fcoll/two_phase/Makefile.am +++ b/ompi/mca/fcoll/two_phase/Makefile.am @@ -11,6 +11,7 @@ # All rights reserved. # Copyright (c) 2008-2015 University of Houston. All rights reserved. # Copyright (c) 2012 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -42,6 +43,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_fcoll_two_phase_la_SOURCES = $(sources) mca_fcoll_two_phase_la_LDFLAGS = -module -avoid-version +mca_fcoll_two_phase_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_fcoll_two_phase_la_SOURCES =$(sources) diff --git a/ompi/mca/fcoll/two_phase/fcoll_two_phase_file_read_all.c b/ompi/mca/fcoll/two_phase/fcoll_two_phase_file_read_all.c index 6c4b717bc90..2e6bb9b7246 100644 --- a/ompi/mca/fcoll/two_phase/fcoll_two_phase_file_read_all.c +++ b/ompi/mca/fcoll/two_phase/fcoll_two_phase_file_read_all.c @@ -11,9 +11,11 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2014 University of Houston. All rights reserved. - * Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -25,6 +27,7 @@ #include "fcoll_two_phase.h" #include "mpi.h" #include "ompi/constants.h" +#include "ompi/communicator/communicator.h" #include "ompi/mca/fcoll/fcoll.h" #include "ompi/mca/io/ompio/io_ompio.h" #include "ompi/mca/io/io.h" @@ -165,11 +168,11 @@ mca_fcoll_two_phase_file_read_all (mca_io_ompio_file_t *fh, for (ti = 0; ti < iov_count; ti++){ decoded_iov[ti].iov_base = (IOVBASE_TYPE *) - ((OPAL_PTRDIFF_TYPE)temp_iov[ti].iov_base - recv_buf_addr); + ((ptrdiff_t)temp_iov[ti].iov_base - recv_buf_addr); decoded_iov[ti].iov_len = temp_iov[ti].iov_len; #if DEBUG printf("d_offset[%d]: %ld, d_len[%d]: %ld\n", - ti, (OPAL_PTRDIFF_TYPE)decoded_iov[ti].iov_base, + ti, (ptrdiff_t)decoded_iov[ti].iov_base, ti, decoded_iov[ti].iov_len); #endif } @@ -183,7 +186,11 @@ mca_fcoll_two_phase_file_read_all (mca_io_ompio_file_t *fh, status->_ucount = max_data; } - fh->f_get_num_aggregators (&two_phase_num_io_procs); + two_phase_num_io_procs = fh->f_get_mca_parameter_value ( "num_aggregators", strlen ("num_aggregators")); + if ( OMPI_ERR_MAX == two_phase_num_io_procs ) { + ret = OMPI_ERROR; + goto exit; + } if (-1 == two_phase_num_io_procs ){ ret = fh->f_set_aggregator_props ((struct mca_io_ompio_file_t *)fh, two_phase_num_io_procs, @@ -197,7 +204,7 @@ mca_fcoll_two_phase_file_read_all (mca_io_ompio_file_t *fh, } if (two_phase_num_io_procs > fh->f_size){ - two_phase_num_io_procs = fh->f_size; + two_phase_num_io_procs = fh->f_size; } aggregator_list = (int *) calloc (two_phase_num_io_procs, sizeof(int)); @@ -206,9 +213,16 @@ mca_fcoll_two_phase_file_read_all (mca_io_ompio_file_t *fh, goto exit; } - for (i=0; i< two_phase_num_io_procs; i++){ - aggregator_list[i] = i * fh->f_size / two_phase_num_io_procs; + if ( OMPI_COMM_IS_MAPBY_NODE (&ompi_mpi_comm_world.comm) ) { + for (i =0; i< two_phase_num_io_procs; i++){ + aggregator_list[i] = i; + } } + else { + for (i =0; i< two_phase_num_io_procs; i++){ + aggregator_list[i] = i * fh->f_size / two_phase_num_io_procs; + } + } ret = fh->f_generate_current_file_view ((struct mca_io_ompio_file_t *)fh, max_data, @@ -565,7 +579,11 @@ static int two_phase_read_and_exch(mca_io_ompio_file_t *fh, } } - fh->f_get_bytes_per_agg ( &two_phase_cycle_buffer_size); + two_phase_cycle_buffer_size = fh->f_get_mca_parameter_value ("bytes_per_agg", strlen ("bytes_per_agg")); + if ( OMPI_ERR_MAX == two_phase_cycle_buffer_size ) { + ret = OMPI_ERROR; + goto exit; + } ntimes = (int)((end_loc - st_loc + two_phase_cycle_buffer_size)/ two_phase_cycle_buffer_size); @@ -671,7 +689,7 @@ static int two_phase_read_and_exch(mca_io_ompio_file_t *fh, } if (req_off < real_off + real_size) { count[i]++; - PMPI_Address(read_buf+req_off-real_off, + PMPI_Get_address(read_buf+req_off-real_off, &(others_req[i].mem_ptrs[j])); send_size[i] += (int)(OMPIO_MIN(real_off + real_size - req_off, diff --git a/ompi/mca/fcoll/two_phase/fcoll_two_phase_file_write_all.c b/ompi/mca/fcoll/two_phase/fcoll_two_phase_file_write_all.c index f78cb143864..9fab19f63f3 100644 --- a/ompi/mca/fcoll/two_phase/fcoll_two_phase_file_write_all.c +++ b/ompi/mca/fcoll/two_phase/fcoll_two_phase_file_write_all.c @@ -11,7 +11,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2014 University of Houston. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015-2016 Los Alamos National Security, LLC. All rights * reserved. @@ -27,6 +27,7 @@ #include "mpi.h" #include "ompi/constants.h" +#include "ompi/communicator/communicator.h" #include "ompi/mca/fcoll/fcoll.h" #include "ompi/mca/io/ompio/io_ompio.h" #include "ompi/mca/io/io.h" @@ -190,7 +191,7 @@ mca_fcoll_two_phase_file_write_all (mca_io_ompio_file_t *fh, goto exit; } - send_buf_addr = (OPAL_PTRDIFF_TYPE)buf; + send_buf_addr = (ptrdiff_t)buf; if ( 0 < iov_count ) { decoded_iov = (struct iovec *)malloc (iov_count * sizeof(struct iovec)); @@ -201,13 +202,13 @@ mca_fcoll_two_phase_file_write_all (mca_io_ompio_file_t *fh, } for (ti = 0; ti < iov_count; ti ++){ decoded_iov[ti].iov_base = (IOVBASE_TYPE *)( - (OPAL_PTRDIFF_TYPE)temp_iov[ti].iov_base - + (ptrdiff_t)temp_iov[ti].iov_base - send_buf_addr); decoded_iov[ti].iov_len = temp_iov[ti].iov_len ; #if DEBUG_ON printf("d_offset[%d]: %ld, d_len[%d]: %ld\n", - ti, (OPAL_PTRDIFF_TYPE)decoded_iov[ti].iov_base, + ti, (ptrdiff_t)decoded_iov[ti].iov_base, ti, decoded_iov[ti].iov_len); #endif } @@ -221,7 +222,11 @@ mca_fcoll_two_phase_file_write_all (mca_io_ompio_file_t *fh, status->_ucount = max_data; } - fh->f_get_num_aggregators ( &two_phase_num_io_procs ); + two_phase_num_io_procs = fh->f_get_mca_parameter_value ( "num_aggregators", strlen ("num_aggregators")); + if ( OMPI_ERR_MAX == two_phase_num_io_procs ) { + ret = OMPI_ERROR; + goto exit; + } if(-1 == two_phase_num_io_procs){ ret = fh->f_set_aggregator_props ((struct mca_io_ompio_file_t *)fh, two_phase_num_io_procs, @@ -235,7 +240,7 @@ mca_fcoll_two_phase_file_write_all (mca_io_ompio_file_t *fh, } if (two_phase_num_io_procs > fh->f_size){ - two_phase_num_io_procs = fh->f_size; + two_phase_num_io_procs = fh->f_size; } #if DEBUG_ON @@ -248,10 +253,16 @@ mca_fcoll_two_phase_file_write_all (mca_io_ompio_file_t *fh, goto exit; } - for (i =0; i< two_phase_num_io_procs; i++){ - aggregator_list[i] = i * fh->f_size / two_phase_num_io_procs; + if ( OMPI_COMM_IS_MAPBY_NODE (&ompi_mpi_comm_world.comm) ) { + for (i =0; i< two_phase_num_io_procs; i++){ + aggregator_list[i] = i; + } } - + else { + for (i =0; i< two_phase_num_io_procs; i++){ + aggregator_list[i] = i * fh->f_size / two_phase_num_io_procs; + } + } ret = fh->f_generate_current_file_view ((struct mca_io_ompio_file_t*)fh, max_data, @@ -635,7 +646,11 @@ static int two_phase_exch_and_write(mca_io_ompio_file_t *fh, } } - fh->f_get_bytes_per_agg ( &two_phase_cycle_buffer_size ); + two_phase_cycle_buffer_size = fh->f_get_mca_parameter_value ("bytes_per_agg", strlen ("bytes_per_agg")); + if ( OMPI_ERR_MAX == two_phase_cycle_buffer_size ) { + ret = OMPI_ERROR; + goto exit; + } ntimes = (int) ((end_loc - st_loc + two_phase_cycle_buffer_size)/two_phase_cycle_buffer_size); if ((st_loc == -1) && (end_loc == -1)) { @@ -766,7 +781,7 @@ static int two_phase_exch_and_write(mca_io_ompio_file_t *fh, size,i, count[i]); #endif - PMPI_Address(write_buf+req_off-off, + PMPI_Get_address(write_buf+req_off-off, &(others_req[i].mem_ptrs[j])); #if DEBUG_ON printf("%d : mem_ptrs : %ld\n", fh->f_rank, @@ -1053,7 +1068,7 @@ static int two_phase_exchage_data(mca_io_ompio_file_t *fh, if (nprocs_recv){ if (*hole){ - if (off > 0){ + if (off >= 0){ fh->f_io_array = (mca_io_ompio_io_array_t *)malloc (sizeof(mca_io_ompio_io_array_t)); if (NULL == fh->f_io_array) { @@ -1066,6 +1081,23 @@ static int two_phase_exchage_data(mca_io_ompio_file_t *fh, fh->f_io_array[0].length = size; fh->f_io_array[0].memory_address = write_buf; if (fh->f_num_of_io_entries){ + int amode_overwrite; + amode_overwrite = fh->f_get_mca_parameter_value ("overwrite_amode", strlen("overwrite_amode")); + if ( OMPI_ERR_MAX == amode_overwrite ) { + ret = OMPI_ERROR; + goto exit; + } + if ( fh->f_amode & MPI_MODE_WRONLY && !amode_overwrite ){ + if ( 0 == fh->f_rank ) { + printf("\n File not opened in RDWR mode, can not continue." + "\n To resolve this problem, you can either \n" + " a. open the file with MPI_MODE_RDWR instead of MPI_MODE_WRONLY\n" + " b. ensure that the mca parameter mca_io_ompio_overwrite_amode is set to 1\n" + " c. use an fcoll component that does not use data sieving (e.g. dynamic)\n"); + } + ret = MPI_ERR_FILE; + goto exit; + } if ( 0 > fh->f_fbtl->fbtl_preadv (fh)) { opal_output(1, "READ FAILED\n"); ret = OMPI_ERROR; diff --git a/ompi/mca/fs/configure.m4 b/ompi/mca/fs/configure.m4 index 98454764ee5..d8ab8db881c 100644 --- a/ompi/mca/fs/configure.m4 +++ b/ompi/mca/fs/configure.m4 @@ -1,7 +1,7 @@ # -*- shell-script -*- # # Copyright (c) 2011 Cisco Systems, Inc. All rights reserved. -# Copyright (c) 2016 Research Organization for Information Science +# Copyright (c) 2016-2017 Research Organization for Information Science # and Technology (RIST). All rights reserved. # # $COPYRIGHT$ @@ -17,8 +17,7 @@ AC_DEFUN([MCA_ompi_fs_CONFIG], [ OPAL_VAR_SCOPE_PUSH([want_io_ompio]) - AS_IF([test "$enable_mpi_io" != "no" && - test "$enable_io_ompio" != "no"], + AS_IF([test "$enable_io_ompio" != "no"], [want_io_ompio=1], [want_io_ompio=0]) diff --git a/ompi/mca/fs/fs.h b/ompi/mca/fs/fs.h index cdb6922827c..b5a5aee7018 100644 --- a/ompi/mca/fs/fs.h +++ b/ompi/mca/fs/fs.h @@ -15,6 +15,7 @@ * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -29,6 +30,7 @@ #include "mpi.h" #include "ompi/mca/mca.h" #include "opal/mca/base/base.h" +#include "ompi/info/info.h" BEGIN_C_DECLS @@ -110,10 +112,10 @@ typedef int (*mca_fs_base_module_finalize_1_0_0_fn_t) typedef int (*mca_fs_base_module_file_open_fn_t)( struct ompi_communicator_t *comm, const char *filename, int amode, - struct ompi_info_t *info, struct mca_io_ompio_file_t *fh); + struct opal_info_t *info, struct mca_io_ompio_file_t *fh); typedef int (*mca_fs_base_module_file_close_fn_t)(struct mca_io_ompio_file_t *fh); typedef int (*mca_fs_base_module_file_delete_fn_t)( - char *filename, struct ompi_info_t *info); + char *filename, struct opal_info_t *info); typedef int (*mca_fs_base_module_file_set_size_fn_t) (struct mca_io_ompio_file_t *fh, OMPI_MPI_OFFSET_TYPE size); typedef int (*mca_fs_base_module_file_get_size_fn_t) diff --git a/ompi/mca/fs/lustre/Makefile.am b/ompi/mca/fs/lustre/Makefile.am index 4fe256888ef..210d4102c53 100644 --- a/ompi/mca/fs/lustre/Makefile.am +++ b/ompi/mca/fs/lustre/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2008-2011 University of Houston. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -47,7 +48,8 @@ AM_CPPFLAGS = $(fs_lustre_CPPFLAGS) mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_fs_lustre_la_SOURCES = $(fs_lustre_sources) -mca_fs_lustre_la_LIBADD = $(fs_lustre_LIBS) +mca_fs_lustre_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(fs_lustre_LIBS) mca_fs_lustre_la_LDFLAGS = -module -avoid-version $(fs_lustre_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/ompi/mca/fs/lustre/configure.m4 b/ompi/mca/fs/lustre/configure.m4 index ab660ed0b26..9e64c6b5351 100644 --- a/ompi/mca/fs/lustre/configure.m4 +++ b/ompi/mca/fs/lustre/configure.m4 @@ -10,7 +10,7 @@ # University of Stuttgart. All rights reserved. # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. -# Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2010-2017 Cisco Systems, Inc. All rights reserved # Copyright (c) 2008-2012 University of Houston. All rights reserved. # $COPYRIGHT$ # @@ -34,11 +34,7 @@ AC_DEFUN([MCA_ompi_fs_lustre_CONFIG],[ [$1], [$2]) -# AC_CHECK_HEADERS([lustre/liblustreapi.h], [], -# [AC_CHECK_HEADERS([lustre/liblustreapi.h], [], [$2], -# [AC_INCLUDES_DEFAULT])], -# [AC_INCLUDES_DEFAULT]) - + OPAL_SUMMARY_ADD([[OMPIO File Systems]],[[Lustre]],[$1],[$fs_lustre_happy]) # substitute in the things needed to build lustre AC_SUBST([fs_lustre_CPPFLAGS]) diff --git a/ompi/mca/fs/lustre/fs_lustre.c b/ompi/mca/fs/lustre/fs_lustre.c index 5af09c528b9..466368168c6 100644 --- a/ompi/mca/fs/lustre/fs_lustre.c +++ b/ompi/mca/fs/lustre/fs_lustre.c @@ -9,7 +9,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2008-2016 University of Houston. All rights reserved. + * Copyright (c) 2008-2017 University of Houston. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -43,8 +43,6 @@ #endif #include -#include -#include /* * ******************************************************************* diff --git a/ompi/mca/fs/lustre/fs_lustre.h b/ompi/mca/fs/lustre/fs_lustre.h index 8e36a3933f0..11042606e9b 100644 --- a/ompi/mca/fs/lustre/fs_lustre.h +++ b/ompi/mca/fs/lustre/fs_lustre.h @@ -9,9 +9,10 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2008-2016 University of Houston. All rights reserved. + * Copyright (c) 2008-2017 University of Houston. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -33,7 +34,7 @@ extern int mca_fs_lustre_stripe_width; BEGIN_C_DECLS -#include +#include #include #ifndef LOV_MAX_STRIPE_COUNT @@ -60,13 +61,13 @@ OMPI_MODULE_DECLSPEC extern mca_fs_base_component_2_0_0_t mca_fs_lustre_componen int mca_fs_lustre_file_open (struct ompi_communicator_t *comm, const char *filename, int amode, - struct ompi_info_t *info, + struct opal_info_t *info, mca_io_ompio_file_t *fh); int mca_fs_lustre_file_close (mca_io_ompio_file_t *fh); int mca_fs_lustre_file_delete (char *filename, - struct ompi_info_t *info); + struct opal_info_t *info); int mca_fs_lustre_file_set_size (mca_io_ompio_file_t *fh, OMPI_MPI_OFFSET_TYPE size); diff --git a/ompi/mca/fs/lustre/fs_lustre_file_delete.c b/ompi/mca/fs/lustre/fs_lustre_file_delete.c index 1fc6da84080..eb293b2021f 100644 --- a/ompi/mca/fs/lustre/fs_lustre_file_delete.c +++ b/ompi/mca/fs/lustre/fs_lustre_file_delete.c @@ -10,6 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2011 University of Houston. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -35,7 +36,7 @@ */ int mca_fs_lustre_file_delete (char* file_name, - struct ompi_info_t *info) + struct opal_info_t *info) { int ret; diff --git a/ompi/mca/fs/lustre/fs_lustre_file_open.c b/ompi/mca/fs/lustre/fs_lustre_file_open.c index 4dd23e529cd..fc1c870d2ac 100644 --- a/ompi/mca/fs/lustre/fs_lustre_file_open.c +++ b/ompi/mca/fs/lustre/fs_lustre_file_open.c @@ -9,9 +9,10 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2008-2015 University of Houston. All rights reserved. + * Copyright (c) 2008-2018 University of Houston. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -59,12 +60,12 @@ int mca_fs_lustre_file_open (struct ompi_communicator_t *comm, const char* filename, int access_mode, - struct ompi_info_t *info, + struct opal_info_t *info, mca_io_ompio_file_t *fh) { - int amode; + int amode, rank; int old_mask, perm; - int rc; + int rc, ret=OMPI_SUCCESS; int flag; int fs_lustre_stripe_size = -1; int fs_lustre_stripe_width = -1; @@ -80,65 +81,121 @@ mca_fs_lustre_file_open (struct ompi_communicator_t *comm, else { perm = fh->f_perm; } + + rank = fh->f_rank; amode = 0; - if (access_mode & MPI_MODE_CREATE) - amode = amode | O_CREAT; if (access_mode & MPI_MODE_RDONLY) amode = amode | O_RDONLY; if (access_mode & MPI_MODE_WRONLY) amode = amode | O_WRONLY; if (access_mode & MPI_MODE_RDWR) amode = amode | O_RDWR; - if (access_mode & MPI_MODE_EXCL) - amode = amode | O_EXCL; - - ompi_info_get (info, "stripe_size", MPI_MAX_INFO_VAL, char_stripe, &flag); + opal_info_get (info, "stripe_size", MPI_MAX_INFO_VAL, char_stripe, &flag); if ( flag ) { sscanf ( char_stripe, "%d", &fs_lustre_stripe_size ); } - ompi_info_get (info, "stripe_width", MPI_MAX_INFO_VAL, char_stripe, &flag); + opal_info_get (info, "stripe_width", MPI_MAX_INFO_VAL, char_stripe, &flag); if ( flag ) { sscanf ( char_stripe, "%d", &fs_lustre_stripe_width ); } if (fs_lustre_stripe_size < 0) { fs_lustre_stripe_size = mca_fs_lustre_stripe_size; - } + } if (fs_lustre_stripe_width < 0) { fs_lustre_stripe_width = mca_fs_lustre_stripe_width; } - if ( (fs_lustre_stripe_size>0 || fs_lustre_stripe_width>0) && - (amode&O_CREAT) && (amode&O_RDWR)) { - if (0 == fh->f_rank) { + + /* Reset errno */ + errno = 0; + if (0 == fh->f_rank) { + /* MODE_CREATE and MODE_EXCL can only be set by one process */ + if ( !(fh->f_flags & OMPIO_SHAREDFP_IS_SET)) { + if ( access_mode & MPI_MODE_CREATE ) + amode = amode | O_CREAT; + if (access_mode & MPI_MODE_EXCL) + amode = amode | O_EXCL; + } + + if ( (fs_lustre_stripe_size>0 || fs_lustre_stripe_width>0) && + ( amode&O_CREAT) && + ( (amode&O_RDWR)|| amode&O_WRONLY) ) { llapi_file_create(filename, fs_lustre_stripe_size, -1, /* MSC need to change that */ fs_lustre_stripe_width, 0); /* MSC need to change that */ - - fh->fd = open(filename, O_CREAT | O_RDWR | O_LOV_DELAY_CREATE, perm); - if (fh->fd < 0) { - fprintf(stderr, "Can't open %s file: %d (%s)\n", - filename, errno, strerror(errno)); - return OMPI_ERROR; + + fh->fd = open(filename, amode | O_LOV_DELAY_CREATE, perm); + } + else { + fh->fd = open (filename, amode, perm); + } + if ( 0 > fh->fd ) { + if ( EACCES == errno ) { + ret = MPI_ERR_ACCESS; + } + else if ( ENAMETOOLONG == errno ) { + ret = MPI_ERR_BAD_FILE; + } + else if ( ENOENT == errno ) { + ret = MPI_ERR_NO_SUCH_FILE; + } + else if ( EISDIR == errno ) { + ret = MPI_ERR_BAD_FILE; + } + else if ( EROFS == errno ) { + ret = MPI_ERR_READ_ONLY; + } + else if ( EEXIST == errno ) { + ret = MPI_ERR_FILE_EXISTS; + } + else { + ret = MPI_ERR_OTHER; } - close (fh->fd); } - fh->f_comm->c_coll->coll_barrier (fh->f_comm, - fh->f_comm->c_coll->coll_barrier_module); } - fh->fd = open (filename, amode, perm); - if (fh->fd < 0) { - opal_output(1, "error opening file %s\n", filename); - return OMPI_ERROR; + comm->c_coll->coll_bcast ( &ret, 1, MPI_INT, 0, comm, comm->c_coll->coll_bcast_module); + if ( OMPI_SUCCESS != ret ) { + fh->fd = -1; + return ret; + } + + if ( 0 != rank ) { + fh->fd = open (filename, amode, perm); + if ( 0 > fh->fd) { + if ( EACCES == errno ) { + ret = MPI_ERR_ACCESS; + } + else if ( ENAMETOOLONG == errno ) { + ret = MPI_ERR_BAD_FILE; + } + else if ( ENOENT == errno ) { + ret = MPI_ERR_NO_SUCH_FILE; + } + else if ( EISDIR == errno ) { + ret = MPI_ERR_BAD_FILE; + } + else if ( EROFS == errno ) { + ret = MPI_ERR_READ_ONLY; + } + else if ( EEXIST == errno ) { + ret = MPI_ERR_FILE_EXISTS; + } + else { + ret = MPI_ERR_OTHER; + } + } } + + lump = alloc_lum(); if (NULL == lump ){ fprintf(stderr,"Cannot allocate memory for extracting stripe size\n"); @@ -149,11 +206,9 @@ mca_fs_lustre_file_open (struct ompi_communicator_t *comm, opal_output(1, "get_stripe failed: %d (%s)\n", errno, strerror(errno)); return OMPI_ERROR; } - fh->f_stripe_size = lump->lmm_stripe_size; - fh->f_stripe_count = lump->lmm_stripe_count; + fh->f_stripe_size = lump->lmm_stripe_size; + fh->f_stripe_count = lump->lmm_stripe_count; + fh->f_fs_block_size = lump->lmm_stripe_size; - // if ( NULL != lump ) { - // free ( lump ); - // } return OMPI_SUCCESS; } diff --git a/ompi/mca/fs/plfs/Makefile.am b/ompi/mca/fs/plfs/Makefile.am deleted file mode 100644 index be6409d131a..00000000000 --- a/ompi/mca/fs/plfs/Makefile.am +++ /dev/null @@ -1,56 +0,0 @@ -# -# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana -# University Research and Technology -# Corporation. All rights reserved. -# Copyright (c) 2004-2005 The University of Tennessee and The University -# of Tennessee Research Foundation. All rights -# reserved. -# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, -# University of Stuttgart. All rights reserved. -# Copyright (c) 2004-2005 The Regents of the University of California. -# All rights reserved. -# Copyright (c) 2008-2014 University of Houston. All rights reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - -# Make the output library in this directory, and name it either -# mca__.la (for DSO builds) or libmca__.la -# (for static builds). - -if MCA_BUILD_ompi_fs_plfs_DSO -component_noinst = -component_install = mca_fs_plfs.la -else -component_noinst = libmca_fs_plfs.la -component_install = -endif - -# Source files - -fs_plfs_sources = \ - fs_plfs.h \ - fs_plfs.c \ - fs_plfs_component.c \ - fs_plfs_file_open.c \ - fs_plfs_file_close.c \ - fs_plfs_file_delete.c \ - fs_plfs_file_sync.c \ - fs_plfs_file_set_size.c \ - fs_plfs_file_get_size.c - -AM_CPPFLAGS = $(fs_plfs_CPPFLAGS) - -mcacomponentdir = $(pkglibdir) -mcacomponent_LTLIBRARIES = $(component_install) -mca_fs_plfs_la_SOURCES = $(fs_plfs_sources) -mca_fs_plfs_la_LIBADD = $(fs_plfs_LIBS) -mca_fs_plfs_la_LDFLAGS = -module -avoid-version $(fs_plfs_LDFLAGS) - -noinst_LTLIBRARIES = $(component_noinst) -libmca_fs_plfs_la_SOURCES = $(fs_plfs_sources) -libmca_fs_plfs_la_LIBADD = $(fs_plfs_LIBS) -libmca_fs_plfs_la_LDFLAGS = -module -avoid-version $(fs_plfs_LDFLAGS) diff --git a/ompi/mca/fs/plfs/configure.m4 b/ompi/mca/fs/plfs/configure.m4 deleted file mode 100644 index 012422ea760..00000000000 --- a/ompi/mca/fs/plfs/configure.m4 +++ /dev/null @@ -1,41 +0,0 @@ -# -*- shell-script -*- -# -# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana -# University Research and Technology -# Corporation. All rights reserved. -# Copyright (c) 2004-2005 The University of Tennessee and The University -# of Tennessee Research Foundation. All rights -# reserved. -# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, -# University of Stuttgart. All rights reserved. -# Copyright (c) 2004-2005 The Regents of the University of California. -# All rights reserved. -# Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. -# Copyright (c) 2008-2014 University of Houston. All rights reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - - -# MCA_fs_plfs_CONFIG(action-if-can-compile, -# [action-if-cant-compile]) -# ------------------------------------------------ -AC_DEFUN([MCA_ompi_fs_plfs_CONFIG],[ - AC_CONFIG_FILES([ompi/mca/fs/plfs/Makefile]) - - OMPI_CHECK_PLFS([fs_plfs], - [fs_plfs_happy="yes"], - [fs_plfs_happy="no"]) - - AS_IF([test "$fs_plfs_happy" = "yes"], - [$1], - [$2]) - - # substitute in the things needed to build plfs - AC_SUBST([fs_plfs_CPPFLAGS]) - AC_SUBST([fs_plfs_LDFLAGS]) - AC_SUBST([fs_plfs_LIBS]) -])dnl diff --git a/ompi/mca/fs/plfs/fs_plfs.c b/ompi/mca/fs/plfs/fs_plfs.c deleted file mode 100644 index e5dcd1e0407..00000000000 --- a/ompi/mca/fs/plfs/fs_plfs.c +++ /dev/null @@ -1,154 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2017 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2008-2015 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - * - * These symbols are in a file by themselves to provide nice linker - * semantics. Since linkers generally pull in symbols by object fules, - * keeping these symbols as the only symbols in this file prevents - * utility programs such as "ompi_info" from having to import entire - * modules just to query their version and parameters - */ - -#include "ompi_config.h" -#include "mpi.h" -#include "ompi/mca/fs/fs.h" -#include "ompi/mca/fs/base/base.h" -#include "ompi/mca/fs/plfs/fs_plfs.h" - -#ifdef HAVE_SYS_STATFS_H -#include /* or */ -#endif -#ifdef HAVE_SYS_PARAM_H -#include -#endif -#ifdef HAVE_SYS_MOUNT_H -#include -#endif -#ifdef HAVE_SYS_STAT_H -#include -#endif - -#include -#include - -/* - * ******************************************************************* - * ************************ actions structure ************************ - * ******************************************************************* - */ -static mca_fs_base_module_1_0_0_t plfs = { - mca_fs_plfs_module_init, /* initalise after being selected */ - mca_fs_plfs_module_finalize, /* close a module on a communicator */ - mca_fs_plfs_file_open, - mca_fs_plfs_file_close, - mca_fs_plfs_file_delete, - mca_fs_plfs_file_set_size, - mca_fs_plfs_file_get_size, - mca_fs_plfs_file_sync -}; -/* - * ******************************************************************* - * ************************* structure ends ************************** - * ******************************************************************* - */ - -int mca_fs_plfs_component_init_query(bool enable_progress_threads, - bool enable_mpi_threads) -{ - /* Nothing to do */ - return OMPI_SUCCESS; -} - -struct mca_fs_base_module_1_0_0_t * -mca_fs_plfs_component_file_query (mca_io_ompio_file_t *fh, int *priority) -{ - int err; - char *dir; - struct statfs fsbuf; - char *tmp; - char wpath[1024]; - - /* The code in this function is based on the ADIO FS selection in ROMIO - * Copyright (C) 1997 University of Chicago. - * See COPYRIGHT notice in top-level directory. - */ - - *priority = mca_fs_plfs_priority; - - tmp = strchr (fh->f_filename, ':'); - if (!tmp) { - if (OMPIO_ROOT == fh->f_rank) { - do { - err = statfs (fh->f_filename, &fsbuf); - } while (err && (errno == ESTALE)); - - if (err && (ENOENT == errno)) { - mca_fs_base_get_parent_dir (fh->f_filename, &dir); - err = statfs (dir, &fsbuf); - free (dir); - } - - getcwd( wpath, sizeof(wpath) ); - if(is_plfs_path(wpath) == 1) { - fh->f_fstype = PLFS; - } - } - fh->f_comm->c_coll->coll_bcast (&(fh->f_fstype), - 1, - MPI_INT, - OMPIO_ROOT, - fh->f_comm, - fh->f_comm->c_coll->coll_bcast_module); - } - else { - if (!strncmp(fh->f_filename, "plfs:", 7) || - !strncmp(fh->f_filename, "PLFS:", 7)) { - fh->f_fstype = PLFS; - } - } - - if (PLFS == fh->f_fstype) { - if (*priority < 50) { - *priority = 50; - return &plfs; - } - } - return NULL; -} - -int mca_fs_plfs_component_file_unquery (mca_io_ompio_file_t *file) -{ - /* This function might be needed for some purposes later. for now it - * does not have anything to do since there are no steps which need - * to be undone if this module is not selected */ - - return OMPI_SUCCESS; -} - -int mca_fs_plfs_module_init (mca_io_ompio_file_t *file) -{ - /* Make sure the file type is not overwritten by the last queried - * component */ - file->f_fstype = PLFS; - return OMPI_SUCCESS; -} - - -int mca_fs_plfs_module_finalize (mca_io_ompio_file_t *file) -{ - return OMPI_SUCCESS; -} diff --git a/ompi/mca/fs/plfs/fs_plfs.h b/ompi/mca/fs/plfs/fs_plfs.h deleted file mode 100644 index 755a8c6b8c6..00000000000 --- a/ompi/mca/fs/plfs/fs_plfs.h +++ /dev/null @@ -1,82 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2006 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2008-2016 University of Houston. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science - * and Technology (RIST). All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef MCA_FS_PLFS_H -#define MCA_FS_PLFS_H - -#include "ompi_config.h" -#include "ompi/mca/mca.h" -#include "ompi/mca/fs/fs.h" -#include "ompi/mca/common/ompio/common_ompio.h" - -#include - -extern int mca_fs_plfs_priority; - -BEGIN_C_DECLS - -int mca_fs_plfs_component_init_query(bool enable_progress_threads, - bool enable_mpi_threads); -struct mca_fs_base_module_1_0_0_t * -mca_fs_plfs_component_file_query (mca_io_ompio_file_t *fh, int *priority); -int mca_fs_plfs_component_file_unquery (mca_io_ompio_file_t *file); - -int mca_fs_plfs_module_init (mca_io_ompio_file_t *file); -int mca_fs_plfs_module_finalize (mca_io_ompio_file_t *file); - -OMPI_MODULE_DECLSPEC extern mca_fs_base_component_2_0_0_t mca_fs_plfs_component; -/* - * ****************************************************************** - * ********* functions which are implemented in this module ********* - * ****************************************************************** - */ - -int mca_fs_plfs_file_open (struct ompi_communicator_t *comm, - const char *filename, - int amode, - struct ompi_info_t *info, - mca_io_ompio_file_t *fh); - -int mca_fs_plfs_file_close (mca_io_ompio_file_t *fh); - -int mca_fs_plfs_file_delete (char *filename, - struct ompi_info_t *info); - -int mca_fs_plfs_file_set_size (mca_io_ompio_file_t *fh, - OMPI_MPI_OFFSET_TYPE size); - -int mca_fs_plfs_file_get_size (mca_io_ompio_file_t *fh, - OMPI_MPI_OFFSET_TYPE *size); - -int mca_fs_plfs_file_sync (mca_io_ompio_file_t *fh); - -int mca_fs_plfs_file_seek (mca_io_ompio_file_t *fh, - OMPI_MPI_OFFSET_TYPE offset, - int whence); -/* - * ****************************************************************** - * ************ functions implemented in this module end ************ - * ****************************************************************** - */ - -END_C_DECLS - -#endif /* MCA_FS_PLFS_H */ diff --git a/ompi/mca/fs/plfs/fs_plfs_component.c b/ompi/mca/fs/plfs/fs_plfs_component.c deleted file mode 100644 index 6df5f7db22b..00000000000 --- a/ompi/mca/fs/plfs/fs_plfs_component.c +++ /dev/null @@ -1,81 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2005 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2008-2014 University of Houston. All rights reserved. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights - * reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - * - * These symbols are in a file by themselves to provide nice linker - * semantics. Since linkers generally pull in symbols by object - * files, keeping these symbols as the only symbols in this file - * prevents utility programs such as "ompi_info" from having to import - * entire components just to query their version and parameters. - */ - -#include "ompi_config.h" -#include "fs_plfs.h" -#include "mpi.h" - -/* - * Public string showing the fs plfs component version number - */ -const char *mca_fs_plfs_component_version_string = - "OMPI/MPI plfs FS MCA component version " OMPI_VERSION; - -static int plfs_register(void); - -int mca_fs_plfs_priority = 20; - -/* - * Instantiate the public struct with all of our public information - * and pointers to our public functions in it - */ -mca_fs_base_component_2_0_0_t mca_fs_plfs_component = { - - /* First, the mca_component_t struct containing meta information - about the component itself */ - - .fsm_version = { - MCA_FS_BASE_VERSION_2_0_0, - - /* Component name and version */ - .mca_component_name = "plfs", - MCA_BASE_MAKE_VERSION(component, OMPI_MAJOR_VERSION, OMPI_MINOR_VERSION, - OMPI_RELEASE_VERSION), - .mca_register_component_params = plfs_register, - }, - .fsm_data = { - /* This component is checkpointable */ - MCA_BASE_METADATA_PARAM_CHECKPOINT - }, - .fsm_init_query = mca_fs_plfs_component_init_query, /* get thread level */ - .fsm_file_query = mca_fs_plfs_component_file_query, /* get priority and actions */ - .fsm_file_unquery = mca_fs_plfs_component_file_unquery, /* undo what was done by previous function */ -}; - -static int -plfs_register(void) -{ - mca_fs_plfs_priority = 20; - (void) mca_base_component_var_register(&mca_fs_plfs_component.fsm_version, - "priority", "Priority of the plfs fs component", - MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, - OPAL_INFO_LVL_9, - MCA_BASE_VAR_SCOPE_READONLY, &mca_fs_plfs_priority); - - return OMPI_SUCCESS; -} diff --git a/ompi/mca/fs/plfs/fs_plfs_file_close.c b/ompi/mca/fs/plfs/fs_plfs_file_close.c deleted file mode 100644 index 5004770c239..00000000000 --- a/ompi/mca/fs/plfs/fs_plfs_file_close.c +++ /dev/null @@ -1,84 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2017 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2008-2014 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - - -#include "ompi_config.h" -#include "fs_plfs.h" - -#include -#include -#include "mpi.h" -#include "ompi/constants.h" -#include "ompi/mca/fs/fs.h" - - -/* - * file_close_plfs - * - * Function: - closes a new file - * Accepts: - file handle - * Returns: - Success if file closed - */ -int -mca_fs_plfs_file_close (mca_io_ompio_file_t *fh) -{ - int flags; - plfs_error_t plfs_ret = PLFS_SUCCESS; - int amode; - char wpath[1024]; - - fh->f_comm->c_coll->coll_barrier (fh->f_comm, - fh->f_comm->c_coll->coll_barrier_module); - - getcwd( wpath, sizeof(wpath) ); - sprintf( wpath,"%s/%s",wpath,fh->f_filename ); - - plfs_ret = plfs_access(wpath, F_OK); - if ( PLFS_SUCCESS != plfs_ret ) { - opal_output(0, "fs_plfs_file_close: Error in plfs_access:\n%s\n", strplfserr(plfs_ret)); - return OMPI_ERROR; // file doesn't exist - } - - amode = 0; - if (fh->f_amode & MPI_MODE_CREATE) - amode = amode | O_CREAT; - if (fh->f_amode & MPI_MODE_RDONLY) - amode = amode | O_RDONLY; - if (fh->f_amode & MPI_MODE_WRONLY) - amode = amode | O_WRONLY; - if (fh->f_amode & MPI_MODE_RDWR) - amode = amode | O_RDWR; - if (fh->f_amode & MPI_MODE_EXCL) { - return OMPI_ERROR; - } - - plfs_ret = plfs_sync(fh->f_fs_ptr); - if (PLFS_SUCCESS != plfs_ret) { - opal_output(0, "fs_plfs_file_close: Error in plfs_sync:\n%s\n", strplfserr(plfs_ret)); - return OMPI_ERROR; - } - - - plfs_ret = plfs_close(fh->f_fs_ptr, fh->f_rank, 0, amode ,NULL, &flags); - if (PLFS_SUCCESS != plfs_ret) { - opal_output(0, "fs_plfs_file_close: Error in plfs_close:\n%s\n", strplfserr(plfs_ret)); - return OMPI_ERROR; - } - return OMPI_SUCCESS; -} diff --git a/ompi/mca/fs/plfs/fs_plfs_file_delete.c b/ompi/mca/fs/plfs/fs_plfs_file_delete.c deleted file mode 100644 index d20a8e88c59..00000000000 --- a/ompi/mca/fs/plfs/fs_plfs_file_delete.c +++ /dev/null @@ -1,50 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2011 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2008-2014 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - - -#include "ompi_config.h" -#include "fs_plfs.h" -#include - -#include "mpi.h" -#include "ompi/constants.h" -#include "ompi/mca/fs/fs.h" - -/* - * file_delete_plfs - * - * Function: - deletes a file - * Accepts: - file name & info - * Returns: - Success if file closed - */ -int -mca_fs_plfs_file_delete (char* file_name, - struct ompi_info_t *info) -{ - plfs_error_t plfs_ret; - char wpath[1024]; - getcwd( wpath, sizeof(wpath) ); - sprintf( wpath,"%s/%s",wpath,file_name ); - plfs_ret = plfs_unlink( wpath ); - if (PLFS_SUCCESS != plfs_ret) { - opal_output(0, "fs_plfs_file_delete: Error in plfs_unlink:\n%s\n", strplfserr(plfs_ret)); - return OMPI_ERROR; - } - return OMPI_SUCCESS; -} diff --git a/ompi/mca/fs/plfs/fs_plfs_file_get_size.c b/ompi/mca/fs/plfs/fs_plfs_file_get_size.c deleted file mode 100644 index c59bd8c53ae..00000000000 --- a/ompi/mca/fs/plfs/fs_plfs_file_get_size.c +++ /dev/null @@ -1,56 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2011 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2008-2014 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - - -#include "ompi_config.h" -#include "fs_plfs.h" - -#include "mpi.h" -#include "ompi/constants.h" -#include "ompi/mca/fs/fs.h" - -/* - * file_get_size_plfs - * - * Function: - get_size of a file - * Accepts: - same arguments as MPI_File_get_size() - * Returns: - Success if size is get - */ -int -mca_fs_plfs_file_get_size (mca_io_ompio_file_t *fh, - OMPI_MPI_OFFSET_TYPE *size) -{ - Plfs_fd *pfd = NULL; - plfs_error_t plfs_ret; - struct stat st; - char wpath[1024]; - int size_only = 1; - - getcwd(wpath, sizeof(wpath)); - sprintf(wpath,"%s/%s",wpath,fh->f_filename); - - plfs_ret = plfs_getattr(pfd, wpath, &st, size_only); - if (PLFS_SUCCESS != plfs_ret) { - opal_output(0, "fs_plfs_file_get_size: Error in plfs_getattr:\n%s\n", strplfserr(plfs_ret)); - return OMPI_ERROR; - } - - *size = st.st_size; - return OMPI_SUCCESS; -} diff --git a/ompi/mca/fs/plfs/fs_plfs_file_open.c b/ompi/mca/fs/plfs/fs_plfs_file_open.c deleted file mode 100644 index 623a6d99004..00000000000 --- a/ompi/mca/fs/plfs/fs_plfs_file_open.c +++ /dev/null @@ -1,111 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2017 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2008-2014 University of Houston. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science - * and Technology (RIST). All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - - -#include "ompi_config.h" -#include "fs_plfs.h" - -#include -#include -#include "mpi.h" -#include "ompi/constants.h" -#include "ompi/mca/fs/fs.h" -#include "ompi/communicator/communicator.h" -#include "ompi/info/info.h" - -#include - -/* - * file_open_plfs - * - * Function: - opens a new file - * Accepts: - same arguments as MPI_File_open() - * Returns: - Success if new file handle - */ -int -mca_fs_plfs_file_open (struct ompi_communicator_t *comm, - const char* filename, - int access_mode, - struct ompi_info_t *info, - mca_io_ompio_file_t *fh) -{ - int rank; - int amode; - int old_mask, perm; - plfs_error_t plfs_ret; - Plfs_fd *pfd = NULL; - char wpath[1024]; - - rank = ompi_comm_rank ( comm ); - - getcwd( wpath, sizeof(wpath) ); - sprintf( wpath,"%s/%s",wpath,filename ); - - if (OMPIO_PERM_NULL == fh->f_perm) { - old_mask = umask(022); - umask(old_mask); - perm = old_mask ^ 0666; - } - else { - perm = fh->f_perm; - } - - amode = 0; - - if (access_mode & MPI_MODE_RDONLY) - amode = amode | O_RDONLY; - if (access_mode & MPI_MODE_WRONLY) - amode = amode | O_WRONLY; - if (access_mode & MPI_MODE_RDWR) - amode = amode | O_RDWR; - if (access_mode & MPI_MODE_EXCL) { - if( is_plfs_path(wpath) == 1 ) { //the file already exists - return OMPI_ERROR; - } - } - - if (0 == rank) { - /* MODE_CREATE and MODE_EXCL can only be set by one process */ - if (access_mode & MPI_MODE_CREATE) - amode = amode | O_CREAT; - - plfs_ret = plfs_open( &pfd, wpath, amode, fh->f_rank, perm, NULL ); - fh->f_fs_ptr = pfd; - } - - comm->c_coll->coll_bcast ( &plfs_ret, 1, MPI_INT, 0, comm, comm->c_coll->coll_bcast_module); - if ( PLFS_SUCCESS != plfs_ret ) { - return OMPI_ERROR; - } - - if (0 != rank) { - plfs_ret = plfs_open( &pfd, wpath, amode, fh->f_rank, perm, NULL ); - if (PLFS_SUCCESS != plfs_ret) { - opal_output(0, "fs_plfs_file_open: Error in plfs_open:\n%s\n", strplfserr(plfs_ret)); - return OMPI_ERROR; - } - else { - fh->f_fs_ptr = pfd; - } - } - - return OMPI_SUCCESS; -} diff --git a/ompi/mca/fs/plfs/fs_plfs_file_set_size.c b/ompi/mca/fs/plfs/fs_plfs_file_set_size.c deleted file mode 100644 index 6c24fb44c53..00000000000 --- a/ompi/mca/fs/plfs/fs_plfs_file_set_size.c +++ /dev/null @@ -1,41 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2011 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2008-2011 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - - -#include "ompi_config.h" -#include "fs_plfs.h" - -#include "mpi.h" -#include "ompi/constants.h" -#include "ompi/mca/fs/fs.h" - -/* - * file_set_size_plfs - * - * Function: - set_size of a file - * Accepts: - same arguments as MPI_File_set_size() - * Returns: - Success if size is set - */ -int -mca_fs_plfs_file_set_size (mca_io_ompio_file_t *file_handle, - OMPI_MPI_OFFSET_TYPE size) -{ - printf ("PLFS SET SIZE\n"); - return OMPI_SUCCESS; -} diff --git a/ompi/mca/fs/plfs/fs_plfs_file_sync.c b/ompi/mca/fs/plfs/fs_plfs_file_sync.c deleted file mode 100644 index 6bbf056b0f9..00000000000 --- a/ompi/mca/fs/plfs/fs_plfs_file_sync.c +++ /dev/null @@ -1,45 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2011 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2008-2014 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - - -#include "ompi_config.h" -#include "fs_plfs.h" - -#include "mpi.h" -#include "ompi/constants.h" -#include "ompi/mca/fs/fs.h" - -/* - * file_sync_plfs - * - * Function: - closes a new file - * Accepts: - file handle - * Returns: - Success if file closed - */ -int -mca_fs_plfs_file_sync (mca_io_ompio_file_t *fh) -{ - plfs_error_t plfs_ret; - plfs_ret = plfs_sync( fh->f_fs_ptr ); - if (PLFS_SUCCESS != plfs_ret) { - opal_output(0, "fs_plfs_file_sync: Error in plfs_sync:\n%s\n", strplfserr(plfs_ret)); - return OMPI_ERROR; - } - return OMPI_SUCCESS; -} diff --git a/ompi/mca/fs/plfs/owner.txt b/ompi/mca/fs/plfs/owner.txt deleted file mode 100644 index 2e9726c28a4..00000000000 --- a/ompi/mca/fs/plfs/owner.txt +++ /dev/null @@ -1,7 +0,0 @@ -# -# owner/status file -# owner: institution that is responsible for this package -# status: e.g. active, maintenance, unmaintained -# -owner: UH -status: active diff --git a/ompi/mca/fs/pvfs2/Makefile.am b/ompi/mca/fs/pvfs2/Makefile.am index 4b8f6bfc578..64c147493ac 100644 --- a/ompi/mca/fs/pvfs2/Makefile.am +++ b/ompi/mca/fs/pvfs2/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2008-2011 University of Houston. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -47,7 +48,8 @@ AM_CPPFLAGS = $(fs_pvfs2_CPPFLAGS) mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_fs_pvfs2_la_SOURCES = $(fs_pvfs2_sources) -mca_fs_pvfs2_la_LIBADD = $(fs_pvfs2_LIBS) +mca_fs_pvfs2_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(fs_pvfs2_LIBS) mca_fs_pvfs2_la_LDFLAGS = -module -avoid-version $(fs_pvfs2_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/ompi/mca/fs/pvfs2/configure.m4 b/ompi/mca/fs/pvfs2/configure.m4 index 17539ba070f..0f404ea0319 100644 --- a/ompi/mca/fs/pvfs2/configure.m4 +++ b/ompi/mca/fs/pvfs2/configure.m4 @@ -30,6 +30,7 @@ AC_DEFUN([MCA_ompi_fs_pvfs2_CONFIG],[ [fs_pvfs2_happy="yes"], [fs_pvfs2_happy="no"]) + OPAL_SUMMARY_ADD([[OMPIO File Systems]],[[PVFS2/OrangeFS]],[$1],[$fs_pvfs2_happy]) AS_IF([test "$fs_pvfs2_happy" = "yes"], [$1], [$2]) diff --git a/ompi/mca/fs/pvfs2/fs_pvfs2.h b/ompi/mca/fs/pvfs2/fs_pvfs2.h index 2555996e861..dc4e724f3db 100644 --- a/ompi/mca/fs/pvfs2/fs_pvfs2.h +++ b/ompi/mca/fs/pvfs2/fs_pvfs2.h @@ -12,6 +12,7 @@ * Copyright (c) 2008-2016 University of Houston. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -70,13 +71,13 @@ OMPI_MODULE_DECLSPEC extern mca_fs_base_component_2_0_0_t mca_fs_pvfs2_component int mca_fs_pvfs2_file_open (struct ompi_communicator_t *comm, const char *filename, int amode, - struct ompi_info_t *info, + struct opal_info_t *info, mca_io_ompio_file_t *fh); int mca_fs_pvfs2_file_close (mca_io_ompio_file_t *fh); int mca_fs_pvfs2_file_delete (char *filename, - struct ompi_info_t *info); + struct opal_info_t *info); int mca_fs_pvfs2_file_set_size (mca_io_ompio_file_t *fh, OMPI_MPI_OFFSET_TYPE size); diff --git a/ompi/mca/fs/pvfs2/fs_pvfs2_file_delete.c b/ompi/mca/fs/pvfs2/fs_pvfs2_file_delete.c index d69007fe6a1..1033e895b71 100644 --- a/ompi/mca/fs/pvfs2/fs_pvfs2_file_delete.c +++ b/ompi/mca/fs/pvfs2/fs_pvfs2_file_delete.c @@ -10,6 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2011 University of Houston. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -38,7 +39,7 @@ */ int mca_fs_pvfs2_file_delete (char* file_name, - struct ompi_info_t *info) + struct opal_info_t *info) { PVFS_credentials credentials; PVFS_sysresp_getparent resp_getparent; diff --git a/ompi/mca/fs/pvfs2/fs_pvfs2_file_open.c b/ompi/mca/fs/pvfs2/fs_pvfs2_file_open.c index 6017cda1481..754cd815bd5 100644 --- a/ompi/mca/fs/pvfs2/fs_pvfs2_file_open.c +++ b/ompi/mca/fs/pvfs2/fs_pvfs2_file_open.c @@ -10,8 +10,9 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2014 University of Houston. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -60,7 +61,7 @@ int mca_fs_pvfs2_file_open (struct ompi_communicator_t *comm, const char* filename, int access_mode, - struct ompi_info_t *info, + struct opal_info_t *info, mca_io_ompio_file_t *fh) { int ret; @@ -72,7 +73,7 @@ mca_fs_pvfs2_file_open (struct ompi_communicator_t *comm, struct ompi_datatype_t *open_status_type; struct ompi_datatype_t *types[2] = {&ompi_mpi_int.dt, &ompi_mpi_byte.dt}; int lens[2] = {1, sizeof(PVFS_object_ref)}; - OPAL_PTRDIFF_TYPE offsets[2]; + ptrdiff_t offsets[2]; char char_stripe[MPI_MAX_INFO_KEY]; int flag; int fs_pvfs2_stripe_size = -1; @@ -108,12 +109,12 @@ mca_fs_pvfs2_file_open (struct ompi_communicator_t *comm, update mca_fs_pvfs2_stripe_width and mca_fs_pvfs2_stripe_size before calling fake_an_open() */ - ompi_info_get (info, "stripe_size", MPI_MAX_INFO_VAL, char_stripe, &flag); + opal_info_get (info, "stripe_size", MPI_MAX_INFO_VAL, char_stripe, &flag); if ( flag ) { sscanf ( char_stripe, "%d", &fs_pvfs2_stripe_size ); } - ompi_info_get (info, "stripe_width", MPI_MAX_INFO_VAL, char_stripe, &flag); + opal_info_get (info, "stripe_width", MPI_MAX_INFO_VAL, char_stripe, &flag); if ( flag ) { sscanf ( char_stripe, "%d", &fs_pvfs2_stripe_width ); } diff --git a/ompi/mca/fs/ufs/Makefile.am b/ompi/mca/fs/ufs/Makefile.am index a66f1d8993f..d9deca4c18b 100644 --- a/ompi/mca/fs/ufs/Makefile.am +++ b/ompi/mca/fs/ufs/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2008-2011 University of Houston. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -33,6 +34,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_fs_ufs_la_SOURCES = $(sources) mca_fs_ufs_la_LDFLAGS = -module -avoid-version +mca_fs_ufs_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_fs_ufs_la_SOURCES = $(sources) diff --git a/ompi/mca/fs/ufs/configure.m4 b/ompi/mca/fs/ufs/configure.m4 new file mode 100644 index 00000000000..dcc617b55d3 --- /dev/null +++ b/ompi/mca/fs/ufs/configure.m4 @@ -0,0 +1,29 @@ +# -*- shell-script -*- +# +# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana +# University Research and Technology +# Corporation. All rights reserved. +# Copyright (c) 2004-2005 The University of Tennessee and The University +# of Tennessee Research Foundation. All rights +# reserved. +# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, +# University of Stuttgart. All rights reserved. +# Copyright (c) 2004-2012 The Regents of the University of California. +# All rights reserved. +# Copyright (c) 2010-2014 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2008-2018 University of Houston. All rights reserved. +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +# MCA_fbtl_posix_CONFIG(action-if-can-compile, +# [action-if-cant-compile]) +# ------------------------------------------------ +AC_DEFUN([MCA_ompi_fs_ufs_CONFIG],[ + AC_CONFIG_FILES([ompi/mca/fs/ufs/Makefile]) + + OPAL_SUMMARY_ADD([[OMPIO File Systems]],[[Generic Unix FS]],[$1],[yes]) +])dnl diff --git a/ompi/mca/fs/ufs/fs_ufs.h b/ompi/mca/fs/ufs/fs_ufs.h index 66ec4c6ce24..08fb426e4e7 100644 --- a/ompi/mca/fs/ufs/fs_ufs.h +++ b/ompi/mca/fs/ufs/fs_ufs.h @@ -12,6 +12,7 @@ * Copyright (c) 2008-2016 University of Houston. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -28,6 +29,12 @@ #include "ompi/mca/common/ompio/common_ompio.h" extern int mca_fs_ufs_priority; +extern int mca_fs_ufs_lock_algorithm; + +#define FS_UFS_LOCK_AUTO 0 +#define FS_UFS_LOCK_NEVER 1 +#define FS_UFS_LOCK_ENTIRE_FILE 2 +#define FS_UFS_LOCK_RANGES 3 BEGIN_C_DECLS @@ -50,13 +57,13 @@ OMPI_MODULE_DECLSPEC extern mca_fs_base_component_2_0_0_t mca_fs_ufs_component; int mca_fs_ufs_file_open (struct ompi_communicator_t *comm, const char *filename, int amode, - struct ompi_info_t *info, + struct opal_info_t *info, mca_io_ompio_file_t *fh); int mca_fs_ufs_file_close (mca_io_ompio_file_t *fh); int mca_fs_ufs_file_delete (char *filename, - struct ompi_info_t *info); + struct opal_info_t *info); int mca_fs_ufs_file_set_size (mca_io_ompio_file_t *fh, OMPI_MPI_OFFSET_TYPE size); diff --git a/ompi/mca/fs/ufs/fs_ufs_component.c b/ompi/mca/fs/ufs/fs_ufs_component.c index d5f3c157daf..7ecaf9e0fd3 100644 --- a/ompi/mca/fs/ufs/fs_ufs_component.c +++ b/ompi/mca/fs/ufs/fs_ufs_component.c @@ -31,6 +31,12 @@ #include "mpi.h" int mca_fs_ufs_priority = 10; +int mca_fs_ufs_lock_algorithm=0; /* auto */ +/* + * Private functions + */ +static int register_component(void); + /* * Public string showing the fs ufs component version number @@ -54,6 +60,7 @@ mca_fs_base_component_2_0_0_t mca_fs_ufs_component = { .mca_component_name = "ufs", MCA_BASE_MAKE_VERSION(component, OMPI_MAJOR_VERSION, OMPI_MINOR_VERSION, OMPI_RELEASE_VERSION), + .mca_register_component_params = register_component, }, .fsm_data = { /* This component is checkpointable */ @@ -63,3 +70,26 @@ mca_fs_base_component_2_0_0_t mca_fs_ufs_component = { .fsm_file_query = mca_fs_ufs_component_file_query, /* get priority and actions */ .fsm_file_unquery = mca_fs_ufs_component_file_unquery, /* undo what was done by previous function */ }; + +static int register_component(void) +{ + mca_fs_ufs_priority = 10; + (void) mca_base_component_var_register(&mca_fs_ufs_component.fsm_version, + "priority", "Priority of the fs ufs component", + MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, + OPAL_INFO_LVL_9, + MCA_BASE_VAR_SCOPE_READONLY, + &mca_fs_ufs_priority); + + mca_fs_ufs_lock_algorithm = 0; + (void) mca_base_component_var_register(&mca_fs_ufs_component.fsm_version, + "lock_algorithm", "Locking algorithm used by the fs ufs component. " + " 0: auto (default), 1: skip locking, 2: always lock entire file, " + "3: lock only specific ranges", + MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, + OPAL_INFO_LVL_9, + MCA_BASE_VAR_SCOPE_READONLY, + &mca_fs_ufs_lock_algorithm ); + + return OMPI_SUCCESS; +} diff --git a/ompi/mca/fs/ufs/fs_ufs_file_delete.c b/ompi/mca/fs/ufs/fs_ufs_file_delete.c index c585ee18da0..d6be6c32246 100644 --- a/ompi/mca/fs/ufs/fs_ufs_file_delete.c +++ b/ompi/mca/fs/ufs/fs_ufs_file_delete.c @@ -10,6 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2011 University of Houston. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -35,7 +36,7 @@ */ int mca_fs_ufs_file_delete (char* file_name, - struct ompi_info_t *info) + struct opal_info_t *info) { int ret; diff --git a/ompi/mca/fs/ufs/fs_ufs_file_open.c b/ompi/mca/fs/ufs/fs_ufs_file_open.c index fe8d722b8ab..ada11edeb20 100644 --- a/ompi/mca/fs/ufs/fs_ufs_file_open.c +++ b/ompi/mca/fs/ufs/fs_ufs_file_open.c @@ -9,9 +9,10 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2008-2014 University of Houston. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2008-2018 University of Houston. All rights reserved. + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -27,9 +28,11 @@ #include #include "mpi.h" #include "ompi/constants.h" +#include "ompi/mca/fs/base/base.h" #include "ompi/mca/fs/fs.h" #include "ompi/communicator/communicator.h" #include "ompi/info/info.h" +#include "opal/util/path.h" /* * file_open_ufs @@ -42,12 +45,12 @@ int mca_fs_ufs_file_open (struct ompi_communicator_t *comm, const char* filename, int access_mode, - struct ompi_info_t *info, + struct opal_info_t *info, mca_io_ompio_file_t *fh) { int amode; int old_mask, perm; - int rank, ret; + int rank, ret=OMPI_SUCCESS; rank = ompi_comm_rank ( comm ); @@ -69,6 +72,8 @@ mca_fs_ufs_file_open (struct ompi_communicator_t *comm, if (access_mode & MPI_MODE_RDWR) amode = amode | O_RDWR; + /* Reset errno */ + errno = 0; if ( 0 == rank ) { /* MODE_CREATE and MODE_EXCL can only be set by one process */ if ( !(fh->f_flags & OMPIO_SHAREDFP_IS_SET)) { @@ -78,23 +83,133 @@ mca_fs_ufs_file_open (struct ompi_communicator_t *comm, amode = amode | O_EXCL; } fh->fd = open (filename, amode, perm); - ret = fh->fd; + if ( 0 > fh->fd ) { + if ( EACCES == errno ) { + ret = MPI_ERR_ACCESS; + } + else if ( ENAMETOOLONG == errno ) { + ret = MPI_ERR_BAD_FILE; + } + else if ( ENOENT == errno ) { + ret = MPI_ERR_NO_SUCH_FILE; + } + else if ( EISDIR == errno ) { + ret = MPI_ERR_BAD_FILE; + } + else if ( EROFS == errno ) { + ret = MPI_ERR_READ_ONLY; + } + else if ( EEXIST == errno ) { + ret = MPI_ERR_FILE_EXISTS; + } + else if ( ENOSPC == errno ) { + ret = MPI_ERR_NO_SPACE; + } + else if ( EDQUOT == errno ) { + ret = MPI_ERR_QUOTA; + } + else if ( ETXTBSY == errno ) { + ret = MPI_ERR_FILE_IN_USE; + } + else { + ret = MPI_ERR_OTHER; + } + } } comm->c_coll->coll_bcast ( &ret, 1, MPI_INT, 0, comm, comm->c_coll->coll_bcast_module); - if ( -1 == ret ) { - fh->fd = ret; - return OMPI_ERROR; + if ( OMPI_SUCCESS != ret ) { + fh->fd = -1; + return ret; } + if ( 0 != rank ) { fh->fd = open (filename, amode, perm); - if (-1 == fh->fd) { - return OMPI_ERROR; + if ( 0 > fh->fd) { + if ( EACCES == errno ) { + ret = MPI_ERR_ACCESS; + } + else if ( ENAMETOOLONG == errno ) { + ret = MPI_ERR_BAD_FILE; + } + else if ( ENOENT == errno ) { + ret = MPI_ERR_NO_SUCH_FILE; + } + else if ( EISDIR == errno ) { + ret = MPI_ERR_BAD_FILE; + } + else if ( EROFS == errno ) { + ret = MPI_ERR_READ_ONLY; + } + else if ( EEXIST == errno ) { + ret = MPI_ERR_FILE_EXISTS; + } + else if ( ENOSPC == errno ) { + ret = MPI_ERR_NO_SPACE; + } + else if ( EDQUOT == errno ) { + ret = MPI_ERR_QUOTA; + } + else if ( ETXTBSY == errno ) { + ret = MPI_ERR_FILE_IN_USE; + } + else { + ret = MPI_ERR_OTHER; + } } } fh->f_stripe_size=0; fh->f_stripe_count=1; + /* Need to check for NFS here. If the file system is not NFS but a regular UFS file system, + we do not need to enforce locking. A regular XFS or EXT4 file system can only be used + within a single node, local environment, and in this case the OS will already ensure correct + handling of file system blocks; + */ + + if ( FS_UFS_LOCK_AUTO == mca_fs_ufs_lock_algorithm ) { + char *fstype=NULL; + bool bret = opal_path_nfs ( (char *)filename, &fstype ); + + if ( false == bret ) { + char *dir; + mca_fs_base_get_parent_dir ( (char *)filename, &dir ); + bret = opal_path_nfs (dir, &fstype); + free(dir); + } + + if ( true == bret ) { + if ( 0 == strncasecmp(fstype, "nfs", sizeof("nfs")) ) { + /* Based on my tests, only locking the entire file for all operations + guarantueed for the entire teststuite to pass correctly. I would not + be surprised, if depending on the NFS configuration that might not always + be necessary, and the user can change that with an MCA parameter of this + component. */ + fh->f_flags |= OMPIO_LOCK_ENTIRE_FILE; + } + else { + fh->f_flags |= OMPIO_LOCK_NEVER; + } + } + else { + fh->f_flags |= OMPIO_LOCK_NEVER; + } + free (fstype); + } + else if ( FS_UFS_LOCK_NEVER == mca_fs_ufs_lock_algorithm ) { + fh->f_flags |= OMPIO_LOCK_NEVER; + } + else if ( FS_UFS_LOCK_ENTIRE_FILE == mca_fs_ufs_lock_algorithm ) { + fh->f_flags |= OMPIO_LOCK_ENTIRE_FILE; + } + else if ( FS_UFS_LOCK_RANGES == mca_fs_ufs_lock_algorithm ) { + /* Nothing to be done. This is what the posix fbtl component would do + anyway without additional information . */ + } + else { + opal_output ( 1, "Invalid value for mca_fs_ufs_lock_algorithm %d", mca_fs_ufs_lock_algorithm ); + } + return OMPI_SUCCESS; } diff --git a/ompi/mca/io/base/base.h b/ompi/mca/io/base/base.h index 19e96b56933..e81f8e94c90 100644 --- a/ompi/mca/io/base/base.h +++ b/ompi/mca/io/base/base.h @@ -12,6 +12,7 @@ * Copyright (c) 2008 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -157,7 +158,7 @@ BEGIN_C_DECLS * module). See io.h for details. */ OMPI_DECLSPEC int mca_io_base_delete(const char *filename, - struct ompi_info_t *info); + struct opal_info_t *info); OMPI_DECLSPEC int mca_io_base_register_datarep(const char *, MPI_Datarep_conversion_function*, diff --git a/ompi/mca/io/base/io_base_delete.c b/ompi/mca/io/base/io_base_delete.c index b00b9eebe49..48265b23478 100644 --- a/ompi/mca/io/base/io_base_delete.c +++ b/ompi/mca/io/base/io_base_delete.c @@ -12,6 +12,7 @@ * Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -29,6 +30,7 @@ #include "opal/class/opal_list.h" #include "opal/util/argv.h" #include "opal/util/output.h" +#include "opal/util/info.h" #include "ompi/mca/mca.h" #include "opal/mca/base/base.h" #include "ompi/mca/io/io.h" @@ -52,19 +54,19 @@ typedef struct avail_io_t avail_io_t; * Local functions */ static opal_list_t *check_components(opal_list_t *components, - const char *filename, struct ompi_info_t *info, + const char *filename, struct opal_info_t *info, char **names, int num_names); static avail_io_t *check_one_component(const mca_base_component_t *component, - const char *filename, struct ompi_info_t *info); + const char *filename, struct opal_info_t *info); static avail_io_t *query(const mca_base_component_t *component, - const char *filename, struct ompi_info_t *info); + const char *filename, struct opal_info_t *info); static avail_io_t *query_2_0_0(const mca_io_base_component_2_0_0_t *io_component, - const char *filename, struct ompi_info_t *info); + const char *filename, struct opal_info_t *info); -static void unquery(avail_io_t *avail, const char *filename, struct ompi_info_t *info); +static void unquery(avail_io_t *avail, const char *filename, struct opal_info_t *info); -static int delete_file(avail_io_t *avail, const char *filename, struct ompi_info_t *info); +static int delete_file(avail_io_t *avail, const char *filename, struct opal_info_t *info); /* @@ -75,7 +77,7 @@ static OBJ_CLASS_INSTANCE(avail_io_t, opal_list_item_t, NULL, NULL); /* */ -int mca_io_base_delete(const char *filename, struct ompi_info_t *info) +int mca_io_base_delete(const char *filename, struct opal_info_t *info) { int err; opal_list_t *selectable; @@ -180,7 +182,7 @@ static int avail_io_compare (opal_list_item_t **itema, * priority order. */ static opal_list_t *check_components(opal_list_t *components, - const char *filename, struct ompi_info_t *info, + const char *filename, struct opal_info_t *info, char **names, int num_names) { int i; @@ -249,7 +251,7 @@ static opal_list_t *check_components(opal_list_t *components, * Check a single component */ static avail_io_t *check_one_component(const mca_base_component_t *component, - const char *filename, struct ompi_info_t *info) + const char *filename, struct opal_info_t *info) { avail_io_t *avail; @@ -282,7 +284,7 @@ static avail_io_t *check_one_component(const mca_base_component_t *component, * module struct */ static avail_io_t *query(const mca_base_component_t *component, - const char *filename, struct ompi_info_t *info) + const char *filename, struct opal_info_t *info) { const mca_io_base_component_2_0_0_t *ioc_200; @@ -303,7 +305,7 @@ static avail_io_t *query(const mca_base_component_t *component, static avail_io_t *query_2_0_0(const mca_io_base_component_2_0_0_t *component, - const char *filename, struct ompi_info_t *info) + const char *filename, struct opal_info_t *info) { bool usable; int priority, ret; @@ -333,7 +335,7 @@ static avail_io_t *query_2_0_0(const mca_io_base_component_2_0_0_t *component, * Unquery functions **************************************************************************/ -static void unquery(avail_io_t *avail, const char *filename, struct ompi_info_t *info) +static void unquery(avail_io_t *avail, const char *filename, struct opal_info_t *info) { const mca_io_base_component_2_0_0_t *ioc_200; @@ -358,7 +360,7 @@ static void unquery(avail_io_t *avail, const char *filename, struct ompi_info_t /* * Invoke the component's delete function */ -static int delete_file(avail_io_t *avail, const char *filename, struct ompi_info_t *info) +static int delete_file(avail_io_t *avail, const char *filename, struct opal_info_t *info) { const mca_io_base_component_2_0_0_t *ioc_200; diff --git a/ompi/mca/io/base/io_base_file_select.c b/ompi/mca/io/base/io_base_file_select.c index fd91033244c..2a30f097437 100644 --- a/ompi/mca/io/base/io_base_file_select.c +++ b/ompi/mca/io/base/io_base_file_select.c @@ -13,6 +13,7 @@ * Copyright (c) 2008-2011 University of Houston. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -30,6 +31,7 @@ #include "ompi/file/file.h" #include "opal/util/argv.h" #include "opal/util/output.h" +#include "opal/util/info.h" #include "opal/class/opal_list.h" #include "opal/class/opal_object.h" #include "ompi/mca/mca.h" @@ -459,7 +461,7 @@ static int module_init(ompi_file_t *file) case MCA_IO_BASE_V_2_0_0: iom_200 = &(file->f_io_selected_module.v2_0_0); return iom_200->io_module_file_open(file->f_comm, file->f_filename, - file->f_amode, file->f_info, + file->f_amode, file->super.s_info, file); break; diff --git a/ompi/mca/io/configure.m4 b/ompi/mca/io/configure.m4 deleted file mode 100644 index 8643772a150..00000000000 --- a/ompi/mca/io/configure.m4 +++ /dev/null @@ -1,40 +0,0 @@ -# -*- shell-script -*- -# -# Copyright (c) 2006-2007 Los Alamos National Security, LLC. -# All rights reserved. -# Copyright (c) 2012 Cisco Systems, Inc. All rights reserved. -# Copyright (c) 2016 Research Organization for Information Science -# and Technology (RIST). All rights reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - -# MCA_ompi_io_CONFIG(project_name, framework_name) -# ------------------------------------------- -AC_DEFUN([MCA_ompi_io_CONFIG], -[ - OPAL_VAR_SCOPE_PUSH([define_mpi_io]) - - AS_IF([test "$enable_mpi_io" != "no"], - [OMPI_MPIF_IO_CONSTANTS_INCLUDE="include \"mpif-io-constants.h\"" - OMPI_MPIF_IO_HANDLES_INCLUDE="include \"mpif-io-handles.h\"" - define_mpi_io=1], - [OMPI_MPIF_IO_CONSTANTS_INCLUDE= - OMPI_MPIF_IO_HANDLES_INCLUDE= - define_mpi_io=0]) - AC_SUBST(OMPI_MPIF_IO_CONSTANTS_INCLUDE) - AC_SUBST(OMPI_MPIF_IO_HANDLES_INCLUDE) - - MCA_CONFIGURE_FRAMEWORK([$1], [$2], [$define_mpi_io]) - - OMPI_PROVIDE_MPI_FILE_INTERFACE=$define_mpi_io - AC_SUBST(OMPI_PROVIDE_MPI_FILE_INTERFACE) - AC_DEFINE_UNQUOTED([OMPI_PROVIDE_MPI_FILE_INTERFACE], [$define_mpi_io], - [Whether OMPI should provide MPI File interface]) - AM_CONDITIONAL([OMPI_PROVIDE_MPI_FILE_INTERFACE], [test "$enable_mpi_io" != "no"]) - - OPAL_VAR_SCOPE_POP -]) diff --git a/ompi/mca/io/io.h b/ompi/mca/io/io.h index 5caa7b6079a..54eac054ecf 100644 --- a/ompi/mca/io/io.h +++ b/ompi/mca/io/io.h @@ -16,6 +16,7 @@ * Copyright (c) 2015 University of Houston. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -29,6 +30,7 @@ #include "mpi.h" #include "ompi/mca/mca.h" #include "ompi/request/request.h" +#include "ompi/info/info.h" /* * Forward declaration for private data on io components and modules. @@ -89,14 +91,14 @@ typedef int (*mca_io_base_component_file_unquery_fn_t) (struct ompi_file_t *file, struct mca_io_base_file_t *private_data); typedef int (*mca_io_base_component_file_delete_query_fn_t) - (const char *filename, struct ompi_info_t *info, + (const char *filename, struct opal_info_t *info, struct mca_io_base_delete_t **private_data, bool *usable, int *priority); typedef int (*mca_io_base_component_file_delete_select_fn_t) - (const char *filename, struct ompi_info_t *info, + (const char *filename, struct opal_info_t *info, struct mca_io_base_delete_t *private_data); typedef int (*mca_io_base_component_file_delete_unselect_fn_t) - (const char *filename, struct ompi_info_t *info, + (const char *filename, struct opal_info_t *info, struct mca_io_base_delete_t *private_data); typedef int (*mca_io_base_component_register_datarep_fn_t)( @@ -140,7 +142,7 @@ typedef union mca_io_base_components_t mca_io_base_components_t; typedef int (*mca_io_base_module_file_open_fn_t) (struct ompi_communicator_t *comm, const char *filename, int amode, - struct ompi_info_t *info, struct ompi_file_t *fh); + struct opal_info_t *info, struct ompi_file_t *fh); typedef int (*mca_io_base_module_file_close_fn_t)(struct ompi_file_t *fh); typedef int (*mca_io_base_module_file_set_size_fn_t) @@ -151,15 +153,11 @@ typedef int (*mca_io_base_module_file_get_size_fn_t) (struct ompi_file_t *fh, MPI_Offset *size); typedef int (*mca_io_base_module_file_get_amode_fn_t) (struct ompi_file_t *fh, int *amode); -typedef int (*mca_io_base_module_file_set_info_fn_t) - (struct ompi_file_t *fh, struct ompi_info_t *info); -typedef int (*mca_io_base_module_file_get_info_fn_t) - (struct ompi_file_t *fh, struct ompi_info_t **info_used); typedef int (*mca_io_base_module_file_set_view_fn_t) (struct ompi_file_t *fh, MPI_Offset disp, struct ompi_datatype_t *etype, struct ompi_datatype_t *filetype, const char *datarep, - struct ompi_info_t *info); + struct opal_info_t *info); typedef int (*mca_io_base_module_file_get_view_fn_t) (struct ompi_file_t *fh, MPI_Offset *disp, struct ompi_datatype_t **etype, struct ompi_datatype_t **filetype, @@ -309,8 +307,6 @@ struct mca_io_base_module_2_0_0_t { mca_io_base_module_file_preallocate_fn_t io_module_file_preallocate; mca_io_base_module_file_get_size_fn_t io_module_file_get_size; mca_io_base_module_file_get_amode_fn_t io_module_file_get_amode; - mca_io_base_module_file_set_info_fn_t io_module_file_set_info; - mca_io_base_module_file_get_info_fn_t io_module_file_get_info; mca_io_base_module_file_set_view_fn_t io_module_file_set_view; mca_io_base_module_file_get_view_fn_t io_module_file_get_view; diff --git a/ompi/mca/io/ompio/Makefile.am b/ompi/mca/io/ompio/Makefile.am index 851b96b4c32..8d25ca79a17 100644 --- a/ompi/mca/io/ompio/Makefile.am +++ b/ompi/mca/io/ompio/Makefile.am @@ -10,7 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2008-2012 University of Houston. All rights reserved. -# Copyright (c) 2016 IBM Corporation. All rights reserved. +# Copyright (c) 2016-2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -34,7 +34,8 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_io_ompio_la_SOURCES = $(headers) $(sources) mca_io_ompio_la_LDFLAGS = -module -avoid-version -mca_io_ompio_la_LIBADD = $(io_ompio_LIBS) \ +mca_io_ompio_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(io_ompio_LIBS) \ $(OMPI_TOP_BUILDDIR)/ompi/mca/common/ompio/libmca_common_ompio.la noinst_LTLIBRARIES = $(component_noinst) diff --git a/ompi/mca/io/ompio/io_ompio.c b/ompi/mca/io/ompio/io_ompio.c index b07d8ad2dd5..05188ad764e 100644 --- a/ompi/mca/io/ompio/io_ompio.c +++ b/ompi/mca/io/ompio/io_ompio.c @@ -13,7 +13,7 @@ * Copyright (c) 2008-2016 University of Houston. All rights reserved. * Copyright (c) 2011-2015 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2012-2013 Inria. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -67,7 +67,7 @@ int ompi_io_ompio_generate_current_file_view (struct mca_io_ompio_file_t *fh, k = 0; while (bytes_to_write) { - OPAL_PTRDIFF_TYPE disp; + ptrdiff_t disp; /* reallocate if needed */ if (OMPIO_IOVEC_INITIAL_SIZE*block <= k) { block ++; @@ -93,7 +93,7 @@ int ompi_io_ompio_generate_current_file_view (struct mca_io_ompio_file_t *fh, } } - disp = (OPAL_PTRDIFF_TYPE)(fh->f_decoded_iov[j].iov_base) + + disp = (ptrdiff_t)(fh->f_decoded_iov[j].iov_base) + (fh->f_total_bytes - sum_previous_counts); iov[k].iov_base = (IOVBASE_TYPE *)(intptr_t)(disp + fh->f_offset); @@ -125,7 +125,7 @@ int ompi_io_ompio_generate_current_file_view (struct mca_io_ompio_file_t *fh, int *row_index=NULL, i=0, l=0, m=0; int column_index=0, r_index=0; int blocklen[3] = {1, 1, 1}; - OPAL_PTRDIFF_TYPE d[3], base; + ptrdiff_t d[3], base; ompi_datatype_t *types[3]; ompi_datatype_t *io_array_type=MPI_DATATYPE_NULL; int **adj_matrix=NULL; @@ -172,9 +172,9 @@ int ompi_io_ompio_generate_current_file_view (struct mca_io_ompio_file_t *fh, types[1] = &ompi_mpi_long.dt; types[2] = &ompi_mpi_int.dt; - d[0] = (OPAL_PTRDIFF_TYPE)&per_process[0]; - d[1] = (OPAL_PTRDIFF_TYPE)&per_process[0].length; - d[2] = (OPAL_PTRDIFF_TYPE)&per_process[0].process_id; + d[0] = (ptrdiff_t)&per_process[0]; + d[1] = (ptrdiff_t)&per_process[0].length; + d[2] = (ptrdiff_t)&per_process[0].process_id; base = d[0]; for (i=0;i<3;i++){ d[i] -= base; @@ -640,18 +640,38 @@ int ompi_io_ompio_sort_offlen (mca_io_ompio_offlen_array_t *io_array, } -void mca_io_ompio_get_num_aggregators ( int *num_aggregators) +int mca_io_ompio_get_mca_parameter_value ( char *mca_parameter_name, int name_length ) { - *num_aggregators = mca_io_ompio_num_aggregators; - return; -} + if ( !strncmp ( mca_parameter_name, "num_aggregators", name_length )) { + return mca_io_ompio_num_aggregators; + } + else if ( !strncmp ( mca_parameter_name, "bytes_per_agg", name_length )) { + return mca_io_ompio_bytes_per_agg; + } + else if ( !strncmp ( mca_parameter_name, "overwrite_amode", name_length )) { + return mca_io_ompio_overwrite_amode; + } + else if ( !strncmp ( mca_parameter_name, "cycle_buffer_size", name_length )) { + return mca_io_ompio_cycle_buffer_size; + } + else if ( !strncmp ( mca_parameter_name, "max_aggregators_ratio", name_length )) { + return mca_io_ompio_max_aggregators_ratio; + } + else if ( !strncmp ( mca_parameter_name, "aggregators_cutoff_threshold", name_length )) { + return mca_io_ompio_aggregators_cutoff_threshold; + } + else { + opal_output (1, "Error in mca_io_ompio_get_mca_parameter_value: unknown parameter name"); + } -void mca_io_ompio_get_bytes_per_agg ( int *bytes_per_agg) -{ - *bytes_per_agg = mca_io_ompio_bytes_per_agg; - return; + /* Using here OMPI_ERROR_MAX instead of OMPI_ERROR, since -1 (which is OMPI_ERROR) + ** is a valid value for some mca parameters, indicating that the user did not set + ** that parameter value + */ + return OMPI_ERR_MAX; } + diff --git a/ompi/mca/io/ompio/io_ompio.h b/ompi/mca/io/ompio/io_ompio.h index 078e66c6763..8c26b59f744 100644 --- a/ompi/mca/io/ompio/io_ompio.h +++ b/ompi/mca/io/ompio/io_ompio.h @@ -11,8 +11,9 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2016 University of Houston. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -48,6 +49,10 @@ extern int mca_io_ompio_num_aggregators; extern int mca_io_ompio_record_offset_info; extern int mca_io_ompio_sharedfp_lazy_open; extern int mca_io_ompio_grouping_option; +extern int mca_io_ompio_max_aggregators_ratio; +extern int mca_io_ompio_aggregators_cutoff_threshold; +extern int mca_io_ompio_overwrite_amode; + OMPI_DECLSPEC extern int mca_io_ompio_coll_timing_info; /* @@ -60,10 +65,14 @@ OMPI_DECLSPEC extern int mca_io_ompio_coll_timing_info; #define OMPIO_CONTIGUOUS_FVIEW 0x00000010 #define OMPIO_AGGREGATOR_IS_SET 0x00000020 #define OMPIO_SHAREDFP_IS_SET 0x00000040 +#define OMPIO_LOCK_ENTIRE_FILE 0x00000080 +#define OMPIO_LOCK_NEVER 0x00000100 +#define OMPIO_LOCK_NOT_THIS_OP 0x00000200 + #define QUEUESIZE 2048 #define MCA_IO_DEFAULT_FILE_VIEW_SIZE 4*1024*1024 -#define OMPIO_FCOLL_WANT_TIME_BREAKDOWN 1 +#define OMPIO_FCOLL_WANT_TIME_BREAKDOWN 0 #define OMPIO_MIN(a, b) (((a) < (b)) ? (a) : (b)) #define OMPIO_MAX(a, b) (((a) < (b)) ? (b) : (a)) @@ -107,7 +116,7 @@ OMPI_DECLSPEC extern int mca_io_ompio_coll_timing_info; #define OPTIMIZE_GROUPING 4 #define SIMPLE 5 #define NO_REFINEMENT 6 - +#define SIMPLE_PLUS 7 #define OMPIO_UNIFORM_DIST_THRESHOLD 0.5 #define OMPIO_CONTG_THRESHOLD 1048576 @@ -117,6 +126,10 @@ OMPI_DECLSPEC extern int mca_io_ompio_coll_timing_info; #define OMPIO_PROCS_IN_GROUP_TAG 1 #define OMPIO_MERGE_THRESHOLD 0.5 + +#define OMPIO_LOCK_ENTIRE_REGION 10 +#define OMPIO_LOCK_SELECTIVE 11 + /*---------------------------*/ BEGIN_C_DECLS @@ -183,8 +196,7 @@ typedef int (*mca_io_ompio_generate_current_file_view_fn_t) (struct mca_io_ompio /* functions to retrieve the number of aggregators and the size of the temporary buffer on aggregators from the fcoll modules */ -typedef void (*mca_io_ompio_get_num_aggregators_fn_t) ( int *num_aggregators); -typedef void (*mca_io_ompio_get_bytes_per_agg_fn_t) ( int *bytes_per_agg); +typedef int (*mca_io_ompio_get_mca_parameter_value_fn_t) ( char *mca_parameter_name, int name_length ); typedef int (*mca_io_ompio_set_aggregator_props_fn_t) (struct mca_io_ompio_file_t *fh, int num_aggregators, size_t bytes_per_proc); @@ -209,9 +221,10 @@ struct mca_io_ompio_file_t { const char *f_filename; char *f_datarep; opal_convertor_t *f_convertor; - ompi_info_t *f_info; + opal_info_t *f_info; int32_t f_flags; void *f_fs_ptr; + int f_fs_block_size; int f_atomicity; size_t f_stripe_size; int f_stripe_count; @@ -237,7 +250,7 @@ struct mca_io_ompio_file_t { size_t f_position_in_file_view; /* in bytes */ size_t f_total_bytes; /* total bytes read/written within 1 Fview*/ int f_index_in_file_view; - OPAL_PTRDIFF_TYPE f_view_extent; + ptrdiff_t f_view_extent; size_t f_view_size; ompi_datatype_t *f_etype; ompi_datatype_t *f_filetype; @@ -277,8 +290,7 @@ struct mca_io_ompio_file_t { mca_io_ompio_decode_datatype_fn_t f_decode_datatype; mca_io_ompio_generate_current_file_view_fn_t f_generate_current_file_view; - mca_io_ompio_get_num_aggregators_fn_t f_get_num_aggregators; - mca_io_ompio_get_bytes_per_agg_fn_t f_get_bytes_per_agg; + mca_io_ompio_get_mca_parameter_value_fn_t f_get_mca_parameter_value; mca_io_ompio_set_aggregator_props_fn_t f_set_aggregator_props; }; typedef struct mca_io_ompio_file_t mca_io_ompio_file_t; @@ -294,8 +306,7 @@ typedef struct mca_io_ompio_data_t mca_io_ompio_data_t; /* functions to retrieve the number of aggregators and the size of the temporary buffer on aggregators from the fcoll modules */ -OMPI_DECLSPEC void mca_io_ompio_get_num_aggregators ( int *num_aggregators); -OMPI_DECLSPEC void mca_io_ompio_get_bytes_per_agg ( int *bytes_per_agg); +OMPI_DECLSPEC int mca_io_ompio_get_mca_parameter_value ( char *mca_parameter_name, int name_length); /* * Function that takes in a datatype and buffer, and decodes that datatype @@ -340,7 +351,7 @@ int mca_io_ompio_file_set_view (struct ompi_file_t *fh, struct ompi_datatype_t *etype, struct ompi_datatype_t *filetype, const char *datarep, - struct ompi_info_t *info); + struct opal_info_t *info); int mca_io_ompio_file_get_view (struct ompi_file_t *fh, OMPI_MPI_OFFSET_TYPE *disp, @@ -350,11 +361,11 @@ int mca_io_ompio_file_get_view (struct ompi_file_t *fh, int mca_io_ompio_file_open (struct ompi_communicator_t *comm, const char *filename, int amode, - struct ompi_info_t *info, + struct opal_info_t *info, struct ompi_file_t *fh); int mca_io_ompio_file_close (struct ompi_file_t *fh); int mca_io_ompio_file_delete (const char *filename, - struct ompi_info_t *info); + struct opal_info_t *info); int mca_io_ompio_file_set_size (struct ompi_file_t *fh, OMPI_MPI_OFFSET_TYPE size); int mca_io_ompio_file_preallocate (struct ompi_file_t *fh, @@ -363,10 +374,6 @@ int mca_io_ompio_file_get_size (struct ompi_file_t *fh, OMPI_MPI_OFFSET_TYPE * size); int mca_io_ompio_file_get_amode (struct ompi_file_t *fh, int *amode); -int mca_io_ompio_file_set_info (struct ompi_file_t *fh, - struct ompi_info_t *info); -int mca_io_ompio_file_get_info (struct ompi_file_t *fh, - struct ompi_info_t ** info_used); int mca_io_ompio_file_sync (struct ompi_file_t *fh); int mca_io_ompio_file_seek (struct ompi_file_t *fh, OMPI_MPI_OFFSET_TYPE offet, @@ -377,7 +384,7 @@ int mca_io_ompio_file_set_view (struct ompi_file_t *fh, struct ompi_datatype_t *etype, struct ompi_datatype_t *filetype, const char *datarep, - struct ompi_info_t *info); + struct opal_info_t *info); int mca_io_ompio_file_get_view (struct ompi_file_t *fh, OMPI_MPI_OFFSET_TYPE *disp, struct ompi_datatype_t **etype, diff --git a/ompi/mca/io/ompio/io_ompio_aggregators.c b/ompi/mca/io/ompio/io_ompio_aggregators.c index bc825349e88..d2e59a61dd5 100644 --- a/ompi/mca/io/ompio/io_ompio_aggregators.c +++ b/ompi/mca/io/ompio/io_ompio_aggregators.c @@ -10,11 +10,12 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2008-2016 University of Houston. All rights reserved. - * Copyright (c) 2011-2015 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2008-2017 University of Houston. All rights reserved. + * Copyright (c) 2011-2018 Cisco Systems, Inc. All rights reserved * Copyright (c) 2012-2013 Inria. All rights reserved. * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -46,41 +47,148 @@ ** ** The first group functions determines the number of aggregators based on various characteristics ** -** 1. simple_grouping:aA simple heuristic based on the amount of data written and size of -** of the temporary buffer used by aggregator processes +** 1. simple_grouping: A heuristic based on a cost model ** 2. fview_based_grouping: analysis the fileview to detect regular patterns ** 3. cart_based_grouping: uses a cartesian communicator to derive certain (probable) properties ** of the access pattern */ + +static double cost_calc (int P, int P_agg, size_t Data_proc, size_t coll_buffer, int dim ); +#define DIM1 1 +#define DIM2 2 + int mca_io_ompio_simple_grouping(mca_io_ompio_file_t *fh, - int *num_groups, + int *num_groups_out, mca_io_ompio_contg *contg_groups) { - size_t stripe_size = (size_t) fh->f_stripe_size; int group_size = 0; int k=0, p=0, g=0; int total_procs = 0; + int num_groups=1; - if ( 0 < fh->f_stripe_size ) { - stripe_size = OMPIO_DEFAULT_STRIPE_SIZE; - } + double time=0.0, time_prev=0.0, dtime=0.0, dtime_abs=0.0, dtime_diff=0.0, dtime_prev=0.0; + double dtime_threshold=0.0; + + /* This is the threshold for absolute improvement. It is not + ** exposed as an MCA parameter to avoid overwhelming users. It is + ** mostly relevant for smaller process counts and data volumes. + */ + double time_threshold=0.001; + + int incr=1, mode=1; + int P_a, P_a_prev; + + /* The aggregator selection algorithm is based on the formulas described + ** in: Shweta Jha, Edgar Gabriel, 'Performance Models for Communication in + ** Collective I/O operations', Proceedings of the 17th IEEE/ACM Symposium + ** on Cluster, Cloud and Grid Computing, Workshop on Theoretical + ** Approaches to Performance Evaluation, Modeling and Simulation, 2017. + ** + ** The current implementation is based on the 1-D and 2-D models derived for the even + ** file partitioning strategy in the paper. Note, that the formulas currently only model + ** the communication aspect of collective I/O operations. There are two extensions in this + ** implementation: + ** + ** 1. Since the resulting formula has an asymptotic behavior w.r.t. the + ** no. of aggregators, this version determines the no. of aggregators to + ** be used iteratively and stops increasing the no. of aggregators if the + ** benefits of increasing the aggregators is below a certain threshold + ** value relative to the last number tested. The aggresivnes of cutting of + ** the increasie in the number of aggregators is controlled by the new mca + ** parameter mca_io_ompio_aggregator_cutoff_threshold. Lower values for + ** this parameter will lead to higher number of aggregators (useful e.g + ** for PVFS2 and GPFS file systems), while higher number will lead to + ** lower no. of aggregators (useful for regular UNIX or NFS file systems). + ** + ** 2. The algorithm further caps the maximum no. of aggregators used to not exceed + ** (no. of processes / mca_io_ompio_max_aggregators_ratio), i.e. a higher value + ** for mca_io_ompio_max_aggregators will decrease the maximum number of aggregators + ** allowed for the given no. of processes. + */ + dtime_threshold = (double) mca_io_ompio_aggregators_cutoff_threshold / 100.0; + + /* Determine whether to use the formula for 1-D or 2-D data decomposition. Anything + ** that is not 1-D is assumed to be 2-D in this version + */ + mode = ( fh->f_cc_size == fh->f_view_size ) ? 1 : 2; - if ( 0 != fh->f_cc_size && stripe_size > fh->f_cc_size ) { - group_size = (((int)stripe_size/(int)fh->f_cc_size) > fh->f_size ) ? fh->f_size : ((int)stripe_size/(int)fh->f_cc_size); - *num_groups = fh->f_size / group_size; + /* Determine the increment size when searching the optimal + ** no. of aggregators + */ + if ( fh->f_size < 16 ) { + incr = 2; + } + else if (fh->f_size < 128 ) { + incr = 4; + } + else if ( fh->f_size < 4096 ) { + incr = 16; } - else if ( fh->f_cc_size <= OMPIO_CONTG_FACTOR * stripe_size) { - *num_groups = fh->f_size/OMPIO_CONTG_FACTOR > 0 ? (fh->f_size/OMPIO_CONTG_FACTOR) : 1 ; - group_size = OMPIO_CONTG_FACTOR; - } else { - *num_groups = fh->f_size; - group_size = 1; + incr = 32; + } + + P_a = 1; + time_prev = cost_calc ( fh->f_size, P_a, fh->f_view_size, (size_t) fh->f_bytes_per_agg, mode ); + P_a_prev = P_a; + for ( P_a = incr; P_a <= fh->f_size; P_a += incr ) { + time = cost_calc ( fh->f_size, P_a, fh->f_view_size, (size_t) fh->f_bytes_per_agg, mode ); + dtime_abs = (time_prev - time); + dtime = dtime_abs / time_prev; + dtime_diff = ( P_a == incr ) ? dtime : (dtime_prev - dtime); +#ifdef OMPIO_DEBUG + if ( 0 == fh->f_rank ){ + printf(" d_p = %ld P_a = %d time = %lf dtime = %lf dtime_abs =%lf dtime_diff=%lf\n", + fh->f_view_size, P_a, time, dtime, dtime_abs, dtime_diff ); + } +#endif + if ( dtime_diff < dtime_threshold ) { + /* The relative improvement compared to the last number + ** of aggregators was below a certain threshold. This is typically + ** the dominating factor for large data volumes and larger process + ** counts + */ +#ifdef OMPIO_DEBUG + if ( 0 == fh->f_rank ) { + printf("dtime_diff below threshold\n"); + } +#endif + break; + } + if ( dtime_abs < time_threshold ) { + /* The absolute improvement compared to the last number + ** of aggregators was below a given threshold. This is typically + ** important for small data valomes and smallers process counts + */ +#ifdef OMPIO_DEBUG + if ( 0 == fh->f_rank ) { + printf("dtime_abs below threshold\n"); + } +#endif + break; + } + time_prev = time; + dtime_prev = dtime; + P_a_prev = P_a; + } + num_groups = P_a_prev; +#ifdef OMPIO_DEBUG + printf(" For P=%d d_p=%ld b_c=%d threshold=%f chosen P_a = %d \n", + fh->f_size, fh->f_view_size, fh->f_bytes_per_agg, dtime_threshold, P_a_prev); +#endif + + /* Cap the maximum number of aggregators.*/ + if ( num_groups > (fh->f_size/mca_io_ompio_max_aggregators_ratio)) { + num_groups = (fh->f_size/mca_io_ompio_max_aggregators_ratio); + } + if ( 1 >= num_groups ) { + num_groups = 1; } + group_size = fh->f_size / num_groups; - for ( k=0, p=0; p<*num_groups; p++ ) { - if ( p == (*num_groups - 1) ) { + for ( k=0, p=0; pf_size - total_procs; } else { @@ -92,6 +200,8 @@ int mca_io_ompio_simple_grouping(mca_io_ompio_file_t *fh, k++; } } + + *num_groups_out = num_groups; return OMPI_SUCCESS; } @@ -192,19 +302,26 @@ int mca_io_ompio_fview_based_grouping(mca_io_ompio_file_t *fh, return ret; } -int mca_io_ompio_cart_based_grouping(mca_io_ompio_file_t *ompio_fh) +int mca_io_ompio_cart_based_grouping(mca_io_ompio_file_t *ompio_fh, + int *num_groups, + mca_io_ompio_contg *contg_groups) { int k = 0; - int j = 0; - int n = 0; + int g=0; int ret = OMPI_SUCCESS, tmp_rank = 0; - int coords_tmp[2] = { 0 }; + int *coords_tmp = NULL; mca_io_ompio_cart_topo_components cart_topo; memset (&cart_topo, 0, sizeof(mca_io_ompio_cart_topo_components)); ret = ompio_fh->f_comm->c_topo->topo.cart.cartdim_get(ompio_fh->f_comm, &cart_topo.ndims); - if (OMPI_SUCCESS != ret ) { + if (OMPI_SUCCESS != ret ) { + goto exit; + } + + if (cart_topo.ndims < 2 ) { + /* We shouldn't be here, this routine only works for more than 1 dimension */ + ret = MPI_ERR_INTERN; goto exit; } @@ -227,6 +344,13 @@ int mca_io_ompio_cart_based_grouping(mca_io_ompio_file_t *ompio_fh) goto exit; } + coords_tmp = (int*)malloc (cart_topo.ndims * sizeof(int)); + if (NULL == coords_tmp) { + opal_output (1, "OUT OF MEMORY\n"); + ret = OMPI_ERR_OUT_OF_RESOURCE; + goto exit; + } + ret = ompio_fh->f_comm->c_topo->topo.cart.cart_get(ompio_fh->f_comm, cart_topo.ndims, cart_topo.dims, @@ -237,55 +361,50 @@ int mca_io_ompio_cart_based_grouping(mca_io_ompio_file_t *ompio_fh) goto exit; } - ompio_fh->f_init_procs_per_group = cart_topo.dims[1]; //number of elements per row - ompio_fh->f_init_num_aggrs = cart_topo.dims[0]; //number of rows - - //Make an initial list of potential aggregators - ompio_fh->f_init_aggr_list = (int *) malloc (ompio_fh->f_init_num_aggrs * sizeof(int)); - if (NULL == ompio_fh->f_init_aggr_list) { - opal_output (1, "OUT OF MEMORY\n"); - ret = OMPI_ERR_OUT_OF_RESOURCE; - goto exit; - } + *num_groups = cart_topo.dims[0]; //number of rows for(k = 0; k < cart_topo.dims[0]; k++){ + int done = 0; + int index = cart_topo.ndims-1; + + memset ( coords_tmp, 0, cart_topo.ndims * sizeof(int)); + contg_groups[k].procs_per_contg_group = (ompio_fh->f_size / cart_topo.dims[0]); coords_tmp[0] = k; - coords_tmp[1] = k * cart_topo.dims[1]; + ret = ompio_fh->f_comm->c_topo->topo.cart.cart_rank (ompio_fh->f_comm,coords_tmp,&tmp_rank); if ( OMPI_SUCCESS != ret ) { opal_output (1, "mca_io_ompio_cart_based_grouping: Error in cart_rank\n"); goto exit; } - ompio_fh->f_init_aggr_list[k] = tmp_rank; - } - - //Initial Grouping - ompio_fh->f_init_procs_in_group = (int*)malloc (ompio_fh->f_init_procs_per_group * sizeof(int)); - if (NULL == ompio_fh->f_init_procs_in_group) { - opal_output (1, "OUT OF MEMORY\n"); - free (ompio_fh->f_init_aggr_list ); - ompio_fh->f_init_aggr_list=NULL; - ret = OMPI_ERR_OUT_OF_RESOURCE; - goto exit; - } + contg_groups[k].procs_in_contg_group[0] = tmp_rank; + + for ( g=1; g< contg_groups[k].procs_per_contg_group; g++ ) { + done = 0; + index = cart_topo.ndims-1; + + while ( ! done ) { + coords_tmp[index]++; + if ( coords_tmp[index] ==cart_topo.dims[index] ) { + coords_tmp[index]=0; + index--; + } + else { + done = 1; + } + if ( index == 0 ) { + done = 1; + } + } - for (j=0 ; j< ompio_fh->f_size ; j++) { - ompio_fh->f_comm->c_topo->topo.cart.cart_coords (ompio_fh->f_comm, j, cart_topo.ndims, coords_tmp); - if (coords_tmp[0] == cart_topo.coords[0]) { - if ((coords_tmp[1]/ompio_fh->f_init_procs_per_group) == - (cart_topo.coords[1]/ompio_fh->f_init_procs_per_group)) { - ompio_fh->f_init_procs_in_group[n] = j; - n++; - } + ret = ompio_fh->f_comm->c_topo->topo.cart.cart_rank (ompio_fh->f_comm,coords_tmp,&tmp_rank); + if ( OMPI_SUCCESS != ret ) { + opal_output (1, "mca_io_ompio_cart_based_grouping: Error in cart_rank\n"); + goto exit; + } + contg_groups[k].procs_in_contg_group[g] = tmp_rank; } } - /*print original group */ - /*printf("RANK%d Initial distribution \n",ompio_fh->f_rank); - for(k = 0; k < ompio_fh->f_init_procs_per_group; k++){ - printf("%d,", ompio_fh->f_init_procs_in_group[k]); - } - printf("\n");*/ exit: if (NULL != cart_topo.dims) { @@ -300,6 +419,10 @@ int mca_io_ompio_cart_based_grouping(mca_io_ompio_file_t *ompio_fh) free (cart_topo.coords); cart_topo.coords = NULL; } + if (NULL != coords_tmp) { + free (coords_tmp); + coords_tmp = NULL; + } return ret; } @@ -374,8 +497,9 @@ int mca_io_ompio_set_aggregator_props (struct mca_io_ompio_file_t *fh, fh->f_flags |= OMPIO_AGGREGATOR_IS_SET; if (-1 == num_aggregators) { - if ( SIMPLE == mca_io_ompio_grouping_option || - NO_REFINEMENT == mca_io_ompio_grouping_option ) { + if ( SIMPLE == mca_io_ompio_grouping_option || + NO_REFINEMENT == mca_io_ompio_grouping_option || + SIMPLE_PLUS == mca_io_ompio_grouping_option ) { fh->f_aggregator_index = 0; fh->f_final_num_aggrs = fh->f_init_num_aggrs; fh->f_procs_per_group = fh->f_init_procs_per_group; @@ -399,6 +523,9 @@ int mca_io_ompio_set_aggregator_props (struct mca_io_ompio_file_t *fh, /* Forced number of aggregators ** calculate the offset at which each group of processes will start */ + if ( num_aggregators > fh->f_size ) { + num_aggregators = fh->f_size; + } procs_per_group = ceil ((float)fh->f_size/num_aggregators); /* calculate the number of processes in the local group */ @@ -908,7 +1035,7 @@ int mca_io_ompio_merge_groups(mca_io_ompio_file_t *fh, //merge_aggrs[0] is considered the new aggregator //New aggregator collects group sizes of the groups to be merged - ret = fcoll_base_coll_allgather_array (&fh->f_init_procs_per_group, + ret = ompi_fcoll_base_coll_allgather_array (&fh->f_init_procs_per_group, 1, MPI_INT, sizes_old_group, @@ -944,7 +1071,7 @@ int mca_io_ompio_merge_groups(mca_io_ompio_file_t *fh, //New aggregator also collects the grouping distribution //This is the actual merge //use allgatherv array - ret = fcoll_base_coll_allgatherv_array (fh->f_init_procs_in_group, + ret = ompi_fcoll_base_coll_allgatherv_array (fh->f_init_procs_in_group, fh->f_init_procs_per_group, MPI_INT, fh->f_procs_in_group, @@ -1127,7 +1254,7 @@ int mca_io_ompio_prepare_to_group(mca_io_ompio_file_t *fh, } //Gather start offsets across processes in a group on aggregator - ret = fcoll_base_coll_allgather_array (start_offset_len, + ret = ompi_fcoll_base_coll_allgather_array (start_offset_len, 3, OMPI_OFFSET_DATATYPE, start_offsets_lens_tmp, @@ -1138,7 +1265,7 @@ int mca_io_ompio_prepare_to_group(mca_io_ompio_file_t *fh, fh->f_init_procs_per_group, fh->f_comm); if ( OMPI_SUCCESS != ret ) { - opal_output (1, "mca_io_ompio_prepare_to_grou[: error in fcoll_base_coll_allgather_array\n"); + opal_output (1, "mca_io_ompio_prepare_to_grou[: error in ompi_fcoll_base_coll_allgather_array\n"); goto exit; } end_offsets_tmp = (OMPI_MPI_OFFSET_TYPE* )malloc (fh->f_init_procs_per_group * sizeof(OMPI_MPI_OFFSET_TYPE)); @@ -1169,16 +1296,21 @@ int mca_io_ompio_prepare_to_group(mca_io_ompio_file_t *fh, if (NULL == aggr_bytes_per_group_tmp) { opal_output (1, "OUT OF MEMORY\n"); ret = OMPI_ERR_OUT_OF_RESOURCE; + free(end_offsets_tmp); goto exit; } decision_list_tmp = (int* )malloc (fh->f_init_num_aggrs * sizeof(int)); if (NULL == decision_list_tmp) { opal_output (1, "OUT OF MEMORY\n"); ret = OMPI_ERR_OUT_OF_RESOURCE; + free(end_offsets_tmp); + if (NULL != aggr_bytes_per_group_tmp) { + free(aggr_bytes_per_group_tmp); + } goto exit; } //Communicate bytes per group between all aggregators - ret = fcoll_base_coll_allgather_array (bytes_per_group, + ret = ompi_fcoll_base_coll_allgather_array (bytes_per_group, 1, OMPI_OFFSET_DATATYPE, aggr_bytes_per_group_tmp, @@ -1189,7 +1321,7 @@ int mca_io_ompio_prepare_to_group(mca_io_ompio_file_t *fh, fh->f_init_num_aggrs, fh->f_comm); if ( OMPI_SUCCESS != ret ) { - opal_output (1, "mca_io_ompio_prepare_to_grou[: error in fcoll_base_coll_allgather_array 2\n"); + opal_output (1, "mca_io_ompio_prepare_to_grou[: error in ompi_fcoll_base_coll_allgather_array 2\n"); free(decision_list_tmp); goto exit; } @@ -1263,7 +1395,7 @@ int mca_io_ompio_prepare_to_group(mca_io_ompio_file_t *fh, *decision_list = &decision_list_tmp[0]; } //Communicate flag to all group members - ret = fcoll_base_coll_bcast_array (ompio_grouping_flag, + ret = ompi_fcoll_base_coll_bcast_array (ompio_grouping_flag, 1, MPI_INT, 0, @@ -1272,17 +1404,86 @@ int mca_io_ompio_prepare_to_group(mca_io_ompio_file_t *fh, fh->f_comm); exit: - if (NULL != aggr_bytes_per_group_tmp) { - free(aggr_bytes_per_group_tmp); - } - if (NULL != start_offsets_lens_tmp) { - free(start_offsets_lens_tmp); - } - if (NULL != end_offsets_tmp) { - free(end_offsets_tmp); - } + /* Do not free aggr_bytes_per_group_tmp, + ** start_offsets_lens_tmp, and end_offsets_tmp + ** here. The memory is released in the layer above. + */ + return ret; } - +/* +** This is the actual formula of the cost function from the paper. +** One change made here is to use floating point values for +** all parameters, since the ceil() function leads to sometimes +** unexpected jumps in the execution time. Using float leads to +** more consistent predictions for the no. of aggregators. +*/ +static double cost_calc (int P, int P_a, size_t d_p, size_t b_c, int dim ) +{ + float n_as=1.0, m_s=1.0, n_s=1.0; + float n_ar=1.0; + double t_send, t_recv, t_tot; + + /* LogGP parameters based on DDR InfiniBand values */ + double L=.00000184; + double o=.00000149; + double g=.0000119; + double G=.00000000067; + + long file_domain = (P * d_p) / P_a; + float n_r = (float)file_domain/(float) b_c; + + switch (dim) { + case DIM1: + { + if( d_p > b_c ){ + //printf("case 1\n"); + n_ar = 1; + n_as = 1; + m_s = b_c; + n_s = (float)d_p/(float)b_c; + } + else { + n_ar = (float)b_c/(float)d_p; + n_as = 1; + m_s = d_p; + n_s = 1; + } + break; + } + case DIM2: + { + int P_x, P_y, c; + + P_x = P_y = (int) sqrt(P); + c = (float) P_a / (float)P_x; + + n_ar = (float) P_y; + n_as = (float) c; + if ( d_p > (P_a*b_c/P )) { + m_s = fmin(b_c / P_y, d_p); + } + else { + m_s = fmin(d_p * P_x / P_a, d_p); + } + break; + } + default : + printf("stop putting random values\n"); + break; + } + + n_s = (float) d_p / (float)(n_as * m_s); + + if( m_s < 33554432) { + g = .00000108; + } + t_send = n_s * (L + 2 * o + (n_as -1) * g + (m_s - 1) * n_as * G); + t_recv= n_r * (L + 2 * o + (n_ar -1) * g + (m_s - 1) * n_ar * G);; + t_tot = t_send + t_recv; + + return t_tot; +} + diff --git a/ompi/mca/io/ompio/io_ompio_aggregators.h b/ompi/mca/io/ompio/io_ompio_aggregators.h index f1b60057d17..dd6b87b7023 100644 --- a/ompi/mca/io/ompio/io_ompio_aggregators.h +++ b/ompi/mca/io/ompio/io_ompio_aggregators.h @@ -51,7 +51,8 @@ OMPI_DECLSPEC int mca_io_ompio_set_aggregator_props (struct mca_io_ompio_file_t int num_aggregators, size_t bytes_per_proc); -int mca_io_ompio_cart_based_grouping(mca_io_ompio_file_t *ompio_fh); +int mca_io_ompio_cart_based_grouping(mca_io_ompio_file_t *ompio_fh, int *num_groups, + mca_io_ompio_contg *contg_groups); int mca_io_ompio_fview_based_grouping(mca_io_ompio_file_t *fh, int *num_groups, mca_io_ompio_contg *contg_groups); diff --git a/ompi/mca/io/ompio/io_ompio_component.c b/ompi/mca/io/ompio/io_ompio_component.c index 6a63bce0586..ec05aead4c2 100644 --- a/ompi/mca/io/ompio/io_ompio_component.c +++ b/ompi/mca/io/ompio/io_ompio_component.c @@ -15,6 +15,7 @@ * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -38,6 +39,9 @@ int mca_io_ompio_num_aggregators = -1; int mca_io_ompio_record_offset_info = 0; int mca_io_ompio_coll_timing_info = 0; int mca_io_ompio_sharedfp_lazy_open = 0; +int mca_io_ompio_max_aggregators_ratio=8; +int mca_io_ompio_aggregators_cutoff_threshold=3; +int mca_io_ompio_overwrite_amode = 1; int mca_io_ompio_grouping_option=5; @@ -56,11 +60,11 @@ file_query (struct ompi_file_t *file, static int file_unquery(struct ompi_file_t *file, struct mca_io_base_file_t *private_data); -static int delete_query(const char *filename, struct ompi_info_t *info, +static int delete_query(const char *filename, struct opal_info_t *info, struct mca_io_base_delete_t **private_data, bool *usable, int *priorty); -static int delete_select(const char *filename, struct ompi_info_t *info, +static int delete_select(const char *filename, struct opal_info_t *info, struct mca_io_base_delete_t *private_data); static int register_datarep(const char *, @@ -209,12 +213,47 @@ static int register_component(void) "Option for grouping of processes in the aggregator selection " "1: Data volume based grouping 2: maximizing group size uniformity 3: maximimze " "data contiguity 4: hybrid optimization 5: simple (default) " - "6: skip refinement step", + "6: skip refinement step 7: simple+: grouping based on default file view", MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, OPAL_INFO_LVL_9, MCA_BASE_VAR_SCOPE_READONLY, &mca_io_ompio_grouping_option); + mca_io_ompio_max_aggregators_ratio = 8; + (void) mca_base_component_var_register(&mca_io_ompio_component.io_version, + "max_aggregators_ratio", + "Maximum number of processes that can be an aggregator expressed as " + "the ratio to the number of process used to open the file" + " i.e 1 out of n processes can be an aggregator, with n being specified" + " by this mca parameter.", + MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, + OPAL_INFO_LVL_9, + MCA_BASE_VAR_SCOPE_READONLY, + &mca_io_ompio_max_aggregators_ratio); + + + mca_io_ompio_aggregators_cutoff_threshold=3; + (void) mca_base_component_var_register(&mca_io_ompio_component.io_version, + "aggregators_cutoff_threshold", + "Relativ cutoff threshold for incrementing the number of aggregators " + "in the simple aggregator selection algorithm (5). Lower value " + "for this parameter will lead to higher no. of aggregators.", + MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, + OPAL_INFO_LVL_9, + MCA_BASE_VAR_SCOPE_READONLY, + &mca_io_ompio_aggregators_cutoff_threshold); + + mca_io_ompio_overwrite_amode = 1; + (void) mca_base_component_var_register(&mca_io_ompio_component.io_version, + "overwrite_amode", + "Overwrite WRONLY amode to RDWR to enable data sieving " + "1: allow overwrite (default) " + "0: do not overwrite amode provided by application ", + MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, + OPAL_INFO_LVL_9, + MCA_BASE_VAR_SCOPE_READONLY, + &mca_io_ompio_overwrite_amode); + return OMPI_SUCCESS; } @@ -321,7 +360,7 @@ static int file_unquery(struct ompi_file_t *file, } -static int delete_query(const char *filename, struct ompi_info_t *info, +static int delete_query(const char *filename, struct opal_info_t *info, struct mca_io_base_delete_t **private_data, bool *usable, int *priority) { @@ -332,7 +371,7 @@ static int delete_query(const char *filename, struct ompi_info_t *info, return OMPI_SUCCESS; } -static int delete_select(const char *filename, struct ompi_info_t *info, +static int delete_select(const char *filename, struct opal_info_t *info, struct mca_io_base_delete_t *private_data) { int ret; diff --git a/ompi/mca/io/ompio/io_ompio_file_open.c b/ompi/mca/io/ompio/io_ompio_file_open.c index b7442263897..59197556a1f 100644 --- a/ompi/mca/io/ompio/io_ompio_file_open.c +++ b/ompi/mca/io/ompio/io_ompio_file_open.c @@ -10,9 +10,10 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2016 University of Houston. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2016 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -43,7 +44,7 @@ int mca_io_ompio_file_open (ompi_communicator_t *comm, const char *filename, int amode, - ompi_info_t *info, + opal_info_t *info, ompi_file_t *fh) { int ret = OMPI_SUCCESS; @@ -78,7 +79,6 @@ int mca_io_ompio_file_open (ompi_communicator_t *comm, return ret; } - int mca_io_ompio_file_close (ompi_file_t *fh) { int ret = OMPI_SUCCESS; @@ -103,7 +103,7 @@ int mca_io_ompio_file_close (ompi_file_t *fh) } int mca_io_ompio_file_delete (const char *filename, - struct ompi_info_t *info) + struct opal_info_t *info) { int ret = OMPI_SUCCESS; @@ -137,7 +137,7 @@ int mca_io_ompio_file_preallocate (ompi_file_t *fh, data = (mca_io_ompio_data_t *) fh->f_io_selected_data; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); tmp = diskspace; ret = data->ompio_fh.f_comm->c_coll->coll_bcast (&tmp, @@ -147,23 +147,23 @@ int mca_io_ompio_file_preallocate (ompi_file_t *fh, data->ompio_fh.f_comm, data->ompio_fh.f_comm->c_coll->coll_bcast_module); if ( OMPI_SUCCESS != ret ) { - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return OMPI_ERROR; } if (tmp != diskspace) { - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return OMPI_ERROR; } ret = data->ompio_fh.f_fs->fs_file_get_size (&data->ompio_fh, ¤t_size); if ( OMPI_SUCCESS != ret ) { - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return OMPI_ERROR; } if ( current_size > diskspace ) { - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return OMPI_SUCCESS; } @@ -240,7 +240,7 @@ int mca_io_ompio_file_preallocate (ompi_file_t *fh, if ( diskspace > current_size ) { data->ompio_fh.f_fs->fs_file_set_size (&data->ompio_fh, diskspace); } - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -255,7 +255,7 @@ int mca_io_ompio_file_set_size (ompi_file_t *fh, data = (mca_io_ompio_data_t *) fh->f_io_selected_data; tmp = size; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = data->ompio_fh.f_comm->c_coll->coll_bcast (&tmp, 1, OMPI_OFFSET_DATATYPE, @@ -264,20 +264,20 @@ int mca_io_ompio_file_set_size (ompi_file_t *fh, data->ompio_fh.f_comm->c_coll->coll_bcast_module); if ( OMPI_SUCCESS != ret ) { opal_output(1, ",mca_io_ompio_file_set_size: error in bcast\n"); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } if (tmp != size) { - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return OMPI_ERROR; } ret = data->ompio_fh.f_fs->fs_file_set_size (&data->ompio_fh, size); if ( OMPI_SUCCESS != ret ) { opal_output(1, ",mca_io_ompio_file_set_size: error in fs->set_size\n"); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -285,10 +285,10 @@ int mca_io_ompio_file_set_size (ompi_file_t *fh, data->ompio_fh.f_comm->c_coll->coll_barrier_module); if ( OMPI_SUCCESS != ret ) { opal_output(1, ",mca_io_ompio_file_set_size: error in barrier\n"); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -300,9 +300,9 @@ int mca_io_ompio_file_get_size (ompi_file_t *fh, mca_io_ompio_data_t *data; data = (mca_io_ompio_data_t *) fh->f_io_selected_data; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = mca_common_ompio_file_get_size(&data->ompio_fh,size); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -322,42 +322,6 @@ int mca_io_ompio_file_get_amode (ompi_file_t *fh, } -int mca_io_ompio_file_set_info (ompi_file_t *fh, - ompi_info_t *info) -{ - int ret = OMPI_SUCCESS; - - if ( MPI_INFO_NULL == fh->f_info ) { - /* OBJ_RELEASE(MPI_INFO_NULL); */ - } - else { - ompi_info_free ( &fh->f_info); - fh->f_info = OBJ_NEW(ompi_info_t); - ret = ompi_info_dup (info, &fh->f_info); - } - - return ret; -} - - -int mca_io_ompio_file_get_info (ompi_file_t *fh, - ompi_info_t ** info_used) -{ - int ret = OMPI_SUCCESS; - ompi_info_t *info=NULL; - - info = OBJ_NEW(ompi_info_t); - if (NULL == info) { - return MPI_ERR_INFO; - } - if (MPI_INFO_NULL != fh->f_info) { - ret = ompi_info_dup (fh->f_info, &info); - } - *info_used = info; - - return ret; -} - int mca_io_ompio_file_get_type_extent (ompi_file_t *fh, struct ompi_datatype_t *datatype, MPI_Aint *extent) @@ -375,7 +339,7 @@ int mca_io_ompio_file_set_atomicity (ompi_file_t *fh, data = (mca_io_ompio_data_t *) fh->f_io_selected_data; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); if (flag) { flag = 1; } @@ -390,12 +354,12 @@ int mca_io_ompio_file_set_atomicity (ompi_file_t *fh, data->ompio_fh.f_comm->c_coll->coll_bcast_module); if (tmp != flag) { - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return OMPI_ERROR; } data->ompio_fh.f_atomicity = flag; - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return OMPI_SUCCESS; } @@ -407,9 +371,9 @@ int mca_io_ompio_file_get_atomicity (ompi_file_t *fh, data = (mca_io_ompio_data_t *) fh->f_io_selected_data; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); *flag = data->ompio_fh.f_atomicity; - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return OMPI_SUCCESS; } @@ -421,9 +385,9 @@ int mca_io_ompio_file_sync (ompi_file_t *fh) data = (mca_io_ompio_data_t *) fh->f_io_selected_data; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = data->ompio_fh.f_fs->fs_file_sync (&data->ompio_fh); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -439,13 +403,13 @@ int mca_io_ompio_file_seek (ompi_file_t *fh, data = (mca_io_ompio_data_t *) fh->f_io_selected_data; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); offset = off * data->ompio_fh.f_etype_size; switch(whence) { case MPI_SEEK_SET: if (offset < 0) { - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return OMPI_ERROR; } break; @@ -453,7 +417,7 @@ int mca_io_ompio_file_seek (ompi_file_t *fh, offset += data->ompio_fh.f_position_in_file_view; offset += data->ompio_fh.f_disp; if (offset < 0) { - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return OMPI_ERROR; } break; @@ -462,18 +426,18 @@ int mca_io_ompio_file_seek (ompi_file_t *fh, &temp_offset); offset += temp_offset; if (offset < 0 || OMPI_SUCCESS != ret) { - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return OMPI_ERROR; } break; default: - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return OMPI_ERROR; } ret = mca_common_ompio_set_explicit_offset (&data->ompio_fh, offset/data->ompio_fh.f_etype_size); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -488,9 +452,9 @@ int mca_io_ompio_file_get_position (ompi_file_t *fd, data = (mca_io_ompio_data_t *) fd->f_io_selected_data; fh = &data->ompio_fh; - OPAL_THREAD_LOCK(&fd->f_mutex); + OPAL_THREAD_LOCK(&fd->f_lock); ret = mca_common_ompio_file_get_position (fh, offset); - OPAL_THREAD_UNLOCK(&fd->f_mutex); + OPAL_THREAD_UNLOCK(&fd->f_lock); return ret; } @@ -506,7 +470,7 @@ int mca_io_ompio_file_get_byte_offset (ompi_file_t *fh, data = (mca_io_ompio_data_t *) fh->f_io_selected_data; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); temp_offset = data->ompio_fh.f_view_extent * (offset*data->ompio_fh.f_etype_size / data->ompio_fh.f_view_size); @@ -533,7 +497,7 @@ int mca_io_ompio_file_get_byte_offset (ompi_file_t *fh, *disp = data->ompio_fh.f_disp + temp_offset + (OMPI_MPI_OFFSET_TYPE)(intptr_t)data->ompio_fh.f_decoded_iov[index].iov_base + k; - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return OMPI_SUCCESS; } @@ -557,9 +521,9 @@ int mca_io_ompio_file_seek_shared (ompi_file_t *fp, return OMPI_ERROR; } - OPAL_THREAD_LOCK(&fp->f_mutex); + OPAL_THREAD_LOCK(&fp->f_lock); ret = shared_fp_base_module->sharedfp_seek(fh,offset,whence); - OPAL_THREAD_UNLOCK(&fp->f_mutex); + OPAL_THREAD_UNLOCK(&fp->f_lock); return ret; } @@ -582,10 +546,10 @@ int mca_io_ompio_file_get_position_shared (ompi_file_t *fp, opal_output(0, "No shared file pointer component found for this communicator. Can not execute\n"); return OMPI_ERROR; } - OPAL_THREAD_LOCK(&fp->f_mutex); + OPAL_THREAD_LOCK(&fp->f_lock); ret = shared_fp_base_module->sharedfp_get_position(fh,offset); *offset = *offset / fh->f_etype_size; - OPAL_THREAD_UNLOCK(&fp->f_mutex); + OPAL_THREAD_UNLOCK(&fp->f_lock); return ret; } diff --git a/ompi/mca/io/ompio/io_ompio_file_read.c b/ompi/mca/io/ompio/io_ompio_file_read.c index 4a634572e04..db0a102db0c 100644 --- a/ompi/mca/io/ompio/io_ompio_file_read.c +++ b/ompi/mca/io/ompio/io_ompio_file_read.c @@ -1,20 +1,22 @@ /* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2016 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2008-2016 University of Houston. All rights reserved. - * $COPYRIGHT$ + * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana + * University Research and Technology + * Corporation. All rights reserved. + * Copyright (c) 2004-2016 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, + * University of Stuttgart. All rights reserved. + * Copyright (c) 2004-2005 The Regents of the University of California. + * All rights reserved. + * Copyright (c) 2008-2016 University of Houston. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * $COPYRIGHT$ * - * Additional copyrights may follow + * Additional copyrights may follow * - * $HEADER$ + * $HEADER$ */ #include "ompi_config.h" @@ -59,9 +61,9 @@ int mca_io_ompio_file_read (ompi_file_t *fp, mca_io_ompio_data_t *data; data = (mca_io_ompio_data_t *) fp->f_io_selected_data; - OPAL_THREAD_LOCK(&fp->f_mutex); + OPAL_THREAD_LOCK(&fp->f_lock); ret = mca_common_ompio_file_read(&data->ompio_fh,buf,count,datatype,status); - OPAL_THREAD_UNLOCK(&fp->f_mutex); + OPAL_THREAD_UNLOCK(&fp->f_lock); return ret; } @@ -77,9 +79,9 @@ int mca_io_ompio_file_read_at (ompi_file_t *fh, mca_io_ompio_data_t *data; data = (mca_io_ompio_data_t *) fh->f_io_selected_data; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = mca_common_ompio_file_read_at(&data->ompio_fh, offset,buf,count,datatype,status); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -94,9 +96,9 @@ int mca_io_ompio_file_iread (ompi_file_t *fh, mca_io_ompio_data_t *data; data = (mca_io_ompio_data_t *) fh->f_io_selected_data; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = mca_common_ompio_file_iread(&data->ompio_fh,buf,count,datatype,request); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -113,9 +115,9 @@ int mca_io_ompio_file_iread_at (ompi_file_t *fh, mca_io_ompio_data_t *data; data = (mca_io_ompio_data_t *) fh->f_io_selected_data; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = mca_common_ompio_file_iread_at(&data->ompio_fh,offset,buf,count,datatype,request); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -134,14 +136,14 @@ int mca_io_ompio_file_read_all (ompi_file_t *fh, data = (mca_io_ompio_data_t *) fh->f_io_selected_data; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = data->ompio_fh. f_fcoll->fcoll_file_read_all (&data->ompio_fh, buf, count, datatype, status); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); if ( MPI_STATUS_IGNORE != status ) { size_t size; @@ -165,7 +167,7 @@ int mca_io_ompio_file_iread_all (ompi_file_t *fh, data = (mca_io_ompio_data_t *) fh->f_io_selected_data; fp = &data->ompio_fh; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); if ( NULL != fp->f_fcoll->fcoll_file_iread_all ) { ret = fp->f_fcoll->fcoll_file_iread_all (&data->ompio_fh, buf, @@ -179,7 +181,7 @@ int mca_io_ompio_file_iread_all (ompi_file_t *fh, individual non-blocking I/O operations. */ ret = mca_common_ompio_file_iread ( fp, buf, count, datatype, request ); } - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -196,9 +198,9 @@ int mca_io_ompio_file_read_at_all (ompi_file_t *fh, mca_io_ompio_data_t *data; data = (mca_io_ompio_data_t *) fh->f_io_selected_data; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = mca_common_ompio_file_read_at_all(&data->ompio_fh,offset,buf,count,datatype,status); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -214,9 +216,9 @@ int mca_io_ompio_file_iread_at_all (ompi_file_t *fh, mca_io_ompio_data_t *data; data = (mca_io_ompio_data_t *) fh->f_io_selected_data; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = mca_common_ompio_file_iread_at_all ( &data->ompio_fh, offset, buf, count, datatype, request ); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -244,9 +246,9 @@ int mca_io_ompio_file_read_shared (ompi_file_t *fp, opal_output(0, "No shared file pointer component found for the given communicator. Can not execute\n"); return OMPI_ERROR; } - OPAL_THREAD_LOCK(&fp->f_mutex); + OPAL_THREAD_LOCK(&fp->f_lock); ret = shared_fp_base_module->sharedfp_read(fh,buf,count,datatype,status); - OPAL_THREAD_UNLOCK(&fp->f_mutex); + OPAL_THREAD_UNLOCK(&fp->f_lock); return ret; } @@ -271,9 +273,9 @@ int mca_io_ompio_file_iread_shared (ompi_file_t *fh, opal_output(0, "No shared file pointer component found for the given communicator. Can not execute\n"); return OMPI_ERROR; } - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = shared_fp_base_module->sharedfp_iread(ompio_fh,buf,count,datatype,request); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -298,9 +300,9 @@ int mca_io_ompio_file_read_ordered (ompi_file_t *fh, opal_output(0, "No shared file pointer component found for the given communicator. Can not execute\n"); return OMPI_ERROR; } - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = shared_fp_base_module->sharedfp_read_ordered(ompio_fh,buf,count,datatype,status); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -323,9 +325,9 @@ int mca_io_ompio_file_read_ordered_begin (ompi_file_t *fh, opal_output(0, "No shared file pointer component found for the given communicator. Can not execute\n"); return OMPI_ERROR; } - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = shared_fp_base_module->sharedfp_read_ordered_begin(ompio_fh,buf,count,datatype); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -348,9 +350,9 @@ int mca_io_ompio_file_read_ordered_end (ompi_file_t *fh, opal_output(0, "No shared file pointer component found for the given communicator. Can not execute\n"); return OMPI_ERROR; } - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = shared_fp_base_module->sharedfp_read_ordered_end(ompio_fh,buf,status); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -373,7 +375,7 @@ int mca_io_ompio_file_read_all_begin (ompi_file_t *fh, printf("Only one split collective I/O operation allowed per file handle at any given point in time!\n"); return MPI_ERR_OTHER; } - /* No need for locking fh->f_mutex, that is done in file_iread_all */ + /* No need for locking fh->f_lock, that is done in file_iread_all */ ret = mca_io_ompio_file_iread_all ( fh, buf, count, datatype, &fp->f_split_coll_req ); fp->f_split_coll_in_use = true; @@ -413,9 +415,9 @@ int mca_io_ompio_file_read_at_all_begin (ompi_file_t *fh, printf("Only one split collective I/O operation allowed per file handle at any given point in time!\n"); return MPI_ERR_REQUEST; } - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = mca_common_ompio_file_iread_at_all ( fp, offset, buf, count, datatype, &fp->f_split_coll_req ); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); fp->f_split_coll_in_use = true; return ret; } diff --git a/ompi/mca/io/ompio/io_ompio_file_set_view.c b/ompi/mca/io/ompio/io_ompio_file_set_view.c index fa14360a9be..3e2a7b3f7ba 100644 --- a/ompi/mca/io/ompio/io_ompio_file_set_view.c +++ b/ompi/mca/io/ompio/io_ompio_file_set_view.c @@ -10,8 +10,9 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2016 University of Houston. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -59,7 +60,7 @@ int mca_io_ompio_file_set_view (ompi_file_t *fp, ompi_datatype_t *etype, ompi_datatype_t *filetype, const char *datarep, - ompi_info_t *info) + opal_info_t *info) { int ret=OMPI_SUCCESS; mca_io_ompio_data_t *data; @@ -73,7 +74,7 @@ int mca_io_ompio_file_set_view (ompi_file_t *fp, */ fh = &data->ompio_fh; - OPAL_THREAD_LOCK(&fp->f_mutex); + OPAL_THREAD_LOCK(&fp->f_lock); ret = mca_common_ompio_set_view(fh, disp, etype, filetype, datarep, info); if ( NULL != fh->f_sharedfp_data) { @@ -81,7 +82,7 @@ int mca_io_ompio_file_set_view (ompi_file_t *fp, ret = mca_common_ompio_set_view(sh, disp, etype, filetype, datarep, info); } - OPAL_THREAD_UNLOCK(&fp->f_mutex); + OPAL_THREAD_UNLOCK(&fp->f_lock); return ret; } @@ -97,12 +98,12 @@ int mca_io_ompio_file_get_view (struct ompi_file_t *fp, data = (mca_io_ompio_data_t *) fp->f_io_selected_data; fh = &data->ompio_fh; - OPAL_THREAD_LOCK(&fp->f_mutex); + OPAL_THREAD_LOCK(&fp->f_lock); *disp = fh->f_disp; datatype_duplicate (fh->f_etype, etype); datatype_duplicate (fh->f_orig_filetype, filetype); strcpy (datarep, fh->f_datarep); - OPAL_THREAD_UNLOCK(&fp->f_mutex); + OPAL_THREAD_UNLOCK(&fp->f_lock); return OMPI_SUCCESS; } diff --git a/ompi/mca/io/ompio/io_ompio_file_write.c b/ompi/mca/io/ompio/io_ompio_file_write.c index 39620737dfd..a0bdcdaaf2f 100644 --- a/ompi/mca/io/ompio/io_ompio_file_write.c +++ b/ompi/mca/io/ompio/io_ompio_file_write.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2016 University of Houston. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -66,9 +66,9 @@ int mca_io_ompio_file_write (ompi_file_t *fp, data = (mca_io_ompio_data_t *) fp->f_io_selected_data; fh = &data->ompio_fh; - OPAL_THREAD_LOCK(&fp->f_mutex); + OPAL_THREAD_LOCK(&fp->f_lock); ret = mca_common_ompio_file_write(fh,buf,count,datatype,status); - OPAL_THREAD_UNLOCK(&fp->f_mutex); + OPAL_THREAD_UNLOCK(&fp->f_lock); return ret; } @@ -84,9 +84,9 @@ int mca_io_ompio_file_write_at (ompi_file_t *fh, mca_io_ompio_data_t *data; data = (mca_io_ompio_data_t *) fh->f_io_selected_data; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = mca_common_ompio_file_write_at (&data->ompio_fh, offset,buf,count,datatype,status); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -101,9 +101,9 @@ int mca_io_ompio_file_iwrite (ompi_file_t *fp, mca_io_ompio_data_t *data; data = (mca_io_ompio_data_t *) fp->f_io_selected_data; - OPAL_THREAD_LOCK(&fp->f_mutex); + OPAL_THREAD_LOCK(&fp->f_lock); ret = mca_common_ompio_file_iwrite(&data->ompio_fh,buf,count,datatype,request); - OPAL_THREAD_UNLOCK(&fp->f_mutex); + OPAL_THREAD_UNLOCK(&fp->f_lock); return ret; } @@ -120,9 +120,9 @@ int mca_io_ompio_file_iwrite_at (ompi_file_t *fh, mca_io_ompio_data_t *data; data = (mca_io_ompio_data_t *) fh->f_io_selected_data; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = mca_common_ompio_file_iwrite_at(&data->ompio_fh,offset,buf,count,datatype,request); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -142,14 +142,14 @@ int mca_io_ompio_file_write_all (ompi_file_t *fh, data = (mca_io_ompio_data_t *) fh->f_io_selected_data; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = data->ompio_fh. f_fcoll->fcoll_file_write_all (&data->ompio_fh, buf, count, datatype, status); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); if ( MPI_STATUS_IGNORE != status ) { size_t size; @@ -171,9 +171,9 @@ int mca_io_ompio_file_write_at_all (ompi_file_t *fh, mca_io_ompio_data_t *data; data = (mca_io_ompio_data_t *) fh->f_io_selected_data; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = mca_common_ompio_file_write_at_all(&data->ompio_fh,offset,buf,count,datatype,status); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -191,7 +191,7 @@ int mca_io_ompio_file_iwrite_all (ompi_file_t *fh, data = (mca_io_ompio_data_t *) fh->f_io_selected_data; fp = &data->ompio_fh; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); if ( NULL != fp->f_fcoll->fcoll_file_iwrite_all ) { ret = fp->f_fcoll->fcoll_file_iwrite_all (&data->ompio_fh, buf, @@ -205,7 +205,7 @@ int mca_io_ompio_file_iwrite_all (ompi_file_t *fh, individual non-blocking I/O operations. */ ret = mca_common_ompio_file_iwrite ( fp, buf, count, datatype, request ); } - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -222,9 +222,9 @@ int mca_io_ompio_file_iwrite_at_all (ompi_file_t *fh, mca_io_ompio_data_t *data; data = (mca_io_ompio_data_t *) fh->f_io_selected_data; - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = mca_common_ompio_file_iwrite_at_all ( &data->ompio_fh, offset, buf, count, datatype, request ); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); return ret; } @@ -253,9 +253,9 @@ int mca_io_ompio_file_write_shared (ompi_file_t *fp, opal_output(0, "No shared file pointer component found for this communicator. Can not execute\n"); return OMPI_ERROR; } - OPAL_THREAD_LOCK(&fp->f_mutex); + OPAL_THREAD_LOCK(&fp->f_lock); ret = shared_fp_base_module->sharedfp_write(fh,buf,count,datatype,status); - OPAL_THREAD_UNLOCK(&fp->f_mutex); + OPAL_THREAD_UNLOCK(&fp->f_lock); return ret; } @@ -280,9 +280,9 @@ int mca_io_ompio_file_iwrite_shared (ompi_file_t *fp, opal_output(0, "No shared file pointer component found for this communicator. Can not execute\n"); return OMPI_ERROR; } - OPAL_THREAD_LOCK(&fp->f_mutex); + OPAL_THREAD_LOCK(&fp->f_lock); ret = shared_fp_base_module->sharedfp_iwrite(fh,buf,count,datatype,request); - OPAL_THREAD_UNLOCK(&fp->f_mutex); + OPAL_THREAD_UNLOCK(&fp->f_lock); return ret; } @@ -307,9 +307,9 @@ int mca_io_ompio_file_write_ordered (ompi_file_t *fp, opal_output(0,"No shared file pointer component found for this communicator. Can not execute\n"); return OMPI_ERROR; } - OPAL_THREAD_LOCK(&fp->f_mutex); + OPAL_THREAD_LOCK(&fp->f_lock); ret = shared_fp_base_module->sharedfp_write_ordered(fh,buf,count,datatype,status); - OPAL_THREAD_UNLOCK(&fp->f_mutex); + OPAL_THREAD_UNLOCK(&fp->f_lock); return ret; } @@ -333,9 +333,9 @@ int mca_io_ompio_file_write_ordered_begin (ompi_file_t *fp, opal_output(0, "No shared file pointer component found for this communicator. Can not execute\n"); return OMPI_ERROR; } - OPAL_THREAD_LOCK(&fp->f_mutex); + OPAL_THREAD_LOCK(&fp->f_lock); ret = shared_fp_base_module->sharedfp_write_ordered_begin(fh,buf,count,datatype); - OPAL_THREAD_UNLOCK(&fp->f_mutex); + OPAL_THREAD_UNLOCK(&fp->f_lock); return ret; } @@ -358,9 +358,9 @@ int mca_io_ompio_file_write_ordered_end (ompi_file_t *fp, opal_output(0, "No shared file pointer component found for this communicator. Can not execute\n"); return OMPI_ERROR; } - OPAL_THREAD_LOCK(&fp->f_mutex); + OPAL_THREAD_LOCK(&fp->f_lock); ret = shared_fp_base_module->sharedfp_write_ordered_end(fh,buf,status); - OPAL_THREAD_UNLOCK(&fp->f_mutex); + OPAL_THREAD_UNLOCK(&fp->f_lock); return ret; } @@ -383,7 +383,7 @@ int mca_io_ompio_file_write_all_begin (ompi_file_t *fh, printf("Only one split collective I/O operation allowed per file handle at any given point in time!\n"); return MPI_ERR_OTHER; } - /* No need for locking fh->f_mutex, that is done in file_iwrite_all */ + /* No need for locking fh->f_lock, that is done in file_iwrite_all */ ret = mca_io_ompio_file_iwrite_all ( fh, buf, count, datatype, &fp->f_split_coll_req ); fp->f_split_coll_in_use = true; @@ -425,9 +425,9 @@ int mca_io_ompio_file_write_at_all_begin (ompi_file_t *fh, printf("Only one split collective I/O operation allowed per file handle at any given point in time!\n"); return MPI_ERR_REQUEST; } - OPAL_THREAD_LOCK(&fh->f_mutex); + OPAL_THREAD_LOCK(&fh->f_lock); ret = mca_common_ompio_file_iwrite_at_all ( fp, offset, buf, count, datatype, &fp->f_split_coll_req ); - OPAL_THREAD_UNLOCK(&fh->f_mutex); + OPAL_THREAD_UNLOCK(&fh->f_lock); fp->f_split_coll_in_use = true; return ret; diff --git a/ompi/mca/io/ompio/io_ompio_module.c b/ompi/mca/io/ompio/io_ompio_module.c index cbdaf2e0dd8..109b99c82ef 100644 --- a/ompi/mca/io/ompio/io_ompio_module.c +++ b/ompi/mca/io/ompio/io_ompio_module.c @@ -10,6 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2011 University of Houston. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -35,8 +36,6 @@ mca_io_base_module_2_0_0_t mca_io_ompio_module = { mca_io_ompio_file_preallocate, mca_io_ompio_file_get_size, mca_io_ompio_file_get_amode, - mca_io_ompio_file_set_info, - mca_io_ompio_file_get_info, mca_io_ompio_file_set_view, mca_io_ompio_file_get_view, diff --git a/ompi/mca/io/ompio/io_ompio_request.c b/ompi/mca/io/ompio/io_ompio_request.c index 59271a346ba..a2fdb47ea87 100644 --- a/ompi/mca/io/ompio/io_ompio_request.c +++ b/ompi/mca/io/ompio/io_ompio_request.c @@ -34,6 +34,7 @@ static int mca_io_ompio_request_free ( struct ompi_request_t **req) opal_list_remove_item (&mca_io_ompio_pending_requests, &ompio_req->req_item); OBJ_RELEASE (*req); + *req = MPI_REQUEST_NULL; return OMPI_SUCCESS; } diff --git a/ompi/mca/io/romio314/Makefile.am b/ompi/mca/io/romio314/Makefile.am index 72493f3d1d1..690ddfbd2d7 100644 --- a/ompi/mca/io/romio314/Makefile.am +++ b/ompi/mca/io/romio314/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2010-2015 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -49,7 +50,8 @@ libs = romio/libromio_dist.la mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component) mca_io_romio314_la_SOURCES = $(component_sources) -mca_io_romio314_la_LIBADD = $(libs) +mca_io_romio314_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(libs) mca_io_romio314_la_DEPENDENCIES = $(libs) mca_io_romio314_la_LDFLAGS = -module -avoid-version diff --git a/ompi/mca/io/romio314/configure.m4 b/ompi/mca/io/romio314/configure.m4 index 6ebe85263f0..b3c20fdec0d 100644 --- a/ompi/mca/io/romio314/configure.m4 +++ b/ompi/mca/io/romio314/configure.m4 @@ -77,10 +77,10 @@ AC_DEFUN([MCA_ompi_io_romio314_CONFIG],[ [AS_IF([test ! -z $build], [io_romio314_flags="$io_romio314_flags --build=$build"]) AS_IF([test ! -z $host], [io_romio314_flags="$io_romio314_flags --host=$host"]) AS_IF([test ! -z $target], [io_romio314_flags="$io_romio314_flags --target=$target"])]) - io_romio314_flags_define="$io_romio314_flags FROM_OMPI=yes CC='$CC' CFLAGS='$CFLAGS -D__EXTENSIONS__' CPPFLAGS='$CPPFLAGS' FFLAGS='$FFLAGS' LDFLAGS='$LDFLAGS' --$io_romio314_shared-shared --$io_romio314_static-static $io_romio314_flags $io_romio314_prefix_arg --disable-aio --disable-weak-symbols --enable-strict" + io_romio314_flags_define="$io_romio314_flags FROM_OMPI=yes CC='$CC' CFLAGS='$CFLAGS -D__EXTENSIONS__' CPPFLAGS='$CPPFLAGS' FFLAGS='$FFLAGS' LDFLAGS='$LDFLAGS' --$io_romio314_shared-shared --$io_romio314_static-static $io_romio314_flags $io_romio314_prefix_arg --disable-aio --disable-weak-symbols --enable-strict --disable-f77 --disable-f90" AC_DEFINE_UNQUOTED([MCA_io_romio314_COMPLETE_CONFIGURE_FLAGS], ["$io_romio314_flags_define"], [Complete set of command line arguments given to ROMIOs configure script]) - io_romio314_flags="$io_romio314_flags FROM_OMPI=yes CC="'"'"$CC"'"'" CFLAGS="'"'"$CFLAGS -D__EXTENSIONS__"'"'" CPPFLAGS="'"'"$CPPFLAGS"'"'" FFLAGS="'"'"$FFLAGS"'"'" LDFLAGS="'"'"$LDFLAGS"'"'" --$io_romio314_shared-shared --$io_romio314_static-static $io_romio314_flags $io_romio314_prefix_arg --disable-aio --disable-weak-symbols --enable-strict" + io_romio314_flags="$io_romio314_flags FROM_OMPI=yes CC="'"'"$CC"'"'" CFLAGS="'"'"$CFLAGS -D__EXTENSIONS__"'"'" CPPFLAGS="'"'"$CPPFLAGS"'"'" FFLAGS="'"'"$FFLAGS"'"'" LDFLAGS="'"'"$LDFLAGS"'"'" --$io_romio314_shared-shared --$io_romio314_static-static $io_romio314_flags $io_romio314_prefix_arg --disable-aio --disable-weak-symbols --enable-strict --disable-f77 --disable-f90" opal_show_subtitle "Configuring ROMIO distribution" OPAL_CONFIG_SUBDIR([ompi/mca/io/romio314/romio], diff --git a/ompi/mca/io/romio314/romio/adio/ad_nfs/ad_nfs_read.c b/ompi/mca/io/romio314/romio/adio/ad_nfs/ad_nfs_read.c index 0a74dafe989..8e76dd4c279 100644 --- a/ompi/mca/io/romio314/romio/adio/ad_nfs/ad_nfs_read.c +++ b/ompi/mca/io/romio314/romio/adio/ad_nfs/ad_nfs_read.c @@ -7,79 +7,68 @@ #include "ad_nfs.h" #include "adio_extern.h" +#ifdef HAVE_UNISTD_H +#include +#endif void ADIOI_NFS_ReadContig(ADIO_File fd, void *buf, int count, MPI_Datatype datatype, int file_ptr_type, ADIO_Offset offset, ADIO_Status *status, int *error_code) { - int err=-1; + ssize_t err=-1; MPI_Count datatype_size, len; + ADIO_Offset bytes_xfered=0; + size_t rd_count; static char myname[] = "ADIOI_NFS_READCONTIG"; + char *p; MPI_Type_size_x(datatype, &datatype_size); len = datatype_size * count; - if (file_ptr_type == ADIO_EXPLICIT_OFFSET) { - if (fd->fp_sys_posn != offset) { -#ifdef ADIOI_MPE_LOGGING - MPE_Log_event( ADIOI_MPE_lseek_a, 0, NULL ); -#endif - lseek(fd->fd_sys, offset, SEEK_SET); -#ifdef ADIOI_MPE_LOGGING - MPE_Log_event( ADIOI_MPE_lseek_b, 0, NULL ); -#endif - } - if (fd->atomicity) - ADIOI_WRITE_LOCK(fd, offset, SEEK_SET, len); - else ADIOI_READ_LOCK(fd, offset, SEEK_SET, len); -#ifdef ADIOI_MPE_LOGGING - MPE_Log_event( ADIOI_MPE_read_a, 0, NULL ); -#endif - err = read(fd->fd_sys, buf, len); -#ifdef ADIOI_MPE_LOGGING - MPE_Log_event( ADIOI_MPE_read_b, 0, NULL ); -#endif - ADIOI_UNLOCK(fd, offset, SEEK_SET, len); - fd->fp_sys_posn = offset + err; - /* individual file pointer not updated */ - } - else { /* read from curr. location of ind. file pointer */ + if (file_ptr_type == ADIO_INDIVIDUAL) { offset = fd->fp_ind; - if (fd->fp_sys_posn != fd->fp_ind) { -#ifdef ADIOI_MPE_LOGGING - MPE_Log_event( ADIOI_MPE_lseek_a, 0, NULL ); -#endif - lseek(fd->fd_sys, fd->fp_ind, SEEK_SET); -#ifdef ADIOI_MPE_LOGGING - MPE_Log_event( ADIOI_MPE_lseek_b, 0, NULL ); -#endif - } + } + + p = buf; + while (bytes_xfered < len ) { + rd_count = len - bytes_xfered; + /* FreeBSD and Darwin workaround: bigger than INT_MAX is an error */ + if (rd_count > INT_MAX) + rd_count = INT_MAX; if (fd->atomicity) - ADIOI_WRITE_LOCK(fd, offset, SEEK_SET, len); - else ADIOI_READ_LOCK(fd, offset, SEEK_SET, len); + ADIOI_WRITE_LOCK(fd, offset+bytes_xfered, SEEK_SET, rd_count); + else ADIOI_READ_LOCK(fd, offset+bytes_xfered, SEEK_SET, rd_count); #ifdef ADIOI_MPE_LOGGING MPE_Log_event( ADIOI_MPE_read_a, 0, NULL ); #endif - err = read(fd->fd_sys, buf, len); + err = pread(fd->fd_sys, p, rd_count, offset+bytes_xfered); + /* --BEGIN ERROR HANDLING-- */ + if (err == -1) { + *error_code = MPIO_Err_create_code(MPI_SUCCESS, + MPIR_ERR_RECOVERABLE, myname, __LINE__, MPI_ERR_IO, + "**io", "**io %s", strerror(errno)); + } + /* --END ERROR HANDLING-- */ #ifdef ADIOI_MPE_LOGGING MPE_Log_event( ADIOI_MPE_read_b, 0, NULL ); #endif - ADIOI_UNLOCK(fd, offset, SEEK_SET, len); - fd->fp_ind += err; - fd->fp_sys_posn = fd->fp_ind; + ADIOI_UNLOCK(fd, offset+bytes_xfered, SEEK_SET, rd_count); + if (err == 0) { + /* end of file */ + break; + } + bytes_xfered += err; + p += err; } - /* --BEGIN ERROR HANDLING-- */ - if (err == -1) { - *error_code = MPIO_Err_create_code(MPI_SUCCESS, MPIR_ERR_RECOVERABLE, - myname, __LINE__, MPI_ERR_IO, - "**io", "**io %s", strerror(errno)); - return; + fd->fp_sys_posn = offset + bytes_xfered; + if (file_ptr_type == ADIO_INDIVIDUAL) { + fd->fp_ind += bytes_xfered; } /* --END ERROR HANDLING-- */ #ifdef HAVE_STATUS_SET_BYTES - MPIR_Status_set_bytes(status, datatype, err); + if (err != -1) MPIR_Status_set_bytes(status, datatype, bytes_xfered); #endif *error_code = MPI_SUCCESS; @@ -168,19 +157,21 @@ void ADIOI_NFS_ReadStrided(ADIO_File fd, void *buf, int count, /* offset is in units of etype relative to the filetype. */ ADIOI_Flatlist_node *flat_buf, *flat_file; - int i, j, k, err=-1, brd_size, frd_size=0, st_index=0; - int bufsize, num, size, sum, n_etypes_in_filetype, size_in_filetype; + int i, j, k, err=-1, brd_size, st_index=0; + int num, size, sum, n_etypes_in_filetype, size_in_filetype; + MPI_Count bufsize; int n_filetypes, etype_in_filetype; ADIO_Offset abs_off_in_filetype=0; int req_len, partial_read; MPI_Count filetype_size, etype_size, buftype_size; - MPI_Aint filetype_extent, buftype_extent; + MPI_Aint filetype_extent, buftype_extent, lb; int buf_count, buftype_is_contig, filetype_is_contig; ADIO_Offset userbuf_off; ADIO_Offset off, req_off, disp, end_offset=0, readbuf_off, start_off; char *readbuf, *tmp_buf, *value; - int st_frd_size, st_n_filetypes, readbuf_len; - int new_brd_size, new_frd_size, err_flag=0, info_flag, max_bufsize; + int st_n_filetypes, readbuf_len; + ADIO_Offset frd_size=0, new_frd_size, st_frd_size; + int new_brd_size, err_flag=0, info_flag, max_bufsize; static char myname[] = "ADIOI_NFS_READSTRIDED"; @@ -196,9 +187,9 @@ void ADIOI_NFS_ReadStrided(ADIO_File fd, void *buf, int count, return; } - MPI_Type_extent(fd->filetype, &filetype_extent); + MPI_Type_get_extent(fd->filetype, &lb, &filetype_extent); MPI_Type_size_x(datatype, &buftype_size); - MPI_Type_extent(datatype, &buftype_extent); + MPI_Type_get_extent(datatype, &lb, &buftype_extent); etype_size = fd->etype_size; bufsize = buftype_size * count; @@ -460,12 +451,13 @@ void ADIOI_NFS_ReadStrided(ADIO_File fd, void *buf, int count, else { /* noncontiguous in memory as well as in file */ + ADIO_Offset i; ADIOI_Flatten_datatype(datatype); flat_buf = ADIOI_Flatlist; while (flat_buf->type != datatype) flat_buf = flat_buf->next; k = num = buf_count = 0; - i = (int) (flat_buf->indices[0]); + i = flat_buf->indices[0]; j = st_index; off = offset; n_filetypes = st_n_filetypes; @@ -510,8 +502,8 @@ void ADIOI_NFS_ReadStrided(ADIO_File fd, void *buf, int count, k = (k + 1)%flat_buf->count; buf_count++; - i = (int) (buftype_extent*(buf_count/flat_buf->count) + - flat_buf->indices[k]); + i = buftype_extent*(buf_count/flat_buf->count) + + flat_buf->indices[k]; new_brd_size = flat_buf->blocklens[k]; if (size != frd_size) { off += size; diff --git a/ompi/mca/io/romio314/romio/adio/ad_nfs/ad_nfs_write.c b/ompi/mca/io/romio314/romio/adio/ad_nfs/ad_nfs_write.c index b41488036e5..794c06c4887 100644 --- a/ompi/mca/io/romio314/romio/adio/ad_nfs/ad_nfs_write.c +++ b/ompi/mca/io/romio314/romio/adio/ad_nfs/ad_nfs_write.c @@ -7,76 +7,64 @@ #include "ad_nfs.h" #include "adio_extern.h" +#ifdef HAVE_UNISTD_H +#include +#endif void ADIOI_NFS_WriteContig(ADIO_File fd, const void *buf, int count, MPI_Datatype datatype, int file_ptr_type, ADIO_Offset offset, ADIO_Status *status, int *error_code) { - int err=-1; + ssize_t err=-1; MPI_Count datatype_size, len; + ADIO_Offset bytes_xfered=0; + size_t wr_count; static char myname[] = "ADIOI_NFS_WRITECONTIG"; + char *p; MPI_Type_size_x(datatype, &datatype_size); - len = datatype_size * count; + len = datatype_size * (ADIO_Offset)count; - if (file_ptr_type == ADIO_EXPLICIT_OFFSET) { - if (fd->fp_sys_posn != offset) { -#ifdef ADIOI_MPE_LOGGING - MPE_Log_event( ADIOI_MPE_lseek_a, 0, NULL ); -#endif - lseek(fd->fd_sys, offset, SEEK_SET); -#ifdef ADIOI_MPE_LOGGING - MPE_Log_event( ADIOI_MPE_lseek_b, 0, NULL ); -#endif - } - ADIOI_WRITE_LOCK(fd, offset, SEEK_SET, len); -#ifdef ADIOI_MPE_LOGGING - MPE_Log_event( ADIOI_MPE_write_a, 0, NULL ); -#endif - err = write(fd->fd_sys, buf, len); -#ifdef ADIOI_MPE_LOGGING - MPE_Log_event( ADIOI_MPE_write_b, 0, NULL ); -#endif - ADIOI_UNLOCK(fd, offset, SEEK_SET, len); - fd->fp_sys_posn = offset + err; - /* individual file pointer not updated */ - } - else { /* write from curr. location of ind. file pointer */ + if (file_ptr_type == ADIO_INDIVIDUAL) { offset = fd->fp_ind; - if (fd->fp_sys_posn != fd->fp_ind) { -#ifdef ADIOI_MPE_LOGGING - MPE_Log_event( ADIOI_MPE_lseek_a, 0, NULL ); -#endif - lseek(fd->fd_sys, fd->fp_ind, SEEK_SET); -#ifdef ADIOI_MPE_LOGGING - MPE_Log_event( ADIOI_MPE_lseek_b, 0, NULL ); -#endif - } - ADIOI_WRITE_LOCK(fd, offset, SEEK_SET, len); + } + + p = (char *)buf; + while (bytes_xfered < len) { #ifdef ADIOI_MPE_LOGGING MPE_Log_event( ADIOI_MPE_write_a, 0, NULL ); #endif - err = write(fd->fd_sys, buf, len); + wr_count = len - bytes_xfered; + /* work around FreeBSD and OS X defects*/ + if (wr_count > INT_MAX) + wr_count = INT_MAX; + + ADIOI_WRITE_LOCK(fd, offset+bytes_xfered, SEEK_SET, wr_count); + err = pwrite(fd->fd_sys, p, wr_count, offset+bytes_xfered); + /* --BEGIN ERROR HANDLING-- */ + if (err == -1) { + *error_code = MPIO_Err_create_code(MPI_SUCCESS, + MPIR_ERR_RECOVERABLE, + myname, __LINE__, MPI_ERR_IO, "**io", + "**io %s", strerror(errno)); + fd->fp_sys_posn = -1; + return; + } + /* --END ERROR HANDLING-- */ #ifdef ADIOI_MPE_LOGGING MPE_Log_event( ADIOI_MPE_write_b, 0, NULL ); #endif - ADIOI_UNLOCK(fd, offset, SEEK_SET, len); - fd->fp_ind += err; - fd->fp_sys_posn = fd->fp_ind; + ADIOI_UNLOCK(fd, offset+bytes_xfered, SEEK_SET, wr_count); + bytes_xfered += err; + p += err; } - /* --BEGIN ERROR HANDLING-- */ - if (err == -1) { - *error_code = MPIO_Err_create_code(MPI_SUCCESS, MPIR_ERR_RECOVERABLE, - myname, __LINE__, MPI_ERR_IO, - "**io", - "**io %s", strerror(errno)); - return; + if (file_ptr_type == ADIO_INDIVIDUAL) { + fd->fp_ind += bytes_xfered; } - /* --END ERROR HANDLING-- */ #ifdef HAVE_STATUS_SET_BYTES - MPIR_Status_set_bytes(status, datatype, err); + MPIR_Status_set_bytes(status, datatype, bytes_xfered); #endif *error_code = MPI_SUCCESS; @@ -272,19 +260,21 @@ void ADIOI_NFS_WriteStrided(ADIO_File fd, const void *buf, int count, /* offset is in units of etype relative to the filetype. */ ADIOI_Flatlist_node *flat_buf, *flat_file; - int i, j, k, err=-1, bwr_size, fwr_size=0, st_index=0; - int bufsize, num, size, sum, n_etypes_in_filetype, size_in_filetype; + int i, j, k, err=-1, bwr_size, st_index=0; + int num, size, sum, n_etypes_in_filetype, size_in_filetype; + MPI_Count bufsize; int n_filetypes, etype_in_filetype; ADIO_Offset abs_off_in_filetype=0; int req_len; MPI_Count filetype_size, etype_size, buftype_size; - MPI_Aint filetype_extent, buftype_extent; + MPI_Aint filetype_extent, buftype_extent, lb; int buf_count, buftype_is_contig, filetype_is_contig; ADIO_Offset userbuf_off; ADIO_Offset off, req_off, disp, end_offset=0, writebuf_off, start_off; char *writebuf=NULL, *value; - int st_fwr_size, st_n_filetypes, writebuf_len, write_sz; - int new_bwr_size, new_fwr_size, err_flag=0, info_flag, max_bufsize; + int st_n_filetypes, writebuf_len, write_sz; + ADIO_Offset fwr_size = 0, new_fwr_size, st_fwr_size; + int new_bwr_size, err_flag=0, info_flag, max_bufsize; static char myname[] = "ADIOI_NFS_WRITESTRIDED"; ADIOI_Datatype_iscontig(datatype, &buftype_is_contig); @@ -299,9 +289,9 @@ void ADIOI_NFS_WriteStrided(ADIO_File fd, const void *buf, int count, return; } - MPI_Type_extent(fd->filetype, &filetype_extent); + MPI_Type_get_extent(fd->filetype, &lb, &filetype_extent); MPI_Type_size_x(datatype, &buftype_size); - MPI_Type_extent(datatype, &buftype_extent); + MPI_Type_get_extent(datatype, &lb, &buftype_extent); etype_size = fd->etype_size; bufsize = buftype_size * count; @@ -565,12 +555,13 @@ void ADIOI_NFS_WriteStrided(ADIO_File fd, const void *buf, int count, else { /* noncontiguous in memory as well as in file */ + ADIO_Offset i; ADIOI_Flatten_datatype(datatype); flat_buf = ADIOI_Flatlist; while (flat_buf->type != datatype) flat_buf = flat_buf->next; k = num = buf_count = 0; - i = (int) (flat_buf->indices[0]); + i = flat_buf->indices[0]; j = st_index; off = offset; n_filetypes = st_n_filetypes; @@ -616,8 +607,8 @@ void ADIOI_NFS_WriteStrided(ADIO_File fd, const void *buf, int count, k = (k + 1)%flat_buf->count; buf_count++; - i = (int) (buftype_extent*(buf_count/flat_buf->count) + - flat_buf->indices[k]); + i = buftype_extent*(buf_count/flat_buf->count) + + flat_buf->indices[k]; new_bwr_size = flat_buf->blocklens[k]; if (size != fwr_size) { off += size; diff --git a/ompi/mca/io/romio314/romio/adio/ad_testfs/ad_testfs_seek.c b/ompi/mca/io/romio314/romio/adio/ad_testfs/ad_testfs_seek.c index a6bca984c02..56d10433435 100644 --- a/ompi/mca/io/romio314/romio/adio/ad_testfs/ad_testfs_seek.c +++ b/ompi/mca/io/romio314/romio/adio/ad_testfs/ad_testfs_seek.c @@ -30,7 +30,7 @@ ADIO_Offset ADIOI_TESTFS_SeekIndividual(ADIO_File fd, ADIO_Offset offset, int size_in_filetype; int filetype_is_contig; MPI_Count filetype_size; - MPI_Aint etype_size, filetype_extent; + MPI_Aint etype_size, filetype_extent, lb; *error_code = MPI_SUCCESS; @@ -47,7 +47,7 @@ ADIO_Offset ADIOI_TESTFS_SeekIndividual(ADIO_File fd, ADIO_Offset offset, flat_file = ADIOI_Flatlist; while (flat_file->type != fd->filetype) flat_file = flat_file->next; - MPI_Type_extent(fd->filetype, &filetype_extent); + MPI_Type_get_extent(fd->filetype, &lb, &filetype_extent); MPI_Type_size_x(fd->filetype, &filetype_size); if ( ! filetype_size ) { *error_code = MPI_SUCCESS; diff --git a/ompi/mca/io/romio314/romio/adio/common/ad_aggregate_new.c b/ompi/mca/io/romio314/romio/adio/common/ad_aggregate_new.c index a01a41c5034..59fee5b9a9a 100644 --- a/ompi/mca/io/romio314/romio/adio/common/ad_aggregate_new.c +++ b/ompi/mca/io/romio314/romio/adio/common/ad_aggregate_new.c @@ -237,18 +237,18 @@ void ADIOI_Calc_file_realms_fsize (ADIO_File fd, int nprocs_for_coll, void ADIOI_Create_fr_simpletype (int size, int nprocs_for_coll, MPI_Datatype *simpletype) { - int count=2, blocklens[2]; - MPI_Aint indices[2]; - MPI_Datatype old_types[2]; + int count=1, blocklens[1]; + MPI_Aint indices[1]; + MPI_Datatype old_types[1]; + MPI_Datatype inttype; blocklens[0] = size; - blocklens[1] = 1; indices[0] = 0; - indices[1] = size*nprocs_for_coll; old_types[0] = MPI_BYTE; - old_types[1] = MPI_UB; - MPI_Type_struct (count, blocklens, indices, old_types, simpletype); + MPI_Type_create_struct (count, blocklens, indices, old_types, &inttype); + MPI_Type_create_resized (inttype, 0, size*nprocs_for_coll, simpletype); + MPI_Type_free (&inttype); MPI_Type_commit (simpletype); } diff --git a/ompi/mca/io/romio314/romio/adio/common/ad_coll_build_req_new.c b/ompi/mca/io/romio314/romio/adio/common/ad_coll_build_req_new.c index a9be23c28df..2b6f29e6f2a 100644 --- a/ompi/mca/io/romio314/romio/adio/common/ad_coll_build_req_new.c +++ b/ompi/mca/io/romio314/romio/adio/common/ad_coll_build_req_new.c @@ -279,7 +279,7 @@ static inline int get_next_fr_off(ADIO_File fd, ADIO_Offset *fr_next_off_p, ADIO_Offset *fr_max_len_p) { - MPI_Aint fr_extent = -1; + MPI_Aint fr_extent = -1, lb; ADIO_Offset tmp_off, off_rem; ADIOI_Flatlist_node *fr_node_p = ADIOI_Flatlist; int i = -1, fr_dtype_ct = 0; @@ -299,7 +299,7 @@ static inline int get_next_fr_off(ADIO_File fd, /* Calculate how many times to loop through the fr_type * and where the next fr_off is. */ - MPI_Type_extent(*fr_type_p, &fr_extent); + MPI_Type_get_extent(*fr_type_p, &lb, &fr_extent); tmp_off = off - fr_st_off; fr_dtype_ct = tmp_off / fr_extent; off_rem = tmp_off % fr_extent; diff --git a/ompi/mca/io/romio314/romio/adio/common/ad_coll_exch_new.c b/ompi/mca/io/romio314/romio/adio/common/ad_coll_exch_new.c index abe7d74465c..18156166759 100644 --- a/ompi/mca/io/romio314/romio/adio/common/ad_coll_exch_new.c +++ b/ompi/mca/io/romio314/romio/adio/common/ad_coll_exch_new.c @@ -127,7 +127,7 @@ void ADIOI_Exch_file_views(int myrank, int nprocs, int file_ptr_type, MPI_Request *send_req_arr = NULL, *recv_req_arr = NULL; MPI_Status *statuses = NULL; ADIO_Offset disp_off_sz_ext_typesz[6]; - MPI_Aint memtype_extent, filetype_extent; + MPI_Aint memtype_extent, filetype_extent, lb; int ret = -1; /* parameters for datatypes */ @@ -143,7 +143,7 @@ void ADIOI_Exch_file_views(int myrank, int nprocs, int file_ptr_type, * freed in the close and should have been flattened in the file * view. */ MPI_Type_size_x(datatype, &memtype_sz); - MPI_Type_extent(datatype, &memtype_extent); + MPI_Type_get_extent(datatype, &lb, &memtype_extent); if (memtype_sz == memtype_extent) { memtype_is_contig = 1; flat_mem_p = ADIOI_Add_contig_flattened(datatype); @@ -156,7 +156,7 @@ void ADIOI_Exch_file_views(int myrank, int nprocs, int file_ptr_type, flat_mem_p = flat_mem_p->next; } - MPI_Type_extent(fd->filetype, &filetype_extent); + MPI_Type_get_extent(fd->filetype, &lb, &filetype_extent); MPI_Type_size_x(fd->filetype, &filetype_sz); if (filetype_extent == filetype_sz) { flat_file_p = ADIOI_Add_contig_flattened(fd->filetype); diff --git a/ompi/mca/io/romio314/romio/adio/common/ad_darray.c b/ompi/mca/io/romio314/romio/adio/common/ad_darray.c index 0437db828ef..cb1407bda62 100644 --- a/ompi/mca/io/romio314/romio/adio/common/ad_darray.c +++ b/ompi/mca/io/romio314/romio/adio/common/ad_darray.c @@ -24,11 +24,11 @@ int ADIO_Type_create_darray(int size, int rank, int ndims, int order, MPI_Datatype oldtype, MPI_Datatype *newtype) { - MPI_Datatype type_old, type_new=MPI_DATATYPE_NULL, types[3]; - int procs, tmp_rank, i, tmp_size, blklens[3], *coords; - MPI_Aint *st_offsets, orig_extent, disps[3]; + MPI_Datatype type_old, type_new=MPI_DATATYPE_NULL, inttype; + int procs, tmp_rank, i, tmp_size, blklen, *coords; + MPI_Aint *st_offsets, orig_extent, disp, ub, lb; - MPI_Type_extent(oldtype, &orig_extent); + MPI_Type_get_extent(oldtype, &lb, &orig_extent); /* calculate position in Cartesian grid as MPI would (row-major ordering) */ @@ -77,11 +77,11 @@ int ADIO_Type_create_darray(int size, int rank, int ndims, } /* add displacement and UB */ - disps[1] = st_offsets[0]; + disp = st_offsets[0]; tmp_size = 1; for (i=1; i=0; i--) { tmp_size *= array_of_gsizes[i+1]; - disps[1] += (MPI_Aint)tmp_size*st_offsets[i]; + disp += (MPI_Aint)tmp_size*st_offsets[i]; } } - disps[1] *= orig_extent; + disp *= orig_extent; - disps[2] = orig_extent; - for (i=0; idim; i--) stride *= (MPI_Aint)array_of_gsizes[i]; - MPI_Type_hvector(mysize, 1, stride, type_old, type_new); + MPI_Type_create_hvector(mysize, 1, stride, type_old, type_new); } } @@ -217,7 +215,7 @@ static int MPIOI_Type_cyclic(int *array_of_gsizes, int dim, int ndims, int nproc rank = coordinate of this process in dimension dim */ int blksize, i, blklens[3], st_index, end_index, local_size, rem, count; MPI_Aint stride, disps[3]; - MPI_Datatype type_tmp, types[3]; + MPI_Datatype type_tmp, type_tmp1, types[3]; if (darg == MPI_DISTRIBUTE_DFLT_DARG) blksize = 1; else blksize = darg; @@ -246,7 +244,7 @@ static int MPIOI_Type_cyclic(int *array_of_gsizes, int dim, int ndims, int nproc for (i=0; idim; i--) stride *= (MPI_Aint)array_of_gsizes[i]; - MPI_Type_hvector(count, blksize, stride, type_old, type_new); + MPI_Type_create_hvector(count, blksize, stride, type_old, type_new); if (rem) { /* if the last block is of size less than blksize, include @@ -259,7 +257,7 @@ static int MPIOI_Type_cyclic(int *array_of_gsizes, int dim, int ndims, int nproc blklens[0] = 1; blklens[1] = rem; - MPI_Type_struct(2, blklens, disps, types, &type_tmp); + MPI_Type_create_struct(2, blklens, disps, types, &type_tmp); MPI_Type_free(type_new); *type_new = type_tmp; @@ -269,14 +267,12 @@ static int MPIOI_Type_cyclic(int *array_of_gsizes, int dim, int ndims, int nproc dimension correctly. */ if ( ((order == MPI_ORDER_FORTRAN) && (dim == 0)) || ((order == MPI_ORDER_C) && (dim == ndims-1)) ) { - types[0] = MPI_LB; - disps[0] = 0; - types[1] = *type_new; - disps[1] = (MPI_Aint)rank * (MPI_Aint)blksize * orig_extent; - types[2] = MPI_UB; - disps[2] = orig_extent * (MPI_Aint)array_of_gsizes[dim]; - blklens[0] = blklens[1] = blklens[2] = 1; - MPI_Type_struct(3, blklens, disps, types, &type_tmp); + types[0] = *type_new; + disps[0] = (MPI_Aint)rank * (MPI_Aint)blksize * orig_extent; + blklens[0] = 1; + MPI_Type_create_struct(1, blklens, disps, types, &type_tmp1); + MPI_Type_create_resized (type_tmp1, 0, orig_extent * (MPI_Aint)array_of_gsizes[dim], &type_tmp); + MPI_Type_free(&type_tmp1); MPI_Type_free(type_new); *type_new = type_tmp; diff --git a/ompi/mca/io/romio314/romio/adio/common/ad_end.c b/ompi/mca/io/romio314/romio/adio/common/ad_end.c index 00725f5f008..b534e0c25e8 100644 --- a/ompi/mca/io/romio314/romio/adio/common/ad_end.c +++ b/ompi/mca/io/romio314/romio/adio/common/ad_end.c @@ -72,13 +72,13 @@ int ADIOI_End_call(MPI_Comm comm, int keyval, void *attribute_val, void ADIOI_UNREFERENCED_ARG(attribute_val); ADIOI_UNREFERENCED_ARG(extra_state); - MPI_Keyval_free(&keyval); + MPI_Comm_free_keyval (&keyval); /* The end call will be called after all possible uses of this keyval, even * if a file was opened with MPI_COMM_SELF. Note, this assumes LIFO * MPI_COMM_SELF attribute destruction behavior mandated by MPI-2.2. */ if (ADIOI_cb_config_list_keyval != MPI_KEYVAL_INVALID) - MPI_Keyval_free(&ADIOI_cb_config_list_keyval); + MPI_Comm_free_keyval (&ADIOI_cb_config_list_keyval); ADIO_End(&error_code); return error_code; diff --git a/ompi/mca/io/romio314/romio/adio/common/ad_io_coll.c b/ompi/mca/io/romio314/romio/adio/common/ad_io_coll.c index 22b2da473de..1f2573eef5b 100644 --- a/ompi/mca/io/romio314/romio/adio/common/ad_io_coll.c +++ b/ompi/mca/io/romio314/romio/adio/common/ad_io_coll.c @@ -54,7 +54,7 @@ void ADIOI_IOStridedColl (ADIO_File fd, void *buf, int count, int rdwr, int interleave_count = 0, i, nprocs, myrank, nprocs_for_coll; int cb_enable; ADIO_Offset bufsize; - MPI_Aint extent; + MPI_Aint extent, lb; #ifdef DEBUG2 MPI_Aint bufextent; #endif @@ -191,7 +191,7 @@ void ADIOI_IOStridedColl (ADIO_File fd, void *buf, int count, int rdwr, return; } - MPI_Type_extent(datatype, &extent); + MPI_Type_get_extent(datatype, &lb, &extent); #ifdef DEBUG2 bufextent = extent * count; #endif @@ -702,7 +702,7 @@ void ADIOI_Calc_bounds (ADIO_File fd, int count, MPI_Datatype buftype, { MPI_Count filetype_size, buftype_size, etype_size; int sum; - MPI_Aint filetype_extent; + MPI_Aint filetype_extent, lb; ADIO_Offset total_io; int filetype_is_contig; ADIO_Offset i, remainder; @@ -726,7 +726,7 @@ void ADIOI_Calc_bounds (ADIO_File fd, int count, MPI_Datatype buftype, ADIOI_Datatype_iscontig (fd->filetype, &filetype_is_contig); MPI_Type_size_x (fd->filetype, &filetype_size); - MPI_Type_extent (fd->filetype, &filetype_extent); + MPI_Type_get_extent (fd->filetype, &lb, &filetype_extent); MPI_Type_size_x (fd->etype, &etype_size); MPI_Type_size_x (buftype, &buftype_size); @@ -884,7 +884,7 @@ void ADIOI_IOFiletype(ADIO_File fd, void *buf, int count, int user_ind_rd_buffer_size; int f_is_contig, m_is_contig; int user_ds_read, user_ds_write; - MPI_Aint f_extent; + MPI_Aint f_extent, lb; MPI_Count f_size; int f_ds_percent; /* size/extent */ @@ -894,7 +894,7 @@ void ADIOI_IOFiletype(ADIO_File fd, void *buf, int count, else MPE_Log_event(5008, 0, NULL); #endif - MPI_Type_extent(custom_ftype, &f_extent); + MPI_Type_get_extent(custom_ftype, &lb, &f_extent); MPI_Type_size_x(custom_ftype, &f_size); f_ds_percent = 100 * f_size / f_extent; diff --git a/ompi/mca/io/romio314/romio/adio/common/ad_read_coll.c b/ompi/mca/io/romio314/romio/adio/common/ad_read_coll.c index 60b409d53a9..ea76f452af9 100644 --- a/ompi/mca/io/romio314/romio/adio/common/ad_read_coll.c +++ b/ompi/mca/io/romio314/romio/adio/common/ad_read_coll.c @@ -307,8 +307,7 @@ void ADIOI_Calc_my_off_len(ADIO_File fd, int bufcount, MPI_Datatype ADIOI_Datatype_iscontig(fd->filetype, &filetype_is_contig); MPI_Type_size_x(fd->filetype, &filetype_size); - MPI_Type_extent(fd->filetype, &filetype_extent); - MPI_Type_lb(fd->filetype, &filetype_lb); + MPI_Type_get_extent(fd->filetype, &filetype_lb, &filetype_extent); MPI_Type_size_x(datatype, &buftype_size); etype_size = fd->etype_size; @@ -524,7 +523,7 @@ static void ADIOI_Read_and_exch(ADIO_File fd, void *buf, MPI_Datatype int req_len, flag, rank; MPI_Status status; ADIOI_Flatlist_node *flat_buf=NULL; - MPI_Aint buftype_extent; + MPI_Aint buftype_extent, lb; int coll_bufsize; *error_code = MPI_SUCCESS; /* changed below if error */ @@ -605,7 +604,7 @@ static void ADIOI_Read_and_exch(ADIO_File fd, void *buf, MPI_Datatype flat_buf = ADIOI_Flatlist; while (flat_buf->type != datatype) flat_buf = flat_buf->next; } - MPI_Type_extent(datatype, &buftype_extent); + MPI_Type_get_extent(datatype, &lb, &buftype_extent); done = 0; off = st_loc; @@ -685,7 +684,7 @@ static void ADIOI_Read_and_exch(ADIO_File fd, void *buf, MPI_Datatype if (req_off < real_off + real_size) { count[i]++; ADIOI_Assert((((ADIO_Offset)(MPIR_Upint)read_buf)+req_off-real_off) == (ADIO_Offset)(MPIR_Upint)(read_buf+req_off-real_off)); - MPI_Address(read_buf+req_off-real_off, + MPI_Get_address(read_buf+req_off-real_off, &(others_req[i].mem_ptrs[j])); ADIOI_Assert((real_off + real_size - req_off) == (int)(real_off + real_size - req_off)); send_size[i] += (int)(ADIOI_MIN(real_off + real_size - req_off, diff --git a/ompi/mca/io/romio314/romio/adio/common/ad_read_str.c b/ompi/mca/io/romio314/romio/adio/common/ad_read_str.c index dc2ea719adb..bad948a3cb2 100644 --- a/ompi/mca/io/romio314/romio/adio/common/ad_read_str.c +++ b/ompi/mca/io/romio314/romio/adio/common/ad_read_str.c @@ -56,7 +56,7 @@ void ADIOI_GEN_ReadStrided(ADIO_File fd, void *buf, int count, ADIO_Offset n_filetypes, etype_in_filetype, st_n_filetypes, size_in_filetype; ADIO_Offset abs_off_in_filetype=0, new_frd_size, frd_size=0, st_frd_size; MPI_Count filetype_size, etype_size, buftype_size, partial_read; - MPI_Aint filetype_extent, buftype_extent; + MPI_Aint filetype_extent, buftype_extent, lb; int buf_count, buftype_is_contig, filetype_is_contig; ADIO_Offset userbuf_off, req_len, sum; ADIO_Offset off, req_off, disp, end_offset=0, readbuf_off, start_off; @@ -94,9 +94,9 @@ void ADIOI_GEN_ReadStrided(ADIO_File fd, void *buf, int count, return; } - MPI_Type_extent(fd->filetype, &filetype_extent); + MPI_Type_get_extent(fd->filetype, &lb, &filetype_extent); MPI_Type_size_x(datatype, &buftype_size); - MPI_Type_extent(datatype, &buftype_extent); + MPI_Type_get_extent(datatype, &lb, &buftype_extent); etype_size = fd->etype_size; ADIOI_Assert((buftype_size * count) == ((ADIO_Offset)(MPI_Count)buftype_size * (ADIO_Offset)count)); diff --git a/ompi/mca/io/romio314/romio/adio/common/ad_read_str_naive.c b/ompi/mca/io/romio314/romio/adio/common/ad_read_str_naive.c index 6ecebda4305..cb574897e71 100644 --- a/ompi/mca/io/romio314/romio/adio/common/ad_read_str_naive.c +++ b/ompi/mca/io/romio314/romio/adio/common/ad_read_str_naive.c @@ -23,7 +23,7 @@ void ADIOI_GEN_ReadStrided_naive(ADIO_File fd, void *buf, int count, ADIO_Offset abs_off_in_filetype=0; MPI_Count bufsize, filetype_size, buftype_size, size_in_filetype; ADIO_Offset etype_size; - MPI_Aint filetype_extent, buftype_extent; + MPI_Aint filetype_extent, buftype_extent, lb; int buf_count, buftype_is_contig, filetype_is_contig; ADIO_Offset userbuf_off; ADIO_Offset off, req_off, disp, end_offset=0, start_off; @@ -43,9 +43,9 @@ void ADIOI_GEN_ReadStrided_naive(ADIO_File fd, void *buf, int count, return; } - MPI_Type_extent(fd->filetype, &filetype_extent); + MPI_Type_get_extent(fd->filetype, &lb, &filetype_extent); MPI_Type_size_x(buftype, &buftype_size); - MPI_Type_extent(buftype, &buftype_extent); + MPI_Type_get_extent(buftype, &lb, &buftype_extent); etype_size = fd->etype_size; ADIOI_Assert((buftype_size * count) == ((ADIO_Offset)buftype_size * (ADIO_Offset)count)); diff --git a/ompi/mca/io/romio314/romio/adio/common/ad_seek.c b/ompi/mca/io/romio314/romio/adio/common/ad_seek.c index b987fe6d023..9a992ddc144 100644 --- a/ompi/mca/io/romio314/romio/adio/common/ad_seek.c +++ b/ompi/mca/io/romio314/romio/adio/common/ad_seek.c @@ -28,7 +28,7 @@ ADIO_Offset ADIOI_GEN_SeekIndividual(ADIO_File fd, ADIO_Offset offset, ADIO_Offset size_in_filetype, sum; MPI_Count filetype_size, etype_size; int filetype_is_contig; - MPI_Aint filetype_extent; + MPI_Aint filetype_extent, lb; ADIOI_UNREFERENCED_ARG(whence); @@ -40,7 +40,7 @@ ADIO_Offset ADIOI_GEN_SeekIndividual(ADIO_File fd, ADIO_Offset offset, flat_file = ADIOI_Flatlist; while (flat_file->type != fd->filetype) flat_file = flat_file->next; - MPI_Type_extent(fd->filetype, &filetype_extent); + MPI_Type_get_extent(fd->filetype, &lb, &filetype_extent); MPI_Type_size_x(fd->filetype, &filetype_size); if ( ! filetype_size ) { /* Since offset relative to the filetype size, we can't diff --git a/ompi/mca/io/romio314/romio/adio/common/ad_set_view.c b/ompi/mca/io/romio314/romio/adio/common/ad_set_view.c index 2b8ef46b2d1..86e007c5c08 100644 --- a/ompi/mca/io/romio314/romio/adio/common/ad_set_view.c +++ b/ompi/mca/io/romio314/romio/adio/common/ad_set_view.c @@ -35,14 +35,14 @@ void ADIO_Set_view(ADIO_File fd, ADIO_Offset disp, MPI_Datatype etype, /* set new etypes and filetypes */ - MPI_Type_get_envelope(etype, &i, &j, &k, &combiner); + ADIOI_Type_get_envelope(etype, &i, &j, &k, &combiner); if (combiner == MPI_COMBINER_NAMED) fd->etype = etype; else { MPI_Type_contiguous(1, etype, ©_etype); MPI_Type_commit(©_etype); fd->etype = copy_etype; } - MPI_Type_get_envelope(filetype, &i, &j, &k, &combiner); + ADIOI_Type_get_envelope(filetype, &i, &j, &k, &combiner); if (combiner == MPI_COMBINER_NAMED) fd->filetype = filetype; else { diff --git a/ompi/mca/io/romio314/romio/adio/common/ad_subarray.c b/ompi/mca/io/romio314/romio/adio/common/ad_subarray.c index e7984ac3814..6ae7015de52 100644 --- a/ompi/mca/io/romio314/romio/adio/common/ad_subarray.c +++ b/ompi/mca/io/romio314/romio/adio/common/ad_subarray.c @@ -16,11 +16,11 @@ int ADIO_Type_create_subarray(int ndims, MPI_Datatype oldtype, MPI_Datatype *newtype) { - MPI_Aint extent, disps[3], size; - int i, blklens[3]; - MPI_Datatype tmp1, tmp2, types[3]; + MPI_Aint extent, disp, size, lb, ub; + int i, blklen; + MPI_Datatype tmp1, tmp2, inttype; - MPI_Type_extent(oldtype, &extent); + MPI_Type_get_extent(oldtype, &lb, &extent); if (order == MPI_ORDER_FORTRAN) { /* dimension 0 changes fastest */ @@ -35,18 +35,18 @@ int ADIO_Type_create_subarray(int ndims, size = (MPI_Aint)array_of_sizes[0]*extent; for (i=2; i=0; i--) { size *= (MPI_Aint)array_of_sizes[i+1]; - MPI_Type_hvector(array_of_subsizes[i], 1, size, tmp1, &tmp2); + MPI_Type_create_hvector(array_of_subsizes[i], 1, size, tmp1, &tmp2); MPI_Type_free(&tmp1); tmp1 = tmp2; } } /* add displacement and UB */ - disps[1] = array_of_starts[ndims-1]; + disp = array_of_starts[ndims-1]; size = 1; for (i=ndims-2; i>=0; i--) { size *= (MPI_Aint)array_of_sizes[i+1]; - disps[1] += size*(MPI_Aint)array_of_starts[i]; + disp += size*(MPI_Aint)array_of_starts[i]; } } - disps[1] *= extent; + disp *= extent; - disps[2] = extent; - for (i=0; itype != datatype) flat_buf = flat_buf->next; } - MPI_Type_extent(datatype, &buftype_extent); + MPI_Type_get_extent(datatype, &lb, &buftype_extent); /* I need to check if there are any outstanding nonblocking writes to @@ -468,7 +468,7 @@ static void ADIOI_Exch_and_write(ADIO_File fd, void *buf, MPI_Datatype if (req_off < off + size) { count[i]++; ADIOI_Assert((((ADIO_Offset)(MPIR_Upint)write_buf)+req_off-off) == (ADIO_Offset)(MPIR_Upint)(write_buf+req_off-off)); - MPI_Address(write_buf+req_off-off, + MPI_Get_address(write_buf+req_off-off, &(others_req[i].mem_ptrs[j])); ADIOI_Assert((off + size - req_off) == (int)(off + size - req_off)); recv_size[i] += (int)(ADIOI_MIN(off + size - req_off, diff --git a/ompi/mca/io/romio314/romio/adio/common/ad_write_nolock.c b/ompi/mca/io/romio314/romio/adio/common/ad_write_nolock.c index 42b5ff2d3ab..e3371a77de9 100644 --- a/ompi/mca/io/romio314/romio/adio/common/ad_write_nolock.c +++ b/ompi/mca/io/romio314/romio/adio/common/ad_write_nolock.c @@ -35,7 +35,7 @@ void ADIOI_NOLOCK_WriteStrided(ADIO_File fd, const void *buf, int count, ADIO_Offset n_filetypes, etype_in_filetype, size, sum; ADIO_Offset abs_off_in_filetype=0, size_in_filetype; MPI_Count filetype_size, etype_size, buftype_size; - MPI_Aint filetype_extent, buftype_extent, indx; + MPI_Aint filetype_extent, buftype_extent, indx, lb; int buf_count, buftype_is_contig, filetype_is_contig; ADIO_Offset off, disp; int flag, err_flag=0; @@ -71,9 +71,9 @@ void ADIOI_NOLOCK_WriteStrided(ADIO_File fd, const void *buf, int count, MPI_Comm_size(fd->comm, &nprocs); #endif - MPI_Type_extent(fd->filetype, &filetype_extent); + MPI_Type_get_extent(fd->filetype, &lb, &filetype_extent); MPI_Type_size_x(datatype, &buftype_size); - MPI_Type_extent(datatype, &buftype_extent); + MPI_Type_get_extent(datatype, &lb, &buftype_extent); etype_size = fd->etype_size; ADIOI_Assert((buftype_size * count) == ((ADIO_Offset)(unsigned)buftype_size * (ADIO_Offset)count)); diff --git a/ompi/mca/io/romio314/romio/adio/common/ad_write_str.c b/ompi/mca/io/romio314/romio/adio/common/ad_write_str.c index 624aeb12285..9e4d7f7fc9e 100644 --- a/ompi/mca/io/romio314/romio/adio/common/ad_write_str.c +++ b/ompi/mca/io/romio314/romio/adio/common/ad_write_str.c @@ -122,7 +122,7 @@ void ADIOI_GEN_WriteStrided(ADIO_File fd, const void *buf, int count, ADIO_Offset num, size, n_filetypes, etype_in_filetype, st_n_filetypes; ADIO_Offset n_etypes_in_filetype, abs_off_in_filetype=0; MPI_Count filetype_size, etype_size, buftype_size; - MPI_Aint filetype_extent, buftype_extent; + MPI_Aint filetype_extent, buftype_extent, lb; int buf_count, buftype_is_contig, filetype_is_contig; ADIO_Offset userbuf_off; ADIO_Offset off, req_off, disp, end_offset=0, writebuf_off, start_off; @@ -164,9 +164,9 @@ void ADIOI_GEN_WriteStrided(ADIO_File fd, const void *buf, int count, return; } - MPI_Type_extent(fd->filetype, &filetype_extent); + MPI_Type_get_extent(fd->filetype, &lb, &filetype_extent); MPI_Type_size_x(datatype, &buftype_size); - MPI_Type_extent(datatype, &buftype_extent); + MPI_Type_get_extent(datatype, &lb, &buftype_extent); etype_size = fd->etype_size; ADIOI_Assert((buftype_size * count) == ((MPI_Count)buftype_size * (ADIO_Offset)count)); diff --git a/ompi/mca/io/romio314/romio/adio/common/ad_write_str_naive.c b/ompi/mca/io/romio314/romio/adio/common/ad_write_str_naive.c index 591c66f6a96..d72f1168753 100644 --- a/ompi/mca/io/romio314/romio/adio/common/ad_write_str_naive.c +++ b/ompi/mca/io/romio314/romio/adio/common/ad_write_str_naive.c @@ -24,7 +24,7 @@ void ADIOI_GEN_WriteStrided_naive(ADIO_File fd, const void *buf, int count, ADIO_Offset size, n_filetypes, etype_in_filetype; ADIO_Offset abs_off_in_filetype=0, req_len; MPI_Count filetype_size, etype_size, buftype_size; - MPI_Aint filetype_extent, buftype_extent; + MPI_Aint filetype_extent, buftype_extent, lb; int buf_count, buftype_is_contig, filetype_is_contig; ADIO_Offset userbuf_off; ADIO_Offset off, req_off, disp, end_offset=0, start_off; @@ -44,9 +44,9 @@ void ADIOI_GEN_WriteStrided_naive(ADIO_File fd, const void *buf, int count, return; } - MPI_Type_extent(fd->filetype, &filetype_extent); + MPI_Type_get_extent(fd->filetype, &lb, &filetype_extent); MPI_Type_size_x(buftype, &buftype_size); - MPI_Type_extent(buftype, &buftype_extent); + MPI_Type_get_extent(buftype, &lb, &buftype_extent); etype_size = fd->etype_size; ADIOI_Assert((buftype_size * count) == ((ADIO_Offset)(unsigned)buftype_size * (ADIO_Offset)count)); diff --git a/ompi/mca/io/romio314/romio/adio/common/byte_offset.c b/ompi/mca/io/romio314/romio/adio/common/byte_offset.c index b7350f1faa8..66f83edf5ad 100644 --- a/ompi/mca/io/romio314/romio/adio/common/byte_offset.c +++ b/ompi/mca/io/romio314/romio/adio/common/byte_offset.c @@ -18,7 +18,7 @@ void ADIOI_Get_byte_offset(ADIO_File fd, ADIO_Offset offset, ADIO_Offset *disp) ADIO_Offset n_filetypes, etype_in_filetype, sum, abs_off_in_filetype=0, size_in_filetype; MPI_Count n_etypes_in_filetype, filetype_size, etype_size; int filetype_is_contig; - MPI_Aint filetype_extent; + MPI_Aint filetype_extent, lb; ADIOI_Datatype_iscontig(fd->filetype, &filetype_is_contig); etype_size = fd->etype_size; @@ -46,7 +46,7 @@ void ADIOI_Get_byte_offset(ADIO_File fd, ADIO_Offset offset, ADIO_Offset *disp) } /* abs. offset in bytes in the file */ - MPI_Type_extent(fd->filetype, &filetype_extent); + MPI_Type_get_extent(fd->filetype, &lb, &filetype_extent); *disp = fd->disp + n_filetypes * ADIOI_AINT_CAST_TO_OFFSET filetype_extent + abs_off_in_filetype; } } diff --git a/ompi/mca/io/romio314/romio/adio/common/cb_config_list.c b/ompi/mca/io/romio314/romio/adio/common/cb_config_list.c index 468105c5ae0..e9f3116224a 100644 --- a/ompi/mca/io/romio314/romio/adio/common/cb_config_list.c +++ b/ompi/mca/io/romio314/romio/adio/common/cb_config_list.c @@ -135,12 +135,12 @@ int ADIOI_cb_gather_name_array(MPI_Comm comm, if (ADIOI_cb_config_list_keyval == MPI_KEYVAL_INVALID) { /* cleaned up by ADIOI_End_call */ - MPI_Keyval_create((MPI_Copy_function *) ADIOI_cb_copy_name_array, - (MPI_Delete_function *) ADIOI_cb_delete_name_array, + MPI_Comm_create_keyval((MPI_Comm_copy_attr_function *) ADIOI_cb_copy_name_array, + (MPI_Comm_delete_attr_function *) ADIOI_cb_delete_name_array, &ADIOI_cb_config_list_keyval, NULL); } else { - MPI_Attr_get(comm, ADIOI_cb_config_list_keyval, (void *) &array, &found); + MPI_Comm_get_attr(comm, ADIOI_cb_config_list_keyval, (void *) &array, &found); if (found) { ADIOI_Assert(array != NULL); *arrayp = array; @@ -255,8 +255,8 @@ int ADIOI_cb_gather_name_array(MPI_Comm comm, * it next time an open is performed on this same comm, and on the * dupcomm, so we can use it in I/O operations. */ - MPI_Attr_put(comm, ADIOI_cb_config_list_keyval, array); - MPI_Attr_put(dupcomm, ADIOI_cb_config_list_keyval, array); + MPI_Comm_set_attr (comm, ADIOI_cb_config_list_keyval, array); + MPI_Comm_set_attr (dupcomm, ADIOI_cb_config_list_keyval, array); *arrayp = array; return 0; } diff --git a/ompi/mca/io/romio314/romio/adio/common/eof_offset.c b/ompi/mca/io/romio314/romio/adio/common/eof_offset.c index 724746317fd..c503eaf94b1 100644 --- a/ompi/mca/io/romio314/romio/adio/common/eof_offset.c +++ b/ompi/mca/io/romio314/romio/adio/common/eof_offset.c @@ -18,7 +18,7 @@ void ADIOI_Get_eof_offset(ADIO_File fd, ADIO_Offset *eof_offset) ADIO_Offset fsize, disp, sum=0, size_in_file, n_filetypes, rem, etype_size; int flag, i; ADIO_Fcntl_t *fcntl_struct; - MPI_Aint filetype_extent; + MPI_Aint filetype_extent, lb; ADIOI_Flatlist_node *flat_file; /* find the eof in bytes */ @@ -45,7 +45,7 @@ void ADIOI_Get_eof_offset(ADIO_File fd, ADIO_Offset *eof_offset) flat_file = flat_file->next; MPI_Type_size_x(fd->filetype, &filetype_size); - MPI_Type_extent(fd->filetype, &filetype_extent); + MPI_Type_get_extent(fd->filetype, &lb, &filetype_extent); disp = fd->disp; n_filetypes = -1; diff --git a/ompi/mca/io/romio314/romio/adio/common/flatten.c b/ompi/mca/io/romio314/romio/adio/common/flatten.c index 88590d9719c..89da527e281 100644 --- a/ompi/mca/io/romio314/romio/adio/common/flatten.c +++ b/ompi/mca/io/romio314/romio/adio/common/flatten.c @@ -14,6 +14,90 @@ #define FLATTEN_DEBUG 1 #endif +struct adio_short_int { + short elem_s; + int elem_i; +}; + +struct adio_double_int { + double elem_d; + int elem_i; +}; + +struct adio_long_int { + long elem_l; + int elem_i; +}; + +struct adio_long_double_int { + long double elem_ld; + int elem_i; +}; + +int ADIOI_Type_get_envelope (MPI_Datatype datatype, int *num_integers, + int *num_addresses, int *num_datatypes, int *combiner) +{ + int rc, is_contig; + + ADIOI_Datatype_iscontig(datatype, &is_contig); + + rc = MPI_Type_get_envelope (datatype, num_integers, num_addresses, num_datatypes, combiner); + if (MPI_SUCCESS != rc || MPI_COMBINER_NAMED != *combiner || is_contig) { + return rc; + } + + if (MPI_SHORT_INT == datatype || MPI_DOUBLE_INT == datatype || MPI_LONG_DOUBLE_INT == datatype || + MPI_LONG_INT == datatype) { + *num_integers = 2; + *num_addresses = 2; + *num_datatypes = 2; + *combiner = MPI_COMBINER_STRUCT; + } + + return rc; +} + +int ADIOI_Type_get_contents (MPI_Datatype datatype, int max_integers, + int max_addresses, int max_datatypes, int array_of_integers[], + MPI_Aint array_of_addresses[], MPI_Datatype array_of_datatypes[]) +{ + int dontcare, combiner; + int rc; + + rc = MPI_Type_get_envelope (datatype, &dontcare, &dontcare, &dontcare, &combiner); + if (MPI_SUCCESS != rc) { + return rc; + } + + if (MPI_COMBINER_NAMED != combiner) { + return MPI_Type_get_contents (datatype, max_integers, max_addresses, max_datatypes, + array_of_integers, array_of_addresses, array_of_datatypes); + } + + array_of_integers[0] = 1; + array_of_integers[1] = 1; + array_of_addresses[0] = 0; + array_of_datatypes[1] = MPI_INT; + + if (MPI_SHORT_INT == datatype) { + array_of_datatypes[0] = MPI_SHORT; + array_of_addresses[1] = offsetof (struct adio_short_int, elem_i); + } else if (MPI_DOUBLE_INT == datatype) { + array_of_datatypes[0] = MPI_DOUBLE; + array_of_addresses[1] = offsetof (struct adio_double_int, elem_i); + } else if (MPI_LONG_DOUBLE_INT == datatype) { + array_of_datatypes[0] = MPI_LONG_DOUBLE; + array_of_addresses[1] = offsetof (struct adio_long_double_int, elem_i); + } else if (MPI_LONG_INT == datatype) { + array_of_datatypes[0] = MPI_LONG; + array_of_addresses[1] = offsetof (struct adio_long_int, elem_i); + } else { + rc = MPI_ERR_TYPE; + } + + return rc; +} + void ADIOI_Optimize_flattened(ADIOI_Flatlist_node *flat_type); /* flatten datatype and add it to Flatlist */ void ADIOI_Flatten_datatype(MPI_Datatype datatype) @@ -114,15 +198,19 @@ void ADIOI_Flatten(MPI_Datatype datatype, ADIOI_Flatlist_node *flat, avoid >2G integer arithmetic problems */ ADIO_Offset top_count; MPI_Count j, old_size, prev_index, num; - MPI_Aint old_extent;/* Assume extents are non-negative */ + MPI_Aint old_extent, lb;/* Assume extents are non-negative */ int *ints; MPI_Aint *adds; /* Make no assumptions about +/- sign on these */ MPI_Datatype *types; - MPI_Type_get_envelope(datatype, &nints, &nadds, &ntypes, &combiner); + ADIOI_Type_get_envelope(datatype, &nints, &nadds, &ntypes, &combiner); + if (combiner == MPI_COMBINER_NAMED) { + return; /* can't do anything else: calling get_contents on a builtin + type is an error */ + } ints = (int *) ADIOI_Malloc((nints+1)*sizeof(int)); adds = (MPI_Aint *) ADIOI_Malloc((nadds+1)*sizeof(MPI_Aint)); types = (MPI_Datatype *) ADIOI_Malloc((ntypes+1)*sizeof(MPI_Datatype)); - MPI_Type_get_contents(datatype, nints, nadds, ntypes, ints, adds, types); + ADIOI_Type_get_contents(datatype, nints, nadds, ntypes, ints, adds, types); #ifdef FLATTEN_DEBUG DBG_FPRINTF(stderr,"ADIOI_Flatten:: st_offset %#llX, curr_index %#llX\n",st_offset,*curr_index); @@ -153,7 +241,7 @@ void ADIOI_Flatten(MPI_Datatype datatype, ADIOI_Flatlist_node *flat, #ifdef FLATTEN_DEBUG DBG_FPRINTF(stderr,"ADIOI_Flatten:: MPI_COMBINER_DUP\n"); #endif - MPI_Type_get_envelope(types[0], &old_nints, &old_nadds, + ADIOI_Type_get_envelope(types[0], &old_nints, &old_nadds, &old_ntypes, &old_combiner); ADIOI_Datatype_iscontig(types[0], &old_is_contig); if ((old_combiner != MPI_COMBINER_NAMED) && (!old_is_contig)) @@ -218,7 +306,7 @@ void ADIOI_Flatten(MPI_Datatype datatype, ADIOI_Flatlist_node *flat, DBG_FPRINTF(stderr,"ADIOI_Flatten:: MPI_COMBINER_CONTIGUOUS\n"); #endif top_count = ints[0]; - MPI_Type_get_envelope(types[0], &old_nints, &old_nadds, + ADIOI_Type_get_envelope(types[0], &old_nints, &old_nadds, &old_ntypes, &old_combiner); ADIOI_Datatype_iscontig(types[0], &old_is_contig); @@ -243,7 +331,7 @@ void ADIOI_Flatten(MPI_Datatype datatype, ADIOI_Flatlist_node *flat, num = *curr_index - prev_index; /* The noncontiguous types have to be replicated count times */ - MPI_Type_extent(types[0], &old_extent); + MPI_Type_get_extent(types[0], &lb, &old_extent); for (m=1; mindices[j] = flat->indices[j-num] + ADIOI_AINT_CAST_TO_OFFSET old_extent; @@ -263,7 +351,7 @@ void ADIOI_Flatten(MPI_Datatype datatype, ADIOI_Flatlist_node *flat, DBG_FPRINTF(stderr,"ADIOI_Flatten:: MPI_COMBINER_VECTOR\n"); #endif top_count = ints[0]; - MPI_Type_get_envelope(types[0], &old_nints, &old_nadds, + ADIOI_Type_get_envelope(types[0], &old_nints, &old_nadds, &old_ntypes, &old_combiner); ADIOI_Datatype_iscontig(types[0], &old_is_contig); @@ -297,7 +385,7 @@ void ADIOI_Flatten(MPI_Datatype datatype, ADIOI_Flatlist_node *flat, /* The noncontiguous types have to be replicated blocklen times and then strided. Replicate the first one. */ - MPI_Type_extent(types[0], &old_extent); + MPI_Type_get_extent(types[0], &lb, &old_extent); for (m=1; mindices[j] = flat->indices[j-num] + ADIOI_AINT_CAST_TO_OFFSET old_extent; @@ -326,7 +414,7 @@ void ADIOI_Flatten(MPI_Datatype datatype, ADIOI_Flatlist_node *flat, DBG_FPRINTF(stderr,"ADIOI_Flatten:: MPI_COMBINER_HVECTOR_INTEGER\n"); #endif top_count = ints[0]; - MPI_Type_get_envelope(types[0], &old_nints, &old_nadds, + ADIOI_Type_get_envelope(types[0], &old_nints, &old_nadds, &old_ntypes, &old_combiner); ADIOI_Datatype_iscontig(types[0], &old_is_contig); @@ -360,7 +448,7 @@ void ADIOI_Flatten(MPI_Datatype datatype, ADIOI_Flatlist_node *flat, /* The noncontiguous types have to be replicated blocklen times and then strided. Replicate the first one. */ - MPI_Type_extent(types[0], &old_extent); + MPI_Type_get_extent(types[0], &lb, &old_extent); for (m=1; mindices[j] = flat->indices[j-num] + ADIOI_AINT_CAST_TO_OFFSET old_extent; @@ -388,10 +476,10 @@ void ADIOI_Flatten(MPI_Datatype datatype, ADIOI_Flatlist_node *flat, DBG_FPRINTF(stderr,"ADIOI_Flatten:: MPI_COMBINER_INDEXED\n"); #endif top_count = ints[0]; - MPI_Type_get_envelope(types[0], &old_nints, &old_nadds, + ADIOI_Type_get_envelope(types[0], &old_nints, &old_nadds, &old_ntypes, &old_combiner); ADIOI_Datatype_iscontig(types[0], &old_is_contig); - MPI_Type_extent(types[0], &old_extent); + MPI_Type_get_extent(types[0], &lb, &old_extent); prev_index = *curr_index; if ((old_combiner != MPI_COMBINER_NAMED) && (!old_is_contig)) @@ -494,10 +582,10 @@ void ADIOI_Flatten(MPI_Datatype datatype, ADIOI_Flatlist_node *flat, DBG_FPRINTF(stderr,"ADIOI_Flatten:: MPI_COMBINER_INDEXED_BLOCK\n"); #endif top_count = ints[0]; - MPI_Type_get_envelope(types[0], &old_nints, &old_nadds, + ADIOI_Type_get_envelope(types[0], &old_nints, &old_nadds, &old_ntypes, &old_combiner); ADIOI_Datatype_iscontig(types[0], &old_is_contig); - MPI_Type_extent(types[0], &old_extent); + MPI_Type_get_extent(types[0], &lb, &old_extent); prev_index = *curr_index; if ((old_combiner != MPI_COMBINER_NAMED) && (!old_is_contig)) @@ -545,7 +633,7 @@ void ADIOI_Flatten(MPI_Datatype datatype, ADIOI_Flatlist_node *flat, if (is_hindexed_block) { /* this is the one place the hindexed case uses the * extent of a type */ - MPI_Type_extent(types[0], &old_extent); + MPI_Type_get_extent(types[0], &lb, &old_extent); } flat->indices[j] = flat->indices[j-num] + ADIOI_AINT_CAST_TO_OFFSET old_extent; @@ -583,7 +671,7 @@ void ADIOI_Flatten(MPI_Datatype datatype, ADIOI_Flatlist_node *flat, DBG_FPRINTF(stderr,"ADIOI_Flatten:: MPI_COMBINER_HINDEXED_INTEGER\n"); #endif top_count = ints[0]; - MPI_Type_get_envelope(types[0], &old_nints, &old_nadds, + ADIOI_Type_get_envelope(types[0], &old_nints, &old_nadds, &old_ntypes, &old_combiner); ADIOI_Datatype_iscontig(types[0], &old_is_contig); @@ -620,7 +708,7 @@ void ADIOI_Flatten(MPI_Datatype datatype, ADIOI_Flatlist_node *flat, /* The noncontiguous types have to be replicated blocklens[i] times and then strided. Replicate the first one. */ - MPI_Type_extent(types[0], &old_extent); + MPI_Type_get_extent(types[0], &lb, &old_extent); for (m=1; mblocklens[j-num] > 0) { @@ -675,7 +763,7 @@ void ADIOI_Flatten(MPI_Datatype datatype, ADIOI_Flatlist_node *flat, #endif top_count = ints[0]; for (n=0; n2G integer arithmetic problems */ - if (ints[1+n] > 0 || types[n] == MPI_LB || types[n] == MPI_UB) { + if (ints[1+n] > 0) { ADIO_Offset blocklength = ints[1+n]; j = *curr_index; flat->indices[j] = st_offset + adds[n]; @@ -706,7 +794,7 @@ void ADIOI_Flatten(MPI_Datatype datatype, ADIOI_Flatlist_node *flat, num = *curr_index - prev_index; /* The current type has to be replicated blocklens[n] times */ - MPI_Type_extent(types[n], &old_extent); + MPI_Type_get_extent(types[n], &lb, &old_extent); for (m=1; mindices[j] = @@ -746,7 +834,7 @@ void ADIOI_Flatten(MPI_Datatype datatype, ADIOI_Flatlist_node *flat, /* handle the datatype */ - MPI_Type_get_envelope(types[0], &old_nints, &old_nadds, + ADIOI_Type_get_envelope(types[0], &old_nints, &old_nadds, &old_ntypes, &old_combiner); ADIOI_Datatype_iscontig(types[0], &old_is_contig); @@ -827,16 +915,20 @@ MPI_Count ADIOI_Count_contiguous_blocks(MPI_Datatype datatype, MPI_Count *curr_i MPI_Aint *adds; /* Make no assumptions about +/- sign on these */ MPI_Datatype *types; - MPI_Type_get_envelope(datatype, &nints, &nadds, &ntypes, &combiner); + ADIOI_Type_get_envelope(datatype, &nints, &nadds, &ntypes, &combiner); + if (combiner == MPI_COMBINER_NAMED) { + return 1; /* builtin types not supposed to be passed to this routine + */ + } ints = (int *) ADIOI_Malloc((nints+1)*sizeof(int)); adds = (MPI_Aint *) ADIOI_Malloc((nadds+1)*sizeof(MPI_Aint)); types = (MPI_Datatype *) ADIOI_Malloc((ntypes+1)*sizeof(MPI_Datatype)); - MPI_Type_get_contents(datatype, nints, nadds, ntypes, ints, adds, types); + ADIOI_Type_get_contents(datatype, nints, nadds, ntypes, ints, adds, types); switch (combiner) { #ifdef MPIIMPL_HAVE_MPI_COMBINER_DUP case MPI_COMBINER_DUP: - MPI_Type_get_envelope(types[0], &old_nints, &old_nadds, + ADIOI_Type_get_envelope(types[0], &old_nints, &old_nadds, &old_ntypes, &old_combiner); ADIOI_Datatype_iscontig(types[0], &old_is_contig); if ((old_combiner != MPI_COMBINER_NAMED) && (!old_is_contig)) @@ -895,7 +987,7 @@ MPI_Count ADIOI_Count_contiguous_blocks(MPI_Datatype datatype, MPI_Count *curr_i #endif case MPI_COMBINER_CONTIGUOUS: top_count = ints[0]; - MPI_Type_get_envelope(types[0], &old_nints, &old_nadds, + ADIOI_Type_get_envelope(types[0], &old_nints, &old_nadds, &old_ntypes, &old_combiner); ADIOI_Datatype_iscontig(types[0], &old_is_contig); @@ -919,7 +1011,7 @@ MPI_Count ADIOI_Count_contiguous_blocks(MPI_Datatype datatype, MPI_Count *curr_i case MPI_COMBINER_HVECTOR: case MPI_COMBINER_HVECTOR_INTEGER: top_count = ints[0]; - MPI_Type_get_envelope(types[0], &old_nints, &old_nadds, + ADIOI_Type_get_envelope(types[0], &old_nints, &old_nadds, &old_ntypes, &old_combiner); ADIOI_Datatype_iscontig(types[0], &old_is_contig); @@ -954,7 +1046,7 @@ MPI_Count ADIOI_Count_contiguous_blocks(MPI_Datatype datatype, MPI_Count *curr_i case MPI_COMBINER_HINDEXED: case MPI_COMBINER_HINDEXED_INTEGER: top_count = ints[0]; - MPI_Type_get_envelope(types[0], &old_nints, &old_nadds, + ADIOI_Type_get_envelope(types[0], &old_nints, &old_nadds, &old_ntypes, &old_combiner); ADIOI_Datatype_iscontig(types[0], &old_is_contig); @@ -990,7 +1082,7 @@ MPI_Count ADIOI_Count_contiguous_blocks(MPI_Datatype datatype, MPI_Count *curr_i #endif case MPI_COMBINER_INDEXED_BLOCK: top_count = ints[0]; - MPI_Type_get_envelope(types[0], &old_nints, &old_nadds, + ADIOI_Type_get_envelope(types[0], &old_nints, &old_nadds, &old_ntypes, &old_combiner); ADIOI_Datatype_iscontig(types[0], &old_is_contig); @@ -1024,7 +1116,7 @@ MPI_Count ADIOI_Count_contiguous_blocks(MPI_Datatype datatype, MPI_Count *curr_i top_count = ints[0]; count = 0; for (n=0; nfiletype, &filetype_is_contig); @@ -31,7 +31,7 @@ void ADIOI_Get_position(ADIO_File fd, ADIO_Offset *offset) while (flat_file->type != fd->filetype) flat_file = flat_file->next; MPI_Type_size_x(fd->filetype, &filetype_size); - MPI_Type_extent(fd->filetype, &filetype_extent); + MPI_Type_get_extent(fd->filetype, &lb, &filetype_extent); disp = fd->disp; byte_offset = fd->fp_ind; diff --git a/ompi/mca/io/romio314/romio/adio/include/adioi.h b/ompi/mca/io/romio314/romio/adio/include/adioi.h index 59638ac8b49..e70d015abac 100644 --- a/ompi/mca/io/romio314/romio/adio/include/adioi.h +++ b/ompi/mca/io/romio314/romio/adio/include/adioi.h @@ -324,6 +324,12 @@ typedef struct { /* prototypes for ADIO internal functions */ + +int ADIOI_Type_get_envelope (MPI_Datatype datatype, int *num_integers, + int *num_addresses, int *num_datatypes, int *combiner); +int ADIOI_Type_get_contents (MPI_Datatype datatype, int max_integers, + int max_addresses, int max_datatypes, int array_of_integers[], + MPI_Aint array_of_addresses[], MPI_Datatype array_of_datatypes[]); void ADIOI_SetFunctions(ADIO_File fd); void ADIOI_Flatten_datatype(MPI_Datatype type); void ADIOI_Flatten(MPI_Datatype type, ADIOI_Flatlist_node *flat, diff --git a/ompi/mca/io/romio314/romio/adio/include/mpipr.h b/ompi/mca/io/romio314/romio/adio/include/mpipr.h index 21f208ed01f..bfe8b5ebf64 100644 --- a/ompi/mca/io/romio314/romio/adio/include/mpipr.h +++ b/ompi/mca/io/romio314/romio/adio/include/mpipr.h @@ -12,8 +12,6 @@ #undef MPI_Abort #define MPI_Abort PMPI_Abort -#undef MPI_Address -#define MPI_Address PMPI_Address #undef MPI_Allgather #define MPI_Allgather PMPI_Allgather #undef MPI_Allgatherv @@ -30,8 +28,6 @@ #define MPI_Attr_delete PMPI_Attr_delete #undef MPI_Attr_get #define MPI_Attr_get PMPI_Attr_get -#undef MPI_Attr_put -#define MPI_Attr_put PMPI_Attr_put #undef MPI_Barrier #define MPI_Barrier PMPI_Barrier #undef MPI_Bcast @@ -68,10 +64,14 @@ #define MPI_Comm_compare PMPI_Comm_compare #undef MPI_Comm_create #define MPI_Comm_create PMPI_Comm_create +#undef MPI_Comm_create_keyval +#define MPI_Comm_create_keyval PMPI_Comm_create_keyval #undef MPI_Comm_dup #define MPI_Comm_dup PMPI_Comm_dup #undef MPI_Comm_free #define MPI_Comm_free PMPI_Comm_free +#undef MPI_Comm_free_keyval +#define MPI_Comm_free_keyval PMPI_Comm_free_keyval #undef MPI_Comm_group #define MPI_Comm_group PMPI_Comm_group #undef MPI_Comm_rank @@ -80,6 +80,8 @@ #define MPI_Comm_remote_group PMPI_Comm_remote_group #undef MPI_Comm_remote_size #define MPI_Comm_remote_size PMPI_Comm_remote_size +#undef MPI_Comm_set_attr +#define MPI_Comm_set_attr PMPI_Comm_set_attr #undef MPI_Comm_size #define MPI_Comm_size PMPI_Comm_size #undef MPI_Comm_split @@ -106,6 +108,8 @@ #define MPI_Gather PMPI_Gather #undef MPI_Gatherv #define MPI_Gatherv PMPI_Gatherv +#undef MPI_Get_address +#define MPI_Get_address PMPI_Get_address #undef MPI_Get_count #define MPI_Get_count PMPI_Get_count #undef MPI_Get_elements @@ -170,10 +174,6 @@ #define MPI_Isend PMPI_Isend #undef MPI_Issend #define MPI_Issend PMPI_Issend -#undef MPI_Keyval_create -#define MPI_Keyval_create PMPI_Keyval_create -#undef MPI_Keyval_free -#define MPI_Keyval_free PMPI_Keyval_free #undef MPI_Name_get #define MPI_Name_get PMPI_Name_get #undef MPI_Name_put @@ -248,14 +248,22 @@ #define MPI_Type_contiguous PMPI_Type_contiguous #undef MPI_Type_count #define MPI_Type_count PMPI_Type_count +#undef MPI_Type_create_struct +#define MPI_Type_create_struct PMPI_Type_create_struct +#undef MPI_Type_create_resized +#define MPI_Type_create_resized PMPI_Type_create_resized /* #define MPI_Type_create_darray PMPI_Type_create_darray */ #undef MPI_Type_create_indexed_block #define MPI_Type_create_indexed_block PMPI_Type_create_indexed_block +#undef MPI_Type_create_hindexed +#define MPI_Type_create_hindexed PMPI_Type_create_hindexed #undef MPI_Type_create_hindexed_block #define MPI_Type_create_hindexed_block PMPI_Type_create_hindexed_block +#undef MPI_Type_create_hvector +#define MPI_Type_create_hvector PMPI_Type_create_hvector /* #define MPI_Type_create_subarray PMPI_Type_create_subarray */ -#undef MPI_Type_extent -#define MPI_Type_extent PMPI_Type_extent +#undef MPI_Type_get_extent +#define MPI_Type_get_extent PMPI_Type_get_extent #undef MPI_Type_free #define MPI_Type_free PMPI_Type_free #undef MPI_Type_get_contents @@ -264,20 +272,10 @@ #define MPI_Type_get_envelope PMPI_Type_get_envelope #undef MPI_Type_get_true_extent #define MPI_Type_get_true_extent PMPI_Type_get_true_extent -#undef MPI_Type_hindexed -#define MPI_Type_hindexed PMPI_Type_hindexed -#undef MPI_Type_hvector -#define MPI_Type_hvector PMPI_Type_hvector #undef MPI_Type_indexed #define MPI_Type_indexed PMPI_Type_indexed -#undef MPI_Type_lb -#define MPI_Type_lb PMPI_Type_lb #undef MPI_Type_size #define MPI_Type_size PMPI_Type_size -#undef MPI_Type_struct -#define MPI_Type_struct PMPI_Type_struct -#undef MPI_Type_ub -#define MPI_Type_ub PMPI_Type_ub #undef MPI_Type_vector #define MPI_Type_vector PMPI_Type_vector #undef MPI_Unpack diff --git a/ompi/mca/io/romio314/romio/mpi-io/get_extent.c b/ompi/mca/io/romio314/romio/mpi-io/get_extent.c index 103f8815c62..8b7ff15004d 100644 --- a/ompi/mca/io/romio314/romio/mpi-io/get_extent.c +++ b/ompi/mca/io/romio314/romio/mpi-io/get_extent.c @@ -42,6 +42,7 @@ int MPI_File_get_type_extent(MPI_File fh, MPI_Datatype datatype, MPI_Aint *exten int error_code; ADIO_File adio_fh; static char myname[] = "MPI_FILE_GET_TYPE_EXTENT"; + MPI_Aint lb; adio_fh = MPIO_File_resolve(fh); @@ -52,7 +53,7 @@ int MPI_File_get_type_extent(MPI_File fh, MPI_Datatype datatype, MPI_Aint *exten /* FIXME: handle other file data representations */ - error_code = MPI_Type_extent(datatype, extent); + error_code = MPI_Type_get_extent(datatype, &lb, extent); fn_exit: return error_code; diff --git a/ompi/mca/io/romio314/romio/mpi-io/mpioimpl.h b/ompi/mca/io/romio314/romio/mpi-io/mpioimpl.h index a73561acb2b..ff1372394b9 100644 --- a/ompi/mca/io/romio314/romio/mpi-io/mpioimpl.h +++ b/ompi/mca/io/romio314/romio/mpi-io/mpioimpl.h @@ -58,7 +58,7 @@ struct MPIR_Info { #define MPIR_INFO_COOKIE 5835657 -MPI_Delete_function ADIOI_End_call; +MPI_Comm_delete_attr_function ADIOI_End_call; /* common initialization routine */ void MPIR_MPIOInit(int * error_code); diff --git a/ompi/mca/io/romio314/romio/mpi-io/mpir-mpioinit.c b/ompi/mca/io/romio314/romio/mpi-io/mpir-mpioinit.c index 914b8d53890..50d9d4e08e5 100644 --- a/ompi/mca/io/romio314/romio/mpi-io/mpir-mpioinit.c +++ b/ompi/mca/io/romio314/romio/mpi-io/mpir-mpioinit.c @@ -36,15 +36,15 @@ void MPIR_MPIOInit(int * error_code) { } /* --END ERROR HANDLING-- */ - MPI_Keyval_create(MPI_NULL_COPY_FN, ADIOI_End_call, &ADIO_Init_keyval, - (void *) 0); + MPI_Comm_create_keyval (MPI_COMM_NULL_COPY_FN, ADIOI_End_call, &ADIO_Init_keyval, + (void *) 0); /* put a dummy attribute on MPI_COMM_SELF, because we want the delete function to be called when MPI_COMM_SELF is freed. Clarified in MPI-2 section 4.8, the standard mandates that attributes on MPI_COMM_SELF get cleaned up early in MPI_Finalize */ - MPI_Attr_put(MPI_COMM_SELF, ADIO_Init_keyval, (void *) 0); + MPI_Comm_set_attr (MPI_COMM_SELF, ADIO_Init_keyval, (void *) 0); /* initialize ADIO */ ADIO_Init( (int *)0, (char ***)0, error_code); diff --git a/ompi/mca/io/romio314/src/io_romio314.h b/ompi/mca/io/romio314/src/io_romio314.h index 86fd9b062a7..0ea00dd486a 100644 --- a/ompi/mca/io/romio314/src/io_romio314.h +++ b/ompi/mca/io/romio314/src/io_romio314.h @@ -12,6 +12,7 @@ * Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -66,11 +67,11 @@ typedef struct mca_io_romio314_data_t mca_io_romio314_data_t; int mca_io_romio314_file_open (struct ompi_communicator_t *comm, const char *filename, int amode, - struct ompi_info_t *info, + struct opal_info_t *info, ompi_file_t *fh); int mca_io_romio314_file_close (struct ompi_file_t *fh); int mca_io_romio314_file_delete (const char *filename, - struct ompi_info_t *info); + struct opal_info_t *info); int mca_io_romio314_file_set_size (struct ompi_file_t *fh, MPI_Offset size); int mca_io_romio314_file_preallocate (struct ompi_file_t *fh, @@ -80,9 +81,9 @@ int mca_io_romio314_file_get_size (struct ompi_file_t *fh, int mca_io_romio314_file_get_amode (struct ompi_file_t *fh, int *amode); int mca_io_romio314_file_set_info (struct ompi_file_t *fh, - struct ompi_info_t *info); + struct opal_info_t *info); int mca_io_romio314_file_get_info (struct ompi_file_t *fh, - struct ompi_info_t ** info_used); + struct opal_info_t ** info_used); /* Section 9.3 */ int mca_io_romio314_file_set_view (struct ompi_file_t *fh, @@ -90,7 +91,7 @@ int mca_io_romio314_file_set_view (struct ompi_file_t *fh, struct ompi_datatype_t *etype, struct ompi_datatype_t *filetype, const char *datarep, - struct ompi_info_t *info); + struct opal_info_t *info); int mca_io_romio314_file_get_view (struct ompi_file_t *fh, MPI_Offset * disp, struct ompi_datatype_t ** etype, @@ -128,12 +129,24 @@ int mca_io_romio314_file_iread_at (struct ompi_file_t *fh, int count, struct ompi_datatype_t *datatype, ompi_request_t **request); +int mca_io_romio314_file_iread_at_all (struct ompi_file_t *fh, + MPI_Offset offset, + void *buf, + int count, + struct ompi_datatype_t *datatype, + ompi_request_t **request); int mca_io_romio314_file_iwrite_at (struct ompi_file_t *fh, MPI_Offset offset, const void *buf, int count, struct ompi_datatype_t *datatype, ompi_request_t **request); +int mca_io_romio314_file_iwrite_at_all (struct ompi_file_t *fh, + MPI_Offset offset, + const void *buf, + int count, + struct ompi_datatype_t *datatype, + ompi_request_t **request); /* Section 9.4.3 */ int mca_io_romio314_file_read (struct ompi_file_t *fh, @@ -161,11 +174,21 @@ int mca_io_romio314_file_iread (struct ompi_file_t *fh, int count, struct ompi_datatype_t *datatype, ompi_request_t **request); +int mca_io_romio314_file_iread_all (struct ompi_file_t *fh, + void *buf, + int count, + struct ompi_datatype_t *datatype, + ompi_request_t **request); int mca_io_romio314_file_iwrite (struct ompi_file_t *fh, const void *buf, int count, struct ompi_datatype_t *datatype, ompi_request_t **request); +int mca_io_romio314_file_iwrite_all (struct ompi_file_t *fh, + const void *buf, + int count, + struct ompi_datatype_t *datatype, + ompi_request_t **request); int mca_io_romio314_file_seek (struct ompi_file_t *fh, MPI_Offset offset, int whence); diff --git a/ompi/mca/io/romio314/src/io_romio314_component.c b/ompi/mca/io/romio314/src/io_romio314_component.c index 60954575760..6d53940b2cd 100644 --- a/ompi/mca/io/romio314/src/io_romio314_component.c +++ b/ompi/mca/io/romio314/src/io_romio314_component.c @@ -15,6 +15,7 @@ * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -48,10 +49,10 @@ static const struct mca_io_base_module_2_0_0_t * static int file_unquery(struct ompi_file_t *file, struct mca_io_base_file_t *private_data); -static int delete_query(const char *filename, struct ompi_info_t *info, +static int delete_query(const char *filename, struct opal_info_t *info, struct mca_io_base_delete_t **private_data, bool *usable, int *priorty); -static int delete_select(const char *filename, struct ompi_info_t *info, +static int delete_select(const char *filename, struct opal_info_t *info, struct mca_io_base_delete_t *private_data); static int register_datarep(const char *, @@ -222,7 +223,7 @@ static int file_unquery(struct ompi_file_t *file, } -static int delete_query(const char *filename, struct ompi_info_t *info, +static int delete_query(const char *filename, struct opal_info_t *info, struct mca_io_base_delete_t **private_data, bool *usable, int *priority) { @@ -234,15 +235,25 @@ static int delete_query(const char *filename, struct ompi_info_t *info, } -static int delete_select(const char *filename, struct ompi_info_t *info, +static int delete_select(const char *filename, struct opal_info_t *info, struct mca_io_base_delete_t *private_data) { int ret; +// An opal_info_t isn't a full ompi_info_t. so if we're using an MPI call +// below with an MPI_Info, we need to create an equivalent MPI_Info. This +// isn't ideal but it only happens a few places. + ompi_info_t *ompi_info; + ompi_info = OBJ_NEW(ompi_info_t); + if (!ompi_info) { return(MPI_ERR_NO_MEM); } + opal_info_t *opal_info = &(ompi_info->super); + opal_info_dup (info, &opal_info); + OPAL_THREAD_LOCK (&mca_io_romio314_mutex); - ret = ROMIO_PREFIX(MPI_File_delete)(filename, info); + ret = ROMIO_PREFIX(MPI_File_delete)(filename, ompi_info); OPAL_THREAD_UNLOCK (&mca_io_romio314_mutex); + ompi_info_free(&ompi_info); return ret; } diff --git a/ompi/mca/io/romio314/src/io_romio314_file_open.c b/ompi/mca/io/romio314/src/io_romio314_file_open.c index b08da7ff0c5..0fdd3841668 100644 --- a/ompi/mca/io/romio314/src/io_romio314_file_open.c +++ b/ompi/mca/io/romio314/src/io_romio314_file_open.c @@ -11,6 +11,7 @@ * All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -31,18 +32,28 @@ int mca_io_romio314_file_open (ompi_communicator_t *comm, const char *filename, int amode, - ompi_info_t *info, + opal_info_t *info, ompi_file_t *fh) { int ret; mca_io_romio314_data_t *data; +// An opal_info_t isn't a full ompi_info_t. so if we're using an MPI call +// below with an MPI_Info, we need to create an equivalent MPI_Info. This +// isn't ideal but it only happens a few places. + ompi_info_t *ompi_info; + ompi_info = OBJ_NEW(ompi_info_t); + if (!ompi_info) { return(MPI_ERR_NO_MEM); } + opal_info_t *opal_info = &(ompi_info->super); + opal_info_dup (info, &opal_info); + data = (mca_io_romio314_data_t *) fh->f_io_selected_data; // OPAL_THREAD_LOCK (&mca_io_romio314_mutex); - ret = ROMIO_PREFIX(MPI_File_open)(comm, filename, amode, info, + ret = ROMIO_PREFIX(MPI_File_open)(comm, filename, amode, ompi_info, &data->romio_fh); // OPAL_THREAD_UNLOCK (&mca_io_romio314_mutex); + ompi_info_free(&ompi_info); return ret; } @@ -149,32 +160,51 @@ mca_io_romio314_file_get_amode (ompi_file_t *fh, int mca_io_romio314_file_set_info (ompi_file_t *fh, - ompi_info_t *info) + opal_info_t *info) { int ret; mca_io_romio314_data_t *data; +// An opal_info_t isn't a full ompi_info_t. so if we're using an MPI call +// below with an MPI_Info, we need to create an equivalent MPI_Info. This +// isn't ideal but it only happens a few places. + ompi_info_t *ompi_info; + ompi_info = OBJ_NEW(ompi_info_t); + if (!ompi_info) { return(MPI_ERR_NO_MEM); } + opal_info_t *opal_info = &(ompi_info->super); + opal_info_dup (info, &opal_info); + data = (mca_io_romio314_data_t *) fh->f_io_selected_data; OPAL_THREAD_LOCK (&mca_io_romio314_mutex); - ret = ROMIO_PREFIX(MPI_File_set_info) (data->romio_fh, info); + ret = ROMIO_PREFIX(MPI_File_set_info) (data->romio_fh, ompi_info); OPAL_THREAD_UNLOCK (&mca_io_romio314_mutex); + ompi_info_free(&ompi_info); return ret; } int mca_io_romio314_file_get_info (ompi_file_t *fh, - ompi_info_t ** info_used) + opal_info_t ** info_used) { int ret; mca_io_romio314_data_t *data; +// An opal_info_t isn't a full ompi_info_t. so if we're using an MPI call +// below with an MPI_Info, we need to create an equivalent MPI_Info. This +// isn't ideal but it only happens a few places. + ompi_info_t *ompi_info; + ompi_info = OBJ_NEW(ompi_info_t); + if (!ompi_info) { return(MPI_ERR_NO_MEM); } + data = (mca_io_romio314_data_t *) fh->f_io_selected_data; OPAL_THREAD_LOCK (&mca_io_romio314_mutex); - ret = ROMIO_PREFIX(MPI_File_get_info) (data->romio_fh, info_used); + ret = ROMIO_PREFIX(MPI_File_get_info) (data->romio_fh, &ompi_info); OPAL_THREAD_UNLOCK (&mca_io_romio314_mutex); + opal_info_dup (&(ompi_info->super), info_used); + ompi_info_free(&ompi_info); return ret; } @@ -185,18 +215,28 @@ mca_io_romio314_file_set_view (ompi_file_t *fh, struct ompi_datatype_t *etype, struct ompi_datatype_t *filetype, const char *datarep, - ompi_info_t *info) + opal_info_t *info) { int ret; mca_io_romio314_data_t *data; +// An opal_info_t isn't a full ompi_info_t. so if we're using an MPI call +// below with an MPI_Info, we need to create an equivalent MPI_Info. This +// isn't ideal but it only happens a few places. + ompi_info_t *ompi_info; + ompi_info = OBJ_NEW(ompi_info_t); + if (!ompi_info) { return(MPI_ERR_NO_MEM); } + opal_info_t *opal_info = &(ompi_info->super); + opal_info_dup (info, &opal_info); + data = (mca_io_romio314_data_t *) fh->f_io_selected_data; OPAL_THREAD_LOCK (&mca_io_romio314_mutex); ret = ROMIO_PREFIX(MPI_File_set_view) (data->romio_fh, disp, etype, filetype, - datarep, info); + datarep, ompi_info); OPAL_THREAD_UNLOCK (&mca_io_romio314_mutex); + ompi_info_free(&ompi_info); return ret; } diff --git a/ompi/mca/io/romio314/src/io_romio314_file_read.c b/ompi/mca/io/romio314/src/io_romio314_file_read.c index df899a50303..fae1421c27c 100644 --- a/ompi/mca/io/romio314/src/io_romio314_file_read.c +++ b/ompi/mca/io/romio314/src/io_romio314_file_read.c @@ -9,6 +9,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -87,6 +88,33 @@ mca_io_romio314_file_iread_at (ompi_file_t *fh, return ret; } +int +mca_io_romio314_file_iread_at_all (ompi_file_t *fh, + MPI_Offset offset, + void *buf, + int count, + struct ompi_datatype_t *datatype, + ompi_request_t **request) +{ + int ret; + mca_io_romio314_data_t *data; + + data = (mca_io_romio314_data_t *) fh->f_io_selected_data; + OPAL_THREAD_LOCK (&mca_io_romio314_mutex); + // ---------------------------------------------------- + // NOTE: If you upgrade ROMIO, replace this with the actual ROMIO call. + // ---------------------------------------------------- + // No support for non-blocking collective I/O operations. + // Fake it with individual non-blocking I/O operations. + // Similar to OMPIO + ret = + ROMIO_PREFIX(MPI_File_iread_at) (data->romio_fh, offset, buf, count, + datatype, request); + OPAL_THREAD_UNLOCK (&mca_io_romio314_mutex); + + return ret; +} + int mca_io_romio314_file_read (ompi_file_t *fh, @@ -150,6 +178,31 @@ mca_io_romio314_file_iread (ompi_file_t *fh, return ret; } +int +mca_io_romio314_file_iread_all (ompi_file_t *fh, + void *buf, + int count, + struct ompi_datatype_t *datatype, + ompi_request_t **request) +{ + int ret; + mca_io_romio314_data_t *data; + + data = (mca_io_romio314_data_t *) fh->f_io_selected_data; + OPAL_THREAD_LOCK (&mca_io_romio314_mutex); + // ---------------------------------------------------- + // NOTE: If you upgrade ROMIO, replace this with the actual ROMIO call. + // ---------------------------------------------------- + // No support for non-blocking collective I/O operations. + // Fake it with individual non-blocking I/O operations. + // Similar to OMPIO + ret = + ROMIO_PREFIX(MPI_File_iread) (data->romio_fh, buf, count, datatype, + request); + OPAL_THREAD_UNLOCK (&mca_io_romio314_mutex); + + return ret; +} int mca_io_romio314_file_read_shared (ompi_file_t *fh, diff --git a/ompi/mca/io/romio314/src/io_romio314_file_write.c b/ompi/mca/io/romio314/src/io_romio314_file_write.c index 628cfd2e592..f8cb72e2650 100644 --- a/ompi/mca/io/romio314/src/io_romio314_file_write.c +++ b/ompi/mca/io/romio314/src/io_romio314_file_write.c @@ -11,6 +11,7 @@ * All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -92,6 +93,32 @@ mca_io_romio314_file_iwrite_at (ompi_file_t *fh, } +int +mca_io_romio314_file_iwrite_at_all (ompi_file_t *fh, + MPI_Offset offset, + const void *buf, + int count, + struct ompi_datatype_t *datatype, + ompi_request_t **request) +{ + int ret; + mca_io_romio314_data_t *data; + + data = (mca_io_romio314_data_t *) fh->f_io_selected_data; + OPAL_THREAD_LOCK (&mca_io_romio314_mutex); + // ---------------------------------------------------- + // NOTE: If you upgrade ROMIO, replace this with the actual ROMIO call. + // ---------------------------------------------------- + // No support for non-blocking collective I/O operations. + // Fake it with individual non-blocking I/O operations. + // Similar to OMPIO + ret = + ROMIO_PREFIX(MPI_File_iwrite_at) (data->romio_fh, offset, buf, count, + datatype, request); + OPAL_THREAD_UNLOCK (&mca_io_romio314_mutex); + + return ret; +} @@ -155,6 +182,32 @@ mca_io_romio314_file_iwrite (ompi_file_t *fh, return ret; } +int +mca_io_romio314_file_iwrite_all (ompi_file_t *fh, + const void *buf, + int count, + struct ompi_datatype_t *datatype, + ompi_request_t **request) +{ + int ret; + mca_io_romio314_data_t *data; + + data = (mca_io_romio314_data_t *) fh->f_io_selected_data; + OPAL_THREAD_LOCK (&mca_io_romio314_mutex); + // ---------------------------------------------------- + // NOTE: If you upgrade ROMIO, replace this with the actual ROMIO call. + // ---------------------------------------------------- + // No support for non-blocking collective I/O operations. + // Fake it with individual non-blocking I/O operations. + // Similar to OMPIO + ret = + ROMIO_PREFIX(MPI_File_iwrite) (data->romio_fh, buf, count, datatype, + request); + OPAL_THREAD_UNLOCK (&mca_io_romio314_mutex); + + return ret; +} + int mca_io_romio314_file_write_shared (ompi_file_t *fh, diff --git a/ompi/mca/io/romio314/src/io_romio314_module.c b/ompi/mca/io/romio314/src/io_romio314_module.c index 3a40046cbdf..800c3bd7948 100644 --- a/ompi/mca/io/romio314/src/io_romio314_module.c +++ b/ompi/mca/io/romio314/src/io_romio314_module.c @@ -11,6 +11,7 @@ * All rights reserved. * Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2008 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -47,8 +48,6 @@ mca_io_base_module_2_0_0_t mca_io_romio314_module = { mca_io_romio314_file_preallocate, mca_io_romio314_file_get_size, mca_io_romio314_file_get_amode, - mca_io_romio314_file_set_info, - mca_io_romio314_file_get_info, mca_io_romio314_file_set_view, mca_io_romio314_file_get_view, @@ -59,8 +58,8 @@ mca_io_base_module_2_0_0_t mca_io_romio314_module = { mca_io_romio314_file_write_at_all, mca_io_romio314_file_iread_at, mca_io_romio314_file_iwrite_at, - NULL, /* iread_at_all */ - NULL, /* iwrite_at_all */ + mca_io_romio314_file_iread_at_all, + mca_io_romio314_file_iwrite_at_all, /* non-indexed IO operations */ mca_io_romio314_file_read, @@ -69,8 +68,8 @@ mca_io_base_module_2_0_0_t mca_io_romio314_module = { mca_io_romio314_file_write_all, mca_io_romio314_file_iread, mca_io_romio314_file_iwrite, - NULL, /* iread_all */ - NULL, /* iwrite_all */ + mca_io_romio314_file_iread_all, + mca_io_romio314_file_iwrite_all, mca_io_romio314_file_seek, mca_io_romio314_file_get_position, @@ -138,6 +137,14 @@ void ADIOI_Datatype_iscontig(MPI_Datatype datatype, int *flag) * In addition, if the data is contiguous but true_lb differes * from zero, ROMIO will ignore the displacement. Thus, lie! */ + + size_t size; + opal_datatype_type_size (&datatype->super, &size); + if ( 0 == size ) { + *flag = 1; + return; + } + *flag = ompi_datatype_is_contiguous_memory_layout(datatype, 2); if (*flag) { MPI_Aint true_extent, true_lb; diff --git a/ompi/mca/mtl/base/mtl_base_frame.c b/ompi/mca/mtl/base/mtl_base_frame.c index ea5784304a6..757ad93ce7a 100644 --- a/ompi/mca/mtl/base/mtl_base_frame.c +++ b/ompi/mca/mtl/base/mtl_base_frame.c @@ -10,7 +10,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2006 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2015-2018 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -108,6 +108,7 @@ ompi_mtl_base_close(void) { /* NTH: Should we be freeing the mtl module here? */ ompi_mtl = NULL; + ompi_mtl_base_selected_component = NULL; /* Close all remaining available modules (may be one if this is a OMPI RTE program, or [possibly] multiple if this is ompi_info) */ diff --git a/ompi/mca/mtl/mtl.h b/ompi/mca/mtl/mtl.h index f703250b4e7..24b2153064d 100644 --- a/ompi/mca/mtl/mtl.h +++ b/ompi/mca/mtl/mtl.h @@ -5,6 +5,7 @@ * Copyright (c) 2012 Sandia National Laboratories. All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017 Intel, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -61,6 +62,9 @@ typedef struct mca_mtl_request_t mca_mtl_request_t; * MTL module flags */ #define MCA_MTL_BASE_FLAG_REQUIRE_WORLD 0x00000001 +#if OPAL_CUDA_SUPPORT +#define MCA_MTL_BASE_FLAG_CUDA_INIT_DISABLE 0x00000002 +#endif /** * Initialization routine for MTL component diff --git a/ompi/mca/mtl/mxm/Makefile.am b/ompi/mca/mtl/mxm/Makefile.am deleted file mode 100644 index fae061a0275..00000000000 --- a/ompi/mca/mtl/mxm/Makefile.am +++ /dev/null @@ -1,49 +0,0 @@ -# -# Copyright (C) Mellanox Technologies Ltd. 2001-2011. ALL RIGHTS RESERVED. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - -AM_CPPFLAGS = $(mtl_mxm_CPPFLAGS) - -dist_ompidata_DATA = help-mtl-mxm.txt - -mtl_mxm_sources = \ - mtl_mxm.c \ - mtl_mxm.h \ - mtl_mxm_cancel.c \ - mtl_mxm_component.c \ - mtl_mxm_endpoint.c \ - mtl_mxm_endpoint.h \ - mtl_mxm_probe.c \ - mtl_mxm_recv.c \ - mtl_mxm_request.h \ - mtl_mxm_send.c \ - mtl_mxm_debug.h \ - mtl_mxm_types.h - -# Make the output library in this directory, and name it either -# mca__.la (for DSO builds) or libmca__.la -# (for static builds). - -if MCA_BUILD_ompi_mtl_mxm_DSO -component_noinst = -component_install = mca_mtl_mxm.la -else -component_noinst = libmca_mtl_mxm.la -component_install = -endif - -mcacomponentdir = $(ompilibdir) -mcacomponent_LTLIBRARIES = $(component_install) -mca_mtl_mxm_la_SOURCES = $(mtl_mxm_sources) -mca_mtl_mxm_la_LIBADD = $(mtl_mxm_LIBS) -mca_mtl_mxm_la_LDFLAGS = -module -avoid-version $(mtl_mxm_LDFLAGS) - -noinst_LTLIBRARIES = $(component_noinst) -libmca_mtl_mxm_la_SOURCES = $(mtl_mxm_sources) -libmca_mtl_mxm_la_LIBADD = $(mtl_mxm_LIBS) -libmca_mtl_mxm_la_LDFLAGS = -module -avoid-version $(mtl_mxm_LDFLAGS) diff --git a/ompi/mca/mtl/mxm/configure.m4 b/ompi/mca/mtl/mxm/configure.m4 deleted file mode 100644 index 1d3914ff1d6..00000000000 --- a/ompi/mca/mtl/mxm/configure.m4 +++ /dev/null @@ -1,39 +0,0 @@ -# -*- shell-script -*- -# -# Copyright (C) Mellanox Technologies Ltd. 2001-2011. ALL RIGHTS RESERVED. -# Copyright (c) 2013 Sandia National Laboratories. All rights reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - -# MCA_ompi_mtl_mxm_POST_CONFIG(will_build) -# ---------------------------------------- -# Only require the tag if we're actually going to be built -AC_DEFUN([MCA_ompi_mtl_mxm_POST_CONFIG], [ - AS_IF([test "$1" = "1"], [OMPI_REQUIRE_ENDPOINT_TAG([MTL])]) -])dnl - -# MCA_mtl_mxm_CONFIG([action-if-can-compile], -# [action-if-cant-compile]) -# ------------------------------------------------ -AC_DEFUN([MCA_ompi_mtl_mxm_CONFIG],[ - AC_CONFIG_FILES([ompi/mca/mtl/mxm/Makefile]) - - OMPI_CHECK_MXM([mtl_mxm], - [mtl_mxm_happy="yes"], - [mtl_mxm_happy="no"]) - - AS_IF([test "$mtl_mxm_happy" = "yes"], - [$1], - [$2]) - - # substitute in the things needed to build mxm - AC_SUBST([mtl_mxm_CFLAGS]) - AC_SUBST([mtl_mxm_CPPFLAGS]) - AC_SUBST([mtl_mxm_LDFLAGS]) - AC_SUBST([mtl_mxm_LIBS]) -])dnl - diff --git a/ompi/mca/mtl/mxm/help-mtl-mxm.txt b/ompi/mca/mtl/mxm/help-mtl-mxm.txt deleted file mode 100644 index 32a06782c62..00000000000 --- a/ompi/mca/mtl/mxm/help-mtl-mxm.txt +++ /dev/null @@ -1,67 +0,0 @@ -# -# Copyright (C) Mellanox Technologies Ltd. 2001-2011. ALL RIGHTS RESERVED. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - - -[no uuid present] -Error obtaining unique transport key from ORTE (orte_precondition_transports %s -the environment). - - Local host: %s - -[unable to create endpoint] -MXM was unable to create an endpoint. Please make sure that the network link is -active on the node and the hardware is functioning. - - Error: %s - -[unable to extract endpoint ptl address] -MXM was unable to read settings for endpoint - - PTL ID: %d - Error: %s - -[unable to extract endpoint address] -MXM was unable to read settings for endpoint - - Error: %s - -[mxm mq create] -Failed to create MQ for endpoint - - Error: %s - -[errors during mxm_progress] - -Error %s occurred in attempting to make network progress (mxm_progress). - - -[mxm init] -Initialization of MXM library failed. - - Error: %s - -[error posting receive] -Unable to post application receive buffer - - Error: %s - Buffer: %p - Length: %d - -[error posting message receive] -Unable to post application receive buffer - - Error: %s - Buffer: %p - Length: %d - -[error posting send] -Unable to post application send buffer - - Error: %s - diff --git a/ompi/mca/mtl/mxm/mtl_mxm.c b/ompi/mca/mtl/mxm/mtl_mxm.c deleted file mode 100644 index 4482ad4fc25..00000000000 --- a/ompi/mca/mtl/mxm/mtl_mxm.c +++ /dev/null @@ -1,679 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (C) 2001-2011 Mellanox Technologies Ltd. ALL RIGHTS RESERVED. - * Copyright (c) 2013-2015 Intel, Inc. All rights reserved - * Copyright (c) 2014-2016 Research Organization for Information Science - * and Technology (RIST). All rights reserved. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights - * reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - - -#include "ompi_config.h" - -#include "ompi/mca/mtl/mtl.h" -#include "ompi/mca/mtl/base/mtl_base_datatype.h" -#include "ompi/proc/proc.h" -#include "ompi/communicator/communicator.h" -#include "opal/memoryhooks/memory.h" -#include "opal/util/show_help.h" -#include "opal/mca/pmix/pmix.h" - -#include "mtl_mxm.h" -#include "mtl_mxm_types.h" -#include "mtl_mxm_endpoint.h" -#include "mtl_mxm_request.h" - -mca_mtl_mxm_module_t ompi_mtl_mxm = { - { - 0, /* max context id */ - 0, /* max tag value */ - 0, /* request reserve space */ - 0, /* flags */ - ompi_mtl_mxm_add_procs, - ompi_mtl_mxm_del_procs, - ompi_mtl_mxm_finalize, - ompi_mtl_mxm_send, - ompi_mtl_mxm_isend, - ompi_mtl_mxm_irecv, - ompi_mtl_mxm_iprobe, - ompi_mtl_mxm_imrecv, - ompi_mtl_mxm_improbe, - ompi_mtl_mxm_cancel, - ompi_mtl_mxm_add_comm, - ompi_mtl_mxm_del_comm - }, - 0, - 0, - NULL, - NULL -}; - -#if MXM_API < MXM_VERSION(2,0) -static uint32_t ompi_mtl_mxm_get_job_id(void) -{ - uint8_t unique_job_key[16]; - uint32_t job_key; - unsigned long long *uu; - char *generated_key; - - uu = (unsigned long long *) unique_job_key; - - generated_key = getenv(OPAL_MCA_PREFIX"orte_precondition_transports"); - memset(uu, 0, sizeof(unique_job_key)); - - if (!generated_key || (strlen(generated_key) != 33) || sscanf(generated_key, "%016llx-%016llx", &uu[0], &uu[1]) != 2) { - opal_show_help("help-mtl-mxm.txt", "no uuid present", true, - generated_key ? "could not be parsed from" : - "not present in", ompi_process_info.nodename); - return 0; - } - - /* - * decode OPAL_MCA_PREFIX"orte_precondition_transports" that looks as - * 000003ca00000000-0000000100000000 - * jobfam-stepid - * to get jobid coded with ORTE_CONSTRUCT_LOCAL_JOBID() - */ - #define GET_LOCAL_JOBID(local, job) \ - ( ((local) & 0xffff0000) | ((job) & 0x0000ffff) ) - job_key = GET_LOCAL_JOBID((uu[0]>>(8 * sizeof(int))) << 16, uu[1]>>(8 * sizeof(int))); - - return job_key; -} -#endif - -int ompi_mtl_mxm_progress(void); -#if MXM_API >= MXM_VERSION(2,0) -static void ompi_mtl_mxm_mem_release_cb(void *buf, size_t length, - void *cbdata, bool from_alloc); -#endif - -#if MXM_API < MXM_VERSION(2,0) -static int ompi_mtl_mxm_get_ep_address(ompi_mtl_mxm_ep_conn_info_t *ep_info, mxm_ptl_id_t ptlid) -{ - size_t addrlen; - mxm_error_t err; - - addrlen = sizeof(ep_info->ptl_addr[ptlid]); - err = mxm_ep_address(ompi_mtl_mxm.ep, ptlid, - (struct sockaddr *) &ep_info->ptl_addr[ptlid], &addrlen); - if (MXM_OK != err) { - opal_show_help("help-mtl-mxm.txt", "unable to extract endpoint ptl address", - true, (int)ptlid, mxm_error_string(err)); - return OMPI_ERROR; - } - - return OMPI_SUCCESS; -} -#else -static int ompi_mtl_mxm_get_ep_address(void **address_p, size_t *address_len_p) -{ - mxm_error_t err; - - *address_len_p = 0; - err = mxm_ep_get_address(ompi_mtl_mxm.ep, NULL, address_len_p); - if (err != MXM_ERR_BUFFER_TOO_SMALL) { - MXM_ERROR("Failed to get ep address length"); - return OMPI_ERROR; - } - - *address_p = malloc(*address_len_p); - if (*address_p == NULL) { - MXM_ERROR("Failed to allocate ep address buffer"); - return OMPI_ERR_OUT_OF_RESOURCE; - } - - err = mxm_ep_get_address(ompi_mtl_mxm.ep, *address_p, address_len_p); - if (MXM_OK != err) { - opal_show_help("help-mtl-mxm.txt", "unable to extract endpoint address", - true, mxm_error_string(err)); - return OMPI_ERROR; - } - - return OMPI_SUCCESS; -} -#endif - -#define max(a,b) ((a)>(b)?(a):(b)) - -static mxm_error_t -ompi_mtl_mxm_create_ep(mxm_h ctx, mxm_ep_h *ep, unsigned ptl_bitmap, int lr, - uint32_t jobid, uint64_t mxlr, int nlps) -{ - mxm_error_t err; - -#if MXM_API < MXM_VERSION(2,0) - ompi_mtl_mxm.mxm_ep_opts->job_id = jobid; - ompi_mtl_mxm.mxm_ep_opts->local_rank = lr; - ompi_mtl_mxm.mxm_ep_opts->num_local_procs = nlps; - err = mxm_ep_create(ctx, ompi_mtl_mxm.mxm_ep_opts, ep); -#else - err = mxm_ep_create(ctx, ompi_mtl_mxm.mxm_ep_opts, ep); -#endif - return err; -} - -/* - * send information using modex (in some case there is limitation on data size for example ess/pmi) - * set size of data sent for once - * - */ -static int ompi_mtl_mxm_send_ep_address(void *address, size_t address_len) -{ - char *modex_component_name = mca_base_component_to_string(&mca_mtl_mxm_component.super.mtl_version); - char *modex_name = malloc(strlen(modex_component_name) + 5); - const size_t modex_max_size = 0x60; - unsigned char *modex_buf_ptr; - size_t modex_buf_size; - size_t modex_cur_size; - int modex_name_id = 0; - int rc; - - /* Send address length */ - sprintf(modex_name, "%s-len", modex_component_name); - OPAL_MODEX_SEND_STRING(rc, OPAL_PMIX_GLOBAL, - modex_name, &address_len, sizeof(address_len)); - if (OMPI_SUCCESS != rc) { - MXM_ERROR("failed to send address length"); - goto bail; - } - - /* Send address, in parts. - * modex name looks as mtl.mxm.1.5-18 where mtl.mxm.1.5 is the component and 18 is part index. - */ - modex_buf_size = address_len; - modex_buf_ptr = address; - while (modex_buf_size) { - sprintf(modex_name, "%s-%d", modex_component_name, modex_name_id); - modex_cur_size = (modex_buf_size < modex_max_size) ? modex_buf_size : modex_max_size; - OPAL_MODEX_SEND_STRING(rc, OPAL_PMIX_GLOBAL, - modex_name, modex_buf_ptr, modex_cur_size); - if (OMPI_SUCCESS != rc) { - MXM_ERROR("Open MPI couldn't distribute EP connection details"); - goto bail; - } - - modex_name_id++; - modex_buf_ptr += modex_cur_size; - modex_buf_size -= modex_cur_size; - } - - rc = OMPI_SUCCESS; - -bail: - free(modex_component_name); - free(modex_name); - return rc; -} - -/* - * recieve information using modex - */ -static int ompi_mtl_mxm_recv_ep_address(ompi_proc_t *source_proc, void **address_p, - size_t *address_len_p) -{ - char *modex_component_name = mca_base_component_to_string(&mca_mtl_mxm_component.super.mtl_version); - char *modex_name = malloc(strlen(modex_component_name) + 5); - uint8_t *modex_buf_ptr; - int32_t modex_cur_size; - size_t modex_buf_size; - size_t *address_len_buf_ptr; - int modex_name_id = 0; - int rc; - - *address_p = NULL; - *address_len_p = 0; - - /* Receive address length */ - sprintf(modex_name, "%s-len", modex_component_name); - OPAL_MODEX_RECV_STRING(rc, modex_name, &source_proc->super.proc_name, - (uint8_t **)&address_len_buf_ptr, - &modex_cur_size); - if (OMPI_SUCCESS != rc) { - MXM_ERROR("Failed to receive ep address length"); - goto bail; - } - - /* Allocate buffer to hold the address */ - *address_len_p = *address_len_buf_ptr; - *address_p = malloc(*address_len_p); - if (*address_p == NULL) { - MXM_ERROR("Failed to allocate modex receive buffer"); - rc = OMPI_ERR_OUT_OF_RESOURCE; - goto bail; - } - - /* Receive the data, in parts */ - modex_buf_size = 0; - while (modex_buf_size < *address_len_p) { - sprintf(modex_name, "%s-%d", modex_component_name, modex_name_id); - OPAL_MODEX_RECV_STRING(rc, modex_name, &source_proc->super.proc_name, - &modex_buf_ptr, - &modex_cur_size); - if (OMPI_SUCCESS != rc) { - MXM_ERROR("Open MPI couldn't distribute EP connection details"); - free(*address_p); - *address_p = NULL; - *address_len_p = 0; - goto bail; - } - - memcpy((char*)(*address_p) + modex_buf_size, modex_buf_ptr, modex_cur_size); - modex_buf_size += modex_cur_size; - modex_name_id++; - } - - rc = OMPI_SUCCESS; -bail: - free(modex_component_name); - free(modex_name); - return rc; -} - -int ompi_mtl_mxm_module_init(void) -{ -#if MXM_API < MXM_VERSION(2,0) - ompi_mtl_mxm_ep_conn_info_t ep_info; -#endif - void *ep_address; - size_t ep_address_len; - mxm_error_t err; - uint32_t jobid; - uint64_t mxlr; - ompi_proc_t **procs; - unsigned ptl_bitmap; - size_t totps, proc; - int lr, nlps; - int rc; - - mxlr = 0; - lr = -1; - jobid = 0; - -#if MXM_API < MXM_VERSION(2,0) - jobid = ompi_mtl_mxm_get_job_id(); - if (0 == jobid) { - MXM_ERROR("Failed to generate jobid"); - return OMPI_ERROR; - } -#endif - - totps = ompi_proc_world_size (); - - if (totps < (size_t)ompi_mtl_mxm.mxm_np) { - MXM_VERBOSE(1, "MXM support will be disabled because of total number " - "of processes (%lu) is less than the minimum set by the " - "mtl_mxm_np MCA parameter (%u)", totps, ompi_mtl_mxm.mxm_np); - return OMPI_ERR_NOT_SUPPORTED; - } - MXM_VERBOSE(1, "MXM support enabled"); - - if (ORTE_NODE_RANK_INVALID == (lr = ompi_process_info.my_node_rank)) { - MXM_ERROR("Unable to obtain local node rank"); - return OMPI_ERROR; - } - nlps = ompi_process_info.num_local_peers + 1; - - /* local procs are always allocated. if that ever changes this will need to - * be modified. */ - procs = ompi_proc_get_allocated (&totps); - if (NULL == procs) { - MXM_ERROR("Unable to obtain process list"); - return OMPI_ERROR; - } - - for (proc = 0; proc < totps; proc++) { - if (OPAL_PROC_ON_LOCAL_NODE(procs[proc]->super.proc_flags)) { - mxlr = max(mxlr, procs[proc]->super.proc_name.vpid); - } - } - free(procs); - - /* Setup the endpoint options and local addresses to bind to. */ -#if MXM_API < MXM_VERSION(2,0) - ptl_bitmap = ompi_mtl_mxm.mxm_ctx_opts->ptl_bitmap; -#else - ptl_bitmap = 0; -#endif - - /* Open MXM endpoint */ - err = ompi_mtl_mxm_create_ep(ompi_mtl_mxm.mxm_context, &ompi_mtl_mxm.ep, - ptl_bitmap, lr, jobid, mxlr, nlps); - if (MXM_OK != err) { - opal_show_help("help-mtl-mxm.txt", "unable to create endpoint", true, - mxm_error_string(err)); - return OMPI_ERROR; - } - - /* - * Get address for each PTL on this endpoint, and share it with other ranks. - */ -#if MXM_API < MXM_VERSION(2,0) - if ((ptl_bitmap & MXM_BIT(MXM_PTL_SELF)) && - OMPI_SUCCESS != ompi_mtl_mxm_get_ep_address(&ep_info, MXM_PTL_SELF)) { - return OMPI_ERROR; - } - if ((ptl_bitmap & MXM_BIT(MXM_PTL_RDMA)) && - OMPI_SUCCESS != ompi_mtl_mxm_get_ep_address(&ep_info, MXM_PTL_RDMA)) { - return OMPI_ERROR; - } - if ((ptl_bitmap & MXM_BIT(MXM_PTL_SHM)) && - OMPI_SUCCESS != ompi_mtl_mxm_get_ep_address(&ep_info, MXM_PTL_SHM)) { - return OMPI_ERROR; - } - - ep_address = &ep_info; - ep_address_len = sizeof(ep_info); -#else - rc = ompi_mtl_mxm_get_ep_address(&ep_address, &ep_address_len); - if (OMPI_SUCCESS != rc) { - return rc; - } -#endif - - rc = ompi_mtl_mxm_send_ep_address(ep_address, ep_address_len); - if (OMPI_SUCCESS != rc) { - MXM_ERROR("Modex session failed."); - return rc; - } - -#if MXM_API >= MXM_VERSION(2,0) - free(ep_address); -#endif - - /* Register the MXM progress function */ - opal_progress_register(ompi_mtl_mxm_progress); - - ompi_mtl_mxm.super.mtl_flags |= MCA_MTL_BASE_FLAG_REQUIRE_WORLD; - - -#if MXM_API >= MXM_VERSION(2,0) - if (ompi_mtl_mxm.using_mem_hooks) { - opal_mem_hooks_register_release(ompi_mtl_mxm_mem_release_cb, NULL); - } -#endif - return OMPI_SUCCESS; -} - -int ompi_mtl_mxm_finalize(struct mca_mtl_base_module_t* mtl) -{ -#if MXM_API >= MXM_VERSION(2,0) - if (ompi_mtl_mxm.using_mem_hooks) { - opal_mem_hooks_unregister_release(ompi_mtl_mxm_mem_release_cb); - } -#endif - opal_progress_unregister(ompi_mtl_mxm_progress); - mxm_ep_destroy(ompi_mtl_mxm.ep); - return OMPI_SUCCESS; -} - -int ompi_mtl_mxm_add_procs(struct mca_mtl_base_module_t *mtl, size_t nprocs, - struct ompi_proc_t** procs) -{ -#if MXM_API < MXM_VERSION(2,0) - ompi_mtl_mxm_ep_conn_info_t *ep_info; - mxm_conn_req_t *conn_reqs; - size_t ep_index = 0; -#endif - void *ep_address = NULL; - size_t ep_address_len; - mxm_error_t err; - size_t i; - int rc; - mca_mtl_mxm_endpoint_t *endpoint; - - assert(mtl == &ompi_mtl_mxm.super); - -#if MXM_API < MXM_VERSION(2,0) - /* Allocate connection requests */ - conn_reqs = calloc(nprocs, sizeof(mxm_conn_req_t)); - ep_info = calloc(nprocs, sizeof(ompi_mtl_mxm_ep_conn_info_t)); - if (NULL == conn_reqs || NULL == ep_info) { - rc = OMPI_ERR_OUT_OF_RESOURCE; - goto bail; - } -#endif - - /* Get the EP connection requests for all the processes from modex */ - for (i = 0; i < nprocs; ++i) { - if (NULL != procs[i]->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_MTL]) { - continue; /* already connected to this endpoint */ - } - rc = ompi_mtl_mxm_recv_ep_address(procs[i], &ep_address, &ep_address_len); - if (rc != OMPI_SUCCESS) { - goto bail; - } - -#if MXM_API < MXM_VERSION(2,0) - if (ep_address_len != sizeof(ep_info[i])) { - MXM_ERROR("Invalid endpoint address length"); - free(ep_address); - rc = OMPI_ERROR; - goto bail; - } - - memcpy(&ep_info[i], ep_address, ep_address_len); - free(ep_address); - conn_reqs[ep_index].ptl_addr[MXM_PTL_SELF] = (struct sockaddr *)&(ep_info[i].ptl_addr[MXM_PTL_SELF]); - conn_reqs[ep_index].ptl_addr[MXM_PTL_SHM] = (struct sockaddr *)&(ep_info[i].ptl_addr[MXM_PTL_SHM]); - conn_reqs[ep_index].ptl_addr[MXM_PTL_RDMA] = (struct sockaddr *)&(ep_info[i].ptl_addr[MXM_PTL_RDMA]); - ep_index++; - -#else - endpoint = OBJ_NEW(mca_mtl_mxm_endpoint_t); - endpoint->mtl_mxm_module = &ompi_mtl_mxm; - err = mxm_ep_connect(ompi_mtl_mxm.ep, ep_address, &endpoint->mxm_conn); - free(ep_address); - if (err != MXM_OK) { - MXM_ERROR("MXM returned connect error: %s\n", mxm_error_string(err)); - rc = OMPI_ERROR; - goto bail; - } - procs[i]->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_MTL] = endpoint; -#endif - } - -#if MXM_API < MXM_VERSION(2,0) - /* Connect to remote peers */ - err = mxm_ep_connect(ompi_mtl_mxm.ep, conn_reqs, ep_index, -1); - if (MXM_OK != err) { - MXM_ERROR("MXM returned connect error: %s\n", mxm_error_string(err)); - for (i = 0; i < ep_index; ++i) { - if (MXM_OK != conn_reqs[i].error) { - MXM_ERROR("MXM EP connect to %s error: %s\n", - (NULL == procs[i]->super.proc_hostname) ? - "unknown" : procs[i]->proc_hostname, - mxm_error_string(conn_reqs[i].error)); - } - } - rc = OMPI_ERROR; - goto bail; - } - - /* Save returned connections */ - for (i = 0; i < ep_index; ++i) { - endpoint = OBJ_NEW(mca_mtl_mxm_endpoint_t); - endpoint->mtl_mxm_module = &ompi_mtl_mxm; - endpoint->mxm_conn = conn_reqs[i].conn; - procs[i]->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_MTL] = endpoint; - } - -#endif - -#if MXM_API >= MXM_VERSION(3,1) - if (ompi_mtl_mxm.bulk_connect) { - mxm_ep_wireup(ompi_mtl_mxm.ep); - } -#endif - - rc = OMPI_SUCCESS; - -bail: -#if MXM_API < MXM_VERSION(2,0) - free(conn_reqs); - free(ep_info); -#endif - return rc; -} - -int ompi_mtl_add_single_proc(struct mca_mtl_base_module_t *mtl, - struct ompi_proc_t* procs) -{ - void *ep_address = NULL; - size_t ep_address_len; - mxm_error_t err; - int rc; - mca_mtl_mxm_endpoint_t *endpoint; - - assert(mtl == &ompi_mtl_mxm.super); - - if (NULL != procs->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_MTL]) { - return OMPI_SUCCESS; - } - rc = ompi_mtl_mxm_recv_ep_address(procs, &ep_address, &ep_address_len); - if (rc != OMPI_SUCCESS) { - return rc; - } - -#if MXM_API < MXM_VERSION(2,0) - ompi_mtl_mxm_ep_conn_info_t ep_info; - mxm_conn_req_t conn_req; - - if (ep_address_len != sizeof(ep_info)) { - MXM_ERROR("Invalid endpoint address length"); - free(ep_address); - return OMPI_ERROR; - } - - memcpy(&ep_info, ep_address, ep_address_len); - free(ep_address); - conn_req.ptl_addr[MXM_PTL_SELF] = (struct sockaddr *)&(ep_info.ptl_addr[MXM_PTL_SELF]); - conn_req.ptl_addr[MXM_PTL_SHM] = (struct sockaddr *)&(ep_info.ptl_addr[MXM_PTL_SHM]); - conn_req.ptl_addr[MXM_PTL_RDMA] = (struct sockaddr *)&(ep_info.ptl_addr[MXM_PTL_RDMA]); - - /* Connect to remote peers */ - err = mxm_ep_connect(ompi_mtl_mxm.ep, conn_req, 1, -1); - if (MXM_OK != err) { - MXM_ERROR("MXM returned connect error: %s\n", mxm_error_string(err)); - if (MXM_OK != conn_req.error) { - MXM_ERROR("MXM EP connect to %s error: %s\n", - (NULL == procs->super.proc_hostname) ? - "unknown" : procs->proc_hostname, - mxm_error_string(conn_reqs.error)); - } - return OMPI_ERROR; - } - - /* Save returned connections */ - endpoint = OBJ_NEW(mca_mtl_mxm_endpoint_t); - endpoint->mtl_mxm_module = &ompi_mtl_mxm; - endpoint->mxm_conn = conn_reqs.conn; - procs->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_MTL] = endpoint; -#else - endpoint = OBJ_NEW(mca_mtl_mxm_endpoint_t); - endpoint->mtl_mxm_module = &ompi_mtl_mxm; - err = mxm_ep_connect(ompi_mtl_mxm.ep, ep_address, &endpoint->mxm_conn); - free(ep_address); - if (err != MXM_OK) { - MXM_ERROR("MXM returned connect error: %s\n", mxm_error_string(err)); - return OMPI_ERROR; - } - procs->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_MTL] = endpoint; -#endif - -#if MXM_API >= MXM_VERSION(3,1) - if (ompi_mtl_mxm.bulk_connect) { - mxm_ep_wireup(ompi_mtl_mxm.ep); - } -#endif - return OMPI_SUCCESS; -} - -int ompi_mtl_mxm_del_procs(struct mca_mtl_base_module_t *mtl, size_t nprocs, - struct ompi_proc_t** procs) -{ - size_t i; - -#if MXM_API >= MXM_VERSION(3,1) - if (ompi_mtl_mxm.bulk_disconnect && ((int)nprocs) == ompi_proc_world_size ()) { - mxm_ep_powerdown(ompi_mtl_mxm.ep); - } -#endif - - /* XXX: Directly accessing the obj_reference_count is an abstraction - * violation of the object system. We know this needs to be fixed, but - * are deferring the fix to a later time as it involves a design issue - * in the way we handle endpoints as objects - */ - for (i = 0; i < nprocs; ++i) { - mca_mtl_mxm_endpoint_t *endpoint = (mca_mtl_mxm_endpoint_t*) - procs[i]->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_MTL]; - if (endpoint) { - mxm_ep_disconnect(endpoint->mxm_conn); - OBJ_RELEASE(endpoint); - } - } - opal_pmix.fence(NULL, 0); - return OMPI_SUCCESS; -} - -int ompi_mtl_mxm_add_comm(struct mca_mtl_base_module_t *mtl, - struct ompi_communicator_t *comm) -{ - mxm_error_t err; - mxm_mq_h mq; - - assert(mtl == &ompi_mtl_mxm.super); - assert(NULL != ompi_mtl_mxm.mxm_context); - - err = mxm_mq_create(ompi_mtl_mxm.mxm_context, comm->c_contextid, &mq); - if (MXM_OK != err) { - opal_show_help("help-mtl-mxm.txt", "mxm mq create", true, mxm_error_string(err)); - return OMPI_ERROR; - } - - comm->c_pml_comm = (void*)mq; - return OMPI_SUCCESS; -} - -int ompi_mtl_mxm_del_comm(struct mca_mtl_base_module_t *mtl, - struct ompi_communicator_t *comm) -{ - assert(mtl == &ompi_mtl_mxm.super); - if (NULL != ompi_mtl_mxm.mxm_context) { - mxm_mq_destroy((mxm_mq_h)comm->c_pml_comm); - } - return OMPI_SUCCESS; -} - -int ompi_mtl_mxm_progress(void) -{ - mxm_error_t err; - - err = mxm_progress(ompi_mtl_mxm.mxm_context); - if ((MXM_OK != err) && (MXM_ERR_NO_PROGRESS != err) ) { - opal_show_help("help-mtl-mxm.txt", "errors during mxm_progress", true, mxm_error_string(err)); - } - return 1; -} - -#if MXM_API >= MXM_VERSION(2,0) -static void ompi_mtl_mxm_mem_release_cb(void *buf, size_t length, - void *cbdata, bool from_alloc) -{ - mxm_mem_unmap(ompi_mtl_mxm.mxm_context, buf, length, - from_alloc ? MXM_MEM_UNMAP_MARK_INVALID : 0); -} -#endif - -OBJ_CLASS_INSTANCE( - ompi_mtl_mxm_message_t, - opal_free_list_item_t, - NULL, - NULL); diff --git a/ompi/mca/mtl/mxm/mtl_mxm.h b/ompi/mca/mtl/mxm/mtl_mxm.h deleted file mode 100644 index f8492446949..00000000000 --- a/ompi/mca/mtl/mxm/mtl_mxm.h +++ /dev/null @@ -1,117 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (C) Mellanox Technologies Ltd. 2001-2011. ALL RIGHTS RESERVED. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights - * reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef MTL_MXM_H_HAS_BEEN_INCLUDED -#define MTL_MXM_H_HAS_BEEN_INCLUDED - -#include -#include -#include - -#include -#ifndef MXM_VERSION -#define MXM_VERSION(major, minor) (((major)< -#endif - -#include "ompi/mca/pml/pml.h" -#include "ompi/mca/mtl/mtl.h" -#include "ompi/mca/mtl/base/base.h" -#include "opal/class/opal_free_list.h" - -#include "opal/util/output.h" -#include "opal/util/show_help.h" -#include "opal/datatype/opal_convertor.h" - -#include "mtl_mxm_debug.h" - -BEGIN_C_DECLS - -/* MTL interface functions */ -extern int ompi_mtl_mxm_add_procs(struct mca_mtl_base_module_t* mtl, - size_t nprocs, struct ompi_proc_t** procs); -extern int ompi_mtl_add_single_proc(struct mca_mtl_base_module_t *mtl, - struct ompi_proc_t* procs); -extern int ompi_mtl_mxm_del_procs(struct mca_mtl_base_module_t* mtl, - size_t nprocs, struct ompi_proc_t** procs); - -extern int ompi_mtl_mxm_send(struct mca_mtl_base_module_t* mtl, - struct ompi_communicator_t* comm, int dest, int tag, - struct opal_convertor_t *convertor, - mca_pml_base_send_mode_t mode); - -extern int ompi_mtl_mxm_isend(struct mca_mtl_base_module_t* mtl, - struct ompi_communicator_t* comm, int dest, - int tag, struct opal_convertor_t *convertor, - mca_pml_base_send_mode_t mode, bool blocking, - mca_mtl_request_t * mtl_request); - -extern int ompi_mtl_mxm_irecv(struct mca_mtl_base_module_t* mtl, - struct ompi_communicator_t *comm, int src, - int tag, struct opal_convertor_t *convertor, - struct mca_mtl_request_t *mtl_request); - -extern int ompi_mtl_mxm_iprobe(struct mca_mtl_base_module_t* mtl, - struct ompi_communicator_t *comm, int src, - int tag, int *flag, - struct ompi_status_public_t *status); - -extern int ompi_mtl_mxm_cancel(struct mca_mtl_base_module_t* mtl, - struct mca_mtl_request_t *mtl_request, int flag); - -extern int ompi_mtl_mxm_imrecv(struct mca_mtl_base_module_t* mtl, - struct opal_convertor_t *convertor, - struct ompi_message_t **message, - struct mca_mtl_request_t *mtl_request); - -extern int ompi_mtl_mxm_improbe(struct mca_mtl_base_module_t *mtl, - struct ompi_communicator_t *comm, - int src, - int tag, - int *matched, - struct ompi_message_t **message, - struct ompi_status_public_t *status); - -extern int ompi_mtl_mxm_add_comm(struct mca_mtl_base_module_t *mtl, - struct ompi_communicator_t *comm); - -extern int ompi_mtl_mxm_del_comm(struct mca_mtl_base_module_t *mtl, - struct ompi_communicator_t *comm); - -extern int ompi_mtl_mxm_finalize(struct mca_mtl_base_module_t* mtl); - -int ompi_mtl_mxm_module_init(void); - -struct ompi_mtl_mxm_message_t { - opal_free_list_item_t super; - - mxm_mq_h mq; - mxm_conn_h conn; - mxm_message_h mxm_msg; - - mxm_tag_t tag; - mxm_tag_t tag_mask; -}; -typedef struct ompi_mtl_mxm_message_t ompi_mtl_mxm_message_t; -OBJ_CLASS_DECLARATION(ompi_mtl_mxm_message_t); - -END_C_DECLS - -#endif - diff --git a/ompi/mca/mtl/mxm/mtl_mxm_cancel.c b/ompi/mca/mtl/mxm/mtl_mxm_cancel.c deleted file mode 100644 index bc6d6b1064b..00000000000 --- a/ompi/mca/mtl/mxm/mtl_mxm_cancel.c +++ /dev/null @@ -1,34 +0,0 @@ -/* - * Copyright (C) Mellanox Technologies Ltd. 2001-2011. ALL RIGHTS RESERVED. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#include "ompi_config.h" -#include "mtl_mxm.h" -#include "mtl_mxm_request.h" - -int ompi_mtl_mxm_cancel(struct mca_mtl_base_module_t* mtl, - struct mca_mtl_request_t *mtl_request, int flag) -{ - mca_mtl_mxm_request_t *mtl_mxm_request = (mca_mtl_mxm_request_t*) mtl_request; - mxm_error_t err; - -#if MXM_API >= MXM_VERSION(2,0) - if (mtl_mxm_request->is_send) { - err = mxm_req_cancel_send(&mtl_mxm_request->mxm.send); - } else { - err = mxm_req_cancel_recv(&mtl_mxm_request->mxm.recv); - } -#else - err = mxm_req_cancel(&mtl_mxm_request->mxm.base); -#endif - if ((err != MXM_OK) && (err != MXM_ERR_NO_PROGRESS)) { - return OMPI_ERROR; - } - - return OMPI_SUCCESS; -} diff --git a/ompi/mca/mtl/mxm/mtl_mxm_component.c b/ompi/mca/mtl/mxm/mtl_mxm_component.c deleted file mode 100644 index 2fdea1c5c39..00000000000 --- a/ompi/mca/mtl/mxm/mtl_mxm_component.c +++ /dev/null @@ -1,316 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (C) Mellanox Technologies Ltd. 2001-2011. ALL RIGHTS RESERVED. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights - * reserved. - * Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#include "ompi_config.h" - -#include "opal/util/output.h" -#include "opal/util/show_help.h" -#include "ompi/proc/proc.h" -#include "opal/memoryhooks/memory.h" -#include "opal/mca/memory/base/base.h" -#include "ompi/runtime/mpiruntime.h" - -#include "mtl_mxm.h" -#include "mtl_mxm_types.h" -#include "mtl_mxm_request.h" - -#include -#include -#include - -static int ompi_mtl_mxm_component_open(void); -static int ompi_mtl_mxm_component_query(mca_base_module_t **module, int *priority); -static int ompi_mtl_mxm_component_close(void); -static int ompi_mtl_mxm_component_register(void); - -static int param_priority; - -int mca_mtl_mxm_output = -1; - - -static mca_mtl_base_module_t - * ompi_mtl_mxm_component_init(bool enable_progress_threads, - bool enable_mpi_threads); - -mca_mtl_mxm_component_t mca_mtl_mxm_component = { -{ - /* - * First, the mca_base_component_t struct containing meta - * information about the component itself - */ - .mtl_version = { - MCA_MTL_BASE_VERSION_2_0_0, - .mca_component_name = "mxm", - MCA_BASE_MAKE_VERSION(component, OMPI_MAJOR_VERSION, OMPI_MINOR_VERSION, - OMPI_RELEASE_VERSION), - .mca_open_component = ompi_mtl_mxm_component_open, - .mca_close_component = ompi_mtl_mxm_component_close, - .mca_query_component = ompi_mtl_mxm_component_query, - .mca_register_component_params = ompi_mtl_mxm_component_register, - }, - .mtl_data = { - /* The component is not checkpoint ready */ - MCA_BASE_METADATA_PARAM_NONE - }, - .mtl_init = ompi_mtl_mxm_component_init, -} -}; - -static int ompi_mtl_mxm_component_register(void) -{ - mca_base_component_t*c; - -#if MXM_API < MXM_VERSION(3,0) - unsigned long cur_ver; - long major, minor; - char* runtime_version; -#endif - - c = &mca_mtl_mxm_component.super.mtl_version; - - ompi_mtl_mxm.verbose = 0; - (void) mca_base_component_var_register(c, "verbose", - "Verbose level of the MXM component", - MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, - OPAL_INFO_LVL_9, - MCA_BASE_VAR_SCOPE_LOCAL, - &ompi_mtl_mxm.verbose); - -#if MXM_API > MXM_VERSION(2,0) - ompi_mtl_mxm.mxm_np = 0; -#else - ompi_mtl_mxm.mxm_np = 128; -#endif - (void) mca_base_component_var_register(c, "np", - "[integer] Minimal number of MPI processes in a single job " - "required to activate the MXM transport", - MCA_BASE_VAR_TYPE_INT, NULL,0, 0, - OPAL_INFO_LVL_9, - MCA_BASE_VAR_SCOPE_READONLY, - &ompi_mtl_mxm.mxm_np); - - ompi_mtl_mxm.compiletime_version = MXM_VERNO_STRING; - (void) mca_base_component_var_register(c, - MCA_COMPILETIME_VER, - "Version of the libmxm library with which Open MPI was compiled", - MCA_BASE_VAR_TYPE_VERSION_STRING, - NULL, 0, 0, - OPAL_INFO_LVL_3, - MCA_BASE_VAR_SCOPE_READONLY, - &ompi_mtl_mxm.compiletime_version); - -#if MXM_API >= MXM_VERSION(3,0) - ompi_mtl_mxm.runtime_version = (char *)mxm_get_version_string(); -#else - cur_ver = mxm_get_version(); - major = (cur_ver >> MXM_MAJOR_BIT) & 0xff; - minor = (cur_ver >> MXM_MINOR_BIT) & 0xff; - asprintf(&runtime_version, "%ld.%ld", major, minor); - ompi_mtl_mxm.runtime_version = runtime_version; -#endif - - (void) mca_base_component_var_register(c, - MCA_RUNTIME_VER, - "Version of the libmxm library with which Open MPI is running", - MCA_BASE_VAR_TYPE_VERSION_STRING, - NULL, 0, 0, - OPAL_INFO_LVL_3, - MCA_BASE_VAR_SCOPE_READONLY, - &ompi_mtl_mxm.runtime_version); - -#if MXM_API < MXM_VERSION(3,0) - free(runtime_version); -#endif - - /* set high enought to defeat ob1's default */ - param_priority = 30; - (void) mca_base_component_var_register (c, - "priority", "Priority of the MXM MTL component", - MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, - OPAL_INFO_LVL_9, - MCA_BASE_VAR_SCOPE_READONLY, - ¶m_priority); - - -#if MXM_API >= MXM_VERSION(3,1) -{ - unsigned long cur_ver = mxm_get_version(); - - ompi_mtl_mxm.bulk_connect = 0; - - if (cur_ver < MXM_VERSION(3,2)) { - ompi_mtl_mxm.bulk_disconnect = 0; - } else { - ompi_mtl_mxm.bulk_disconnect = 1; - } - - (void) mca_base_component_var_register(c, "bulk_connect", - "[integer] use bulk connect", - MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, - OPAL_INFO_LVL_9, - MCA_BASE_VAR_SCOPE_READONLY, - &ompi_mtl_mxm.bulk_connect); - - (void) mca_base_component_var_register(c, "bulk_disconnect", - "[integer] use bulk disconnect", - MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, - OPAL_INFO_LVL_9, - MCA_BASE_VAR_SCOPE_READONLY, - &ompi_mtl_mxm.bulk_disconnect); - - if (cur_ver < MXM_VERSION(3,2) && - (ompi_mtl_mxm.bulk_connect || ompi_mtl_mxm.bulk_disconnect)) { - ompi_mtl_mxm.bulk_connect = 0; - ompi_mtl_mxm.bulk_disconnect = 0; - - MXM_VERBOSE(1, "WARNING: OMPI runs with %s version of MXM that is less than 3.2, " - "so bulk connect/disconnect cannot work properly and will be turn off.", - ompi_mtl_mxm.runtime_version); - } -} -#endif - - return OMPI_SUCCESS; -} - -static int ompi_mtl_mxm_component_open(void) -{ - mxm_error_t err; - unsigned long cur_ver; - int rc; - - mca_mtl_mxm_output = opal_output_open(NULL); - opal_output_set_verbosity(mca_mtl_mxm_output, ompi_mtl_mxm.verbose); - cur_ver = mxm_get_version(); - if (cur_ver != MXM_API) { - MXM_VERBOSE(1, - "WARNING: OMPI was compiled with MXM version %d.%d but version %ld.%ld detected.", - MXM_VERNO_MAJOR, - MXM_VERNO_MINOR, - (cur_ver >> MXM_MAJOR_BIT) & 0xff, - (cur_ver >> MXM_MINOR_BIT) & 0xff); - } - -#if MXM_API >= MXM_VERSION(2,0) - (void)mca_base_framework_open(&opal_memory_base_framework, 0); - /* Register memory hooks */ - if ((OPAL_MEMORY_FREE_SUPPORT | OPAL_MEMORY_MUNMAP_SUPPORT) == - ((OPAL_MEMORY_FREE_SUPPORT | OPAL_MEMORY_MUNMAP_SUPPORT) & - opal_mem_hooks_support_level())) - { - setenv("MXM_MPI_MEM_ON_DEMAND_MAP", "y", 0); - MXM_VERBOSE(1, "Enabling on-demand memory mapping"); - ompi_mtl_mxm.using_mem_hooks = 1; - } else { - MXM_VERBOSE(1, "Disabling on-demand memory mapping"); - ompi_mtl_mxm.using_mem_hooks = 0; - } - setenv("MXM_MPI_SINGLE_THREAD", ompi_mpi_thread_multiple ? "n" : "y" , 0); -#endif - -#if MXM_API >= MXM_VERSION(2,1) - if (MXM_OK != mxm_config_read_opts(&ompi_mtl_mxm.mxm_ctx_opts, - &ompi_mtl_mxm.mxm_ep_opts, - "MPI", NULL, 0)) -#else - if ((MXM_OK != mxm_config_read_context_opts(&ompi_mtl_mxm.mxm_ctx_opts)) || - (MXM_OK != mxm_config_read_ep_opts(&ompi_mtl_mxm.mxm_ep_opts))) -#endif - { - MXM_ERROR("Failed to parse MXM configuration"); - return OPAL_ERR_BAD_PARAM; - } - - err = mxm_init(ompi_mtl_mxm.mxm_ctx_opts, &ompi_mtl_mxm.mxm_context); - MXM_VERBOSE(1, "mxm component open"); - - if (MXM_OK != err) { - if (MXM_ERR_NO_DEVICE == err) { - MXM_VERBOSE(1, "No supported device found, disqualifying mxm"); - } else { - opal_show_help("help-mtl-mxm.txt", "mxm init", true, - mxm_error_string(err)); - } - return OPAL_ERR_NOT_AVAILABLE; - } - - OBJ_CONSTRUCT(&mca_mtl_mxm_component.mxm_messages, opal_free_list_t); - rc = opal_free_list_init (&mca_mtl_mxm_component.mxm_messages, - sizeof(ompi_mtl_mxm_message_t), - opal_cache_line_size, - OBJ_CLASS(ompi_mtl_mxm_message_t), - 0, opal_cache_line_size, - 32 /* free list num */, - -1 /* free list max */, - 32 /* free list inc */, - NULL, 0, NULL, NULL, NULL); - if (OMPI_SUCCESS != rc) { - opal_show_help("help-mtl-mxm.txt", "mxm init", true, - mxm_error_string(err)); - return OPAL_ERR_NOT_AVAILABLE; - } - - return OMPI_SUCCESS; -} - -static int ompi_mtl_mxm_component_query(mca_base_module_t **module, int *priority) -{ - - /* - * if we get here it means that mxm is available so give high priority - */ - - ompi_mpi_dynamics_disable("the MXM MTL does not support MPI dynamic process functionality"); - - *priority = param_priority; - *module = (mca_base_module_t *)&ompi_mtl_mxm.super; - return OMPI_SUCCESS; -} - -static int ompi_mtl_mxm_component_close(void) -{ - if (ompi_mtl_mxm.mxm_context != NULL) { - mxm_cleanup(ompi_mtl_mxm.mxm_context); - ompi_mtl_mxm.mxm_context = NULL; - OBJ_DESTRUCT(&mca_mtl_mxm_component.mxm_messages); -#if MXM_API >= MXM_VERSION(2,0) - mxm_config_free_ep_opts(ompi_mtl_mxm.mxm_ep_opts); - mxm_config_free_context_opts(ompi_mtl_mxm.mxm_ctx_opts); - mca_base_framework_close(&opal_memory_base_framework); -#else - mxm_config_free(ompi_mtl_mxm.mxm_ep_opts); - mxm_config_free(ompi_mtl_mxm.mxm_ctx_opts); -#endif - } - - return OMPI_SUCCESS; -} - -static mca_mtl_base_module_t* -ompi_mtl_mxm_component_init(bool enable_progress_threads, - bool enable_mpi_threads) -{ - int rc; - - rc = ompi_mtl_mxm_module_init(); - if (OMPI_SUCCESS != rc) { - return NULL; - } - - /* Calculate MTL constraints according to MXM types */ - ompi_mtl_mxm.super.mtl_max_contextid = 1UL << (sizeof(mxm_ctxid_t) * 8); - ompi_mtl_mxm.super.mtl_max_tag = 1UL << (sizeof(mxm_tag_t) * 8 - 2); - ompi_mtl_mxm.super.mtl_request_size = - sizeof(mca_mtl_mxm_request_t) - sizeof(struct mca_mtl_request_t); - return &ompi_mtl_mxm.super; -} diff --git a/ompi/mca/mtl/mxm/mtl_mxm_debug.h b/ompi/mca/mtl/mxm/mtl_mxm_debug.h deleted file mode 100644 index 64f2d7190d2..00000000000 --- a/ompi/mca/mtl/mxm/mtl_mxm_debug.h +++ /dev/null @@ -1,34 +0,0 @@ -/* - * Copyright (C) Mellanox Technologies Ltd. 2001-2011. ALL RIGHTS RESERVED. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef MTL_MXM_DEBUG_H -#define MTL_MXM_DEBUG_H -#pragma GCC system_header - -#ifdef __BASE_FILE__ -#define __MXM_FILE__ __BASE_FILE__ -#else -#define __MXM_FILE__ __FILE__ -#endif - -#define MXM_VERBOSE(level, format, ...) \ - opal_output_verbose(level, mca_mtl_mxm_output, "%s:%d - %s() " format, \ - __MXM_FILE__, __LINE__, __FUNCTION__, ## __VA_ARGS__) - -#define MXM_ERROR(format, ... ) \ - opal_output_verbose(0, mca_mtl_mxm_output, "Error: %s:%d - %s() " format, \ - __MXM_FILE__, __LINE__, __FUNCTION__, ## __VA_ARGS__) - - -#define MXM_MODULE_VERBOSE(mxm_module, level, format, ...) \ - MXM_VERBOSE(level, "[%d] " format, (mxm_module)->rank, ## __VA_ARGS__) - -extern int mca_mtl_mxm_output; - -#endif diff --git a/ompi/mca/mtl/mxm/mtl_mxm_endpoint.c b/ompi/mca/mtl/mxm/mtl_mxm_endpoint.c deleted file mode 100644 index 6fe543a0002..00000000000 --- a/ompi/mca/mtl/mxm/mtl_mxm_endpoint.c +++ /dev/null @@ -1,42 +0,0 @@ -/* - * Copyright (C) Mellanox Technologies Ltd. 2001-2011. ALL RIGHTS RESERVED. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#include "ompi_config.h" -#include -#include -#include "ompi/types.h" - -#include "mtl_mxm.h" -#include "mtl_mxm_types.h" -#include "mtl_mxm_endpoint.h" - -/* - * Initialize state of the endpoint instance. - * - */ - -static void mca_mtl_mxm_endpoint_construct(mca_mtl_mxm_endpoint_t* endpoint) -{ - endpoint->mtl_mxm_module = NULL; -} - -/* - * Destroy a endpoint - * - */ - -static void mca_mtl_mxm_endpoint_destruct(mca_mtl_mxm_endpoint_t* endpoint) -{ -} - -OBJ_CLASS_INSTANCE( - mca_mtl_mxm_endpoint_t, - opal_list_item_t, - mca_mtl_mxm_endpoint_construct, - mca_mtl_mxm_endpoint_destruct); diff --git a/ompi/mca/mtl/mxm/mtl_mxm_endpoint.h b/ompi/mca/mtl/mxm/mtl_mxm_endpoint.h deleted file mode 100644 index 1dfeca87c42..00000000000 --- a/ompi/mca/mtl/mxm/mtl_mxm_endpoint.h +++ /dev/null @@ -1,41 +0,0 @@ -/* - * Copyright (C) Mellanox Technologies Ltd. 2001-2011. ALL RIGHTS RESERVED. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef MCA_MTL_MXM_ENDPOINT_H -#define MCA_MTL_MXM_ENDPOINT_H -#include "opal/class/opal_list.h" -#include "ompi/mca/mtl/mtl.h" -#include "mtl_mxm.h" - -BEGIN_C_DECLS - -OBJ_CLASS_DECLARATION(mca_mtl_mxm_endpoint_t); - -/** - * An abstraction that represents a connection to a endpoint process. - * An instance of mca_mtl_mxm_endpoint_t is associated w/ each process - * and MTL pair at startup. However, connections to the endpoint - * are established dynamically on an as-needed basis: - */ - -struct mca_mtl_mxm_endpoint_t { - opal_list_item_t super; - - struct mca_mtl_mxm_module_t* mtl_mxm_module; - /**< MTL instance that created this connection */ - - mxm_conn_h mxm_conn; - /**< MXM Connection handle*/ -}; - -typedef struct mca_mtl_mxm_endpoint_t mca_mtl_mxm_endpoint_t; -OBJ_CLASS_DECLARATION(mca_mtl_mxm_endpoint); - -END_C_DECLS -#endif diff --git a/ompi/mca/mtl/mxm/mtl_mxm_probe.c b/ompi/mca/mtl/mxm/mtl_mxm_probe.c deleted file mode 100644 index 814dc9c6d91..00000000000 --- a/ompi/mca/mtl/mxm/mtl_mxm_probe.c +++ /dev/null @@ -1,115 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (C) Mellanox Technologies Ltd. 2001-2011. ALL RIGHTS RESERVED. - * Copyright (c) 2013 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights - * reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#include "ompi_config.h" -#include "mtl_mxm.h" -#include "mtl_mxm_types.h" - -#include "ompi/message/message.h" -#include "ompi/communicator/communicator.h" - -int ompi_mtl_mxm_iprobe(struct mca_mtl_base_module_t* mtl, - struct ompi_communicator_t *comm, int src, int tag, - int *flag, struct ompi_status_public_t *status) -{ - mxm_error_t err; - mxm_recv_req_t req; - - req.base.state = MXM_REQ_NEW; - ompi_mtl_mxm_set_recv_envelope(&req, comm, src, tag); - - err = mxm_req_probe(&req); - if (MXM_OK == err) { - *flag = 1; - if (MPI_STATUS_IGNORE != status) { - ompi_mtl_mxm_to_mpi_status(err, status); - status->MPI_SOURCE = req.completion.sender_imm; - status->MPI_TAG = req.completion.sender_tag; - status->_ucount = req.completion.sender_len; - } - return OMPI_SUCCESS; - } else if (MXM_ERR_NO_MESSAGE == err) { - *flag = 0; - return OMPI_SUCCESS; - } else { - return OMPI_ERROR; - } -} - - -int ompi_mtl_mxm_improbe(struct mca_mtl_base_module_t *mtl, - struct ompi_communicator_t *comm, - int src, - int tag, - int *matched, - struct ompi_message_t **message, - struct ompi_status_public_t *status) -{ - mxm_error_t err; - mxm_recv_req_t req; - - opal_free_list_item_t *item; - ompi_mtl_mxm_message_t *msgp; - - item = opal_free_list_wait (&mca_mtl_mxm_component.mxm_messages); - if (OPAL_UNLIKELY(NULL == item)) { - return OMPI_ERR_OUT_OF_RESOURCE; - } - - msgp = (ompi_mtl_mxm_message_t *) item; - - req.base.state = MXM_REQ_NEW; - ompi_mtl_mxm_set_recv_envelope(&req, comm, src, tag); - - msgp->mq = req.base.mq; - msgp->conn = req.base.conn; - msgp->tag = req.tag; - msgp->tag_mask = req.tag_mask; - - err = mxm_req_mprobe(&req, &msgp->mxm_msg); - if (MXM_OK == err) { - if (MPI_STATUS_IGNORE != status) { - *matched = 1; - ompi_mtl_mxm_to_mpi_status(err, status); - status->MPI_SOURCE = req.completion.sender_imm; - status->MPI_TAG = req.completion.sender_tag; - status->_ucount = req.completion.sender_len; - } else{ - *matched = 0; - *message = MPI_MESSAGE_NULL; - return OMPI_SUCCESS; - } - } else if (MXM_ERR_NO_MESSAGE == err) { - *matched = 0; - *message = MPI_MESSAGE_NULL; - return OMPI_SUCCESS; - } else { - return OMPI_ERROR; - } - - (*message) = ompi_message_alloc(); - if (OPAL_UNLIKELY(NULL == (*message))) { - *matched = 0; - *message = MPI_MESSAGE_NULL; - return OMPI_ERR_OUT_OF_RESOURCE; - } - - (*message)->comm = comm; - (*message)->req_ptr = msgp; - (*message)->peer = status->MPI_SOURCE; - (*message)->count = status->_ucount; - - return OMPI_SUCCESS; -} diff --git a/ompi/mca/mtl/mxm/mtl_mxm_recv.c b/ompi/mca/mtl/mxm/mtl_mxm_recv.c deleted file mode 100644 index a69e9d7e12a..00000000000 --- a/ompi/mca/mtl/mxm/mtl_mxm_recv.c +++ /dev/null @@ -1,197 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (C) Mellanox Technologies Ltd. 2001-2011. ALL RIGHTS RESERVED. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights - * reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#include "ompi_config.h" -#include "ompi/message/message.h" -#include "opal/datatype/opal_convertor.h" -#include "ompi/mca/mtl/base/mtl_base_datatype.h" -#include "opal/util/show_help.h" - -#include "mtl_mxm.h" -#include "mtl_mxm_types.h" -#include "mtl_mxm_request.h" - -static void ompi_mtl_mxm_recv_completion_cb(void *context) -{ - mca_mtl_mxm_request_t *req = (mca_mtl_mxm_request_t *) context; - struct ompi_request_t *ompi_req = req->super.ompi_req; - mxm_recv_req_t *mxm_recv_req = &req->mxm.recv; - - /* Set completion status and envelope */ - ompi_mtl_mxm_to_mpi_status(mxm_recv_req->base.error, &ompi_req->req_status); - ompi_req->req_status.MPI_TAG = mxm_recv_req->completion.sender_tag; - ompi_req->req_status.MPI_SOURCE = mxm_recv_req->completion.sender_imm; - ompi_req->req_status._ucount = mxm_recv_req->completion.actual_len; - - req->super.completion_callback(&req->super); -} - -static size_t ompi_mtl_mxm_stream_unpack(void *buffer, size_t length, - size_t offset, void *context) -{ - struct iovec iov; - uint32_t iov_count = 1; - - mca_mtl_mxm_request_t *mtl_mxm_request = (mca_mtl_mxm_request_t *) context; - opal_convertor_t *convertor = mtl_mxm_request->convertor; - - iov.iov_len = length; - iov.iov_base = buffer; - - opal_convertor_set_position(convertor, &offset); - opal_convertor_unpack(convertor, &iov, &iov_count, &length); - - return length; -} - -static inline __opal_attribute_always_inline__ int - ompi_mtl_mxm_choose_recv_datatype(mca_mtl_mxm_request_t *mtl_mxm_request) -{ - void **buffer = &mtl_mxm_request->buf; - size_t *buffer_len = &mtl_mxm_request->length; - - mxm_recv_req_t *mxm_recv_req = &mtl_mxm_request->mxm.recv; - opal_convertor_t *convertor = mtl_mxm_request->convertor; - - opal_convertor_get_packed_size(convertor, buffer_len); - - if (0 == *buffer_len) { - *buffer = NULL; - *buffer_len = 0; - - mxm_recv_req->base.data_type = MXM_REQ_DATA_BUFFER; - - return OMPI_SUCCESS; - } - - if (opal_convertor_need_buffers(convertor)) { - mxm_recv_req->base.data_type = MXM_REQ_DATA_STREAM; - mxm_recv_req->base.data.stream.length = *buffer_len; - mxm_recv_req->base.data.stream.cb = ompi_mtl_mxm_stream_unpack; - - return OMPI_SUCCESS; - } - - mxm_recv_req->base.data_type = MXM_REQ_DATA_BUFFER; - - *buffer = convertor->pBaseBuf + - convertor->use_desc->desc[convertor->use_desc->used].end_loop.first_elem_disp; - - mxm_recv_req->base.data.buffer.ptr = *buffer; - mxm_recv_req->base.data.buffer.length = *buffer_len; - - return OMPI_SUCCESS; -} - -static inline __opal_attribute_always_inline__ int - ompi_mtl_mxm_recv_init(mca_mtl_mxm_request_t *mtl_mxm_request, - opal_convertor_t *convertor, - mxm_recv_req_t *mxm_recv_req) -{ - int ret; - - mtl_mxm_request->convertor = convertor; - ret = ompi_mtl_mxm_choose_recv_datatype(mtl_mxm_request); - if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { - return ret; - } - -#if MXM_API >= MXM_VERSION(2,0) - mtl_mxm_request->is_send = 0; -#endif - - mxm_recv_req->base.state = MXM_REQ_NEW; - -#if MXM_API < MXM_VERSION(2,0) - mxm_recv_req->base.flags = 0; -#endif - - mxm_recv_req->base.data.buffer.memh = MXM_INVALID_MEM_HANDLE; - mxm_recv_req->base.context = mtl_mxm_request; - mxm_recv_req->base.completed_cb = ompi_mtl_mxm_recv_completion_cb; - - return OMPI_SUCCESS; -} - -int ompi_mtl_mxm_irecv(struct mca_mtl_base_module_t* mtl, - struct ompi_communicator_t *comm, int src, int tag, - struct opal_convertor_t *convertor, - struct mca_mtl_request_t *mtl_request) -{ - int ret; - mxm_error_t err; - mxm_recv_req_t *mxm_recv_req; - mca_mtl_mxm_request_t *mtl_mxm_request; - - mtl_mxm_request = (mca_mtl_mxm_request_t*) mtl_request; - mxm_recv_req = &mtl_mxm_request->mxm.recv; - - ompi_mtl_mxm_set_recv_envelope(mxm_recv_req, comm, src, tag); - - /* prepare a receive request embedded in the MTL request */ - ret = ompi_mtl_mxm_recv_init(mtl_mxm_request, convertor, mxm_recv_req); - if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { - return ret; - } - - /* post-recv */ - err = mxm_req_recv(mxm_recv_req); - if (OPAL_UNLIKELY(MXM_OK != err)) { - opal_show_help("help-mtl-mxm.txt", "error posting receive", true, - mxm_error_string(err), mtl_mxm_request->buf, mtl_mxm_request->length); - return OMPI_ERROR; - } - - return OMPI_SUCCESS; -} - -int ompi_mtl_mxm_imrecv(struct mca_mtl_base_module_t* mtl, - struct opal_convertor_t *convertor, - struct ompi_message_t **message, - struct mca_mtl_request_t *mtl_request) -{ - int ret; - mxm_error_t err; - mxm_recv_req_t *mxm_recv_req; - mca_mtl_mxm_request_t *mtl_mxm_request; - - ompi_mtl_mxm_message_t *msgp = - (ompi_mtl_mxm_message_t *) (*message)->req_ptr; - - mtl_mxm_request = (mca_mtl_mxm_request_t*) mtl_request; - mxm_recv_req = &mtl_mxm_request->mxm.recv; - - /* prepare a receive request embedded in the MTL request */ - ret = ompi_mtl_mxm_recv_init(mtl_mxm_request, convertor, mxm_recv_req); - if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { - return ret; - } - - mxm_recv_req->tag = msgp->tag; - mxm_recv_req->tag_mask = msgp->tag_mask; - mxm_recv_req->base.mq = msgp->mq; - mxm_recv_req->base.conn = msgp->conn; - - err = mxm_message_recv(mxm_recv_req, msgp->mxm_msg); - if (OPAL_UNLIKELY(MXM_OK != err)) { - opal_show_help("help-mtl-mxm.txt", "error posting message receive", true, - mxm_error_string(err), mtl_mxm_request->buf, mtl_mxm_request->length); - return OMPI_ERROR; - } - - opal_free_list_return (&mca_mtl_mxm_component.mxm_messages, (opal_free_list_item_t *) msgp); - - ompi_message_return(*message); - (*message) = MPI_MESSAGE_NULL; - - return OMPI_SUCCESS; -} diff --git a/ompi/mca/mtl/mxm/mtl_mxm_request.h b/ompi/mca/mtl/mxm/mtl_mxm_request.h deleted file mode 100644 index a3c103996a4..00000000000 --- a/ompi/mca/mtl/mxm/mtl_mxm_request.h +++ /dev/null @@ -1,35 +0,0 @@ -/* - * Copyright (C) Mellanox Technologies Ltd. 2001-2011. ALL RIGHTS RESERVED. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef OMPI_MTL_MXM_REQUEST_H -#define OMPI_MTL_MXM_REQUEST_H - -#include "opal/datatype/opal_convertor.h" -#include "mtl_mxm.h" - - -struct mca_mtl_mxm_request_t { - struct mca_mtl_request_t super; - union { - mxm_req_base_t base; - mxm_send_req_t send; - mxm_recv_req_t recv; - } mxm; -#if MXM_API >= MXM_VERSION(2,0) - int is_send; -#endif - /* mxm_segment_t mxm_segment[1]; */ - void *buf; - size_t length; - struct opal_convertor_t *convertor; - bool free_after; -}; -typedef struct mca_mtl_mxm_request_t mca_mtl_mxm_request_t; - -#endif diff --git a/ompi/mca/mtl/mxm/mtl_mxm_send.c b/ompi/mca/mtl/mxm/mtl_mxm_send.c deleted file mode 100644 index 0f5c0c9b42f..00000000000 --- a/ompi/mca/mtl/mxm/mtl_mxm_send.c +++ /dev/null @@ -1,238 +0,0 @@ -/* * Copyright (C) Mellanox Technologies Ltd. 2001-2011. ALL RIGHTS RESERVED. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#include "ompi_config.h" -#include "ompi/mca/pml/pml.h" -#include "opal/datatype/opal_convertor.h" -#include "opal/util/show_help.h" - -#include "mtl_mxm.h" -#include "mtl_mxm_types.h" -#include "mtl_mxm_request.h" -#include "ompi/mca/mtl/base/mtl_base_datatype.h" - -static inline __opal_attribute_always_inline__ - size_t ompi_mtl_mxm_stream_pack(opal_convertor_t *convertor, void *buffer, - size_t length, size_t offset) -{ - struct iovec iov; - uint32_t iov_count = 1; - - iov.iov_len = length; - iov.iov_base = buffer; - - opal_convertor_set_position(convertor, &offset); - opal_convertor_pack(convertor, &iov, &iov_count, &length); - - return length; -} - -static size_t ompi_mtl_mxm_stream_isend(void *buffer, size_t length, size_t offset, void *context) -{ - mca_mtl_mxm_request_t *mtl_mxm_request = (mca_mtl_mxm_request_t *) context; - opal_convertor_t *convertor = mtl_mxm_request->convertor; - - return ompi_mtl_mxm_stream_pack(convertor, buffer, length, offset); -} - -static size_t ompi_mtl_mxm_stream_send(void *buffer, size_t length, size_t offset, void *context) -{ - opal_convertor_t *convertor = (opal_convertor_t *) context; - - return ompi_mtl_mxm_stream_pack(convertor, buffer, length, offset); -} - -static inline __opal_attribute_always_inline__ int - ompi_mtl_mxm_choose_send_datatype(mxm_send_req_t *mxm_send_req, - opal_convertor_t *convertor, - mxm_stream_cb_t stream_cb) -{ - struct iovec iov; - uint32_t iov_count = 1; - - size_t *buffer_len = &mxm_send_req->base.data.buffer.length; - -#if !(OPAL_ENABLE_HETEROGENEOUS_SUPPORT) - if (convertor->pDesc && - opal_datatype_is_contiguous_memory_layout(convertor->pDesc, - convertor->count)) { - mxm_send_req->base.data.buffer.ptr = convertor->pBaseBuf; - mxm_send_req->base.data.buffer.length = convertor->local_size; - mxm_send_req->base.data_type = MXM_REQ_DATA_BUFFER; - return OMPI_SUCCESS; - } -#endif - - opal_convertor_get_packed_size(convertor, buffer_len); - if (0 == *buffer_len) { - mxm_send_req->base.data.buffer.ptr = NULL; - mxm_send_req->base.data_type = MXM_REQ_DATA_BUFFER; - - return OMPI_SUCCESS; - } - - if (opal_convertor_need_buffers(convertor)) { - mxm_send_req->base.data_type = MXM_REQ_DATA_STREAM; - mxm_send_req->base.data.stream.length = *buffer_len; - mxm_send_req->base.data.stream.cb = stream_cb; - - return OMPI_SUCCESS; - } - - mxm_send_req->base.data_type = MXM_REQ_DATA_BUFFER; - - iov.iov_base = NULL; - iov.iov_len = *buffer_len; - - opal_convertor_pack(convertor, &iov, &iov_count, buffer_len); - mxm_send_req->base.data.buffer.ptr = iov.iov_base; - - return OMPI_SUCCESS; -} - -static void ompi_mtl_mxm_send_completion_cb(void *context) -{ - mca_mtl_mxm_request_t *mtl_mxm_request = context; - - ompi_mtl_mxm_to_mpi_status(mtl_mxm_request->mxm.base.error, - &mtl_mxm_request->super.ompi_req->req_status); - mtl_mxm_request->super.completion_callback(&mtl_mxm_request->super); -} - -static void ompi_mtl_mxm_send_progress_cb(void *user_data) -{ - opal_progress(); -} - -int ompi_mtl_mxm_send(struct mca_mtl_base_module_t* mtl, - struct ompi_communicator_t* comm, int dest, int tag, - struct opal_convertor_t *convertor, - mca_pml_base_send_mode_t mode) -{ - mxm_send_req_t mxm_send_req; - mxm_wait_t wait; - mxm_error_t err; - int ret; - - /* prepare local send request */ - mxm_send_req.base.state = MXM_REQ_NEW; - mxm_send_req.base.mq = ompi_mtl_mxm_mq_lookup(comm); - mxm_send_req.base.conn = ompi_mtl_mxm_conn_lookup(comm, dest); - mxm_send_req.base.context = convertor; - mxm_send_req.base.completed_cb = NULL; - - ret = ompi_mtl_mxm_choose_send_datatype(&mxm_send_req, convertor, - ompi_mtl_mxm_stream_send); - if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { - return ret; - } - - mxm_send_req.base.data.buffer.memh = MXM_INVALID_MEM_HANDLE; - mxm_send_req.op.send.tag = tag; - mxm_send_req.op.send.imm_data = ompi_comm_rank(comm); - -#if MXM_API < MXM_VERSION(2,0) - mxm_send_req.base.flags = MXM_REQ_FLAG_BLOCKING; - mxm_send_req.opcode = MXM_REQ_OP_SEND; - if (mode == MCA_PML_BASE_SEND_SYNCHRONOUS) { - mxm_send_req.base.flags |= MXM_REQ_FLAG_SEND_SYNC; - } -#else - mxm_send_req.flags = MXM_REQ_SEND_FLAG_BLOCKING; - if (mode == MCA_PML_BASE_SEND_SYNCHRONOUS) { - mxm_send_req.opcode = MXM_REQ_OP_SEND_SYNC; - } else { - mxm_send_req.opcode = MXM_REQ_OP_SEND; - } -#endif - - /* post-send */ - err = mxm_req_send(&mxm_send_req); - if (MXM_OK != err) { - opal_show_help("help-mtl-mxm.txt", "error posting send", true, 0, mxm_error_string(err)); - return OMPI_ERROR; - } - - /* wait for request completion */ - wait.req = &mxm_send_req.base; - wait.state = MXM_REQ_COMPLETED; - wait.progress_cb = ompi_mtl_mxm_send_progress_cb; - wait.progress_arg = NULL; - mxm_wait(&wait); - - return OMPI_SUCCESS; -} - -int ompi_mtl_mxm_isend(struct mca_mtl_base_module_t* mtl, - struct ompi_communicator_t* comm, int dest, int tag, - struct opal_convertor_t *convertor, - mca_pml_base_send_mode_t mode, bool blocking, - mca_mtl_request_t * mtl_request) -{ - mca_mtl_mxm_request_t *mtl_mxm_request = (mca_mtl_mxm_request_t *) mtl_request; - mxm_send_req_t *mxm_send_req; - mxm_error_t err; - int ret; - - assert(mtl == &ompi_mtl_mxm.super); - - mtl_mxm_request->convertor = convertor; - - mxm_send_req = &mtl_mxm_request->mxm.send; -#if MXM_API >= MXM_VERSION(2,0) - mtl_mxm_request->is_send = 1; -#endif - - /* prepare a send request embedded in the MTL request */ - mxm_send_req->base.state = MXM_REQ_NEW; - mxm_send_req->base.mq = ompi_mtl_mxm_mq_lookup(comm); - mxm_send_req->base.conn = ompi_mtl_mxm_conn_lookup(comm, dest); - - ret = ompi_mtl_mxm_choose_send_datatype(mxm_send_req, convertor, - ompi_mtl_mxm_stream_isend); - if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { - return ret; - } - - mtl_mxm_request->buf = mxm_send_req->base.data.buffer.ptr; - mtl_mxm_request->length = mxm_send_req->base.data.buffer.length; - - mxm_send_req->base.data.buffer.memh = MXM_INVALID_MEM_HANDLE; - mxm_send_req->base.context = mtl_mxm_request; - mxm_send_req->base.completed_cb = ompi_mtl_mxm_send_completion_cb; - -#if MXM_API < MXM_VERSION(2,0) - mxm_send_req->base.flags = 0; - mxm_send_req->opcode = MXM_REQ_OP_SEND; - if (mode == MCA_PML_BASE_SEND_SYNCHRONOUS) { - mxm_send_req->base.flags |= MXM_REQ_FLAG_SEND_SYNC; - } -#else -#if defined(MXM_REQ_SEND_FLAG_REENTRANT) - mxm_send_req->flags = MXM_REQ_SEND_FLAG_REENTRANT; -#else - mxm_send_req->flags = 0; -#endif - if (mode == MCA_PML_BASE_SEND_SYNCHRONOUS) { - mxm_send_req->opcode = MXM_REQ_OP_SEND_SYNC; - } else { - mxm_send_req->opcode = MXM_REQ_OP_SEND; - } -#endif - mxm_send_req->op.send.tag = tag; - mxm_send_req->op.send.imm_data = ompi_comm_rank(comm); - - /* post-send */ - err = mxm_req_send(mxm_send_req); - if (MXM_OK != err) { - opal_show_help("help-mtl-mxm.txt", "error posting send", true, 1, mxm_error_string(err)); - return OMPI_ERROR; - } - - return OMPI_SUCCESS; -} diff --git a/ompi/mca/mtl/mxm/mtl_mxm_types.h b/ompi/mca/mtl/mxm/mtl_mxm_types.h deleted file mode 100644 index 6e5749c733c..00000000000 --- a/ompi/mca/mtl/mxm/mtl_mxm_types.h +++ /dev/null @@ -1,123 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (C) Mellanox Technologies Ltd. 2001-2011. ALL RIGHTS RESERVED. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights - * reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef MTL_MXM_TYPES_H_HAS_BEEN_INCLUDED -#define MTL_MXM_TYPES_H_HAS_BEEN_INCLUDED - -#include "ompi_config.h" -#include "mtl_mxm.h" - -#include "ompi/mca/mtl/mtl.h" -#include "ompi/mca/mtl/base/base.h" -#include "ompi/communicator/communicator.h" -#include "mtl_mxm_endpoint.h" - - -BEGIN_C_DECLS - -/** - * MTL Module Interface - */ -typedef struct mca_mtl_mxm_module_t { - mca_mtl_base_module_t super; /**< base MTL interface */ - int verbose; - int mxm_np; - mxm_h mxm_context; - mxm_ep_h ep; - mxm_context_opts_t *mxm_ctx_opts; - mxm_ep_opts_t *mxm_ep_opts; -#if MXM_API >= MXM_VERSION(2,0) - int using_mem_hooks; -#endif -#if MXM_API >= MXM_VERSION(3,1) - int bulk_connect; /* use bulk connect */ - int bulk_disconnect; /* use bulk disconnect */ -#endif - char* runtime_version; - char* compiletime_version; -} mca_mtl_mxm_module_t; - - -#if MXM_API < MXM_VERSION(2,0) -typedef struct ompi_mtl_mxm_ep_conn_info_t { - struct sockaddr_storage ptl_addr[MXM_PTL_LAST]; -} ompi_mtl_mxm_ep_conn_info_t; -#endif - -extern mca_mtl_mxm_module_t ompi_mtl_mxm; - -typedef struct mca_mtl_mxm_component_t { - mca_mtl_base_component_2_0_0_t super; /**< base MTL component */ - opal_free_list_t mxm_messages; /* will be used for MPI_Mprobe and MPI_Mrecv calls */ -} mca_mtl_mxm_component_t; - - -OMPI_DECLSPEC mca_mtl_mxm_component_t mca_mtl_mxm_component; - - -static inline mxm_conn_h ompi_mtl_mxm_conn_lookup(struct ompi_communicator_t* comm, int rank) { - ompi_proc_t* ompi_proc = ompi_comm_peer_lookup(comm, rank); - mca_mtl_mxm_endpoint_t *endpoint = (mca_mtl_mxm_endpoint_t*) ompi_proc->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_MTL]; - - if (endpoint != NULL) { - return endpoint->mxm_conn; - } - - MXM_VERBOSE(80, "First communication with [%s:%s]: set endpoint connection.", - ompi_proc->super.proc_hostname, OPAL_NAME_PRINT(ompi_proc->super.proc_name)); - ompi_mtl_add_single_proc(ompi_mtl, ompi_proc); - endpoint = (mca_mtl_mxm_endpoint_t*) ompi_proc->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_MTL]; - - return endpoint->mxm_conn; -} - -static inline mxm_mq_h ompi_mtl_mxm_mq_lookup(struct ompi_communicator_t* comm) { - return (mxm_mq_h)comm->c_pml_comm; -} - -static inline void ompi_mtl_mxm_to_mpi_status(mxm_error_t status, ompi_status_public_t *ompi_status) { - switch (status) { - case MXM_OK: - ompi_status->MPI_ERROR = OMPI_SUCCESS; - break; - case MXM_ERR_CANCELED: - ompi_status->_cancelled = true; - break; - case MXM_ERR_MESSAGE_TRUNCATED: - ompi_status->MPI_ERROR = MPI_ERR_TRUNCATE; - break; - default: - ompi_status->MPI_ERROR = MPI_ERR_INTERN; - break; - } -} - -static inline void ompi_mtl_mxm_set_recv_envelope(mxm_recv_req_t *req, - struct ompi_communicator_t *comm, - int src, int tag) { - req->base.mq = (mxm_mq_h)comm->c_pml_comm; - req->base.conn = (src == MPI_ANY_SOURCE) - ? NULL - : ompi_mtl_mxm_conn_lookup(comm, src); - if (tag == MPI_ANY_TAG) { - req->tag = 0; - req->tag_mask = 0x80000000U; /* MPI_ANY_TAG should not match against negative tags */ - } else { - req->tag = tag; - req->tag_mask = 0xffffffffU; - } -} - -END_C_DECLS - -#endif - diff --git a/ompi/mca/mtl/mxm/owner.txt b/ompi/mca/mtl/mxm/owner.txt deleted file mode 100644 index 8dacea65a6d..00000000000 --- a/ompi/mca/mtl/mxm/owner.txt +++ /dev/null @@ -1,7 +0,0 @@ -# -# owner/status file -# owner: institution that is responsible for this package -# status: e.g. active, maintenance, unmaintained -# -owner: MELLANOX -status: active diff --git a/ompi/mca/mtl/ofi/Makefile.am b/ompi/mca/mtl/ofi/Makefile.am index 7f81b4545fa..3fbb0fd52bf 100644 --- a/ompi/mca/mtl/ofi/Makefile.am +++ b/ompi/mca/mtl/ofi/Makefile.am @@ -2,6 +2,9 @@ # Copyright (c) 2013-2015 Intel, Inc. All rights reserved # # Copyright (c) 2014-2015 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 Los Alamos National Security, LLC. All rights +# reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -11,7 +14,7 @@ EXTRA_DIST = post_configure.sh -AM_CPPFLAGS = $(ompi_mtl_ofi_CPPFLAGS) $(opal_common_libfabric_CPPFLAGS) +AM_CPPFLAGS = $(ompi_mtl_ofi_CPPFLAGS) $(opal_common_ofi_CPPFLAGS) dist_ompidata_DATA = help-mtl-ofi.txt @@ -43,8 +46,9 @@ mca_mtl_ofi_la_SOURCES = $(mtl_ofi_sources) mca_mtl_ofi_la_LDFLAGS = \ $(ompi_mtl_ofi_LDFLAGS) \ -module -avoid-version -mca_mtl_ofi_la_LIBADD = $(ompi_mtl_ofi_LIBS) \ - $(OPAL_TOP_BUILDDIR)/opal/mca/common/libfabric/lib@OPAL_LIB_PREFIX@mca_common_libfabric.la +mca_mtl_ofi_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(ompi_mtl_ofi_LIBS) \ + $(OPAL_TOP_BUILDDIR)/opal/mca/common/ofi/lib@OPAL_LIB_PREFIX@mca_common_ofi.la noinst_LTLIBRARIES = $(component_noinst) libmca_mtl_ofi_la_SOURCES = $(mtl_ofi_sources) diff --git a/ompi/mca/mtl/ofi/configure.m4 b/ompi/mca/mtl/ofi/configure.m4 index 627298dcda6..772cd75cfa4 100644 --- a/ompi/mca/mtl/ofi/configure.m4 +++ b/ompi/mca/mtl/ofi/configure.m4 @@ -3,6 +3,8 @@ # Copyright (c) 2013-2014 Intel, Inc. All rights reserved # # Copyright (c) 2014-2015 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 Los Alamos National Security, LLC. All rights +# reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -23,10 +25,10 @@ AC_DEFUN([MCA_ompi_mtl_ofi_POST_CONFIG], [ AC_DEFUN([MCA_ompi_mtl_ofi_CONFIG],[ AC_CONFIG_FILES([ompi/mca/mtl/ofi/Makefile]) - # ensure we already ran the common libfabric config - AC_REQUIRE([MCA_opal_common_libfabric_CONFIG]) + # ensure we already ran the common OFI/libfabric config + AC_REQUIRE([MCA_opal_common_ofi_CONFIG]) - AS_IF([test "$opal_common_libfabric_happy" = "yes"], + AS_IF([test "$opal_common_ofi_happy" = "yes"], [$1], [$2]) ])dnl diff --git a/ompi/mca/mtl/ofi/help-mtl-ofi.txt b/ompi/mca/mtl/ofi/help-mtl-ofi.txt index 2338d548f01..8131766ae00 100644 --- a/ompi/mca/mtl/ofi/help-mtl-ofi.txt +++ b/ompi/mca/mtl/ofi/help-mtl-ofi.txt @@ -1,10 +1,18 @@ # -*- text -*- # -# Copyright (c) 2013-2015 Intel, Inc. All rights reserved +# Copyright (c) 2013-2017 Intel, Inc. All rights reserved # +# Copyright (c) 2017 Cisco Systems, Inc. All rights reserved # $COPYRIGHT$ # # Additional copyrights may follow # # $HEADER$ # +[OFI call fail] +Open MPI failed an OFI Libfabric library call (%s). This is highly +unusual; your job may behave unpredictably (and/or abort) after this. + + Local host: %s + Location: %s:%d + Error: %s (%zd) diff --git a/ompi/mca/mtl/ofi/mtl_ofi.h b/ompi/mca/mtl/ofi/mtl_ofi.h index 1128aca3d26..f4c5f7c3f9a 100644 --- a/ompi/mca/mtl/ofi/mtl_ofi.h +++ b/ompi/mca/mtl/ofi/mtl_ofi.h @@ -1,5 +1,7 @@ /* - * Copyright (c) 2013-2016 Intel, Inc. All rights reserved + * Copyright (c) 2013-2017 Intel, Inc. All rights reserved + * Copyright (c) 2017 Los Alamos National Security, LLC. All rights + * reserved. * * $COPYRIGHT$ * @@ -14,6 +16,7 @@ #include "ompi/mca/mtl/mtl.h" #include "ompi/mca/mtl/base/base.h" #include "opal/datatype/opal_convertor.h" +#include "opal/util/show_help.h" #include #include @@ -60,8 +63,7 @@ __opal_attribute_always_inline__ static inline int ompi_mtl_ofi_progress(void) { ssize_t ret; - int count = 0; - struct fi_cq_tagged_entry wc = { 0 }; + int count = 0, i, events_read; struct fi_cq_err_entry error = { 0 }; ompi_mtl_ofi_request_t *ofi_req = NULL; @@ -71,21 +73,26 @@ ompi_mtl_ofi_progress(void) * Call the request's callback. */ while (true) { - ret = fi_cq_read(ompi_mtl_ofi.cq, (void *)&wc, 1); + ret = fi_cq_read(ompi_mtl_ofi.cq, ompi_mtl_ofi.progress_entries, + ompi_mtl_ofi.ofi_progress_event_count); if (ret > 0) { - count++; - if (NULL != wc.op_context) { - ofi_req = TO_OFI_REQ(wc.op_context); - assert(ofi_req); - ret = ofi_req->event_callback(&wc, ofi_req); - if (OMPI_SUCCESS != ret) { - opal_output(ompi_mtl_base_framework.framework_output, - "Error returned by request event callback: %zd", - ret); - abort(); + count+= ret; + events_read = ret; + for (i = 0; i < events_read; i++) { + if (NULL != ompi_mtl_ofi.progress_entries[i].op_context) { + ofi_req = TO_OFI_REQ(ompi_mtl_ofi.progress_entries[i].op_context); + assert(ofi_req); + ret = ofi_req->event_callback(&ompi_mtl_ofi.progress_entries[i], ofi_req); + if (OMPI_SUCCESS != ret) { + opal_output(0, "%s:%d: Error returned by request event callback: %zd.\n" + "*** The Open MPI OFI MTL is aborting the MPI job (via exit(3)).\n", + __FILE__, __LINE__, ret); + fflush(stderr); + exit(1); + } } } - } else if (ret == -FI_EAVAIL) { + } else if (OPAL_UNLIKELY(ret == -FI_EAVAIL)) { /** * An error occured and is being reported via the CQ. * Read the error and forward it to the upper layer. @@ -94,9 +101,11 @@ ompi_mtl_ofi_progress(void) &error, 0); if (0 > ret) { - opal_output(ompi_mtl_base_framework.framework_output, - "Error returned from fi_cq_readerr: %zd", ret); - abort(); + opal_output(0, "%s:%d: Error returned from fi_cq_readerr: %s(%zd).\n" + "*** The Open MPI OFI MTL is aborting the MPI job (via exit(3)).\n", + __FILE__, __LINE__, fi_strerror(-ret), ret); + fflush(stderr); + exit(1); } assert(error.op_context); @@ -104,16 +113,22 @@ ompi_mtl_ofi_progress(void) assert(ofi_req); ret = ofi_req->error_callback(&error, ofi_req); if (OMPI_SUCCESS != ret) { - opal_output(ompi_mtl_base_framework.framework_output, - "Error returned by request error callback: %zd", - ret); - abort(); + opal_output(0, "%s:%d: Error returned by request error callback: %zd.\n" + "*** The Open MPI OFI MTL is aborting the MPI job (via exit(3)).\n", + __FILE__, __LINE__, ret); + fflush(stderr); + exit(1); } } else { - /** - * The CQ is empty. Return. - */ - break; + if (ret == -FI_EAGAIN || ret == -EINTR) { + break; + } else { + opal_output(0, "%s:%d: Error returned from fi_cq_read: %s(%zd).\n" + "*** The Open MPI OFI MTL is aborting the MPI job (via exit(3)).\n", + __FILE__, __LINE__, fi_strerror(-ret), ret); + fflush(stderr); + exit(1); + } } } return count; @@ -676,8 +691,8 @@ ompi_mtl_ofi_imrecv(struct mca_mtl_base_module_t *mtl, msg.desc = NULL; msg.iov_count = 1; msg.addr = 0; - msg.tag = 0; - msg.ignore = 0; + msg.tag = ofi_req->match_bits; + msg.ignore = ofi_req->mask_bits; msg.context = (void *)&ofi_req->ctx; msg.data = 0; @@ -858,6 +873,7 @@ ompi_mtl_ofi_improbe(struct mca_mtl_base_module_t *mtl, ofi_req->error_callback = ompi_mtl_ofi_probe_error_callback; ofi_req->completion_count = 1; ofi_req->match_state = 0; + ofi_req->mask_bits = mask_bits; MTL_OFI_RETRY_UNTIL_DONE(fi_trecvmsg(ompi_mtl_ofi.ep, &msg, msgflags)); if (-FI_ENOMSG == ret) { diff --git a/ompi/mca/mtl/ofi/mtl_ofi_component.c b/ompi/mca/mtl/ofi/mtl_ofi_component.c index 9ab7c1405d4..662fb38e796 100644 --- a/ompi/mca/mtl/ofi/mtl_ofi_component.c +++ b/ompi/mca/mtl/ofi/mtl_ofi_component.c @@ -1,8 +1,8 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* - * Copyright (c) 2013-2016 Intel, Inc. All rights reserved + * Copyright (c) 2013-2017 Intel, Inc. All rights reserved * - * Copyright (c) 2014-2015 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2014-2017 Cisco Systems, Inc. All rights reserved * Copyright (c) 2015-2016 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ @@ -14,6 +14,7 @@ #include "mtl_ofi.h" #include "opal/util/argv.h" +#include "opal/util/show_help.h" static int ompi_mtl_ofi_component_open(void); static int ompi_mtl_ofi_component_query(mca_base_module_t **module, int *priority); @@ -38,18 +39,20 @@ static int av_type; enum { MTL_OFI_PROG_AUTO=1, MTL_OFI_PROG_MANUAL, - MTL_OFI_PROG_UNKNOWN, + MTL_OFI_PROG_UNSPEC, }; mca_base_var_enum_value_t control_prog_type[] = { {MTL_OFI_PROG_AUTO, "auto"}, {MTL_OFI_PROG_MANUAL, "manual"}, + {MTL_OFI_PROG_UNSPEC, "unspec"}, {0, NULL} }; mca_base_var_enum_value_t data_prog_type[] = { {MTL_OFI_PROG_AUTO, "auto"}, {MTL_OFI_PROG_MANUAL, "manual"}, + {MTL_OFI_PROG_UNSPEC, "unspec"}, {0, NULL} }; @@ -95,6 +98,7 @@ ompi_mtl_ofi_component_register(void) { int ret; mca_base_var_enum_t *new_enum = NULL; + char *desc; param_priority = 25; /* for now give a lower priority than the psm mtl */ mca_base_component_var_register(&mca_mtl_ofi_component.super.mtl_version, @@ -122,15 +126,27 @@ ompi_mtl_ofi_component_register(void) MCA_BASE_VAR_SCOPE_READONLY, &prov_exclude); + ompi_mtl_ofi.ofi_progress_event_count = 100; + asprintf(&desc, "Max number of events to read each call to OFI progress (default: %d events will be read per OFI progress call)", ompi_mtl_ofi.ofi_progress_event_count); + mca_base_component_var_register(&mca_mtl_ofi_component.super.mtl_version, + "progress_event_cnt", + desc, + MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, + OPAL_INFO_LVL_6, + MCA_BASE_VAR_SCOPE_READONLY, + &ompi_mtl_ofi.ofi_progress_event_count); + + free(desc); + ret = mca_base_var_enum_create ("control_prog_type", control_prog_type, &new_enum); if (OPAL_SUCCESS != ret) { return ret; } - control_progress = MTL_OFI_PROG_MANUAL; + control_progress = MTL_OFI_PROG_UNSPEC; mca_base_component_var_register (&mca_mtl_ofi_component.super.mtl_version, "control_progress", - "Specify control progress model (default: manual). Set to auto for auto progress.", + "Specify control progress model (default: unspecificed, use provider's default). Set to auto or manual for auto or manual progress respectively.", MCA_BASE_VAR_TYPE_INT, new_enum, 0, 0, OPAL_INFO_LVL_3, MCA_BASE_VAR_SCOPE_READONLY, @@ -142,10 +158,10 @@ ompi_mtl_ofi_component_register(void) return ret; } - data_progress = MTL_OFI_PROG_AUTO; + data_progress = MTL_OFI_PROG_UNSPEC; mca_base_component_var_register(&mca_mtl_ofi_component.super.mtl_version, "data_progress", - "Specify data progress model (default: auto). Set to manual for manual progress.", + "Specify data progress model (default: unspecified, use provider's default). Set to auto or manual for auto or manual progress respectively.", MCA_BASE_VAR_TYPE_INT, new_enum, 0, 0, OPAL_INFO_LVL_3, MCA_BASE_VAR_SCOPE_READONLY, @@ -231,7 +247,7 @@ is_in_list(char **list, char *item) } while (NULL != list[i]) { - if (0 == strncmp(item, list[i], strlen(item))) { + if (0 == strncmp(item, list[i], strlen(list[i]))) { return 1; } else { i++; @@ -325,16 +341,26 @@ ompi_mtl_ofi_component_init(bool enable_progress_threads, hints->domain_attr->threading = FI_THREAD_UNSPEC; - if (MTL_OFI_PROG_AUTO == control_progress) { - hints->domain_attr->control_progress = FI_PROGRESS_AUTO; - } else { + switch (control_progress) { + case MTL_OFI_PROG_AUTO: + hints->domain_attr->control_progress = FI_PROGRESS_AUTO; + break; + case MTL_OFI_PROG_MANUAL: hints->domain_attr->control_progress = FI_PROGRESS_MANUAL; + break; + default: + hints->domain_attr->control_progress = FI_PROGRESS_UNSPEC; } - if (MTL_OFI_PROG_MANUAL == data_progress) { + switch (data_progress) { + case MTL_OFI_PROG_AUTO: + hints->domain_attr->data_progress = FI_PROGRESS_AUTO; + break; + case MTL_OFI_PROG_MANUAL: hints->domain_attr->data_progress = FI_PROGRESS_MANUAL; - } else { - hints->domain_attr->data_progress = FI_PROGRESS_AUTO; + break; + default: + hints->domain_attr->data_progress = FI_PROGRESS_UNSPEC; } if (MTL_OFI_AV_TABLE == av_type) { @@ -361,12 +387,16 @@ ompi_mtl_ofi_component_init(bool enable_progress_threads, NULL, /* Optional name or fabric to resolve */ NULL, /* Optional service name or port to request */ 0ULL, /* Optional flag */ - hints, /* In: Hints to filter providers */ + hints, /* In: Hints to filter providers */ &providers); /* Out: List of matching providers */ - if (0 != ret) { - opal_output_verbose(1, ompi_mtl_base_framework.framework_output, - "%s:%d: fi_getinfo failed: %s\n", - __FILE__, __LINE__, fi_strerror(-ret)); + if (FI_ENODATA == -ret) { + // It is not an error if no information is returned. + goto error; + } else if (0 != ret) { + opal_show_help("help-mtl-ofi.txt", "OFI call fail", true, + "fi_getinfo", + ompi_process_info.nodename, __FILE__, __LINE__, + fi_strerror(-ret), -ret); goto error; } @@ -392,9 +422,10 @@ ompi_mtl_ofi_component_init(bool enable_progress_threads, &ompi_mtl_ofi.fabric, /* Out: Fabric handle */ NULL); /* Optional context for fabric events */ if (0 != ret) { - opal_output_verbose(1, ompi_mtl_base_framework.framework_output, - "%s:%d: fi_fabric failed: %s\n", - __FILE__, __LINE__, fi_strerror(-ret)); + opal_show_help("help-mtl-ofi.txt", "OFI call fail", true, + "fi_fabric", + ompi_process_info.nodename, __FILE__, __LINE__, + fi_strerror(-ret), -ret); goto error; } @@ -408,9 +439,10 @@ ompi_mtl_ofi_component_init(bool enable_progress_threads, &ompi_mtl_ofi.domain, /* Out: Domain oject */ NULL); /* Optional context for domain events */ if (0 != ret) { - opal_output_verbose(1, ompi_mtl_base_framework.framework_output, - "%s:%d: fi_domain failed: %s\n", - __FILE__, __LINE__, fi_strerror(-ret)); + opal_show_help("help-mtl-ofi.txt", "OFI call fail", true, + "fi_domain", + ompi_process_info.nodename, __FILE__, __LINE__, + fi_strerror(-ret), -ret); goto error; } @@ -426,9 +458,10 @@ ompi_mtl_ofi_component_init(bool enable_progress_threads, &ompi_mtl_ofi.ep, /* Out: Endpoint object */ NULL); /* Optional context */ if (0 != ret) { - opal_output_verbose(1, ompi_mtl_base_framework.framework_output, - "%s:%d: fi_endpoint failed: %s\n", - __FILE__, __LINE__, fi_strerror(-ret)); + opal_show_help("help-mtl-ofi.txt", "OFI call fail", true, + "fi_endpoint", + ompi_process_info.nodename, __FILE__, __LINE__, + fi_strerror(-ret), -ret); goto error; } @@ -445,6 +478,19 @@ ompi_mtl_ofi_component_init(bool enable_progress_threads, * - dynamic memory-spanning memory region */ cq_attr.format = FI_CQ_FORMAT_TAGGED; + + /** + * If a user has set an ofi_progress_event_count > the default, then + * the CQ size hint is set to the user's desired value such that + * the CQ created will have enough slots to store up to + * ofi_progress_event_count events. If a user has not set the + * ofi_progress_event_count, then the provider is trusted to set a + * default high CQ size and the CQ size hint is left unspecified. + */ + if (ompi_mtl_ofi.ofi_progress_event_count > 100) { + cq_attr.size = ompi_mtl_ofi.ofi_progress_event_count; + } + ret = fi_cq_open(ompi_mtl_ofi.domain, &cq_attr, &ompi_mtl_ofi.cq, NULL); if (ret) { opal_output_verbose(1, ompi_mtl_base_framework.framework_output, @@ -453,6 +499,17 @@ ompi_mtl_ofi_component_init(bool enable_progress_threads, goto error; } + /** + * Allocate memory for storing the CQ events read in OFI progress. + */ + ompi_mtl_ofi.progress_entries = calloc(ompi_mtl_ofi.ofi_progress_event_count, sizeof(struct fi_cq_tagged_entry)); + if (OPAL_UNLIKELY(!ompi_mtl_ofi.progress_entries)) { + opal_output_verbose(1, ompi_mtl_base_framework.framework_output, + "%s:%d: alloc of CQ event storage failed: %s\n", + __FILE__, __LINE__, strerror(errno)); + goto error; + } + /** * The remote fi_addr will be stored in the ofi_endpoint struct. */ @@ -575,44 +632,52 @@ ompi_mtl_ofi_component_init(bool enable_progress_threads, if (ompi_mtl_ofi.fabric) { (void) fi_close((fid_t)ompi_mtl_ofi.fabric); } + if (ompi_mtl_ofi.progress_entries) { + free(ompi_mtl_ofi.progress_entries); + } + return NULL; } int ompi_mtl_ofi_finalize(struct mca_mtl_base_module_t *mtl) { + ssize_t ret; + opal_progress_unregister(ompi_mtl_ofi_progress_no_inline); - /** - * * Close all the OFI objects - * */ - if (fi_close((fid_t)ompi_mtl_ofi.ep)) { - opal_output(ompi_mtl_base_framework.framework_output, - "fi_close failed: %s", strerror(errno)); - abort(); - } - if (fi_close((fid_t)ompi_mtl_ofi.cq)) { - opal_output(ompi_mtl_base_framework.framework_output, - "fi_close failed: %s", strerror(errno)); - abort(); - } - if (fi_close((fid_t)ompi_mtl_ofi.av)) { - opal_output(ompi_mtl_base_framework.framework_output, - "fi_close failed: %s", strerror(errno)); - abort(); - } - if (fi_close((fid_t)ompi_mtl_ofi.domain)) { - opal_output(ompi_mtl_base_framework.framework_output, - "fi_close failed: %s", strerror(errno)); - abort(); - } - if (fi_close((fid_t)ompi_mtl_ofi.fabric)) { - opal_output(ompi_mtl_base_framework.framework_output, - "fi_close failed: %s", strerror(errno)); - abort(); + /* Close all the OFI objects */ + if ((ret = fi_close((fid_t)ompi_mtl_ofi.ep))) { + goto finalize_err; } + if ((ret = fi_close((fid_t)ompi_mtl_ofi.cq))) { + goto finalize_err; + } + + if ((ret = fi_close((fid_t)ompi_mtl_ofi.av))) { + goto finalize_err; + } + + if ((ret = fi_close((fid_t)ompi_mtl_ofi.domain))) { + goto finalize_err; + } + + if ((ret = fi_close((fid_t)ompi_mtl_ofi.fabric))) { + goto finalize_err; + } + + free(ompi_mtl_ofi.progress_entries); + return OMPI_SUCCESS; + +finalize_err: + opal_show_help("help-mtl-ofi.txt", "OFI call fail", true, + "fi_close", + ompi_process_info.nodename, __FILE__, __LINE__, + fi_strerror(-ret), -ret); + + return OMPI_ERROR; } diff --git a/ompi/mca/mtl/ofi/mtl_ofi_request.h b/ompi/mca/mtl/ofi/mtl_ofi_request.h index 5e2faad6456..15bbd2b0148 100644 --- a/ompi/mca/mtl/ofi/mtl_ofi_request.h +++ b/ompi/mca/mtl/ofi/mtl_ofi_request.h @@ -1,5 +1,7 @@ /* * Copyright (c) 2013-2016 Intel, Inc. All rights reserved + * Copyright (c) 2017 Los Alamos National Security, LLC. All rights + * reserved. * * $COPYRIGHT$ * @@ -71,9 +73,12 @@ struct ompi_mtl_ofi_request_t { /** Flag to prevent MPI_Cancel from cancelling a started Recv request */ volatile bool req_started; - /** Request's tag used in case of an error. */ + /** Request's tag used in case of an error. Also for FI_CLAIM requests. */ uint64_t match_bits; + /** Used to build msg for fi_trecvmsg with FI_CLAIM */ + uint64_t mask_bits; + /** Remote OFI address used when a Recv needs to be ACKed */ fi_addr_t remote_addr; diff --git a/ompi/mca/mtl/ofi/mtl_ofi_types.h b/ompi/mca/mtl/ofi/mtl_ofi_types.h index 1b1bdb1e1c5..0b6a1fcc715 100644 --- a/ompi/mca/mtl/ofi/mtl_ofi_types.h +++ b/ompi/mca/mtl/ofi/mtl_ofi_types.h @@ -49,6 +49,12 @@ typedef struct mca_mtl_ofi_module_t { /** Maximum inject size */ size_t max_inject_size; + /** Maximum number of CQ events to read in OFI Progress */ + int ofi_progress_event_count; + + /** CQ event storage */ + struct fi_cq_tagged_entry *progress_entries; + } mca_mtl_ofi_module_t; extern mca_mtl_ofi_module_t ompi_mtl_ofi; diff --git a/ompi/mca/mtl/portals4/Makefile.am b/ompi/mca/mtl/portals4/Makefile.am index 1693ff435d7..df3f13a5586 100644 --- a/ompi/mca/mtl/portals4/Makefile.am +++ b/ompi/mca/mtl/portals4/Makefile.am @@ -12,6 +12,7 @@ # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2010-2012 Sandia National Laboratories. All rights reserved. # Copyright (c) 2014 Intel, Inc. All rights reserved +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -59,7 +60,8 @@ endif mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_mtl_portals4_la_SOURCES = $(local_sources) -mca_mtl_portals4_la_LIBADD = $(mtl_portals4_LIBS) +mca_mtl_portals4_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(mtl_portals4_LIBS) mca_mtl_portals4_la_LDFLAGS = -module -avoid-version $(mtl_portals4_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/ompi/mca/mtl/portals4/mtl_portals4.c b/ompi/mca/mtl/portals4/mtl_portals4.c index 2d25c8db7dd..5371a8be4dc 100644 --- a/ompi/mca/mtl/portals4/mtl_portals4.c +++ b/ompi/mca/mtl/portals4/mtl_portals4.c @@ -548,8 +548,10 @@ ompi_mtl_portals4_del_procs(struct mca_mtl_base_module_t *mtl, int ompi_mtl_portals4_finalize(struct mca_mtl_base_module_t *mtl) { - opal_progress_unregister(ompi_mtl_portals4_progress); - while (0 != ompi_mtl_portals4_progress()) { } + if (0 == ompi_mtl_portals4.need_init) { + opal_progress_unregister(ompi_mtl_portals4_progress); + while (0 != ompi_mtl_portals4_progress()) { } + } #if OMPI_MTL_PORTALS4_FLOW_CONTROL ompi_mtl_portals4_flowctl_fini(); diff --git a/ompi/mca/mtl/portals4/mtl_portals4.h b/ompi/mca/mtl/portals4/mtl_portals4.h index 82975f6219d..52b21b9354d 100644 --- a/ompi/mca/mtl/portals4/mtl_portals4.h +++ b/ompi/mca/mtl/portals4/mtl_portals4.h @@ -71,6 +71,10 @@ struct mca_mtl_portals4_module_t { /* free list of message for matched probe */ opal_free_list_t fl_message; + /* free list of rendezvous get fragments */ + opal_free_list_t fl_rndv_get_frag; + int get_retransmit_timeout; + /** Network interface handle for matched interface */ ptl_handle_ni_t ni_h; /** Limit given by portals after NIInit */ diff --git a/ompi/mca/mtl/portals4/mtl_portals4_component.c b/ompi/mca/mtl/portals4/mtl_portals4_component.c index 3509efa03be..915e3e2fc74 100644 --- a/ompi/mca/mtl/portals4/mtl_portals4_component.c +++ b/ompi/mca/mtl/portals4/mtl_portals4_component.c @@ -75,6 +75,10 @@ static mca_base_var_enum_value_t long_protocol_values[] = { {0, NULL} }; +OBJ_CLASS_INSTANCE(ompi_mtl_portals4_rndv_get_frag_t, + opal_free_list_item_t, + NULL, NULL); + static int ompi_mtl_portals4_component_register(void) { @@ -198,6 +202,16 @@ ompi_mtl_portals4_component_register(void) MCA_BASE_VAR_SCOPE_READONLY, &ompi_mtl_portals4.max_msg_size_mtl); + ompi_mtl_portals4.get_retransmit_timeout=10000; + (void) mca_base_component_var_register(&mca_mtl_portals4_component.mtl_version, + "get_retransmit_timeout", + "PtlGET retransmission timeout in usec", + MCA_BASE_VAR_TYPE_INT, + NULL, 0, 0, + OPAL_INFO_LVL_5, + MCA_BASE_VAR_SCOPE_READONLY, + &ompi_mtl_portals4.get_retransmit_timeout); + OBJ_RELEASE(new_enum); if (0 > ret) { return OMPI_ERR_NOT_SUPPORTED; @@ -251,6 +265,13 @@ ompi_mtl_portals4_component_open(void) OBJ_CLASS(ompi_mtl_portals4_message_t), 0, 0, 1, -1, 1, NULL, 0, NULL, NULL, NULL); + OBJ_CONSTRUCT(&ompi_mtl_portals4.fl_rndv_get_frag, opal_free_list_t); + opal_free_list_init(&ompi_mtl_portals4.fl_rndv_get_frag, + sizeof(ompi_mtl_portals4_rndv_get_frag_t), + opal_cache_line_size, + OBJ_CLASS(ompi_mtl_portals4_rndv_get_frag_t), + 0, 0, 1, -1, 1, NULL, 0, NULL, NULL, NULL); + ompi_mtl_portals4.ni_h = PTL_INVALID_HANDLE; ompi_mtl_portals4.send_eq_h = PTL_INVALID_HANDLE; ompi_mtl_portals4.recv_eq_h = PTL_INVALID_HANDLE; @@ -478,6 +499,7 @@ ompi_mtl_portals4_progress(void) unsigned int which; ptl_event_t ev; ompi_mtl_portals4_base_request_t *ptl_request; + ompi_mtl_portals4_rndv_get_frag_t *rndv_get_frag; while (true) { ret = PtlEQPoll(ompi_mtl_portals4.eqs_h, 2, 0, &ev, &which); @@ -489,7 +511,6 @@ ompi_mtl_portals4_progress(void) case PTL_EVENT_GET: case PTL_EVENT_PUT: case PTL_EVENT_PUT_OVERFLOW: - case PTL_EVENT_REPLY: case PTL_EVENT_SEND: case PTL_EVENT_ACK: case PTL_EVENT_AUTO_FREE: @@ -507,6 +528,18 @@ ompi_mtl_portals4_progress(void) } break; + case PTL_EVENT_REPLY: + if (NULL != ev.user_ptr) { + rndv_get_frag = ev.user_ptr; + ret = rndv_get_frag->event_callback(&ev, rndv_get_frag); + if (OMPI_SUCCESS != ret) { + opal_output(ompi_mtl_base_framework.framework_output, + "Error returned from target event callback: %d", ret); + abort(); + } + } + break; + case PTL_EVENT_PT_DISABLED: #if OMPI_MTL_PORTALS4_FLOW_CONTROL OPAL_OUTPUT_VERBOSE((10, ompi_mtl_base_framework.framework_output, diff --git a/ompi/mca/mtl/portals4/mtl_portals4_flowctl.c b/ompi/mca/mtl/portals4/mtl_portals4_flowctl.c index ee9d055d8ac..19d3b600b36 100644 --- a/ompi/mca/mtl/portals4/mtl_portals4_flowctl.c +++ b/ompi/mca/mtl/portals4/mtl_portals4_flowctl.c @@ -1,7 +1,7 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* * Copyright (c) 2012 Sandia National Laboratories. All rights reserved. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2015-2017 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -296,9 +296,10 @@ ompi_mtl_portals4_flowctl_add_procs(size_t me, int ompi_mtl_portals4_flowctl_trigger(void) { + int32_t _tmp_value = 0; int ret; - if (true == OPAL_ATOMIC_CMPSET_32(&ompi_mtl_portals4.flowctl.flowctl_active, false, true)) { + if (true == OPAL_ATOMIC_COMPARE_EXCHANGE_STRONG_32(&ompi_mtl_portals4.flowctl.flowctl_active, &_tmp_value, 1)) { /* send trigger to root */ ret = PtlPut(ompi_mtl_portals4.zero_md_h, 0, @@ -346,7 +347,7 @@ start_recover(void) int64_t epoch_counter; ompi_mtl_portals4.flowctl.flowctl_active = true; - epoch_counter = opal_atomic_add_64(&ompi_mtl_portals4.flowctl.epoch_counter, 1); + epoch_counter = opal_atomic_add_fetch_64(&ompi_mtl_portals4.flowctl.epoch_counter, 1); opal_output_verbose(1, ompi_mtl_base_framework.framework_output, "Entering flowctl_start_recover %ld", diff --git a/ompi/mca/mtl/portals4/mtl_portals4_recv.c b/ompi/mca/mtl/portals4/mtl_portals4_recv.c index 387aa53be02..f2737428e26 100644 --- a/ompi/mca/mtl/portals4/mtl_portals4_recv.c +++ b/ompi/mca/mtl/portals4/mtl_portals4_recv.c @@ -27,6 +27,7 @@ #include "ompi/mca/mtl/base/base.h" #include "ompi/mca/mtl/base/mtl_base_datatype.h" #include "ompi/message/message.h" +#include "opal/mca/timer/base/base.h" #include "mtl_portals4.h" #include "mtl_portals4_endpoint.h" @@ -34,45 +35,74 @@ #include "mtl_portals4_recv_short.h" #include "mtl_portals4_message.h" + +static int +ompi_mtl_portals4_recv_progress(ptl_event_t *ev, + ompi_mtl_portals4_base_request_t* ptl_base_request); +static int +ompi_mtl_portals4_rndv_get_frag_progress(ptl_event_t *ev, + ompi_mtl_portals4_rndv_get_frag_t* rndv_get_frag); + static int read_msg(void *start, ptl_size_t length, ptl_process_t target, ptl_match_bits_t match_bits, ptl_size_t remote_offset, ompi_mtl_portals4_recv_request_t *request) { int ret, i; - ptl_size_t rest = length, asked = 0, frag_size; - int32_t pending_reply; + ptl_size_t rest = length, asked = 0; + int32_t frag_count; #if OMPI_MTL_PORTALS4_FLOW_CONTROL - while (OPAL_UNLIKELY(OPAL_THREAD_ADD32(&ompi_mtl_portals4.flowctl.send_slots, -1) < 0)) { - OPAL_THREAD_ADD32(&ompi_mtl_portals4.flowctl.send_slots, 1); + while (OPAL_UNLIKELY(OPAL_THREAD_ADD_FETCH32(&ompi_mtl_portals4.flowctl.send_slots, -1) < 0)) { + OPAL_THREAD_ADD_FETCH32(&ompi_mtl_portals4.flowctl.send_slots, 1); ompi_mtl_portals4_progress(); } #endif - request->pending_reply = (length + ompi_mtl_portals4.max_msg_size_mtl - 1) / ompi_mtl_portals4.max_msg_size_mtl; - pending_reply = request->pending_reply; + frag_count = (length + ompi_mtl_portals4.max_msg_size_mtl - 1) / ompi_mtl_portals4.max_msg_size_mtl; + ret = OPAL_THREAD_ADD_FETCH32(&(request->pending_reply), frag_count); + + for (i = 0 ; i < frag_count ; i++) { + opal_free_list_item_t *tmp; + ompi_mtl_portals4_rndv_get_frag_t* frag; + + tmp = opal_free_list_get (&ompi_mtl_portals4.fl_rndv_get_frag); + if (NULL == tmp) return OMPI_ERR_OUT_OF_RESOURCE; + + frag = (ompi_mtl_portals4_rndv_get_frag_t*) tmp; + + frag->request = request; +#if OPAL_ENABLE_DEBUG + frag->frag_num = i; +#endif + frag->frag_start = (char*)start + i * ompi_mtl_portals4.max_msg_size_mtl; + frag->frag_length = (OPAL_UNLIKELY(rest > ompi_mtl_portals4.max_msg_size_mtl)) ? ompi_mtl_portals4.max_msg_size_mtl : rest; + frag->frag_target = target; + frag->frag_match_bits = match_bits; + frag->frag_remote_offset = remote_offset + i * ompi_mtl_portals4.max_msg_size_mtl; + + frag->event_callback = ompi_mtl_portals4_rndv_get_frag_progress; + frag->frag_abs_timeout_usec = 0; + + OPAL_OUTPUT_VERBOSE((90, ompi_mtl_base_framework.framework_output, "GET (fragment %d/%d, size %ld) send", + i + 1, frag_count, frag->frag_length)); - for (i = 0 ; i < pending_reply ; i++) { - OPAL_OUTPUT_VERBOSE((90, ompi_mtl_base_framework.framework_output, "GET (fragment %d/%d) send", - i + 1, pending_reply)); - frag_size = (OPAL_UNLIKELY(rest > ompi_mtl_portals4.max_msg_size_mtl)) ? ompi_mtl_portals4.max_msg_size_mtl : rest; ret = PtlGet(ompi_mtl_portals4.send_md_h, - (ptl_size_t) start + i * ompi_mtl_portals4.max_msg_size_mtl, - frag_size, - target, + (ptl_size_t) frag->frag_start, + frag->frag_length, + frag->frag_target, ompi_mtl_portals4.read_idx, - match_bits, - remote_offset + i * ompi_mtl_portals4.max_msg_size_mtl, - request); + frag->frag_match_bits, + frag->frag_remote_offset, + frag); if (OPAL_UNLIKELY(PTL_OK != ret)) { opal_output_verbose(1, ompi_mtl_base_framework.framework_output, "%s:%d: PtlGet failed: %d", __FILE__, __LINE__, ret); return OMPI_ERR_OUT_OF_RESOURCE; } - rest -= frag_size; - asked += frag_size; + rest -= frag->frag_length; + asked += frag->frag_length; } return OMPI_SUCCESS; @@ -134,9 +164,8 @@ ompi_mtl_portals4_recv_progress(ptl_event_t *ev, /* If it's not a short message and we're doing rndv and the message is not complete, we only have the first part of the message. Issue the get to pull the second part of the message. */ - ret = read_msg((char*) ptl_request->delivery_ptr + ev->mlength, - ((msg_length > ptl_request->delivery_len) ? - ptl_request->delivery_len : msg_length) - ev->mlength, + ret = read_msg((char*)ptl_request->delivery_ptr + ev->mlength, + ((msg_length > ptl_request->delivery_len) ? ptl_request->delivery_len : msg_length) - ev->mlength, ev->initiator, ev->hdr_data, ev->mlength, @@ -165,54 +194,6 @@ ompi_mtl_portals4_recv_progress(ptl_event_t *ev, } break; - case PTL_EVENT_REPLY: - OPAL_OUTPUT_VERBOSE((50, ompi_mtl_base_framework.framework_output, - "Recv %lu (0x%lx) got reply event", - ptl_request->opcount, ptl_request->hdr_data)); - - if (OPAL_UNLIKELY(ev->ni_fail_type != PTL_NI_OK)) { - opal_output_verbose(1, ompi_mtl_base_framework.framework_output, - "%s:%d: PTL_EVENT_REPLY with ni_fail_type: %d", - __FILE__, __LINE__, ev->ni_fail_type); - ret = PTL_FAIL; - goto callback_error; - } - - /* set the received length in the status, now that we know - exactly how much data was sent. */ - ptl_request->super.super.ompi_req->req_status._ucount += ev->mlength; - - ret = OPAL_THREAD_ADD32(&(ptl_request->pending_reply), -1); - if (ret > 0) { - return OMPI_SUCCESS; - } - assert(ptl_request->pending_reply == 0); - -#if OMPI_MTL_PORTALS4_FLOW_CONTROL - OPAL_THREAD_ADD32(&ompi_mtl_portals4.flowctl.send_slots, 1); -#endif - - /* make sure the data is in the right place. Use _ucount for - the total length because it will be set correctly for all - three protocols. mlength is only correct for eager, and - delivery_len is the length of the buffer, not the length of - the send. */ - ret = ompi_mtl_datatype_unpack(ptl_request->convertor, - ptl_request->delivery_ptr, - ptl_request->super.super.ompi_req->req_status._ucount); - if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { - opal_output_verbose(1, ompi_mtl_base_framework.framework_output, - "%s:%d: ompi_mtl_datatype_unpack failed: %d", - __FILE__, __LINE__, ret); - ptl_request->super.super.ompi_req->req_status.MPI_ERROR = ret; - } - - OPAL_OUTPUT_VERBOSE((50, ompi_mtl_base_framework.framework_output, - "Recv %lu (0x%lx) completed , reply (pending_reply: %d)", - ptl_request->opcount, ptl_request->hdr_data, ptl_request->pending_reply)); - ptl_request->super.super.completion_callback(&ptl_request->super.super); - break; - case PTL_EVENT_PUT_OVERFLOW: OPAL_OUTPUT_VERBOSE((50, ompi_mtl_base_framework.framework_output, "Recv %lu (0x%lx) got put_overflow event", @@ -301,9 +282,8 @@ ompi_mtl_portals4_recv_progress(ptl_event_t *ev, /* For long messages in the overflow list, ev->mlength = 0 */ ptl_request->super.super.ompi_req->req_status._ucount = 0; - ret = read_msg((char*) ptl_request->delivery_ptr, - (msg_length > ptl_request->delivery_len) ? - ptl_request->delivery_len : msg_length, + ret = read_msg((char*)ptl_request->delivery_ptr, + (msg_length > ptl_request->delivery_len) ? ptl_request->delivery_len : msg_length, ev->initiator, ev->hdr_data, 0, @@ -336,6 +316,115 @@ ompi_mtl_portals4_recv_progress(ptl_event_t *ev, } +static int +ompi_mtl_portals4_rndv_get_frag_progress(ptl_event_t *ev, + ompi_mtl_portals4_rndv_get_frag_t* rndv_get_frag) +{ + int ret; + ompi_mtl_portals4_recv_request_t* ptl_request = + (ompi_mtl_portals4_recv_request_t*) rndv_get_frag->request; + + assert(PTL_EVENT_REPLY == ev->type); + + OPAL_OUTPUT_VERBOSE((50, ompi_mtl_base_framework.framework_output, + "Recv %lu (0x%lx) got reply event", + ptl_request->opcount, ptl_request->hdr_data)); + + + if (OPAL_UNLIKELY(ev->ni_fail_type != PTL_NI_OK)) { + opal_output_verbose(1, ompi_mtl_base_framework.framework_output, + "%s:%d: PTL_EVENT_REPLY with ni_fail_type: %d", + __FILE__, __LINE__, ev->ni_fail_type); + + if (OPAL_UNLIKELY(ev->ni_fail_type != PTL_NI_DROPPED)) { + opal_output_verbose(1, ompi_mtl_base_framework.framework_output, + "PTL_EVENT_REPLY with ni_fail_type: %u => cannot retry", + (uint32_t)ev->ni_fail_type); + ret = PTL_FAIL; + goto callback_error; + } + + if (0 == rndv_get_frag->frag_abs_timeout_usec) { + /* this is the first retry of the frag. start the timer. */ + /* instead of recording the start time, record the end time + * and avoid addition on each retry. */ + rndv_get_frag->frag_abs_timeout_usec = opal_timer_base_get_usec() + ompi_mtl_portals4.get_retransmit_timeout; + opal_output_verbose(1, ompi_mtl_base_framework.framework_output, + "setting frag timeout at %lu", + rndv_get_frag->frag_abs_timeout_usec); + } else if (opal_timer_base_get_usec() >= rndv_get_frag->frag_abs_timeout_usec) { + opal_output_verbose(1, ompi_mtl_base_framework.framework_output, + "timeout retrying GET"); + ret = PTL_FAIL; + goto callback_error; + } + + OPAL_OUTPUT_VERBOSE((50, ompi_mtl_base_framework.framework_output, + "Rendezvous Get Failed: Reissuing frag #%u", rndv_get_frag->frag_num)); + + ret = PtlGet(ompi_mtl_portals4.send_md_h, + (ptl_size_t) rndv_get_frag->frag_start, + rndv_get_frag->frag_length, + rndv_get_frag->frag_target, + ompi_mtl_portals4.read_idx, + rndv_get_frag->frag_match_bits, + rndv_get_frag->frag_remote_offset, + rndv_get_frag); + if (OPAL_UNLIKELY(PTL_OK != ret)) { + if (NULL != ptl_request->buffer_ptr) free(ptl_request->buffer_ptr); + goto callback_error; + } + return OMPI_SUCCESS; + } + + /* set the received length in the status, now that we know + exactly how much data was sent. */ + ptl_request->super.super.ompi_req->req_status._ucount += ev->mlength; + + /* this frag is complete. return to freelist. */ + opal_free_list_return (&ompi_mtl_portals4.fl_rndv_get_frag, + &rndv_get_frag->super); + + ret = OPAL_THREAD_ADD_FETCH32(&(ptl_request->pending_reply), -1); + if (ret > 0) { + return OMPI_SUCCESS; + } + assert(ptl_request->pending_reply == 0); + +#if OMPI_MTL_PORTALS4_FLOW_CONTROL + OPAL_THREAD_ADD_FETCH32(&ompi_mtl_portals4.flowctl.send_slots, 1); +#endif + + /* make sure the data is in the right place. Use _ucount for + the total length because it will be set correctly for all + three protocols. mlength is only correct for eager, and + delivery_len is the length of the buffer, not the length of + the send. */ + ret = ompi_mtl_datatype_unpack(ptl_request->convertor, + ptl_request->delivery_ptr, + ptl_request->super.super.ompi_req->req_status._ucount); + if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { + opal_output_verbose(1, ompi_mtl_base_framework.framework_output, + "%s:%d: ompi_mtl_datatype_unpack failed: %d", + __FILE__, __LINE__, ret); + ptl_request->super.super.ompi_req->req_status.MPI_ERROR = ret; + } + + OPAL_OUTPUT_VERBOSE((50, ompi_mtl_base_framework.framework_output, + "Recv %lu (0x%lx) completed , reply (pending_reply: %d)", + ptl_request->opcount, ptl_request->hdr_data, ptl_request->pending_reply)); + ptl_request->super.super.completion_callback(&ptl_request->super.super); + + return OMPI_SUCCESS; + + callback_error: + ptl_request->super.super.ompi_req->req_status.MPI_ERROR = + ompi_mtl_portals4_get_error(ret); + ptl_request->super.super.completion_callback(&ptl_request->super.super); + return OMPI_SUCCESS; +} + + int ompi_mtl_portals4_irecv(struct mca_mtl_base_module_t* mtl, struct ompi_communicator_t *comm, @@ -379,7 +468,7 @@ ompi_mtl_portals4_irecv(struct mca_mtl_base_module_t* mtl, ptl_request->super.type = portals4_req_recv; ptl_request->super.event_callback = ompi_mtl_portals4_recv_progress; #if OPAL_ENABLE_DEBUG - ptl_request->opcount = OPAL_THREAD_ADD64((int64_t*) &ompi_mtl_portals4.recv_opcount, 1); + ptl_request->opcount = OPAL_THREAD_ADD_FETCH64((int64_t*) &ompi_mtl_portals4.recv_opcount, 1); ptl_request->hdr_data = 0; #endif ptl_request->buffer_ptr = (free_after) ? start : NULL; @@ -460,7 +549,7 @@ ompi_mtl_portals4_imrecv(struct mca_mtl_base_module_t* mtl, } #if OPAL_ENABLE_DEBUG - ptl_request->opcount = OPAL_THREAD_ADD64((int64_t*) &ompi_mtl_portals4.recv_opcount, 1); + ptl_request->opcount = OPAL_THREAD_ADD_FETCH64((int64_t*) &ompi_mtl_portals4.recv_opcount, 1); ptl_request->hdr_data = 0; #endif ptl_request->super.type = portals4_req_recv; diff --git a/ompi/mca/mtl/portals4/mtl_portals4_request.h b/ompi/mca/mtl/portals4/mtl_portals4_request.h index e187bce765e..c7e3c31e47a 100644 --- a/ompi/mca/mtl/portals4/mtl_portals4_request.h +++ b/ompi/mca/mtl/portals4/mtl_portals4_request.h @@ -22,6 +22,7 @@ #include "opal/datatype/opal_convertor.h" #include "ompi/mca/mtl/mtl.h" +#include "opal/mca/timer/base/base.h" struct ompi_mtl_portals4_message_t; struct ompi_mtl_portals4_pending_request_t; @@ -83,6 +84,28 @@ struct ompi_mtl_portals4_recv_request_t { }; typedef struct ompi_mtl_portals4_recv_request_t ompi_mtl_portals4_recv_request_t; +struct ompi_mtl_portals4_rndv_get_frag_t { + opal_free_list_item_t super; + /* the recv request that's composed of these frags */ + ompi_mtl_portals4_recv_request_t *request; + /* info extracted from the put_overflow event that is required to retry the rndv-get */ + void *frag_start; + ptl_size_t frag_length; + ptl_process_t frag_target; + ptl_hdr_data_t frag_match_bits; + ptl_size_t frag_remote_offset; + /* the absolute time at which this frag times out */ + opal_timer_t frag_abs_timeout_usec; + + int (*event_callback)(ptl_event_t *ev, struct ompi_mtl_portals4_rndv_get_frag_t*); + +#if OPAL_ENABLE_DEBUG + uint32_t frag_num; +#endif +}; +typedef struct ompi_mtl_portals4_rndv_get_frag_t ompi_mtl_portals4_rndv_get_frag_t; +OBJ_CLASS_DECLARATION(ompi_mtl_portals4_rndv_get_frag_t); + struct ompi_mtl_portals4_recv_short_request_t { ompi_mtl_portals4_base_request_t super; diff --git a/ompi/mca/mtl/portals4/mtl_portals4_send.c b/ompi/mca/mtl/portals4/mtl_portals4_send.c index 6393b9a465b..27291eed559 100644 --- a/ompi/mca/mtl/portals4/mtl_portals4_send.c +++ b/ompi/mca/mtl/portals4/mtl_portals4_send.c @@ -45,7 +45,7 @@ ompi_mtl_portals4_callback(ptl_event_t *ev, (ompi_mtl_portals4_isend_request_t*) ptl_base_request; if (PTL_EVENT_GET == ev->type) { - ret = OPAL_THREAD_ADD32(&(ptl_request->pending_get), -1); + ret = OPAL_THREAD_ADD_FETCH32(&(ptl_request->pending_get), -1); if (ret > 0) { /* wait for other gets */ OPAL_OUTPUT_VERBOSE((90, ompi_mtl_base_framework.framework_output, "PTL_EVENT_GET received now pending_get=%d",ret)); @@ -94,7 +94,7 @@ ompi_mtl_portals4_callback(ptl_event_t *ev, opal_list_append(&ompi_mtl_portals4.flowctl.pending_sends, &pending->super.super); - OPAL_THREAD_ADD32(&ompi_mtl_portals4.flowctl.send_slots, 1); + OPAL_THREAD_ADD_FETCH32(&ompi_mtl_portals4.flowctl.send_slots, 1); ompi_mtl_portals4_flowctl_trigger(); return OMPI_SUCCESS; @@ -124,7 +124,7 @@ ompi_mtl_portals4_callback(ptl_event_t *ev, if ((eager == ompi_mtl_portals4.protocol) || (ptl_request->length % ompi_mtl_portals4.max_msg_size_mtl <= ompi_mtl_portals4.eager_limit)) { - val = OPAL_THREAD_ADD32(&(ptl_request->pending_get), -1); + val = OPAL_THREAD_ADD_FETCH32(&(ptl_request->pending_get), -1); } if (0 == val) { add = 2; /* We haven't to wait for any get, so we have to add an extra count to cause the message to complete */ @@ -161,7 +161,7 @@ ompi_mtl_portals4_callback(ptl_event_t *ev, ptl_request->me_h = PTL_INVALID_HANDLE; add++; } - val = OPAL_THREAD_ADD32((int32_t*)&ptl_request->event_count, add); + val = OPAL_THREAD_ADD_FETCH32((int32_t*)&ptl_request->event_count, add); assert(val <= 3); if (val == 3) { @@ -174,7 +174,7 @@ ompi_mtl_portals4_callback(ptl_event_t *ev, *complete = true; #if OMPI_MTL_PORTALS4_FLOW_CONTROL - OPAL_THREAD_ADD32(&ompi_mtl_portals4.flowctl.send_slots, 1); + OPAL_THREAD_ADD_FETCH32(&ompi_mtl_portals4.flowctl.send_slots, 1); opal_free_list_return (&ompi_mtl_portals4.flowctl.pending_fl, &ptl_request->pending->super); @@ -422,15 +422,15 @@ ompi_mtl_portals4_pending_list_progress() while ((!ompi_mtl_portals4.flowctl.flowctl_active) && (0 != opal_list_get_size(&ompi_mtl_portals4.flowctl.pending_sends))) { - val = OPAL_THREAD_ADD32(&ompi_mtl_portals4.flowctl.send_slots, -1); + val = OPAL_THREAD_ADD_FETCH32(&ompi_mtl_portals4.flowctl.send_slots, -1); if (val < 0) { - OPAL_THREAD_ADD32(&ompi_mtl_portals4.flowctl.send_slots, 1); + OPAL_THREAD_ADD_FETCH32(&ompi_mtl_portals4.flowctl.send_slots, 1); return; } item = opal_list_remove_first(&ompi_mtl_portals4.flowctl.pending_sends); if (OPAL_UNLIKELY(NULL == item)) { - OPAL_THREAD_ADD32(&ompi_mtl_portals4.flowctl.send_slots, 1); + OPAL_THREAD_ADD_FETCH32(&ompi_mtl_portals4.flowctl.send_slots, 1); return; } @@ -456,7 +456,7 @@ ompi_mtl_portals4_pending_list_progress() if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { opal_list_prepend(&ompi_mtl_portals4.flowctl.pending_sends, &pending->super.super); - OPAL_THREAD_ADD32(&ompi_mtl_portals4.flowctl.send_slots, 1); + OPAL_THREAD_ADD_FETCH32(&ompi_mtl_portals4.flowctl.send_slots, 1); } } } @@ -492,7 +492,7 @@ ompi_mtl_portals4_send_start(struct mca_mtl_base_module_t* mtl, ret = ompi_mtl_datatype_pack(convertor, &start, &length, &free_after); if (OMPI_SUCCESS != ret) return ret; - ptl_request->opcount = OPAL_THREAD_ADD64((int64_t*)&ompi_mtl_portals4.opcount, 1); + ptl_request->opcount = OPAL_THREAD_ADD_FETCH64((int64_t*)&ompi_mtl_portals4.opcount, 1); ptl_request->buffer_ptr = (free_after) ? start : NULL; ptl_request->length = length; ptl_request->event_count = 0; @@ -520,15 +520,15 @@ ompi_mtl_portals4_send_start(struct mca_mtl_base_module_t* mtl, pending->ptl_proc = ptl_proc; pending->ptl_request = ptl_request; - if (OPAL_UNLIKELY(OPAL_THREAD_ADD32(&ompi_mtl_portals4.flowctl.send_slots, -1) < 0)) { - OPAL_THREAD_ADD32(&ompi_mtl_portals4.flowctl.send_slots, 1); + if (OPAL_UNLIKELY(OPAL_THREAD_ADD_FETCH32(&ompi_mtl_portals4.flowctl.send_slots, -1) < 0)) { + OPAL_THREAD_ADD_FETCH32(&ompi_mtl_portals4.flowctl.send_slots, 1); opal_list_append(&ompi_mtl_portals4.flowctl.pending_sends, &pending->super.super); return OMPI_SUCCESS; } if (OPAL_UNLIKELY(0 != opal_list_get_size(&ompi_mtl_portals4.flowctl.pending_sends))) { - OPAL_THREAD_ADD32(&ompi_mtl_portals4.flowctl.send_slots, 1); + OPAL_THREAD_ADD_FETCH32(&ompi_mtl_portals4.flowctl.send_slots, 1); opal_list_append(&ompi_mtl_portals4.flowctl.pending_sends, &pending->super.super); ompi_mtl_portals4_pending_list_progress(); @@ -536,7 +536,7 @@ ompi_mtl_portals4_send_start(struct mca_mtl_base_module_t* mtl, } if (OPAL_UNLIKELY(ompi_mtl_portals4.flowctl.flowctl_active)) { - OPAL_THREAD_ADD32(&ompi_mtl_portals4.flowctl.send_slots, 1); + OPAL_THREAD_ADD_FETCH32(&ompi_mtl_portals4.flowctl.send_slots, 1); opal_list_append(&ompi_mtl_portals4.flowctl.pending_sends, &pending->super.super); return OMPI_SUCCESS; diff --git a/ompi/mca/mtl/psm/Makefile.am b/ompi/mca/mtl/psm/Makefile.am index 816309f753b..6ebbb895dda 100644 --- a/ompi/mca/mtl/psm/Makefile.am +++ b/ompi/mca/mtl/psm/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2004-2006 The Regents of the University of California. # All rights reserved. # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -51,7 +52,8 @@ endif mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_mtl_psm_la_SOURCES = $(mtl_psm_sources) -mca_mtl_psm_la_LIBADD = $(mtl_psm_LIBS) +mca_mtl_psm_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(mtl_psm_LIBS) mca_mtl_psm_la_LDFLAGS = -module -avoid-version $(mtl_psm_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/ompi/mca/mtl/psm/help-mtl-psm.txt b/ompi/mca/mtl/psm/help-mtl-psm.txt index 9572b48ca47..8fe48cb2313 100644 --- a/ompi/mca/mtl/psm/help-mtl-psm.txt +++ b/ompi/mca/mtl/psm/help-mtl-psm.txt @@ -37,7 +37,10 @@ Unable to post application receive buffer (psm_mq_irecv). Error: %s Buffer: %p - Length: %d + Length: %llu # [path query mechanism unknown] Unknown path record query mechanism %s. Supported mechanisms are %s. +# +[message too big] +Message size %llu bigger than supported by PSM API. Max = %llu diff --git a/ompi/mca/mtl/psm/mtl_psm_recv.c b/ompi/mca/mtl/psm/mtl_psm_recv.c index b345ae19aa9..acf5137ab1d 100644 --- a/ompi/mca/mtl/psm/mtl_psm_recv.c +++ b/ompi/mca/mtl/psm/mtl_psm_recv.c @@ -50,6 +50,13 @@ ompi_mtl_psm_irecv(struct mca_mtl_base_module_t* mtl, if (OMPI_SUCCESS != ret) return ret; + if (length >= 1ULL << sizeof(uint32_t) * 8) { + opal_show_help("help-mtl-psm.txt", + "message too big", false, + length, 1ULL << sizeof(uint32_t) * 8); + return OMPI_ERROR; + } + mtl_psm_request->length = length; mtl_psm_request->convertor = convertor; mtl_psm_request->type = OMPI_MTL_PSM_IRECV; diff --git a/ompi/mca/mtl/psm/mtl_psm_send.c b/ompi/mca/mtl/psm/mtl_psm_send.c index c30801b1fbd..8f2e95a956b 100644 --- a/ompi/mca/mtl/psm/mtl_psm_send.c +++ b/ompi/mca/mtl/psm/mtl_psm_send.c @@ -24,6 +24,7 @@ #include "ompi/mca/pml/pml.h" #include "ompi/communicator/communicator.h" #include "opal/datatype/opal_convertor.h" +#include "opal/util/show_help.h" #include "mtl_psm.h" #include "mtl_psm_types.h" @@ -56,13 +57,19 @@ ompi_mtl_psm_send(struct mca_mtl_base_module_t* mtl, &length, &mtl_psm_request.free_after); + if (OMPI_SUCCESS != ret) return ret; + + if (length >= 1ULL << sizeof(uint32_t) * 8) { + opal_show_help("help-mtl-psm.txt", + "message too big", false, + length, 1ULL << sizeof(uint32_t) * 8); + return OMPI_ERROR; + } mtl_psm_request.length = length; mtl_psm_request.convertor = convertor; mtl_psm_request.type = OMPI_MTL_PSM_ISEND; - if (OMPI_SUCCESS != ret) return ret; - if (mode == MCA_PML_BASE_SEND_SYNCHRONOUS) flags |= PSM_MQ_FLAG_SENDSYNC; @@ -109,12 +116,20 @@ ompi_mtl_psm_isend(struct mca_mtl_base_module_t* mtl, &length, &mtl_psm_request->free_after); + + if (OMPI_SUCCESS != ret) return ret; + + if (length >= 1ULL << sizeof(uint32_t) * 8) { + opal_show_help("help-mtl-psm.txt", + "message too big", false, + length, 1ULL << sizeof(uint32_t) * 8); + return OMPI_ERROR; + } + mtl_psm_request->length= length; mtl_psm_request->convertor = convertor; mtl_psm_request->type = OMPI_MTL_PSM_ISEND; - if (OMPI_SUCCESS != ret) return ret; - if (mode == MCA_PML_BASE_SEND_SYNCHRONOUS) flags |= PSM_MQ_FLAG_SENDSYNC; diff --git a/ompi/mca/mtl/psm2/Makefile.am b/ompi/mca/mtl/psm2/Makefile.am index fa3c5201bb6..741c65e4638 100644 --- a/ompi/mca/mtl/psm2/Makefile.am +++ b/ompi/mca/mtl/psm2/Makefile.am @@ -11,6 +11,10 @@ # All rights reserved. # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2015 Intel, Inc. All rights reserved +# Copyright (c) 2017 Los Alamos National Security, LLC. +# All rights reserved. +# +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -35,6 +39,7 @@ mtl_psm2_sources = \ mtl_psm2_recv.c \ mtl_psm2_request.h \ mtl_psm2_send.c \ + mtl_psm2_stats.c \ mtl_psm2_types.h # Make the output library in this directory, and name it either @@ -52,7 +57,8 @@ endif mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_mtl_psm2_la_SOURCES = $(mtl_psm2_sources) -mca_mtl_psm2_la_LIBADD = $(mtl_psm2_LIBS) +mca_mtl_psm2_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(mtl_psm2_LIBS) mca_mtl_psm2_la_LDFLAGS = -module -avoid-version $(mtl_psm2_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/ompi/mca/mtl/psm2/help-mtl-psm2.txt b/ompi/mca/mtl/psm2/help-mtl-psm2.txt index 16c5116a2f9..7728e4d7a37 100644 --- a/ompi/mca/mtl/psm2/help-mtl-psm2.txt +++ b/ompi/mca/mtl/psm2/help-mtl-psm2.txt @@ -1,7 +1,7 @@ # -*- text -*- # # Copyright (C) 2009. QLogic Corporation. All rights reserved. -# Copyright (c) 2013-2015 Intel, Inc. All rights reserved. +# Copyright (c) 2013-2017 Intel, Inc. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -38,7 +38,26 @@ Unable to post application receive buffer (psm2_mq_irecv or psm2_mq_imrecv). Error: %s Buffer: %p - Length: %d + Length: %llu # [path query mechanism unknown] Unknown path record query mechanism %s. Supported mechanisms are %s. +# +[message too big] +Message size %llu bigger than supported by PSM2 API. Max = %llu +# +[no psm2 cuda env] +Warning: Open MPI has detected that you are running in an environment with CUDA +devices present and that you are using Intel(r) Ompi-Path networking. However, +the environment variable PSM2_CUDA was not set, meaning that the PSM2 Omni-Path +networking library was not told how to handle CUDA support. + +If your application uses CUDA buffers, you should set the environment variable +PSM2_CUDA to 1; otherwise, set it to 0. Setting the variable to the wrong value +can have performance implications on your application, or even cause it to +crash. + +Since it was not set, Open MPI has defaulted to setting the PSM2_CUDA +environment variable to 1. + +Local hostname: %s diff --git a/ompi/mca/mtl/psm2/mtl_psm2.c b/ompi/mca/mtl/psm2/mtl_psm2.c index eac986c3fc7..4b5fc9cfd9a 100644 --- a/ompi/mca/mtl/psm2/mtl_psm2.c +++ b/ompi/mca/mtl/psm2/mtl_psm2.c @@ -11,7 +11,7 @@ * Copyright (c) 2004-2006 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006 QLogic Corporation. All rights reserved. - * Copyright (c) 2013-2015 Intel, Inc. All rights reserved + * Copyright (c) 2013-2017 Intel, Inc. All rights reserved * Copyright (c) 2014 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2016 Research Organization for Information Science @@ -173,6 +173,10 @@ int ompi_mtl_psm2_module_init(int local_rank, int num_local_procs) { /* register the psm2 progress function */ opal_progress_register(ompi_mtl_psm2_progress); +#if OPAL_CUDA_SUPPORT + ompi_mtl_psm2.super.mtl_flags |= MCA_MTL_BASE_FLAG_CUDA_INIT_DISABLE; +#endif + return OMPI_SUCCESS; } @@ -402,58 +406,62 @@ int ompi_mtl_psm2_progress( void ) { int completed = 1; do { + OPAL_THREAD_LOCK(&mtl_psm2_mq_mutex); err = psm2_mq_ipeek2(ompi_mtl_psm2.mq, &req, NULL); - if (err == PSM2_MQ_INCOMPLETE) { - return completed; - } else if (err != PSM2_OK) { - goto error; - } + if (err == PSM2_MQ_INCOMPLETE) { + OPAL_THREAD_UNLOCK(&mtl_psm2_mq_mutex); + return completed; + } else if (OPAL_UNLIKELY(err != PSM2_OK)) { + OPAL_THREAD_UNLOCK(&mtl_psm2_mq_mutex); + goto error; + } - completed++; + err = psm2_mq_test2(&req, &psm2_status); + OPAL_THREAD_UNLOCK(&mtl_psm2_mq_mutex); - err = psm2_mq_test2(&req, &psm2_status); - if (err != PSM2_OK) { - goto error; - } + if (OPAL_UNLIKELY (err != PSM2_OK)) { + goto error; + } + + completed++; mtl_psm2_request = (mca_mtl_psm2_request_t*) psm2_status.context; - if (mtl_psm2_request->type == OMPI_mtl_psm2_IRECV) { + if (mtl_psm2_request->type == OMPI_mtl_psm2_IRECV) { - mtl_psm2_request->super.ompi_req->req_status.MPI_SOURCE = - psm2_status.msg_tag.tag1; - mtl_psm2_request->super.ompi_req->req_status.MPI_TAG = - psm2_status.msg_tag.tag0; + mtl_psm2_request->super.ompi_req->req_status.MPI_SOURCE = + psm2_status.msg_tag.tag1; + mtl_psm2_request->super.ompi_req->req_status.MPI_TAG = + psm2_status.msg_tag.tag0; mtl_psm2_request->super.ompi_req->req_status._ucount = psm2_status.nbytes; ompi_mtl_datatype_unpack(mtl_psm2_request->convertor, - mtl_psm2_request->buf, - psm2_status.msg_length); - } - - if(mtl_psm2_request->type == OMPI_mtl_psm2_ISEND) { - if (mtl_psm2_request->free_after) { - free(mtl_psm2_request->buf); - } - } + mtl_psm2_request->buf, + psm2_status.msg_length); + } - switch (psm2_status.error_code) { - case PSM2_OK: - mtl_psm2_request->super.ompi_req->req_status.MPI_ERROR = - OMPI_SUCCESS; - break; - case PSM2_MQ_TRUNCATION: - mtl_psm2_request->super.ompi_req->req_status.MPI_ERROR = - MPI_ERR_TRUNCATE; - break; - default: - mtl_psm2_request->super.ompi_req->req_status.MPI_ERROR = - MPI_ERR_INTERN; - } + if(mtl_psm2_request->type == OMPI_mtl_psm2_ISEND) { + if (mtl_psm2_request->free_after) { + free(mtl_psm2_request->buf); + } + } - mtl_psm2_request->super.completion_callback(&mtl_psm2_request->super); + switch (psm2_status.error_code) { + case PSM2_OK: + mtl_psm2_request->super.ompi_req->req_status.MPI_ERROR = + OMPI_SUCCESS; + break; + case PSM2_MQ_TRUNCATION: + mtl_psm2_request->super.ompi_req->req_status.MPI_ERROR = + MPI_ERR_TRUNCATE; + break; + default: + mtl_psm2_request->super.ompi_req->req_status.MPI_ERROR = + MPI_ERR_INTERN; + } + mtl_psm2_request->super.completion_callback(&mtl_psm2_request->super); } while (1); diff --git a/ompi/mca/mtl/psm2/mtl_psm2.h b/ompi/mca/mtl/psm2/mtl_psm2.h index 44152656bf2..3b62d8c1937 100644 --- a/ompi/mca/mtl/psm2/mtl_psm2.h +++ b/ompi/mca/mtl/psm2/mtl_psm2.h @@ -12,7 +12,7 @@ * All rights reserved. * Copyright (c) 2006 QLogic Corporation. All rights reserved. * Copyright (c) 2015 Intel, Inc. All rights reserved - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2015-2017 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -34,6 +34,8 @@ BEGIN_C_DECLS +/* MPI_THREAD_MULTIPLE_SUPPORT */ +extern opal_mutex_t mtl_psm2_mq_mutex; /* MTL interface functions */ extern int ompi_mtl_psm2_add_procs(struct mca_mtl_base_module_t* mtl, @@ -103,6 +105,7 @@ extern int ompi_mtl_psm2_finalize(struct mca_mtl_base_module_t* mtl); int ompi_mtl_psm2_module_init(int local_rank, int num_local_procs); +extern int ompi_mtl_psm2_register_pvars(void); END_C_DECLS diff --git a/ompi/mca/mtl/psm2/mtl_psm2_component.c b/ompi/mca/mtl/psm2/mtl_psm2_component.c index c16acb6e3cb..0785193b401 100644 --- a/ompi/mca/mtl/psm2/mtl_psm2_component.c +++ b/ompi/mca/mtl/psm2/mtl_psm2_component.c @@ -11,9 +11,9 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2010 QLogic Corporation. All rights reserved. - * Copyright (c) 2012-2015 Los Alamos National Security, LLC. - * All rights reserved. - * Copyright (c) 2013-2015 Intel, Inc. All rights reserved + * Copyright (c) 2012-2017 Los Alamos National Security, LLC. All rights + * reserved. + * Copyright (c) 2013-2017 Intel, Inc. All rights reserved * Copyright (c) 2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ @@ -28,6 +28,7 @@ #include "opal/mca/event/event.h" #include "opal/util/output.h" #include "opal/util/show_help.h" +#include "opal/util/opal_environ.h" #include "ompi/proc/proc.h" #include "mtl_psm2.h" @@ -42,6 +43,12 @@ #include static int param_priority; +/* MPI_THREAD_MULTIPLE_SUPPORT */ +opal_mutex_t mtl_psm2_mq_mutex = OPAL_MUTEX_STATIC_INIT; + +#if OPAL_CUDA_SUPPORT +static bool cuda_envvar_set = false; +#endif static int ompi_mtl_psm2_component_open(void); static int ompi_mtl_psm2_component_close(void); @@ -77,9 +84,150 @@ mca_mtl_psm2_component_t mca_mtl_psm2_component = { } }; +struct ompi_mtl_psm2_shadow_variable { + int variable_type; + void *storage; + mca_base_var_storage_t default_value; + const char *env_name; + mca_base_var_info_lvl_t info_level; + const char *mca_name; + const char *description; + mca_base_var_flag_t flags; +}; + +struct ompi_mtl_psm2_shadow_variable ompi_mtl_psm2_shadow_variables[] = { + {MCA_BASE_VAR_TYPE_STRING, &ompi_mtl_psm2.psm2_devices, {.stringval = "self,shm,hfi"}, "PSM2_DEVICES", OPAL_INFO_LVL_3, + "devices", + "Comma-delimited list of PSM2 devices. Valid values: self, shm, hfi (default: self,shm,hfi. Reduced to self,shm in single node jobs)",0}, + {MCA_BASE_VAR_TYPE_STRING, &ompi_mtl_psm2.psm2_memory, {.stringval = "normal"}, "PSM2_MEMORY", OPAL_INFO_LVL_9, + "memory_model", "PSM2 memory usage mode. Valid values: min, normal, large (default: normal)", 0}, + {MCA_BASE_VAR_TYPE_UNSIGNED_LONG, &ompi_mtl_psm2.psm2_mq_sendreqs_max, {.ulval = 0}, "PSM2_MQ_SENDREQS_MAX", OPAL_INFO_LVL_3, + "mq_sendreqs_max", "PSM2 maximum number of isend requests in flight (default: unset, let libpsm2 use its default)", MCA_BASE_VAR_FLAG_DEF_UNSET}, + {MCA_BASE_VAR_TYPE_UNSIGNED_LONG, &ompi_mtl_psm2.psm2_mq_recvreqs_max, {.ulval = 0}, "PSM2_MQ_RECVREQS_MAX", OPAL_INFO_LVL_3, + "mq_recvreqs_max", "PSM2 maximum number of irecv requests in flight (default: unset, let libpsm2 use its default)", MCA_BASE_VAR_FLAG_DEF_UNSET}, + {MCA_BASE_VAR_TYPE_UNSIGNED_LONG, &ompi_mtl_psm2.psm2_mq_rndv_hfi_threshold, {.ulval = 0}, "PSM2_MQ_RNDV_HFI_THRESH", OPAL_INFO_LVL_3, + "hfi_eager_limit", "PSM2 eager to rendezvous threshold (default: unset, let libpsm2 use its defaults)", MCA_BASE_VAR_FLAG_DEF_UNSET}, + {MCA_BASE_VAR_TYPE_UNSIGNED_LONG, &ompi_mtl_psm2.psm2_mq_rndv_shm_threshold, {.ulval = 0}, "PSM2_MQ_RNDV_SHM_THRESH", OPAL_INFO_LVL_3, + "shm_eager_limit", "PSM2 shared memory eager to rendezvous threshold (default: unset, let libpsm2 use its default)", MCA_BASE_VAR_FLAG_DEF_UNSET}, + {MCA_BASE_VAR_TYPE_BOOL, &ompi_mtl_psm2.psm2_recvthread, {.boolval = true}, "PSM2_RCVTHREAD", OPAL_INFO_LVL_3, + "use_receive_thread", "Use PSM2 progress thread (default: true)"}, + {MCA_BASE_VAR_TYPE_BOOL, &ompi_mtl_psm2.psm2_shared_contexts, {.boolval = true}, "PSM2_SHAREDCONTEXTS", OPAL_INFO_LVL_6, + "use_shared_contexts", "Share PSM contexts between MPI processes (default: true)"}, + {MCA_BASE_VAR_TYPE_UNSIGNED_LONG, &ompi_mtl_psm2.psm2_max_contexts_per_job, {.ulval = 0}, "PSM2_MAX_CONTEXTS_PER_JOB", OPAL_INFO_LVL_9, + "max_contexts_per_job", "Maximum number of contexts available on a node (default: unset, let libpsm2 use its default)", MCA_BASE_VAR_FLAG_DEF_UNSET}, + {MCA_BASE_VAR_TYPE_UNSIGNED_LONG, &ompi_mtl_psm2.psm2_tracemask, {.ulval = 1}, "PSM2_TRACEMASK", OPAL_INFO_LVL_9, + "trace_mask", "PSM2 tracemask value. See PSM2 documentation for accepted values in 0x (default: 1)"}, + {MCA_BASE_VAR_TYPE_UNSIGNED_LONG, &ompi_mtl_psm2.psm2_opa_sl, {.ulval = 0}, "HFI_SL", OPAL_INFO_LVL_9, + "opa_service_level", "HFI Service Level (default: unset, let libpsm2 use its defaults)", MCA_BASE_VAR_FLAG_DEF_UNSET}, + {-1}, +}; + +static void ompi_mtl_psm2_set_shadow_env (struct ompi_mtl_psm2_shadow_variable *variable) +{ + mca_base_var_storage_t *storage = variable->storage; + char *env_value; + int ret = 0; + int var_index = 0; + const mca_base_var_t *mca_base_var; + + var_index = mca_base_var_find("ompi", "mtl", "psm2", variable->mca_name); + ret = mca_base_var_get (var_index,&mca_base_var); + /* Something is fundamentally broken if registered variables are + * not found */ + if (OPAL_SUCCESS != ret) { + fprintf (stderr, "ERROR setting PSM2 environment variable: %s\n", variable->env_name); + return; + } + + /** Skip setting variables for which the default behavior is "unset" */ + if ((mca_base_var->mbv_flags & MCA_BASE_VAR_FLAG_DEF_UNSET) && + (MCA_BASE_VAR_SOURCE_DEFAULT == mca_base_var->mbv_source)){ + return ; + } + + switch (variable->variable_type) { + case MCA_BASE_VAR_TYPE_BOOL: + ret = asprintf (&env_value, "%s=%d", variable->env_name, storage->boolval ? 1 : 0); + break; + case MCA_BASE_VAR_TYPE_UNSIGNED_LONG: + if (0 == strcmp (variable->env_name, "PSM2_TRACEMASK")) { + /* PSM2 documentation shows the tracemask as a hexidecimal number. to be consitent + * use hexidecimal here. */ + ret = asprintf (&env_value, "%s=0x%lx", variable->env_name, storage->ulval); + } else { + ret = asprintf (&env_value, "%s=%lu", variable->env_name, storage->ulval); + } + break; + case MCA_BASE_VAR_TYPE_STRING: + ret = asprintf (&env_value, "%s=%s", variable->env_name, storage->stringval); + break; + } + + if (0 > ret) { + fprintf (stderr, "ERROR setting PSM2 environment variable: %s\n", variable->env_name); + } else { + putenv (env_value); + } +} + +static void ompi_mtl_psm2_register_shadow_env (struct ompi_mtl_psm2_shadow_variable *variable) +{ + mca_base_var_storage_t *storage = variable->storage; + char *env_value; + + env_value = getenv (variable->env_name); + switch (variable->variable_type) { + case MCA_BASE_VAR_TYPE_BOOL: + if (env_value) { + int tmp; + (void) mca_base_var_enum_bool.value_from_string (&mca_base_var_enum_bool, env_value, &tmp); + storage->boolval = !!tmp; + } else { + storage->boolval = variable->default_value.boolval; + } + break; + case MCA_BASE_VAR_TYPE_UNSIGNED_LONG: + if (env_value) { + storage->ulval = strtol (env_value, NULL, 0); + } else { + storage->ulval = variable->default_value.ulval; + } + break; + case MCA_BASE_VAR_TYPE_STRING: + if (env_value) { + storage->stringval = env_value; + } else { + storage->stringval = variable->default_value.stringval; + } + break; + } + + (void) mca_base_component_var_register (&mca_mtl_psm2_component.super.mtl_version, variable->mca_name, variable->description, + variable->variable_type, NULL, 0, variable->flags, variable->info_level, MCA_BASE_VAR_SCOPE_READONLY, + variable->storage); +} + +static int +get_num_total_procs(int *out_ntp) +{ + *out_ntp = (int)ompi_process_info.num_procs; + return OMPI_SUCCESS; +} + +static int +get_num_local_procs(int *out_nlp) +{ + /* num_local_peers does not include us in + * its calculation, so adjust for that */ + *out_nlp = (int)(1 + ompi_process_info.num_local_peers); + return OMPI_SUCCESS; +} + static int ompi_mtl_psm2_component_register(void) { + int num_local_procs, num_total_procs; + ompi_mtl_psm2.connect_timeout = 180; (void) mca_base_component_var_register(&mca_mtl_psm2_component.super.mtl_version, "connect_timeout", @@ -89,8 +237,22 @@ ompi_mtl_psm2_component_register(void) MCA_BASE_VAR_SCOPE_READONLY, &ompi_mtl_psm2.connect_timeout); + + (void) get_num_local_procs(&num_local_procs); + (void) get_num_total_procs(&num_total_procs); + /* set priority high enough to beat ob1's default (also set higher than psm) */ - param_priority = 40; + if ((num_local_procs == num_total_procs) && (1 < num_total_procs)) { + /* Disable hfi if all processes are local. However, if running only one + * process assume it is ompi_info or this is most likely going to spawn, for + * which all PSM2 devices are needed */ + setenv("PSM2_DEVICES", "self,shm", 0); + /* ob1 is much faster than psm2 with shared memory */ + param_priority = 10; + } else { + param_priority = 40; + } + (void) mca_base_component_var_register (&mca_mtl_psm2_component.super.mtl_version, "priority", "Priority of the PSM2 MTL component", MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, @@ -98,6 +260,12 @@ ompi_mtl_psm2_component_register(void) MCA_BASE_VAR_SCOPE_READONLY, ¶m_priority); + for (int i = 0 ; ompi_mtl_psm2_shadow_variables[i].variable_type >= 0 ; ++i) { + ompi_mtl_psm2_register_shadow_env (ompi_mtl_psm2_shadow_variables + i); + } + + ompi_mtl_psm2_register_pvars(); + return OMPI_SUCCESS; } @@ -105,17 +273,16 @@ static int ompi_mtl_psm2_component_open(void) { int res; - glob_t globbuf; - globbuf.gl_offs = 0; + glob_t globbuf = {0}; /* Component available only if Omni-Path hardware is present */ res = glob("/dev/hfi1_[0-9]", GLOB_DOOFFS, NULL, &globbuf); - if (0 == res || GLOB_NOMATCH == res) { + if (globbuf.gl_pathc > 0) { globfree(&globbuf); } if (0 != res) { res = glob("/dev/hfi1_[0-9][0-9]", GLOB_APPEND, NULL, &globbuf); - if (0 == res || GLOB_NOMATCH == res) { + if (globbuf.gl_pathc > 0) { globfree(&globbuf); } if (0 != res) { @@ -169,22 +336,11 @@ ompi_mtl_psm2_component_query(mca_base_module_t **module, int *priority) static int ompi_mtl_psm2_component_close(void) { - return OMPI_SUCCESS; -} - -static int -get_num_total_procs(int *out_ntp) -{ - *out_ntp = (int)ompi_process_info.num_procs; - return OMPI_SUCCESS; -} - -static int -get_num_local_procs(int *out_nlp) -{ - /* num_local_peers does not include us in - * its calculation, so adjust for that */ - *out_nlp = (int)(1 + ompi_process_info.num_local_peers); +#if OPAL_CUDA_SUPPORT + if (cuda_envvar_set) { + opal_unsetenv("PSM2_CUDA", &environ); + } +#endif return OMPI_SUCCESS; } @@ -211,7 +367,11 @@ ompi_mtl_psm2_component_init(bool enable_progress_threads, int verno_major = PSM2_VERNO_MAJOR; int verno_minor = PSM2_VERNO_MINOR; int local_rank = -1, num_local_procs = 0; - int num_total_procs = 0; +#if OPAL_CUDA_SUPPORT + int ret; + char *cuda_env; + glob_t globbuf = {0}; +#endif /* Compute the total number of processes on this host and our local rank * on that node. We need to provide PSM2 with these values so it can @@ -226,11 +386,6 @@ ompi_mtl_psm2_component_init(bool enable_progress_threads, opal_output(0, "Cannot determine local rank. Cannot continue.\n"); return NULL; } - if (OMPI_SUCCESS != get_num_total_procs(&num_total_procs)) { - opal_output(0, "Cannot determine total number of processes. " - "Cannot continue.\n"); - return NULL; - } err = psm2_error_register_handler(NULL /* no ep */, PSM2_ERRHANDLER_NOP); @@ -240,9 +395,30 @@ ompi_mtl_psm2_component_init(bool enable_progress_threads, return NULL; } - if (num_local_procs == num_total_procs) { - setenv("PSM2_DEVICES", "self,shm", 0); + for (int i = 0 ; ompi_mtl_psm2_shadow_variables[i].variable_type >= 0 ; ++i) { + ompi_mtl_psm2_set_shadow_env (ompi_mtl_psm2_shadow_variables + i); + } + +#if OPAL_CUDA_SUPPORT + /* + * If using CUDA enabled Open MPI, the user likely intends to + * run with CUDA buffers. So, force-set the envvar here if user failed + * to set it. + */ + ret = glob("/sys/module/nvidia", GLOB_DOOFFS, NULL, &globbuf); + if (globbuf.gl_pathc > 0) { + globfree(&globbuf); + } + + cuda_env = getenv("PSM2_CUDA"); + if (!cuda_env && (0 == ret)) { + opal_show_help("help-mtl-psm2.txt", + "no psm2 cuda env", true, + ompi_process_info.nodename); + opal_setenv("PSM2_CUDA", "1", false, &environ); + cuda_envvar_set = true; } +#endif err = psm2_init(&verno_major, &verno_minor); if (err) { diff --git a/ompi/mca/mtl/psm2/mtl_psm2_recv.c b/ompi/mca/mtl/psm2/mtl_psm2_recv.c index a62e3db3bb6..ff5c54067ce 100644 --- a/ompi/mca/mtl/psm2/mtl_psm2_recv.c +++ b/ompi/mca/mtl/psm2/mtl_psm2_recv.c @@ -52,6 +52,13 @@ ompi_mtl_psm2_irecv(struct mca_mtl_base_module_t* mtl, if (OMPI_SUCCESS != ret) return ret; + if (length >= 1ULL << sizeof(uint32_t) * 8) { + opal_show_help("help-mtl-psm2.txt", + "message too big", false, + length, 1ULL << sizeof(uint32_t) * 8); + return OMPI_ERROR; + } + mtl_psm2_request->length = length; mtl_psm2_request->convertor = convertor; mtl_psm2_request->type = OMPI_mtl_psm2_IRECV; @@ -102,6 +109,13 @@ ompi_mtl_psm2_imrecv(struct mca_mtl_base_module_t* mtl, if (OMPI_SUCCESS != ret) return ret; + if (length >= 1ULL << sizeof(uint32_t) * 8) { + opal_show_help("help-mtl-psm2.txt", + "message too big", false, + length, 1ULL << sizeof(uint32_t) * 8); + return OMPI_ERROR; + } + mtl_psm2_request->length = length; mtl_psm2_request->convertor = convertor; mtl_psm2_request->type = OMPI_mtl_psm2_IRECV; diff --git a/ompi/mca/mtl/psm2/mtl_psm2_send.c b/ompi/mca/mtl/psm2/mtl_psm2_send.c index d4ed8136bf6..6acb30cf6d2 100644 --- a/ompi/mca/mtl/psm2/mtl_psm2_send.c +++ b/ompi/mca/mtl/psm2/mtl_psm2_send.c @@ -22,6 +22,7 @@ #include "ompi/mca/pml/pml.h" #include "ompi/communicator/communicator.h" #include "opal/datatype/opal_convertor.h" +#include "opal/util/show_help.h" #include "mtl_psm2.h" #include "mtl_psm2_types.h" @@ -54,6 +55,12 @@ ompi_mtl_psm2_send(struct mca_mtl_base_module_t* mtl, &length, &mtl_psm2_request.free_after); + if (length >= 1ULL << sizeof(uint32_t) * 8) { + opal_show_help("help-mtl-psm2.txt", + "message too big", false, + length, 1ULL << sizeof(uint32_t) * 8); + return OMPI_ERROR; + } mtl_psm2_request.length = length; mtl_psm2_request.convertor = convertor; @@ -107,6 +114,13 @@ ompi_mtl_psm2_isend(struct mca_mtl_base_module_t* mtl, &length, &mtl_psm2_request->free_after); + if (length >= 1ULL << sizeof(uint32_t) * 8) { + opal_show_help("help-mtl-psm2.txt", + "message too big", false, + length, 1ULL << sizeof(uint32_t) * 8); + return OMPI_ERROR; + } + mtl_psm2_request->length= length; mtl_psm2_request->convertor = convertor; mtl_psm2_request->type = OMPI_mtl_psm2_ISEND; diff --git a/ompi/mca/mtl/psm2/mtl_psm2_stats.c b/ompi/mca/mtl/psm2/mtl_psm2_stats.c new file mode 100644 index 00000000000..ad3d879a3b1 --- /dev/null +++ b/ompi/mca/mtl/psm2/mtl_psm2_stats.c @@ -0,0 +1,98 @@ +/* + * Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana + * University Research and Technology + * Corporation. All rights reserved. + * Copyright (c) 2004-2010 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, + * University of Stuttgart. All rights reserved. + * Copyright (c) 2004-2006 The Regents of the University of California. + * All rights reserved. + * Copyright (c) 2006 QLogic Corporation. All rights reserved. + * Copyright (c) 2006-2017 Los Alamos National Security, LLC. All rights + * reserved. + * Copyright (c) 2013-2015 Intel, Inc. All rights reserved + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "ompi_config.h" +#include "mtl_psm2.h" +#include "mtl_psm2_types.h" +#include "psm2.h" +#include "ompi/communicator/communicator.h" +#include "ompi/message/message.h" + +#include "opal/mca/base/mca_base_pvar.h" + +struct ompi_mtl_psm2_name_descs +{ + char *name; + char *desc; + ptrdiff_t offset; +}; + +const struct ompi_mtl_psm2_name_descs name_descs[PSM2_MQ_NUM_STATS] = +{ + { "rx_user_bytes", "Bytes received into a matched user buffer", + offsetof(struct psm2_mq_stats, rx_user_bytes) }, + { "rx_user_num", "Messages received into a matched user buffer", + offsetof(struct psm2_mq_stats, rx_user_num) }, + { "rx_sys_bytes", "Bytes received into an unmatched system buffer", + offsetof(struct psm2_mq_stats, rx_sys_bytes) }, + { "rx_sys_num", "Messages received into an unmatched system buffer", + offsetof(struct psm2_mq_stats, rx_sys_num) }, + { "tx_num", "Total Messages transmitted (shm and hfi)", + offsetof(struct psm2_mq_stats, tx_num) }, + { "tx_eager_num", "Messages transmitted eagerly", + offsetof(struct psm2_mq_stats, tx_eager_num) }, + { "tx_eager_bytes", "Bytes transmitted eagerl", + offsetof(struct psm2_mq_stats, tx_eager_bytes) }, + { "tx_rndv_num", "Messages transmitted using expected TID mechanism", + offsetof(struct psm2_mq_stats, tx_rndv_num) }, + { "tx_rndv_bytes", "Bytes transmitted using expected TID mechanism", + offsetof(struct psm2_mq_stats, tx_rndv_bytes) }, + { "tx_shm_num", "Messages transmitted (shm only)", + offsetof(struct psm2_mq_stats, tx_shm_num) }, + { "rx_shm_num", "Messages received through shm", + offsetof(struct psm2_mq_stats, rx_shm_num) }, + { "rx_sysbuf_num", "Number of system buffers allocated", + offsetof(struct psm2_mq_stats, rx_sysbuf_num) }, + { "rx_sysbuf_bytes", "Bytes allocated for system buffers", + offsetof(struct psm2_mq_stats, rx_sysbuf_bytes) }, +}; + +static int mca_mtl_psm2_get_stats(const mca_base_pvar_t *pvar, void *value, void *obj) +{ + psm2_mq_stats_t stats; + int index = (int)(intptr_t) pvar->ctx; + + psm2_mq_get_stats(ompi_mtl_psm2.mq, &stats); + + *(uint64_t *)value = *(uint64_t *)((uint8_t *)&stats + name_descs[index].offset); + + return OMPI_SUCCESS; +} + + +int ompi_mtl_psm2_register_pvars(void) +{ + int i; + + /* PSM2 MQ performance variables */ + for (i = 0 ; i < PSM2_MQ_NUM_STATS; ++i) { + (void) mca_base_component_pvar_register (&mca_mtl_psm2_component.super.mtl_version, + name_descs[i].name, name_descs[i].desc, + OPAL_INFO_LVL_4, MCA_BASE_PVAR_CLASS_COUNTER, + MCA_BASE_VAR_TYPE_UNSIGNED_LONG, + NULL, MCA_BASE_VAR_BIND_NO_OBJECT, + MCA_BASE_PVAR_FLAG_READONLY | MCA_BASE_PVAR_FLAG_CONTINUOUS, + mca_mtl_psm2_get_stats, NULL, NULL, + (void *) (intptr_t) i); + } + return OMPI_SUCCESS; +} diff --git a/ompi/mca/mtl/psm2/mtl_psm2_types.h b/ompi/mca/mtl/psm2/mtl_psm2_types.h index 31f0deb7ca1..20c404129f4 100644 --- a/ompi/mca/mtl/psm2/mtl_psm2_types.h +++ b/ompi/mca/mtl/psm2/mtl_psm2_types.h @@ -1,3 +1,4 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* * Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana * University Research and Technology @@ -10,8 +11,8 @@ * Copyright (c) 2004-2006 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006 QLogic Corporation. All rights reserved. - * Copyright (c) 2011 Los Alamos National Security, LLC. - * All rights reserved. + * Copyright (c) 2011-2017 Los Alamos National Security, LLC. All rights + * reserved. * Copyright (c) 2013-2015 Intel, Inc. All rights reserved * $COPYRIGHT$ * @@ -49,6 +50,17 @@ struct mca_mtl_psm2_module_t { psm2_mq_t mq; psm2_epid_t epid; psm2_epaddr_t epaddr; + char *psm2_devices; + char *psm2_memory; + unsigned long psm2_mq_sendreqs_max; + unsigned long psm2_mq_recvreqs_max; + unsigned long psm2_mq_rndv_hfi_threshold; + unsigned long psm2_mq_rndv_shm_threshold; + unsigned long psm2_max_contexts_per_job; + unsigned long psm2_tracemask; + bool psm2_recvthread; + bool psm2_shared_contexts; + unsigned long psm2_opa_sl; }; typedef struct mca_mtl_psm2_module_t mca_mtl_psm2_module_t; diff --git a/ompi/mca/op/example/Makefile.am b/ompi/mca/op/example/Makefile.am index 62626c10976..e0990e52716 100644 --- a/ompi/mca/op/example/Makefile.am +++ b/ompi/mca/op/example/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2008-2014 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -70,6 +71,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component) mca_op_example_la_SOURCES = $(component_sources) mca_op_example_la_LDFLAGS = -module -avoid-version +mca_op_example_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la # Specific information for static builds. # diff --git a/ompi/mca/osc/base/base.h b/ompi/mca/osc/base/base.h index bb368be82b9..1445510ee65 100644 --- a/ompi/mca/osc/base/base.h +++ b/ompi/mca/osc/base/base.h @@ -7,6 +7,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -41,7 +42,7 @@ int ompi_osc_base_select(ompi_win_t *win, size_t size, int disp_unit, ompi_communicator_t *comm, - ompi_info_t *info, + opal_info_t *info, int flavor, int *model); diff --git a/ompi/mca/osc/base/osc_base_init.c b/ompi/mca/osc/base/osc_base_init.c index 1e0cba6629a..7d1aaaf6a5f 100644 --- a/ompi/mca/osc/base/osc_base_init.c +++ b/ompi/mca/osc/base/osc_base_init.c @@ -10,6 +10,7 @@ * All rights reserved. * Copyright (c) 2014 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -34,7 +35,7 @@ ompi_osc_base_select(ompi_win_t *win, size_t size, int disp_unit, ompi_communicator_t *comm, - ompi_info_t *info, + opal_info_t *info, int flavor, int *model) { diff --git a/ompi/mca/osc/base/osc_base_obj_convert.c b/ompi/mca/osc/base/osc_base_obj_convert.c index d91d4d30801..e396258ce2b 100644 --- a/ompi/mca/osc/base/osc_base_obj_convert.c +++ b/ompi/mca/osc/base/osc_base_obj_convert.c @@ -11,7 +11,7 @@ * Copyright (c) 2007-2015 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2015 Intel, Inc. All rights reserved. @@ -105,16 +105,20 @@ int ompi_osc_base_process_op (void *outbuf, void *inbuf, size_t inbuflen, struct iovec iov[OMPI_OSC_BASE_DECODE_MAX]; uint32_t iov_count; size_t size, primitive_size; - OPAL_PTRDIFF_TYPE lb, extent; + ptrdiff_t lb, extent; bool done; primitive_datatype = ompi_datatype_get_single_predefined_type_from_args(datatype); + ompi_datatype_type_size (primitive_datatype, &primitive_size); + if (ompi_datatype_is_contiguous_memory_layout (datatype, count) && 1 == datatype->super.desc.used) { /* NTH: the datatype is made up of a contiguous block of the primitive * datatype. fast path. do not set up a convertor to deal with the * datatype. */ - count *= datatype->super.desc.desc[0].elem.count; + (void)ompi_datatype_type_size(datatype, &size); + count *= (size / primitive_size); + assert( 0 == (size % primitive_size) ); /* in case it is possible for the datatype to have a non-zero lb in this case. * remove me if this is not possible */ @@ -125,8 +129,6 @@ int ompi_osc_base_process_op (void *outbuf, void *inbuf, size_t inbuflen, return OMPI_SUCCESS; } - ompi_datatype_type_size (primitive_datatype, &primitive_size); - /* create convertor */ OBJ_CONSTRUCT(&convertor, opal_convertor_t); opal_convertor_copy_and_prepare_for_recv(ompi_mpi_local_convertor, &datatype->super, diff --git a/ompi/mca/osc/monitoring/Makefile.am b/ompi/mca/osc/monitoring/Makefile.am new file mode 100644 index 00000000000..a90ce38c6e3 --- /dev/null +++ b/ompi/mca/osc/monitoring/Makefile.am @@ -0,0 +1,41 @@ +# +# Copyright (c) 2016-2018 Inria. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +monitoring_sources = \ + osc_monitoring.h \ + osc_monitoring_comm.h \ + osc_monitoring_component.c \ + osc_monitoring_accumulate.h \ + osc_monitoring_passive_target.h \ + osc_monitoring_active_target.h \ + osc_monitoring_dynamic.h \ + osc_monitoring_module.h \ + osc_monitoring_template.h + +if MCA_BUILD_ompi_osc_monitoring_DSO +component_noinst = +component_install = mca_osc_monitoring.la +else +component_noinst = libmca_osc_monitoring.la +component_install = +endif + +mcacomponentdir = $(ompilibdir) +mcacomponent_LTLIBRARIES = $(component_install) +mca_osc_monitoring_la_SOURCES = $(monitoring_sources) +mca_osc_monitoring_la_LDFLAGS = -module -avoid-version +mca_osc_monitoring_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(OMPI_TOP_BUILDDIR)/ompi/mca/common/monitoring/libmca_common_monitoring.la + +noinst_LTLIBRARIES = $(component_noinst) +libmca_osc_monitoring_la_SOURCES = $(monitoring_sources) +libmca_osc_monitoring_la_LDFLAGS = -module -avoid-version + +DISTCLEANFILES = osc_monitoring_template_gen.h diff --git a/ompi/mca/osc/monitoring/configure.m4 b/ompi/mca/osc/monitoring/configure.m4 new file mode 100644 index 00000000000..a22f8cb1a62 --- /dev/null +++ b/ompi/mca/osc/monitoring/configure.m4 @@ -0,0 +1,100 @@ +dnl -*- shell-script -*- +dnl +dnl Copyright (c) 2016-2018 Inria. All rights reserved. +dnl $COPYRIGHT$ +dnl +dnl Additional copyrights may follow +dnl +dnl $HEADER$ +dnl + +# mca_ompi_osc_monitoring_generate_templates +# +# Overwrite $1. $1 is where the different templates are brought +# together and compose an array of components by listing component +# names in $2. +# +# $1 = filename +# $2 = osc component names +# +AC_DEFUN( + [MCA_OMPI_OSC_MONITORING_GENERATE_TEMPLATES], + [m4_ifval( + [$1], + [AC_CONFIG_COMMANDS( + [$1], + [filename="$1" + components=`echo "$2" | sed -e 's/,/ /g' -e 's/monitoring//'` + cat <$filename +/* $filename + * + * This file was generated from ompi/mca/osc/monitoring/configure.m4 + * + * DO NOT EDIT THIS FILE. + * + */ +/* + * Copyright (c) 2017-2018 Inria. All rights reserved. + * \$COPYRIGHT$ + * + * Additional copyrights may follow + * + * \$HEADER$ + */ + +#ifndef MCA_OSC_MONITORING_GEN_TEMPLATE_H +#define MCA_OSC_MONITORING_GEN_TEMPLATE_H + +#include +#include +#include + +/************************************************************/ +/* Include template generating macros and inlined functions */ + +EOF + # Generate each case in order to register the proper template functions + for comp in $components + do + echo "OSC_MONITORING_MODULE_TEMPLATE_GENERATE(${comp})" >>$filename + done + cat <>$filename + +/************************************************************/ + +typedef struct { + const char * name; + ompi_osc_base_module_t * (*fct) (ompi_osc_base_module_t *); +} osc_monitoring_components_list_t; + +static const osc_monitoring_components_list_t osc_monitoring_components_list[[]] = { +EOF + for comp in $components + do + echo " { .name = \"${comp}\", .fct = OSC_MONITORING_SET_TEMPLATE_FCT_NAME(${comp}) }," >>$filename + done + cat <>$filename + { .name = NULL, .fct = NULL } +}; + +#endif /* MCA_OSC_MONITORING_GEN_TEMPLATE_H */ +EOF + unset filename components + ]) + ])dnl + ])dnl + +# MCA_ompi_osc_monitoring_CONFIG() +# ------------------------------------------------ +AC_DEFUN( + [MCA_ompi_osc_monitoring_CONFIG], + [AC_CONFIG_FILES([ompi/mca/osc/monitoring/Makefile]) + + AS_IF([test "$MCA_BUILD_ompi_common_monitoring_DSO_TRUE" = ''], + [$1], + [$2]) + + MCA_OMPI_OSC_MONITORING_GENERATE_TEMPLATES( + [ompi/mca/osc/monitoring/osc_monitoring_template_gen.h], + [mca_ompi_osc_m4_config_component_list, mca_ompi_osc_no_config_component_list])dnl + ])dnl diff --git a/ompi/mca/osc/monitoring/osc_monitoring.h b/ompi/mca/osc/monitoring/osc_monitoring.h new file mode 100644 index 00000000000..8a223e459e4 --- /dev/null +++ b/ompi/mca/osc/monitoring/osc_monitoring.h @@ -0,0 +1,29 @@ +/* + * Copyright (c) 2016-2017 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#ifndef MCA_OSC_MONITORING_H +#define MCA_OSC_MONITORING_H + +BEGIN_C_DECLS + +#include +#include +#include + +struct ompi_osc_monitoring_component_t { + ompi_osc_base_component_t super; + int priority; +}; +typedef struct ompi_osc_monitoring_component_t ompi_osc_monitoring_component_t; + +OMPI_DECLSPEC extern ompi_osc_monitoring_component_t mca_osc_monitoring_component; + +END_C_DECLS + +#endif /* MCA_OSC_MONITORING_H */ diff --git a/ompi/mca/osc/monitoring/osc_monitoring_accumulate.h b/ompi/mca/osc/monitoring/osc_monitoring_accumulate.h new file mode 100644 index 00000000000..259a496f73a --- /dev/null +++ b/ompi/mca/osc/monitoring/osc_monitoring_accumulate.h @@ -0,0 +1,175 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#ifndef MCA_OSC_MONITORING_ACCUMULATE_H +#define MCA_OSC_MONITORING_ACCUMULATE_H + +#include +#include +#include + +#define OSC_MONITORING_GENERATE_TEMPLATE_ACCUMULATE(template) \ + \ + static int ompi_osc_monitoring_## template ##_compare_and_swap (const void *origin_addr, \ + const void *compare_addr, \ + void *result_addr, \ + ompi_datatype_t *dt, \ + int target_rank, \ + ptrdiff_t target_disp, \ + ompi_win_t *win) \ + { \ + int world_rank; \ + /** \ + * If this fails the destination is not part of my MPI_COM_WORLD \ + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank \ + */ \ + if(OPAL_SUCCESS == mca_common_monitoring_get_world_rank(target_rank, win->w_group, &world_rank)) { \ + size_t type_size; \ + ompi_datatype_type_size(dt, &type_size); \ + mca_common_monitoring_record_osc(world_rank, type_size, SEND); \ + mca_common_monitoring_record_osc(world_rank, type_size, RECV); \ + OPAL_MONITORING_PRINT_INFO("MPI_Compare_and_swap to %d intercepted", world_rank); \ + } \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_compare_and_swap(origin_addr, compare_addr, result_addr, dt, target_rank, target_disp, win); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_get_accumulate (const void *origin_addr, \ + int origin_count, \ + ompi_datatype_t*origin_datatype, \ + void *result_addr, \ + int result_count, \ + ompi_datatype_t*result_datatype, \ + int target_rank, \ + MPI_Aint target_disp, \ + int target_count, \ + ompi_datatype_t*target_datatype, \ + ompi_op_t *op, ompi_win_t*win) \ + { \ + int world_rank; \ + /** \ + * If this fails the destination is not part of my MPI_COM_WORLD \ + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank \ + */ \ + if(OPAL_SUCCESS == mca_common_monitoring_get_world_rank(target_rank, win->w_group, &world_rank)) { \ + size_t type_size, data_size; \ + ompi_datatype_type_size(origin_datatype, &type_size); \ + data_size = origin_count*type_size; \ + mca_common_monitoring_record_osc(world_rank, data_size, SEND); \ + ompi_datatype_type_size(result_datatype, &type_size); \ + data_size = result_count*type_size; \ + mca_common_monitoring_record_osc(world_rank, data_size, RECV); \ + OPAL_MONITORING_PRINT_INFO("MPI_Get_accumulate to %d intercepted", world_rank); \ + } \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_get_accumulate(origin_addr, origin_count, origin_datatype, result_addr, result_count, result_datatype, target_rank, target_disp, target_count, target_datatype, op, win); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_rget_accumulate (const void *origin_addr, \ + int origin_count, \ + ompi_datatype_t *origin_datatype, \ + void *result_addr, \ + int result_count, \ + ompi_datatype_t *result_datatype, \ + int target_rank, \ + MPI_Aint target_disp, \ + int target_count, \ + ompi_datatype_t*target_datatype, \ + ompi_op_t *op, \ + ompi_win_t *win, \ + ompi_request_t **request) \ + { \ + int world_rank; \ + /** \ + * If this fails the destination is not part of my MPI_COM_WORLD \ + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank \ + */ \ + if(OPAL_SUCCESS == mca_common_monitoring_get_world_rank(target_rank, win->w_group, &world_rank)) { \ + size_t type_size, data_size; \ + ompi_datatype_type_size(origin_datatype, &type_size); \ + data_size = origin_count*type_size; \ + mca_common_monitoring_record_osc(world_rank, data_size, SEND); \ + ompi_datatype_type_size(result_datatype, &type_size); \ + data_size = result_count*type_size; \ + mca_common_monitoring_record_osc(world_rank, data_size, RECV); \ + OPAL_MONITORING_PRINT_INFO("MPI_Rget_accumulate to %d intercepted", world_rank); \ + } \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_rget_accumulate(origin_addr, origin_count, origin_datatype, result_addr, result_count, result_datatype, target_rank, target_disp, target_count, target_datatype, op, win, request); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_raccumulate (const void *origin_addr, \ + int origin_count, \ + ompi_datatype_t *origin_datatype, \ + int target_rank, \ + ptrdiff_t target_disp, \ + int target_count, \ + ompi_datatype_t *target_datatype, \ + ompi_op_t *op, ompi_win_t *win, \ + ompi_request_t **request) \ + { \ + int world_rank; \ + /** \ + * If this fails the destination is not part of my MPI_COM_WORLD \ + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank \ + */ \ + if(OPAL_SUCCESS == mca_common_monitoring_get_world_rank(target_rank, win->w_group, &world_rank)) { \ + size_t type_size, data_size; \ + ompi_datatype_type_size(origin_datatype, &type_size); \ + data_size = origin_count*type_size; \ + mca_common_monitoring_record_osc(world_rank, data_size, SEND); \ + OPAL_MONITORING_PRINT_INFO("MPI_Raccumulate to %d intercepted", world_rank); \ + } \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_raccumulate(origin_addr, origin_count, origin_datatype, target_rank, target_disp, target_count, target_datatype, op, win, request); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_accumulate (const void *origin_addr, \ + int origin_count, \ + ompi_datatype_t *origin_datatype, \ + int target_rank, \ + ptrdiff_t target_disp, \ + int target_count, \ + ompi_datatype_t *target_datatype, \ + ompi_op_t *op, ompi_win_t *win) \ + { \ + int world_rank; \ + /** \ + * If this fails the destination is not part of my MPI_COM_WORLD \ + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank \ + */ \ + if(OPAL_SUCCESS == mca_common_monitoring_get_world_rank(target_rank, win->w_group, &world_rank)) { \ + size_t type_size, data_size; \ + ompi_datatype_type_size(origin_datatype, &type_size); \ + data_size = origin_count*type_size; \ + mca_common_monitoring_record_osc(world_rank, data_size, SEND); \ + OPAL_MONITORING_PRINT_INFO("MPI_Accumulate to %d intercepted", world_rank); \ + } \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_accumulate(origin_addr, origin_count, origin_datatype, target_rank, target_disp, target_count, target_datatype, op, win); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_fetch_and_op (const void *origin_addr, \ + void *result_addr, \ + ompi_datatype_t *dt, \ + int target_rank, \ + ptrdiff_t target_disp, \ + ompi_op_t *op, ompi_win_t *win) \ + { \ + int world_rank; \ + /** \ + * If this fails the destination is not part of my MPI_COM_WORLD \ + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank \ + */ \ + if(OPAL_SUCCESS == mca_common_monitoring_get_world_rank(target_rank, win->w_group, &world_rank)) { \ + size_t type_size; \ + ompi_datatype_type_size(dt, &type_size); \ + mca_common_monitoring_record_osc(world_rank, type_size, SEND); \ + mca_common_monitoring_record_osc(world_rank, type_size, RECV); \ + OPAL_MONITORING_PRINT_INFO("MPI_Fetch_and_op to %d intercepted", world_rank); \ + } \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_fetch_and_op(origin_addr, result_addr, dt, target_rank, target_disp, op, win); \ + } + +#endif /* MCA_OSC_MONITORING_ACCUMULATE_H */ diff --git a/ompi/mca/osc/monitoring/osc_monitoring_active_target.h b/ompi/mca/osc/monitoring/osc_monitoring_active_target.h new file mode 100644 index 00000000000..3420bf60dc6 --- /dev/null +++ b/ompi/mca/osc/monitoring/osc_monitoring_active_target.h @@ -0,0 +1,48 @@ +/* + * Copyright (c) 2016 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#ifndef MCA_OSC_MONITORING_ACTIVE_TARGET_H +#define MCA_OSC_MONITORING_ACTIVE_TARGET_H + +#include +#include + +#define OSC_MONITORING_GENERATE_TEMPLATE_ACTIVE_TARGET(template) \ + \ + static int ompi_osc_monitoring_## template ##_post (ompi_group_t *group, int assert, ompi_win_t *win) \ + { \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_post(group, assert, win); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_start (ompi_group_t *group, int assert, ompi_win_t *win) \ + { \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_start(group, assert, win); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_complete (ompi_win_t *win) \ + { \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_complete(win); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_wait (ompi_win_t *win) \ + { \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_wait(win); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_test (ompi_win_t *win, int *flag) \ + { \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_test(win, flag); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_fence (int assert, ompi_win_t *win) \ + { \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_fence(assert, win); \ + } + +#endif /* MCA_OSC_MONITORING_ACTIVE_TARGET_H */ diff --git a/ompi/mca/osc/monitoring/osc_monitoring_comm.h b/ompi/mca/osc/monitoring/osc_monitoring_comm.h new file mode 100644 index 00000000000..c98e0509558 --- /dev/null +++ b/ompi/mca/osc/monitoring/osc_monitoring_comm.h @@ -0,0 +1,118 @@ +/* + * Copyright (c) 2016-2018 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#ifndef MCA_OSC_MONITORING_COMM_H +#define MCA_OSC_MONITORING_COMM_H + +#include +#include +#include + +#define OSC_MONITORING_GENERATE_TEMPLATE_COMM(template) \ + \ + static int ompi_osc_monitoring_## template ##_put (const void *origin_addr, \ + int origin_count, \ + ompi_datatype_t *origin_datatype, \ + int target_rank, \ + ptrdiff_t target_disp, \ + int target_count, \ + ompi_datatype_t *target_datatype, \ + ompi_win_t *win) \ + { \ + int world_rank; \ + /** \ + * If this fails the destination is not part of my MPI_COM_WORLD \ + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank \ + */ \ + if(OPAL_SUCCESS == mca_common_monitoring_get_world_rank(target_rank, win->w_group, &world_rank)) { \ + size_t type_size, data_size; \ + ompi_datatype_type_size(origin_datatype, &type_size); \ + data_size = origin_count*type_size; \ + mca_common_monitoring_record_osc(world_rank, data_size, SEND); \ + OPAL_MONITORING_PRINT_INFO("MPI_Put to %d intercepted", world_rank); \ + } \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_put(origin_addr, origin_count, origin_datatype, target_rank, target_disp, target_count, target_datatype, win); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_rput (const void *origin_addr, \ + int origin_count, \ + ompi_datatype_t *origin_datatype, \ + int target_rank, \ + ptrdiff_t target_disp, \ + int target_count, \ + ompi_datatype_t *target_datatype, \ + ompi_win_t *win, \ + ompi_request_t **request) \ + { \ + int world_rank; \ + /** \ + * If this fails the destination is not part of my MPI_COM_WORLD \ + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank \ + */ \ + if(OPAL_SUCCESS == mca_common_monitoring_get_world_rank(target_rank, win->w_group, &world_rank)) { \ + size_t type_size, data_size; \ + ompi_datatype_type_size(origin_datatype, &type_size); \ + data_size = origin_count*type_size; \ + mca_common_monitoring_record_osc(world_rank, data_size, SEND); \ + OPAL_MONITORING_PRINT_INFO("MPI_Rput to %d intercepted", world_rank); \ + } \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_rput(origin_addr, origin_count, origin_datatype, target_rank, target_disp, target_count, target_datatype, win, request); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_get (void *origin_addr, int origin_count, \ + ompi_datatype_t *origin_datatype, \ + int source_rank, \ + ptrdiff_t source_disp, \ + int source_count, \ + ompi_datatype_t *source_datatype, \ + ompi_win_t *win) \ + { \ + int world_rank; \ + /** \ + * If this fails the destination is not part of my MPI_COM_WORLD \ + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank \ + */ \ + if(OPAL_SUCCESS == mca_common_monitoring_get_world_rank(source_rank, win->w_group, &world_rank)) { \ + size_t type_size, data_size; \ + ompi_datatype_type_size(origin_datatype, &type_size); \ + data_size = origin_count*type_size; \ + mca_common_monitoring_record_osc(world_rank, 0, SEND); \ + mca_common_monitoring_record_osc(world_rank, data_size, RECV); \ + OPAL_MONITORING_PRINT_INFO("MPI_Get to %d intercepted", world_rank); \ + } \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_get(origin_addr, origin_count, origin_datatype, source_rank, source_disp, source_count, source_datatype, win); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_rget (void *origin_addr, int origin_count, \ + ompi_datatype_t *origin_datatype, \ + int source_rank, \ + ptrdiff_t source_disp, \ + int source_count, \ + ompi_datatype_t *source_datatype, \ + ompi_win_t *win, \ + ompi_request_t **request) \ + { \ + int world_rank; \ + /** \ + * If this fails the destination is not part of my MPI_COM_WORLD \ + * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank \ + */ \ + if(OPAL_SUCCESS == mca_common_monitoring_get_world_rank(source_rank, win->w_group, &world_rank)) { \ + size_t type_size, data_size; \ + ompi_datatype_type_size(origin_datatype, &type_size); \ + data_size = origin_count*type_size; \ + mca_common_monitoring_record_osc(world_rank, 0, SEND); \ + mca_common_monitoring_record_osc(world_rank, data_size, RECV); \ + OPAL_MONITORING_PRINT_INFO("MPI_Rget to %d intercepted", world_rank); \ + } \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_rget(origin_addr, origin_count, origin_datatype, source_rank, source_disp, source_count, source_datatype, win, request); \ + } + +#endif /* MCA_OSC_MONITORING_COMM_H */ + diff --git a/ompi/mca/osc/monitoring/osc_monitoring_component.c b/ompi/mca/osc/monitoring/osc_monitoring_component.c new file mode 100644 index 00000000000..39247e179ee --- /dev/null +++ b/ompi/mca/osc/monitoring/osc_monitoring_component.c @@ -0,0 +1,141 @@ +/* + * Copyright (c) 2016-2017 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include "osc_monitoring.h" +#include +#include +#include +#include +#include +#include +#include +#include + +/**************************************************/ +/* Include templated macros and inlined functions */ + +#include "osc_monitoring_template_gen.h" + +/**************************************************/ + +static int mca_osc_monitoring_component_init(bool enable_progress_threads, + bool enable_mpi_threads) +{ + OPAL_MONITORING_PRINT_INFO("osc_component_init"); + return mca_common_monitoring_init(); +} + +static int mca_osc_monitoring_component_finish(void) +{ + OPAL_MONITORING_PRINT_INFO("osc_component_finish"); + mca_common_monitoring_finalize(); + return OMPI_SUCCESS; +} + +static int mca_osc_monitoring_component_register(void) +{ + return OMPI_SUCCESS; +} + +static int mca_osc_monitoring_component_query(struct ompi_win_t *win, void **base, size_t size, int disp_unit, + struct ompi_communicator_t *comm, struct opal_info_t *info, + int flavor) +{ + OPAL_MONITORING_PRINT_INFO("osc_component_query"); + return mca_osc_monitoring_component.priority; +} + +static inline int +ompi_mca_osc_monitoring_set_template(ompi_osc_base_component_t *best_component, + ompi_osc_base_module_t *module) +{ + osc_monitoring_components_list_t comp = osc_monitoring_components_list[0]; + for (unsigned i = 0; NULL != comp.name; comp = osc_monitoring_components_list[++i]) { + if ( 0 == strcmp(comp.name, best_component->osc_version.mca_component_name) ) { + comp.fct(module); + return OMPI_SUCCESS; + } + } + return OMPI_ERR_NOT_SUPPORTED; +} + +static int mca_osc_monitoring_component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit, + struct ompi_communicator_t *comm, struct opal_info_t *info, + int flavor, int *model) +{ + OPAL_MONITORING_PRINT_INFO("osc_component_select"); + opal_list_item_t *item; + ompi_osc_base_component_t *best_component = NULL; + int best_priority = -1, priority, ret = OMPI_SUCCESS; + + /* Redo the select loop to add our layer in the middle */ + for (item = opal_list_get_first(&ompi_osc_base_framework.framework_components) ; + item != opal_list_get_end(&ompi_osc_base_framework.framework_components) ; + item = opal_list_get_next(item)) { + ompi_osc_base_component_t *component = (ompi_osc_base_component_t*) + ((mca_base_component_list_item_t*) item)->cli_component; + + if( component == (ompi_osc_base_component_t*)(&mca_osc_monitoring_component) ) + continue; /* skip self */ + + priority = component->osc_query(win, base, size, disp_unit, comm, info, flavor); + if (priority < 0) { + if (MPI_WIN_FLAVOR_SHARED == flavor && OMPI_ERR_RMA_SHARED == priority) { + /* NTH: quick fix to return OMPI_ERR_RMA_SHARED */ + return OMPI_ERR_RMA_SHARED; + } + continue; + } + + if (priority > best_priority) { + best_component = component; + best_priority = priority; + } + } + + if (NULL == best_component) return OMPI_ERR_NOT_SUPPORTED; + OPAL_MONITORING_PRINT_INFO("osc: chosen one: %s", best_component->osc_version.mca_component_name); + ret = best_component->osc_select(win, base, size, disp_unit, comm, info, flavor, model); + if( OMPI_SUCCESS == ret ) { + /* Intercept module functions with ours, based on selected component */ + ret = ompi_mca_osc_monitoring_set_template(best_component, win->w_osc_module); + if (OMPI_ERR_NOT_SUPPORTED == ret) { + OPAL_MONITORING_PRINT_WARN("osc: monitoring disabled: no module for this component " + "(%s)", best_component->osc_version.mca_component_name); + return OMPI_SUCCESS; + } + } + return ret; +} + +ompi_osc_monitoring_component_t mca_osc_monitoring_component = { + .super = { + /* First, the mca_base_component_t struct containing meta + information about the component itself */ + .osc_version = { + OMPI_OSC_BASE_VERSION_3_0_0, + + .mca_component_name = "monitoring", /* MCA component name */ + MCA_MONITORING_MAKE_VERSION, + .mca_register_component_params = mca_osc_monitoring_component_register + }, + .osc_data = { + /* The component is checkpoint ready */ + MCA_BASE_METADATA_PARAM_CHECKPOINT + }, + + .osc_init = mca_osc_monitoring_component_init, /* component init */ + .osc_finalize = mca_osc_monitoring_component_finish, /* component finalize */ + .osc_query = mca_osc_monitoring_component_query, + .osc_select = mca_osc_monitoring_component_select + }, + .priority = INT_MAX +}; diff --git a/ompi/mca/osc/monitoring/osc_monitoring_dynamic.h b/ompi/mca/osc/monitoring/osc_monitoring_dynamic.h new file mode 100644 index 00000000000..5a8101ea200 --- /dev/null +++ b/ompi/mca/osc/monitoring/osc_monitoring_dynamic.h @@ -0,0 +1,27 @@ +/* + * Copyright (c) 2016 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#ifndef MCA_OSC_MONITORING_DYNAMIC_H +#define MCA_OSC_MONITORING_DYNAMIC_H + +#include + +#define OSC_MONITORING_GENERATE_TEMPLATE_DYNAMIC(template) \ + \ + static int ompi_osc_monitoring_## template ##_attach (struct ompi_win_t *win, void *base, size_t len) \ + { \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_win_attach(win, base, len); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_detach (struct ompi_win_t *win, const void *base) \ + { \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_win_detach(win, base); \ + } + +#endif /* MCA_OSC_MONITORING_DYNAMIC_H */ diff --git a/ompi/mca/osc/monitoring/osc_monitoring_module.h b/ompi/mca/osc/monitoring/osc_monitoring_module.h new file mode 100644 index 00000000000..9c7dfb12e70 --- /dev/null +++ b/ompi/mca/osc/monitoring/osc_monitoring_module.h @@ -0,0 +1,109 @@ +/* + * Copyright (c) 2016-2017 Inria. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 Amazon.com, Inc. or its affiliates. All Rights + * reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#ifndef MCA_OSC_MONITORING_MODULE_H +#define MCA_OSC_MONITORING_MODULE_H + +#include +#include +#include + +/* Define once and for all the module_template variable name */ +#define OMPI_OSC_MONITORING_MODULE_VARIABLE(template) \ + ompi_osc_monitoring_module_## template ##_template + +/* Define once and for all the + * ompi_osc_monitoring_## template ##_set_template function name + */ +#define OSC_MONITORING_SET_TEMPLATE_FCT_NAME(template) \ + ompi_osc_monitoring_## template ##_set_template + +/* Define the ompi_osc_monitoring_module_## template ##_template + * variable + */ +#define OMPI_OSC_MONITORING_MODULE_GENERATE(template) \ + /* Define the ompi_osc_monitoring_module_## template ##_template */ \ + static ompi_osc_base_module_t OMPI_OSC_MONITORING_MODULE_VARIABLE(template); + +#define OSC_MONITORING_GENERATE_TEMPLATE_MODULE(template) \ + \ + static int ompi_osc_monitoring_## template ##_free(ompi_win_t *win) \ + { \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_free(win); \ + } + +#define MCA_OSC_MONITORING_MODULE_TEMPLATE_GENERATE(template) \ + /* Generate template specific module initialization function: \ + * ompi_osc_monitoring_## template ##_set_template(ompi_osc_base_module_t*module) \ + */ \ + static inline ompi_osc_base_module_t * \ + OSC_MONITORING_SET_TEMPLATE_FCT_NAME(template) (ompi_osc_base_module_t*module) \ + { \ + /* Define the ompi_osc_monitoring_module_## template ##_init_done variable */ \ + static int32_t init_done = 0; \ + /* Define and set the ompi_osc_monitoring_## template \ + * ##_template variable. The functions recorded here are \ + * linked to the original functions of the original \ + * {template} module that was replaced. \ + */ \ + static const ompi_osc_base_module_t module_specific_interception_layer = { \ + .osc_win_attach = ompi_osc_monitoring_## template ##_attach, \ + .osc_win_detach = ompi_osc_monitoring_## template ##_detach, \ + .osc_free = ompi_osc_monitoring_## template ##_free, \ + \ + .osc_put = ompi_osc_monitoring_## template ##_put, \ + .osc_get = ompi_osc_monitoring_## template ##_get, \ + .osc_accumulate = ompi_osc_monitoring_## template ##_accumulate, \ + .osc_compare_and_swap = ompi_osc_monitoring_## template ##_compare_and_swap, \ + .osc_fetch_and_op = ompi_osc_monitoring_## template ##_fetch_and_op, \ + .osc_get_accumulate = ompi_osc_monitoring_## template ##_get_accumulate, \ + \ + .osc_rput = ompi_osc_monitoring_## template ##_rput, \ + .osc_rget = ompi_osc_monitoring_## template ##_rget, \ + .osc_raccumulate = ompi_osc_monitoring_## template ##_raccumulate, \ + .osc_rget_accumulate = ompi_osc_monitoring_## template ##_rget_accumulate, \ + \ + .osc_fence = ompi_osc_monitoring_## template ##_fence, \ + \ + .osc_start = ompi_osc_monitoring_## template ##_start, \ + .osc_complete = ompi_osc_monitoring_## template ##_complete, \ + .osc_post = ompi_osc_monitoring_## template ##_post, \ + .osc_wait = ompi_osc_monitoring_## template ##_wait, \ + .osc_test = ompi_osc_monitoring_## template ##_test, \ + \ + .osc_lock = ompi_osc_monitoring_## template ##_lock, \ + .osc_unlock = ompi_osc_monitoring_## template ##_unlock, \ + .osc_lock_all = ompi_osc_monitoring_## template ##_lock_all, \ + .osc_unlock_all = ompi_osc_monitoring_## template ##_unlock_all, \ + \ + .osc_sync = ompi_osc_monitoring_## template ##_sync, \ + .osc_flush = ompi_osc_monitoring_## template ##_flush, \ + .osc_flush_all = ompi_osc_monitoring_## template ##_flush_all, \ + .osc_flush_local = ompi_osc_monitoring_## template ##_flush_local, \ + .osc_flush_local_all = ompi_osc_monitoring_## template ##_flush_local_all, \ + }; \ + if ( 1 == opal_atomic_add_fetch_32(&init_done, 1) ) { \ + /* Saves the original module functions in \ + * ompi_osc_monitoring_module_## template ##_template \ + */ \ + memcpy(&OMPI_OSC_MONITORING_MODULE_VARIABLE(template), \ + module, sizeof(ompi_osc_base_module_t)); \ + } \ + /* Replace the original functions with our generated ones */ \ + memcpy(module, &module_specific_interception_layer, \ + sizeof(ompi_osc_base_module_t)); \ + return module; \ + } + +#endif /* MCA_OSC_MONITORING_MODULE_H */ + diff --git a/ompi/mca/osc/monitoring/osc_monitoring_passive_target.h b/ompi/mca/osc/monitoring/osc_monitoring_passive_target.h new file mode 100644 index 00000000000..9e91b3f6e76 --- /dev/null +++ b/ompi/mca/osc/monitoring/osc_monitoring_passive_target.h @@ -0,0 +1,63 @@ +/* + * Copyright (c) 2016 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#ifndef MCA_OSC_MONITORING_PASSIVE_TARGET_H +#define MCA_OSC_MONITORING_PASSIVE_TARGET_H + +#include + +#define OSC_MONITORING_GENERATE_TEMPLATE_PASSIVE_TARGET(template) \ + \ + static int ompi_osc_monitoring_## template ##_sync (struct ompi_win_t *win) \ + { \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_sync(win); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_flush (int target, struct ompi_win_t *win) \ + { \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_flush(target, win); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_flush_all (struct ompi_win_t *win) \ + { \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_flush_all(win); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_flush_local (int target, struct ompi_win_t *win) \ + { \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_flush_local(target, win); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_flush_local_all (struct ompi_win_t *win) \ + { \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_flush_local_all(win); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_lock (int lock_type, int target, int assert, ompi_win_t *win) \ + { \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_lock(lock_type, target, assert, win); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_unlock (int target, ompi_win_t *win) \ + { \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_unlock(target, win); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_lock_all (int assert, struct ompi_win_t *win) \ + { \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_lock_all(assert, win); \ + } \ + \ + static int ompi_osc_monitoring_## template ##_unlock_all (struct ompi_win_t *win) \ + { \ + return OMPI_OSC_MONITORING_MODULE_VARIABLE(template).osc_unlock_all(win); \ + } + +#endif /* MCA_OSC_MONITORING_PASSIVE_TARGET_H */ + diff --git a/ompi/mca/osc/monitoring/osc_monitoring_template.h b/ompi/mca/osc/monitoring/osc_monitoring_template.h new file mode 100644 index 00000000000..f78a678b8d6 --- /dev/null +++ b/ompi/mca/osc/monitoring/osc_monitoring_template.h @@ -0,0 +1,53 @@ +/* + * Copyright (c) 2016-2017 Inria. All rights reserved. + * Copyright (c) 2017 Amazon.com, Inc. or its affiliates. All Rights + * reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#ifndef MCA_OSC_MONITORING_TEMPLATE_H +#define MCA_OSC_MONITORING_TEMPLATE_H + +#include +#include +#include +#include +#include "osc_monitoring_accumulate.h" +#include "osc_monitoring_active_target.h" +#include "osc_monitoring_comm.h" +#include "osc_monitoring_dynamic.h" +#include "osc_monitoring_module.h" +#include "osc_monitoring_passive_target.h" + +/* The magic used here is that for a given module type (given with the + * {template} parameter), we generate a set of every functions defined + * in ompi_osc_base_module_t, the ompi_osc_monitoring_module_## + * template ##_template variable recording the original set of + * functions, and the generated set of functions is recorded as a + * static variable inside the initialization function. When a function + * is called from the original module, we route the call to our + * generated function that does the monitoring, and then we call the + * original function that had been saved in the + * ompi_osc_monitoring_module_## template ##_template variable. + */ +#define OSC_MONITORING_MODULE_TEMPLATE_GENERATE(template) \ + /* Generate the proper symbol for the \ + ompi_osc_monitoring_module_## template ##_template variable */ \ + OMPI_OSC_MONITORING_MODULE_GENERATE(template) \ + /* Generate each module specific functions */ \ + OSC_MONITORING_GENERATE_TEMPLATE_ACCUMULATE(template) \ + OSC_MONITORING_GENERATE_TEMPLATE_ACTIVE_TARGET(template) \ + OSC_MONITORING_GENERATE_TEMPLATE_COMM(template) \ + OSC_MONITORING_GENERATE_TEMPLATE_DYNAMIC(template) \ + OSC_MONITORING_GENERATE_TEMPLATE_MODULE(template) \ + OSC_MONITORING_GENERATE_TEMPLATE_PASSIVE_TARGET(template) \ + /* Generate template specific module initialization function: \ + * ompi_osc_monitoring_## template ##_set_template(ompi_osc_base_module_t*module) \ + */ \ + MCA_OSC_MONITORING_MODULE_TEMPLATE_GENERATE(template) + +#endif /* MCA_OSC_MONITORING_TEMPLATE_H */ diff --git a/ompi/mca/osc/osc.h b/ompi/mca/osc/osc.h index 61ae2880036..a7892dfc7b0 100644 --- a/ompi/mca/osc/osc.h +++ b/ompi/mca/osc/osc.h @@ -11,8 +11,9 @@ * Copyright (c) 2007-2015 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -45,7 +46,7 @@ BEGIN_C_DECLS struct ompi_win_t; -struct ompi_info_t; +struct opal_info_t; struct ompi_communicator_t; struct ompi_group_t; struct ompi_datatype_t; @@ -116,7 +117,7 @@ typedef int (*ompi_osc_base_component_query_fn_t)(struct ompi_win_t *win, size_t size, int disp_unit, struct ompi_communicator_t *comm, - struct ompi_info_t *info, + struct opal_info_t *info, int flavor); /** @@ -148,7 +149,7 @@ typedef int (*ompi_osc_base_component_select_fn_t)(struct ompi_win_t *win, size_t size, int disp_unit, struct ompi_communicator_t *comm, - struct ompi_info_t *info, + struct opal_info_t *info, int flavor, int *model); @@ -207,7 +208,7 @@ typedef int (*ompi_osc_base_module_put_fn_t)(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win); @@ -217,7 +218,7 @@ typedef int (*ompi_osc_base_module_get_fn_t)(void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win); @@ -227,7 +228,7 @@ typedef int (*ompi_osc_base_module_accumulate_fn_t)(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_op_t *op, @@ -238,14 +239,14 @@ typedef int (*ompi_osc_base_module_compare_and_swap_fn_t)(const void *origin_add void *result_addr, struct ompi_datatype_t *dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, struct ompi_win_t *win); typedef int (*ompi_osc_base_module_fetch_and_op_fn_t)(const void *origin_addr, void *result_addr, struct ompi_datatype_t *dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, struct ompi_op_t *op, struct ompi_win_t *win); @@ -256,7 +257,7 @@ typedef int (*ompi_osc_base_module_get_accumulate_fn_t)(const void *origin_addr, int result_count, struct ompi_datatype_t *result_datatype, int target_rank, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_datatype, struct ompi_op_t *op, @@ -266,7 +267,7 @@ typedef int (*ompi_osc_base_module_rput_fn_t)(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win, @@ -276,7 +277,7 @@ typedef int (*ompi_osc_base_module_rget_fn_t)(void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win, @@ -287,7 +288,7 @@ typedef int (*ompi_osc_base_module_raccumulate_fn_t)(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_op_t *op, @@ -301,7 +302,7 @@ typedef int (*ompi_osc_base_module_rget_accumulate_fn_t)(const void *origin_addr int result_count, struct ompi_datatype_t *result_datatype, int target_rank, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_datatype, struct ompi_op_t *op, @@ -352,9 +353,6 @@ typedef int (*ompi_osc_base_module_flush_local_fn_t)(int target, struct ompi_win_t *win); typedef int (*ompi_osc_base_module_flush_local_all_fn_t)(struct ompi_win_t *win); -typedef int (*ompi_osc_base_module_set_info_fn_t)(struct ompi_win_t *win, struct ompi_info_t *info); -typedef int (*ompi_osc_base_module_get_info_fn_t)(struct ompi_win_t *win, struct ompi_info_t **info_used); - /* ******************************************************************** */ @@ -406,9 +404,6 @@ struct ompi_osc_base_module_3_0_0_t { ompi_osc_base_module_flush_all_fn_t osc_flush_all; ompi_osc_base_module_flush_local_fn_t osc_flush_local; ompi_osc_base_module_flush_local_all_fn_t osc_flush_local_all; - - ompi_osc_base_module_set_info_fn_t osc_set_info; - ompi_osc_base_module_get_info_fn_t osc_get_info; }; typedef struct ompi_osc_base_module_3_0_0_t ompi_osc_base_module_3_0_0_t; typedef ompi_osc_base_module_3_0_0_t ompi_osc_base_module_t; diff --git a/ompi/mca/osc/portals4/Makefile.am b/ompi/mca/osc/portals4/Makefile.am index 73b7ed9d5ff..a7f5a061254 100644 --- a/ompi/mca/osc/portals4/Makefile.am +++ b/ompi/mca/osc/portals4/Makefile.am @@ -1,5 +1,6 @@ # # Copyright (c) 2011 Sandia National Laboratories. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -35,7 +36,8 @@ endif mcacomponentdir = $(pkglibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_osc_portals4_la_SOURCES = $(portals4_sources) -mca_osc_portals4_la_LIBADD = $(osc_portals4_LIBS) +mca_osc_portals4_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(osc_portals4_LIBS) mca_osc_portals4_la_LDFLAGS = -module -avoid-version $(osc_portals4_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/ompi/mca/osc/portals4/osc_portals4.h b/ompi/mca/osc/portals4/osc_portals4.h index b35c0ed9053..682003b40fe 100644 --- a/ompi/mca/osc/portals4/osc_portals4.h +++ b/ompi/mca/osc/portals4/osc_portals4.h @@ -3,8 +3,9 @@ * Copyright (c) 2011-2017 Sandia National Laboratories. All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -143,7 +144,7 @@ int ompi_osc_portals4_put(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win); @@ -152,7 +153,7 @@ int ompi_osc_portals4_get(void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win); @@ -161,7 +162,7 @@ int ompi_osc_portals4_accumulate(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_op_t *op, @@ -172,14 +173,14 @@ int ompi_osc_portals4_compare_and_swap(const void *origin_addr, void *result_addr, struct ompi_datatype_t *dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, struct ompi_win_t *win); int ompi_osc_portals4_fetch_and_op(const void *origin_addr, void *result_addr, struct ompi_datatype_t *dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, struct ompi_op_t *op, struct ompi_win_t *win); @@ -190,7 +191,7 @@ int ompi_osc_portals4_get_accumulate(const void *origin_addr, int result_count, struct ompi_datatype_t *result_datatype, int target_rank, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_datatype, struct ompi_op_t *op, @@ -200,7 +201,7 @@ int ompi_osc_portals4_rput(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win, @@ -210,7 +211,7 @@ int ompi_osc_portals4_rget(void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win, @@ -220,7 +221,7 @@ int ompi_osc_portals4_raccumulate(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_op_t *op, @@ -234,7 +235,7 @@ int ompi_osc_portals4_rget_accumulate(const void *origin_addr, int result_count, struct ompi_datatype_t *result_datatype, int target_rank, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_datatype, struct ompi_op_t *op, @@ -281,8 +282,8 @@ int ompi_osc_portals4_flush_local(int target, struct ompi_win_t *win); int ompi_osc_portals4_flush_local_all(struct ompi_win_t *win); -int ompi_osc_portals4_set_info(struct ompi_win_t *win, struct ompi_info_t *info); -int ompi_osc_portals4_get_info(struct ompi_win_t *win, struct ompi_info_t **info_used); +int ompi_osc_portals4_set_info(struct ompi_win_t *win, struct opal_info_t *info); +int ompi_osc_portals4_get_info(struct ompi_win_t *win, struct opal_info_t **info_used); static inline int ompi_osc_portals4_complete_all(ompi_osc_portals4_module_t *module) diff --git a/ompi/mca/osc/portals4/osc_portals4_active_target.c b/ompi/mca/osc/portals4/osc_portals4_active_target.c index e2bd9a9da20..23a763efe8e 100644 --- a/ompi/mca/osc/portals4/osc_portals4_active_target.c +++ b/ompi/mca/osc/portals4/osc_portals4_active_target.c @@ -99,7 +99,7 @@ ompi_osc_portals4_complete(struct ompi_win_t *win) PTL_SUM, PTL_INT32_T); if (ret != OMPI_SUCCESS) return ret; - OPAL_THREAD_ADD64(&module->opcount, 1); + OPAL_THREAD_ADD_FETCH64(&module->opcount, 1); } ret = ompi_osc_portals4_complete_all(module); @@ -144,7 +144,7 @@ ompi_osc_portals4_post(struct ompi_group_t *group, PTL_SUM, PTL_INT32_T); if (ret != OMPI_SUCCESS) return ret; - OPAL_THREAD_ADD64(&module->opcount, 1); + OPAL_THREAD_ADD_FETCH64(&module->opcount, 1); } } else { module->post_group = NULL; diff --git a/ompi/mca/osc/portals4/osc_portals4_comm.c b/ompi/mca/osc/portals4/osc_portals4_comm.c index 3b197f9708c..b125f2aee50 100644 --- a/ompi/mca/osc/portals4/osc_portals4_comm.c +++ b/ompi/mca/osc/portals4/osc_portals4_comm.c @@ -3,7 +3,7 @@ * Copyright (c) 2014 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -206,7 +206,7 @@ segmentedPut(int64_t *opcount, ptl_size_t bytes_put = 0; do { - opal_atomic_add_64(opcount, 1); + opal_atomic_add_fetch_64(opcount, 1); ptl_size_t frag_length = MIN(put_length, segment_length); OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, @@ -222,7 +222,7 @@ segmentedPut(int64_t *opcount, user_ptr, hdr_data); if (PTL_OK != ret) { - opal_atomic_add_64(opcount, -1); + opal_atomic_add_fetch_64(opcount, -1); opal_output_verbose(1, ompi_osc_base_framework.framework_output, "%s:%d PtlPut failed with return value %d", __FUNCTION__, __LINE__, ret); @@ -251,7 +251,7 @@ segmentedGet(int64_t *opcount, ptl_size_t bytes_gotten = 0; do { - opal_atomic_add_64(opcount, 1); + opal_atomic_add_fetch_64(opcount, 1); ptl_size_t frag_length = MIN(get_length, segment_length); OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, @@ -266,7 +266,7 @@ segmentedGet(int64_t *opcount, target_offset + bytes_gotten, user_ptr); if (PTL_OK != ret) { - opal_atomic_add_64(opcount, -1); + opal_atomic_add_fetch_64(opcount, -1); opal_output_verbose(1, ompi_osc_base_framework.framework_output, "%s:%d PtlGet failed with return value %d", __FUNCTION__, __LINE__, ret); @@ -297,7 +297,7 @@ segmentedAtomic(int64_t *opcount, ptl_size_t sent = 0; do { - opal_atomic_add_64(opcount, 1); + opal_atomic_add_fetch_64(opcount, 1); ptl_size_t frag_length = MIN(length, segment_length); OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, @@ -315,7 +315,7 @@ segmentedAtomic(int64_t *opcount, ptl_op, ptl_dt); if (PTL_OK != ret) { - opal_atomic_add_64(opcount, -1); + opal_atomic_add_fetch_64(opcount, -1); opal_output_verbose(1, ompi_osc_base_framework.framework_output, "%s:%d PtlAtomic failed with return value %d", __FUNCTION__, __LINE__, ret); @@ -348,7 +348,7 @@ segmentedFetchAtomic(int64_t *opcount, ptl_size_t sent = 0; do { - opal_atomic_add_64(opcount, 1); + opal_atomic_add_fetch_64(opcount, 1); ptl_size_t frag_length = MIN(length, segment_length); OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, @@ -367,7 +367,7 @@ segmentedFetchAtomic(int64_t *opcount, ptl_op, ptl_dt); if (PTL_OK != ret) { - opal_atomic_add_64(opcount, -1); + opal_atomic_add_fetch_64(opcount, -1); opal_output_verbose(1, ompi_osc_base_framework.framework_output, "%s:%d PtlFetchAtomic failed with return value %d", __FUNCTION__, __LINE__, ret); @@ -399,7 +399,7 @@ segmentedSwap(int64_t *opcount, ptl_size_t sent = 0; do { - opal_atomic_add_64(opcount, 1); + opal_atomic_add_fetch_64(opcount, 1); ptl_size_t frag_length = MIN(length, segment_length); OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, @@ -419,7 +419,7 @@ segmentedSwap(int64_t *opcount, PTL_SWAP, ptl_dt); if (PTL_OK != ret) { - opal_atomic_add_64(opcount, -1); + opal_atomic_add_fetch_64(opcount, -1); opal_output_verbose(1, ompi_osc_base_framework.framework_output, "%s:%d PtlSwap failed with return value %d", __FUNCTION__, __LINE__, ret); @@ -501,7 +501,7 @@ get_to_iovec(ompi_osc_portals4_module_t *module, { int ret; size_t size; - OPAL_PTRDIFF_TYPE length, origin_lb, target_lb, extent; + ptrdiff_t length, origin_lb, target_lb, extent; ptl_md_t md; if (module->origin_iovec_md_h != PTL_INVALID_HANDLE) { @@ -547,7 +547,7 @@ get_to_iovec(ompi_osc_portals4_module_t *module, return ret; } - opal_atomic_add_64(&module->opcount, 1); + opal_atomic_add_fetch_64(&module->opcount, 1); OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, "%s,%d Get(origin_count=%d, origin_lb=%lu, target_count=%d, target_lb=%lu, size=%lu, length=%lu, offset=%lu, op_count=%ld)", @@ -564,7 +564,7 @@ get_to_iovec(ompi_osc_portals4_module_t *module, OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, "%s,%d PtlGet() failed: ret = %d", __FUNCTION__, __LINE__, ret)); - opal_atomic_add_64(&module->opcount, -1); + opal_atomic_add_fetch_64(&module->opcount, -1); return ret; } @@ -588,7 +588,7 @@ atomic_get_to_iovec(ompi_osc_portals4_module_t *module, { int ret; size_t size; - OPAL_PTRDIFF_TYPE length, origin_lb, target_lb, extent; + ptrdiff_t length, origin_lb, target_lb, extent; ptl_md_t md; if (module->origin_iovec_md_h != PTL_INVALID_HANDLE) { @@ -670,7 +670,7 @@ put_from_iovec(ompi_osc_portals4_module_t *module, { int ret; size_t size; - OPAL_PTRDIFF_TYPE length, origin_lb, target_lb, extent; + ptrdiff_t length, origin_lb, target_lb, extent; ptl_md_t md; if (module->origin_iovec_md_h != PTL_INVALID_HANDLE) { @@ -716,7 +716,7 @@ put_from_iovec(ompi_osc_portals4_module_t *module, return ret; } - opal_atomic_add_64(&module->opcount, 1); + opal_atomic_add_fetch_64(&module->opcount, 1); OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, "%s,%d Put(origin_count=%d, origin_lb=%lu, target_count=%d, target_lb=%lu, size=%lu, length=%lu, offset=%lu, op_count=%ld)", @@ -735,7 +735,7 @@ put_from_iovec(ompi_osc_portals4_module_t *module, OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, "%s,%d PtlPut() failed: ret = %d", __FUNCTION__, __LINE__, ret)); - opal_atomic_add_64(&module->opcount, -1); + opal_atomic_add_fetch_64(&module->opcount, -1); return ret; } @@ -759,7 +759,7 @@ atomic_put_from_iovec(ompi_osc_portals4_module_t *module, { int ret; size_t size; - OPAL_PTRDIFF_TYPE length, origin_lb, target_lb, extent; + ptrdiff_t length, origin_lb, target_lb, extent; ptl_md_t md; if (module->origin_iovec_md_h != PTL_INVALID_HANDLE) { @@ -844,7 +844,7 @@ atomic_from_iovec(ompi_osc_portals4_module_t *module, { int ret; size_t size; - OPAL_PTRDIFF_TYPE length, origin_lb, target_lb, extent; + ptrdiff_t length, origin_lb, target_lb, extent; ptl_md_t md; ptl_op_t ptl_op; ptl_datatype_t ptl_dt; @@ -944,7 +944,7 @@ swap_to_iovec(ompi_osc_portals4_module_t *module, int ret; size_t size; ptl_size_t iovec_count=0; - OPAL_PTRDIFF_TYPE length, result_lb, origin_lb, target_lb, extent; + ptrdiff_t length, result_lb, origin_lb, target_lb, extent; ptl_md_t md; ptl_datatype_t ptl_dt; @@ -1069,7 +1069,7 @@ fetch_atomic_to_iovec(ompi_osc_portals4_module_t *module, int ret; size_t size; ptl_size_t iovec_count=0; - OPAL_PTRDIFF_TYPE length, result_lb, origin_lb, target_lb, extent; + ptrdiff_t length, result_lb, origin_lb, target_lb, extent; ptl_md_t md; ptl_op_t ptl_op; ptl_datatype_t ptl_dt; @@ -1252,7 +1252,7 @@ put_to_noncontig(int64_t *opcount, /* determine how much to transfer in this operation */ rdma_len = MIN(MIN(origin_iovec[origin_iov_index].iov_len, target_iovec[target_iov_index].iov_len), max_rdma_len); - opal_atomic_add_64(opcount, 1); + opal_atomic_add_fetch_64(opcount, 1); OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, "performing rdma on contiguous region. local: %p, remote: %p, len: %lu", @@ -1270,7 +1270,7 @@ put_to_noncontig(int64_t *opcount, user_ptr, 0); if (OPAL_UNLIKELY(PTL_OK != ret)) { - opal_atomic_add_64(opcount, -1); + opal_atomic_add_fetch_64(opcount, -1); return ret; } @@ -1361,7 +1361,7 @@ atomic_put_to_noncontig(ompi_osc_portals4_module_t *module, /* determine how much to transfer in this operation */ rdma_len = MIN(MIN(origin_iovec[origin_iov_index].iov_len, target_iovec[target_iov_index].iov_len), max_rdma_len); - opal_atomic_add_64(&module->opcount, 1); + opal_atomic_add_fetch_64(&module->opcount, 1); OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, "performing rdma on contiguous region. local: %p, remote: %p, len: %lu", @@ -1379,7 +1379,7 @@ atomic_put_to_noncontig(ompi_osc_portals4_module_t *module, user_ptr, 0); if (OPAL_UNLIKELY(PTL_OK != ret)) { - opal_atomic_add_64(&module->opcount, -1); + opal_atomic_add_fetch_64(&module->opcount, -1); return ret; } @@ -1479,7 +1479,7 @@ atomic_to_noncontig(ompi_osc_portals4_module_t *module, /* determine how much to transfer in this operation */ atomic_len = MIN(MIN(origin_iovec[origin_iov_index].iov_len, target_iovec[target_iov_index].iov_len), module->atomic_max); - opal_atomic_add_64(&module->opcount, 1); + opal_atomic_add_fetch_64(&module->opcount, 1); OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, "performing rdma on contiguous region. local: %p, remote: %p, len: %lu", @@ -1501,7 +1501,7 @@ atomic_to_noncontig(ompi_osc_portals4_module_t *module, ptl_op, ptl_dt); if (OPAL_UNLIKELY(PTL_OK != ret)) { - opal_atomic_add_64(&module->opcount, -1); + opal_atomic_add_fetch_64(&module->opcount, -1); return ret; } @@ -1586,7 +1586,7 @@ get_from_noncontig(int64_t *opcount, /* determine how much to transfer in this operation */ rdma_len = MIN(MIN(origin_iovec[origin_iov_index].iov_len, target_iovec[target_iov_index].iov_len), max_rdma_len); - opal_atomic_add_64(opcount, 1); + opal_atomic_add_fetch_64(opcount, 1); OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, "performing rdma on contiguous region. local: %p, remote: %p, len: %lu", @@ -1602,7 +1602,7 @@ get_from_noncontig(int64_t *opcount, offset + (ptl_size_t)target_iovec[target_iov_index].iov_base, user_ptr); if (OPAL_UNLIKELY(PTL_OK != ret)) { - opal_atomic_add_64(opcount, -1); + opal_atomic_add_fetch_64(opcount, -1); return ret; } @@ -1687,7 +1687,7 @@ atomic_get_from_noncontig(ompi_osc_portals4_module_t *module, /* determine how much to transfer in this operation */ rdma_len = MIN(MIN(origin_iovec[origin_iov_index].iov_len, target_iovec[target_iov_index].iov_len), max_rdma_len); - opal_atomic_add_64(&module->opcount, 1); + opal_atomic_add_fetch_64(&module->opcount, 1); OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, "performing rdma on contiguous region. local: %p, remote: %p, len: %lu", @@ -1703,7 +1703,7 @@ atomic_get_from_noncontig(ompi_osc_portals4_module_t *module, offset + (ptl_size_t)target_iovec[target_iov_index].iov_base, user_ptr); if (OPAL_UNLIKELY(PTL_OK != ret)) { - opal_atomic_add_64(&module->opcount, -1); + opal_atomic_add_fetch_64(&module->opcount, -1); return ret; } @@ -1817,7 +1817,7 @@ swap_from_noncontig(ompi_osc_portals4_module_t *module, /* determine how much to transfer in this operation */ rdma_len = MIN(MIN(origin_iovec[origin_iov_index].iov_len, target_iovec[target_iov_index].iov_len), max_rdma_len); - opal_atomic_add_64(&module->opcount, 1); + opal_atomic_add_fetch_64(&module->opcount, 1); OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, "performing swap on contiguous region. result: %p origin: %p, target: %p, len: %lu", @@ -1844,7 +1844,7 @@ swap_from_noncontig(ompi_osc_portals4_module_t *module, opal_output_verbose(1, ompi_osc_base_framework.framework_output, "%s:%d PtlSwap failed with return value %d", __FUNCTION__, __LINE__, ret); - opal_atomic_add_64(&module->opcount, -1); + opal_atomic_add_fetch_64(&module->opcount, -1); return ret; } @@ -1969,7 +1969,7 @@ fetch_atomic_from_noncontig(ompi_osc_portals4_module_t *module, /* determine how much to transfer in this operation */ rdma_len = MIN(MIN(origin_iovec[origin_iov_index].iov_len, target_iovec[target_iov_index].iov_len), max_rdma_len); - opal_atomic_add_64(&module->opcount, 1); + opal_atomic_add_fetch_64(&module->opcount, 1); OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, "performing swap on contiguous region. result: %p origin: %p, target: %p, len: %lu", @@ -1995,7 +1995,7 @@ fetch_atomic_from_noncontig(ompi_osc_portals4_module_t *module, opal_output_verbose(1, ompi_osc_base_framework.framework_output, "%s:%d PtlFetchAtomic failed with return value %d", __FUNCTION__, __LINE__, ret); - opal_atomic_add_64(&module->opcount, -1); + opal_atomic_add_fetch_64(&module->opcount, -1); return ret; } @@ -2021,7 +2021,7 @@ ompi_osc_portals4_rput(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win, @@ -2033,7 +2033,7 @@ ompi_osc_portals4_rput(const void *origin_addr, (ompi_osc_portals4_module_t*) win->w_osc_module; ptl_process_t peer = ompi_osc_portals4_get_peer(module, target); size_t size, offset; - OPAL_PTRDIFF_TYPE length, origin_lb, target_lb, extent; + ptrdiff_t length, origin_lb, target_lb, extent; OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output, "rput: 0x%lx, %d, %s, %d, %lu, %d, %s, 0x%lx", @@ -2133,7 +2133,7 @@ ompi_osc_portals4_rget(void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win, @@ -2145,7 +2145,7 @@ ompi_osc_portals4_rget(void *origin_addr, (ompi_osc_portals4_module_t*) win->w_osc_module; ptl_process_t peer = ompi_osc_portals4_get_peer(module, target); size_t offset, size; - OPAL_PTRDIFF_TYPE length, origin_lb, target_lb, extent; + ptrdiff_t length, origin_lb, target_lb, extent; OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output, "rget: 0x%lx, %d, %s, %d, %lu, %d, %s, 0x%lx", @@ -2238,7 +2238,7 @@ ompi_osc_portals4_raccumulate(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_op_t *op, @@ -2253,7 +2253,7 @@ ompi_osc_portals4_raccumulate(const void *origin_addr, size_t offset, size; ptl_op_t ptl_op; ptl_datatype_t ptl_dt; - OPAL_PTRDIFF_TYPE sent, length, origin_lb, target_lb, extent; + ptrdiff_t sent, length, origin_lb, target_lb, extent; OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output, "raccumulate: 0x%lx, %d, %s, %d, %lu, %d, %s, %s 0x%lx", @@ -2411,7 +2411,7 @@ ompi_osc_portals4_raccumulate(const void *origin_addr, do { size_t msg_length = MIN(module->atomic_max, length - sent); - (void)opal_atomic_add_64(&module->opcount, 1); + (void)opal_atomic_add_fetch_64(&module->opcount, 1); OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, "%s,%d Atomic", __FUNCTION__, __LINE__)); @@ -2428,7 +2428,7 @@ ompi_osc_portals4_raccumulate(const void *origin_addr, ptl_op, ptl_dt); if (OMPI_SUCCESS != ret) { - (void)opal_atomic_add_64(&module->opcount, -1); + (void)opal_atomic_add_fetch_64(&module->opcount, -1); OMPI_OSC_PORTALS4_REQUEST_RETURN(request); return ret; } @@ -2449,7 +2449,7 @@ ompi_osc_portals4_rget_accumulate(const void *origin_addr, int result_count, struct ompi_datatype_t *result_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_op_t *op, @@ -2464,7 +2464,7 @@ ompi_osc_portals4_rget_accumulate(const void *origin_addr, size_t target_offset, size; ptl_op_t ptl_op; ptl_datatype_t ptl_dt; - OPAL_PTRDIFF_TYPE length, origin_lb, target_lb, result_lb, extent; + ptrdiff_t length, origin_lb, target_lb, result_lb, extent; OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output, "rget_accumulate: 0x%lx, %d, %s, 0x%lx, %d, %s, %d, %lu, %d, %s, %s, 0x%lx", @@ -2798,7 +2798,7 @@ ompi_osc_portals4_put(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win) @@ -2808,7 +2808,7 @@ ompi_osc_portals4_put(const void *origin_addr, (ompi_osc_portals4_module_t*) win->w_osc_module; ptl_process_t peer = ompi_osc_portals4_get_peer(module, target); size_t offset, size; - OPAL_PTRDIFF_TYPE length, origin_lb, target_lb, extent; + ptrdiff_t length, origin_lb, target_lb, extent; OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output, "put: 0x%lx, %d, %s, %d, %lu, %d, %s, 0x%lx", @@ -2897,7 +2897,7 @@ ompi_osc_portals4_get(void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win) @@ -2907,7 +2907,7 @@ ompi_osc_portals4_get(void *origin_addr, (ompi_osc_portals4_module_t*) win->w_osc_module; ptl_process_t peer = ompi_osc_portals4_get_peer(module, target); size_t offset, size; - OPAL_PTRDIFF_TYPE length, origin_lb, target_lb, extent; + ptrdiff_t length, origin_lb, target_lb, extent; OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output, "get: 0x%lx, %d, %s, %d, %lu, %d, %s, 0x%lx", @@ -2993,7 +2993,7 @@ ompi_osc_portals4_accumulate(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_op_t *op, @@ -3006,7 +3006,7 @@ ompi_osc_portals4_accumulate(const void *origin_addr, size_t offset, size; ptl_op_t ptl_op; ptl_datatype_t ptl_dt; - OPAL_PTRDIFF_TYPE sent, length, origin_lb, target_lb, extent; + ptrdiff_t sent, length, origin_lb, target_lb, extent; OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output, "accumulate: 0x%lx, %d, %s, %d, %lu, %d, %s, %s, 0x%lx", @@ -3149,7 +3149,7 @@ ompi_osc_portals4_accumulate(const void *origin_addr, do { size_t msg_length = MIN(module->atomic_max, length - sent); - (void)opal_atomic_add_64(&module->opcount, 1); + (void)opal_atomic_add_fetch_64(&module->opcount, 1); OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, "%s,%d Atomic", __FUNCTION__, __LINE__)); @@ -3166,7 +3166,7 @@ ompi_osc_portals4_accumulate(const void *origin_addr, ptl_op, ptl_dt); if (OMPI_SUCCESS != ret) { - (void)opal_atomic_add_64(&module->opcount, -1); + (void)opal_atomic_add_fetch_64(&module->opcount, -1); return ret; } sent += msg_length; @@ -3186,7 +3186,7 @@ ompi_osc_portals4_get_accumulate(const void *origin_addr, int result_count, struct ompi_datatype_t *result_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_op_t *op, @@ -3199,7 +3199,7 @@ ompi_osc_portals4_get_accumulate(const void *origin_addr, size_t target_offset, size; ptl_op_t ptl_op; ptl_datatype_t ptl_dt; - OPAL_PTRDIFF_TYPE length, origin_lb, target_lb, result_lb, extent; + ptrdiff_t length, origin_lb, target_lb, result_lb, extent; OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output, "get_accumulate: 0x%lx, %d, %s, 0x%lx, %d, %s, %d, %lu, %d, %s, %s, 0x%lx", @@ -3504,7 +3504,7 @@ ompi_osc_portals4_compare_and_swap(const void *origin_addr, void *result_addr, struct ompi_datatype_t *dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, struct ompi_win_t *win) { int ret; @@ -3541,7 +3541,7 @@ ompi_osc_portals4_compare_and_swap(const void *origin_addr, result_md_offset = (ptl_size_t) result_addr; origin_md_offset = (ptl_size_t) origin_addr; - (void)opal_atomic_add_64(&module->opcount, 1); + (void)opal_atomic_add_fetch_64(&module->opcount, 1); OPAL_OUTPUT_VERBOSE((90,ompi_osc_base_framework.framework_output, "%s,%d Swap", __FUNCTION__, __LINE__)); @@ -3572,7 +3572,7 @@ ompi_osc_portals4_fetch_and_op(const void *origin_addr, void *result_addr, struct ompi_datatype_t *dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, struct ompi_op_t *op, struct ompi_win_t *win) { @@ -3613,7 +3613,7 @@ ompi_osc_portals4_fetch_and_op(const void *origin_addr, result_md_offset = (ptl_size_t) result_addr; origin_md_offset = (ptl_size_t) origin_addr; - (void)opal_atomic_add_64(&module->opcount, 1); + (void)opal_atomic_add_fetch_64(&module->opcount, 1); OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, "%s,%d Swap", __FUNCTION__, __LINE__)); ret = PtlSwap(module->md_h, @@ -3635,7 +3635,7 @@ ompi_osc_portals4_fetch_and_op(const void *origin_addr, md_offset = (ptl_size_t) result_addr; - (void)opal_atomic_add_64(&module->opcount, 1); + (void)opal_atomic_add_fetch_64(&module->opcount, 1); OPAL_OUTPUT_VERBOSE((90, ompi_osc_base_framework.framework_output, "%s,%d Get", __FUNCTION__, __LINE__)); ret = PtlGet(module->md_h, @@ -3648,7 +3648,7 @@ ompi_osc_portals4_fetch_and_op(const void *origin_addr, NULL); } else { ptl_size_t result_md_offset, origin_md_offset; - (void)opal_atomic_add_64(&module->opcount, 1); + (void)opal_atomic_add_fetch_64(&module->opcount, 1); ret = ompi_osc_portals4_get_op(op, &ptl_op); if (OMPI_SUCCESS != ret) { diff --git a/ompi/mca/osc/portals4/osc_portals4_component.c b/ompi/mca/osc/portals4/osc_portals4_component.c index 60fd088c51a..8a4781e3af6 100644 --- a/ompi/mca/osc/portals4/osc_portals4_component.c +++ b/ompi/mca/osc/portals4/osc_portals4_component.c @@ -8,6 +8,7 @@ * Copyright (c) 2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -30,10 +31,10 @@ static int component_register(void); static int component_init(bool enable_progress_threads, bool enable_mpi_threads); static int component_finalize(void); static int component_query(struct ompi_win_t *win, void **base, size_t size, int disp_unit, - struct ompi_communicator_t *comm, struct ompi_info_t *info, + struct ompi_communicator_t *comm, struct opal_info_t *info, int flavor); static int component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit, - struct ompi_communicator_t *comm, struct ompi_info_t *info, + struct ompi_communicator_t *comm, struct opal_info_t *info, int flavor, int *model); @@ -108,14 +109,14 @@ ompi_osc_portals4_module_t ompi_osc_portals4_module_template = { looks in the info structure passed by the user, then through mca parameters. */ static bool -check_config_value_bool(char *key, ompi_info_t *info) +check_config_value_bool(char *key, opal_info_t *info) { char *value_string; int value_len, ret, flag, param; const bool *flag_value; bool result; - ret = ompi_info_get_valuelen(info, key, &value_len, &flag); + ret = opal_info_get_valuelen(info, key, &value_len, &flag); if (OMPI_SUCCESS != ret) goto info_not_found; if (flag == 0) goto info_not_found; value_len++; @@ -123,13 +124,13 @@ check_config_value_bool(char *key, ompi_info_t *info) value_string = (char*)malloc(sizeof(char) * value_len + 1); /* Should malloc 1 char for NUL-termination */ if (NULL == value_string) goto info_not_found; - ret = ompi_info_get(info, key, value_len, value_string, &flag); + ret = opal_info_get(info, key, value_len, value_string, &flag); if (OMPI_SUCCESS != ret) { free(value_string); goto info_not_found; } assert(flag != 0); - ret = ompi_info_value_to_bool(value_string, &result); + ret = opal_info_value_to_bool(value_string, &result); free(value_string); if (OMPI_SUCCESS != ret) goto info_not_found; return result; @@ -146,14 +147,14 @@ check_config_value_bool(char *key, ompi_info_t *info) static bool -check_config_value_equal(char *key, ompi_info_t *info, char *value) +check_config_value_equal(char *key, opal_info_t *info, char *value) { char *value_string; int value_len, ret, flag, param; const bool *flag_value; bool result = false; - ret = ompi_info_get_valuelen(info, key, &value_len, &flag); + ret = opal_info_get_valuelen(info, key, &value_len, &flag); if (OMPI_SUCCESS != ret) goto info_not_found; if (flag == 0) goto info_not_found; value_len++; @@ -161,7 +162,7 @@ check_config_value_equal(char *key, ompi_info_t *info, char *value) value_string = (char*)malloc(sizeof(char) * value_len + 1); /* Should malloc 1 char for NUL-termination */ if (NULL == value_string) goto info_not_found; - ret = ompi_info_get(info, key, value_len, value_string, &flag); + ret = opal_info_get(info, key, value_len, value_string, &flag); if (OMPI_SUCCESS != ret) { free(value_string); goto info_not_found; @@ -229,8 +230,8 @@ progress_callback(void) } req = (ompi_osc_portals4_request_t*) ev.user_ptr; - opal_atomic_add_size_t(&req->super.req_status._ucount, ev.mlength); - ops = opal_atomic_add_32(&req->ops_committed, 1); + opal_atomic_add_fetch_size_t(&req->super.req_status._ucount, ev.mlength); + ops = opal_atomic_add_fetch_32(&req->ops_committed, 1); if (ops == req->ops_expected) { ompi_request_complete(&req->super, true); } @@ -382,7 +383,7 @@ component_finalize(void) static int component_query(struct ompi_win_t *win, void **base, size_t size, int disp_unit, - struct ompi_communicator_t *comm, struct ompi_info_t *info, + struct ompi_communicator_t *comm, struct opal_info_t *info, int flavor) { int ret; @@ -403,7 +404,7 @@ component_query(struct ompi_win_t *win, void **base, size_t size, int disp_unit, static int component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit, - struct ompi_communicator_t *comm, struct ompi_info_t *info, + struct ompi_communicator_t *comm, struct opal_info_t *info, int flavor, int *model) { ompi_osc_portals4_module_t *module = NULL; @@ -684,7 +685,7 @@ ompi_osc_portals4_free(struct ompi_win_t *win) int -ompi_osc_portals4_set_info(struct ompi_win_t *win, struct ompi_info_t *info) +ompi_osc_portals4_set_info(struct ompi_win_t *win, struct opal_info_t *info) { ompi_osc_portals4_module_t *module = (ompi_osc_portals4_module_t*) win->w_osc_module; @@ -696,19 +697,19 @@ ompi_osc_portals4_set_info(struct ompi_win_t *win, struct ompi_info_t *info) int -ompi_osc_portals4_get_info(struct ompi_win_t *win, struct ompi_info_t **info_used) +ompi_osc_portals4_get_info(struct ompi_win_t *win, struct opal_info_t **info_used) { ompi_osc_portals4_module_t *module = (ompi_osc_portals4_module_t*) win->w_osc_module; - ompi_info_t *info = OBJ_NEW(ompi_info_t); + opal_info_t *info = OBJ_NEW(opal_info_t); if (NULL == info) return OMPI_ERR_TEMP_OUT_OF_RESOURCE; - ompi_info_set(info, "no_locks", (module->state.lock == LOCK_ILLEGAL) ? "true" : "false"); + opal_info_set(info, "no_locks", (module->state.lock == LOCK_ILLEGAL) ? "true" : "false"); if (module->atomic_max < mca_osc_portals4_component.matching_atomic_max) { - ompi_info_set(info, "accumulate_ordering", "none"); + opal_info_set(info, "accumulate_ordering", "none"); } else { - ompi_info_set(info, "accumulate_ordering", "rar,war,raw,waw"); + opal_info_set(info, "accumulate_ordering", "rar,war,raw,waw"); } *info_used = info; diff --git a/ompi/mca/osc/portals4/osc_portals4_passive_target.c b/ompi/mca/osc/portals4/osc_portals4_passive_target.c index b39d4d904fe..b9baeea6f1c 100644 --- a/ompi/mca/osc/portals4/osc_portals4_passive_target.c +++ b/ompi/mca/osc/portals4/osc_portals4_passive_target.c @@ -43,7 +43,7 @@ lk_cas64(ompi_osc_portals4_module_t *module, int ret; size_t offset = offsetof(ompi_osc_portals4_node_state_t, lock); - (void)opal_atomic_add_64(&module->opcount, 1); + (void)opal_atomic_add_fetch_64(&module->opcount, 1); ret = PtlSwap(module->md_h, (ptl_size_t) result_val, @@ -76,7 +76,7 @@ lk_write64(ompi_osc_portals4_module_t *module, int ret; size_t offset = offsetof(ompi_osc_portals4_node_state_t, lock); - (void)opal_atomic_add_64(&module->opcount, 1); + (void)opal_atomic_add_fetch_64(&module->opcount, 1); ret = PtlPut(module->md_h, (ptl_size_t) &write_val, @@ -106,7 +106,7 @@ lk_add64(ompi_osc_portals4_module_t *module, int ret; size_t offset = offsetof(ompi_osc_portals4_node_state_t, lock); - (void)opal_atomic_add_64(&module->opcount, 1); + (void)opal_atomic_add_fetch_64(&module->opcount, 1); ret = PtlFetchAtomic(module->md_h, (ptl_size_t) result_val, diff --git a/ompi/mca/osc/pt2pt/Makefile.am b/ompi/mca/osc/pt2pt/Makefile.am index 17d08ff50e1..244f9b7d2c2 100644 --- a/ompi/mca/osc/pt2pt/Makefile.am +++ b/ompi/mca/osc/pt2pt/Makefile.am @@ -11,6 +11,7 @@ # Copyright (c) 2014 Los Alamos National Security, LLC. All rights # reserved. # Copyright (c) 2015 Intel, Inc. All rights reserved +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -18,6 +19,8 @@ # $HEADER$ # +dist_ompidata_DATA = help-osc-pt2pt.txt + pt2pt_sources = \ osc_pt2pt.h \ osc_pt2pt_module.c \ @@ -52,6 +55,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_osc_pt2pt_la_SOURCES = $(pt2pt_sources) mca_osc_pt2pt_la_LDFLAGS = -module -avoid-version +mca_osc_pt2pt_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_osc_pt2pt_la_SOURCES = $(pt2pt_sources) diff --git a/ompi/mca/osc/pt2pt/help-osc-pt2pt.txt b/ompi/mca/osc/pt2pt/help-osc-pt2pt.txt new file mode 100644 index 00000000000..9b57ac20b72 --- /dev/null +++ b/ompi/mca/osc/pt2pt/help-osc-pt2pt.txt @@ -0,0 +1,15 @@ +# -*- text -*- +# +# Copyright (c) 2016 Los Alamos National Security, LLC. All rights +# reserved. +# +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# +[mpi-thread-multiple-not-supported] +The OSC pt2pt component does not support MPI_THREAD_MULTIPLE in this release. +Workarounds are to run on a single node, or to use a system with an RDMA +capable network such as Infiniband. diff --git a/ompi/mca/osc/pt2pt/osc_pt2pt.h b/ompi/mca/osc/pt2pt/osc_pt2pt.h index 5901aa2e1a0..4b1a423ded1 100644 --- a/ompi/mca/osc/pt2pt/osc_pt2pt.h +++ b/ompi/mca/osc/pt2pt/osc_pt2pt.h @@ -8,13 +8,14 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2007-2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2007-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2012-2013 Sandia National Laboratories. All rights reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2016 FUJITSU LIMITED. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -144,15 +145,11 @@ static inline bool ompi_osc_pt2pt_peer_eager_active (ompi_osc_pt2pt_peer_t *peer static inline void ompi_osc_pt2pt_peer_set_flag (ompi_osc_pt2pt_peer_t *peer, int32_t flag, bool value) { - int32_t peer_flags, new_flags; - do { - peer_flags = peer->flags; - if (value) { - new_flags = peer_flags | flag; - } else { - new_flags = peer_flags & ~flag; - } - } while (!OPAL_ATOMIC_CMPSET_32 (&peer->flags, peer_flags, new_flags)); + if (value) { + OPAL_ATOMIC_OR_FETCH32 (&peer->flags, flag); + } else { + OPAL_ATOMIC_AND_FETCH32 (&peer->flags, ~flag); + } } static inline void ompi_osc_pt2pt_peer_set_locked (ompi_osc_pt2pt_peer_t *peer, bool value) @@ -328,7 +325,7 @@ int ompi_osc_pt2pt_put(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win); @@ -337,7 +334,7 @@ int ompi_osc_pt2pt_accumulate(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_op_t *op, @@ -347,7 +344,7 @@ int ompi_osc_pt2pt_get(void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win); @@ -357,14 +354,14 @@ int ompi_osc_pt2pt_compare_and_swap(const void *origin_addr, void *result_addr, struct ompi_datatype_t *dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, struct ompi_win_t *win); int ompi_osc_pt2pt_fetch_and_op(const void *origin_addr, void *result_addr, struct ompi_datatype_t *dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, struct ompi_op_t *op, struct ompi_win_t *win); @@ -385,7 +382,7 @@ int ompi_osc_pt2pt_rput(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win, @@ -395,7 +392,7 @@ int ompi_osc_pt2pt_rget(void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win, @@ -405,7 +402,7 @@ int ompi_osc_pt2pt_raccumulate(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_op_t *op, @@ -470,8 +467,8 @@ int ompi_osc_pt2pt_flush_local(int target, struct ompi_win_t *win); int ompi_osc_pt2pt_flush_local_all(struct ompi_win_t *win); -int ompi_osc_pt2pt_set_info(struct ompi_win_t *win, struct ompi_info_t *info); -int ompi_osc_pt2pt_get_info(struct ompi_win_t *win, struct ompi_info_t **info_used); +int ompi_osc_pt2pt_set_info(struct ompi_win_t *win, struct opal_info_t *info); +int ompi_osc_pt2pt_get_info(struct ompi_win_t *win, struct opal_info_t **info_used); int ompi_osc_pt2pt_component_irecv(ompi_osc_pt2pt_module_t *module, void *buf, @@ -517,7 +514,7 @@ static inline void mark_incoming_completion (ompi_osc_pt2pt_module_t *module, in OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output, "mark_incoming_completion marking active incoming complete. module %p, count = %d", (void *) module, (int) module->active_incoming_frag_count + 1)); - new_value = OPAL_THREAD_ADD32(&module->active_incoming_frag_count, 1); + new_value = OPAL_THREAD_ADD_FETCH32(&module->active_incoming_frag_count, 1); if (new_value >= 0) { OPAL_THREAD_LOCK(&module->lock); opal_condition_broadcast(&module->cond); @@ -529,7 +526,7 @@ static inline void mark_incoming_completion (ompi_osc_pt2pt_module_t *module, in OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output, "mark_incoming_completion marking passive incoming complete. module %p, source = %d, count = %d", (void *) module, source, (int) peer->passive_incoming_frag_count + 1)); - new_value = OPAL_THREAD_ADD32((int32_t *) &peer->passive_incoming_frag_count, 1); + new_value = OPAL_THREAD_ADD_FETCH32((int32_t *) &peer->passive_incoming_frag_count, 1); if (0 == new_value) { OPAL_THREAD_LOCK(&module->lock); opal_condition_broadcast(&module->cond); @@ -553,7 +550,7 @@ static inline void mark_incoming_completion (ompi_osc_pt2pt_module_t *module, in */ static inline void mark_outgoing_completion (ompi_osc_pt2pt_module_t *module) { - int32_t new_value = OPAL_THREAD_ADD32((int32_t *) &module->outgoing_frag_count, 1); + int32_t new_value = OPAL_THREAD_ADD_FETCH32((int32_t *) &module->outgoing_frag_count, 1); OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output, "mark_outgoing_completion: outgoing_frag_count = %d", new_value)); if (new_value >= 0) { @@ -577,12 +574,12 @@ static inline void mark_outgoing_completion (ompi_osc_pt2pt_module_t *module) */ static inline void ompi_osc_signal_outgoing (ompi_osc_pt2pt_module_t *module, int target, int count) { - OPAL_THREAD_ADD32((int32_t *) &module->outgoing_frag_count, -count); + OPAL_THREAD_ADD_FETCH32((int32_t *) &module->outgoing_frag_count, -count); if (MPI_PROC_NULL != target) { OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output, "ompi_osc_signal_outgoing_passive: target = %d, count = %d, total = %d", target, count, module->epoch_outgoing_frag_count[target] + count)); - OPAL_THREAD_ADD32((int32_t *) (module->epoch_outgoing_frag_count + target), count); + OPAL_THREAD_ADD_FETCH32((int32_t *) (module->epoch_outgoing_frag_count + target), count); } } @@ -720,7 +717,7 @@ static inline int get_tag(ompi_osc_pt2pt_module_t *module) /* the LSB of the tag is used be the receiver to determine if the message is a passive or active target (ie, where to mark completion). */ - int32_t tmp = OPAL_THREAD_ADD32((volatile int32_t *) &module->tag_counter, 4); + int32_t tmp = OPAL_THREAD_ADD_FETCH32((volatile int32_t *) &module->tag_counter, 4); return (tmp & OSC_PT2PT_FRAG_MASK) | !!(module->passive_target_access_epoch); } diff --git a/ompi/mca/osc/pt2pt/osc_pt2pt_active_target.c b/ompi/mca/osc/pt2pt/osc_pt2pt_active_target.c index 501c126fd14..33df9440a62 100644 --- a/ompi/mca/osc/pt2pt/osc_pt2pt_active_target.c +++ b/ompi/mca/osc/pt2pt/osc_pt2pt_active_target.c @@ -183,7 +183,7 @@ int ompi_osc_pt2pt_fence(int assert, ompi_win_t *win) incoming_reqs)); /* set our complete condition for incoming requests */ - OPAL_THREAD_ADD32(&module->active_incoming_frag_count, -incoming_reqs); + OPAL_THREAD_ADD_FETCH32(&module->active_incoming_frag_count, -incoming_reqs); /* wait for completion */ while (module->outgoing_frag_count < 0 || module->active_incoming_frag_count < 0) { @@ -272,7 +272,7 @@ int ompi_osc_pt2pt_start (ompi_group_t *group, int assert, ompi_win_t *win) OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output, "found unexpected post from %d", peer->rank)); - OPAL_THREAD_ADD32 (&sync->sync_expected, -1); + OPAL_THREAD_ADD_FETCH32 (&sync->sync_expected, -1); ompi_osc_pt2pt_peer_set_unex (peer, false); } } @@ -574,12 +574,12 @@ void osc_pt2pt_incoming_complete (ompi_osc_pt2pt_module_t *module, int source, i frag_count, module->active_incoming_frag_count, module->num_complete_msgs)); /* the current fragment is not part of the frag_count so we need to add it here */ - OPAL_THREAD_ADD32(&module->active_incoming_frag_count, -frag_count); + OPAL_THREAD_ADD_FETCH32(&module->active_incoming_frag_count, -frag_count); /* make sure the signal count is written before changing the complete message count */ opal_atomic_wmb (); - if (0 == OPAL_THREAD_ADD32(&module->num_complete_msgs, 1)) { + if (0 == OPAL_THREAD_ADD_FETCH32(&module->num_complete_msgs, 1)) { OPAL_THREAD_LOCK(&module->lock); opal_condition_broadcast (&module->cond); OPAL_THREAD_UNLOCK(&module->lock); diff --git a/ompi/mca/osc/pt2pt/osc_pt2pt_comm.c b/ompi/mca/osc/pt2pt/osc_pt2pt_comm.c index f0935273442..bfe67ea3d8f 100644 --- a/ompi/mca/osc/pt2pt/osc_pt2pt_comm.c +++ b/ompi/mca/osc/pt2pt/osc_pt2pt_comm.c @@ -12,7 +12,7 @@ * reserved. * Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2012-2013 Sandia National Laboratories. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2016 FUJITSU LIMITED. All rights reserved. * Copyright (c) 2016 IBM Corporation. All rights reserved. @@ -62,7 +62,7 @@ static int ompi_osc_pt2pt_req_comm_complete (ompi_request_t *request) /* update the cbdata for ompi_osc_pt2pt_comm_complete */ request->req_complete_cb_data = pt2pt_request->module; - if (0 == OPAL_THREAD_ADD32(&pt2pt_request->outstanding_requests, -1)) { + if (0 == OPAL_THREAD_ADD_FETCH32(&pt2pt_request->outstanding_requests, -1)) { ompi_osc_pt2pt_request_complete (pt2pt_request, request->req_status.MPI_ERROR); } @@ -108,7 +108,7 @@ static int ompi_osc_pt2pt_dt_send_complete (ompi_request_t *request) /* self communication optimizations */ static inline int ompi_osc_pt2pt_put_self (ompi_osc_pt2pt_sync_t *pt2pt_sync, const void *source, int source_count, - ompi_datatype_t *source_datatype, OPAL_PTRDIFF_TYPE target_disp, int target_count, + ompi_datatype_t *source_datatype, ptrdiff_t target_disp, int target_count, ompi_datatype_t *target_datatype, ompi_osc_pt2pt_module_t *module, ompi_osc_pt2pt_request_t *request) { @@ -133,7 +133,7 @@ static inline int ompi_osc_pt2pt_put_self (ompi_osc_pt2pt_sync_t *pt2pt_sync, co } static inline int ompi_osc_pt2pt_get_self (ompi_osc_pt2pt_sync_t *pt2pt_sync, void *target, int target_count, ompi_datatype_t *target_datatype, - OPAL_PTRDIFF_TYPE source_disp, int source_count, ompi_datatype_t *source_datatype, + ptrdiff_t source_disp, int source_count, ompi_datatype_t *source_datatype, ompi_osc_pt2pt_module_t *module, ompi_osc_pt2pt_request_t *request) { void *source = (unsigned char*) module->baseptr + @@ -157,7 +157,7 @@ static inline int ompi_osc_pt2pt_get_self (ompi_osc_pt2pt_sync_t *pt2pt_sync, vo } static inline int ompi_osc_pt2pt_cas_self (ompi_osc_pt2pt_sync_t *pt2pt_sync, const void *source, const void *compare, void *result, - ompi_datatype_t *datatype, OPAL_PTRDIFF_TYPE target_disp, ompi_osc_pt2pt_module_t *module) + ompi_datatype_t *datatype, ptrdiff_t target_disp, ompi_osc_pt2pt_module_t *module) { void *target = (unsigned char*) module->baseptr + ((unsigned long) target_disp * module->disp_unit); @@ -179,7 +179,7 @@ static inline int ompi_osc_pt2pt_cas_self (ompi_osc_pt2pt_sync_t *pt2pt_sync, co } static inline int ompi_osc_pt2pt_acc_self (ompi_osc_pt2pt_sync_t *pt2pt_sync, const void *source, int source_count, ompi_datatype_t *source_datatype, - OPAL_PTRDIFF_TYPE target_disp, int target_count, ompi_datatype_t *target_datatype, + ptrdiff_t target_disp, int target_count, ompi_datatype_t *target_datatype, ompi_op_t *op, ompi_osc_pt2pt_module_t *module, ompi_osc_pt2pt_request_t *request) { void *target = (unsigned char*) module->baseptr + @@ -214,7 +214,7 @@ static inline int ompi_osc_pt2pt_acc_self (ompi_osc_pt2pt_sync_t *pt2pt_sync, co static inline int ompi_osc_pt2pt_gacc_self (ompi_osc_pt2pt_sync_t *pt2pt_sync, const void *source, int source_count, ompi_datatype_t *source_datatype, void *result, int result_count, ompi_datatype_t *result_datatype, - OPAL_PTRDIFF_TYPE target_disp, int target_count, ompi_datatype_t *target_datatype, + ptrdiff_t target_disp, int target_count, ompi_datatype_t *target_datatype, ompi_op_t *op, ompi_osc_pt2pt_module_t *module, ompi_osc_pt2pt_request_t *request) { void *target = (unsigned char*) module->baseptr + @@ -267,7 +267,7 @@ static inline int ompi_osc_pt2pt_gacc_self (ompi_osc_pt2pt_sync_t *pt2pt_sync, c static inline int ompi_osc_pt2pt_put_w_req (const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, - int target, OPAL_PTRDIFF_TYPE target_disp, + int target, ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, ompi_win_t *win, ompi_osc_pt2pt_request_t *request) { @@ -418,7 +418,7 @@ static inline int ompi_osc_pt2pt_put_w_req (const void *origin_addr, int origin_ int ompi_osc_pt2pt_put(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, - int target, OPAL_PTRDIFF_TYPE target_disp, + int target, ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, ompi_win_t *win) { @@ -431,7 +431,7 @@ ompi_osc_pt2pt_put(const void *origin_addr, int origin_count, static int ompi_osc_pt2pt_accumulate_w_req (const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, - int target, OPAL_PTRDIFF_TYPE target_disp, + int target, ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_op_t *op, ompi_win_t *win, @@ -593,7 +593,7 @@ ompi_osc_pt2pt_accumulate_w_req (const void *origin_addr, int origin_count, int ompi_osc_pt2pt_accumulate(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, - int target, OPAL_PTRDIFF_TYPE target_disp, + int target, ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_op_t *op, ompi_win_t *win) @@ -605,7 +605,7 @@ ompi_osc_pt2pt_accumulate(const void *origin_addr, int origin_count, int ompi_osc_pt2pt_compare_and_swap (const void *origin_addr, const void *compare_addr, void *result_addr, struct ompi_datatype_t *dt, - int target, OPAL_PTRDIFF_TYPE target_disp, + int target, ptrdiff_t target_disp, struct ompi_win_t *win) { ompi_osc_pt2pt_module_t *module = GET_MODULE(win); @@ -697,7 +697,7 @@ int ompi_osc_pt2pt_compare_and_swap (const void *origin_addr, const void *compar int ompi_osc_pt2pt_fetch_and_op(const void *origin_addr, void *result_addr, struct ompi_datatype_t *dt, int target, - OPAL_PTRDIFF_TYPE target_disp, struct ompi_op_t *op, + ptrdiff_t target_disp, struct ompi_op_t *op, struct ompi_win_t *win) { return ompi_osc_pt2pt_get_accumulate(origin_addr, 1, dt, result_addr, 1, dt, @@ -706,7 +706,7 @@ int ompi_osc_pt2pt_fetch_and_op(const void *origin_addr, void *result_addr, int ompi_osc_pt2pt_rput(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, - int target, OPAL_PTRDIFF_TYPE target_disp, + int target, ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win, struct ompi_request_t **request) { @@ -746,7 +746,7 @@ int ompi_osc_pt2pt_rput(const void *origin_addr, int origin_count, static inline int ompi_osc_pt2pt_rget_internal (void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win, bool release_req, @@ -879,7 +879,7 @@ static inline int ompi_osc_pt2pt_rget_internal (void *origin_addr, int origin_co } int ompi_osc_pt2pt_rget (void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, - int target, OPAL_PTRDIFF_TYPE target_disp, int target_count, + int target, ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win, struct ompi_request_t **request) { @@ -890,7 +890,7 @@ int ompi_osc_pt2pt_rget (void *origin_addr, int origin_count, struct ompi_dataty int ompi_osc_pt2pt_get (void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, - int target, OPAL_PTRDIFF_TYPE target_disp, int target_count, + int target, ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win) { ompi_request_t *request; @@ -901,7 +901,7 @@ int ompi_osc_pt2pt_get (void *origin_addr, int origin_count, struct ompi_datatyp int ompi_osc_pt2pt_raccumulate(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, int target_count, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_op_t *op, struct ompi_win_t *win, struct ompi_request_t **request) { diff --git a/ompi/mca/osc/pt2pt/osc_pt2pt_component.c b/ompi/mca/osc/pt2pt/osc_pt2pt_component.c index 10fd18137c6..acb08fee54c 100644 --- a/ompi/mca/osc/pt2pt/osc_pt2pt_component.c +++ b/ompi/mca/osc/pt2pt/osc_pt2pt_component.c @@ -16,6 +16,7 @@ * Copyright (c) 2012-2013 Sandia National Laboratories. All rights reserved. * Copyright (c) 2015-2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -24,6 +25,7 @@ */ #include "ompi_config.h" +#include "opal/util/show_help.h" #include @@ -38,10 +40,10 @@ static int component_register(void); static int component_init(bool enable_progress_threads, bool enable_mpi_threads); static int component_finalize(void); static int component_query(struct ompi_win_t *win, void **base, size_t size, int disp_unit, - struct ompi_communicator_t *comm, struct ompi_info_t *info, + struct ompi_communicator_t *comm, struct opal_info_t *info, int flavor); static int component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit, - struct ompi_communicator_t *comm, struct ompi_info_t *info, + struct ompi_communicator_t *comm, struct opal_info_t *info, int flavor, int *model); ompi_osc_pt2pt_component_t mca_osc_pt2pt_component = { @@ -103,22 +105,20 @@ ompi_osc_pt2pt_module_t ompi_osc_pt2pt_module_template = { ompi_osc_pt2pt_flush_all, ompi_osc_pt2pt_flush_local, ompi_osc_pt2pt_flush_local_all, - - ompi_osc_pt2pt_set_info, - ompi_osc_pt2pt_get_info } }; bool ompi_osc_pt2pt_no_locks = false; +static bool using_thread_multiple = false; /* look up parameters for configuring this window. The code first looks in the info structure passed by the user, then through mca parameters. */ -static bool check_config_value_bool(char *key, ompi_info_t *info, bool result) +static bool check_config_value_bool(char *key, opal_info_t *info, bool result) { int flag; - (void) ompi_info_get_bool (info, key, &result, &flag); + (void) opal_info_get_bool (info, key, &result, &flag); return result; } @@ -208,6 +208,10 @@ component_init(bool enable_progress_threads, { int ret; + if (enable_mpi_threads) { + using_thread_multiple = true; + } + OBJ_CONSTRUCT(&mca_osc_pt2pt_component.lock, opal_mutex_t); OBJ_CONSTRUCT(&mca_osc_pt2pt_component.pending_operations, opal_list_t); OBJ_CONSTRUCT(&mca_osc_pt2pt_component.pending_operations_lock, opal_mutex_t); @@ -282,7 +286,7 @@ component_finalize(void) static int component_query(struct ompi_win_t *win, void **base, size_t size, int disp_unit, - struct ompi_communicator_t *comm, struct ompi_info_t *info, + struct ompi_communicator_t *comm, struct opal_info_t *info, int flavor) { if (MPI_WIN_FLAVOR_SHARED == flavor) return -1; @@ -293,7 +297,7 @@ component_query(struct ompi_win_t *win, void **base, size_t size, int disp_unit, static int component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit, - struct ompi_communicator_t *comm, struct ompi_info_t *info, + struct ompi_communicator_t *comm, struct opal_info_t *info, int flavor, int *model) { ompi_osc_pt2pt_module_t *module = NULL; @@ -304,6 +308,15 @@ component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit component */ if (MPI_WIN_FLAVOR_SHARED == flavor) return OMPI_ERR_NOT_SUPPORTED; + /* + * workaround for issue https://github.com/open-mpi/ompi/issues/2614 + * The following check needs to be removed once 2614 is addressed. + */ + if (using_thread_multiple) { + opal_show_help("help-osc-pt2pt.txt", "mpi-thread-multiple-not-supported", true); + return OMPI_ERR_NOT_SUPPORTED; + } + /* create module structure with all fields initialized to zero */ module = (ompi_osc_pt2pt_module_t*) calloc(1, sizeof(ompi_osc_pt2pt_module_t)); @@ -442,7 +455,7 @@ component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit int -ompi_osc_pt2pt_set_info(struct ompi_win_t *win, struct ompi_info_t *info) +ompi_osc_pt2pt_set_info(struct ompi_win_t *win, struct opal_info_t *info) { ompi_osc_pt2pt_module_t *module = (ompi_osc_pt2pt_module_t*) win->w_osc_module; @@ -454,9 +467,9 @@ ompi_osc_pt2pt_set_info(struct ompi_win_t *win, struct ompi_info_t *info) int -ompi_osc_pt2pt_get_info(struct ompi_win_t *win, struct ompi_info_t **info_used) +ompi_osc_pt2pt_get_info(struct ompi_win_t *win, struct opal_info_t **info_used) { - ompi_info_t *info = OBJ_NEW(ompi_info_t); + opal_info_t *info = OBJ_NEW(opal_info_t); if (NULL == info) return OMPI_ERR_TEMP_OUT_OF_RESOURCE; *info_used = info; diff --git a/ompi/mca/osc/pt2pt/osc_pt2pt_data_move.c b/ompi/mca/osc/pt2pt/osc_pt2pt_data_move.c index 8aef87566f9..6a4205499bd 100644 --- a/ompi/mca/osc/pt2pt/osc_pt2pt_data_move.c +++ b/ompi/mca/osc/pt2pt/osc_pt2pt_data_move.c @@ -667,7 +667,7 @@ static int accumulate_cb (ompi_request_t *request) rank = acc_data->peer; } - if (0 == OPAL_THREAD_ADD32(&acc_data->request_count, -1)) { + if (0 == OPAL_THREAD_ADD_FETCH32(&acc_data->request_count, -1)) { /* no more requests needed before the buffer can be accumulated */ if (acc_data->source) { @@ -716,9 +716,9 @@ static int ompi_osc_pt2pt_acc_op_queue (ompi_osc_pt2pt_module_t *module, ompi_os /* NTH: ensure we don't leave wait/process_flush/etc until this * accumulate operation is complete. */ if (active_target) { - OPAL_THREAD_ADD32(&module->active_incoming_frag_count, -1); + OPAL_THREAD_ADD_FETCH32(&module->active_incoming_frag_count, -1); } else { - OPAL_THREAD_ADD32(&peer->passive_incoming_frag_count, -1); + OPAL_THREAD_ADD_FETCH32(&peer->passive_incoming_frag_count, -1); } pending_acc->active_target = active_target; @@ -1353,7 +1353,7 @@ static inline int process_flush (ompi_osc_pt2pt_module_t *module, int source, "process_flush header = {.frag_count = %d}", flush_header->frag_count)); /* increase signal count by incoming frags */ - OPAL_THREAD_ADD32(&peer->passive_incoming_frag_count, -(int32_t) flush_header->frag_count); + OPAL_THREAD_ADD_FETCH32(&peer->passive_incoming_frag_count, -(int32_t) flush_header->frag_count); OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output, "%d: process_flush: received message from %d. passive_incoming_frag_count = %d", @@ -1372,7 +1372,7 @@ static inline int process_flush (ompi_osc_pt2pt_module_t *module, int source, } /* signal incomming will increment this counter */ - OPAL_THREAD_ADD32(&peer->passive_incoming_frag_count, -1); + OPAL_THREAD_ADD_FETCH32(&peer->passive_incoming_frag_count, -1); return sizeof (*flush_header); } @@ -1387,7 +1387,7 @@ static inline int process_unlock (ompi_osc_pt2pt_module_t *module, int source, "process_unlock header = {.frag_count = %d}", unlock_header->frag_count)); /* increase signal count by incoming frags */ - OPAL_THREAD_ADD32(&peer->passive_incoming_frag_count, -(int32_t) unlock_header->frag_count); + OPAL_THREAD_ADD_FETCH32(&peer->passive_incoming_frag_count, -(int32_t) unlock_header->frag_count); OPAL_OUTPUT_VERBOSE((25, ompi_osc_base_framework.framework_output, "osc pt2pt: processing unlock request from %d. frag count = %d, processed_count = %d", @@ -1406,7 +1406,7 @@ static inline int process_unlock (ompi_osc_pt2pt_module_t *module, int source, } /* signal incoming will increment this counter */ - OPAL_THREAD_ADD32(&peer->passive_incoming_frag_count, -1); + OPAL_THREAD_ADD_FETCH32(&peer->passive_incoming_frag_count, -1); return sizeof (*unlock_header); } diff --git a/ompi/mca/osc/pt2pt/osc_pt2pt_frag.c b/ompi/mca/osc/pt2pt/osc_pt2pt_frag.c index 63208da8772..c14afeb2572 100644 --- a/ompi/mca/osc/pt2pt/osc_pt2pt_frag.c +++ b/ompi/mca/osc/pt2pt/osc_pt2pt_frag.c @@ -1,10 +1,11 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* * Copyright (c) 2012-2013 Sandia National Laboratories. All rights reserved. - * Copyright (c) 2014-2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2014-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017-2018 Cisco Systems, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -104,8 +105,8 @@ static int ompi_osc_pt2pt_flush_active_frag (ompi_osc_pt2pt_module_t *module, om "osc pt2pt: flushing active fragment to target %d. pending: %d", active_frag->target, active_frag->pending)); - if (opal_atomic_cmpset (&peer->active_frag, active_frag, NULL)) { - if (0 != OPAL_THREAD_ADD32(&active_frag->pending, -1)) { + if (opal_atomic_compare_exchange_strong_ptr (&peer->active_frag, &active_frag, NULL)) { + if (0 != OPAL_THREAD_ADD_FETCH32(&active_frag->pending, -1)) { /* communication going on while synchronizing; this is an rma usage bug */ return OMPI_ERR_RMA_SYNC; } @@ -138,7 +139,7 @@ int ompi_osc_pt2pt_frag_flush_pending (ompi_osc_pt2pt_module_t *module, int targ int ompi_osc_pt2pt_frag_flush_pending_all (ompi_osc_pt2pt_module_t *module) { - int ret; + int ret = OPAL_SUCCESS; for (int i = 0 ; i < ompi_comm_size (module->comm) ; ++i) { ret = ompi_osc_pt2pt_frag_flush_pending (module, i); @@ -153,7 +154,6 @@ int ompi_osc_pt2pt_frag_flush_pending_all (ompi_osc_pt2pt_module_t *module) int ompi_osc_pt2pt_frag_flush_target (ompi_osc_pt2pt_module_t *module, int target) { ompi_osc_pt2pt_peer_t *peer = ompi_osc_pt2pt_peer_lookup (module, target); - ompi_osc_pt2pt_frag_t *frag; int ret = OMPI_SUCCESS; OPAL_OUTPUT_VERBOSE((50, ompi_osc_base_framework.framework_output, diff --git a/ompi/mca/osc/pt2pt/osc_pt2pt_frag.h b/ompi/mca/osc/pt2pt/osc_pt2pt_frag.h index f4e05a12ad8..4ed38930d5a 100644 --- a/ompi/mca/osc/pt2pt/osc_pt2pt_frag.h +++ b/ompi/mca/osc/pt2pt/osc_pt2pt_frag.h @@ -1,7 +1,7 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* * Copyright (c) 2012 Sandia National Laboratories. All rights reserved. - * Copyright (c) 2014-2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2014-2017 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -51,7 +51,7 @@ static inline int ompi_osc_pt2pt_frag_finish (ompi_osc_pt2pt_module_t *module, ompi_osc_pt2pt_frag_t* buffer) { opal_atomic_wmb (); - if (0 == OPAL_THREAD_ADD32(&buffer->pending, -1)) { + if (0 == OPAL_THREAD_ADD_FETCH32(&buffer->pending, -1)) { opal_atomic_mb (); return ompi_osc_pt2pt_frag_start(module, buffer); } @@ -67,7 +67,7 @@ static inline ompi_osc_pt2pt_frag_t *ompi_osc_pt2pt_frag_alloc_non_buffered (omp /* to ensure ordering flush the buffer on the peer */ curr = peer->active_frag; - if (NULL != curr && opal_atomic_cmpset (&peer->active_frag, curr, NULL)) { + if (NULL != curr && opal_atomic_compare_exchange_strong_ptr (&peer->active_frag, &curr, NULL)) { /* If there's something pending, the pending finish will start the buffer. Otherwise, we need to start it now. */ int ret = ompi_osc_pt2pt_frag_finish (module, curr); @@ -142,11 +142,11 @@ static inline int _ompi_osc_pt2pt_frag_alloc (ompi_osc_pt2pt_module_t *module, i curr->pending_long_sends = long_send; peer->active_frag = curr; } else { - OPAL_THREAD_ADD32(&curr->header->num_ops, 1); + OPAL_THREAD_ADD_FETCH32(&curr->header->num_ops, 1); curr->pending_long_sends += long_send; } - OPAL_THREAD_ADD32(&curr->pending, 1); + OPAL_THREAD_ADD_FETCH32(&curr->pending, 1); } else { curr = ompi_osc_pt2pt_frag_alloc_non_buffered (module, peer, request_len); if (OPAL_UNLIKELY(NULL == curr)) { @@ -172,8 +172,12 @@ static inline int ompi_osc_pt2pt_frag_alloc (ompi_osc_pt2pt_module_t *module, in { int ret; + if (request_len > mca_osc_pt2pt_component.buffer_size) { + return OMPI_ERR_OUT_OF_RESOURCE; + } + do { - ret = ompi_osc_pt2pt_frag_alloc (module, target, request_len , buffer, ptr, long_send, buffered); + ret = _ompi_osc_pt2pt_frag_alloc (module, target, request_len , buffer, ptr, long_send, buffered); if (OPAL_LIKELY(OMPI_SUCCESS == ret || OMPI_ERR_OUT_OF_RESOURCE != ret)) { break; } diff --git a/ompi/mca/osc/pt2pt/osc_pt2pt_passive_target.c b/ompi/mca/osc/pt2pt/osc_pt2pt_passive_target.c index 819e7376dac..091757511f3 100644 --- a/ompi/mca/osc/pt2pt/osc_pt2pt_passive_target.c +++ b/ompi/mca/osc/pt2pt/osc_pt2pt_passive_target.c @@ -8,7 +8,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2007-2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2007-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2010-2016 IBM Corporation. All rights reserved. * Copyright (c) 2012-2013 Sandia National Laboratories. All rights reserved. @@ -64,7 +64,7 @@ static inline int ompi_osc_pt2pt_lock_self (ompi_osc_pt2pt_module_t *module, omp assert (lock->type == OMPI_OSC_PT2PT_SYNC_TYPE_LOCK); - (void) OPAL_THREAD_ADD32(&lock->sync_expected, 1); + (void) OPAL_THREAD_ADD_FETCH32(&lock->sync_expected, 1); acquired = ompi_osc_pt2pt_lock_try_acquire (module, my_rank, lock_type, (uint64_t) (uintptr_t) lock); if (!acquired) { @@ -91,7 +91,7 @@ static inline void ompi_osc_pt2pt_unlock_self (ompi_osc_pt2pt_module_t *module, ompi_osc_pt2pt_peer_t *peer = ompi_osc_pt2pt_peer_lookup (module, my_rank); int lock_type = lock->sync.lock.type; - (void) OPAL_THREAD_ADD32(&lock->sync_expected, 1); + (void) OPAL_THREAD_ADD_FETCH32(&lock->sync_expected, 1); assert (lock->type == OMPI_OSC_PT2PT_SYNC_TYPE_LOCK); @@ -99,9 +99,9 @@ static inline void ompi_osc_pt2pt_unlock_self (ompi_osc_pt2pt_module_t *module, "ompi_osc_pt2pt_unlock_self: unlocking myself. lock state = %d", module->lock_status)); if (MPI_LOCK_EXCLUSIVE == lock_type) { - OPAL_THREAD_ADD32(&module->lock_status, 1); + OPAL_THREAD_ADD_FETCH32(&module->lock_status, 1); ompi_osc_pt2pt_activate_next_lock (module); - } else if (0 == OPAL_THREAD_ADD32(&module->lock_status, -1)) { + } else if (0 == OPAL_THREAD_ADD_FETCH32(&module->lock_status, -1)) { ompi_osc_pt2pt_activate_next_lock (module); } @@ -128,7 +128,7 @@ int ompi_osc_pt2pt_lock_remote (ompi_osc_pt2pt_module_t *module, int target, omp return OMPI_SUCCESS; } - (void) OPAL_THREAD_ADD32(&lock->sync_expected, 1); + (void) OPAL_THREAD_ADD_FETCH32(&lock->sync_expected, 1); assert (lock->type == OMPI_OSC_PT2PT_SYNC_TYPE_LOCK); @@ -145,7 +145,7 @@ int ompi_osc_pt2pt_lock_remote (ompi_osc_pt2pt_module_t *module, int target, omp ret = ompi_osc_pt2pt_control_send_unbuffered (module, target, &lock_req, sizeof (lock_req)); if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { - OPAL_THREAD_ADD32(&lock->sync_expected, -1); + OPAL_THREAD_ADD_FETCH32(&lock->sync_expected, -1); } else { ompi_osc_pt2pt_peer_set_locked (peer, true); } @@ -163,7 +163,7 @@ static inline int ompi_osc_pt2pt_unlock_remote (ompi_osc_pt2pt_module_t *module, ompi_osc_pt2pt_header_unlock_t unlock_req; int ret; - (void) OPAL_THREAD_ADD32(&lock->sync_expected, 1); + (void) OPAL_THREAD_ADD_FETCH32(&lock->sync_expected, 1); assert (lock->type == OMPI_OSC_PT2PT_SYNC_TYPE_LOCK); @@ -207,7 +207,7 @@ static inline int ompi_osc_pt2pt_flush_remote (ompi_osc_pt2pt_module_t *module, int32_t frag_count = opal_atomic_swap_32 ((int32_t *) module->epoch_outgoing_frag_count + target, -1); int ret; - (void) OPAL_THREAD_ADD32(&lock->sync_expected, 1); + (void) OPAL_THREAD_ADD_FETCH32(&lock->sync_expected, 1); assert (lock->type == OMPI_OSC_PT2PT_SYNC_TYPE_LOCK); @@ -633,6 +633,9 @@ int ompi_osc_pt2pt_flush_local (int target, struct ompi_win_t *win) } OPAL_THREAD_UNLOCK(&module->lock); + /* make some progress */ + opal_progress (); + return OMPI_SUCCESS; } @@ -659,6 +662,9 @@ int ompi_osc_pt2pt_flush_local_all (struct ompi_win_t *win) } OPAL_THREAD_UNLOCK(&module->lock); + /* make some progress */ + opal_progress (); + return OMPI_SUCCESS; } @@ -738,14 +744,13 @@ static bool ompi_osc_pt2pt_lock_try_acquire (ompi_osc_pt2pt_module_t* module, in break; } - if (opal_atomic_cmpset_32 (&module->lock_status, lock_status, lock_status + 1)) { + if (opal_atomic_compare_exchange_strong_32 (&module->lock_status, &lock_status, lock_status + 1)) { break; } - - lock_status = module->lock_status; } while (1); } else { - queue = !opal_atomic_cmpset_32 (&module->lock_status, 0, -1); + int32_t _tmp_value = 0; + queue = !opal_atomic_compare_exchange_strong_32 (&module->lock_status, &_tmp_value, -1); } if (queue) { @@ -903,9 +908,9 @@ int ompi_osc_pt2pt_process_unlock (ompi_osc_pt2pt_module_t *module, int source, } if (-1 == module->lock_status) { - OPAL_THREAD_ADD32(&module->lock_status, 1); + OPAL_THREAD_ADD_FETCH32(&module->lock_status, 1); ompi_osc_pt2pt_activate_next_lock (module); - } else if (0 == OPAL_THREAD_ADD32(&module->lock_status, -1)) { + } else if (0 == OPAL_THREAD_ADD_FETCH32(&module->lock_status, -1)) { ompi_osc_pt2pt_activate_next_lock (module); } diff --git a/ompi/mca/osc/pt2pt/osc_pt2pt_sync.h b/ompi/mca/osc/pt2pt/osc_pt2pt_sync.h index 10398926e84..fe359bf6cf9 100644 --- a/ompi/mca/osc/pt2pt/osc_pt2pt_sync.h +++ b/ompi/mca/osc/pt2pt/osc_pt2pt_sync.h @@ -166,7 +166,7 @@ static inline void ompi_osc_pt2pt_sync_wait_expected (ompi_osc_pt2pt_sync_t *syn static inline void ompi_osc_pt2pt_sync_expected (ompi_osc_pt2pt_sync_t *sync) { - int32_t new_value = OPAL_THREAD_ADD32 (&sync->sync_expected, -1); + int32_t new_value = OPAL_THREAD_ADD_FETCH32 (&sync->sync_expected, -1); if (0 == new_value) { OPAL_THREAD_LOCK(&sync->lock); if (!(sync->type == OMPI_OSC_PT2PT_SYNC_TYPE_LOCK && sync->num_peers > 1)) { diff --git a/ompi/mca/osc/rdma/Makefile.am b/ompi/mca/osc/rdma/Makefile.am index 80082a0e711..e52d0087743 100644 --- a/ompi/mca/osc/rdma/Makefile.am +++ b/ompi/mca/osc/rdma/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2014-2015 Los Alamos National Security, LLC. All rights # reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -58,6 +59,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_osc_rdma_la_SOURCES = $(rdma_sources) mca_osc_rdma_la_LDFLAGS = -module -avoid-version +mca_osc_rdma_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_osc_rdma_la_SOURCES = $(rdma_sources) diff --git a/ompi/mca/osc/rdma/osc_rdma.h b/ompi/mca/osc/rdma/osc_rdma.h index 6f344553a3b..a33e0f332f8 100644 --- a/ompi/mca/osc/rdma/osc_rdma.h +++ b/ompi/mca/osc/rdma/osc_rdma.h @@ -8,11 +8,11 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2007-2017 Los Alamos National Security, LLC. All rights + * Copyright (c) 2007-2018 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2012-2013 Sandia National Laboratories. All rights reserved. - * Copyright (c) 2016 Intel, Inc. All rights reserved. + * Copyright (c) 2016-2018 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -50,6 +50,11 @@ #include "opal_stdint.h" +enum { + OMPI_OSC_RDMA_LOCKING_TWO_LEVEL, + OMPI_OSC_RDMA_LOCKING_ON_DEMAND, +}; + /** * @brief osc rdma component structure */ @@ -87,6 +92,9 @@ struct ompi_osc_rdma_component_t { /** Default value of the no_locks info key for new windows */ bool no_locks; + /** Locking mode to use as the default for all windows */ + int locking_mode; + /** Accumulate operations will only operate on a single intrinsic datatype */ bool acc_single_intrinsic; @@ -119,6 +127,8 @@ struct ompi_osc_rdma_module_t { /** Mutex lock protecting module data */ opal_mutex_t lock; + /** locking mode to use */ + int locking_mode; /* window configuration */ @@ -128,6 +138,9 @@ struct ompi_osc_rdma_module_t { /** value of same_size info key for this window */ bool same_size; + /** CPU atomics can be used */ + bool use_cpu_atomics; + /** passive-target synchronization will not be used in this window */ bool no_locks; @@ -144,10 +157,12 @@ struct ompi_osc_rdma_module_t { /** Local displacement unit. */ int disp_unit; - /** global leader */ ompi_osc_rdma_peer_t *leader; + /** my peer structure */ + ompi_osc_rdma_peer_t *my_peer; + /** pointer to free on cleanup (may be NULL) */ void *free_after; @@ -273,6 +288,16 @@ int ompi_osc_rdma_free (struct ompi_win_t *win); */ int ompi_osc_module_add_peer (ompi_osc_rdma_module_t *module, ompi_osc_rdma_peer_t *peer); +/** + * @brief demand lock a peer + * + * @param[in] module osc rdma module + * @param[in] peer peer to lock + * + * @returns OMPI_SUCCESS on success + */ +int ompi_osc_rdma_demand_lock_peer (ompi_osc_rdma_module_t *module, ompi_osc_rdma_peer_t *peer); + /** * @brief check if a peer object is cached for a remote rank * @@ -446,10 +471,18 @@ static inline ompi_osc_rdma_sync_t *ompi_osc_rdma_module_sync_lookup (ompi_osc_r } return NULL; - case OMPI_OSC_RDMA_SYNC_TYPE_FENCE: case OMPI_OSC_RDMA_SYNC_TYPE_LOCK: - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "found fence/lock_all access epoch for target %d", target); + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "found lock_all access epoch for target %d", target); + + *peer = ompi_osc_rdma_module_peer (module, target); + if (OPAL_UNLIKELY(OMPI_OSC_RDMA_LOCKING_ON_DEMAND == module->locking_mode && + !ompi_osc_rdma_peer_is_demand_locked (*peer))) { + ompi_osc_rdma_demand_lock_peer (module, *peer); + } + return &module->all_sync; + case OMPI_OSC_RDMA_SYNC_TYPE_FENCE: + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "found fence access epoch for target %d", target); /* fence epoch is now active */ module->all_sync.epoch_active = true; *peer = ompi_osc_rdma_module_peer (module, target); @@ -467,6 +500,62 @@ static inline ompi_osc_rdma_sync_t *ompi_osc_rdma_module_sync_lookup (ompi_osc_r return NULL; } +static bool ompi_osc_rdma_use_btl_flush (ompi_osc_rdma_module_t *module) +{ +#if defined(BTL_VERSION) && (BTL_VERSION >= 310) + return !!(module->selected_btl->btl_flush); +#else + return false; +#endif +} + +/** + * @brief increment the outstanding rdma operation counter (atomic) + * + * @param[in] rdma_sync osc rdma synchronization object + */ +static inline void ompi_osc_rdma_sync_rdma_inc_always (ompi_osc_rdma_sync_t *rdma_sync) +{ + ompi_osc_rdma_counter_add (&rdma_sync->outstanding_rdma.counter, 1); + + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_INFO, "inc: there are %ld outstanding rdma operations", + (unsigned long) rdma_sync->outstanding_rdma.counter); +} + +static inline void ompi_osc_rdma_sync_rdma_inc (ompi_osc_rdma_sync_t *rdma_sync) +{ +#if defined(BTL_VERSION) && (BTL_VERSION >= 310) + if (ompi_osc_rdma_use_btl_flush (rdma_sync->module)) { + return; + } +#endif + ompi_osc_rdma_sync_rdma_inc_always (rdma_sync); +} + +/** + * @brief decrement the outstanding rdma operation counter (atomic) + * + * @param[in] rdma_sync osc rdma synchronization object + */ +static inline void ompi_osc_rdma_sync_rdma_dec_always (ompi_osc_rdma_sync_t *rdma_sync) +{ + opal_atomic_wmb (); + ompi_osc_rdma_counter_add (&rdma_sync->outstanding_rdma.counter, -1); + + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_INFO, "dec: there are %ld outstanding rdma operations", + (unsigned long) rdma_sync->outstanding_rdma.counter); +} + +static inline void ompi_osc_rdma_sync_rdma_dec (ompi_osc_rdma_sync_t *rdma_sync) +{ +#if defined(BTL_VERSION) && (BTL_VERSION >= 310) + if (ompi_osc_rdma_use_btl_flush (rdma_sync->module)) { + return; + } +#endif + ompi_osc_rdma_sync_rdma_dec_always (rdma_sync); +} + /** * @brief complete all outstanding rdma operations to all peers * @@ -474,18 +563,31 @@ static inline ompi_osc_rdma_sync_t *ompi_osc_rdma_module_sync_lookup (ompi_osc_r */ static inline void ompi_osc_rdma_sync_rdma_complete (ompi_osc_rdma_sync_t *sync) { - ompi_osc_rdma_aggregation_t *aggregation, *next; - if (opal_list_get_size (&sync->aggregations)) { + ompi_osc_rdma_aggregation_t *aggregation, *next; + OPAL_THREAD_SCOPED_LOCK(&sync->lock, OPAL_LIST_FOREACH_SAFE(aggregation, next, &sync->aggregations, ompi_osc_rdma_aggregation_t) { + fprintf (stderr, "Flushing aggregation %p, peer %p\n", (void*)aggregation, (void*)aggregation->peer); ompi_osc_rdma_peer_aggregate_flush (aggregation->peer); }); } +#if !defined(BTL_VERSION) || (BTL_VERSION < 310) do { opal_progress (); - } while (sync->outstanding_rdma); + } while (ompi_osc_rdma_sync_get_count (sync)); +#else + mca_btl_base_module_t *btl_module = sync->module->selected_btl; + + do { + if (!ompi_osc_rdma_use_btl_flush (sync->module)) { + opal_progress (); + } else { + btl_module->btl_flush (btl_module, NULL); + } + } while (ompi_osc_rdma_sync_get_count (sync) || (sync->module->rdma_frag && (sync->module->rdma_frag->pending > 1))); +#endif } /** diff --git a/ompi/mca/osc/rdma/osc_rdma_accumulate.c b/ompi/mca/osc/rdma/osc_rdma_accumulate.c index 0fd2bbdd6ef..53fdeb889de 100644 --- a/ompi/mca/osc/rdma/osc_rdma_accumulate.c +++ b/ompi/mca/osc/rdma/osc_rdma_accumulate.c @@ -1,10 +1,10 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* - * Copyright (c) 2014-2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2014-2018 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2016 Research Organization for Information Science + * Copyright (c) 2016-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2016 Intel, Inc. All rights reserved. + * Copyright (c) 2016-2018 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -18,22 +18,115 @@ #include "ompi/mca/osc/base/osc_base_obj_convert.h" +static inline void ompi_osc_rdma_peer_accumulate_cleanup (ompi_osc_rdma_module_t *module, ompi_osc_rdma_peer_t *peer, bool lock_acquired) +{ + if (lock_acquired) { + (void) ompi_osc_rdma_lock_release_exclusive (module, peer, offsetof (ompi_osc_rdma_state_t, accumulate_lock)); + } + + /* clear out the accumulation flag */ + ompi_osc_rdma_peer_clear_flag (peer, OMPI_OSC_RDMA_PEER_ACCUMULATING); +} + +enum ompi_osc_rdma_event_type_t { + OMPI_OSC_RDMA_EVENT_TYPE_PUT, +}; + +typedef enum ompi_osc_rdma_event_type_t ompi_osc_rdma_event_type_t; + +struct ompi_osc_rdma_event_t { + opal_event_t super; + ompi_osc_rdma_module_t *module; + struct mca_btl_base_endpoint_t *endpoint; + void *local_address; + mca_btl_base_registration_handle_t *local_handle; + uint64_t remote_address; + mca_btl_base_registration_handle_t *remote_handle; + uint64_t length; + mca_btl_base_rdma_completion_fn_t cbfunc; + void *cbcontext; + void *cbdata; +}; + +typedef struct ompi_osc_rdma_event_t ompi_osc_rdma_event_t; + +#if 0 +static void *ompi_osc_rdma_event_put (int fd, int flags, void *context) +{ + ompi_osc_rdma_event_t *event = (ompi_osc_rdma_event_t *) context; + int ret; + + ret = event->module->selected_btl->btl_put (event->module->selected_btl, event->endpoint, event->local_address, + event->remote_address, event->local_handle, event->remote_handle, + event->length, 0, MCA_BTL_NO_ORDER, event->cbfunc, event->cbcontext, + event->cbdata); + if (OPAL_LIKELY(OPAL_SUCCESS == ret)) { + /* done with this event */ + opal_event_del (&event->super); + free (event); + } else { + /* re-activate the event */ + opal_event_active (&event->super, OPAL_EV_READ, 1); + } + + return NULL; +} + +static int ompi_osc_rdma_event_queue (ompi_osc_rdma_module_t *module, struct mca_btl_base_endpoint_t *endpoint, + ompi_osc_rdma_event_type_t event_type, void *local_address, mca_btl_base_registration_handle_t *local_handle, + uint64_t remote_address, mca_btl_base_registration_handle_t *remote_handle, + uint64_t length, mca_btl_base_rdma_completion_fn_t cbfunc, void *cbcontext, + void *cbdata) +{ + ompi_osc_rdma_event_t *event = malloc (sizeof (*event)); + void *(*event_func) (int, int, void *); + + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "queueing event type %d", event_type); + + if (OPAL_UNLIKELY(NULL == event)) { + return OMPI_ERR_OUT_OF_RESOURCE; + } + + event->module = module; + event->endpoint = endpoint; + event->local_address = local_address; + event->local_handle = local_handle; + event->remote_address = remote_address; + event->remote_handle = remote_handle; + event->length = length; + event->cbfunc = cbfunc; + event->cbcontext = cbcontext; + event->cbdata = cbdata; + + switch (event_type) { + case OMPI_OSC_RDMA_EVENT_TYPE_PUT: + event_func = ompi_osc_rdma_event_put; + break; + default: + opal_output(0, "osc/rdma: cannot queue unknown event type %d", event_type); + abort (); + } + + opal_event_set (opal_sync_event_base, &event->super, -1, OPAL_EV_READ, + event_func, event); + opal_event_active (&event->super, OPAL_EV_READ, 1); + + return OMPI_SUCCESS; +} +#endif + static int ompi_osc_rdma_gacc_local (const void *source_buffer, int source_count, ompi_datatype_t *source_datatype, void *result_buffer, int result_count, ompi_datatype_t *result_datatype, ompi_osc_rdma_peer_t *peer, uint64_t target_address, mca_btl_base_registration_handle_t *target_handle, int target_count, ompi_datatype_t *target_datatype, ompi_op_t *op, ompi_osc_rdma_module_t *module, - ompi_osc_rdma_request_t *request) + ompi_osc_rdma_request_t *request, bool lock_acquired) { int ret = OMPI_SUCCESS; do { OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "performing accumulate with local region(s)"); - if (!ompi_osc_rdma_peer_is_exclusive (peer)) { - (void) ompi_osc_rdma_lock_acquire_exclusive (module, peer, offsetof (ompi_osc_rdma_state_t, accumulate_lock)); - } - if (NULL != result_buffer) { /* get accumulate */ @@ -54,12 +147,10 @@ static int ompi_osc_rdma_gacc_local (const void *source_buffer, int source_count target_count, target_datatype); } } - - if (!ompi_osc_rdma_peer_is_exclusive (peer)) { - (void) ompi_osc_rdma_lock_release_exclusive (module, peer, offsetof (ompi_osc_rdma_state_t, accumulate_lock)); - } } while (0); + ompi_osc_rdma_peer_accumulate_cleanup (module, peer, lock_acquired); + if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_ERROR, "local accumulate failed with ompi error code %d", ret); return ret; @@ -76,200 +167,91 @@ static int ompi_osc_rdma_gacc_local (const void *source_buffer, int source_count static inline int ompi_osc_rdma_cas_local (const void *source_addr, const void *compare_addr, void *result_addr, ompi_datatype_t *datatype, ompi_osc_rdma_peer_t *peer, uint64_t target_address, mca_btl_base_registration_handle_t *target_handle, - ompi_osc_rdma_module_t *module) + ompi_osc_rdma_module_t *module, bool lock_acquired) { OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "performing compare-and-swap with local regions"); - ompi_osc_rdma_lock_acquire_exclusive (module, peer, offsetof (ompi_osc_rdma_state_t, accumulate_lock)); - memcpy (result_addr, (void *) (uintptr_t) target_address, datatype->super.size); if (0 == memcmp (compare_addr, result_addr, datatype->super.size)) { memcpy ((void *) (uintptr_t) target_address, source_addr, datatype->super.size); } - ompi_osc_rdma_lock_release_exclusive (module, peer, offsetof (ompi_osc_rdma_state_t, accumulate_lock)); + ompi_osc_rdma_peer_accumulate_cleanup (module, peer, lock_acquired); return OMPI_SUCCESS; } -/* completion of an accumulate put */ -static void ompi_osc_rdma_acc_put_complete (struct mca_btl_base_module_t *btl, struct mca_btl_base_endpoint_t *endpoint, - void *local_address, mca_btl_base_registration_handle_t *local_handle, - void *context, void *data, int status) -{ - ompi_osc_rdma_request_t *request = (ompi_osc_rdma_request_t *) context; - ompi_osc_rdma_sync_t *sync = request->sync; - ompi_osc_rdma_peer_t *peer = request->peer; - - OSC_RDMA_VERBOSE(status ? MCA_BASE_VERBOSE_ERROR : MCA_BASE_VERBOSE_TRACE, "remote accumulate (put/get) complete on " - "sync %p. local address %p. opal status %d", (void *) sync, local_address, status); - - ompi_osc_rdma_frag_complete (request->frag); - ompi_osc_rdma_request_complete (request, status); - - if (!ompi_osc_rdma_peer_is_exclusive (peer)) { - (void) ompi_osc_rdma_lock_release_exclusive (sync->module, peer, offsetof (ompi_osc_rdma_state_t, accumulate_lock)); - } - - ompi_osc_rdma_sync_rdma_dec (sync); - peer->flags &= ~OMPI_OSC_RDMA_PEER_ACCUMULATING; -} - -/* completion of an accumulate get operation */ -static void ompi_osc_rdma_acc_get_complete (struct mca_btl_base_module_t *btl, struct mca_btl_base_endpoint_t *endpoint, - void *local_address, mca_btl_base_registration_handle_t *local_handle, - void *context, void *data, int status) -{ - ompi_osc_rdma_request_t *request = (ompi_osc_rdma_request_t *) context; - intptr_t source = (intptr_t) local_address + request->offset; - ompi_osc_rdma_sync_t *sync = request->sync; - ompi_osc_rdma_module_t *module = sync->module; - - assert (OMPI_SUCCESS == status); - - OSC_RDMA_VERBOSE(status ? MCA_BASE_VERBOSE_ERROR : MCA_BASE_VERBOSE_TRACE, "remote accumulate get complete on sync %p. " - "status %d. request type %d", (void *) sync, status, request->type); - - if (OMPI_SUCCESS == status && OMPI_OSC_RDMA_TYPE_GET_ACC == request->type) { - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "unpacking get accumulate result into user buffer"); - if (NULL == request->result_addr) { - /* result buffer is not necessarily contiguous. use the opal datatype engine to - * copy the data over in this case */ - struct iovec iov = {.iov_base = (void *) source, request->len}; - uint32_t iov_count = 1; - size_t size = request->len; - - opal_convertor_unpack (&request->convertor, &iov, &iov_count, &size); - opal_convertor_cleanup (&request->convertor); - } else { - /* copy contiguous data to the result buffer */ - ompi_datatype_sndrcv ((void *) source, request->len, MPI_BYTE, request->result_addr, - request->result_count, request->result_dt); - } - - if (&ompi_mpi_op_no_op.op == request->op) { - /* this is a no-op. nothing more to do except release resources and the accumulate lock */ - ompi_osc_rdma_acc_put_complete (btl, endpoint, local_address, local_handle, context, data, status); - - return; - } - } - - /* accumulate the data */ - if (&ompi_mpi_op_replace.op != request->op) { - ompi_op_reduce (request->op, request->origin_addr, (void *) source, request->origin_count, request->origin_dt); - } else { - memcpy ((void *) source, request->origin_addr, request->len); - } - - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "putting locally accumulated result into target window"); - - /* initiate the put of the accumulated data */ - status = module->selected_btl->btl_put (module->selected_btl, endpoint, (void *) source, - request->target_address, local_handle, - (mca_btl_base_registration_handle_t *) request->ctx, - request->len, 0, MCA_BTL_NO_ORDER, ompi_osc_rdma_acc_put_complete, - request, NULL); - /* TODO -- we can do better. probably should queue up the next step and handle it in progress */ - assert (OPAL_SUCCESS == status); -} - -static inline int ompi_osc_rdma_gacc_contig (ompi_osc_rdma_sync_t *sync, const void *source, int source_count, ompi_datatype_t *source_datatype, - void *result, int result_count, ompi_datatype_t *result_datatype, - ompi_osc_rdma_peer_t *peer, uint64_t target_address, +static inline int ompi_osc_rdma_gacc_contig (ompi_osc_rdma_sync_t *sync, const void *source, int source_count, + ompi_datatype_t *source_datatype, void *result, int result_count, + ompi_datatype_t *result_datatype, ompi_osc_rdma_peer_t *peer, uint64_t target_address, mca_btl_base_registration_handle_t *target_handle, int target_count, ompi_datatype_t *target_datatype, ompi_op_t *op, ompi_osc_rdma_request_t *request) { ompi_osc_rdma_module_t *module = sync->module; - const size_t btl_alignment_mask = ALIGNMENT_MASK(module->selected_btl->btl_get_alignment); unsigned long len = target_count * target_datatype->super.size; - ompi_osc_rdma_frag_t *frag = NULL; - unsigned long aligned_len, offset; char *ptr = NULL; int ret; OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "initiating accumulate on contiguous region of %lu bytes to remote address %" PRIx64 ", sync %p", len, target_address, (void *) sync); - offset = target_address & btl_alignment_mask;; - aligned_len = (len + offset + btl_alignment_mask) & ~btl_alignment_mask; - - ret = ompi_osc_rdma_frag_alloc (module, aligned_len, &frag, &ptr); - if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_WARN, "could not allocate a temporary buffer for accumulate"); - return OMPI_ERR_OUT_OF_RESOURCE; - } - - OPAL_THREAD_LOCK(&module->lock); - /* to ensure order wait until the previous accumulate completes */ - while (ompi_osc_rdma_peer_is_accumulating (peer)) { - OPAL_THREAD_UNLOCK(&module->lock); - ompi_osc_rdma_progress (module); - OPAL_THREAD_LOCK(&module->lock); - } + if (&ompi_mpi_op_replace.op != op || OMPI_OSC_RDMA_TYPE_GET_ACC == request->type) { + ptr = malloc (len); + if (OPAL_UNLIKELY(NULL == ptr)) { + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_WARN, "could not allocate a temporary buffer for accumulate"); + return OMPI_ERR_OUT_OF_RESOURCE; + } - peer->flags |= OMPI_OSC_RDMA_PEER_ACCUMULATING; - OPAL_THREAD_UNLOCK(&module->lock); + /* set up the request */ + request->to_free = ptr; - if (!ompi_osc_rdma_peer_is_exclusive (peer)) { - (void) ompi_osc_rdma_lock_acquire_exclusive (module, peer, offsetof (ompi_osc_rdma_state_t, accumulate_lock)); - } + ret = ompi_osc_get_data_blocking (module, peer->data_endpoint, target_address, target_handle, ptr, len); + if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { + return ret; + } - /* set up the request */ - request->frag = frag; - request->origin_addr = (void *) source; - request->origin_dt = source_datatype; - request->origin_count = source_count; - request->ctx = (void *) target_handle; - request->result_addr = result; - request->result_count = result_count; - request->result_dt = result_datatype; - request->offset = (ptrdiff_t) target_address & btl_alignment_mask; - request->target_address = target_address; - request->len = len; - request->op = op; - request->sync = sync; - - ompi_osc_rdma_sync_rdma_inc (sync); + if (OMPI_OSC_RDMA_TYPE_GET_ACC == request->type) { + if (NULL == result) { + /* result buffer is not necessarily contiguous. use the opal datatype engine to + * copy the data over in this case */ + struct iovec iov = {.iov_base = ptr, len}; + uint32_t iov_count = 1; + size_t size = request->len; - if (&ompi_mpi_op_replace.op != op || OMPI_OSC_RDMA_TYPE_GET_ACC == request->type) { - /* align the target address */ - target_address = target_address & ~btl_alignment_mask; + opal_convertor_unpack (&request->convertor, &iov, &iov_count, &size); + opal_convertor_cleanup (&request->convertor); + } else { + /* copy contiguous data to the result buffer */ + ompi_datatype_sndrcv (ptr, len, MPI_BYTE, result, result_count, result_datatype); + } + } - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "initiating btl get. local: %p (handle %p), remote: 0x%" PRIx64 - " (handle %p)", (void*)ptr, (void *) frag->handle, target_address, (void *) target_handle); + if (&ompi_mpi_op_replace.op == op) { + return ompi_osc_rdma_put_contig (sync, peer, target_address, target_handle, (void *) source, len, request); + } - ret = module->selected_btl->btl_get (module->selected_btl, peer->data_endpoint, ptr, - target_address, frag->handle, target_handle, aligned_len, - 0, MCA_BTL_NO_ORDER, ompi_osc_rdma_acc_get_complete, - request, NULL); - } else { - /* copy the put accumulate data */ - memcpy (ptr, source, len); + if (&ompi_mpi_op_no_op.op != op) { + /* NTH: need to cast away const for the source buffer. the buffer will not be modified by this call */ + ompi_op_reduce (op, (void *) source, ptr, source_count, source_datatype); - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "initiating btl put. local: %p (handle %p), remote: 0x%" PRIx64 - " (handle %p)", (void*)ptr, (void *) frag->handle, target_address, (void *) target_handle); + return ompi_osc_rdma_put_contig (sync, peer, target_address, target_handle, ptr, len, request); + } - ret = module->selected_btl->btl_put (module->selected_btl, peer->data_endpoint, ptr, - target_address, frag->handle, target_handle, len, 0, - MCA_BTL_NO_ORDER, ompi_osc_rdma_acc_put_complete, - request, NULL); - } + if (request) { + /* nothing more to do for this request */ + ompi_osc_rdma_request_complete (request, MPI_SUCCESS); + } - if (OPAL_UNLIKELY(OMPI_SUCCESS == ret)) { return OMPI_SUCCESS; } - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_INFO, "accumulate btl operation failed with opal error code %d", ret); - - if (!ompi_osc_rdma_peer_is_exclusive (peer)) { - (void) ompi_osc_rdma_lock_release_exclusive (module, peer, offsetof (ompi_osc_rdma_state_t, accumulate_lock)); - } - - ompi_osc_rdma_cleanup_rdma (sync, frag, NULL, NULL); + return ompi_osc_rdma_put_contig (sync, peer, target_address, target_handle, (void *) source, len, request); +} - return ret; +static void ompi_osc_rdma_gacc_master_cleanup (ompi_osc_rdma_request_t *request) +{ + ompi_osc_rdma_peer_accumulate_cleanup (request->module, request->peer, !ompi_osc_rdma_peer_is_exclusive (request->peer)); } static inline int ompi_osc_rdma_gacc_master (ompi_osc_rdma_sync_t *sync, const void *source_addr, int source_count, @@ -294,6 +276,14 @@ static inline int ompi_osc_rdma_gacc_master (ompi_osc_rdma_sync_t *sync, const v int ret, acc_len; bool done; + if (!request) { + OMPI_OSC_RDMA_REQUEST_ALLOC(module, peer, request); + request->internal = true; + } + + request->cleanup = ompi_osc_rdma_gacc_master_cleanup; + request->type = result_datatype ? OMPI_OSC_RDMA_TYPE_GET_ACC : OMPI_OSC_RDMA_TYPE_ACC; + (void) ompi_datatype_get_extent (target_datatype, &lb, &extent); target_address += lb; @@ -302,13 +292,6 @@ static inline int ompi_osc_rdma_gacc_master (ompi_osc_rdma_sync_t *sync, const v ompi_datatype_is_predefined (target_datatype) && (!result_count || ompi_datatype_is_predefined (result_datatype)) && (target_datatype->super.size * target_count <= acc_limit))) { - if (NULL == request) { - OMPI_OSC_RDMA_REQUEST_ALLOC(module, peer, request); - request->internal = true; - } - - request->type = result_datatype ? OMPI_OSC_RDMA_TYPE_GET_ACC : OMPI_OSC_RDMA_TYPE_ACC; - if (source_datatype) { (void) ompi_datatype_get_extent (source_datatype, &lb, &extent); source_addr = (void *)((intptr_t) source_addr + lb); @@ -384,14 +367,13 @@ static inline int ompi_osc_rdma_gacc_master (ompi_osc_rdma_sync_t *sync, const v return ret; } - if (request) { - /* keep the request from completing until all the transfers have started */ - request->outstanding_requests = 1; - } + /* keep the request from completing until all the transfers have started */ + request->outstanding_requests = 1; target_iov_index = 0; target_iov_count = 0; result_position = 0; + subreq = NULL; do { /* decode segments of the source data */ @@ -424,11 +406,11 @@ static inline int ompi_osc_rdma_gacc_master (ompi_osc_rdma_sync_t *sync, const v acc_len = min((size_t) acc_len, acc_limit); /* execute the get */ - OMPI_OSC_RDMA_REQUEST_ALLOC(module, peer, subreq); - subreq->internal = true; - subreq->parent_request = request; - if (request) { - (void) OPAL_THREAD_ADD32 (&request->outstanding_requests, 1); + if (!subreq) { + OMPI_OSC_RDMA_REQUEST_ALLOC(module, peer, subreq); + subreq->internal = true; + subreq->parent_request = request; + (void) OPAL_THREAD_ADD_FETCH32 (&request->outstanding_requests, 1); } if (result_datatype) { @@ -442,10 +424,13 @@ static inline int ompi_osc_rdma_gacc_master (ompi_osc_rdma_sync_t *sync, const v } ret = ompi_osc_rdma_gacc_contig (sync, source_iovec[source_iov_index].iov_base, acc_len / target_primitive->super.size, - target_primitive, NULL, 0, NULL, peer, (uint64_t) (intptr_t) target_iovec[target_iov_index].iov_base, - target_handle, acc_len / target_primitive->super.size, target_primitive, op, subreq); + target_primitive, NULL, 0, NULL, peer, + (uint64_t) (intptr_t) target_iovec[target_iov_index].iov_base, target_handle, + acc_len / target_primitive->super.size, target_primitive, op, subreq); if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { if (OPAL_UNLIKELY(OMPI_ERR_OUT_OF_RESOURCE != ret)) { + OMPI_OSC_RDMA_REQUEST_RETURN(subreq); + (void) OPAL_THREAD_ADD_FETCH32 (&request->outstanding_requests, -1); /* something bad happened. need to figure out how to handle these errors */ return ret; } @@ -455,6 +440,8 @@ static inline int ompi_osc_rdma_gacc_master (ompi_osc_rdma_sync_t *sync, const v continue; } + subreq = NULL; + /* adjust io vectors */ target_iovec[target_iov_index].iov_len -= acc_len; source_iovec[source_iov_index].iov_len -= acc_len; @@ -467,10 +454,8 @@ static inline int ompi_osc_rdma_gacc_master (ompi_osc_rdma_sync_t *sync, const v } } while (!done); - if (request) { - /* release our reference so the request can complete */ - (void) OPAL_THREAD_ADD32 (&request->outstanding_requests, -1); - } + /* release our reference so the request can complete */ + ompi_osc_rdma_request_deref (request); if (source_datatype) { opal_convertor_cleanup (&source_convertor); @@ -485,35 +470,15 @@ static inline int ompi_osc_rdma_gacc_master (ompi_osc_rdma_sync_t *sync, const v return OMPI_SUCCESS; } -static void ompi_osc_rdma_cas_atomic_complete (struct mca_btl_base_module_t *btl, struct mca_btl_base_endpoint_t *endpoint, - void *local_address, mca_btl_base_registration_handle_t *local_handle, - void *context, void *data, int status) -{ - ompi_osc_rdma_sync_t *sync = (ompi_osc_rdma_sync_t *) context; - ompi_osc_rdma_frag_t *frag = (ompi_osc_rdma_frag_t *) data; - void *result_addr = (void *)(intptr_t) ((int64_t *) local_address)[1]; - size_t size = ((int64_t *) local_address)[2]; - - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "atomic compare-and-swap complete. result: 0x%" PRIx64, - *((int64_t *) local_address)); - - /* copy the result */ - memcpy (result_addr, local_address, size); - - ompi_osc_rdma_sync_rdma_dec (sync); - ompi_osc_rdma_frag_complete (frag); -} - static inline int ompi_osc_rdma_cas_atomic (ompi_osc_rdma_sync_t *sync, const void *source_addr, const void *compare_addr, void *result_addr, ompi_datatype_t *datatype, ompi_osc_rdma_peer_t *peer, - uint64_t target_address, mca_btl_base_registration_handle_t *target_handle) + uint64_t target_address, mca_btl_base_registration_handle_t *target_handle, + bool lock_acquired) { ompi_osc_rdma_module_t *module = sync->module; const size_t size = datatype->super.size; - ompi_osc_rdma_frag_t *frag = NULL; int64_t compare, source; int ret, flags; - char *ptr; if (8 != size && !(4 == size && (MCA_BTL_ATOMIC_SUPPORTS_32BIT & module->selected_btl->btl_flags))) { return OMPI_ERR_NOT_SUPPORTED; @@ -526,65 +491,16 @@ static inline int ompi_osc_rdma_cas_atomic (ompi_osc_rdma_sync_t *sync, const vo OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "initiating compare-and-swap using %d-bit btl atomics. compare: 0x%" PRIx64 ", origin: 0x%" PRIx64, (int) size * 8, *((int64_t *) compare_addr), *((int64_t *) source_addr)); - ret = ompi_osc_rdma_frag_alloc (module, 24, &frag, &ptr); - if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { - return ret; - } - - /* store the destination and size in the temporary buffer */ - ((int64_t *) ptr)[1] = (intptr_t) result_addr; - ((int64_t *) ptr)[2] = size; - - ompi_osc_rdma_sync_rdma_inc (sync); - - do { - ret = module->selected_btl->btl_atomic_cswap (module->selected_btl, peer->data_endpoint, ptr, target_address, - frag->handle, target_handle, compare, source, flags, MCA_BTL_NO_ORDER, - ompi_osc_rdma_cas_atomic_complete, sync, frag); - - ompi_osc_rdma_progress (module); - } while (OPAL_UNLIKELY(OMPI_ERR_OUT_OF_RESOURCE == ret || OPAL_ERR_TEMP_OUT_OF_RESOURCE == ret)); - - if (OPAL_SUCCESS != ret) { - ompi_osc_rdma_sync_rdma_dec (sync); - - if (1 == ret) { - memcpy (result_addr, ptr, size); - ret = OMPI_SUCCESS; - } - - ompi_osc_rdma_frag_complete (frag); + ret = ompi_osc_rdma_btl_cswap (module, peer->data_endpoint, target_address, target_handle, compare, source, flags, + result_addr); + if (OPAL_LIKELY(OMPI_SUCCESS == ret)) { + ompi_osc_rdma_peer_accumulate_cleanup (module, peer, lock_acquired); } return ret; } -static inline void ompi_osc_rdma_fetch_and_op_atomic_complete (struct mca_btl_base_module_t *btl, struct mca_btl_base_endpoint_t *endpoint, - void *local_address, mca_btl_base_registration_handle_t *local_handle, - void *context, void *data, int status) -{ - ompi_osc_rdma_sync_t *sync = (ompi_osc_rdma_sync_t *) context; - ompi_osc_rdma_frag_t *frag = (ompi_osc_rdma_frag_t *) data; - void *result_addr = (void *)(intptr_t) ((int64_t *) local_address)[1]; - ompi_osc_rdma_request_t *req = (ompi_osc_rdma_request_t *) (intptr_t) ((int64_t *) local_address)[2]; - size_t size = ((int64_t *) local_address)[3]; - - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "atomic fetch-and-op complete. result: 0x%" PRIx64, - *((int64_t *) local_address)); - - /* copy the result */ - if (result_addr) { - memcpy (result_addr, local_address, size); - } - - ompi_osc_rdma_sync_rdma_dec (sync); - ompi_osc_rdma_frag_complete (frag); - if (req) { - ompi_osc_rdma_request_complete (req, status); - } -} - -static int ompi_osc_rdma_op_mapping[OMPI_OP_NUM_OF_TYPES] = { +static int ompi_osc_rdma_op_mapping[OMPI_OP_NUM_OF_TYPES + 1] = { [OMPI_OP_MAX] = MCA_BTL_ATOMIC_MAX, [OMPI_OP_MIN] = MCA_BTL_ATOMIC_MIN, [OMPI_OP_SUM] = MCA_BTL_ATOMIC_ADD, @@ -599,13 +515,12 @@ static int ompi_osc_rdma_op_mapping[OMPI_OP_NUM_OF_TYPES] = { static int ompi_osc_rdma_fetch_and_op_atomic (ompi_osc_rdma_sync_t *sync, const void *origin_addr, void *result_addr, ompi_datatype_t *dt, ptrdiff_t extent, ompi_osc_rdma_peer_t *peer, uint64_t target_address, - mca_btl_base_registration_handle_t *target_handle, ompi_op_t *op, ompi_osc_rdma_request_t *req) + mca_btl_base_registration_handle_t *target_handle, ompi_op_t *op, ompi_osc_rdma_request_t *req, + bool lock_acquired) { ompi_osc_rdma_module_t *module = sync->module; int32_t atomic_flags = module->selected_btl->btl_atomic_flags; - ompi_osc_rdma_frag_t *frag = NULL; int ret, btl_op, flags; - char *ptr = NULL; int64_t origin; if ((8 != extent && !((MCA_BTL_ATOMIC_SUPPORTS_32BIT & atomic_flags) && 4 == extent)) || @@ -614,51 +529,30 @@ static int ompi_osc_rdma_fetch_and_op_atomic (ompi_osc_rdma_sync_t *sync, const return OMPI_ERR_NOT_SUPPORTED; } + btl_op = ompi_osc_rdma_op_mapping[op->op_type]; + if (0 == btl_op) { + return OMPI_ERR_NOT_SUPPORTED; + } + flags = (4 == extent) ? MCA_BTL_ATOMIC_FLAG_32BIT : 0; if (OMPI_DATATYPE_FLAG_DATA_FLOAT & dt->super.flags) { flags |= MCA_BTL_ATOMIC_FLAG_FLOAT; } - btl_op = ompi_osc_rdma_op_mapping[op->op_type]; - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "initiating fetch-and-op using %d-bit btl atomics. origin: 0x%" PRIx64, (4 == extent) ? 32 : 64, *((int64_t *) origin_addr)); - ret = ompi_osc_rdma_frag_alloc (module, 32, &frag, &ptr); - if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { - return ret; - } - origin = (8 == extent) ? ((int64_t *) origin_addr)[0] : ((int32_t *) origin_addr)[0]; - /* store the destination, request, and extent in the temporary buffer for the callback */ - ((int64_t *) ptr)[1] = (intptr_t) result_addr; - ((int64_t *) ptr)[2] = (intptr_t) req; - ((int64_t *) ptr)[3] = extent; + ret = ompi_osc_rdma_btl_fop (module, peer->data_endpoint, target_address, target_handle, btl_op, origin, flags, + result_addr, true, NULL, NULL, NULL); + if (OPAL_SUCCESS == ret) { + /* done. release the lock */ + ompi_osc_rdma_peer_accumulate_cleanup (module, peer, lock_acquired); - ompi_osc_rdma_sync_rdma_inc (sync); - - do { - ret = module->selected_btl->btl_atomic_fop (module->selected_btl, peer->data_endpoint, ptr, target_address, - frag->handle, target_handle, btl_op, origin, flags, - MCA_BTL_NO_ORDER, ompi_osc_rdma_fetch_and_op_atomic_complete, - sync, frag); - - ompi_osc_rdma_progress (module); - } while (OPAL_UNLIKELY(OMPI_ERR_OUT_OF_RESOURCE == ret || OPAL_ERR_TEMP_OUT_OF_RESOURCE == ret)); - - if (OPAL_SUCCESS != ret) { - ompi_osc_rdma_sync_rdma_dec (sync); - - if (OPAL_LIKELY(1 == ret)) { - memcpy (result_addr, ptr, extent); - if (req) { - ompi_osc_rdma_request_complete (req, OMPI_SUCCESS); - } - ret = OPAL_SUCCESS; + if (req) { + ompi_osc_rdma_request_complete (req, MPI_SUCCESS); } - - ompi_osc_rdma_frag_complete (frag); } return ret; @@ -666,12 +560,11 @@ static int ompi_osc_rdma_fetch_and_op_atomic (ompi_osc_rdma_sync_t *sync, const static int ompi_osc_rdma_fetch_and_op_cas (ompi_osc_rdma_sync_t *sync, const void *origin_addr, void *result_addr, ompi_datatype_t *dt, ptrdiff_t extent, ompi_osc_rdma_peer_t *peer, uint64_t target_address, - mca_btl_base_registration_handle_t *target_handle, ompi_op_t *op, ompi_osc_rdma_request_t *req) + mca_btl_base_registration_handle_t *target_handle, ompi_op_t *op, ompi_osc_rdma_request_t *req, + bool lock_acquired) { ompi_osc_rdma_module_t *module = sync->module; - ompi_osc_rdma_frag_t *frag = NULL; - uint64_t address, offset; - char *ptr = NULL; + uint64_t address, offset, new_value, old_value; int ret; if (extent > 8) { @@ -682,81 +575,51 @@ static int ompi_osc_rdma_fetch_and_op_cas (ompi_osc_rdma_sync_t *sync, const voi address = target_address & ~7; offset = target_address & ~address; - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "initiating fetch-and-op using compare-and-swap. origin: 0x%" PRIx64, - *((int64_t *) origin_addr)); + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "initiating fetch-and-op using compare-and-swap"); - ret = ompi_osc_rdma_frag_alloc (module, 16, &frag, &ptr); + ret = ompi_osc_get_data_blocking (module, peer->data_endpoint, address, target_handle, &old_value, 8); if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { return ret; } /* store the destination in the temporary buffer */ do { - volatile bool complete = false; - - ret = ompi_osc_get_data_blocking (module, peer->data_endpoint, address, target_handle, ptr, 8); - if (OMPI_SUCCESS != ret) { - ompi_osc_rdma_frag_complete (frag); - return ret; - } - - ((int64_t *) ptr)[1] = ((int64_t *) ptr)[0]; + new_value = old_value; - if (&ompi_mpi_op_no_op.op == op) { - memcpy (ptr + offset, origin_addr, extent); - } else { - ompi_op_reduce (op, (void *) origin_addr, ptr + offset, 1, dt); + if (&ompi_mpi_op_replace.op == op) { + memcpy ((void *)((intptr_t) &new_value + offset), origin_addr, extent); + } else if (&ompi_mpi_op_no_op.op != op) { + ompi_op_reduce (op, (void *) origin_addr, (void*)((intptr_t) &new_value + offset), 1, dt); } - do { - ret = module->selected_btl->btl_atomic_cswap (module->selected_btl, peer->data_endpoint, ptr, address, - frag->handle, target_handle, ((int64_t *) ptr)[1], - ((int64_t *) ptr)[0], 0, MCA_BTL_NO_ORDER, - ompi_osc_rdma_atomic_complete, (void *) &complete, NULL); - - ompi_osc_rdma_progress (module); - } while (OPAL_UNLIKELY(OPAL_ERR_OUT_OF_RESOURCE == ret || OPAL_ERR_TEMP_OUT_OF_RESOURCE == ret)); - - if (OPAL_UNLIKELY(OPAL_SUCCESS != ret)) { + ret = ompi_osc_rdma_btl_cswap (module, peer->data_endpoint, address, target_handle, + old_value, new_value, 0, (int64_t*)&new_value); + if (OPAL_SUCCESS != ret || new_value == old_value) { break; } - while (!complete) { - ompi_osc_rdma_progress (module); - } - - if (((int64_t *) ptr)[1] == ((int64_t *) ptr)[0]) { - break; - } + old_value = new_value; } while (1); if (result_addr) { - memcpy (result_addr, ptr + 8 + offset, extent); + memcpy (result_addr, (void *)((intptr_t) &new_value + offset), extent); } - ompi_osc_rdma_frag_complete (frag); - - return ret; -} - -static void ompi_osc_rdma_acc_single_atomic_complete (struct mca_btl_base_module_t *btl, struct mca_btl_base_endpoint_t *endpoint, - void *local_address, mca_btl_base_registration_handle_t *local_handle, - void *context, void *data, int status) -{ - ompi_osc_rdma_sync_t *sync = (ompi_osc_rdma_sync_t *) context; - ompi_osc_rdma_request_t *req = (ompi_osc_rdma_request_t *) data; - - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "atomic accumulate complete"); + if (OPAL_SUCCESS == ret) { + /* done. release the lock */ + ompi_osc_rdma_peer_accumulate_cleanup (module, peer, lock_acquired); - ompi_osc_rdma_sync_rdma_dec (sync); - if (req) { - ompi_osc_rdma_request_complete (req, status); + if (req) { + ompi_osc_rdma_request_complete (req, MPI_SUCCESS); + } } + + return ret; } static int ompi_osc_rdma_acc_single_atomic (ompi_osc_rdma_sync_t *sync, const void *origin_addr, ompi_datatype_t *dt, ptrdiff_t extent, ompi_osc_rdma_peer_t *peer, uint64_t target_address, mca_btl_base_registration_handle_t *target_handle, - ompi_op_t *op, ompi_osc_rdma_request_t *req) + ompi_op_t *op, ompi_osc_rdma_request_t *req, bool lock_acquired) { ompi_osc_rdma_module_t *module = sync->module; int32_t atomic_flags = module->selected_btl->btl_atomic_flags; @@ -765,7 +628,8 @@ static int ompi_osc_rdma_acc_single_atomic (ompi_osc_rdma_sync_t *sync, const vo if (!(module->selected_btl->btl_flags & MCA_BTL_FLAGS_ATOMIC_OPS)) { /* btl put atomics not supported or disabled. fall back on fetch-and-op */ - return ompi_osc_rdma_fetch_and_op_atomic (sync, origin_addr, NULL, dt, extent, peer, target_address, target_handle, op, req); + return ompi_osc_rdma_fetch_and_op_atomic (sync, origin_addr, NULL, dt, extent, peer, target_address, target_handle, + op, req, lock_acquired); } if ((8 != extent && !((MCA_BTL_ATOMIC_SUPPORTS_32BIT & atomic_flags) && 4 == extent)) || @@ -787,23 +651,15 @@ static int ompi_osc_rdma_acc_single_atomic (ompi_osc_rdma_sync_t *sync, const vo OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "initiating accumulate using 64-bit btl atomics. origin: 0x%" PRIx64, *((int64_t *) origin_addr)); - ompi_osc_rdma_sync_rdma_inc (sync); - - do { - ret = module->selected_btl->btl_atomic_op (module->selected_btl, peer->data_endpoint, target_address, - target_handle, btl_op, origin, flags, MCA_BTL_NO_ORDER, - ompi_osc_rdma_acc_single_atomic_complete, sync, req); + /* if we locked the peer its best to wait for completion before returning */ + ret = ompi_osc_rdma_btl_op (module, peer->data_endpoint, target_address, target_handle, btl_op, origin, + flags, true, NULL, NULL, NULL); + if (OPAL_SUCCESS == ret) { + /* done. release the lock */ + ompi_osc_rdma_peer_accumulate_cleanup (module, peer, lock_acquired); - ompi_osc_rdma_progress (module); - } while (OPAL_UNLIKELY(OMPI_ERR_OUT_OF_RESOURCE == ret || OPAL_ERR_TEMP_OUT_OF_RESOURCE == ret)); - - if (OPAL_SUCCESS != ret) { - ompi_osc_rdma_sync_rdma_dec (sync); - if (1 == ret) { - if (req) { - ompi_osc_rdma_request_complete (req, OMPI_SUCCESS); - } - ret = OMPI_SUCCESS; + if (req) { + ompi_osc_rdma_request_complete (req, MPI_SUCCESS); } } @@ -814,152 +670,103 @@ static int ompi_osc_rdma_acc_single_atomic (ompi_osc_rdma_sync_t *sync, const vo * ompi_osc_rdma_cas_get_complete: * Note: This function will not work as is in a heterogeneous environment. */ -static void ompi_osc_rdma_cas_get_complete (struct mca_btl_base_module_t *btl, struct mca_btl_base_endpoint_t *endpoint, +static void ompi_osc_rdma_cas_put_complete (struct mca_btl_base_module_t *btl, struct mca_btl_base_endpoint_t *endpoint, void *local_address, mca_btl_base_registration_handle_t *local_handle, void *context, void *data, int status) { - ompi_osc_rdma_request_t *request = (ompi_osc_rdma_request_t *) context; - ompi_osc_rdma_sync_t *sync = request->sync; - ompi_osc_rdma_module_t *module = sync->module; - intptr_t source = (intptr_t) local_address + request->offset; - ompi_osc_rdma_frag_t *frag = request->frag; - ompi_osc_rdma_peer_t *peer = request->peer; - int ret; - - OSC_RDMA_VERBOSE(status ? MCA_BASE_VERBOSE_ERROR : MCA_BASE_VERBOSE_TRACE, "remote compare-and-swap get complete on sync %p. " - "status %d", (void *) sync, status); - - if (OPAL_UNLIKELY(OMPI_SUCCESS != status)) { - return; - } - - /* copy data to the user buffer (for gacc) */ - memcpy (request->result_addr, (void *) source, request->len); - - if (0 == memcmp ((void *) source, request->compare_addr, request->len)) { - /* the target and compare buffers match. write the source to the target */ - memcpy ((void *) source, request->origin_addr, request->len); - - ret = module->selected_btl->btl_put (module->selected_btl, peer->data_endpoint, local_address, - request->target_address, local_handle, - (mca_btl_base_registration_handle_t *) request->ctx, - request->len, 0, MCA_BTL_NO_ORDER, - ompi_osc_rdma_acc_put_complete, request, NULL); - if (OPAL_UNLIKELY(OPAL_SUCCESS != ret)) { - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_ERROR, "could not start put to complete accumulate operation. opal return code " - "%d", ret); - } - - /* TODO -- we can do better. probably should queue up the next step and handle it in progress */ - assert (OPAL_SUCCESS == ret); - - return; - } - - /* this is a no-op. nothing more to do except release the accumulate lock */ - ompi_osc_rdma_frag_complete (frag); - - if (!ompi_osc_rdma_peer_is_exclusive (peer)) { - (void) ompi_osc_rdma_lock_release_exclusive (module, request->peer, - offsetof (ompi_osc_rdma_state_t, accumulate_lock)); - } - - /* the request is now complete and the outstanding rdma operation is complete */ - ompi_osc_rdma_request_complete (request, status); + bool *complete = (bool *) context; - ompi_osc_rdma_sync_rdma_dec (sync); - peer->flags &= ~OMPI_OSC_RDMA_PEER_ACCUMULATING; + *complete = true; } +/** + * @brief Support for compare-and-swap on arbitraty-sized datatypes + * + * This function is necessary to support compare-and-swap on types larger + * than 64-bits. As of MPI-3.1 this can include MPI_INTEGER16 and possibly + * MPI_LON_LONG_INT. The former is a 128-bit value and the later *may* + * be depending on the platform, compiler, etc. This function currently + * blocks until the operation is complete. + */ static inline int cas_rdma (ompi_osc_rdma_sync_t *sync, const void *source_addr, const void *compare_addr, void *result_addr, ompi_datatype_t *datatype, ompi_osc_rdma_peer_t *peer, uint64_t target_address, - mca_btl_base_registration_handle_t *target_handle) + mca_btl_base_registration_handle_t *target_handle, bool lock_acquired) { ompi_osc_rdma_module_t *module = sync->module; - const size_t btl_alignment_mask = ALIGNMENT_MASK(module->selected_btl->btl_get_alignment); - unsigned long offset, aligned_len, len = datatype->super.size; + unsigned long len = datatype->super.size; + mca_btl_base_registration_handle_t *local_handle = NULL; ompi_osc_rdma_frag_t *frag = NULL; - ompi_osc_rdma_request_t *request; - char *ptr = NULL; + volatile bool complete = false; + /* drop the const. this code will not attempt to change the value */ + char *ptr = (char *) source_addr; int ret; OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "initiating compare-and-swap using RMDA on %lu bytes to remote address %" PRIx64 ", sync %p", len, target_address, (void *) sync); - OMPI_OSC_RDMA_REQUEST_ALLOC(module, peer, request); + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "RDMA compare-and-swap initiating blocking btl get..."); + ret = ompi_osc_get_data_blocking (module, peer->data_endpoint, target_address, target_handle, result_addr, len); + if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { + return ret; + } - request->internal = true; - request->type = OMPI_OSC_RDMA_TYPE_CSWAP; - request->sync = sync; + if (0 != memcmp (result_addr, compare_addr, len)) { + /* value does not match compare value, nothing more to do*/ + ompi_osc_rdma_peer_accumulate_cleanup (module, peer, lock_acquired); + return OMPI_SUCCESS; + } - OPAL_THREAD_LOCK(&module->lock); - /* to ensure order wait until the previous accumulate completes */ - while (ompi_osc_rdma_peer_is_accumulating (peer)) { - OPAL_THREAD_UNLOCK(&module->lock); - ompi_osc_rdma_progress (module); - OPAL_THREAD_LOCK(&module->lock); + if (module->selected_btl->btl_register_mem && len > module->selected_btl->btl_put_local_registration_threshold) { + do { + ret = ompi_osc_rdma_frag_alloc (module, len, &frag, &ptr); + if (OPAL_UNLIKELY(OMPI_SUCCESS == ret)) { + break; + } + + ompi_osc_rdma_progress (module); + } while (1); + + memcpy (ptr, source_addr, len); + local_handle = frag->handle; } - peer->flags |= OMPI_OSC_RDMA_PEER_ACCUMULATING; - OPAL_THREAD_UNLOCK(&module->lock); - offset = target_address & btl_alignment_mask;; - aligned_len = (len + offset + btl_alignment_mask) & ~btl_alignment_mask; + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "RDMA compare-and-swap initiating blocking btl put..."); do { - ret = ompi_osc_rdma_frag_alloc (module, aligned_len, &frag, &ptr); - if (OPAL_UNLIKELY(OMPI_SUCCESS == ret)) { + ret = module->selected_btl->btl_put (module->selected_btl, peer->data_endpoint, ptr, target_address, + local_handle, target_handle, len, 0, MCA_BTL_NO_ORDER, + ompi_osc_rdma_cas_put_complete, (void *) &complete, NULL); + if (OPAL_SUCCESS == ret || (OPAL_ERR_OUT_OF_RESOURCE != ret && OPAL_ERR_TEMP_OUT_OF_RESOURCE != ret)) { break; } - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_WARN, "could not allocate an rdma fragment for compare-and-swap"); + /* spin a bit on progress */ ompi_osc_rdma_progress (module); } while (1); - if (!ompi_osc_rdma_peer_is_exclusive (peer)) { - (void) ompi_osc_rdma_lock_acquire_exclusive (module, peer, offsetof (ompi_osc_rdma_state_t, accumulate_lock)); + if (OPAL_SUCCESS != ret) { + /* something went horribly wrong */ + return ret; } - /* set up the request */ - request->frag = frag; - request->origin_addr = (void *) source_addr; - request->ctx = (void *) target_handle; - request->result_addr = result_addr; - request->compare_addr = compare_addr; - request->result_dt = datatype; - request->offset = (ptrdiff_t) offset; - request->target_address = target_address; - request->len = len; - - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "RDMA compare-and-swap initiating btl get"); - - do { - ret = module->selected_btl->btl_get (module->selected_btl, peer->data_endpoint, ptr, - target_address, frag->handle, target_handle, - aligned_len, 0, MCA_BTL_NO_ORDER, - ompi_osc_rdma_cas_get_complete, request, NULL); - if (OPAL_LIKELY(OPAL_SUCCESS == ret)) { - break; - } + while (!complete) { + ompi_osc_rdma_progress (module); + } - if (OPAL_UNLIKELY(OPAL_ERR_OUT_OF_RESOURCE != ret && OPAL_ERR_TEMP_OUT_OF_RESOURCE != ret)) { - if (!ompi_osc_rdma_peer_is_exclusive (peer)) { - (void) ompi_osc_rdma_lock_release_exclusive (module, peer, offsetof (ompi_osc_rdma_state_t, accumulate_lock)); - } - ompi_osc_rdma_frag_complete (frag); - return ret; - } + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "RDMA compare-and-swap compare-and-swap complete"); - ompi_osc_rdma_progress (module); - } while (1); + if (frag) { + ompi_osc_rdma_frag_complete (frag); + } - ompi_osc_rdma_sync_rdma_inc (sync); + ompi_osc_rdma_peer_accumulate_cleanup (module, peer, lock_acquired); - return OMPI_SUCCESS; + return ret; } int ompi_osc_rdma_compare_and_swap (const void *origin_addr, const void *compare_addr, void *result_addr, - ompi_datatype_t *dt, int target_rank, OPAL_PTRDIFF_TYPE target_disp, + ompi_datatype_t *dt, int target_rank, ptrdiff_t target_disp, ompi_win_t *win) { ompi_osc_rdma_module_t *module = GET_MODULE(win); @@ -967,6 +774,8 @@ int ompi_osc_rdma_compare_and_swap (const void *origin_addr, const void *compare mca_btl_base_registration_handle_t *target_handle; ompi_osc_rdma_sync_t *sync; uint64_t target_address; + ptrdiff_t true_lb, true_extent; + bool lock_acquired = false; int ret; OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "cswap: 0x%lx, 0x%lx, 0x%lx, %s, %d, %d, %s", @@ -978,29 +787,59 @@ int ompi_osc_rdma_compare_and_swap (const void *origin_addr, const void *compare return OMPI_ERR_RMA_SYNC; } - ret = osc_rdma_get_remote_segment (module, peer, target_disp, dt->super.size, &target_address, &target_handle); + ret = ompi_datatype_get_true_extent(dt, &true_lb, &true_extent); + if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { + return ret; + } + + ret = osc_rdma_get_remote_segment (module, peer, target_disp, true_lb+true_extent, &target_address, &target_handle); if (OPAL_UNLIKELY(OPAL_SUCCESS != ret)) { return ret; } - if (win->w_acc_ops <= OMPI_WIN_ACCUMULATE_OPS_SAME_OP) { - /* the user has indicated that they will only use the same op (or same op and no op) - * for operations on overlapping memory ranges. that indicates it is safe to go ahead - * and use network atomic operations. */ - ret = ompi_osc_rdma_cas_atomic (sync, origin_addr, compare_addr, result_addr, dt, - peer, target_address, target_handle); - if (OMPI_SUCCESS == ret) { - return OMPI_SUCCESS; - } + /* to ensure order wait until the previous accumulate completes */ + while (!ompi_osc_rdma_peer_test_set_flag (peer, OMPI_OSC_RDMA_PEER_ACCUMULATING)) { + ompi_osc_rdma_progress (module); + } + + /* get an exclusive lock on the peer */ + if (!ompi_osc_rdma_peer_is_exclusive (peer) && !(module->acc_single_intrinsic || win->w_acc_ops <= OMPI_WIN_ACCUMULATE_OPS_SAME_OP)) { + (void) ompi_osc_rdma_lock_acquire_exclusive (module, peer, offsetof (ompi_osc_rdma_state_t, accumulate_lock)); + lock_acquired = true; + } + + /* either we have and exclusive lock (via MPI_Win_lock() or the accumulate lock) or the + * user has indicated that they will only use the same op (or same op and no op) for + * operations on overlapping memory ranges. that indicates it is safe to go ahead and + * use network atomic operations. */ + ret = ompi_osc_rdma_cas_atomic (sync, origin_addr, compare_addr, result_addr, dt, + peer, target_address, target_handle, lock_acquired); + if (OMPI_SUCCESS == ret) { + return OMPI_SUCCESS; + } + + if (!(lock_acquired || ompi_osc_rdma_peer_is_exclusive (peer))) { + (void) ompi_osc_rdma_lock_acquire_exclusive (module, peer, offsetof (ompi_osc_rdma_state_t, accumulate_lock)); + lock_acquired = true; } if (ompi_osc_rdma_peer_local_base (peer)) { - return ompi_osc_rdma_cas_local (origin_addr, compare_addr, result_addr, dt, - peer, target_address, target_handle, module); + ret = ompi_osc_rdma_cas_local (origin_addr, compare_addr, result_addr, dt, + peer, target_address, target_handle, module, + lock_acquired); + } else { + ret = cas_rdma (sync, origin_addr, compare_addr, result_addr, dt, peer, target_address, + target_handle, lock_acquired); + } + + if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { + /* operation failed. the application will most likely abort but we still want to leave the window + * in working state if possible. on successful completion the above calls with clear the lock + * and accumulate state */ + ompi_osc_rdma_peer_accumulate_cleanup (module, peer, lock_acquired); } - return cas_rdma (sync, origin_addr, compare_addr, result_addr, dt, peer, target_address, - target_handle); + return ret; } @@ -1015,7 +854,8 @@ int ompi_osc_rdma_rget_accumulate_internal (ompi_osc_rdma_sync_t *sync, const vo ompi_osc_rdma_module_t *module = sync->module; mca_btl_base_registration_handle_t *target_handle; uint64_t target_address; - ptrdiff_t lb, extent; + ptrdiff_t lb, origin_extent, target_span; + bool lock_acquired = false; int ret; /* short-circuit case. note that origin_count may be 0 if op is MPI_NO_OP */ @@ -1027,21 +867,39 @@ int ompi_osc_rdma_rget_accumulate_internal (ompi_osc_rdma_sync_t *sync, const vo return OMPI_SUCCESS; } - (void) ompi_datatype_get_extent (origin_datatype, &lb, &extent); + target_span = opal_datatype_span(&target_datatype->super, target_count, &lb); - ret = osc_rdma_get_remote_segment (module, peer, target_disp, extent * target_count, &target_address, &target_handle); + // a buffer defined by (buf, count, dt) + // will have data starting at buf+offset and ending len bytes later: + ret = osc_rdma_get_remote_segment (module, peer, target_disp, target_span+lb, &target_address, &target_handle); if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { return ret; } - if (module->acc_single_intrinsic && extent <= 8) { + (void) ompi_datatype_get_extent (origin_datatype, &lb, &origin_extent); + + /* to ensure order wait until the previous accumulate completes */ + while (!ompi_osc_rdma_peer_test_set_flag (peer, OMPI_OSC_RDMA_PEER_ACCUMULATING)) { + ompi_osc_rdma_progress (module); + } + + /* get an exclusive lock on the peer if needed */ + if (!ompi_osc_rdma_peer_is_exclusive (peer) && !module->acc_single_intrinsic) { + lock_acquired = true; + (void) ompi_osc_rdma_lock_acquire_exclusive (module, peer, offsetof (ompi_osc_rdma_state_t, accumulate_lock)); + } + + /* if the datatype is small enough (and the count is 1) then try to directly use the hardware to execute + * the atomic operation. this should be safe in all cases as either 1) the user has assured us they will + * never use atomics with count > 1, 2) we have the accumulate lock, or 3) we have an exclusive lock */ + if (origin_extent <= 8 && 1 == origin_count) { if (module->acc_use_amo && ompi_datatype_is_predefined (origin_datatype)) { if (NULL == result_addr) { - ret = ompi_osc_rdma_acc_single_atomic (sync, origin_addr, origin_datatype, extent, peer, target_address, - target_handle, op, request); + ret = ompi_osc_rdma_acc_single_atomic (sync, origin_addr, origin_datatype, origin_extent, peer, target_address, + target_handle, op, request, lock_acquired); } else { - ret = ompi_osc_rdma_fetch_and_op_atomic (sync, origin_addr, result_addr, origin_datatype, extent, peer, target_address, - target_handle, op, request); + ret = ompi_osc_rdma_fetch_and_op_atomic (sync, origin_addr, result_addr, origin_datatype, origin_extent, peer, target_address, + target_handle, op, request, lock_acquired); } if (OMPI_SUCCESS == ret) { @@ -1049,23 +907,37 @@ int ompi_osc_rdma_rget_accumulate_internal (ompi_osc_rdma_sync_t *sync, const vo } } - ret = ompi_osc_rdma_fetch_and_op_cas (sync, origin_addr, result_addr, origin_datatype, extent, peer, target_address, - target_handle, op, request); + ret = ompi_osc_rdma_fetch_and_op_cas (sync, origin_addr, result_addr, origin_datatype, origin_extent, peer, target_address, + target_handle, op, request, lock_acquired); if (OMPI_SUCCESS == ret) { return OMPI_SUCCESS; } } + /* could not use network atomics. acquire the lock if needed and continue. */ + if (!lock_acquired && !ompi_osc_rdma_peer_is_exclusive (peer)) { + lock_acquired = true; + (void) ompi_osc_rdma_lock_acquire_exclusive (module, peer, offsetof (ompi_osc_rdma_state_t, accumulate_lock)); + } + if (ompi_osc_rdma_peer_local_base (peer)) { /* local/self optimization */ - return ompi_osc_rdma_gacc_local (origin_addr, origin_count, origin_datatype, result_addr, result_count, + ret = ompi_osc_rdma_gacc_local (origin_addr, origin_count, origin_datatype, result_addr, result_count, + result_datatype, peer, target_address, target_handle, target_count, + target_datatype, op, module, request, lock_acquired); + } else { + /* do not need to pass the lock acquired flag to this function. the value of the flag can be obtained + * just by calling ompi_osc_rdma_peer_is_exclusive() in this case. */ + ret = ompi_osc_rdma_gacc_master (sync, origin_addr, origin_count, origin_datatype, result_addr, result_count, result_datatype, peer, target_address, target_handle, target_count, - target_datatype, op, module, request); + target_datatype, op, request); } - return ompi_osc_rdma_gacc_master (sync, origin_addr, origin_count, origin_datatype, result_addr, result_count, - result_datatype, peer, target_address, target_handle, target_count, - target_datatype, op, request); + if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { + ompi_osc_rdma_peer_accumulate_cleanup (module, peer, lock_acquired); + } + + return ret; } int ompi_osc_rdma_get_accumulate (const void *origin_addr, int origin_count, ompi_datatype_t *origin_datatype, @@ -1133,7 +1005,7 @@ int ompi_osc_rdma_rget_accumulate (const void *origin_addr, int origin_count, om } int ompi_osc_rdma_raccumulate (const void *origin_addr, int origin_count, ompi_datatype_t *origin_datatype, int target_rank, - OPAL_PTRDIFF_TYPE target_disp, int target_count, ompi_datatype_t *target_datatype, ompi_op_t *op, + ptrdiff_t target_disp, int target_count, ompi_datatype_t *target_datatype, ompi_op_t *op, ompi_win_t *win, ompi_request_t **request) { ompi_osc_rdma_module_t *module = GET_MODULE(win); @@ -1167,7 +1039,7 @@ int ompi_osc_rdma_raccumulate (const void *origin_addr, int origin_count, ompi_d } int ompi_osc_rdma_accumulate (const void *origin_addr, int origin_count, ompi_datatype_t *origin_datatype, int target_rank, - OPAL_PTRDIFF_TYPE target_disp, int target_count, ompi_datatype_t *target_datatype, ompi_op_t *op, + ptrdiff_t target_disp, int target_count, ompi_datatype_t *target_datatype, ompi_op_t *op, ompi_win_t *win) { ompi_osc_rdma_module_t *module = GET_MODULE(win); @@ -1190,7 +1062,7 @@ int ompi_osc_rdma_accumulate (const void *origin_addr, int origin_count, ompi_da int ompi_osc_rdma_fetch_and_op (const void *origin_addr, void *result_addr, ompi_datatype_t *dt, int target_rank, - OPAL_PTRDIFF_TYPE target_disp, ompi_op_t *op, ompi_win_t *win) + ptrdiff_t target_disp, ompi_op_t *op, ompi_win_t *win) { ompi_osc_rdma_module_t *module = GET_MODULE(win); ompi_osc_rdma_peer_t *peer; diff --git a/ompi/mca/osc/rdma/osc_rdma_accumulate.h b/ompi/mca/osc/rdma/osc_rdma_accumulate.h index 7ab370ab2b8..74f41abf6ef 100644 --- a/ompi/mca/osc/rdma/osc_rdma_accumulate.h +++ b/ompi/mca/osc/rdma/osc_rdma_accumulate.h @@ -2,6 +2,8 @@ /* * Copyright (c) 2014-2016 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2016-2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -15,15 +17,15 @@ #include "osc_rdma.h" int ompi_osc_rdma_compare_and_swap (const void *origin_addr, const void *compare_addr, void *result_addr, - ompi_datatype_t *dt, int target_rank, OPAL_PTRDIFF_TYPE target_disp, + ompi_datatype_t *dt, int target_rank, ptrdiff_t target_disp, ompi_win_t *win); int ompi_osc_rdma_accumulate (const void *origin_addr, int origin_count, ompi_datatype_t *origin_datatype, int target_rank, - OPAL_PTRDIFF_TYPE target_disp, int target_count, ompi_datatype_t *target_datatype, ompi_op_t *op, + ptrdiff_t target_disp, int target_count, ompi_datatype_t *target_datatype, ompi_op_t *op, ompi_win_t *win); int ompi_osc_rdma_fetch_and_op (const void *origin_addr, void *result_addr, ompi_datatype_t *dt, int target_rank, - OPAL_PTRDIFF_TYPE target_disp, ompi_op_t *op, ompi_win_t *win); + ptrdiff_t target_disp, ompi_op_t *op, ompi_win_t *win); int ompi_osc_rdma_get_accumulate (const void *origin_addr, int origin_count, ompi_datatype_t *origin_datatype, void *result_addr, int result_count, ompi_datatype_t *result_datatype, @@ -31,7 +33,7 @@ int ompi_osc_rdma_get_accumulate (const void *origin_addr, int origin_count, omp ompi_op_t *op, ompi_win_t *win); int ompi_osc_rdma_raccumulate (const void *origin_addr, int origin_count, ompi_datatype_t *origin_datatype, int target_rank, - OPAL_PTRDIFF_TYPE target_disp, int target_count, ompi_datatype_t *target_datatype, ompi_op_t *op, + ptrdiff_t target_disp, int target_count, ompi_datatype_t *target_datatype, ompi_op_t *op, ompi_win_t *win, ompi_request_t **request); int ompi_osc_rdma_rget_accumulate (const void *origin_addr, int origin_count, ompi_datatype_t *origin_datatype, diff --git a/ompi/mca/osc/rdma/osc_rdma_active_target.c b/ompi/mca/osc/rdma/osc_rdma_active_target.c index ed773346325..dd52e4938e8 100644 --- a/ompi/mca/osc/rdma/osc_rdma_active_target.c +++ b/ompi/mca/osc/rdma/osc_rdma_active_target.c @@ -8,7 +8,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2007-2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2007-2018 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2010 IBM Corporation. All rights reserved. * Copyright (c) 2012-2013 Sandia National Laboratories. All rights reserved. @@ -16,6 +16,7 @@ * Copyright (c) 2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. + * Copyright (c) 2017-2018 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -48,6 +49,28 @@ typedef struct ompi_osc_rdma_pending_post_t ompi_osc_rdma_pending_post_t; static OBJ_CLASS_INSTANCE(ompi_osc_rdma_pending_post_t, opal_list_item_t, NULL, NULL); +static void ompi_osc_rdma_pending_op_construct (ompi_osc_rdma_pending_op_t *pending_op) +{ + pending_op->op_frag = NULL; + pending_op->op_buffer = NULL; + pending_op->op_result = NULL; + pending_op->op_complete = false; + pending_op->cbfunc = NULL; +} + +static void ompi_osc_rdma_pending_op_destruct (ompi_osc_rdma_pending_op_t *pending_op) +{ + if (NULL != pending_op->op_frag) { + ompi_osc_rdma_frag_complete (pending_op->op_frag); + } + + ompi_osc_rdma_pending_op_construct (pending_op); +} + +OBJ_CLASS_INSTANCE(ompi_osc_rdma_pending_op_t, opal_list_item_t, + ompi_osc_rdma_pending_op_construct, + ompi_osc_rdma_pending_op_destruct); + /** * Dummy completion function for atomic operations */ @@ -55,11 +78,25 @@ void ompi_osc_rdma_atomic_complete (mca_btl_base_module_t *btl, struct mca_btl_b void *local_address, mca_btl_base_registration_handle_t *local_handle, void *context, void *data, int status) { - volatile bool *atomic_complete = (volatile bool *) context; + ompi_osc_rdma_pending_op_t *pending_op = (ompi_osc_rdma_pending_op_t *) context; + + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_INFO, "pending atomic %p complete with status %d", (void*)pending_op, status); + + if (pending_op->op_result) { + memmove (pending_op->op_result, pending_op->op_buffer, pending_op->op_size); + } + + if (NULL != pending_op->cbfunc) { + pending_op->cbfunc (pending_op->cbdata, pending_op->cbcontext, status); + } - if (atomic_complete) { - *atomic_complete = true; + if (NULL != pending_op->op_frag) { + ompi_osc_rdma_frag_complete (pending_op->op_frag); + pending_op->op_frag = NULL; } + + pending_op->op_complete = true; + OBJ_RELEASE(pending_op); } /** @@ -164,7 +201,8 @@ static void ompi_osc_rdma_handle_post (ompi_osc_rdma_module_t *module, int rank, if (rank == peers[j]->rank) { OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_INFO, "got expected post from %d. still expecting posts from %d processes", rank, (int) (npeers - state->num_post_msgs - 1)); - ++state->num_post_msgs; + /* an atomic is not really necessary as this function is currently used but it doesn't hurt */ + ompi_osc_rdma_counter_add (&state->num_post_msgs, 1); return; } } @@ -176,16 +214,90 @@ static void ompi_osc_rdma_handle_post (ompi_osc_rdma_module_t *module, int rank, OPAL_THREAD_SCOPED_LOCK(&module->lock, opal_list_append (&module->pending_posts, &pending_post->super)); } +static void ompi_osc_rdma_check_posts (ompi_osc_rdma_module_t *module) +{ + ompi_osc_rdma_state_t *state = module->state; + ompi_osc_rdma_sync_t *sync = &module->all_sync; + int count = 0; + + if (OMPI_OSC_RDMA_SYNC_TYPE_PSCW == sync->type) { + count = sync->num_peers; + } + + for (int i = 0 ; i < OMPI_OSC_RDMA_POST_PEER_MAX ; ++i) { + /* no post at this index (yet) */ + if (0 == state->post_peers[i]) { + continue; + } + + ompi_osc_rdma_handle_post (module, state->post_peers[i] - 1, sync->peer_list.peers, count); + state->post_peers[i] = 0; + } +} + +static int ompi_osc_rdma_post_peer (ompi_osc_rdma_module_t *module, ompi_osc_rdma_peer_t *peer) +{ + uint64_t target = (uint64_t) (intptr_t) peer->state + offsetof (ompi_osc_rdma_state_t, post_index); + ompi_osc_rdma_lock_t post_index, result, _tmp_value; + int my_rank = ompi_comm_rank (module->comm); + int ret; + + if (peer->rank == my_rank) { + ompi_osc_rdma_handle_post (module, my_rank, NULL, 0); + return OMPI_SUCCESS; + } + + /* get a post index */ + if (!ompi_osc_rdma_peer_local_state (peer)) { + ret = ompi_osc_rdma_lock_btl_fop (module, peer, target, MCA_BTL_ATOMIC_ADD, 1, &post_index, true); + if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { + return ret; + } + } else { + post_index = ompi_osc_rdma_counter_add ((osc_rdma_counter_t *) (intptr_t) target, 1) - 1; + } + + post_index &= OMPI_OSC_RDMA_POST_PEER_MAX - 1; + + target = (uint64_t) (intptr_t) peer->state + offsetof (ompi_osc_rdma_state_t, post_peers) + + sizeof (osc_rdma_counter_t) * post_index; + + do { + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "attempting to post to index %d @ rank %d", (int)post_index, peer->rank); + + _tmp_value = 0; + + /* try to post. if the value isn't 0 then another rank is occupying this index */ + if (!ompi_osc_rdma_peer_local_state (peer)) { + ret = ompi_osc_rdma_lock_btl_cswap (module, peer, target, 0, 1 + (int64_t) my_rank, &result); + if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { + return ret; + } + } else { + result = !ompi_osc_rdma_lock_compare_exchange ((osc_rdma_counter_t *) target, &_tmp_value, + 1 + (osc_rdma_counter_t) my_rank); + } + + if (OPAL_LIKELY(0 == result)) { + break; + } + + /* prevent circular wait by checking for post messages received */ + ompi_osc_rdma_check_posts (module); + + /* zzzzzzzzzzzzz */ + nanosleep (&(struct timespec) {.tv_sec = 0, .tv_nsec = 100}, NULL); + } while (1); + + return OMPI_SUCCESS; +} + int ompi_osc_rdma_post_atomic (ompi_group_t *group, int assert, ompi_win_t *win) { ompi_osc_rdma_module_t *module = GET_MODULE(win); ompi_osc_rdma_peer_t **peers; - int my_rank = ompi_comm_rank (module->comm); ompi_osc_rdma_state_t *state = module->state; - volatile bool atomic_complete; - ompi_osc_rdma_frag_t *frag = NULL; - osc_rdma_counter_t *temp = NULL; - int ret; + int ret = OMPI_SUCCESS; OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "post: %p, %d, %s", (void*) group, assert, win->w_name); @@ -212,21 +324,13 @@ int ompi_osc_rdma_post_atomic (ompi_group_t *group, int assert, ompi_win_t *win) state->num_complete_msgs = 0; OPAL_THREAD_UNLOCK(&module->lock); - /* allocate a temporary buffer for atomic response */ - ret = ompi_osc_rdma_frag_alloc (module, 8, &frag, (char **) &temp); - if ((assert & MPI_MODE_NOCHECK) || 0 == ompi_group_size (group)) { return OMPI_SUCCESS; } - if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { - return OMPI_ERR_OUT_OF_RESOURCE; - } - /* translate group ranks into the communicator */ peers = ompi_osc_rdma_get_peers (module, module->pw_group); if (OPAL_UNLIKELY(NULL == peers)) { - ompi_osc_rdma_frag_complete (frag); return OMPI_ERR_OUT_OF_RESOURCE; } @@ -234,92 +338,17 @@ int ompi_osc_rdma_post_atomic (ompi_group_t *group, int assert, ompi_win_t *win) /* send a hello counter to everyone in group */ for (int i = 0 ; i < ompi_group_size(module->pw_group) ; ++i) { - ompi_osc_rdma_peer_t *peer = peers[i]; - uint64_t target = (uint64_t) (intptr_t) peer->state + offsetof (ompi_osc_rdma_state_t, post_index); - int post_index; - - if (peer->rank == my_rank) { - ompi_osc_rdma_handle_post (module, my_rank, NULL, 0); - continue; - } - - /* get a post index */ - atomic_complete = false; - if (!ompi_osc_rdma_peer_local_state (peer)) { - do { - ret = module->selected_btl->btl_atomic_fop (module->selected_btl, peer->state_endpoint, temp, target, frag->handle, - peer->state_handle, MCA_BTL_ATOMIC_ADD, 1, 0, MCA_BTL_NO_ORDER, - ompi_osc_rdma_atomic_complete, (void *) &atomic_complete, NULL); - assert (OPAL_SUCCESS >= ret); - - if (OMPI_SUCCESS == ret) { - while (!atomic_complete) { - ompi_osc_rdma_progress (module); - } - - break; - } - - ompi_osc_rdma_progress (module); - } while (1); - } else { - *temp = ompi_osc_rdma_counter_add ((osc_rdma_counter_t *) (intptr_t) target, 1) - 1; + ret = ompi_osc_rdma_post_peer (module, peers[i]); + if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { + break; } - post_index = (*temp) & (OMPI_OSC_RDMA_POST_PEER_MAX - 1); - - target = (uint64_t) (intptr_t) peer->state + offsetof (ompi_osc_rdma_state_t, post_peers) + - sizeof (osc_rdma_counter_t) * post_index; - - do { - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "attempting to post to index %d @ rank %d", post_index, peer->rank); - - /* try to post. if the value isn't 0 then another rank is occupying this index */ - if (!ompi_osc_rdma_peer_local_state (peer)) { - atomic_complete = false; - ret = module->selected_btl->btl_atomic_cswap (module->selected_btl, peer->state_endpoint, temp, target, frag->handle, peer->state_handle, - 0, 1 + (int64_t) my_rank, 0, MCA_BTL_NO_ORDER, ompi_osc_rdma_atomic_complete, - (void *) &atomic_complete, NULL); - assert (OPAL_SUCCESS >= ret); - - if (OMPI_SUCCESS == ret) { - while (!atomic_complete) { - ompi_osc_rdma_progress (module); - } - } else { - ompi_osc_rdma_progress (module); - continue; - } - - } else { - *temp = !ompi_osc_rdma_lock_cmpset ((osc_rdma_counter_t *) target, 0, 1 + (osc_rdma_counter_t) my_rank); - } - - if (OPAL_LIKELY(0 == *temp)) { - break; - } - - /* prevent circular wait by checking for post messages received */ - for (int j = 0 ; j < OMPI_OSC_RDMA_POST_PEER_MAX ; ++j) { - /* no post at this index (yet) */ - if (0 == state->post_peers[j]) { - continue; - } - - ompi_osc_rdma_handle_post (module, state->post_peers[j] - 1, NULL, 0); - state->post_peers[j] = 0; - } - - usleep (100); - } while (1); } - ompi_osc_rdma_frag_complete (frag); - ompi_osc_rdma_release_peers (peers, ompi_group_size(module->pw_group)); OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "post complete"); - return OMPI_SUCCESS; + return ret; } int ompi_osc_rdma_start_atomic (ompi_group_t *group, int assert, ompi_win_t *win) @@ -385,8 +414,7 @@ int ompi_osc_rdma_start_atomic (ompi_group_t *group, int assert, ompi_win_t *win "from %d processes", peer->rank, (int) (group_size - state->num_post_msgs - 1)); opal_list_remove_item (&module->pending_posts, &pending_post->super); OBJ_RELEASE(pending_post); - /* only one thread can process post messages so there is no need of atomics here */ - ++state->num_post_msgs; + ompi_osc_rdma_counter_add (&state->num_post_msgs, 1); break; } } @@ -396,16 +424,7 @@ int ompi_osc_rdma_start_atomic (ompi_group_t *group, int assert, ompi_win_t *win while (state->num_post_msgs != group_size) { OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "waiting for post messages. have %d of %d", (int) state->num_post_msgs, group_size); - for (int i = 0 ; i < OMPI_OSC_RDMA_POST_PEER_MAX ; ++i) { - /* no post at this index (yet) */ - if (0 == state->post_peers[i]) { - continue; - } - - ompi_osc_rdma_handle_post (module, state->post_peers[i] - 1, sync->peer_list.peers, group_size); - state->post_peers[i] = 0; - } - + ompi_osc_rdma_check_posts (module); ompi_osc_rdma_progress (module); } } else { @@ -422,9 +441,7 @@ int ompi_osc_rdma_complete_atomic (ompi_win_t *win) { ompi_osc_rdma_module_t *module = GET_MODULE(win); ompi_osc_rdma_sync_t *sync = &module->all_sync; - ompi_osc_rdma_frag_t *frag = NULL; ompi_osc_rdma_peer_t **peers; - void *scratch_lock = NULL; ompi_group_t *group; int group_size, ret; @@ -459,45 +476,19 @@ int ompi_osc_rdma_complete_atomic (ompi_win_t *win) ompi_osc_rdma_sync_rdma_complete (sync); - if (!(MCA_BTL_FLAGS_ATOMIC_OPS & module->selected_btl->btl_flags)) { - /* need a temporary buffer for performing fetching atomics */ - ret = ompi_osc_rdma_frag_alloc (module, 8, &frag, (char **) &scratch_lock); - if (OPAL_UNLIKELY(OPAL_SUCCESS != ret)) { - return ret; - } - } - /* for each process in the group increment their number of complete messages */ for (int i = 0 ; i < group_size ; ++i) { ompi_osc_rdma_peer_t *peer = peers[i]; intptr_t target = (intptr_t) peer->state + offsetof (ompi_osc_rdma_state_t, num_complete_msgs); if (!ompi_osc_rdma_peer_local_state (peer)) { - do { - if (MCA_BTL_FLAGS_ATOMIC_OPS & module->selected_btl->btl_flags) { - ret = module->selected_btl->btl_atomic_op (module->selected_btl, peer->state_endpoint, target, peer->state_handle, - MCA_BTL_ATOMIC_ADD, 1, 0, MCA_BTL_NO_ORDER, - ompi_osc_rdma_atomic_complete, NULL, NULL); - } else { - /* don't care about the read value so use the scratch lock */ - ret = module->selected_btl->btl_atomic_fop (module->selected_btl, peer->state_endpoint, scratch_lock, - target, frag->handle, peer->state_handle, MCA_BTL_ATOMIC_ADD, 1, - 0, MCA_BTL_NO_ORDER, ompi_osc_rdma_atomic_complete, NULL, NULL); - } - - if (OPAL_LIKELY(OMPI_SUCCESS == ret)) { - break; - } - } while (1); + ret = ompi_osc_rdma_lock_btl_op (module, peer, target, MCA_BTL_ATOMIC_ADD, 1, true); + assert (OMPI_SUCCESS == ret); } else { (void) ompi_osc_rdma_counter_add ((osc_rdma_counter_t *) target, 1); } } - if (frag) { - ompi_osc_rdma_frag_complete (frag); - } - /* release our reference to peers in this group */ ompi_osc_rdma_release_peers (peers, group_size); @@ -534,7 +525,6 @@ int ompi_osc_rdma_wait_atomic (ompi_win_t *win) } OPAL_THREAD_LOCK(&module->lock); - state->num_complete_msgs = 0; group = module->pw_group; module->pw_group = NULL; OPAL_THREAD_UNLOCK(&module->lock); @@ -585,6 +575,8 @@ int ompi_osc_rdma_test_atomic (ompi_win_t *win, int *flag) OBJ_RELEASE(group); + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "test complete. returning flag: true"); + return OMPI_SUCCESS; } @@ -601,6 +593,8 @@ int ompi_osc_rdma_fence_atomic (int assert, ompi_win_t *win) return OMPI_ERR_RMA_SYNC; } + /* NTH: locking here isn't really needed per-se but it may make user synchronization errors more + * predicable. if the user is using RMA correctly then there will be no contention on this lock. */ OPAL_THREAD_LOCK(&module->lock); /* active sends are now active (we will close the epoch if NOSUCCEED is specified) */ @@ -612,22 +606,17 @@ int ompi_osc_rdma_fence_atomic (int assert, ompi_win_t *win) } /* technically it is possible to enter a lock epoch (which will close the fence epoch) if - * no communication has occurred. this flag will be set on the next put, get, accumulate, etc. */ + * no communication has occurred. this flag will be set to true on the next put, get, + * accumulate, etc if no other synchronization call is made. yay fence */ module->all_sync.epoch_active = false; - /* short-circuit the noprecede case */ - if (0 != (assert & MPI_MODE_NOPRECEDE)) { - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "fence complete (short circuit)"); - /* no communication can occur until a peer has entered the same fence epoch. for now - * a barrier is used to ensure this is the case. */ - ret = module->comm->c_coll->coll_barrier(module->comm, module->comm->c_coll->coll_barrier_module); - OPAL_THREAD_UNLOCK(&module->lock); - return ret; - } + /* there really is no practical difference between NOPRECEDE and the normal case. in both cases there + * may be local stores that will not be visible as they should if we do not barrier. since that is the + * case there is no optimization for NOPRECEDE */ ompi_osc_rdma_sync_rdma_complete (&module->all_sync); - /* ensure all writes to my memory are complete */ + /* ensure all writes to my memory are complete (both local stores, and RMA operations) */ ret = module->comm->c_coll->coll_barrier(module->comm, module->comm->c_coll->coll_barrier_module); if (assert & MPI_MODE_NOSUCCEED) { diff --git a/ompi/mca/osc/rdma/osc_rdma_comm.c b/ompi/mca/osc/rdma/osc_rdma_comm.c index d4daad37b6f..fda90e91221 100644 --- a/ompi/mca/osc/rdma/osc_rdma_comm.c +++ b/ompi/mca/osc/rdma/osc_rdma_comm.c @@ -1,8 +1,11 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* - * Copyright (c) 2014-2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2014-2018 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2016 Intel, Inc. All rights reserved. + * Copyright (c) 2016-2018 Intel, Inc. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -18,6 +21,27 @@ #include "ompi/mca/osc/base/osc_base_obj_convert.h" #include "opal/align.h" +/* helper functions */ +static inline void ompi_osc_rdma_cleanup_rdma (ompi_osc_rdma_sync_t *sync, bool dec_always, ompi_osc_rdma_frag_t *frag, + mca_btl_base_registration_handle_t *handle, ompi_osc_rdma_request_t *request) +{ + if (frag) { + ompi_osc_rdma_frag_complete (frag); + } else { + ompi_osc_rdma_deregister (sync->module, handle); + } + + if (request) { + (void) OPAL_THREAD_ADD_FETCH32 (&request->outstanding_requests, -1); + } + + if (dec_always) { + ompi_osc_rdma_sync_rdma_dec_always (sync); + } else { + ompi_osc_rdma_sync_rdma_dec (sync); + } +} + static int ompi_osc_rdma_get_contig (ompi_osc_rdma_sync_t *sync, ompi_osc_rdma_peer_t *peer, uint64_t source_address, mca_btl_base_registration_handle_t *source_handle, void *target_buffer, size_t size, ompi_osc_rdma_request_t *request); @@ -34,17 +58,30 @@ int ompi_osc_get_data_blocking (ompi_osc_rdma_module_t *module, struct mca_btl_b uint64_t source_address, mca_btl_base_registration_handle_t *source_handle, void *data, size_t len) { + const size_t btl_alignment_mask = ALIGNMENT_MASK(module->selected_btl->btl_get_alignment); mca_btl_base_registration_handle_t *local_handle = NULL; ompi_osc_rdma_frag_t *frag = NULL; volatile bool read_complete = false; + size_t aligned_len, offset; + uint64_t aligned_addr = (source_address + btl_alignment_mask) & ~btl_alignment_mask; char *ptr = data; int ret; - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "reading state data from endpoint %p. source: 0x%" PRIx64 ", len: %lu", - (void *) endpoint, source_address, (unsigned long) len); + offset = source_address & btl_alignment_mask; + aligned_len = (len + offset + btl_alignment_mask) & ~btl_alignment_mask; + + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "reading data from endpoint %p. source: 0x%" PRIx64 " (aligned: 0x%" PRIx64 + "), len: %lu (aligned: %lu)", (void *) endpoint, source_address, aligned_addr, (unsigned long) len, + (unsigned long) aligned_len); if (module->selected_btl->btl_register_mem && len >= module->selected_btl->btl_get_local_registration_threshold) { - ret = ompi_osc_rdma_frag_alloc (module, len, &frag, &ptr); + do { + ret = ompi_osc_rdma_frag_alloc (module, aligned_len, &frag, &ptr); + if (OPAL_UNLIKELY(OMPI_ERR_OUT_OF_RESOURCE == ret)) { + ompi_osc_rdma_progress (module); + } + } while (OMPI_ERR_OUT_OF_RESOURCE == ret); + if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_ERROR, "error allocating temporary buffer"); return ret; @@ -58,10 +95,10 @@ int ompi_osc_get_data_blocking (ompi_osc_rdma_module_t *module, struct mca_btl_b assert (!(source_address & ALIGNMENT_MASK(module->selected_btl->btl_get_alignment))); do { - ret = module->selected_btl->btl_get (module->selected_btl, endpoint, ptr, source_address, - local_handle, source_handle, len, 0, MCA_BTL_NO_ORDER, + ret = module->selected_btl->btl_get (module->selected_btl, endpoint, ptr, aligned_addr, + local_handle, source_handle, aligned_len, 0, MCA_BTL_NO_ORDER, ompi_osc_get_data_complete, (void *) &read_complete, NULL); - if (OPAL_LIKELY(OMPI_ERR_OUT_OF_RESOURCE != ret)) { + if (!ompi_osc_rdma_oor (ret)) { break; } @@ -88,7 +125,7 @@ int ompi_osc_get_data_blocking (ompi_osc_rdma_module_t *module, struct mca_btl_b opal_memchecker_base_mem_defined (ptr, len); if (frag) { - memcpy (data, ptr, len); + memcpy (data, ptr + offset, len); /* done with the fragment */ ompi_osc_rdma_frag_complete (frag); @@ -157,7 +194,7 @@ static int ompi_osc_rdma_master_noncontig (ompi_osc_rdma_sync_t *sync, void *loc subreq = NULL; - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "scheduling rdma on non-contiguous datatype(s)"); + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "scheduling rdma on non-contiguous datatype(s) or large region"); /* prepare convertors for the source and target. these convertors will be used to determine the * contiguous segments within the source and target. */ @@ -188,7 +225,7 @@ static int ompi_osc_rdma_master_noncontig (ompi_osc_rdma_sync_t *sync, void *loc remote_iov_count = OMPI_OSC_RDMA_DECODE_MAX; remote_iov_index = 0; - /* opal_convertor_raw returns done when it has reached the end of the data */ + /* opal_convertor_raw returns true when it has reached the end of the data */ done = opal_convertor_raw (&remote_convertor, remote_iovec, &remote_iov_count, &remote_size); /* loop on the target segments until we have exhaused the decoded source data */ @@ -214,7 +251,7 @@ static int ompi_osc_rdma_master_noncontig (ompi_osc_rdma_sync_t *sync, void *loc subreq->parent_request = request; if (request) { - (void) OPAL_THREAD_ADD32 (&request->outstanding_requests, 1); + (void) OPAL_THREAD_ADD_FETCH32 (&request->outstanding_requests, 1); } } else if (!alloc_reqs) { subreq = request; @@ -229,7 +266,7 @@ static int ompi_osc_rdma_master_noncontig (ompi_osc_rdma_sync_t *sync, void *loc if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { if (OPAL_UNLIKELY(OMPI_ERR_OUT_OF_RESOURCE != ret)) { if (request) { - (void) OPAL_THREAD_ADD32 (&request->outstanding_requests, -1); + ompi_osc_rdma_request_deref (request); } if (alloc_reqs) { @@ -259,11 +296,7 @@ static int ompi_osc_rdma_master_noncontig (ompi_osc_rdma_sync_t *sync, void *loc if (request) { /* release our reference so the request can complete */ - if (1 == request->outstanding_requests) { - ompi_osc_rdma_request_complete (request, OMPI_SUCCESS); - } - - (void) OPAL_THREAD_ADD32 (&request->outstanding_requests, -1); + ompi_osc_rdma_request_deref (request); } OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_TRACE, "finished scheduling rdma on non-contiguous datatype(s)"); @@ -350,14 +383,12 @@ static void ompi_osc_rdma_put_complete (struct mca_btl_base_module_t *btl, struc void *context, void *data, int status) { ompi_osc_rdma_sync_t *sync = (ompi_osc_rdma_sync_t *) context; - ompi_osc_rdma_frag_t *frag = (ompi_osc_rdma_frag_t *) data; - ompi_osc_rdma_request_t *request = NULL; assert (OPAL_SUCCESS == status); /* the lowest bit is used as a flag indicating this put operation has a request */ if ((intptr_t) context & 0x1) { - request = (ompi_osc_rdma_request_t *) ((intptr_t) context & ~1); + ompi_osc_rdma_request_t *request = request = (ompi_osc_rdma_request_t *) ((intptr_t) context & ~1); sync = request->sync; /* NTH -- TODO: better error handling */ @@ -367,15 +398,42 @@ static void ompi_osc_rdma_put_complete (struct mca_btl_base_module_t *btl, struc OSC_RDMA_VERBOSE(status ? MCA_BASE_VERBOSE_ERROR : MCA_BASE_VERBOSE_TRACE, "btl put complete on sync %p. local " "address %p. opal status %d", (void *) sync, local_address, status); - if (frag) { - ompi_osc_rdma_frag_complete (frag); - } else { + if (data) { + ompi_osc_rdma_frag_complete ((ompi_osc_rdma_frag_t *) data); + } else if (local_handle) { ompi_osc_rdma_deregister (sync->module, local_handle); } ompi_osc_rdma_sync_rdma_dec (sync); } +static void ompi_osc_rdma_put_complete_flush (struct mca_btl_base_module_t *btl, struct mca_btl_base_endpoint_t *endpoint, + void *local_address, mca_btl_base_registration_handle_t *local_handle, + void *context, void *data, int status) +{ + ompi_osc_rdma_module_t *module = (ompi_osc_rdma_module_t *) context; + + assert (OPAL_SUCCESS == status); + + /* the lowest bit is used as a flag indicating this put operation has a request */ + if ((intptr_t) context & 0x1) { + ompi_osc_rdma_request_t *request = request = (ompi_osc_rdma_request_t *) ((intptr_t) context & ~1); + module = request->module; + + /* NTH -- TODO: better error handling */ + ompi_osc_rdma_request_complete (request, status); + } + + OSC_RDMA_VERBOSE(status ? MCA_BASE_VERBOSE_ERROR : MCA_BASE_VERBOSE_TRACE, "btl put complete on module %p. local " + "address %p. opal status %d", (void *) module, local_address, status); + + if (data) { + ompi_osc_rdma_frag_complete ((ompi_osc_rdma_frag_t *) data); + } else if (local_handle) { + ompi_osc_rdma_deregister (module, local_handle); + } +} + static void ompi_osc_rdma_aggregate_put_complete (struct mca_btl_base_module_t *btl, struct mca_btl_base_endpoint_t *endpoint, void *local_address, mca_btl_base_registration_handle_t *local_handle, void *context, void *data, int status) @@ -421,14 +479,12 @@ static int ompi_osc_rdma_put_real (ompi_osc_rdma_sync_t *sync, ompi_osc_rdma_pee ++module->put_retry_count; - if (OPAL_ERR_OUT_OF_RESOURCE != ret && OPAL_ERR_TEMP_OUT_OF_RESOURCE != ret) { + if (!ompi_osc_rdma_oor (ret)) { break; } /* spin a bit on progress */ - for (int i = 0 ; i < 10 ; ++i) { - ompi_osc_rdma_progress (module); - } + ompi_osc_rdma_progress (module); } while (1); OSC_RDMA_VERBOSE(10, "btl put failed with opal error code %d", ret); @@ -436,6 +492,7 @@ static int ompi_osc_rdma_put_real (ompi_osc_rdma_sync_t *sync, ompi_osc_rdma_pee return ret; } +#if 0 static void ompi_osc_rdma_aggregate_append (ompi_osc_rdma_aggregation_t *aggregation, ompi_osc_rdma_request_t *request, void *source_buffer, size_t size) { @@ -494,19 +551,24 @@ static int ompi_osc_rdma_aggregate_alloc (ompi_osc_rdma_sync_t *sync, ompi_osc_r return OMPI_SUCCESS; } +#endif -static int ompi_osc_rdma_put_contig (ompi_osc_rdma_sync_t *sync, ompi_osc_rdma_peer_t *peer, uint64_t target_address, - mca_btl_base_registration_handle_t *target_handle, void *source_buffer, size_t size, - ompi_osc_rdma_request_t *request) +int ompi_osc_rdma_put_contig (ompi_osc_rdma_sync_t *sync, ompi_osc_rdma_peer_t *peer, uint64_t target_address, + mca_btl_base_registration_handle_t *target_handle, void *source_buffer, size_t size, + ompi_osc_rdma_request_t *request) { ompi_osc_rdma_module_t *module = sync->module; +#if 0 ompi_osc_rdma_aggregation_t *aggregation = peer->aggregate; +#endif mca_btl_base_registration_handle_t *local_handle = NULL; + mca_btl_base_rdma_completion_fn_t cbfunc = NULL; ompi_osc_rdma_frag_t *frag = NULL; char *ptr = source_buffer; void *cbcontext; int ret; +#if 0 if (aggregation) { if (size <= (aggregation->buffer_size - aggregation->buffer_used) && (target_handle == aggregation->target_handle) && (target_address == aggregation->target_address + aggregation->buffer_used)) { @@ -532,6 +594,7 @@ static int ompi_osc_rdma_put_contig (ompi_osc_rdma_sync_t *sync, ompi_osc_rdma_p return ret; } } +#endif if (module->selected_btl->btl_register_mem && size > module->selected_btl->btl_put_local_registration_threshold) { ret = ompi_osc_rdma_frag_alloc (module, size, &frag, &ptr); @@ -546,23 +609,36 @@ static int ompi_osc_rdma_put_contig (ompi_osc_rdma_sync_t *sync, ompi_osc_rdma_p } } + if (ompi_osc_rdma_use_btl_flush (module)) { + /* NTH: when using the btl_flush function there is no guarantee that the callback will happen + * before the flush is complete. because of this there is a chance that the sync object will be + * released before there is a callback. to handle this case we call different callback that doesn't + * use the sync object. its possible the btl sematics will change in the future and the callback + * will happen *before* flush is considered complete. if that is the case this workaround can be + * removed */ + cbcontext = (void *) module; + if (request || local_handle || frag) { + cbfunc = ompi_osc_rdma_put_complete_flush; + } + /* else the callback function is a no-op so do not bother specifying one */ + } else { + cbcontext = (void *) sync; + cbfunc = ompi_osc_rdma_put_complete; + } + /* increment the outstanding request counter in the request object */ if (request) { - (void) OPAL_THREAD_ADD32 (&request->outstanding_requests, 1); + (void) OPAL_THREAD_ADD_FETCH32 (&request->outstanding_requests, 1); cbcontext = (void *) ((intptr_t) request | 1); request->sync = sync; - } else { - cbcontext = (void *) sync; } - ret = ompi_osc_rdma_put_real (sync, peer, target_address, target_handle, ptr, local_handle, size, ompi_osc_rdma_put_complete, + ret = ompi_osc_rdma_put_real (sync, peer, target_address, target_handle, ptr, local_handle, size, cbfunc, cbcontext, frag); - if (OPAL_UNLIKELY(OMPI_SUCCESS == ret)) { - return OMPI_SUCCESS; + if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { + ompi_osc_rdma_cleanup_rdma (sync, false, frag, local_handle, request); } - ompi_osc_rdma_cleanup_rdma (sync, frag, local_handle, request); - return ret; } @@ -581,20 +657,26 @@ static void ompi_osc_rdma_get_complete (struct mca_btl_base_module_t *btl, struc assert (OPAL_SUCCESS == status); - if (request->buffer || NULL != frag) { + if (request->buffer || frag) { if (OPAL_LIKELY(OMPI_SUCCESS == status)) { memcpy (origin_addr, (void *) source, request->len); } } + if (NULL == request->buffer) { + /* completion detection can handle this case without the counter when using btl_flush */ + ompi_osc_rdma_sync_rdma_dec (sync); + } else { + /* the counter was needed to keep track of the number of outstanding operations */ + ompi_osc_rdma_sync_rdma_dec_always (sync); + } + if (NULL != frag) { ompi_osc_rdma_frag_complete (frag); } else { ompi_osc_rdma_deregister (sync->module, local_handle); } - ompi_osc_rdma_sync_rdma_dec (sync); - ompi_osc_rdma_request_complete (request, status); } @@ -621,7 +703,7 @@ int ompi_osc_rdma_peer_aggregate_flush (ompi_osc_rdma_peer_t *peer) return OMPI_SUCCESS; } - ompi_osc_rdma_cleanup_rdma (aggregation->sync, aggregation->frag, NULL, NULL); + ompi_osc_rdma_cleanup_rdma (aggregation->sync, false, aggregation->frag, NULL, NULL); ompi_osc_rdma_aggregation_return (aggregation); @@ -640,12 +722,12 @@ static int ompi_osc_rdma_get_partial (ompi_osc_rdma_sync_t *sync, ompi_osc_rdma_ subreq->internal = true; subreq->type = OMPI_OSC_RDMA_TYPE_RDMA; subreq->parent_request = request; - (void) OPAL_THREAD_ADD32 (&request->outstanding_requests, 1); + (void) OPAL_THREAD_ADD_FETCH32 (&request->outstanding_requests, 1); ret = ompi_osc_rdma_get_contig (sync, peer, source_address, source_handle, target_buffer, size, subreq); if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { OMPI_OSC_RDMA_REQUEST_RETURN(subreq); - (void) OPAL_THREAD_ADD32 (&request->outstanding_requests, -1); + ompi_osc_rdma_request_deref (request); } return ret; @@ -662,6 +744,7 @@ static int ompi_osc_rdma_get_contig (ompi_osc_rdma_sync_t *sync, ompi_osc_rdma_p osc_rdma_size_t aligned_len; osc_rdma_base_t aligned_source_base, aligned_source_bound; char *ptr = target_buffer; + bool counter_needs_inc = false; int ret; aligned_source_base = source_address & ~btl_alignment_mask; @@ -743,19 +826,31 @@ static int ompi_osc_rdma_get_contig (ompi_osc_rdma_sync_t *sync, ompi_osc_rdma_p request->origin_addr = target_buffer; request->sync = sync; - ompi_osc_rdma_sync_rdma_inc (sync); + if (request->buffer) { + /* always increment the outstanding RDMA counter as the btl_flush function does not guarantee callback completion, + * just operation completion. */ + counter_needs_inc = true; + ompi_osc_rdma_sync_rdma_inc_always (sync); + } else { + /* if this operation is being buffered with a frag then ompi_osc_rdma_sync_rdma_complete() can use the number + * of pending operations on the rdma_frag as an indicator as to whether the operation is complete. this can + * only be done since there is only on rdma frag per module. if that changes this logic will need to be changed + * as well. this path also covers the case where the get operation is not buffered. */ + ompi_osc_rdma_sync_rdma_inc (sync); + } do { - ret = module->selected_btl->btl_get (module->selected_btl, peer->data_endpoint, ptr, aligned_source_base, local_handle, - source_handle, aligned_len, 0, MCA_BTL_NO_ORDER, ompi_osc_rdma_get_complete, + ret = module->selected_btl->btl_get (module->selected_btl, peer->data_endpoint, ptr, + aligned_source_base, local_handle, source_handle, + aligned_len, 0, MCA_BTL_NO_ORDER, ompi_osc_rdma_get_complete, request, frag); - if (OPAL_UNLIKELY(OMPI_SUCCESS == ret)) { + if (OPAL_LIKELY(OMPI_SUCCESS == ret)) { return OMPI_SUCCESS; } ++module->get_retry_count; - if (OPAL_ERR_OUT_OF_RESOURCE != ret && OPAL_ERR_TEMP_OUT_OF_RESOURCE != ret) { + if (!ompi_osc_rdma_oor (ret)) { break; } @@ -767,14 +862,14 @@ static int ompi_osc_rdma_get_contig (ompi_osc_rdma_sync_t *sync, ompi_osc_rdma_p OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_ERROR, "btl get failed with opal error code %d", ret); - ompi_osc_rdma_cleanup_rdma (sync, frag, local_handle, request); + ompi_osc_rdma_cleanup_rdma (sync, counter_needs_inc, frag, local_handle, request); return ret; } static inline int ompi_osc_rdma_put_w_req (ompi_osc_rdma_sync_t *sync, const void *origin_addr, int origin_count, ompi_datatype_t *origin_datatype, ompi_osc_rdma_peer_t *peer, - OPAL_PTRDIFF_TYPE target_disp, int target_count, + ptrdiff_t target_disp, int target_count, ompi_datatype_t *target_datatype, ompi_osc_rdma_request_t *request) { ompi_osc_rdma_module_t *module = sync->module; @@ -791,7 +886,14 @@ static inline int ompi_osc_rdma_put_w_req (ompi_osc_rdma_sync_t *sync, const voi return OMPI_SUCCESS; } - ret = osc_rdma_get_remote_segment (module, peer, target_disp, target_datatype->super.size * target_count, + ptrdiff_t len, offset; + // a buffer defined by (buf, count, dt) + // will have data starting at buf+offset and ending len bytes later: + len = opal_datatype_span(&target_datatype->super, target_count, &offset); + + // the below function wants arg4 to be the number of bytes after + // source_disp that the data ends, which is offset+len + ret = osc_rdma_get_remote_segment (module, peer, target_disp, offset+len, &target_address, &target_handle); if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { return ret; @@ -809,12 +911,13 @@ static inline int ompi_osc_rdma_put_w_req (ompi_osc_rdma_sync_t *sync, const voi } static inline int ompi_osc_rdma_get_w_req (ompi_osc_rdma_sync_t *sync, void *origin_addr, int origin_count, ompi_datatype_t *origin_datatype, - ompi_osc_rdma_peer_t *peer, OPAL_PTRDIFF_TYPE source_disp, int source_count, + ompi_osc_rdma_peer_t *peer, ptrdiff_t source_disp, int source_count, ompi_datatype_t *source_datatype, ompi_osc_rdma_request_t *request) { ompi_osc_rdma_module_t *module = sync->module; mca_btl_base_registration_handle_t *source_handle; uint64_t source_address; + ptrdiff_t source_span, source_lb; int ret; /* short-circuit case */ @@ -826,7 +929,11 @@ static inline int ompi_osc_rdma_get_w_req (ompi_osc_rdma_sync_t *sync, void *ori return OMPI_SUCCESS; } - ret = osc_rdma_get_remote_segment (module, peer, source_disp, source_datatype->super.size * source_count, + // a buffer defined by (buf, count, dt) + // will have data starting at buf+offset and ending len bytes later: + source_span = opal_datatype_span(&source_datatype->super, source_count, &source_lb); + + ret = osc_rdma_get_remote_segment (module, peer, source_disp, source_span+source_lb, &source_address, &source_handle); if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { return ret; @@ -843,7 +950,7 @@ static inline int ompi_osc_rdma_get_w_req (ompi_osc_rdma_sync_t *sync, void *ori module->selected_btl->btl_get_limit, ompi_osc_rdma_get_contig, true); } int ompi_osc_rdma_put (const void *origin_addr, int origin_count, ompi_datatype_t *origin_datatype, - int target_rank, OPAL_PTRDIFF_TYPE target_disp, int target_count, + int target_rank, ptrdiff_t target_disp, int target_count, ompi_datatype_t *target_datatype, ompi_win_t *win) { ompi_osc_rdma_module_t *module = GET_MODULE(win); @@ -864,7 +971,7 @@ int ompi_osc_rdma_put (const void *origin_addr, int origin_count, ompi_datatype_ } int ompi_osc_rdma_rput (const void *origin_addr, int origin_count, ompi_datatype_t *origin_datatype, - int target_rank, OPAL_PTRDIFF_TYPE target_disp, int target_count, + int target_rank, ptrdiff_t target_disp, int target_count, ompi_datatype_t *target_datatype, ompi_win_t *win, ompi_request_t **request) { @@ -899,7 +1006,7 @@ int ompi_osc_rdma_rput (const void *origin_addr, int origin_count, ompi_datatype } int ompi_osc_rdma_get (void *origin_addr, int origin_count, ompi_datatype_t *origin_datatype, - int source_rank, OPAL_PTRDIFF_TYPE source_disp, int source_count, + int source_rank, ptrdiff_t source_disp, int source_count, ompi_datatype_t *source_datatype, ompi_win_t *win) { ompi_osc_rdma_module_t *module = GET_MODULE(win); @@ -920,7 +1027,7 @@ int ompi_osc_rdma_get (void *origin_addr, int origin_count, ompi_datatype_t *ori } int ompi_osc_rdma_rget (void *origin_addr, int origin_count, ompi_datatype_t *origin_datatype, - int source_rank, OPAL_PTRDIFF_TYPE source_disp, int source_count, + int source_rank, ptrdiff_t source_disp, int source_count, ompi_datatype_t *source_datatype, ompi_win_t *win, ompi_request_t **request) { diff --git a/ompi/mca/osc/rdma/osc_rdma_comm.h b/ompi/mca/osc/rdma/osc_rdma_comm.h index e9b048c56ee..0f3d9f19c59 100644 --- a/ompi/mca/osc/rdma/osc_rdma_comm.h +++ b/ompi/mca/osc/rdma/osc_rdma_comm.h @@ -2,6 +2,8 @@ /* * Copyright (c) 2014-2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -22,23 +24,6 @@ #define min(a,b) ((a) < (b) ? (a) : (b)) #define ALIGNMENT_MASK(x) ((x) ? (x) - 1 : 0) -/* helper functions */ -static inline void ompi_osc_rdma_cleanup_rdma (ompi_osc_rdma_sync_t *sync, ompi_osc_rdma_frag_t *frag, - mca_btl_base_registration_handle_t *handle, ompi_osc_rdma_request_t *request) -{ - if (frag) { - ompi_osc_rdma_frag_complete (frag); - } else { - ompi_osc_rdma_deregister (sync->module, handle); - } - - if (request) { - (void) OPAL_THREAD_ADD32 (&request->outstanding_requests, -1); - } - - ompi_osc_rdma_sync_rdma_dec (sync); -} - /** * @brief find a remote segment associate with the memory region * @@ -53,7 +38,7 @@ static inline void ompi_osc_rdma_cleanup_rdma (ompi_osc_rdma_sync_t *sync, ompi_ * @returns OMPI_ERR_RMA_RANGE if the address range is not valid at the remote window * @returns other OMPI error on error */ -static inline int osc_rdma_get_remote_segment (ompi_osc_rdma_module_t *module, ompi_osc_rdma_peer_t *peer, OPAL_PTRDIFF_TYPE target_disp, +static inline int osc_rdma_get_remote_segment (ompi_osc_rdma_module_t *module, ompi_osc_rdma_peer_t *peer, ptrdiff_t target_disp, size_t length, uint64_t *remote_address, mca_btl_base_registration_handle_t **remote_handle) { ompi_osc_rdma_region_t *region; @@ -97,20 +82,20 @@ static inline int osc_rdma_get_remote_segment (ompi_osc_rdma_module_t *module, o /* prototypes for implementations of MPI RMA window functions. these will be called from the * mpi interface (ompi/mpi/c) */ int ompi_osc_rdma_put (const void *origin_addr, int origin_count, ompi_datatype_t *origin_dt, - int target, OPAL_PTRDIFF_TYPE target_disp, int target_count, + int target, ptrdiff_t target_disp, int target_count, ompi_datatype_t *target_dt, ompi_win_t *win); int ompi_osc_rdma_get (void *origin_addr, int origin_count, ompi_datatype_t *origin_dt, - int target, OPAL_PTRDIFF_TYPE target_disp, int target_count, + int target, ptrdiff_t target_disp, int target_count, ompi_datatype_t *target_dt, ompi_win_t *win); int ompi_osc_rdma_rput (const void *origin_addr, int origin_count, ompi_datatype_t *origin_dt, - int target, OPAL_PTRDIFF_TYPE target_disp, int target_count, + int target, ptrdiff_t target_disp, int target_count, ompi_datatype_t *target_dt, ompi_win_t *win, ompi_request_t **request); int ompi_osc_rdma_rget (void *origin_addr, int origin_count, ompi_datatype_t *origin_dt, - int target, OPAL_PTRDIFF_TYPE target_disp, int target_count, + int target, ptrdiff_t target_disp, int target_count, ompi_datatype_t *target_dt, ompi_win_t *win, ompi_request_t **request); @@ -132,4 +117,8 @@ int ompi_osc_get_data_blocking (ompi_osc_rdma_module_t *module, struct mca_btl_b uint64_t source_address, mca_btl_base_registration_handle_t *source_handle, void *data, size_t len); +int ompi_osc_rdma_put_contig (ompi_osc_rdma_sync_t *sync, ompi_osc_rdma_peer_t *peer, uint64_t target_address, + mca_btl_base_registration_handle_t *target_handle, void *source_buffer, size_t size, + ompi_osc_rdma_request_t *request); + #endif /* OMPI_OSC_RDMA_COMM_H */ diff --git a/ompi/mca/osc/rdma/osc_rdma_component.c b/ompi/mca/osc/rdma/osc_rdma_component.c index 36afaed33e4..b14539589be 100644 --- a/ompi/mca/osc/rdma/osc_rdma_component.c +++ b/ompi/mca/osc/rdma/osc_rdma_component.c @@ -9,13 +9,15 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2007-2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2007-2018 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2006-2008 University of Houston. All rights reserved. * Copyright (c) 2010 Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2012-2015 Sandia National Laboratories. All rights reserved. * Copyright (c) 2015 NVIDIA Corporation. All rights reserved. - * Copyright (c) 2015 Intel, Inc. All rights reserved. + * Copyright (c) 2015-2017 Intel, Inc. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. + * Copyright (c) 2018 Cisco Systems, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -43,6 +45,7 @@ #if OPAL_CUDA_SUPPORT #include "opal/datatype/opal_datatype_cuda.h" #endif /* OPAL_CUDA_SUPPORT */ +#include "opal/util/info_subscriber.h" #include "ompi/info/info.h" #include "ompi/communicator/communicator.h" @@ -53,23 +56,34 @@ #include "opal/mca/btl/base/base.h" #include "opal/mca/base/mca_base_pvar.h" #include "ompi/mca/bml/base/base.h" +#include "ompi/mca/mtl/base/base.h" static int ompi_osc_rdma_component_register (void); static int ompi_osc_rdma_component_init (bool enable_progress_threads, bool enable_mpi_threads); static int ompi_osc_rdma_component_finalize (void); static int ompi_osc_rdma_component_query (struct ompi_win_t *win, void **base, size_t size, int disp_unit, - struct ompi_communicator_t *comm, struct ompi_info_t *info, + struct ompi_communicator_t *comm, struct opal_info_t *info, int flavor); static int ompi_osc_rdma_component_select (struct ompi_win_t *win, void **base, size_t size, int disp_unit, - struct ompi_communicator_t *comm, struct ompi_info_t *info, + struct ompi_communicator_t *comm, struct opal_info_t *info, int flavor, int *model); - -static int ompi_osc_rdma_set_info (struct ompi_win_t *win, struct ompi_info_t *info); -static int ompi_osc_rdma_get_info (struct ompi_win_t *win, struct ompi_info_t **info_used); - +#if 0 // stale code? +static int ompi_osc_rdma_set_info (struct ompi_win_t *win, struct opal_info_t *info); +static int ompi_osc_rdma_get_info (struct ompi_win_t *win, struct opal_info_t **info_used); +#endif static int ompi_osc_rdma_query_btls (ompi_communicator_t *comm, struct mca_btl_base_module_t **btl); +static int ompi_osc_rdma_query_mtls (void); + +static char* ompi_osc_rdma_set_no_lock_info(opal_infosubscriber_t *obj, char *key, char *value); static char *ompi_osc_rdma_btl_names; +static char *ompi_osc_rdma_mtl_names; + +static const mca_base_var_enum_value_t ompi_osc_rdma_locking_modes[] = { + {.value = OMPI_OSC_RDMA_LOCKING_TWO_LEVEL, .string = "two_level"}, + {.value = OMPI_OSC_RDMA_LOCKING_ON_DEMAND, .string = "on_demand"}, + {.string = NULL}, +}; ompi_osc_rdma_component_t mca_osc_rdma_component = { .super = { @@ -126,21 +140,18 @@ ompi_osc_base_module_t ompi_osc_rdma_module_rdma_template = { .osc_flush_all = ompi_osc_rdma_flush_all, .osc_flush_local = ompi_osc_rdma_flush_local, .osc_flush_local_all = ompi_osc_rdma_flush_local_all, - - .osc_set_info = ompi_osc_rdma_set_info, - .osc_get_info = ompi_osc_rdma_get_info }; /* look up parameters for configuring this window. The code first looks in the info structure passed by the user, then it checks for a matching MCA variable. */ -static bool check_config_value_bool (char *key, ompi_info_t *info) +static bool check_config_value_bool (char *key, opal_info_t *info) { int ret, flag, param; bool result = false; const bool *flag_value = &result; - ret = ompi_info_get_bool (info, key, &result, &flag); + ret = opal_info_get_bool (info, key, &result, &flag); if (OMPI_SUCCESS == ret && flag) { return result; } @@ -166,61 +177,96 @@ static int ompi_osc_rdma_pvar_read (const struct mca_base_pvar_t *pvar, void *va static int ompi_osc_rdma_component_register (void) { + char *description_str; + mca_base_var_enum_t *new_enum; + mca_osc_rdma_component.no_locks = false; - (void) mca_base_component_var_register(&mca_osc_rdma_component.super.osc_version, - "no_locks", "Enable optimizations available only if MPI_LOCK is " - "not used. Info key of same name overrides this value (default: false)", + asprintf(&description_str, "Enable optimizations available only if MPI_LOCK is " + "not used. Info key of same name overrides this value (default: %s)", + mca_osc_rdma_component.no_locks ? "true" : "false"); + (void) mca_base_component_var_register(&mca_osc_rdma_component.super.osc_version, "no_locks", description_str, MCA_BASE_VAR_TYPE_BOOL, NULL, 0, 0, OPAL_INFO_LVL_5, MCA_BASE_VAR_SCOPE_GROUP, &mca_osc_rdma_component.no_locks); + free(description_str); mca_osc_rdma_component.acc_single_intrinsic = false; + asprintf(&description_str, "Enable optimizations for MPI_Fetch_and_op, MPI_Accumulate, etc for codes " + "that will not use anything more than a single predefined datatype (default: %s)", + mca_osc_rdma_component.acc_single_intrinsic ? "true" : "false"); (void) mca_base_component_var_register(&mca_osc_rdma_component.super.osc_version, "acc_single_intrinsic", - "Enable optimizations for MPI_Fetch_and_op, MPI_Accumulate, etc for codes " - "that will not use anything more than a single predefined datatype (default: false)", - MCA_BASE_VAR_TYPE_BOOL, NULL, 0, 0, OPAL_INFO_LVL_5, + description_str, MCA_BASE_VAR_TYPE_BOOL, NULL, 0, 0, OPAL_INFO_LVL_5, MCA_BASE_VAR_SCOPE_GROUP, &mca_osc_rdma_component.acc_single_intrinsic); + free(description_str); mca_osc_rdma_component.acc_use_amo = true; - (void) mca_base_component_var_register(&mca_osc_rdma_component.super.osc_version, "acc_use_amo", - "Enable the use of network atomic memory operations when using single " - "intrinsic optimizations. If not set network compare-and-swap will be " - "used instread (default: true)", MCA_BASE_VAR_TYPE_BOOL, NULL, 0, 0, OPAL_INFO_LVL_5, - MCA_BASE_VAR_SCOPE_GROUP, &mca_osc_rdma_component.acc_use_amo); + asprintf(&description_str, "Enable the use of network atomic memory operations when using single " + "intrinsic optimizations. If not set network compare-and-swap will be " + "used instread (default: %s)", mca_osc_rdma_component.acc_use_amo ? "true" : "false"); + (void) mca_base_component_var_register(&mca_osc_rdma_component.super.osc_version, "acc_use_amo", description_str, + MCA_BASE_VAR_TYPE_BOOL, NULL, 0, 0, OPAL_INFO_LVL_5, MCA_BASE_VAR_SCOPE_GROUP, + &mca_osc_rdma_component.acc_use_amo); + free(description_str); mca_osc_rdma_component.buffer_size = 32768; - (void) mca_base_component_var_register (&mca_osc_rdma_component.super.osc_version, "buffer_size", - "Size of temporary buffers (default: 32k)", MCA_BASE_VAR_TYPE_UNSIGNED_INT, - NULL, 0, 0, OPAL_INFO_LVL_3, MCA_BASE_VAR_SCOPE_LOCAL, - &mca_osc_rdma_component.buffer_size); + asprintf(&description_str, "Size of temporary buffers (default: %d)", mca_osc_rdma_component.buffer_size); + (void) mca_base_component_var_register (&mca_osc_rdma_component.super.osc_version, "buffer_size", description_str, + MCA_BASE_VAR_TYPE_UNSIGNED_INT, NULL, 0, 0, OPAL_INFO_LVL_3, + MCA_BASE_VAR_SCOPE_LOCAL, &mca_osc_rdma_component.buffer_size); + free(description_str); mca_osc_rdma_component.max_attach = 32; - (void) mca_base_component_var_register (&mca_osc_rdma_component.super.osc_version, "max_attach", - "Maximum number of buffers that can be attached to a dynamic window. " - "Keep in mind that each attached buffer will use a potentially limited " - "resource (default: 32)", MCA_BASE_VAR_TYPE_UNSIGNED_INT, NULL, 0, 0, - OPAL_INFO_LVL_3, MCA_BASE_VAR_SCOPE_GROUP, &mca_osc_rdma_component.max_attach); + asprintf(&description_str, "Maximum number of buffers that can be attached to a dynamic window. " + "Keep in mind that each attached buffer will use a potentially limited " + "resource (default: %d)", mca_osc_rdma_component.max_attach); + (void) mca_base_component_var_register (&mca_osc_rdma_component.super.osc_version, "max_attach", description_str, + MCA_BASE_VAR_TYPE_UNSIGNED_INT, NULL, 0, 0, OPAL_INFO_LVL_3, + MCA_BASE_VAR_SCOPE_GROUP, &mca_osc_rdma_component.max_attach); + free(description_str); mca_osc_rdma_component.aggregation_limit = 1024; + asprintf(&description_str, "Maximum size of an aggregated put/get. Messages are aggregated for consecutive" + "put and get operations. In some cases this may lead to higher latency but " + "should also lead to higher bandwidth utilization. Set to 0 to disable (default: %d)", + mca_osc_rdma_component.aggregation_limit); (void) mca_base_component_var_register (&mca_osc_rdma_component.super.osc_version, "aggregation_limit", - "Maximum size of an aggregated put/get. Messages are aggregated for consecutive" - "put and get operations. In some cases this may lead to higher latency but " - "should also lead to higher bandwidth utilization. Set to 0 to disable (default:" - " 1k)", MCA_BASE_VAR_TYPE_UNSIGNED_INT, NULL, 0, 0, OPAL_INFO_LVL_3, + description_str, MCA_BASE_VAR_TYPE_UNSIGNED_INT, NULL, 0, 0, OPAL_INFO_LVL_3, MCA_BASE_VAR_SCOPE_GROUP, &mca_osc_rdma_component.aggregation_limit); + free(description_str); - mca_osc_rdma_component.priority = 90; - (void) mca_base_component_var_register (&mca_osc_rdma_component.super.osc_version, "priority", - "Priority of the osc/rdma component (default: 90)", + mca_osc_rdma_component.priority = 101; + asprintf(&description_str, "Priority of the osc/rdma component (default: %d)", + mca_osc_rdma_component.priority); + (void) mca_base_component_var_register (&mca_osc_rdma_component.super.osc_version, "priority", description_str, MCA_BASE_VAR_TYPE_UNSIGNED_INT, NULL, 0, 0, OPAL_INFO_LVL_3, MCA_BASE_VAR_SCOPE_GROUP, &mca_osc_rdma_component.priority); - - ompi_osc_rdma_btl_names = "openib,ugni"; - (void) mca_base_component_var_register (&mca_osc_rdma_component.super.osc_version, "btls", - "Comma-delimited list of BTL component names to allow without verifying " - "connectivity. Do not add a BTL to to this list unless it can reach all " - "processes in any communicator used with an MPI window (default: openib,ugni)", + free(description_str); + + (void) mca_base_var_enum_create ("osc_rdma_locking_mode", ompi_osc_rdma_locking_modes, &new_enum); + + mca_osc_rdma_component.locking_mode = OMPI_OSC_RDMA_LOCKING_TWO_LEVEL; + (void) mca_base_component_var_register (&mca_osc_rdma_component.super.osc_version, "locking_mode", + "Locking mode to use for passive-target synchronization (default: two_level)", + MCA_BASE_VAR_TYPE_INT, new_enum, 0, 0, OPAL_INFO_LVL_3, + MCA_BASE_VAR_SCOPE_GROUP, &mca_osc_rdma_component.locking_mode); + OBJ_RELEASE(new_enum); + + ompi_osc_rdma_btl_names = "openib,ugni,uct,ucp"; + asprintf(&description_str, "Comma-delimited list of BTL component names to allow without verifying " + "connectivity. Do not add a BTL to to this list unless it can reach all " + "processes in any communicator used with an MPI window (default: %s)", + ompi_osc_rdma_btl_names); + (void) mca_base_component_var_register (&mca_osc_rdma_component.super.osc_version, "btls", description_str, MCA_BASE_VAR_TYPE_STRING, NULL, 0, 0, OPAL_INFO_LVL_3, MCA_BASE_VAR_SCOPE_GROUP, &ompi_osc_rdma_btl_names); + free(description_str); + + ompi_osc_rdma_mtl_names = "psm2"; + asprintf(&description_str, "Comma-delimited list of MTL component names to lower the priority of rdma " + "osc component favoring pt2pt osc (default: %s)", ompi_osc_rdma_mtl_names); + (void) mca_base_component_var_register (&mca_osc_rdma_component.super.osc_version, "mtls", description_str, + MCA_BASE_VAR_TYPE_STRING, NULL, 0, 0, OPAL_INFO_LVL_3, + MCA_BASE_VAR_SCOPE_GROUP, &ompi_osc_rdma_mtl_names); + free(description_str); /* register performance variables */ @@ -322,7 +368,7 @@ int ompi_osc_rdma_component_finalize (void) static int ompi_osc_rdma_component_query (struct ompi_win_t *win, void **base, size_t size, int disp_unit, - struct ompi_communicator_t *comm, struct ompi_info_t *info, + struct ompi_communicator_t *comm, struct opal_info_t *info, int flavor) { @@ -339,6 +385,10 @@ static int ompi_osc_rdma_component_query (struct ompi_win_t *win, void **base, s } #endif /* OPAL_CUDA_SUPPORT */ + if (OMPI_SUCCESS == ompi_osc_rdma_query_mtls ()) { + return 5; /* this has to be lower that osc pt2pt default priority */ + } + if (OMPI_SUCCESS != ompi_osc_rdma_query_btls (comm, NULL)) { return -1; } @@ -393,7 +443,8 @@ static int allocate_state_single (ompi_osc_rdma_module_t *module, void **base, s /* allocate anything that will be accessed remotely in the same region. this cuts down on the number of * registration handles needed to access this data. */ - total_size = module->state_size + local_rank_array_size + leader_peer_data_size; + total_size = local_rank_array_size + module->region_size + + module->state_size + leader_peer_data_size; if (MPI_WIN_FLAVOR_ALLOCATE == module->flavor) { total_size += size; @@ -408,7 +459,11 @@ static int allocate_state_single (ompi_osc_rdma_module_t *module, void **base, s return OMPI_ERR_OUT_OF_RESOURCE; } - module->state_offset = local_rank_array_size; +// Note, the extra module->region_size space added after local_rank_array_size +// is unused but is there to match what happens in allocte_state_shared() +// This allows module->state_offset to be uniform across the ranks which +// is part of how they pull peer info from each other. + module->state_offset = local_rank_array_size + module->region_size; module->state = (ompi_osc_rdma_state_t *) ((intptr_t) module->rank_array + module->state_offset); module->node_comm_info = (unsigned char *) ((intptr_t) module->state + module->state_size); @@ -442,11 +497,12 @@ static int allocate_state_single (ompi_osc_rdma_module_t *module, void **base, s return ret; } + module->my_peer = my_peer; module->free_after = module->rank_array; my_peer->flags |= OMPI_OSC_RDMA_PEER_LOCAL_BASE; my_peer->state = (uint64_t) (uintptr_t) module->state; - if (module->selected_btl->btl_flags & MCA_BTL_ATOMIC_SUPPORTS_GLOB) { + if (module->use_cpu_atomics) { /* all peers are local or it is safe to mix cpu and nic atomics */ my_peer->flags |= OMPI_OSC_RDMA_PEER_LOCAL_STATE; } else { @@ -464,8 +520,13 @@ static int allocate_state_single (ompi_osc_rdma_module_t *module, void **base, s ex_peer->size = size; } - if (MPI_WIN_FLAVOR_ALLOCATE == module->flavor) { - ex_peer->super.base_handle = module->state_handle; + if (!module->use_cpu_atomics) { + if (MPI_WIN_FLAVOR_ALLOCATE == module->flavor) { + /* base is local and cpu atomics are available */ + ex_peer->super.base_handle = module->state_handle; + } else { + ex_peer->super.base_handle = module->base_handle; + } } } @@ -483,11 +544,10 @@ static int allocate_state_shared (ompi_osc_rdma_module_t *module, void **base, s unsigned long offset, total_size; unsigned long state_base, data_base; int local_rank, local_size, ret; - size_t local_rank_array_size, leader_peer_data_size; + size_t local_rank_array_size, leader_peer_data_size, my_base_offset = 0; int my_rank = ompi_comm_rank (module->comm); int global_size = ompi_comm_size (module->comm); ompi_osc_rdma_region_t *state_region; - int my_base_offset = 0; struct _local_data *temp; char *data_file; @@ -496,6 +556,9 @@ static int allocate_state_shared (ompi_osc_rdma_module_t *module, void **base, s local_rank = ompi_comm_rank (shared_comm); local_size = ompi_comm_size (shared_comm); + /* CPU atomics can be used if every process is on the same node or the NIC allows mixing CPU and NIC atomics */ + module->use_cpu_atomics = local_size == global_size || (module->selected_btl->btl_flags & MCA_BTL_ATOMIC_SUPPORTS_GLOB); + if (1 == local_size) { /* no point using a shared segment if there are no other processes on this node */ return allocate_state_single (module, base, size); @@ -625,13 +688,15 @@ static int allocate_state_shared (ompi_osc_rdma_module_t *module, void **base, s } } - /* barrier to make sure all ranks have attached */ + /* barrier to make sure all ranks have set up their region data */ shared_comm->c_coll->coll_barrier(shared_comm, shared_comm->c_coll->coll_barrier_module); offset = data_base; for (int i = 0 ; i < local_size ; ++i) { + /* local pointer to peer's state */ + ompi_osc_rdma_state_t *peer_state = (ompi_osc_rdma_state_t *) ((uintptr_t) module->segment_base + state_base + module->state_size * i); + ompi_osc_rdma_region_t *peer_region = (ompi_osc_rdma_region_t *) peer_state->regions; ompi_osc_rdma_peer_extended_t *ex_peer; - ompi_osc_rdma_state_t *peer_state; ompi_osc_rdma_peer_t *peer; int peer_rank = temp[i].rank; @@ -642,13 +707,12 @@ static int allocate_state_shared (ompi_osc_rdma_module_t *module, void **base, s ex_peer = (ompi_osc_rdma_peer_extended_t *) peer; - /* peer state local pointer */ - peer_state = (ompi_osc_rdma_state_t *) ((uintptr_t) module->segment_base + state_base + module->state_size * i); - - if (local_size == global_size || (module->selected_btl->btl_flags & MCA_BTL_ATOMIC_SUPPORTS_GLOB)) { + /* set up peer state */ + if (module->use_cpu_atomics) { /* all peers are local or it is safe to mix cpu and nic atomics */ peer->flags |= OMPI_OSC_RDMA_PEER_LOCAL_STATE; peer->state = (osc_rdma_counter_t) peer_state; + peer->state_endpoint = NULL; } else { /* use my endpoint handle to modify the peer's state */ if (module->selected_btl->btl_register_mem) { @@ -658,38 +722,43 @@ static int allocate_state_shared (ompi_osc_rdma_module_t *module, void **base, s peer->state_endpoint = ompi_osc_rdma_peer_btl_endpoint (module, temp[0].rank); } - /* finish setting up the local peer structure */ - if (MPI_WIN_FLAVOR_DYNAMIC != module->flavor) { - if (!module->same_disp_unit) { - ex_peer->disp_unit = peer_state->disp_unit; - } + if (my_rank == peer_rank) { + module->my_peer = peer; + } - if (!module->same_size) { - ex_peer->size = temp[i].size; - } + if (MPI_WIN_FLAVOR_DYNAMIC == module->flavor || MPI_WIN_FLAVOR_CREATE == module->flavor) { + /* use the peer's BTL endpoint directly */ + peer->data_endpoint = ompi_osc_rdma_peer_btl_endpoint (module, peer_rank); + } else if (!module->use_cpu_atomics && temp[i].size) { + /* use the local leader's endpoint */ + peer->data_endpoint = ompi_osc_rdma_peer_btl_endpoint (module, temp[0].rank); + } - if (my_rank == peer_rank) { - peer->flags |= OMPI_OSC_RDMA_PEER_LOCAL_BASE; - } + ompi_osc_module_add_peer (module, peer); - if (MPI_WIN_FLAVOR_ALLOCATE == module->flavor) { - if (temp[i].size) { - ex_peer->super.base = state_region->base + offset; - offset += temp[i].size; - } else { - ex_peer->super.base = 0; - } - } + if (MPI_WIN_FLAVOR_DYNAMIC == module->flavor || 0 == temp[i].size) { + /* nothing more to do */ + continue; + } - ompi_osc_rdma_region_t *peer_region = (ompi_osc_rdma_region_t *) peer_state->regions; + /* finish setting up the local peer structure for win allocate/create */ + if (!(module->same_disp_unit && module->same_size)) { + ex_peer->disp_unit = peer_state->disp_unit; + ex_peer->size = temp[i].size; + } + if (module->use_cpu_atomics && MPI_WIN_FLAVOR_ALLOCATE == module->flavor) { + /* base is local and cpu atomics are available */ + ex_peer->super.base = (uintptr_t) module->segment_base + offset; + peer->flags |= OMPI_OSC_RDMA_PEER_LOCAL_BASE; + offset += temp[i].size; + } else { ex_peer->super.base = peer_region->base; + if (module->selected_btl->btl_register_mem) { ex_peer->super.base_handle = (mca_btl_base_registration_handle_t *) peer_region->btl_handle_data; } } - - ompi_osc_module_add_peer (module, peer); } } while (0); @@ -698,6 +767,23 @@ static int allocate_state_shared (ompi_osc_rdma_module_t *module, void **base, s return ret; } +static int ompi_osc_rdma_query_mtls (void) +{ + char **mtls_to_use; + + mtls_to_use = opal_argv_split (ompi_osc_rdma_mtl_names, ','); + if (mtls_to_use && ompi_mtl_base_selected_component) { + for (int i = 0 ; mtls_to_use[i] ; ++i) { + if (0 == strcmp (mtls_to_use[i], ompi_mtl_base_selected_component->mtl_version.mca_component_name)) { + opal_argv_free(mtls_to_use); + return OMPI_SUCCESS; + } + } + } + opal_argv_free(mtls_to_use); + return -1; +} + static int ompi_osc_rdma_query_btls (ompi_communicator_t *comm, struct mca_btl_base_module_t **btl) { struct mca_btl_base_module_t **possible_btls = NULL; @@ -810,11 +896,18 @@ static int ompi_osc_rdma_query_btls (ompi_communicator_t *comm, struct mca_btl_b } for (int i = 0 ; i < max_btls ; ++i) { + int btl_count = btl_counts[i]; + if (NULL == possible_btls[i]) { break; } - if (btl_counts[i] == comm_size && possible_btls[i]->btl_latency < selected_latency) { + if (possible_btls[i]->btl_atomic_flags & MCA_BTL_ATOMIC_SUPPORTS_GLOB) { + /* do not need to use the btl for self communication */ + btl_count++; + } + + if (btl_count >= comm_size && possible_btls[i]->btl_latency < selected_latency) { selected_btl = possible_btls[i]; selected_latency = possible_btls[i]->btl_latency; } @@ -1014,7 +1107,7 @@ static int ompi_osc_rdma_check_parameters (ompi_osc_rdma_module_t *module, int d static int ompi_osc_rdma_component_select (struct ompi_win_t *win, void **base, size_t size, int disp_unit, - struct ompi_communicator_t *comm, struct ompi_info_t *info, + struct ompi_communicator_t *comm, struct opal_info_t *info, int flavor, int *model) { ompi_osc_rdma_module_t *module = NULL; @@ -1045,7 +1138,8 @@ static int ompi_osc_rdma_component_select (struct ompi_win_t *win, void **base, module->same_disp_unit = check_config_value_bool ("same_disp_unit", info); module->same_size = check_config_value_bool ("same_size", info); module->no_locks = check_config_value_bool ("no_locks", info); - module->acc_single_intrinsic = check_config_value_bool ("ompi_single_accumulate", info); + module->locking_mode = mca_osc_rdma_component.locking_mode; + module->acc_single_intrinsic = check_config_value_bool ("acc_single_intrinsic", info); module->acc_use_amo = mca_osc_rdma_component.acc_use_amo; module->all_sync.module = module; @@ -1117,6 +1211,15 @@ static int ompi_osc_rdma_component_select (struct ompi_win_t *win, void **base, } else { module->state_size += mca_osc_rdma_component.max_attach * module->region_size; } +/* + * These are the info's that this module is interested in + */ + opal_infosubscribe_subscribe(&win->super, "no_locks", "false", ompi_osc_rdma_set_no_lock_info); + +/* + * TODO: same_size, same_disp_unit have w_flag entries, but do not appear + * to be used anywhere. If that changes, they should be subscribed + */ /* fill in the function pointer part */ memcpy(&module->super, &ompi_osc_rdma_module_rdma_template, sizeof(module->super)); @@ -1201,7 +1304,43 @@ static int ompi_osc_rdma_component_select (struct ompi_win_t *win, void **base, } -static int ompi_osc_rdma_set_info (struct ompi_win_t *win, struct ompi_info_t *info) +static char* ompi_osc_rdma_set_no_lock_info(opal_infosubscriber_t *obj, char *key, char *value) +{ + + struct ompi_win_t *win = (struct ompi_win_t*) obj; + ompi_osc_rdma_module_t *module = GET_MODULE(win); + bool temp; + + temp = opal_str_to_bool(value); + if (temp && !module->no_locks) { + /* clean up the lock hash. it is up to the user to ensure no lock is + * outstanding from this process when setting the info key */ + OBJ_DESTRUCT(&module->outstanding_locks); + OBJ_CONSTRUCT(&module->outstanding_locks, opal_hash_table_t); + + module->no_locks = true; + } else if (!temp && module->no_locks) { + int world_size = ompi_comm_size (module->comm); + int init_limit = world_size > 256 ? 256 : world_size; + int ret; + + ret = opal_hash_table_init (&module->outstanding_locks, init_limit); + if (OPAL_SUCCESS != ret) { + module->no_locks = true; + } + + module->no_locks = false; + } + /* enforce collectiveness... */ + module->comm->c_coll->coll_barrier(module->comm, module->comm->c_coll->coll_barrier_module); +/* + * Accept any value + */ + return module->no_locks ? "true" : "false"; +} + +#if 0 // stale code? +static int ompi_osc_rdma_set_info (struct ompi_win_t *win, struct opal_info_t *info) { ompi_osc_rdma_module_t *module = GET_MODULE(win); bool temp; @@ -1235,9 +1374,9 @@ static int ompi_osc_rdma_set_info (struct ompi_win_t *win, struct ompi_info_t *i } -static int ompi_osc_rdma_get_info (struct ompi_win_t *win, struct ompi_info_t **info_used) +static int ompi_osc_rdma_get_info (struct ompi_win_t *win, struct opal_info_t **info_used) { - ompi_info_t *info = OBJ_NEW(ompi_info_t); + opal_info_t *info = OBJ_NEW(opal_info_t); if (NULL == info) { return OMPI_ERR_TEMP_OUT_OF_RESOURCE; @@ -1247,5 +1386,5 @@ static int ompi_osc_rdma_get_info (struct ompi_win_t *win, struct ompi_info_t ** return OMPI_SUCCESS; } - +#endif OBJ_CLASS_INSTANCE(ompi_osc_rdma_aggregation_t, opal_list_item_t, NULL, NULL); diff --git a/ompi/mca/osc/rdma/osc_rdma_frag.h b/ompi/mca/osc/rdma/osc_rdma_frag.h index e9636a24d25..beecce93be3 100644 --- a/ompi/mca/osc/rdma/osc_rdma_frag.h +++ b/ompi/mca/osc/rdma/osc_rdma_frag.h @@ -1,7 +1,7 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* * Copyright (c) 2012 Sandia National Laboratories. All rights reserved. - * Copyright (c) 2014-2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2014-2018 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -16,34 +16,18 @@ #include "osc_rdma.h" #include "opal/align.h" -/** Communication buffer for packing messages */ -struct ompi_osc_rdma_frag_t { - opal_free_list_item_t super; - - /* start of unused space */ - unsigned char *top; - - /* space remaining in buffer */ - uint32_t remain_len; - /* Number of operations which have started writing into the frag, but not yet completed doing so */ - int32_t pending; - - ompi_osc_rdma_module_t *module; - mca_btl_base_registration_handle_t *handle; -}; -typedef struct ompi_osc_rdma_frag_t ompi_osc_rdma_frag_t; -OBJ_CLASS_DECLARATION(ompi_osc_rdma_frag_t); - - static inline void ompi_osc_rdma_frag_complete (ompi_osc_rdma_frag_t *frag) { - if (0 == OPAL_THREAD_ADD32(&frag->pending, -1)) { + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_INFO, "returning frag. pending = %d", frag->pending); + if (0 == OPAL_THREAD_ADD_FETCH32(&frag->pending, -1)) { opal_atomic_rmb (); - ompi_osc_rdma_deregister (frag->module, frag->handle); - frag->handle = NULL; - - opal_free_list_return (&mca_osc_rdma_component.frags, (opal_free_list_item_t *) frag); + (void) opal_atomic_swap_32 (&frag->pending, 1); +#if OPAL_HAVE_ATOMIC_MATH_64 + (void) opal_atomic_swap_64 (&frag->curr_index, 0); +#else + (void) opal_atomic_swap_32 (&frag->curr_index, 0); +#endif } } @@ -53,7 +37,8 @@ static inline void ompi_osc_rdma_frag_complete (ompi_osc_rdma_frag_t *frag) static inline int ompi_osc_rdma_frag_alloc (ompi_osc_rdma_module_t *module, size_t request_len, ompi_osc_rdma_frag_t **buffer, char **ptr) { - ompi_osc_rdma_frag_t *curr; + ompi_osc_rdma_frag_t *curr = module->rdma_frag; + int64_t my_index; int ret; /* ensure all buffers are 8-byte aligned */ @@ -63,60 +48,60 @@ static inline int ompi_osc_rdma_frag_alloc (ompi_osc_rdma_module_t *module, size return OMPI_ERR_VALUE_OUT_OF_BOUNDS; } - OPAL_THREAD_LOCK(&module->lock); - curr = module->rdma_frag; - if (OPAL_UNLIKELY(NULL == curr || curr->remain_len < request_len)) { - if (NULL == curr || (NULL != curr && curr->pending > 1)) { - opal_free_list_item_t *item = NULL; + if (NULL == curr) { + opal_free_list_item_t *item = NULL; - /* release the initial reference to the buffer */ - module->rdma_frag = NULL; + item = opal_free_list_get (&mca_osc_rdma_component.frags); + if (OPAL_UNLIKELY(NULL == item)) { + OPAL_THREAD_UNLOCK(&module->lock); + return OMPI_ERR_OUT_OF_RESOURCE; + } - if (curr) { - ompi_osc_rdma_frag_complete (curr); - } + curr = (ompi_osc_rdma_frag_t *) item; + + curr->handle = NULL; + curr->pending = 1; + curr->module = module; + curr->curr_index = 0; - item = opal_free_list_get (&mca_osc_rdma_component.frags); - if (OPAL_UNLIKELY(NULL == item)) { - OPAL_THREAD_UNLOCK(&module->lock); + if (module->selected_btl->btl_register_mem) { + ret = ompi_osc_rdma_register (module, MCA_BTL_ENDPOINT_ANY, curr->super.ptr, mca_osc_rdma_component.buffer_size, + MCA_BTL_REG_FLAG_ACCESS_ANY, &curr->handle); + if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { return OMPI_ERR_OUT_OF_RESOURCE; } + } - curr = module->rdma_frag = (ompi_osc_rdma_frag_t *) item; - + if (!opal_atomic_compare_exchange_strong_ptr (&module->rdma_frag, &(void *){NULL}, curr)) { + ompi_osc_rdma_deregister (module, curr->handle); curr->handle = NULL; - curr->pending = 1; - curr->module = module; - } - curr->top = curr->super.ptr; - curr->remain_len = mca_osc_rdma_component.buffer_size; + opal_free_list_return (&mca_osc_rdma_component.frags, &curr->super); - if (curr->remain_len < request_len) { - OPAL_THREAD_UNLOCK(&module->lock); - return OMPI_ERR_TEMP_OUT_OF_RESOURCE; + curr = module->rdma_frag; } } - if (!curr->handle && module->selected_btl->btl_register_mem) { - ret = ompi_osc_rdma_register (module, MCA_BTL_ENDPOINT_ANY, curr->super.ptr, mca_osc_rdma_component.buffer_size, - MCA_BTL_REG_FLAG_ACCESS_ANY, &curr->handle); - if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { - OPAL_THREAD_UNLOCK(&module->lock); - return ret; + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_INFO, "allocating frag. pending = %d", curr->pending); + OPAL_THREAD_ADD_FETCH32(&curr->pending, 1); + +#if OPAL_HAVE_ATOMIC_MATH_64 + my_index = opal_atomic_fetch_add_64 (&curr->curr_index, request_len); +#else + my_index = opal_atomic_fetch_add_32 (&curr->curr_index, request_len); +#endif + if (my_index + request_len > mca_osc_rdma_component.buffer_size) { + if (my_index <= mca_osc_rdma_component.buffer_size) { + /* this thread caused the buffer to spill over */ + ompi_osc_rdma_frag_complete (curr); } + ompi_osc_rdma_frag_complete (curr); + return OPAL_ERR_OUT_OF_RESOURCE; } - - *ptr = (char *) curr->top; + *ptr = (void *) ((intptr_t) curr->super.ptr + my_index); *buffer = curr; - curr->top += request_len; - curr->remain_len -= request_len; - OPAL_THREAD_ADD32(&curr->pending, 1); - - OPAL_THREAD_UNLOCK(&module->lock); - return OMPI_SUCCESS; } diff --git a/ompi/mca/osc/rdma/osc_rdma_lock.h b/ompi/mca/osc/rdma/osc_rdma_lock.h index e06e9742d5f..bbc56bc182b 100644 --- a/ompi/mca/osc/rdma/osc_rdma_lock.h +++ b/ompi/mca/osc/rdma/osc_rdma_lock.h @@ -17,7 +17,8 @@ static inline int ompi_osc_rdma_trylock_local (volatile ompi_osc_rdma_lock_t *lock) { - return !ompi_osc_rdma_lock_cmpset (lock, 0, OMPI_OSC_RDMA_LOCK_EXCLUSIVE); + ompi_osc_rdma_lock_t _tmp_value = 0; + return !ompi_osc_rdma_lock_compare_exchange (lock, &_tmp_value, OMPI_OSC_RDMA_LOCK_EXCLUSIVE); } static inline void ompi_osc_rdma_unlock_local (volatile ompi_osc_rdma_lock_t *lock) @@ -33,24 +34,41 @@ void ompi_osc_rdma_atomic_complete (mca_btl_base_module_t *btl, struct mca_btl_b void *context, void *data, int status); __opal_attribute_always_inline__ -static inline int ompi_osc_rdma_lock_btl_fop (ompi_osc_rdma_module_t *module, ompi_osc_rdma_peer_t *peer, uint64_t address, - int op, ompi_osc_rdma_lock_t operand, ompi_osc_rdma_lock_t *result) +static inline int ompi_osc_rdma_btl_fop (ompi_osc_rdma_module_t *module, struct mca_btl_base_endpoint_t *endpoint, + uint64_t address, mca_btl_base_registration_handle_t *address_handle, int op, + int64_t operand, int flags, int64_t *result, const bool wait_for_completion, + ompi_osc_rdma_pending_op_cb_fn_t cbfunc, void *cbdata, void *cbcontext) { - volatile bool atomic_complete = false; - ompi_osc_rdma_frag_t *frag = NULL; - ompi_osc_rdma_lock_t *temp = NULL; + ompi_osc_rdma_pending_op_t *pending_op; int ret; + pending_op = OBJ_NEW(ompi_osc_rdma_pending_op_t); + assert (NULL != pending_op); + + if (wait_for_completion) { + OBJ_RETAIN(pending_op); + } + + pending_op->op_result = (void *) result; + pending_op->op_size = (MCA_BTL_ATOMIC_FLAG_32BIT & flags) ? 4 : 8; + OBJ_RETAIN(pending_op); + if (cbfunc) { + pending_op->cbfunc = cbfunc; + pending_op->cbdata = cbdata; + pending_op->cbcontext = cbcontext; + } + /* spin until the btl has accepted the operation */ do { - if (NULL == frag) { - ret = ompi_osc_rdma_frag_alloc (module, 8, &frag, (char **) &temp); + if (NULL == pending_op->op_frag) { + ret = ompi_osc_rdma_frag_alloc (module, 8, &pending_op->op_frag, (char **) &pending_op->op_buffer); } - if (NULL != frag) { - ret = module->selected_btl->btl_atomic_fop (module->selected_btl, peer->state_endpoint, temp, (intptr_t) address, - frag->handle, peer->state_handle, op, operand, 0, - MCA_BTL_NO_ORDER, ompi_osc_rdma_atomic_complete, (void *) &atomic_complete, - NULL); + + if (NULL != pending_op->op_frag) { + ret = module->selected_btl->btl_atomic_fop (module->selected_btl, endpoint, pending_op->op_buffer, + (intptr_t) address, pending_op->op_frag->handle, address_handle, + op, operand, flags, MCA_BTL_NO_ORDER, ompi_osc_rdma_atomic_complete, + (void *) pending_op, NULL); } if (OPAL_LIKELY(!ompi_osc_rdma_oor(ret))) { @@ -59,40 +77,64 @@ static inline int ompi_osc_rdma_lock_btl_fop (ompi_osc_rdma_module_t *module, om ompi_osc_rdma_progress (module); } while (1); - if (OPAL_SUCCESS == ret) { - while (!atomic_complete) { - ompi_osc_rdma_progress (module); + if (OPAL_SUCCESS != ret) { + if (OPAL_LIKELY(1 == ret)) { + *result = ((int64_t *) pending_op->op_buffer)[0]; + ret = OMPI_SUCCESS; + ompi_osc_rdma_atomic_complete (module->selected_btl, endpoint, pending_op->op_buffer, + pending_op->op_frag->handle, (void *) pending_op, NULL, OPAL_SUCCESS); } - } else if (1 == ret) { - ret = OMPI_SUCCESS; - } - if (NULL != frag) { - if (*result) { - *result = *temp; + /* need to release here because ompi_osc_rdma_atomic_complet was not called */ + OBJ_RELEASE(pending_op); + } else if (wait_for_completion) { + while (!pending_op->op_complete) { + ompi_osc_rdma_progress (module); } - ompi_osc_rdma_frag_complete (frag); } + OBJ_RELEASE(pending_op); + return ret; } __opal_attribute_always_inline__ -static inline int ompi_osc_rdma_lock_btl_op (ompi_osc_rdma_module_t *module, ompi_osc_rdma_peer_t *peer, uint64_t address, - int op, ompi_osc_rdma_lock_t operand) +static inline int ompi_osc_rdma_lock_btl_fop (ompi_osc_rdma_module_t *module, ompi_osc_rdma_peer_t *peer, uint64_t address, + int op, ompi_osc_rdma_lock_t operand, ompi_osc_rdma_lock_t *result, + const bool wait_for_completion) +{ + return ompi_osc_rdma_btl_fop (module, peer->state_endpoint, address, peer->state_handle, op, operand, 0, result, + wait_for_completion, NULL, NULL, NULL); +} + +__opal_attribute_always_inline__ +static inline int ompi_osc_rdma_btl_op (ompi_osc_rdma_module_t *module, struct mca_btl_base_endpoint_t *endpoint, + uint64_t address, mca_btl_base_registration_handle_t *address_handle, + int op, int64_t operand, int flags, const bool wait_for_completion, + ompi_osc_rdma_pending_op_cb_fn_t cbfunc, void *cbdata, void *cbcontext) { - volatile bool atomic_complete = false; + ompi_osc_rdma_pending_op_t *pending_op; int ret; if (!(module->selected_btl->btl_flags & MCA_BTL_FLAGS_ATOMIC_OPS)) { - return ompi_osc_rdma_lock_btl_fop (module, peer, address, op, operand, NULL); + return ompi_osc_rdma_btl_fop (module, endpoint, address, address_handle, op, operand, flags, NULL, wait_for_completion, + cbfunc, cbdata, cbcontext); + } + + pending_op = OBJ_NEW(ompi_osc_rdma_pending_op_t); + assert (NULL != pending_op); + OBJ_RETAIN(pending_op); + if (cbfunc) { + pending_op->cbfunc = cbfunc; + pending_op->cbdata = cbdata; + pending_op->cbcontext = cbcontext; } /* spin until the btl has accepted the operation */ do { - ret = module->selected_btl->btl_atomic_op (module->selected_btl, peer->state_endpoint, (intptr_t) address, peer->state_handle, - op, operand, 0, MCA_BTL_NO_ORDER, ompi_osc_rdma_atomic_complete, - (void *) &atomic_complete, NULL); + ret = module->selected_btl->btl_atomic_op (module->selected_btl, endpoint, (intptr_t) address, address_handle, + op, operand, flags, MCA_BTL_NO_ORDER, ompi_osc_rdma_atomic_complete, + (void *) pending_op, NULL); if (OPAL_LIKELY(!ompi_osc_rdma_oor(ret))) { break; @@ -100,35 +142,60 @@ static inline int ompi_osc_rdma_lock_btl_op (ompi_osc_rdma_module_t *module, omp ompi_osc_rdma_progress (module); } while (1); - if (OPAL_SUCCESS == ret) { - while (!atomic_complete) { + if (OPAL_SUCCESS != ret) { + /* need to release here because ompi_osc_rdma_atomic_complet was not called */ + OBJ_RELEASE(pending_op); + if (OPAL_LIKELY(1 == ret)) { + if (cbfunc) { + cbfunc (cbdata, cbcontext, OMPI_SUCCESS); + } + ret = OMPI_SUCCESS; + } + } else if (wait_for_completion) { + while (!pending_op->op_complete) { ompi_osc_rdma_progress (module); } - } else if (1 == ret) { - ret = OMPI_SUCCESS; } + OBJ_RELEASE(pending_op); + return ret; } __opal_attribute_always_inline__ -static inline int ompi_osc_rdma_lock_btl_cswap (ompi_osc_rdma_module_t *module, ompi_osc_rdma_peer_t *peer, uint64_t address, - ompi_osc_rdma_lock_t compare, ompi_osc_rdma_lock_t value, ompi_osc_rdma_lock_t *result) +static inline int ompi_osc_rdma_lock_btl_op (ompi_osc_rdma_module_t *module, ompi_osc_rdma_peer_t *peer, uint64_t address, + int op, ompi_osc_rdma_lock_t operand, const bool wait_for_completion) { - volatile bool atomic_complete = false; - ompi_osc_rdma_frag_t *frag = NULL; - ompi_osc_rdma_lock_t *temp = NULL; + return ompi_osc_rdma_btl_op (module, peer->state_endpoint, address, peer->state_handle, op, operand, 0, wait_for_completion, + NULL, NULL, NULL); +} + +__opal_attribute_always_inline__ +static inline int ompi_osc_rdma_btl_cswap (ompi_osc_rdma_module_t *module, struct mca_btl_base_endpoint_t *endpoint, + uint64_t address, mca_btl_base_registration_handle_t *address_handle, + int64_t compare, int64_t value, int flags, int64_t *result) +{ + ompi_osc_rdma_pending_op_t *pending_op; int ret; + pending_op = OBJ_NEW(ompi_osc_rdma_pending_op_t); + assert (NULL != pending_op); + + OBJ_RETAIN(pending_op); + + pending_op->op_result = (void *) result; + pending_op->op_size = (MCA_BTL_ATOMIC_FLAG_32BIT & flags) ? 4 : 8; + /* spin until the btl has accepted the operation */ do { - if (NULL == frag) { - ret = ompi_osc_rdma_frag_alloc (module, 8, &frag, (char **) &temp); + if (NULL == pending_op->op_frag) { + ret = ompi_osc_rdma_frag_alloc (module, 8, &pending_op->op_frag, (char **) &pending_op->op_buffer); } - if (NULL != frag) { - ret = module->selected_btl->btl_atomic_cswap (module->selected_btl, peer->state_endpoint, temp, address, frag->handle, - peer->state_handle, compare, value, 0, 0, ompi_osc_rdma_atomic_complete, - (void *) &atomic_complete, NULL); + if (NULL != pending_op->op_frag) { + ret = module->selected_btl->btl_atomic_cswap (module->selected_btl, endpoint, pending_op->op_buffer, + address, pending_op->op_frag->handle, address_handle, compare, + value, flags, 0, ompi_osc_rdma_atomic_complete, (void *) pending_op, + NULL); } if (OPAL_LIKELY(!ompi_osc_rdma_oor(ret))) { @@ -137,24 +204,32 @@ static inline int ompi_osc_rdma_lock_btl_cswap (ompi_osc_rdma_module_t *module, ompi_osc_rdma_progress (module); } while (1); - if (OPAL_SUCCESS == ret) { - while (!atomic_complete) { - ompi_osc_rdma_progress (module); + if (OPAL_SUCCESS != ret) { + if (OPAL_LIKELY(1 == ret)) { + *result = ((int64_t *) pending_op->op_buffer)[0]; + ret = OMPI_SUCCESS; } - } else if (1 == ret) { - ret = OMPI_SUCCESS; - } - if (NULL != frag) { - if (*result) { - *result = *temp; + /* need to release here because ompi_osc_rdma_atomic_complete was not called */ + OBJ_RELEASE(pending_op); + } else { + while (!pending_op->op_complete) { + ompi_osc_rdma_progress (module); } - ompi_osc_rdma_frag_complete (frag); } + OBJ_RELEASE(pending_op); + return ret; } +__opal_attribute_always_inline__ +static inline int ompi_osc_rdma_lock_btl_cswap (ompi_osc_rdma_module_t *module, ompi_osc_rdma_peer_t *peer, uint64_t address, + ompi_osc_rdma_lock_t compare, ompi_osc_rdma_lock_t value, ompi_osc_rdma_lock_t *result) +{ + return ompi_osc_rdma_btl_cswap (module, peer->state_endpoint, address, peer->state_handle, compare, value, 0, result); +} + /** * ompi_osc_rdma_lock_acquire_shared: * @@ -178,7 +253,7 @@ static inline int ompi_osc_rdma_lock_release_shared (ompi_osc_rdma_module_t *mod peer->rank, (unsigned long) value); if (!ompi_osc_rdma_peer_local_state (peer)) { - return ompi_osc_rdma_lock_btl_op (module, peer, lock, MCA_BTL_ATOMIC_ADD, value); + return ompi_osc_rdma_lock_btl_op (module, peer, lock, MCA_BTL_ATOMIC_ADD, value, false); } (void) ompi_osc_rdma_lock_add ((volatile ompi_osc_rdma_lock_t *) lock, value); @@ -215,7 +290,7 @@ static inline int ompi_osc_rdma_lock_acquire_shared (ompi_osc_rdma_module_t *mod /* spin until the lock has been acquired */ if (!ompi_osc_rdma_peer_local_state (peer)) { do { - ret = ompi_osc_rdma_lock_btl_fop (module, peer, lock, MCA_BTL_ATOMIC_ADD, value, &lock_state); + ret = ompi_osc_rdma_lock_btl_fop (module, peer, lock, MCA_BTL_ATOMIC_ADD, value, &lock_state, true); if (OPAL_UNLIKELY(OPAL_SUCCESS != ret)) { OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_DEBUG, "failed to increment shared lock. opal error code %d", ret); return ret; @@ -285,7 +360,8 @@ static inline int ompi_osc_rdma_lock_try_acquire_exclusive (ompi_osc_rdma_module if (0 == lock_state) { OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_DEBUG, "exclusive lock acquired"); } else { - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_DEBUG, "could not acquire exclusive lock"); + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_DEBUG, "could not acquire exclusive lock. lock state 0x%" PRIx64, + (uint64_t) lock_state); } #endif @@ -311,7 +387,7 @@ static inline int ompi_osc_rdma_lock_acquire_exclusive (ompi_osc_rdma_module_t * { int ret; - while (1 != (ret = ompi_osc_rdma_lock_try_acquire_exclusive (module, peer, offset))) { + while (1 == (ret = ompi_osc_rdma_lock_try_acquire_exclusive (module, peer, offset))) { ompi_osc_rdma_progress (module); } @@ -336,10 +412,14 @@ static inline int ompi_osc_rdma_lock_release_exclusive (ompi_osc_rdma_module_t * uint64_t lock = (uint64_t) (intptr_t) peer->state + offset; int ret = OMPI_SUCCESS; - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_DEBUG, "releasing exclusive lock %" PRIx64 " on peer %d", lock, peer->rank); + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_DEBUG, "releasing exclusive lock %" PRIx64 " on peer %d\n", lock, peer->rank); if (!ompi_osc_rdma_peer_local_state (peer)) { - ret = ompi_osc_rdma_lock_btl_op (module, peer, lock, MCA_BTL_ATOMIC_ADD, -OMPI_OSC_RDMA_LOCK_EXCLUSIVE); + ret = ompi_osc_rdma_lock_btl_op (module, peer, lock, MCA_BTL_ATOMIC_ADD, -OMPI_OSC_RDMA_LOCK_EXCLUSIVE, + false); + if (OMPI_SUCCESS != ret) { + abort (); + } } else { ompi_osc_rdma_unlock_local ((volatile ompi_osc_rdma_lock_t *)(intptr_t) lock); } diff --git a/ompi/mca/osc/rdma/osc_rdma_passive_target.c b/ompi/mca/osc/rdma/osc_rdma_passive_target.c index 6358020f984..dc11c5e31df 100644 --- a/ompi/mca/osc/rdma/osc_rdma_passive_target.c +++ b/ompi/mca/osc/rdma/osc_rdma_passive_target.c @@ -8,10 +8,11 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2007-2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2007-2018 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2010 IBM Corporation. All rights reserved. * Copyright (c) 2012-2013 Sandia National Laboratories. All rights reserved. + * Copyright (c) 2018 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -113,23 +114,29 @@ int ompi_osc_rdma_flush_local_all (struct ompi_win_t *win) static inline int ompi_osc_rdma_lock_atomic_internal (ompi_osc_rdma_module_t *module, ompi_osc_rdma_peer_t *peer, ompi_osc_rdma_sync_t *lock) { + const int locking_mode = module->locking_mode; int ret; if (MPI_LOCK_EXCLUSIVE == lock->sync.lock.type) { do { OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_DEBUG, "incrementing global exclusive lock"); - /* lock the master lock. this requires no rank has a global shared lock */ - ret = ompi_osc_rdma_lock_acquire_shared (module, module->leader, 1, offsetof (ompi_osc_rdma_state_t, global_lock), 0xffffffff00000000L); - if (OMPI_SUCCESS != ret) { - ompi_osc_rdma_progress (module); - continue; + if (OMPI_OSC_RDMA_LOCKING_TWO_LEVEL == locking_mode) { + /* lock the master lock. this requires no rank has a global shared lock */ + ret = ompi_osc_rdma_lock_acquire_shared (module, module->leader, 1, offsetof (ompi_osc_rdma_state_t, global_lock), + 0xffffffff00000000L); + if (OMPI_SUCCESS != ret) { + ompi_osc_rdma_progress (module); + continue; + } } OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_DEBUG, "acquiring exclusive lock on peer"); ret = ompi_osc_rdma_lock_try_acquire_exclusive (module, peer, offsetof (ompi_osc_rdma_state_t, local_lock)); if (ret) { /* release the global lock */ - ompi_osc_rdma_lock_release_shared (module, module->leader, -1, offsetof (ompi_osc_rdma_state_t, global_lock)); + if (OMPI_OSC_RDMA_LOCKING_TWO_LEVEL == locking_mode) { + ompi_osc_rdma_lock_release_shared (module, module->leader, -1, offsetof (ompi_osc_rdma_state_t, global_lock)); + } ompi_osc_rdma_progress (module); continue; } @@ -157,20 +164,48 @@ static inline int ompi_osc_rdma_lock_atomic_internal (ompi_osc_rdma_module_t *mo static inline int ompi_osc_rdma_unlock_atomic_internal (ompi_osc_rdma_module_t *module, ompi_osc_rdma_peer_t *peer, ompi_osc_rdma_sync_t *lock) { + const int locking_mode = module->locking_mode; + if (MPI_LOCK_EXCLUSIVE == lock->sync.lock.type) { OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_DEBUG, "releasing exclusive lock on peer"); ompi_osc_rdma_lock_release_exclusive (module, peer, offsetof (ompi_osc_rdma_state_t, local_lock)); - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_DEBUG, "decrementing global exclusive lock"); - ompi_osc_rdma_lock_release_shared (module, module->leader, -1, offsetof (ompi_osc_rdma_state_t, global_lock)); + + if (OMPI_OSC_RDMA_LOCKING_TWO_LEVEL == locking_mode) { + OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_DEBUG, "decrementing global exclusive lock"); + ompi_osc_rdma_lock_release_shared (module, module->leader, -1, offsetof (ompi_osc_rdma_state_t, global_lock)); + } + peer->flags &= ~OMPI_OSC_RDMA_PEER_EXCLUSIVE; } else { OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_DEBUG, "decrementing global shared lock"); ompi_osc_rdma_lock_release_shared (module, peer, -1, offsetof (ompi_osc_rdma_state_t, local_lock)); + peer->flags &= ~OMPI_OSC_RDMA_PEER_DEMAND_LOCKED; } return OMPI_SUCCESS; } +int ompi_osc_rdma_demand_lock_peer (ompi_osc_rdma_module_t *module, ompi_osc_rdma_peer_t *peer) +{ + ompi_osc_rdma_sync_t *lock = &module->all_sync; + int ret = OMPI_SUCCESS; + + /* check for bad usage */ + assert (OMPI_OSC_RDMA_SYNC_TYPE_LOCK == lock->type); + + OPAL_THREAD_SCOPED_LOCK(&peer->lock, + do { + if (!ompi_osc_rdma_peer_is_demand_locked (peer)) { + ret = ompi_osc_rdma_lock_atomic_internal (module, peer, lock); + OPAL_THREAD_SCOPED_LOCK(&lock->lock, opal_list_append (&lock->demand_locked_peers, &peer->super)); + peer->flags |= OMPI_OSC_RDMA_PEER_DEMAND_LOCKED; + } + } while (0); + ); + + return ret; +} + int ompi_osc_rdma_lock_atomic (int lock_type, int target, int assert, ompi_win_t *win) { ompi_osc_rdma_module_t *module = GET_MODULE(win); @@ -315,9 +350,14 @@ int ompi_osc_rdma_lock_all_atomic (int assert, struct ompi_win_t *win) if (0 == (assert & MPI_MODE_NOCHECK)) { /* increment the global shared lock */ - ret = ompi_osc_rdma_lock_acquire_shared (module, module->leader, 0x0000000100000000UL, - offsetof(ompi_osc_rdma_state_t, global_lock), - 0x00000000ffffffffUL); + if (OMPI_OSC_RDMA_LOCKING_TWO_LEVEL == module->locking_mode) { + ret = ompi_osc_rdma_lock_acquire_shared (module, module->leader, 0x0000000100000000UL, + offsetof(ompi_osc_rdma_state_t, global_lock), + 0x00000000ffffffffUL); + } else { + /* always lock myself */ + ret = ompi_osc_rdma_demand_lock_peer (module, module->my_peer); + } } if (OPAL_LIKELY(OMPI_SUCCESS != ret)) { @@ -357,8 +397,19 @@ int ompi_osc_rdma_unlock_all_atomic (struct ompi_win_t *win) ompi_osc_rdma_sync_rdma_complete (lock); if (0 == (lock->sync.lock.assert & MPI_MODE_NOCHECK)) { - /* decrement the master lock shared count */ - (void) ompi_osc_rdma_lock_release_shared (module, module->leader, -0x0000000100000000UL, offsetof (ompi_osc_rdma_state_t, global_lock)); + if (OMPI_OSC_RDMA_LOCKING_ON_DEMAND == module->locking_mode) { + ompi_osc_rdma_peer_t *peer, *next; + + /* drop all on-demand locks */ + OPAL_LIST_FOREACH_SAFE(peer, next, &lock->demand_locked_peers, ompi_osc_rdma_peer_t) { + (void) ompi_osc_rdma_unlock_atomic_internal (module, peer, lock); + opal_list_remove_item (&lock->demand_locked_peers, &peer->super); + } + } else { + /* decrement the master lock shared count */ + (void) ompi_osc_rdma_lock_release_shared (module, module->leader, -0x0000000100000000UL, + offsetof (ompi_osc_rdma_state_t, global_lock)); + } } lock->type = OMPI_OSC_RDMA_SYNC_TYPE_NONE; diff --git a/ompi/mca/osc/rdma/osc_rdma_peer.c b/ompi/mca/osc/rdma/osc_rdma_peer.c index 80085124034..81ed0c2d16e 100644 --- a/ompi/mca/osc/rdma/osc_rdma_peer.c +++ b/ompi/mca/osc/rdma/osc_rdma_peer.c @@ -61,7 +61,8 @@ int ompi_osc_rdma_new_peer (struct ompi_osc_rdma_module_t *module, int peer_id, *peer_out = NULL; endpoint = ompi_osc_rdma_peer_btl_endpoint (module, peer_id); - if (OPAL_UNLIKELY(NULL == endpoint)) { + if (OPAL_UNLIKELY(NULL == endpoint && !((module->selected_btl->btl_atomic_flags & MCA_BTL_ATOMIC_SUPPORTS_GLOB) && + peer_id == ompi_comm_rank (module->comm)))) { return OMPI_ERR_UNREACH; } @@ -302,7 +303,7 @@ static void ompi_osc_rdma_peer_destruct (ompi_osc_rdma_peer_t *peer) } } -OBJ_CLASS_INSTANCE(ompi_osc_rdma_peer_t, opal_object_t, +OBJ_CLASS_INSTANCE(ompi_osc_rdma_peer_t, opal_list_item_t, ompi_osc_rdma_peer_construct, ompi_osc_rdma_peer_destruct); diff --git a/ompi/mca/osc/rdma/osc_rdma_peer.h b/ompi/mca/osc/rdma/osc_rdma_peer.h index 6716733a43a..0e46ec6dfc4 100644 --- a/ompi/mca/osc/rdma/osc_rdma_peer.h +++ b/ompi/mca/osc/rdma/osc_rdma_peer.h @@ -1,6 +1,6 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* - * Copyright (c) 2014-2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2014-2017 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -22,7 +22,7 @@ struct ompi_osc_rdma_module_t; * This object is used as a cache for information associated with a peer. */ struct ompi_osc_rdma_peer_t { - opal_object_t super; + opal_list_item_t super; /** rdma data endpoint for this peer */ struct mca_btl_base_endpoint_t *data_endpoint; @@ -36,11 +36,14 @@ struct ompi_osc_rdma_peer_t { /** registration handle associated with the state */ mca_btl_base_registration_handle_t *state_handle; + /** lock to protrct peer structure */ + opal_mutex_t lock; + /** rank of this peer in the window */ int rank; /** peer flags */ - int flags; + volatile int32_t flags; /** aggregation support */ ompi_osc_rdma_aggregation_t *aggregate; @@ -134,6 +137,8 @@ enum { OMPI_OSC_RDMA_PEER_STATE_FREE = 0x20, /** peer base handle should be freed */ OMPI_OSC_RDMA_PEER_BASE_FREE = 0x40, + /** peer was demand locked as part of lock-all (when in demand locking mode) */ + OMPI_OSC_RDMA_PEER_DEMAND_LOCKED = 0x80, }; /** @@ -188,13 +193,40 @@ static inline bool ompi_osc_rdma_peer_is_exclusive (ompi_osc_rdma_peer_t *peer) } /** - * @brief check if this process is currently accumulating on a peer + * @brief try to set a flag on a peer object * - * @param[in] peer peer object to check + * @param[in] peer peer object to modify + * @param[in] flag flag to set + * + * @returns true if the flag was not already set + * @returns flase otherwise */ -static inline bool ompi_osc_rdma_peer_is_accumulating (ompi_osc_rdma_peer_t *peer) +static inline bool ompi_osc_rdma_peer_test_set_flag (ompi_osc_rdma_peer_t *peer, int flag) { - return !!(peer->flags & OMPI_OSC_RDMA_PEER_ACCUMULATING); + int32_t flags; + + opal_atomic_mb (); + flags = peer->flags; + + do { + if (flags & flag) { + return false; + } + } while (!OPAL_ATOMIC_COMPARE_EXCHANGE_STRONG_32 (&peer->flags, &flags, flags | flag)); + + return true; +} + +/** + * @brief clear a flag from a peer object + * + * @param[in] peer peer object to modify + * @param[in] flag flag to set + */ +static inline void ompi_osc_rdma_peer_clear_flag (ompi_osc_rdma_peer_t *peer, int flag) +{ + OPAL_ATOMIC_AND_FETCH32(&peer->flags, ~flag); + opal_atomic_mb (); } /** @@ -221,5 +253,15 @@ static inline bool ompi_osc_rdma_peer_local_state (ompi_osc_rdma_peer_t *peer) return !!(peer->flags & OMPI_OSC_RDMA_PEER_LOCAL_STATE); } +/** + * @brief check if the peer has been demand locked as part of the current epoch + * + * @param[in] peer peer object to check + * + */ +static inline bool ompi_osc_rdma_peer_is_demand_locked (ompi_osc_rdma_peer_t *peer) +{ + return !!(peer->flags & OMPI_OSC_RDMA_PEER_DEMAND_LOCKED); +} #endif /* OMPI_OSC_RDMA_PEER_H */ diff --git a/ompi/mca/osc/rdma/osc_rdma_request.c b/ompi/mca/osc/rdma/osc_rdma_request.c index 625b4d380ed..44fe9a5e8c5 100644 --- a/ompi/mca/osc/rdma/osc_rdma_request.c +++ b/ompi/mca/osc/rdma/osc_rdma_request.c @@ -1,7 +1,7 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* * Copyright (c) 2011-2012 Sandia National Laboratories. All rights reserved. - * Copyright (c) 2014-2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2014-2018 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2016 The University of Tennessee and The University * of Tennessee Research Foundation. All rights @@ -44,27 +44,17 @@ static int request_free(struct ompi_request_t **ompi_req) return OMPI_SUCCESS; } -static int request_complete (struct ompi_request_t *request) -{ - ompi_osc_rdma_request_t *parent_request = ((ompi_osc_rdma_request_t *) request)->parent_request; - - if (parent_request && 0 == OPAL_THREAD_ADD32 (&parent_request->outstanding_requests, -1)) { - ompi_osc_rdma_request_complete (parent_request, OMPI_SUCCESS); - } - - return OMPI_SUCCESS; -} - static void request_construct(ompi_osc_rdma_request_t *request) { request->super.req_type = OMPI_REQUEST_WIN; request->super.req_status._cancelled = 0; request->super.req_free = request_free; request->super.req_cancel = request_cancel; - request->super.req_complete_cb = request_complete; request->parent_request = NULL; + request->to_free = NULL; request->buffer = NULL; request->internal = false; + request->cleanup = NULL; request->outstanding_requests = 0; OBJ_CONSTRUCT(&request->convertor, opal_convertor_t); } diff --git a/ompi/mca/osc/rdma/osc_rdma_request.h b/ompi/mca/osc/rdma/osc_rdma_request.h index 3cec365a7aa..ad052e172cb 100644 --- a/ompi/mca/osc/rdma/osc_rdma_request.h +++ b/ompi/mca/osc/rdma/osc_rdma_request.h @@ -1,7 +1,7 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* * Copyright (c) 2012 Sandia National Laboratories. All rights reserved. - * Copyright (c) 2014-2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2014-2018 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -25,26 +25,22 @@ enum ompi_osc_rdma_request_type_t { }; typedef enum ompi_osc_rdma_request_type_t ompi_osc_rdma_request_type_t; +struct ompi_osc_rdma_request_t; + +typedef void (*ompi_osc_rdma_request_cleanup_fn_t) (struct ompi_osc_rdma_request_t *); + struct ompi_osc_rdma_request_t { ompi_request_t super; ompi_osc_rdma_peer_t *peer; + ompi_osc_rdma_request_cleanup_fn_t cleanup; ompi_osc_rdma_request_type_t type; + void *to_free; void *origin_addr; - int origin_count; - struct ompi_datatype_t *origin_dt; - - void *result_addr; - int result_count; - struct ompi_datatype_t *result_dt; - - const void *compare_addr; - - ompi_op_t *op; ompi_osc_rdma_module_t *module; - int32_t outstanding_requests; + volatile int32_t outstanding_requests; bool internal; ptrdiff_t offset; @@ -69,35 +65,45 @@ OBJ_CLASS_DECLARATION(ompi_osc_rdma_request_t); rdma_rget, etc.), so it's ok to spin here... */ #define OMPI_OSC_RDMA_REQUEST_ALLOC(rmodule, rpeer, req) \ do { \ - opal_free_list_item_t *item; \ - do { \ - item = opal_free_list_get (&mca_osc_rdma_component.requests); \ - if (NULL == item) { \ - ompi_osc_rdma_progress (rmodule); \ - } \ - } while (NULL == item); \ - req = (ompi_osc_rdma_request_t*) item; \ - OMPI_REQUEST_INIT(&req->super, false); \ - req->super.req_mpi_object.win = module->win; \ - req->super.req_state = OMPI_REQUEST_ACTIVE; \ - req->module = rmodule; \ - req->peer = (rpeer); \ + (req) = OBJ_NEW(ompi_osc_rdma_request_t); \ + OMPI_REQUEST_INIT(&(req)->super, false); \ + (req)->super.req_mpi_object.win = (rmodule)->win; \ + (req)->super.req_state = OMPI_REQUEST_ACTIVE; \ + (req)->module = rmodule; \ + (req)->peer = (rpeer); \ } while (0) #define OMPI_OSC_RDMA_REQUEST_RETURN(req) \ do { \ OMPI_REQUEST_FINI(&(req)->super); \ free ((req)->buffer); \ - (req)->buffer = NULL; \ - (req)->parent_request = NULL; \ - (req)->internal = false; \ - (req)->outstanding_requests = 0; \ - opal_free_list_return (&mca_osc_rdma_component.requests, \ - (opal_free_list_item_t *) (req)); \ + free (req); \ } while (0) +static inline void ompi_osc_rdma_request_complete (ompi_osc_rdma_request_t *request, int mpi_error); + + +static inline void ompi_osc_rdma_request_deref (ompi_osc_rdma_request_t *request) +{ + if (1 == OPAL_THREAD_FETCH_ADD32 (&request->outstanding_requests, -1)) { + ompi_osc_rdma_request_complete (request, OMPI_SUCCESS); + } +} + static inline void ompi_osc_rdma_request_complete (ompi_osc_rdma_request_t *request, int mpi_error) { + ompi_osc_rdma_request_t *parent_request = request->parent_request; + + if (request->cleanup) { + request->cleanup (request); + } + + free (request->to_free); + + if (parent_request) { + ompi_osc_rdma_request_deref (parent_request); + } + if (!request->internal) { request->super.req_status.MPI_ERROR = mpi_error; diff --git a/ompi/mca/osc/rdma/osc_rdma_sync.c b/ompi/mca/osc/rdma/osc_rdma_sync.c index dca7e328d89..f07ea4f7839 100644 --- a/ompi/mca/osc/rdma/osc_rdma_sync.c +++ b/ompi/mca/osc/rdma/osc_rdma_sync.c @@ -1,6 +1,6 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2015-2018 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -16,15 +16,17 @@ static void ompi_osc_rdma_sync_constructor (ompi_osc_rdma_sync_t *rdma_sync) { rdma_sync->type = OMPI_OSC_RDMA_SYNC_TYPE_NONE; rdma_sync->epoch_active = false; - rdma_sync->outstanding_rdma = 0; + rdma_sync->outstanding_rdma.counter = 0; OBJ_CONSTRUCT(&rdma_sync->aggregations, opal_list_t); OBJ_CONSTRUCT(&rdma_sync->lock, opal_mutex_t); + OBJ_CONSTRUCT(&rdma_sync->demand_locked_peers, opal_list_t); } static void ompi_osc_rdma_sync_destructor (ompi_osc_rdma_sync_t *rdma_sync) { OBJ_DESTRUCT(&rdma_sync->aggregations); OBJ_DESTRUCT(&rdma_sync->lock); + OBJ_DESTRUCT(&rdma_sync->demand_locked_peers); } OBJ_CLASS_INSTANCE(ompi_osc_rdma_sync_t, opal_object_t, ompi_osc_rdma_sync_constructor, diff --git a/ompi/mca/osc/rdma/osc_rdma_sync.h b/ompi/mca/osc/rdma/osc_rdma_sync.h index c4ffbbd4c3c..e33b32d4371 100644 --- a/ompi/mca/osc/rdma/osc_rdma_sync.h +++ b/ompi/mca/osc/rdma/osc_rdma_sync.h @@ -1,6 +1,6 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2015-2018 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -33,6 +33,13 @@ typedef enum ompi_osc_rdma_sync_type_t ompi_osc_rdma_sync_type_t; struct ompi_osc_rdma_module_t; +struct ompi_osc_rdma_sync_aligned_counter_t { + volatile osc_rdma_counter_t counter; + /* pad out to next cache line */ + uint64_t padding[7]; +}; +typedef struct ompi_osc_rdma_sync_aligned_counter_t ompi_osc_rdma_sync_aligned_counter_t; + /** * @brief synchronization object * @@ -78,6 +85,9 @@ struct ompi_osc_rdma_sync_t { struct ompi_osc_rdma_peer_t *peer; } peer_list; + /** demand locked peers (lock-all) */ + opal_list_t demand_locked_peers; + /** number of peers */ int num_peers; @@ -85,7 +95,7 @@ struct ompi_osc_rdma_sync_t { bool epoch_active; /** outstanding rdma operations on epoch */ - osc_rdma_counter_t outstanding_rdma; + ompi_osc_rdma_sync_aligned_counter_t outstanding_rdma __opal_attribute_aligned__(64); /** aggregated operations in this epoch */ opal_list_t aggregations; @@ -129,30 +139,10 @@ void ompi_osc_rdma_sync_return (ompi_osc_rdma_sync_t *rdma_sync); */ bool ompi_osc_rdma_sync_pscw_peer (struct ompi_osc_rdma_module_t *module, int target, struct ompi_osc_rdma_peer_t **peer); -/** - * @brief increment the outstanding rdma operation counter (atomic) - * - * @param[in] rdma_sync osc rdma synchronization object - */ -static inline void ompi_osc_rdma_sync_rdma_inc (ompi_osc_rdma_sync_t *rdma_sync) -{ - ompi_osc_rdma_counter_add (&rdma_sync->outstanding_rdma, 1); - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_INFO, "inc: there are %ld outstanding rdma operations", - (unsigned long) rdma_sync->outstanding_rdma); -} - -/** - * @brief decrement the outstanding rdma operation counter (atomic) - * - * @param[in] rdma_sync osc rdma synchronization object - */ -static inline void ompi_osc_rdma_sync_rdma_dec (ompi_osc_rdma_sync_t *rdma_sync) +static inline int64_t ompi_osc_rdma_sync_get_count (ompi_osc_rdma_sync_t *rdma_sync) { - ompi_osc_rdma_counter_add (&rdma_sync->outstanding_rdma, -1); - - OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_INFO, "dec: there are %ld outstanding rdma operations", - (unsigned long) rdma_sync->outstanding_rdma); + return rdma_sync->outstanding_rdma.counter; } #endif /* OSC_RDMA_SYNC_H */ diff --git a/ompi/mca/osc/rdma/osc_rdma_types.h b/ompi/mca/osc/rdma/osc_rdma_types.h index 123238d0209..81d04a55ebf 100644 --- a/ompi/mca/osc/rdma/osc_rdma_types.h +++ b/ompi/mca/osc/rdma/osc_rdma_types.h @@ -1,6 +1,6 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* - * Copyright (c) 2014-2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2014-2017 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -25,7 +25,7 @@ typedef int64_t osc_rdma_base_t; typedef int64_t osc_rdma_size_t; typedef int64_t osc_rdma_counter_t; -#define ompi_osc_rdma_counter_add opal_atomic_add_64 +#define ompi_osc_rdma_counter_add opal_atomic_add_fetch_64 #else @@ -33,7 +33,7 @@ typedef int32_t osc_rdma_base_t; typedef int32_t osc_rdma_size_t; typedef int32_t osc_rdma_counter_t; -#define ompi_osc_rdma_counter_add opal_atomic_add_32 +#define ompi_osc_rdma_counter_add opal_atomic_add_fetch_32 #endif @@ -48,18 +48,18 @@ static inline int64_t ompi_osc_rdma_lock_add (volatile int64_t *p, int64_t value int64_t new; opal_atomic_mb (); - new = opal_atomic_add_64 (p, value) - value; + new = opal_atomic_add_fetch_64 (p, value) - value; opal_atomic_mb (); return new; } -static inline int ompi_osc_rdma_lock_cmpset (volatile int64_t *p, int64_t comp, int64_t value) +static inline int ompi_osc_rdma_lock_compare_exchange (volatile int64_t *p, int64_t *comp, int64_t value) { int ret; opal_atomic_mb (); - ret = opal_atomic_cmpset_64 (p, comp, value); + ret = opal_atomic_compare_exchange_strong_64 (p, comp, value); opal_atomic_mb (); return ret; @@ -76,19 +76,19 @@ static inline int32_t ompi_osc_rdma_lock_add (volatile int32_t *p, int32_t value int32_t new; opal_atomic_mb (); - /* opal_atomic_add_32 differs from normal atomics in that is returns the new value */ - new = opal_atomic_add_32 (p, value) - value; + /* opal_atomic_add_fetch_32 differs from normal atomics in that is returns the new value */ + new = opal_atomic_add_fetch_32 (p, value) - value; opal_atomic_mb (); return new; } -static inline int ompi_osc_rdma_lock_cmpset (volatile int32_t *p, int32_t comp, int32_t value) +static inline int ompi_osc_rdma_lock_compare_exchange (volatile int32_t *p, int32_t *comp, int32_t value) { int ret; opal_atomic_mb (); - ret = opal_atomic_cmpset_32 (p, comp, value); + ret = opal_atomic_compare_exchange_strong_32 (p, comp, value); opal_atomic_mb (); return ret; @@ -205,6 +205,42 @@ typedef struct ompi_osc_rdma_aggregation_t ompi_osc_rdma_aggregation_t; OBJ_CLASS_DECLARATION(ompi_osc_rdma_aggregation_t); +typedef void (*ompi_osc_rdma_pending_op_cb_fn_t) (void *, void *, int); + +struct ompi_osc_rdma_pending_op_t { + opal_list_item_t super; + struct ompi_osc_rdma_frag_t *op_frag; + void *op_buffer; + void *op_result; + size_t op_size; + volatile bool op_complete; + ompi_osc_rdma_pending_op_cb_fn_t cbfunc; + void *cbdata; + void *cbcontext; +}; + +typedef struct ompi_osc_rdma_pending_op_t ompi_osc_rdma_pending_op_t; + +OBJ_CLASS_DECLARATION(ompi_osc_rdma_pending_op_t); + +/** Communication buffer for packing messages */ +struct ompi_osc_rdma_frag_t { + opal_free_list_item_t super; + + /* Number of operations which have started writing into the frag, but not yet completed doing so */ + volatile int32_t pending; +#if OPAL_HAVE_ATOMIC_MATH_64 + volatile int64_t curr_index; +#else + volatile int32_t curr_index; +#endif + + struct ompi_osc_rdma_module_t *module; + mca_btl_base_registration_handle_t *handle; +}; +typedef struct ompi_osc_rdma_frag_t ompi_osc_rdma_frag_t; +OBJ_CLASS_DECLARATION(ompi_osc_rdma_frag_t); + #define OSC_RDMA_VERBOSE(x, ...) OPAL_OUTPUT_VERBOSE((x, ompi_osc_base_framework.framework_output, __VA_ARGS__)) #endif /* OMPI_OSC_RDMA_TYPES_H */ diff --git a/ompi/mca/osc/sm/Makefile.am b/ompi/mca/osc/sm/Makefile.am index 8a5a8284d2c..01d230d7bf3 100644 --- a/ompi/mca/osc/sm/Makefile.am +++ b/ompi/mca/osc/sm/Makefile.am @@ -1,5 +1,6 @@ # # Copyright (c) 2011 Sandia National Laboratories. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -33,7 +34,8 @@ endif mcacomponentdir = $(pkglibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_osc_sm_la_SOURCES = $(sm_sources) -mca_osc_sm_la_LIBADD = $(osc_sm_LIBS) +mca_osc_sm_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(osc_sm_LIBS) mca_osc_sm_la_LDFLAGS = -module -avoid-version $(osc_sm_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/ompi/mca/osc/sm/osc_sm.h b/ompi/mca/osc/sm/osc_sm.h index 7c058465b07..b27aa83365c 100644 --- a/ompi/mca/osc/sm/osc_sm.h +++ b/ompi/mca/osc/sm/osc_sm.h @@ -3,8 +3,9 @@ * Copyright (c) 2012 Sandia National Laboratories. All rights reserved. * Copyright (c) 2014-2015 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -17,6 +18,20 @@ #include "opal/mca/shmem/base/base.h" +#if OPAL_HAVE_ATOMIC_MATH_64 + +typedef uint64_t osc_sm_post_type_t; +#define OSC_SM_POST_BITS 6 +#define OSC_SM_POST_MASK 0x3f + +#else + +typedef uint32_t osc_sm_post_type_t; +#define OSC_SM_POST_BITS 5 +#define OSC_SM_POST_MASK 0x1f + +#endif + /* data shared across all peers */ struct ompi_osc_sm_global_state_t { int use_barrier_for_fence; @@ -80,7 +95,8 @@ struct ompi_osc_sm_module_t { ompi_osc_sm_global_state_t *global_state; ompi_osc_sm_node_state_t *my_node_state; ompi_osc_sm_node_state_t *node_states; - uint64_t **posts; + + osc_sm_post_type_t ** volatile posts; opal_mutex_t lock; }; @@ -97,7 +113,7 @@ int ompi_osc_sm_put(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win); @@ -106,7 +122,7 @@ int ompi_osc_sm_get(void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win); @@ -115,7 +131,7 @@ int ompi_osc_sm_accumulate(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_op_t *op, @@ -126,14 +142,14 @@ int ompi_osc_sm_compare_and_swap(const void *origin_addr, void *result_addr, struct ompi_datatype_t *dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, struct ompi_win_t *win); int ompi_osc_sm_fetch_and_op(const void *origin_addr, void *result_addr, struct ompi_datatype_t *dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, struct ompi_op_t *op, struct ompi_win_t *win); @@ -154,7 +170,7 @@ int ompi_osc_sm_rput(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win, @@ -164,7 +180,7 @@ int ompi_osc_sm_rget(void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win, @@ -174,7 +190,7 @@ int ompi_osc_sm_raccumulate(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_op_t *op, @@ -235,7 +251,7 @@ int ompi_osc_sm_flush_local(int target, struct ompi_win_t *win); int ompi_osc_sm_flush_local_all(struct ompi_win_t *win); -int ompi_osc_sm_set_info(struct ompi_win_t *win, struct ompi_info_t *info); -int ompi_osc_sm_get_info(struct ompi_win_t *win, struct ompi_info_t **info_used); +int ompi_osc_sm_set_info(struct ompi_win_t *win, struct opal_info_t *info); +int ompi_osc_sm_get_info(struct ompi_win_t *win, struct opal_info_t **info_used); #endif diff --git a/ompi/mca/osc/sm/osc_sm_active_target.c b/ompi/mca/osc/sm/osc_sm_active_target.c index 5a67454725a..ab0f73f87c6 100644 --- a/ompi/mca/osc/sm/osc_sm_active_target.c +++ b/ompi/mca/osc/sm/osc_sm_active_target.c @@ -1,7 +1,7 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* * Copyright (c) 2012 Sandia National Laboratories. All rights reserved. - * Copyright (c) 2014-2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2014-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights @@ -130,10 +130,11 @@ ompi_osc_sm_start(struct ompi_group_t *group, ompi_osc_sm_module_t *module = (ompi_osc_sm_module_t*) win->w_osc_module; int my_rank = ompi_comm_rank (module->comm); + void *_tmp_ptr = NULL; OBJ_RETAIN(group); - if (!OPAL_ATOMIC_CMPSET_PTR(&module->start_group, NULL, group)) { + if (!OPAL_ATOMIC_COMPARE_EXCHANGE_STRONG_PTR(&module->start_group, (void *) &_tmp_ptr, group)) { OBJ_RELEASE(group); return OMPI_ERR_RMA_SYNC; } @@ -149,8 +150,8 @@ ompi_osc_sm_start(struct ompi_group_t *group, size = ompi_group_size(module->start_group); for (int i = 0 ; i < size ; ++i) { - int rank_byte = ranks[i] >> 6; - uint64_t old, rank_bit = ((uint64_t) 1) << (ranks[i] & 0x3f); + int rank_byte = ranks[i] >> OSC_SM_POST_BITS; + osc_sm_post_type_t rank_bit = ((osc_sm_post_type_t) 1) << (ranks[i] & 0x3f); /* wait for rank to post */ while (!(module->posts[my_rank][rank_byte] & rank_bit)) { @@ -160,9 +161,11 @@ ompi_osc_sm_start(struct ompi_group_t *group, opal_atomic_rmb (); - do { - old = module->posts[my_rank][rank_byte]; - } while (!opal_atomic_cmpset_64 ((int64_t *) module->posts[my_rank] + rank_byte, old, old ^ rank_bit)); +#if OPAL_HAVE_ATOMIC_MATH_64 + (void) opal_atomic_fetch_xor_64 ((volatile int64_t *) module->posts[my_rank] + rank_byte, rank_bit); +#else + (void) opal_atomic_fetch_xor_32 ((volatile int32_t *) module->posts[my_rank] + rank_byte, rank_bit); +#endif } free (ranks); @@ -185,7 +188,7 @@ ompi_osc_sm_complete(struct ompi_win_t *win) opal_atomic_mb(); group = module->start_group; - if (NULL == group || !OPAL_ATOMIC_CMPSET_PTR(&module->start_group, group, NULL)) { + if (NULL == group || !OPAL_ATOMIC_COMPARE_EXCHANGE_STRONG_PTR(&module->start_group, &group, NULL)) { return OMPI_ERR_RMA_SYNC; } @@ -198,7 +201,7 @@ ompi_osc_sm_complete(struct ompi_win_t *win) gsize = ompi_group_size(group); for (int i = 0 ; i < gsize ; ++i) { - (void) opal_atomic_add_32(&module->node_states[ranks[i]].complete_count, 1); + (void) opal_atomic_add_fetch_32(&module->node_states[ranks[i]].complete_count, 1); } free (ranks); @@ -244,7 +247,7 @@ ompi_osc_sm_post(struct ompi_group_t *group, gsize = ompi_group_size(module->post_group); for (int i = 0 ; i < gsize ; ++i) { - (void) opal_atomic_add_64 ((int64_t *) module->posts[ranks[i]] + my_byte, my_bit); + opal_atomic_add ((volatile osc_sm_post_type_t *) module->posts[ranks[i]] + my_byte, my_bit); } opal_atomic_wmb (); diff --git a/ompi/mca/osc/sm/osc_sm_comm.c b/ompi/mca/osc/sm/osc_sm_comm.c index e6f3da44e68..b6094dd16eb 100644 --- a/ompi/mca/osc/sm/osc_sm_comm.c +++ b/ompi/mca/osc/sm/osc_sm_comm.c @@ -3,7 +3,7 @@ * Copyright (c) 2011 Sandia National Laboratories. All rights reserved. * Copyright (c) 2014 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -25,7 +25,7 @@ ompi_osc_sm_rput(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win, @@ -65,7 +65,7 @@ ompi_osc_sm_rget(void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win, @@ -105,7 +105,7 @@ ompi_osc_sm_raccumulate(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_op_t *op, @@ -210,7 +210,7 @@ ompi_osc_sm_put(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win) @@ -241,7 +241,7 @@ ompi_osc_sm_get(void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_win_t *win) @@ -272,7 +272,7 @@ ompi_osc_sm_accumulate(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, int target_count, struct ompi_datatype_t *target_dt, struct ompi_op_t *op, @@ -365,7 +365,7 @@ ompi_osc_sm_compare_and_swap(const void *origin_addr, void *result_addr, struct ompi_datatype_t *dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, struct ompi_win_t *win) { ompi_osc_sm_module_t *module = @@ -404,7 +404,7 @@ ompi_osc_sm_fetch_and_op(const void *origin_addr, void *result_addr, struct ompi_datatype_t *dt, int target, - OPAL_PTRDIFF_TYPE target_disp, + ptrdiff_t target_disp, struct ompi_op_t *op, struct ompi_win_t *win) { diff --git a/ompi/mca/osc/sm/osc_sm_component.c b/ompi/mca/osc/sm/osc_sm_component.c index 22c1d8dd5f3..f7211cd93cc 100644 --- a/ompi/mca/osc/sm/osc_sm_component.c +++ b/ompi/mca/osc/sm/osc_sm_component.c @@ -5,11 +5,12 @@ * reserved. * Copyright (c) 2014 Intel, Inc. All rights reserved. * Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015-2017 Research Organization for Information Science + * Copyright (c) 2015-2018 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -25,6 +26,7 @@ #include "ompi/request/request.h" #include "opal/util/sys_limits.h" #include "opal/include/opal/align.h" +#include "opal/util/info_subscriber.h" #include "osc_sm.h" @@ -32,11 +34,13 @@ static int component_open(void); static int component_init(bool enable_progress_threads, bool enable_mpi_threads); static int component_finalize(void); static int component_query(struct ompi_win_t *win, void **base, size_t size, int disp_unit, - struct ompi_communicator_t *comm, struct ompi_info_t *info, + struct ompi_communicator_t *comm, struct opal_info_t *info, int flavor); static int component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit, - struct ompi_communicator_t *comm, struct ompi_info_t *info, + struct ompi_communicator_t *comm, struct opal_info_t *info, int flavor, int *model); +static char* component_set_blocking_fence_info(opal_infosubscriber_t *obj, char *key, char *val); +static char* component_set_alloc_shared_noncontig_info(opal_infosubscriber_t *obj, char *key, char *val); ompi_osc_sm_component_t mca_osc_sm_component = { @@ -98,9 +102,6 @@ ompi_osc_sm_module_t ompi_osc_sm_module_template = { .osc_flush_all = ompi_osc_sm_flush_all, .osc_flush_local = ompi_osc_sm_flush_local, .osc_flush_local_all = ompi_osc_sm_flush_local_all, - - .osc_set_info = ompi_osc_sm_set_info, - .osc_get_info = ompi_osc_sm_get_info } }; @@ -146,7 +147,7 @@ check_win_ok(ompi_communicator_t *comm, int flavor) static int component_query(struct ompi_win_t *win, void **base, size_t size, int disp_unit, - struct ompi_communicator_t *comm, struct ompi_info_t *info, + struct ompi_communicator_t *comm, struct opal_info_t *info, int flavor) { int ret; @@ -163,7 +164,7 @@ component_query(struct ompi_win_t *win, void **base, size_t size, int disp_unit, static int component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit, - struct ompi_communicator_t *comm, struct ompi_info_t *info, + struct ompi_communicator_t *comm, struct opal_info_t *info, int flavor, int *model) { ompi_osc_sm_module_t *module = NULL; @@ -179,8 +180,14 @@ component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit calloc(1, sizeof(ompi_osc_sm_module_t)); if (NULL == module) return OMPI_ERR_TEMP_OUT_OF_RESOURCE; + win->w_osc_module = &module->super; + OBJ_CONSTRUCT(&module->lock, opal_mutex_t); + ret = opal_infosubscribe_subscribe(&(win->super), "alloc_shared_contig", "false", component_set_alloc_shared_noncontig_info); + + if (OPAL_SUCCESS != ret) goto error; + /* fill in the function pointer part */ memcpy(module, &ompi_osc_sm_module_template, sizeof(ompi_osc_base_module_t)); @@ -207,9 +214,9 @@ component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit if (NULL == module->global_state) return OMPI_ERR_TEMP_OUT_OF_RESOURCE; module->node_states = malloc(sizeof(ompi_osc_sm_node_state_t)); if (NULL == module->node_states) return OMPI_ERR_TEMP_OUT_OF_RESOURCE; - module->posts = calloc (1, sizeof(module->posts[0]) + sizeof (uint64_t)); + module->posts = calloc (1, sizeof(module->posts[0]) + sizeof (module->posts[0][0])); if (NULL == module->posts) return OMPI_ERR_TEMP_OUT_OF_RESOURCE; - module->posts[0] = (uint64_t *) (module->posts + 1); + module->posts[0] = (osc_sm_post_type_t *) (module->posts + 1); } else { unsigned long total, *rbuf; int i, flag; @@ -227,7 +234,7 @@ component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit if (NULL == rbuf) return OMPI_ERR_TEMP_OUT_OF_RESOURCE; module->noncontig = false; - if (OMPI_SUCCESS != ompi_info_get_bool(info, "alloc_shared_noncontig", + if (OMPI_SUCCESS != opal_info_get_bool(info, "alloc_shared_noncontig", &module->noncontig, &flag)) { goto error; } @@ -251,7 +258,7 @@ component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit /* user opal/shmem directly to create a shared memory segment */ state_size = sizeof(ompi_osc_sm_global_state_t) + sizeof(ompi_osc_sm_node_state_t) * comm_size; state_size += OPAL_ALIGN_PAD_AMOUNT(state_size, 64); - posts_size = comm_size * post_size * sizeof (uint64_t); + posts_size = comm_size * post_size * sizeof (module->posts[0][0]); posts_size += OPAL_ALIGN_PAD_AMOUNT(posts_size, 64); if (0 == ompi_comm_rank (module->comm)) { char *data_file; @@ -288,7 +295,7 @@ component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit if (NULL == module->posts) return OMPI_ERR_TEMP_OUT_OF_RESOURCE; /* set module->posts[0] first to ensure 64-bit alignment */ - module->posts[0] = (uint64_t *) (module->segment_base); + module->posts[0] = (osc_sm_post_type_t *) (module->segment_base); module->global_state = (ompi_osc_sm_global_state_t *) (module->posts[0] + comm_size * post_size); module->node_states = (ompi_osc_sm_node_state_t *) (module->global_state + 1); @@ -315,7 +322,7 @@ component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit *base = module->bases[ompi_comm_rank(module->comm)]; - opal_atomic_init(&module->my_node_state->accumulate_lock, OPAL_ATOMIC_UNLOCKED); + opal_atomic_lock_init(&module->my_node_state->accumulate_lock, OPAL_ATOMIC_LOCK_UNLOCKED); /* share everyone's displacement units. */ module->disp_units = malloc(sizeof(int) * comm_size); @@ -344,7 +351,7 @@ component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit bool blocking_fence=false; int flag; - if (OMPI_SUCCESS != ompi_info_get_bool(info, "blocking_fence", + if (OMPI_SUCCESS != opal_info_get_bool(info, "blocking_fence", &blocking_fence, &flag)) { goto error; } @@ -378,18 +385,20 @@ component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit #endif } + ret = opal_infosubscribe_subscribe(&(win->super), "blocking_fence", "false", + component_set_blocking_fence_info); + + if (OPAL_SUCCESS != ret) goto error; + ret = module->comm->c_coll->coll_barrier(module->comm, module->comm->c_coll->coll_barrier_module); if (OMPI_SUCCESS != ret) goto error; *model = MPI_WIN_UNIFIED; - win->w_osc_module = &module->super; - return OMPI_SUCCESS; error: - win->w_osc_module = &module->super; ompi_osc_sm_free (win); return ret; @@ -476,7 +485,9 @@ ompi_osc_sm_free(struct ompi_win_t *win) } else { free(module->node_states); free(module->global_state); - free(module->bases[0]); + if (NULL != module->bases) { + free(module->bases[0]); + } } free(module->disp_units); free(module->outstanding_locks); @@ -497,7 +508,7 @@ ompi_osc_sm_free(struct ompi_win_t *win) int -ompi_osc_sm_set_info(struct ompi_win_t *win, struct ompi_info_t *info) +ompi_osc_sm_set_info(struct ompi_win_t *win, struct opal_info_t *info) { ompi_osc_sm_module_t *module = (ompi_osc_sm_module_t*) win->w_osc_module; @@ -508,19 +519,42 @@ ompi_osc_sm_set_info(struct ompi_win_t *win, struct ompi_info_t *info) } +static char* +component_set_blocking_fence_info(opal_infosubscriber_t *obj, char *key, char *val) +{ + ompi_osc_sm_module_t *module = (ompi_osc_sm_module_t*) ((struct ompi_win_t*) obj)->w_osc_module; +/* + * Assuming that you can't change the default. + */ + return module->global_state->use_barrier_for_fence ? "true" : "false"; +} + + +static char* +component_set_alloc_shared_noncontig_info(opal_infosubscriber_t *obj, char *key, char *val) +{ + + ompi_osc_sm_module_t *module = (ompi_osc_sm_module_t*) ((struct ompi_win_t*) obj)->w_osc_module; +/* + * Assuming that you can't change the default. + */ + return module->noncontig ? "true" : "false"; +} + + int -ompi_osc_sm_get_info(struct ompi_win_t *win, struct ompi_info_t **info_used) +ompi_osc_sm_get_info(struct ompi_win_t *win, struct opal_info_t **info_used) { ompi_osc_sm_module_t *module = (ompi_osc_sm_module_t*) win->w_osc_module; - ompi_info_t *info = OBJ_NEW(ompi_info_t); + opal_info_t *info = OBJ_NEW(opal_info_t); if (NULL == info) return OMPI_ERR_TEMP_OUT_OF_RESOURCE; if (module->flavor == MPI_WIN_FLAVOR_SHARED) { - ompi_info_set(info, "blocking_fence", + opal_info_set(info, "blocking_fence", (1 == module->global_state->use_barrier_for_fence) ? "true" : "false"); - ompi_info_set(info, "alloc_shared_noncontig", + opal_info_set(info, "alloc_shared_noncontig", (module->noncontig) ? "true" : "false"); } diff --git a/ompi/mca/osc/sm/osc_sm_passive_target.c b/ompi/mca/osc/sm/osc_sm_passive_target.c index 889ac829dd1..a3388b776a4 100644 --- a/ompi/mca/osc/sm/osc_sm_passive_target.c +++ b/ompi/mca/osc/sm/osc_sm_passive_target.c @@ -26,9 +26,9 @@ lk_fetch_add32(ompi_osc_sm_module_t *module, size_t offset, uint32_t delta) { - /* opal_atomic_add_32 is an add then fetch so delta needs to be subtracted out to get the + /* opal_atomic_add_fetch_32 is an add then fetch so delta needs to be subtracted out to get the * old value */ - return opal_atomic_add_32((int32_t*) ((char*) &module->node_states[target].lock + offset), + return opal_atomic_add_fetch_32((int32_t*) ((char*) &module->node_states[target].lock + offset), delta) - delta; } @@ -39,7 +39,7 @@ lk_add32(ompi_osc_sm_module_t *module, size_t offset, uint32_t delta) { - opal_atomic_add_32((int32_t*) ((char*) &module->node_states[target].lock + offset), + opal_atomic_add_fetch_32((int32_t*) ((char*) &module->node_states[target].lock + offset), delta); } diff --git a/ompi/mca/osc/ucx/Makefile.am b/ompi/mca/osc/ucx/Makefile.am new file mode 100644 index 00000000000..e301686c3b2 --- /dev/null +++ b/ompi/mca/osc/ucx/Makefile.am @@ -0,0 +1,44 @@ +# +# Copyright (C) Mellanox Technologies Ltd. 2001-2017. ALL RIGHTS RESERVED. +# Copyright (c) 2017 IBM Corporation. All rights reserved. +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +ucx_sources = \ + osc_ucx.h \ + osc_ucx_request.h \ + osc_ucx_comm.c \ + osc_ucx_component.c \ + osc_ucx_request.c \ + osc_ucx_active_target.c \ + osc_ucx_passive_target.c + +AM_CPPFLAGS = $(osc_ucx_CPPFLAGS) + +# Make the output library in this directory, and name it either +# mca__.la (for DSO builds) or libmca__.la +# (for static builds). + +if MCA_BUILD_ompi_osc_ucx_DSO +component_noinst = +component_install = mca_osc_ucx.la +else +component_noinst = libmca_osc_ucx.la +component_install = +endif + +mcacomponentdir = $(pkglibdir) +mcacomponent_LTLIBRARIES = $(component_install) +mca_osc_ucx_la_SOURCES = $(ucx_sources) +mca_osc_ucx_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(osc_ucx_LIBS) +mca_osc_ucx_la_LDFLAGS = -module -avoid-version $(osc_ucx_LDFLAGS) + +noinst_LTLIBRARIES = $(component_noinst) +libmca_osc_ucx_la_SOURCES = $(ucx_sources) +libmca_osc_ucx_la_LIBADD = $(osc_ucx_LIBS) +libmca_osc_ucx_la_LDFLAGS = -module -avoid-version $(osc_ucx_LDFLAGS) diff --git a/ompi/mca/osc/ucx/configure.m4 b/ompi/mca/osc/ucx/configure.m4 new file mode 100644 index 00000000000..72f5527d97b --- /dev/null +++ b/ompi/mca/osc/ucx/configure.m4 @@ -0,0 +1,36 @@ +# -*- shell-script -*- +# +# Copyright (C) Mellanox Technologies Ltd. 2001-2017. ALL RIGHTS RESERVED. +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +# MCA_ompi_osc_ucx_POST_CONFIG(will_build) +# ---------------------------------------- +# Only require the tag if we're actually going to be built +AC_DEFUN([MCA_ompi_osc_ucx_POST_CONFIG], [ + AS_IF([test "$1" = "1"], [OMPI_REQUIRE_ENDPOINT_TAG([UCX])]) +])dnl + +# MCA_osc_ucx_CONFIG(action-if-can-compile, +# [action-if-cant-compile]) +# ------------------------------------------------ +AC_DEFUN([MCA_ompi_osc_ucx_CONFIG],[ + AC_CONFIG_FILES([ompi/mca/osc/ucx/Makefile]) + + OMPI_CHECK_UCX([osc_ucx], + [osc_ucx_happy="yes"], + [osc_ucx_happy="no"]) + + AS_IF([test "$osc_ucx_happy" = "yes"], + [$1], + [$2]) + + # substitute in the things needed to build ucx + AC_SUBST([osc_ucx_CPPFLAGS]) + AC_SUBST([osc_ucx_LDFLAGS]) + AC_SUBST([osc_ucx_LIBS]) +])dnl diff --git a/ompi/mca/osc/ucx/osc_ucx.h b/ompi/mca/osc/ucx/osc_ucx.h new file mode 100644 index 00000000000..5bba9080ba5 --- /dev/null +++ b/ompi/mca/osc/ucx/osc_ucx.h @@ -0,0 +1,215 @@ +/* + * Copyright (C) Mellanox Technologies Ltd. 2001-2017. ALL RIGHTS RESERVED. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#ifndef OMPI_OSC_UCX_H +#define OMPI_OSC_UCX_H + +#include + +#include "ompi/group/group.h" +#include "ompi/communicator/communicator.h" + +#define OMPI_OSC_UCX_POST_PEER_MAX 32 +#define OMPI_OSC_UCX_ATTACH_MAX 32 +#define OMPI_OSC_UCX_RKEY_BUF_MAX 1024 + +typedef struct ompi_osc_ucx_win_info { + ucp_rkey_h rkey; + uint64_t addr; + bool rkey_init; +} ompi_osc_ucx_win_info_t; + +typedef struct ompi_osc_ucx_component { + ompi_osc_base_component_t super; + ucp_context_h ucp_context; + ucp_worker_h ucp_worker; + bool enable_mpi_threads; + opal_free_list_t requests; /* request free list for the r* communication variants */ + int num_incomplete_req_ops; + unsigned int priority; +} ompi_osc_ucx_component_t; + +OMPI_DECLSPEC extern ompi_osc_ucx_component_t mca_osc_ucx_component; + +typedef enum ompi_osc_ucx_epoch { + NONE_EPOCH, + FENCE_EPOCH, + POST_WAIT_EPOCH, + START_COMPLETE_EPOCH, + PASSIVE_EPOCH, + PASSIVE_ALL_EPOCH +} ompi_osc_ucx_epoch_t; + +typedef struct ompi_osc_ucx_epoch_type { + ompi_osc_ucx_epoch_t access; + ompi_osc_ucx_epoch_t exposure; +} ompi_osc_ucx_epoch_type_t; + +#define TARGET_LOCK_UNLOCKED ((uint64_t)(0x0000000000000000ULL)) +#define TARGET_LOCK_EXCLUSIVE ((uint64_t)(0x0000000100000000ULL)) + +#define OSC_UCX_IOVEC_MAX 128 +#define OSC_UCX_OPS_THRESHOLD 1000000 + +#define OSC_UCX_STATE_LOCK_OFFSET 0 +#define OSC_UCX_STATE_REQ_FLAG_OFFSET sizeof(uint64_t) +#define OSC_UCX_STATE_ACC_LOCK_OFFSET (sizeof(uint64_t) * 2) +#define OSC_UCX_STATE_COMPLETE_COUNT_OFFSET (sizeof(uint64_t) * 3) +#define OSC_UCX_STATE_POST_INDEX_OFFSET (sizeof(uint64_t) * 4) +#define OSC_UCX_STATE_POST_STATE_OFFSET (sizeof(uint64_t) * 5) +#define OSC_UCX_STATE_DYNAMIC_WIN_CNT_OFFSET (sizeof(uint64_t) * (5 + OMPI_OSC_UCX_POST_PEER_MAX)) + +typedef struct ompi_osc_dynamic_win_info { + uint64_t base; + size_t size; + char rkey_buffer[OMPI_OSC_UCX_RKEY_BUF_MAX]; +} ompi_osc_dynamic_win_info_t; + +typedef struct ompi_osc_local_dynamic_win_info { + ucp_mem_h memh; + int refcnt; +} ompi_osc_local_dynamic_win_info_t; + +typedef struct ompi_osc_ucx_state { + volatile uint64_t lock; + volatile uint64_t req_flag; + volatile uint64_t acc_lock; + volatile uint64_t complete_count; /* # msgs received from complete processes */ + volatile uint64_t post_index; + volatile uint64_t post_state[OMPI_OSC_UCX_POST_PEER_MAX]; + volatile uint64_t dynamic_win_count; + volatile ompi_osc_dynamic_win_info_t dynamic_wins[OMPI_OSC_UCX_ATTACH_MAX]; +} ompi_osc_ucx_state_t; + +typedef struct ompi_osc_ucx_module { + ompi_osc_base_module_t super; + struct ompi_communicator_t *comm; + ucp_mem_h memh; /* remote accessible memory */ + int flavor; + size_t size; + ucp_mem_h state_memh; + ompi_osc_ucx_win_info_t *win_info_array; + ompi_osc_ucx_win_info_t *state_info_array; + int disp_unit; /* if disp_unit >= 0, then everyone has the same + * disp unit size; if disp_unit == -1, then we + * need to look at disp_units */ + int *disp_units; + + ompi_osc_ucx_state_t state; /* remote accessible flags */ + ompi_osc_local_dynamic_win_info_t local_dynamic_win_info[OMPI_OSC_UCX_ATTACH_MAX]; + ompi_osc_ucx_epoch_type_t epoch_type; + ompi_group_t *start_group; + ompi_group_t *post_group; + opal_hash_table_t outstanding_locks; + opal_list_t pending_posts; + int lock_count; + int post_count; + int global_ops_num; + int *per_target_ops_nums; + uint64_t req_result; + int *start_grp_ranks; + bool lock_all_is_nocheck; +} ompi_osc_ucx_module_t; + +typedef enum locktype { + LOCK_EXCLUSIVE, + LOCK_SHARED +} lock_type_t; + +typedef struct ompi_osc_ucx_lock { + opal_object_t super; + int target_rank; + lock_type_t type; + bool is_nocheck; +} ompi_osc_ucx_lock_t; + +#define OSC_UCX_GET_EP(comm_, rank_) (ompi_comm_peer_lookup(comm_, rank_)->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_UCX]) +#define OSC_UCX_GET_DISP(module_, rank_) ((module_->disp_unit < 0) ? module_->disp_units[rank_] : module_->disp_unit) + +int ompi_osc_ucx_win_attach(struct ompi_win_t *win, void *base, size_t len); +int ompi_osc_ucx_win_detach(struct ompi_win_t *win, const void *base); +int ompi_osc_ucx_free(struct ompi_win_t *win); + +int ompi_osc_ucx_put(const void *origin_addr, int origin_count, + struct ompi_datatype_t *origin_dt, + int target, ptrdiff_t target_disp, int target_count, + struct ompi_datatype_t *target_dt, struct ompi_win_t *win); +int ompi_osc_ucx_get(void *origin_addr, int origin_count, + struct ompi_datatype_t *origin_dt, + int target, ptrdiff_t target_disp, int target_count, + struct ompi_datatype_t *target_dt, struct ompi_win_t *win); +int ompi_osc_ucx_accumulate(const void *origin_addr, int origin_count, + struct ompi_datatype_t *origin_dt, + int target, ptrdiff_t target_disp, int target_count, + struct ompi_datatype_t *target_dt, + struct ompi_op_t *op, struct ompi_win_t *win); +int ompi_osc_ucx_compare_and_swap(const void *origin_addr, const void *compare_addr, + void *result_addr, struct ompi_datatype_t *dt, + int target, ptrdiff_t target_disp, + struct ompi_win_t *win); +int ompi_osc_ucx_fetch_and_op(const void *origin_addr, void *result_addr, + struct ompi_datatype_t *dt, int target, + ptrdiff_t target_disp, struct ompi_op_t *op, + struct ompi_win_t *win); +int ompi_osc_ucx_get_accumulate(const void *origin_addr, int origin_count, + struct ompi_datatype_t *origin_datatype, + void *result_addr, int result_count, + struct ompi_datatype_t *result_datatype, + int target_rank, ptrdiff_t target_disp, + int target_count, struct ompi_datatype_t *target_datatype, + struct ompi_op_t *op, struct ompi_win_t *win); +int ompi_osc_ucx_rput(const void *origin_addr, int origin_count, + struct ompi_datatype_t *origin_dt, + int target, ptrdiff_t target_disp, int target_count, + struct ompi_datatype_t *target_dt, + struct ompi_win_t *win, struct ompi_request_t **request); +int ompi_osc_ucx_rget(void *origin_addr, int origin_count, + struct ompi_datatype_t *origin_dt, + int target, ptrdiff_t target_disp, int target_count, + struct ompi_datatype_t *target_dt, struct ompi_win_t *win, + struct ompi_request_t **request); +int ompi_osc_ucx_raccumulate(const void *origin_addr, int origin_count, + struct ompi_datatype_t *origin_dt, + int target, ptrdiff_t target_disp, int target_count, + struct ompi_datatype_t *target_dt, struct ompi_op_t *op, + struct ompi_win_t *win, struct ompi_request_t **request); +int ompi_osc_ucx_rget_accumulate(const void *origin_addr, int origin_count, + struct ompi_datatype_t *origin_datatype, + void *result_addr, int result_count, + struct ompi_datatype_t *result_datatype, + int target_rank, ptrdiff_t target_disp, int target_count, + struct ompi_datatype_t *target_datatype, + struct ompi_op_t *op, struct ompi_win_t *win, + struct ompi_request_t **request); + +int ompi_osc_ucx_fence(int assert, struct ompi_win_t *win); +int ompi_osc_ucx_start(struct ompi_group_t *group, int assert, struct ompi_win_t *win); +int ompi_osc_ucx_complete(struct ompi_win_t *win); +int ompi_osc_ucx_post(struct ompi_group_t *group, int assert, struct ompi_win_t *win); +int ompi_osc_ucx_wait(struct ompi_win_t *win); +int ompi_osc_ucx_test(struct ompi_win_t *win, int *flag); + +int ompi_osc_ucx_lock(int lock_type, int target, int assert, struct ompi_win_t *win); +int ompi_osc_ucx_unlock(int target, struct ompi_win_t *win); +int ompi_osc_ucx_lock_all(int assert, struct ompi_win_t *win); +int ompi_osc_ucx_unlock_all(struct ompi_win_t *win); +int ompi_osc_ucx_sync(struct ompi_win_t *win); +int ompi_osc_ucx_flush(int target, struct ompi_win_t *win); +int ompi_osc_ucx_flush_all(struct ompi_win_t *win); +int ompi_osc_ucx_flush_local(int target, struct ompi_win_t *win); +int ompi_osc_ucx_flush_local_all(struct ompi_win_t *win); + +int ompi_osc_find_attached_region_position(ompi_osc_dynamic_win_info_t *dynamic_wins, + int min_index, int max_index, + uint64_t base, size_t len, int *insert); + +void req_completion(void *request, ucs_status_t status); +void internal_req_init(void *request); + +#endif /* OMPI_OSC_UCX_H */ diff --git a/ompi/mca/osc/ucx/osc_ucx_active_target.c b/ompi/mca/osc/ucx/osc_ucx_active_target.c new file mode 100644 index 00000000000..50eebdb19ff --- /dev/null +++ b/ompi/mca/osc/ucx/osc_ucx_active_target.c @@ -0,0 +1,360 @@ +/* + * Copyright (c) 2004-2005 The Trustees of Indiana University. + * All rights reserved. + * Copyright (c) 2004-2005 The Trustees of the University of Tennessee. + * All rights reserved. + * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, + * University of Stuttgart. All rights reserved. + * Copyright (c) 2004-2005 The Regents of the University of California. + * All rights reserved. + * Copyright (c) 2007-2015 Los Alamos National Security, LLC. All rights + * reserved. + * Copyright (c) 2010 IBM Corporation. All rights reserved. + * Copyright (c) 2012-2013 Sandia National Laboratories. All rights reserved. + * Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * Copyright (C) Mellanox Technologies Ltd. 2001-2017. ALL RIGHTS RESERVED. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "ompi_config.h" + +#include "ompi/mca/osc/osc.h" +#include "ompi/mca/osc/base/base.h" +#include "ompi/mca/osc/base/osc_base_obj_convert.h" + +#include "osc_ucx.h" + +typedef struct ompi_osc_ucx_pending_post { + opal_list_item_t super; + int rank; +} ompi_osc_ucx_pending_post_t; + +OBJ_CLASS_INSTANCE(ompi_osc_ucx_pending_post_t, opal_list_item_t, NULL, NULL); + +static inline void ompi_osc_ucx_handle_incoming_post(ompi_osc_ucx_module_t *module, volatile uint64_t *post_ptr, int ranks_in_win_grp[], int grp_size) { + int i, post_rank = (*post_ptr) - 1; + ompi_osc_ucx_pending_post_t *pending_post = NULL; + + (*post_ptr) = 0; + + for (i = 0; i < grp_size; i++) { + if (post_rank == ranks_in_win_grp[i]) { + module->post_count++; + return; + } + } + + /* post does not belong to this start epoch. save it for later */ + pending_post = OBJ_NEW(ompi_osc_ucx_pending_post_t); + pending_post->rank = post_rank; + opal_list_append(&module->pending_posts, &pending_post->super); +} + +int ompi_osc_ucx_fence(int assert, struct ompi_win_t *win) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + ucs_status_t status; + + if (module->epoch_type.access != NONE_EPOCH && + module->epoch_type.access != FENCE_EPOCH) { + return OMPI_ERR_RMA_SYNC; + } + + if (assert & MPI_MODE_NOSUCCEED) { + module->epoch_type.access = NONE_EPOCH; + } else { + module->epoch_type.access = FENCE_EPOCH; + } + + if (!(assert & MPI_MODE_NOPRECEDE)) { + status = ucp_worker_flush(mca_osc_ucx_component.ucp_worker); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_worker_flush failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + } + + module->global_ops_num = 0; + memset(module->per_target_ops_nums, 0, + sizeof(int) * ompi_comm_size(module->comm)); + + return module->comm->c_coll->coll_barrier(module->comm, + module->comm->c_coll->coll_barrier_module); +} + +int ompi_osc_ucx_start(struct ompi_group_t *group, int assert, struct ompi_win_t *win) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + int i, size, *ranks_in_grp = NULL, *ranks_in_win_grp = NULL; + ompi_group_t *win_group = NULL; + int ret = OMPI_SUCCESS; + + if (module->epoch_type.access != NONE_EPOCH && + module->epoch_type.access != FENCE_EPOCH) { + return OMPI_ERR_RMA_SYNC; + } + + module->epoch_type.access = START_COMPLETE_EPOCH; + + OBJ_RETAIN(group); + module->start_group = group; + size = ompi_group_size(module->start_group); + + ranks_in_grp = malloc(sizeof(int) * size); + ranks_in_win_grp = malloc(sizeof(int) * ompi_comm_size(module->comm)); + + for (i = 0; i < size; i++) { + ranks_in_grp[i] = i; + } + + ret = ompi_comm_group(module->comm, &win_group); + if (ret != OMPI_SUCCESS) { + return OMPI_ERROR; + } + + ret = ompi_group_translate_ranks(module->start_group, size, ranks_in_grp, + win_group, ranks_in_win_grp); + if (ret != OMPI_SUCCESS) { + return OMPI_ERROR; + } + + if ((assert & MPI_MODE_NOCHECK) == 0) { + ompi_osc_ucx_pending_post_t *pending_post, *next; + + /* first look through the pending list */ + OPAL_LIST_FOREACH_SAFE(pending_post, next, &module->pending_posts, ompi_osc_ucx_pending_post_t) { + for (i = 0; i < size; i++) { + if (pending_post->rank == ranks_in_win_grp[i]) { + opal_list_remove_item(&module->pending_posts, &pending_post->super); + OBJ_RELEASE(pending_post); + module->post_count++; + break; + } + } + } + + /* waiting for the rest post requests to come */ + while (module->post_count != size) { + for (i = 0; i < OMPI_OSC_UCX_POST_PEER_MAX; i++) { + if (0 == module->state.post_state[i]) { + continue; + } + + ompi_osc_ucx_handle_incoming_post(module, &(module->state.post_state[i]), ranks_in_win_grp, size); + } + ucp_worker_progress(mca_osc_ucx_component.ucp_worker); + } + + module->post_count = 0; + } + + free(ranks_in_grp); + ompi_group_free(&win_group); + + module->start_grp_ranks = ranks_in_win_grp; + + return ret; +} + +int ompi_osc_ucx_complete(struct ompi_win_t *win) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + ucs_status_t status; + int i, size; + int ret = OMPI_SUCCESS; + + if (module->epoch_type.access != START_COMPLETE_EPOCH) { + return OMPI_ERR_RMA_SYNC; + } + + module->epoch_type.access = NONE_EPOCH; + + status = ucp_worker_flush(mca_osc_ucx_component.ucp_worker); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_worker_flush failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + module->global_ops_num = 0; + memset(module->per_target_ops_nums, 0, + sizeof(int) * ompi_comm_size(module->comm)); + + size = ompi_group_size(module->start_group); + for (i = 0; i < size; i++) { + uint64_t remote_addr = (module->state_info_array)[module->start_grp_ranks[i]].addr + OSC_UCX_STATE_COMPLETE_COUNT_OFFSET; /* write to state.complete_count on remote side */ + ucp_rkey_h rkey = (module->state_info_array)[module->start_grp_ranks[i]].rkey; + ucp_ep_h ep = OSC_UCX_GET_EP(module->comm, module->start_grp_ranks[i]); + + status = ucp_atomic_post(ep, UCP_ATOMIC_POST_OP_ADD, 1, + sizeof(uint64_t), remote_addr, rkey); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_atomic_post failed: %d\n", + __FILE__, __LINE__, status); + } + + status = ucp_ep_flush(ep); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_ep_flush failed: %d\n", + __FILE__, __LINE__, status); + } + } + + OBJ_RELEASE(module->start_group); + module->start_group = NULL; + free(module->start_grp_ranks); + + return ret; +} + +int ompi_osc_ucx_post(struct ompi_group_t *group, int assert, struct ompi_win_t *win) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + int ret = OMPI_SUCCESS; + + if (module->epoch_type.exposure != NONE_EPOCH) { + return OMPI_ERR_RMA_SYNC; + } + + OBJ_RETAIN(group); + module->post_group = group; + + if ((assert & MPI_MODE_NOCHECK) == 0) { + int i, j, size; + ompi_group_t *win_group = NULL; + int *ranks_in_grp = NULL, *ranks_in_win_grp = NULL; + int myrank = ompi_comm_rank(module->comm); + ucs_status_t status; + + size = ompi_group_size(module->post_group); + ranks_in_grp = malloc(sizeof(int) * size); + ranks_in_win_grp = malloc(sizeof(int) * ompi_comm_size(module->comm)); + + for (i = 0; i < size; i++) { + ranks_in_grp[i] = i; + } + + ret = ompi_comm_group(module->comm, &win_group); + if (ret != OMPI_SUCCESS) { + return OMPI_ERROR; + } + + ret = ompi_group_translate_ranks(module->post_group, size, ranks_in_grp, + win_group, ranks_in_win_grp); + if (ret != OMPI_SUCCESS) { + return OMPI_ERROR; + } + + for (i = 0; i < size; i++) { + uint64_t remote_addr = (module->state_info_array)[ranks_in_win_grp[i]].addr + OSC_UCX_STATE_POST_INDEX_OFFSET; /* write to state.post_index on remote side */ + ucp_rkey_h rkey = (module->state_info_array)[ranks_in_win_grp[i]].rkey; + ucp_ep_h ep = OSC_UCX_GET_EP(module->comm, ranks_in_win_grp[i]); + uint64_t curr_idx = 0, result = 0; + + /* do fop first to get an post index */ + status = ucp_atomic_fadd64(ep, 1, remote_addr, rkey, &result); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_atomic_fadd64 failed: %d\n", + __FILE__, __LINE__, status); + } + + curr_idx = result & (OMPI_OSC_UCX_POST_PEER_MAX - 1); + + remote_addr = (module->state_info_array)[ranks_in_win_grp[i]].addr + OSC_UCX_STATE_POST_STATE_OFFSET + sizeof(uint64_t) * curr_idx; + + /* do cas to send post message */ + do { + status = ucp_atomic_cswap64(ep, 0, (uint64_t)myrank + 1, + remote_addr, rkey, &result); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_atomic_cswap64 failed: %d\n", + __FILE__, __LINE__, status); + } + + if (result == 0) + break; + + /* prevent circular wait by checking for post messages received */ + for (j = 0; j < OMPI_OSC_UCX_POST_PEER_MAX; j++) { + /* no post at this index (yet) */ + if (0 == module->state.post_state[j]) { + continue; + } + + ompi_osc_ucx_handle_incoming_post(module, &(module->state.post_state[j]), NULL, 0); + } + + usleep(100); + } while (1); + } + + free(ranks_in_grp); + free(ranks_in_win_grp); + ompi_group_free(&win_group); + } + + module->epoch_type.exposure = POST_WAIT_EPOCH; + + return ret; +} + +int ompi_osc_ucx_wait(struct ompi_win_t *win) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + int size; + + if (module->epoch_type.exposure != POST_WAIT_EPOCH) { + return OMPI_ERR_RMA_SYNC; + } + + size = ompi_group_size(module->post_group); + + while (module->state.complete_count != (uint64_t)size) { + /* not sure if this is required */ + ucp_worker_progress(mca_osc_ucx_component.ucp_worker); + } + + module->state.complete_count = 0; + + OBJ_RELEASE(module->post_group); + module->post_group = NULL; + + module->epoch_type.exposure = NONE_EPOCH; + + return OMPI_SUCCESS; +} + +int ompi_osc_ucx_test(struct ompi_win_t *win, int *flag) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + int size; + + if (module->epoch_type.exposure != POST_WAIT_EPOCH) { + return OMPI_ERR_RMA_SYNC; + } + + size = ompi_group_size(module->post_group); + + opal_progress(); + + if (module->state.complete_count == (uint64_t)size) { + OBJ_RELEASE(module->post_group); + module->post_group = NULL; + + module->state.complete_count = 0; + + module->epoch_type.exposure = NONE_EPOCH; + *flag = 1; + } else { + *flag = 0; + } + + return OMPI_SUCCESS; +} diff --git a/ompi/mca/osc/ucx/osc_ucx_comm.c b/ompi/mca/osc/ucx/osc_ucx_comm.c new file mode 100644 index 00000000000..22f4ce1e943 --- /dev/null +++ b/ompi/mca/osc/ucx/osc_ucx_comm.c @@ -0,0 +1,1069 @@ +/* + * Copyright (C) Mellanox Technologies Ltd. 2001-2017. ALL RIGHTS RESERVED. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "ompi_config.h" + +#include "ompi/mca/osc/osc.h" +#include "ompi/mca/osc/base/base.h" +#include "ompi/mca/osc/base/osc_base_obj_convert.h" + +#include "osc_ucx.h" +#include "osc_ucx_request.h" + +typedef struct ucx_iovec { + void *addr; + size_t len; +} ucx_iovec_t; + +static inline int check_sync_state(ompi_osc_ucx_module_t *module, int target, + bool is_req_ops) { + if (is_req_ops == false) { + if (module->epoch_type.access == NONE_EPOCH) { + return OMPI_ERR_RMA_SYNC; + } else if (module->epoch_type.access == START_COMPLETE_EPOCH) { + int i, size = ompi_group_size(module->start_group); + for (i = 0; i < size; i++) { + if (module->start_grp_ranks[i] == target) { + break; + } + } + if (i == size) { + return OMPI_ERR_RMA_SYNC; + } + } else if (module->epoch_type.access == PASSIVE_EPOCH) { + ompi_osc_ucx_lock_t *item = NULL; + opal_hash_table_get_value_uint32(&module->outstanding_locks, (uint32_t) target, (void **) &item); + if (item == NULL) { + return OMPI_ERR_RMA_SYNC; + } + } + } else { + if (module->epoch_type.access != PASSIVE_EPOCH && + module->epoch_type.access != PASSIVE_ALL_EPOCH) { + return OMPI_ERR_RMA_SYNC; + } else if (module->epoch_type.access == PASSIVE_EPOCH) { + ompi_osc_ucx_lock_t *item = NULL; + opal_hash_table_get_value_uint32(&module->outstanding_locks, (uint32_t) target, (void **) &item); + if (item == NULL) { + return OMPI_ERR_RMA_SYNC; + } + } + } + return OMPI_SUCCESS; +} + +static inline int incr_and_check_ops_num(ompi_osc_ucx_module_t *module, int target, + ucp_ep_h ep) { + ucs_status_t status; + + module->global_ops_num++; + module->per_target_ops_nums[target]++; + if (module->global_ops_num >= OSC_UCX_OPS_THRESHOLD) { + /* TODO: ucp_ep_flush needs to be replaced with its non-blocking counterpart + * when it is implemented in UCX */ + status = ucp_ep_flush(ep); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_ep_flush failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + module->global_ops_num -= module->per_target_ops_nums[target]; + module->per_target_ops_nums[target] = 0; + } + return OMPI_SUCCESS; +} + +static inline int create_iov_list(const void *addr, int count, ompi_datatype_t *datatype, + ucx_iovec_t **ucx_iov, uint32_t *ucx_iov_count) { + int ret = OMPI_SUCCESS; + size_t size; + bool done = false; + opal_convertor_t convertor; + uint32_t iov_count, iov_idx; + struct iovec iov[OSC_UCX_IOVEC_MAX]; + uint32_t ucx_iov_idx; + + OBJ_CONSTRUCT(&convertor, opal_convertor_t); + ret = opal_convertor_copy_and_prepare_for_send(ompi_mpi_local_convertor, + &datatype->super, count, + addr, 0, &convertor); + if (ret != OMPI_SUCCESS) { + return ret; + } + + (*ucx_iov_count) = 0; + ucx_iov_idx = 0; + + do { + iov_count = OSC_UCX_IOVEC_MAX; + iov_idx = 0; + + done = opal_convertor_raw(&convertor, iov, &iov_count, &size); + + (*ucx_iov_count) += iov_count; + (*ucx_iov) = (ucx_iovec_t *)realloc((*ucx_iov), (*ucx_iov_count) * sizeof(ucx_iovec_t)); + if (*ucx_iov == NULL) { + return OMPI_ERR_TEMP_OUT_OF_RESOURCE; + } + + while (iov_idx != iov_count) { + (*ucx_iov)[ucx_iov_idx].addr = iov[iov_idx].iov_base; + (*ucx_iov)[ucx_iov_idx].len = iov[iov_idx].iov_len; + ucx_iov_idx++; + iov_idx++; + } + + assert((*ucx_iov_count) == ucx_iov_idx); + + } while (!done); + + opal_convertor_cleanup(&convertor); + OBJ_DESTRUCT(&convertor); + + return ret; +} + +static inline int ddt_put_get(ompi_osc_ucx_module_t *module, + const void *origin_addr, int origin_count, + struct ompi_datatype_t *origin_dt, + bool is_origin_contig, ptrdiff_t origin_lb, + int target, ucp_ep_h ep, uint64_t remote_addr, ucp_rkey_h rkey, + int target_count, struct ompi_datatype_t *target_dt, + bool is_target_contig, ptrdiff_t target_lb, bool is_get) { + ucx_iovec_t *origin_ucx_iov = NULL, *target_ucx_iov = NULL; + uint32_t origin_ucx_iov_count = 0, target_ucx_iov_count = 0; + uint32_t origin_ucx_iov_idx = 0, target_ucx_iov_idx = 0; + ucs_status_t status; + int ret = OMPI_SUCCESS; + + if (!is_origin_contig) { + ret = create_iov_list(origin_addr, origin_count, origin_dt, + &origin_ucx_iov, &origin_ucx_iov_count); + if (ret != OMPI_SUCCESS) { + return ret; + } + } + + if (!is_target_contig) { + ret = create_iov_list(NULL, target_count, target_dt, + &target_ucx_iov, &target_ucx_iov_count); + if (ret != OMPI_SUCCESS) { + return ret; + } + } + + if (!is_origin_contig && !is_target_contig) { + size_t curr_len = 0; + while (origin_ucx_iov_idx < origin_ucx_iov_count) { + curr_len = MIN(origin_ucx_iov[origin_ucx_iov_idx].len, + target_ucx_iov[target_ucx_iov_idx].len); + + if (!is_get) { + status = ucp_put_nbi(ep, origin_ucx_iov[origin_ucx_iov_idx].addr, curr_len, + remote_addr + (uint64_t)(target_ucx_iov[target_ucx_iov_idx].addr), rkey); + if (status != UCS_OK && status != UCS_INPROGRESS) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_put_nbi failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + } else { + status = ucp_get_nbi(ep, origin_ucx_iov[origin_ucx_iov_idx].addr, curr_len, + remote_addr + (uint64_t)(target_ucx_iov[target_ucx_iov_idx].addr), rkey); + if (status != UCS_OK && status != UCS_INPROGRESS) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_get_nbi failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + } + + ret = incr_and_check_ops_num(module, target, ep); + if (ret != OMPI_SUCCESS) { + return ret; + } + + origin_ucx_iov[origin_ucx_iov_idx].addr = (void *)((intptr_t)origin_ucx_iov[origin_ucx_iov_idx].addr + curr_len); + target_ucx_iov[target_ucx_iov_idx].addr = (void *)((intptr_t)target_ucx_iov[target_ucx_iov_idx].addr + curr_len); + + origin_ucx_iov[origin_ucx_iov_idx].len -= curr_len; + if (origin_ucx_iov[origin_ucx_iov_idx].len == 0) { + origin_ucx_iov_idx++; + } + target_ucx_iov[target_ucx_iov_idx].len -= curr_len; + if (target_ucx_iov[target_ucx_iov_idx].len == 0) { + target_ucx_iov_idx++; + } + } + + assert(origin_ucx_iov_idx == origin_ucx_iov_count && + target_ucx_iov_idx == target_ucx_iov_count); + + } else if (!is_origin_contig) { + size_t prev_len = 0; + while (origin_ucx_iov_idx < origin_ucx_iov_count) { + if (!is_get) { + status = ucp_put_nbi(ep, origin_ucx_iov[origin_ucx_iov_idx].addr, + origin_ucx_iov[origin_ucx_iov_idx].len, + remote_addr + target_lb + prev_len, rkey); + if (status != UCS_OK && status != UCS_INPROGRESS) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_put_nbi failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + } else { + status = ucp_get_nbi(ep, origin_ucx_iov[origin_ucx_iov_idx].addr, + origin_ucx_iov[origin_ucx_iov_idx].len, + remote_addr + target_lb + prev_len, rkey); + if (status != UCS_OK && status != UCS_INPROGRESS) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_get_nbi failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + } + + ret = incr_and_check_ops_num(module, target, ep); + if (ret != OMPI_SUCCESS) { + return ret; + } + + prev_len += origin_ucx_iov[origin_ucx_iov_idx].len; + origin_ucx_iov_idx++; + } + } else { + size_t prev_len = 0; + while (target_ucx_iov_idx < target_ucx_iov_count) { + if (!is_get) { + status = ucp_put_nbi(ep, (void *)((intptr_t)origin_addr + origin_lb + prev_len), + target_ucx_iov[target_ucx_iov_idx].len, + remote_addr + (uint64_t)(target_ucx_iov[target_ucx_iov_idx].addr), rkey); + if (status != UCS_OK && status != UCS_INPROGRESS) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_put_nbi failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + } else { + status = ucp_get_nbi(ep, (void *)((intptr_t)origin_addr + origin_lb + prev_len), + target_ucx_iov[target_ucx_iov_idx].len, + remote_addr + (uint64_t)(target_ucx_iov[target_ucx_iov_idx].addr), rkey); + if (status != UCS_OK && status != UCS_INPROGRESS) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_get_nbi failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + } + + ret = incr_and_check_ops_num(module, target, ep); + if (ret != OMPI_SUCCESS) { + return ret; + } + + prev_len += target_ucx_iov[target_ucx_iov_idx].len; + target_ucx_iov_idx++; + } + } + + if (origin_ucx_iov != NULL) { + free(origin_ucx_iov); + } + if (target_ucx_iov != NULL) { + free(target_ucx_iov); + } + + return ret; +} + +static inline int start_atomicity(ompi_osc_ucx_module_t *module, ucp_ep_h ep, int target) { + uint64_t result_value = -1; + ucp_rkey_h rkey = (module->state_info_array)[target].rkey; + uint64_t remote_addr = (module->state_info_array)[target].addr + OSC_UCX_STATE_ACC_LOCK_OFFSET; + ucs_status_t status; + + while (result_value != TARGET_LOCK_UNLOCKED) { + status = ucp_atomic_cswap64(ep, TARGET_LOCK_UNLOCKED, + TARGET_LOCK_EXCLUSIVE, + remote_addr, rkey, &result_value); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_atomic_cswap64 failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + } + + return OMPI_SUCCESS; +} + +static inline int end_atomicity(ompi_osc_ucx_module_t *module, ucp_ep_h ep, int target) { + uint64_t result_value = 0; + ucp_rkey_h rkey = (module->state_info_array)[target].rkey; + uint64_t remote_addr = (module->state_info_array)[target].addr + OSC_UCX_STATE_ACC_LOCK_OFFSET; + ucs_status_t status; + + status = ucp_atomic_swap64(ep, TARGET_LOCK_UNLOCKED, + remote_addr, rkey, &result_value); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_atomic_swap64 failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + assert(result_value == TARGET_LOCK_EXCLUSIVE); + + return OMPI_SUCCESS; +} + +static inline int get_dynamic_win_info(uint64_t remote_addr, ompi_osc_ucx_module_t *module, + ucp_ep_h ep, int target) { + ucp_rkey_h state_rkey = (module->state_info_array)[target].rkey; + uint64_t remote_state_addr = (module->state_info_array)[target].addr + OSC_UCX_STATE_DYNAMIC_WIN_CNT_OFFSET; + size_t len = sizeof(uint64_t) + sizeof(ompi_osc_dynamic_win_info_t) * OMPI_OSC_UCX_ATTACH_MAX; + char *temp_buf = malloc(len); + ompi_osc_dynamic_win_info_t *temp_dynamic_wins; + int win_count, contain, insert = -1; + ucs_status_t status; + + if ((module->win_info_array[target]).rkey_init == true) { + ucp_rkey_destroy((module->win_info_array[target]).rkey); + (module->win_info_array[target]).rkey_init == false; + } + + status = ucp_get_nbi(ep, (void *)temp_buf, len, remote_state_addr, state_rkey); + if (status != UCS_OK && status != UCS_INPROGRESS) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_get_nbi failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + status = ucp_ep_flush(ep); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_ep_flush failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + memcpy(&win_count, temp_buf, sizeof(uint64_t)); + assert(win_count > 0 && win_count <= OMPI_OSC_UCX_ATTACH_MAX); + + temp_dynamic_wins = (ompi_osc_dynamic_win_info_t *)(temp_buf + sizeof(uint64_t)); + contain = ompi_osc_find_attached_region_position(temp_dynamic_wins, 0, win_count, + remote_addr, 1, &insert); + assert(contain >= 0 && contain < win_count); + + status = ucp_ep_rkey_unpack(ep, temp_dynamic_wins[contain].rkey_buffer, + &((module->win_info_array[target]).rkey)); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_ep_rkey_unpack failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + (module->win_info_array[target]).rkey_init = true; + + free(temp_buf); + + return status; +} + +int ompi_osc_ucx_put(const void *origin_addr, int origin_count, struct ompi_datatype_t *origin_dt, + int target, ptrdiff_t target_disp, int target_count, + struct ompi_datatype_t *target_dt, struct ompi_win_t *win) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + ucp_ep_h ep = OSC_UCX_GET_EP(module->comm, target); + uint64_t remote_addr = (module->win_info_array[target]).addr + target_disp * OSC_UCX_GET_DISP(module, target); + ucp_rkey_h rkey; + bool is_origin_contig = false, is_target_contig = false; + ptrdiff_t origin_lb, origin_extent, target_lb, target_extent; + ucs_status_t status; + int ret = OMPI_SUCCESS; + + ret = check_sync_state(module, target, false); + if (ret != OMPI_SUCCESS) { + return ret; + } + + if (module->flavor == MPI_WIN_FLAVOR_DYNAMIC) { + status = get_dynamic_win_info(remote_addr, module, ep, target); + if (status != UCS_OK) { + return OMPI_ERROR; + } + } + + rkey = (module->win_info_array[target]).rkey; + + ompi_datatype_get_true_extent(origin_dt, &origin_lb, &origin_extent); + ompi_datatype_get_true_extent(target_dt, &target_lb, &target_extent); + + is_origin_contig = ompi_datatype_is_contiguous_memory_layout(origin_dt, origin_count); + is_target_contig = ompi_datatype_is_contiguous_memory_layout(target_dt, target_count); + + if (is_origin_contig && is_target_contig) { + /* fast path */ + size_t origin_len; + + ompi_datatype_type_size(origin_dt, &origin_len); + origin_len *= origin_count; + + status = ucp_put_nbi(ep, (void *)((intptr_t)origin_addr + origin_lb), origin_len, + remote_addr + target_lb, rkey); + if (status != UCS_OK && status != UCS_INPROGRESS) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_put_nbi failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + return incr_and_check_ops_num(module, target, ep); + } else { + return ddt_put_get(module, origin_addr, origin_count, origin_dt, is_origin_contig, + origin_lb, target, ep, remote_addr, rkey, target_count, target_dt, + is_target_contig, target_lb, false); + } +} + +int ompi_osc_ucx_get(void *origin_addr, int origin_count, + struct ompi_datatype_t *origin_dt, + int target, ptrdiff_t target_disp, int target_count, + struct ompi_datatype_t *target_dt, struct ompi_win_t *win) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + ucp_ep_h ep = OSC_UCX_GET_EP(module->comm, target); + uint64_t remote_addr = (module->win_info_array[target]).addr + target_disp * OSC_UCX_GET_DISP(module, target); + ucp_rkey_h rkey; + ptrdiff_t origin_lb, origin_extent, target_lb, target_extent; + bool is_origin_contig = false, is_target_contig = false; + ucs_status_t status; + int ret = OMPI_SUCCESS; + + ret = check_sync_state(module, target, false); + if (ret != OMPI_SUCCESS) { + return ret; + } + + if (module->flavor == MPI_WIN_FLAVOR_DYNAMIC) { + status = get_dynamic_win_info(remote_addr, module, ep, target); + if (status != UCS_OK) { + return OMPI_ERROR; + } + } + + rkey = (module->win_info_array[target]).rkey; + + ompi_datatype_get_true_extent(origin_dt, &origin_lb, &origin_extent); + ompi_datatype_get_true_extent(target_dt, &target_lb, &target_extent); + + is_origin_contig = ompi_datatype_is_contiguous_memory_layout(origin_dt, origin_count); + is_target_contig = ompi_datatype_is_contiguous_memory_layout(target_dt, target_count); + + if (is_origin_contig && is_target_contig) { + /* fast path */ + size_t origin_len; + + ompi_datatype_type_size(origin_dt, &origin_len); + origin_len *= origin_count; + + status = ucp_get_nbi(ep, (void *)((intptr_t)origin_addr + origin_lb), origin_len, + remote_addr + target_lb, rkey); + if (status != UCS_OK && status != UCS_INPROGRESS) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_get_nbi failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + return incr_and_check_ops_num(module, target, ep); + } else { + return ddt_put_get(module, origin_addr, origin_count, origin_dt, is_origin_contig, + origin_lb, target, ep, remote_addr, rkey, target_count, target_dt, + is_target_contig, target_lb, true); + } +} + +int ompi_osc_ucx_accumulate(const void *origin_addr, int origin_count, + struct ompi_datatype_t *origin_dt, + int target, ptrdiff_t target_disp, int target_count, + struct ompi_datatype_t *target_dt, + struct ompi_op_t *op, struct ompi_win_t *win) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + ucp_ep_h ep = OSC_UCX_GET_EP(module->comm, target); + int ret = OMPI_SUCCESS; + + ret = check_sync_state(module, target, false); + if (ret != OMPI_SUCCESS) { + return ret; + } + + if (op == &ompi_mpi_op_no_op.op) { + return ret; + } + + ret = start_atomicity(module, ep, target); + if (ret != OMPI_SUCCESS) { + return ret; + } + + if (op == &ompi_mpi_op_replace.op) { + ret = ompi_osc_ucx_put(origin_addr, origin_count, origin_dt, target, + target_disp, target_count, target_dt, win); + if (ret != OMPI_SUCCESS) { + return ret; + } + } else { + void *temp_addr = NULL; + uint32_t temp_count; + ompi_datatype_t *temp_dt; + ptrdiff_t temp_lb, temp_extent; + ucs_status_t status; + bool is_origin_contig = ompi_datatype_is_contiguous_memory_layout(origin_dt, origin_count); + + if (ompi_datatype_is_predefined(target_dt)) { + temp_dt = target_dt; + temp_count = target_count; + } else { + ret = ompi_osc_base_get_primitive_type_info(target_dt, &temp_dt, &temp_count); + if (ret != OMPI_SUCCESS) { + return ret; + } + } + ompi_datatype_get_true_extent(temp_dt, &temp_lb, &temp_extent); + temp_addr = malloc(temp_extent * temp_count); + if (temp_addr == NULL) { + return OMPI_ERR_TEMP_OUT_OF_RESOURCE; + } + + ret = ompi_osc_ucx_get(temp_addr, (int)temp_count, temp_dt, + target, target_disp, target_count, target_dt, win); + if (ret != OMPI_SUCCESS) { + return ret; + } + + status = ucp_ep_flush(ep); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_ep_flush failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + if (ompi_datatype_is_predefined(origin_dt) || is_origin_contig) { + ompi_op_reduce(op, (void *)origin_addr, temp_addr, (int)temp_count, temp_dt); + } else { + ucx_iovec_t *origin_ucx_iov = NULL; + uint32_t origin_ucx_iov_count = 0; + uint32_t origin_ucx_iov_idx = 0; + + ret = create_iov_list(origin_addr, origin_count, origin_dt, + &origin_ucx_iov, &origin_ucx_iov_count); + if (ret != OMPI_SUCCESS) { + return ret; + } + + if ((op != &ompi_mpi_op_maxloc.op && op != &ompi_mpi_op_minloc.op) || + ompi_datatype_is_contiguous_memory_layout(temp_dt, temp_count)) { + size_t temp_size; + ompi_datatype_type_size(temp_dt, &temp_size); + while (origin_ucx_iov_idx < origin_ucx_iov_count) { + int curr_count = origin_ucx_iov[origin_ucx_iov_idx].len / temp_size; + ompi_op_reduce(op, origin_ucx_iov[origin_ucx_iov_idx].addr, + temp_addr, curr_count, temp_dt); + temp_addr = (void *)((char *)temp_addr + curr_count * temp_size); + origin_ucx_iov_idx++; + } + } else { + int i; + void *curr_origin_addr = origin_ucx_iov[origin_ucx_iov_idx].addr; + for (i = 0; i < (int)temp_count; i++) { + ompi_op_reduce(op, curr_origin_addr, + (void *)((char *)temp_addr + i * temp_extent), + 1, temp_dt); + curr_origin_addr = (void *)((char *)curr_origin_addr + temp_extent); + origin_ucx_iov_idx++; + if (curr_origin_addr >= (void *)((char *)origin_ucx_iov[origin_ucx_iov_idx].addr + origin_ucx_iov[origin_ucx_iov_idx].len)) { + origin_ucx_iov_idx++; + curr_origin_addr = origin_ucx_iov[origin_ucx_iov_idx].addr; + } + } + } + + free(origin_ucx_iov); + } + + ret = ompi_osc_ucx_put(temp_addr, (int)temp_count, temp_dt, target, target_disp, + target_count, target_dt, win); + if (ret != OMPI_SUCCESS) { + return ret; + } + + status = ucp_ep_flush(ep); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_ep_flush failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + free(temp_addr); + } + + ret = end_atomicity(module, ep, target); + + return ret; +} + +int ompi_osc_ucx_compare_and_swap(const void *origin_addr, const void *compare_addr, + void *result_addr, struct ompi_datatype_t *dt, + int target, ptrdiff_t target_disp, + struct ompi_win_t *win) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t *)win->w_osc_module; + ucp_ep_h ep = OSC_UCX_GET_EP(module->comm, target); + uint64_t remote_addr = (module->win_info_array[target]).addr + target_disp * OSC_UCX_GET_DISP(module, target); + ucp_rkey_h rkey; + size_t dt_bytes; + ompi_osc_ucx_internal_request_t *req = NULL; + int ret = OMPI_SUCCESS; + ucs_status_t status; + + ret = check_sync_state(module, target, false); + if (ret != OMPI_SUCCESS) { + return ret; + } + + ret = start_atomicity(module, ep, target); + if (ret != OMPI_SUCCESS) { + return ret; + } + + if (module->flavor == MPI_WIN_FLAVOR_DYNAMIC) { + status = get_dynamic_win_info(remote_addr, module, ep, target); + if (status != UCS_OK) { + return OMPI_ERROR; + } + } + + rkey = (module->win_info_array[target]).rkey; + + ompi_datatype_type_size(dt, &dt_bytes); + memcpy(result_addr, origin_addr, dt_bytes); + req = ucp_atomic_fetch_nb(ep, UCP_ATOMIC_FETCH_OP_CSWAP, *(uint64_t *)compare_addr, + result_addr, dt_bytes, remote_addr, rkey, req_completion); + if (UCS_PTR_IS_PTR(req)) { + ucp_request_release(req); + } + + ret = incr_and_check_ops_num(module, target, ep); + if (ret != OMPI_SUCCESS) { + return ret; + } + + return end_atomicity(module, ep, target); +} + +int ompi_osc_ucx_fetch_and_op(const void *origin_addr, void *result_addr, + struct ompi_datatype_t *dt, int target, + ptrdiff_t target_disp, struct ompi_op_t *op, + struct ompi_win_t *win) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + int ret = OMPI_SUCCESS; + + ret = check_sync_state(module, target, false); + if (ret != OMPI_SUCCESS) { + return ret; + } + + if (op == &ompi_mpi_op_no_op.op || op == &ompi_mpi_op_replace.op || + op == &ompi_mpi_op_sum.op) { + ucp_ep_h ep = OSC_UCX_GET_EP(module->comm, target); + uint64_t remote_addr = (module->win_info_array[target]).addr + target_disp * OSC_UCX_GET_DISP(module, target); + ucp_rkey_h rkey; + uint64_t value = *(uint64_t *)origin_addr; + ucp_atomic_fetch_op_t opcode; + size_t dt_bytes; + ompi_osc_ucx_internal_request_t *req = NULL; + ucs_status_t status; + + ret = start_atomicity(module, ep, target); + if (ret != OMPI_SUCCESS) { + return ret; + } + + if (module->flavor == MPI_WIN_FLAVOR_DYNAMIC) { + status = get_dynamic_win_info(remote_addr, module, ep, target); + if (status != UCS_OK) { + return OMPI_ERROR; + } + } + + rkey = (module->win_info_array[target]).rkey; + + ompi_datatype_type_size(dt, &dt_bytes); + + if (op == &ompi_mpi_op_replace.op) { + opcode = UCP_ATOMIC_FETCH_OP_SWAP; + } else { + opcode = UCP_ATOMIC_FETCH_OP_FADD; + if (op == &ompi_mpi_op_no_op.op) { + value = 0; + } + } + + req = ucp_atomic_fetch_nb(ep, opcode, value, result_addr, + dt_bytes, remote_addr, rkey, req_completion); + if (UCS_PTR_IS_PTR(req)) { + ucp_request_release(req); + } + + ret = incr_and_check_ops_num(module, target, ep); + if (ret != OMPI_SUCCESS) { + return ret; + } + + return end_atomicity(module, ep, target); + } else { + return ompi_osc_ucx_get_accumulate(origin_addr, 1, dt, result_addr, 1, dt, + target, target_disp, 1, dt, op, win); + } +} + +int ompi_osc_ucx_get_accumulate(const void *origin_addr, int origin_count, + struct ompi_datatype_t *origin_dt, + void *result_addr, int result_count, + struct ompi_datatype_t *result_dt, + int target, ptrdiff_t target_disp, + int target_count, struct ompi_datatype_t *target_dt, + struct ompi_op_t *op, struct ompi_win_t *win) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + ucp_ep_h ep = OSC_UCX_GET_EP(module->comm, target); + int ret = OMPI_SUCCESS; + + ret = check_sync_state(module, target, false); + if (ret != OMPI_SUCCESS) { + return ret; + } + + ret = start_atomicity(module, ep, target); + if (ret != OMPI_SUCCESS) { + return ret; + } + + ret = ompi_osc_ucx_get(result_addr, result_count, result_dt, target, + target_disp, target_count, target_dt, win); + if (ret != OMPI_SUCCESS) { + return ret; + } + + if (op != &ompi_mpi_op_no_op.op) { + if (op == &ompi_mpi_op_replace.op) { + ret = ompi_osc_ucx_put(origin_addr, origin_count, origin_dt, + target, target_disp, target_count, + target_dt, win); + if (ret != OMPI_SUCCESS) { + return ret; + } + } else { + void *temp_addr = NULL; + uint32_t temp_count; + ompi_datatype_t *temp_dt; + ptrdiff_t temp_lb, temp_extent; + ucs_status_t status; + bool is_origin_contig = ompi_datatype_is_contiguous_memory_layout(origin_dt, origin_count); + + if (ompi_datatype_is_predefined(target_dt)) { + temp_dt = target_dt; + temp_count = target_count; + } else { + ret = ompi_osc_base_get_primitive_type_info(target_dt, &temp_dt, &temp_count); + if (ret != OMPI_SUCCESS) { + return ret; + } + } + ompi_datatype_get_true_extent(temp_dt, &temp_lb, &temp_extent); + temp_addr = malloc(temp_extent * temp_count); + if (temp_addr == NULL) { + return OMPI_ERR_TEMP_OUT_OF_RESOURCE; + } + + ret = ompi_osc_ucx_get(temp_addr, (int)temp_count, temp_dt, + target, target_disp, target_count, target_dt, win); + if (ret != OMPI_SUCCESS) { + return ret; + } + + status = ucp_ep_flush(ep); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_ep_flush failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + if (ompi_datatype_is_predefined(origin_dt) || is_origin_contig) { + ompi_op_reduce(op, (void *)origin_addr, temp_addr, (int)temp_count, temp_dt); + } else { + ucx_iovec_t *origin_ucx_iov = NULL; + uint32_t origin_ucx_iov_count = 0; + uint32_t origin_ucx_iov_idx = 0; + + ret = create_iov_list(origin_addr, origin_count, origin_dt, + &origin_ucx_iov, &origin_ucx_iov_count); + if (ret != OMPI_SUCCESS) { + return ret; + } + + if ((op != &ompi_mpi_op_maxloc.op && op != &ompi_mpi_op_minloc.op) || + ompi_datatype_is_contiguous_memory_layout(temp_dt, temp_count)) { + size_t temp_size; + ompi_datatype_type_size(temp_dt, &temp_size); + while (origin_ucx_iov_idx < origin_ucx_iov_count) { + int curr_count = origin_ucx_iov[origin_ucx_iov_idx].len / temp_size; + ompi_op_reduce(op, origin_ucx_iov[origin_ucx_iov_idx].addr, + temp_addr, curr_count, temp_dt); + temp_addr = (void *)((char *)temp_addr + curr_count * temp_size); + origin_ucx_iov_idx++; + } + } else { + int i; + void *curr_origin_addr = origin_ucx_iov[origin_ucx_iov_idx].addr; + for (i = 0; i < (int)temp_count; i++) { + ompi_op_reduce(op, curr_origin_addr, + (void *)((char *)temp_addr + i * temp_extent), + 1, temp_dt); + curr_origin_addr = (void *)((char *)curr_origin_addr + temp_extent); + origin_ucx_iov_idx++; + if (curr_origin_addr >= (void *)((char *)origin_ucx_iov[origin_ucx_iov_idx].addr + origin_ucx_iov[origin_ucx_iov_idx].len)) { + origin_ucx_iov_idx++; + curr_origin_addr = origin_ucx_iov[origin_ucx_iov_idx].addr; + } + } + } + free(origin_ucx_iov); + } + + ret = ompi_osc_ucx_put(temp_addr, (int)temp_count, temp_dt, target, target_disp, + target_count, target_dt, win); + if (ret != OMPI_SUCCESS) { + return ret; + } + + status = ucp_ep_flush(ep); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_ep_flush failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + free(temp_addr); + } + } + + ret = end_atomicity(module, ep, target); + + return ret; +} + +int ompi_osc_ucx_rput(const void *origin_addr, int origin_count, + struct ompi_datatype_t *origin_dt, + int target, ptrdiff_t target_disp, int target_count, + struct ompi_datatype_t *target_dt, + struct ompi_win_t *win, struct ompi_request_t **request) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + ucp_ep_h ep = OSC_UCX_GET_EP(module->comm, target); + uint64_t remote_addr = (module->state_info_array[target]).addr + OSC_UCX_STATE_REQ_FLAG_OFFSET; + ucp_rkey_h rkey; + ompi_osc_ucx_request_t *ucx_req = NULL; + ompi_osc_ucx_internal_request_t *internal_req = NULL; + ucs_status_t status; + int ret = OMPI_SUCCESS; + + ret = check_sync_state(module, target, true); + if (ret != OMPI_SUCCESS) { + return ret; + } + + if (module->flavor == MPI_WIN_FLAVOR_DYNAMIC) { + status = get_dynamic_win_info(remote_addr, module, ep, target); + if (status != UCS_OK) { + return OMPI_ERROR; + } + } + + rkey = (module->win_info_array[target]).rkey; + + OMPI_OSC_UCX_REQUEST_ALLOC(win, ucx_req); + if (NULL == ucx_req) { + return OMPI_ERR_TEMP_OUT_OF_RESOURCE; + } + + ret = ompi_osc_ucx_put(origin_addr, origin_count, origin_dt, target, target_disp, + target_count, target_dt, win); + if (ret != OMPI_SUCCESS) { + return ret; + } + + status = ucp_worker_fence(mca_osc_ucx_component.ucp_worker); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_worker_fence failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + internal_req = ucp_atomic_fetch_nb(ep, UCP_ATOMIC_FETCH_OP_FADD, 0, + &(module->req_result), sizeof(uint64_t), + remote_addr, rkey, req_completion); + + if (UCS_PTR_IS_PTR(internal_req)) { + internal_req->external_req = ucx_req; + mca_osc_ucx_component.num_incomplete_req_ops++; + } else { + ompi_request_complete(&ucx_req->super, true); + } + + *request = &ucx_req->super; + + return incr_and_check_ops_num(module, target, ep); +} + +int ompi_osc_ucx_rget(void *origin_addr, int origin_count, + struct ompi_datatype_t *origin_dt, + int target, ptrdiff_t target_disp, int target_count, + struct ompi_datatype_t *target_dt, struct ompi_win_t *win, + struct ompi_request_t **request) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + ucp_ep_h ep = OSC_UCX_GET_EP(module->comm, target); + uint64_t remote_addr = (module->state_info_array[target]).addr + OSC_UCX_STATE_REQ_FLAG_OFFSET; + ucp_rkey_h rkey; + ompi_osc_ucx_request_t *ucx_req = NULL; + ompi_osc_ucx_internal_request_t *internal_req = NULL; + ucs_status_t status; + int ret = OMPI_SUCCESS; + + ret = check_sync_state(module, target, true); + if (ret != OMPI_SUCCESS) { + return ret; + } + + if (module->flavor == MPI_WIN_FLAVOR_DYNAMIC) { + status = get_dynamic_win_info(remote_addr, module, ep, target); + if (status != UCS_OK) { + return OMPI_ERROR; + } + } + + rkey = (module->win_info_array[target]).rkey; + + OMPI_OSC_UCX_REQUEST_ALLOC(win, ucx_req); + if (NULL == ucx_req) { + return OMPI_ERR_TEMP_OUT_OF_RESOURCE; + } + + ret = ompi_osc_ucx_get(origin_addr, origin_count, origin_dt, target, target_disp, + target_count, target_dt, win); + if (ret != OMPI_SUCCESS) { + return ret; + } + + status = ucp_worker_fence(mca_osc_ucx_component.ucp_worker); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_worker_fence failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + internal_req = ucp_atomic_fetch_nb(ep, UCP_ATOMIC_FETCH_OP_FADD, 0, + &(module->req_result), sizeof(uint64_t), + remote_addr, rkey, req_completion); + + if (UCS_PTR_IS_PTR(internal_req)) { + internal_req->external_req = ucx_req; + mca_osc_ucx_component.num_incomplete_req_ops++; + } else { + ompi_request_complete(&ucx_req->super, true); + } + + *request = &ucx_req->super; + + return incr_and_check_ops_num(module, target, ep); +} + +int ompi_osc_ucx_raccumulate(const void *origin_addr, int origin_count, + struct ompi_datatype_t *origin_dt, + int target, ptrdiff_t target_disp, int target_count, + struct ompi_datatype_t *target_dt, struct ompi_op_t *op, + struct ompi_win_t *win, struct ompi_request_t **request) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + ompi_osc_ucx_request_t *ucx_req = NULL; + int ret = OMPI_SUCCESS; + + ret = check_sync_state(module, target, true); + if (ret != OMPI_SUCCESS) { + return ret; + } + + OMPI_OSC_UCX_REQUEST_ALLOC(win, ucx_req); + if (NULL == ucx_req) { + return OMPI_ERR_TEMP_OUT_OF_RESOURCE; + } + + ret = ompi_osc_ucx_accumulate(origin_addr, origin_count, origin_dt, target, target_disp, + target_count, target_dt, op, win); + if (ret != OMPI_SUCCESS) { + return ret; + } + + ompi_request_complete(&ucx_req->super, true); + *request = &ucx_req->super; + + return ret; +} + +int ompi_osc_ucx_rget_accumulate(const void *origin_addr, int origin_count, + struct ompi_datatype_t *origin_datatype, + void *result_addr, int result_count, + struct ompi_datatype_t *result_datatype, + int target, ptrdiff_t target_disp, int target_count, + struct ompi_datatype_t *target_datatype, + struct ompi_op_t *op, struct ompi_win_t *win, + struct ompi_request_t **request) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + ompi_osc_ucx_request_t *ucx_req = NULL; + int ret = OMPI_SUCCESS; + + ret = check_sync_state(module, target, true); + if (ret != OMPI_SUCCESS) { + return ret; + } + + OMPI_OSC_UCX_REQUEST_ALLOC(win, ucx_req); + if (NULL == ucx_req) { + return OMPI_ERR_TEMP_OUT_OF_RESOURCE; + } + + ret = ompi_osc_ucx_get_accumulate(origin_addr, origin_count, origin_datatype, + result_addr, result_count, result_datatype, + target, target_disp, target_count, + target_datatype, op, win); + if (ret != OMPI_SUCCESS) { + return ret; + } + + ompi_request_complete(&ucx_req->super, true); + + *request = &ucx_req->super; + + return ret; +} diff --git a/ompi/mca/osc/ucx/osc_ucx_component.c b/ompi/mca/osc/ucx/osc_ucx_component.c new file mode 100644 index 00000000000..0c518e371e4 --- /dev/null +++ b/ompi/mca/osc/ucx/osc_ucx_component.c @@ -0,0 +1,846 @@ +/* + * Copyright (C) Mellanox Technologies Ltd. 2001-2017. ALL RIGHTS RESERVED. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "ompi_config.h" + +#include "ompi/mca/osc/osc.h" +#include "ompi/mca/osc/base/base.h" +#include "ompi/mca/osc/base/osc_base_obj_convert.h" + +#include "osc_ucx.h" +#include "osc_ucx_request.h" + +static int component_open(void); +static int component_register(void); +static int component_init(bool enable_progress_threads, bool enable_mpi_threads); +static int component_finalize(void); +static int component_query(struct ompi_win_t *win, void **base, size_t size, int disp_unit, + struct ompi_communicator_t *comm, struct opal_info_t *info, int flavor); +static int component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit, + struct ompi_communicator_t *comm, struct opal_info_t *info, + int flavor, int *model); + +ompi_osc_ucx_component_t mca_osc_ucx_component = { + { /* ompi_osc_base_component_t */ + .osc_version = { + OMPI_OSC_BASE_VERSION_3_0_0, + .mca_component_name = "ucx", + MCA_BASE_MAKE_VERSION(component, OMPI_MAJOR_VERSION, OMPI_MINOR_VERSION, + OMPI_RELEASE_VERSION), + .mca_open_component = component_open, + .mca_register_component_params = component_register, + }, + .osc_data = { + /* The component is not checkpoint ready */ + MCA_BASE_METADATA_PARAM_NONE + }, + .osc_init = component_init, + .osc_query = component_query, + .osc_select = component_select, + .osc_finalize = component_finalize, + } +}; + +ompi_osc_ucx_module_t ompi_osc_ucx_module_template = { + { + .osc_win_attach = ompi_osc_ucx_win_attach, + .osc_win_detach = ompi_osc_ucx_win_detach, + .osc_free = ompi_osc_ucx_free, + + .osc_put = ompi_osc_ucx_put, + .osc_get = ompi_osc_ucx_get, + .osc_accumulate = ompi_osc_ucx_accumulate, + .osc_compare_and_swap = ompi_osc_ucx_compare_and_swap, + .osc_fetch_and_op = ompi_osc_ucx_fetch_and_op, + .osc_get_accumulate = ompi_osc_ucx_get_accumulate, + + .osc_rput = ompi_osc_ucx_rput, + .osc_rget = ompi_osc_ucx_rget, + .osc_raccumulate = ompi_osc_ucx_raccumulate, + .osc_rget_accumulate = ompi_osc_ucx_rget_accumulate, + + .osc_fence = ompi_osc_ucx_fence, + + .osc_start = ompi_osc_ucx_start, + .osc_complete = ompi_osc_ucx_complete, + .osc_post = ompi_osc_ucx_post, + .osc_wait = ompi_osc_ucx_wait, + .osc_test = ompi_osc_ucx_test, + + .osc_lock = ompi_osc_ucx_lock, + .osc_unlock = ompi_osc_ucx_unlock, + .osc_lock_all = ompi_osc_ucx_lock_all, + .osc_unlock_all = ompi_osc_ucx_unlock_all, + + .osc_sync = ompi_osc_ucx_sync, + .osc_flush = ompi_osc_ucx_flush, + .osc_flush_all = ompi_osc_ucx_flush_all, + .osc_flush_local = ompi_osc_ucx_flush_local, + .osc_flush_local_all = ompi_osc_ucx_flush_local_all, + } +}; + +static int component_open(void) { + return OMPI_SUCCESS; +} + +static int component_register(void) { + char *description_str; + mca_osc_ucx_component.priority = 0; + asprintf(&description_str, "Priority of the osc/ucx component (default: %d)", + mca_osc_ucx_component.priority); + (void) mca_base_component_var_register(&mca_osc_ucx_component.super.osc_version, "priority", description_str, + MCA_BASE_VAR_TYPE_UNSIGNED_INT, NULL, 0, 0, OPAL_INFO_LVL_3, + MCA_BASE_VAR_SCOPE_GROUP, &mca_osc_ucx_component.priority); + free(description_str); + + return OMPI_SUCCESS; +} + +static int progress_callback(void) { + if (mca_osc_ucx_component.ucp_worker != NULL && + mca_osc_ucx_component.num_incomplete_req_ops > 0) { + ucp_worker_progress(mca_osc_ucx_component.ucp_worker); + } + return 0; +} + +static int component_init(bool enable_progress_threads, bool enable_mpi_threads) { + ucp_config_t *config = NULL; + ucp_params_t context_params; + bool progress_registered = false, requests_created = false; + int ret = OMPI_SUCCESS; + ucs_status_t status; + + mca_osc_ucx_component.ucp_context = NULL; + mca_osc_ucx_component.ucp_worker = NULL; + mca_osc_ucx_component.enable_mpi_threads = enable_mpi_threads; + + status = ucp_config_read("MPI", NULL, &config); + if (UCS_OK != status) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_config_read failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + OBJ_CONSTRUCT(&mca_osc_ucx_component.requests, opal_free_list_t); + requests_created = true; + ret = opal_free_list_init (&mca_osc_ucx_component.requests, + sizeof(ompi_osc_ucx_request_t), + opal_cache_line_size, + OBJ_CLASS(ompi_osc_ucx_request_t), + 0, 0, 8, 0, 8, NULL, 0, NULL, NULL, NULL); + if (OMPI_SUCCESS != ret) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: opal_free_list_init failed: %d\n", + __FILE__, __LINE__, ret); + goto error; + } + + mca_osc_ucx_component.num_incomplete_req_ops = 0; + + ret = opal_progress_register(progress_callback); + progress_registered = true; + if (OMPI_SUCCESS != ret) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: opal_progress_register failed: %d\n", + __FILE__, __LINE__, ret); + goto error; + } + + /* initialize UCP context */ + + memset(&context_params, 0, sizeof(ucp_context_h)); + context_params.field_mask = UCP_PARAM_FIELD_FEATURES | + UCP_PARAM_FIELD_MT_WORKERS_SHARED | + UCP_PARAM_FIELD_ESTIMATED_NUM_EPS | + UCP_PARAM_FIELD_REQUEST_INIT | + UCP_PARAM_FIELD_REQUEST_SIZE; + context_params.features = UCP_FEATURE_RMA | UCP_FEATURE_AMO32 | UCP_FEATURE_AMO64; + context_params.mt_workers_shared = 0; + context_params.estimated_num_eps = ompi_proc_world_size(); + context_params.request_init = internal_req_init; + context_params.request_size = sizeof(ompi_osc_ucx_internal_request_t); + + status = ucp_init(&context_params, config, &mca_osc_ucx_component.ucp_context); + ucp_config_release(config); + if (UCS_OK != status) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_init failed: %d\n", + __FILE__, __LINE__, status); + ret = OMPI_ERROR; + goto error; + } + + return ret; + error: + if (progress_registered) opal_progress_unregister(progress_callback); + if (requests_created) OBJ_DESTRUCT(&mca_osc_ucx_component.requests); + if (mca_osc_ucx_component.ucp_context) ucp_cleanup(mca_osc_ucx_component.ucp_context); + return ret; +} + +static int component_finalize(void) { + int i; + for (i = 0; i < ompi_proc_world_size(); i++) { + ucp_ep_h ep = OSC_UCX_GET_EP(&(ompi_mpi_comm_world.comm), i); + if (ep != NULL) { + ucp_ep_destroy(ep); + } + } + + if (mca_osc_ucx_component.ucp_worker != NULL) { + ucp_worker_destroy(mca_osc_ucx_component.ucp_worker); + } + + assert(mca_osc_ucx_component.num_incomplete_req_ops == 0); + OBJ_DESTRUCT(&mca_osc_ucx_component.requests); + opal_progress_unregister(progress_callback); + ucp_cleanup(mca_osc_ucx_component.ucp_context); + return OMPI_SUCCESS; +} + +static int component_query(struct ompi_win_t *win, void **base, size_t size, int disp_unit, + struct ompi_communicator_t *comm, struct opal_info_t *info, int flavor) { + if (MPI_WIN_FLAVOR_SHARED == flavor) return -1; + return mca_osc_ucx_component.priority; +} + +static inline int allgather_len_and_info(void *my_info, int my_info_len, char **recv_info, + int *disps, struct ompi_communicator_t *comm) { + int ret = OMPI_SUCCESS; + int comm_size = ompi_comm_size(comm); + int lens[comm_size]; + int total_len, i; + + ret = comm->c_coll->coll_allgather(&my_info_len, 1, MPI_INT, + lens, 1, MPI_INT, comm, + comm->c_coll->coll_allgather_module); + if (OMPI_SUCCESS != ret) { + return ret; + } + + total_len = 0; + for (i = 0; i < comm_size; i++) { + disps[i] = total_len; + total_len += lens[i]; + } + + (*recv_info) = (char *)malloc(total_len); + + ret = comm->c_coll->coll_allgatherv(my_info, my_info_len, MPI_BYTE, + (void *)(*recv_info), lens, disps, MPI_BYTE, + comm, comm->c_coll->coll_allgatherv_module); + if (OMPI_SUCCESS != ret) { + return ret; + } + + return ret; +} + +static inline int mem_map(void **base, size_t size, ucp_mem_h *memh_ptr, + ompi_osc_ucx_module_t *module, int flavor) { + ucp_mem_map_params_t mem_params; + ucp_mem_attr_t mem_attrs; + ucs_status_t status; + int ret = OMPI_SUCCESS; + + if (!(flavor == MPI_WIN_FLAVOR_ALLOCATE || flavor == MPI_WIN_FLAVOR_CREATE) + || size == 0) { + return ret; + } + + memset(&mem_params, 0, sizeof(ucp_mem_map_params_t)); + mem_params.field_mask = UCP_MEM_MAP_PARAM_FIELD_ADDRESS | + UCP_MEM_MAP_PARAM_FIELD_LENGTH | + UCP_MEM_MAP_PARAM_FIELD_FLAGS; + mem_params.length = size; + if (flavor == MPI_WIN_FLAVOR_ALLOCATE) { + mem_params.address = NULL; + mem_params.flags = UCP_MEM_MAP_ALLOCATE; + } else { + mem_params.address = (*base); + } + + /* memory map */ + + status = ucp_mem_map(mca_osc_ucx_component.ucp_context, &mem_params, memh_ptr); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_mem_map failed: %d\n", + __FILE__, __LINE__, status); + ret = OMPI_ERROR; + goto error; + } + + mem_attrs.field_mask = UCP_MEM_ATTR_FIELD_ADDRESS | UCP_MEM_ATTR_FIELD_LENGTH; + status = ucp_mem_query((*memh_ptr), &mem_attrs); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_mem_query failed: %d\n", + __FILE__, __LINE__, status); + ret = OMPI_ERROR; + goto error; + } + + assert(mem_attrs.length >= size); + if (flavor == MPI_WIN_FLAVOR_CREATE) { + assert(mem_attrs.address == (*base)); + } else { + (*base) = mem_attrs.address; + } + + return ret; + error: + ucp_mem_unmap(mca_osc_ucx_component.ucp_context, (*memh_ptr)); + return ret; +} + +static int component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit, + struct ompi_communicator_t *comm, struct opal_info_t *info, + int flavor, int *model) { + ompi_osc_ucx_module_t *module = NULL; + char *name = NULL; + long values[2]; + int ret = OMPI_SUCCESS; + ucs_status_t status; + int i, comm_size = ompi_comm_size(comm); + int is_eps_ready; + bool eps_created = false, worker_created = false; + ucp_address_t *my_addr = NULL; + size_t my_addr_len; + char *recv_buf = NULL; + void *rkey_buffer = NULL, *state_rkey_buffer = NULL; + size_t rkey_buffer_size, state_rkey_buffer_size; + void *state_base = NULL; + void * my_info = NULL; + size_t my_info_len; + int disps[comm_size]; + int rkey_sizes[comm_size]; + uint64_t zero = 0; + + /* the osc/sm component is the exclusive provider for support for + * shared memory windows */ + if (flavor == MPI_WIN_FLAVOR_SHARED) { + return OMPI_ERR_NOT_SUPPORTED; + } + + /* if UCP worker has never been initialized before, init it first */ + if (mca_osc_ucx_component.ucp_worker == NULL) { + ucp_worker_params_t worker_params; + ucp_worker_attr_t worker_attr; + + memset(&worker_params, 0, sizeof(ucp_worker_h)); + worker_params.field_mask = UCP_WORKER_PARAM_FIELD_THREAD_MODE; + worker_params.thread_mode = (mca_osc_ucx_component.enable_mpi_threads == true) + ? UCS_THREAD_MODE_MULTI : UCS_THREAD_MODE_SINGLE; + status = ucp_worker_create(mca_osc_ucx_component.ucp_context, &worker_params, + &(mca_osc_ucx_component.ucp_worker)); + if (UCS_OK != status) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_worker_create failed: %d\n", + __FILE__, __LINE__, status); + ret = OMPI_ERROR; + goto error; + } + + /* query UCP worker attributes */ + worker_attr.field_mask = UCP_WORKER_ATTR_FIELD_THREAD_MODE; + status = ucp_worker_query(mca_osc_ucx_component.ucp_worker, &worker_attr); + if (UCS_OK != status) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_worker_query failed: %d\n", + __FILE__, __LINE__, status); + ret = OMPI_ERROR; + goto error; + } + + if (mca_osc_ucx_component.enable_mpi_threads == true && + worker_attr.thread_mode != UCS_THREAD_MODE_MULTI) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucx does not support multithreading\n", + __FILE__, __LINE__); + ret = OMPI_ERROR; + goto error; + } + + worker_created = true; + } + + /* create module structure */ + module = (ompi_osc_ucx_module_t *)calloc(1, sizeof(ompi_osc_ucx_module_t)); + if (module == NULL) { + ret = OMPI_ERR_TEMP_OUT_OF_RESOURCE; + goto error; + } + + /* fill in the function pointer part */ + memcpy(module, &ompi_osc_ucx_module_template, sizeof(ompi_osc_base_module_t)); + + ret = ompi_comm_dup(comm, &module->comm); + if (ret != OMPI_SUCCESS) { + goto error; + } + + *model = MPI_WIN_UNIFIED; + asprintf(&name, "ucx window %d", ompi_comm_get_cid(module->comm)); + ompi_win_set_name(win, name); + free(name); + + module->flavor = flavor; + module->size = size; + + /* share everyone's displacement units. Only do an allgather if + strictly necessary, since it requires O(p) state. */ + values[0] = disp_unit; + values[1] = -disp_unit; + + ret = module->comm->c_coll->coll_allreduce(MPI_IN_PLACE, values, 2, MPI_LONG, + MPI_MIN, module->comm, + module->comm->c_coll->coll_allreduce_module); + if (OMPI_SUCCESS != ret) { + goto error; + } + + if (values[0] == -values[1]) { /* everyone has the same disp_unit, we do not need O(p) space */ + module->disp_unit = disp_unit; + } else { /* different disp_unit sizes, allocate O(p) space to store them */ + module->disp_unit = -1; + module->disp_units = calloc(comm_size, sizeof(int)); + if (module->disp_units == NULL) { + ret = OMPI_ERR_TEMP_OUT_OF_RESOURCE; + goto error; + } + + ret = module->comm->c_coll->coll_allgather(&disp_unit, 1, MPI_INT, + module->disp_units, 1, MPI_INT, + module->comm, + module->comm->c_coll->coll_allgather_module); + if (OMPI_SUCCESS != ret) { + goto error; + } + } + + /* exchange endpoints if necessary */ + is_eps_ready = 1; + for (i = 0; i < comm_size; i++) { + if (OSC_UCX_GET_EP(module->comm, i) == NULL) { + is_eps_ready = 0; + break; + } + } + + ret = module->comm->c_coll->coll_allreduce(MPI_IN_PLACE, &is_eps_ready, 1, MPI_INT, + MPI_LAND, + module->comm, + module->comm->c_coll->coll_allreduce_module); + if (OMPI_SUCCESS != ret) { + goto error; + } + + if (!is_eps_ready) { + status = ucp_worker_get_address(mca_osc_ucx_component.ucp_worker, + &my_addr, &my_addr_len); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_worker_get_address failed: %d\n", + __FILE__, __LINE__, status); + ret = OMPI_ERROR; + goto error; + } + + ret = allgather_len_and_info(my_addr, (int)my_addr_len, + &recv_buf, disps, module->comm); + if (ret != OMPI_SUCCESS) { + goto error; + } + + for (i = 0; i < comm_size; i++) { + if (OSC_UCX_GET_EP(module->comm, i) == NULL) { + ucp_ep_params_t ep_params; + ucp_ep_h ep; + memset(&ep_params, 0, sizeof(ucp_ep_params_t)); + ep_params.field_mask = UCP_EP_PARAM_FIELD_REMOTE_ADDRESS; + ep_params.address = (ucp_address_t *)&(recv_buf[disps[i]]); + status = ucp_ep_create(mca_osc_ucx_component.ucp_worker, &ep_params, &ep); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_ep_create failed: %d\n", + __FILE__, __LINE__, status); + ret = OMPI_ERROR; + goto error; + } + + ompi_comm_peer_lookup(module->comm, i)->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_UCX] = ep; + } + } + + ucp_worker_release_address(mca_osc_ucx_component.ucp_worker, my_addr); + my_addr = NULL; + free(recv_buf); + recv_buf = NULL; + + eps_created = true; + } + + ret = mem_map(base, size, &(module->memh), module, flavor); + if (ret != OMPI_SUCCESS) { + goto error; + } + + state_base = (void *)&(module->state); + ret = mem_map(&state_base, sizeof(ompi_osc_ucx_state_t), &(module->state_memh), + module, MPI_WIN_FLAVOR_CREATE); + if (ret != OMPI_SUCCESS) { + goto error; + } + + module->win_info_array = calloc(comm_size, sizeof(ompi_osc_ucx_win_info_t)); + if (module->win_info_array == NULL) { + ret = OMPI_ERR_TEMP_OUT_OF_RESOURCE; + goto error; + } + + module->state_info_array = calloc(comm_size, sizeof(ompi_osc_ucx_win_info_t)); + if (module->state_info_array == NULL) { + ret = OMPI_ERR_TEMP_OUT_OF_RESOURCE; + goto error; + } + + if (size > 0 && (flavor == MPI_WIN_FLAVOR_ALLOCATE || flavor == MPI_WIN_FLAVOR_CREATE)) { + status = ucp_rkey_pack(mca_osc_ucx_component.ucp_context, module->memh, + &rkey_buffer, &rkey_buffer_size); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_rkey_pack failed: %d\n", + __FILE__, __LINE__, status); + ret = OMPI_ERROR; + goto error; + } + } else { + rkey_buffer_size = 0; + } + + status = ucp_rkey_pack(mca_osc_ucx_component.ucp_context, module->state_memh, + &state_rkey_buffer, &state_rkey_buffer_size); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_rkey_pack failed: %d\n", + __FILE__, __LINE__, status); + ret = OMPI_ERROR; + goto error; + } + + my_info_len = 2 * sizeof(uint64_t) + rkey_buffer_size + state_rkey_buffer_size; + my_info = malloc(my_info_len); + if (my_info == NULL) { + ret = OMPI_ERR_TEMP_OUT_OF_RESOURCE; + goto error; + } + + if (flavor == MPI_WIN_FLAVOR_ALLOCATE || flavor == MPI_WIN_FLAVOR_CREATE) { + memcpy(my_info, base, sizeof(uint64_t)); + } else { + memcpy(my_info, &zero, sizeof(uint64_t)); + } + memcpy((void *)((char *)my_info + sizeof(uint64_t)), &state_base, sizeof(uint64_t)); + memcpy((void *)((char *)my_info + 2 * sizeof(uint64_t)), rkey_buffer, rkey_buffer_size); + memcpy((void *)((char *)my_info + 2 * sizeof(uint64_t) + rkey_buffer_size), + state_rkey_buffer, state_rkey_buffer_size); + + ret = allgather_len_and_info(my_info, (int)my_info_len, &recv_buf, disps, module->comm); + if (ret != OMPI_SUCCESS) { + goto error; + } + + ret = comm->c_coll->coll_allgather((void *)&rkey_buffer_size, 1, MPI_INT, + rkey_sizes, 1, MPI_INT, comm, + comm->c_coll->coll_allgather_module); + if (OMPI_SUCCESS != ret) { + goto error; + } + + for (i = 0; i < comm_size; i++) { + ucp_ep_h ep = OSC_UCX_GET_EP(module->comm, i); + assert(ep != NULL); + + memcpy(&(module->win_info_array[i]).addr, &recv_buf[disps[i]], sizeof(uint64_t)); + memcpy(&(module->state_info_array[i]).addr, &recv_buf[disps[i] + sizeof(uint64_t)], + sizeof(uint64_t)); + + (module->win_info_array[i]).rkey_init = false; + if (size > 0 && (flavor == MPI_WIN_FLAVOR_ALLOCATE || flavor == MPI_WIN_FLAVOR_CREATE)) { + status = ucp_ep_rkey_unpack(ep, &(recv_buf[disps[i] + 2 * sizeof(uint64_t)]), + &((module->win_info_array[i]).rkey)); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_ep_rkey_unpack failed: %d\n", + __FILE__, __LINE__, status); + ret = OMPI_ERROR; + goto error; + } + (module->win_info_array[i]).rkey_init = true; + } + + status = ucp_ep_rkey_unpack(ep, &(recv_buf[disps[i] + 2 * sizeof(uint64_t) + rkey_sizes[i]]), + &((module->state_info_array[i]).rkey)); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_ep_rkey_unpack failed: %d\n", + __FILE__, __LINE__, status); + ret = OMPI_ERROR; + goto error; + } + (module->state_info_array[i]).rkey_init = true; + } + + free(my_info); + free(recv_buf); + + if (rkey_buffer_size != 0) { + ucp_rkey_buffer_release(rkey_buffer); + } + ucp_rkey_buffer_release(state_rkey_buffer); + + module->state.lock = TARGET_LOCK_UNLOCKED; + module->state.post_index = 0; + memset((void *)module->state.post_state, 0, sizeof(uint64_t) * OMPI_OSC_UCX_POST_PEER_MAX); + module->state.complete_count = 0; + module->state.req_flag = 0; + module->state.acc_lock = TARGET_LOCK_UNLOCKED; + module->state.dynamic_win_count = 0; + for (i = 0; i < OMPI_OSC_UCX_ATTACH_MAX; i++) { + module->local_dynamic_win_info[i].refcnt = 0; + } + module->epoch_type.access = NONE_EPOCH; + module->epoch_type.exposure = NONE_EPOCH; + module->lock_count = 0; + module->post_count = 0; + module->start_group = NULL; + module->post_group = NULL; + OBJ_CONSTRUCT(&module->outstanding_locks, opal_hash_table_t); + OBJ_CONSTRUCT(&module->pending_posts, opal_list_t); + module->global_ops_num = 0; + module->per_target_ops_nums = calloc(comm_size, sizeof(int)); + module->start_grp_ranks = NULL; + module->lock_all_is_nocheck = false; + + ret = opal_hash_table_init(&module->outstanding_locks, comm_size); + if (ret != OPAL_SUCCESS) { + goto error; + } + + win->w_osc_module = &module->super; + + /* sync with everyone */ + + ret = module->comm->c_coll->coll_barrier(module->comm, + module->comm->c_coll->coll_barrier_module); + if (ret != OMPI_SUCCESS) { + goto error; + } + + return ret; + + error: + if (my_addr) ucp_worker_release_address(mca_osc_ucx_component.ucp_worker, my_addr); + if (recv_buf) free(recv_buf); + if (my_info) free(my_info); + for (i = 0; i < comm_size; i++) { + if ((module->win_info_array[i]).rkey != NULL) { + ucp_rkey_destroy((module->win_info_array[i]).rkey); + } + if ((module->state_info_array[i]).rkey != NULL) { + ucp_rkey_destroy((module->state_info_array[i]).rkey); + } + } + if (rkey_buffer) ucp_rkey_buffer_release(rkey_buffer); + if (state_rkey_buffer) ucp_rkey_buffer_release(state_rkey_buffer); + if (module->win_info_array) free(module->win_info_array); + if (module->state_info_array) free(module->state_info_array); + if (module->disp_units) free(module->disp_units); + if (module->comm) ompi_comm_free(&module->comm); + if (module->per_target_ops_nums) free(module->per_target_ops_nums); + if (eps_created) { + for (i = 0; i < comm_size; i++) { + ucp_ep_h ep = OSC_UCX_GET_EP(module->comm, i); + ucp_ep_destroy(ep); + } + } + if (worker_created) ucp_worker_destroy(mca_osc_ucx_component.ucp_worker); + if (module) free(module); + return ret; +} + +int ompi_osc_find_attached_region_position(ompi_osc_dynamic_win_info_t *dynamic_wins, + int min_index, int max_index, + uint64_t base, size_t len, int *insert) { + int mid_index = (max_index + min_index) >> 1; + + if (min_index > max_index) { + (*insert) = min_index; + return -1; + } + + if (dynamic_wins[mid_index].base > base) { + return ompi_osc_find_attached_region_position(dynamic_wins, min_index, mid_index-1, + base, len, insert); + } else if (base + len < dynamic_wins[mid_index].base + dynamic_wins[mid_index].size) { + return mid_index; + } else { + return ompi_osc_find_attached_region_position(dynamic_wins, mid_index+1, max_index, + base, len, insert); + } +} + +int ompi_osc_ucx_win_attach(struct ompi_win_t *win, void *base, size_t len) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + int insert_index = -1, contain_index; + void *rkey_buffer; + size_t rkey_buffer_size; + int ret = OMPI_SUCCESS; + ucs_status_t status; + + if (module->state.dynamic_win_count >= OMPI_OSC_UCX_ATTACH_MAX) { + return OMPI_ERR_TEMP_OUT_OF_RESOURCE; + } + + if (module->state.dynamic_win_count > 0) { + contain_index = ompi_osc_find_attached_region_position((ompi_osc_dynamic_win_info_t *)module->state.dynamic_wins, + 0, (int)module->state.dynamic_win_count, + (uint64_t)base, len, &insert_index); + if (contain_index >= 0) { + module->local_dynamic_win_info[contain_index].refcnt++; + return ret; + } + + assert(insert_index >= 0 && insert_index < module->state.dynamic_win_count); + + memmove((void *)&module->local_dynamic_win_info[insert_index+1], + (void *)&module->local_dynamic_win_info[insert_index], + (OMPI_OSC_UCX_ATTACH_MAX - (insert_index + 1)) * sizeof(ompi_osc_local_dynamic_win_info_t)); + memmove((void *)&module->state.dynamic_wins[insert_index+1], + (void *)&module->state.dynamic_wins[insert_index], + (OMPI_OSC_UCX_ATTACH_MAX - (insert_index + 1)) * sizeof(ompi_osc_dynamic_win_info_t)); + } else { + insert_index = 0; + } + + ret = mem_map(&base, len, &(module->local_dynamic_win_info[insert_index].memh), + module, MPI_WIN_FLAVOR_CREATE); + if (ret != OMPI_SUCCESS) { + return ret; + } + + module->state.dynamic_wins[insert_index].base = (uint64_t)base; + module->state.dynamic_wins[insert_index].size = len; + + status = ucp_rkey_pack(mca_osc_ucx_component.ucp_context, + module->local_dynamic_win_info[insert_index].memh, + &rkey_buffer, (size_t *)&rkey_buffer_size); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_rkey_pack failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + assert(rkey_buffer_size <= OMPI_OSC_UCX_RKEY_BUF_MAX); + memcpy((char *)(module->state.dynamic_wins[insert_index].rkey_buffer), + (char *)rkey_buffer, rkey_buffer_size); + + module->local_dynamic_win_info[insert_index].refcnt++; + module->state.dynamic_win_count++; + + ucp_rkey_buffer_release(rkey_buffer); + + return ret; +} + +int ompi_osc_ucx_win_detach(struct ompi_win_t *win, const void *base) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + int insert, contain; + + assert(module->state.dynamic_win_count > 0); + + contain = ompi_osc_find_attached_region_position((ompi_osc_dynamic_win_info_t *)module->state.dynamic_wins, + 0, (int)module->state.dynamic_win_count, + (uint64_t)base, 1, &insert); + assert(contain >= 0 && contain < module->state.dynamic_win_count); + + module->local_dynamic_win_info[contain].refcnt--; + if (module->local_dynamic_win_info[contain].refcnt == 0) { + ucp_mem_unmap(mca_osc_ucx_component.ucp_context, + module->local_dynamic_win_info[contain].memh); + memmove((void *)&(module->local_dynamic_win_info[contain]), + (void *)&(module->local_dynamic_win_info[contain+1]), + (OMPI_OSC_UCX_ATTACH_MAX - (contain + 1)) * sizeof(ompi_osc_local_dynamic_win_info_t)); + memmove((void *)&module->state.dynamic_wins[contain], + (void *)&module->state.dynamic_wins[contain+1], + (OMPI_OSC_UCX_ATTACH_MAX - (contain + 1)) * sizeof(ompi_osc_dynamic_win_info_t)); + + module->state.dynamic_win_count--; + } + + return OMPI_SUCCESS; +} + +int ompi_osc_ucx_free(struct ompi_win_t *win) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + int i, ret = OMPI_SUCCESS; + + if ((module->epoch_type.access != NONE_EPOCH && module->epoch_type.access != FENCE_EPOCH) + || module->epoch_type.exposure != NONE_EPOCH) { + ret = OMPI_ERR_RMA_SYNC; + } + + if (module->start_group != NULL || module->post_group != NULL) { + ret = OMPI_ERR_RMA_SYNC; + } + + assert(module->global_ops_num == 0); + assert(module->lock_count == 0); + assert(opal_list_is_empty(&module->pending_posts) == true); + OBJ_DESTRUCT(&module->outstanding_locks); + OBJ_DESTRUCT(&module->pending_posts); + + while (module->state.lock != TARGET_LOCK_UNLOCKED) { + /* not sure if this is required */ + ucp_worker_progress(mca_osc_ucx_component.ucp_worker); + } + + ret = module->comm->c_coll->coll_barrier(module->comm, + module->comm->c_coll->coll_barrier_module); + + for (i = 0; i < ompi_comm_size(module->comm); i++) { + if ((module->win_info_array[i]).rkey_init == true) { + ucp_rkey_destroy((module->win_info_array[i]).rkey); + (module->win_info_array[i]).rkey_init == false; + } + ucp_rkey_destroy((module->state_info_array[i]).rkey); + } + free(module->win_info_array); + free(module->state_info_array); + + free(module->per_target_ops_nums); + + if ((module->flavor == MPI_WIN_FLAVOR_ALLOCATE || module->flavor == MPI_WIN_FLAVOR_CREATE) + && module->size > 0) { + ucp_mem_unmap(mca_osc_ucx_component.ucp_context, module->memh); + } + ucp_mem_unmap(mca_osc_ucx_component.ucp_context, module->state_memh); + + if (module->disp_units) free(module->disp_units); + ompi_comm_free(&module->comm); + + free(module); + + return ret; +} diff --git a/ompi/mca/osc/ucx/osc_ucx_passive_target.c b/ompi/mca/osc/ucx/osc_ucx_passive_target.c new file mode 100644 index 00000000000..9f2fe98b638 --- /dev/null +++ b/ompi/mca/osc/ucx/osc_ucx_passive_target.c @@ -0,0 +1,365 @@ +/* + * Copyright (C) Mellanox Technologies Ltd. 2001-2017. ALL RIGHTS RESERVED. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "ompi_config.h" + +#include "ompi/mca/osc/osc.h" +#include "ompi/mca/osc/base/base.h" +#include "ompi/mca/osc/base/osc_base_obj_convert.h" + +#include "osc_ucx.h" + +OBJ_CLASS_INSTANCE(ompi_osc_ucx_lock_t, opal_object_t, NULL, NULL); + +static inline int start_shared(ompi_osc_ucx_module_t *module, int target) { + uint64_t result_value = -1; + ucp_ep_h ep = OSC_UCX_GET_EP(module->comm, target); + ucp_rkey_h rkey = (module->state_info_array)[target].rkey; + uint64_t remote_addr = (module->state_info_array)[target].addr + OSC_UCX_STATE_LOCK_OFFSET; + ucs_status_t status; + + while (true) { + status = ucp_atomic_fadd64(ep, 1, remote_addr, rkey, &result_value); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_atomic_fadd64 failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + assert(result_value >= 0); + if (result_value >= TARGET_LOCK_EXCLUSIVE) { + status = ucp_atomic_add64(ep, (-1), remote_addr, rkey); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_atomic_add64 failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + } else { + break; + } + } + + return OMPI_SUCCESS; +} + +static inline int end_shared(ompi_osc_ucx_module_t *module, int target) { + ucp_ep_h ep = OSC_UCX_GET_EP(module->comm, target); + ucp_rkey_h rkey = (module->state_info_array)[target].rkey; + uint64_t remote_addr = (module->state_info_array)[target].addr + OSC_UCX_STATE_LOCK_OFFSET; + ucs_status_t status; + + status = ucp_atomic_add64(ep, (-1), remote_addr, rkey); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_atomic_add64 failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + return OMPI_SUCCESS; +} + +static inline int start_exclusive(ompi_osc_ucx_module_t *module, int target) { + uint64_t result_value = -1; + ucp_ep_h ep = OSC_UCX_GET_EP(module->comm, target); + ucp_rkey_h rkey = (module->state_info_array)[target].rkey; + uint64_t remote_addr = (module->state_info_array)[target].addr + OSC_UCX_STATE_LOCK_OFFSET; + ucs_status_t status; + + while (result_value != TARGET_LOCK_UNLOCKED) { + status = ucp_atomic_cswap64(ep, TARGET_LOCK_UNLOCKED, + TARGET_LOCK_EXCLUSIVE, + remote_addr, rkey, &result_value); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_atomic_cswap64 failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + } + + return OMPI_SUCCESS; +} + +static inline int end_exclusive(ompi_osc_ucx_module_t *module, int target) { + uint64_t result_value = 0; + ucp_ep_h ep = OSC_UCX_GET_EP(module->comm, target); + ucp_rkey_h rkey = (module->state_info_array)[target].rkey; + uint64_t remote_addr = (module->state_info_array)[target].addr + OSC_UCX_STATE_LOCK_OFFSET; + ucs_status_t status; + + status = ucp_atomic_swap64(ep, TARGET_LOCK_UNLOCKED, + remote_addr, rkey, &result_value); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_atomic_swap64 failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + assert(result_value >= TARGET_LOCK_EXCLUSIVE); + + return OMPI_SUCCESS; +} + +int ompi_osc_ucx_lock(int lock_type, int target, int assert, struct ompi_win_t *win) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t *)win->w_osc_module; + ompi_osc_ucx_lock_t *lock = NULL; + ompi_osc_ucx_epoch_t original_epoch = module->epoch_type.access; + int ret = OMPI_SUCCESS; + + if (module->lock_count == 0) { + if (module->epoch_type.access != NONE_EPOCH && + module->epoch_type.access != FENCE_EPOCH) { + return OMPI_ERR_RMA_SYNC; + } + } else { + ompi_osc_ucx_lock_t *item = NULL; + assert(module->epoch_type.access == PASSIVE_EPOCH); + opal_hash_table_get_value_uint32(&module->outstanding_locks, (uint32_t) target, (void **) &item); + if (item != NULL) { + return OMPI_ERR_RMA_SYNC; + } + } + + module->epoch_type.access = PASSIVE_EPOCH; + module->lock_count++; + assert(module->lock_count <= ompi_comm_size(module->comm)); + + lock = OBJ_NEW(ompi_osc_ucx_lock_t); + lock->target_rank = target; + + if ((assert & MPI_MODE_NOCHECK) == 0) { + lock->is_nocheck = false; + if (lock_type == MPI_LOCK_EXCLUSIVE) { + ret = start_exclusive(module, target); + lock->type = LOCK_EXCLUSIVE; + } else { + ret = start_shared(module, target); + lock->type = LOCK_SHARED; + } + } else { + lock->is_nocheck = true; + } + + if (ret == OMPI_SUCCESS) { + opal_hash_table_set_value_uint32(&module->outstanding_locks, (uint32_t)target, (void *)lock); + } else { + OBJ_RELEASE(lock); + module->epoch_type.access = original_epoch; + } + + return ret; +} + +int ompi_osc_ucx_unlock(int target, struct ompi_win_t *win) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t *)win->w_osc_module; + ompi_osc_ucx_lock_t *lock = NULL; + ucs_status_t status; + int ret = OMPI_SUCCESS; + ucp_ep_h ep; + + if (module->epoch_type.access != PASSIVE_EPOCH) { + return OMPI_ERR_RMA_SYNC; + } + + opal_hash_table_get_value_uint32(&module->outstanding_locks, (uint32_t) target, (void **) &lock); + if (lock == NULL) { + return OMPI_ERR_RMA_SYNC; + } + + opal_hash_table_remove_value_uint32(&module->outstanding_locks, + (uint32_t)target); + + ep = OSC_UCX_GET_EP(module->comm, target); + status = ucp_ep_flush(ep); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_ep_flush failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + module->global_ops_num -= module->per_target_ops_nums[target]; + module->per_target_ops_nums[target] = 0; + + if (lock->is_nocheck == false) { + if (lock->type == LOCK_EXCLUSIVE) { + ret = end_exclusive(module, target); + } else { + ret = end_shared(module, target); + } + } + + OBJ_RELEASE(lock); + + module->lock_count--; + assert(module->lock_count >= 0); + if (module->lock_count == 0) { + module->epoch_type.access = NONE_EPOCH; + assert(module->global_ops_num == 0); + } + + return ret; +} + +int ompi_osc_ucx_lock_all(int assert, struct ompi_win_t *win) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + int ret = OMPI_SUCCESS; + + if (module->epoch_type.access != NONE_EPOCH && + module->epoch_type.access != FENCE_EPOCH) { + return OMPI_ERR_RMA_SYNC; + } + + module->epoch_type.access = PASSIVE_ALL_EPOCH; + + if (0 == (assert & MPI_MODE_NOCHECK)) { + int i, comm_size; + module->lock_all_is_nocheck = false; + comm_size = ompi_comm_size(module->comm); + for (i = 0; i < comm_size; i++) { + ret = start_shared(module, i); + if (ret != OMPI_SUCCESS) { + int j; + for (j = 0; j < i; j++) { + end_shared(module, j); + } + return ret; + } + } + } else { + module->lock_all_is_nocheck = true; + } + + if (ret != OMPI_SUCCESS) { + module->epoch_type.access = NONE_EPOCH; + } + + return ret; +} + +int ompi_osc_ucx_unlock_all(struct ompi_win_t *win) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*)win->w_osc_module; + int comm_size = ompi_comm_size(module->comm); + ucs_status_t status; + int ret = OMPI_SUCCESS; + + if (module->epoch_type.access != PASSIVE_ALL_EPOCH) { + return OMPI_ERR_RMA_SYNC; + } + + assert(module->lock_count == 0); + + status = ucp_worker_flush(mca_osc_ucx_component.ucp_worker); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_worker_flush failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + module->global_ops_num = 0; + memset(module->per_target_ops_nums, 0, sizeof(int) * comm_size); + + if (!module->lock_all_is_nocheck) { + int i; + for (i = 0; i < comm_size; i++) { + ret |= end_shared(module, i); + } + } + + module->epoch_type.access = NONE_EPOCH; + + return ret; +} + +int ompi_osc_ucx_sync(struct ompi_win_t *win) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t *)win->w_osc_module; + ucs_status_t status; + + if (module->epoch_type.access != PASSIVE_EPOCH && + module->epoch_type.access != PASSIVE_ALL_EPOCH) { + return OMPI_ERR_RMA_SYNC; + } + + opal_atomic_mb(); + + status = ucp_worker_fence(mca_osc_ucx_component.ucp_worker); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_worker_fence failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + return OMPI_SUCCESS; +} + +int ompi_osc_ucx_flush(int target, struct ompi_win_t *win) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t*) win->w_osc_module; + ucp_ep_h ep; + ucs_status_t status; + + if (module->epoch_type.access != PASSIVE_EPOCH && + module->epoch_type.access != PASSIVE_ALL_EPOCH) { + return OMPI_ERR_RMA_SYNC; + } + + ep = OSC_UCX_GET_EP(module->comm, target); + status = ucp_ep_flush(ep); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_ep_flush failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + module->global_ops_num -= module->per_target_ops_nums[target]; + module->per_target_ops_nums[target] = 0; + + return OMPI_SUCCESS; +} + +int ompi_osc_ucx_flush_all(struct ompi_win_t *win) { + ompi_osc_ucx_module_t *module = (ompi_osc_ucx_module_t *)win->w_osc_module; + ucs_status_t status; + + if (module->epoch_type.access != PASSIVE_EPOCH && + module->epoch_type.access != PASSIVE_ALL_EPOCH) { + return OMPI_ERR_RMA_SYNC; + } + + status = ucp_worker_flush(mca_osc_ucx_component.ucp_worker); + if (status != UCS_OK) { + opal_output_verbose(1, ompi_osc_base_framework.framework_output, + "%s:%d: ucp_worker_flush failed: %d\n", + __FILE__, __LINE__, status); + return OMPI_ERROR; + } + + module->global_ops_num = 0; + memset(module->per_target_ops_nums, 0, + sizeof(int) * ompi_comm_size(module->comm)); + + return OMPI_SUCCESS; +} + +int ompi_osc_ucx_flush_local(int target, struct ompi_win_t *win) { + /* TODO: currently euqals to ompi_osc_ucx_flush, should find a way + * to implement local completion */ + return ompi_osc_ucx_flush(target, win); +} + +int ompi_osc_ucx_flush_local_all(struct ompi_win_t *win) { + /* TODO: currently euqals to ompi_osc_ucx_flush_all, should find a way + * to implement local completion */ + return ompi_osc_ucx_flush_all(win); +} diff --git a/ompi/mca/osc/ucx/osc_ucx_request.c b/ompi/mca/osc/ucx/osc_ucx_request.c new file mode 100644 index 00000000000..efbd9c38cc6 --- /dev/null +++ b/ompi/mca/osc/ucx/osc_ucx_request.c @@ -0,0 +1,65 @@ +/* + * Copyright (C) Mellanox Technologies Ltd. 2001-2017. ALL RIGHTS RESERVED. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "ompi_config.h" + +#include "ompi/request/request.h" +#include "ompi/mca/osc/osc.h" +#include "ompi/mca/osc/base/base.h" +#include "ompi/mca/osc/base/osc_base_obj_convert.h" + +#include "osc_ucx.h" +#include "osc_ucx_request.h" + +static int request_cancel(struct ompi_request_t *request, int complete) +{ + return MPI_ERR_REQUEST; +} + +static int request_free(struct ompi_request_t **ompi_req) +{ + ompi_osc_ucx_request_t *request = (ompi_osc_ucx_request_t*) *ompi_req; + + if (true != (bool)(request->super.req_complete)) { + return MPI_ERR_REQUEST; + } + + OMPI_OSC_UCX_REQUEST_RETURN(request); + + *ompi_req = MPI_REQUEST_NULL; + + return OMPI_SUCCESS; +} + +static void request_construct(ompi_osc_ucx_request_t *request) +{ + request->super.req_type = OMPI_REQUEST_WIN; + request->super.req_status._cancelled = 0; + request->super.req_free = request_free; + request->super.req_cancel = request_cancel; +} + +void internal_req_init(void *request) { + ompi_osc_ucx_internal_request_t *req = (ompi_osc_ucx_internal_request_t *)request; + req->external_req = NULL; +} + +void req_completion(void *request, ucs_status_t status) { + ompi_osc_ucx_internal_request_t *req = (ompi_osc_ucx_internal_request_t *)request; + + if(req->external_req != NULL) { + ompi_request_complete(&(req->external_req->super), true); + ucp_request_release(req); + mca_osc_ucx_component.num_incomplete_req_ops--; + assert(mca_osc_ucx_component.num_incomplete_req_ops >= 0); + } +} + +OBJ_CLASS_INSTANCE(ompi_osc_ucx_request_t, ompi_request_t, + request_construct, NULL); diff --git a/ompi/mca/osc/ucx/osc_ucx_request.h b/ompi/mca/osc/ucx/osc_ucx_request.h new file mode 100644 index 00000000000..b33bc54c2de --- /dev/null +++ b/ompi/mca/osc/ucx/osc_ucx_request.h @@ -0,0 +1,56 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ +/* + * Copyright (c) 2011-2013 Sandia National Laboratories. All rights reserved. + * Copyright (c) 2015 Los Alamos National Security, LLC. All rights + * reserved. + * Copyright (C) Mellanox Technologies Ltd. 2001-2017. ALL RIGHTS RESERVED. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#ifndef OMPI_OSC_UCX_REQUEST_H +#define OMPI_OSC_UCX_REQUEST_H + +#include "ompi/request/request.h" + +typedef struct ompi_osc_ucx_request { + ompi_request_t super; +} ompi_osc_ucx_request_t; + +OBJ_CLASS_DECLARATION(ompi_osc_ucx_request_t); + +typedef struct ompi_osc_ucx_internal_request { + ompi_osc_ucx_request_t *external_req; +} ompi_osc_ucx_internal_request_t; + +#define OMPI_OSC_UCX_REQUEST_ALLOC(win, req) \ + do { \ + opal_free_list_item_t *item; \ + do { \ + item = opal_free_list_get(&mca_osc_ucx_component.requests); \ + if (item == NULL) { \ + if (mca_osc_ucx_component.ucp_worker != NULL && \ + mca_osc_ucx_component.num_incomplete_req_ops > 0) { \ + ucp_worker_progress(mca_osc_ucx_component.ucp_worker); \ + } \ + } \ + } while (item == NULL); \ + req = (ompi_osc_ucx_request_t*) item; \ + OMPI_REQUEST_INIT(&req->super, false); \ + req->super.req_mpi_object.win = win; \ + req->super.req_complete = false; \ + req->super.req_state = OMPI_REQUEST_ACTIVE; \ + req->super.req_status.MPI_ERROR = MPI_SUCCESS; \ + } while (0) + +#define OMPI_OSC_UCX_REQUEST_RETURN(req) \ + do { \ + OMPI_REQUEST_FINI(&request->super); \ + opal_free_list_return (&mca_osc_ucx_component.requests, \ + (opal_free_list_item_t*) req); \ + } while (0) + +#endif /* OMPI_OSC_UCX_REQUEST_H */ diff --git a/ompi/mca/pml/base/pml_base_bsend.c b/ompi/mca/pml/base/pml_base_bsend.c index f683570f708..ef6be82599a 100644 --- a/ompi/mca/pml/base/pml_base_bsend.c +++ b/ompi/mca/pml/base/pml_base_bsend.c @@ -81,7 +81,7 @@ int mca_pml_base_bsend_init(bool thread_safe) { size_t tmp; - if(OPAL_THREAD_ADD32(&mca_pml_bsend_init, 1) > 1) + if(OPAL_THREAD_ADD_FETCH32(&mca_pml_bsend_init, 1) > 1) return OMPI_SUCCESS; /* initialize static objects */ @@ -109,7 +109,7 @@ int mca_pml_base_bsend_init(bool thread_safe) */ int mca_pml_base_bsend_fini(void) { - if(OPAL_THREAD_ADD32(&mca_pml_bsend_init,-1) > 0) + if(OPAL_THREAD_ADD_FETCH32(&mca_pml_bsend_init,-1) > 0) return OMPI_SUCCESS; if(NULL != mca_pml_bsend_allocator) diff --git a/ompi/mca/pml/base/pml_base_frame.c b/ompi/mca/pml/base/pml_base_frame.c index 64f82224a25..bf35186ef73 100644 --- a/ompi/mca/pml/base/pml_base_frame.c +++ b/ompi/mca/pml/base/pml_base_frame.c @@ -213,6 +213,7 @@ static int mca_pml_base_open(mca_base_open_flag_t flags) 0 == strlen(default_pml[0])) || (default_pml[0][0] == '^') ) { opal_pointer_array_add(&mca_pml_base_pml, strdup("ob1")); opal_pointer_array_add(&mca_pml_base_pml, strdup("yalla")); + opal_pointer_array_add(&mca_pml_base_pml, strdup("ucx")); opal_pointer_array_add(&mca_pml_base_pml, strdup("cm")); } else { opal_pointer_array_add(&mca_pml_base_pml, strdup(default_pml[0])); diff --git a/ompi/mca/pml/base/pml_base_sendreq.h b/ompi/mca/pml/base/pml_base_sendreq.h index 1e85d8044ad..3f6cce1e578 100644 --- a/ompi/mca/pml/base/pml_base_sendreq.h +++ b/ompi/mca/pml/base/pml_base_sendreq.h @@ -15,6 +15,7 @@ * and Technology (RIST). All rights reserved. * Copyright (c) 2016 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -115,8 +116,9 @@ OMPI_DECLSPEC OBJ_CLASS_DECLARATION( mca_pml_base_send_request_t ); #define MCA_PML_BASE_SEND_REQUEST_RESET(request) \ if ((request)->req_bytes_packed > 0) { \ + size_t cnt = 0; \ opal_convertor_set_position(&(sendreq)->req_send.req_base.req_convertor, \ - &(size_t){0}); \ + &cnt); \ } /** @@ -153,4 +155,3 @@ OMPI_DECLSPEC OBJ_CLASS_DECLARATION( mca_pml_base_send_request_t ); END_C_DECLS #endif - diff --git a/ompi/mca/pml/bfo/Makefile.am b/ompi/mca/pml/bfo/Makefile.am index 5df9be74924..7565d84c13e 100644 --- a/ompi/mca/pml/bfo/Makefile.am +++ b/ompi/mca/pml/bfo/Makefile.am @@ -12,6 +12,7 @@ # Copyright (c) 2009-2010 Oracle and/or its affiliates. All rights reserved. # Copyright (c) 2009-2010 Cisco Systems, Inc. All rights reserved. # +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -70,6 +71,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_pml_bfo_la_SOURCES = $(bfo_sources) mca_pml_bfo_la_LDFLAGS = -module -avoid-version +mca_pml_bfo_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_pml_bfo_la_SOURCES = $(bfo_sources) diff --git a/ompi/mca/pml/bfo/pml_bfo_failover.h b/ompi/mca/pml/bfo/pml_bfo_failover.h index d1b97807adb..ea4f70fdc48 100644 --- a/ompi/mca/pml/bfo/pml_bfo_failover.h +++ b/ompi/mca/pml/bfo/pml_bfo_failover.h @@ -261,7 +261,7 @@ extern void mca_pml_bfo_recv_frag_callback_recverrnotify( mca_btl_base_module_t */ #define MCA_PML_BFO_VERIFY_SENDREQ_REQ_STATE_VALUE(sendreq) \ if (sendreq->req_state == -1) { \ - OPAL_THREAD_ADD32(&sendreq->req_state, 1); \ + OPAL_THREAD_ADD_FETCH32(&sendreq->req_state, 1); \ } /* Now check the error state. This request can be in error if the diff --git a/ompi/mca/pml/bfo/pml_bfo_recvfrag.c b/ompi/mca/pml/bfo/pml_bfo_recvfrag.c index ce6827d5385..c7216c0d538 100644 --- a/ompi/mca/pml/bfo/pml_bfo_recvfrag.c +++ b/ompi/mca/pml/bfo/pml_bfo_recvfrag.c @@ -328,7 +328,7 @@ void mca_pml_bfo_recv_frag_callback_ack(mca_btl_base_module_t* btl, * protocol has req_state == 0 and as such should not be * decremented. */ - OPAL_THREAD_ADD32(&sendreq->req_state, -1); + OPAL_THREAD_ADD_FETCH32(&sendreq->req_state, -1); } if(send_request_pml_complete_check(sendreq) == false) diff --git a/ompi/mca/pml/bfo/pml_bfo_recvreq.c b/ompi/mca/pml/bfo/pml_bfo_recvreq.c index 2cf1534b64d..c0658f10ef3 100644 --- a/ompi/mca/pml/bfo/pml_bfo_recvreq.c +++ b/ompi/mca/pml/bfo/pml_bfo_recvreq.c @@ -154,6 +154,7 @@ static int mca_pml_bfo_recv_request_cancel(struct ompi_request_t* ompi_request, static void mca_pml_bfo_recv_request_construct(mca_pml_bfo_recv_request_t* request) { request->req_recv.req_base.req_type = MCA_PML_REQUEST_RECV; + request->req_recv.req_base.req_ompi.req_start = mca_pml_bfo_start; request->req_recv.req_base.req_ompi.req_free = mca_pml_bfo_recv_request_free; request->req_recv.req_base.req_ompi.req_cancel = mca_pml_bfo_recv_request_cancel; request->req_rdma_cnt = 0; @@ -205,7 +206,7 @@ static void mca_pml_bfo_put_completion( mca_btl_base_module_t* btl, (void *) des->des_remote, des->des_remote_count, 0); } - OPAL_THREAD_SUB_SIZE_T(&recvreq->req_pipeline_depth, 1); + OPAL_THREAD_SUB_FETCH_SIZE_T(&recvreq->req_pipeline_depth, 1); #if PML_BFO btl->btl_free(btl, des); @@ -216,7 +217,7 @@ static void mca_pml_bfo_put_completion( mca_btl_base_module_t* btl, #endif /* PML_BFO */ /* check completion status */ - OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, bytes_received); + OPAL_THREAD_ADD_FETCH_SIZE_T(&recvreq->req_bytes_received, bytes_received); if(recv_request_pml_complete_check(recvreq) == false && recvreq->req_rdma_offset < recvreq->req_send_offset) { /* schedule additional rdma operations */ @@ -387,7 +388,7 @@ static void mca_pml_bfo_rget_completion( mca_btl_base_module_t* btl, #endif /* PML_BFO */ /* is receive request complete */ - OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, frag->rdma_length); + OPAL_THREAD_ADD_FETCH_SIZE_T(&recvreq->req_bytes_received, frag->rdma_length); recv_request_pml_complete_check(recvreq); MCA_PML_BFO_RDMA_FRAG_RETURN(frag); @@ -505,7 +506,7 @@ void mca_pml_bfo_recv_request_progress_frag( mca_pml_bfo_recv_request_t* recvreq recvreq->req_recv.req_base.req_datatype); ); - OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, bytes_received); + OPAL_THREAD_ADD_FETCH_SIZE_T(&recvreq->req_bytes_received, bytes_received); /* check completion status */ if(recv_request_pml_complete_check(recvreq) == false && recvreq->req_rdma_offset < recvreq->req_send_offset) { @@ -667,7 +668,7 @@ void mca_pml_bfo_recv_request_progress_rndv( mca_pml_bfo_recv_request_t* recvreq recvreq->req_recv.req_base.req_datatype); ); } - OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, bytes_received); + OPAL_THREAD_ADD_FETCH_SIZE_T(&recvreq->req_bytes_received, bytes_received); /* check completion status */ if(recv_request_pml_complete_check(recvreq) == false && recvreq->req_rdma_offset < recvreq->req_send_offset) { @@ -902,7 +903,7 @@ int mca_pml_bfo_recv_request_schedule_once( mca_pml_bfo_recv_request_t* recvreq, #endif /* PML_BFO */ /* update request state */ recvreq->req_rdma_offset += size; - OPAL_THREAD_ADD_SIZE_T(&recvreq->req_pipeline_depth, 1); + OPAL_THREAD_ADD_FETCH_SIZE_T(&recvreq->req_pipeline_depth, 1); recvreq->req_rdma[rdma_idx].length -= size; bytes_remaining -= size; } else { diff --git a/ompi/mca/pml/bfo/pml_bfo_recvreq.h b/ompi/mca/pml/bfo/pml_bfo_recvreq.h index 9c3f53989a4..7b3a6db6271 100644 --- a/ompi/mca/pml/bfo/pml_bfo_recvreq.h +++ b/ompi/mca/pml/bfo/pml_bfo_recvreq.h @@ -70,12 +70,12 @@ OBJ_CLASS_DECLARATION(mca_pml_bfo_recv_request_t); static inline bool lock_recv_request(mca_pml_bfo_recv_request_t *recvreq) { - return OPAL_THREAD_ADD32(&recvreq->req_lock, 1) == 1; + return OPAL_THREAD_ADD_FETCH32(&recvreq->req_lock, 1) == 1; } static inline bool unlock_recv_request(mca_pml_bfo_recv_request_t *recvreq) { - return OPAL_THREAD_ADD32(&recvreq->req_lock, -1) == 0; + return OPAL_THREAD_ADD_FETCH32(&recvreq->req_lock, -1) == 0; } /** diff --git a/ompi/mca/pml/bfo/pml_bfo_sendreq.c b/ompi/mca/pml/bfo/pml_bfo_sendreq.c index 815097ef78c..176eadf4f6e 100644 --- a/ompi/mca/pml/bfo/pml_bfo_sendreq.c +++ b/ompi/mca/pml/bfo/pml_bfo_sendreq.c @@ -131,6 +131,7 @@ static int mca_pml_bfo_send_request_cancel(struct ompi_request_t* request, int c static void mca_pml_bfo_send_request_construct(mca_pml_bfo_send_request_t* req) { req->req_send.req_base.req_type = MCA_PML_REQUEST_SEND; + req->req_send.req_base.req_ompi.req_start = mca_pml_bfo_start; req->req_send.req_base.req_ompi.req_free = mca_pml_bfo_send_request_free; req->req_send.req_base.req_ompi.req_cancel = mca_pml_bfo_send_request_cancel; req->req_rdma_cnt = 0; @@ -206,10 +207,10 @@ mca_pml_bfo_rndv_completion_request( mca_bml_base_btl_t* bml_btl, &(sendreq->req_send.req_base), PERUSE_SEND ); } - OPAL_THREAD_ADD_SIZE_T(&sendreq->req_bytes_delivered, req_bytes_delivered); + OPAL_THREAD_ADD_FETCH_SIZE_T(&sendreq->req_bytes_delivered, req_bytes_delivered); /* advance the request */ - OPAL_THREAD_ADD32(&sendreq->req_state, -1); + OPAL_THREAD_ADD_FETCH32(&sendreq->req_state, -1); send_request_pml_complete_check(sendreq); @@ -286,7 +287,7 @@ mca_pml_bfo_rget_completion( mca_btl_base_module_t* btl, (void *) des->des_local, des->des_local_count, 0); if (OPAL_LIKELY(0 < req_bytes_delivered)) { - OPAL_THREAD_ADD_SIZE_T(&sendreq->req_bytes_delivered, req_bytes_delivered); + OPAL_THREAD_ADD_FETCH_SIZE_T(&sendreq->req_bytes_delivered, req_bytes_delivered); } send_request_pml_complete_check(sendreq); @@ -359,8 +360,8 @@ mca_pml_bfo_frag_completion( mca_btl_base_module_t* btl, des->des_local_count, sizeof(mca_pml_bfo_frag_hdr_t)); - OPAL_THREAD_SUB_SIZE_T(&sendreq->req_pipeline_depth, 1); - OPAL_THREAD_ADD_SIZE_T(&sendreq->req_bytes_delivered, req_bytes_delivered); + OPAL_THREAD_SUB_FETCH_SIZE_T(&sendreq->req_pipeline_depth, 1); + OPAL_THREAD_ADD_FETCH_SIZE_T(&sendreq->req_bytes_delivered, req_bytes_delivered); #if PML_BFO MCA_PML_BFO_FRAG_COMPLETION_SENDREQ_ERROR_CHECK(sendreq, status, btl, @@ -1163,7 +1164,7 @@ mca_pml_bfo_send_request_schedule_once(mca_pml_bfo_send_request_t* sendreq) range->range_btls[btl_idx].length -= size; range->range_send_length -= size; range->range_send_offset += size; - OPAL_THREAD_ADD_SIZE_T(&sendreq->req_pipeline_depth, 1); + OPAL_THREAD_ADD_FETCH_SIZE_T(&sendreq->req_pipeline_depth, 1); if(range->range_send_length == 0) { range = get_next_send_range(sendreq, range); prev_bytes_remaining = 0; @@ -1225,7 +1226,7 @@ static void mca_pml_bfo_put_completion( mca_btl_base_module_t* btl, #endif /* PML_BFO */ /* check for request completion */ - OPAL_THREAD_ADD_SIZE_T(&sendreq->req_bytes_delivered, frag->rdma_length); + OPAL_THREAD_ADD_FETCH_SIZE_T(&sendreq->req_bytes_delivered, frag->rdma_length); send_request_pml_complete_check(sendreq); @@ -1334,7 +1335,7 @@ void mca_pml_bfo_send_request_put( mca_pml_bfo_send_request_t* sendreq, size_t i, size = 0; if(hdr->hdr_common.hdr_flags & MCA_PML_BFO_HDR_TYPE_ACK) { - OPAL_THREAD_ADD32(&sendreq->req_state, -1); + OPAL_THREAD_ADD_FETCH32(&sendreq->req_state, -1); } #if PML_BFO MCA_PML_BFO_VERIFY_SENDREQ_REQ_STATE_VALUE(sendreq); diff --git a/ompi/mca/pml/bfo/pml_bfo_sendreq.h b/ompi/mca/pml/bfo/pml_bfo_sendreq.h index 37f15af4578..170512ffe3e 100644 --- a/ompi/mca/pml/bfo/pml_bfo_sendreq.h +++ b/ompi/mca/pml/bfo/pml_bfo_sendreq.h @@ -78,12 +78,12 @@ OBJ_CLASS_DECLARATION(mca_pml_bfo_send_range_t); static inline bool lock_send_request(mca_pml_bfo_send_request_t *sendreq) { - return OPAL_THREAD_ADD32(&sendreq->req_lock, 1) == 1; + return OPAL_THREAD_ADD_FETCH32(&sendreq->req_lock, 1) == 1; } static inline bool unlock_send_request(mca_pml_bfo_send_request_t *sendreq) { - return OPAL_THREAD_ADD32(&sendreq->req_lock, -1) == 0; + return OPAL_THREAD_ADD_FETCH32(&sendreq->req_lock, -1) == 0; } static inline void @@ -445,7 +445,7 @@ mca_pml_bfo_send_request_start( mca_pml_bfo_send_request_t* sendreq ) sendreq->req_pipeline_depth = 0; sendreq->req_bytes_delivered = 0; sendreq->req_pending = MCA_PML_BFO_SEND_PENDING_NONE; - sendreq->req_send.req_base.req_sequence = OPAL_THREAD_ADD32( + sendreq->req_send.req_base.req_sequence = OPAL_THREAD_ADD_FETCH32( &comm->procs[sendreq->req_send.req_base.req_peer].send_sequence,1); #if PML_BFO sendreq->req_restartseq = 0; /* counts up restarts */ diff --git a/ompi/mca/pml/cm/Makefile.am b/ompi/mca/pml/cm/Makefile.am index 28ad04fb5dc..d1a43fe8841 100644 --- a/ompi/mca/pml/cm/Makefile.am +++ b/ompi/mca/pml/cm/Makefile.am @@ -4,6 +4,7 @@ # Copyright (c) 2009 High Performance Computing Center Stuttgart, # University of Stuttgart. All rights reserved. # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -42,7 +43,8 @@ local_sources = \ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_pml_cm_la_SOURCES = $(local_sources) -mca_pml_cm_la_LIBADD = $(pml_cm_LIBS) +mca_pml_cm_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(pml_cm_LIBS) mca_pml_cm_la_LDFLAGS = -module -avoid-version $(pml_cm_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/ompi/mca/pml/cm/pml_cm.h b/ompi/mca/pml/cm/pml_cm.h index ba055c474ea..b3c06eb83bf 100644 --- a/ompi/mca/pml/cm/pml_cm.h +++ b/ompi/mca/pml/cm/pml_cm.h @@ -6,6 +6,7 @@ * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 Intel, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -79,6 +80,7 @@ mca_pml_cm_irecv_init(void *addr, struct ompi_request_t **request) { mca_pml_cm_hvy_recv_request_t *recvreq; + uint32_t flags = 0; #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT ompi_proc_t* ompi_proc; #endif @@ -87,7 +89,7 @@ mca_pml_cm_irecv_init(void *addr, if( OPAL_UNLIKELY(NULL == recvreq) ) return OMPI_ERR_OUT_OF_RESOURCE; MCA_PML_CM_HVY_RECV_REQUEST_INIT(recvreq, ompi_proc, comm, tag, src, - datatype, addr, count, true); + datatype, addr, count, flags, true); *request = (ompi_request_t*) recvreq; @@ -104,6 +106,7 @@ mca_pml_cm_irecv(void *addr, struct ompi_request_t **request) { int ret; + uint32_t flags = 0; mca_pml_cm_thin_recv_request_t *recvreq; #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT ompi_proc_t* ompi_proc = NULL; @@ -118,7 +121,8 @@ mca_pml_cm_irecv(void *addr, src, datatype, addr, - count); + count, + flags); MCA_PML_CM_THIN_RECV_REQUEST_START(recvreq, comm, tag, src, ret); @@ -145,6 +149,7 @@ mca_pml_cm_recv(void *addr, ompi_status_public_t * status) { int ret; + uint32_t flags = 0; #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT ompi_proc_t *ompi_proc; #endif @@ -173,20 +178,24 @@ mca_pml_cm_recv(void *addr, ompi_proc = ompi_comm_peer_lookup( comm, src ); } + MCA_PML_CM_SWITCH_CUDA_CONVERTOR_OFF(flags, datatype, count); + opal_convertor_copy_and_prepare_for_recv( ompi_proc->super.proc_convertor, &(datatype->super), count, addr, - 0, + flags, &convertor ); #else + MCA_PML_CM_SWITCH_CUDA_CONVERTOR_OFF(flags, datatype, count); + opal_convertor_copy_and_prepare_for_recv( ompi_mpi_local_convertor, &(datatype->super), count, addr, - 0, + flags, &convertor ); #endif @@ -222,6 +231,7 @@ mca_pml_cm_isend_init(const void* buf, ompi_request_t** request) { mca_pml_cm_hvy_send_request_t *sendreq; + uint32_t flags = 0; #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT ompi_proc_t* ompi_proc; #endif @@ -230,7 +240,7 @@ mca_pml_cm_isend_init(const void* buf, if (OPAL_UNLIKELY(NULL == sendreq)) return OMPI_ERR_OUT_OF_RESOURCE; MCA_PML_CM_HVY_SEND_REQUEST_INIT(sendreq, ompi_proc, comm, tag, dst, - datatype, sendmode, true, false, buf, count); + datatype, sendmode, true, false, buf, count, flags); /* Work around a leak in start by marking this request as complete. The * problem occured because we do not have a way to differentiate an @@ -254,6 +264,7 @@ mca_pml_cm_isend(const void* buf, ompi_request_t** request) { int ret; + uint32_t flags = 0; if(sendmode == MCA_PML_BASE_SEND_BUFFERED ) { mca_pml_cm_hvy_send_request_t* sendreq; @@ -274,7 +285,8 @@ mca_pml_cm_isend(const void* buf, false, false, buf, - count); + count, + flags); MCA_PML_CM_HVY_SEND_REQUEST_START( sendreq, ret); @@ -296,7 +308,8 @@ mca_pml_cm_isend(const void* buf, datatype, sendmode, buf, - count); + count, + flags); MCA_PML_CM_THIN_SEND_REQUEST_START( sendreq, @@ -324,6 +337,7 @@ mca_pml_cm_send(const void *buf, ompi_communicator_t* comm) { int ret = OMPI_ERROR; + uint32_t flags = 0; ompi_proc_t * ompi_proc; if(sendmode == MCA_PML_BASE_SEND_BUFFERED) { @@ -342,7 +356,8 @@ mca_pml_cm_send(const void *buf, false, false, buf, - count); + count, + flags); MCA_PML_CM_HVY_SEND_REQUEST_START(sendreq, ret); if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { MCA_PML_CM_HVY_SEND_REQUEST_RETURN(sendreq); @@ -368,9 +383,12 @@ mca_pml_cm_send(const void *buf, #endif { ompi_proc = ompi_comm_peer_lookup(comm, dst); + + MCA_PML_CM_SWITCH_CUDA_CONVERTOR_OFF(flags, datatype, count); + opal_convertor_copy_and_prepare_for_send( ompi_proc->super.proc_convertor, - &datatype->super, count, buf, 0, + &datatype->super, count, buf, flags, &convertor); } @@ -459,6 +477,7 @@ mca_pml_cm_imrecv(void *buf, struct ompi_request_t **request) { int ret; + uint32_t flags = 0; mca_pml_cm_thin_recv_request_t *recvreq; #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT ompi_proc_t* ompi_proc; @@ -474,7 +493,8 @@ mca_pml_cm_imrecv(void *buf, (*message)->peer, datatype, buf, - count); + count, + flags); MCA_PML_CM_THIN_RECV_REQUEST_MATCHED_START(recvreq, message, ret); @@ -491,6 +511,7 @@ mca_pml_cm_mrecv(void *buf, ompi_status_public_t* status) { int ret; + uint32_t flags = 0; mca_pml_cm_thin_recv_request_t *recvreq; #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT ompi_proc_t* ompi_proc; @@ -506,7 +527,8 @@ mca_pml_cm_mrecv(void *buf, (*message)->peer, datatype, buf, - count); + count, + flags); MCA_PML_CM_THIN_RECV_REQUEST_MATCHED_START(recvreq, message, ret); diff --git a/ompi/mca/pml/cm/pml_cm_recvreq.c b/ompi/mca/pml/cm/pml_cm_recvreq.c index 707666c6aac..ccece912117 100644 --- a/ompi/mca/pml/cm/pml_cm_recvreq.c +++ b/ompi/mca/pml/cm/pml_cm_recvreq.c @@ -56,6 +56,7 @@ void mca_pml_cm_recv_request_completion(struct mca_mtl_request_t *mtl_request) static void mca_pml_cm_recv_request_construct(mca_pml_cm_thin_recv_request_t* recvreq) { + recvreq->req_base.req_ompi.req_start = mca_pml_cm_start; recvreq->req_base.req_ompi.req_free = mca_pml_cm_recv_request_free; recvreq->req_base.req_ompi.req_cancel = mca_pml_cm_cancel; OBJ_CONSTRUCT( &(recvreq->req_base.req_convertor), opal_convertor_t ); diff --git a/ompi/mca/pml/cm/pml_cm_recvreq.h b/ompi/mca/pml/cm/pml_cm_recvreq.h index d0774bac1c0..6729cac886e 100644 --- a/ompi/mca/pml/cm/pml_cm_recvreq.h +++ b/ompi/mca/pml/cm/pml_cm_recvreq.h @@ -13,6 +13,7 @@ * Copyright (c) 2012 Sandia National Laboratories. All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017 Intel, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -92,7 +93,8 @@ do { \ src, \ datatype, \ addr, \ - count ) \ + count, \ + flags ) \ do { \ OMPI_REQUEST_INIT(&(request)->req_base.req_ompi, false); \ (request)->req_base.req_ompi.req_mpi_object.comm = comm; \ @@ -108,12 +110,13 @@ do { \ } else { \ ompi_proc = ompi_comm_peer_lookup( comm, src ); \ } \ + MCA_PML_CM_SWITCH_CUDA_CONVERTOR_OFF(flags, datatype, count); \ opal_convertor_copy_and_prepare_for_recv( \ ompi_proc->super.proc_convertor, \ &(datatype->super), \ count, \ addr, \ - 0, \ + flags, \ &(request)->req_base.req_convertor ); \ } while(0) #else @@ -123,7 +126,8 @@ do { \ src, \ datatype, \ addr, \ - count ) \ + count, \ + flags ) \ do { \ OMPI_REQUEST_INIT(&(request)->req_base.req_ompi, false); \ (request)->req_base.req_ompi.req_mpi_object.comm = comm; \ @@ -134,12 +138,13 @@ do { \ OBJ_RETAIN(comm); \ OMPI_DATATYPE_RETAIN(datatype); \ \ + MCA_PML_CM_SWITCH_CUDA_CONVERTOR_OFF(flags, datatype, count); \ opal_convertor_copy_and_prepare_for_recv( \ ompi_mpi_local_convertor, \ &(datatype->super), \ count, \ addr, \ - 0, \ + flags, \ &(request)->req_base.req_convertor ); \ } while(0) #endif @@ -153,6 +158,7 @@ do { \ datatype, \ addr, \ count, \ + flags, \ persistent) \ do { \ OMPI_REQUEST_INIT(&(request)->req_base.req_ompi, persistent); \ @@ -173,12 +179,13 @@ do { \ } else { \ ompi_proc = ompi_comm_peer_lookup( comm, src ); \ } \ + MCA_PML_CM_SWITCH_CUDA_CONVERTOR_OFF(flags, datatype, count); \ opal_convertor_copy_and_prepare_for_recv( \ ompi_proc->super.proc_convertor, \ &(datatype->super), \ count, \ addr, \ - 0, \ + flags, \ &(request)->req_base.req_convertor ); \ } while(0) #else @@ -190,6 +197,7 @@ do { \ datatype, \ addr, \ count, \ + flags, \ persistent) \ do { \ OMPI_REQUEST_INIT(&(request)->req_base.req_ompi, persistent); \ @@ -205,12 +213,13 @@ do { \ OBJ_RETAIN(comm); \ OMPI_DATATYPE_RETAIN(datatype); \ \ + MCA_PML_CM_SWITCH_CUDA_CONVERTOR_OFF(flags, datatype, count); \ opal_convertor_copy_and_prepare_for_recv( \ ompi_mpi_local_convertor, \ &(datatype->super), \ count, \ addr, \ - 0, \ + flags, \ &(request)->req_base.req_convertor ); \ } while(0) #endif diff --git a/ompi/mca/pml/cm/pml_cm_request.h b/ompi/mca/pml/cm/pml_cm_request.h index f0605f94a12..8f2f9b6547a 100644 --- a/ompi/mca/pml/cm/pml_cm_request.h +++ b/ompi/mca/pml/cm/pml_cm_request.h @@ -9,6 +9,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2006 The Regents of the University of California. * All rights reserved. + * Copyright (c) 2017 Intel, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -53,4 +54,20 @@ struct mca_pml_cm_request_t { typedef struct mca_pml_cm_request_t mca_pml_cm_request_t; OBJ_CLASS_DECLARATION(mca_pml_cm_request_t); +/* + * Avoid CUDA convertor inits only for contiguous memory and if indicated by + * the MTL. For non-contiguous memory, do not skip CUDA convertor init phases. + */ +#if OPAL_CUDA_SUPPORT +#define MCA_PML_CM_SWITCH_CUDA_CONVERTOR_OFF(flags, datatype, count) \ + { \ + if (opal_datatype_is_contiguous_memory_layout(&datatype->super, count) \ + && (ompi_mtl->mtl_flags & MCA_MTL_BASE_FLAG_CUDA_INIT_DISABLE)) { \ + flags |= CONVERTOR_SKIP_CUDA_INIT; \ + } \ + } +#else +#define MCA_PML_CM_SWITCH_CUDA_CONVERTOR_OFF(flags, datatype, count) +#endif + #endif diff --git a/ompi/mca/pml/cm/pml_cm_sendreq.c b/ompi/mca/pml/cm/pml_cm_sendreq.c index 8d0f3bad90f..6d156286f45 100644 --- a/ompi/mca/pml/cm/pml_cm_sendreq.c +++ b/ompi/mca/pml/cm/pml_cm_sendreq.c @@ -63,6 +63,7 @@ mca_pml_cm_send_request_completion(struct mca_mtl_request_t *mtl_request) static void mca_pml_cm_send_request_construct(mca_pml_cm_hvy_send_request_t* sendreq) { /* no need to reinit for every send -- never changes */ + sendreq->req_send.req_base.req_ompi.req_start = mca_pml_cm_start; sendreq->req_send.req_base.req_ompi.req_free = mca_pml_cm_send_request_free; sendreq->req_send.req_base.req_ompi.req_cancel = mca_pml_cm_cancel; } diff --git a/ompi/mca/pml/cm/pml_cm_sendreq.h b/ompi/mca/pml/cm/pml_cm_sendreq.h index e03eebf092b..3560270b99f 100644 --- a/ompi/mca/pml/cm/pml_cm_sendreq.h +++ b/ompi/mca/pml/cm/pml_cm_sendreq.h @@ -10,10 +10,11 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2006 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2015-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 Intel, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -125,18 +126,20 @@ do { \ datatype, \ sendmode, \ buf, \ - count) \ + count, \ + flags ) \ { \ OBJ_RETAIN(comm); \ OMPI_DATATYPE_RETAIN(datatype); \ (req_send)->req_base.req_comm = comm; \ (req_send)->req_base.req_datatype = datatype; \ + MCA_PML_CM_SWITCH_CUDA_CONVERTOR_OFF(flags, datatype, count); \ opal_convertor_copy_and_prepare_for_send( \ ompi_proc->super.proc_convertor, \ &(datatype->super), \ count, \ buf, \ - 0, \ + flags, \ &(req_send)->req_base.req_convertor ); \ (req_send)->req_base.req_ompi.req_mpi_object.comm = comm; \ (req_send)->req_base.req_ompi.req_status.MPI_SOURCE = \ @@ -154,18 +157,20 @@ do { \ datatype, \ sendmode, \ buf, \ - count) \ + count, \ + flags ) \ { \ OBJ_RETAIN(comm); \ OMPI_DATATYPE_RETAIN(datatype); \ (req_send)->req_base.req_comm = comm; \ (req_send)->req_base.req_datatype = datatype; \ + MCA_PML_CM_SWITCH_CUDA_CONVERTOR_OFF(flags, datatype, count); \ opal_convertor_copy_and_prepare_for_send( \ ompi_mpi_local_convertor, \ &(datatype->super), \ count, \ buf, \ - 0, \ + flags, \ &(req_send)->req_base.req_convertor ); \ (req_send)->req_base.req_ompi.req_mpi_object.comm = comm; \ (req_send)->req_base.req_ompi.req_status.MPI_SOURCE = \ @@ -185,18 +190,20 @@ do { \ datatype, \ sendmode, \ buf, \ - count) \ + count, \ + flags ) \ { \ OBJ_RETAIN(comm); \ OMPI_DATATYPE_RETAIN(datatype); \ (req_send)->req_base.req_comm = comm; \ (req_send)->req_base.req_datatype = datatype; \ + MCA_PML_CM_SWITCH_CUDA_CONVERTOR_OFF(flags, datatype, count); \ opal_convertor_copy_and_prepare_for_send( \ ompi_proc->super.proc_convertor, \ &(datatype->super), \ count, \ buf, \ - 0, \ + flags, \ &(req_send)->req_base.req_convertor ); \ (req_send)->req_base.req_ompi.req_mpi_object.comm = comm; \ (req_send)->req_base.req_ompi.req_status.MPI_SOURCE = \ @@ -215,7 +222,8 @@ do { \ datatype, \ sendmode, \ buf, \ - count) \ + count, \ + flags ) \ { \ OBJ_RETAIN(comm); \ OMPI_DATATYPE_RETAIN(datatype); \ @@ -235,12 +243,13 @@ do { \ (req_send)->req_base.req_convertor.count = count; \ (req_send)->req_base.req_convertor.pDesc = &datatype->super; \ } else { \ + MCA_PML_CM_SWITCH_CUDA_CONVERTOR_OFF(flags, datatype, count); \ opal_convertor_copy_and_prepare_for_send( \ ompi_mpi_local_convertor, \ &(datatype->super), \ count, \ buf, \ - 0, \ + flags, \ &(req_send)->req_base.req_convertor ); \ } \ (req_send)->req_base.req_ompi.req_mpi_object.comm = comm; \ @@ -263,7 +272,8 @@ do { \ persistent, \ blocking, \ buf, \ - count) \ + count, \ + flags ) \ do { \ OMPI_REQUEST_INIT(&(sendreq->req_send.req_base.req_ompi), \ persistent); \ @@ -278,7 +288,8 @@ do { \ datatype, \ sendmode, \ buf, \ - count); \ + count, \ + flags ) \ opal_convertor_get_packed_size( \ &sendreq->req_send.req_base.req_convertor, \ &sendreq->req_count ); \ @@ -297,7 +308,8 @@ do { \ datatype, \ sendmode, \ buf, \ - count) \ + count, \ + flags ) \ do { \ OMPI_REQUEST_INIT(&(sendreq->req_send.req_base.req_ompi), \ false); \ @@ -308,7 +320,8 @@ do { \ datatype, \ sendmode, \ buf, \ - count); \ + count, \ + flags); \ sendreq->req_send.req_base.req_pml_complete = false; \ } while(0) @@ -369,28 +382,31 @@ do { \ } while(0); -#define MCA_PML_CM_HVY_SEND_REQUEST_START(sendreq, ret) \ -do { \ - ret = OMPI_SUCCESS; \ - MCA_PML_CM_SEND_REQUEST_START_SETUP(&(sendreq)->req_send); \ - if (sendreq->req_send.req_send_mode == MCA_PML_BASE_SEND_BUFFERED) { \ - MCA_PML_CM_HVY_SEND_REQUEST_BSEND_ALLOC(sendreq, ret); \ - } \ - if (OMPI_SUCCESS == ret) { \ - ret = OMPI_MTL_CALL(isend(ompi_mtl, \ - sendreq->req_send.req_base.req_comm, \ - sendreq->req_peer, \ - sendreq->req_tag, \ - &sendreq->req_send.req_base.req_convertor, \ - sendreq->req_send.req_send_mode, \ - sendreq->req_blocking, \ - &sendreq->req_mtl)); \ - if(OMPI_SUCCESS == ret && \ - sendreq->req_send.req_send_mode == MCA_PML_BASE_SEND_BUFFERED) { \ - sendreq->req_send.req_base.req_ompi.req_status.MPI_ERROR = 0; \ - ompi_request_complete(&(sendreq)->req_send.req_base.req_ompi, true); \ - } \ - } \ +#define MCA_PML_CM_HVY_SEND_REQUEST_START(sendreq, ret) \ +do { \ + ret = OMPI_SUCCESS; \ + MCA_PML_CM_SEND_REQUEST_START_SETUP(&(sendreq)->req_send); \ + if (sendreq->req_send.req_send_mode == MCA_PML_BASE_SEND_BUFFERED) { \ + MCA_PML_CM_HVY_SEND_REQUEST_BSEND_ALLOC(sendreq, ret); \ + } \ + if (OMPI_SUCCESS == ret) { \ + ret = OMPI_MTL_CALL(isend(ompi_mtl, \ + sendreq->req_send.req_base.req_comm, \ + sendreq->req_peer, \ + sendreq->req_tag, \ + &sendreq->req_send.req_base.req_convertor, \ + sendreq->req_send.req_send_mode, \ + sendreq->req_blocking, \ + &sendreq->req_mtl)); \ + if(OMPI_SUCCESS == ret && \ + sendreq->req_send.req_send_mode == MCA_PML_BASE_SEND_BUFFERED) { \ + sendreq->req_send.req_base.req_ompi.req_status.MPI_ERROR = 0; \ + if(!REQUEST_COMPLETE(&sendreq->req_send.req_base.req_ompi)) { \ + /* request may have already been marked complete by the MTL */ \ + ompi_request_complete(&(sendreq)->req_send.req_base.req_ompi, true); \ + } \ + } \ + } \ } while (0) /* @@ -410,7 +426,7 @@ do { } \ \ if( !REQUEST_COMPLETE(&sendreq->req_send.req_base.req_ompi)) { \ - /* Should only be called for long messages (maybe synchronous) */ \ + /* the request may have already been marked complete by the MTL */ \ ompi_request_complete(&(sendreq->req_send.req_base.req_ompi), true); \ } \ sendreq->req_send.req_base.req_pml_complete = true; \ diff --git a/ompi/mca/pml/crcpw/Makefile.am b/ompi/mca/pml/crcpw/Makefile.am index 41cf2db5c82..7eaef6bf1e2 100644 --- a/ompi/mca/pml/crcpw/Makefile.am +++ b/ompi/mca/pml/crcpw/Makefile.am @@ -11,6 +11,7 @@ # All rights reserved. # # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -36,6 +37,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_pml_crcpw_la_SOURCES = $(crcpw_sources) mca_pml_crcpw_la_LDFLAGS = -module -avoid-version +mca_pml_crcpw_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_pml_crcpw_la_SOURCES = $(crcpw_sources) diff --git a/ompi/mca/pml/example/Makefile.am b/ompi/mca/pml/example/Makefile.am index b1cb203e84b..4c3588848a9 100644 --- a/ompi/mca/pml/example/Makefile.am +++ b/ompi/mca/pml/example/Makefile.am @@ -8,6 +8,7 @@ # Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, # University of Stuttgart. All rights reserved. # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -52,6 +53,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_pml_example_la_SOURCES = $(local_sources) mca_pml_example_la_LDFLAGS = -module -avoid-version +mca_pml_example_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_pml_example_la_SOURCES = $(local_sources) diff --git a/ompi/mca/pml/monitoring/Makefile.am b/ompi/mca/pml/monitoring/Makefile.am index 517af90c0fd..431f5be9ba9 100644 --- a/ompi/mca/pml/monitoring/Makefile.am +++ b/ompi/mca/pml/monitoring/Makefile.am @@ -3,6 +3,7 @@ # of Tennessee Research Foundation. All rights # reserved. # Copyright (c) 2013-2015 Inria. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -11,7 +12,6 @@ # monitoring_sources = \ - pml_monitoring.c \ pml_monitoring.h \ pml_monitoring_comm.c \ pml_monitoring_component.c \ @@ -32,6 +32,8 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_pml_monitoring_la_SOURCES = $(monitoring_sources) mca_pml_monitoring_la_LDFLAGS = -module -avoid-version +mca_pml_monitoring_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(OMPI_TOP_BUILDDIR)/ompi/mca/common/monitoring/libmca_common_monitoring.la noinst_LTLIBRARIES = $(component_noinst) libmca_pml_monitoring_la_SOURCES = $(monitoring_sources) diff --git a/ompi/mca/pml/monitoring/configure.m4 b/ompi/mca/pml/monitoring/configure.m4 new file mode 100644 index 00000000000..27815f22957 --- /dev/null +++ b/ompi/mca/pml/monitoring/configure.m4 @@ -0,0 +1,24 @@ +# -*- shell-script -*- +# +# Copyright (c) 2017 The University of Tennessee and The University +# of Tennessee Research Foundation. All rights +# reserved. +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +# MCA_ompi_coll_monitoring_CONFIG([action-if-can-compile], +# [action-if-cant-compile]) +# ------------------------------------------------ +AC_DEFUN([MCA_ompi_pml_monitoring_CONFIG],[ + AC_CONFIG_FILES([ompi/mca/pml/monitoring/Makefile]) + + AS_IF([test "$MCA_BUILD_ompi_common_monitoring_DSO_TRUE" = ''], + [$1], + [$2]) +])dnl + + diff --git a/ompi/mca/pml/monitoring/pml_monitoring.c b/ompi/mca/pml/monitoring/pml_monitoring.c deleted file mode 100644 index 5fc7bee32a0..00000000000 --- a/ompi/mca/pml/monitoring/pml_monitoring.c +++ /dev/null @@ -1,258 +0,0 @@ -/* - * Copyright (c) 2013-2016 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2013-2015 Inria. All rights reserved. - * Copyright (c) 2015 Bull SAS. All rights reserved. - * Copyright (c) 2016 Research Organization for Information Science - * and Technology (RIST). All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#include -#include -#include "opal/class/opal_hash_table.h" - -/* array for stroring monitoring data*/ -uint64_t* sent_data = NULL; -uint64_t* messages_count = NULL; -uint64_t* filtered_sent_data = NULL; -uint64_t* filtered_messages_count = NULL; - -static int init_done = 0; -static int nbprocs = -1; -static int my_rank = -1; -opal_hash_table_t *translation_ht = NULL; - - -mca_pml_monitoring_module_t mca_pml_monitoring = { - mca_pml_monitoring_add_procs, - mca_pml_monitoring_del_procs, - mca_pml_monitoring_enable, - NULL, - mca_pml_monitoring_add_comm, - mca_pml_monitoring_del_comm, - mca_pml_monitoring_irecv_init, - mca_pml_monitoring_irecv, - mca_pml_monitoring_recv, - mca_pml_monitoring_isend_init, - mca_pml_monitoring_isend, - mca_pml_monitoring_send, - mca_pml_monitoring_iprobe, - mca_pml_monitoring_probe, - mca_pml_monitoring_start, - mca_pml_monitoring_improbe, - mca_pml_monitoring_mprobe, - mca_pml_monitoring_imrecv, - mca_pml_monitoring_mrecv, - mca_pml_monitoring_dump, - NULL, - 65535, - INT_MAX -}; - -/** - * This PML monitors only the processes in the MPI_COMM_WORLD. As OMPI is now lazily - * adding peers on the first call to add_procs we need to check how many processes - * are in the MPI_COMM_WORLD to create the storage with the right size. - */ -int mca_pml_monitoring_add_procs(struct ompi_proc_t **procs, - size_t nprocs) -{ - opal_process_name_t tmp, wp_name; - size_t i, peer_rank, nprocs_world; - uint64_t key; - - if(NULL == translation_ht) { - translation_ht = OBJ_NEW(opal_hash_table_t); - opal_hash_table_init(translation_ht, 2048); - /* get my rank in the MPI_COMM_WORLD */ - my_rank = ompi_comm_rank((ompi_communicator_t*)&ompi_mpi_comm_world); - } - - nprocs_world = ompi_comm_size((ompi_communicator_t*)&ompi_mpi_comm_world); - /* For all procs in the same MPI_COMM_WORLD we need to add them to the hash table */ - for( i = 0; i < nprocs; i++ ) { - - /* Extract the peer procname from the procs array */ - if( ompi_proc_is_sentinel(procs[i]) ) { - tmp = ompi_proc_sentinel_to_name((uintptr_t)procs[i]); - } else { - tmp = procs[i]->super.proc_name; - } - if( tmp.jobid != ompi_proc_local_proc->super.proc_name.jobid ) - continue; - - for( peer_rank = 0; peer_rank < nprocs_world; peer_rank++ ) { - wp_name = ompi_group_get_proc_name(((ompi_communicator_t*)&ompi_mpi_comm_world)->c_remote_group, peer_rank); - if( 0 != opal_compare_proc( tmp, wp_name) ) - continue; - - /* Find the rank of the peer in MPI_COMM_WORLD */ - key = *((uint64_t*)&tmp); - /* store the rank (in COMM_WORLD) of the process - with its name (a uniq opal ID) as key in the hash table*/ - if( OPAL_SUCCESS != opal_hash_table_set_value_uint64(translation_ht, - key, (void*)(uintptr_t)peer_rank) ) { - return OMPI_ERR_OUT_OF_RESOURCE; /* failed to allocate memory or growing the hash table */ - } - break; - } - } - return pml_selected_module.pml_add_procs(procs, nprocs); -} - -/** - * Pass the information down the PML stack. - */ -int mca_pml_monitoring_del_procs(struct ompi_proc_t **procs, - size_t nprocs) -{ - return pml_selected_module.pml_del_procs(procs, nprocs); -} - -int mca_pml_monitoring_dump(struct ompi_communicator_t* comm, - int verbose) -{ - return pml_selected_module.pml_dump(comm, verbose); -} - - -void finalize_monitoring( void ) -{ - free(filtered_sent_data); - free(filtered_messages_count); - free(sent_data); - free(messages_count); - opal_hash_table_remove_all( translation_ht ); - free(translation_ht); -} - -/** - * We have delayed the initialization until the first send so that we know that - * the MPI_COMM_WORLD (which is the only communicator we are interested on at - * this point) is correctly initialized. - */ -static void initialize_monitoring( void ) -{ - nbprocs = ompi_comm_size((ompi_communicator_t*)&ompi_mpi_comm_world); - sent_data = (uint64_t*)calloc(nbprocs, sizeof(uint64_t)); - messages_count = (uint64_t*)calloc(nbprocs, sizeof(uint64_t)); - filtered_sent_data = (uint64_t*)calloc(nbprocs, sizeof(uint64_t)); - filtered_messages_count = (uint64_t*)calloc(nbprocs, sizeof(uint64_t)); - - init_done = 1; -} - -void mca_pml_monitoring_reset( void ) -{ - if( !init_done ) return; - memset(sent_data, 0, nbprocs * sizeof(uint64_t)); - memset(messages_count, 0, nbprocs * sizeof(uint64_t)); - memset(filtered_sent_data, 0, nbprocs * sizeof(uint64_t)); - memset(filtered_messages_count, 0, nbprocs * sizeof(uint64_t)); -} - -void monitor_send_data(int world_rank, size_t data_size, int tag) -{ - if( 0 == filter_monitoring() ) return; /* right now the monitoring is not started */ - - if ( !init_done ) - initialize_monitoring(); - - /* distinguishses positive and negative tags if requested */ - if( (tag < 0) && (1 == filter_monitoring()) ) { - filtered_sent_data[world_rank] += data_size; - filtered_messages_count[world_rank]++; - } else { /* if filtered monitoring is not activated data is aggregated indifferently */ - sent_data[world_rank] += data_size; - messages_count[world_rank]++; - } -} - -int mca_pml_monitoring_get_messages_count(const struct mca_base_pvar_t *pvar, - void *value, - void *obj_handle) -{ - ompi_communicator_t *comm = (ompi_communicator_t *) obj_handle; - int i, comm_size = ompi_comm_size (comm); - uint64_t *values = (uint64_t*) value; - - if(comm != &ompi_mpi_comm_world.comm || NULL == messages_count) - return OMPI_ERROR; - - for (i = 0 ; i < comm_size ; ++i) { - values[i] = messages_count[i]; - } - - return OMPI_SUCCESS; -} - -int mca_pml_monitoring_get_messages_size(const struct mca_base_pvar_t *pvar, - void *value, - void *obj_handle) -{ - ompi_communicator_t *comm = (ompi_communicator_t *) obj_handle; - int comm_size = ompi_comm_size (comm); - uint64_t *values = (uint64_t*) value; - int i; - - if(comm != &ompi_mpi_comm_world.comm || NULL == sent_data) - return OMPI_ERROR; - - for (i = 0 ; i < comm_size ; ++i) { - values[i] = sent_data[i]; - } - - return OMPI_SUCCESS; -} - -static void output_monitoring( FILE *pf ) -{ - if( 0 == filter_monitoring() ) return; /* if disabled do nothing */ - - for (int i = 0 ; i < nbprocs ; i++) { - if(sent_data[i] > 0) { - fprintf(pf, "I\t%d\t%d\t%" PRIu64 " bytes\t%" PRIu64 " msgs sent\n", - my_rank, i, sent_data[i], messages_count[i]); - } - } - - if( 1 == filter_monitoring() ) return; - - for (int i = 0 ; i < nbprocs ; i++) { - if(filtered_sent_data[i] > 0) { - fprintf(pf, "E\t%d\t%d\t%" PRIu64 " bytes\t%" PRIu64 " msgs sent\n", - my_rank, i, filtered_sent_data[i], filtered_messages_count[i]); - } - } -} - - -/* - Flushes the monitoring into filename - Useful for phases (see example in test/monitoring) -*/ -int ompi_mca_pml_monitoring_flush(char* filename) -{ - FILE *pf = stderr; - - if ( !init_done ) return -1; - - if( NULL != filename ) - pf = fopen(filename, "w"); - - if(!pf) - return -1; - - fprintf(stderr, "Proc %d flushing monitoring to: %s\n", my_rank, filename); - output_monitoring( pf ); - - if( NULL != filename ) - fclose(pf); - return 0; -} diff --git a/ompi/mca/pml/monitoring/pml_monitoring.h b/ompi/mca/pml/monitoring/pml_monitoring.h index efd9a5b0686..db9fe725476 100644 --- a/ompi/mca/pml/monitoring/pml_monitoring.h +++ b/ompi/mca/pml/monitoring/pml_monitoring.h @@ -2,7 +2,7 @@ * Copyright (c) 2013-2015 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. - * Copyright (c) 2013-2015 Inria. All rights reserved. + * Copyright (c) 2013-2017 Inria. All rights reserved. * Copyright (c) 2015 Bull SAS. All rights reserved. * $COPYRIGHT$ * @@ -20,14 +20,15 @@ BEGIN_C_DECLS #include #include #include -#include +#include +#include #include typedef mca_pml_base_module_t mca_pml_monitoring_module_t; extern mca_pml_base_component_t pml_selected_component; extern mca_pml_base_module_t pml_selected_module; -extern mca_pml_monitoring_module_t mca_pml_monitoring; +extern mca_pml_monitoring_module_t mca_pml_monitoring_module; OMPI_DECLSPEC extern mca_pml_base_component_2_0_0_t mca_pml_monitoring_component; /* @@ -38,11 +39,9 @@ extern int mca_pml_monitoring_add_comm(struct ompi_communicator_t* comm); extern int mca_pml_monitoring_del_comm(struct ompi_communicator_t* comm); -extern int mca_pml_monitoring_add_procs(struct ompi_proc_t **procs, - size_t nprocs); +extern int mca_pml_monitoring_add_procs(struct ompi_proc_t **procs, size_t nprocs); -extern int mca_pml_monitoring_del_procs(struct ompi_proc_t **procs, - size_t nprocs); +extern int mca_pml_monitoring_del_procs(struct ompi_proc_t **procs, size_t nprocs); extern int mca_pml_monitoring_enable(bool enable); @@ -138,20 +137,6 @@ extern int mca_pml_monitoring_dump(struct ompi_communicator_t* comm, extern int mca_pml_monitoring_start(size_t count, ompi_request_t** requests); -int mca_pml_monitoring_get_messages_count (const struct mca_base_pvar_t *pvar, - void *value, - void *obj_handle); - -int mca_pml_monitoring_get_messages_size (const struct mca_base_pvar_t *pvar, - void *value, - void *obj_handle); - -void finalize_monitoring( void ); -int filter_monitoring( void ); -void mca_pml_monitoring_reset( void ); -int ompi_mca_pml_monitoring_flush(char* filename); -void monitor_send_data(int world_rank, size_t data_size, int tag); - END_C_DECLS #endif /* MCA_PML_MONITORING_H */ diff --git a/ompi/mca/pml/monitoring/pml_monitoring_comm.c b/ompi/mca/pml/monitoring/pml_monitoring_comm.c index 1200f7ad714..b689ef637ed 100644 --- a/ompi/mca/pml/monitoring/pml_monitoring_comm.c +++ b/ompi/mca/pml/monitoring/pml_monitoring_comm.c @@ -2,7 +2,7 @@ * Copyright (c) 2013-2015 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. - * Copyright (c) 2013-2015 Inria. All rights reserved. + * Copyright (c) 2013-2017 Inria. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -11,7 +11,7 @@ */ #include -#include +#include "pml_monitoring.h" int mca_pml_monitoring_add_comm(struct ompi_communicator_t* comm) { @@ -20,5 +20,6 @@ int mca_pml_monitoring_add_comm(struct ompi_communicator_t* comm) int mca_pml_monitoring_del_comm(struct ompi_communicator_t* comm) { + mca_common_monitoring_coll_cache_name(comm); return pml_selected_module.pml_del_comm(comm); } diff --git a/ompi/mca/pml/monitoring/pml_monitoring_component.c b/ompi/mca/pml/monitoring/pml_monitoring_component.c index 540d414dca0..fed3bd6955d 100644 --- a/ompi/mca/pml/monitoring/pml_monitoring_component.c +++ b/ompi/mca/pml/monitoring/pml_monitoring_component.c @@ -2,7 +2,7 @@ * Copyright (c) 2013-2016 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. - * Copyright (c) 2013-2015 Inria. All rights reserved. + * Copyright (c) 2013-2017 Inria. All rights reserved. * Copyright (c) 2015 Bull SAS. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. @@ -14,123 +14,81 @@ */ #include -#include +#include "pml_monitoring.h" #include #include +#include #include -static int mca_pml_monitoring_enabled = 0; static int mca_pml_monitoring_active = 0; -static int mca_pml_monitoring_current_state = 0; -static char* mca_pml_monitoring_current_filename = NULL; + mca_pml_base_component_t pml_selected_component = {{0}}; mca_pml_base_module_t pml_selected_module = {0}; -/* Return the current status of the monitoring system 0 if off, 1 if the - * seperation between internal tags and external tags is enabled. Any other - * positive value if the segregation between point-to-point and collective is - * disabled. - */ -int filter_monitoring( void ) -{ - return mca_pml_monitoring_current_state; -} - -static int -mca_pml_monitoring_set_flush(struct mca_base_pvar_t *pvar, const void *value, void *obj) -{ - if( NULL != mca_pml_monitoring_current_filename ) - free(mca_pml_monitoring_current_filename); - if( NULL == value ) /* No more output */ - mca_pml_monitoring_current_filename = NULL; - else { - mca_pml_monitoring_current_filename = strdup((char*)value); - if( NULL == mca_pml_monitoring_current_filename ) - return OMPI_ERROR; - } - return OMPI_SUCCESS; -} +mca_pml_monitoring_module_t mca_pml_monitoring_module = { + mca_pml_monitoring_add_procs, + mca_pml_monitoring_del_procs, + mca_pml_monitoring_enable, + NULL, + mca_pml_monitoring_add_comm, + mca_pml_monitoring_del_comm, + mca_pml_monitoring_irecv_init, + mca_pml_monitoring_irecv, + mca_pml_monitoring_recv, + mca_pml_monitoring_isend_init, + mca_pml_monitoring_isend, + mca_pml_monitoring_send, + mca_pml_monitoring_iprobe, + mca_pml_monitoring_probe, + mca_pml_monitoring_start, + mca_pml_monitoring_improbe, + mca_pml_monitoring_mprobe, + mca_pml_monitoring_imrecv, + mca_pml_monitoring_mrecv, + mca_pml_monitoring_dump, + NULL, + 65535, + INT_MAX +}; -static int -mca_pml_monitoring_get_flush(const struct mca_base_pvar_t *pvar, void *value, void *obj) +/** + * This PML monitors only the processes in the MPI_COMM_WORLD. As OMPI is now lazily + * adding peers on the first call to add_procs we need to check how many processes + * are in the MPI_COMM_WORLD to create the storage with the right size. + */ +int mca_pml_monitoring_add_procs(struct ompi_proc_t **procs, + size_t nprocs) { - return OMPI_SUCCESS; + int ret = mca_common_monitoring_add_procs(procs, nprocs); + if( OMPI_SUCCESS == ret ) + ret = pml_selected_module.pml_add_procs(procs, nprocs); + return ret; } -static int -mca_pml_monitoring_notify_flush(struct mca_base_pvar_t *pvar, mca_base_pvar_event_t event, - void *obj, int *count) +/** + * Pass the information down the PML stack. + */ +int mca_pml_monitoring_del_procs(struct ompi_proc_t **procs, + size_t nprocs) { - switch (event) { - case MCA_BASE_PVAR_HANDLE_BIND: - mca_pml_monitoring_reset(); - *count = (NULL == mca_pml_monitoring_current_filename ? 0 : strlen(mca_pml_monitoring_current_filename)); - case MCA_BASE_PVAR_HANDLE_UNBIND: - return OMPI_SUCCESS; - case MCA_BASE_PVAR_HANDLE_START: - mca_pml_monitoring_current_state = mca_pml_monitoring_enabled; - return OMPI_SUCCESS; - case MCA_BASE_PVAR_HANDLE_STOP: - if( 0 == ompi_mca_pml_monitoring_flush(mca_pml_monitoring_current_filename) ) - return OMPI_SUCCESS; - } - return OMPI_ERROR; + return pml_selected_module.pml_del_procs(procs, nprocs); } -static int -mca_pml_monitoring_messages_notify(mca_base_pvar_t *pvar, - mca_base_pvar_event_t event, - void *obj_handle, - int *count) +int mca_pml_monitoring_dump(struct ompi_communicator_t* comm, + int verbose) { - switch (event) { - case MCA_BASE_PVAR_HANDLE_BIND: - /* Return the size of the communicator as the number of values */ - *count = ompi_comm_size ((ompi_communicator_t *) obj_handle); - case MCA_BASE_PVAR_HANDLE_UNBIND: - return OMPI_SUCCESS; - case MCA_BASE_PVAR_HANDLE_START: - mca_pml_monitoring_current_state = mca_pml_monitoring_enabled; - return OMPI_SUCCESS; - case MCA_BASE_PVAR_HANDLE_STOP: - mca_pml_monitoring_current_state = 0; - return OMPI_SUCCESS; - } - - return OMPI_ERROR; + return pml_selected_module.pml_dump(comm, verbose); } int mca_pml_monitoring_enable(bool enable) { - /* If we reach this point we were succesful at hijacking the interface of - * the real PML, and we are now correctly interleaved between the upper - * layer and the real PML. - */ - (void)mca_base_pvar_register("ompi", "pml", "monitoring", "flush", "Flush the monitoring information" - "in the provided file", OPAL_INFO_LVL_1, MCA_BASE_PVAR_CLASS_GENERIC, - MCA_BASE_VAR_TYPE_STRING, NULL, MPI_T_BIND_NO_OBJECT, - 0, - mca_pml_monitoring_get_flush, mca_pml_monitoring_set_flush, - mca_pml_monitoring_notify_flush, &mca_pml_monitoring_component); - - (void)mca_base_pvar_register("ompi", "pml", "monitoring", "messages_count", "Number of messages " - "sent to each peer in a communicator", OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_SIZE, - MCA_BASE_VAR_TYPE_UNSIGNED_LONG, NULL, MPI_T_BIND_MPI_COMM, - MCA_BASE_PVAR_FLAG_READONLY, - mca_pml_monitoring_get_messages_count, NULL, mca_pml_monitoring_messages_notify, NULL); - - (void)mca_base_pvar_register("ompi", "pml", "monitoring", "messages_size", "Size of messages " - "sent to each peer in a communicator", OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_SIZE, - MCA_BASE_VAR_TYPE_UNSIGNED_LONG, NULL, MPI_T_BIND_MPI_COMM, - MCA_BASE_PVAR_FLAG_READONLY, - mca_pml_monitoring_get_messages_size, NULL, mca_pml_monitoring_messages_notify, NULL); - return pml_selected_module.pml_enable(enable); } static int mca_pml_monitoring_component_open(void) { - if( mca_pml_monitoring_enabled ) { + /* CF: What if we are the only PML available ?? */ + if( mca_common_monitoring_enabled ) { opal_pointer_array_add(&mca_pml_base_pml, strdup(mca_pml_monitoring_component.pmlm_version.mca_component_name)); } @@ -139,22 +97,15 @@ static int mca_pml_monitoring_component_open(void) static int mca_pml_monitoring_component_close(void) { - if( NULL != mca_pml_monitoring_current_filename ) { - free(mca_pml_monitoring_current_filename); - mca_pml_monitoring_current_filename = NULL; - } - if( !mca_pml_monitoring_enabled ) - return OMPI_SUCCESS; + if( !mca_common_monitoring_enabled ) return OMPI_SUCCESS; /** - * If this component is already active, then we are currently monitoring the execution - * and this close if the one from MPI_Finalize. Do the clean up and release the extra - * reference on ourselves. + * If this component is already active, then we are currently monitoring + * the execution and this call to close if the one from MPI_Finalize. + * Clean up and release the extra reference on ourselves. */ if( mca_pml_monitoring_active ) { /* Already active, turn off */ pml_selected_component.pmlm_version.mca_close_component(); - memset(&pml_selected_component, 0, sizeof(mca_pml_base_component_t)); - memset(&pml_selected_module, 0, sizeof(mca_pml_base_module_t)); mca_base_component_repository_release((mca_base_component_t*)&mca_pml_monitoring_component); mca_pml_monitoring_active = 0; return OMPI_SUCCESS; @@ -175,12 +126,13 @@ static int mca_pml_monitoring_component_close(void) pml_selected_module = mca_pml; /* Install our interception layer */ mca_pml_base_selected_component = mca_pml_monitoring_component; - mca_pml = mca_pml_monitoring; - /* Restore some of the original valued: progress, flags, tags and context id */ + mca_pml = mca_pml_monitoring_module; + /* Restore some of the original values: progress, flags, tags and context id */ mca_pml.pml_progress = pml_selected_module.pml_progress; mca_pml.pml_max_contextid = pml_selected_module.pml_max_contextid; mca_pml.pml_max_tag = pml_selected_module.pml_max_tag; - mca_pml.pml_flags = pml_selected_module.pml_flags; + /* Add MCA_PML_BASE_FLAG_REQUIRE_WORLD flag to ensure the hashtable is properly initialized */ + mca_pml.pml_flags = pml_selected_module.pml_flags | MCA_PML_BASE_FLAG_REQUIRE_WORLD; mca_pml_monitoring_active = 1; @@ -192,44 +144,36 @@ mca_pml_monitoring_component_init(int* priority, bool enable_progress_threads, bool enable_mpi_threads) { - if( mca_pml_monitoring_enabled ) { + if( (OMPI_SUCCESS == mca_common_monitoring_init()) && + mca_common_monitoring_enabled ) { *priority = 0; /* I'm up but don't select me */ - return &mca_pml_monitoring; + return &mca_pml_monitoring_module; } return NULL; } static int mca_pml_monitoring_component_finish(void) { - if( mca_pml_monitoring_enabled && mca_pml_monitoring_active ) { + if( mca_common_monitoring_enabled && mca_pml_monitoring_active ) { /* Free internal data structure */ - finalize_monitoring(); - /* Call the original PML and then close */ - mca_pml_monitoring_active = 0; - mca_pml_monitoring_enabled = 0; + mca_common_monitoring_finalize(); /* Restore the original PML */ mca_pml_base_selected_component = pml_selected_component; mca_pml = pml_selected_module; /* Redirect the close call to the original PML */ pml_selected_component.pmlm_finalize(); /** - * We should never release the last ref on the current component or face forever punishement. + * We should never release the last ref on the current + * component or face forever punishement. */ - /* mca_base_component_repository_release(&mca_pml_monitoring_component.pmlm_version); */ + /* mca_base_component_repository_release(&mca_common_monitoring_component.pmlm_version); */ } return OMPI_SUCCESS; } static int mca_pml_monitoring_component_register(void) { - (void)mca_base_component_var_register(&mca_pml_monitoring_component.pmlm_version, "enable", - "Enable the monitoring at the PML level. A value of 0 will disable the monitoring (default). " - "A value of 1 will aggregate all monitoring information (point-to-point and collective). " - "Any other value will enable filtered monitoring", - MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, - OPAL_INFO_LVL_4, - MCA_BASE_VAR_SCOPE_READONLY, &mca_pml_monitoring_enabled); - + mca_common_monitoring_register(&mca_pml_monitoring_component); return OMPI_SUCCESS; } @@ -242,9 +186,7 @@ mca_pml_base_component_2_0_0_t mca_pml_monitoring_component = { MCA_PML_BASE_VERSION_2_0_0, .mca_component_name = "monitoring", /* MCA component name */ - .mca_component_major_version = OMPI_MAJOR_VERSION, /* MCA component major version */ - .mca_component_minor_version = OMPI_MINOR_VERSION, /* MCA component minor version */ - .mca_component_release_version = OMPI_RELEASE_VERSION, /* MCA component release version */ + MCA_MONITORING_MAKE_VERSION, .mca_open_component = mca_pml_monitoring_component_open, /* component open */ .mca_close_component = mca_pml_monitoring_component_close, /* component close */ .mca_register_component_params = mca_pml_monitoring_component_register @@ -256,6 +198,5 @@ mca_pml_base_component_2_0_0_t mca_pml_monitoring_component = { .pmlm_init = mca_pml_monitoring_component_init, /* component init */ .pmlm_finalize = mca_pml_monitoring_component_finish /* component finalize */ - }; diff --git a/ompi/mca/pml/monitoring/pml_monitoring_iprobe.c b/ompi/mca/pml/monitoring/pml_monitoring_iprobe.c index ec34cb5d27c..42bc7ba257c 100644 --- a/ompi/mca/pml/monitoring/pml_monitoring_iprobe.c +++ b/ompi/mca/pml/monitoring/pml_monitoring_iprobe.c @@ -2,7 +2,7 @@ * Copyright (c) 2013-2015 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. - * Copyright (c) 2013-2015 Inria. All rights reserved. + * Copyright (c) 2013-2017 Inria. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -11,7 +11,7 @@ */ #include -#include +#include "pml_monitoring.h" /* EJ: nothing to do here */ diff --git a/ompi/mca/pml/monitoring/pml_monitoring_irecv.c b/ompi/mca/pml/monitoring/pml_monitoring_irecv.c index 91b247c7c53..7c3fa8aa4d2 100644 --- a/ompi/mca/pml/monitoring/pml_monitoring_irecv.c +++ b/ompi/mca/pml/monitoring/pml_monitoring_irecv.c @@ -2,7 +2,7 @@ * Copyright (c) 2013-2015 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. - * Copyright (c) 2013-2015 Inria. All rights reserved. + * Copyright (c) 2013-2017 Inria. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -11,7 +11,7 @@ */ #include -#include +#include "pml_monitoring.h" /* EJ: loging is done on the sender. Nothing to do here */ diff --git a/ompi/mca/pml/monitoring/pml_monitoring_isend.c b/ompi/mca/pml/monitoring/pml_monitoring_isend.c index 1c88fd268bf..6b167db1fb2 100644 --- a/ompi/mca/pml/monitoring/pml_monitoring_isend.c +++ b/ompi/mca/pml/monitoring/pml_monitoring_isend.c @@ -2,7 +2,7 @@ * Copyright (c) 2013-2015 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. - * Copyright (c) 2013-2015 Inria. All rights reserved. + * Copyright (c) 2013-2018 Inria. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -11,9 +11,7 @@ */ #include -#include - -extern opal_hash_table_t *translation_ht; +#include "pml_monitoring.h" int mca_pml_monitoring_isend_init(const void *buf, size_t count, @@ -37,22 +35,16 @@ int mca_pml_monitoring_isend(const void *buf, struct ompi_communicator_t* comm, struct ompi_request_t **request) { - - /* find the processor of teh destination */ - ompi_proc_t *proc = ompi_group_get_proc_ptr(comm->c_remote_group, dst, true); int world_rank; - - /* find its name*/ - uint64_t key = *((uint64_t*)&(proc->super.proc_name)); /** * If this fails the destination is not part of my MPI_COM_WORLD * Lookup its name in the rank hastable to get its MPI_COMM_WORLD rank */ - if(OPAL_SUCCESS == opal_hash_table_get_value_uint64(translation_ht, key, (void *)&world_rank)) { + if(OPAL_SUCCESS == mca_common_monitoring_get_world_rank(dst, comm->c_remote_group, &world_rank)) { size_t type_size, data_size; ompi_datatype_type_size(datatype, &type_size); data_size = count*type_size; - monitor_send_data(world_rank, data_size, tag); + mca_common_monitoring_record_pml(world_rank, data_size, tag); } return pml_selected_module.pml_isend(buf, count, datatype, @@ -67,19 +59,15 @@ int mca_pml_monitoring_send(const void *buf, mca_pml_base_send_mode_t mode, struct ompi_communicator_t* comm) { - ompi_proc_t *proc = ompi_group_get_proc_ptr(comm->c_remote_group, dst, true); int world_rank; - uint64_t key = *((uint64_t*) &(proc->super.proc_name)); - /* Are we sending to a peer from my own MPI_COMM_WORLD? */ - if(OPAL_SUCCESS == opal_hash_table_get_value_uint64(translation_ht, key, (void *)&world_rank)) { + if(OPAL_SUCCESS == mca_common_monitoring_get_world_rank(dst, comm->c_remote_group, &world_rank)) { size_t type_size, data_size; ompi_datatype_type_size(datatype, &type_size); data_size = count*type_size; - monitor_send_data(world_rank, data_size, tag); + mca_common_monitoring_record_pml(world_rank, data_size, tag); } return pml_selected_module.pml_send(buf, count, datatype, dst, tag, mode, comm); } - diff --git a/ompi/mca/pml/monitoring/pml_monitoring_start.c b/ompi/mca/pml/monitoring/pml_monitoring_start.c index fbdebac1c27..903aec805e3 100644 --- a/ompi/mca/pml/monitoring/pml_monitoring_start.c +++ b/ompi/mca/pml/monitoring/pml_monitoring_start.c @@ -2,7 +2,7 @@ * Copyright (c) 2013-2015 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. - * Copyright (c) 2013-2015 Inria. All rights reserved. + * Copyright (c) 2013-2018 Inria. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -11,12 +11,9 @@ */ #include -#include -#include +#include "pml_monitoring.h" #include -extern opal_hash_table_t *translation_ht; - /* manage persistant requests*/ int mca_pml_monitoring_start(size_t count, ompi_request_t** requests) @@ -25,7 +22,6 @@ int mca_pml_monitoring_start(size_t count, for( i = 0; i < count; i++ ) { mca_pml_base_request_t *pml_request = (mca_pml_base_request_t*)requests[i]; - ompi_proc_t *proc; int world_rank; if(NULL == pml_request) { @@ -38,18 +34,16 @@ int mca_pml_monitoring_start(size_t count, continue; } - proc = ompi_group_get_proc_ptr(pml_request->req_comm->c_remote_group, pml_request->req_peer, true); - uint64_t key = *((uint64_t*) &(proc->super.proc_name)); - - /** * If this fails the destination is not part of my MPI_COM_WORLD */ - if(OPAL_SUCCESS == opal_hash_table_get_value_uint64(translation_ht, key, (void *)&world_rank)) { + if(OPAL_SUCCESS == mca_common_monitoring_get_world_rank(pml_request->req_peer, + pml_request->req_comm->c_remote_group, + &world_rank)) { size_t type_size, data_size; ompi_datatype_type_size(pml_request->req_datatype, &type_size); data_size = pml_request->req_count * type_size; - monitor_send_data(world_rank, data_size, 1); + mca_common_monitoring_record_pml(world_rank, data_size, 1); } } return pml_selected_module.pml_start(count, requests); diff --git a/ompi/mca/pml/ob1/Makefile.am b/ompi/mca/pml/ob1/Makefile.am index 4609a29484e..d0044bb6b6a 100644 --- a/ompi/mca/pml/ob1/Makefile.am +++ b/ompi/mca/pml/ob1/Makefile.am @@ -12,6 +12,7 @@ # Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. # Copyright (c) 2009-2014 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2012 NVIDIA Corporation. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -68,7 +69,7 @@ mca_pml_ob1_la_SOURCES = $(ob1_sources) mca_pml_ob1_la_LDFLAGS = -module -avoid-version if OPAL_cuda_support -mca_pml_ob1_la_LIBADD = \ +mca_pml_ob1_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ $(OMPI_TOP_BUILDDIR)/opal/mca/common/cuda/lib@OPAL_LIB_PREFIX@mca_common_cuda.la endif diff --git a/ompi/mca/pml/ob1/pml_ob1.c b/ompi/mca/pml/ob1/pml_ob1.c index fc941df0716..5adf19028a8 100644 --- a/ompi/mca/pml/ob1/pml_ob1.c +++ b/ompi/mca/pml/ob1/pml_ob1.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2010 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2012 The University of Tennessee and The University + * Copyright (c) 2004-2018 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -206,6 +206,9 @@ int mca_pml_ob1_add_comm(ompi_communicator_t* comm) return OMPI_ERR_OUT_OF_RESOURCE; } + ompi_comm_assert_subscribe (comm, OMPI_COMM_ASSERT_NO_ANY_SOURCE); + ompi_comm_assert_subscribe (comm, OMPI_COMM_ASSERT_ALLOW_OVERTAKE); + mca_pml_ob1_comm_init_size(pml_comm, comm->c_remote_group->grp_proc_count); comm->c_pml_comm = pml_comm; @@ -223,8 +226,6 @@ int mca_pml_ob1_add_comm(ompi_communicator_t* comm) opal_list_remove_item (&mca_pml_ob1.non_existing_communicator_pending, (opal_list_item_t *) frag); - add_fragment_to_unexpected: - /* We generate the MSG_ARRIVED event as soon as the PML is aware * of a matching fragment arrival. Independing if it is received * on the correct order or not. This will allow the tools to @@ -242,7 +243,16 @@ int mca_pml_ob1_add_comm(ompi_communicator_t* comm) */ pml_proc = mca_pml_ob1_peer_lookup(comm, hdr->hdr_src); - if( ((uint16_t)hdr->hdr_seq) == ((uint16_t)pml_proc->expected_sequence) ) { + if (OMPI_COMM_CHECK_ASSERT_ALLOW_OVERTAKE(comm)) { + opal_list_append( &pml_proc->unexpected_frags, (opal_list_item_t*)frag ); + PERUSE_TRACE_MSG_EVENT(PERUSE_COMM_MSG_INSERT_IN_UNEX_Q, comm, + hdr->hdr_src, hdr->hdr_tag, PERUSE_RECV); + continue; + } + + if (((uint16_t)hdr->hdr_seq) == ((uint16_t)pml_proc->expected_sequence) ) { + + add_fragment_to_unexpected: /* We're now expecting the next sequence number. */ pml_proc->expected_sequence++; opal_list_append( &pml_proc->unexpected_frags, (opal_list_item_t*)frag ); @@ -254,19 +264,16 @@ int mca_pml_ob1_add_comm(ompi_communicator_t* comm) * situation as the cant_match is only checked when a new fragment is received from * the network. */ - for(frag = (mca_pml_ob1_recv_frag_t *)opal_list_get_first(&pml_proc->frags_cant_match); - frag != (mca_pml_ob1_recv_frag_t *)opal_list_get_end(&pml_proc->frags_cant_match); - frag = (mca_pml_ob1_recv_frag_t *)opal_list_get_next(frag)) { - hdr = &frag->hdr.hdr_match; - /* If the message has the next expected seq from that proc... */ - if(hdr->hdr_seq != pml_proc->expected_sequence) - continue; - - opal_list_remove_item(&pml_proc->frags_cant_match, (opal_list_item_t*)frag); - goto add_fragment_to_unexpected; - } + if( NULL != pml_proc->frags_cant_match ) { + frag = check_cantmatch_for_match(pml_proc); + if( NULL != frag ) { + hdr = &frag->hdr.hdr_match; + goto add_fragment_to_unexpected; + } + } } else { - opal_list_append( &pml_proc->frags_cant_match, (opal_list_item_t*)frag ); + append_frag_to_ordered_list(&pml_proc->frags_cant_match, frag, + pml_proc->expected_sequence); } } return OMPI_SUCCESS; @@ -333,7 +340,7 @@ int mca_pml_ob1_add_procs(ompi_proc_t** procs, size_t nprocs) expose all currently in use btls. */ OPAL_LIST_FOREACH(sm, &mca_btl_base_modules_initialized, mca_btl_base_selected_module_t) { - if (sm->btl_module->btl_eager_limit < sizeof(mca_pml_ob1_hdr_t)) { + if ((MCA_BTL_FLAGS_SEND & sm->btl_module->btl_flags) && sm->btl_module->btl_eager_limit < sizeof(mca_pml_ob1_hdr_t)) { opal_show_help("help-mpi-pml-ob1.txt", "eager_limit_too_small", true, sm->btl_component->btl_version.mca_component_name, @@ -553,6 +560,23 @@ static void mca_pml_ob1_dump_frag_list(opal_list_t* queue, bool is_req) } } +void mca_pml_ob1_dump_cant_match(mca_pml_ob1_recv_frag_t* queue) +{ + mca_pml_ob1_recv_frag_t* item = queue; + + do { + mca_pml_ob1_dump_hdr( &item->hdr ); + if( NULL != item->range ) { + mca_pml_ob1_recv_frag_t* frag = item->range; + do { + mca_pml_ob1_dump_hdr( &frag->hdr ); + frag = (mca_pml_ob1_recv_frag_t*)frag->super.super.opal_list_next; + } while( frag != item->range ); + } + item = (mca_pml_ob1_recv_frag_t*)item->super.super.opal_list_next; + } while( item != queue ); +} + int mca_pml_ob1_dump(struct ompi_communicator_t* comm, int verbose) { struct mca_pml_comm_t* pml_comm = comm->c_pml_comm; @@ -588,9 +612,9 @@ int mca_pml_ob1_dump(struct ompi_communicator_t* comm, int verbose) opal_output(0, "expected specific receives\n"); mca_pml_ob1_dump_frag_list(&proc->specific_receives, true); } - if( opal_list_get_size(&proc->frags_cant_match) ) { + if( NULL != proc->frags_cant_match ) { opal_output(0, "out of sequence\n"); - mca_pml_ob1_dump_frag_list(&proc->frags_cant_match, false); + mca_pml_ob1_dump_cant_match(proc->frags_cant_match); } if( opal_list_get_size(&proc->unexpected_frags) ) { opal_output(0, "unexpected frag\n"); diff --git a/ompi/mca/pml/ob1/pml_ob1.h b/ompi/mca/pml/ob1/pml_ob1.h index 10162916c6a..1f4bfbb5899 100644 --- a/ompi/mca/pml/ob1/pml_ob1.h +++ b/ompi/mca/pml/ob1/pml_ob1.h @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2016 The University of Tennessee and The University + * Copyright (c) 2004-2018 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -55,8 +55,8 @@ struct mca_pml_ob1_t { int free_list_num; /* initial size of free list */ int free_list_max; /* maximum size of free list */ int free_list_inc; /* number of elements to grow free list */ - size_t send_pipeline_depth; - size_t recv_pipeline_depth; + int32_t send_pipeline_depth; + int32_t recv_pipeline_depth; size_t rdma_retries_limit; int max_rdma_per_request; int max_send_per_range; diff --git a/ompi/mca/pml/ob1/pml_ob1_comm.c b/ompi/mca/pml/ob1/pml_ob1_comm.c index 40c54771a8f..510704849da 100644 --- a/ompi/mca/pml/ob1/pml_ob1_comm.c +++ b/ompi/mca/pml/ob1/pml_ob1_comm.c @@ -2,7 +2,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2006 The University of Tennessee and The University + * Copyright (c) 2004-2018 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -29,7 +29,7 @@ static void mca_pml_ob1_comm_proc_construct(mca_pml_ob1_comm_proc_t* proc) proc->ompi_proc = NULL; proc->expected_sequence = 1; proc->send_sequence = 0; - OBJ_CONSTRUCT(&proc->frags_cant_match, opal_list_t); + proc->frags_cant_match = NULL; OBJ_CONSTRUCT(&proc->specific_receives, opal_list_t); OBJ_CONSTRUCT(&proc->unexpected_frags, opal_list_t); } @@ -37,7 +37,7 @@ static void mca_pml_ob1_comm_proc_construct(mca_pml_ob1_comm_proc_t* proc) static void mca_pml_ob1_comm_proc_destruct(mca_pml_ob1_comm_proc_t* proc) { - OBJ_DESTRUCT(&proc->frags_cant_match); + assert(NULL == proc->frags_cant_match); OBJ_DESTRUCT(&proc->specific_receives); OBJ_DESTRUCT(&proc->unexpected_frags); if (proc->ompi_proc) { diff --git a/ompi/mca/pml/ob1/pml_ob1_comm.h b/ompi/mca/pml/ob1/pml_ob1_comm.h index 33f16955193..a6f32153250 100644 --- a/ompi/mca/pml/ob1/pml_ob1_comm.h +++ b/ompi/mca/pml/ob1/pml_ob1_comm.h @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2016 The University of Tennessee and The University + * Copyright (c) 2004-2018 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -40,7 +40,7 @@ struct mca_pml_ob1_comm_proc_t { #else int32_t send_sequence; /**< send side sequence number */ #endif - opal_list_t frags_cant_match; /**< out-of-order fragment queues */ + struct mca_pml_ob1_recv_frag_t* frags_cant_match; /**< out-of-order fragment queues */ opal_list_t specific_receives; /**< queues of unmatched specific receives */ opal_list_t unexpected_frags; /**< unexpected fragment queues */ }; diff --git a/ompi/mca/pml/ob1/pml_ob1_component.c b/ompi/mca/pml/ob1/pml_ob1_component.c index 6557bc20371..60345cab68c 100644 --- a/ompi/mca/pml/ob1/pml_ob1_component.c +++ b/ompi/mca/pml/ob1/pml_ob1_component.c @@ -110,6 +110,7 @@ static inline unsigned int mca_pml_ob1_param_register_uint( return *storage; } +#if 0 static inline size_t mca_pml_ob1_param_register_sizet( const char* param_name, size_t default_value, @@ -122,6 +123,7 @@ static inline size_t mca_pml_ob1_param_register_sizet( MCA_BASE_VAR_SCOPE_READONLY, storage); return *storage; } +#endif static int mca_pml_ob1_comm_size_notify (mca_base_pvar_t *pvar, mca_base_pvar_event_t event, void *obj_handle, int *count) { @@ -184,8 +186,8 @@ static int mca_pml_ob1_component_register(void) mca_pml_ob1_param_register_int("free_list_max", -1, &mca_pml_ob1.free_list_max); mca_pml_ob1_param_register_int("free_list_inc", 64, &mca_pml_ob1.free_list_inc); mca_pml_ob1_param_register_int("priority", 20, &mca_pml_ob1.priority); - mca_pml_ob1_param_register_sizet("send_pipeline_depth", 3, &mca_pml_ob1.send_pipeline_depth); - mca_pml_ob1_param_register_sizet("recv_pipeline_depth", 4, &mca_pml_ob1.recv_pipeline_depth); + mca_pml_ob1_param_register_int("send_pipeline_depth", 3, &mca_pml_ob1.send_pipeline_depth); + mca_pml_ob1_param_register_int("recv_pipeline_depth", 4, &mca_pml_ob1.recv_pipeline_depth); /* NTH: we can get into a live-lock situation in the RDMA failure path so disable RDMA retries for now. Falling back to send may suck but it is better than @@ -209,18 +211,19 @@ static int mca_pml_ob1_component_register(void) "Name of allocator component for unexpected messages", MCA_BASE_VAR_TYPE_STRING, NULL, 0, 0, OPAL_INFO_LVL_9, MCA_BASE_VAR_SCOPE_READONLY, &mca_pml_ob1.allocator_name); - - (void) mca_base_pvar_register ("ompi", "pml", "ob1", "unexpected_msgq_length", "Number of unexpected messages " - "received by each peer in a communicator", OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_SIZE, - MCA_BASE_VAR_TYPE_UNSIGNED_INT, NULL, MPI_T_BIND_MPI_COMM, - MCA_BASE_PVAR_FLAG_READONLY | MCA_BASE_PVAR_FLAG_CONTINUOUS, - mca_pml_ob1_get_unex_msgq_size, NULL, mca_pml_ob1_comm_size_notify, NULL); - - (void) mca_base_pvar_register ("ompi", "pml", "ob1", "posted_recvq_length", "Number of unmatched receives " - "posted for each peer in a communicator", OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_SIZE, - MCA_BASE_VAR_TYPE_UNSIGNED_INT, NULL, MPI_T_BIND_MPI_COMM, - MCA_BASE_PVAR_FLAG_READONLY | MCA_BASE_PVAR_FLAG_CONTINUOUS, - mca_pml_ob1_get_posted_recvq_size, NULL, mca_pml_ob1_comm_size_notify, NULL); + (void)mca_base_component_pvar_register(&mca_pml_ob1_component.pmlm_version, + "unexpected_msgq_length", "Number of unexpected messages " + "received by each peer in a communicator", OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_SIZE, + MCA_BASE_VAR_TYPE_UNSIGNED_INT, NULL, MPI_T_BIND_MPI_COMM, + MCA_BASE_PVAR_FLAG_READONLY | MCA_BASE_PVAR_FLAG_CONTINUOUS, + mca_pml_ob1_get_unex_msgq_size, NULL, mca_pml_ob1_comm_size_notify, NULL); + + (void)mca_base_component_pvar_register(&mca_pml_ob1_component.pmlm_version, + "posted_recvq_length", "Number of unmatched receives " + "posted for each peer in a communicator", OPAL_INFO_LVL_4, MPI_T_PVAR_CLASS_SIZE, + MCA_BASE_VAR_TYPE_UNSIGNED_INT, NULL, MPI_T_BIND_MPI_COMM, + MCA_BASE_PVAR_FLAG_READONLY | MCA_BASE_PVAR_FLAG_CONTINUOUS, + mca_pml_ob1_get_posted_recvq_size, NULL, mca_pml_ob1_comm_size_notify, NULL); return OMPI_SUCCESS; } diff --git a/ompi/mca/pml/ob1/pml_ob1_irecv.c b/ompi/mca/pml/ob1/pml_ob1_irecv.c index 71eb9dd8aa6..37c0ce9e9e8 100644 --- a/ompi/mca/pml/ob1/pml_ob1_irecv.c +++ b/ompi/mca/pml/ob1/pml_ob1_irecv.c @@ -15,7 +15,7 @@ * Copyright (c) 2010-2012 Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2011 Sandia National Laboratories. All rights reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2016 Research Organization for Information Science + * Copyright (c) 2016-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -134,6 +134,16 @@ int mca_pml_ob1_recv(void *addr, MCA_PML_OB1_RECV_REQUEST_START(recvreq); ompi_request_wait_completion(&recvreq->req_recv.req_base.req_ompi); + if( true == recvreq->req_recv.req_base.req_pml_complete ) { + /* make buffer defined when the request is compeleted */ + MEMCHECKER( + memchecker_call(&opal_memchecker_base_mem_defined, + recvreq->req_recv.req_base.req_addr, + recvreq->req_recv.req_base.req_count, + recvreq->req_recv.req_base.req_datatype); + ); + } + if (NULL != status) { /* return status */ *status = recvreq->req_recv.req_base.req_ompi.req_status; } diff --git a/ompi/mca/pml/ob1/pml_ob1_isend.c b/ompi/mca/pml/ob1/pml_ob1_isend.c index 90edc34e188..be673382761 100644 --- a/ompi/mca/pml/ob1/pml_ob1_isend.c +++ b/ompi/mca/pml/ob1/pml_ob1_isend.c @@ -143,14 +143,16 @@ int mca_pml_ob1_isend(const void *buf, mca_pml_ob1_send_request_t *sendreq = NULL; ompi_proc_t *dst_proc = ob1_proc->ompi_proc; mca_bml_base_endpoint_t* endpoint = mca_bml_base_get_endpoint (dst_proc); - int16_t seqn; + int16_t seqn = 0; int rc; if (OPAL_UNLIKELY(NULL == endpoint)) { return OMPI_ERR_UNREACH; } - seqn = (uint16_t) OPAL_THREAD_ADD32(&ob1_proc->send_sequence, 1); + if (!OMPI_COMM_CHECK_ASSERT_ALLOW_OVERTAKE(comm)) { + seqn = (uint16_t) OPAL_THREAD_ADD_FETCH32(&ob1_proc->send_sequence, 1); + } if (MCA_PML_BASE_SEND_SYNCHRONOUS != sendmode) { rc = mca_pml_ob1_send_inline (buf, count, datatype, dst, tag, seqn, dst_proc, @@ -196,7 +198,7 @@ int mca_pml_ob1_send(const void *buf, ompi_proc_t *dst_proc = ob1_proc->ompi_proc; mca_bml_base_endpoint_t* endpoint = mca_bml_base_get_endpoint (dst_proc); mca_pml_ob1_send_request_t *sendreq = NULL; - int16_t seqn; + int16_t seqn = 0; int rc; if (OPAL_UNLIKELY(NULL == endpoint)) { @@ -217,7 +219,9 @@ int mca_pml_ob1_send(const void *buf, return OMPI_SUCCESS; } - seqn = (uint16_t) OPAL_THREAD_ADD32(&ob1_proc->send_sequence, 1); + if (!OMPI_COMM_CHECK_ASSERT_ALLOW_OVERTAKE(comm)) { + seqn = (uint16_t) OPAL_THREAD_ADD_FETCH32(&ob1_proc->send_sequence, 1); + } /** * The immediate send will not have a request, so they are diff --git a/ompi/mca/pml/ob1/pml_ob1_progress.c b/ompi/mca/pml/ob1/pml_ob1_progress.c index 96935b60215..e1f84e796b4 100644 --- a/ompi/mca/pml/ob1/pml_ob1_progress.c +++ b/ompi/mca/pml/ob1/pml_ob1_progress.c @@ -10,6 +10,8 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -54,8 +56,8 @@ static inline int mca_pml_ob1_process_pending_cuda_async_copies(void) static int mca_pml_ob1_progress_needed = 0; int mca_pml_ob1_enable_progress(int32_t count) { - int32_t old = OPAL_ATOMIC_ADD32(&mca_pml_ob1_progress_needed, count); - if( 0 != old ) + int32_t progress_count = OPAL_ATOMIC_ADD_FETCH32(&mca_pml_ob1_progress_needed, count); + if( 1 < progress_count ) return 0; /* progress was already on */ opal_progress_register(mca_pml_ob1_progress); @@ -117,8 +119,8 @@ int mca_pml_ob1_progress(void) } if( 0 != completed_requests ) { - j = OPAL_ATOMIC_ADD32(&mca_pml_ob1_progress_needed, -completed_requests); - if( j == completed_requests ) { + j = OPAL_ATOMIC_ADD_FETCH32(&mca_pml_ob1_progress_needed, -completed_requests); + if( 0 == j ) { opal_progress_unregister(mca_pml_ob1_progress); } } diff --git a/ompi/mca/pml/ob1/pml_ob1_recvfrag.c b/ompi/mca/pml/ob1/pml_ob1_recvfrag.c index 5f3f8fdc484..eb261029ffd 100644 --- a/ompi/mca/pml/ob1/pml_ob1_recvfrag.c +++ b/ompi/mca/pml/ob1/pml_ob1_recvfrag.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2016 The University of Tennessee and The University + * Copyright (c) 2004-2018 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2007 High Performance Computing Center Stuttgart, @@ -66,10 +66,11 @@ OBJ_CLASS_INSTANCE( mca_pml_ob1_recv_frag_t, */ /** - * Append a unexpected descriptor to a queue. This function will allocate and + * Append an unexpected descriptor to a queue. This function will allocate and * initialize the fragment (if necessary) and then will add it to the specified * queue. The allocated fragment is not returned to the caller. */ + static void append_frag_to_list(opal_list_t *queue, mca_btl_base_module_t *btl, mca_pml_ob1_match_hdr_t *hdr, mca_btl_base_segment_t* segments, @@ -82,21 +83,224 @@ append_frag_to_list(opal_list_t *queue, mca_btl_base_module_t *btl, opal_list_append(queue, (opal_list_item_t*)frag); } +/** + * Append an unexpected descriptor to an ordered queue. + * + * use the opal_list_item_t to maintain themselves on an ordered list + * according to their hdr_seq. Special care has been taken to cope with + * overflowing the uint16_t we use for the hdr_seq. The current algorithm + * works as long as there are no two elements with the same hdr_seq in the + * list in same time (aka. no more than 2^16-1 left out-of-sequence + * messages. On the vertical layer, messages with contiguous sequence + * number organize themselves in a way to minimize the search space. + */ +void +append_frag_to_ordered_list(mca_pml_ob1_recv_frag_t** queue, + mca_pml_ob1_recv_frag_t *frag, + uint16_t seq) +{ + mca_pml_ob1_recv_frag_t *prior, *next; + mca_pml_ob1_match_hdr_t *hdr; + + frag->super.super.opal_list_next = (opal_list_item_t*)frag; + frag->super.super.opal_list_prev = (opal_list_item_t*)frag; + frag->range = NULL; + hdr = &frag->hdr.hdr_match; + + if( NULL == *queue ) { /* no pending fragments yet */ + *queue = frag; + return; + } + + prior = *queue; + assert(hdr->hdr_seq != prior->hdr.hdr_match.hdr_seq); + + /* The hdr_seq being 16 bits long it can rollover rather quickly. We need to + * account for this rollover or the matching will fail. + * Extract the items from the list to order them safely */ + if( hdr->hdr_seq < prior->hdr.hdr_match.hdr_seq ) { + uint16_t d1, d2 = prior->hdr.hdr_match.hdr_seq - hdr->hdr_seq; + do { + d1 = d2; + prior = (mca_pml_ob1_recv_frag_t*)(prior->super.super.opal_list_prev); + d2 = prior->hdr.hdr_match.hdr_seq - hdr->hdr_seq; + } while( (hdr->hdr_seq < prior->hdr.hdr_match.hdr_seq) && + (d1 > d2) && (prior != *queue) ); + } else { + uint16_t prior_seq = prior->hdr.hdr_match.hdr_seq, + next_seq = ((mca_pml_ob1_recv_frag_t*)(prior->super.super.opal_list_next))->hdr.hdr_match.hdr_seq; + /* prevent rollover */ + while( (hdr->hdr_seq > prior_seq) && (hdr->hdr_seq > next_seq) && (prior_seq < next_seq) ) { + prior_seq = next_seq; + prior = (mca_pml_ob1_recv_frag_t*)(prior->super.super.opal_list_next); + next_seq = ((mca_pml_ob1_recv_frag_t*)(prior->super.super.opal_list_next))->hdr.hdr_match.hdr_seq; + } + } + + /* prior is the fragment with a closest hdr_seq lesser than the current hdr_seq */ + mca_pml_ob1_recv_frag_t* parent = prior; + + /* Is this fragment the next in range ? */ + if( NULL == parent->range ) { + if( (parent->hdr.hdr_match.hdr_seq + 1) == hdr->hdr_seq ) { + parent->range = (mca_pml_ob1_recv_frag_t*)frag; + goto merge_ranges; + } + /* all other cases fallback and add the frag after the parent */ + } else { + /* can we add the frag to the range of the previous fragment ? */ + mca_pml_ob1_recv_frag_t* largest = (mca_pml_ob1_recv_frag_t*)parent->range->super.super.opal_list_prev; + if( (largest->hdr.hdr_match.hdr_seq + 1) == hdr->hdr_seq ) { + /* the frag belongs to this range */ + frag->super.super.opal_list_prev = (opal_list_item_t*)largest; + frag->super.super.opal_list_next = largest->super.super.opal_list_next; + frag->super.super.opal_list_prev->opal_list_next = (opal_list_item_t*)frag; + frag->super.super.opal_list_next->opal_list_prev = (opal_list_item_t*)frag; + goto merge_ranges; + } + /* all other cases fallback and add the frag after the parent */ + } + + frag->super.super.opal_list_prev = (opal_list_item_t*)prior; + frag->super.super.opal_list_next = (opal_list_item_t*)prior->super.super.opal_list_next; + frag->super.super.opal_list_prev->opal_list_next = (opal_list_item_t*)frag; + frag->super.super.opal_list_next->opal_list_prev = (opal_list_item_t*)frag; + parent = frag; /* the frag is not part of a range yet */ + + /* if the newly added element is closer to the next expected sequence mark it so */ + if( parent->hdr.hdr_match.hdr_seq >= seq ) + if( abs(parent->hdr.hdr_match.hdr_seq - seq) < abs((*queue)->hdr.hdr_match.hdr_seq - seq)) + *queue = parent; + + merge_ranges: + /* is the next hdr_seq the increasing next one ? */ + next = (mca_pml_ob1_recv_frag_t*)parent->super.super.opal_list_next; + uint16_t upper = parent->hdr.hdr_match.hdr_seq; + if( NULL != parent->range ) { + upper = ((mca_pml_ob1_recv_frag_t*)parent->range->super.super.opal_list_prev)->hdr.hdr_match.hdr_seq; + } + if( (upper + 1) == next->hdr.hdr_match.hdr_seq ) { + /* remove next from the horizontal chain */ + next->super.super.opal_list_next->opal_list_prev = (opal_list_item_t*)parent; + parent->super.super.opal_list_next = next->super.super.opal_list_next; + /* merge next with it's own range */ + if( NULL != next->range ) { + next->super.super.opal_list_next = (opal_list_item_t*)next->range; + next->super.super.opal_list_prev = next->range->super.super.opal_list_prev; + next->super.super.opal_list_next->opal_list_prev = (opal_list_item_t*)next; + next->super.super.opal_list_prev->opal_list_next = (opal_list_item_t*)next; + next->range = NULL; + } else { + next->super.super.opal_list_prev = (opal_list_item_t*)next; + next->super.super.opal_list_next = (opal_list_item_t*)next; + } + if( NULL == parent->range ) { + parent->range = next; + } else { + /* we have access to parent->range so make frag be it's predecessor */ + frag = (mca_pml_ob1_recv_frag_t*)parent->range->super.super.opal_list_prev; + /* merge the 2 rings such that frag is right before next */ + frag->super.super.opal_list_next = (opal_list_item_t*)next; + parent->range->super.super.opal_list_prev = next->super.super.opal_list_prev; + next->super.super.opal_list_prev->opal_list_next = (opal_list_item_t*)parent->range; + next->super.super.opal_list_prev = (opal_list_item_t*)frag; + } + if( next == *queue ) + *queue = parent; + } +} + +/* + * remove the head of ordered list and restructure the list. + */ +static mca_pml_ob1_recv_frag_t* +remove_head_from_ordered_list(mca_pml_ob1_recv_frag_t** queue) +{ + mca_pml_ob1_recv_frag_t* frag = *queue; + /* queue is empty, nothing to see. */ + if( NULL == *queue ) + return NULL; + if( NULL == frag->range ) { + /* head has no range, */ + if( frag->super.super.opal_list_next == (opal_list_item_t*)frag ) { + /* head points to itself means it is the only + * one in this queue. We set the new head to NULL */ + *queue = NULL; + } else { + /* make the next one a new head. */ + *queue = (mca_pml_ob1_recv_frag_t*)frag->super.super.opal_list_next; + frag->super.super.opal_list_next->opal_list_prev = frag->super.super.opal_list_prev; + frag->super.super.opal_list_prev->opal_list_next = frag->super.super.opal_list_next; + } + } else { + /* head has range */ + mca_pml_ob1_recv_frag_t* range = frag->range; + frag->range = NULL; + *queue = (mca_pml_ob1_recv_frag_t*)range; + if( range->super.super.opal_list_next == (opal_list_item_t*)range ) { + /* the range has no next element */ + assert( range->super.super.opal_list_prev == (opal_list_item_t*)range ); + range->range = NULL; + } else { + range->range = (mca_pml_ob1_recv_frag_t*)range->super.super.opal_list_next; + /* remove the range from the vertical chain */ + range->super.super.opal_list_next->opal_list_prev = range->super.super.opal_list_prev; + range->super.super.opal_list_prev->opal_list_next = range->super.super.opal_list_next; + } + /* replace frag with range in the horizontal range if not the only element */ + if( frag->super.super.opal_list_next == (opal_list_item_t*)frag ) { + range->super.super.opal_list_next = (opal_list_item_t*)range; + range->super.super.opal_list_prev = (opal_list_item_t*)range; + } else { + range->super.super.opal_list_next = frag->super.super.opal_list_next; + range->super.super.opal_list_prev = frag->super.super.opal_list_prev; + range->super.super.opal_list_next->opal_list_prev = (opal_list_item_t*)range; + range->super.super.opal_list_prev->opal_list_next = (opal_list_item_t*)range; + } + } + frag->super.super.opal_list_next = NULL; + frag->super.super.opal_list_prev = NULL; + return frag; +} + /** * Match incoming recv_frags against posted receives. * Supports out of order delivery. * - * @param frag_header (IN) Header of received recv_frag. - * @param frag_desc (IN) Received recv_frag descriptor. - * @param match_made (OUT) Flag indicating wether a match was made. - * @param additional_matches (OUT) List of additional matches + * @param hdr (IN) Header of received recv_frag. + * @param segments (IN) Received recv_frag descriptor. + * @param num_segments (IN) Flag indicating wether a match was made. + * @param type (IN) Type of the message header. * @return OMPI_SUCCESS or error status on failure. */ static int mca_pml_ob1_recv_frag_match( mca_btl_base_module_t *btl, mca_pml_ob1_match_hdr_t *hdr, mca_btl_base_segment_t* segments, size_t num_segments, - int type); + int type ); + +/** + * Match incoming frags against posted receives. If frag is not NULL then we assume + * it is already local and that it can be released upon completion. + * Supports out of order delivery. + * + * @param comm_ptr (IN) Communicator where the message has been received + * @param proc (IN) Proc for which we have received the message. + * @param hdr (IN) Header of received recv_frag. + * @param segments (IN) Received recv_frag descriptor. + * @param num_segments (IN) Flag indicating wether a match was made. + * @param type (IN) Type of the message header. + * @return OMPI_SUCCESS or error status on failure. + */ +static int +mca_pml_ob1_recv_frag_match_proc( mca_btl_base_module_t *btl, + ompi_communicator_t* comm_ptr, + mca_pml_ob1_comm_proc_t *proc, + mca_pml_ob1_match_hdr_t *hdr, + mca_btl_base_segment_t* segments, + size_t num_segments, + int type, + mca_pml_ob1_recv_frag_t* frag ); static mca_pml_ob1_recv_request_t* match_one(mca_btl_base_module_t *btl, @@ -105,6 +309,17 @@ match_one(mca_btl_base_module_t *btl, mca_pml_ob1_comm_proc_t *proc, mca_pml_ob1_recv_frag_t* frag); +mca_pml_ob1_recv_frag_t* +check_cantmatch_for_match(mca_pml_ob1_comm_proc_t *proc) +{ + mca_pml_ob1_recv_frag_t *frag = proc->frags_cant_match; + + if( (NULL != frag) && (frag->hdr.hdr_match.hdr_seq == proc->expected_sequence) ) { + return remove_head_from_ordered_list(&proc->frags_cant_match); + } + return NULL; +} + void mca_pml_ob1_recv_frag_callback_match(mca_btl_base_module_t* btl, mca_btl_base_tag_t tag, mca_btl_base_descriptor_t* des, @@ -163,32 +378,36 @@ void mca_pml_ob1_recv_frag_callback_match(mca_btl_base_module_t* btl, */ OB1_MATCHING_LOCK(&comm->matching_lock); - /* get sequence number of next message that can be processed */ - if(OPAL_UNLIKELY((((uint16_t) hdr->hdr_seq) != ((uint16_t) proc->expected_sequence)) || - (opal_list_get_size(&proc->frags_cant_match) > 0 ))) { - goto slow_path; - } - - /* This is the sequence number we were expecting, so we can try - * matching it to already posted receives. - */ + if (!OMPI_COMM_CHECK_ASSERT_ALLOW_OVERTAKE(comm_ptr)) { + /* get sequence number of next message that can be processed. + * If this frag is out of sequence, queue it up in the list + * now as we still have the lock. + */ + if(OPAL_UNLIKELY(((uint16_t) hdr->hdr_seq) != ((uint16_t) proc->expected_sequence))) { + mca_pml_ob1_recv_frag_t* frag; + MCA_PML_OB1_RECV_FRAG_ALLOC(frag); + MCA_PML_OB1_RECV_FRAG_INIT(frag, hdr, segments, num_segments, btl); + append_frag_to_ordered_list(&proc->frags_cant_match, frag, proc->expected_sequence); + OB1_MATCHING_UNLOCK(&comm->matching_lock); + return; + } - /* We're now expecting the next sequence number. */ - proc->expected_sequence++; + /* We're now expecting the next sequence number. */ + proc->expected_sequence++; + } /* We generate the SEARCH_POSTED_QUEUE only when the message is * received in the correct sequence. Otherwise, we delay the event * generation until we reach the correct sequence number. */ PERUSE_TRACE_MSG_EVENT(PERUSE_COMM_SEARCH_POSTED_Q_BEGIN, comm_ptr, - hdr->hdr_src, hdr->hdr_tag, PERUSE_RECV); + hdr->hdr_src, hdr->hdr_tag, PERUSE_RECV); match = match_one(btl, hdr, segments, num_segments, comm_ptr, proc, NULL); /* The match is over. We generate the SEARCH_POSTED_Q_END here, - * before going into the mca_pml_ob1_check_cantmatch_for_match so - * we can make a difference for the searching time for all - * messages. + * before going into check_cantmatch_for_match so we can make + * a difference for the searching time for all messages. */ PERUSE_TRACE_MSG_EVENT(PERUSE_COMM_SEARCH_POSTED_Q_END, comm_ptr, hdr->hdr_src, hdr->hdr_tag, PERUSE_RECV); @@ -198,7 +417,12 @@ void mca_pml_ob1_recv_frag_callback_match(mca_btl_base_module_t* btl, if(OPAL_LIKELY(match)) { bytes_received = segments->seg_len - OMPI_PML_OB1_MATCH_HDR_LEN; - match->req_recv.req_bytes_packed = bytes_received; + /* We don't need to know the total amount of bytes we just received, + * but we need to know if there is any data in this message. The + * simplest way is to get the extra length from the first segment, + * and then add the number of remaining segments. + */ + match->req_recv.req_bytes_packed = bytes_received + (num_segments-1); MCA_PML_OB1_RECV_REQUEST_MATCHED(match, hdr); if(match->req_bytes_expected > 0) { @@ -244,12 +468,31 @@ void mca_pml_ob1_recv_frag_callback_match(mca_btl_base_module_t* btl, /* don't need a rmb as that is for checking */ recv_request_pml_complete(match); } - return; - slow_path: - OB1_MATCHING_UNLOCK(&comm->matching_lock); - mca_pml_ob1_recv_frag_match(btl, hdr, segments, - num_segments, MCA_PML_OB1_HDR_TYPE_MATCH); + /* We matched the frag, Now see if we already have the next sequence in + * our OOS list. If yes, try to match it. + * + * NOTE: + * To optimize the number of lock used, mca_pml_ob1_recv_frag_match_proc() + * MUST be called with communicator lock and will RELEASE the lock. This is + * not ideal but it is better for the performance. + */ + if(NULL != proc->frags_cant_match) { + mca_pml_ob1_recv_frag_t* frag; + + OB1_MATCHING_LOCK(&comm->matching_lock); + if((frag = check_cantmatch_for_match(proc))) { + /* mca_pml_ob1_recv_frag_match_proc() will release the lock. */ + mca_pml_ob1_recv_frag_match_proc(frag->btl, comm_ptr, proc, + &frag->hdr.hdr_match, + frag->segments, frag->num_segments, + frag->hdr.hdr_match.hdr_common.hdr_type, frag); + } else { + OB1_MATCHING_UNLOCK(&comm->matching_lock); + } + } + + return; } @@ -340,7 +583,7 @@ void mca_pml_ob1_recv_frag_callback_ack(mca_btl_base_module_t* btl, * protocol has req_state == 0 and as such should not be * decremented. */ - OPAL_THREAD_ADD32(&sendreq->req_state, -1); + OPAL_THREAD_ADD_FETCH32(&sendreq->req_state, -1); } #if OPAL_CUDA_SUPPORT /* CUDA_ASYNC_SEND */ @@ -506,6 +749,27 @@ static mca_pml_ob1_recv_request_t *match_incomming( return NULL; } +static mca_pml_ob1_recv_request_t *match_incomming_no_any_source ( + mca_pml_ob1_match_hdr_t *hdr, mca_pml_ob1_comm_t *comm, + mca_pml_ob1_comm_proc_t *proc) +{ + mca_pml_ob1_recv_request_t *recv_req; + int tag = hdr->hdr_tag; + + OPAL_LIST_FOREACH(recv_req, &proc->specific_receives, mca_pml_ob1_recv_request_t) { + int req_tag = recv_req->req_recv.req_base.req_tag; + + if (req_tag == tag || (req_tag == OMPI_ANY_TAG && tag >= 0)) { + opal_list_remove_item (&proc->specific_receives, (opal_list_item_t *) recv_req); + PERUSE_TRACE_COMM_EVENT(PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q, + &(recv_req->req_recv.req_base), PERUSE_RECV); + return recv_req; + } + } + + return NULL; +} + static mca_pml_ob1_recv_request_t* match_one(mca_btl_base_module_t *btl, mca_pml_ob1_match_hdr_t *hdr, mca_btl_base_segment_t* segments, @@ -517,7 +781,11 @@ match_one(mca_btl_base_module_t *btl, mca_pml_ob1_comm_t *comm = (mca_pml_ob1_comm_t *)comm_ptr->c_pml_comm; do { - match = match_incomming(hdr, comm, proc); + if (!OMPI_COMM_CHECK_ASSERT_NO_ANY_SOURCE (comm_ptr)) { + match = match_incomming(hdr, comm, proc); + } else { + match = match_incomming_no_any_source (hdr, comm, proc); + } /* if match found, process data */ if(OPAL_LIKELY(NULL != match)) { @@ -563,31 +831,6 @@ match_one(mca_btl_base_module_t *btl, } while(true); } -static mca_pml_ob1_recv_frag_t* check_cantmatch_for_match(mca_pml_ob1_comm_proc_t *proc) -{ - mca_pml_ob1_recv_frag_t *frag; - - /* search the list for a fragment from the send with sequence - * number next_msg_seq_expected - */ - for(frag = (mca_pml_ob1_recv_frag_t*)opal_list_get_first(&proc->frags_cant_match); - frag != (mca_pml_ob1_recv_frag_t*)opal_list_get_end(&proc->frags_cant_match); - frag = (mca_pml_ob1_recv_frag_t*)opal_list_get_next(frag)) - { - mca_pml_ob1_match_hdr_t* hdr = &frag->hdr.hdr_match; - /* - * If the message has the next expected seq from that proc... - */ - if(hdr->hdr_seq != proc->expected_sequence) - continue; - - opal_list_remove_item(&proc->frags_cant_match, (opal_list_item_t*)frag); - return frag; - } - - return NULL; -} - /** * RCS/CTS receive side matching * @@ -625,12 +868,11 @@ static int mca_pml_ob1_recv_frag_match( mca_btl_base_module_t *btl, int type) { /* local variables */ - uint16_t next_msg_seq_expected, frag_msg_seq; + uint16_t frag_msg_seq; + uint16_t next_msg_seq_expected; ompi_communicator_t *comm_ptr; - mca_pml_ob1_recv_request_t *match = NULL; mca_pml_ob1_comm_t *comm; mca_pml_ob1_comm_proc_t *proc; - mca_pml_ob1_recv_frag_t* frag = NULL; /* communicator pointer */ comm_ptr = ompi_comm_lookup(hdr->hdr_ctx); @@ -649,14 +891,13 @@ static int mca_pml_ob1_recv_frag_match( mca_btl_base_module_t *btl, comm = (mca_pml_ob1_comm_t *)comm_ptr->c_pml_comm; /* source sequence number */ - frag_msg_seq = hdr->hdr_seq; proc = mca_pml_ob1_peer_lookup (comm_ptr, hdr->hdr_src); - /** - * We generate the MSG_ARRIVED event as soon as the PML is aware of a matching - * fragment arrival. Independing if it is received on the correct order or not. - * This will allow the tools to figure out if the messages are not received in the - * correct order (if multiple network interfaces). + /* We generate the MSG_ARRIVED event as soon as the PML is aware + * of a matching fragment arrival. Independing if it is received + * on the correct order or not. This will allow the tools to + * figure out if the messages are not received in the correct + * order (if multiple network interfaces). */ PERUSE_TRACE_MSG_EVENT(PERUSE_COMM_MSG_ARRIVED, comm_ptr, hdr->hdr_src, hdr->hdr_tag, PERUSE_RECV); @@ -670,38 +911,69 @@ static int mca_pml_ob1_recv_frag_match( mca_btl_base_module_t *btl, */ OB1_MATCHING_LOCK(&comm->matching_lock); - /* get sequence number of next message that can be processed */ + frag_msg_seq = hdr->hdr_seq; next_msg_seq_expected = (uint16_t)proc->expected_sequence; - if(OPAL_UNLIKELY(frag_msg_seq != next_msg_seq_expected)) - goto wrong_seq; - /* - * This is the sequence number we were expecting, - * so we can try matching it to already posted - * receives. + /* If the sequence number is wrong, queue it up for later. */ + if(OPAL_UNLIKELY(frag_msg_seq != next_msg_seq_expected)) { + mca_pml_ob1_recv_frag_t* frag; + MCA_PML_OB1_RECV_FRAG_ALLOC(frag); + MCA_PML_OB1_RECV_FRAG_INIT(frag, hdr, segments, num_segments, btl); + append_frag_to_ordered_list(&proc->frags_cant_match, frag, next_msg_seq_expected); + OB1_MATCHING_UNLOCK(&comm->matching_lock); + return OMPI_SUCCESS; + } + + /* mca_pml_ob1_recv_frag_match_proc() will release the lock. */ + return mca_pml_ob1_recv_frag_match_proc(btl, comm_ptr, proc, hdr, + segments, num_segments, + type, NULL); +} + + +/* mca_pml_ob1_recv_frag_match_proc() will match the given frag and + * then try to match the next frag in sequence by looking into arrived + * out of order frags in frags_cant_match list until it can't find one. + * + * ATTENTION: THIS FUNCTION MUST BE CALLED WITH COMMUNICATOR LOCK HELD. + * THE LOCK WILL BE RELEASED UPON RETURN. USE WITH CARE. */ +static int +mca_pml_ob1_recv_frag_match_proc( mca_btl_base_module_t *btl, + ompi_communicator_t* comm_ptr, + mca_pml_ob1_comm_proc_t *proc, + mca_pml_ob1_match_hdr_t *hdr, + mca_btl_base_segment_t* segments, + size_t num_segments, + int type, + mca_pml_ob1_recv_frag_t* frag ) +{ + /* local variables */ + mca_pml_ob1_comm_t* comm = (mca_pml_ob1_comm_t *)comm_ptr->c_pml_comm; + mca_pml_ob1_recv_request_t *match = NULL; + + /* If we are here, this is the sequence number we were expecting, + * so we can try matching it to already posted receives. */ -out_of_order_match: + match_this_frag: /* We're now expecting the next sequence number. */ proc->expected_sequence++; - /** - * We generate the SEARCH_POSTED_QUEUE only when the message is received - * in the correct sequence. Otherwise, we delay the event generation until - * we reach the correct sequence number. + /* We generate the SEARCH_POSTED_QUEUE only when the message is + * received in the correct sequence. Otherwise, we delay the event + * generation until we reach the correct sequence number. */ PERUSE_TRACE_MSG_EVENT(PERUSE_COMM_SEARCH_POSTED_Q_BEGIN, comm_ptr, - hdr->hdr_src, hdr->hdr_tag, PERUSE_RECV); + hdr->hdr_src, hdr->hdr_tag, PERUSE_RECV); match = match_one(btl, hdr, segments, num_segments, comm_ptr, proc, frag); - /** - * The match is over. We generate the SEARCH_POSTED_Q_END here, before going - * into the mca_pml_ob1_check_cantmatch_for_match so we can make a difference - * for the searching time for all messages. + /* The match is over. We generate the SEARCH_POSTED_Q_END here, + * before going into check_cantmatch_for_match we can make a + * difference for the searching time for all messages. */ PERUSE_TRACE_MSG_EVENT(PERUSE_COMM_SEARCH_POSTED_Q_END, comm_ptr, - hdr->hdr_src, hdr->hdr_tag, PERUSE_RECV); + hdr->hdr_src, hdr->hdr_tag, PERUSE_RECV); /* release matching lock before processing fragment */ OB1_MATCHING_UNLOCK(&comm->matching_lock); @@ -725,10 +997,10 @@ static int mca_pml_ob1_recv_frag_match( mca_btl_base_module_t *btl, /* * Now that new message has arrived, check to see if - * any fragments on the c_c_frags_cant_match list + * any fragments on the frags_cant_match list * may now be used to form new matchs */ - if(OPAL_UNLIKELY(opal_list_get_size(&proc->frags_cant_match) > 0)) { + if(OPAL_UNLIKELY(NULL != proc->frags_cant_match)) { OB1_MATCHING_LOCK(&comm->matching_lock); if((frag = check_cantmatch_for_match(proc))) { hdr = &frag->hdr.hdr_match; @@ -736,20 +1008,11 @@ static int mca_pml_ob1_recv_frag_match( mca_btl_base_module_t *btl, num_segments = frag->num_segments; btl = frag->btl; type = hdr->hdr_common.hdr_type; - goto out_of_order_match; + goto match_this_frag; } OB1_MATCHING_UNLOCK(&comm->matching_lock); } - return OMPI_SUCCESS; -wrong_seq: - /* - * This message comes after the next expected, so it - * is ahead of sequence. Save it for later. - */ - append_frag_to_list(&proc->frags_cant_match, btl, hdr, segments, - num_segments, NULL); - OB1_MATCHING_UNLOCK(&comm->matching_lock); return OMPI_SUCCESS; } diff --git a/ompi/mca/pml/ob1/pml_ob1_recvfrag.h b/ompi/mca/pml/ob1/pml_ob1_recvfrag.h index 80bcef1501f..def120ccc62 100644 --- a/ompi/mca/pml/ob1/pml_ob1_recvfrag.h +++ b/ompi/mca/pml/ob1/pml_ob1_recvfrag.h @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2013 The University of Tennessee and The University + * Copyright (c) 2004-2018 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -42,6 +42,7 @@ struct mca_pml_ob1_recv_frag_t { opal_free_list_item_t super; mca_pml_ob1_hdr_t hdr; size_t num_segments; + struct mca_pml_ob1_recv_frag_t* range; mca_btl_base_module_t* btl; mca_btl_base_segment_t segments[MCA_BTL_DES_MAX_SEGMENTS]; mca_pml_ob1_buffer_t buffers[MCA_BTL_DES_MAX_SEGMENTS]; @@ -167,7 +168,18 @@ extern void mca_pml_ob1_recv_frag_callback_fin( mca_btl_base_module_t *btl, mca_btl_base_descriptor_t* descriptor, void* cbdata ); +/** + * Extract the next fragment from the cant_match ordered list. This fragment + * will be the next in sequence. + */ +extern mca_pml_ob1_recv_frag_t* +check_cantmatch_for_match(mca_pml_ob1_comm_proc_t *proc); + +void append_frag_to_ordered_list(mca_pml_ob1_recv_frag_t** queue, + mca_pml_ob1_recv_frag_t* frag, + uint16_t seq); +extern void mca_pml_ob1_dump_cant_match(mca_pml_ob1_recv_frag_t* queue); END_C_DECLS #endif diff --git a/ompi/mca/pml/ob1/pml_ob1_recvreq.c b/ompi/mca/pml/ob1/pml_ob1_recvreq.c index ddd60f263ce..9ccb27e1af4 100644 --- a/ompi/mca/pml/ob1/pml_ob1_recvreq.c +++ b/ompi/mca/pml/ob1/pml_ob1_recvreq.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2016 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, @@ -13,7 +13,7 @@ * Copyright (c) 2008 UT-Battelle, LLC. All rights reserved. * Copyright (c) 2011 Sandia National Laboratories. All rights reserved. * Copyright (c) 2012-2015 NVIDIA Corporation. All rights reserved. - * Copyright (c) 2011-2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2011-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2012 FUJITSU LIMITED. All rights reserved. * Copyright (c) 2014-2016 Research Organization for Information Science @@ -143,6 +143,7 @@ static int mca_pml_ob1_recv_request_cancel(struct ompi_request_t* ompi_request, static void mca_pml_ob1_recv_request_construct(mca_pml_ob1_recv_request_t* request) { /* the request type is set by the superclass */ + request->req_recv.req_base.req_ompi.req_start = mca_pml_ob1_start; request->req_recv.req_base.req_ompi.req_free = mca_pml_ob1_recv_request_free; request->req_recv.req_base.req_ompi.req_cancel = mca_pml_ob1_recv_request_cancel; request->req_rdma_cnt = 0; @@ -189,15 +190,15 @@ static void mca_pml_ob1_put_completion (mca_pml_ob1_rdma_frag_t *frag, int64_t r mca_pml_ob1_recv_request_t* recvreq = (mca_pml_ob1_recv_request_t *) frag->rdma_req; mca_bml_base_btl_t *bml_btl = frag->rdma_bml; - OPAL_THREAD_SUB_SIZE_T(&recvreq->req_pipeline_depth, 1); + OPAL_THREAD_ADD_FETCH32(&recvreq->req_pipeline_depth, -1); + assert ((uint64_t) rdma_size == frag->rdma_length); MCA_PML_OB1_RDMA_FRAG_RETURN(frag); if (OPAL_LIKELY(0 < rdma_size)) { - assert ((uint64_t) rdma_size == frag->rdma_length); /* check completion status */ - OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, (size_t) rdma_size); + OPAL_THREAD_ADD_FETCH_SIZE_T(&recvreq->req_bytes_received, rdma_size); if (recv_request_pml_complete_check(recvreq) == false && recvreq->req_rdma_offset < recvreq->req_send_offset) { /* schedule additional rdma operations */ @@ -372,7 +373,7 @@ static void mca_pml_ob1_rget_completion (mca_btl_base_module_t* btl, struct mca_ } } else { /* is receive request complete */ - OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, frag->rdma_length); + OPAL_THREAD_ADD_FETCH_SIZE_T(&recvreq->req_bytes_received, frag->rdma_length); /* TODO: re-add order */ mca_pml_ob1_send_fin (recvreq->req_recv.req_base.req_proc, bml_btl, frag->rdma_hdr.hdr_rget.hdr_frag, @@ -469,7 +470,7 @@ int mca_pml_ob1_recv_request_get_frag (mca_pml_ob1_rdma_frag_t *frag) rc = mca_bml_base_get (bml_btl, frag->local_address, frag->remote_address, local_handle, (mca_btl_base_registration_handle_t *) frag->remote_handle, frag->rdma_length, 0, MCA_BTL_NO_ORDER, mca_pml_ob1_rget_completion, frag); - if( OPAL_UNLIKELY(OMPI_SUCCESS != rc) ) { + if( OPAL_UNLIKELY(OMPI_SUCCESS > rc) ) { return mca_pml_ob1_recv_request_get_frag_failed (frag, OMPI_ERR_OUT_OF_RESOURCE); } @@ -523,7 +524,7 @@ void mca_pml_ob1_recv_request_progress_frag( mca_pml_ob1_recv_request_t* recvreq recvreq->req_recv.req_base.req_datatype); ); - OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, bytes_received); + OPAL_THREAD_ADD_FETCH_SIZE_T(&recvreq->req_bytes_received, bytes_received); /* check completion status */ if(recv_request_pml_complete_check(recvreq) == false && recvreq->req_rdma_offset < recvreq->req_send_offset) { @@ -600,7 +601,7 @@ void mca_pml_ob1_recv_request_frag_copy_finished( mca_btl_base_module_t* btl, * known that the data has been copied out of the descriptor. */ des->des_cbfunc(NULL, NULL, des, 0); - OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, bytes_received); + OPAL_THREAD_ADD_FETCH_SIZE_T(&recvreq->req_bytes_received, bytes_received); /* check completion status */ if(recv_request_pml_complete_check(recvreq) == false && @@ -753,13 +754,14 @@ void mca_pml_ob1_recv_request_progress_rget( mca_pml_ob1_recv_request_t* recvreq frag->rdma_length = bytes_remaining; } + prev_sent = frag->rdma_length; + /* NTH: TODO -- handle error conditions gracefully */ rc = mca_pml_ob1_recv_request_get_frag(frag); if (OMPI_SUCCESS != rc) { break; } - prev_sent = frag->rdma_length; bytes_remaining -= prev_sent; offset += prev_sent; } @@ -813,7 +815,7 @@ void mca_pml_ob1_recv_request_progress_rndv( mca_pml_ob1_recv_request_t* recvreq recvreq->req_recv.req_base.req_count, recvreq->req_recv.req_base.req_datatype); ); - OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, bytes_received); + OPAL_THREAD_ADD_FETCH_SIZE_T(&recvreq->req_bytes_received, bytes_received); } /* check completion status */ if(recv_request_pml_complete_check(recvreq) == false && @@ -949,7 +951,7 @@ int mca_pml_ob1_recv_request_schedule_once( mca_pml_ob1_recv_request_t* recvreq, } while(bytes_remaining > 0 && - recvreq->req_pipeline_depth < mca_pml_ob1.recv_pipeline_depth) { + recvreq->req_pipeline_depth < mca_pml_ob1.recv_pipeline_depth) { mca_pml_ob1_rdma_frag_t *frag = NULL; mca_btl_base_module_t *btl; int rc, rdma_idx; @@ -981,14 +983,10 @@ int mca_pml_ob1_recv_request_schedule_once( mca_pml_ob1_recv_request_t* recvreq, } while(!size); btl = bml_btl->btl; - /* NTH: This conditional used to check if there was a registration in - * recvreq->req_rdma[rdma_idx].btl_reg. If once existed it was due to - * the btl not needed registration (equivalent to btl->btl_register_mem - * != NULL. This new check is equivalent. Note: I feel this protocol - * needs work to better improve resource usage when running with a - * leave pinned protocol. */ - if (btl->btl_register_mem && (btl->btl_rdma_pipeline_frag_size != 0) && - (size > btl->btl_rdma_pipeline_frag_size)) { + /* NTH: Note: I feel this protocol needs work to better improve resource + * usage when running with a leave pinned protocol. */ + /* GB: We should always abide by the BTL RDMA pipeline fragment limit (if one is set) */ + if ((btl->btl_rdma_pipeline_frag_size != 0) && (size > btl->btl_rdma_pipeline_frag_size)) { size = btl->btl_rdma_pipeline_frag_size; } @@ -1026,7 +1024,7 @@ int mca_pml_ob1_recv_request_schedule_once( mca_pml_ob1_recv_request_t* recvreq, if (OPAL_LIKELY(OMPI_SUCCESS == rc)) { /* update request state */ recvreq->req_rdma_offset += size; - OPAL_THREAD_ADD_SIZE_T(&recvreq->req_pipeline_depth, 1); + OPAL_THREAD_ADD_FETCH32(&recvreq->req_pipeline_depth, 1); recvreq->req_rdma[rdma_idx].length -= size; bytes_remaining -= size; } else { diff --git a/ompi/mca/pml/ob1/pml_ob1_recvreq.h b/ompi/mca/pml/ob1/pml_ob1_recvreq.h index 6d575693237..0ced47e2915 100644 --- a/ompi/mca/pml/ob1/pml_ob1_recvreq.h +++ b/ompi/mca/pml/ob1/pml_ob1_recvreq.h @@ -41,12 +41,12 @@ BEGIN_C_DECLS struct mca_pml_ob1_recv_request_t { mca_pml_base_recv_request_t req_recv; opal_ptr_t remote_req_send; - int32_t req_lock; - size_t req_pipeline_depth; - size_t req_bytes_received; /**< amount of data transferred into the user buffer */ - size_t req_bytes_expected; /**< local size of the data as suggested by the user */ - size_t req_rdma_offset; - size_t req_send_offset; + int32_t req_lock; + int32_t req_pipeline_depth; + size_t req_bytes_received; /**< amount of data transferred into the user buffer */ + size_t req_bytes_expected; /**< local size of the data as suggested by the user */ + size_t req_rdma_offset; + size_t req_send_offset; uint32_t req_rdma_cnt; uint32_t req_rdma_idx; bool req_pending; @@ -64,12 +64,12 @@ OBJ_CLASS_DECLARATION(mca_pml_ob1_recv_request_t); static inline bool lock_recv_request(mca_pml_ob1_recv_request_t *recvreq) { - return OPAL_THREAD_ADD32(&recvreq->req_lock, 1) == 1; + return OPAL_THREAD_ADD_FETCH32(&recvreq->req_lock, 1) == 1; } static inline bool unlock_recv_request(mca_pml_ob1_recv_request_t *recvreq) { - return OPAL_THREAD_ADD32(&recvreq->req_lock, -1) == 0; + return OPAL_THREAD_ADD_FETCH32(&recvreq->req_lock, -1) == 0; } /** diff --git a/ompi/mca/pml/ob1/pml_ob1_sendreq.c b/ompi/mca/pml/ob1/pml_ob1_sendreq.c index 96bfa16ddb5..a2aecae09ac 100644 --- a/ompi/mca/pml/ob1/pml_ob1_sendreq.c +++ b/ompi/mca/pml/ob1/pml_ob1_sendreq.c @@ -132,6 +132,7 @@ static int mca_pml_ob1_send_request_cancel(struct ompi_request_t* request, int c static void mca_pml_ob1_send_request_construct(mca_pml_ob1_send_request_t* req) { req->req_send.req_base.req_type = MCA_PML_REQUEST_SEND; + req->req_send.req_base.req_ompi.req_start = mca_pml_ob1_start; req->req_send.req_base.req_ompi.req_free = mca_pml_ob1_send_request_free; req->req_send.req_base.req_ompi.req_cancel = mca_pml_ob1_send_request_cancel; req->req_rdma_cnt = 0; @@ -204,10 +205,10 @@ mca_pml_ob1_rndv_completion_request( mca_bml_base_btl_t* bml_btl, &(sendreq->req_send.req_base), PERUSE_SEND ); } - OPAL_THREAD_ADD_SIZE_T(&sendreq->req_bytes_delivered, req_bytes_delivered); + OPAL_THREAD_ADD_FETCH_SIZE_T(&sendreq->req_bytes_delivered, req_bytes_delivered); /* advance the request */ - OPAL_THREAD_ADD32(&sendreq->req_state, -1); + OPAL_THREAD_ADD_FETCH32(&sendreq->req_state, -1); send_request_pml_complete_check(sendreq); @@ -260,7 +261,7 @@ mca_pml_ob1_rget_completion (mca_pml_ob1_rdma_frag_t *frag, int64_t rdma_length) /* count bytes of user data actually delivered and check for request completion */ if (OPAL_LIKELY(0 < rdma_length)) { - OPAL_THREAD_ADD_SIZE_T(&sendreq->req_bytes_delivered, (size_t) rdma_length); + OPAL_THREAD_ADD_FETCH_SIZE_T(&sendreq->req_bytes_delivered, (size_t) rdma_length); } send_request_pml_complete_check(sendreq); @@ -312,8 +313,8 @@ mca_pml_ob1_frag_completion( mca_btl_base_module_t* btl, des->des_segment_count, sizeof(mca_pml_ob1_frag_hdr_t)); - OPAL_THREAD_SUB_SIZE_T(&sendreq->req_pipeline_depth, 1); - OPAL_THREAD_ADD_SIZE_T(&sendreq->req_bytes_delivered, req_bytes_delivered); + OPAL_THREAD_ADD_FETCH32(&sendreq->req_pipeline_depth, -1); + OPAL_THREAD_ADD_FETCH_SIZE_T(&sendreq->req_bytes_delivered, req_bytes_delivered); if(send_request_pml_complete_check(sendreq) == false) { mca_pml_ob1_send_request_schedule(sendreq); @@ -912,13 +913,13 @@ mca_pml_ob1_send_request_schedule_once(mca_pml_ob1_send_request_t* sendreq) /* check pipeline_depth here before attempting to get any locks */ if(true == sendreq->req_throttle_sends && - sendreq->req_pipeline_depth >= mca_pml_ob1.send_pipeline_depth) + sendreq->req_pipeline_depth >= mca_pml_ob1.send_pipeline_depth) return OMPI_SUCCESS; range = get_send_range(sendreq); while(range && (false == sendreq->req_throttle_sends || - sendreq->req_pipeline_depth < mca_pml_ob1.send_pipeline_depth)) { + sendreq->req_pipeline_depth < mca_pml_ob1.send_pipeline_depth)) { mca_pml_ob1_frag_hdr_t* hdr; mca_btl_base_descriptor_t* des; int rc, btl_idx; @@ -1043,7 +1044,7 @@ mca_pml_ob1_send_request_schedule_once(mca_pml_ob1_send_request_t* sendreq) range->range_btls[btl_idx].length -= size; range->range_send_length -= size; range->range_send_offset += size; - OPAL_THREAD_ADD_SIZE_T(&sendreq->req_pipeline_depth, 1); + OPAL_THREAD_ADD_FETCH32(&sendreq->req_pipeline_depth, 1); if(range->range_send_length == 0) { range = get_next_send_range(sendreq, range); prev_bytes_remaining = 0; @@ -1059,7 +1060,7 @@ mca_pml_ob1_send_request_schedule_once(mca_pml_ob1_send_request_t* sendreq) range->range_btls[btl_idx].length -= size; range->range_send_length -= size; range->range_send_offset += size; - OPAL_THREAD_ADD_SIZE_T(&sendreq->req_pipeline_depth, 1); + OPAL_THREAD_ADD_FETCH32(&sendreq->req_pipeline_depth, 1); if(range->range_send_length == 0) { range = get_next_send_range(sendreq, range); prev_bytes_remaining = 0; @@ -1125,7 +1126,7 @@ static void mca_pml_ob1_put_completion (mca_btl_base_module_t* btl, struct mca_b 0, 0); /* check for request completion */ - OPAL_THREAD_ADD_SIZE_T(&sendreq->req_bytes_delivered, frag->rdma_length); + OPAL_THREAD_ADD_FETCH_SIZE_T(&sendreq->req_bytes_delivered, frag->rdma_length); send_request_pml_complete_check(sendreq); } else { @@ -1199,7 +1200,7 @@ void mca_pml_ob1_send_request_put( mca_pml_ob1_send_request_t* sendreq, mca_pml_ob1_rdma_frag_t* frag; if(hdr->hdr_common.hdr_flags & MCA_PML_OB1_HDR_TYPE_ACK) { - OPAL_THREAD_ADD32(&sendreq->req_state, -1); + OPAL_THREAD_ADD_FETCH32(&sendreq->req_state, -1); } sendreq->req_recv.pval = hdr->hdr_recv_req.pval; diff --git a/ompi/mca/pml/ob1/pml_ob1_sendreq.h b/ompi/mca/pml/ob1/pml_ob1_sendreq.h index 80acc93f4ec..be36c3f2ac4 100644 --- a/ompi/mca/pml/ob1/pml_ob1_sendreq.h +++ b/ompi/mca/pml/ob1/pml_ob1_sendreq.h @@ -45,11 +45,11 @@ struct mca_pml_ob1_send_request_t { mca_pml_base_send_request_t req_send; mca_bml_base_endpoint_t* req_endpoint; opal_ptr_t req_recv; - int32_t req_state; - int32_t req_lock; - bool req_throttle_sends; - size_t req_pipeline_depth; - size_t req_bytes_delivered; + int32_t req_state; + int32_t req_lock; + bool req_throttle_sends; + int32_t req_pipeline_depth; + size_t req_bytes_delivered; uint32_t req_rdma_cnt; mca_pml_ob1_send_pending_t req_pending; opal_mutex_t req_send_range_lock; @@ -76,12 +76,12 @@ OBJ_CLASS_DECLARATION(mca_pml_ob1_send_range_t); static inline bool lock_send_request(mca_pml_ob1_send_request_t *sendreq) { - return OPAL_THREAD_ADD32(&sendreq->req_lock, 1) == 1; + return OPAL_THREAD_ADD_FETCH32(&sendreq->req_lock, 1) == 1; } static inline bool unlock_send_request(mca_pml_ob1_send_request_t *sendreq) { - return OPAL_THREAD_ADD32(&sendreq->req_lock, -1) == 0; + return OPAL_THREAD_ADD_FETCH32(&sendreq->req_lock, -1) == 0; } static inline void @@ -485,7 +485,7 @@ mca_pml_ob1_send_request_start( mca_pml_ob1_send_request_t* sendreq ) return OMPI_ERR_UNREACH; } - seqn = OPAL_THREAD_ADD32(&ob1_proc->send_sequence, 1); + seqn = OPAL_THREAD_ADD_FETCH32(&ob1_proc->send_sequence, 1); return mca_pml_ob1_send_request_start_seq (sendreq, endpoint, seqn); } diff --git a/ompi/mca/pml/pml.h b/ompi/mca/pml/pml.h index 0b70da841b8..243b5993dda 100644 --- a/ompi/mca/pml/pml.h +++ b/ompi/mca/pml/pml.h @@ -69,6 +69,7 @@ #include "ompi/mca/mca.h" #include "mpi.h" /* needed for MPI_ANY_TAG */ #include "ompi/mca/pml/pml_constants.h" +#include "ompi/request/request.h" BEGIN_C_DECLS @@ -350,14 +351,11 @@ typedef int (*mca_pml_base_module_send_fn_t)( /** * Initiate one or more persistent requests. * - * @param count Number of requests - * @param request Array of persistent requests - * @return OMPI_SUCCESS or failure status. + * @param count (IN) Number of requests + * @param requests (IN/OUT) Array of persistent requests + * @return OMPI_SUCCESS or failure status. */ -typedef int (*mca_pml_base_module_start_fn_t)( - size_t count, - struct ompi_request_t** requests -); +typedef ompi_request_start_fn_t mca_pml_base_module_start_fn_t; /** * Probe to poll for pending recv. diff --git a/ompi/mca/pml/ucx/Makefile.am b/ompi/mca/pml/ucx/Makefile.am index 0fdd85e2723..54e590438e1 100644 --- a/ompi/mca/pml/ucx/Makefile.am +++ b/ompi/mca/pml/ucx/Makefile.am @@ -1,5 +1,7 @@ # -# Copyright (C) Mellanox Technologies Ltd. 2001-2015. ALL RIGHTS RESERVED. +# Copyright (C) 2001-2017 Mellanox Technologies, Inc. +# All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -35,11 +37,12 @@ endif mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_pml_ucx_la_SOURCES = $(local_sources) -mca_pml_ucx_la_LIBADD = $(pml_ucx_LIBS) -mca_pml_ucx_la_LDFLAGS = -module -avoid-version +mca_pml_ucx_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(pml_ucx_LIBS) +mca_pml_ucx_la_LDFLAGS = -module -avoid-version $(pml_ucx_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) libmca_pml_ucx_la_SOURCES = $(local_sources) libmca_pml_ucx_la_LIBADD = $(pml_ucx_LIBS) -libmca_pml_ucx_la_LDFLAGS = -module -avoid-version +libmca_pml_ucx_la_LDFLAGS = -module -avoid-version $(pml_ucx_LDFLAGS) diff --git a/ompi/mca/pml/ucx/pml_ucx.c b/ompi/mca/pml/ucx/pml_ucx.c index cf4b49f8304..98352a951fb 100644 --- a/ompi/mca/pml/ucx/pml_ucx.c +++ b/ompi/mca/pml/ucx/pml_ucx.c @@ -68,13 +68,17 @@ mca_pml_ucx_module_t ompi_pml_ucx = { mca_pml_ucx_mrecv, mca_pml_ucx_dump, NULL, /* FT */ - 1ul << (PML_UCX_TAG_BITS - 1), 1ul << (PML_UCX_CONTEXT_BITS), + 1ul << (PML_UCX_TAG_BITS - 1), }, NULL, /* ucp_context */ NULL /* ucp_worker */ }; +#define PML_UCX_REQ_ALLOCA() \ + ((char *)alloca(ompi_pml_ucx.request_size) + ompi_pml_ucx.request_size); + + static int mca_pml_ucx_send_worker_address(void) { ucp_address_t *address; @@ -108,9 +112,10 @@ static int mca_pml_ucx_recv_worker_address(ompi_proc_t *proc, *address_p = NULL; OPAL_MODEX_RECV(ret, &mca_pml_ucx_component.pmlm_version, &proc->super.proc_name, - (void**)address_p, addrlen_p); + (void**)address_p, addrlen_p); if (ret < 0) { - PML_UCX_ERROR("Failed to receive EP address"); + PML_UCX_ERROR("Failed to receive UCX worker address: %s (%d)", + opal_strerror(ret), ret); } return ret; } @@ -135,12 +140,17 @@ int mca_pml_ucx_open(void) UCP_PARAM_FIELD_REQUEST_SIZE | UCP_PARAM_FIELD_REQUEST_INIT | UCP_PARAM_FIELD_REQUEST_CLEANUP | - UCP_PARAM_FIELD_TAG_SENDER_MASK; + UCP_PARAM_FIELD_TAG_SENDER_MASK | + UCP_PARAM_FIELD_MT_WORKERS_SHARED | + UCP_PARAM_FIELD_ESTIMATED_NUM_EPS; params.features = UCP_FEATURE_TAG; params.request_size = sizeof(ompi_request_t); params.request_init = mca_pml_ucx_request_init; params.request_cleanup = mca_pml_ucx_request_cleanup; params.tag_sender_mask = PML_UCX_SPECIFIC_SOURCE_MASK; + params.mt_workers_shared = 0; /* we do not need mt support for context + since it will be protected by worker */ + params.estimated_num_eps = ompi_proc_world_size(); status = ucp_init(¶ms, config, &ompi_pml_ucx.ucp_context); ucp_config_release(config); @@ -178,6 +188,7 @@ int mca_pml_ucx_init(void) { ucp_worker_params_t params; ucs_status_t status; + ucp_worker_attr_t attr; int rc; PML_UCX_VERBOSE(1, "mca_pml_ucx_init"); @@ -185,10 +196,34 @@ int mca_pml_ucx_init(void) /* TODO check MPI thread mode */ params.field_mask = UCP_WORKER_PARAM_FIELD_THREAD_MODE; params.thread_mode = UCS_THREAD_MODE_SINGLE; + if (ompi_mpi_thread_multiple) { + params.thread_mode = UCS_THREAD_MODE_MULTI; + } else { + params.thread_mode = UCS_THREAD_MODE_SINGLE; + } status = ucp_worker_create(ompi_pml_ucx.ucp_context, ¶ms, &ompi_pml_ucx.ucp_worker); if (UCS_OK != status) { + PML_UCX_ERROR("Failed to create UCP worker"); + return OMPI_ERROR; + } + + attr.field_mask = UCP_WORKER_ATTR_FIELD_THREAD_MODE; + status = ucp_worker_query(ompi_pml_ucx.ucp_worker, &attr); + if (UCS_OK != status) { + ucp_worker_destroy(ompi_pml_ucx.ucp_worker); + ompi_pml_ucx.ucp_worker = NULL; + PML_UCX_ERROR("Failed to query UCP worker thread level"); + return OMPI_ERROR; + } + + if (ompi_mpi_thread_multiple && attr.thread_mode != UCS_THREAD_MODE_MULTI) { + /* UCX does not support multithreading, disqualify current PML for now */ + /* TODO: we should let OMPI to fallback to THREAD_SINGLE mode */ + ucp_worker_destroy(ompi_pml_ucx.ucp_worker); + ompi_pml_ucx.ucp_worker = NULL; + PML_UCX_ERROR("UCP worker does not support MPI_THREAD_MULTIPLE"); return OMPI_ERROR; } @@ -234,7 +269,7 @@ int mca_pml_ucx_cleanup(void) return OMPI_SUCCESS; } -ucp_ep_h mca_pml_ucx_add_proc(ompi_communicator_t *comm, int dst) +static ucp_ep_h mca_pml_ucx_add_proc_common(ompi_proc_t *proc) { ucp_ep_params_t ep_params; ucp_address_t *address; @@ -243,23 +278,12 @@ ucp_ep_h mca_pml_ucx_add_proc(ompi_communicator_t *comm, int dst) ucp_ep_h ep; int ret; - ompi_proc_t *proc0 = ompi_comm_peer_lookup(comm, 0); - ompi_proc_t *proc_peer = ompi_comm_peer_lookup(comm, dst); - - /* Note, mca_pml_base_pml_check_selected, doesn't use 3rd argument */ - if (OMPI_SUCCESS != (ret = mca_pml_base_pml_check_selected("ucx", - &proc0, - dst))) { - return NULL; - } - - ret = mca_pml_ucx_recv_worker_address(proc_peer, &address, &addrlen); + ret = mca_pml_ucx_recv_worker_address(proc, &address, &addrlen); if (ret < 0) { - PML_UCX_ERROR("Failed to receive worker address from proc: %d", proc_peer->super.proc_name.vpid); return NULL; } - PML_UCX_VERBOSE(2, "connecting to proc. %d", proc_peer->super.proc_name.vpid); + PML_UCX_VERBOSE(2, "connecting to proc. %d", proc->super.proc_name.vpid); ep_params.field_mask = UCP_EP_PARAM_FIELD_REMOTE_ADDRESS; ep_params.address = address; @@ -267,66 +291,78 @@ ucp_ep_h mca_pml_ucx_add_proc(ompi_communicator_t *comm, int dst) status = ucp_ep_create(ompi_pml_ucx.ucp_worker, &ep_params, &ep); free(address); if (UCS_OK != status) { - PML_UCX_ERROR("Failed to connect to proc: %d, %s", proc_peer->super.proc_name.vpid, - ucs_status_string(status)); + PML_UCX_ERROR("ucp_ep_create(proc=%d) failed: %s", + proc->super.proc_name.vpid, + ucs_status_string(status)); return NULL; } - proc_peer->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_PML] = ep; - + proc->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_PML] = ep; return ep; } +static ucp_ep_h mca_pml_ucx_add_proc(ompi_communicator_t *comm, int dst) +{ + ompi_proc_t *proc0 = ompi_comm_peer_lookup(comm, 0); + ompi_proc_t *proc_peer = ompi_comm_peer_lookup(comm, dst); + int ret; + + /* Note, mca_pml_base_pml_check_selected, doesn't use 3rd argument */ + if (OMPI_SUCCESS != (ret = mca_pml_base_pml_check_selected("ucx", + &proc0, + dst))) { + return NULL; + } + + return mca_pml_ucx_add_proc_common(proc_peer); +} + int mca_pml_ucx_add_procs(struct ompi_proc_t **procs, size_t nprocs) { - ucp_ep_params_t ep_params; - ucp_address_t *address; - ucs_status_t status; ompi_proc_t *proc; - size_t addrlen; ucp_ep_h ep; size_t i; int ret; if (OMPI_SUCCESS != (ret = mca_pml_base_pml_check_selected("ucx", - procs, - nprocs))) { + procs, + nprocs))) { return ret; } for (i = 0; i < nprocs; ++i) { proc = procs[(i + OMPI_PROC_MY_NAME->vpid) % nprocs]; - - ret = mca_pml_ucx_recv_worker_address(proc, &address, &addrlen); - if (ret < 0) { - PML_UCX_ERROR("Failed to receive worker address from proc: %d", - proc->super.proc_name.vpid); - return ret; - } - - if (proc->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_PML]) { - PML_UCX_VERBOSE(3, "already connected to proc. %d", proc->super.proc_name.vpid); - continue; + ep = mca_pml_ucx_add_proc_common(proc); + if (ep == NULL) { + return OMPI_ERROR; } + } - PML_UCX_VERBOSE(2, "connecting to proc. %d", proc->super.proc_name.vpid); + return OMPI_SUCCESS; +} - ep_params.field_mask = UCP_EP_PARAM_FIELD_REMOTE_ADDRESS; - ep_params.address = address; +static inline ucp_ep_h mca_pml_ucx_get_ep(ompi_communicator_t *comm, int rank) +{ + ucp_ep_h ep; - status = ucp_ep_create(ompi_pml_ucx.ucp_worker, &ep_params, &ep); - free(address); + ep = ompi_comm_peer_lookup(comm, rank)->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_PML]; + if (OPAL_LIKELY(ep != NULL)) { + return ep; + } - if (UCS_OK != status) { - PML_UCX_ERROR("Failed to connect to proc: %d, %s", proc->super.proc_name.vpid, - ucs_status_string(status)); - return OMPI_ERROR; - } + ep = mca_pml_ucx_add_proc(comm, rank); + if (OPAL_LIKELY(ep != NULL)) { + return ep; + } - proc->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_PML] = ep; + if (rank >= ompi_comm_size(comm)) { + PML_UCX_ERROR("Rank number (%d) is larger than communicator size (%d)", + rank, ompi_comm_size(comm)); + } else { + PML_UCX_ERROR("Failed to resolve UCX endpoint for rank %d", rank); } - return OMPI_SUCCESS; + return NULL; } static void mca_pml_ucx_waitall(void **reqs, size_t *count_p) @@ -495,12 +531,11 @@ int mca_pml_ucx_recv(void *buf, size_t count, ompi_datatype_t *datatype, int src PML_UCX_TRACE_RECV("%s", buf, count, datatype, src, tag, comm, "recv"); PML_UCX_MAKE_RECV_TAG(ucp_tag, ucp_tag_mask, tag, src, comm); - req = (char *)alloca(ompi_pml_ucx.request_size) + ompi_pml_ucx.request_size; + req = PML_UCX_REQ_ALLOCA(); status = ucp_tag_recv_nbr(ompi_pml_ucx.ucp_worker, buf, count, mca_pml_ucx_get_datatype(datatype), ucp_tag, ucp_tag_mask, req); - ucp_worker_progress(ompi_pml_ucx.ucp_worker); for (;;) { status = ucp_request_test(req, &info); if (status != UCS_INPROGRESS) { @@ -549,7 +584,6 @@ int mca_pml_ucx_isend_init(const void *buf, size_t count, ompi_datatype_t *datat ep = mca_pml_ucx_get_ep(comm, dst); if (OPAL_UNLIKELY(NULL == ep)) { - PML_UCX_ERROR("Failed to get ep for rank %d", dst); return OMPI_ERROR; } @@ -571,7 +605,7 @@ int mca_pml_ucx_isend_init(const void *buf, size_t count, ompi_datatype_t *datat return OMPI_SUCCESS; } -static int +static ucs_status_ptr_t mca_pml_ucx_bsend(ucp_ep_h ep, const void *buf, size_t count, ompi_datatype_t *datatype, uint64_t pml_tag) { @@ -593,21 +627,21 @@ mca_pml_ucx_bsend(ucp_ep_h ep, const void *buf, size_t count, if (OPAL_UNLIKELY(NULL == packed_data)) { OBJ_DESTRUCT(&opal_conv); PML_UCX_ERROR("bsend: failed to allocate buffer"); - return OMPI_ERR_OUT_OF_RESOURCE; + return UCS_STATUS_PTR(OMPI_ERROR); } iov_count = 1; iov.iov_base = packed_data; iov.iov_len = packed_length; - PML_UCX_VERBOSE(8, "bsend of packed buffer %p len %d", packed_data, packed_length); + PML_UCX_VERBOSE(8, "bsend of packed buffer %p len %zu", packed_data, packed_length); offset = 0; opal_convertor_set_position(&opal_conv, &offset); if (0 > opal_convertor_pack(&opal_conv, &iov, &iov_count, &packed_length)) { mca_pml_base_bsend_request_free(packed_data); OBJ_DESTRUCT(&opal_conv); PML_UCX_ERROR("bsend: failed to pack user datatype"); - return OMPI_ERROR; + return UCS_STATUS_PTR(OMPI_ERROR); } OBJ_DESTRUCT(&opal_conv); @@ -618,17 +652,34 @@ mca_pml_ucx_bsend(ucp_ep_h ep, const void *buf, size_t count, if (NULL == req) { /* request was completed in place */ mca_pml_base_bsend_request_free(packed_data); - return OMPI_SUCCESS; + return NULL; } if (OPAL_UNLIKELY(UCS_PTR_IS_ERR(req))) { mca_pml_base_bsend_request_free(packed_data); PML_UCX_ERROR("ucx bsend failed: %s", ucs_status_string(UCS_PTR_STATUS(req))); - return OMPI_ERROR; + return UCS_STATUS_PTR(OMPI_ERROR); } req->req_complete_cb_data = packed_data; - return OMPI_SUCCESS; + return NULL; +} + +static inline ucs_status_ptr_t mca_pml_ucx_common_send(ucp_ep_h ep, const void *buf, + size_t count, + ompi_datatype_t *datatype, + ucp_datatype_t ucx_datatype, + ucp_tag_t tag, + mca_pml_base_send_mode_t mode, + ucp_send_callback_t cb) +{ + if (OPAL_UNLIKELY(MCA_PML_BASE_SEND_BUFFERED == mode)) { + return mca_pml_ucx_bsend(ep, buf, count, datatype, tag); + } else if (OPAL_UNLIKELY(MCA_PML_BASE_SEND_SYNCHRONOUS == mode)) { + return ucp_tag_send_sync_nb(ep, buf, count, ucx_datatype, tag, cb); + } else { + return ucp_tag_send_nb(ep, buf, count, ucx_datatype, tag, cb); + } } int mca_pml_ucx_isend(const void *buf, size_t count, ompi_datatype_t *datatype, @@ -644,25 +695,16 @@ int mca_pml_ucx_isend(const void *buf, size_t count, ompi_datatype_t *datatype, mode == MCA_PML_BASE_SEND_BUFFERED ? "b" : "", (void*)request) - /* TODO special care to sync/buffered send */ - ep = mca_pml_ucx_get_ep(comm, dst); if (OPAL_UNLIKELY(NULL == ep)) { - PML_UCX_ERROR("Failed to get ep for rank %d", dst); return OMPI_ERROR; } - /* Special care to sync/buffered send */ - if (OPAL_UNLIKELY(MCA_PML_BASE_SEND_BUFFERED == mode)) { - *request = &ompi_pml_ucx.completed_send_req; - return mca_pml_ucx_bsend(ep, buf, count, datatype, - PML_UCX_MAKE_SEND_TAG(tag, comm)); - } + req = (ompi_request_t*)mca_pml_ucx_common_send(ep, buf, count, datatype, + mca_pml_ucx_get_datatype(datatype), + PML_UCX_MAKE_SEND_TAG(tag, comm), mode, + mca_pml_ucx_send_completion); - req = (ompi_request_t*)ucp_tag_send_nb(ep, buf, count, - mca_pml_ucx_get_datatype(datatype), - PML_UCX_MAKE_SEND_TAG(tag, comm), - mca_pml_ucx_send_completion); if (req == NULL) { PML_UCX_VERBOSE(8, "returning completed request"); *request = &ompi_pml_ucx.completed_send_req; @@ -677,32 +719,19 @@ int mca_pml_ucx_isend(const void *buf, size_t count, ompi_datatype_t *datatype, } } -int mca_pml_ucx_send(const void *buf, size_t count, ompi_datatype_t *datatype, int dst, - int tag, mca_pml_base_send_mode_t mode, - struct ompi_communicator_t* comm) +static inline __opal_attribute_always_inline__ int +mca_pml_ucx_send_nb(ucp_ep_h ep, const void *buf, size_t count, + ompi_datatype_t *datatype, ucp_datatype_t ucx_datatype, + ucp_tag_t tag, mca_pml_base_send_mode_t mode, + ucp_send_callback_t cb) { ompi_request_t *req; - ucp_ep_h ep; - - PML_UCX_TRACE_SEND("%s", buf, count, datatype, dst, tag, mode, comm, - mode == MCA_PML_BASE_SEND_BUFFERED ? "bsend" : "send"); - ep = mca_pml_ucx_get_ep(comm, dst); - if (OPAL_UNLIKELY(NULL == ep)) { - PML_UCX_ERROR("Failed to get ep for rank %d", dst); - return OMPI_ERROR; - } + req = (ompi_request_t*)mca_pml_ucx_common_send(ep, buf, count, datatype, + mca_pml_ucx_get_datatype(datatype), + tag, mode, + mca_pml_ucx_send_completion); - /* Special care to sync/buffered send */ - if (OPAL_UNLIKELY(MCA_PML_BASE_SEND_BUFFERED == mode)) { - return mca_pml_ucx_bsend(ep, buf, count, datatype, - PML_UCX_MAKE_SEND_TAG(tag, comm)); - } - - req = (ompi_request_t*)ucp_tag_send_nb(ep, buf, count, - mca_pml_ucx_get_datatype(datatype), - PML_UCX_MAKE_SEND_TAG(tag, comm), - mca_pml_ucx_send_completion); if (OPAL_LIKELY(req == NULL)) { return OMPI_SUCCESS; } else if (!UCS_PTR_IS_ERR(req)) { @@ -716,6 +745,59 @@ int mca_pml_ucx_send(const void *buf, size_t count, ompi_datatype_t *datatype, i } } +#if HAVE_DECL_UCP_TAG_SEND_NBR +static inline __opal_attribute_always_inline__ int +mca_pml_ucx_send_nbr(ucp_ep_h ep, const void *buf, size_t count, + ucp_datatype_t ucx_datatype, ucp_tag_t tag) + +{ + void *req; + ucs_status_t status; + + req = PML_UCX_REQ_ALLOCA(); + status = ucp_tag_send_nbr(ep, buf, count, ucx_datatype, tag, req); + if (OPAL_LIKELY(status == UCS_OK)) { + return OMPI_SUCCESS; + } + + ucp_worker_progress(ompi_pml_ucx.ucp_worker); + while ((status = ucp_request_check_status(req)) == UCS_INPROGRESS) { + opal_progress(); + } + + return OPAL_LIKELY(UCS_OK == status) ? OMPI_SUCCESS : OMPI_ERROR; +} +#endif + +int mca_pml_ucx_send(const void *buf, size_t count, ompi_datatype_t *datatype, int dst, + int tag, mca_pml_base_send_mode_t mode, + struct ompi_communicator_t* comm) +{ + ucp_ep_h ep; + + PML_UCX_TRACE_SEND("%s", buf, count, datatype, dst, tag, mode, comm, + mode == MCA_PML_BASE_SEND_BUFFERED ? "bsend" : "send"); + + ep = mca_pml_ucx_get_ep(comm, dst); + if (OPAL_UNLIKELY(NULL == ep)) { + return OMPI_ERROR; + } + +#if HAVE_DECL_UCP_TAG_SEND_NBR + if (OPAL_LIKELY((MCA_PML_BASE_SEND_BUFFERED != mode) && + (MCA_PML_BASE_SEND_SYNCHRONOUS != mode))) { + return mca_pml_ucx_send_nbr(ep, buf, count, + mca_pml_ucx_get_datatype(datatype), + PML_UCX_MAKE_SEND_TAG(tag, comm)); + } +#endif + + return mca_pml_ucx_send_nb(ep, buf, count, datatype, + mca_pml_ucx_get_datatype(datatype), + PML_UCX_MAKE_SEND_TAG(tag, comm), mode, + mca_pml_ucx_send_completion); +} + int mca_pml_ucx_iprobe(int src, int tag, struct ompi_communicator_t* comm, int *matched, ompi_status_public_t* mpi_status) { @@ -861,7 +943,6 @@ int mca_pml_ucx_start(size_t count, ompi_request_t** requests) mca_pml_ucx_persistent_request_t *preq; ompi_request_t *tmp_req; size_t i; - int rc; for (i = 0; i < count; ++i) { preq = (mca_pml_ucx_persistent_request_t *)requests[i]; @@ -876,22 +957,14 @@ int mca_pml_ucx_start(size_t count, ompi_request_t** requests) mca_pml_ucx_request_reset(&preq->ompi); if (preq->flags & MCA_PML_UCX_REQUEST_FLAG_SEND) { - if (OPAL_UNLIKELY(MCA_PML_BASE_SEND_BUFFERED == preq->send.mode)) { - PML_UCX_VERBOSE(8, "start bsend request %p", (void*)preq); - rc = mca_pml_ucx_bsend(preq->send.ep, preq->buffer, preq->count, - preq->ompi_datatype, preq->tag); - if (OMPI_SUCCESS != rc) { - return rc; - } - /* pretend that we got immediate completion */ - tmp_req = NULL; - } else { - PML_UCX_VERBOSE(8, "start send request %p", (void*)preq); - tmp_req = (ompi_request_t*)ucp_tag_send_nb(preq->send.ep, preq->buffer, - preq->count, preq->datatype, - preq->tag, - mca_pml_ucx_psend_completion); - } + tmp_req = (ompi_request_t*)mca_pml_ucx_common_send(preq->send.ep, + preq->buffer, + preq->count, + preq->ompi_datatype, + preq->datatype, + preq->tag, + preq->send.mode, + mca_pml_ucx_psend_completion); } else { PML_UCX_VERBOSE(8, "start recv request %p", (void*)preq); tmp_req = (ompi_request_t*)ucp_tag_recv_nb(ompi_pml_ucx.ucp_worker, diff --git a/ompi/mca/pml/ucx/pml_ucx.h b/ompi/mca/pml/ucx/pml_ucx.h index 44320b2a48e..feec3683289 100644 --- a/ompi/mca/pml/ucx/pml_ucx.h +++ b/ompi/mca/pml/ucx/pml_ucx.h @@ -87,7 +87,6 @@ int mca_pml_ucx_close(void); int mca_pml_ucx_init(void); int mca_pml_ucx_cleanup(void); -ucp_ep_h mca_pml_ucx_add_proc(ompi_communicator_t *comm, int dst); int mca_pml_ucx_add_procs(struct ompi_proc_t **procs, size_t nprocs); int mca_pml_ucx_del_procs(struct ompi_proc_t **procs, size_t nprocs); diff --git a/ompi/mca/pml/ucx/pml_ucx_component.c b/ompi/mca/pml/ucx/pml_ucx_component.c index 4ca2a0b0702..f2266474f67 100644 --- a/ompi/mca/pml/ucx/pml_ucx_component.c +++ b/ompi/mca/pml/ucx/pml_ucx_component.c @@ -55,7 +55,7 @@ static int mca_pml_ucx_component_register(void) MCA_BASE_VAR_SCOPE_LOCAL, &ompi_pml_ucx.verbose); - ompi_pml_ucx.priority = 5; + ompi_pml_ucx.priority = 51; (void) mca_base_component_var_register(&mca_pml_ucx_component.pmlm_version, "priority", "Priority of the UCX component", MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, diff --git a/ompi/mca/pml/ucx/pml_ucx_datatype.c b/ompi/mca/pml/ucx/pml_ucx_datatype.c index 9970a64c1b2..98b7b190df7 100644 --- a/ompi/mca/pml/ucx/pml_ucx_datatype.c +++ b/ompi/mca/pml/ucx/pml_ucx_datatype.c @@ -40,6 +40,7 @@ static void* pml_ucx_generic_datatype_start_unpack(void *context, void *buffer, OMPI_DATATYPE_RETAIN(datatype); convertor->datatype = datatype; + convertor->offset = 0; opal_convertor_copy_and_prepare_for_recv(ompi_proc_local_proc->super.proc_convertor, &datatype->super, count, buffer, 0, &convertor->opal_conv); @@ -80,13 +81,31 @@ static ucs_status_t pml_ucx_generic_datatype_unpack(void *state, size_t offset, uint32_t iov_count; struct iovec iov; + opal_convertor_t conv; iov_count = 1; iov.iov_base = (void*)src; iov.iov_len = length; - opal_convertor_set_position(&convertor->opal_conv, &offset); - opal_convertor_unpack(&convertor->opal_conv, &iov, &iov_count, &length); + /* in case if unordered message arrived - create separate convertor to + * unpack data. */ + if (offset != convertor->offset) { + OBJ_CONSTRUCT(&conv, opal_convertor_t); + opal_convertor_copy_and_prepare_for_recv(ompi_proc_local_proc->super.proc_convertor, + &convertor->datatype->super, + convertor->opal_conv.count, + convertor->opal_conv.pBaseBuf, 0, + &conv); + opal_convertor_set_position(&conv, &offset); + opal_convertor_unpack(&conv, &iov, &iov_count, &length); + opal_convertor_cleanup(&conv); + OBJ_DESTRUCT(&conv); + /* permanently switch to un-ordered mode */ + convertor->offset = 0; + } else { + opal_convertor_unpack(&convertor->opal_conv, &iov, &iov_count, &length); + convertor->offset += length; + } return UCS_OK; } diff --git a/ompi/mca/pml/ucx/pml_ucx_datatype.h b/ompi/mca/pml/ucx/pml_ucx_datatype.h index 79dce36cc8e..26b1835a153 100644 --- a/ompi/mca/pml/ucx/pml_ucx_datatype.h +++ b/ompi/mca/pml/ucx/pml_ucx_datatype.h @@ -17,6 +17,7 @@ struct pml_ucx_convertor { opal_free_list_item_t super; ompi_datatype_t *datatype; opal_convertor_t opal_conv; + size_t offset; }; diff --git a/ompi/mca/pml/ucx/pml_ucx_request.c b/ompi/mca/pml/ucx/pml_ucx_request.c index 01dac786b8b..05533914a4c 100644 --- a/ompi/mca/pml/ucx/pml_ucx_request.c +++ b/ompi/mca/pml/ucx/pml_ucx_request.c @@ -136,6 +136,7 @@ static void mca_pml_ucx_request_init_common(ompi_request_t* ompi_req, OMPI_REQUEST_INIT(ompi_req, req_persistent); ompi_req->req_type = OMPI_REQUEST_PML; ompi_req->req_state = state; + ompi_req->req_start = mca_pml_ucx_start; ompi_req->req_free = req_free; ompi_req->req_cancel = req_cancel; /* This field is used to attach persistant request to a temporary req. diff --git a/ompi/mca/pml/ucx/pml_ucx_request.h b/ompi/mca/pml/ucx/pml_ucx_request.h index 5aa657eccbd..9166f042ae9 100644 --- a/ompi/mca/pml/ucx/pml_ucx_request.h +++ b/ompi/mca/pml/ucx/pml_ucx_request.h @@ -26,15 +26,15 @@ enum { /* * UCX tag structure: * - * 01234567 01234567 01234567 01234567 01234567 01234567 01234567 01234567 - * | | - * message tag (24) | source rank (24) | context id (16) - * | | + * 01234567 01234567 01234567 01234567 01234567 0123 4567 01234567 01234567 + * | | + * message tag (24) | source rank (20) | context id (20) + * | | */ #define PML_UCX_TAG_BITS 24 -#define PML_UCX_RANK_BITS 24 -#define PML_UCX_CONTEXT_BITS 16 -#define PML_UCX_ANY_SOURCE_MASK 0x800000000000fffful +#define PML_UCX_RANK_BITS 20 +#define PML_UCX_CONTEXT_BITS 20 +#define PML_UCX_ANY_SOURCE_MASK 0x80000000000ffffful #define PML_UCX_SPECIFIC_SOURCE_MASK 0x800000fffffffffful #define PML_UCX_TAG_MASK 0x7fffff0000000000ul @@ -89,7 +89,7 @@ enum { #define PML_UCX_MESSAGE_RELEASE(_message) \ { \ ompi_message_return(*(_message)); \ - *(_message) = NULL; \ + *(_message) = MPI_MESSAGE_NULL; \ } @@ -136,16 +136,6 @@ void mca_pml_ucx_request_init(void *request); void mca_pml_ucx_request_cleanup(void *request); -static inline ucp_ep_h mca_pml_ucx_get_ep(ompi_communicator_t *comm, int dst) -{ - ucp_ep_h ep = ompi_comm_peer_lookup(comm,dst)->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_PML]; - if (OPAL_UNLIKELY(NULL == ep)) { - ep = mca_pml_ucx_add_proc(comm, dst); - } - - return ep; -} - static inline void mca_pml_ucx_request_reset(ompi_request_t *req) { req->req_complete = REQUEST_PENDING; @@ -180,6 +170,7 @@ static inline void mca_pml_ucx_set_recv_status(ompi_status_public_t* mpi_status, } else if (ucp_status == UCS_ERR_MESSAGE_TRUNCATED) { mpi_status->MPI_ERROR = MPI_ERR_TRUNCATE; } else if (ucp_status == UCS_ERR_CANCELED) { + mpi_status->MPI_ERROR = MPI_SUCCESS; mpi_status->_cancelled = true; } else { mpi_status->MPI_ERROR = MPI_ERR_INTERN; diff --git a/ompi/mca/pml/v/Makefile.am b/ompi/mca/pml/v/Makefile.am index c7c51db30c3..3fd61be21df 100644 --- a/ompi/mca/pml/v/Makefile.am +++ b/ompi/mca/pml/v/Makefile.am @@ -2,6 +2,7 @@ # Copyright (c) 2004-2007 The Trustees of the University of Tennessee. # All rights reserved. # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -31,6 +32,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_pml_v_la_SOURCES = $(local_sources) mca_pml_v_la_LDFLAGS = -module -avoid-version +mca_pml_v_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_pml_v_la_SOURCES = $(local_sources) diff --git a/ompi/mca/pml/v/pml_v_component.c b/ompi/mca/pml/v/pml_v_component.c index eb09036fb7c..fce84fc228c 100644 --- a/ompi/mca/pml/v/pml_v_component.c +++ b/ompi/mca/pml/v/pml_v_component.c @@ -2,11 +2,12 @@ /* * Copyright (c) 2004-2015 The Trustees of the University of Tennessee. * All rights reserved. - * Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2010-2018 Cisco Systems, Inc. All rights reserved * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -100,7 +101,7 @@ static int mca_pml_v_component_register(void) static int mca_pml_v_component_open(void) { int rc; - pml_v_output_open(ompi_pml_v_output, ompi_pml_v_verbose); + ompi_pml_v_output_open(ompi_pml_v_output, ompi_pml_v_verbose); V_OUTPUT_VERBOSE(500, "loaded"); @@ -111,7 +112,7 @@ static int mca_pml_v_component_open(void) } if( NULL == mca_vprotocol_base_include_list ) { - pml_v_output_close(); + ompi_pml_v_output_close(); return mca_base_framework_close(&ompi_vprotocol_base_framework); } @@ -136,13 +137,18 @@ static int mca_pml_v_component_close(void) } /* Make sure to close out output even if vprotocol isn't in use */ - pml_v_output_close (); + ompi_pml_v_output_close (); /* Mark that we have changed something */ - snprintf(mca_pml_base_selected_component.pmlm_version.mca_component_name, - MCA_BASE_MAX_TYPE_NAME_LEN, "%s]v%s", + char *new_name; + asprintf(&new_name, "%s]v%s", mca_pml_v.host_pml_component.pmlm_version.mca_component_name, mca_vprotocol_component.pmlm_version.mca_component_name); + size_t len = sizeof(mca_pml_base_selected_component.pmlm_version.mca_component_name); + strncpy(mca_pml_base_selected_component.pmlm_version.mca_component_name, + new_name, len - 1); + mca_pml_base_selected_component.pmlm_version.mca_component_name[len - 1] = '\0'; + free(new_name); /* Replace finalize */ mca_pml_base_selected_component.pmlm_finalize = @@ -188,7 +194,7 @@ static int mca_pml_v_component_parasite_close(void) mca_pml_base_selected_component = mca_pml_v.host_pml_component; (void) mca_base_framework_close(&ompi_vprotocol_base_framework); - pml_v_output_close(); + ompi_pml_v_output_close(); mca_pml.pml_enable = mca_pml_v.host_pml.pml_enable; /* don't need to call the host component's close: pml_base will do it */ diff --git a/ompi/mca/pml/v/pml_v_output.c b/ompi/mca/pml/v/pml_v_output.c index 4d9102a822a..6fa44042ad8 100644 --- a/ompi/mca/pml/v/pml_v_output.c +++ b/ompi/mca/pml/v/pml_v_output.c @@ -1,6 +1,7 @@ /* * Copyright (c) 2004-2007 The Trustees of the University of Tennessee. * All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -19,7 +20,7 @@ #endif #include -int pml_v_output_open(char *output, int verbosity) { +int ompi_pml_v_output_open(char *output, int verbosity) { opal_output_stream_t lds; char hostname[OPAL_MAXHOSTNAMELEN] = "NA"; @@ -49,7 +50,7 @@ int pml_v_output_open(char *output, int verbosity) { return mca_pml_v.output; } -void pml_v_output_close(void) { +void ompi_pml_v_output_close(void) { opal_output_close(mca_pml_v.output); mca_pml_v.output = -1; } diff --git a/ompi/mca/pml/v/pml_v_output.h b/ompi/mca/pml/v/pml_v_output.h index 77bb5b14055..3ddf213e269 100644 --- a/ompi/mca/pml/v/pml_v_output.h +++ b/ompi/mca/pml/v/pml_v_output.h @@ -1,6 +1,7 @@ /* * Copyright (c) 2004-2007 The Trustees of the University of Tennessee. * All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -18,8 +19,8 @@ BEGIN_C_DECLS -int pml_v_output_open(char *output, int verbosity); -void pml_v_output_close(void); +int ompi_pml_v_output_open(char *output, int verbosity); +void ompi_pml_v_output_close(void); static inline void V_OUTPUT_ERR(const char *fmt, ... ) __opal_attribute_format__(__printf__, 1, 2); static inline void V_OUTPUT_ERR(const char *fmt, ... ) diff --git a/ompi/mca/pml/yalla/Makefile.am b/ompi/mca/pml/yalla/Makefile.am index 0ca79ef7dd7..78d2726e34d 100644 --- a/ompi/mca/pml/yalla/Makefile.am +++ b/ompi/mca/pml/yalla/Makefile.am @@ -2,6 +2,7 @@ # Copyright (c) 2001-2014 Mellanox Technologies Ltd. ALL RIGHTS RESERVED. # Copyright (c) 2015 Research Organization for Information Science # and Technology (RIST). All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -37,7 +38,8 @@ endif mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_pml_yalla_la_SOURCES = $(local_sources) -mca_pml_yalla_la_LIBADD = $(pml_yalla_LIBS) +mca_pml_yalla_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(pml_yalla_LIBS) mca_pml_yalla_la_LDFLAGS = -module -avoid-version $(pml_yalla_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/ompi/mca/pml/yalla/pml_yalla.c b/ompi/mca/pml/yalla/pml_yalla.c index 3f74ff3f44f..4494ca1022d 100644 --- a/ompi/mca/pml/yalla/pml_yalla.c +++ b/ompi/mca/pml/yalla/pml_yalla.c @@ -369,6 +369,7 @@ int mca_pml_yalla_recv(void *buf, size_t count, ompi_datatype_t *datatype, int s { mxm_recv_req_t rreq; mxm_error_t error; + int rc; PML_YALLA_INIT_MXM_RECV_REQ(&rreq, buf, count, datatype, src, tag, comm, recv); PML_YALLA_INIT_BLOCKING_MXM_RECV_REQ(&rreq); @@ -387,10 +388,10 @@ int mca_pml_yalla_recv(void *buf, size_t count, ompi_datatype_t *datatype, int s rreq.completion.sender_imm, rreq.completion.sender_tag, rreq.tag, rreq.tag_mask, rreq.completion.actual_len); - PML_YALLA_SET_RECV_STATUS(&rreq, rreq.completion.actual_len, status); + rc = PML_YALLA_SET_RECV_STATUS(&rreq, rreq.completion.actual_len, status); PML_YALLA_FREE_BLOCKING_MXM_REQ(&rreq.base); - return OMPI_SUCCESS; + return rc; } int mca_pml_yalla_isend_init(const void *buf, size_t count, ompi_datatype_t *datatype, @@ -678,8 +679,7 @@ int mca_pml_yalla_mrecv(void *buf, size_t count, ompi_datatype_t *datatype, rreq.completion.sender_imm, rreq.completion.sender_tag, rreq.tag, rreq.tag_mask, rreq.completion.actual_len); - PML_YALLA_SET_RECV_STATUS(&rreq, rreq.completion.actual_len, status); - return OMPI_SUCCESS; + return PML_YALLA_SET_RECV_STATUS(&rreq, rreq.completion.actual_len, status); } int mca_pml_yalla_start(size_t count, ompi_request_t** requests) diff --git a/ompi/mca/pml/yalla/pml_yalla_datatype.h b/ompi/mca/pml/yalla/pml_yalla_datatype.h index c77dfd41ba2..744ee2ece34 100644 --- a/ompi/mca/pml/yalla/pml_yalla_datatype.h +++ b/ompi/mca/pml/yalla/pml_yalla_datatype.h @@ -3,6 +3,7 @@ * Copyright (C) Mellanox Technologies Ltd. 2001-2011. ALL RIGHTS RESERVED. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -25,15 +26,13 @@ OBJ_CLASS_DECLARATION(mca_pml_yalla_convertor_t); #define PML_YALLA_INIT_MXM_REQ_DATA(_req_base, _buf, _count, _dtype, _stream_type, ...) \ { \ - size_t size; \ - ptrdiff_t lb; \ + ptrdiff_t span, gap; \ \ if (opal_datatype_is_contiguous_memory_layout(&(_dtype)->super, _count)) { \ - ompi_datatype_type_size(_dtype, &size); \ - ompi_datatype_type_lb(_dtype, &lb); \ + span = opal_datatype_span(&(_dtype)->super, (_count), &gap); \ (_req_base)->data_type = MXM_REQ_DATA_BUFFER; \ - (_req_base)->data.buffer.ptr = (char *)_buf + lb; \ - (_req_base)->data.buffer.length = size * (_count); \ + (_req_base)->data.buffer.ptr = (char *)_buf + gap; \ + (_req_base)->data.buffer.length = span; \ } else { \ mca_pml_yalla_set_noncontig_data_ ## _stream_type(_req_base, \ _buf, _count, \ diff --git a/ompi/mca/pml/yalla/pml_yalla_request.c b/ompi/mca/pml/yalla/pml_yalla_request.c index f75c2d9b446..a591371551a 100644 --- a/ompi/mca/pml/yalla/pml_yalla_request.c +++ b/ompi/mca/pml/yalla/pml_yalla_request.c @@ -149,6 +149,7 @@ static void init_base_req(mca_pml_yalla_base_request_t *req) { OMPI_REQUEST_INIT(&req->ompi, false); req->ompi.req_type = OMPI_REQUEST_PML; + req->ompi.req_start = mca_pml_yalla_start; req->ompi.req_cancel = NULL; req->ompi.req_complete_cb = NULL; req->ompi.req_complete_cb_data = NULL; diff --git a/ompi/mca/pml/yalla/pml_yalla_request.h b/ompi/mca/pml/yalla/pml_yalla_request.h index c469ee74426..26aa5f8a2de 100644 --- a/ompi/mca/pml/yalla/pml_yalla_request.h +++ b/ompi/mca/pml/yalla/pml_yalla_request.h @@ -175,31 +175,40 @@ static inline mca_pml_yalla_send_request_t* MCA_PML_YALLA_SREQ_INIT(void *_buf, } \ } -#define PML_YALLA_SET_RECV_STATUS(_rreq, _length, _mpi_status) \ - { \ - if ((_mpi_status) != MPI_STATUS_IGNORE) { \ - switch ((_rreq)->base.error) { \ - case MXM_OK: \ - (_mpi_status)->MPI_ERROR = OMPI_SUCCESS; \ - break; \ - case MXM_ERR_CANCELED: \ - (_mpi_status)->MPI_ERROR = OMPI_SUCCESS; \ - (_mpi_status)->_cancelled = true; \ - break; \ - case MXM_ERR_MESSAGE_TRUNCATED: \ - (_mpi_status)->MPI_ERROR = MPI_ERR_TRUNCATE; \ - break; \ - default: \ - (_mpi_status)->MPI_ERROR = MPI_ERR_INTERN; \ - break; \ - } \ - \ - (_mpi_status)->MPI_TAG = (_rreq)->completion.sender_tag; \ - (_mpi_status)->MPI_SOURCE = (_rreq)->completion.sender_imm; \ - (_mpi_status)->_ucount = (_length); \ - } \ +static inline int PML_YALLA_SET_RECV_STATUS(mxm_recv_req_t *_rreq, + size_t _length, + ompi_status_public_t *_mpi_status) +{ + int rc; + + switch (_rreq->base.error) { + case MXM_OK: + rc = OMPI_SUCCESS; + break; + case MXM_ERR_CANCELED: + rc = OMPI_SUCCESS; + break; + case MXM_ERR_MESSAGE_TRUNCATED: + rc = MPI_ERR_TRUNCATE; + break; + default: + rc = MPI_ERR_INTERN; + break; } + /* If status is not ignored, fill what is needed */ + if (_mpi_status != MPI_STATUS_IGNORE) { + _mpi_status->MPI_ERROR = rc; + if (MXM_ERR_CANCELED == _rreq->base.error) { + _mpi_status->_cancelled = true; + } + _mpi_status->MPI_TAG = _rreq->completion.sender_tag; + _mpi_status->MPI_SOURCE = _rreq->completion.sender_imm; + _mpi_status->_ucount = _length; + } + return rc; +} + #define PML_YALLA_SET_MESSAGE(_rreq, _comm, _mxm_msg, _message) \ { \ *(_message) = ompi_message_alloc(); \ @@ -212,7 +221,7 @@ static inline mca_pml_yalla_send_request_t* MCA_PML_YALLA_SREQ_INIT(void *_buf, #define PML_YALLA_MESSAGE_RELEASE(_message) \ { \ ompi_message_return(*(_message)); \ - *(_message) = NULL; \ + *(_message) = MPI_MESSAGE_NULL; \ } #endif /* PML_YALLA_REQUEST_H_ */ diff --git a/ompi/mca/rte/orte/Makefile.am b/ompi/mca/rte/orte/Makefile.am index 804d66adb52..451436373b3 100644 --- a/ompi/mca/rte/orte/Makefile.am +++ b/ompi/mca/rte/orte/Makefile.am @@ -2,7 +2,7 @@ # Copyright (c) 2012 Los Alamos National Security, LLC. # All rights reserved. # Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. -# Copyright (c) 2016 Intel, Inc. All rights reserved. +# Copyright (c) 2016-2017 Intel, Inc. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -28,10 +28,12 @@ libmca_rte_orte_la_SOURCES =$(sources) $(headers) libmca_rte_orte_la_LDFLAGS = -module -avoid-version libmca_rte_orte_la_LIBADD = $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la -man_pages = mpirun.1 mpiexec.1 ompi-ps.1 ompi-clean.1 ompi-top.1 ompi-server.1 ompi-dvm.1 +man_pages = mpirun.1 mpiexec.1 ompi-ps.1 ompi-clean.1 ompi-top.1 ompi-server.1 -if WANT_FT -man_pages += ompi-checkpoint.1 ompi-restart.1 +if OPAL_WANT_PRUN +if WANT_INSTALL_HEADERS +man_pages += ompi-dvm.1 +endif endif if OPAL_INSTALL_BINARIES @@ -44,11 +46,8 @@ install-exec-hook: (cd $(DESTDIR)$(bindir); rm -f ompi-clean$(EXEEXT); $(LN_S) orte-clean$(EXEEXT) ompi-clean$(EXEEXT)) (cd $(DESTDIR)$(bindir); rm -f ompi-top$(EXEEXT); $(LN_S) orte-top$(EXEEXT) ompi-top$(EXEEXT)) (cd $(DESTDIR)$(bindir); rm -f ompi-server$(EXEEXT); $(LN_S) orte-server$(EXEEXT) ompi-server$(EXEEXT)) +if OPAL_WANT_PRUN (cd $(DESTDIR)$(bindir); rm -f ompi-dvm$(EXEEXT); $(LN_S) orte-dvm$(EXEEXT) ompi-dvm$(EXEEXT)) -if WANT_FT - (cd $(DESTDIR)$(bindir); rm -f ompi-checkpoint$(EXEEXT); $(LN_S) orte-checkpoint$(EXEEXT) ompi-checkpoint$(EXEEXT)) - (cd $(DESTDIR)$(bindir); rm -f ompi-restart$(EXEEXT); $(LN_S) orte-restart$(EXEEXT) ompi-restart$(EXEEXT)) - (cd $(DESTDIR)$(bindir); rm -f ompi-migrate$(EXEEXT); $(LN_S) orte-migrate$(EXEEXT) ompi-migrate$(EXEEXT)) endif uninstall-local: @@ -57,12 +56,9 @@ uninstall-local: $(DESTDIR)$(bindir)/ompi-ps$(EXEEXT) \ $(DESTDIR)$(bindir)/ompi-clean$(EXEEXT) \ $(DESTDIR)$(bindir)/ompi-top$(EXEEXT) \ - $(DESTDIR)$(bindir)/ompi-server$(EXEEXT) \ - $(DESTDIR)$(bindir)/ompi-dvm$(EXEEXT) -if WANT_FT - rm -f $(DESTDIR)$(bindir)/ompi-checkpoint$(EXEEXT) \ - $(DESTDIR)$(bindir)/ompi-restart$(EXEEXT) \ - $(DESTDIR)$(bindir)/ompi-migrate$(EXEEXT) + $(DESTDIR)$(bindir)/ompi-server$(EXEEXT) +if OPAL_WANT_PRUN + rm -f $(DESTDIR)$(bindir)/ompi-dvm$(EXEEXT) endif endif # OPAL_INSTALL_BINARIES @@ -88,24 +84,6 @@ $(top_builddir)/orte/tools/orte-clean/orte-clean.1: ompi-clean.1: $(top_builddir)/orte/tools/orte-clean/orte-clean.1 cp -f $(top_builddir)/orte/tools/orte-clean/orte-clean.1 ompi-clean.1 -$(top_builddir)/orte/tools/orte-checkpoint/orte-checkpoint.1: - (cd $(top_builddir)/orte/tools/orte-checkpoint && $(MAKE) $(AM_MAKEFLAGS) orte-checkpoint.1) - -ompi-checkpoint.1: $(top_builddir)/orte/tools/orte-checkpoint/orte-checkpoint.1 - cp -f $(top_builddir)/orte/tools/orte-checkpoint/orte-checkpoint.1 ompi-checkpoint.1 - -$(top_builddir)/orte/tools/orte-restart/orte-restart.1: - (cd $(top_builddir)/orte/tools/orte-restart && $(MAKE) $(AM_MAKEFLAGS) orte-restart.1) - -ompi-restart.1: $(top_builddir)/orte/tools/orte-restart/orte-restart.1 - cp -f $(top_builddir)/orte/tools/orte-restart/orte-restart.1 ompi-restart.1 - -$(top_builddir)/orte/tools/orte-migrate/orte-migrate.1: - (cd $(top_builddir)/orte/tools/orte-migrate && $(MAKE) $(AM_MAKEFLAGS) orte-migrate.1) - -ompi-migrate.1: $(top_builddir)/orte/tools/orte-migrate/orte-migrate.1 - cp -f $(top_builddir)/orte/tools/orte-migrate/orte-migrate.1 ompi-migrate.1 - $(top_builddir)/orte/tools/orte-top/orte-top.1: (cd $(top_builddir)/orte/tools/orte-top && $(MAKE) $(AM_MAKEFLAGS) orte-top.1) @@ -118,8 +96,10 @@ $(top_builddir)/orte/tools/orte-server/orte-server.1: ompi-server.1: $(top_builddir)/orte/tools/orte-server/orte-server.1 cp -f $(top_builddir)/orte/tools/orte-server/orte-server.1 ompi-server.1 +if OPAL_WANT_PRUN ompi-dvm.1: $(top_builddir)/orte/tools/orte-dvm/orte-dvm.1 cp -f $(top_builddir)/orte/tools/orte-dvm/orte-dvm.1 ompi-dvm.1 +endif clean-local: rm -f $(man_pages) diff --git a/ompi/mca/rte/orte/configure.m4 b/ompi/mca/rte/orte/configure.m4 index ab8a15df302..95adc6e2946 100644 --- a/ompi/mca/rte/orte/configure.m4 +++ b/ompi/mca/rte/orte/configure.m4 @@ -3,6 +3,7 @@ # Copyright (c) 2012 Los Alamos National Security, LLC. All rights reserved. # Copyright (c) 2013 Sandia National Laboratories. All rights reserved. # +# Copyright (c) 2017 Intel, Inc. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -10,8 +11,9 @@ # $HEADER$ # -# Highest priority, as it's the default -AC_DEFUN([MCA_ompi_rte_orte_PRIORITY], [100]) +# Lowest priority, as it's the default and we want +# it to be able to be overridden +AC_DEFUN([MCA_ompi_rte_orte_PRIORITY], [10]) # Force this component to compile in static-only mode AC_DEFUN([MCA_ompi_rte_orte_COMPILE_MODE], [ diff --git a/ompi/mca/rte/orte/rte_orte.h b/ompi/mca/rte/orte/rte_orte.h index b71a6e8323a..665cdd9e7bc 100644 --- a/ompi/mca/rte/orte/rte_orte.h +++ b/ompi/mca/rte/orte/rte_orte.h @@ -1,11 +1,12 @@ /* * Copyright (c) 2012-2013 Los Alamos National Security, LLC. * All rights reserved. - * Copyright (c) 2013-2015 Intel, Inc. All rights reserved + * Copyright (c) 2013-2017 Intel, Inc. All rights reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2014-2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Intel, Inc. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -37,7 +38,6 @@ struct opal_proc_t; #include "orte/util/name_fns.h" #include "orte/util/proc_info.h" -#include "ompi/info/info.h" struct ompi_proc_t; struct ompi_communicator_t; @@ -67,13 +67,11 @@ typedef orte_ns_cmp_bitmask_t ompi_rte_cmp_bitmask_t; #define OMPI_NAME ORTE_NAME #define OMPI_PROCESS_NAME_HTON ORTE_PROCESS_NAME_HTON #define OMPI_PROCESS_NAME_NTOH ORTE_PROCESS_NAME_NTOH -#define OMPI_RTE_MY_NODEID ORTE_PROC_MY_DAEMON->vpid -/* database keys */ -#define OMPI_RTE_NODE_ID ORTE_DB_DAEMON_VPID -#define OMPI_RTE_HOST_ID ORTE_DB_HOSTID #if OPAL_ENABLE_DEBUG -static inline orte_process_name_t * OMPI_CAST_RTE_NAME(opal_process_name_t * name); +static inline orte_process_name_t * OMPI_CAST_RTE_NAME(opal_process_name_t * name) { + return (orte_process_name_t *)name; +} #else #define OMPI_CAST_RTE_NAME(a) ((orte_process_name_t*)(a)) #endif @@ -95,26 +93,10 @@ OMPI_DECLSPEC void __opal_attribute_noreturn__ #define ompi_rte_finalize() orte_finalize() OMPI_DECLSPEC void ompi_rte_wait_for_debugger(void); -typedef struct { - ompi_rte_component_t super; - opal_mutex_t lock; - opal_list_t modx_reqs; -} ompi_rte_orte_component_t; - -typedef struct { - opal_list_item_t super; - opal_mutex_t lock; - opal_condition_t cond; - bool active; - orte_process_name_t peer; -} ompi_orte_tracker_t; -OBJ_CLASS_DECLARATION(ompi_orte_tracker_t); +/* check dynamics support */ +OMPI_DECLSPEC bool ompi_rte_connect_accept_support(const char *port); -#if OPAL_ENABLE_DEBUG -static inline orte_process_name_t * OMPI_CAST_RTE_NAME(opal_process_name_t * name) { - return (orte_process_name_t *)name; -} -#endif +#define ompi_proc_applied_binding orte_proc_applied_binding END_C_DECLS diff --git a/ompi/mca/rte/orte/rte_orte_component.c b/ompi/mca/rte/orte/rte_orte_component.c index 1c6817b0bfe..50e49935bd7 100644 --- a/ompi/mca/rte/orte/rte_orte_component.c +++ b/ompi/mca/rte/orte/rte_orte_component.c @@ -1,7 +1,7 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* * Copyright (c) 2012 Los Alamos National Security, LLC. All rights reserved. - * Copyright (c) 2014-2016 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2017 Intel, Inc. All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. * @@ -47,59 +47,34 @@ static int rte_orte_close(void); * and pointers to our public functions in it */ -ompi_rte_orte_component_t mca_rte_orte_component = { - { - /* First, the mca_component_t struct containing meta information - about the component itself */ - - .base_version = { - OMPI_RTE_BASE_VERSION_1_0_0, - - /* Component name and version */ - .mca_component_name = "orte", - MCA_BASE_MAKE_VERSION(component, OMPI_MAJOR_VERSION, OMPI_MINOR_VERSION, - OMPI_RELEASE_VERSION), - - /* Component open and close functions */ - .mca_open_component = rte_orte_open, - .mca_close_component = rte_orte_close, - }, - .base_data = { - /* The component is checkpoint ready */ - MCA_BASE_METADATA_PARAM_CHECKPOINT - }, - } +ompi_rte_component_t mca_rte_orte_component = { + /* First, the mca_component_t struct containing meta information + about the component itself */ + + .base_version = { + OMPI_RTE_BASE_VERSION_1_0_0, + + /* Component name and version */ + .mca_component_name = "orte", + MCA_BASE_MAKE_VERSION(component, OMPI_MAJOR_VERSION, OMPI_MINOR_VERSION, + OMPI_RELEASE_VERSION), + + /* Component open and close functions */ + .mca_open_component = rte_orte_open, + .mca_close_component = rte_orte_close, + }, + .base_data = { + /* The component is checkpoint ready */ + MCA_BASE_METADATA_PARAM_CHECKPOINT + }, }; static int rte_orte_open(void) { - OBJ_CONSTRUCT(&mca_rte_orte_component.lock, opal_mutex_t); - OBJ_CONSTRUCT(&mca_rte_orte_component.modx_reqs, opal_list_t); - return OMPI_SUCCESS; } static int rte_orte_close(void) { - opal_mutex_lock(&mca_rte_orte_component.lock); - OPAL_LIST_DESTRUCT(&mca_rte_orte_component.modx_reqs); - opal_mutex_unlock(&mca_rte_orte_component.lock); - OBJ_DESTRUCT(&mca_rte_orte_component.lock); - return OMPI_SUCCESS; } - -static void con(ompi_orte_tracker_t *p) -{ - p->active = true; - OBJ_CONSTRUCT(&p->lock, opal_mutex_t); - OBJ_CONSTRUCT(&p->cond, opal_condition_t); -} -static void des(ompi_orte_tracker_t *p) -{ - OBJ_DESTRUCT(&p->lock); - OBJ_DESTRUCT(&p->cond); -} -OBJ_CLASS_INSTANCE(ompi_orte_tracker_t, - opal_list_item_t, - con, des); diff --git a/ompi/mca/rte/orte/rte_orte_module.c b/ompi/mca/rte/orte/rte_orte_module.c index aa4f5ad5a49..33dc0a10290 100644 --- a/ompi/mca/rte/orte/rte_orte_module.c +++ b/ompi/mca/rte/orte/rte_orte_module.c @@ -1,7 +1,7 @@ /* * Copyright (c) 2012-2013 Los Alamos National Security, LLC. * All rights reserved. - * Copyright (c) 2013-2016 Intel, Inc. All rights reserved. + * Copyright (c) 2013-2017 Intel, Inc. All rights reserved. * Copyright (c) 2012-2014 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. @@ -39,6 +39,7 @@ #include "orte/mca/routed/routed.h" #include "orte/util/name_fns.h" #include "orte/util/session_dir.h" +#include "orte/util/show_help.h" #include "orte/runtime/orte_globals.h" #include "orte/runtime/orte_wait.h" #include "orte/runtime/orte_data_server.h" @@ -50,7 +51,7 @@ #include "ompi/runtime/params.h" #include "ompi/communicator/communicator.h" -extern ompi_rte_orte_component_t mca_rte_orte_component; +extern ompi_rte_component_t mca_rte_orte_component; void ompi_rte_abort(int error_code, char *fmt, ...) { @@ -131,7 +132,7 @@ static void _register_fn(int status, void ompi_rte_wait_for_debugger(void) { int debugger; - opal_list_t *codes; + opal_list_t *codes, directives; opal_value_t *kv; char *evar; int time; @@ -179,9 +180,17 @@ void ompi_rte_wait_for_debugger(void) kv->data.integer = ORTE_ERR_DEBUGGER_RELEASE; opal_list_append(codes, &kv->super); - opal_pmix.register_evhandler(codes, NULL, _release_fn, _register_fn, codes); + OBJ_CONSTRUCT(&directives, opal_list_t); + kv = OBJ_NEW(opal_value_t); + kv->key = strdup(OPAL_PMIX_EVENT_HDLR_NAME); + kv->type = OPAL_STRING; + kv->data.string = strdup("MPI-DEBUGGER-ATTACH"); + opal_list_append(&directives, &kv->super); + + opal_pmix.register_evhandler(codes, &directives, _release_fn, _register_fn, codes); /* let the MPI progress engine run while we wait for registration to complete */ OMPI_WAIT_FOR_COMPLETION(debugger_register_active); + OPAL_LIST_DESTRUCT(&directives); /* let the MPI progress engine run while we wait for debugger release */ OMPI_WAIT_FOR_COMPLETION(debugger_event_active); @@ -190,3 +199,47 @@ void ompi_rte_wait_for_debugger(void) opal_pmix.deregister_evhandler(handler, NULL, NULL); } } + +bool ompi_rte_connect_accept_support(const char *port) +{ + char *ptr, *tmp; + orte_process_name_t name; + + /* were we launched by mpirun, or are we calling + * without a defined port? */ + if (NULL == orte_process_info.my_hnp_uri || + NULL == port || 0 == strlen(port)) { + return true; + } + + /* is the job family in the port different than my own? */ + tmp = strdup(port); // protect input + if (NULL == (ptr = strchr(tmp, ':'))) { + /* this port didn't come from us! */ + orte_show_help("help-orterun.txt", "orterun:malformedport", true); + free(tmp); + return false; + } + *ptr = '\0'; + if (ORTE_SUCCESS != orte_util_convert_string_to_process_name(&name, tmp)) { + free(tmp); + orte_show_help("help-orterun.txt", "orterun:malformedport", true); + return false; + } + free(tmp); + if (ORTE_JOB_FAMILY(ORTE_PROC_MY_NAME->jobid) == ORTE_JOB_FAMILY(name.jobid)) { + /* same job family, so our infrastructure is adequate */ + return true; + } + + /* if the job family of the port is different than our own + * and we were launched by mpirun, then we require ompi-server + * support */ + if (NULL == orte_data_server_uri) { + /* print a pretty help message */ + orte_show_help("help-orterun.txt", "orterun:server-unavailable", true); + return false; + } + + return true; +} diff --git a/ompi/mca/rte/pmix/Makefile.am b/ompi/mca/rte/pmix/Makefile.am new file mode 100644 index 00000000000..7ed2a4f61f4 --- /dev/null +++ b/ompi/mca/rte/pmix/Makefile.am @@ -0,0 +1,29 @@ +# +# Copyright (c) 2012 Los Alamos National Security, LLC. +# All rights reserved. +# Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2016-2017 Intel, Inc. All rights reserved. +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +headers = rte_pmix.h + +sources = \ + rte_pmix_component.c \ + rte_pmix_module.c + +# Conditionally install the header files +if WANT_INSTALL_HEADERS +ompidir = $(ompiincludedir)/$(subdir) +nobase_ompi_HEADERS = $(headers) +endif + +# We only ever build this component statically +noinst_LTLIBRARIES = libmca_rte_pmix.la +libmca_rte_pmix_la_SOURCES =$(sources) $(headers) +libmca_rte_pmix_la_LDFLAGS = -module -avoid-version +libmca_rte_pmix_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la diff --git a/ompi/mca/rte/pmix/configure.m4 b/ompi/mca/rte/pmix/configure.m4 new file mode 100644 index 00000000000..be29c0b3cb1 --- /dev/null +++ b/ompi/mca/rte/pmix/configure.m4 @@ -0,0 +1,45 @@ +# -*- shell-script -*- +# +# Copyright (c) 2012 Los Alamos National Security, LLC. All rights reserved. +# Copyright (c) 2013 Sandia National Laboratories. All rights reserved. +# +# Copyright (c) 2017 Intel, Inc. All rights reserved. +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +# Higher priority to override the default +AC_DEFUN([MCA_ompi_rte_pmix_PRIORITY], [50]) + +# Force this component to compile in static-only mode +AC_DEFUN([MCA_ompi_rte_pmix_COMPILE_MODE], [ + AC_MSG_CHECKING([for MCA component $2:$3 compile mode]) + $4="static" + AC_MSG_RESULT([$$4]) +]) + +# If component was selected, $1 will be 1 and we should set the base header +AC_DEFUN([MCA_ompi_rte_pmix_POST_CONFIG],[ + AS_IF([test "$1" = "1"], [ompi_rte_base_include="pmix/rte_pmix.h"]) + AC_DEFINE_UNQUOTED([OMPI_RTE_PMIX], [$1], + [Defined to 1 if the OMPI runtime component is PMIX]) + AM_CONDITIONAL([OMPI_RTE_PMIX], [test $1 = 1]) +])dnl + +# MCA_rte_pmix_CONFIG([action-if-can-compile], +# [action-if-cant-compile]) +# ------------------------------------------------ +AC_DEFUN([MCA_ompi_rte_pmix_CONFIG],[ + AC_CONFIG_FILES([ompi/mca/rte/pmix/Makefile]) + + AC_ARG_WITH([ompi-pmix-rte], + AC_HELP_STRING([--with-ompi-pmix-rte], + [Use PMIx as the OMPI run-time environment (default: no)])) + AS_IF([test "$with_ompi_pmix_rte" == "yes"], + [$1 + AC_MSG_NOTICE([PMIx RTE selected by user])], + [$2]) +]) diff --git a/ompi/mca/rte/pmix/rte_pmix.h b/ompi/mca/rte/pmix/rte_pmix.h new file mode 100644 index 00000000000..c7d125ac4c1 --- /dev/null +++ b/ompi/mca/rte/pmix/rte_pmix.h @@ -0,0 +1,136 @@ +/* + * Copyright (c) 2012-2013 Los Alamos National Security, LLC. + * All rights reserved. + * Copyright (c) 2013-2017 Intel, Inc. All rights reserved. + * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2014-2016 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * Copyright (c) 2015 Intel, Inc. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + * + * When this component is used, this file is included in the rest of + * the OPAL/OMPI code base via ompi/mca/rte/rte.h. As such, + * this header represents the public interface to this static component. + */ + +#ifndef MCA_OMPI_RTE_PMIX_H +#define MCA_OMPI_RTE_PMIX_H + +#include "ompi_config.h" +#include "ompi/constants.h" + +#include + +#ifdef HAVE_SYS_TYPES_H +#include +#endif + +struct opal_proc_t; + +#include "opal/threads/threads.h" +#include "opal/util/proc.h" +#include "opal/mca/hwloc/hwloc-internal.h" +#include "opal/mca/pmix/pmix.h" + +struct ompi_proc_t; +struct ompi_communicator_t; + +BEGIN_C_DECLS + +/* Process name objects and operations */ +typedef opal_process_name_t ompi_process_name_t; +typedef uint32_t ompi_jobid_t; +typedef uint32_t ompi_vpid_t; + +/* some local storage */ +OMPI_DECLSPEC extern opal_process_name_t pmix_name_wildcard; +OMPI_DECLSPEC extern opal_process_name_t pmix_proc_my_name; +OMPI_DECLSPEC extern hwloc_cpuset_t ompi_proc_applied_binding; + +#define OMPI_PROC_MY_NAME (&pmix_proc_my_name) +#define OMPI_NAME_WILDCARD (&pmix_name_wildcard) + +typedef uint8_t ompi_rte_cmp_bitmask_t; +#define OMPI_RTE_CMP_NONE 0x00 +#define OMPI_RTE_CMP_JOBID 0x02 +#define OMPI_RTE_CMP_VPID 0x04 +#define OMPI_RTE_CMP_ALL 0x04 +#define OMPI_RTE_CMP_WILD 0x10 + +#define OMPI_NAME_PRINT(a) OPAL_NAME_PRINT((*(a))) +OMPI_DECLSPEC int ompi_rte_compare_name_fields(ompi_rte_cmp_bitmask_t mask, + const opal_process_name_t* name1, + const opal_process_name_t* name2); +OMPI_DECLSPEC int ompi_rte_convert_string_to_process_name(opal_process_name_t *name, + const char* name_string); +OMPI_DECLSPEC int ompi_rte_convert_process_name_to_string(char** name_string, + const opal_process_name_t *name); + +#define OMPI_LOCAL_JOBID(jobid) jobid +#define OMPI_JOB_FAMILY(jobid) 0 +/* do a little with the "family" param to avoid compiler warnings */ +#define OMPI_CONSTRUCT_JOBID(family,local) \ + ((family & 0x0000) | local) + +/* This is the DSS tag to serialize a proc name */ +#define OMPI_NAME OPAL_NAME +#define OMPI_PROCESS_NAME_HTON OPAL_PROCESS_NAME_HTON +#define OMPI_PROCESS_NAME_NTOH OPAL_PROCESS_NAME_NTOH + +#if OPAL_ENABLE_DEBUG +static inline opal_process_name_t * OMPI_CAST_RTE_NAME(opal_process_name_t * name) { + return (opal_process_name_t *)name; +} +#else +#define OMPI_CAST_RTE_NAME(a) ((opal_process_name_t*)(a)) +#endif + +/* Process info struct and values */ +typedef uint16_t ompi_node_rank_t; +typedef uint16_t ompi_local_rank_t; +#define OMPI_NODE_RANK_INVALID UINT16_MAX +#define OMPI_LOCAL_RANK_INVALID UINT16_MAX + +typedef struct { + opal_process_name_t my_name; + char *my_hnp_uri; + char *nodename; + pid_t pid; + char *job_session_dir; + char *proc_session_dir; + uint16_t my_local_rank; + uint16_t my_node_rank; + int32_t num_local_peers; + uint32_t num_procs; + uint32_t app_num; +} pmix_process_info_t; +OMPI_DECLSPEC extern pmix_process_info_t pmix_process_info; +#define ompi_process_info pmix_process_info + +OMPI_DECLSPEC extern bool pmix_proc_is_bound; +#define ompi_rte_proc_is_bound pmix_proc_is_bound + +/* Error handling objects and operations */ +OMPI_DECLSPEC void __opal_attribute_noreturn__ + ompi_rte_abort(int error_code, char *fmt, ...); +OMPI_DECLSPEC void ompi_rte_abort_peers(opal_process_name_t *procs, + int32_t num_procs, + int error_code); +#define OMPI_ERROR_LOG OPAL_ERROR_LOG + +/* Init and finalize operations */ +OMPI_DECLSPEC int ompi_rte_init(int *argc, char ***argv); +OMPI_DECLSPEC int ompi_rte_finalize(void); +OMPI_DECLSPEC void ompi_rte_wait_for_debugger(void); + +/* check dynamics support */ +OMPI_DECLSPEC bool ompi_rte_connect_accept_support(const char *port); + +END_C_DECLS + +#endif /* MCA_OMPI_RTE_PMIX_H */ diff --git a/ompi/mca/rte/pmix/rte_pmix_component.c b/ompi/mca/rte/pmix/rte_pmix_component.c new file mode 100644 index 00000000000..a54c0b0ab0a --- /dev/null +++ b/ompi/mca/rte/pmix/rte_pmix_component.c @@ -0,0 +1,77 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ +/* + * Copyright (c) 2012 Los Alamos National Security, LLC. All rights reserved. + * Copyright (c) 2014-2017 Intel, Inc. All rights reserved. + * Copyright (c) 2015 Los Alamos National Security, LLC. All rights + * reserved. + * + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + * + * These symbols are in a file by themselves to provide nice linker + * semantics. Since linkers generally pull in symbols by object + * files, keeping these symbols as the only symbols in this file + * prevents utility programs such as "ompi_info" from having to import + * entire components just to query their version and parameters. + */ + +#include "ompi_config.h" +#include "ompi/constants.h" + +#include "opal/threads/threads.h" +#include "opal/class/opal_list.h" + +#include "ompi/mca/rte/rte.h" +#include "rte_pmix.h" + +/* + * Public string showing the component version number + */ +const char *ompi_rte_pmix_component_version_string = + "OMPI pmix rte MCA component version " OMPI_VERSION; + +/* + * Local function + */ +static int rte_pmix_open(void); +static int rte_pmix_close(void); + +/* + * Instantiate the public struct with all of our public information + * and pointers to our public functions in it + */ + +ompi_rte_component_t mca_rte_pmix_component = { + /* First, the mca_component_t struct containing meta information + about the component itself */ + + .base_version = { + OMPI_RTE_BASE_VERSION_1_0_0, + + /* Component name and version */ + .mca_component_name = "pmix", + MCA_BASE_MAKE_VERSION(component, OMPI_MAJOR_VERSION, OMPI_MINOR_VERSION, + OMPI_RELEASE_VERSION), + + /* Component open and close functions */ + .mca_open_component = rte_pmix_open, + .mca_close_component = rte_pmix_close, + }, + .base_data = { + /* The component is checkpoint ready */ + MCA_BASE_METADATA_PARAM_CHECKPOINT + }, +}; + +static int rte_pmix_open(void) +{ + return OMPI_SUCCESS; +} + +static int rte_pmix_close(void) +{ + return OMPI_SUCCESS; +} diff --git a/ompi/mca/rte/pmix/rte_pmix_module.c b/ompi/mca/rte/pmix/rte_pmix_module.c new file mode 100644 index 00000000000..4b264f8ff05 --- /dev/null +++ b/ompi/mca/rte/pmix/rte_pmix_module.c @@ -0,0 +1,759 @@ +/* + * Copyright (c) 2012-2013 Los Alamos National Security, LLC. + * All rights reserved. + * Copyright (c) 2013-2017 Intel, Inc. All rights reserved. + * Copyright (c) 2012-2014 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * $COPYRIGHT$ + */ +#include "ompi_config.h" +#include "ompi/constants.h" + +#include +#include +#include +#ifdef HAVE_SYS_TYPES_H +#include +#endif /* HAVE_SYS_TYPES_H */ +#include +#ifdef HAVE_UNISTD_H +#include +#endif /* HAVE_UNISTD_H */ +#ifdef HAVE_DIRENT_H +#include +#endif /* HAVE_DIRENT_H */ +#ifdef HAVE_PWD_H +#include +#endif /* HAVE_PWD_H */ + +#include "opal/dss/dss.h" +#include "opal/util/argv.h" +#include "opal/util/error.h" +#include "opal/util/opal_getcwd.h" +#include "opal/util/os_path.h" +#include "opal/util/os_dirpath.h" +#include "opal/util/proc.h" +#include "opal/util/show_help.h" +#include "opal/mca/hwloc/base/base.h" +#include "opal/mca/pmix/base/base.h" +#include "opal/threads/threads.h" +#include "opal/class/opal_list.h" +#include "opal/dss/dss.h" + +#include "ompi/mca/rte/base/base.h" +#include "ompi/mca/rte/rte.h" +#include "ompi/debuggers/debuggers.h" +#include "ompi/proc/proc.h" +#include "ompi/runtime/params.h" +#include "ompi/communicator/communicator.h" + +/* instantiate a debugger-required value */ +volatile int MPIR_being_debugged = 0; + +extern ompi_rte_component_t mca_rte_pmix_component; + +/* storage to support OMPI */ +opal_process_name_t pmix_name_wildcard = {UINT32_MAX-1, UINT32_MAX-1}; +opal_process_name_t pmix_name_invalid = {UINT32_MAX, UINT32_MAX}; +opal_process_name_t pmix_proc_my_name = {0, 0}; +hwloc_cpuset_t ompi_proc_applied_binding = NULL; +pmix_process_info_t pmix_process_info = {0}; +bool pmix_proc_is_bound = false; + +static bool pmix_in_parallel_debugger = false; +static bool added_transport_keys = false; +static bool added_num_procs = false; +static bool added_app_ctx = false; +static char* pre_condition_transports_print(uint64_t *unique_key); +static int _setup_job_session_dir(char **sdir); + +#define ORTE_SCHEMA_DELIMITER_CHAR '.' +#define ORTE_SCHEMA_WILDCARD_CHAR '*' +#define ORTE_SCHEMA_WILDCARD_STRING "*" +#define ORTE_SCHEMA_INVALID_CHAR '$' +#define ORTE_SCHEMA_INVALID_STRING "$" + +int ompi_rte_compare_name_fields(ompi_rte_cmp_bitmask_t fields, + const opal_process_name_t* name1, + const opal_process_name_t* name2) +{ + /* handle the NULL pointer case */ + if (NULL == name1 && NULL == name2) { + return OPAL_EQUAL; + } else if (NULL == name1) { + return OPAL_VALUE2_GREATER; + } else if (NULL == name2) { + return OPAL_VALUE1_GREATER; + } + + /* in this comparison function, we check for exact equalities. + * In the case of wildcards, we check to ensure that the fields + * actually match those values - thus, a "wildcard" in this + * function does not actually stand for a wildcard value, but + * rather a specific value - UNLESS the CMP_WILD bitmask value + * is set + */ + + /* check job id */ + if (OMPI_RTE_CMP_JOBID & fields) { + if (OMPI_RTE_CMP_WILD & fields && + (pmix_name_wildcard.jobid == name1->jobid || + pmix_name_wildcard.jobid == name2->jobid)) { + goto check_vpid; + } + if (name1->jobid < name2->jobid) { + return OPAL_VALUE2_GREATER; + } else if (name1->jobid > name2->jobid) { + return OPAL_VALUE1_GREATER; + } + } + + /* get here if jobid's are equal, or not being checked + * now check vpid + */ + check_vpid: + if (OMPI_RTE_CMP_VPID & fields) { + if (OMPI_RTE_CMP_WILD & fields && + (pmix_name_wildcard.vpid == name1->vpid || + pmix_name_wildcard.vpid == name2->vpid)) { + return OPAL_EQUAL; + } + if (name1->vpid < name2->vpid) { + return OPAL_VALUE2_GREATER; + } else if (name1->vpid > name2->vpid) { + return OPAL_VALUE1_GREATER; + } + } + + /* only way to get here is if all fields are being checked and are equal, + * or jobid not checked, but vpid equal, + * only vpid being checked, and equal + * return that fact + */ + return OPAL_EQUAL; +} + +int ompi_rte_convert_string_to_process_name(opal_process_name_t *name, + const char* name_string) +{ + char *temp, *token; + opal_jobid_t job; + opal_vpid_t vpid; + int return_code=OPAL_SUCCESS; + + /* set default */ + name->jobid = pmix_name_invalid.jobid; + name->vpid = pmix_name_invalid.vpid; + + /* check for NULL string - error */ + if (NULL == name_string) { + OPAL_ERROR_LOG(OPAL_ERR_BAD_PARAM); + return OPAL_ERR_BAD_PARAM; + } + + temp = strdup(name_string); /** copy input string as the strtok process is destructive */ + token = strchr(temp, ORTE_SCHEMA_DELIMITER_CHAR); /** get first field -> jobid */ + + /* check for error */ + if (NULL == token) { + OPAL_ERROR_LOG(OPAL_ERR_BAD_PARAM); + free(temp); + return OPAL_ERR_BAD_PARAM; + } + *token = '\0'; + token++; + + /* check for WILDCARD character - assign + * value accordingly, if found + */ + if (0 == strcmp(temp, ORTE_SCHEMA_WILDCARD_STRING)) { + job = pmix_name_wildcard.jobid; + } else if (0 == strcmp(temp, ORTE_SCHEMA_INVALID_STRING)) { + job = pmix_name_invalid.jobid; + } else { + job = strtoul(temp, NULL, 10); + } + + /* check for WILDCARD character - assign + * value accordingly, if found + */ + if (0 == strcmp(token, ORTE_SCHEMA_WILDCARD_STRING)) { + vpid = pmix_name_wildcard.vpid; + } else if (0 == strcmp(token, ORTE_SCHEMA_INVALID_STRING)) { + vpid = pmix_name_invalid.vpid; + } else { + vpid = strtoul(token, NULL, 10); + } + + name->jobid = job; + name->vpid = vpid; + + free(temp); + + return return_code; +} + +int ompi_rte_convert_process_name_to_string(char** name_string, + const opal_process_name_t *name) +{ + char *tmp, *tmp2; + + if (NULL == name) { /* got an error */ + OPAL_ERROR_LOG(OPAL_ERR_BAD_PARAM); + return OPAL_ERR_BAD_PARAM; + } + + /* check for wildcard and invalid values - where encountered, insert the + * corresponding string so we can correctly parse the name string when + * it is passed back to us later + */ + if (pmix_name_wildcard.jobid == name->jobid) { + asprintf(&tmp, "%s", ORTE_SCHEMA_WILDCARD_STRING); + } else if (pmix_name_invalid.jobid == name->jobid) { + asprintf(&tmp, "%s", ORTE_SCHEMA_INVALID_STRING); + } else { + asprintf(&tmp, "%lu", (unsigned long)name->jobid); + } + + if (pmix_name_wildcard.vpid == name->vpid) { + asprintf(&tmp2, "%s%c%s", tmp, ORTE_SCHEMA_DELIMITER_CHAR, ORTE_SCHEMA_WILDCARD_STRING); + } else if (pmix_name_invalid.vpid == name->vpid) { + asprintf(&tmp2, "%s%c%s", tmp, ORTE_SCHEMA_DELIMITER_CHAR, ORTE_SCHEMA_INVALID_STRING); + } else { + asprintf(&tmp2, "%s%c%lu", tmp, ORTE_SCHEMA_DELIMITER_CHAR, (unsigned long)name->vpid); + } + + asprintf(name_string, "%s", tmp2); + + free(tmp); + free(tmp2); + + return OPAL_SUCCESS; +} + +int ompi_rte_init(int *pargc, char ***pargv) +{ + int ret; + char *error = NULL; + opal_process_name_t pname; + opal_proc_t *myname; + int u32, *u32ptr; + uint16_t u16, *u16ptr; + char **peers=NULL, *mycpuset; + char *envar, *ev1, *ev2; + opal_value_t *kv; + char *val; + size_t i; + uint64_t unique_key[2]; + char *string_key; + + u32ptr = &u32; + u16ptr = &u16; + memset(&pmix_process_info, 0, sizeof(pmix_process_info)); + + /* initialize the opal layer */ + if (OPAL_SUCCESS != (ret = opal_init(pargc, pargv))) { + error = "opal_init"; + goto error; + } + + /* open and setup pmix */ + if (OPAL_SUCCESS != (ret = mca_base_framework_open(&opal_pmix_base_framework, 0))) { + OPAL_ERROR_LOG(ret); + /* we cannot run */ + error = "pmix init"; + goto error; + } + if (OPAL_SUCCESS != (ret = opal_pmix_base_select())) { + /* we cannot run */ + error = "pmix init"; + goto error; + } + /* set the event base */ + opal_pmix_base_set_evbase(opal_sync_event_base); + + /* initialize the selected module */ + if (!opal_pmix.initialized() && (OPAL_SUCCESS != (ret = opal_pmix.init(NULL)))) { + /* we cannot run - this could be due to being direct launched + * without the required PMI support being built, so print + * out a help message indicating it */ + opal_show_help("help-ompi-rte-pmix.txt", "no-pmi", true); + return OPAL_ERR_SILENT; + } + /* opal_pmix.init will have filled in proc name fields in + * OPAL, so transfer them here */ + myname = opal_proc_local_get(); + pmix_proc_my_name = myname->proc_name; + /* get our hostname */ + pmix_process_info.nodename = opal_get_proc_hostname(myname); + + /* get our local rank from PMI */ + OPAL_MODEX_RECV_VALUE(ret, OPAL_PMIX_LOCAL_RANK, + &pmix_proc_my_name, &u16ptr, OPAL_UINT16); + if (OPAL_SUCCESS != ret) { + error = "getting local rank"; + goto error; + } + pmix_process_info.my_local_rank = u16; + + /* get our node rank from PMI */ + OPAL_MODEX_RECV_VALUE(ret, OPAL_PMIX_NODE_RANK, + &pmix_proc_my_name, &u16ptr, OPAL_UINT16); + if (OPAL_SUCCESS != ret) { + error = "getting node rank"; + goto error; + } + pmix_process_info.my_node_rank = u16; + + /* get job size */ + OPAL_MODEX_RECV_VALUE(ret, OPAL_PMIX_JOB_SIZE, + &pmix_name_wildcard, &u32ptr, OPAL_UINT32); + if (OPAL_SUCCESS != ret) { + error = "getting job size"; + goto error; + } + pmix_process_info.num_procs = u32; + + /* push into the environ for pickup in MPI layer for + * MPI-3 required info key + */ + if (NULL == getenv(OPAL_MCA_PREFIX"orte_ess_num_procs")) { + asprintf(&ev1, OPAL_MCA_PREFIX"orte_ess_num_procs=%d", pmix_process_info.num_procs); + putenv(ev1); + added_num_procs = true; + } + if (NULL == getenv("OMPI_APP_CTX_NUM_PROCS")) { + asprintf(&ev2, "OMPI_APP_CTX_NUM_PROCS=%d", pmix_process_info.num_procs); + putenv(ev2); + added_app_ctx = true; + } + + /* get our app number from PMI - ok if not found */ + OPAL_MODEX_RECV_VALUE_OPTIONAL(ret, OPAL_PMIX_APPNUM, + &pmix_proc_my_name, &u32ptr, OPAL_UINT32); + if (OPAL_SUCCESS == ret) { + pmix_process_info.app_num = u32; + } else { + pmix_process_info.app_num = 0; + } + + /* get the number of local peers - required for wireup of + * shared memory BTL */ + OPAL_MODEX_RECV_VALUE(ret, OPAL_PMIX_LOCAL_SIZE, + &pmix_name_wildcard, &u32ptr, OPAL_UINT32); + if (OPAL_SUCCESS == ret) { + pmix_process_info.num_local_peers = u32 - 1; // want number besides ourselves + } else { + pmix_process_info.num_local_peers = 0; + } + + /* setup transport keys in case the MPI layer needs them - + * we can use the jobfam and stepid as unique keys + * because they are unique values assigned by the RM + */ + if (NULL == getenv(OPAL_MCA_PREFIX"orte_precondition_transports")) { + unique_key[0] = (pmix_proc_my_name.jobid & 0xff00) >> 16; + unique_key[1] = pmix_proc_my_name.jobid & 0x00ff; + if (NULL == (string_key = pre_condition_transports_print(unique_key))) { + OPAL_ERROR_LOG(OPAL_ERR_OUT_OF_RESOURCE); + return OPAL_ERR_OUT_OF_RESOURCE; + } + opal_output_verbose(2, ompi_rte_base_framework.framework_output, + "%s transport key %s", + OPAL_NAME_PRINT(pmix_proc_my_name), string_key); + asprintf(&envar, OPAL_MCA_PREFIX"orte_precondition_transports=%s", string_key); + putenv(envar); + added_transport_keys = true; + /* cannot free the envar as that messes up our environ */ + free(string_key); + } + + /* retrieve temp directories info */ + OPAL_MODEX_RECV_VALUE_OPTIONAL(ret, OPAL_PMIX_NSDIR, &pmix_name_wildcard, &val, OPAL_STRING); + if (OPAL_SUCCESS == ret && NULL != val) { + pmix_process_info.job_session_dir = val; + val = NULL; + } else { + /* we need to create something */ + ret = _setup_job_session_dir(&pmix_process_info.job_session_dir); + if (OPAL_SUCCESS != ret) { + error = "job session directory"; + goto error; + } + } + + /* get our local peers */ + if (0 < pmix_process_info.num_local_peers) { + /* if my local rank if too high, then that's an error */ + if (pmix_process_info.num_local_peers < pmix_process_info.my_local_rank) { + ret = OPAL_ERR_BAD_PARAM; + error = "num local peers"; + goto error; + } + /* retrieve the local peers */ + OPAL_MODEX_RECV_VALUE(ret, OPAL_PMIX_LOCAL_PEERS, + &pmix_name_wildcard, &val, OPAL_STRING); + if (OPAL_SUCCESS == ret && NULL != val) { + peers = opal_argv_split(val, ','); + free(val); + } else { + peers = NULL; + } + } else { + peers = NULL; + } + + /* set the locality */ + if (NULL != peers) { + /* identify our location */ + val = NULL; + OPAL_MODEX_RECV_VALUE_OPTIONAL(ret, OPAL_PMIX_LOCALITY_STRING, + &pmix_proc_my_name, &val, OPAL_STRING); + if (OPAL_SUCCESS == ret && NULL != val) { + mycpuset = val; + } else { + mycpuset = NULL; + } + pname.jobid = pmix_proc_my_name.jobid; + for (i=0; NULL != peers[i]; i++) { + pname.vpid = strtoul(peers[i], NULL, 10); + if (pname.vpid == pmix_proc_my_name.vpid) { + /* we are fully local to ourselves */ + u16 = OPAL_PROC_ALL_LOCAL; + } else { + val = NULL; + OPAL_MODEX_RECV_VALUE_OPTIONAL(ret, OPAL_PMIX_LOCALITY_STRING, + &pname, &val, OPAL_STRING); + if (OPAL_SUCCESS == ret && NULL != val) { + u16 = opal_hwloc_compute_relative_locality(mycpuset, val); + free(val); + } else { + /* all we can say is that it shares our node */ + u16 = OPAL_PROC_ON_CLUSTER | OPAL_PROC_ON_CU | OPAL_PROC_ON_NODE; + } + } + kv = OBJ_NEW(opal_value_t); + kv->key = strdup(OPAL_PMIX_LOCALITY); + kv->type = OPAL_UINT16; + OPAL_OUTPUT_VERBOSE((1, ompi_rte_base_framework.framework_output, + "%s locality: proc %s locality %s", + OPAL_NAME_PRINT(pmix_proc_my_name), + OPAL_NAME_PRINT(pname), opal_hwloc_base_print_locality(u16))); + kv->data.uint16 = u16; + ret = opal_pmix.store_local(&pname, kv); + if (OPAL_SUCCESS != ret) { + error = "local store of locality"; + opal_argv_free(peers); + if (NULL != mycpuset) { + free(mycpuset); + } + goto error; + } + OBJ_RELEASE(kv); + } + opal_argv_free(peers); + if (NULL != mycpuset) { + free(mycpuset); + } + } + + /* poor attempt to detect we are bound */ + if (NULL != getenv("SLURM_CPU_BIND_TYPE")) { + pmix_proc_is_bound = true; + } + + /* push our hostname so others can find us, if they need to - the + * native PMIx component will ignore this request as the hostname + * is provided by the system */ + OPAL_MODEX_SEND_VALUE(ret, OPAL_PMIX_GLOBAL, OPAL_PMIX_HOSTNAME, pmix_process_info.nodename, OPAL_STRING); + if (OPAL_SUCCESS != ret) { + error = "db store hostname"; + goto error; + } + + return OPAL_SUCCESS; + + error: + opal_show_help_finalize(); + if (OPAL_ERR_SILENT != ret ) { + opal_show_help("help-ompi-rte-pmix.txt", + "internal-failure", + true, error, opal_strerror(ret), ret); + } + return ret; + +} + +static bool check_file(const char *root, const char *path) +{ + struct stat st; + char *fullpath; + + /* + * Keep: + * - non-zero files starting with "output-" + */ + if (0 == strncmp(path, "output-", strlen("output-"))) { + fullpath = opal_os_path(false, &fullpath, root, path, NULL); + stat(fullpath, &st); + free(fullpath); + if (0 == st.st_size) { + return true; + } + return false; + } + + return true; +} + +int ompi_rte_finalize(void) +{ + /* remove the envars that we pushed into environ + * so we leave that structure intact + */ + if (added_transport_keys) { + unsetenv(OPAL_MCA_PREFIX"orte_precondition_transports"); + } + if (added_num_procs) { + unsetenv(OPAL_MCA_PREFIX"orte_ess_num_procs"); + } + if (added_app_ctx) { + unsetenv("OMPI_APP_CTX_NUM_PROCS"); + } + + /* shutdown pmix */ + if (NULL != opal_pmix.finalize) { + opal_pmix.finalize(); + (void) mca_base_framework_close(&opal_pmix_base_framework); + } + + /* cleanup the session directory we created */ + if (NULL != pmix_process_info.job_session_dir) { + opal_os_dirpath_destroy(pmix_process_info.job_session_dir, + false, check_file); + free(pmix_process_info.job_session_dir); + } + return OMPI_SUCCESS; +} + +void ompi_rte_abort(int error_code, char *fmt, ...) +{ + va_list arglist; + char* buffer = NULL; + struct timespec tp = {0, 100000}; + + /* If there was a message, output it */ + va_start(arglist, fmt); + if( NULL != fmt ) { + vasprintf( &buffer, fmt, arglist ); + } + va_end(arglist); + + /* call abort */ + opal_pmix.abort(error_code, buffer, NULL); + if (NULL != buffer) { + free(buffer); + } + + /* provide a little delay for the PMIx thread to + * get the info out */ + nanosleep(&tp, NULL); + + /* Now Exit */ + _exit(error_code); +} + +void ompi_rte_abort_peers(opal_process_name_t *procs, + int32_t num_procs, + int error_code) +{ + return; +} + +static size_t handler = SIZE_MAX; +static bool debugger_register_active = true; +static bool debugger_event_active = true; + +static void _release_fn(int status, + const opal_process_name_t *source, + opal_list_t *info, opal_list_t *results, + opal_pmix_notification_complete_fn_t cbfunc, + void *cbdata) +{ + /* must let the notifier know we are done */ + if (NULL != cbfunc) { + cbfunc(OPAL_SUCCESS, NULL, NULL, NULL, cbdata); + } + debugger_event_active = false; +} + +static void _register_fn(int status, + size_t evhandler_ref, + void *cbdata) +{ + opal_list_t *codes = (opal_list_t*)cbdata; + + handler = evhandler_ref; + OPAL_LIST_RELEASE(codes); + debugger_register_active = false; +} + +/* + * Wait for a debugger if asked. We support two ways of waiting for + * attaching debuggers -- see big comment in + * pmix/tools/pmixrun/debuggers.c explaining the two scenarios. + */ +void ompi_rte_wait_for_debugger(void) +{ + int debugger; + opal_list_t *codes, directives; + opal_value_t *kv; + char *evar; + int time; + + /* check PMIx to see if we are under a debugger */ + debugger = pmix_in_parallel_debugger; + + if (1 == MPIR_being_debugged) { + debugger = 1; + } + + if (!debugger && NULL == getenv("PMIX_TEST_DEBUGGER_ATTACH")) { + /* if not, just return */ + return; + } + + /* if we are being debugged, then we need to find + * the correct plug-ins + */ + ompi_debugger_setup_dlls(); + + if (NULL != (evar = getenv("PMIX_TEST_DEBUGGER_SLEEP"))) { + time = strtol(evar, NULL, 10); + sleep(time); + return; + } + + /* register an event handler for the PMIX_ERR_DEBUGGER_RELEASE event */ + codes = OBJ_NEW(opal_list_t); + kv = OBJ_NEW(opal_value_t); + kv->key = strdup("errorcode"); + kv->type = OPAL_INT; + kv->data.integer = OPAL_ERR_DEBUGGER_RELEASE; + opal_list_append(codes, &kv->super); + + OBJ_CONSTRUCT(&directives, opal_list_t); + kv = OBJ_NEW(opal_value_t); + kv->key = strdup(OPAL_PMIX_EVENT_HDLR_NAME); + kv->type = OPAL_STRING; + kv->data.string = strdup("MPI-DEBUGGER-ATTACH"); + opal_list_append(&directives, &kv->super); + + opal_pmix.register_evhandler(codes, &directives, _release_fn, _register_fn, codes); + /* let the MPI progress engine run while we wait for registration to complete */ + OMPI_WAIT_FOR_COMPLETION(debugger_register_active); + OPAL_LIST_DESTRUCT(&directives); + + /* let the MPI progress engine run while we wait for debugger release */ + OMPI_WAIT_FOR_COMPLETION(debugger_event_active); + + /* deregister the event handler */ + opal_pmix.deregister_evhandler(handler, NULL, NULL); +} + +bool ompi_rte_connect_accept_support(const char *port) +{ + /* not sure how to support this yet */ + return false; +} + +static char* pre_condition_transports_print(uint64_t *unique_key) +{ + unsigned int *int_ptr; + size_t i, j, string_key_len, written_len; + char *string_key = NULL, *format = NULL; + + /* string is two 64 bit numbers printed in hex with a dash between + * and zero padding. + */ + string_key_len = (sizeof(uint64_t) * 2) * 2 + strlen("-") + 1; + string_key = (char*) malloc(string_key_len); + if (NULL == string_key) { + return NULL; + } + + string_key[0] = '\0'; + written_len = 0; + + /* get a format string based on the length of an unsigned int. We + * want to have zero padding for sizeof(unsigned int) * 2 + * characters -- when printing as a hex number, each byte is + * represented by 2 hex characters. Format will contain something + * that looks like %08lx, where the number 8 might be a different + * number if the system has a different sized long (8 would be for + * sizeof(int) == 4)). + */ + asprintf(&format, "%%0%dx", (int)(sizeof(unsigned int)) * 2); + + /* print the first number */ + int_ptr = (unsigned int*) &unique_key[0]; + for (i = 0 ; i < sizeof(uint64_t) / sizeof(unsigned int) ; ++i) { + if (0 == int_ptr[i]) { + /* inject some energy */ + for (j=0; j < sizeof(unsigned int); j++) { + int_ptr[i] |= j << j; + } + } + snprintf(string_key + written_len, + string_key_len - written_len, + format, int_ptr[i]); + written_len = strlen(string_key); + } + + /* print the middle dash */ + snprintf(string_key + written_len, string_key_len - written_len, "-"); + written_len = strlen(string_key); + + /* print the second number */ + int_ptr = (unsigned int*) &unique_key[1]; + for (i = 0 ; i < sizeof(uint64_t) / sizeof(unsigned int) ; ++i) { + if (0 == int_ptr[i]) { + /* inject some energy */ + for (j=0; j < sizeof(unsigned int); j++) { + int_ptr[i] |= j << j; + } + } + snprintf(string_key + written_len, + string_key_len - written_len, + format, int_ptr[i]); + written_len = strlen(string_key); + } + free(format); + + return string_key; +} + +static int _setup_job_session_dir(char **sdir) +{ + char *tmpdir; + /* get the effective uid */ + uid_t uid = geteuid(); + + if( NULL == (tmpdir = getenv("TMPDIR")) ) + if( NULL == (tmpdir = getenv("TEMP")) ) + if( NULL == (tmpdir = getenv("TMP")) ) + tmpdir = "/tmp"; + + if (0 > asprintf(&pmix_process_info.job_session_dir, + "%s/ompi.%s.%lu/jf.0/%u", tmpdir, + pmix_process_info.nodename, + (unsigned long)uid, + pmix_proc_my_name.jobid)) { + pmix_process_info.job_session_dir = NULL; + return OPAL_ERR_OUT_OF_RESOURCE; + } + + return OPAL_SUCCESS; +} diff --git a/ompi/mca/sharedfp/addproc/.opal_unignore b/ompi/mca/sharedfp/addproc/.opal_unignore deleted file mode 100644 index e69de29bb2d..00000000000 diff --git a/ompi/mca/sharedfp/addproc/Makefile.am b/ompi/mca/sharedfp/addproc/Makefile.am deleted file mode 100644 index f8e9a5739b2..00000000000 --- a/ompi/mca/sharedfp/addproc/Makefile.am +++ /dev/null @@ -1,64 +0,0 @@ -# -# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana -# University Research and Technology -# Corporation. All rights reserved. -# Copyright (c) 2004-2005 The University of Tennessee and The University -# of Tennessee Research Foundation. All rights -# reserved. -# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, -# University of Stuttgart. All rights reserved. -# Copyright (c) 2004-2005 The Regents of the University of California. -# All rights reserved. -# Copyright (c) 2013 University of Houston. All rights reserved. -# Copyright (c) 2016 IBM Corporation. All rights reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - -# Make the output library in this directory, and name it either -# mca__.la (for DSO builds) or libmca__.la -# (for static builds). - -if MCA_BUILD_ompi_sharedfp_addproc_DSO -component_noinst = -component_install = mca_sharedfp_addproc.la -else -component_noinst = libmca_sharedfp_addproc.la -component_install = -endif - -mcacomponentdir = $(ompilibdir) -mcacomponent_LTLIBRARIES = $(component_install) -mca_sharedfp_addproc_la_SOURCES = $(sources) -mca_sharedfp_addproc_la_LDFLAGS = -module -avoid-version - -noinst_LTLIBRARIES = $(component_noinst) -libmca_sharedfp_addproc_la_SOURCES = $(sources) -libmca_sharedfp_addproc_la_LDFLAGS = -module -avoid-version - -# Source files - -#IMPORTANT: Update here when adding new source code files to the library -sources = \ - sharedfp_addproc.h \ - sharedfp_addproc.c \ - sharedfp_addproc_component.c \ - sharedfp_addproc_seek.c \ - sharedfp_addproc_request_position.c \ - sharedfp_addproc_write.c \ - sharedfp_addproc_iwrite.c \ - sharedfp_addproc_read.c \ - sharedfp_addproc_iread.c \ - sharedfp_addproc_file_open.c - -#The additional process is spawned by executing this executable -bin_PROGRAMS = mca_sharedfp_addproc_control - -mca_sharedfp_addproc_control_SOURCES = \ - sharedfp_addproc_control.h \ - sharedfp_addproc_control.c - -mca_sharedfp_addproc_control_LDADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la diff --git a/ompi/mca/sharedfp/addproc/owner.txt b/ompi/mca/sharedfp/addproc/owner.txt deleted file mode 100644 index f886026a69e..00000000000 --- a/ompi/mca/sharedfp/addproc/owner.txt +++ /dev/null @@ -1,7 +0,0 @@ -# -# owner/status file -# owner: institution that is responsible for this package -# status: e.g. active, maintenance, unmaintained -# -owner: UH -status: maintenance diff --git a/ompi/mca/sharedfp/addproc/sharedfp_addproc.c b/ompi/mca/sharedfp/addproc/sharedfp_addproc.c deleted file mode 100644 index 4e44715aee9..00000000000 --- a/ompi/mca/sharedfp/addproc/sharedfp_addproc.c +++ /dev/null @@ -1,97 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2006 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2013 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - * - * These symbols are in a file by themselves to provide nice linker - * semantics. Since linkers generally pull in symbols by object fules, - * keeping these symbols as the only symbols in this file prevents - * utility programs such as "ompi_info" from having to import entire - * modules just to query their version and parameters - */ - -#include "ompi_config.h" -#include "mpi.h" -#include "ompi/mca/sharedfp/sharedfp.h" -#include "ompi/mca/sharedfp/addproc/sharedfp_addproc.h" - -/* - * ******************************************************************* - * ************************ actions structure ************************ - * ******************************************************************* - */ - /* IMPORTANT: Update here when adding sharedfp component interface functions*/ -static mca_sharedfp_base_module_1_0_0_t addproc = { - mca_sharedfp_addproc_module_init, /* initalise after being selected */ - mca_sharedfp_addproc_module_finalize, /* close a module on a communicator */ - mca_sharedfp_addproc_seek, - mca_sharedfp_addproc_get_position, - mca_sharedfp_addproc_read, - mca_sharedfp_addproc_read_ordered, - mca_sharedfp_addproc_read_ordered_begin, - mca_sharedfp_addproc_read_ordered_end, - mca_sharedfp_addproc_iread, - mca_sharedfp_addproc_write, - mca_sharedfp_addproc_write_ordered, - mca_sharedfp_addproc_write_ordered_begin, - mca_sharedfp_addproc_write_ordered_end, - mca_sharedfp_addproc_iwrite, - mca_sharedfp_addproc_file_open, - mca_sharedfp_addproc_file_close -}; -/* - * ******************************************************************* - * ************************* structure ends ************************** - * ******************************************************************* - */ - -int mca_sharedfp_addproc_component_init_query(bool enable_progress_threads, - bool enable_mpi_threads) -{ - /* Nothing to do */ - - return OMPI_SUCCESS; -} - -struct mca_sharedfp_base_module_1_0_0_t * - mca_sharedfp_addproc_component_file_query - (mca_io_ompio_file_t *fh, int *priority) { - *priority = mca_sharedfp_addproc_priority; - - /*test, and update priority*/ - - return &addproc; -} - -int mca_sharedfp_addproc_component_file_unquery (mca_io_ompio_file_t *file) -{ - /* This function might be needed for some purposes later. for now it - * does not have anything to do since there are no steps which need - * to be undone if this module is not selected */ - - return OMPI_SUCCESS; -} - -int mca_sharedfp_addproc_module_init (mca_io_ompio_file_t *file) -{ - return OMPI_SUCCESS; -} - - -int mca_sharedfp_addproc_module_finalize (mca_io_ompio_file_t *file) -{ - return OMPI_SUCCESS; -} diff --git a/ompi/mca/sharedfp/addproc/sharedfp_addproc.h b/ompi/mca/sharedfp/addproc/sharedfp_addproc.h deleted file mode 100644 index e47afda7a82..00000000000 --- a/ompi/mca/sharedfp/addproc/sharedfp_addproc.h +++ /dev/null @@ -1,164 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2006 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2013-2016 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef MCA_SHAREDFP_ADDPROC_H -#define MCA_SHAREDFP_ADDPROC_H - -#include "ompi_config.h" -#include "ompi/mca/mca.h" -#include "ompi/mca/sharedfp/sharedfp.h" -#include "ompi/mca/common/ompio/common_ompio.h" -#include - -BEGIN_C_DECLS - -int mca_sharedfp_addproc_component_init_query(bool enable_progress_threads, - bool enable_mpi_threads); -struct mca_sharedfp_base_module_1_0_0_t * - mca_sharedfp_addproc_component_file_query (mca_io_ompio_file_t *file, int *priority); -int mca_sharedfp_addproc_component_file_unquery (mca_io_ompio_file_t *file); - -int mca_sharedfp_addproc_module_init (mca_io_ompio_file_t *file); -int mca_sharedfp_addproc_module_finalize (mca_io_ompio_file_t *file); - -extern int mca_sharedfp_addproc_priority; -extern int mca_sharedfp_addproc_verbose; -#if 0 -extern char[MPI_MAX_HOSTNAME_LEN] mca_sharedfp_addproc_control_host; -#endif - -OMPI_MODULE_DECLSPEC extern mca_sharedfp_base_component_2_0_0_t mca_sharedfp_addproc_component; -/* - * ****************************************************************** - * ********* functions which are implemented in this module ********* - * ****************************************************************** - */ -/*IMPORANT: Update here when implementing functions from sharedfp API*/ - -int mca_sharedfp_addproc_seek (mca_io_ompio_file_t *fh, - OMPI_MPI_OFFSET_TYPE offset, int whence); -int mca_sharedfp_addproc_get_position (mca_io_ompio_file_t *fh, - OMPI_MPI_OFFSET_TYPE * offset); -int mca_sharedfp_addproc_file_open (struct ompi_communicator_t *comm, - const char* filename, - int amode, - struct ompi_info_t *info, - mca_io_ompio_file_t *fh); -int mca_sharedfp_addproc_file_close (mca_io_ompio_file_t *fh); -int mca_sharedfp_addproc_read (mca_io_ompio_file_t *fh, - void *buf, int count, MPI_Datatype datatype, MPI_Status *status); -int mca_sharedfp_addproc_read_ordered (mca_io_ompio_file_t *fh, - void *buf, int count, struct ompi_datatype_t *datatype, - ompi_status_public_t *status - ); -int mca_sharedfp_addproc_read_ordered_begin (mca_io_ompio_file_t *fh, - void *buf, - int count, - struct ompi_datatype_t *datatype); -int mca_sharedfp_addproc_read_ordered_end (mca_io_ompio_file_t *fh, - void *buf, - ompi_status_public_t *status); -int mca_sharedfp_addproc_iread (mca_io_ompio_file_t *fh, - void *buf, - int count, - struct ompi_datatype_t *datatype, - ompi_request_t **request); -int mca_sharedfp_addproc_write (mca_io_ompio_file_t *fh, - const void *buf, - int count, - struct ompi_datatype_t *datatype, - ompi_status_public_t *status); -int mca_sharedfp_addproc_write_ordered (mca_io_ompio_file_t *fh, - const void *buf, - int count, - struct ompi_datatype_t *datatype, - ompi_status_public_t *status); -int mca_sharedfp_addproc_write_ordered_begin (mca_io_ompio_file_t *fh, - const void *buf, - int count, - struct ompi_datatype_t *datatype); -int mca_sharedfp_addproc_write_ordered_end (mca_io_ompio_file_t *fh, - const void *buf, - ompi_status_public_t *status); -int mca_sharedfp_addproc_iwrite (mca_io_ompio_file_t *fh, - const void *buf, - int count, - struct ompi_datatype_t *datatype, - ompi_request_t **request); -/****************************************************/ -/*The following are structures and definitions * - * copied over directly from uhio codebase */ -/****************************************************/ - -/*This structure will hang off of the mca_sharedfp_base_data_t's - *selected_module_data attribute - */ -struct mca_sharedfp_addproc_data -{ - MPI_Comm intercom; -}; - -typedef struct mca_sharedfp_addproc_data addproc_data; - - -int mca_sharedfp_addproc_request_position (struct mca_sharedfp_base_data_t * sh, - int bytes_requested, - OMPI_MPI_OFFSET_TYPE * offset); - -#define DO_ACK 0 /* To be set by the Environment Variable*/ -#define REQUEST_TAG 99 -#define ACK_TAG 1 -#define OFFSET_TAG 98 -#define END_TAG 97 - -#define SEEK_END_TAG 91 -#define SEEK_SET_TAG 92 -#define SEEK_CUR_TAG 93 -#define GET_POSITION_TAG 94 - -#define NUM_OF_SPAWNS 1 - -struct list { - - int procNo; - long numBytesArrAddr; - struct list *Next; -}; - -struct Stat { - int tag; - int source; - long* recvBuff; -}; - - -double uhio_shared_gettime(void); - -typedef struct list node; -typedef struct Stat statusStruct; - -/* - * ****************************************************************** - * ************ functions implemented in this module end ************ - * ****************************************************************** - */ - -END_C_DECLS - -#endif /* MCA_SHAREDFP_ADDPROC_H */ diff --git a/ompi/mca/sharedfp/addproc/sharedfp_addproc_component.c b/ompi/mca/sharedfp/addproc/sharedfp_addproc_component.c deleted file mode 100644 index 66bf463e0b9..00000000000 --- a/ompi/mca/sharedfp/addproc/sharedfp_addproc_component.c +++ /dev/null @@ -1,104 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2005 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2013 University of Houston. All rights reserved. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights - * reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - * - * These symbols are in a file by themselves to provide nice linker - * semantics. Since linkers generally pull in symbols by object - * files, keeping these symbols as the only symbols in this file - * prevents utility programs such as "ompi_info" from having to import - * entire components just to query their version and parameters. - */ - -#include "ompi_config.h" -#include "sharedfp_addproc.h" -#include "mpi.h" - -/* - * Public string showing the sharedfp addproc component version number - */ -const char *mca_sharedfp_addproc_component_version_string = - "OMPI/MPI addproc SHAREDFP MCA component version " OMPI_VERSION; - -/* - * Global variables - */ -int mca_sharedfp_addproc_priority=1; -int mca_sharedfp_addproc_verbose=0; -#if 0 -char[MPI_MAX_HOSTNAME_LEN] mca_sharedfp_addproc_control_host; -#endif - -static int addproc_register(void); - -/* - * Instantiate the public struct with all of our public information - * and pointers to our public functions in it - */ -mca_sharedfp_base_component_2_0_0_t mca_sharedfp_addproc_component = { - - /* First, the mca_component_t struct containing meta information - about the component itself */ - - .sharedfpm_version = { - MCA_SHAREDFP_BASE_VERSION_2_0_0, - - /* Component name and version */ - .mca_component_name = "addproc", - MCA_BASE_MAKE_VERSION(component, OMPI_MAJOR_VERSION, OMPI_MINOR_VERSION, - OMPI_RELEASE_VERSION), - .mca_register_component_params = addproc_register, - }, - .sharedfpm_data = { - /* This component is checkpointable */ - MCA_BASE_METADATA_PARAM_CHECKPOINT - }, - .sharedfpm_init_query = mca_sharedfp_addproc_component_init_query, /* get thread level */ - .sharedfpm_file_query = mca_sharedfp_addproc_component_file_query, /* get priority and actions */ - .sharedfpm_file_unquery = mca_sharedfp_addproc_component_file_unquery, /* undo what was done by previous function */ -}; - - -static int addproc_register(void) -{ - mca_sharedfp_addproc_priority = 1; - (void) mca_base_component_var_register(&mca_sharedfp_addproc_component.sharedfpm_version, - "priority", "Priority of the addproc sharedfp component", - MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, - OPAL_INFO_LVL_9, - MCA_BASE_VAR_SCOPE_READONLY, &mca_sharedfp_addproc_priority); - mca_sharedfp_addproc_verbose = 0; - (void) mca_base_component_var_register(&mca_sharedfp_addproc_component.sharedfpm_version, - "verbose", "Verbosity of the addproc sharedfp component", - MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, - OPAL_INFO_LVL_9, - MCA_BASE_VAR_SCOPE_READONLY, &mca_sharedfp_addproc_verbose); - - -#if 0 - memset (mca_sharedfp_addproc_control_host, 0, MPI_MAX_HOSTNAME_LEN); - (void) mca_base_component_var_register(&mca_sharedfp_addproc_component.sharedfpm_version, - "control_host", "Name of the host where to spawn the control process(default:none)", - MCA_BASE_VAR_TYPE_STRING, NULL, 0, 0, - OPAL_INFO_LVL_9, - MCA_BASE_VAR_SCOPE_READONLY, &mca_sharedfp_addproc_control_host); - -#endif - return OMPI_SUCCESS; -} diff --git a/ompi/mca/sharedfp/addproc/sharedfp_addproc_control.c b/ompi/mca/sharedfp/addproc/sharedfp_addproc_control.c deleted file mode 100644 index 5287547573c..00000000000 --- a/ompi/mca/sharedfp/addproc/sharedfp_addproc_control.c +++ /dev/null @@ -1,231 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2005 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2013 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#include "sharedfp_addproc_control.h" - -/* #define PRINT_TAG 1 */ -void nodeDelete(node **front, node **rear) -{ - node *delNode; - if ((*front) == NULL && (*rear)==NULL) { - printf("The queue is empty\n"); - } - else { - delNode = *front; - if (*front == *rear) { - *rear = NULL; - } - (*front) = (*front)->Next; - - free(delNode); - } - - return; -} - -void nodeInsert(node **front, node **rear, int procNo, long numBytesArrAddr) -{ - node *newNode; - newNode = (node*)malloc(sizeof(node)); - - newNode->Next = NULL; - newNode->procNo = procNo; - newNode->numBytesArrAddr = numBytesArrAddr; - - - if ((*front == NULL) && (*rear == NULL)) { - *front = newNode; - *rear = newNode; -#if 0 - printf("Front and rear both NULL\n"); -#endif - fflush(stdout); - } - else { - (*rear)->Next = newNode; - *rear=newNode; -#if 0 - printf("Front and rear both not NULL\n"); -#endif - fflush(stdout); - } - - return; -} - -int Check_Request_Offset(int tag_received) -{ -#if 0 - printf("Tag received %d\n",tag_received); -#endif - - if (tag_received == REQUEST_TAG) { -#if 0 - printf("Return from Check_Request_Offset\n"); -#endif - return 1; - } - - - return 0; -} - -int Check_Acknowledgement(int tag_received) -{ - if (tag_received == ACK_TAG) - return 1; - - return 0; -} - -int End_control_shared_request(int tag_received) -{ - if (tag_received == END_TAG) - return 1; - - - return 0; -} - - -int main(int argc, char **argv) -{ - long recvBuff; - long offsetValue; - long endoffile; - int size; - int tag_received; - int END_FLAG = 0; - - int recvcount = 1; - MPI_Status status; - MPI_Comm parentComm; - static MPI_Offset offset = 0; - - /*statusStruct arr;*/ - - node *rear, *front; - rear = front = NULL; - -#if 0 - printf("addproc_control: MPI_INIT\n"); fflush(stdout); -#endif - MPI_Init(&argc,&argv); - -#if 0 - printf("addproc_control: MPI_Comm_size\n"); fflush(stdout); -#endif - MPI_Comm_size(MPI_COMM_WORLD,&size); - - - endoffile = 0; - -#if 0 - printf("addproc_control: start listening\n"); fflush(stdout); -#endif - while(!END_FLAG) { - - /* Receive request from other processes */ - MPI_Comm_get_parent(&parentComm); - - MPI_Recv(&recvBuff,recvcount,OMPI_OFFSET_DATATYPE,MPI_ANY_SOURCE,MPI_ANY_TAG,parentComm,&status); - tag_received = status.MPI_TAG; - - switch (tag_received) - { - - case REQUEST_TAG: -#if 0 - printf("addproc_control: Offset requested by the process %d\n",status.MPI_SOURCE); fflush(stdout); -#endif - /* Insert the node into the linked list */ - nodeInsert(&front,&rear,status.MPI_SOURCE,recvBuff); - break; - case END_TAG: -#if 0 - printf("addproc_control: End Control tag received\n"); fflush(stdout); -#endif - END_FLAG = 1; - break; - case SEEK_SET_TAG: - offset = recvBuff; - MPI_Send(&offset,1,OMPI_OFFSET_DATATYPE,status.MPI_SOURCE,SEEK_SET_TAG,parentComm); -#if 0 - printf("addproc_control: Seek set tag received\n"); fflush(stdout); -#endif - break; - case SEEK_CUR_TAG: -#if 0 - printf("addproc_control: Seek CUR Tag received\n"); fflush(stdout); -#endif - /*set the pointer to the offset*/ - offset += recvBuff; - MPI_Send(&offset,1,OMPI_OFFSET_DATATYPE,status.MPI_SOURCE,SEEK_CUR_TAG,parentComm); - break; - case SEEK_END_TAG: -#if 0 - printf("addproc_control: Seek END TAG received\n"); fflush(stdout); -#endif - offset = endoffile; - offset += recvBuff; - MPI_Send(&offset,1,OMPI_OFFSET_DATATYPE,status.MPI_SOURCE,SEEK_END_TAG,parentComm); - break; - case GET_POSITION_TAG: -#if 0 - printf("\naddproc_control: Get Position tag received\n"); fflush(stdout); -#endif - /*Send the offset as requested*/ - MPI_Send(&offset,1,OMPI_OFFSET_DATATYPE,status.MPI_SOURCE,GET_POSITION_TAG,parentComm); - break; - default: - printf("addproc_control: Unknown tag received\n"); fflush(stdout); - break; - } - - while (front != NULL) { - - offsetValue = offset; - - offset += front->numBytesArrAddr; - - /* Store the end of file */ - if (endoffile < offset) - endoffile = offset; - - - /* MPI_Send to the correct process */ - - MPI_Send(&offsetValue,1,OMPI_OFFSET_DATATYPE, front->procNo, OFFSET_TAG, - parentComm); - nodeDelete(&front,&rear); - - } - - } /* End of while(1) loop */ - -#if 0 - printf("addproc_control: finalizing mpi...\n"); fflush(stdout); -#endif - MPI_Finalize(); - -#if 0 - printf("addproc_control: Exiting...\n"); -#endif - return 0; -} diff --git a/ompi/mca/sharedfp/addproc/sharedfp_addproc_control.h b/ompi/mca/sharedfp/addproc/sharedfp_addproc_control.h deleted file mode 100644 index 40072e57f40..00000000000 --- a/ompi/mca/sharedfp/addproc/sharedfp_addproc_control.h +++ /dev/null @@ -1,37 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2006 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2008 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef MCA_SHAREDFP_addproc_control_H -#define MCA_SHAREDFP_addproc_control_H - -#include -#include "mpi.h" -#include "sharedfp_addproc.h" - -BEGIN_C_DECLS - -void nodeDelete(node **front, node **rear); -void nodeInsert(node **front, node **rear, int procNo, long numBytesArrAddr); -int Check_Request_Offset(int tag_received); -int Check_Acknowledgement(int tag_received); -int End_control_shared_request(int tag_received); - -END_C_DECLS - -#endif /* MCA_SHAREDFP_addproc_control_H */ diff --git a/ompi/mca/sharedfp/addproc/sharedfp_addproc_file_open.c b/ompi/mca/sharedfp/addproc/sharedfp_addproc_file_open.c deleted file mode 100644 index 5bea7fec6b3..00000000000 --- a/ompi/mca/sharedfp/addproc/sharedfp_addproc_file_open.c +++ /dev/null @@ -1,175 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2017 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2013-2016 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - - -#include "ompi_config.h" -#include "sharedfp_addproc.h" - -#include "mpi.h" -#include "ompi/constants.h" -#include "ompi/mca/sharedfp/sharedfp.h" -#include "ompi/mca/pml/pml.h" - -#include -#include -#include "ompi/mca/sharedfp/base/base.h" - - -int mca_sharedfp_addproc_file_open (struct ompi_communicator_t *comm, - const char* filename, - int amode, - struct ompi_info_t *info, - mca_io_ompio_file_t *fh) -{ - int ret = OMPI_SUCCESS, err; - int rank; - struct mca_sharedfp_base_data_t* sh; - mca_io_ompio_file_t * shfileHandle, *ompio_fh; - MPI_Comm newInterComm; - struct mca_sharedfp_addproc_data * addproc_data = NULL; - mca_io_ompio_data_t *data; - - - /*-------------------------------------------------*/ - /*Open the same file again without shared file pointer*/ - /*-------------------------------------------------*/ - shfileHandle = (mca_io_ompio_file_t *)malloc(sizeof(mca_io_ompio_file_t)); - ret = mca_common_ompio_file_open(comm,filename,amode,info,shfileHandle,false); - if ( OMPI_SUCCESS != ret) { - opal_output(0, "mca_sharedfp_addproc_file_open: Error during file open\n"); - return ret; - } - shfileHandle->f_fh = fh->f_fh; - data = (mca_io_ompio_data_t *) fh->f_fh->f_io_selected_data; - ompio_fh = &data->ompio_fh; - - err = mca_common_ompio_set_view (shfileHandle, - ompio_fh->f_disp, - ompio_fh->f_etype, - ompio_fh->f_orig_filetype, - ompio_fh->f_datarep, - MPI_INFO_NULL); - - /*Memory is allocated here for the sh structure*/ - if ( mca_sharedfp_addproc_verbose ) { - opal_output(ompi_sharedfp_base_framework.framework_output, - "mca_sharedfp_addproc_file_open: malloc f_sharedfp_ptr struct\n"); - } - sh = (struct mca_sharedfp_base_data_t*)malloc(sizeof(struct mca_sharedfp_base_data_t)); - if ( NULL == sh ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "mca_sharedfp_addproc_file_open: Error, unable to malloc f_sharedfp_ptr struct\n"); - return OMPI_ERR_OUT_OF_RESOURCE; - } - - /*Populate the sh file structure based on the implementation*/ - sh->sharedfh = shfileHandle; /* Shared file pointer*/ - sh->global_offset = 0; /* Global Offset*/ - sh->comm = comm; /* Communicator*/ - sh->selected_module_data = NULL; - - rank = ompi_comm_rank ( sh->comm ); - - if ( mca_sharedfp_addproc_verbose ) { - opal_output(ompi_sharedfp_base_framework.framework_output, - "mca_sharedfp_addproc_file_open: START spawn by rank=%d\n",rank); - } - - /*Spawn a new process which will maintain the offsets for this file open*/ - ret = MPI_Comm_spawn("mca_sharedfp_addproc_control", MPI_ARGV_NULL, 1, MPI_INFO_NULL, - 0, sh->comm, &newInterComm, &err); - if ( OMPI_SUCCESS != ret ) { - opal_output(0, "mca_sharedfp_addproc_file_open: error spawning control process ret=%d\n", - ret); - } - - /*If spawning successful*/ - if (newInterComm) { - addproc_data = (struct mca_sharedfp_addproc_data*)malloc(sizeof(struct mca_sharedfp_addproc_data)); - if ( NULL == addproc_data ){ - opal_output (0,"mca_sharedfp_addproc_file_open: Error, unable to malloc addproc_data struct\n"); - return OMPI_ERR_OUT_OF_RESOURCE; - } - - /*Store the new Intercommunicator*/ - addproc_data->intercom = newInterComm; - - /*save the addproc data*/ - sh->selected_module_data = addproc_data; - /*remember the shared file handle*/ - fh->f_sharedfp_data = sh; - } - else{ - opal_output(ompi_sharedfp_base_framework.framework_output, - "mca_sharedfp_addproc_file_open: DONE spawn by rank=%d, errcode[success=%d, err=%d]=%d\n", - rank, MPI_SUCCESS, MPI_ERR_SPAWN, ret); - ret = OMPI_ERROR; - } - - return ret; -} - -int mca_sharedfp_addproc_file_close (mca_io_ompio_file_t *fh) -{ - struct mca_sharedfp_base_data_t *sh=NULL; - int err = OMPI_SUCCESS; - long sendBuff = 0; - int count = 1; - int rank; - struct mca_sharedfp_addproc_data * addproc_data = NULL; - - if ( NULL == fh->f_sharedfp_data){ - /* Can happen with lazy initialization of the sharedfp structures */ - if ( mca_sharedfp_addproc_verbose ) { - opal_output(0, "sharedfp_addproc_file_close - shared file pointer structure not initialized\n"); - } - return OMPI_SUCCESS; - } - sh = fh->f_sharedfp_data; - - rank = ompi_comm_rank ( sh->comm ); - - /* Make sure that all processes are ready to release the - ** shared file pointer resources - */ - sh->comm->c_coll->coll_barrier(sh->comm, sh->comm->c_coll->coll_barrier_module ); - - addproc_data = (struct mca_sharedfp_addproc_data*)(sh->selected_module_data); - - if (addproc_data) { - /*tell additional proc to stop listening*/ - if(0 == rank){ - MCA_PML_CALL(send( &sendBuff, count, OMPI_OFFSET_DATATYPE, 0, END_TAG, - MCA_PML_BASE_SEND_STANDARD, addproc_data->intercom)); - } - - /* Free intercommunicator */ - if(addproc_data->intercom){ - ompi_comm_free(&(addproc_data->intercom)); - } - free(addproc_data); - } - - /* Close the main file opened by this component*/ - err = mca_common_ompio_file_close(sh->sharedfh); - - /*free shared file pointer data struct*/ - free(sh); - return err; -} diff --git a/ompi/mca/sharedfp/addproc/sharedfp_addproc_iread.c b/ompi/mca/sharedfp/addproc/sharedfp_addproc_iread.c deleted file mode 100644 index bb765305933..00000000000 --- a/ompi/mca/sharedfp/addproc/sharedfp_addproc_iread.c +++ /dev/null @@ -1,206 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2017 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2013-2016 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - - -#include "ompi_config.h" -#include "sharedfp_addproc.h" - -#include "mpi.h" -#include "ompi/constants.h" -#include "ompi/mca/sharedfp/sharedfp.h" -#include "ompi/mca/sharedfp/base/base.h" - - - -int mca_sharedfp_addproc_iread(mca_io_ompio_file_t *fh, - void *buf, - int count, - ompi_datatype_t *datatype, - MPI_Request * request) -{ - int ret = OMPI_SUCCESS; - OMPI_MPI_OFFSET_TYPE offset = 0; - long bytesRequested = 0; - size_t numofBytes; - struct mca_sharedfp_base_data_t *sh = NULL; - - if(NULL == fh->f_sharedfp_data){ - opal_output(0, "sharedfp_addproc_iread - shared file pointer structure not initialized correctly\n"); - return OMPI_ERROR; - } - - /* Calculate the number of bytes to write */ - opal_datatype_type_size ( &datatype->super ,&numofBytes); - bytesRequested = count * numofBytes; - - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "mca_sharedfp_addproc_iread: Bytes Requested is %ld\n",bytesRequested); - } - /* Retrieve the shared file data struct */ - sh = fh->f_sharedfp_data; - - /*Request to the additional process for the offset*/ - ret = mca_sharedfp_addproc_request_position(sh,bytesRequested,&offset); - offset /= sh->sharedfh->f_etype_size; - - if( OMPI_SUCCESS == ret ){ - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "mca_sharedfp_addproc_iread: Offset received is %lld\n",offset); - } - /* Read from the file */ - ret = mca_common_ompio_file_iread_at ( sh->sharedfh, offset, buf, count, datatype, request); - } - - return ret; -} -int mca_sharedfp_addproc_read_ordered_begin(mca_io_ompio_file_t *fh, - void *buf, - int count, - struct ompi_datatype_t *datatype) -{ - int ret = OMPI_SUCCESS; - OMPI_MPI_OFFSET_TYPE offset = 0, offsetReceived = 0; - long sendBuff = 0; - long *buff=NULL; - long offsetBuff, bytesRequested = 0; - size_t numofBytes; - int rank, size, i; - struct mca_sharedfp_base_data_t *sh = NULL; - - if(NULL == fh->f_sharedfp_data){ - opal_output(0, "sharedfp_addproc_read_ordered_begin: shared file pointer " - "structure not initialized correctly\n"); - return OMPI_ERROR; - } - - if ( true == fh->f_split_coll_in_use ) { - opal_output(0, "Only one split collective I/O operation allowed per " - "file handle at any given point in time!\n"); - return MPI_ERR_REQUEST; - } - - /*Retrieve the new communicator*/ - sh = fh->f_sharedfp_data; - - /* Calculate the number of bytes to read*/ - opal_datatype_type_size ( &datatype->super, &numofBytes); - sendBuff = count * numofBytes; - - /* Get the ranks in the communicator */ - rank = ompi_comm_rank ( sh->comm); - size = ompi_comm_size ( sh->comm); - - if ( 0 == rank ) { - buff = (long*)malloc(sizeof(OMPI_MPI_OFFSET_TYPE) * size); - if ( NULL == buff ) - return OMPI_ERR_OUT_OF_RESOURCE; - } - - ret = sh->comm->c_coll->coll_gather( &sendBuff, 1, OMPI_OFFSET_DATATYPE, - buff, 1, OMPI_OFFSET_DATATYPE, 0, sh->comm, - sh->comm->c_coll->coll_gather_module); - if ( OMPI_SUCCESS != ret ) { - goto exit; - } - - /* All the counts are present now in the recvBuff. - The size of recvBuff is sizeof_newComm - */ - if ( 0 == rank ) { - for (i = 0; i < size ; i ++) { - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "sharedfp_addproc_read_ordered_begin: Buff is %ld\n",buff[i]); - } - bytesRequested += buff[i]; - - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "sharedfp_addproc_read_ordered_begin: Bytes requested are %ld\n", - bytesRequested); - } - } - - /* Request the offset to read bytesRequested bytes - ** only the root process needs to do the request, - ** since the root process will then tell the other - ** processes at what offset they should read their - ** share of the data. - */ - ret = mca_sharedfp_addproc_request_position(sh,bytesRequested,&offsetReceived); - if( OMPI_SUCCESS != ret ){ - goto exit; - } - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "sharedfp_addproc_read_ordered_begin: Offset received is %lld\n", - offsetReceived); - } - buff[0] += offsetReceived; - - - for (i = 1 ; i < size; i++) { - buff[i] += buff[i-1]; - } - } - - /* Scatter the results to the other processes*/ - ret = sh->comm->c_coll->coll_scatter ( buff, 1, OMPI_OFFSET_DATATYPE, &offsetBuff, - 1, OMPI_OFFSET_DATATYPE, 0, sh->comm, - sh->comm->c_coll->coll_scatter_module ); - if ( OMPI_SUCCESS != ret ) { - goto exit; - } - - /*Each process now has its own individual offset in recvBUFF*/ - offset = offsetBuff - sendBuff; - offset /= sh->sharedfh->f_etype_size; - - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "sharedfp_addproc_read_ordered_begin: Offset returned is %lld\n",offset); - } - - /* read from the file */ - ret = mca_common_ompio_file_iread_at_all(sh->sharedfh,offset,buf,count,datatype,&fh->f_split_coll_req); - fh->f_split_coll_in_use = true; - -exit: - if ( NULL != buff ) { - free ( buff ); - } - - return ret; - -} - - -int mca_sharedfp_addproc_read_ordered_end(mca_io_ompio_file_t *fh, - void *buf, - ompi_status_public_t *status) -{ - int ret = OMPI_SUCCESS; - ret = ompi_request_wait ( &fh->f_split_coll_req, status ); - - /* remove the flag again */ - fh->f_split_coll_in_use = false; - return ret; -} diff --git a/ompi/mca/sharedfp/addproc/sharedfp_addproc_iwrite.c b/ompi/mca/sharedfp/addproc/sharedfp_addproc_iwrite.c deleted file mode 100644 index 011c494d28d..00000000000 --- a/ompi/mca/sharedfp/addproc/sharedfp_addproc_iwrite.c +++ /dev/null @@ -1,200 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2017 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2013-2016 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - - -#include "ompi_config.h" -#include "sharedfp_addproc.h" - -#include "mpi.h" -#include "ompi/constants.h" -#include "ompi/mca/sharedfp/sharedfp.h" -#include "ompi/mca/sharedfp/base/base.h" - -int mca_sharedfp_addproc_iwrite(mca_io_ompio_file_t *fh, - const void *buf, - int count, - ompi_datatype_t *datatype, - MPI_Request * request) -{ - int ret = OMPI_SUCCESS; - OMPI_MPI_OFFSET_TYPE offset = 0; - long bytesRequested = 0; - size_t numofBytes; - struct mca_sharedfp_base_data_t *sh = NULL; - - if(NULL == fh->f_sharedfp_data){ - opal_output(0, "sharedfp_addproc_iwrite: shared file pointer structure not initialized correctly\n"); - return OMPI_ERROR; - } - - /* Calculate the number of bytes to write */ - opal_datatype_type_size ( &datatype->super, &numofBytes); - bytesRequested = count * numofBytes; - - /* Retrieve the shared file data struct */ - sh = fh->f_sharedfp_data; - - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "sharedfp_addproc_iwrite: Bytes Requested is %ld\n",bytesRequested); - } - /* Request the offset to write bytesRequested bytes */ - ret = mca_sharedfp_addproc_request_position(sh,bytesRequested,&offset); - offset /= sh->sharedfh->f_etype_size; - - if ( OMPI_SUCCESS == ret ) { - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "sharedfp_addproc_iwrite: Offset received is %lld\n",offset); - } - /* Write to the file */ - ret = mca_common_ompio_file_iwrite_at(sh->sharedfh,offset,buf,count,datatype,request); - } - - return ret; -} - -int mca_sharedfp_addproc_write_ordered_begin(mca_io_ompio_file_t *fh, - const void *buf, - int count, - struct ompi_datatype_t *datatype) -{ - int ret = OMPI_SUCCESS; - OMPI_MPI_OFFSET_TYPE offset = 0, offsetReceived = 0; - long sendBuff = 0; - long *buff=NULL; - long offsetBuff; - long bytesRequested = 0; - int recvcnt = 1, sendcnt = 1; - size_t numofBytes; - int rank, size, i; - struct mca_sharedfp_base_data_t *sh = NULL; - - if ( NULL == fh->f_sharedfp_data){ - opal_output(0, "sharedfp_addproc_write_ordered_begin:" - " shared file pointer structure not initialized correctly\n"); - return OMPI_ERROR; - } - - if ( true == fh->f_split_coll_in_use ) { - opal_output(0, "Only one split collective I/O operation allowed per file handle " - "at any given point in time!\n"); - return MPI_ERR_REQUEST; - } - - /*Retrieve the shared file pointer structure*/ - sh = fh->f_sharedfp_data; - - /* Calculate the number of bytes to write*/ - opal_datatype_type_size ( &datatype->super, &numofBytes); - sendBuff = count * numofBytes; - - /* Get the ranks in the communicator */ - rank = ompi_comm_rank ( sh->comm ); - size = ompi_comm_size ( sh->comm ); - - if ( 0 == rank ) { - buff = (long*)malloc(sizeof(OMPI_MPI_OFFSET_TYPE) * size); - if ( NULL == buff ) - return OMPI_ERR_OUT_OF_RESOURCE; - } - - ret = sh->comm->c_coll->coll_gather ( &sendBuff, sendcnt, OMPI_OFFSET_DATATYPE, buff, - recvcnt, OMPI_OFFSET_DATATYPE, 0, sh->comm, - sh->comm->c_coll->coll_gather_module); - if( OMPI_SUCCESS != ret ){ - goto exit; - } - - /* All the counts are present now in the recvBuff. - The size of recvBuff is sizeof_newComm - */ - if ( 0 == rank ) { - for (i = 0; i < size ; i ++) { - bytesRequested += buff[i]; - - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "sharedfp_addproc_write_ordered_begin: Bytes requested are %ld\n", - bytesRequested); - } - } - - /* Request the offset to write bytesRequested bytes - ** only the root process needs to do the request, - ** since the root process will then tell the other - ** processes at what offset they should write their - ** share of the data. - */ - ret = mca_sharedfp_addproc_request_position(sh,bytesRequested,&offsetReceived); - if( OMPI_SUCCESS != ret ){ - goto exit; - } - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "sharedfp_addproc_write_ordered_begin: Offset received is %lld\n", - offsetReceived); - } - buff[0] += offsetReceived; - - for (i = 1 ; i < size; i++) { - buff[i] += buff[i-1]; - } - } - - /* Scatter the results to the other processes*/ - ret = sh->comm->c_coll->coll_scatter ( buff, sendcnt, OMPI_OFFSET_DATATYPE, &offsetBuff, - recvcnt, OMPI_OFFSET_DATATYPE, 0, sh->comm, - sh->comm->c_coll->coll_scatter_module ); - if( OMPI_SUCCESS != ret ){ - goto exit; - } - - /*Each process now has its own individual offset in recvBUFF*/ - offset = offsetBuff - sendBuff; - offset /= sh->sharedfh->f_etype_size; - - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "sharedfp_addproc_write_ordered: Offset returned is %lld\n",offset); - } - - /* write to the file */ - ret = mca_common_ompio_file_iwrite_at_all(sh->sharedfh,offset,buf,count,datatype,&fh->f_split_coll_req); - fh->f_split_coll_in_use = true; - -exit: - if ( NULL != buff ) { - free ( buff ); - } - return ret; -} - - -int mca_sharedfp_addproc_write_ordered_end(mca_io_ompio_file_t *fh, - const void *buf, - ompi_status_public_t *status) -{ - int ret = OMPI_SUCCESS; - ret = ompi_request_wait ( &fh->f_split_coll_req, status ); - - /* remove the flag again */ - fh->f_split_coll_in_use = false; - return ret; -} diff --git a/ompi/mca/sharedfp/addproc/sharedfp_addproc_read.c b/ompi/mca/sharedfp/addproc/sharedfp_addproc_read.c deleted file mode 100644 index 02bb7a7817d..00000000000 --- a/ompi/mca/sharedfp/addproc/sharedfp_addproc_read.c +++ /dev/null @@ -1,183 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2017 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2013-2016 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - - -#include "ompi_config.h" -#include "sharedfp_addproc.h" - -#include "mpi.h" -#include "ompi/constants.h" -#include "ompi/mca/sharedfp/sharedfp.h" -#include "ompi/mca/sharedfp/base/base.h" - -int mca_sharedfp_addproc_read ( mca_io_ompio_file_t *fh, - void *buf, int count, MPI_Datatype datatype, MPI_Status *status) -{ - int ret = OMPI_SUCCESS; - OMPI_MPI_OFFSET_TYPE offset = 0; - long bytesRequested = 0; - size_t numofBytes; - struct mca_sharedfp_base_data_t *sh = NULL; - - if(NULL == fh->f_sharedfp_data){ - opal_output(0, "sharedfp_addproc_read: shared file pointer " - "structure not initialized correctly\n"); - return OMPI_ERROR; - } - - /* Calculate the number of bytes to write */ - opal_datatype_type_size ( &datatype->super ,&numofBytes); - bytesRequested = count * numofBytes; - - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "mca_sharedfp_addproc_read: Bytes Requested is %ld\n", bytesRequested); - } - /* Retrieve the shared file data struct */ - sh = fh->f_sharedfp_data; - - /*Request to the additional process for the offset*/ - ret = mca_sharedfp_addproc_request_position(sh,bytesRequested,&offset); - offset /= sh->sharedfh->f_etype_size; - - if( OMPI_SUCCESS == ret ){ - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "mca_sharedfp_addproc_read: Offset received is %lld\n",offset); - } - /* Read from the file */ - ret = mca_common_ompio_file_read_at(sh->sharedfh,offset,buf,count,datatype,status); - } - - return ret; -} - -int mca_sharedfp_addproc_read_ordered (mca_io_ompio_file_t *fh, - void *buf, - int count, - struct ompi_datatype_t *datatype, - ompi_status_public_t *status) -{ - int ret = OMPI_SUCCESS; - OMPI_MPI_OFFSET_TYPE offset = 0, offsetReceived = 0; - long sendBuff = 0; - long *buff=NULL; - long offsetBuff, bytesRequested = 0; - size_t numofBytes; - int rank, size, i; - struct mca_sharedfp_base_data_t *sh = NULL; - - if(NULL == fh->f_sharedfp_data){ - opal_output(0, "sharedfp_addproc_read_ordered: shared file pointer " - "structure not initialized correctly\n"); - return OMPI_ERROR; - } - - /*Retrieve the new communicator*/ - sh = fh->f_sharedfp_data; - - /* Calculate the number of bytes to read*/ - opal_datatype_type_size ( &datatype->super, &numofBytes); - sendBuff = count * numofBytes; - - /* Get the ranks in the communicator */ - rank = ompi_comm_rank ( sh->comm); - size = ompi_comm_size ( sh->comm); - - if ( 0 == rank ) { - buff = (long*)malloc(sizeof(OMPI_MPI_OFFSET_TYPE) * size); - if ( NULL == buff ) - return OMPI_ERR_OUT_OF_RESOURCE; - } - - ret = sh->comm->c_coll->coll_gather( &sendBuff, 1, OMPI_OFFSET_DATATYPE, - buff, 1, OMPI_OFFSET_DATATYPE, 0, sh->comm, - sh->comm->c_coll->coll_gather_module); - if ( OMPI_SUCCESS != ret ) { - goto exit; - } - - /* All the counts are present now in the recvBuff. - The size of recvBuff is sizeof_newComm - */ - if ( 0 == rank ) { - for (i = 0; i < size ; i ++) { - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "sharedfp_addproc_read_ordered: Buff is %ld\n",buff[i]); - } - bytesRequested += buff[i]; - - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "sharedfp_addproc_read_ordered: Bytes requested are %ld\n", - bytesRequested); - } - } - - /* Request the offset to read bytesRequested bytes - ** only the root process needs to do the request, - ** since the root process will then tell the other - ** processes at what offset they should read their - ** share of the data. - */ - ret = mca_sharedfp_addproc_request_position(sh,bytesRequested,&offsetReceived); - if( OMPI_SUCCESS != ret ){ - goto exit; - } - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "sharedfp_addproc_read_ordered: Offset received is %lld\n", - offsetReceived); - } - buff[0] += offsetReceived; - - - for (i = 1 ; i < size; i++) { - buff[i] += buff[i-1]; - } - } - - /* Scatter the results to the other processes*/ - ret = sh->comm->c_coll->coll_scatter ( buff, 1, OMPI_OFFSET_DATATYPE, &offsetBuff, - 1, OMPI_OFFSET_DATATYPE, 0, sh->comm, - sh->comm->c_coll->coll_scatter_module ); - if ( OMPI_SUCCESS != ret ) { - goto exit; - } - - /*Each process now has its own individual offset in recvBUFF*/ - offset = offsetBuff - sendBuff; - offset /= sh->sharedfh->f_etype_size; - - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "sharedfp_addproc_read_ordered: Offset returned is %lld\n",offset); - } - - /* read from the file */ - ret = mca_common_ompio_file_read_at_all(sh->sharedfh,offset,buf,count,datatype,status); - -exit: - if ( NULL != buff ) { - free ( buff ); - } - - return ret; -} diff --git a/ompi/mca/sharedfp/addproc/sharedfp_addproc_request_position.c b/ompi/mca/sharedfp/addproc/sharedfp_addproc_request_position.c deleted file mode 100644 index c9b84eac39e..00000000000 --- a/ompi/mca/sharedfp/addproc/sharedfp_addproc_request_position.c +++ /dev/null @@ -1,75 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2005 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2013 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - - -#include "ompi_config.h" -#include "sharedfp_addproc.h" - -#include "mpi.h" -#include "ompi/constants.h" -#include "ompi/mca/pml/pml.h" -#include "ompi/mca/sharedfp/sharedfp.h" - -int mca_sharedfp_addproc_request_position(struct mca_sharedfp_base_data_t * sh, - int bytes_requested, - OMPI_MPI_OFFSET_TYPE *offset) -{ - int ret = OMPI_SUCCESS; - OMPI_MPI_OFFSET_TYPE position = 0; - long sendBuff = bytes_requested ; - int count = 1; - - - struct mca_sharedfp_addproc_data * addproc_data = sh->selected_module_data; - - *offset = 0; - - ret = MCA_PML_CALL(send( &sendBuff, count, OMPI_OFFSET_DATATYPE, 0, REQUEST_TAG, - MCA_PML_BASE_SEND_STANDARD, addproc_data->intercom)); - if ( OMPI_SUCCESS != ret ) { - return ret; - } - ret = MCA_PML_CALL(recv( &position, count, OMPI_OFFSET_DATATYPE, 0, OFFSET_TAG, - addproc_data->intercom, MPI_STATUS_IGNORE)); - - *offset = position; - return ret; -} - -int mca_sharedfp_addproc_get_position(mca_io_ompio_file_t *fh, - OMPI_MPI_OFFSET_TYPE * offset) -{ - int ret = OMPI_SUCCESS; - struct mca_sharedfp_base_data_t *sh = NULL; - - if(NULL == fh->f_sharedfp_data){ - opal_output(0, "sharedfp_addproc_get_position - shared file pointer structure not initialized correctly\n"); - return OMPI_ERROR; - } - - /* Retrieve the shared file data struct*/ - sh = fh->f_sharedfp_data; - - /* Requesting the offset to write 0 bytes, - ** returns the current offset w/o updating it - */ - ret = mca_sharedfp_addproc_request_position(sh, 0, offset); - - return ret; -} diff --git a/ompi/mca/sharedfp/addproc/sharedfp_addproc_seek.c b/ompi/mca/sharedfp/addproc/sharedfp_addproc_seek.c deleted file mode 100644 index daab8f81dc0..00000000000 --- a/ompi/mca/sharedfp/addproc/sharedfp_addproc_seek.c +++ /dev/null @@ -1,69 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2017 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2013 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - - -#include "ompi_config.h" -#include "sharedfp_addproc.h" - -#include "mpi.h" -#include "ompi/constants.h" -#include "ompi/mca/pml/pml.h" -#include "ompi/mca/sharedfp/sharedfp.h" - -int -mca_sharedfp_addproc_seek (mca_io_ompio_file_t *fh, - OMPI_MPI_OFFSET_TYPE offset, int whence) -{ - int rank; - int ret = OMPI_SUCCESS; - OMPI_MPI_OFFSET_TYPE position = 0; - struct mca_sharedfp_base_data_t *sh = NULL; - struct mca_sharedfp_addproc_data * addproc_data = sh->selected_module_data; - long buff = 0; - - if(NULL == fh->f_sharedfp_data){ - opal_output(0, "sharedfp_addproc_write_ordered - shared file pointer structure not initialized correctly\n"); - return OMPI_ERROR; - } - - sh = fh->f_sharedfp_data; - rank = ompi_comm_rank ( sh->comm ); - buff = offset; - - - /* This is a collective call, - * only one process needs to communicate with the */ - if(0 == rank){ - ret = MCA_PML_CALL(send ( &buff, 1, OMPI_OFFSET_DATATYPE, 0, whence, - MCA_PML_BASE_SEND_STANDARD, - addproc_data->intercom)); - if ( OMPI_SUCCESS != ret ) { - return OMPI_ERROR; - } - ret = MCA_PML_CALL(recv(&position, 1, OMPI_OFFSET_DATATYPE, 0, whence, - addproc_data->intercom, MPI_STATUS_IGNORE)); - if ( OMPI_SUCCESS != ret ) { - return OMPI_ERROR; - } - - } - ret = sh->comm->c_coll->coll_barrier(sh->comm, sh->comm->c_coll->coll_barrier_module); - - return ret; -} diff --git a/ompi/mca/sharedfp/addproc/sharedfp_addproc_write.c b/ompi/mca/sharedfp/addproc/sharedfp_addproc_write.c deleted file mode 100644 index 2c555124a37..00000000000 --- a/ompi/mca/sharedfp/addproc/sharedfp_addproc_write.c +++ /dev/null @@ -1,183 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2017 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2013-2016 University of Houston. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - - -#include "ompi_config.h" -#include "sharedfp_addproc.h" - -#include "mpi.h" -#include "ompi/constants.h" -#include "ompi/mca/sharedfp/sharedfp.h" -#include "ompi/mca/sharedfp/base/base.h" - -int mca_sharedfp_addproc_write (mca_io_ompio_file_t *fh, - const void *buf, - int count, - struct ompi_datatype_t *datatype, - ompi_status_public_t *status) -{ - int ret = OMPI_SUCCESS; - OMPI_MPI_OFFSET_TYPE offset = 0; - long bytesRequested = 0; - size_t numofBytes; - struct mca_sharedfp_base_data_t *sh = NULL; - - if(NULL == fh->f_sharedfp_data){ - opal_output(0, "sharedfp_addproc_write: shared file pointer structure not initialized correctly\n"); - return OMPI_ERROR; - } - - /* Calculate the number of bytes to write*/ - opal_datatype_type_size ( &datatype->super, &numofBytes); - bytesRequested = count * numofBytes; - - /*Retrieve the shared file data structure */ - sh = fh->f_sharedfp_data; - - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "sharedfp_addproc_write: sharedfp_addproc_write: Bytes Requested is %ld\n", - bytesRequested); - } - - /*Request the offset to write bytesRequested bytes*/ - ret = mca_sharedfp_addproc_request_position( sh, bytesRequested, &offset); - offset /= sh->sharedfh->f_etype_size; - if ( OMPI_SUCCESS == ret ) { - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "sharedfp_addproc_write: Offset received is %lld\n",offset); - } - /* Write to the file */ - ret = mca_common_ompio_file_write_at(sh->sharedfh,offset,buf,count,datatype,status); - } - - return ret; -} - -int mca_sharedfp_addproc_write_ordered (mca_io_ompio_file_t *fh, - const void *buf, - int count, - struct ompi_datatype_t *datatype, - ompi_status_public_t *status) -{ - int ret = OMPI_SUCCESS; - OMPI_MPI_OFFSET_TYPE offset = 0, offsetReceived = 0; - long sendBuff = 0; - long *buff=NULL; - long offsetBuff; - long bytesRequested = 0; - int recvcnt = 1, sendcnt = 1; - size_t numofBytes; - int rank, size, i; - struct mca_sharedfp_base_data_t *sh = NULL; - - if(NULL == fh->f_sharedfp_data){ - opal_output(0, "sharedfp_addproc_write_ordered: shared file pointer " - "structure not initialized correctly\n"); - return OMPI_ERROR; - } - - /*Retrieve the shared file pointer structure*/ - sh = fh->f_sharedfp_data; - - /* Calculate the number of bytes to write*/ - opal_datatype_type_size ( &datatype->super, &numofBytes); - sendBuff = count * numofBytes; - - /* Get the ranks in the communicator */ - rank = ompi_comm_rank ( sh->comm ); - size = ompi_comm_size ( sh->comm ); - - if ( 0 == rank ) { - buff = (long*)malloc(sizeof(OMPI_MPI_OFFSET_TYPE) * size); - if ( NULL == buff ) - return OMPI_ERR_OUT_OF_RESOURCE; - } - - ret = sh->comm->c_coll->coll_gather ( &sendBuff, sendcnt, OMPI_OFFSET_DATATYPE, buff, - recvcnt, OMPI_OFFSET_DATATYPE, 0, sh->comm, - sh->comm->c_coll->coll_gather_module); - if( OMPI_SUCCESS != ret ){ - goto exit; - } - - /* All the counts are present now in the recvBuff. - The size of recvBuff is sizeof_newComm - */ - if ( 0 == rank ) { - for (i = 0; i < size ; i ++) { - bytesRequested += buff[i]; - - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "sharedfp_addproc_write_ordered: Bytes requested are %ld\n", - bytesRequested); - } - } - - /* Request the offset to write bytesRequested bytes - ** only the root process needs to do the request, - ** since the root process will then tell the other - ** processes at what offset they should write their - ** share of the data. - */ - ret = mca_sharedfp_addproc_request_position(sh,bytesRequested,&offsetReceived); - if( OMPI_SUCCESS != ret ){ - goto exit; - } - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "sharedfp_addproc_write_ordered: Offset received is %lld\n", - offsetReceived); - } - buff[0] += offsetReceived; - - for (i = 1 ; i < size; i++) { - buff[i] += buff[i-1]; - } - } - - /* Scatter the results to the other processes*/ - ret = sh->comm->c_coll->coll_scatter ( buff, sendcnt, OMPI_OFFSET_DATATYPE, &offsetBuff, - recvcnt, OMPI_OFFSET_DATATYPE, 0, sh->comm, - sh->comm->c_coll->coll_scatter_module ); - if( OMPI_SUCCESS != ret ){ - goto exit; - } - - /*Each process now has its own individual offset in recvBUFF*/ - offset = offsetBuff - sendBuff; - offset /= sh->sharedfh->f_etype_size; - - if ( mca_sharedfp_addproc_verbose ){ - opal_output(ompi_sharedfp_base_framework.framework_output, - "sharedfp_addproc_write_ordered: Offset returned is %lld\n", - offset); - } - - /* write to the file */ - ret = mca_common_ompio_file_write_at_all(sh->sharedfh,offset,buf,count,datatype,status); - -exit: - if ( NULL != buff ) { - free ( buff ); - } - return ret; -} diff --git a/ompi/mca/sharedfp/configure.m4 b/ompi/mca/sharedfp/configure.m4 index 9859df4226d..a8ed878e600 100644 --- a/ompi/mca/sharedfp/configure.m4 +++ b/ompi/mca/sharedfp/configure.m4 @@ -1,7 +1,7 @@ # -*- shell-script -*- # # Copyright (c) 2011 Cisco Systems, Inc. All rights reserved. -# Copyright (c) 2016 Research Organization for Information Science +# Copyright (c) 2016-2017 Research Organization for Information Science # and Technology (RIST). All rights reserved. # # $COPYRIGHT$ @@ -17,8 +17,7 @@ AC_DEFUN([MCA_ompi_sharedfp_CONFIG], [ OPAL_VAR_SCOPE_PUSH([want_io_ompio]) - AS_IF([test "$enable_mpi_io" != "no" && - test "$enable_io_ompio" != "no"], + AS_IF([test "$enable_io_ompio" != "no"], [want_io_ompio=1], [want_io_ompio=0]) diff --git a/ompi/mca/sharedfp/individual/Makefile.am b/ompi/mca/sharedfp/individual/Makefile.am index 36c090604c0..d0a4ed34ba4 100644 --- a/ompi/mca/sharedfp/individual/Makefile.am +++ b/ompi/mca/sharedfp/individual/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2008 University of Houston. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -33,6 +34,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_sharedfp_individual_la_SOURCES = $(sources) mca_sharedfp_individual_la_LDFLAGS = -module -avoid-version +mca_sharedfp_individual_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_sharedfp_individual_la_SOURCES = $(sources) diff --git a/ompi/mca/sharedfp/individual/sharedfp_individual.c b/ompi/mca/sharedfp/individual/sharedfp_individual.c index 262e3aeefa3..9eea5c1263a 100644 --- a/ompi/mca/sharedfp/individual/sharedfp_individual.c +++ b/ompi/mca/sharedfp/individual/sharedfp_individual.c @@ -10,6 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2013-2015 University of Houston. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -72,7 +73,7 @@ struct mca_sharedfp_base_module_1_0_0_t * mca_sharedfp_individual_component_file int amode; bool wronly_flag=false; bool relaxed_order_flag=false; - MPI_Info info; + opal_info_t *info; int flag; int valuelen; char value[MPI_MAX_INFO_VAL+1]; @@ -101,9 +102,9 @@ struct mca_sharedfp_base_module_1_0_0_t * mca_sharedfp_individual_component_file /*---------------------------------------------------------*/ /* 2. Did the user specify MPI_INFO relaxed ordering flag? */ info = fh->f_info; - if ( info != MPI_INFO_NULL ){ + if ( info != &(MPI_INFO_NULL->super) ){ valuelen = MPI_MAX_INFO_VAL; - ompi_info_get ( info,"OMPIO_SHAREDFP_RELAXED_ORDERING", valuelen, value, &flag); + opal_info_get ( info,"OMPIO_SHAREDFP_RELAXED_ORDERING", valuelen, value, &flag); if ( flag ) { if ( mca_sharedfp_individual_verbose ) { opal_output(ompi_sharedfp_base_framework.framework_output, diff --git a/ompi/mca/sharedfp/individual/sharedfp_individual.h b/ompi/mca/sharedfp/individual/sharedfp_individual.h index b5f0d5e5be6..8674711cbb7 100644 --- a/ompi/mca/sharedfp/individual/sharedfp_individual.h +++ b/ompi/mca/sharedfp/individual/sharedfp_individual.h @@ -12,6 +12,7 @@ * Copyright (c) 2013-2016 University of Houston. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -57,7 +58,7 @@ int mca_sharedfp_individual_seek (mca_io_ompio_file_t *fh, int mca_sharedfp_individual_file_open (struct ompi_communicator_t *comm, const char* filename, int amode, - struct ompi_info_t *info, + struct opal_info_t *info, mca_io_ompio_file_t *fh); int mca_sharedfp_individual_file_close (mca_io_ompio_file_t *fh); int mca_sharedfp_individual_read (mca_io_ompio_file_t *fh, diff --git a/ompi/mca/sharedfp/individual/sharedfp_individual_file_open.c b/ompi/mca/sharedfp/individual/sharedfp_individual_file_open.c index f259e8750d8..90e106700a2 100644 --- a/ompi/mca/sharedfp/individual/sharedfp_individual_file_open.c +++ b/ompi/mca/sharedfp/individual/sharedfp_individual_file_open.c @@ -12,6 +12,7 @@ * Copyright (c) 2013-2016 University of Houston. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -33,7 +34,7 @@ int mca_sharedfp_individual_file_open (struct ompi_communicator_t *comm, const char* filename, int amode, - struct ompi_info_t *info, + struct opal_info_t *info, mca_io_ompio_file_t *fh) { int err = 0; @@ -113,7 +114,7 @@ int mca_sharedfp_individual_file_open (struct ompi_communicator_t *comm, } err = mca_common_ompio_file_open(MPI_COMM_SELF, datafilename, MPI_MODE_RDWR | MPI_MODE_CREATE | MPI_MODE_DELETE_ON_CLOSE, - MPI_INFO_NULL, datafilehandle, false); + &(MPI_INFO_NULL->super), datafilehandle, false); if ( OMPI_SUCCESS != err) { opal_output(0, "mca_sharedfp_individual_file_open: Error during datafile file open\n"); free (shfileHandle ); @@ -156,7 +157,7 @@ int mca_sharedfp_individual_file_open (struct ompi_communicator_t *comm, } err = mca_common_ompio_file_open ( MPI_COMM_SELF,metadatafilename, MPI_MODE_RDWR | MPI_MODE_CREATE | MPI_MODE_DELETE_ON_CLOSE, - MPI_INFO_NULL, metadatafilehandle, false); + &(MPI_INFO_NULL->super), metadatafilehandle, false); if ( OMPI_SUCCESS != err) { opal_output(0, "mca_sharedfp_individual_file_open: Error during metadatafile file open\n"); free (shfileHandle ); diff --git a/ompi/mca/sharedfp/lockedfile/Makefile.am b/ompi/mca/sharedfp/lockedfile/Makefile.am index c0ea5abdd51..b0151c56126 100644 --- a/ompi/mca/sharedfp/lockedfile/Makefile.am +++ b/ompi/mca/sharedfp/lockedfile/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2008 University of Houston. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -33,6 +34,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_sharedfp_lockedfile_la_SOURCES = $(sources) mca_sharedfp_lockedfile_la_LDFLAGS = -module -avoid-version +mca_sharedfp_lockedfile_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_sharedfp_lockedfile_la_SOURCES = $(sources) diff --git a/ompi/mca/sharedfp/lockedfile/sharedfp_lockedfile.h b/ompi/mca/sharedfp/lockedfile/sharedfp_lockedfile.h index 0e1faa35842..2eede80bb78 100644 --- a/ompi/mca/sharedfp/lockedfile/sharedfp_lockedfile.h +++ b/ompi/mca/sharedfp/lockedfile/sharedfp_lockedfile.h @@ -12,6 +12,7 @@ * Copyright (c) 2013-2016 University of Houston. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -57,7 +58,7 @@ int mca_sharedfp_lockedfile_get_position (mca_io_ompio_file_t *fh, int mca_sharedfp_lockedfile_file_open (struct ompi_communicator_t *comm, const char* filename, int amode, - struct ompi_info_t *info, + struct opal_info_t *info, mca_io_ompio_file_t *fh); int mca_sharedfp_lockedfile_file_close (mca_io_ompio_file_t *fh); int mca_sharedfp_lockedfile_read (mca_io_ompio_file_t *fh, diff --git a/ompi/mca/sharedfp/lockedfile/sharedfp_lockedfile_file_open.c b/ompi/mca/sharedfp/lockedfile/sharedfp_lockedfile_file_open.c index 0ed762b452e..89bdf56aa45 100644 --- a/ompi/mca/sharedfp/lockedfile/sharedfp_lockedfile_file_open.c +++ b/ompi/mca/sharedfp/lockedfile/sharedfp_lockedfile_file_open.c @@ -12,6 +12,7 @@ * Copyright (c) 2013-2017 University of Houston. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -38,7 +39,7 @@ int mca_sharedfp_lockedfile_file_open (struct ompi_communicator_t *comm, const char* filename, int amode, - struct ompi_info_t *info, + struct opal_info_t *info, mca_io_ompio_file_t *fh) { int err = MPI_SUCCESS; @@ -67,7 +68,7 @@ int mca_sharedfp_lockedfile_file_open (struct ompi_communicator_t *comm, ompio_fh->f_etype, ompio_fh->f_orig_filetype, ompio_fh->f_datarep, - MPI_INFO_NULL); + &(MPI_INFO_NULL->super)); /*Memory is allocated here for the sh structure*/ diff --git a/ompi/mca/sharedfp/sharedfp.h b/ompi/mca/sharedfp/sharedfp.h index 1c370c00f3d..2d5d969315b 100644 --- a/ompi/mca/sharedfp/sharedfp.h +++ b/ompi/mca/sharedfp/sharedfp.h @@ -15,6 +15,7 @@ * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -30,6 +31,7 @@ #include "ompi/mca/mca.h" #include "opal/mca/base/base.h" #include "ompi/request/request.h" +#include "ompi/info/info.h" BEGIN_C_DECLS @@ -176,7 +178,7 @@ typedef int (*mca_sharedfp_base_module_read_ordered_end_fn_t)( ompi_status_public_t *status); typedef int (*mca_sharedfp_base_module_file_open_fn_t)( struct ompi_communicator_t *comm, const char *filename, int amode, - struct ompi_info_t *info, struct mca_io_ompio_file_t *fh); + struct opal_info_t *info, struct mca_io_ompio_file_t *fh); typedef int (*mca_sharedfp_base_module_file_close_fn_t)(struct mca_io_ompio_file_t *fh); diff --git a/ompi/mca/sharedfp/sm/Makefile.am b/ompi/mca/sharedfp/sm/Makefile.am index 2783a9ad679..3553cb80c51 100644 --- a/ompi/mca/sharedfp/sm/Makefile.am +++ b/ompi/mca/sharedfp/sm/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2008 University of Houston. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -33,6 +34,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_sharedfp_sm_la_SOURCES = $(sources) mca_sharedfp_sm_la_LDFLAGS = -module -avoid-version +mca_sharedfp_sm_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_sharedfp_sm_la_SOURCES = $(sources) diff --git a/ompi/mca/sharedfp/sm/sharedfp_sm.h b/ompi/mca/sharedfp/sm/sharedfp_sm.h index b10bbcf141d..ec8d0f4ed6f 100644 --- a/ompi/mca/sharedfp/sm/sharedfp_sm.h +++ b/ompi/mca/sharedfp/sm/sharedfp_sm.h @@ -13,6 +13,7 @@ * Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -57,7 +58,7 @@ int mca_sharedfp_sm_get_position (mca_io_ompio_file_t *fh, int mca_sharedfp_sm_file_open (struct ompi_communicator_t *comm, const char* filename, int amode, - struct ompi_info_t *info, + struct opal_info_t *info, mca_io_ompio_file_t *fh); int mca_sharedfp_sm_file_close (mca_io_ompio_file_t *fh); int mca_sharedfp_sm_read (mca_io_ompio_file_t *fh, diff --git a/ompi/mca/sharedfp/sm/sharedfp_sm_file_open.c b/ompi/mca/sharedfp/sm/sharedfp_sm_file_open.c index 2d205bd23be..4ed76e12a82 100644 --- a/ompi/mca/sharedfp/sm/sharedfp_sm_file_open.c +++ b/ompi/mca/sharedfp/sm/sharedfp_sm_file_open.c @@ -14,6 +14,7 @@ * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -48,7 +49,7 @@ int mca_sharedfp_sm_file_open (struct ompi_communicator_t *comm, const char* filename, int amode, - struct ompi_info_t *info, + struct opal_info_t *info, mca_io_ompio_file_t *fh) { int err = OMPI_SUCCESS; @@ -57,11 +58,13 @@ int mca_sharedfp_sm_file_open (struct ompi_communicator_t *comm, mca_io_ompio_file_t * shfileHandle, *ompio_fh; char * filename_basename; char * sm_filename; + int sm_filename_length; struct mca_sharedfp_sm_offset * sm_offset_ptr; struct mca_sharedfp_sm_offset sm_offset; mca_io_ompio_data_t *data; int sm_fd; int rank; + uint32_t comm_cid; /*----------------------------------------------------*/ /*Open the same file again without shared file pointer*/ @@ -86,7 +89,7 @@ int mca_sharedfp_sm_file_open (struct ompi_communicator_t *comm, ompio_fh->f_etype, ompio_fh->f_orig_filetype, ompio_fh->f_datarep, - MPI_INFO_NULL); + &(MPI_INFO_NULL->super)); /*Memory is allocated here for the sh structure*/ if ( mca_sharedfp_sm_verbose ) { @@ -129,28 +132,21 @@ int mca_sharedfp_sm_file_open (struct ompi_communicator_t *comm, /* the shared memory segment is identified opening a file ** and then mapping it to memory ** For sharedfp we also want to put the file backed shared memory into the tmp directory - ** TODO: properly name the file so that different jobs can run on the same system w/o - ** overwriting each other, e.g. orte_process_info.proc_session_dir */ - /*sprintf(sm_filename,"%s%s",filename,".sm");*/ - filename_basename = basename((void *)filename); - sm_filename = (char*) malloc( sizeof(char) * (strlen(filename_basename)+64) ); + filename_basename = basename((char*)filename); + /* format is "%s/%s_cid-%d.sm", see below */ + sm_filename_length = strlen(ompi_process_info.job_session_dir) + 1 + strlen(filename_basename) + 5 + (3*sizeof(uint32_t)+1) + 4; + sm_filename = (char*) malloc( sizeof(char) * sm_filename_length); if (NULL == sm_filename) { + opal_output(0, "mca_sharedfp_sm_file_open: Error, unable to malloc sm_filename\n"); free(sm_data); free(sh); free(shfileHandle); return OMPI_ERR_OUT_OF_RESOURCE; } - opal_jobid_t masterjobid; - if ( 0 == comm->c_my_rank ) { - ompi_proc_t *masterproc = ompi_group_peer_lookup(comm->c_local_group, 0 ); - masterjobid = OMPI_CAST_RTE_NAME(&masterproc->super.proc_name)->jobid; - } - comm->c_coll->coll_bcast ( &masterjobid, 1, MPI_UNSIGNED, 0, comm, - comm->c_coll->coll_bcast_module ); - - sprintf(sm_filename,"/tmp/OMPIO_%s_%d_%s",filename_basename, masterjobid, ".sm"); + comm_cid = ompi_comm_get_cid(comm); + sprintf(sm_filename, "%s/%s_cid-%d.sm", ompi_process_info.job_session_dir, filename_basename, comm_cid); /* open shared memory file, initialize to 0, map into memory */ sm_fd = open(sm_filename, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH); @@ -195,7 +191,7 @@ int mca_sharedfp_sm_file_open (struct ompi_communicator_t *comm, #if defined(HAVE_SEM_OPEN) -#if defined (__APPLE__) +#if defined (__APPLE__) sm_data->sem_name = (char*) malloc( sizeof(char) * 32); snprintf(sm_data->sem_name,31,"OMPIO_%s",filename_basename); #else @@ -235,6 +231,12 @@ int mca_sharedfp_sm_file_open (struct ompi_communicator_t *comm, comm->c_coll->coll_barrier (comm, comm->c_coll->coll_barrier_module ); +#if defined(HAVE_SEM_OPEN) + if ( 0 == rank ) { + sem_unlink ( sm_data->sem_name); + } +#endif + return err; } @@ -267,7 +269,7 @@ int mca_sharedfp_sm_file_close (mca_io_ompio_file_t *fh) if (file_data->sm_offset_ptr) { /* destroy semaphore */ #if defined(HAVE_SEM_OPEN) - sem_unlink (file_data->sem_name); + sem_close ( file_data->mutex); free (file_data->sem_name); #elif defined(HAVE_SEM_INIT) sem_destroy(&file_data->sm_offset_ptr->mutex); diff --git a/ompi/mca/topo/base/base.h b/ompi/mca/topo/base/base.h index 5e05a8009d4..9ab1a4b927a 100644 --- a/ompi/mca/topo/base/base.h +++ b/ompi/mca/topo/base/base.h @@ -15,6 +15,7 @@ * Copyright (c) 2012-2013 Inria. All rights reserved. * Copyright (c) 2014-2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -171,7 +172,7 @@ mca_topo_base_dist_graph_create(mca_topo_base_module_t* module, ompi_communicator_t *old_comm, int n, const int nodes[], const int degrees[], const int targets[], const int weights[], - ompi_info_t *info, int reorder, + opal_info_t *info, int reorder, ompi_communicator_t **new_comm); OMPI_DECLSPEC int @@ -180,7 +181,7 @@ mca_topo_base_dist_graph_create_adjacent(mca_topo_base_module_t* module, int indegree, const int sources[], const int sourceweights[], int outdegree, const int destinations[], const int destweights[], - ompi_info_t *info, int reorder, + opal_info_t *info, int reorder, ompi_communicator_t **comm_dist_graph); OMPI_DECLSPEC int @@ -194,6 +195,9 @@ OMPI_DECLSPEC int mca_topo_base_dist_graph_neighbors_count(ompi_communicator_t *comm, int *inneighbors, int *outneighbors, int *weighted); + +int mca_topo_base_neighbor_count (ompi_communicator_t *comm, int *indegree, int *outdegree); + END_C_DECLS #endif /* MCA_BASE_TOPO_H */ diff --git a/ompi/mca/topo/base/topo_base_dist_graph_create.c b/ompi/mca/topo/base/topo_base_dist_graph_create.c index 9048d7acb90..fdc202f879a 100644 --- a/ompi/mca/topo/base/topo_base_dist_graph_create.c +++ b/ompi/mca/topo/base/topo_base_dist_graph_create.c @@ -10,7 +10,7 @@ * Copyright (c) 2011-2013 Université Bordeaux 1 * Copyright (c) 2014-2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2016 IBM Corporation. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. */ #include "ompi_config.h" @@ -284,7 +284,7 @@ int mca_topo_base_dist_graph_create(mca_topo_base_module_t* module, int n, const int nodes[], const int degrees[], const int targets[], const int weights[], - ompi_info_t *info, int reorder, + opal_info_t *info, int reorder, ompi_communicator_t **newcomm) { int err; @@ -295,6 +295,14 @@ int mca_topo_base_dist_graph_create(mca_topo_base_module_t* module, OBJ_RELEASE(module); return err; } + // But if there is an info object, the above call didn't make use + // of it, so we'll do a dup-with-info to get the final comm and + // free the above intermediate newcomm: + if (info && info != &(MPI_INFO_NULL->super)) { + ompi_communicator_t *intermediate_comm = *newcomm; + ompi_comm_dup_with_info (intermediate_comm, info, newcomm); + ompi_comm_free(&intermediate_comm); + } assert(NULL == (*newcomm)->c_topo); (*newcomm)->c_topo = module; diff --git a/ompi/mca/topo/base/topo_base_dist_graph_create_adjacent.c b/ompi/mca/topo/base/topo_base_dist_graph_create_adjacent.c index 6d3d9406339..5b12042708b 100644 --- a/ompi/mca/topo/base/topo_base_dist_graph_create_adjacent.c +++ b/ompi/mca/topo/base/topo_base_dist_graph_create_adjacent.c @@ -10,6 +10,7 @@ * Copyright (c) 2011-2013 Université Bordeaux 1 * Copyright (c) 2014-2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corp. All rights reserved. */ #include "ompi_config.h" @@ -26,7 +27,7 @@ int mca_topo_base_dist_graph_create_adjacent(mca_topo_base_module_t* module, int outdegree, const int destinations[], const int destweights[], - ompi_info_t *info, int reorder, + opal_info_t *info, int reorder, ompi_communicator_t **newcomm) { mca_topo_base_comm_dist_graph_2_2_0_t *topo = NULL; @@ -37,6 +38,15 @@ int mca_topo_base_dist_graph_create_adjacent(mca_topo_base_module_t* module, newcomm)) ) { return err; } + // But if there is an info object, the above call didn't make use + // of it, so we'll do a dup-with-info to get the final comm and + // free the above intermediate newcomm: + if (info && info != &(MPI_INFO_NULL->super)) { + ompi_communicator_t *intermediate_comm = *newcomm; + ompi_comm_dup_with_info (intermediate_comm, info, newcomm); + ompi_comm_free(&intermediate_comm); + } + err = OMPI_ERR_OUT_OF_RESOURCE; /* suppose by default something bad will happens */ assert( NULL == (*newcomm)->c_topo ); diff --git a/ompi/mca/topo/base/topo_base_frame.c b/ompi/mca/topo/base/topo_base_frame.c index 062786f9308..4ed9049fc26 100644 --- a/ompi/mca/topo/base/topo_base_frame.c +++ b/ompi/mca/topo/base/topo_base_frame.c @@ -71,6 +71,33 @@ static int mca_topo_base_open(mca_base_open_flag_t flags) return mca_base_framework_components_open(&ompi_topo_base_framework, flags); } +int mca_topo_base_neighbor_count (ompi_communicator_t *comm, int *indegree, int *outdegree) { + if (!OMPI_COMM_IS_TOPO(comm)) { + return OMPI_ERR_BAD_PARAM; + } + + if (OMPI_COMM_IS_CART(comm)) { + /* cartesian */ + /* outdegree is always 2*ndims because we need to iterate over + empty buffers for MPI_PROC_NULL */ + *outdegree = *indegree = 2 * comm->c_topo->mtc.cart->ndims; + } else if (OMPI_COMM_IS_GRAPH(comm)) { + /* graph */ + int rank, nneighbors; + + rank = ompi_comm_rank (comm); + mca_topo_base_graph_neighbors_count (comm, rank, &nneighbors); + + *outdegree = *indegree = nneighbors; + } else if (OMPI_COMM_IS_DIST_GRAPH(comm)) { + /* graph */ + *indegree = comm->c_topo->mtc.dist_graph->indegree; + *outdegree = comm->c_topo->mtc.dist_graph->outdegree; + } + + return OMPI_SUCCESS; +} + MCA_BASE_FRAMEWORK_DECLARE(ompi, topo, "OMPI Topo", NULL, mca_topo_base_open, mca_topo_base_close, mca_topo_base_static_components, 0); diff --git a/ompi/mca/topo/basic/Makefile.am b/ompi/mca/topo/basic/Makefile.am index 4e6da4f4fe7..9ed7b26dadd 100644 --- a/ompi/mca/topo/basic/Makefile.am +++ b/ompi/mca/topo/basic/Makefile.am @@ -5,6 +5,7 @@ # Copyright (c) 2011-2013 INRIA. All rights reserved. # Copyright (c) 2011-2013 Université Bordeaux 1 # Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -36,6 +37,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component) mca_topo_basic_la_SOURCES = $(component_sources) mca_topo_basic_la_LDFLAGS = -module -avoid-version +mca_topo_basic_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(lib) libmca_topo_basic_la_SOURCES = $(lib_sources) diff --git a/ompi/mca/topo/example/Makefile.am b/ompi/mca/topo/example/Makefile.am index 190bdf0dc8a..22acd1a360f 100644 --- a/ompi/mca/topo/example/Makefile.am +++ b/ompi/mca/topo/example/Makefile.am @@ -11,6 +11,7 @@ # All rights reserved. # Copyright (c) 2009 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2012-2013 Inria. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -45,6 +46,7 @@ mcacomponentdir = $(pkglibdir) mcacomponent_LTLIBRARIES = $(component) mca_topo_example_la_SOURCES = $(component_sources) mca_topo_example_la_LDFLAGS = -module -avoid-version +mca_topo_example_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(lib) libmca_topo_example_la_SOURCES = $(lib_sources) diff --git a/ompi/mca/topo/topo.h b/ompi/mca/topo/topo.h index d4460793b30..7735250f290 100644 --- a/ompi/mca/topo/topo.h +++ b/ompi/mca/topo/topo.h @@ -16,6 +16,7 @@ * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -252,7 +253,7 @@ typedef int (*mca_topo_base_module_dist_graph_create_fn_t) struct ompi_communicator_t *old_comm, int n, const int nodes[], const int degrees[], const int targets[], const int weights[], - struct ompi_info_t *info, int reorder, + struct opal_info_t *info, int reorder, struct ompi_communicator_t **new_comm); /* Back end for MPI_DIST_GRAPH_CREATE_ADJACENT */ @@ -264,7 +265,7 @@ typedef int (*mca_topo_base_module_dist_graph_create_adjacent_fn_t) int outdegree, const int destinations[], const int destweights[], - struct ompi_info_t *info, int reorder, + struct opal_info_t *info, int reorder, ompi_communicator_t **comm_dist_graph); /* Back end for MPI_DIST_GRAPH_NEIGHBORS */ diff --git a/ompi/mca/topo/treematch/Makefile.am b/ompi/mca/topo/treematch/Makefile.am index 6019a786e8d..27d07bc64fe 100644 --- a/ompi/mca/topo/treematch/Makefile.am +++ b/ompi/mca/topo/treematch/Makefile.am @@ -4,6 +4,7 @@ # reserved. # Copyright (c) 2011-2015 INRIA. All rights reserved. # Copyright (c) 2011-2015 Université Bordeaux 1 +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -13,20 +14,25 @@ if topo_treematch_local extra_treematch_files = treematch/tm_bucket.h \ - treematch/tm_hwloc.h treematch/tm_mapping.h \ + treematch/tm_mapping.h \ treematch/tm_timings.h treematch/tm_tree.h \ treematch/tm_kpartitioning.h treematch/uthash.h\ treematch/IntConstantInitializedVector.h \ - treematch/tm_mt.h \ + treematch/tm_mt.h treematch/fibo.h \ treematch/tm_thread_pool.h treematch/tm_verbose.h \ - treematch/tm_malloc.h \ + treematch/tm_malloc.h treematch/k-partitioning.h\ + treematch/tm_solution.h treematch/tm_topology.h\ + treematch/PriorityQueue.h \ treematch/IntConstantInitializedVector.c \ - treematch/tm_mt.c \ + treematch/tm_mt.c treematch/fibo.c \ treematch/tm_thread_pool.c treematch/tm_verbose.c \ - treematch/tm_malloc.c \ + treematch/tm_malloc.c treematch/treematch.h \ treematch/tm_mapping.c treematch/tm_timings.c \ treematch/tm_bucket.c treematch/tm_tree.c \ - treematch/tm_hwloc.c treematch/tm_kpartitioning.c + treematch/tm_topology.c treematch/tm_kpartitioning.c \ + treematch/tm_solution.c treematch/k-partitioning.c \ + treematch/PriorityQueue.c +EXTRA_DIST = treematch/COPYING treematch/LICENSE endif sources = \ @@ -55,6 +61,7 @@ mcacomponentdir = $(pkglibdir) mcacomponent_LTLIBRARIES = $(component) mca_topo_treematch_la_SOURCES = $(component_sources) mca_topo_treematch_la_LDFLAGS = -module -avoid-version +mca_topo_treematch_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(lib) libmca_topo_treematch_la_SOURCES = $(lib_sources) diff --git a/ompi/mca/topo/treematch/topo_treematch.h b/ompi/mca/topo/treematch/topo_treematch.h index 7c11cdf5421..bcc4d748bfd 100644 --- a/ompi/mca/topo/treematch/topo_treematch.h +++ b/ompi/mca/topo/treematch/topo_treematch.h @@ -6,6 +6,7 @@ * Copyright (c) 2011-2015 Bordeaux Polytechnic Institute * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -69,7 +70,7 @@ int mca_topo_treematch_dist_graph_create(mca_topo_base_module_t* module, int n, const int nodes[], const int degrees[], const int targets[], const int weights[], - struct ompi_info_t *info, int reorder, + struct opal_info_t *info, int reorder, ompi_communicator_t **newcomm); /* * ****************************************************************** diff --git a/ompi/mca/topo/treematch/topo_treematch_component.c b/ompi/mca/topo/treematch/topo_treematch_component.c index 6062bf1ed31..fca7e5b71b0 100644 --- a/ompi/mca/topo/treematch/topo_treematch_component.c +++ b/ompi/mca/topo/treematch/topo_treematch_component.c @@ -62,6 +62,11 @@ mca_topo_treematch_component_2_2_0_t mca_topo_treematch_component = static int init_query(bool enable_progress_threads, bool enable_mpi_threads) { + /* The first time this function is called is too early in the process and + * the HWLOC topology information is not available. Thus we should not check + * for the topology here, but instead delay the check until we really need + * the topology information. + */ return OMPI_SUCCESS; } @@ -95,3 +100,4 @@ static int mca_topo_treematch_component_register(void) MCA_BASE_VAR_SCOPE_READONLY, &mca_topo_treematch_component.reorder_mode); return OMPI_SUCCESS; } + diff --git a/ompi/mca/topo/treematch/topo_treematch_dist_graph_create.c b/ompi/mca/topo/treematch/topo_treematch_dist_graph_create.c index 7129c8f369c..47e9ec45719 100644 --- a/ompi/mca/topo/treematch/topo_treematch_dist_graph_create.c +++ b/ompi/mca/topo/treematch/topo_treematch_dist_graph_create.c @@ -3,14 +3,15 @@ * Copyright (c) 2011-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. - * Copyright (c) 2011-2015 INRIA. All rights reserved. - * Copyright (c) 2012-2015 Bordeaux Poytechnic Institute - * Copyright (c) 2015-2016 Intel, Inc. All rights reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2011-2016 INRIA. All rights reserved. + * Copyright (c) 2012-2017 Bordeaux Polytechnic Institute + * Copyright (c) 2015-2017 Intel, Inc. All rights reserved. + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2016 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2017 Cisco Systems, Inc. All rights reserved + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -21,9 +22,10 @@ #include "ompi_config.h" #include "opal/constants.h" -#include "opal/mca/hwloc/hwloc-internal.h" +#include "opal/mca/hwloc/base/base.h" #include "ompi/mca/topo/treematch/topo_treematch.h" +#include "ompi/mca/topo/treematch/treematch/treematch.h" #include "ompi/mca/topo/treematch/treematch/tm_mapping.h" #include "ompi/mca/topo/base/base.h" @@ -34,27 +36,18 @@ #include "opal/mca/pmix/pmix.h" -#define ERR_EXIT(ERR) \ - do { \ - free (nodes_roots); \ - free (local_procs); \ - free (tracker); \ - free (colors); \ - free(local_pattern); \ - return (ERR); } \ - while(0); - -#define FALLBACK() \ - do { free(nodes_roots); \ - free(local_procs); \ - if( NULL != set) hwloc_bitmap_free(set); \ - goto fallback; } \ - while(0); - -#define MY_STRING_SIZE 64 -/*#define __DEBUG__ 1 */ - - +/* #define __DEBUG__ 1 */ + +/** + * This function is a allreduce between all processes to detect for oversubscription. + * On each node, the local_procs will be a different array, that contains only the + * local processes. Thus, that process will compute the node oversubscription and will + * bring this value to the operation, while every other process on the node will + * contribute 0. + * Doing an AllReduce might be an overkill for this situation, but it should remain + * more scalable than a star reduction (between the roots of each node (nodes_roots), + * followed by a bcast to all processes. + */ static int check_oversubscribing(int rank, int num_nodes, int num_objs_in_node, @@ -63,63 +56,50 @@ static int check_oversubscribing(int rank, int *local_procs, ompi_communicator_t *comm_old) { - int oversubscribed = 0; - int local_oversub = 0; - int err; + int oversubscribed = 0, local_oversub = 0, err; + /* Only a single process per node, the local root, compute the oversubscription condition */ if (rank == local_procs[0]) if(num_objs_in_node < num_procs_in_node) - local_oversub = 1; - - if (rank == 0) { - MPI_Request *reqs = (MPI_Request *)calloc(num_nodes-1, sizeof(MPI_Request)); - int *oversub = (int *)calloc(num_nodes, sizeof(int)); - int i; - - oversub[0] = local_oversub; - for(i = 1; i < num_nodes; i++) - if (OMPI_SUCCESS != ( err = MCA_PML_CALL(irecv(&oversub[i], 1, MPI_INT, - nodes_roots[i], 111, comm_old, &reqs[i-1])))) { - /* NTH: more needs to be done to correctly clean up here */ - free (reqs); - free (oversub); - return err; - } + local_oversub = 1; - if (OMPI_SUCCESS != ( err = ompi_request_wait_all(num_nodes-1, - reqs, MPI_STATUSES_IGNORE))) { - /* NTH: more needs to be done to correctly clean up here */ - free (reqs); - free (oversub); - return err; - } - - for(i = 0; i < num_nodes; i++) - oversubscribed += oversub[i]; - - free(oversub); - free(reqs); - } else { - if (rank == local_procs[0]) - if (OMPI_SUCCESS != (err = MCA_PML_CALL(send(&local_oversub, 1, MPI_INT, 0, - 111, MCA_PML_BASE_SEND_STANDARD, comm_old)))) - return err; - } - if (OMPI_SUCCESS != (err = comm_old->c_coll->coll_bcast(&oversubscribed, 1, - MPI_INT, 0, comm_old, - comm_old->c_coll->coll_bcast_module))) + if (OMPI_SUCCESS != (err = comm_old->c_coll->coll_allreduce(&local_oversub, &oversubscribed, 1, MPI_INT, + MPI_SUM, comm_old, comm_old->c_coll->coll_allreduce_module))) return err; return oversubscribed; } +#ifdef __DEBUG__ +static void dump_int_array( int level, int output_id, char* prolog, char* line_prolog, int* array, size_t length ) +{ + size_t i; + if( -1 == output_id ) return; + + opal_output_verbose(level, output_id, "%s : ", prolog); + for(i = 0; i < length ; i++) + opal_output_verbose(level, output_id, "%s [%lu:%i] ", line_prolog, i, array[i]); + opal_output_verbose(level, output_id, "\n"); +} +static void dump_double_array( int level, int output_id, char* prolog, char* line_prolog, double* array, size_t length ) +{ + size_t i; + + if( -1 == output_id ) return; + opal_output_verbose(level, output_id, "%s : ", prolog); + for(i = 0; i < length ; i++) + opal_output_verbose(level, output_id, "%s [%lu:%lf] ", line_prolog, i, array[i]); + opal_output_verbose(level, output_id, "\n"); +} +#endif + int mca_topo_treematch_dist_graph_create(mca_topo_base_module_t* topo_module, ompi_communicator_t *comm_old, int n, const int nodes[], const int degrees[], const int targets[], const int weights[], - struct ompi_info_t *info, int reorder, + struct opal_info_t *info, int reorder, ompi_communicator_t **newcomm) { int err; @@ -143,87 +123,78 @@ int mca_topo_treematch_dist_graph_create(mca_topo_base_module_t* topo_module, } return err; } /* reorder == yes */ + mca_topo_base_comm_dist_graph_2_2_0_t *topo = NULL; ompi_proc_t *proc = NULL; MPI_Request *reqs = NULL; - hwloc_cpuset_t set; - hwloc_obj_t object,root_obj; + hwloc_cpuset_t set = NULL; + hwloc_obj_t object, root_obj; hwloc_obj_t *tracker = NULL; double *local_pattern = NULL; int *vpids, *colors = NULL; - int *local_procs = NULL; - int *nodes_roots = NULL; + int *lindex_to_grank = NULL; + int *nodes_roots = NULL, *k = NULL; int *localrank_to_objnum = NULL; int depth, effective_depth, obj_rank = -1; - int num_objs_in_node = 0; - int num_pus_in_node = 0; - int numlevels = 0; - int num_nodes = 0; - int num_procs_in_node = 0; - int rank, size; - int hwloc_err; - int oversubscribing_objs = 0; - int i, j, idx; + int num_objs_in_node = 0, num_pus_in_node = 0; + int numlevels = 0, num_nodes = 0, num_procs_in_node = 0; + int rank, size, newrank = -1, hwloc_err, i, j, idx; + int oversubscribing_objs = 0, oversubscribed_pus = 0; uint32_t val, *pval; + /* We need to know if the processes are bound. We assume all + * processes are in the same state: all bound or none. */ + if (OPAL_SUCCESS != opal_hwloc_base_get_topology()) { + goto fallback; + } + root_obj = hwloc_get_root_obj(opal_hwloc_topology); + if (NULL == root_obj) goto fallback; + topo = topo_module->mtc.dist_graph; rank = ompi_comm_rank(comm_old); size = ompi_comm_size(comm_old); -#ifdef __DEBUG__ - fprintf(stdout,"Process rank is : %i\n",rank); -#endif - /* Determine the number of local procs */ - /* and the number of ext procs */ - for(i = 0 ; i < size ; i++){ - proc = ompi_group_peer_lookup(comm_old->c_local_group, i); - if (( i == rank ) || - (OPAL_PROC_ON_LOCAL_NODE(proc->super.proc_flags))) - num_procs_in_node++; - } - + OPAL_OUTPUT_VERBOSE((10, ompi_topo_base_framework.framework_output, + "Process rank is : %i\n",rank)); + /** + * In order to decrease the number of loops let's use a trick: + * build the lindex_to_grank in the vpids array, and only allocate + * it upon completion of the most costly loop. + */ vpids = (int *)malloc(size * sizeof(int)); colors = (int *)malloc(size * sizeof(int)); - local_procs = (int *)malloc(num_procs_in_node * sizeof(int)); - for(i = idx = 0 ; i < size ; i++){ + for(i = 0 ; i < size ; i++) { proc = ompi_group_peer_lookup(comm_old->c_local_group, i); if (( i == rank ) || - (OPAL_PROC_ON_LOCAL_NODE(proc->super.proc_flags))) { - local_procs[idx++] = i; - } + (OPAL_PROC_ON_LOCAL_NODE(proc->super.proc_flags))) + vpids[num_procs_in_node++] = i; pval = &val; OPAL_MODEX_RECV_VALUE(err, OPAL_PMIX_NODEID, &(proc->super.proc_name), &pval, OPAL_UINT32); if( OPAL_SUCCESS != err ) { opal_output(0, "Unable to extract peer %s nodeid from the modex.\n", OMPI_NAME_PRINT(&(proc->super.proc_name))); - vpids[i] = colors[i] = -1; + colors[i] = -1; continue; } - vpids[i] = colors[i] = (int)val; + colors[i] = (int)val; } + lindex_to_grank = (int *)malloc(num_procs_in_node * sizeof(int)); + memcpy(lindex_to_grank, vpids, num_procs_in_node * sizeof(int)); + memcpy(vpids, colors, size * sizeof(int)); #ifdef __DEBUG__ - fprintf(stdout,"Process rank (2) is : %i \n",rank); - if ( 0 == rank ){ - fprintf(stdout,"local_procs : "); - for(i = 0; i < num_procs_in_node ; i++) - fprintf(stdout," [%i:%i] ",i,local_procs[i]); - fprintf(stdout,"\n"); - - fprintf(stdout,"Vpids : "); - for(i = 0; i < size ; i++) - fprintf(stdout," [%i:%i] ",i,vpids[i]); - fprintf(stdout,"\n"); + if ( 0 == rank ) { + dump_int_array(10, ompi_topo_base_framework.framework_output, + "lindex_to_grank : ", "", lindex_to_grank, num_procs_in_node); + dump_int_array(10, ompi_topo_base_framework.framework_output, + "Vpids : ", "", colors, size); } #endif /* clean-up dupes in the array */ - for(i = 0; i < size; i++) { - if( -1 == vpids[i] ) - continue; - - num_nodes++; /* update the number of nodes */ - + for(i = 0; i < size ; i++) { + if ( -1 == vpids[i] ) continue; + num_nodes++; /* compute number of nodes */ for(j = i+1; j < size; j++) if( vpids[i] == vpids[j] ) vpids[j] = -1; @@ -233,695 +204,745 @@ int mca_topo_treematch_dist_graph_create(mca_topo_base_module_t* topo_module, * and create a duplicate of the original communicator */ free(vpids); free(colors); - free(local_procs); - err = OMPI_SUCCESS; /* return with success */ - goto fallback; + goto fallback; /* return with success */ } /* compute local roots ranks in comm_old */ /* Only the global root needs to do this */ if(0 == rank) { nodes_roots = (int *)calloc(num_nodes, sizeof(int)); - for(i = idx = 0; i < size ; i++) + for(i = idx = 0; i < size; i++) if( vpids[i] != -1 ) nodes_roots[idx++] = i; + OPAL_OUTPUT_VERBOSE((10, ompi_topo_base_framework.framework_output, + "num nodes is %i\n", num_nodes)); #ifdef __DEBUG__ - fprintf(stdout,"num nodes is %i\n",num_nodes); - fprintf(stdout,"Root nodes are :\n"); - for(i = 0; i < num_nodes ; i++) - fprintf(stdout," [root %i : %i] ",i,nodes_roots[i]); - fprintf(stdout,"\n"); + dump_int_array(10, ompi_topo_base_framework.framework_output, + "Root nodes are :\n", "root ", nodes_roots, num_nodes); #endif } free(vpids); - /* Then, we need to know if the processes are bound */ - /* We make the hypothesis that all processes are in */ - /* the same state : all bound or none bound */ - if (OPAL_SUCCESS != opal_hwloc_base_get_topology()) { - goto fallback; - } - root_obj = hwloc_get_root_obj(opal_hwloc_topology); - if (NULL == root_obj) goto fallback; - /* if cpubind returns an error, it will be full anyway */ set = hwloc_bitmap_alloc_full(); - hwloc_get_cpubind(opal_hwloc_topology,set,0); + hwloc_get_cpubind(opal_hwloc_topology, set, 0); num_pus_in_node = hwloc_get_nbobjs_by_type(opal_hwloc_topology, HWLOC_OBJ_PU); - if(hwloc_bitmap_isincluded(root_obj->cpuset,set)){ - /* processes are not bound on the machine */ -#ifdef __DEBUG__ + /** + * In all situations (including heterogeneous environments) all processes must execute + * all the calls that involve collective communications, so we have to lay the logic + * accordingly. + */ + + if(hwloc_bitmap_isincluded(root_obj->cpuset,set)) { /* processes are not bound on the machine */ if (0 == rank) - fprintf(stdout,">>>>>>>>>>>>> Process Not bound <<<<<<<<<<<<<<<\n"); -#endif /* __DEBUG__ */ + OPAL_OUTPUT_VERBOSE((10, ompi_topo_base_framework.framework_output, + ">>>>>>>>>>>>> Process Not bound <<<<<<<<<<<<<<<\n")); /* we try to bind to cores or above objects if enough are present */ /* Not sure that cores are present in ALL nodes */ - depth = hwloc_get_type_or_above_depth(opal_hwloc_topology,HWLOC_OBJ_CORE); - num_objs_in_node = hwloc_get_nbobjs_by_depth(opal_hwloc_topology,depth); - - /* Check for oversubscribing */ - oversubscribing_objs = check_oversubscribing(rank,num_nodes, - num_objs_in_node,num_procs_in_node, - nodes_roots,local_procs,comm_old); - if(oversubscribing_objs) { -#ifdef __DEBUG__ - fprintf(stdout,"Oversubscribing OBJ/CORES resources => Trying to use PUs \n"); -#endif - int oversubscribed_pus = check_oversubscribing(rank,num_nodes, - num_pus_in_node,num_procs_in_node, - nodes_roots,local_procs,comm_old); - if (oversubscribed_pus){ -#ifdef __DEBUG__ - fprintf(stdout,"Oversubscribing PUs resources => Rank Reordering Impossible \n"); -#endif - FALLBACK(); - } else { + depth = hwloc_get_type_or_above_depth(opal_hwloc_topology, HWLOC_OBJ_CORE); + num_objs_in_node = hwloc_get_nbobjs_by_depth(opal_hwloc_topology, depth); + } else { /* the processes are already bound */ + object = hwloc_get_obj_covering_cpuset(opal_hwloc_topology, set); + obj_rank = object->logical_index; + effective_depth = object->depth; + num_objs_in_node = hwloc_get_nbobjs_by_depth(opal_hwloc_topology, effective_depth); + } + if( (0 == num_objs_in_node) || (0 == num_pus_in_node) ) { /* deal with bozo cases: COVERITY 1418505 */ + free(colors); + goto fallback; /* return with success */ + } + /* Check for oversubscribing */ + oversubscribing_objs = check_oversubscribing(rank, num_nodes, + num_objs_in_node, num_procs_in_node, + nodes_roots, lindex_to_grank, comm_old); + + if(oversubscribing_objs) { + if(hwloc_bitmap_isincluded(root_obj->cpuset, set)) { /* processes are not bound on the machine */ + OPAL_OUTPUT_VERBOSE((10, ompi_topo_base_framework.framework_output, + "Oversubscribing OBJ/CORES resources => Trying to use PUs \n")); + + oversubscribed_pus = check_oversubscribing(rank, num_nodes, + num_pus_in_node, num_procs_in_node, + nodes_roots, lindex_to_grank, comm_old); + /* Update the data used to compute the correct binding */ + if (!oversubscribed_pus) { obj_rank = ompi_process_info.my_local_rank%num_pus_in_node; effective_depth = hwloc_topology_get_depth(opal_hwloc_topology) - 1; num_objs_in_node = num_pus_in_node; -#ifdef __DEBUG__ - fprintf(stdout,"Process not bound : binding on PU#%i \n",obj_rank); -#endif + OPAL_OUTPUT_VERBOSE((10, ompi_topo_base_framework.framework_output, + "Process %i not bound : binding on PU#%i \n", rank, obj_rank)); } } else { + /* Bound processes will participate with the same data as before */ + oversubscribed_pus = check_oversubscribing(rank, num_nodes, + num_objs_in_node, num_procs_in_node, + nodes_roots, lindex_to_grank, comm_old); + } + } + + if( !oversubscribing_objs && !oversubscribed_pus ) { + if( hwloc_bitmap_isincluded(root_obj->cpuset, set) ) { /* processes are not bound on the machine */ obj_rank = ompi_process_info.my_local_rank%num_objs_in_node; effective_depth = depth; - object = hwloc_get_obj_by_depth(opal_hwloc_topology,effective_depth,obj_rank); - if( NULL == object) FALLBACK(); + object = hwloc_get_obj_by_depth(opal_hwloc_topology, effective_depth, obj_rank); + if( NULL == object) { + free(colors); + hwloc_bitmap_free(set); + goto fallback; /* return with success */ + } - hwloc_bitmap_copy(set,object->cpuset); + hwloc_bitmap_copy(set, object->cpuset); hwloc_bitmap_singlify(set); /* we don't want the process to move */ - hwloc_err = hwloc_set_cpubind(opal_hwloc_topology,set,0); - if( -1 == hwloc_err) FALLBACK(); -#ifdef __DEBUG__ - fprintf(stdout,"Process not bound : binding on OBJ#%i \n",obj_rank); -#endif - } - } else { /* the processes are already bound */ - object = hwloc_get_obj_covering_cpuset(opal_hwloc_topology,set); - obj_rank = object->logical_index; - effective_depth = object->depth; - num_objs_in_node = hwloc_get_nbobjs_by_depth(opal_hwloc_topology, effective_depth); - - /* Check for oversubscribing */ - oversubscribing_objs = check_oversubscribing(rank,num_nodes, - num_objs_in_node,num_procs_in_node, - nodes_roots,local_procs,comm_old); - if(oversubscribing_objs) { -#ifdef __DEBUG__ - fprintf(stdout,"Oversubscribing OBJ/CORES resources => Rank Reordering Impossible\n"); -#endif - FALLBACK(); + hwloc_err = hwloc_set_cpubind(opal_hwloc_topology, set, 0); + if( -1 == hwloc_err) { + /* This is a local issue. Either we agree with the rest of the processes to stop the + * reordering or we have to complete the entire process. Let's complete. + */ + OPAL_OUTPUT_VERBOSE((10, ompi_topo_base_framework.framework_output, + "Process %i failed to bind on OBJ#%i \n", rank, obj_rank)); + } else + OPAL_OUTPUT_VERBOSE((10, ompi_topo_base_framework.framework_output, + "Process %i not bound : binding on OBJ#%i \n",rank, obj_rank)); + } else { + OPAL_OUTPUT_VERBOSE((10, ompi_topo_base_framework.framework_output, + "Process %i bound on OBJ #%i \n" + "=====> Num obj in node : %i | num pus in node : %i\n", + rank, obj_rank, + num_objs_in_node, num_pus_in_node)); } -#ifdef __DEBUG__ - fprintf(stdout,"Process %i bound on OBJ #%i \n",rank,obj_rank); - fprintf(stdout,"=====> Num obj in node : %i | num pus in node : %i\n",num_objs_in_node,num_pus_in_node); -#endif + } else { + OPAL_OUTPUT_VERBOSE((10, ompi_topo_base_framework.framework_output, + "Oversubscribing PUs resources => Rank Reordering Impossible \n")); + free(colors); + hwloc_bitmap_free(set); + goto fallback; /* return with success */ } - reqs = (MPI_Request *)calloc(num_procs_in_node-1,sizeof(MPI_Request)); - if( rank == local_procs[0] ) { - /* we need to find the right elements of the hierarchy */ - /* and remove the unneeded elements */ - /* Only local masters need to do this */ + reqs = (MPI_Request *)calloc(num_procs_in_node-1, sizeof(MPI_Request)); + if( rank == lindex_to_grank[0] ) { /* local leader clean the hierarchy */ int array_size = effective_depth + 1; - int *myhierarchy = (int *)calloc(array_size,sizeof(int)); + int *myhierarchy = (int *)calloc(array_size, sizeof(int)); - for (i = 0; i < array_size ; i++) { - myhierarchy[i] = hwloc_get_nbobjs_by_depth(opal_hwloc_topology,i); -#ifdef __DEBUG__ - fprintf(stdout,"hierarchy[%i] = %i\n",i,myhierarchy[i]); -#endif - } numlevels = 1; - for (i = 1; i < array_size; i++) + myhierarchy[0] = hwloc_get_nbobjs_by_depth(opal_hwloc_topology, 0); + for (i = 1; i < array_size ; i++) { + myhierarchy[i] = hwloc_get_nbobjs_by_depth(opal_hwloc_topology, i); + OPAL_OUTPUT_VERBOSE((10, ompi_topo_base_framework.framework_output, + "hierarchy[%i] = %i\n", i, myhierarchy[i])); if ((myhierarchy[i] != 0) && (myhierarchy[i] != myhierarchy[i-1])) numlevels++; + } - tracker = (hwloc_obj_t *)calloc(numlevels,sizeof(hwloc_obj_t)); - idx = 0; - i = 1; - while (i < array_size){ + tracker = (hwloc_obj_t *)calloc(numlevels, sizeof(hwloc_obj_t)); + for(idx = 0, i = 1; i < array_size; i++) { if(myhierarchy[i] != myhierarchy[i-1]) - tracker[idx++] = hwloc_get_obj_by_depth(opal_hwloc_topology,i-1,0); - i++; + tracker[idx++] = hwloc_get_obj_by_depth(opal_hwloc_topology, i-1, 0); } - tracker[idx] = hwloc_get_obj_by_depth(opal_hwloc_topology,effective_depth,0); + tracker[idx] = hwloc_get_obj_by_depth(opal_hwloc_topology, effective_depth, 0); free(myhierarchy); -#ifdef __DEBUG__ - fprintf(stdout,">>>>>>>>>>>>>>>>>>>>> Effective depth is : %i (total depth %i)| num_levels %i\n", - effective_depth,hwloc_topology_get_depth(opal_hwloc_topology),numlevels); - for(i = 0 ; i < numlevels ; i++) - fprintf(stdout,"tracker[%i] : arity %i | depth %i\n",i,tracker[i]->arity,tracker[i]->depth); -#endif + OPAL_OUTPUT_VERBOSE((10, ompi_topo_base_framework.framework_output, + ">>>>>>>>>>>>>>>>>>>>> Effective depth is : %i (total depth %i)| num_levels %i\n", + effective_depth, hwloc_topology_get_depth(opal_hwloc_topology), numlevels)); + for(i = 0 ; i < numlevels ; i++) { + OPAL_OUTPUT_VERBOSE((10, ompi_topo_base_framework.framework_output, + "tracker[%i] : arity %i | depth %i\n", + i, tracker[i]->arity, tracker[i]->depth)); + } /* get the obj number */ - localrank_to_objnum = (int *)calloc(num_procs_in_node,sizeof(int)); + localrank_to_objnum = (int *)calloc(num_procs_in_node, sizeof(int)); localrank_to_objnum[0] = obj_rank; for(i = 1; i < num_procs_in_node; i++) { - if (OMPI_SUCCESS != ( err = MCA_PML_CALL(irecv(&localrank_to_objnum[i],1,MPI_INT, - local_procs[i],111, comm_old,&reqs[i-1])))) - return err; + if (OMPI_SUCCESS != ( err = MCA_PML_CALL(irecv(&localrank_to_objnum[i], 1, MPI_INT, + lindex_to_grank[i], -111, comm_old, &reqs[i-1])))) { + free(reqs); reqs = NULL; + goto release_and_return; + } } if (OMPI_SUCCESS != ( err = ompi_request_wait_all(num_procs_in_node-1, - reqs,MPI_STATUSES_IGNORE))) - return err; + reqs, MPI_STATUSES_IGNORE))) { + free(reqs); reqs = NULL; + goto release_and_return; + } } else { /* sending my core number to my local master on the node */ - if (OMPI_SUCCESS != (err = MCA_PML_CALL(send(&obj_rank, 1, MPI_INT, local_procs[0], - 111, MCA_PML_BASE_SEND_STANDARD, comm_old)))) - return err; + if (OMPI_SUCCESS != (err = MCA_PML_CALL(send(&obj_rank, 1, MPI_INT, lindex_to_grank[0], + -111, MCA_PML_BASE_SEND_STANDARD, comm_old)))) { + free(reqs); reqs = NULL; + goto release_and_return; + } } - free(reqs); + free(reqs); reqs = NULL; /* Centralized Reordering */ if (0 == mca_topo_treematch_component.reorder_mode) { int *k = NULL; int *obj_mapping = NULL; - int newrank = -1; int num_objs_total = 0; /* Gather comm pattern * If weights have been provided take them in account. Otherwise rely * solely on HWLOC information. */ - if(0 == rank) { - - fprintf(stderr,"========== Centralized Reordering ========= \n"); + if( 0 == rank ) { + OPAL_OUTPUT_VERBOSE((10, ompi_topo_base_framework.framework_output, + "========== Centralized Reordering ========= \n")); local_pattern = (double *)calloc(size*size,sizeof(double)); - if( true == topo->weighted ) { - for(i = 0; i < topo->indegree ; i++) - local_pattern[topo->in[i]] += topo->inw[i]; - for(i = 0; i < topo->outdegree ; i++) - local_pattern[topo->out[i]] += topo->outw[i]; - if (OMPI_SUCCESS != (err = comm_old->c_coll->coll_gather(MPI_IN_PLACE, size, MPI_DOUBLE, - local_pattern, size, MPI_DOUBLE, - 0, comm_old, - comm_old->c_coll->coll_gather_module))) - return err; - } } else { local_pattern = (double *)calloc(size,sizeof(double)); - if( true == topo->weighted ) { - for(i = 0; i < topo->indegree ; i++) - local_pattern[topo->in[i]] += topo->inw[i]; - for(i = 0; i < topo->outdegree ; i++) - local_pattern[topo->out[i]] += topo->outw[i]; - if (OMPI_SUCCESS != (err = comm_old->c_coll->coll_gather(local_pattern, size, MPI_DOUBLE, - NULL,0,0, - 0, comm_old, - comm_old->c_coll->coll_gather_module))) - return err; - } + } + if( true == topo->weighted ) { + for(i = 0; i < topo->indegree ; i++) + local_pattern[topo->in[i]] += topo->inw[i]; + for(i = 0; i < topo->outdegree ; i++) + local_pattern[topo->out[i]] += topo->outw[i]; + } + err = comm_old->c_coll->coll_gather( (0 == rank ? MPI_IN_PLACE : local_pattern), size, MPI_DOUBLE, + local_pattern, size, MPI_DOUBLE, /* ignored on non-root */ + 0, comm_old, comm_old->c_coll->coll_gather_module); + if (OMPI_SUCCESS != err) { + goto release_and_return; } - if( rank == local_procs[0]) { + if( rank == lindex_to_grank[0] ) { tm_topology_t *tm_topology = NULL; - tm_topology_t *tm_opt_topology = NULL; int *obj_to_rank_in_comm = NULL; int *hierarchies = NULL; - int hierarchy[MAX_LEVELS+1]; int min; /* create a table that derives the rank in comm_old from the object number */ obj_to_rank_in_comm = (int *)malloc(num_objs_in_node*sizeof(int)); - for(i = 0 ; i < num_objs_in_node ; i++) - obj_to_rank_in_comm[i] = -1; for(i = 0 ; i < num_objs_in_node ; i++) { - object = hwloc_get_obj_by_depth(opal_hwloc_topology,effective_depth,i); + obj_to_rank_in_comm[i] = -1; + object = hwloc_get_obj_by_depth(opal_hwloc_topology, effective_depth, i); for( j = 0; j < num_procs_in_node ; j++ ) - if(localrank_to_objnum[j] == (int)(object->logical_index)) + if(localrank_to_objnum[j] == (int)(object->logical_index)) { + obj_to_rank_in_comm[i] = lindex_to_grank[j]; break; - if(j == num_procs_in_node) - obj_to_rank_in_comm[i] = -1; - else { - int k; - for(k = 0; k < size ; k++) - if (k == local_procs[j]) - break; - obj_to_rank_in_comm[i] = k; - } + } } /* the global master gathers info from local_masters */ if ( 0 == rank ) { if ( num_nodes > 1 ) { - int *objs_per_node = NULL ; - int *displs = NULL; + int *objs_per_node = NULL, displ; - objs_per_node = (int *)calloc(num_nodes,sizeof(int)); - reqs = (MPI_Request *)calloc(num_nodes-1,sizeof(MPI_Request)); + objs_per_node = (int *)calloc(num_nodes, sizeof(int)); + reqs = (MPI_Request *)calloc(num_nodes-1, sizeof(MPI_Request)); objs_per_node[0] = num_objs_in_node; for(i = 1; i < num_nodes ; i++) if (OMPI_SUCCESS != ( err = MCA_PML_CALL(irecv(objs_per_node + i, 1, MPI_INT, - nodes_roots[i],111,comm_old,&reqs[i-1])))) - ERR_EXIT(err); + nodes_roots[i], -112, comm_old, &reqs[i-1])))) { + free(obj_to_rank_in_comm); + free(objs_per_node); + goto release_and_return; + } if (OMPI_SUCCESS != ( err = ompi_request_wait_all(num_nodes - 1, - reqs,MPI_STATUSES_IGNORE))) - ERR_EXIT(err); + reqs, MPI_STATUSES_IGNORE))) { + free(objs_per_node); + goto release_and_return; + } for(i = 0; i < num_nodes; i++) num_objs_total += objs_per_node[i]; obj_mapping = (int *)calloc(num_objs_total,sizeof(int)); - displs = (int *)calloc(num_objs_total,sizeof(int)); - displs[0] = 0; - for(i = 1; i < num_nodes ; i++) - displs[i] = displs[i-1] + objs_per_node[i-1]; - memset(reqs,0,(num_nodes-1)*sizeof(MPI_Request)); - memcpy(obj_mapping,obj_to_rank_in_comm,objs_per_node[0]*sizeof(int)); - for(i = 1; i < num_nodes ; i++) - if (OMPI_SUCCESS != ( err = MCA_PML_CALL(irecv(obj_mapping + displs[i], objs_per_node[i], MPI_INT, - nodes_roots[i],111,comm_old,&reqs[i-1])))) - ERR_EXIT(err); + memcpy(obj_mapping, obj_to_rank_in_comm, objs_per_node[0]*sizeof(int)); + displ = objs_per_node[0]; + for(i = 1; i < num_nodes ; i++) { + if (OMPI_SUCCESS != ( err = MCA_PML_CALL(irecv(obj_mapping + displ, objs_per_node[i], MPI_INT, + nodes_roots[i], -113, comm_old, &reqs[i-1])))) { + free(obj_to_rank_in_comm); + free(objs_per_node); + free(obj_mapping); + goto release_and_return; + } + displ += objs_per_node[i]; + } if (OMPI_SUCCESS != ( err = ompi_request_wait_all(num_nodes - 1, - reqs,MPI_STATUSES_IGNORE))) - ERR_EXIT(err); - free(displs); + reqs, MPI_STATUSES_IGNORE))) { + free(obj_to_rank_in_comm); + free(objs_per_node); + free(obj_mapping); + goto release_and_return; + } free(objs_per_node); } else { /* if num_nodes == 1, then it's easy to get the obj mapping */ num_objs_total = num_objs_in_node; - obj_mapping = (int *)calloc(num_objs_total,sizeof(int)); - memcpy(obj_mapping,obj_to_rank_in_comm,num_objs_total*sizeof(int)); + obj_mapping = (int *)calloc(num_objs_total, sizeof(int)); + memcpy(obj_mapping, obj_to_rank_in_comm, num_objs_total*sizeof(int)); } - #ifdef __DEBUG__ - fprintf(stdout,"Obj mapping : "); - for(i = 0 ; i < num_objs_total ; i++) - fprintf(stdout," [%i:%i] ",i,obj_mapping[i]); - fprintf(stdout,"\n"); + dump_int_array(10, ompi_topo_base_framework.framework_output, + "Obj mapping : ", "", obj_mapping, num_objs_total ); #endif } else { if ( num_nodes > 1 ) { if (OMPI_SUCCESS != (err = MCA_PML_CALL(send(&num_objs_in_node, 1, MPI_INT, - 0, 111, MCA_PML_BASE_SEND_STANDARD, comm_old)))) - ERR_EXIT(err); + 0, -112, MCA_PML_BASE_SEND_STANDARD, comm_old)))) { + free(obj_to_rank_in_comm); + goto release_and_return; + } if (OMPI_SUCCESS != (err = MCA_PML_CALL(send(obj_to_rank_in_comm, num_objs_in_node, MPI_INT, - 0, 111, MCA_PML_BASE_SEND_STANDARD, comm_old)))) - ERR_EXIT(err); + 0, -113, MCA_PML_BASE_SEND_STANDARD, comm_old)))) { + free(obj_to_rank_in_comm); + goto release_and_return; + } } } - free(obj_to_rank_in_comm); - for(i = 0 ; i < (MAX_LEVELS+1) ; i++) - hierarchy[i] = -1; - hierarchy[0] = numlevels; - - assert(numlevels < MAX_LEVELS); - - for(i = 0 ; i < hierarchy[0] ; i++) - hierarchy[i+1] = tracker[i]->arity; - + assert(numlevels < TM_MAX_LEVELS); if( 0 == rank ) { - hierarchies = (int *)malloc(num_nodes*(MAX_LEVELS+1)*sizeof(int)); - for(i = 0 ; i < num_nodes*(MAX_LEVELS+1) ; i++) - hierarchies[i] = -1; + hierarchies = (int *)malloc(num_nodes*(TM_MAX_LEVELS+1)*sizeof(int)); + } else { + hierarchies = (int *)malloc((TM_MAX_LEVELS+1)*sizeof(int)); } + hierarchies[0] = numlevels; + + for(i = 0 ; i < hierarchies[0]; i++) + hierarchies[i+1] = tracker[i]->arity; + for(; i < (TM_MAX_LEVELS+1); i++) /* fill up everything else with -1 */ + hierarchies[i] = -1; + /* gather hierarchies iff more than 1 node! */ if ( num_nodes > 1 ) { - if(rank != 0) { - if (OMPI_SUCCESS != (err = MCA_PML_CALL(send(hierarchy,(MAX_LEVELS+1), MPI_INT, 0, - 111, MCA_PML_BASE_SEND_STANDARD, comm_old)))) - ERR_EXIT(err); + if( rank != 0 ) { + if (OMPI_SUCCESS != (err = MCA_PML_CALL(send(hierarchies,(TM_MAX_LEVELS+1), MPI_INT, 0, + -114, MCA_PML_BASE_SEND_STANDARD, comm_old)))) { + free(hierarchies); + goto release_and_return; + } } else { - memset(reqs,0,(num_nodes-1)*sizeof(MPI_Request)); for(i = 1; i < num_nodes ; i++) - if (OMPI_SUCCESS != ( err = MCA_PML_CALL(irecv(hierarchies+i*(MAX_LEVELS+1),(MAX_LEVELS+1),MPI_INT, - nodes_roots[i],111,comm_old,&reqs[i-1])))){ + if (OMPI_SUCCESS != ( err = MCA_PML_CALL(irecv(hierarchies+i*(TM_MAX_LEVELS+1), (TM_MAX_LEVELS+1), MPI_INT, + nodes_roots[i], -114, comm_old, &reqs[i-1])))) { + free(obj_mapping); free(hierarchies); - ERR_EXIT(err); + goto release_and_return; } if (OMPI_SUCCESS != ( err = ompi_request_wait_all(num_nodes - 1, - reqs,MPI_STATUSES_IGNORE))) { + reqs, MPI_STATUSES_IGNORE))) { + free(obj_mapping); free(hierarchies); - ERR_EXIT(err); + goto release_and_return; } - free(reqs); + free(reqs); reqs = NULL; } } if ( 0 == rank ) { - tree_t *comm_tree = NULL; + tm_tree_t *comm_tree = NULL; + tm_solution_t *sol = NULL; + tm_affinity_mat_t *aff_mat = NULL; double **comm_pattern = NULL; - int *matching = NULL; - memcpy(hierarchies,hierarchy,(MAX_LEVELS+1)*sizeof(int)); #ifdef __DEBUG__ - fprintf(stdout,"hierarchies : "); - for(i = 0 ; i < num_nodes*(MAX_LEVELS+1) ; i++) - fprintf(stdout," [%i] ",hierarchies[i]); - fprintf(stdout,"\n"); + dump_int_array(10, ompi_topo_base_framework.framework_output, + "hierarchies : ", "", hierarchies, num_nodes*(TM_MAX_LEVELS+1)); #endif tm_topology = (tm_topology_t *)malloc(sizeof(tm_topology_t)); tm_topology->nb_levels = hierarchies[0]; /* extract min depth */ for(i = 1 ; i < num_nodes ; i++) - if (hierarchies[i*(MAX_LEVELS+1)] < tm_topology->nb_levels) - tm_topology->nb_levels = hierarchies[i*(MAX_LEVELS+1)]; + if (hierarchies[i*(TM_MAX_LEVELS+1)] < tm_topology->nb_levels) + tm_topology->nb_levels = hierarchies[i*(TM_MAX_LEVELS+1)]; + /* Crush levels in hierarchies too long (ie > tm_topology->nb_levels)*/ for(i = 0; i < num_nodes ; i++) { - int *base_ptr = hierarchies + i*(MAX_LEVELS+1) ; + int *base_ptr = hierarchies + i*(TM_MAX_LEVELS+1); int suppl = *base_ptr - tm_topology->nb_levels; for(j = 1 ; j <= suppl ; j++) *(base_ptr + tm_topology->nb_levels) *= *(base_ptr + tm_topology->nb_levels + j); } - if( num_nodes > 1){ + if( num_nodes > 1) { /* We aggregate all topos => +1 level!*/ tm_topology->nb_levels += 1; - tm_topology->arity = (int *)calloc(tm_topology->nb_levels,sizeof(int)); + tm_topology->arity = (int *)calloc(tm_topology->nb_levels, sizeof(int)); tm_topology->arity[0] = num_nodes; - for(i = 0; i < (tm_topology->nb_levels - 1); i++) { - min = *(hierarchies + 1 + i); + for(i = 1; i < tm_topology->nb_levels; i++) { /* compute the minimum for each level */ + min = hierarchies[i]; for(j = 1; j < num_nodes ; j++) - if( hierarchies[j*(MAX_LEVELS+1) + 1 + i] < min) - min = hierarchies[j*(MAX_LEVELS+1) + 1 + i]; - tm_topology->arity[i+1] = min; + if( hierarchies[j*(TM_MAX_LEVELS+1) + i] < min) + min = hierarchies[j*(TM_MAX_LEVELS+1) + i]; + tm_topology->arity[i] = min; } - }else{ - tm_topology->arity = (int *)calloc(tm_topology->nb_levels,sizeof(int)); + } else { + tm_topology->arity = (int *)calloc(tm_topology->nb_levels, sizeof(int)); for(i = 0; i < tm_topology->nb_levels; i++) - tm_topology->arity[i] = hierarchies[i+1]; /* fixme !!!*/ + tm_topology->arity[i] = hierarchies[i+1]; } free(hierarchies); -#ifdef __DEBUG__ - for(i = 0; i < tm_topology->nb_levels; i++) - fprintf(stdout,"topo_arity[%i] = %i\n",i,tm_topology->arity[i]); -#endif + + for(i = 0; i < tm_topology->nb_levels; i++) { + OPAL_OUTPUT_VERBOSE((10, ompi_topo_base_framework.framework_output, + "topo_arity[%i] = %i\n", i, tm_topology->arity[i])); + } + /* compute the number of processing elements */ - tm_topology->nb_nodes = (int *)calloc(tm_topology->nb_levels,sizeof(int)); + tm_topology->nb_nodes = (size_t *)calloc(tm_topology->nb_levels, sizeof(size_t)); tm_topology->nb_nodes[0] = 1; for(i = 1 ; i < tm_topology->nb_levels; i++) - tm_topology->nb_nodes[i] = tm_topology->nb_nodes[i-1]*tm_topology->arity[i-1]; + tm_topology->nb_nodes[i] = tm_topology->nb_nodes[i-1] * tm_topology->arity[i-1]; /* Build process id tab */ - tm_topology->node_id = (int **)calloc(tm_topology->nb_levels,sizeof(int*)); - for(i = 0; i < tm_topology->nb_levels ; i++) { - tm_topology->node_id[i] = (int *)calloc(tm_topology->nb_nodes[i],sizeof(int)); - for (j = 0; j < tm_topology->nb_nodes[i] ; j++) - tm_topology->node_id[i][j] = obj_mapping[j]; + tm_topology->node_id = (int **)calloc(tm_topology->nb_levels, sizeof(int*)); + tm_topology->node_rank = (int **)malloc(sizeof(int *) * tm_topology->nb_levels); + for(i = 0; i < tm_topology->nb_levels; i++) { + tm_topology->node_id[i] = (int *)calloc(tm_topology->nb_nodes[i], sizeof(int)); + tm_topology->node_rank[i] = (int * )calloc(tm_topology->nb_nodes[i], sizeof(int)); + /*note : we make the hypothesis that logical indexes in hwloc range from + 0 to N, are contiguous and crescent. */ + + for( j = 0 ; j < (int)tm_topology->nb_nodes[i] ; j++ ) { + tm_topology->node_id[i][j] = j; + tm_topology->node_rank[i][j] = j; + + /* Should use object->logical_index */ + /* obj = hwloc_get_obj_by_depth(topo,i,j%num_objs_in_node); + id = obj->logical_index + (num_objs_in_node)*(j/num_obj_in_node)*/ + /* + int id = core_numbering[j%nb_core_per_nodes] + (nb_core_per_nodes)*(j/nb_core_per_nodes); + topology->node_id[i][j] = id; + topology->node_rank[i][id] = j; + */ + } } + /* unused for now*/ + tm_topology->cost = (double*)calloc(tm_topology->nb_levels,sizeof(double)); + + tm_topology->nb_proc_units = num_objs_total; + + tm_topology->nb_constraints = 0; + for(i = 0; i < tm_topology->nb_proc_units ; i++) + if (obj_mapping[i] != -1) + tm_topology->nb_constraints++; + tm_topology->constraints = (int *)calloc(tm_topology->nb_constraints,sizeof(int)); + for(idx = 0, i = 0; i < tm_topology->nb_proc_units ; i++) + if (obj_mapping[i] != -1) + tm_topology->constraints[idx++] = obj_mapping[i]; + + tm_topology->oversub_fact = 1; + #ifdef __DEBUG__ + assert(num_objs_total == (int)tm_topology->nb_nodes[tm_topology->nb_levels-1]); + for(i = 0; i < tm_topology->nb_levels ; i++) { - fprintf(stdout,"tm topo node_id for level [%i] : ",i); - for(j = 0 ; j < tm_topology->nb_nodes[i] ; j++) - fprintf(stdout," [%i:%i] ",j,obj_mapping[j]); - fprintf(stdout,"\n"); + opal_output_verbose(10, ompi_topo_base_framework.framework_output, + "tm topo node_id for level [%i] : ",i); + dump_int_array(10, ompi_topo_base_framework.framework_output, + "", "", obj_mapping, tm_topology->nb_nodes[i]); } - display_topology(tm_topology); + tm_display_topology(tm_topology); #endif comm_pattern = (double **)malloc(size*sizeof(double *)); for(i = 0 ; i < size ; i++) - comm_pattern[i] = local_pattern + i*size; + comm_pattern[i] = local_pattern + i * size; /* matrix needs to be symmetric */ - for( i = 0 ; i < size ; i++) - for(j = i ; j < size ; j++) { - comm_pattern[i][j] += comm_pattern[j][i]; - comm_pattern[j][i] = comm_pattern[i][j]; + for( i = 0; i < size ; i++ ) + for( j = i; j < size ; j++ ) { + comm_pattern[i][j] = (comm_pattern[i][j] + comm_pattern[j][i]) / 2; + comm_pattern[j][i] = comm_pattern[i][j]; } - for( i = 0 ; i < size ; i++) - for(j = 0 ; j < size ; j++) - comm_pattern[i][j] /= 2; #ifdef __DEBUG__ - fprintf(stdout,"==== COMM PATTERN ====\n"); - for( i = 0 ; i < size ; i++){ - for(j = 0 ; j < size ; j++) - fprintf(stdout," %f ",comm_pattern[i][j]); - fprintf(stdout,"\n"); + opal_output_verbose(10, ompi_topo_base_framework.framework_output, + "==== COMM PATTERN ====\n"); + for( i = 0 ; i < size ; i++) { + dump_double_array(10, ompi_topo_base_framework.framework_output, + "", "", comm_pattern[i], size); } #endif - k = (int *)calloc(num_objs_total,sizeof(int)); - matching = (int *)calloc(size,sizeof(int)); - - tm_opt_topology = optimize_topology(tm_topology); - comm_tree = build_tree_from_topology(tm_opt_topology,comm_pattern,size,NULL,NULL); - map_topology_simple(tm_opt_topology,comm_tree,matching,size,k); -#ifdef __DEBUG__ + tm_optimize_topology(&tm_topology); + aff_mat = tm_build_affinity_mat(comm_pattern,size); + comm_tree = tm_build_tree_from_topology(tm_topology,aff_mat, NULL, NULL); + sol = tm_compute_mapping(tm_topology, comm_tree); - fprintf(stdout,"====> nb levels : %i\n",tm_topology->nb_levels); - fprintf(stdout,"Rank permutation sigma/k : "); - for(i = 0 ; i < num_objs_total ; i++) - fprintf(stdout," [%i:%i] ",i,k[i]); - fprintf(stdout,"\n"); + k = (int *)calloc(sol->k_length, sizeof(int)); + for(idx = 0 ; idx < (int)sol->k_length ; idx++) + k[idx] = sol->k[idx][0]; - fprintf(stdout,"Matching : "); - for(i = 0 ; i < size ; i++) - fprintf(stdout," [%i:%i] ",i,matching[i]); - fprintf(stdout,"\n"); +#ifdef __DEBUG__ + opal_output_verbose(10, ompi_topo_base_framework.framework_output, + "====> nb levels : %i\n",tm_topology->nb_levels); + dump_int_array(10, ompi_topo_base_framework.framework_output, + "Rank permutation sigma/k : ", "", k, num_objs_total); + assert(size == (int)sol->sigma_length); + dump_int_array(10, ompi_topo_base_framework.framework_output, + "Matching : ", "",sol->sigma, sol->sigma_length); #endif - free(comm_pattern); - free(comm_tree); - free(matching); free(obj_mapping); - for(i = 0 ; i < tm_topology->nb_levels ; i++) - free(tm_topology->node_id[i]); - free(tm_topology->node_id); - free(tm_topology->nb_nodes); - free(tm_topology->arity); - free(tm_topology); - FREE_topology(tm_opt_topology); + free(comm_pattern); + free(aff_mat->sum_row); + free(aff_mat); + tm_free_solution(sol); + tm_free_tree(comm_tree); + tm_free_topology(tm_topology); } } /* Todo : Bcast + group creation */ /* scatter the ranks */ if (OMPI_SUCCESS != (err = comm_old->c_coll->coll_scatter(k, 1, MPI_INT, - &newrank, 1, MPI_INT, - 0, comm_old,comm_old->c_coll->coll_scatter_module))) - ERR_EXIT(err); + &newrank, 1, MPI_INT, + 0, comm_old, + comm_old->c_coll->coll_scatter_module))) { + if (NULL != k) free(k); + goto release_and_return; + } if ( 0 == rank ) free(k); /* this needs to be optimized but will do for now */ - if (OMPI_SUCCESS != (err = ompi_comm_split(comm_old, 0, newrank,newcomm, false))) - ERR_EXIT(err); + if (OMPI_SUCCESS != (err = ompi_comm_split(comm_old, 0, newrank, newcomm, false))) { + goto release_and_return; + } /* end of TODO */ /* Attach the dist_graph to the newly created communicator */ (*newcomm)->c_flags |= OMPI_COMM_DIST_GRAPH; (*newcomm)->c_topo = topo_module; (*newcomm)->c_topo->reorder = reorder; + } else { /* partially distributed reordering */ + int *grank_to_lrank = NULL, *lrank_to_grank = NULL, *marked = NULL; + int node_position = 0, offset = 0, pos = 0; ompi_communicator_t *localcomm = NULL; - int *matching = (int *)calloc(num_procs_in_node,sizeof(int)); - int *lrank_to_grank = (int *)calloc(num_procs_in_node,sizeof(int)); - int *grank_to_lrank = (int *)calloc(size,sizeof(int)); - hwloc_obj_t object; - opal_hwloc_locality_t locality; - char set_as_string[64]; - opal_value_t kv; - - if (OMPI_SUCCESS != (err = ompi_comm_split(comm_old,colors[rank],ompi_process_info.my_local_rank,&localcomm, false))) - return err; - for(i = 0 ; i < num_procs_in_node ; i++) - lrank_to_grank[i] = -1; - lrank_to_grank[ompi_process_info.my_local_rank] = rank; - - for(i = 0 ; i < size ; i++) - grank_to_lrank[i] = -1; + if (OMPI_SUCCESS != (err = ompi_comm_split(comm_old, colors[rank], rank, + &localcomm, false))) { + goto release_and_return; + } - if (OMPI_SUCCESS != (err = localcomm->c_coll->coll_allgather(&rank,1,MPI_INT, - lrank_to_grank,1,MPI_INT, - localcomm, - localcomm->c_coll->coll_allgather_module))) - return err; + lrank_to_grank = (int *)calloc(num_procs_in_node, sizeof(int)); + if (OMPI_SUCCESS != (err = localcomm->c_coll->coll_allgather(&rank, 1, MPI_INT, + lrank_to_grank, 1, MPI_INT, + localcomm, localcomm->c_coll->coll_allgather_module))) { + free(lrank_to_grank); + ompi_comm_free(&localcomm); + goto release_and_return; + } + grank_to_lrank = (int *)malloc(size * sizeof(int)); + for(i = 0 ; i < size ; grank_to_lrank[i++] = -1); for(i = 0 ; i < num_procs_in_node ; i++) grank_to_lrank[lrank_to_grank[i]] = i; - if (rank == local_procs[0]){ + /* Discover the local patterns */ + if (rank == lindex_to_grank[0]) { + OPAL_OUTPUT_VERBOSE((10, ompi_topo_base_framework.framework_output, + "========== Partially Distributed Reordering ========= \n")); + local_pattern = (double *)calloc(num_procs_in_node * num_procs_in_node, sizeof(double)); + } else { + local_pattern = (double *)calloc(num_procs_in_node, sizeof(double)); + } + /* Extract the local communication pattern */ + if( true == topo->weighted ) { + for(i = 0; i < topo->indegree; i++) + if (grank_to_lrank[topo->in[i]] != -1) + local_pattern[grank_to_lrank[topo->in[i]]] += topo->inw[i]; + for(i = 0; i < topo->outdegree; i++) + if (grank_to_lrank[topo->out[i]] != -1) + local_pattern[grank_to_lrank[topo->out[i]]] += topo->outw[i]; + } + if (OMPI_SUCCESS != (err = localcomm->c_coll->coll_gather((rank == lindex_to_grank[0] ? MPI_IN_PLACE : local_pattern), + num_procs_in_node, MPI_DOUBLE, + local_pattern, num_procs_in_node, MPI_DOUBLE, + 0, localcomm, localcomm->c_coll->coll_gather_module))) { + free(lrank_to_grank); + ompi_comm_free(&localcomm); + free(grank_to_lrank); + goto release_and_return; + } + + /* The root has now the entire information, so let's crunch it */ + if (rank == lindex_to_grank[0]) { tm_topology_t *tm_topology = NULL; - tm_topology_t *tm_opt_topology = NULL; - tree_t *comm_tree = NULL; + tm_tree_t *comm_tree = NULL; + tm_solution_t *sol = NULL; + tm_affinity_mat_t *aff_mat = NULL; double **comm_pattern = NULL; -#ifdef __DEBUG__ - fprintf(stderr,"========== Partially Distributed Reordering ========= \n"); -#endif - - local_pattern = (double *)calloc(num_procs_in_node*num_procs_in_node,sizeof(double)); - for(i = 0 ; i < num_procs_in_node*num_procs_in_node ; i++) - local_pattern[i] = 0.0; - - if( true == topo->weighted ) { - for(i = 0; i < topo->indegree ; i++) - if (grank_to_lrank[topo->in[i]] != -1) - local_pattern[grank_to_lrank[topo->in[i]]] += topo->inw[i]; - for(i = 0; i < topo->outdegree ; i++) - if (grank_to_lrank[topo->out[i]] != -1) - local_pattern[grank_to_lrank[topo->out[i]]] += topo->outw[i]; - if (OMPI_SUCCESS != (err = localcomm->c_coll->coll_gather(MPI_IN_PLACE, num_procs_in_node, MPI_DOUBLE, - local_pattern, num_procs_in_node, MPI_DOUBLE, - 0,localcomm, - localcomm->c_coll->coll_gather_module))) - ERR_EXIT(err); - } - comm_pattern = (double **)malloc(num_procs_in_node*sizeof(double *)); - for(i = 0 ; i < num_procs_in_node ; i++){ - comm_pattern[i] = (double *)calloc(num_procs_in_node,sizeof(double)); - memcpy((void *)comm_pattern[i],(void *)(local_pattern + i*num_procs_in_node),num_procs_in_node*sizeof(double)); + for( i = 0; i < num_procs_in_node; i++ ) { + comm_pattern[i] = local_pattern + i * num_procs_in_node; } - /* Matrix needs to be symmetric */ - for( i = 0 ; i < num_procs_in_node ; i++) - for(j = i ; j < num_procs_in_node ; j++){ - comm_pattern[i][j] += comm_pattern[j][i]; - comm_pattern[j][i] = comm_pattern[i][j]; + /* Matrix needs to be symmetric. Beware: as comm_patterns + * refers to local_pattern we indirectly alter the content + * of local_pattern */ + for( i = 0; i < num_procs_in_node ; i++ ) + for( j = i; j < num_procs_in_node ; j++ ) { + comm_pattern[i][j] = (comm_pattern[i][j] + comm_pattern[j][i]) / 2; + comm_pattern[j][i] = comm_pattern[i][j]; } - for( i = 0 ; i < num_procs_in_node ; i++) - for(j = 0 ; j < num_procs_in_node ; j++) - comm_pattern[i][j] /= 2; #ifdef __DEBUG__ - fprintf(stdout,"========== COMM PATTERN ============= \n"); + OPAL_OUTPUT_VERBOSE((10, ompi_topo_base_framework.framework_output, + "========== COMM PATTERN ============= \n")); for(i = 0 ; i < num_procs_in_node ; i++){ - fprintf(stdout," %i : ",i); - for(j = 0; j < num_procs_in_node ; j++) - fprintf(stdout," %f ",comm_pattern[i][j]); - fprintf(stdout,"\n"); + opal_output_verbose(10, ompi_topo_base_framework.framework_output," %i : ",i); + dump_double_array(10, ompi_topo_base_framework.framework_output, + "", "", comm_pattern[i], num_procs_in_node); } - fprintf(stdout,"======================= \n"); + opal_output_verbose(10, ompi_topo_base_framework.framework_output, + "======================= \n"); #endif tm_topology = (tm_topology_t *)malloc(sizeof(tm_topology_t)); tm_topology->nb_levels = numlevels; - tm_topology->arity = (int *)calloc(tm_topology->nb_levels,sizeof(int)); - tm_topology->nb_nodes = (int *)calloc(tm_topology->nb_levels,sizeof(int)); + tm_topology->arity = (int *)calloc(tm_topology->nb_levels, sizeof(int)); + tm_topology->nb_nodes = (size_t *)calloc(tm_topology->nb_levels, sizeof(size_t)); tm_topology->node_id = (int **)malloc(tm_topology->nb_levels*sizeof(int *)); + tm_topology->node_rank = (int **)malloc(tm_topology->nb_levels*sizeof(int *)); + for(i = 0 ; i < tm_topology->nb_levels ; i++){ - int nb_objs = hwloc_get_nbobjs_by_depth(opal_hwloc_topology,tracker[i]->depth); + int nb_objs = hwloc_get_nbobjs_by_depth(opal_hwloc_topology, tracker[i]->depth); tm_topology->nb_nodes[i] = nb_objs; - tm_topology->node_id[i] = (int*)malloc(sizeof(int)*nb_objs); tm_topology->arity[i] = tracker[i]->arity; - for(j = 0 ; j < nb_objs ; j++) - tm_topology->node_id[i][j] = -1; - for(j = 0 ; j < nb_objs ; j++) - if ( j < num_procs_in_node ) - tm_topology->node_id[i][j] = localrank_to_objnum[j]; + tm_topology->node_id[i] = (int *)calloc(tm_topology->nb_nodes[i], sizeof(int)); + tm_topology->node_rank[i] = (int * )calloc(tm_topology->nb_nodes[i], sizeof(int)); + for(j = 0; j < (int)tm_topology->nb_nodes[i] ; j++){ + tm_topology->node_id[i][j] = j; + tm_topology->node_rank[i][j] = j; + } } + /* unused for now*/ + tm_topology->cost = (double*)calloc(tm_topology->nb_levels,sizeof(double)); + + tm_topology->nb_proc_units = num_objs_in_node; + //tm_topology->nb_proc_units = num_procs_in_node; + tm_topology->nb_constraints = 0; + for(i = 0; i < num_procs_in_node ; i++) + if (localrank_to_objnum[i] != -1) + tm_topology->nb_constraints++; + + tm_topology->constraints = (int *)calloc(tm_topology->nb_constraints,sizeof(int)); + for(idx = 0,i = 0; i < num_procs_in_node ; i++) + if (localrank_to_objnum[i] != -1) + tm_topology->constraints[idx++] = localrank_to_objnum[i]; + + tm_topology->oversub_fact = 1; + #ifdef __DEBUG__ - fprintf(stdout,"Levels in topo : %i | num procs in node : %i\n",tm_topology->nb_levels,num_procs_in_node); - for(i = 0; i < tm_topology->nb_levels ; i++){ - fprintf(stdout,"Nb objs for level %i : %i | arity %i\n ",i,tm_topology->nb_nodes[i],tm_topology->arity[i]); - for(j = 0; j < tm_topology->nb_nodes[i] ; j++) - fprintf(stdout,"Obj id : %i |",tm_topology->node_id[i][j]); - fprintf(stdout,"\n"); + assert(num_objs_in_node == (int)tm_topology->nb_nodes[tm_topology->nb_levels-1]); + OPAL_OUTPUT_VERBOSE((10, ompi_topo_base_framework.framework_output, + "Levels in topo : %i | num procs in node : %i\n", + tm_topology->nb_levels,num_procs_in_node)); + for(i = 0; i < tm_topology->nb_levels ; i++) { + OPAL_OUTPUT_VERBOSE((10, ompi_topo_base_framework.framework_output, + "Nb objs for level %i : %lu | arity %i\n ", + i, tm_topology->nb_nodes[i],tm_topology->arity[i])); + dump_int_array(10, ompi_topo_base_framework.framework_output, + "", "Obj id ", tm_topology->node_id[i], tm_topology->nb_nodes[i]); } - display_topology(tm_topology); + tm_display_topology(tm_topology); #endif + tm_optimize_topology(&tm_topology); + aff_mat = tm_build_affinity_mat(comm_pattern,num_procs_in_node); + comm_tree = tm_build_tree_from_topology(tm_topology,aff_mat, NULL, NULL); + sol = tm_compute_mapping(tm_topology, comm_tree); - tm_opt_topology = optimize_topology(tm_topology); - comm_tree = build_tree_from_topology(tm_opt_topology,comm_pattern,num_procs_in_node,NULL,NULL); - map_topology_simple(tm_opt_topology,comm_tree,matching,num_procs_in_node,NULL); + assert((int)sol->k_length == num_objs_in_node); -#ifdef __DEBUG__ + k = (int *)calloc(sol->k_length, sizeof(int)); + for(idx = 0 ; idx < (int)sol->k_length ; idx++) + k[idx] = sol->k[idx][0]; - fprintf(stdout,"Matching :"); - for(i = 0 ; i < num_procs_in_node ; i++) - fprintf(stdout," %i ",matching[i]); - fprintf(stdout,"\n"); +#ifdef __DEBUG__ + OPAL_OUTPUT_VERBOSE((10, ompi_topo_base_framework.framework_output, + "====> nb levels : %i\n",tm_topology->nb_levels)); + dump_int_array(10, ompi_topo_base_framework.framework_output, + "Rank permutation sigma/k : ", "", k, num_procs_in_node); + assert(num_procs_in_node == (int)sol->sigma_length); + dump_int_array(10, ompi_topo_base_framework.framework_output, + "Matching : ", "", sol->sigma, sol->sigma_length); #endif - for(i = 0 ; i < num_procs_in_node ; i++) - free(comm_pattern[i]); + + free(aff_mat->sum_row); + free(aff_mat); free(comm_pattern); - for(i = 0; i < tm_topology->nb_levels ; i++) - free(tm_topology->node_id[i]); - free(tm_topology->node_id); - free(tm_topology->nb_nodes); - free(tm_topology->arity); - free(tm_topology); - FREE_topology(tm_opt_topology); - } else { - local_pattern = (double *)calloc(num_procs_in_node,sizeof(double)); - for(i = 0 ; i < num_procs_in_node ; i++) - local_pattern[i] = 0.0; - - if( true == topo->weighted ) { - for(i = 0; i < topo->indegree ; i++) - if (grank_to_lrank[topo->in[i]] != -1) - local_pattern[grank_to_lrank[topo->in[i]]] += topo->inw[i]; - for(i = 0; i < topo->outdegree ; i++) - if (grank_to_lrank[topo->out[i]] != -1) - local_pattern[grank_to_lrank[topo->out[i]]] += topo->outw[i]; - if (OMPI_SUCCESS != (err = localcomm->c_coll->coll_gather(local_pattern, num_procs_in_node, MPI_DOUBLE, - NULL,0,0, - 0,localcomm, - localcomm->c_coll->coll_gather_module))) - ERR_EXIT(err); - } + tm_free_solution(sol); + tm_free_tree(comm_tree); + tm_free_topology(tm_topology); } - if (OMPI_SUCCESS != (err = localcomm->c_coll->coll_bcast(matching, num_procs_in_node, - MPI_INT,0,localcomm, - localcomm->c_coll->coll_bcast_module))) - ERR_EXIT(err); - - object = hwloc_get_obj_by_depth(opal_hwloc_topology, - effective_depth,matching[ompi_process_info.my_local_rank]); - if( NULL == object) goto fallback; - hwloc_bitmap_copy(set,object->cpuset); - hwloc_bitmap_singlify(set); - hwloc_err = hwloc_set_cpubind(opal_hwloc_topology,set,0); - if( -1 == hwloc_err) goto fallback; - - /* Report new binding to ORTE/OPAL */ - /* hwloc_bitmap_list_asprintf(&orte_process_info.cpuset,set); */ - err = hwloc_bitmap_snprintf (set_as_string,64,set); + /* Todo : Bcast + group creation */ + /* scatter the ranks */ + if (OMPI_SUCCESS != (err = localcomm->c_coll->coll_scatter(k, 1, MPI_INT, + &newrank, 1, MPI_INT, + 0, localcomm, + localcomm->c_coll->coll_scatter_module))) { + if (NULL != k) free(k); + ompi_comm_free(&localcomm); + free(lrank_to_grank); + free(grank_to_lrank); + goto release_and_return; + } -#ifdef __DEBUG__ - fprintf(stdout,"Bitmap str size : %i\n",err); -#endif + /* compute the offset of newrank before the split */ + /* use the colors array, not the vpids */ + marked = (int *)malloc((num_nodes-1)*sizeof(int)); + for(idx = 0 ; idx < num_nodes - 1 ; idx++) + marked[idx] = -1; + + while( (node_position != rank) && (colors[node_position] != colors[rank])) { + /* Have we already counted the current color ? */ + for(idx = 0; idx < pos; idx++) + if( marked[idx] == colors[node_position] ) + goto next_iter; /* yes, let's skip the rest */ + /* How many elements of this color are here ? none before the current position */ + for(; idx < size; idx++) + if(colors[idx] == colors[node_position]) + offset++; + marked[pos++] = colors[node_position]; + next_iter: + node_position++; + } + newrank += offset; + free(marked); - OBJ_CONSTRUCT(&kv, opal_value_t); - kv.key = strdup(OPAL_PMIX_CPUSET); - kv.type = OPAL_STRING; - kv.data.string = strdup(set_as_string); + if (rank == lindex_to_grank[0]) + free(k); - (void)opal_pmix.store_local((opal_process_name_t*)ORTE_PROC_MY_NAME, &kv); - OBJ_DESTRUCT(&kv); + /* this needs to be optimized but will do for now */ + if (OMPI_SUCCESS != (err = ompi_comm_split(comm_old, 0, newrank, newcomm, false))) { + ompi_comm_free(&localcomm); + free(lrank_to_grank); + free(grank_to_lrank); + goto release_and_return; + } + /* end of TODO */ - locality = opal_hwloc_base_get_relative_locality(opal_hwloc_topology, - orte_process_info.cpuset,set_as_string); - OBJ_CONSTRUCT(&kv, opal_value_t); - kv.key = strdup(OPAL_PMIX_LOCALITY); - kv.type = OPAL_UINT16; - kv.data.uint16 = locality; - (void)opal_pmix.store_local((opal_process_name_t*)ORTE_PROC_MY_NAME, &kv); - OBJ_DESTRUCT(&kv); + /* Attach the dist_graph to the newly created communicator */ + (*newcomm)->c_flags |= OMPI_COMM_DIST_GRAPH; + (*newcomm)->c_topo = topo_module; + (*newcomm)->c_topo->reorder = reorder; - if( OMPI_SUCCESS != (err = ompi_comm_create(comm_old, - comm_old->c_local_group, - newcomm))) { - ERR_EXIT(err); - } else { - /* Attach the dist_graph to the newly created communicator */ - (*newcomm)->c_flags |= OMPI_COMM_DIST_GRAPH; - (*newcomm)->c_topo = topo_module; - (*newcomm)->c_topo->reorder = reorder; - } - free(matching); free(grank_to_lrank); free(lrank_to_grank); } /* distributed reordering end */ - if(rank == local_procs[0]) - free(tracker); - free(nodes_roots); - free(local_procs); - free(local_pattern); - free(localrank_to_objnum); + release_and_return: + if (NULL != reqs ) free(reqs); + if (NULL != tracker) free(tracker); + if (NULL != local_pattern) free(local_pattern); free(colors); - hwloc_bitmap_free(set); - return err; + if (NULL != lindex_to_grank) free(lindex_to_grank); + if (NULL != nodes_roots) free(nodes_roots); /* only on root */ + if (NULL != localrank_to_objnum) free(localrank_to_objnum); + if( NULL != set) hwloc_bitmap_free(set); + /* As the reordering is optional, if we encountered an error during the reordering, + * we can safely return with just a duplicate of the original communicator associated + * with the topology. */ + if( OMPI_SUCCESS != err ) goto fallback; + return OMPI_SUCCESS; } diff --git a/ompi/mca/topo/treematch/treematch/IntConstantInitializedVector.c b/ompi/mca/topo/treematch/treematch/IntConstantInitializedVector.c index 00ee56a1610..25a6708b2c9 100644 --- a/ompi/mca/topo/treematch/treematch/IntConstantInitializedVector.c +++ b/ompi/mca/topo/treematch/treematch/IntConstantInitializedVector.c @@ -2,13 +2,12 @@ #include #include "IntConstantInitializedVector.h" - int intCIV_isInitialized(int_CIVector * v, int i) { if(v->top == 0) return 0; if(v->from[i] >= 0) - if(v->from[i] < v->top && v->to[v->from[i]] == i) + if(v->from[i] < v->top && v->to[v->from[i]] == i) return 1; return 0; } @@ -45,7 +44,7 @@ int intCIV_set(int_CIVector * v, int i, int val) v->top++; } v->vec[i] = val; - return 0; + return 0; } int intCIV_get(int_CIVector * v, int i) diff --git a/ompi/mca/topo/treematch/treematch/IntConstantInitializedVector.h b/ompi/mca/topo/treematch/treematch/IntConstantInitializedVector.h index 1b237b1b0ee..25e5a1d759f 100644 --- a/ompi/mca/topo/treematch/treematch/IntConstantInitializedVector.h +++ b/ompi/mca/topo/treematch/treematch/IntConstantInitializedVector.h @@ -12,5 +12,4 @@ void intCIV_exit(int_CIVector * v); int intCIV_set(int_CIVector * v, int i, int val); int intCIV_get(int_CIVector * v, int i); - #endif /*INTEGER_CONSTANT_INITIALIZED_VECTOR*/ diff --git a/ompi/mca/topo/treematch/treematch/PriorityQueue.c b/ompi/mca/topo/treematch/treematch/PriorityQueue.c new file mode 100644 index 00000000000..471583f4af9 --- /dev/null +++ b/ompi/mca/topo/treematch/treematch/PriorityQueue.c @@ -0,0 +1,170 @@ +#include +#include "PriorityQueue.h" + +/* + This comparison function is used to sort elements in key descending order. +*/ +static int compFunc(const FiboNode * const node1, const FiboNode * const node2) +{ + return + ( ( ((QueueElement*)(node1))->key > ((QueueElement*)(node2))->key ) ? -1 : 1); +} + +int PQ_init(PriorityQueue * const q, int size) +{ + int i; + q->size = size; + q->elements = malloc(sizeof(QueueElement *) * size); + for(i=0; i < size; i++) + q->elements[i]=NULL; + return fiboTreeInit((FiboTree *)q, compFunc); +} + +void PQ_exit(PriorityQueue * const q) +{ + + int i; + for(i = 0; i < q->size; i++) + { + if(q->elements[i] != NULL) + free(q->elements[i]); + } + if(q->elements != NULL) + free(q->elements); + fiboTreeExit((FiboTree *)q); +} +void PQ_free(PriorityQueue * const q) +{ + int i; + for(i = 0; i < q->size; i++) + { + if(q->elements[i] != NULL) + free(q->elements[i]); + } + fiboTreeFree((FiboTree *)q); +} + +int PQ_isEmpty(PriorityQueue * const q) +{ + FiboTree * tree = (FiboTree *)q; +/* if the tree root is linked to itself then the tree is empty */ + if(&(tree->rootdat) == (tree->rootdat.linkdat.nextptr)) + return 1; + return 0; +} + +void PQ_insertElement(PriorityQueue * const q, QueueElement * const e) +{ + if(e->value >= 0 && e->value < q->size) + { + fiboTreeAdd((FiboTree *)q, (FiboNode *)(e)); + q->elements[e->value] = e; + e->isInQueue = 1; + } +} +void PQ_deleteElement(PriorityQueue * const q, QueueElement * const e) +{ + fiboTreeDel((FiboTree *)q, (FiboNode *)(e)); + q->elements[e->value] = NULL; + e->isInQueue = 0; +} + +void PQ_insert(PriorityQueue * const q, int val, double key) +{ + if( val >= 0 && val < q->size) + { + QueueElement * e = malloc(sizeof(QueueElement)); + e->value = val; + e->key = key; + PQ_insertElement(q, e); + } +} + +void PQ_delete(PriorityQueue * const q, int val) +{ + QueueElement * e = q->elements[val]; + PQ_deleteElement(q, e); + free(e); +} + +QueueElement * PQ_findMaxElement(PriorityQueue * const q) +{ + QueueElement * e = (QueueElement *)(fiboTreeMin((FiboTree *)q)); + return e; +} +QueueElement * PQ_deleteMaxElement(PriorityQueue * const q) +{ + QueueElement * e = (QueueElement *)(fiboTreeMin((FiboTree *)q)); + if(e != NULL) + { + PQ_deleteElement(q, e); + } + return e; +} + +double PQ_findMaxKey(PriorityQueue * const q) +{ + QueueElement * e = PQ_findMaxElement(q); + if(e!=NULL) + return e->key; + return 0; +} + +int PQ_deleteMax(PriorityQueue * const q) +{ + QueueElement * e = PQ_deleteMaxElement(q); + int res = -1; + if(e != NULL) + res = e->value; + free(e); + return res; +} + +void PQ_increaseElementKey(PriorityQueue * const q, QueueElement * const e, double i) +{ + if(e->isInQueue) + { + PQ_deleteElement(q, e); + e->key += i; + PQ_insertElement(q, e); + } +} +void PQ_decreaseElementKey(PriorityQueue * const q, QueueElement * const e, double i) +{ + if(e->isInQueue) + { + PQ_deleteElement(q, e); + e->key -= i; + PQ_insertElement(q, e); + } +} +void PQ_adjustElementKey(PriorityQueue * const q, QueueElement * const e, double i) +{ + if(e->isInQueue) + { + PQ_deleteElement(q, e); + e->key = i; + PQ_insertElement(q, e); + } +} + +void PQ_increaseKey(PriorityQueue * const q, int val, double i) +{ + QueueElement * e = q->elements[val]; + if(e != NULL) + PQ_increaseElementKey(q, e, i); +} + +void PQ_decreaseKey(PriorityQueue * const q, int val, double i) +{ + QueueElement * e = q->elements[val]; + if(e != NULL) + PQ_decreaseElementKey(q, e, i); +} + +void PQ_adjustKey(PriorityQueue * const q, int val, double i) +{ + QueueElement * e = q->elements[val]; + if(e != NULL) + PQ_adjustElementKey(q, e, i); +} diff --git a/ompi/mca/topo/treematch/treematch/PriorityQueue.h b/ompi/mca/topo/treematch/treematch/PriorityQueue.h new file mode 100644 index 00000000000..c9ef1d2291a --- /dev/null +++ b/ompi/mca/topo/treematch/treematch/PriorityQueue.h @@ -0,0 +1,108 @@ +#ifndef PRIORITY_QUEUE +#define PRIORITY_QUEUE + +#include "fibo.h" + +/* + This is the struct for our elements in a PriorityQueue. + The node is at first place so we only have to use a cast to switch between QueueElement's pointer and Fibonode's pointer. +*/ +typedef struct QueueElement_ +{ + FiboNode node; /*the node used to insert the element in a FiboTree*/ + double key; /*the key of the element, elements are sorted in a descending order according to their key*/ + int value; + int isInQueue; +} QueueElement; + +typedef struct PriorityQueue_ +{ + FiboTree tree; + QueueElement ** elements; /*a vector of element with their value as key so we can easily retreive an element from its value */ + int size; /*the size allocated to the elements vector*/ +} PriorityQueue; + + +/* + PQ_init initiates a PriorityQueue with a size given in argument and sets compFunc as comparison function. Note that you have to allocate memory to the PriorityQueue pointer before calling this function. + Returns : + 0 if success + !0 if failed + + PQ_free simply empties the PriorityQueue but does not free the memory used by its elements. + PQ_exit destroys the PriorityQueue without freeing elements. The PriorityQueue is no longer usable without using PQ_init again. +Note that the PriorityQueue pointer is not deallocated. +*/ +int PQ_init(PriorityQueue * const, int size); +void PQ_free(PriorityQueue * const); +void PQ_exit(PriorityQueue * const); + +/* + PQ_isEmpty returns 1 if the PriorityQueue is empty, 0 otherwise. +*/ +int PQ_isEmpty(PriorityQueue * const); + +/* + PQ_insertElement inserts the given QueueElement in the given PriorityQueue +*/ +void PQ_insertElement(PriorityQueue * const, QueueElement * const); +/* + PQ_deleteElement delete the element given in argument from the PriorityQueue. +*/ +void PQ_deleteElement(PriorityQueue * const, QueueElement * const); + +/* + PQ_insert inserts an element in the PriorityQueue with the value and key given in argument. +*/ +void PQ_insert(PriorityQueue * const, int val, double key); +/* + PQ_delete removes the first element found with the value given in argument and frees it. +*/ +void PQ_delete(PriorityQueue * const, int val); + + +/* + PQ_findMaxElement returns the QueueElement with the greatest key in the given PriorityQueue +*/ +QueueElement * PQ_findMaxElement(PriorityQueue * const); +/* + PQ_deleteMaxElement returns the QueueElement with the geatest key in the given PriorityQueue and removes it from the queue. +*/ +QueueElement * PQ_deleteMaxElement(PriorityQueue * const); + +/* + PQ_findMax returns the key of the element with the geatest key in the given PriorityQueue +*/ +double PQ_findMaxKey(PriorityQueue * const); +/* + PQ_deleteMax returns the value of the element with the greatest key in the given PriorityQueue and removes it from the queue. +*/ +int PQ_deleteMax(PriorityQueue * const); + +/* + PQ_increaseElementKey adds the value of i to the key of the given QueueElement +*/ +void PQ_increaseElementKey(PriorityQueue * const, QueueElement * const, double i); +/* + PQ_decreaseElementKey substracts the value of i from the key of the given QueueElement +*/ +void PQ_decreaseElementKey(PriorityQueue * const, QueueElement * const, double i); +/* + PQ_adjustElementKey sets to i the key of the given QueueElement. +*/ +void PQ_adjustElementKey(PriorityQueue * const, QueueElement * const, double i); + +/* + PQ_increaseKey adds i to the key of the first element found with a value equal to val in the PriorityQueue. +*/ +void PQ_increaseKey(PriorityQueue * const, int val, double i); +/* + PQ_decreaseKey substracts i from the key of the first element found with a value equal to val in the PriorityQueue. +*/ +void PQ_decreaseKey(PriorityQueue * const, int val, double i); +/* + PQ_adjustKey sets to i the key of the first element found with a value equal to val in the PriorityQueue. +*/ +void PQ_adjustKey(PriorityQueue * const, int val, double i); + +#endif /*PRIORITY_QUEUE*/ diff --git a/ompi/mca/topo/treematch/treematch/fibo.c b/ompi/mca/topo/treematch/treematch/fibo.c new file mode 100644 index 00000000000..97070e7273a --- /dev/null +++ b/ompi/mca/topo/treematch/treematch/fibo.c @@ -0,0 +1,372 @@ +/* Copyright 2010 IPB, INRIA & CNRS +** +** This file originally comes from the Scotch software package for +** static mapping, graph partitioning and sparse matrix ordering. +** +** This software is governed by the CeCILL-B license under French law +** and abiding by the rules of distribution of free software. You can +** use, modify and/or redistribute the software under the terms of the +** CeCILL-B license as circulated by CEA, CNRS and INRIA at the following +** URL: "http://www.cecill.info". +** +** As a counterpart to the access to the source code and rights to copy, +** modify and redistribute granted by the license, users are provided +** only with a limited warranty and the software's author, the holder of +** the economic rights, and the successive licensors have only limited +** liability. +** +** In this respect, the user's attention is drawn to the risks associated +** with loading, using, modifying and/or developing or reproducing the +** software by the user in light of its specific status of free software, +** that may mean that it is complicated to manipulate, and that also +** therefore means that it is reserved for developers and experienced +** professionals having in-depth computer knowledge. Users are therefore +** encouraged to load and test the software's suitability as regards +** their requirements in conditions enabling the security of their +** systems and/or data to be ensured and, more generally, to use and +** operate it in the same conditions as regards security. +** +** The fact that you are presently reading this means that you have had +** knowledge of the CeCILL-B license and that you accept its terms. +*/ +/************************************************************/ +/** **/ +/** NAME : fibo.c **/ +/** **/ +/** AUTHOR : Francois PELLEGRINI **/ +/** **/ +/** FUNCTION : This module handles Fibonacci trees. **/ +/** **/ +/** DATES : # Version 1.0 : from : 01 may 2010 **/ +/** to 12 may 2010 **/ +/** **/ +/************************************************************/ + +/* +** The defines and includes. +*/ + +#define FIBO + +#include +#include +#include +#include "fibo.h" + +/* Helper macros which can be redefined at compile time. */ + +#ifndef INT +#define INT int /* "long long" can be used on 64-bit systems */ +#endif /* INT */ + +#ifndef errorPrint +#define errorPrint(s) fprintf (stderr, s) +#endif /* errorPrint */ + +#ifndef memAlloc +#define memAlloc malloc +#define memSet memset +#define memFree free +#endif /* memAlloc */ + +/*********************************************/ +/* */ +/* These routines deal with Fibonacci trees. */ +/* */ +/*********************************************/ + +/* This routine initializes a Fibonacci +** tree structure. +** It returns: +** - 0 : in case of success. +** - !0 : on error. +*/ + +int +fiboTreeInit ( +FiboTree * const treeptr, +int (* cmpfptr) (const FiboNode * const, const FiboNode * const)) +{ + if ((treeptr->degrtab = (FiboNode **) memAlloc ((sizeof (INT) << 3) * sizeof (FiboNode *))) == NULL) /* As many cells as there are bits in an INT */ + return (1); + + memSet (treeptr->degrtab, 0, (sizeof (INT) << 3) * sizeof (FiboNode *)); /* Make degree array ready for consolidation: all cells set to NULL */ + + treeptr->rootdat.linkdat.prevptr = /* Link root node to itself */ + treeptr->rootdat.linkdat.nextptr = &treeptr->rootdat; + treeptr->cmpfptr = cmpfptr; + + return (0); +} + +/* This routine flushes the contents of +** the given Fibonacci tree. +** It returns: +** - VOID : in all cases. +*/ + +void +fiboTreeExit ( +FiboTree * const treeptr) +{ + if (treeptr->degrtab != NULL) + memFree (treeptr->degrtab); +} + +/* This routine flushes the contents of +** the given Fibonacci tree. It does not +** free any of its contents, but instead +** makes the tree structure look empty again. +** It returns: +** - VOID : in all cases. +*/ + +void +fiboTreeFree ( +FiboTree * const treeptr) +{ + treeptr->rootdat.linkdat.prevptr = /* Link root node to itself */ + treeptr->rootdat.linkdat.nextptr = &treeptr->rootdat; +} + +/* This routine perform the consolidation +** of roots per degree. It returns the best +** element found because this element is not +** recorded in the data structure itself. +** It returns: +** - !NULL : pointer to best element found. +** - NULL : Fibonacci tree is empty. +*/ + +FiboNode * +fiboTreeConsolidate ( +FiboTree * const treeptr) +{ + FiboNode ** restrict degrtab; + int degrmax; + int degrval; + FiboNode * rootptr; + FiboNode * nextptr; + FiboNode * bestptr; + + degrtab = treeptr->degrtab; + + for (rootptr = treeptr->rootdat.linkdat.nextptr, nextptr = rootptr->linkdat.nextptr, degrmax = 0; /* For all roots in root list */ + rootptr != &treeptr->rootdat; ) { + degrval = rootptr->deflval >> 1; /* Get degree, getting rid of flag part */ +#ifdef FIBO_DEBUG + if (degrval >= (sizeof (INT) << 3)) + errorPrint ("fiboTreeConsolidate: invalid node degree"); +#endif /* FIBO_DEBUG */ + if (degrtab[degrval] == NULL) { /* If no tree with same degree already found */ + if (degrval > degrmax) /* Record highest degree found */ + degrmax = degrval; + + degrtab[degrval] = rootptr; /* Record tree as first tree with this degree */ + rootptr = nextptr; /* Process next root in list during next iteration */ + nextptr = rootptr->linkdat.nextptr; + } + else { + FiboNode * oldrptr; /* Root which will no longer be a root */ + FiboNode * chldptr; + + oldrptr = degrtab[degrval]; /* Assume old root is worse */ + if (treeptr->cmpfptr (oldrptr, rootptr) <= 0) { /* If old root is still better */ + oldrptr = rootptr; /* This root will be be linked to it */ + rootptr = degrtab[degrval]; /* We will go on processing this root */ + } + + degrtab[degrval] = NULL; /* Remaining root changes degree so leaves this cell */ + fiboTreeUnlink (oldrptr); /* Old root is no longer a root */ + oldrptr->deflval &= ~1; /* Whatever old root flag was, it is reset to 0 */ + oldrptr->pareptr = rootptr; /* Remaining root is now father of old root */ + + chldptr = rootptr->chldptr; /* Get first child of remaining root */ + if (chldptr != NULL) { /* If remaining root had already some children, link old root with them */ + rootptr->deflval += 2; /* Increase degree by 1, that is, by 2 with left shift in deflval */ + fiboTreeLinkAfter (chldptr, oldrptr); + } + else { /* Old root becomes first child of remaining root */ + rootptr->deflval = 2; /* Real degree set to 1, and flag set to 0 */ + rootptr->chldptr = oldrptr; + oldrptr->linkdat.prevptr = /* Chain old root to oneself as only child */ + oldrptr->linkdat.nextptr = oldrptr; + } + } /* Process again remaining root as its degree has changed */ + } + + bestptr = NULL; + for (degrval = 0; degrval <= degrmax; degrval ++) { + if (degrtab[degrval] != NULL) { /* If some tree is found */ + bestptr = degrtab[degrval]; /* Record it as potential best */ + degrtab[degrval] = NULL; /* Clean-up used part of array */ + degrval ++; /* Go on at next cell in next loop */ + break; + } + } + for ( ; degrval <= degrmax; degrval ++) { /* For remaining roots once a potential best root has been found */ + if (degrtab[degrval] != NULL) { + if (treeptr->cmpfptr (degrtab[degrval], bestptr) < 0) /* If new root is better */ + bestptr = degrtab[degrval]; /* Record new root as best root */ + degrtab[degrval] = NULL; /* Clean-up used part of array */ + } + } + + return (bestptr); +} + +/* This routine returns the node of minimum +** key in the given tree. The node is searched +** for each time this routine is called, so this +** information should be recorded if needed. +** This is the non-macro version, for testing +** and setting up breakpoints. +** It returns: +** - !NULL : pointer to best element found. +** - NULL : Fibonacci tree is empty. +*/ + +#ifndef fiboTreeMin + +FiboNode * +fiboTreeMin ( +FiboTree * const treeptr) +{ + FiboNode * bestptr; + + bestptr = fiboTreeMinMacro (treeptr); + +#ifdef FIBO_DEBUG + fiboTreeCheck (treeptr); +#endif /* FIBO_DEBUG */ + + return (bestptr); +} + +#endif /* fiboTreeMin */ + +/* This routine adds the given node to the +** given tree. This is the non-macro version, +** for testing and setting up breakpoints. +** It returns: +** - void : in all cases. +*/ + +#ifndef fiboTreeAdd + +void +fiboTreeAdd ( +FiboTree * const treeptr, +FiboNode * const nodeptr) +{ + fiboTreeAddMacro (treeptr, nodeptr); + +#ifdef FIBO_DEBUG + fiboTreeCheck (treeptr); +#endif /* FIBO_DEBUG */ +} + +#endif /* fiboTreeAdd */ + +/* This routine deletes the given node from +** the given tree, whatever ths node is (root +** or non root). This is the non-macro version, +** for testing and setting up breakpoints. +** It returns: +** - void : in all cases. +*/ + +#ifndef fiboTreeDel + +void +fiboTreeDel ( +FiboTree * const treeptr, +FiboNode * const nodeptr) +{ + fiboTreeDelMacro (treeptr, nodeptr); + +#ifdef FIBO_DEBUG + nodeptr->pareptr = + nodeptr->chldptr = + nodeptr->linkdat.prevptr = + nodeptr->linkdat.nextptr = NULL; + + fiboTreeCheck (treeptr); +#endif /* FIBO_DEBUG */ +} + +#endif /* fiboTreeDel */ + +/* This routine checks the consistency of the +** given linked list. +** It returns: +** - !NULL : pointer to the vertex. +** - NULL : if no such vertex available. +*/ + +#ifdef FIBO_DEBUG + +static +int +fiboTreeCheck2 ( +const FiboNode * const nodeptr) +{ + FiboNode * chldptr; + int degrval; + + degrval = 0; + chldptr = nodeptr->chldptr; + if (chldptr != NULL) { + do { + if (chldptr->linkdat.nextptr->linkdat.prevptr != chldptr) { + errorPrint ("fiboTreeCheck: bad child linked list"); + return (1); + } + + if (chldptr->pareptr != nodeptr) { + errorPrint ("fiboTreeCheck: bad child parent"); + return (1); + } + + if (fiboTreeCheck2 (chldptr) != 0) + return (1); + + degrval ++; + chldptr = chldptr->linkdat.nextptr; + } while (chldptr != nodeptr->chldptr); + } + + if (degrval != (nodeptr->deflval >> 1)) { /* Real node degree is obtained by discarding lowest bit */ + errorPrint ("fiboTreeCheck2: invalid child information"); + return (1); + } + + return (0); +} + +int +fiboTreeCheck ( +const FiboTree * const treeptr) +{ + FiboNode * nodeptr; + + for (nodeptr = treeptr->rootdat.linkdat.nextptr; + nodeptr != &treeptr->rootdat; nodeptr = nodeptr->linkdat.nextptr) { + if (nodeptr->linkdat.nextptr->linkdat.prevptr != nodeptr) { + errorPrint ("fiboTreeCheck: bad root linked list"); + return (1); + } + + if (nodeptr->pareptr != NULL) { + errorPrint ("fiboTreeCheck: bad root parent"); + return (1); + } + + if (fiboTreeCheck2 (nodeptr) != 0) + return (1); + } + + return (0); +} + +#endif /* FIBO_DEBUG */ diff --git a/ompi/mca/topo/treematch/treematch/fibo.h b/ompi/mca/topo/treematch/treematch/fibo.h new file mode 100644 index 00000000000..32e0a7c0824 --- /dev/null +++ b/ompi/mca/topo/treematch/treematch/fibo.h @@ -0,0 +1,205 @@ +/* Copyright 2010 IPB, INRIA & CNRS +** +** This file originally comes from the Scotch software package for +** static mapping, graph partitioning and sparse matrix ordering. +** +** This software is governed by the CeCILL-B license under French law +** and abiding by the rules of distribution of free software. You can +** use, modify and/or redistribute the software under the terms of the +** CeCILL-B license as circulated by CEA, CNRS and INRIA at the following +** URL: "http://www.cecill.info". +** +** As a counterpart to the access to the source code and rights to copy, +** modify and redistribute granted by the license, users are provided +** only with a limited warranty and the software's author, the holder of +** the economic rights, and the successive licensors have only limited +** liability. +** +** In this respect, the user's attention is drawn to the risks associated +** with loading, using, modifying and/or developing or reproducing the +** software by the user in light of its specific status of free software, +** that may mean that it is complicated to manipulate, and that also +** therefore means that it is reserved for developers and experienced +** professionals having in-depth computer knowledge. Users are therefore +** encouraged to load and test the software's suitability as regards +** their requirements in conditions enabling the security of their +** systems and/or data to be ensured and, more generally, to use and +** operate it in the same conditions as regards security. +** +** The fact that you are presently reading this means that you have had +** knowledge of the CeCILL-B license and that you accept its terms. +*/ +/************************************************************/ +/** **/ +/** NAME : fibo.h **/ +/** **/ +/** AUTHOR : Francois PELLEGRINI **/ +/** **/ +/** FUNCTION : This module contains the definitions of **/ +/** the generic Fibonacci trees. **/ +/** **/ +/** DATES : # Version 1.0 : from : 01 may 2010 **/ +/** to 12 may 2010 **/ +/** **/ +/** NOTES : # Since this module has originally been **/ +/** designed as a gain keeping data **/ +/** structure for local optimization **/ +/** algorithms, the computation of the **/ +/** best node is only done when actually **/ +/** searching for it. **/ +/** This is most useful when many **/ +/** insertions and deletions can take **/ +/** place in the mean time. This is why **/ +/** this data structure does not keep **/ +/** track of the best node, unlike most **/ +/** implementations do. **/ +/** **/ +/************************************************************/ + +/* +** The type and structure definitions. +*/ + +/* The doubly linked list structure. */ + +typedef struct FiboLink_ { + struct FiboNode_ * prevptr; /*+ Pointer to previous sibling element +*/ + struct FiboNode_ * nextptr; /*+ Pointer to next sibling element +*/ +} FiboLink; + +/* The tree node data structure. The deflval + variable merges degree and flag variables. + The degree of a node is smaller than + "bitsizeof (INT)", so it can hold on an + "int". The flag value is stored in the + lowest bit of the value. */ + + +typedef struct FiboNode_ { + struct FiboNode_ * pareptr; /*+ Pointer to parent element, if any +*/ + struct FiboNode_ * chldptr; /*+ Pointer to first child element, if any +*/ + FiboLink linkdat; /*+ Pointers to sibling elements +*/ + int deflval; /*+ Lowest bit: flag value; other bits: degree value +*/ +} FiboNode; + +/* The tree data structure. The fake dummy node aims + at handling root node insertion without any test. + This is important as many insertions have to be + performed. */ + +typedef struct FiboTree_ { + FiboNode rootdat; /*+ Dummy node for fast root insertion +*/ + FiboNode ** restrict degrtab; /*+ Consolidation array of size "bitsizeof (INT)" +*/ + int (* cmpfptr) (const FiboNode * const, const FiboNode * const); /*+ Comparison routine +*/ +} FiboTree; + +/* +** The marco definitions. +*/ + +/* This is the core of the module. All of + the algorithms have been de-recursived + and written as macros. */ + +#define fiboTreeLinkAfter(o,n) do { \ + FiboNode * nextptr; \ + nextptr = (o)->linkdat.nextptr; \ + (n)->linkdat.nextptr = nextptr; \ + (n)->linkdat.prevptr = (o); \ + nextptr->linkdat.prevptr = (n); \ + (o)->linkdat.nextptr = (n); \ + } while (0) + +#define fiboTreeUnlink(n) do { \ + (n)->linkdat.prevptr->linkdat.nextptr = (n)->linkdat.nextptr; \ + (n)->linkdat.nextptr->linkdat.prevptr = (n)->linkdat.prevptr; \ + } while (0) + +#define fiboTreeAddMacro(t,n) do { \ + (n)->pareptr = NULL; \ + (n)->chldptr = NULL; \ + (n)->deflval = 0; \ + fiboTreeLinkAfter (&((t)->rootdat), (n)); \ + } while (0) + +#define fiboTreeMinMacro(t) (fiboTreeConsolidate (t)) + +#define fiboTreeCutChildren(t,n) do { \ + FiboNode * chldptr; \ + chldptr = (n)->chldptr; \ + if (chldptr != NULL) { \ + FiboNode * cendptr; \ + cendptr = chldptr; \ + do { \ + FiboNode * nextptr; \ + nextptr = chldptr->linkdat.nextptr; \ + chldptr->pareptr = NULL; \ + fiboTreeLinkAfter (&((t)->rootdat), chldptr); \ + chldptr = nextptr; \ + } while (chldptr != cendptr); \ + } \ + } while (0) + +#define fiboTreeDelMacro(t,n) do { \ + FiboNode * pareptr; \ + FiboNode * rghtptr; \ + pareptr = (n)->pareptr; \ + fiboTreeUnlink (n); \ + fiboTreeCutChildren ((t), (n)); \ + if (pareptr == NULL) \ + break; \ + rghtptr = (n)->linkdat.nextptr; \ + while (1) { \ + FiboNode * gdpaptr; \ + int deflval; \ + deflval = pareptr->deflval - 2; \ + pareptr->deflval = deflval | 1; \ + gdpaptr = pareptr->pareptr; \ + pareptr->chldptr = (deflval <= 1) ? NULL : rghtptr; \ + if (((deflval & 1) == 0) || (gdpaptr == NULL)) \ + break; \ + rghtptr = pareptr->linkdat.nextptr; \ + fiboTreeUnlink (pareptr); \ + pareptr->pareptr = NULL; \ + fiboTreeLinkAfter (&((t)->rootdat), pareptr); \ + pareptr = gdpaptr; \ + } \ + } while (0) + +/* +** The function prototypes. +*/ + +/* This set of definitions allows the user + to specify whether he prefers to use + the fibonacci routines as macros or as + regular functions, for instance for + debugging. */ + +#define fiboTreeAdd fiboTreeAddMacro +/* #define fiboTreeDel fiboTreeDelMacro */ +/* #define fiboTreeMin fiboTreeMinMacro */ + +#ifndef FIBO +#define static +#endif + +int fiboTreeInit (FiboTree * const, int (*) (const FiboNode * const, const FiboNode * const)); +void fiboTreeExit (FiboTree * const); +void fiboTreeFree (FiboTree * const); +FiboNode * fiboTreeConsolidate (FiboTree * const); +#ifndef fiboTreeAdd +void fiboTreeAdd (FiboTree * const, FiboNode * const); +#endif /* fiboTreeAdd */ +#ifndef fiboTreeDel +void fiboTreeDel (FiboTree * const, FiboNode * const); +#endif /* fiboTreeDel */ +#ifndef fiboTreeMin +FiboNode * fiboTreeMin (FiboTree * const); +#endif /* fiboTreeMin */ +#ifdef FIBO_DEBUG +int fiboTreeCheck (const FiboTree * const); +static int fiboTreeCheck2 (const FiboNode * const); +#endif /* FIBO_DEBUG */ + +#undef static diff --git a/ompi/mca/topo/treematch/treematch/k-partitioning.c b/ompi/mca/topo/treematch/treematch/k-partitioning.c new file mode 100644 index 00000000000..f035ffa24a1 --- /dev/null +++ b/ompi/mca/topo/treematch/treematch/k-partitioning.c @@ -0,0 +1,339 @@ +#include +#include +#include "k-partitioning.h" +#include "tm_mt.h" +#include "tm_verbose.h" + +void memory_allocation(PriorityQueue ** Q, PriorityQueue ** Qinst, double *** D, int n, int k); +void initialization(int * const part, double ** const matrice, PriorityQueue * const Qpart, PriorityQueue * const Q, PriorityQueue * const Qinst, double ** const D, int n, int k, int * const deficit, int * const surplus); +void algo(int * const part, double ** const matrice, PriorityQueue * const Qpart, PriorityQueue * const Q, PriorityQueue * const Qinst, double ** const D, int n, int * const deficit, int * const surplus); +double nextGain(PriorityQueue * const Qpart, PriorityQueue * const Q, int * const deficit, int * const surplus); +void balancing(int n, int deficit, int surplus, double ** const D, int * const part); +void destruction(PriorityQueue * Qpart, PriorityQueue * Q, PriorityQueue * Qinst, double ** D, int n, int k); + +void allocate_vertex2(int u, int *res, double **comm, int n, int *size, int max_size); +double eval_cost2(int *,int,double **); +int *kpartition_greedy2(int k, double **comm, int n, int nb_try_max, int *constraints, int nb_constraints); +int* build_p_vector(double **comm, int n, int k, int greedy_trials, int * constraints, int nb_constraints); + +int* kPartitioning(double ** comm, int n, int k, int * constraints, int nb_constraints, int greedy_trials) +{ + /* ##### declarations & allocations ##### */ + + PriorityQueue Qpart, *Q = NULL, *Qinst = NULL; + double **D = NULL; + int deficit, surplus, *part = NULL; + int real_n = n-nb_constraints; + + part = build_p_vector(comm, n, k, greedy_trials, constraints, nb_constraints); + + memory_allocation(&Q, &Qinst, &D, real_n, k); + + /* ##### Initialization ##### */ + + initialization(part, comm, &Qpart, Q, Qinst, D, real_n, k, &deficit, &surplus); + + /* ##### Main loop ##### */ + while((nextGain(&Qpart, Q, &deficit, &surplus))>0) + { + algo(part, comm, &Qpart, Q, Qinst, D, real_n, &deficit, &surplus); + } + + /* ##### Balancing the partition ##### */ + balancing(real_n, deficit, surplus, D, part); /*if partition isn't balanced we have to make one last move*/ + + /* ##### Memory deallocation ##### */ + destruction(&Qpart, Q, Qinst, D, real_n, k); + + return part; +} + +void memory_allocation(PriorityQueue ** Q, PriorityQueue ** Qinst, double *** D, int n, int k) +{ + int i; + *Q = calloc(k, sizeof(PriorityQueue)); /*one Q for each partition*/ + *Qinst = calloc(n, sizeof(PriorityQueue)); /*one Qinst for each vertex*/ + *D = malloc(sizeof(double *) * n); /*D's size is n * k*/ + for(i=0; i < n; ++i) + (*D)[i] = calloc(k, sizeof(double)); +} + +void initialization(int * const part, double ** const matrice, PriorityQueue * const Qpart, PriorityQueue * const Q, PriorityQueue * const Qinst, double ** const D, int n, int k, int * const deficit, int * const surplus) +{ + int i,j; + + /* ##### PriorityQueue initializations ##### */ + /* We initialize Qpart with a size of k because it contains the subsets's indexes. */ + PQ_init(Qpart, k); + + /* We initialize each Q[i] with a size of n because each vertex is in one of these queue at any time. */ + /* However we could set a size of (n/k)+1 as this is the maximum size of a subset when the partition is not balanced. */ + for(i=0; i= CRITICAL) + fprintf(stderr,"Error Max element in priority queue negative!\n"); + exit(-1); + } + *surplus = j; /*this subset becomes surplus*/ + + for(v=0; v < n; ++v) /*we scan though all edges (u,v) */ + { + j = part[u]; /*we set j to the starting subset */ + D[v][j]= D[v][j] - matrice[u][v]; /*we compute the new D[v, i] (here j has the value of the starting subset of u, that's why we say i) */ + PQ_adjustKey(&Qinst[v], j, D[v][j]); /*we update this gain in Qinst[v]*/ + j = *surplus; /*we put back the arrival subset in j*/ + D[v][j] = D[v][j] + matrice[u][v]; /*matrice[u][v]; we compute the new D[v, j]*/ + PQ_adjustKey(&Qinst[v], j, D[v][j]);/*we update this gain in Qinst[v]*/ + d = PQ_findMaxKey(&Qinst[v]) - D[v][part[v]]; /*we compute v's new highest possible gain*/ + PQ_adjustKey(&Q[part[v]], v, d); /*we update it in Q[p[v]]*/ + d = PQ_findMaxKey(&Q[part[v]]); /*we get the highest possible gain in v's subset*/ + PQ_adjustKey(Qpart, part[v], d); /*we update it in Qpart*/ + } + part[u] = *surplus; /*we move u from i to j (here surplus has the value of j the arrival subset)*/ + + d = PQ_findMaxKey(&Qinst[u]) - D[u][part[u]]; /*we compute the new u's highest possible gain*/ + if(!PQ_isEmpty(&Qinst[u])) /*if at least one more move of u is possible*/ + PQ_insert(&Q[part[u]], u, d); /*we insert u in the Q queue of its new subset*/ + PQ_adjustKey(Qpart, part[u], d); /*we update the new highest possible gain in u's subset*/ +} + +double nextGain(PriorityQueue * const Qpart, PriorityQueue * const Q, int * const deficit, int * const surplus) +{ + double res; + if(*deficit == *surplus) /*if the current partition is balanced*/ + res = PQ_findMaxKey(Qpart); /*we get the highest possible gain*/ + else /*the current partition is not balanced*/ + res = PQ_findMaxKey(&Q[*surplus]); /*we get the highest possible gain from surplus*/ + return res; +} + +void balancing(int n, int deficit, int surplus, double ** const D, int * const part) +{ + if(surplus != deficit) /*if the current partition is not balanced*/ + { + int i; + PriorityQueue moves; /*we use a queue to store the possible moves from surplus to deficit*/ + PQ_init(&moves, n); + for(i=0; i= max_size) + continue; + /* find a vertex not already partitionned*/ + do{ + /* call the mersenne twister PRNG of tm_mt.c*/ + j = genrand_int32() % n; + } while ( res[j] != -1 ); + /* allocate and update size of partition*/ + res[j] = i; + /* printf("random: %d -> %d\n",j,i); */ + size[i]++; + } + + /* allocate each unallocated vertices in the partition that maximize the communication*/ + for( i = 0 ; i < n ; ++i ) + if( res[i] == -1) + allocate_vertex2(i, res, comm, n-nb_constraints, size, max_size); + + cost = eval_cost2(res,n-nb_constraints,comm); + /*print_1D_tab(res,n); + printf("cost=%.2f\n",cost);*/ + if((cost best_cost)){ + best_cost = cost; + best_part = res[i]; + } + } + } + + /* printf("size[%d]: %d\n",best_part, size[best_part]);*/ + /* printf("putting(%.2f): %d -> %d\n",best_cost, u, best_part); */ + + res[u] = best_part; + size[best_part]++; +} + +double eval_cost2(int *partition, int n, double **comm) +{ + double cost = 0; + int i,j; + + for( i = 0 ; i < n ; ++i ) + for( j = i+1 ; j < n ; ++j ) + if(partition[i] != partition[j]) + cost += comm[i][j]; + + return cost; +} + +int* build_p_vector(double **comm, int n, int k, int greedy_trials, int * constraints, int nb_constraints) +{ + int * part = NULL; + if(greedy_trials>0) /*if greedy_trials > 0 then we use kpartition_greedy with greedy_trials trials*/ + { + part = kpartition_greedy2(k, comm, n, greedy_trials, constraints, nb_constraints); + } + else + { + int * size = calloc(k, sizeof(int)); + int i,j; + int nodes_per_part = n/k; + int nb_real_nodes = n-nb_constraints; + part = malloc(sizeof(int) * n); + for(i=0; i 0 : use of kpartition_greedy with greedy_trials number of trials + */ + +int* kPartitioning(double ** comm, int n, int k, int * const constraints, int nb_constraints, int greedy_trials); + +#endif /*K_PARTITIONING*/ diff --git a/ompi/mca/topo/treematch/treematch/tgt_map.c b/ompi/mca/topo/treematch/treematch/tgt_map.c deleted file mode 100644 index ea0a35542ad..00000000000 --- a/ompi/mca/topo/treematch/treematch/tgt_map.c +++ /dev/null @@ -1,56 +0,0 @@ -#include -#include -#include -//#include "tm_hwloc.h" -#include "tm_tree.h" -#include "tm_mapping.h" -#include "tm_timings.h" - - - -int main(int argc, char**argv){; - tree_t *comm_tree=NULL; - double **comm,**arch; - tm_topology_t *topology; - int nb_processes,nb_cores; - int *sol,*k; - if(argc<3){ - fprintf(stderr,"Usage: %s \n",argv[0]); - return -1; - } - - topology=tgt_to_tm(argv[1],&arch); - optimize_topology(&topology); - nb_processes=build_comm(argv[2],&comm); - sol=(int*)MALLOC(sizeof(int)*nb_processes); - - nb_cores=nb_processing_units(topology); - k=(int*)MALLOC(sizeof(int)*nb_cores); - // TreeMatchMapping(nb_processes,nb_cores,comm,sol); - - if(nb_processes>nb_cores){ - fprintf(stderr,"Error: to many processes (%d) for this topology (%d nodes)\n",nb_processes,nb_cores); - exit(-1); - } - TIC; - comm_tree=build_tree_from_topology(topology,comm,nb_processes,NULL,NULL); - map_topology_simple(topology,comm_tree,sol,k); - double duration=TOC; - printf("mapping duration: %f\n",duration); - printf("TreeMatch: "); - print_sol_inv(nb_processes,sol,comm,arch); - //print_1D_tab(k,nb_cores); -// display_other_heuristics(topology,nb_processes,comm,arch); - - //display_tab(arch,nb_cores); - - FREE_topology(topology); - //FREE_tree(comm_tree); - FREE(sol); - FREE(comm); - FREE(arch); - - - - return 0; -} diff --git a/ompi/mca/topo/treematch/treematch/tgt_to_mat.c b/ompi/mca/topo/treematch/treematch/tgt_to_mat.c deleted file mode 100644 index 1e65a21a941..00000000000 --- a/ompi/mca/topo/treematch/treematch/tgt_to_mat.c +++ /dev/null @@ -1,31 +0,0 @@ -#include -#include -#include -#include "tm_hwloc.h" -#include "tm_tree.h" -#include "tm_mapping.h" -#include "tm_timings.h" - - - -int main(int argc, char**argv){; - tm_topology_t *topology; - int nb_cores; - double **arch; - if(argc<2){ - fprintf(stderr,"Usage: %s \n",argv[0]); - return -1; - } - - topology=tgt_to_tm(argv[1],&arch); - nb_cores=nb_nodes(topology); - - display_tab(arch,nb_cores); - - FREE_topology(topology); - FREE(arch); - - - - return 0; -} diff --git a/ompi/mca/topo/treematch/treematch/tm_bucket.c b/ompi/mca/topo/treematch/treematch/tm_bucket.c index 28e7664574e..88719cf925e 100644 --- a/ompi/mca/topo/treematch/treematch/tm_bucket.c +++ b/ompi/mca/topo/treematch/treematch/tm_bucket.c @@ -31,7 +31,7 @@ static int ilog2(int val) static int verbose_level = ERROR; -bucket_list_t global_bl = {0}; +static bucket_list_t global_bl; int tab_cmp(const void*,const void*); int old_bucket_id(int,int,bucket_list_t); @@ -47,12 +47,12 @@ void fill_buckets(bucket_list_t); int is_power_of_2(int); void partial_sort(bucket_list_t *,double **,int); void next_bucket_elem(bucket_list_t,int *,int *); -int add_edge_3(tree_t *,tree_t *,int,int,int *); -void FREE_bucket(bucket_t *); -void FREE_tab_bucket(bucket_t **,int); -void FREE_bucket_list(bucket_list_t); -void partial_update_val (int nb_args, void **args); - +int add_edge_3(tm_tree_t *,tm_tree_t *,int,int,int *); +void free_bucket(bucket_t *); +void free_tab_bucket(bucket_t **,int); +void free_bucket_list(bucket_list_t); +void partial_update_val (int nb_args, void **args, int thread_id); +double bucket_grouping(tm_affinity_mat_t *,tm_tree_t *, tm_tree_t *, int ,int); int tab_cmp(const void* x1,const void* x2) { int *e1 = NULL,*e2 = NULL,i1,i2,j1,j2; @@ -146,7 +146,7 @@ void check_bucket(bucket_t *b,double **tab,double inf, double sup) j = b->bucket[k].j; if((tab[i][j] < inf) || (tab[i][j] > sup)){ if(verbose_level >= CRITICAL) - printf("[%d] (%d,%d):%f not in [%f,%f]\n",k,i,j,tab[i][j],inf,sup); + fprintf(stderr,"[%d] (%d,%d):%f not in [%f,%f]\n",k,i,j,tab[i][j],inf,sup); exit(-1); } } @@ -197,16 +197,21 @@ void add_to_bucket(int id,int i,int j,bucket_list_t bucket_list) n = bucket_list->nb_buckets; size = N*N/n; /* display_bucket(bucket);*/ - bucket->bucket = (coord*)realloc(bucket->bucket,sizeof(coord)*(size + bucket->bucket_len)); - bucket->bucket_len += size; - if(verbose_level >= DEBUG){ - printf("MALLOC/realloc: %d\n",id); - printf("(%d,%d)\n",i,j); - display_bucket(bucket); - printf("\n"); + printf("Extending bucket %d (%p) from size %d to size %d!\n", + id, (void*)bucket->bucket, bucket->nb_elem, bucket->nb_elem+size); } + bucket->bucket = (coord*)REALLOC(bucket->bucket,sizeof(coord)*(size + bucket->bucket_len)); + bucket->bucket_len += size; + + /* if(verbose_level >= DEBUG){ */ + /* printf("MALLOC/realloc: %d\n",id); */ + /* printf("(%d,%d)\n",i,j); */ + /* display_bucket(bucket); */ + /* printf("\n"); */ + /* } */ + } bucket->bucket[bucket->nb_elem].i=i; @@ -289,7 +294,13 @@ void partial_sort(bucket_list_t *bl,double **tab,int N) bucket_list_t bucket_list; int nb_buckets, nb_bits; - /* after these operations, nb_bucket is a power of 2 interger close to log2(N)*/ + if( N <= 0){ + if(verbose_level >= ERROR ) + fprintf(stderr,"Error: tryng to group a matrix of size %d<=0!\n",N); + return; + } + + /* after these operations, nb_buckets is a power of 2 interger close to log2(N)*/ nb_buckets = (int)floor(CmiLog2(N)); @@ -404,7 +415,7 @@ void next_bucket_elem(bucket_list_t bucket_list,int *i,int *j) } -int add_edge_3(tree_t *tab_node, tree_t *parent,int i,int j,int *nb_groups) +int add_edge_3(tm_tree_t *tab_node, tm_tree_t *parent,int i,int j,int *nb_groups) { /* printf("%d <-> %d ?\n",tab_node[i].id,tab_node[j].id); */ if((!tab_node[i].parent) && (!tab_node[j].parent)){ @@ -453,7 +464,7 @@ int add_edge_3(tree_t *tab_node, tree_t *parent,int i,int j,int *nb_groups) return 0; } -int try_add_edge(tree_t *tab_node, tree_t *parent,int arity,int i,int j,int *nb_groups) +int try_add_edge(tm_tree_t *tab_node, tm_tree_t *parent,int arity,int i,int j,int *nb_groups) { assert( i != j ); @@ -481,40 +492,40 @@ int try_add_edge(tree_t *tab_node, tree_t *parent,int arity,int i,int j,int *nb_ } } -void FREE_bucket(bucket_t *bucket) +void free_bucket(bucket_t *bucket) { FREE(bucket->bucket); FREE(bucket); } -void FREE_tab_bucket(bucket_t **bucket_tab,int N) +void free_tab_bucket(bucket_t **bucket_tab,int N) { int i; for( i = 0 ; i < N ; i++ ) - FREE_bucket(bucket_tab[i]); + free_bucket(bucket_tab[i]); FREE(bucket_tab); } -void FREE_bucket_list(bucket_list_t bucket_list) +void free_bucket_list(bucket_list_t bucket_list) { - /* Do not FREE the tab field it is used elsewhere */ - FREE_tab_bucket(bucket_list->bucket_tab,bucket_list->nb_buckets); + /* Do not free the tab field it is used elsewhere */ + free_tab_bucket(bucket_list->bucket_tab,bucket_list->nb_buckets); FREE(bucket_list->pivot); FREE(bucket_list->pivot_tree); FREE(bucket_list); } -void partial_update_val (int nb_args, void **args){ +void partial_update_val (int nb_args, void **args, int thread_id){ int inf = *(int*)args[0]; int sup = *(int*)args[1]; - affinity_mat_t *aff_mat = (affinity_mat_t*)args[2]; - tree_t *new_tab_node = (tree_t*)args[3]; + tm_affinity_mat_t *aff_mat = (tm_affinity_mat_t*)args[2]; + tm_tree_t *new_tab_node = (tm_tree_t*)args[3]; double *res=(double*)args[4]; int l; - if(nb_args != 6){ + if(nb_args != 5){ if(verbose_level >= ERROR) - fprintf(stderr,"Wrong number of args in %s: %d\n",__func__, nb_args); + fprintf(stderr,"(Thread: %d) Wrong number of args in %s: %d\n",thread_id, __func__, nb_args); exit(-1); } @@ -524,7 +535,7 @@ void partial_update_val (int nb_args, void **args){ } } -void bucket_grouping(affinity_mat_t *aff_mat,tree_t *tab_node, tree_t *new_tab_node, +double bucket_grouping(tm_affinity_mat_t *aff_mat,tm_tree_t *tab_node, tm_tree_t *new_tab_node, int arity,int M) { bucket_list_t bucket_list; @@ -536,10 +547,12 @@ void bucket_grouping(affinity_mat_t *aff_mat,tree_t *tab_node, tree_t *new_tab_n int N = aff_mat->order; double **mat = aff_mat->mat; - verbose_level = get_verbose_level(); + verbose_level = tm_get_verbose_level(); if(verbose_level >= INFO ) printf("starting sort of N=%d elements\n",N); + + TIC; partial_sort(&bucket_list,mat,N); duration = TOC; @@ -662,8 +675,8 @@ void bucket_grouping(affinity_mat_t *aff_mat,tree_t *tab_node, tree_t *new_tab_n printf("Bucket: %d, indice:%d\n",bucket_list->cur_bucket,bucket_list->bucket_indice); printf("val=%f\n",val); } - FREE_bucket_list(bucket_list); + free_bucket_list(bucket_list); - /* exit(-1); */ - /* display_grouping(new_tab_node,M,arity,val); */ + return val; } + diff --git a/ompi/mca/topo/treematch/treematch/tm_bucket.h b/ompi/mca/topo/treematch/treematch/tm_bucket.h index 17e70603983..433d4816466 100644 --- a/ompi/mca/topo/treematch/treematch/tm_bucket.h +++ b/ompi/mca/topo/treematch/treematch/tm_bucket.h @@ -28,7 +28,8 @@ typedef struct{ typedef _bucket_list_t *bucket_list_t; -void bucket_grouping(affinity_mat_t *aff_mat,tree_t *tab_node, tree_t *new_tab_node, - int arity,int M); -int try_add_edge(tree_t *tab_node, tree_t *parent,int arity,int i,int j,int *nb_groups); +double bucket_grouping(tm_affinity_mat_t *aff_mat,tm_tree_t *tab_node, tm_tree_t *new_tab_node, + int arity,int M); +int try_add_edge(tm_tree_t *tab_node, tm_tree_t *parent,int arity,int i,int j,int *nb_groups); #endif + diff --git a/ompi/mca/topo/treematch/treematch/tm_hwloc.c b/ompi/mca/topo/treematch/treematch/tm_hwloc.c deleted file mode 100644 index 4a85588cb99..00000000000 --- a/ompi/mca/topo/treematch/treematch/tm_hwloc.c +++ /dev/null @@ -1,278 +0,0 @@ -#include "opal/mca/hwloc/hwloc-internal.h" -#include "tm_tree.h" -#include "tm_mapping.h" -#include -#include "tm_verbose.h" - - -double ** tm_topology_to_arch(tm_topology_t *topology,double *cost); -tm_topology_t * tgt_to_tm(char *filename,double **pcost); -int topo_nb_proc(hwloc_topology_t topology,int N); -double ** topology_to_arch(hwloc_topology_t topology); -int symetric(hwloc_topology_t topology); -tm_topology_t* hwloc_to_tm(char *filename,double **pcost); -tm_topology_t* get_local_topo_with_hwloc(void); - - - - -/* transform a tgt scotch file into a topology file*/ -tm_topology_t * tgt_to_tm(char *filename, double **pcost) -{ - tm_topology_t *topology = NULL; - FILE *pf = NULL; - char line[1024]; - char *s = NULL; - double *cost = NULL; - int i; - - - - pf = fopen(filename,"r"); - if(!pf){ - if(get_verbose_level() >= CRITICAL) - fprintf(stderr,"Cannot open %s\n",filename); - exit(-1); - } - - if(get_verbose_level() >= INFO) - printf("Reading TGT file: %s\n",filename); - - - fgets(line,1024,pf); - - s = strstr(line,"tleaf"); - if(!s){ - if(get_verbose_level() >= CRITICAL) - fprintf(stderr,"Syntax error! %s is not a tleaf file\n",filename); - exit(-1); - } - - s += 5; - while(isspace(*s)) - s++; - - topology = (tm_topology_t*)MALLOC(sizeof(tm_topology_t)); - topology->nb_levels = atoi(strtok(s," "))+1; - topology->arity = (int*)MALLOC(sizeof(int)*topology->nb_levels); - cost = (double*)CALLOC(topology->nb_levels,sizeof(double)); - - for( i = 0 ; i < topology->nb_levels-1 ; i++ ){ - topology->arity[i] = atoi(strtok(NULL," ")); - cost[i] = atoi(strtok(NULL," ")); - } - - topology->arity[topology->nb_levels-1] = 0; - /* cost[topology->nb_levels-1]=0; */ - - /*aggregate costs*/ - for( i = topology->nb_levels-2 ; i >= 0 ; i-- ) - cost[i] += cost[i+1]; - - build_synthetic_proc_id(topology); - - *pcost = cost; - fclose(pf); - /* - topology->arity[0]=nb_proc; - topology->nb_levels=decompose((int)ceil((1.0*nb_obj)/nb_proc),1,topology->arity); - printf("levels=%d\n",topology->nb_levels); - */ - if(get_verbose_level() >= INFO) - printf("Topology built from %s!\n",filename); - - return topology; -} - -int topo_nb_proc(hwloc_topology_t topology,int N) -{ - hwloc_obj_t *objs = NULL; - int nb_proc; - - objs = (hwloc_obj_t*)MALLOC(sizeof(hwloc_obj_t)*N); - objs[0] = hwloc_get_next_obj_by_type(topology,HWLOC_OBJ_PU,NULL); - nb_proc = 1 + hwloc_get_closest_objs(topology,objs[0],objs+1,N-1); - FREE(objs); - return nb_proc; -} - - -double ** topology_to_arch(hwloc_topology_t topology) -{ - int nb_proc,i,j; - hwloc_obj_t obj_proc1,obj_proc2,obj_res; - double **arch = NULL; - - nb_proc = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_PU); - arch = (double**)MALLOC(sizeof(double*)*nb_proc); - for( i = 0 ; i < nb_proc ; i++ ){ - obj_proc1 = hwloc_get_obj_by_type(topology,HWLOC_OBJ_PU,i); - arch[obj_proc1->os_index] = (double*)MALLOC(sizeof(double)*nb_proc); - for( j = 0 ; j < nb_proc ; j++ ){ - obj_proc2 = hwloc_get_obj_by_type(topology,HWLOC_OBJ_PU,j); - obj_res = hwloc_get_common_ancestor_obj(topology,obj_proc1,obj_proc2); - /* printf("arch[%d][%d] <- %ld\n",obj_proc1->os_index,obj_proc2->os_index,*((long int*)(obj_res->userdatab))); */ - arch[obj_proc1->os_index][obj_proc2->os_index]=speed(obj_res->depth+1); - } - } - return arch; -} - -int symetric(hwloc_topology_t topology) -{ - int depth,i,topodepth = hwloc_topology_get_depth(topology); - unsigned int arity; - hwloc_obj_t obj; - for ( depth = 0; depth < topodepth-1 ; depth++ ) { - int N = hwloc_get_nbobjs_by_depth(topology, depth); - obj = hwloc_get_next_obj_by_depth (topology,depth,NULL); - arity = obj->arity; - - /* printf("Depth=%d, N=%d, Arity:%d\n",depth,N,arity); */ - for (i = 1; i < N; i++ ){ - obj = hwloc_get_next_obj_by_depth (topology,depth,obj); - if( obj->arity != arity){ - /* printf("[%d]: obj->arity=%d, arity=%d\n",i,obj->arity,arity); */ - return 0; - } - } - } - return 1; -} - -tm_topology_t* hwloc_to_tm(char *filename,double **pcost) -{ - hwloc_topology_t topology; - tm_topology_t *res = NULL; - hwloc_obj_t *objs = NULL; - unsigned topodepth,depth; - int nb_nodes,i; - double *cost; - int err; - - /* Build the topology */ - hwloc_topology_init(&topology); - err = hwloc_topology_set_xml(topology,filename); - if(err == -1){ - if(get_verbose_level() >= CRITICAL) - fprintf(stderr,"Error: %s is a bad xml topology file!\n",filename); - exit(-1); - } - - hwloc_topology_ignore_all_keep_structure(topology); - hwloc_topology_load(topology); - - - /* Test if symetric */ - if(!symetric(topology)){ - if(get_verbose_level() >= CRITICAL) - fprintf(stderr,"%s not symetric!\n",filename); - exit(-1); - } - - /* work on depth */ - topodepth = hwloc_topology_get_depth(topology); - - res = (tm_topology_t*)MALLOC(sizeof(tm_topology_t)); - res->nb_levels = topodepth; - res->node_id = (int**)MALLOC(sizeof(int*)*res->nb_levels); - res->nb_nodes = (int*)MALLOC(sizeof(int)*res->nb_levels); - res->arity = (int*)MALLOC(sizeof(int)*res->nb_levels); - - if(get_verbose_level() >= INFO) - printf("topodepth = %d\n",topodepth); - - /* Build TreeMatch topology */ - for( depth = 0 ; depth < topodepth ; depth++ ){ - nb_nodes = hwloc_get_nbobjs_by_depth(topology, depth); - res->nb_nodes[depth] = nb_nodes; - res->node_id[depth] = (int*)MALLOC(sizeof(int)*nb_nodes); - - objs = (hwloc_obj_t*)MALLOC(sizeof(hwloc_obj_t)*nb_nodes); - objs[0] = hwloc_get_next_obj_by_depth(topology,depth,NULL); - hwloc_get_closest_objs(topology,objs[0],objs+1,nb_nodes-1); - res->arity[depth] = objs[0]->arity; - - if(get_verbose_level() >= INFO) - printf("%d(%d):",res->arity[depth],nb_nodes); - - /* Build process id tab */ - for (i = 0; i < nb_nodes; i++){ - res->node_id[depth][i] = objs[i]->os_index; - /* if(depth==topodepth-1) */ - } - FREE(objs); - } - - cost = (double*)CALLOC(res->nb_levels,sizeof(double)); - for(i=0; inb_levels; i++){ - cost[i] = speed(i); - } - - *pcost = cost; - - - /* Destroy topology object. */ - hwloc_topology_destroy(topology); - if(get_verbose_level() >= INFO) - printf("\n"); - return res; -} - -tm_topology_t* get_local_topo_with_hwloc(void) -{ - hwloc_topology_t topology; - tm_topology_t *res = NULL; - hwloc_obj_t *objs = NULL; - unsigned topodepth,depth; - int nb_nodes,i; - - /* Build the topology */ - hwloc_topology_init(&topology); - hwloc_topology_ignore_all_keep_structure(topology); - hwloc_topology_load(topology); - - /* Test if symetric */ - if(!symetric(topology)){ - if(get_verbose_level() >= CRITICAL) - fprintf(stderr,"Local toplogy not symetric!\n"); - exit(-1); - } - - /* work on depth */ - topodepth = hwloc_topology_get_depth(topology); - - res = (tm_topology_t*)MALLOC(sizeof(tm_topology_t)); - res->nb_levels = topodepth; - res->node_id = (int**)MALLOC(sizeof(int*)*res->nb_levels); - res->nb_nodes = (int*)MALLOC(sizeof(int)*res->nb_levels); - res->arity = (int*)MALLOC(sizeof(int)*res->nb_levels); - - /* Build TreeMatch topology */ - for( depth = 0 ; depth < topodepth ; depth++ ){ - nb_nodes = hwloc_get_nbobjs_by_depth(topology, depth); - res->nb_nodes[depth] = nb_nodes; - res->node_id[depth] = (int*)MALLOC(sizeof(int)*nb_nodes); - - objs = (hwloc_obj_t*)MALLOC(sizeof(hwloc_obj_t)*nb_nodes); - objs[0] = hwloc_get_next_obj_by_depth(topology,depth,NULL); - hwloc_get_closest_objs(topology,objs[0],objs+1,nb_nodes-1); - res->arity[depth] = objs[0]->arity; - - /* printf("%d:",res->arity[depth]); */ - - /* Build process id tab */ - for (i = 0; i < nb_nodes; i++){ - res->node_id[depth][i] = objs[i]->os_index; - /* if(depth==topodepth-1) */ - } - FREE(objs); - } - - /* Destroy HWLOC topology object. */ - hwloc_topology_destroy(topology); - - /* printf("\n"); */ - return res; -} - diff --git a/ompi/mca/topo/treematch/treematch/tm_hwloc.h b/ompi/mca/topo/treematch/treematch/tm_hwloc.h deleted file mode 100644 index 7ba09d3e518..00000000000 --- a/ompi/mca/topo/treematch/treematch/tm_hwloc.h +++ /dev/null @@ -1,7 +0,0 @@ -#include "opal/mca/hwloc/hwloc-internal.h" -#include "tm_tree.h" - -void hwloc_topology_tag(hwloc_topology_t topology); -tm_topology_t* hwloc_to_tm(char *filename,double **pcost); -tm_topology_t * tgt_to_tm(char *filename,double **pcost); -tm_topology_t* get_local_topo_with_hwloc(void); diff --git a/ompi/mca/topo/treematch/treematch/tm_kpartitioning.c b/ompi/mca/topo/treematch/treematch/tm_kpartitioning.c index 3aaed6a9fcc..4f56b49d694 100644 --- a/ompi/mca/topo/treematch/treematch/tm_kpartitioning.c +++ b/ompi/mca/topo/treematch/treematch/tm_kpartitioning.c @@ -1,13 +1,12 @@ #include "tm_mapping.h" #include "tm_mt.h" #include "tm_kpartitioning.h" +#include "k-partitioning.h" #include #include +#include "config.h" #define USE_KL_KPART 0 -#if USE_KL_KPART -#include "k-partitioning.h" -#endif /* USE_KL_KPART */ #define KL_KPART_GREEDY_TRIALS 0 static int verbose_level = ERROR; @@ -20,19 +19,18 @@ static int verbose_level = ERROR; int fill_tab(int **,int *,int,int,int,int); -void complete_com_mat(double ***,int,int); void complete_obj_weight(double **,int,int); void allocate_vertex(int,int *,com_mat_t *,int,int *,int); double eval_cost(int *, com_mat_t *); int *kpartition_greedy(int, com_mat_t *,int,int *,int); -constraint_t *split_constraints (int *,int,int,tm_topology_t *,int); +constraint_t *split_constraints (int *,int,int,tm_topology_t *,int, int); com_mat_t **split_com_mat(com_mat_t *,int,int,int *); int **split_vertices(int *,int,int,int *); -void FREE_tab_com_mat(com_mat_t **,int); -void FREE_tab_local_vertices(int **,int); -void FREE_const_tab(constraint_t *,int); -void kpartition_build_level_topology(tree_t *,com_mat_t *,int,int,tm_topology_t *, +void free_tab_com_mat(com_mat_t **,int); +void free_tab_local_vertices(int **,int); +void free_const_tab(constraint_t *,int); +void kpartition_build_level_topology(tm_tree_t *,com_mat_t *,int,int,tm_topology_t *, int *,int *,int,double *,double *); @@ -50,10 +48,14 @@ void allocate_vertex(int u, int *res, com_mat_t *com_mat, int n, int *size, int best_part = res[i]; break; } + }else{ for( i = 0 ; i < n ; i++){ if (( res[i] != -1 ) && ( size[res[i]] < max_size )){ cost = (((i)n)) ?com_mat->comm[u][i]:0; + /* if((n<=16) && (u==8)){ */ + /* printf("u=%d, i=%d: %f\n",u, i, cost); */ + /* } */ if (( cost > best_cost)){ best_cost = cost; best_part = res[i]; @@ -61,8 +63,10 @@ void allocate_vertex(int u, int *res, com_mat_t *com_mat, int n, int *size, int } } } - /* printf("size[%d]: %d\n",best_part, size[best_part]);*/ - /* printf("putting(%.2f): %d -> %d\n",best_cost, u, best_part); */ + /* if(n<=16){ */ + /* printf("size[%d]: %d\n",best_part, size[best_part]); */ + /* printf("putting(%.2f): %d -> %d\n",best_cost, u, best_part); */ + /* } */ res[u] = best_part; size[best_part]++; @@ -83,25 +87,45 @@ double eval_cost(int *partition, com_mat_t *com_mat) int *kpartition_greedy(int k, com_mat_t *com_mat, int n, int *constraints, int nb_constraints) { - int *res = NULL, *best_res=NULL, *size = NULL; + int *partition = NULL, *best_partition=NULL, *size = NULL; int i,j,nb_trials; int max_size, max_val; double cost, best_cost = -1; int start, end; int dumb_id, nb_dumb; + int vl = tm_get_verbose_level(); + if(nb_constraints > n){ + if(vl >= ERROR){ + fprintf(stderr,"Error more constraints (%d) than the problem size (%d)!\n",nb_constraints, n); + } + return NULL; + } + + max_size = n/k; + + if(vl >= DEBUG){ + printf("max_size = %d (n=%d,k=%d)\ncom_mat->n-1=%d\n",max_size,n,k,com_mat->n-1); + printf("nb_constraints = %d\n",nb_constraints); + + if(n<=16){ + printf("Constraints: ");print_1D_tab(constraints,nb_constraints); + } + } + /* if(com_mat->n){ */ + /* printf ("val [n-1][0]= %f\n",com_mat->comm[com_mat->n-1][0]); */ + /* } */ for( nb_trials = 0 ; nb_trials < MAX_TRIALS ; nb_trials++ ){ - res = (int *)MALLOC(sizeof(int)*n); + partition = (int *)MALLOC(sizeof(int)*n); for ( i = 0 ; i < n ; i ++ ) - res[i] = -1; + partition[i] = -1; size = (int *)CALLOC(k,sizeof(int)); - max_size = n/k; - /*printf("Constraints: ");print_1D_tab(constraints,nb_constraints);*/ + /* put "dumb" vertices in the correct partition if there are any*/ if (nb_constraints){ @@ -120,12 +144,13 @@ int *kpartition_greedy(int k, com_mat_t *com_mat, int n, int *constraints, int number of leaves of the subtree (n/k) and the number of constraints */ nb_dumb = n/k - (end-start); - /*printf("max_val: %d, nb_dumb=%d, start=%d, end=%d, size=%d\n",max_val, nb_dumb, start, end, n/k);*/ - + /* if(n<=16){ */ + /* printf("max_val: %d, nb_dumb=%d, start=%d, end=%d, size=%d\n",max_val, nb_dumb, start, end, n/k); */ + /* } */ /* dumb vertices are the one with highest indices: put them in the ith partitions*/ for( j = 0; j < nb_dumb; j ++ ){ - res[dumb_id] = i; + partition[dumb_id] = i; dumb_id--; } /* increase the size of the ith partition accordingly*/ @@ -133,7 +158,10 @@ int *kpartition_greedy(int k, com_mat_t *com_mat, int n, int *constraints, int start=end; } } - /*printf("After dumb vertices mapping: ");print_1D_tab(res,n);*/ + /* if(n<=16){ */ + /* printf("After dumb vertices mapping: ");print_1D_tab(partition,n); */ + /* } */ + /* choose k initial "true" vertices at random and put them in a different partition */ for ( i = 0 ; i < k ; i ++ ){ @@ -144,35 +172,39 @@ int *kpartition_greedy(int k, com_mat_t *com_mat, int n, int *constraints, int do{ /* call the mersenne twister PRNG of tm_mt.c*/ j = genrand_int32() % n; - } while ( res[j] != -1 ); + } while ( partition[j] != -1 ); /* allocate and update size of partition*/ - res[j] = i; - /* printf("random: %d -> %d\n",j,i); */ + partition[j] = i; + /* if(n<=16){ */ + /* printf("random: %d -> %d\n",j,i); */ + /* } */ size[i]++; } /* allocate each unaloacted vertices in the partition that maximize the communication*/ for( i = 0 ; i < n ; i ++) - if( res[i] == -1) - allocate_vertex(i, res, com_mat, n, size, max_size); - - cost = eval_cost(res,com_mat); - /*print_1D_tab(res,n); - printf("cost=%.2f\n",cost);*/ + if( partition[i] == -1) + allocate_vertex(i, partition, com_mat, n, size, max_size); + + cost = eval_cost(partition,com_mat); + /* if(n<=16){ */ + /* print_1D_tab(partition,n); */ + /* printf("cost=%.2f\n",cost); */ + /* } */ if((cost=DEBUG){ + printf("Step %d\n",i); + printf("\tConstraint: "); print_1D_tab(constraints, nb_constraints); + printf("\tSub constraint: "); print_1D_tab(const_tab[i].constraints, end-start); + } + + if(end-start > N/k){ + if(vl >= ERROR){ + fprintf(stderr, "Error in spliting constraint at step %d. N=%d k= %d, length = %d\n", i, N, k, end-start); + } + FREE(const_tab); + return NULL; + } const_tab[i].id = i; start = end; } @@ -223,6 +279,7 @@ constraint_t *split_constraints (int *constraints, int nb_constraints, int k, tm } +/* split the com_mat of order n in k partiton according to parmutition table*/ com_mat_t **split_com_mat(com_mat_t *com_mat, int n, int k, int *partition) { com_mat_t **res = NULL, *sub_com_mat; @@ -236,6 +293,8 @@ com_mat_t **split_com_mat(com_mat_t *com_mat, int n, int k, int *partition) if(verbose_level >= DEBUG){ printf("Partition: "); print_1D_tab(partition,n); display_tab(com_mat->comm,com_mat->n); + printf("m=%d,n=%d,k=%d\n",m,n,k); + printf("perm=%p\n", (void*)perm); } perm = (int*)MALLOC(sizeof(int)*m); @@ -243,10 +302,22 @@ com_mat_t **split_com_mat(com_mat_t *com_mat, int n, int k, int *partition) /* build perm such that submat[i][j] correspond to com_mat[perm[i]][perm[j]] according to the partition*/ s = 0; - for( j = 0; j < com_mat->n; j ++) /* check only non zero element of of com_mat*/ + /* The partition is of size n. n can be larger than the communication matrix order + as only the input problem are in the communication matrix while n is of the size + of all the element (including the added one where it is possible to map computation) : + we can have more compute units than processes*/ + for( j = 0; j < com_mat->n; j ++) if ( partition[j] == cur_part ) perm[s++] = j; + if(s>m){ + if(verbose_level >= CRITICAL){ + fprintf(stderr,"Partition: "); print_1D_tab(partition,n); + display_tab(com_mat->comm,com_mat->n); + fprintf(stderr,"too many elements of the partition for the permuation (s=%d>%d=m). n=%d, k=%d, cur_part= %d\n",s,m,n,k, cur_part); + } + exit(-1); + } /* s is now the size of the non zero sub matrix for this partition*/ /* built a sub-matrix for partition cur_part*/ sub_mat = (double **) MALLOC(sizeof(double *) * s); @@ -263,7 +334,7 @@ com_mat_t **split_com_mat(com_mat_t *com_mat, int n, int k, int *partition) } } - sub_com_mat = (com_mat_t *)malloc(sizeof(com_mat_t)); + sub_com_mat = (com_mat_t *)MALLOC(sizeof(com_mat_t)); sub_com_mat -> n = s; sub_com_mat -> comm = sub_mat; @@ -274,7 +345,7 @@ com_mat_t **split_com_mat(com_mat_t *com_mat, int n, int k, int *partition) res[cur_part] = sub_com_mat; } - FREE(perm); + FREE(perm); return res; } @@ -310,7 +381,7 @@ int **split_vertices( int *vertices, int n, int k, int *partition) return res; } -void FREE_tab_com_mat(com_mat_t **mat,int k) +void free_tab_com_mat(com_mat_t **mat,int k) { int i,j; if( !mat ) @@ -320,11 +391,13 @@ void FREE_tab_com_mat(com_mat_t **mat,int k) for ( j = 0 ; j < mat[i]->n ; j ++) FREE( mat[i]->comm[j] ); FREE( mat[i]->comm ); + FREE(mat[i]); + } FREE(mat); } -void FREE_tab_local_vertices(int **mat, int k) +void free_tab_local_vertices(int **mat, int k) { int i; /* m=n/k; */ if( !mat ) @@ -337,7 +410,7 @@ void FREE_tab_local_vertices(int **mat, int k) } -void FREE_const_tab(constraint_t *const_tab, int k) +void free_const_tab(constraint_t *const_tab, int k) { int i; @@ -352,19 +425,32 @@ void FREE_const_tab(constraint_t *const_tab, int k) FREE(const_tab); } -void kpartition_build_level_topology(tree_t *cur_node, com_mat_t *com_mat, int N, int depth, +#if 0 +static void check_com_mat(com_mat_t *com_mat){ + int i,j; + + for( i = 0 ; i < com_mat->n ; i++ ) + for( j = 0 ; j < com_mat->n ; j++ ) + if(com_mat->comm[i][j]<0){ + printf("com_mat->comm[%d][%d]= %f\n",i,j,com_mat->comm[i][j]); + exit(-1); + } +} +#endif + +void kpartition_build_level_topology(tm_tree_t *cur_node, com_mat_t *com_mat, int N, int depth, tm_topology_t *topology, int *local_vertices, int *constraints, int nb_constraints, double *obj_weight, double *comm_speed) { com_mat_t **tab_com_mat = NULL; /* table of comunication matrix. We will have k of such comunication matrix, one for each subtree */ int k = topology->arity[depth]; - tree_t **tab_child = NULL; + tm_tree_t **tab_child = NULL; int *partition = NULL; int **tab_local_vertices = NULL; constraint_t *const_tab = NULL; int i; - verbose_level = get_verbose_level(); + verbose_level = tm_get_verbose_level(); /* if we are at the bottom of the tree set cur_node and return*/ @@ -376,8 +462,14 @@ void kpartition_build_level_topology(tree_t *cur_node, com_mat_t *com_mat, int N } + if(verbose_level >= DEBUG){ + printf("Partitionning Matrix of size %d (problem size= %d) in %d partitions\n", com_mat->n, N, k); + } + + /* check_com_mat(com_mat); */ + /* partition the com_matrix in k partitions*/ - partition = kpartition(topology->arity[depth], com_mat, N, constraints, nb_constraints); + partition = kpartition(k, com_mat, N, constraints, nb_constraints); /* split the communication matrix in k parts according to the partition just found above */ tab_com_mat = split_com_mat( com_mat, N, k, partition); @@ -386,12 +478,12 @@ void kpartition_build_level_topology(tree_t *cur_node, com_mat_t *com_mat, int N tab_local_vertices = split_vertices( local_vertices, N, k, partition); /* construct a tab of constraints of size k: one for each partitions*/ - const_tab = split_constraints (constraints, nb_constraints, k, topology, depth); + const_tab = split_constraints (constraints, nb_constraints, k, topology, depth, N); /* create the table of k nodes of the resulting sub-tree */ - tab_child = (tree_t **) CALLOC (k,sizeof(tree_t*)); + tab_child = (tm_tree_t **) CALLOC (k,sizeof(tm_tree_t*)); for( i = 0 ; i < k ; i++){ - tab_child[i] = (tree_t *) MALLOC(sizeof(tree_t)); + tab_child[i] = (tm_tree_t *) MALLOC(sizeof(tm_tree_t)); } /* for each child, proceeed recursively*/ @@ -407,29 +499,30 @@ void kpartition_build_level_topology(tree_t *cur_node, com_mat_t *com_mat, int N /* link the node with its child */ set_node( cur_node, tab_child, k, NULL, cur_node->id, 0, NULL, depth); - /* FREE local data*/ + /* free local data*/ FREE(partition); - FREE_tab_com_mat(tab_com_mat,k); - FREE_tab_local_vertices(tab_local_vertices,k); - FREE_const_tab(const_tab,k); + free_tab_com_mat(tab_com_mat,k); + free_tab_local_vertices(tab_local_vertices,k); + free_const_tab(const_tab,k); } -tree_t *kpartition_build_tree_from_topology(tm_topology_t *topology,double **comm,int N, int *constraints, int nb_constraints, double *obj_weight, double *com_speed) +tm_tree_t *kpartition_build_tree_from_topology(tm_topology_t *topology,double **comm,int N, int *constraints, int nb_constraints, double *obj_weight, double *com_speed) { int depth,i, K; - tree_t *root = NULL; + tm_tree_t *root = NULL; int *local_vertices = NULL; int nb_cores; com_mat_t com_mat; - verbose_level = get_verbose_level(); + verbose_level = tm_get_verbose_level(); - if(verbose_level>=INFO) - printf("Number of constraints: %d\n", nb_constraints); - printf("Number of constraints: %d, N=%d\n", nb_constraints, N); - nb_cores=nb_processing_units(topology); + nb_cores=nb_processing_units(topology)*topology->oversub_fact; + + + if(verbose_level>=INFO) + printf("Number of constraints: %d, N=%d, nb_cores = %d, K=%d\n", nb_constraints, N, nb_cores, nb_cores-N); if((constraints == NULL) && (nb_constraints != 0)){ if(verbose_level>=ERROR) @@ -449,7 +542,6 @@ tree_t *kpartition_build_tree_from_topology(tm_topology_t *topology,double **com if((K=nb_cores - N)>0){ /* add K element to the object weight*/ complete_obj_weight(&obj_weight,N,K); - /* display_tab(tab,N+K);*/ } else if( K < 0){ if(verbose_level>=ERROR) fprintf(stderr,"Not enough cores!\n"); @@ -463,7 +555,7 @@ tree_t *kpartition_build_tree_from_topology(tm_topology_t *topology,double **com local_vertices is the array of vertices that can be used the min(N,nb_contraints) 1st element are number from 0 to N the last ones have value -1 - the value of this array will be used to number the leaves of the tree_t tree + the value of this array will be used to number the leaves of the tm_tree_t tree that start at "root" min(N,nb_contraints) is used to takle the case where thre is less processes than constraints @@ -479,18 +571,20 @@ tree_t *kpartition_build_tree_from_topology(tm_topology_t *topology,double **com /* we assume all objects have the same arity*/ /* assign the root of the tree*/ - root = (tree_t*) MALLOC (sizeof(tree_t)); - root->id = 0; + root = (tm_tree_t*) MALLOC (sizeof(tm_tree_t)); + root -> id = 0; + /*build the tree downward from the root*/ kpartition_build_level_topology(root, &com_mat, N+K, depth, topology, local_vertices, - constraints, nb_constraints, obj_weight, com_speed); + constraints, nb_constraints, obj_weight, com_speed); /*print_1D_tab(local_vertices,K+N);*/ if(verbose_level>=INFO) printf("Build (bottom-up) tree done!\n"); + FREE(local_vertices); diff --git a/ompi/mca/topo/treematch/treematch/tm_kpartitioning.h b/ompi/mca/topo/treematch/treematch/tm_kpartitioning.h index 58cf6af6ffc..aa9eee619d4 100644 --- a/ompi/mca/topo/treematch/treematch/tm_kpartitioning.h +++ b/ompi/mca/topo/treematch/treematch/tm_kpartitioning.h @@ -6,4 +6,6 @@ typedef struct _com_mat_t{ int *kpartition(int, com_mat_t*, int, int *, int); -tree_t * kpartition_build_tree_from_topology(tm_topology_t *topology,double **com_mat,int N, int *constraints, int nb_constraints, double *obj_weight, double *com_speed); +tm_tree_t * kpartition_build_tree_from_topology(tm_topology_t *topology,double **com_mat,int N, int *constraints, int nb_constraints, double *obj_weight, double *com_speed); + +#define HAVE_LIBSCOTCH 0 // missing configure setup? diff --git a/ompi/mca/topo/treematch/treematch/tm_malloc.c b/ompi/mca/topo/treematch/treematch/tm_malloc.c index 7facdae6d98..66fae50621f 100644 --- a/ompi/mca/topo/treematch/treematch/tm_malloc.c +++ b/ompi/mca/topo/treematch/treematch/tm_malloc.c @@ -1,34 +1,60 @@ +#include +#include +#include +#include #include "uthash.h" #include #include "tm_verbose.h" #include "tm_malloc.h" +#include "tm_tree.h" +#include "tm_mt.h" + + +#define MIN(a,b) ((a)<(b)?(a):(b)) #define EXTRA_BYTE 100 -typedef signed char byte; +typedef uint8_t byte; /* static int verbose_level = ERROR;*/ typedef struct _hash_t { - void *key; /* we'll use this field as the key */ - size_t size; - UT_hash_handle hh; /* makes this structure hashable */ + void *key; /* we'll use this field as the key */ + size_t size; + char *file; + int line; + UT_hash_handle hh; /* makes this structure hashable */ }hash_t; static hash_t *size_hash = NULL; static char extra_data[EXTRA_BYTE]; -static void save_size(void *ptr, size_t size); +static void save_ptr(void *ptr, size_t size, char *file, int line); static size_t retreive_size(void *someaddr); static void init_extra_data(void); -void save_size(void *ptr, size_t size) { + + +static char *my_strdup(char* string){ + int size = 1+strlen(string); + char *res = (char*)malloc(size*sizeof(char)); + + if(res) + memcpy(res, string, size*sizeof(char)); + + return res; + +} + +void save_ptr(void *ptr, size_t size, char *file, int line) { hash_t *elem; elem = (hash_t*) malloc(sizeof(hash_t)); - elem -> key = ptr; + elem -> key = ptr; elem -> size = size; - if(get_verbose_level() >= DEBUG) + elem -> line = line; + elem -> file = my_strdup(file); + if(tm_get_verbose_level() >= DEBUG) printf("Storing (%p,%ld)\n",ptr,size); HASH_ADD_PTR( size_hash, key, elem ); } @@ -39,30 +65,34 @@ size_t retreive_size(void *someaddr){ hash_t *elem = NULL; HASH_FIND_PTR(size_hash, &someaddr, elem); if(!elem){ - fprintf(stderr,"cannot find ptr %p to free!\n",someaddr); + if(tm_get_verbose_level() >= CRITICAL) + fprintf(stderr,"Cannot find ptr %p to free!\n",someaddr); + abort(); return 0; } res = elem->size; - if(get_verbose_level()>=DEBUG) + if(tm_get_verbose_level()>=DEBUG) printf("Retreiving (%p,%ld)\n",someaddr, res); + free(elem->file); HASH_DEL( size_hash, elem); return res; } -void my_mem_check(void){ +void tm_mem_check(void){ +#ifdef __DEBUG_TM_MALLOC__ hash_t *s; int nb_errors = 0; for(s=size_hash; s != NULL; s=s->hh.next) { - if(get_verbose_level() >= ERROR) { - printf("pointer %p of size %ld has not been freed!\n", s->key, s->size); - } - nb_errors ++; + if(tm_get_verbose_level()>=ERROR) + printf("pointer %p of size %ld (%s: %d) has not been freed!\n", s->key, s->size, s->file, s->line); + nb_errors ++; } - if(get_verbose_level() >= INFO) + if(tm_get_verbose_level() >= INFO) printf ("Number of errors in managing memory: %d\n",nb_errors); +#endif } void init_extra_data(void){ @@ -72,38 +102,39 @@ void init_extra_data(void){ if(done) return; - srandom(0); + init_genrand(0); for( i = 0 ; i < EXTRA_BYTE; i++) - extra_data[i] = (char) random() % 256; + extra_data[i] = (char) genrand_int32() % 256; done = 1; } -void *my_malloc(size_t size, char *file, int line){ +void *tm_malloc(size_t size, char *file, int line){ byte *ptr; init_extra_data(); size+=2*EXTRA_BYTE; ptr = malloc(size); - if(get_verbose_level()>=DEBUG) - printf("my_malloc of size %ld: %p (%s: %d)\n",size-2*EXTRA_BYTE,(void*)ptr,file,line); + if(tm_get_verbose_level()>=DEBUG) + printf("tm_malloc of size %ld: %p (%s: %d)\n",size-2*EXTRA_BYTE,(void*)ptr,file,line); - save_size(ptr,size); + save_ptr(ptr, size, file, line); memcpy(ptr, extra_data, EXTRA_BYTE); memcpy(ptr + size - EXTRA_BYTE, extra_data, EXTRA_BYTE); - if(get_verbose_level()>=DEBUG) - printf("my_malloc returning: %p\n",(void*)(ptr+EXTRA_BYTE)); + if(tm_get_verbose_level()>=DEBUG) + printf("tm_malloc returning: %p\n",(void*)(ptr+EXTRA_BYTE)); return (void *)(ptr + EXTRA_BYTE); } -void *my_calloc(size_t count, size_t size, char *file, int line){ + +void *tm_calloc(size_t count, size_t size, char *file, int line){ byte *ptr; size_t full_size; @@ -113,22 +144,72 @@ void *my_calloc(size_t count, size_t size, char *file, int line){ ptr = malloc(full_size); bzero(ptr,full_size); - save_size(ptr, full_size); + save_ptr(ptr, full_size, file, line); + + if(tm_get_verbose_level()>=DEBUG) + printf("tm_calloc of size %ld: %p (%s: %d)\n",full_size-2*EXTRA_BYTE,(void*)ptr, file, line); + + + memcpy(ptr, extra_data, EXTRA_BYTE); + memcpy(ptr + full_size - EXTRA_BYTE, extra_data, EXTRA_BYTE); + + if(tm_get_verbose_level()>=DEBUG) + printf("tm_calloc returning: %p\n", (void*)(ptr+EXTRA_BYTE)); + + return (void *)(ptr+EXTRA_BYTE); +} - if(get_verbose_level()>=DEBUG) - printf("my_calloc of size %ld: %p (%s: %d)\n",full_size-2*EXTRA_BYTE,(void*)ptr, file, line); + +void *tm_realloc(void *old_ptr, size_t size, char *file, int line){ + byte *ptr; + size_t full_size; + + init_extra_data(); + + full_size = size + 2 * EXTRA_BYTE; + + ptr = malloc(full_size); + save_ptr(ptr, full_size, file, line); + + if(tm_get_verbose_level()>=DEBUG) + printf("tm_realloc of size %ld: %p (%s: %d)\n",full_size-2*EXTRA_BYTE, (void*)ptr, file, line); memcpy(ptr, extra_data, EXTRA_BYTE); memcpy(ptr + full_size - EXTRA_BYTE, extra_data, EXTRA_BYTE); - if(get_verbose_level()>=DEBUG) - printf("my_calloc returning: %p\n",(void*)(ptr+EXTRA_BYTE)); + if(old_ptr){ + byte *original_ptr = ((byte *)old_ptr) - EXTRA_BYTE; + size_t old_ptr_size = retreive_size(original_ptr); + + memcpy(ptr + EXTRA_BYTE, old_ptr, MIN(old_ptr_size - 2 * EXTRA_BYTE, size)); + + if((bcmp(original_ptr ,extra_data, EXTRA_BYTE)) && ((tm_get_verbose_level()>=ERROR))){ + fprintf(stderr,"Realloc: cannot find special string ***before*** %p!\n", (void*)original_ptr); + fprintf(stderr,"memory is probably corrupted here!\n"); + } + + if((bcmp(original_ptr + old_ptr_size -EXTRA_BYTE ,extra_data, EXTRA_BYTE)) && ((tm_get_verbose_level()>=ERROR))){ + fprintf(stderr,"Realloc: cannot find special string ***after*** %p!\n", (void*)original_ptr); + fprintf(stderr,"memory is probably corrupted here!\n"); + } + + if(tm_get_verbose_level()>=DEBUG) + printf("tm_free freeing: %p\n", (void*)original_ptr); + + + free(original_ptr); + } + + + if(tm_get_verbose_level()>=DEBUG) + printf("tm_realloc returning: %p (----- %p)\n",(void*)(ptr+EXTRA_BYTE),(void*)(((byte *)ptr) - EXTRA_BYTE)); + return (void *)(ptr+EXTRA_BYTE); } -void my_free(void *ptr){ +void tm_free(void *ptr){ byte *original_ptr = ((byte *)ptr) - EXTRA_BYTE; size_t size; @@ -137,18 +218,18 @@ void my_free(void *ptr){ size = retreive_size(original_ptr); - if((bcmp(original_ptr ,extra_data, EXTRA_BYTE)) && ((get_verbose_level()>=ERROR))){ - fprintf(stderr,"cannot find special string ***before*** %p!\n",ptr); + if((bcmp(original_ptr ,extra_data, EXTRA_BYTE)) && ((tm_get_verbose_level()>=ERROR))){ + fprintf(stderr,"Free: cannot find special string ***before*** %p!\n", (void*)original_ptr); fprintf(stderr,"memory is probably corrupted here!\n"); } - if((bcmp(original_ptr + size -EXTRA_BYTE ,extra_data, EXTRA_BYTE)) && ((get_verbose_level()>=ERROR))){ - fprintf(stderr,"cannot find special string ***after*** %p!\n",ptr); + if((bcmp(original_ptr + size -EXTRA_BYTE ,extra_data, EXTRA_BYTE)) && ((tm_get_verbose_level()>=ERROR))){ + fprintf(stderr,"Free: cannot find special string ***after*** %p!\n", (void*)original_ptr); fprintf(stderr,"memory is probably corrupted here!\n"); } - if(get_verbose_level()>=DEBUG) - printf("my_free freeing: %p\n",(void*)original_ptr); + if(tm_get_verbose_level()>=DEBUG) + printf("tm_free freeing: %p\n", (void*)original_ptr); free(original_ptr); diff --git a/ompi/mca/topo/treematch/treematch/tm_malloc.h b/ompi/mca/topo/treematch/treematch/tm_malloc.h index c4038d90be7..f74cd3db6af 100644 --- a/ompi/mca/topo/treematch/treematch/tm_malloc.h +++ b/ompi/mca/topo/treematch/treematch/tm_malloc.h @@ -1,5 +1,29 @@ +#ifndef _TM_MALLOC_H_ +#define _TM_MALLOC_H_ + #include -void *my_malloc(size_t size, char *, int); -void *my_calloc(size_t count, size_t size, char *, int); -void my_free(void *ptr); -void my_mem_check(void); +void *tm_malloc(size_t size, char *, int); +void *tm_calloc(size_t count, size_t size, char *, int); +void *tm_realloc(void *ptr, size_t size, char *, int); +void tm_free(void *ptr); +void tm_mem_check(void); + +/* for debugging malloc */ +/* #define __DEBUG_TM_MALLOC__ */ +#undef __DEBUG_TM_MALLOC__ +#ifdef __DEBUG_TM_MALLOC__ +#define MALLOC(x) tm_malloc(x,__FILE__,__LINE__) +#define CALLOC(x,y) tm_calloc(x,y,__FILE__,__LINE__) +#define REALLOC(x,y) tm_realloc(x,y,__FILE__,__LINE__) +#define FREE tm_free +#define MEM_CHECK tm_mem_check +#else +#define MALLOC malloc +#define CALLOC calloc +#define FREE free +#define REALLOC realloc +#define MEM_CHECK tm_mem_check +#endif + + +#endif diff --git a/ompi/mca/topo/treematch/treematch/tm_mapping.c b/ompi/mca/topo/treematch/treematch/tm_mapping.c index 1debcb606cc..3472b4a9982 100644 --- a/ompi/mca/topo/treematch/treematch/tm_mapping.c +++ b/ompi/mca/topo/treematch/treematch/tm_mapping.c @@ -10,6 +10,7 @@ #include "tm_mt.h" #include "tm_mapping.h" #include "tm_timings.h" +#include "tm_thread_pool.h" #include "tm_tree.h" #ifdef _WIN32 @@ -25,11 +26,6 @@ #define LINE_SIZE (1000000) -typedef struct { - int val; - long key; -} hash_t; - typedef struct { double val; @@ -37,126 +33,29 @@ typedef struct { int key2; } hash2_t; -int distance(tm_topology_t *topology,int i, int j); -int nb_lines(char *); -void init_comm(char *,int,double **);void map_Packed(tm_topology_t *,int,int *); -void map_RR(int ,int *,int *); -int hash_asc(const void*,const void*); -int *generate_random_sol(tm_topology_t *,int,int,int); -double eval_sol(int *,int,double **,double **); -double eval_sol_inv(int *,int,double **,double **); -void exchange(int *,int,int); -double gain_exchange(int *,int,int,double,int,double **,double **); -void select_max(int *,int *,double **,int,int *); -void compute_gain(int *,int,double **,double **,double **); -void map_MPIPP(tm_topology_t *,int,int,int *,double **,double **); -void depth_first(tree_t *,int *,int *); -int nb_leaves(tree_t *); -void map_topology(tm_topology_t *,tree_t *,int,int,int *,int,int *); -int int_cmp(const void*,const void*); -int decompose(int,int,int *); -tree_t *build_synthetic_topology_old(int *,int,int,int); -void update_comm_speed(double **,int,int); -void topology_numbering(tm_topology_t *,int **,int *); -void topology_arity(tm_topology_t *,int **,int *); -void optimize_arity(int **,int *,int); -int get_indice(int *,int,int); -int fill_tab(int **,int *,int,int,int,int); -void update_canonical(int *,int,int,int); -int constraint_dsc(const void*,const void*); -void display_contsraint_tab(constraint_t *,int); -void update_perm(int *,int,constraint_t *,int,int); -void recursive_canonicalization(int,tm_topology_t *,int *,int *,int *,int,int); -void FREE_topology(tm_topology_t *); - - -int distance(tm_topology_t *topology,int i, int j) -{ - int level = topology->nb_levels; - int arity; - int f_i = i,f_j = j; - - do{ - level--; - arity = topology->arity[level]; - if( arity == 0 ) - arity = 1; - f_i = f_i/arity; - f_j = f_j/arity; - } while(f_i!=f_j); - - /* printf("(%d,%d):%d\n",i,j,level);*/ - /* exit(-1); */ - return level; -} -int nb_processing_units(tm_topology_t *topology) +/* compute the number of leaves of any subtree starting froma node of depth depth*/ +int compute_nb_leaves_from_level(int depth,tm_topology_t *topology) { - return topology->nb_nodes[topology->nb_levels-1]; -} + int res = 1; + while(depth < topology->nb_levels-1) + res *= topology->arity[depth++]; -void FREE_topology(tm_topology_t *topology) -{ - int i; - for( i = 0 ; i < topology->nb_levels ; i++ ) - FREE(topology->node_id[i]); - FREE(topology->node_id); - FREE(topology->nb_nodes); - FREE(topology->arity); - FREE(topology); + return res; } -double print_sol(int N,int *Value,double **comm, double *cost, tm_topology_t *topology) -{ - double a,c,sol; - int i,j; - - sol = 0; - for ( i = 0 ; i < N ; i++ ) - for ( j = i+1 ; j < N ; j++){ - c = comm[i][j]; - a = cost[distance(topology,Value[i],Value[j])]; - /* printf("T_%d_%d %f/%f=%f\n",i,j,c,a,c/a); */ - sol += c/a; - } - - for (i = 0; i < N; i++) { - printf("%d", Value[i]); - if(inb_proc_units; } + void print_1D_tab(int *tab,int N) { int i; @@ -170,34 +69,33 @@ void print_1D_tab(int *tab,int N) int nb_lines(char *filename) { - FILE *pf = NULL; - char line[LINE_SIZE]; - int N = 0; - - if(!(pf = fopen(filename,"r"))){ - if(get_verbose_level() >= CRITICAL) - fprintf(stderr,"Cannot open %s\n",filename); - exit(-1); - } + FILE *pf = NULL; + char line[LINE_SIZE]; + int N = 0; + + if(!(pf = fopen(filename,"r"))){ + if(tm_get_verbose_level() >= CRITICAL) + fprintf(stderr,"Cannot open %s\n",filename); + exit(-1); + } - while(fgets(line,LINE_SIZE,pf)) - N++; + while(fgets(line,LINE_SIZE,pf)) + N++; - if(get_verbose_level() >= DEBUG) - printf("Number of lines of file %s = %d\n",filename,N); + if(tm_get_verbose_level() >= DEBUG) + printf("Number of lines of file %s = %d\n",filename,N); - fclose(pf); - return N; + fclose(pf); + return N; } -void init_comm(char *filename,int N,double **comm) +void init_mat(char *filename,int N, double **mat, double *sum_row) { FILE *pf = NULL; char *ptr= NULL; char line[LINE_SIZE]; int i,j; - unsigned int vl = get_verbose_level(); - + unsigned int vl = tm_get_verbose_level(); if(!(pf=fopen(filename,"r"))){ @@ -208,381 +106,122 @@ void init_comm(char *filename,int N,double **comm) j = -1; i = 0; + + while(fgets(line,LINE_SIZE,pf)){ char *l = line; j = 0; - comm[i][N] = 0; - /* printf("%s|",line); */ + sum_row[i] = 0; while((ptr=strtok(l," \t"))){ l = NULL; if((ptr[0]!='\n')&&(!isspace(ptr[0]))&&(*ptr)){ - comm[i][j] = atof(ptr); - comm[i][N] += comm [i][j]; - /* printf ("comm[%d][%d]=%f|%s|\n",i,j,comm[i][j],ptr); */ - j++; + mat[i][j] = atof(ptr); + sum_row[i] += mat [i][j]; + if(mat[i][j]<0){ + if(vl >= WARNING) + fprintf(stderr,"Warning: negative value in com matrix! mat[%d][%d]=%f\n",i,j,mat[i][j]); + } + j++; } } if( j != N){ if(vl >= CRITICAL) - fprintf(stderr,"Error at %d %d (%d!=%d)for %s\n",i,j,j,N,filename); + fprintf(stderr,"Error at %d %d (%d!=%d). Too many columns for %s\n",i,j,j,N,filename); exit(-1); } i++; } - if( i != N ){ - if(vl >= CRITICAL) - fprintf(stderr,"Error at %d %d for %s\n",i,j,filename); - exit(-1); - } - /* - printf("%s:\n",filename); - for(i=0;i= CRITICAL) - fprintf(stderr,"Cannot open %s\n",filename); - exit(-1); - } - - /* compute the size od the array to store the constraints*/ - n = 0; - fgets(line, LINE_SIZE, pf); - l = line; - while((ptr=strtok(l," \t"))){ - l = NULL; - if((ptr[0] != '\n') && ( !isspace(ptr[0])) && (*ptr) && (ptr)) - n++; - } - tab = (int*)MALLOC((n+1)*sizeof(int)); - rewind(pf); - fgets(line, LINE_SIZE, pf); - l = line; - i = 0; - while((ptr=strtok(l," \t"))){ - l = NULL; - if((ptr[0] != '\n') && ( !isspace(ptr[0])) && (*ptr) && (ptr)){ - if(i <= n) - tab[i] = atoi(ptr); - else{ - if(vl >= CRITICAL) - fprintf(stderr, "More than %d entries in %s\n", n, filename); - exit(-1); - } - i++; - } - } - - if( i != n ){ + if( i != N ){ if(vl >= CRITICAL) - fprintf(stderr, "Read %d entries while expecting %d ones\n", i, n); + fprintf(stderr,"Error at %d %d. Too many rows for %s\n",i,j,filename); exit(-1); } - *ptab = tab; - fclose(pf); - return n; + fclose (pf); } -int build_comm(char *filename,double ***pcomm) -{ - double **comm = NULL; - int i,N; - - if(get_verbose_level() >= INFO) - printf("Reading communication matrix file: %s\n",filename); - - N = nb_lines(filename); - comm = (double**)MALLOC(N*sizeof(double*)); - for( i = 0 ; i < N ; i++) - /* the last column stores the sum of the line*/ - comm[i] = (double*)MALLOC((N+1)*sizeof(double)); - init_comm(filename,N,comm); - *pcomm = comm; +tm_affinity_mat_t * new_affinity_mat(double **mat, double *sum_row, int order){ + tm_affinity_mat_t * aff_mat; - if(get_verbose_level() >= INFO) - printf("Communication matrix built from %s!\n",filename); + aff_mat = (tm_affinity_mat_t *) MALLOC(sizeof(tm_affinity_mat_t)); + aff_mat -> mat = mat; + aff_mat -> sum_row = sum_row; + aff_mat -> order = order; - return N; + return aff_mat; } -void map_Packed(tm_topology_t *topology,int N,int *Value) -{ - int i,j = 0,depth; - depth = topology->nb_levels-1; +tm_affinity_mat_t * tm_build_affinity_mat(double **mat, int order){ + double *sum_row = NULL; + int i,j; + sum_row = (double*)MALLOC(order*sizeof(double)); - for( i = 0 ; i < nb_processing_units(topology) ; i++){ - /* printf ("%d -> %d\n",objs[i]->os_index,i); */ - if(topology->node_id[depth][i] != -1){ - Value[j++]=topology->node_id[depth][i]; - if(j == N) - break; - } + for( i = 0 ; i < order ; i++){ + sum_row[i] = 0; + for(j = 0 ; j < order ; j++) + sum_row[i] += mat [i][j]; } -} - -void map_RR(int N,int *Value, int *constraints) -{ - int i; - for( i = 0 ; i < N ; i++ ){ - /*printf ("%d -> %d\n",i,i);*/ - if(constraints) - Value[i]=constraints[i]; - else - Value[i]=i; - } + return new_affinity_mat(mat, sum_row, order); } -int hash_asc(const void* x1,const void* x2) -{ - hash_t *e1 = NULL,*e2 = NULL; - e1 = ((hash_t*)x1); - e2 = ((hash_t*)x2); - return (e1->key < e2->key) ? -1 : 1; -} -int *generate_random_sol(tm_topology_t *topology,int N,int level,int seed) -{ - hash_t *hash_tab = NULL; - int *sol = NULL; - int *nodes_id= NULL; +void tm_free_affinity_mat(tm_affinity_mat_t *aff_mat){ int i; + int n = aff_mat->order; - nodes_id = topology->node_id[level]; - - hash_tab = (hash_t*)MALLOC(sizeof(hash_t)*N); - sol = (int*)MALLOC(sizeof(int)*N); + for(i = 0 ; i < n ; i++) + FREE(aff_mat->mat[i]); - init_genrand(seed); - - for( i = 0 ; i < N ; i++ ){ - hash_tab[i].val = nodes_id[i]; - hash_tab[i].key = genrand_int32(); - } - - qsort(hash_tab,N,sizeof(hash_t),hash_asc); - for( i = 0 ; i < N ; i++ ) - sol[i] = hash_tab[i].val; - - FREE(hash_tab); - return sol; + FREE(aff_mat->mat); + FREE(aff_mat->sum_row); + FREE(aff_mat); } -double eval_sol(int *sol,int N,double **comm, double **arch) -{ - double a,c,res; - int i,j; - - res = 0; - for ( i = 0 ; i < N ; i++ ) - for ( j = i+1 ; j < N ; j++ ){ - c = comm[i][j]; - a = arch[sol[i]][sol[j]]; - res += c/a; - } - - return res; -} - -double eval_sol_inv(int *sol,int N,double **comm, double **arch) +tm_affinity_mat_t *tm_load_aff_mat(char *filename) { - double a,c,res; - int i,j; + double **mat = NULL; + double *sum_row = NULL; + int i, order; - res = 0; - for ( i = 0 ; i < N ; i++ ) - for ( j = i+1 ; j < N ; j++ ){ - c = comm[i][j]; - a = arch[sol[i]][sol[j]]; - res += c*a; - } + if(tm_get_verbose_level() >= INFO) + printf("Reading matrix file: %s\n",filename); - return res; -} + order = nb_lines(filename); -void exchange(int *sol,int i,int j) -{ - int tmp; - tmp = sol[i]; - sol[i] = sol[j]; - sol[j] = tmp; -} - -double gain_exchange(int *sol,int l,int m,double eval1,int N,double **comm, double **arch) -{ - double eval2; - if( l == m ) - return 0; - exchange(sol,l,m); - eval2 = eval_sol(sol,N,comm,arch); - exchange(sol,l,m); + sum_row = (double*)MALLOC(order*sizeof(double)); + mat = (double**)MALLOC(order*sizeof(double*)); + for( i = 0 ; i < order ; i++) + /* the last column stores the sum of the line*/ + mat[i] = (double*)MALLOC((order)*sizeof(double)); + init_mat(filename,order, mat, sum_row); - return eval1-eval2; -} -void select_max(int *l,int *m,double **gain,int N,int *state) -{ - double max; - int i,j; + if(tm_get_verbose_level() >= INFO) + printf("Affinity matrix built from %s!\n",filename); - max = -DBL_MAX; - - for( i = 0 ; i < N ; i++ ) - if(!state[i]) - for( j = 0 ; j < N ; j++ ) - if( (i != j) && (!state[j]) ){ - if(gain[i][j] > max){ - *l = i; - *m = j; - max=gain[i][j]; - } - } -} + return new_affinity_mat(mat, sum_row, order); -void compute_gain(int *sol,int N,double **gain,double **comm, double **arch) -{ - double eval1; - int i,j; - eval1 = eval_sol(sol,N,comm,arch); - for( i = 0 ; i < N ; i++ ) - for( j = 0 ; j <= i ; j++) - gain[i][j] = gain[j][i] = gain_exchange(sol,i,j,eval1,N,comm,arch); } -/* Randomized Algorithm of -Hu Chen, Wenguang Chen, Jian Huang ,Bob Robert,and H.Kuhn. Mpipp: an automatic profile-guided -parallel process placement toolset for smp clusters and multiclusters. In -Gregory K. Egan and Yoichi Muraoka, editors, ICS, pages 353-360. ACM, 2006. - */ - -void map_MPIPP(tm_topology_t *topology,int nb_seed,int N,int *Value,double **comm, double **arch) -{ - int *sol = NULL; - int *state = NULL; - double **gain = NULL; - int **history = NULL; - double *temp = NULL; - int i,j,t,l=0,m=0,seed=0; - double max,sum,best_eval,eval; - - gain = (double**)MALLOC(sizeof(double*)*N); - history = (int**)MALLOC(sizeof(int*)*N); - for( i = 0 ; i < N ; i++){ - gain[i] = (double*)MALLOC(sizeof(double)*N); - history[i] = (int*)MALLOC(sizeof(int)*3); - } - - state = (int*)MALLOC(sizeof(int)*N); - temp = (double*)MALLOC(sizeof(double)*N); - - sol = generate_random_sol(topology,N,topology->nb_levels-1,seed++); - for( i = 0 ; i < N ; i++) - Value[i] = sol[i]; - best_eval = DBL_MAX; - while(seed <= nb_seed){ - do{ - for( i = 0 ; i < N ; i++ ){ - state[i] = 0; - /* printf("%d ",sol[i]); */ - } - /* printf("\n"); */ - compute_gain(sol,N,gain,comm,arch); - /* - display_tab(gain,N); - exit(-1); - */ - for( i = 0 ; i < N/2 ; i++ ){ - select_max(&l,&m,gain,N,state); - /* printf("%d: %d <=> %d : %f\n",i,l,m,gain[l][m]); */ - state[l] = 1; - state[m] = 1; - exchange(sol,l,m); - history[i][1] = l; - history[i][2] = m; - temp[i] = gain[l][m]; - compute_gain(sol,N,gain,comm,arch); - } - t = -1; - max = 0; - sum = 0; - for(i = 0 ; i < N/2 ; i++ ){ - sum += temp[i]; - if( sum > max ){ - max = sum; - t = i; - } - } - /*for(j=0;j<=t;j++) - printf("exchanging: %d with %d for gain: %f\n",history[j][1],history[j][2],temp[j]); */ - for( j = t+1 ; j < N/2 ; j++ ){ - exchange(sol,history[j][1],history[j][2]); - /* printf("Undoing: %d with %d for gain: %f\n",history[j][1],history[j][2],temp[j]); */ - } - /* printf("max=%f\n",max); */ - - /*for(i=0;i 0 ); - - FREE(sol); - sol=generate_random_sol(topology,N,topology->nb_levels-1,seed++); - } - FREE(sol); - FREE(temp); - FREE(state); - for( i = 0 ; i < N ; i++){ - FREE(gain[i]); - FREE(history[i]); - } - FREE(gain); - FREE(history); -} - -/* void map_tree(tree_t* t1,tree_t *t2) */ +/* void map_tree(tm_tree_t* t1,tm_tree_t *t2) */ /* { */ /* double x1,x2; if((!t1->left)&&(!t1->right)){ printf ("%d -> %d\n",t1->id,t2->id); - Value[t2->id]=t1->id; + sigma[t2->id]=t1->id; return; } x1=t2->right->val/t1->right->val+t2->left->val/t1->left->val; @@ -596,7 +235,7 @@ void map_MPIPP(tm_topology_t *topology,int nb_seed,int N,int *Value,double **com }*/ /* } */ -void depth_first(tree_t *comm_tree, int *proc_list,int *i) +void depth_first(tm_tree_t *comm_tree, int *proc_list,int *i) { int j; if(!comm_tree->child){ @@ -608,7 +247,7 @@ void depth_first(tree_t *comm_tree, int *proc_list,int *i) depth_first(comm_tree->child[j],proc_list,i); } -int nb_leaves(tree_t *comm_tree) +int nb_leaves(tm_tree_t *comm_tree) { int j,n=0; @@ -621,249 +260,143 @@ int nb_leaves(tree_t *comm_tree) return n; } +/* find the first '-1 in the array of size n and put the value there*/ +static void set_val(int *tab, int val, int n){ + int i = 0; + + while (i < n ){ + if(tab[i] ==- 1){ + tab[i] = val; + return; + } + i++; + } + + if(tm_get_verbose_level() >= CRITICAL){ + fprintf(stderr,"Error while assigning value %d to k\n",val); + } + + exit(-1); +} /*Map topology to cores: sigma_i is such that process i is mapped on core sigma_i k_i is such that core i exectutes process k_i size of sigma is the number of process "nb_processes" - size of k is the number of cores/nodes "topology->nb_nodes[level]" + size of k is the number of cores/nodes "nb_compute_units" We must have numbe of process<=number of cores k_i =-1 if no process is mapped on core i */ -void map_topology(tm_topology_t *topology,tree_t *comm_tree,int nb_compute_units, - int level,int *sigma, int nb_processes, int *k) +void map_topology(tm_topology_t *topology,tm_tree_t *comm_tree, int level, + int *sigma, int nb_processes, int **k, int nb_compute_units) { - int *nodes_id = NULL; - int *proc_list = NULL; - int i,N,M,block_size; - unsigned int vl = get_verbose_level(); - - M = nb_leaves(comm_tree); - nodes_id = topology->node_id[level]; - N = topology->nb_nodes[level]; - - if(vl >= INFO){ - printf("nb_leaves=%d\n",M); - printf("level=%d, nodes_id=%p, N=%d\n",level,(void *)nodes_id,N); - printf("N=%d,nb_compute_units=%d\n",N,nb_compute_units); - } - - /* The number of node at level "level" in the tree should be equal to the number of processors*/ - assert(N==nb_compute_units); - - proc_list = (int*)MALLOC(sizeof(int)*M); - i = 0; - depth_first(comm_tree,proc_list,&i); + int *nodes_id = NULL; + int *proc_list = NULL; + int i,j,N,M,block_size; - if(vl >= DEBUG) - for(i=0;i= INFO) - printf("M=%d, N=%d, BS=%d\n",M,N,block_size); - for( i = 0 ; i < nb_processing_units(topology) ; i++ ) - k[i] = -1; - - for( i = 0 ; i < M ; i++ ) - if(proc_list[i] != -1){ - if(vl >= DEBUG) - printf ("%d->%d\n",proc_list[i],nodes_id[i/block_size]); - - if( proc_list[i] < nb_processes ){ - sigma[proc_list[i]] = nodes_id[i/block_size]; - k[nodes_id[i/block_size]] = proc_list[i]; - } - } - }else{ - if(vl >= INFO) - printf("M=%d, N=%d, BS=%d\n",M,N,block_size); - for( i = 0 ; i < M ; i++ ) - if(proc_list[i] != -1){ - if(vl >= DEBUG) - printf ("%d->%d\n",proc_list[i],nodes_id[i/block_size]); - if( proc_list[i] < nb_processes ) - sigma[proc_list[i]] = nodes_id[i/block_size]; - } - } - - if((vl >= DEBUG) && (k)){ - printf("k: "); - for( i = 0 ; i < nb_processing_units(topology) ; i++ ) - printf("%d ",k[i]); - printf("\n"); - } - - - FREE(proc_list); -} - -void map_topology_simple(tm_topology_t *topology,tree_t *comm_tree, int *sigma, int nb_processes, int *k) -{ - map_topology(topology,comm_tree,topology->nb_nodes[topology->nb_levels-1], - topology->nb_levels-1,sigma,nb_processes,k); -} + unsigned int vl = tm_get_verbose_level(); + M = nb_leaves(comm_tree); + nodes_id = topology->node_id[level]; + N = topology->nb_nodes[level]; -int int_cmp(const void* x1,const void* x2) -{ - int *e1 = NULL,*e2= NULL; + if(vl >= INFO){ + printf("nb_leaves=%d\n",M); + printf("level=%d, nodes_id=%p, N=%d\n",level,(void *)nodes_id,N); + printf("N=%d,nb_compute_units=%d\n",N,nb_compute_units); + } - e1 = ((int *)x1); - e2 = ((int *)x2); + /* The number of node at level "level" in the tree should be equal to the number of processors*/ + assert(N==nb_compute_units*topology->oversub_fact); - return ((*e1) > (*e2)) ? -1 : 1; -} + proc_list = (int*)MALLOC(sizeof(int)*M); + i = 0; + depth_first(comm_tree,proc_list,&i); + block_size = M/N; -int decompose(int n,int optimize,int *tab) -{ - int primes[6] = {2,3,5,7,11,0}; - int i = 0,j = 1,flag = 2; - unsigned int vl = get_verbose_level(); - - while( primes[i] && (n!=1) ){ - /* printf("[%d] before=%d\n",primes[i],n); */ - if( flag && optimize && (n%primes[i]!= 0) ){ - n += primes[i] - n%primes[i]; - flag--; - i = 0; - continue; - } - /* printf("after=%d\n",n); */ - if( n%primes[i] == 0 ){ - tab[j++] = primes[i]; - n /= primes[i]; - }else{ - i++; - flag = 1; + if(k){/*if we need the k vector*/ + if(vl >= INFO) + printf("M=%d, N=%d, BS=%d\n",M,N,block_size); + for( i = 0 ; i < nb_processing_units(topology) ; i++ ) + for(j = 0 ; j < topology->oversub_fact ; j++){ + k[i][j] = -1; } - } - if( n != 1 ) - tab[j++] = n; - qsort(tab+1,j-1,sizeof(int),int_cmp); + for( i = 0 ; i < M ; i++ ) + if(proc_list[i] != -1){ + if(vl >= DEBUG) + printf ("%d->%d\n",proc_list[i],nodes_id[i/block_size]); - if(vl >= DEBUG){ - for( i = 0 ; i < j ; i++ ) - printf("%d:",tab[i]); - printf("\n"); + if( proc_list[i] < nb_processes ){ + sigma[proc_list[i]] = nodes_id[i/block_size]; + set_val(k[nodes_id[i/block_size]], proc_list[i], topology->oversub_fact); + } + } + }else{ + if(vl >= INFO) + printf("M=%d, N=%d, BS=%d\n",M,N,block_size); + for( i = 0 ; i < M ; i++ ) + if(proc_list[i] != -1){ + if(vl >= DEBUG) + printf ("%d->%d\n",proc_list[i],nodes_id[i/block_size]); + if( proc_list[i] < nb_processes ) + sigma[proc_list[i]] = nodes_id[i/block_size]; + } } - tab[j] = 0; - - return (j+1); -} - - -tree_t *build_synthetic_topology_old(int *synt_tab,int id,int depth,int nb_levels) -{ - tree_t *res = NULL,**child = NULL; - int arity = synt_tab[0]; - int val,i; - - res = (tree_t*)MALLOC(sizeof(tree_t)); - val = 0; - if(depth >= nb_levels) - child = NULL; - else{ - child = (tree_t**)MALLOC(sizeof(tree_t*)*arity); - for( i = 0 ; i < arity ; i++ ){ - child[i] = build_synthetic_topology_old(synt_tab+1,i,depth+1,nb_levels); - child[i]->parent = res; - val += child[i]->val; + if((vl >= DEBUG) && (k)){ + printf("k: "); + for( i = 0 ; i < nb_processing_units(topology) ; i++ ){ + printf("Procesing unit %d: ",i); + for (j = 0 ; joversub_fact; j++){ + if( k[i][j] == -1) + break; + printf("%d ",k[i][j]); + } + printf("\n"); } } - set_node(res,child,arity,NULL,id,val+speed(depth),child[0],depth); - return res; + + FREE(proc_list); } -void display_topology(tm_topology_t *topology) +tm_solution_t * tm_compute_mapping(tm_topology_t *topology,tm_tree_t *comm_tree) { - int i,j; + size_t i; + tm_solution_t *solution; + int *sigma, **k; + size_t sigma_length = comm_tree->nb_processes; + size_t k_length = nb_processing_units(topology); - for( i = 0 ; i < topology->nb_levels ; i++ ){ - printf("%d: ",i); - for( j = 0 ; j < topology->nb_nodes[i] ; j++) - printf("%d ",topology->node_id[i][j]); - printf("\n"); + solution = (tm_solution_t *)MALLOC(sizeof(tm_solution_t)); + sigma = (int*) MALLOC(sizeof(int) * sigma_length); + k = (int**) MALLOC(sizeof(int*) * k_length); + for (i=0 ; i < k_length ; i++){ + k[i] = (int*) MALLOC(sizeof(int) * topology->oversub_fact); } -} - -/* - Build a synthetic balanced topology - arity : array of arity of the first nb_level (of size nb_levels-1) - core_numbering: numbering of the core by the system. Array of size nb_core_per_node + map_topology(topology, comm_tree, topology->nb_levels-1, sigma, sigma_length ,k, k_length); - nb_core_per_nodes: number of cores of a given node + solution->sigma = sigma; + solution->sigma_length = sigma_length; + solution->k = k; + solution->k_length = k_length; + solution->oversub_fact = topology->oversub_fact; - The numbering of the cores is done in round robin fashion after a width traversal of the topology - */ - -tm_topology_t *build_synthetic_topology(int *arity, int nb_levels, int *core_numbering, int nb_core_per_nodes) -{ - tm_topology_t *topology = NULL; - int i,j,n = 1; - - topology = (tm_topology_t*)MALLOC(sizeof(tm_topology_t)); - topology->arity = (int*)MALLOC(sizeof(int)*nb_levels); - memcpy(topology->arity,arity,sizeof(int)*nb_levels); - topology->nb_levels = nb_levels; - - topology->node_id = (int**)MALLOC(sizeof(int*)*topology->nb_levels); - topology->nb_nodes = (int*)MALLOC(sizeof(int)*topology->nb_levels); - - for( i = 0 ; i < topology->nb_levels ; i++ ){ - topology->nb_nodes[i] = n; - topology->node_id[i] = (int*)MALLOC(sizeof(int)*n); - if( i < topology->nb_levels-1) - for( j = 0 ; j < n ; j++ ) - topology->node_id[i][j] = j; - else - for( j = 0 ; j < n ; j++ ) - topology->node_id[i][j] = core_numbering[j%nb_core_per_nodes] + (nb_core_per_nodes)*(j/nb_core_per_nodes); - - n *= topology->arity[i]; - } - return topology; + return solution; } -void build_synthetic_proc_id(tm_topology_t *topology) -{ - int i; - size_t j,n = 1; - - topology->node_id = (int**)MALLOC(sizeof(int*)*topology->nb_levels); - topology->nb_nodes = (int*)MALLOC(sizeof(int)*topology->nb_levels); - - for( i = 0 ; i < topology->nb_levels ; i++ ){ - /* printf("n= %lld, arity := %d\n",n, topology->arity[i]); */ - topology->nb_nodes[i] = n; - topology->node_id[i] = (int*)MALLOC(sizeof(long int)*n); - if ( !topology->node_id[i] ){ - if(get_verbose_level() >= CRITICAL) - fprintf(stderr,"Cannot allocate level %d (of size %ld) of the topology\n", i, (unsigned long int)n); - exit(-1); - } - for( j = 0 ; j < n ; j++ ) - topology->node_id[i][j] = j; - n *= topology->arity[i]; - } -} void update_comm_speed(double **comm_speed,int old_size,int new_size) { double *old_tab = NULL,*new_tab= NULL; int i; - unsigned int vl = get_verbose_level(); + unsigned int vl = tm_get_verbose_level(); if(vl >= DEBUG) printf("comm speed [%p]: ",(void *)*comm_speed); @@ -886,260 +419,9 @@ void update_comm_speed(double **comm_speed,int old_size,int new_size) } -/* d: size of comm_speed */ -void TreeMatchMapping(int nb_obj, int nb_proc, double **comm_mat, double *obj_weight, double * comm_speed, int d, int *sol) -{ - tree_t *comm_tree = NULL; - tm_topology_t *topology= NULL; - double duration; - int i; - unsigned int vl = get_verbose_level(); - - TIC; - - for( i = 0 ; i < nb_obj ; i++ ){ - sol[i] = i; - /* printf("%f ",obj_weight[i]); */ - } - /* - printf("\n"); - return; - */ - - topology = (tm_topology_t*)MALLOC(sizeof(tm_topology_t)); - topology->arity = (int*)MALLOC(sizeof(int)*MAX_LEVELS); - topology->arity[0] = nb_proc; - topology->nb_levels = decompose((int)ceil((1.0*nb_obj)/nb_proc),1,topology->arity); - if(vl >= INFO) - printf("Topology nb levels=%d\n",topology->nb_levels); - build_synthetic_proc_id(topology); - - if(topology->nb_levels > d) - update_comm_speed(&comm_speed,d,topology->nb_levels); - - /* - exit(-1); - topology_to_arch(topology); - - display_tab(arch,hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_PROC)); - display_tab(arch,96); - exit(-1); - int nb_core=topo_nb_proc(topology,1000); - - display_tab(comm_mat,N); - */ - - TIC; - comm_tree = build_tree_from_topology(topology,comm_mat,nb_obj,obj_weight,comm_speed); - if(vl >= INFO) - printf("Tree building time=%f\n",TOC); - TIC; - map_topology(topology,comm_tree,nb_proc,1,sol,nb_obj,NULL); - if(vl >= INFO) - printf("Topology mapping time=%f\n",TOC); - - if(topology->nb_levels > d) - FREE(comm_speed); - - FREE_topology(topology); - FREE_tree(comm_tree); - - duration=TOC; - if(vl >= INFO) - printf("-------------- Mapping done in %.4fs!\n",duration); -} - -void display_other_heuristics(tm_topology_t *topology,int N,double **comm,int TGT_flag, int *constraints, double *cost) -{ - int *sol = NULL; - - sol = (int*)MALLOC(sizeof(int)*N); - - map_Packed(topology,N,sol); - printf("Packed: "); - if (TGT_flag == 1) - print_sol_inv(N,sol,comm,cost, topology); - else - print_sol(N,sol,comm,cost, topology); - - map_RR(N,sol,constraints); - printf("RR: "); - if (TGT_flag == 1) - print_sol_inv(N,sol,comm, cost, topology); - else - print_sol(N,sol,comm, cost, topology); - -/* double duration; */ -/* CLOCK_T time1,time0; */ -/* CLOCK(time0); */ -/* map_MPIPP(topology,1,N,sol,comm,arch); */ -/* CLOCK(time1); */ -/* duration=CLOCK_DIFF(time1,time0); */ -/* printf("MPIPP-1-D:%f\n",duration); */ -/* printf("MPIPP-1: "); */ -/* if (TGT_flag == 1) */ -/* print_sol_inv(N,sol,comm,arch); */ -/* else */ -/* print_sol(N,sol,comm,arch); */ - -/* CLOCK(time0); */ -/* map_MPIPP(topology,5,N,sol,comm,arch); */ -/* CLOCK(time1); */ -/* duration=CLOCK_DIFF(time1,time0); */ -/* printf("MPIPP-5-D:%f\n",duration); */ -/* printf("MPIPP-5: "); */ -/* if (TGT_flag == 1) */ -/* print_sol_inv(N,sol,comm,arch); */ -/* else */ -/* print_sol(N,sol,comm,arch); */ - - FREE(sol); -} - -void topology_numbering(tm_topology_t *topology,int **numbering,int *nb_nodes) -{ - int nb_levels; - unsigned int vl = get_verbose_level(); - - nb_levels = topology->nb_levels; - *nb_nodes = topology->nb_nodes[nb_levels-1]; - if(vl >= INFO) - printf("nb_nodes=%d\n",*nb_nodes); - *numbering = (int*)MALLOC(sizeof(int)*(*nb_nodes)); - memcpy(*numbering,topology->node_id[nb_levels-1],sizeof(int)*(*nb_nodes)); -} - -void topology_arity(tm_topology_t *topology,int **arity,int *nb_levels) -{ - *nb_levels = topology->nb_levels; - *arity = (int*)MALLOC(sizeof(int)*(*nb_levels)); - memcpy(*arity,topology->arity,sizeof(int)*(*nb_levels)); -} - -void optimize_arity(int **arity, int *nb_levels,int n) -{ - int a,i; - int *new_arity = NULL; - - if( n < 0 ) - return; - /* printf("n=%d\tnb_levels=%d\n",n,*nb_levels); */ - /* for(i=0;i<*nb_levels;i++) */ - /* printf("%d:",(*arity)[i]); */ - /* printf("\n"); */ - /* if(n==(*nb_levels)-3) */ - /* exit(-1); */ - a = (*arity)[n]; - if( (a%3 == 0) && (a > 3) ){ - /* - check if the a rity of level n devides 3 - If this is the case: - Add a level - */ - (*nb_levels)++; - /* Build a new arity array */ - new_arity = (int*)MALLOC(sizeof(int)*(*nb_levels)); - /* Copy the begining if the old array */ - for( i = 0 ; i < n ; i++) - new_arity[i] = (*arity)[i]; - /* set the nth level to arity 3 */ - new_arity[n] = 3; - /* printf("a=%d\n",a); */ - /* Set the (n+1) level to arity a/3 */ - new_arity[n+1] = a/3; - /* Copy the end of the array */ - for( i = n+2 ; i < *nb_levels ; i++) - new_arity[i] = (*arity)[i-1]; - FREE(*arity); - /* if a/3 =3 then go to the next level */ - if(new_arity[n+1] == 3) - optimize_arity(&new_arity,nb_levels,n); - else /* continue to this level (remember we just add a new level */ - optimize_arity(&new_arity,nb_levels,n+1); - *arity=new_arity; - }else if( (a%2==0) && (a>2) ){/* same as above but for arity == 2 instead of 3 */ - (*nb_levels)++; - new_arity = (int*)MALLOC(sizeof(int)*(*nb_levels)); - for( i = 0 ; i < n ; i++ ) - new_arity[i] = (*arity)[i]; - new_arity[n] = 2; - /* printf("a=%d\n",a); */ - new_arity[n+1] = a/2; - for( i = n+2 ; i < *nb_levels ; i++ ) - new_arity[i] = (*arity)[i-1]; - FREE(*arity); - if(new_arity[n+1] == 2) - optimize_arity(&new_arity,nb_levels,n); - else - optimize_arity(&new_arity,nb_levels,n+1); - *arity = new_arity; - }else /* if nothing works go to next level. */ - optimize_arity(arity,nb_levels,n-1); -} - - - -tm_topology_t *optimize_topology(tm_topology_t *topology){ - int *arity = NULL,nb_levels; - int *numbering = NULL,nb_nodes; - tm_topology_t *new_topo; - - topology_arity(topology,&arity,&nb_levels); - /* printf("nb_levels=%d\n",nb_levels); */ - /* for(i=0;inb_levels-1) - res *= topology->arity[depth++]; - - return res; -} - - -/* return the indice of the greatest element of tab slower than val - tab needs to be sorted in increasing order*/ -int get_indice(int *tab, int n, int val) -{ - int i = 0, j = n-1, k; - if( tab[n-1] < val ) - return n-1; - while( i != j){ - k = (i+j)/2; - if( (tab[k]length > e2->length) ? -1 : 1; -} - - -/* display function*/ -void display_contsraint_tab(constraint_t *const_tab, int n) -{ - int i; - for( i = 0; i < n; i++ ) { - printf("tab %d:",i); - print_1D_tab(const_tab[i].constraints, const_tab[i].length); - } -} - - -/* - We shift perm in new_perm and then copy back - perm is decomposed in m part of size 'size' - - in part k of new_perm we copy part constratint[k].id -*/ - -void update_perm(int *perm, int n, constraint_t *const_tab, int m, int size) -{ - int k; - int *new_perm = NULL; - - if( n <= 1 ) - return; - - new_perm = (int*)MALLOC(sizeof(int)*n); - - for ( k = 0 ; k < m ; k++ ) - memcpy(new_perm+k*size,perm+const_tab[k].id*size,size*sizeof(int)); - - memcpy(perm,new_perm,n*sizeof(int)); - /*printf("perm:");print_1D_tab(perm,n);*/ - - FREE(new_perm); -} - - - -/* we are at a given subtree of depth depth of the topology - the mapping constraints are in the table constraints of size n - The value of constraints are between 0 and the number of leaves-1 of the current subtree - - Canonical is the output of the function and is a just a renumbering of constraints in the canonical way - perm is a way to go from canonical[i] to the corresponding constraints[k]: perm[canonical[i]]=constraints[k] -*/ - -void recursive_canonicalization(int depth, tm_topology_t *topology, int *constraints, int *canonical, int *perm, int n, int m) -{ - constraint_t *const_tab = NULL; - int nb_leaves,nb_subtrees; - int k, prec, start, end; - - /* if there is no constraints stop and return*/ - if( !constraints ){ - assert( n == 0 ); - return; - } - - /* if we are at teh bottom of the tree set canonical to the 0 value: it will be shifted by update_canonical - and return*/ - if ( depth == topology->nb_levels ){ - assert( n==1 ); - canonical[0] = 0; - return; - } - - /* compute in how many subtrees we need to devide the curret one*/ - nb_subtrees = topology->arity[depth]; - /* construct a tab of constraints of this size*/ - const_tab = (constraint_t *) MALLOC( nb_subtrees * sizeof(constraint_t) ); - - /*printf("tab (%d):",nb_subtrees,n);print_1D_tab(constraints,n);*/ - /* nb_leaves is the number of leaves of the current subtree - this will help to detremine where to split constraints and how to shift values - */ - nb_leaves = compute_nb_leaves_from_level( depth + 1, topology ); - - /* split the constraints into nb_subtrees sub-constraints - each sub-contraints k contains constraints of value in [k*nb_leaves,(k+1)*nb_leaves[ - */ - start = 0; - for(k = 0; k < nb_subtrees; k++){ - /*returns the indice in contsraints that contains the smallest value not copied - end is used to compute the number of copied elements (end-size) and is used as the next staring indices*/ - end=fill_tab(&(const_tab[k].constraints), constraints, n,start, (k+1) * nb_leaves, k * nb_leaves); - const_tab[k].length = end-start; - const_tab[k].id = k; - start = end; - } - - /* sort constraint tab such that subtrees with the largest number of - constraints are put on the left and managed first, this how we canonize subtrees*/ - qsort(const_tab, nb_subtrees, sizeof(constraint_t), constraint_dsc); - /*display_contsraint_tab(const_tab,nb_subtrees);*/ - - /* update perm such taht we can backtrack the changes between constraints and caononical - To go from canonical[i] to the corresponding constraints[k] perm is such that perm[canonical[i]]=constraints[k]*/ - update_perm(perm, m, const_tab, nb_subtrees, nb_leaves); - - /* recursively call each subtree*/ - prec = 0; - for(k = 0; k < nb_subtrees; k++){ - /* the tricky part is here : we send only a subtab of canonical that will be updated recursively - This will greatly simplify the merging*/ - recursive_canonicalization(depth+1, topology, const_tab[k].constraints, canonical+prec, perm+k*nb_leaves, - const_tab[k].length, nb_leaves); - prec += const_tab[k].length; - } - - /* merging consist only in shifting the right part of canonical*/ - start = const_tab[0].length; - for( k = 1; k < nb_subtrees ; k++){ - update_canonical(canonical, start, start+const_tab[k].length, k * nb_leaves); - start += const_tab[k].length; - } - - /* FREE local subconstraints*/ - for( k = 0; k < nb_subtrees; k++ ) - if(const_tab[k].length) - FREE(const_tab[k].constraints); - - FREE(const_tab); -} - -/* - shuffle the constraints such that for each node there are more constraints on the left subtree than on the right subtree - - This is required to avoid handling permutations. On a 2:2:2:2 tree, if the - contraints are (0,1,3), it is equivalent to (0,1,2) The canonical form is the - second one. This help to handle the case (0,6,7,9,11,13,14,15) which are - symetric constaints and for which the canonical form is (0,1,2,4,6,8,9,12)) - - - - We store in *perm the way to go from the canonical form to the original constraints. - perm is a way to go from canonical[i] to the corresponding constraints[k]: perm[canonical[i]]=constraints[k] - */ -void canonize_constraints(tm_topology_t *topology, int *constraints, int **canonical, int n, int **perm, int *m) -{ - int *p = NULL, *c = NULL; - int i; - unsigned int vl = get_verbose_level(); - - *m = compute_nb_leaves_from_level(0,topology); - - p = (int*) MALLOC(sizeof(int)*(*m)); - for( i = 0 ; i < *m ; i++ ) - p[i] = i; - - c = (int*) MALLOC(sizeof(int)*n); - - if(vl>=DEBUG){ - printf("constraints:"); - print_1D_tab(constraints, n); - } - - recursive_canonicalization(0, topology, constraints, c, p, n, *m); - - if(vl>=DEBUG){ - printf("canonical:"); - print_1D_tab(c, n); - printf("perm:"); - print_1D_tab(p, *m); - } - - *perm = p; - *canonical = c; -} diff --git a/ompi/mca/topo/treematch/treematch/tm_mapping.h b/ompi/mca/topo/treematch/treematch/tm_mapping.h index 0068184b567..97b3a728a71 100644 --- a/ompi/mca/topo/treematch/treematch/tm_mapping.h +++ b/ompi/mca/topo/treematch/treematch/tm_mapping.h @@ -1,43 +1,33 @@ +#ifndef __TM_MAPPING_H__ +#define __TM_MAPPING_H__ #include "tm_tree.h" -#include "tm_hwloc.h" +#include "tm_topology.h" #include "tm_timings.h" #include "tm_verbose.h" -int build_comm(char *filename,double ***pcomm); -void TreeMatchMapping(int nb_obj, int nb_proc,double **comm_mat, double * obj_weigth, double *com_speed, int d, int *sol); - -/*Map topology to cores: - sigma_i is such that process i is mapped on core sigma_i - k_i is such that core i exectutes process k_i - - size of sigma is the number of process (nb_objs) - size of k is the number of cores/nodes (nb_proc) - - We must have numbe of process<=number of cores - - k_i =-1 if no process is mapped on core i -*/ -void map_topology_simple(tm_topology_t *topology,tree_t *comm_tree, int *sigma, int nb_processes, int *k); - -int nb_processing_units(tm_topology_t *topology); -void free_topology(tm_topology_t *topology); -void display_other_heuristics(tm_topology_t *topology,int N,double **comm,int TGT_flag, int *constraints, double *cost); -void print_1D_tab(int *tab,int N); +tm_affinity_mat_t * new_affinity_mat(double **mat, double *sum_row, int order); void build_synthetic_proc_id(tm_topology_t *topology); -void display_topology(tm_topology_t *topology); -tm_topology_t *build_synthetic_topology(int *arity, int nb_levels, int *core_numbering, int nb_core_per_node); -tm_topology_t *optimize_topology(tm_topology_t *topology); -double print_sol_inv(int N,int *Value,double **comm, double *cost, tm_topology_t *topology); -double print_sol(int N,int *Value,double **comm, double *cost, tm_topology_t *topology); -int build_binding_constraints(char *filename, int **ptab); -void canonize_constraints(tm_topology_t *topology, int *constraints, int **canonical, int n, int **perm, int *m); +tm_topology_t *build_synthetic_topology(int *arity, int nb_levels, int *core_numbering, int nb_core_per_nodes); int compute_nb_leaves_from_level(int depth,tm_topology_t *topology); -void FREE_topology(tm_topology_t *); - +void depth_first(tm_tree_t *comm_tree, int *proc_list,int *i); +int fill_tab(int **new_tab,int *tab, int n, int start, int max_val, int shift); +void init_mat(char *filename,int N, double **mat, double *sum_row); +void map_topology(tm_topology_t *topology,tm_tree_t *comm_tree, int level, + int *sigma, int nb_processes, int **k, int nb_compute_units); +int nb_leaves(tm_tree_t *comm_tree); +int nb_lines(char *filename); +int nb_processing_units(tm_topology_t *topology); +void print_1D_tab(int *tab,int N); +tm_solution_t * tm_compute_mapping(tm_topology_t *topology,tm_tree_t *comm_tree); +void tm_free_affinity_mat(tm_affinity_mat_t *aff_mat); +tm_affinity_mat_t *tm_load_aff_mat(char *filename); +void update_comm_speed(double **comm_speed,int old_size,int new_size); /* use to split a constaint into subconstraint according the tree*/ -typedef struct _constraint{ +typedef struct{ int *constraints; /* the subconstraints*/ int length; /*length of *constraints*/ int id; /* id of the corresponding subtree*/ }constraint_t; + +#endif diff --git a/ompi/mca/topo/treematch/treematch/tm_mt.h b/ompi/mca/topo/treematch/treematch/tm_mt.h index 260067d514d..58f50d8f509 100644 --- a/ompi/mca/topo/treematch/treematch/tm_mt.h +++ b/ompi/mca/topo/treematch/treematch/tm_mt.h @@ -2,8 +2,7 @@ void init_genrand(unsigned long s); void init_by_array(unsigned long init_key[], int key_length); /* generates a random number on the interval [0,0x7fffffff] */ -unsigned long genrand_int32(void); - +unsigned long genrand_int32(void); long genrand_int31(void); double genrand_real1(void); double genrand_real2(void); diff --git a/ompi/mca/topo/treematch/treematch/tm_solution.c b/ompi/mca/topo/treematch/treematch/tm_solution.c new file mode 100644 index 00000000000..a0fde41e299 --- /dev/null +++ b/ompi/mca/topo/treematch/treematch/tm_solution.c @@ -0,0 +1,504 @@ +#include +#include +#include "tm_solution.h" +#include "tm_mt.h" +#include "tm_mapping.h" + +typedef struct { + int val; + long key; +} hash_t; + + +void tm_free_solution(tm_solution_t *sol){ + int i,n; + + n = sol->k_length; + + if(sol->k) + for(i=0 ; ik[i]); + + FREE(sol->k); + FREE(sol->sigma); + FREE(sol); +} + +/* + Compute the distance in the tree + between node i and j : the farther away node i and j, the + larger the returned value. + + The algorithm looks at the largest level, starting from the top, + for which node i and j are still in the same subtree. This is done + by iteratively dividing their numbering by the arity of the levels +*/ +int distance(tm_topology_t *topology,int i, int j) +{ + int level = 0; + int arity; + int f_i, f_j ; + int vl = tm_get_verbose_level(); + int depth = topology->nb_levels-1; + + f_i = topology->node_rank[depth][i]; + f_j = topology->node_rank[depth][j]; + + if(vl >= DEBUG) + printf("i=%d, j=%d Level = %d f=(%d,%d)\n",i ,j, level, f_i, f_j); + + + do{ + level++; + arity = topology->arity[level]; + if( arity == 0 ) + arity = 1; + f_i = f_i/arity; + f_j = f_j/arity; + } while((f_i!=f_j) && (level < depth)); + + if(vl >= DEBUG) + printf("distance(%d,%d):%d\n",topology->node_rank[depth][i], topology->node_rank[depth][j], level); + /* exit(-1); */ + return level; +} + +double display_sol_sum_com(tm_topology_t *topology, tm_affinity_mat_t *aff_mat, int *sigma) +{ + double a,c,sol; + int i,j; + double *cost = topology->cost; + double **mat = aff_mat->mat; + int N = aff_mat->order; + int depth = topology->nb_levels - 1; + + + sol = 0; + for ( i = 0 ; i < N ; i++ ) + for ( j = i+1 ; j < N ; j++){ + c = mat[i][j]; + /* + Compute cost in funvtion of the inverse of the distance + This is due to the fact that the cost matrix is numbered + from top to bottom : cost[0] is the cost of the longest distance. + */ + a = cost[depth-distance(topology,sigma[i],sigma[j])]; + if(tm_get_verbose_level() >= DEBUG) + printf("T_%d_%d %f*%f=%f\n",i,j,c,a,c*a); + sol += c*a; + } + + for (i = 0; i < N; i++) { + printf("%d", sigma[i]); + if(icost; + double **mat = aff_mat->mat; + int N = aff_mat->order; + int vl = tm_get_verbose_level(); + int depth = topology->nb_levels - 1; + + sol = 0; + for ( i = 0 ; i < N ; i++ ) + for ( j = i+1 ; j < N ; j++){ + c = mat[i][j]; + /* + Compute cost in funvtion of the inverse of the distance + This is due to the fact that the cost matrix is numbered + from top to bottom : cost[0] is the cost of the longest distance. + */ + a = cost[depth-distance(topology,sigma[i],sigma[j])]; + if(vl >= DEBUG) + printf("T_%d_%d %f*%f=%f\n",i,j,c,a,c*a); + if(c*a > sol) + sol = c*a; + } + + for (i = 0; i < N; i++) { + printf("%d", sigma[i]); + if(imat; + int N = aff_mat->order; + + sol = 0; + for ( i = 0 ; i < N ; i++ ) + for ( j = i+1 ; j < N ; j++){ + c = mat[i][j]; + nb_hops = 2*distance(topology,sigma[i],sigma[j]); + if(tm_get_verbose_level() >= DEBUG) + printf("T_%d_%d %f*%d=%f\n",i,j,c,nb_hops,c*nb_hops); + sol += c*nb_hops; + } + + for (i = 0; i < N; i++) { + printf("%d", sigma[i]); + if(i= ERROR){ + fprintf(stderr,"Error printing solution: metric %d not implemented\n",metric); + return -1; + } + } + return -1; +} + +double tm_display_solution(tm_topology_t *topology, tm_affinity_mat_t *aff_mat, tm_solution_t *sol, + tm_metric_t metric){ + + int i,j; + int **k = sol->k; + + + if(tm_get_verbose_level() >= DEBUG){ + printf("k: \n"); + for( i = 0 ; i < nb_processing_units(topology) ; i++ ){ + if(k[i][0] != -1){ + printf("\tProcessing unit %d: ",i); + for (j = 0 ; joversub_fact; j++){ + if( k[i][j] == -1) + break; + printf("%d ",k[i][j]); + } + printf("\n"); + } + } + } + + + return display_sol(topology, aff_mat, sol->sigma, metric); +} + +void tm_display_other_heuristics(tm_topology_t *topology, tm_affinity_mat_t *aff_mat, tm_metric_t metric) +{ + int *sigma = NULL; + int N = aff_mat->order; + + sigma = (int*)MALLOC(sizeof(int)*N); + + map_Packed(topology, N, sigma); + printf("Packed: "); + display_sol(topology, aff_mat, sigma, metric); + + map_RR(topology, N, sigma); + printf("RR: "); + display_sol(topology, aff_mat, sigma, metric); + +/* double duration; */ +/* CLOCK_T time1,time0; */ +/* CLOCK(time0); */ +/* map_MPIPP(topology,1,N,sigma,comm,arch); */ +/* CLOCK(time1); */ +/* duration=CLOCK_DIFF(time1,time0); */ +/* printf("MPIPP-1-D:%f\n",duration); */ +/* printf("MPIPP-1: "); */ +/* if (TGT_flag == 1) */ +/* print_sigma_inv(N,sigma,comm,arch); */ +/* else */ +/* print_sigma(N,sigma,comm,arch); */ + +/* CLOCK(time0); */ +/* map_MPIPP(topology,5,N,sigma,comm,arch); */ +/* CLOCK(time1); */ +/* duration=CLOCK_DIFF(time1,time0); */ +/* printf("MPIPP-5-D:%f\n",duration); */ +/* printf("MPIPP-5: "); */ +/* if (TGT_flag == 1) */ +/* print_sigma_inv(N,sigma,comm,arch); */ +/* else */ +/* print_sigma(N,sigma,comm,arch); */ + + FREE(sigma); +} + + +int in_tab(int *tab, int n, int val){ + int i; + for( i = 0; i < n ; i++) + if(tab[i] == val) + return 1; + + return 0; +} + +void map_Packed(tm_topology_t *topology, int N, int *sigma) +{ + size_t i; + int j = 0,depth; + int vl = tm_get_verbose_level(); + + depth = topology->nb_levels-1; + + for( i = 0 ; i < topology->nb_nodes[depth] ; i++){ + /* printf ("%d -> %d\n",objs[i]->os_index,i); */ + if((!topology->constraints) || (in_tab(topology->constraints, topology->nb_constraints, topology->node_id[depth][i]))){ + if(vl >= DEBUG) + printf ("%lu: %d -> %d\n", i, j, topology->node_id[depth][i]); + sigma[j++]=topology->node_id[depth][i]; + if(j == N) + break; + } + } +} + +void map_RR(tm_topology_t *topology, int N,int *sigma) +{ + int i; + int vl = tm_get_verbose_level(); + + for( i = 0 ; i < N ; i++ ){ + if(topology->constraints) + sigma[i]=topology->constraints[i%topology->nb_constraints]; + else + sigma[i]=i%topology->nb_proc_units; + if(vl >= DEBUG) + printf ("%d -> %d (%d)\n",i,sigma[i],topology->nb_proc_units); + } +} + +int hash_asc(const void* x1,const void* x2) +{ + hash_t *e1 = NULL,*e2 = NULL; + + e1 = ((hash_t*)x1); + e2 = ((hash_t*)x2); + + return (e1->key < e2->key) ? -1 : 1; +} + + +int *generate_random_sol(tm_topology_t *topology,int N,int level,int seed) +{ + hash_t *hash_tab = NULL; + int *sol = NULL; + int *nodes_id= NULL; + int i; + + nodes_id = topology->node_id[level]; + + hash_tab = (hash_t*)MALLOC(sizeof(hash_t)*N); + sol = (int*)MALLOC(sizeof(int)*N); + + init_genrand(seed); + + for( i = 0 ; i < N ; i++ ){ + hash_tab[i].val = nodes_id[i]; + hash_tab[i].key = genrand_int32(); + } + + qsort(hash_tab,N,sizeof(hash_t),hash_asc); + for( i = 0 ; i < N ; i++ ) + sol[i] = hash_tab[i].val; + + FREE(hash_tab); + return sol; +} + + +double eval_sol(int *sol,int N,double **comm, double **arch) +{ + double a,c,res; + int i,j; + + res = 0; + for ( i = 0 ; i < N ; i++ ) + for ( j = i+1 ; j < N ; j++ ){ + c = comm[i][j]; + a = arch[sol[i]][sol[j]]; + res += c/a; + } + + return res; +} + +void exchange(int *sol,int i,int j) +{ + int tmp; + tmp = sol[i]; + sol[i] = sol[j]; + sol[j] = tmp; +} + +double gain_exchange(int *sol,int l,int m,double eval1,int N,double **comm, double **arch) +{ + double eval2; + if( l == m ) + return 0; + exchange(sol,l,m); + eval2 = eval_sol(sol,N,comm,arch); + exchange(sol,l,m); + + return eval1-eval2; +} + +void select_max(int *l,int *m,double **gain,int N,int *state) +{ + double max; + int i,j; + + max = -DBL_MAX; + + for( i = 0 ; i < N ; i++ ) + if(!state[i]) + for( j = 0 ; j < N ; j++ ) + if( (i != j) && (!state[j]) ){ + if(gain[i][j] > max){ + *l = i; + *m = j; + max=gain[i][j]; + } + } +} + + +void compute_gain(int *sol,int N,double **gain,double **comm, double **arch) +{ + double eval1; + int i,j; + + eval1 = eval_sol(sol,N,comm,arch); + for( i = 0 ; i < N ; i++ ) + for( j = 0 ; j <= i ; j++) + gain[i][j] = gain[j][i] = gain_exchange(sol,i,j,eval1,N,comm,arch); +} + + +/* Randomized Algorithm of +Hu Chen, Wenguang Chen, Jian Huang ,Bob Robert,and H.Kuhn. Mpipp: an automatic profile-guided +parallel process placement toolset for smp clusters and multiclusters. In +Gregory K. Egan and Yoichi Muraoka, editors, ICS, pages 353-360. ACM, 2006. + */ + +void map_MPIPP(tm_topology_t *topology,int nb_seed,int N,int *sigma,double **comm, double **arch) +{ + int *sol = NULL; + int *state = NULL; + double **gain = NULL; + int **history = NULL; + double *temp = NULL; + int i,j,t,l=0,m=0,seed=0; + double max,sum,best_eval,eval; + + gain = (double**)MALLOC(sizeof(double*)*N); + history = (int**)MALLOC(sizeof(int*)*N); + for( i = 0 ; i < N ; i++){ + gain[i] = (double*)MALLOC(sizeof(double)*N); + history[i] = (int*)MALLOC(sizeof(int)*3); + } + + state = (int*)MALLOC(sizeof(int)*N); + temp = (double*)MALLOC(sizeof(double)*N); + + sol = generate_random_sol(topology,N,topology->nb_levels-1,seed++); + for( i = 0 ; i < N ; i++) + sigma[i] = sol[i]; + + best_eval = DBL_MAX; + while(seed <= nb_seed){ + do{ + for( i = 0 ; i < N ; i++ ){ + state[i] = 0; + /* printf("%d ",sol[i]); */ + } + /* printf("\n"); */ + compute_gain(sol,N,gain,comm,arch); + /* + display_tab(gain,N); + exit(-1); + */ + for( i = 0 ; i < N/2 ; i++ ){ + select_max(&l,&m,gain,N,state); + /* printf("%d: %d <=> %d : %f\n",i,l,m,gain[l][m]); */ + state[l] = 1; + state[m] = 1; + exchange(sol,l,m); + history[i][1] = l; + history[i][2] = m; + temp[i] = gain[l][m]; + compute_gain(sol,N,gain,comm,arch); + } + + t = -1; + max = 0; + sum = 0; + for(i = 0 ; i < N/2 ; i++ ){ + sum += temp[i]; + if( sum > max ){ + max = sum; + t = i; + } + } + /*for(j=0;j<=t;j++) + printf("exchanging: %d with %d for gain: %f\n",history[j][1],history[j][2],temp[j]); */ + for( j = t+1 ; j < N/2 ; j++ ){ + exchange(sol,history[j][1],history[j][2]); + /* printf("Undoing: %d with %d for gain: %f\n",history[j][1],history[j][2],temp[j]); */ + } + /* printf("max=%f\n",max); */ + + /*for(i=0;i 0 ); + FREE(sol); + sol=generate_random_sol(topology,N,topology->nb_levels-1,seed++); + } + + + FREE(sol); + FREE(temp); + FREE(state); + for( i = 0 ; i < N ; i++){ + FREE(gain[i]); + FREE(history[i]); + } + FREE(gain); + FREE(history); +} diff --git a/ompi/mca/topo/treematch/treematch/tm_solution.h b/ompi/mca/topo/treematch/treematch/tm_solution.h new file mode 100644 index 00000000000..5ed62b7022b --- /dev/null +++ b/ompi/mca/topo/treematch/treematch/tm_solution.h @@ -0,0 +1,26 @@ +#ifndef TM_SOLUION_H +#define TM_SOLUION_H + +#include "treematch.h" + +void tm_free_solution(tm_solution_t *sol); +int distance(tm_topology_t *topology,int i, int j); +double display_sol_sum_com(tm_topology_t *topology, tm_affinity_mat_t *aff_mat, int *sigma); + double display_sol(tm_topology_t *topology, tm_affinity_mat_t *aff_mat, int *sigma, tm_metric_t metric); +double tm_display_solution(tm_topology_t *topology, tm_affinity_mat_t *aff_mat, tm_solution_t *sol, + tm_metric_t metric); +void tm_display_other_heuristics(tm_topology_t *topology, tm_affinity_mat_t *aff_mat, tm_metric_t metric); +int in_tab(int *tab, int n, int val); +void map_Packed(tm_topology_t *topology, int N, int *sigma); +void map_RR(tm_topology_t *topology, int N, int *sigma); +int hash_asc(const void* x1,const void* x2); +int *generate_random_sol(tm_topology_t *topology,int N,int level,int seed); +double eval_sol(int *sol,int N,double **comm, double **arch); +void exchange(int *sol,int i,int j); +double gain_exchange(int *sol,int l,int m,double eval1,int N,double **comm, double **arch); +void select_max(int *l,int *m,double **gain,int N,int *state); +void compute_gain(int *sol,int N,double **gain,double **comm, double **arch); +void map_MPIPP(tm_topology_t *topology,int nb_seed,int N,int *sigma,double **comm, double **arch); + + +#endif diff --git a/ompi/mca/topo/treematch/treematch/tm_thread_pool.c b/ompi/mca/topo/treematch/treematch/tm_thread_pool.c index ce649ce0970..ef9ccbf68df 100644 --- a/ompi/mca/topo/treematch/treematch/tm_thread_pool.c +++ b/ompi/mca/topo/treematch/treematch/tm_thread_pool.c @@ -1,13 +1,18 @@ #include #include "tm_thread_pool.h" #include "tm_verbose.h" -#include "opal/mca/hwloc/hwloc-internal.h" +#include #include "tm_verbose.h" #include "tm_tree.h" #include +#include +typedef enum _mapping_policy {COMPACT, SCATTER} mapping_policy_t; + +static mapping_policy_t mapping_policy = COMPACT; static int verbose_level = ERROR; static thread_pool_t *pool = NULL; +static unsigned int max_nb_threads = INT_MAX; static thread_pool_t *get_thread_pool(void); static void execute_work(work_t *work); @@ -16,39 +21,21 @@ static void *thread_loop(void *arg); static void add_work(pthread_mutex_t *list_lock, pthread_cond_t *cond_var, work_t *working_list, work_t *work); static thread_pool_t *create_threads(void); -static void f1 (int nb_args, void **args); -static void f2 (int nb_args, void **args); +static void f1 (int nb_args, void **args, int thread_id); +static void f2 (int nb_args, void **args, int thread_id); static void destroy_work(work_t *work); +#define MIN(a, b) ((a)<(b)?(a):(b)) +#define MAX(a, b) ((a)>(b)?(a):(b)) -void f1 (int nb_args, void **args){ - int a, b; - a = *(int*)args[0]; - b = *(int*)args[1]; - printf("nb_args=%d, a=%d, b=%d\n",nb_args,a,b); -} -void f2 (int nb_args, void **args){ - int n, *tab; - int *res; - int i,j; - n = *(int*)args[0]; - tab = (int*)args[1]; - res=(int*)args[2]; - - for(j=0;j<1000000;j++){ - *res=0; - for (i=0;itask(work->nb_args, work->args); + work->task(work->nb_args, work->args, work->thread_id); } int bind_myself_to_core(hwloc_topology_t topology, int id){ @@ -57,10 +44,29 @@ int bind_myself_to_core(hwloc_topology_t topology, int id){ char *str; int binding_res; int depth = hwloc_topology_get_depth(topology); + int nb_cores = hwloc_get_nbobjs_by_depth(topology, depth-1); + int my_core; + int nb_threads = get_nb_threads(); /* printf("depth=%d\n",depth); */ + switch (mapping_policy){ + case SCATTER: + my_core = id*(nb_cores/nb_threads); + break; + default: + if(verbose_level>=WARNING){ + printf("Wrong scheduling policy. Using COMPACT\n"); + } + case COMPACT: + my_core = id%nb_cores; + } + + if(verbose_level>=INFO){ + printf("Mapping thread %d on core %d\n",id,my_core); + } + /* Get my core. */ - obj = hwloc_get_obj_by_depth(topology, depth-1, id); + obj = hwloc_get_obj_by_depth(topology, depth-1, my_core); if (obj) { /* Get a copy of its cpuset that we may modify. */ cpuset = hwloc_bitmap_dup(obj->cpuset); @@ -71,7 +77,7 @@ int bind_myself_to_core(hwloc_topology_t topology, int id){ /*hwloc_bitmap_asprintf(&str, cpuset); - printf("Binding thread %d to cpuset %s\n", id,str); + printf("Binding thread %d to cpuset %s\n", my_core,str); FREE(str); */ @@ -81,8 +87,8 @@ int bind_myself_to_core(hwloc_topology_t topology, int id){ int error = errno; hwloc_bitmap_asprintf(&str, obj->cpuset); if(verbose_level>=WARNING) - fprintf(stderr,"%d Couldn't bind to cpuset %s: %s\n", id, str, strerror(error)); - FREE(str); + printf("Thread %d couldn't bind to cpuset %s: %s.\n This thread is not bound to any core...\n", my_core, str, strerror(error)); + free(str); /* str is allocated by hlwoc, free it normally*/ return 0; } /* FREE our cpuset copy */ @@ -90,7 +96,7 @@ int bind_myself_to_core(hwloc_topology_t topology, int id){ return 1; }else{ if(verbose_level>=WARNING) - fprintf(stderr,"No valid object for core id %d!\n",id); + printf("No valid object for core id %d!\n",my_core); return 0; } } @@ -161,6 +167,7 @@ void wait_work_completion(work_t *work){ int submit_work(work_t *work, int thread_id){ if( (thread_id>=0) && (thread_id< pool->nb_threads)){ + work->thread_id = thread_id; add_work(&pool->list_lock[thread_id], &pool->cond_var[thread_id], &pool->working_list[thread_id], work); return 1; } @@ -171,11 +178,11 @@ thread_pool_t *create_threads(){ hwloc_topology_t topology; int i; local_thread_t *local; - int nb_cores; + int nb_threads; + unsigned int nb_cores; int depth; - verbose_level = get_verbose_level(); - + verbose_level = tm_get_verbose_level(); /*Get number of cores: set 1 thread per core*/ /* Allocate and initialize topology object. */ @@ -187,7 +194,7 @@ thread_pool_t *create_threads(){ depth = hwloc_topology_get_depth(topology); if (depth == -1 ) { if(verbose_level>=CRITICAL) - fprintf(stderr,"Error: topology with unknown depth\n"); + fprintf(stderr,"Error: HWLOC unable to find the depth of the topology of this node!\n"); exit(-1); } @@ -195,19 +202,23 @@ thread_pool_t *create_threads(){ /* at depth 'depth' it is necessary a PU/core where we can execute things*/ nb_cores = hwloc_get_nbobjs_by_depth(topology, depth-1); + nb_threads = MIN(nb_cores, max_nb_threads); + + if(verbose_level>=INFO) + printf("nb_threads = %d\n",nb_threads); pool = (thread_pool_t*) MALLOC(sizeof(thread_pool_t)); pool -> topology = topology; - pool -> nb_threads = nb_cores; - pool -> thread_list = (pthread_t*)MALLOC(sizeof(pthread_t)*nb_cores); - pool -> working_list = (work_t*)CALLOC(nb_cores,sizeof(work_t)); - pool -> cond_var = (pthread_cond_t*)MALLOC(sizeof(pthread_cond_t)*nb_cores); - pool -> list_lock = (pthread_mutex_t*)MALLOC(sizeof(pthread_mutex_t)*nb_cores); + pool -> nb_threads = nb_threads; + pool -> thread_list = (pthread_t*)MALLOC(sizeof(pthread_t)*nb_threads); + pool -> working_list = (work_t*)CALLOC(nb_threads,sizeof(work_t)); + pool -> cond_var = (pthread_cond_t*)MALLOC(sizeof(pthread_cond_t)*nb_threads); + pool -> list_lock = (pthread_mutex_t*)MALLOC(sizeof(pthread_mutex_t)*nb_threads); - local=(local_thread_t*)MALLOC(sizeof(local_thread_t)*nb_cores); + local=(local_thread_t*)MALLOC(sizeof(local_thread_t)*nb_threads); pool->local = local; - for (i=0;iworking_list[i]; @@ -245,11 +256,12 @@ void terminate_thread_pool(){ for (id=0;idnb_threads;id++){ pthread_join(pool->thread_list[id],(void **) &ret); + FREE(ret); pthread_cond_destroy(pool->cond_var +id); pthread_mutex_destroy(pool->list_lock +id); if (pool->working_list[id].next != NULL) if(verbose_level >= WARNING) - fprintf(stderr,"Working list of thread %d not empty!\n",id); + printf("Working list of thread %d not empty!\n",id); } hwloc_topology_destroy(pool->topology); @@ -272,7 +284,7 @@ int get_nb_threads(){ } -work_t *create_work(int nb_args, void **args, void (*task) (int, void **)){ +work_t *create_work(int nb_args, void **args, void (*task) (int, void **, int)){ work_t *work; work = MALLOC(sizeof(work_t)); work -> nb_args = nb_args; @@ -293,6 +305,34 @@ void destroy_work(work_t *work){ FREE(work); } +/* CODE example 2 functions and test driver*/ + +void f1 (int nb_args, void **args, int thread_id){ + int a, b; + a = *(int*)args[0]; + b = *(int*)args[1]; + printf("id: %d, nb_args=%d, a=%d, b=%d\n",thread_id, nb_args,a,b); +} + + +void f2 (int nb_args, void **args, int thread_id){ + int n, *tab; + int *res; + int i,j; + n = *(int*)args[0]; + tab = (int*)args[1]; + res=(int*)args[2]; + + for(j=0;j<1000000;j++){ + *res=0; + for (i=0;i -#include "opal/mca/hwloc/hwloc-internal.h" +#include typedef struct _work_t{ int nb_args; - void (*task)(int nb_args, void **args); + void (*task)(int nb_args, void **args, int thread_id); void **args; struct _work_t *next; pthread_cond_t work_done; pthread_mutex_t mutex; int done; + int thread_id; }work_t; typedef struct { @@ -38,8 +39,10 @@ int get_nb_threads(void); int submit_work(work_t *work, int thread_id); void wait_work_completion(work_t *work); void terminate_thread_pool(void); -work_t *create_work(int nb_args, void **args, void (int, void **)); +work_t *create_work(int nb_args, void **args, void (int, void **, int)); int test_main(void); + + #endif /* THREAD_POOL_H */ diff --git a/ompi/mca/topo/treematch/treematch/tm_timings.c b/ompi/mca/topo/treematch/treematch/tm_timings.c index 8f00865eba9..b20747370e5 100644 --- a/ompi/mca/topo/treematch/treematch/tm_timings.c +++ b/ompi/mca/topo/treematch/treematch/tm_timings.c @@ -12,6 +12,7 @@ void get_time(void) CLOCK(time_tab[clock_num]); } + double time_diff(void) { CLOCK_T t2,t1; @@ -22,7 +23,7 @@ double time_diff(void) } if(clock_num < 0){ - return -1.0; + return -2.0; } CLOCK(t2); diff --git a/ompi/mca/topo/treematch/treematch/tm_timings.h b/ompi/mca/topo/treematch/treematch/tm_timings.h index 250ee5c1459..377a1cd46ef 100644 --- a/ompi/mca/topo/treematch/treematch/tm_timings.h +++ b/ompi/mca/topo/treematch/treematch/tm_timings.h @@ -1,4 +1,3 @@ - #ifndef TIMINGS_H #define TIMINGS_H #include diff --git a/ompi/mca/topo/treematch/treematch/tm_topology.c b/ompi/mca/topo/treematch/treematch/tm_topology.c new file mode 100644 index 00000000000..4445b45634c --- /dev/null +++ b/ompi/mca/topo/treematch/treematch/tm_topology.c @@ -0,0 +1,850 @@ +#include +#include +#include "tm_tree.h" +#include "tm_mapping.h" +#include +#include "tm_verbose.h" +#include "tm_solution.h" + + +tm_topology_t* get_local_topo_with_hwloc(void); +tm_topology_t* hwloc_to_tm(char *filename); +int int_cmp_inc(const void* x1,const void* x2); +void optimize_arity(int **arity, double **cost, int *nb_levels,int n); +int symetric(hwloc_topology_t topology); +tm_topology_t * tgt_to_tm(char *filename); +void tm_display_arity(tm_topology_t *topology); +void tm_display_topology(tm_topology_t *topology); +void tm_free_topology(tm_topology_t *topology); +tm_topology_t *tm_load_topology(char *arch_filename, tm_file_type_t arch_file_type); +void tm_optimize_topology(tm_topology_t **topology); +int tm_topology_add_binding_constraints(char *constraints_filename, tm_topology_t *topology); +int topo_nb_proc(hwloc_topology_t topology,int N); +void topology_arity_cpy(tm_topology_t *topology,int **arity,int *nb_levels); +void topology_constraints_cpy(tm_topology_t *topology,int **constraints,int *nb_constraints); +void topology_cost_cpy(tm_topology_t *topology,double **cost); +void topology_numbering_cpy(tm_topology_t *topology,int **numbering,int *nb_nodes); +double ** topology_to_arch(hwloc_topology_t topology); +void build_synthetic_proc_id(tm_topology_t *topology); +tm_topology_t *tm_build_synthetic_topology(int *arity, double *cost, int nb_levels, int *core_numbering, int nb_core_per_nodes); + + +#define LINE_SIZE (1000000) + + +/* transform a tgt scotch file into a topology file*/ +tm_topology_t * tgt_to_tm(char *filename) +{ + tm_topology_t *topology = NULL; + FILE *pf = NULL; + char line[1024]; + char *s = NULL; + double *cost = NULL; + int i; + + + + pf = fopen(filename,"r"); + if(!pf){ + if(tm_get_verbose_level() >= CRITICAL) + fprintf(stderr,"Cannot open %s\n",filename); + exit(-1); + } + + if(tm_get_verbose_level() >= INFO) + printf("Reading TGT file: %s\n",filename); + + + fgets(line,1024,pf); + fclose(pf); + + s = strstr(line,"tleaf"); + if(!s){ + if(tm_get_verbose_level() >= CRITICAL) + fprintf(stderr,"Syntax error! %s is not a tleaf file\n",filename); + exit(-1); + } + + s += 5; + while(isspace(*s)) + s++; + + topology = (tm_topology_t*)MALLOC(sizeof(tm_topology_t)); + topology->nb_constraints = 0; + topology->oversub_fact = 1; + topology->constraints = NULL; + topology->nb_levels = atoi(strtok(s," "))+1; + topology->arity = (int*)MALLOC(sizeof(int)*topology->nb_levels); + + cost = (double*)CALLOC(topology->nb_levels,sizeof(double)); + + for( i = 0 ; i < topology->nb_levels-1 ; i++ ){ + topology->arity[i] = atoi(strtok(NULL," ")); + cost[i] = atoi(strtok(NULL," ")); + } + + topology->arity[topology->nb_levels-1] = 0; + /* cost[topology->nb_levels-1]=0; */ + + /*aggregate costs*/ + for( i = topology->nb_levels-2 ; i >= 0 ; i-- ) + cost[i] += cost[i+1]; + + build_synthetic_proc_id(topology); + + if(tm_get_verbose_level() >= INFO) + printf("Topology built from %s!\n",filename); + + topology->cost=cost; + + + return topology; +} + +int topo_nb_proc(hwloc_topology_t topology,int N) +{ + hwloc_obj_t *objs = NULL; + int nb_proc; + + objs = (hwloc_obj_t*)MALLOC(sizeof(hwloc_obj_t)*N); + objs[0] = hwloc_get_next_obj_by_type(topology,HWLOC_OBJ_PU,NULL); + nb_proc = 1 + hwloc_get_closest_objs(topology,objs[0],objs+1,N-1); + FREE(objs); + return nb_proc; +} + + + +static double link_cost(int depth) +{ + /* + Bertha values + double tab[5]={21,9,4.5,2.5,0.001}; + double tab[5]={1,1,1,1,1}; + double tab[6]={100000,10000,1000,500,100,10}; + */ + double tab[11] = {1024,512,256,128,64,32,16,8,4,2,1}; + + return tab[depth]; + /* + return 10*log(depth+2); + return (depth+1); + return (long int)pow(100,depth); + */ +} + + +double ** topology_to_arch(hwloc_topology_t topology) +{ + int nb_proc,i,j; + hwloc_obj_t obj_proc1,obj_proc2,obj_res; + double **arch = NULL; + + nb_proc = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_PU); + arch = (double**)MALLOC(sizeof(double*)*nb_proc); + for( i = 0 ; i < nb_proc ; i++ ){ + obj_proc1 = hwloc_get_obj_by_type(topology,HWLOC_OBJ_PU,i); + arch[obj_proc1->os_index] = (double*)MALLOC(sizeof(double)*nb_proc); + for( j = 0 ; j < nb_proc ; j++ ){ + obj_proc2 = hwloc_get_obj_by_type(topology,HWLOC_OBJ_PU,j); + obj_res = hwloc_get_common_ancestor_obj(topology,obj_proc1,obj_proc2); + /* printf("arch[%d][%d] <- %ld\n",obj_proc1->os_index,obj_proc2->os_index,*((long int*)(obj_res->userdatab))); */ + arch[obj_proc1->os_index][obj_proc2->os_index]=link_cost(obj_res->depth+1); + } + } + return arch; +} + +int symetric(hwloc_topology_t topology) +{ + int depth,i,topodepth = hwloc_topology_get_depth(topology); + unsigned int arity; + hwloc_obj_t obj; + for ( depth = 0; depth < topodepth-1 ; depth++ ) { + int N = hwloc_get_nbobjs_by_depth(topology, depth); + obj = hwloc_get_next_obj_by_depth (topology,depth,NULL); + arity = obj->arity; + + /* printf("Depth=%d, N=%d, Arity:%d\n",depth,N,arity); */ + for (i = 1; i < N; i++ ){ + obj = hwloc_get_next_obj_by_depth (topology,depth,obj); + if( obj->arity != arity){ + /* printf("[%d]: obj->arity=%d, arity=%d\n",i,obj->arity,arity); */ + return 0; + } + } + } + return 1; +} + +tm_topology_t* hwloc_to_tm(char *filename) +{ + hwloc_topology_t topology; + tm_topology_t *res = NULL; + hwloc_obj_t *objs = NULL; + unsigned topodepth,depth; + unsigned int nb_nodes; + double *cost; + int err, l; + unsigned int i; + int vl = tm_get_verbose_level(); + + /* Build the topology */ + hwloc_topology_init(&topology); + err = hwloc_topology_set_xml(topology,filename); + if(err == -1){ + if(vl >= CRITICAL) + fprintf(stderr,"Error: %s is a bad xml topology file!\n",filename); + exit(-1); + } + +#if HWLOC_API_VERSION >= 0x00020000 + hwloc_topology_set_all_types_filter(topology, HWLOC_TYPE_FILTER_KEEP_STRUCTURE); +#else /* HWLOC_API_VERSION >= 0x00020000 */ + hwloc_topology_ignore_all_keep_structure(topology); +#endif /* HWLOC_API_VERSION >= 0x00020000 */ + hwloc_topology_load(topology); + + + /* Test if symetric */ + if(!symetric(topology)){ + if(tm_get_verbose_level() >= CRITICAL) + fprintf(stderr,"%s not symetric!\n",filename); + exit(-1); + } + + /* work on depth */ + topodepth = hwloc_topology_get_depth(topology); + + res = (tm_topology_t*)MALLOC(sizeof(tm_topology_t)); + res->oversub_fact = 1; + res->nb_constraints = 0; + res->constraints = NULL; + res->nb_levels = topodepth; + res->node_id = (int**)MALLOC(sizeof(int*)*res->nb_levels); + res->node_rank = (int**)MALLOC(sizeof(int*)*res->nb_levels); + res->nb_nodes = (size_t*)MALLOC(sizeof(size_t)*res->nb_levels); + res->arity = (int*)MALLOC(sizeof(int)*res->nb_levels); + + if(vl >= INFO) + printf("topodepth = %d\n",topodepth); + + /* Build TreeMatch topology */ + for( depth = 0 ; depth < topodepth ; depth++ ){ + nb_nodes = hwloc_get_nbobjs_by_depth(topology, depth); + res->nb_nodes[depth] = nb_nodes; + res->node_id[depth] = (int*)MALLOC(sizeof(int)*nb_nodes); + res->node_rank[depth] = (int*)MALLOC(sizeof(int)*nb_nodes); + + objs = (hwloc_obj_t*)MALLOC(sizeof(hwloc_obj_t)*nb_nodes); + objs[0] = hwloc_get_next_obj_by_depth(topology,depth,NULL); + hwloc_get_closest_objs(topology,objs[0],objs+1,nb_nodes-1); + res->arity[depth] = objs[0]->arity; + + if (depth == topodepth -1){ + res->nb_constraints = nb_nodes; + res->nb_proc_units = nb_nodes; + } + + if(vl >= DEBUG) + printf("\n--%d(%d) **%d**:--\n",res->arity[depth],nb_nodes,res->arity[0]); + + /* Build process id tab */ + for (i = 0; i < nb_nodes; i++){ + if(objs[i]->os_index > nb_nodes){ + if(vl >= CRITICAL){ + fprintf(stderr, "Index of object %d of level %d is %d and larger than number of nodes : %d\n", + i, depth, objs[i]->os_index, nb_nodes); + } + exit(-1); + } + + res->node_id[depth][i] = objs[i]->os_index; + res->node_rank[depth][objs[i]->os_index] = i; + /* if(depth==topodepth-1) */ + } + FREE(objs); + + + } + + cost = (double*)CALLOC(res->nb_levels,sizeof(double)); + for(l=0; lnb_levels; l++){ + cost[l] = link_cost(l); + } + res->cost = cost; + + + /* Destroy topology object. */ + hwloc_topology_destroy(topology); + if(tm_get_verbose_level() >= INFO) + printf("\n"); + + + + return res; +} + +tm_topology_t* get_local_topo_with_hwloc(void) +{ + hwloc_topology_t topology; + tm_topology_t *res = NULL; + hwloc_obj_t *objs = NULL; + unsigned topodepth,depth; + int nb_nodes,i; + + /* Build the topology */ + hwloc_topology_init(&topology); +#if HWLOC_API_VERSION >= 0x00020000 + hwloc_topology_set_all_types_filter(topology, HWLOC_TYPE_FILTER_KEEP_STRUCTURE); +#else /* HWLOC_API_VERSION >= 0x00020000 */ + hwloc_topology_ignore_all_keep_structure(topology); +#endif /* HWLOC_API_VERSION >= 0x00020000 */ + hwloc_topology_load(topology); + + /* Test if symetric */ + if(!symetric(topology)){ + if(tm_get_verbose_level() >= CRITICAL) + fprintf(stderr,"Local toplogy not symetric!\n"); + exit(-1); + } + + /* work on depth */ + topodepth = hwloc_topology_get_depth(topology); + + res = (tm_topology_t*)MALLOC(sizeof(tm_topology_t)); + res->nb_constraints = 0; + res->constraints = NULL; + res->nb_levels = topodepth; + res->node_id = (int**)MALLOC(sizeof(int*)*res->nb_levels); + res->node_rank = (int**)MALLOC(sizeof(int*)*res->nb_levels); + res->nb_nodes = (size_t*)MALLOC(sizeof(size_t)*res->nb_levels); + res->arity = (int*)MALLOC(sizeof(int)*res->nb_levels); + + /* Build TreeMatch topology */ + for( depth = 0 ; depth < topodepth ; depth++ ){ + nb_nodes = hwloc_get_nbobjs_by_depth(topology, depth); + res->nb_nodes[depth] = nb_nodes; + res->node_id[depth] = (int*)MALLOC(sizeof(int)*nb_nodes); + res->node_rank[depth] = (int*)MALLOC(sizeof(int)*nb_nodes); + + objs = (hwloc_obj_t*)MALLOC(sizeof(hwloc_obj_t)*nb_nodes); + objs[0] = hwloc_get_next_obj_by_depth(topology,depth,NULL); + hwloc_get_closest_objs(topology,objs[0],objs+1,nb_nodes-1); + res->arity[depth] = objs[0]->arity; + + if (depth == topodepth -1){ + res->nb_constraints = nb_nodes; + res->nb_proc_units = nb_nodes; + } + /* printf("%d:",res->arity[depth]); */ + + /* Build process id tab */ + for (i = 0; i < nb_nodes; i++){ + res->node_id[depth][i] = objs[i]->os_index; + res->node_rank[depth][objs[i]->os_index] = i; + /* if(depth==topodepth-1) */ + } + FREE(objs); + } + + + + /* Destroy HWLOC topology object. */ + hwloc_topology_destroy(topology); + + /* printf("\n"); */ + return res; +} + + +void tm_free_topology(tm_topology_t *topology) +{ + int i; + for( i = 0 ; i < topology->nb_levels ; i++ ){ + FREE(topology->node_id[i]); + FREE(topology->node_rank[i]); + } + + FREE(topology->constraints); + FREE(topology->node_id); + FREE(topology->node_rank); + FREE(topology->nb_nodes); + FREE(topology->arity); + FREE(topology->cost); + FREE(topology); +} + +tm_topology_t *tm_load_topology(char *arch_filename, tm_file_type_t arch_file_type){ + switch(arch_file_type){ + case TM_FILE_TYPE_TGT: + return tgt_to_tm(arch_filename); + case TM_FILE_TYPE_XML: + return hwloc_to_tm(arch_filename); + default: + if(tm_get_verbose_level() >= ERROR){ + fprintf(stderr,"Error loading topology. Filetype %d unknown\n", arch_file_type); + } + exit(-1); + } +} + + +void tm_display_topology(tm_topology_t *topology) +{ + int i; + unsigned int j; + unsigned long id; + for( i = 0 ; i < topology->nb_levels ; i++ ){ + printf("%d: ",i); + for( j = 0 ; j < topology->nb_nodes[i] ; j++) + printf("%d ",topology->node_id[i][j]); + printf("\n"); + } + + printf("Last level: "); + for(id = 0; id < topology->nb_nodes[topology->nb_levels-1]/topology->oversub_fact; id++) + printf("%d ",topology->node_rank[topology->nb_levels-1][id]); + printf("\n"); + + + if(topology->constraints){ + printf("Constraints: "); + for(i = 0; i < topology->nb_constraints; i++) + printf("%d ",topology->constraints[i]); + printf("\n"); + } + + printf("\tnb_levels=%d\n\tnb_constraints=%d\n\toversub_fact=%d\n\tnb proc units=%d\n\n", + topology->nb_levels, topology->nb_constraints, topology->oversub_fact, topology->nb_proc_units); + +} + + +void tm_display_arity(tm_topology_t *topology){ + int depth; + for(depth=0; depth < topology->nb_levels; depth++) + printf("%d(%lf): ",topology->arity[depth], topology->cost[depth]); + + printf("\n"); +} + +int int_cmp_inc(const void* x1,const void* x2) +{ + return *((int *)x1) < *((int *)x2) ? -1 : 1; +} + + +static int topo_check_constraints(tm_topology_t *topology){ + int n = topology->nb_constraints; + int i; + int depth = topology->nb_levels-1; + for (i=0;inode_id[depth], topology->nb_nodes[depth], topology->constraints[i])){ + if(tm_get_verbose_level() >= CRITICAL){ + fprintf(stderr,"Error! Incompatible constraint with the topology: rank %d in the constraints is not a valid id of any nodes of the topology.\n",topology->constraints[i]); + } + return 0; + } + } + return 1; +} + + + + +/* cpy flag tells if we need to copy the array. + Set to 1 when called from the application level and 0 when called from inside the library*/ +static int tm_topology_set_binding_constraints_cpy(int *constraints, int nb_constraints, tm_topology_t *topology, int cpy_flag){ + + topology -> nb_constraints = nb_constraints; + if(cpy_flag){ + topology -> constraints = (int*)MALLOC(nb_constraints*sizeof(int)); + memcpy(topology -> constraints, constraints, nb_constraints*sizeof(int)); + }else{ + topology -> constraints = constraints; + } + + return topo_check_constraints(topology); +} + +int tm_topology_set_binding_constraints(int *constraints, int nb_constraints, tm_topology_t *topology){ + return tm_topology_set_binding_constraints_cpy(constraints, nb_constraints, topology, 1); +} + +int tm_topology_add_binding_constraints(char *constraints_filename, tm_topology_t *topology) +{ + int *tab = NULL; + FILE *pf = NULL; + char line[LINE_SIZE],*l = NULL; + char *ptr = NULL; + int i,n; + unsigned int vl = tm_get_verbose_level(); + + + if (!(pf = fopen(constraints_filename,"r"))) { + if(vl >= CRITICAL) + fprintf(stderr,"Cannot open %s\n",constraints_filename); + exit(-1); + } + + /* compute the size of the array to store the constraints*/ + n = 0; + fgets(line, LINE_SIZE, pf); + l = line; + while((ptr=strtok(l," \t"))){ + l = NULL; + if((ptr[0] != '\n') && ( !isspace(ptr[0])) && (*ptr) && (ptr)) + n++; + } + + tab = (int*)MALLOC(n*sizeof(int)); + + rewind(pf); + fgets(line, LINE_SIZE, pf); + fclose(pf); + l = line; + i = 0; + while((ptr=strtok(l," \t"))){ + l = NULL; + if((ptr[0] != '\n') && ( !isspace(ptr[0])) && (*ptr) && (ptr)){ + if(i < n) + tab[i] = atoi(ptr); + else{ + if(vl >= CRITICAL) + fprintf(stderr, "More than %d entries in %s\n", n, constraints_filename); + exit(-1); + } + i++; + } + } + + if( i != n ){ + if(vl >= CRITICAL) + fprintf(stderr, "Read %d entries while expecting %d ones\n", i, n); + exit(-1); + } + + qsort(tab,n,sizeof(int),int_cmp_inc); + + return tm_topology_set_binding_constraints_cpy(tab, n, topology, 0); +} + + +void topology_numbering_cpy(tm_topology_t *topology,int **numbering,int *nb_nodes) +{ + int nb_levels; + unsigned int vl = tm_get_verbose_level(); + + nb_levels = topology->nb_levels; + *nb_nodes = topology->nb_nodes[nb_levels-1]; + if(vl >= INFO) + printf("nb_nodes=%d\n",*nb_nodes); + *numbering = (int*)MALLOC(sizeof(int)*(*nb_nodes)); + memcpy(*numbering,topology->node_id[nb_levels-1],sizeof(int)*(*nb_nodes)); +} + +void topology_arity_cpy(tm_topology_t *topology,int **arity,int *nb_levels) +{ + *nb_levels = topology->nb_levels; + *arity = (int*)MALLOC(sizeof(int)*(*nb_levels)); + memcpy(*arity,topology->arity,sizeof(int)*(*nb_levels)); +} + +void topology_constraints_cpy(tm_topology_t *topology,int **constraints,int *nb_constraints) +{ + *nb_constraints = topology->nb_constraints; + if(topology->constraints){ + *constraints = (int*)MALLOC(sizeof(int)*(*nb_constraints)); + memcpy(*constraints,topology->constraints,sizeof(int)*(*nb_constraints)); + }else{ + *constraints = NULL; + } +} + +void topology_cost_cpy(tm_topology_t *topology,double **cost) +{ + *cost = (double*)MALLOC(sizeof(double)*(topology->nb_levels)); + memcpy(*cost,topology->cost,sizeof(double)*(topology->nb_levels)); +} + +void optimize_arity(int **arity, double **cost, int *nb_levels,int n) +{ + int a,i; + int *new_arity = NULL; + double *new_cost = NULL; + + if( n < 0 ) + return; + /* printf("n=%d\tnb_levels=%d\n",n,*nb_levels); */ + /* for(i=0;i<*nb_levels;i++) */ + /* printf("%d:",(*arity)[i]); */ + /* printf("\n"); */ + /* if(n==(*nb_levels)-3) */ + /* exit(-1); */ + a = (*arity)[n]; + if( (a%3 == 0) && (a > 3) ){ + /* + check if the arity of level n devides 3 + If this is the case: + Add a level + */ + (*nb_levels)++; + /* Build a new arity and cost arrays */ + new_arity = (int*)MALLOC(sizeof(int)*(*nb_levels)); + new_cost = (double*)MALLOC(sizeof(double)*(*nb_levels)); + /* Copy the begining if the old arrays */ + for( i = 0 ; i < n ; i++){ + new_arity[i] = (*arity)[i]; + new_cost[i] = (*cost)[i]; + } + /* set the nth level to arity 3 */ + new_arity[n] = 3; + /* copy the cost to this level*/ + new_cost[n] = (*cost)[n];; + /* printf("a=%d\n",a); */ + /* Set the (n+1) level to arity a/3 */ + new_arity[n+1] = a/3; + /*Dupliacte the cost as it is the same level originally*/ + new_cost[n+1] = (*cost)[n]; + /* Copy the end of the arrays */ + for( i = n+2 ; i < *nb_levels ; i++){ + new_arity[i] = (*arity)[i-1]; + new_cost[i] = (*cost)[i-1]; + } + FREE(*arity); + FREE(*cost); + /* if a/3 =3 then go to the next level */ + if(new_arity[n+1] == 3) + optimize_arity(&new_arity,&new_cost,nb_levels,n); + else /* continue to this level (remember we just add a new level */ + optimize_arity(&new_arity,&new_cost,nb_levels,n+1); + *arity=new_arity; + *cost=new_cost; + }else if( (a%2==0) && (a>2) ){/* same as above but for arity == 2 instead of 3 */ + (*nb_levels)++; + new_arity = (int*)MALLOC(sizeof(int)*(*nb_levels)); + new_cost = (double*)MALLOC(sizeof(double)*(*nb_levels)); + for( i = 0 ; i < n ; i++ ){ + new_arity[i] = (*arity)[i]; + new_cost[i] = (*cost)[i]; + } + new_arity[n] = 2; + new_cost[n] = (*cost)[n];; + /* printf("a=%d\n",a); */ + new_arity[n+1] = a/2; + new_cost[n+1] = (*cost)[n]; + for( i = n+2 ; i < *nb_levels ; i++ ){ + new_arity[i] = (*arity)[i-1]; + new_cost[i] = (*cost)[i-1]; + } + FREE(*arity); + FREE(*cost); + if(new_arity[n+1] == 2) + optimize_arity(&new_arity, &new_cost, nb_levels, n); + else + optimize_arity(&new_arity, &new_cost, nb_levels, n+1); + *arity = new_arity; + *cost= new_cost; + }else /* if nothing works go to next level. */ + optimize_arity(arity, cost, nb_levels,n-1); +} + + + + +void tm_optimize_topology(tm_topology_t **topology){ + int *arity = NULL,nb_levels; + int *numbering = NULL,nb_nodes; + tm_topology_t *new_topo; + double *cost; + unsigned int vl = tm_get_verbose_level(); + int *constraints = NULL, nb_constraints; + int i; + + if(vl >= DEBUG) + tm_display_arity(*topology); + + topology_arity_cpy(*topology,&arity,&nb_levels); + topology_numbering_cpy(*topology,&numbering,&nb_nodes); + topology_constraints_cpy(*topology,&constraints,&nb_constraints); + topology_cost_cpy(*topology,&cost); + + + optimize_arity(&arity,&cost,&nb_levels,nb_levels-2); + new_topo = tm_build_synthetic_topology(arity, NULL, nb_levels,numbering,nb_nodes); + new_topo->cost = cost; + new_topo->constraints = constraints; + new_topo->nb_constraints = nb_constraints; + new_topo->nb_proc_units = (*topology)->nb_proc_units; + new_topo->oversub_fact = (*topology)->oversub_fact; + + + + if(vl >= DEBUG){ + if(constraints){ + printf("Constraints: "); + for(i=0;inb_constraints = 0; + topology->oversub_fact = 1; + topology->constraints = NULL; + topology->nb_levels = nb_levels; + topology->arity = (int*)MALLOC(sizeof(int)*topology->nb_levels); + topology->node_id = (int**)MALLOC(sizeof(int*)*topology->nb_levels); + topology->node_rank = (int**)MALLOC(sizeof(int*)*topology->nb_levels); + topology->nb_nodes = (size_t *)MALLOC(sizeof(size_t)*topology->nb_levels); + if(cost) + topology->cost = (double*)CALLOC(topology->nb_levels,sizeof(double)); + else + topology->cost = NULL; + + memcpy(topology->arity, arity, sizeof(int)*nb_levels); + if(cost) + memcpy(topology->cost, cost, sizeof(double)*nb_levels); + + n = 1; + for( i = 0 ; i < topology->nb_levels ; i++ ){ + topology->nb_nodes[i] = n; + topology->node_id[i] = (int*)MALLOC(sizeof(int)*n); + topology->node_rank[i] = (int*)MALLOC(sizeof(int)*n); + if( i < topology->nb_levels-1){ + for( j = 0 ; j < n ; j++ ){ + topology->node_id[i][j] = j; + topology->node_rank[i][j]=j; + } + }else{ + for( j = 0 ; j < n ; j++ ){ + int id = core_numbering[j%nb_core_per_nodes] + (nb_core_per_nodes)*(j/nb_core_per_nodes); + topology->node_id[i][j] = id; + topology->node_rank[i][id] = j; + } + } + + + if (i == topology->nb_levels-1){ + topology->nb_constraints = n; + topology->nb_proc_units = n; + } + + n *= topology->arity[i]; + } + if(cost){ + /*aggregate costs*/ + for( i = topology->nb_levels-2 ; i >= 0 ; i-- ) + topology->cost[i] += topology->cost[i+1]; + } + + return topology; +} + + +void build_synthetic_proc_id(tm_topology_t *topology) +{ + int i; + size_t j,n = 1; + + topology->node_id = (int**)MALLOC(sizeof(int*)*topology->nb_levels); + topology->node_rank = (int**)MALLOC(sizeof(int*)*topology->nb_levels); + topology->nb_nodes = (size_t*) MALLOC(sizeof(size_t)*topology->nb_levels); + + for( i = 0 ; i < topology->nb_levels ; i++ ){ + /* printf("n= %lld, arity := %d\n",n, topology->arity[i]); */ + topology->nb_nodes[i] = n; + topology->node_id[i] = (int*)MALLOC(sizeof(long int)*n); + topology->node_rank[i] = (int*)MALLOC(sizeof(long int)*n); + if ( !topology->node_id[i] ){ + if(tm_get_verbose_level() >= CRITICAL) + fprintf(stderr,"Cannot allocate level %d (of size %ld) of the topology\n", i, (unsigned long int)n); + exit(-1); + } + + if (i == topology->nb_levels-1){ + topology->nb_constraints = n; + topology->nb_proc_units = n; + } + + + + for( j = 0 ; j < n ; j++ ){ + topology->node_id[i][j] = j; + topology->node_rank[i][j] = j; + } + n *= topology->arity[i]; + } + +} + + + +void tm_enable_oversubscribing(tm_topology_t *topology, unsigned int oversub_fact){ +{ + int i,j,n; + + if(oversub_fact <=1) + return; + + topology -> nb_levels ++; + topology -> arity = (int*) REALLOC(topology->arity, sizeof(int)*topology->nb_levels); + topology -> cost = (double*) REALLOC(topology->cost, sizeof(double)*topology->nb_levels); + topology -> node_id = (int**) REALLOC(topology->node_id, sizeof(int*)*topology->nb_levels); + topology -> node_rank = (int**) REALLOC(topology->node_rank, sizeof(int*)*topology->nb_levels); + topology -> nb_nodes = (size_t *)REALLOC(topology->nb_nodes, sizeof(size_t)*topology->nb_levels); + topology -> oversub_fact = oversub_fact; + + i = topology->nb_levels - 1; + n = topology->nb_nodes[i-1] * oversub_fact; + topology->arity[i-1] = oversub_fact; + topology->cost[i-1] = 0; + topology->node_id[i] = (int*)MALLOC(sizeof(int)*n); + topology->node_rank[i] = (int*)MALLOC(sizeof(int)*n); + topology->nb_nodes[i] = n; + + for( j = 0 ; j < n ; j++ ){ + int id = topology->node_id[i-1][j/oversub_fact]; + topology->node_id[i][j] = id; + topology->node_rank[i][id] = j; + } + } + +} diff --git a/ompi/mca/topo/treematch/treematch/tm_topology.h b/ompi/mca/topo/treematch/treematch/tm_topology.h new file mode 100644 index 00000000000..1cd0c5b4174 --- /dev/null +++ b/ompi/mca/topo/treematch/treematch/tm_topology.h @@ -0,0 +1,22 @@ +#include +#include "tm_tree.h" + +tm_topology_t* get_local_topo_with_hwloc(void); +tm_topology_t* hwloc_to_tm(char *filename); +int int_cmp_inc(const void* x1,const void* x2); +void optimize_arity(int **arity, double **cost, int *nb_levels,int n); +int symetric(hwloc_topology_t topology); +tm_topology_t * tgt_to_tm(char *filename); +void tm_display_arity(tm_topology_t *topology); +void tm_display_topology(tm_topology_t *topology); +void tm_free_topology(tm_topology_t *topology); +tm_topology_t *tm_load_topology(char *arch_filename, tm_file_type_t arch_file_type); +void tm_optimize_topology(tm_topology_t **topology); +int tm_topology_add_binding_constraints(char *constraints_filename, tm_topology_t *topology); +int topo_nb_proc(hwloc_topology_t topology,int N); +void topology_arity(tm_topology_t *topology,int **arity,int *nb_levels); +void topology_constraints(tm_topology_t *topology,int **constraints,int *nb_constraints); +void topology_cost(tm_topology_t *topology,double **cost); +void topology_numbering(tm_topology_t *topology,int **numbering,int *nb_nodes); +double ** topology_to_arch(hwloc_topology_t topology); + diff --git a/ompi/mca/topo/treematch/treematch/tm_tree.c b/ompi/mca/topo/treematch/treematch/tm_tree.c index f9ec2a9a117..ffac4e7615b 100644 --- a/ompi/mca/topo/treematch/treematch/tm_tree.c +++ b/ompi/mca/topo/treematch/treematch/tm_tree.c @@ -3,17 +3,20 @@ #include #include #include +#include + +#include "treematch.h" #include "tm_tree.h" +#include "tm_mapping.h" #include "tm_timings.h" #include "tm_bucket.h" #include "tm_kpartitioning.h" -#include "tm_mapping.h" #include "tm_verbose.h" #include "tm_thread_pool.h" -#define MIN(a,b) ((a)<(b)?(a):(b)) -#define MAX(a,b) ((a)>(b)?(a):(b)) +#define MIN(a, b) ((a)<(b)?(a):(b)) +#define MAX(a, b) ((a)>(b)?(a):(b)) #ifndef __CHARMC__ #define __CHARMC__ 0 @@ -22,151 +25,153 @@ #if __CHARMC__ #include "converse.h" #else -static int ilog2(int val) -{ - int i = 0; - for( ; val != 0; val >>= 1, i++ ); - return i; -} -#define CmiLog2(VAL) ilog2((int)(VAL)) +#define CmiLog2(VAL) log2((double)(VAL)) #endif static int verbose_level = ERROR; +static int exhaustive_search_flag = 0; + +void free_list_child(tm_tree_t *);void free_tab_child(tm_tree_t *); +double choose (long, long);void display_node(tm_tree_t *); +void clone_tree(tm_tree_t *, tm_tree_t *); +double *aggregate_obj_weight(tm_tree_t *, double *, int); +tm_affinity_mat_t *aggregate_com_mat(tm_tree_t *, tm_affinity_mat_t *, int); +double eval_grouping(tm_affinity_mat_t *, tm_tree_t **, int); +group_list_t *new_group_list(tm_tree_t **, double, group_list_t *); +void add_to_list(group_list_t *, tm_tree_t **, int, double); +void list_all_possible_groups(tm_affinity_mat_t *, tm_tree_t *, int, int, int, tm_tree_t **, group_list_t *); +int independent_groups(group_list_t **, int, group_list_t *, int); +void display_selection (group_list_t**, int, int, double); +void display_grouping (tm_tree_t *, int, int, double); +int recurs_select_independent_groups(group_list_t **, int, int, int, int, + int, double, double *, group_list_t **, group_list_t **); +int test_independent_groups(group_list_t **, int, int, int, int, int, double, double *, + group_list_t **, group_list_t **); +void delete_group_list(group_list_t *); +int group_list_id(const void*, const void*); +int group_list_asc(const void*, const void*); +int group_list_dsc(const void*, const void*); +int weighted_degree_asc(const void*, const void*); +int weighted_degree_dsc(const void*, const void*); +int select_independent_groups(group_list_t **, int, int, int, double *, group_list_t **, int, double); +int select_independent_groups_by_largest_index(group_list_t **, int, int, int, double *, + group_list_t **, int, double); +void list_to_tab(group_list_t *, group_list_t **, int); +void display_tab_group(group_list_t **, int, int); +int independent_tab(tm_tree_t **, tm_tree_t **, int); +void compute_weighted_degree(group_list_t **, int, int); +void group(tm_affinity_mat_t *, tm_tree_t *, tm_tree_t *, int, int, int, double *, tm_tree_t **); +void fast_group(tm_affinity_mat_t *, tm_tree_t *, tm_tree_t *, int, int, int, double *, tm_tree_t **, int *, int); +int adjacency_asc(const void*, const void*); +int adjacency_dsc(const void*, const void*); + void super_fast_grouping(tm_affinity_mat_t *, tm_tree_t *, tm_tree_t *, int, int); +tm_affinity_mat_t *build_cost_matrix(tm_affinity_mat_t *, double *, double); +void group_nodes(tm_affinity_mat_t *, tm_tree_t *, tm_tree_t *, int , int, double*, double); +double fast_grouping(tm_affinity_mat_t *, tm_tree_t *, tm_tree_t *, int, int, double); +void complete_aff_mat(tm_affinity_mat_t **, int, int); +void complete_obj_weight(double **, int, int); +void create_dumb_tree(tm_tree_t *, int, tm_topology_t *); +void complete_tab_node(tm_tree_t **, int, int, int, tm_topology_t *); +void set_deb_tab_child(tm_tree_t *, tm_tree_t *, int); +tm_tree_t *build_level_topology(tm_tree_t *, tm_affinity_mat_t *, int, int, tm_topology_t *, double *, double *); +int check_constraints(tm_topology_t *, int **); +tm_tree_t *bottom_up_build_tree_from_topology(tm_topology_t *, tm_affinity_mat_t *, double *, double *); +void free_non_constraint_tree(tm_tree_t *); +void free_constraint_tree(tm_tree_t *); +void free_tab_double(double**, int); +void free_tab_int(int**, int ); +void partial_aggregate_aff_mat (int, void **, int); +void free_affinity_mat(tm_affinity_mat_t *aff_mat); +int int_cmp_inc(const void* x1, const void* x2); + + + + + +void tm_set_exhaustive_search_flag(int new_val){ + exhaustive_search_flag = new_val; +} +int tm_get_exhaustive_search_flag(){ + return exhaustive_search_flag; +} -void FREE_list_child(tree_t *); -void FREE_tab_child(tree_t *); -unsigned long int choose (long,long); -void display_node(tree_t *); -void clone_tree(tree_t *,tree_t *); -double *aggregate_obj_weight(tree_t *,double *,int); -affinity_mat_t *aggregate_com_mat(tree_t *,affinity_mat_t *,int); -double eval_grouping(affinity_mat_t *,tree_t **,int); -group_list_t *new_group_list(tree_t **,double,group_list_t *); -void add_to_list(group_list_t *,tree_t **,int,double); -void list_all_possible_groups(affinity_mat_t *,tree_t *,int,int,int,tree_t **,group_list_t *); -int independent_groups(group_list_t **,int,group_list_t *,int); -void display_selection (group_list_t**,int,int,double); -void display_grouping (tree_t *,int,int,double); -int recurs_select_independent_groups(group_list_t **,int,int,int,int, - int,double,double *,group_list_t **,group_list_t **); -int test_independent_groups(group_list_t **,int,int,int,int,int,double,double *, - group_list_t **,group_list_t **); -void delete_group_list(group_list_t *); -int group_list_id(const void*,const void*); -int group_list_asc(const void*,const void*); -int group_list_dsc(const void*,const void*); -int weighted_degree_asc(const void*,const void*); -int weighted_degree_dsc(const void*,const void*); -int select_independent_groups(group_list_t **,int,int,int,double *,group_list_t **,int,double); -int select_independent_groups_by_largest_index(group_list_t **,int,int,int,double *, - group_list_t **,int,double); -void list_to_tab(group_list_t *,group_list_t **,int); -void display_tab_group(group_list_t **,int,int); -int independent_tab(tree_t **,tree_t **,int); -void compute_weighted_degree(group_list_t **,int,int); -void group(affinity_mat_t *,tree_t *,tree_t *,int,int,int,double *,tree_t **); -void fast_group(affinity_mat_t *,tree_t *,tree_t *,int,int,int,double *,tree_t **, int *, int); -int adjacency_asc(const void*,const void*); -int adjacency_dsc(const void*,const void*); - void super_fast_grouping(affinity_mat_t *,tree_t *,tree_t *,int, int); -affinity_mat_t *build_cost_matrix(affinity_mat_t *,double *,double); -void group_nodes(affinity_mat_t *,tree_t *,tree_t *,int ,int,double*,double); -void fast_grouping(affinity_mat_t *,tree_t *,tree_t *,int,int,long int); -void complete_aff_mat(affinity_mat_t **,int,int); -void complete_obj_weight(double **,int,int); -void create_dumb_tree(tree_t *,int,tm_topology_t *); -void complete_tab_node(tree_t **,int,int,int,tm_topology_t *); -void set_deb_tab_child(tree_t *,tree_t *,int); -tree_t *build_level_topology(tree_t *,affinity_mat_t *,int,int,tm_topology_t *,double *,double *); -int check_constraints(tm_topology_t *,int **); -tree_t *bottom_up_build_tree_from_topology(tm_topology_t *,double **, int ,double *,double *); -void FREE_non_constraint_tree(tree_t *); -void FREE_constraint_tree(tree_t *); -void FREE_tab_double(double**,int); -void FREE_tab_int(int**,int ); -void partial_aggregate_com_mat (int, void **); -affinity_mat_t *new_affinity_mat(double **, double *, int); -void partial_aggregate_aff_mat (int, void **); -affinity_mat_t *aggregate_aff_mat(tree_t *, affinity_mat_t *, int); -affinity_mat_t * build_affinity_mat(double **, int); - -affinity_mat_t *new_affinity_mat(double **mat, double *sum_row, int order){ - affinity_mat_t *res = (affinity_mat_t *) MALLOC (sizeof(affinity_mat_t)); - - res -> mat = mat; - res -> sum_row = sum_row; - res -> order = order; - return res; +void free_affinity_mat(tm_affinity_mat_t *aff_mat){ + free_tab_double(aff_mat->mat, aff_mat->order); + FREE(aff_mat->sum_row); + FREE(aff_mat); } -void FREE_list_child(tree_t *tree) + + +void free_list_child(tm_tree_t *tree) { - int i; + int i; - if(NULL == tree) return; + if(tree){ for(i=0;iarity;i++) - FREE_list_child(tree->child[i]); + free_list_child(tree->child[i]); FREE(tree->child); if(tree->dumb) - FREE(tree); + FREE(tree); + } } - -void FREE_tab_child(tree_t *tree) +void free_tab_child(tm_tree_t *tree) { if(tree){ - FREE_tab_child(tree->tab_child); + free_tab_child(tree->tab_child); FREE(tree->tab_child); } } -void FREE_non_constraint_tree(tree_t *tree) +void free_non_constraint_tree(tm_tree_t *tree) { - int free_tree = tree->dumb; - FREE_tab_child(tree); - FREE_list_child(tree); - if(free_tree) + int d = tree->dumb; + + free_tab_child(tree); + free_list_child(tree); + if(!d) FREE(tree); } -void FREE_constraint_tree(tree_t *tree) +void free_constraint_tree(tm_tree_t *tree) { int i; if(tree){ for(i=0;iarity;i++) - FREE_constraint_tree(tree->child[i]); + free_constraint_tree(tree->child[i]); FREE(tree->child); FREE(tree); } } -void FREE_tree(tree_t *tree) +void tm_free_tree(tm_tree_t *tree) { if(tree->constraint) - FREE_constraint_tree(tree); + free_constraint_tree(tree); else - FREE_non_constraint_tree(tree); + free_non_constraint_tree(tree); } -unsigned long int choose (long n,long k) +double choose (long n, long k) { /* compute C_n_k */ double res = 1; int i; - for( i = 0 ; i < k ; i++ ) - res *= (double)(n-i)/(double)(k-i); - - return (unsigned long int)res; + for( i = 0 ; i < k ; i++ ){ + res *= ((double)(n-i)/(double)(k-i)); + } + return res; } -void set_node(tree_t *node,tree_t ** child, int arity,tree_t *parent, - int id,double val,tree_t *tab_child,int depth) +void set_node(tm_tree_t *node, tm_tree_t ** child, int arity, tm_tree_t *parent, + int id, double val, tm_tree_t *tab_child, int depth) { static int uniq = 0; node->child = child; @@ -180,14 +185,14 @@ void set_node(tree_t *node,tree_t ** child, int arity,tree_t *parent, node->dumb = 0; } -void display_node(tree_t *node) +void display_node(tm_tree_t *node) { if (verbose_level >= DEBUG) printf("child : %p\narity : %d\nparent : %p\nid : %d\nval : %f\nuniq : %d\n\n", - (void *)(node->child),node->arity,(void *)(node->parent),node->id,node->val,node->uniq); + (void *)(node->child), node->arity, (void *)(node->parent), node->id, node->val, node->uniq); } -void clone_tree(tree_t *new,tree_t *old) +void clone_tree(tm_tree_t *new, tm_tree_t *old) { int i; new->child = old->child; @@ -204,9 +209,9 @@ void clone_tree(tree_t *new,tree_t *old) } -double *aggregate_obj_weight(tree_t *new_tab_node, double *tab, int M) +double *aggregate_obj_weight(tm_tree_t *new_tab_node, double *tab, int M) { - int i,i1,id1; + int i, i1, id1; double *res = NULL; if(!tab) @@ -226,26 +231,26 @@ double *aggregate_obj_weight(tree_t *new_tab_node, double *tab, int M) -void partial_aggregate_aff_mat (int nb_args, void **args){ +void partial_aggregate_aff_mat (int nb_args, void **args, int thread_id){ int inf = *(int*)args[0]; int sup = *(int*)args[1]; double **old_mat = (double**)args[2]; - tree_t *tab_node = (tree_t*)args[3]; + tm_tree_t *tab_node = (tm_tree_t*)args[3]; int M = *(int*)args[4]; double **mat = (double**)args[5]; double *sum_row = (double*)args[6]; - int i,j,i1,j1; + int i, j, i1, j1; int id1, id2; - if(nb_args != 6){ + if(nb_args != 7){ if(verbose_level >= ERROR) - fprintf(stderr,"Wrong number of args in %s: %d\n",__func__, nb_args); + fprintf(stderr, "Thread %d: Wrong number of args in %s: %d\n", thread_id, __func__, nb_args); exit(-1); } if(verbose_level >= INFO) - printf("Aggregate in parallel (%d-%d)\n",inf,sup-1); + printf("Aggregate in parallel (%d-%d)\n", inf, sup-1); for( i = inf ; i < sup ; i++ ) for( j = 0 ; j < M ; j++ ){ @@ -255,7 +260,7 @@ void partial_aggregate_aff_mat (int nb_args, void **args){ for( j1 = 0 ; j1 < tab_node[j].arity ; j1++ ){ id2 = tab_node[j].child[j1]->id; mat[i][j] += old_mat[id1][id2]; - /* printf("mat[%d][%d]+=old_mat[%d][%d]=%f\n",i,j,id1,id2,old_mat[id1][id2]);*/ + /* printf("mat[%d][%d]+=old_mat[%d][%d]=%f\n", i, j, id1, id2, old_mat[id1][id2]);*/ } sum_row[i] += mat[i][j]; } @@ -264,17 +269,17 @@ void partial_aggregate_aff_mat (int nb_args, void **args){ } -affinity_mat_t *aggregate_aff_mat(tree_t *tab_node, affinity_mat_t *aff_mat, int M) +static tm_affinity_mat_t *aggregate_aff_mat(tm_tree_t *tab_node, tm_affinity_mat_t *aff_mat, int M) { - int i,j,i1,j1,id1,id2; + int i, j, i1, j1, id1, id2; double **new_mat = NULL, **old_mat = aff_mat->mat; double *sum_row = NULL; new_mat = (double**)MALLOC(M*sizeof(double*)); for( i = 0 ; i < M ; i++ ) - new_mat[i] = (double*)CALLOC((M),sizeof(double)); + new_mat[i] = (double*)CALLOC((M), sizeof(double)); - sum_row = (double*)CALLOC(M,sizeof(double)); + sum_row = (double*)CALLOC(M, sizeof(double)); if(M>512){ /* perform this part in parallel*/ int id; @@ -283,7 +288,7 @@ affinity_mat_t *aggregate_aff_mat(tree_t *tab_node, affinity_mat_t *aff_mat, int int *inf; int *sup; - nb_threads = MIN(M/512,get_nb_threads()); + nb_threads = MIN(M/512, get_nb_threads()); works = (work_t**)MALLOC(sizeof(work_t*)*nb_threads); inf = (int*)MALLOC(sizeof(int)*nb_threads); sup = (int*)MALLOC(sizeof(int)*nb_threads); @@ -300,9 +305,9 @@ affinity_mat_t *aggregate_aff_mat(tree_t *tab_node, affinity_mat_t *aff_mat, int args[5]=(void*)new_mat; args[6]=(void*)sum_row; - works[id]= create_work(7,args,partial_aggregate_aff_mat); + works[id]= create_work(7, args, partial_aggregate_aff_mat); if(verbose_level >= DEBUG) - printf("Executing %p\n",(void *)works[id]); + printf("Executing %p\n", (void *)works[id]); submit_work( works[id], id); } @@ -326,60 +331,66 @@ affinity_mat_t *aggregate_aff_mat(tree_t *tab_node, affinity_mat_t *aff_mat, int for( j1 = 0 ; j1 < tab_node[j].arity ; j1++ ){ id2 = tab_node[j].child[j1]->id; new_mat[i][j] += old_mat[id1][id2]; - /* printf("mat[%d][%d]+=old_mat[%d][%d]=%f\n",i,j,id1,id2,old_mat[id1][id2]);*/ + /* printf("mat[%d][%d]+=old_mat[%d][%d]=%f\n", i, j, id1, id2, old_mat[id1][id2]);*/ } sum_row[i] += new_mat[i][j]; } } } } - return new_affinity_mat(new_mat,sum_row,M); + return new_affinity_mat(new_mat, sum_row, M); } -void FREE_tab_double(double**tab,int N) +void free_tab_double(double**tab, int mat_order) { int i; - for( i = 0 ; i < N ; i++ ) + for( i = 0 ; i < mat_order ; i++ ) FREE(tab[i]); FREE(tab); } -void FREE_tab_int(int**tab,int N) +void free_tab_int(int**tab, int mat_order) { int i; - for( i = 0 ; i < N ; i++ ) + for( i = 0 ; i < mat_order ; i++ ) FREE(tab[i]); FREE(tab); } -void display_tab(double **tab,int N) +void display_tab(double **tab, int mat_order) { - int i,j; - double line,total = 0; - + int i, j; + double line, total = 0; + int vl = tm_get_verbose_level(); - for( i = 0 ; i < N ; i++ ){ + for( i = 0 ; i < mat_order ; i++ ){ line = 0; - for( j = 0 ; j < N ; j++ ){ - printf("%g ",tab[i][j]); + for( j = 0 ; j < mat_order ; j++ ){ + if(vl >= WARNING) + printf("%g ", tab[i][j]); + else + fprintf(stderr, "%g ", tab[i][j]); line += tab[i][j]; } total += line; - /* printf(": %g",line);*/ - printf("\n"); + /* printf(": %g", line);*/ + if(vl >= WARNING) + printf("\n"); + else + fprintf(stderr, "\n"); } - /* printf("Total: %.2f\n",total);*/ + /* printf("Total: %.2f\n", total);*/ } -double eval_grouping(affinity_mat_t *aff_mat,tree_t **cur_group,int arity) +double eval_grouping(tm_affinity_mat_t *aff_mat, tm_tree_t **cur_group, int arity) { double res = 0; - int i,j,id,id1,id2; + int i, j, id, id1, id2; double **mat = aff_mat->mat; double * sum_row = aff_mat -> sum_row; - /*display_tab(tab,N);*/ + /*display_tab(tab, mat_order);*/ for( i = 0 ; i < arity ; i++ ){ id = cur_group[i]->id; @@ -390,16 +401,16 @@ double eval_grouping(affinity_mat_t *aff_mat,tree_t **cur_group,int arity) id1 = cur_group[i]->id; for( j = 0 ; j < arity ; j++ ){ id2 = cur_group[j]->id; - /*printf("res-=tab[%d][%d]=%f\n",id1,id2,tab[id1][id2]);*/ + /*printf("res-=tab[%d][%d]=%f\n", id1, id2, tab[id1][id2]);*/ res -= mat[id1][id2]; } } - /*printf(" = %f\n",res);*/ + /*printf(" = %f\n", res);*/ return res; } -group_list_t *new_group_list(tree_t **tab,double val,group_list_t *next) +group_list_t *new_group_list(tm_tree_t **tab, double val, group_list_t *next) { group_list_t *res = NULL; @@ -412,74 +423,74 @@ group_list_t *new_group_list(tree_t **tab,double val,group_list_t *next) } -void add_to_list(group_list_t *list,tree_t **cur_group, int arity, double val) +void add_to_list(group_list_t *list, tm_tree_t **cur_group, int arity, double val) { group_list_t *elem = NULL; - tree_t **tab = NULL; + tm_tree_t **tab = NULL; int i; - tab=(tree_t **)MALLOC(sizeof(tree_t *)*arity); + tab=(tm_tree_t **)MALLOC(sizeof(tm_tree_t *)*arity); for( i = 0 ; i < arity ; i++ ){ tab[i] = cur_group[i]; - if(verbose_level>=INFO) - printf("cur_group[%d]=%d ",i,cur_group[i]->id); + if(verbose_level>=DEBUG) + printf("cur_group[%d]=%d ", i, cur_group[i]->id); } - if(verbose_level>=INFO) - printf(": %f\n",val); + if(verbose_level>=DEBUG) + printf(": %f\n", val); /*printf("\n");*/ - elem = new_group_list(tab,val,list->next); + elem = new_group_list(tab, val, list->next); list->next = elem; list->val++; } -void list_all_possible_groups(affinity_mat_t *aff_mat,tree_t *tab_node,int id,int arity, int depth, - tree_t **cur_group, group_list_t *list) +void list_all_possible_groups(tm_affinity_mat_t *aff_mat, tm_tree_t *tab_node, int id, int arity, int depth, + tm_tree_t **cur_group, group_list_t *list) { double val; int i; - int N = aff_mat->order; + int mat_order = aff_mat->order; if(depth == arity){ - val = eval_grouping(aff_mat,cur_group,arity); - add_to_list(list,cur_group,arity,val); + val = eval_grouping(aff_mat, cur_group, arity); + add_to_list(list, cur_group, arity, val); return; - }else if( (N+depth) >= (arity+id) ){ + }else if( (mat_order+depth) >= (arity+id) ){ /*}else if(1){*/ - for( i = id ; i < N ; i++ ){ + for( i = id ; i < mat_order ; i++ ){ if(tab_node[i].parent) continue; cur_group[depth] = &tab_node[i]; - if(verbose_level>=INFO) - printf("%d<-%d\n",depth,i); - list_all_possible_groups(aff_mat,tab_node,i+1,arity,depth+1,cur_group,list); + if(verbose_level>=DEBUG) + printf("%d<-%d\n", depth, i); + list_all_possible_groups(aff_mat, tab_node, i+1, arity, depth+1, cur_group, list); } } } -void update_val(affinity_mat_t *aff_mat,tree_t *parent) +void update_val(tm_affinity_mat_t *aff_mat, tm_tree_t *parent) { /* int i; */ - parent->val = eval_grouping(aff_mat,parent->child,parent->arity); + parent->val = eval_grouping(aff_mat, parent->child, parent->arity); /*printf("connecting: ");*/ /*for( i = 0 ; i < parent->arity ; i++ ){ */ - /*printf("%d ",parent->child[i]->id);*/ + /*printf("%d ", parent->child[i]->id);*/ /* if(parent->child[i]->parent!=parent){ parent->child[i]->parent=parent; }else{ - fprintf(stderr,"redundant operation!\n"); + fprintf(stderr, "redundant operation!\n"); exit(-1); }*/ /* } */ - /*printf(": %f\n",parent->val);*/ + /*printf(": %f\n", parent->val);*/ } -int independent_groups(group_list_t **selection,int d,group_list_t *elem,int arity) +int independent_groups(group_list_t **selection, int d, group_list_t *elem, int arity) { - int i,j,k; + int i, j, k; if(d == 0) return 1; @@ -492,25 +503,30 @@ int independent_groups(group_list_t **selection,int d,group_list_t *elem,int ari return 1; } -void display_selection (group_list_t** selection,int M,int arity,double val) + + +void display_selection (group_list_t** selection, int M, int arity, double val) { - int i,j; + int i, j; + double local_val = 0; - if(verbose_leveltab[j]->id); - printf("-- "); + printf("%d ", selection[i]->tab[j]->id); + printf("(%d)-- ", selection[i]->id); + local_val+=selection[i]->val; } - printf(":%f\n",val); + printf(":%f -- %f\n", val, local_val); } -void display_grouping (tree_t *father,int M,int arity,double val) + +void display_grouping (tm_tree_t *father, int M, int arity, double val) { - int i,j; + int i, j; if(verbose_level < INFO) return; @@ -518,14 +534,14 @@ void display_grouping (tree_t *father,int M,int arity,double val) printf("Grouping : "); for( i = 0 ; i < M ; i++ ){ for( j = 0 ; j < arity ; j++ ) - printf("%d ",father[i].child[j]->id); + printf("%d ", father[i].child[j]->id); printf("-- "); } - printf(":%f\n",val); + printf(":%f\n", val); } -int recurs_select_independent_groups(group_list_t **tab,int i,int n,int arity,int d,int M,double val,double *best_val,group_list_t **selection,group_list_t **best_selection) +int recurs_select_independent_groups(group_list_t **tab, int i, int n, int arity, int d, int M, double val, double *best_val, group_list_t **selection, group_list_t **best_selection) { group_list_t *elem = NULL; /* @@ -534,8 +550,8 @@ int recurs_select_independent_groups(group_list_t **tab,int i,int n,int arity,in */ if( d == M ){ - if(verbose_level>=INFO) - display_selection(selection,M,arity,val); + if(verbose_level >= DEBUG) + display_selection(selection, M, arity, val); if( val < *best_val ){ *best_val = val; for( i = 0 ; i < M ; i++ ) @@ -547,12 +563,12 @@ int recurs_select_independent_groups(group_list_t **tab,int i,int n,int arity,in while( i < n ){ elem = tab[i]; - if(independent_groups(selection,d,elem,arity)){ - if(verbose_level>=INFO) - printf("%d: %d\n",d,i); + if(independent_groups(selection, d, elem, arity)){ + if(verbose_level >= DEBUG) + printf("%d: %d\n", d, i); selection[d] = elem; val += elem->val; - return recurs_select_independent_groups(tab,i+1,n,arity,d+1,M,val,best_val,selection,best_selection); + return recurs_select_independent_groups(tab, i+1, n, arity, d+1, M, val, best_val, selection, best_selection); } i++; } @@ -560,22 +576,23 @@ int recurs_select_independent_groups(group_list_t **tab,int i,int n,int arity,in } -int test_independent_groups(group_list_t **tab,int i,int n,int arity,int d,int M,double val,double *best_val,group_list_t **selection,group_list_t **best_selection) + +int test_independent_groups(group_list_t **tab, int i, int n, int arity, int d, int M, double val, double *best_val, group_list_t **selection, group_list_t **best_selection) { group_list_t *elem = NULL; if( d == M ){ - /*display_selection(selection,M,arity,val);*/ + /*display_selection(selection, M, arity, val);*/ return 1; } while( i < n ){ elem = tab[i]; - if(independent_groups(selection,d,elem,arity)){ - /*printf("%d: %d\n",d,i);*/ + if(independent_groups(selection, d, elem, arity)){ + /*printf("%d: %d\n", d, i);*/ selection[d] = elem; val += elem->val; - return recurs_select_independent_groups(tab,i+1,n,arity,d+1,M,val,best_val,selection,best_selection); + return recurs_select_independent_groups(tab, i+1, n, arity, d+1, M, val, best_val, selection, best_selection); } i++; } @@ -584,6 +601,7 @@ int test_independent_groups(group_list_t **tab,int i,int n,int arity,int d,int M void delete_group_list(group_list_t *list) { + if(list){ delete_group_list(list->next); FREE(list->tab); @@ -591,9 +609,9 @@ void delete_group_list(group_list_t *list) } } -int group_list_id(const void* x1,const void* x2) +int group_list_id(const void* x1, const void* x2) { - group_list_t *e1 = NULL,*e2= NULL; + group_list_t *e1 = NULL, *e2= NULL; e1 = *((group_list_t**)x1); e2 = *((group_list_t**)x2); @@ -601,9 +619,9 @@ int group_list_id(const void* x1,const void* x2) return (e1->tab[0]->id < e2->tab[0]->id) ? - 1 : 1; } -int group_list_asc(const void* x1,const void* x2) +int group_list_asc(const void* x1, const void* x2) { - group_list_t *e1 = NULL,*e2 = NULL; + group_list_t *e1 = NULL, *e2 = NULL; e1 = *((group_list_t**)x1); e2 = *((group_list_t**)x2); @@ -611,9 +629,9 @@ int group_list_asc(const void* x1,const void* x2) return (e1->val < e2->val) ? - 1 : 1; } -int group_list_dsc(const void* x1,const void* x2) +int group_list_dsc(const void* x1, const void* x2) { - group_list_t *e1 = NULL,*e2 = NULL; + group_list_t *e1 = NULL, *e2 = NULL; e1 = *((group_list_t**)x1); e2 = *((group_list_t**)x2); @@ -621,9 +639,9 @@ int group_list_dsc(const void* x1,const void* x2) return (e1->val > e2->val) ? -1 : 1; } -int weighted_degree_asc(const void* x1,const void* x2) +int weighted_degree_asc(const void* x1, const void* x2) { - group_list_t *e1= NULL,*e2 = NULL; + group_list_t *e1= NULL, *e2 = NULL; e1 = *((group_list_t**)x1); e2 = *((group_list_t**)x2); @@ -631,9 +649,9 @@ int weighted_degree_asc(const void* x1,const void* x2) return (e1->wg > e2->wg) ? 1 : -1; } -int weighted_degree_dsc(const void* x1,const void* x2) +int weighted_degree_dsc(const void* x1, const void* x2) { - group_list_t *e1 = NULL,*e2 = NULL; + group_list_t *e1 = NULL, *e2 = NULL; e1 = *((group_list_t**)x1); e2 = *((group_list_t**)x2); @@ -641,20 +659,20 @@ int weighted_degree_dsc(const void* x1,const void* x2) return (e1->wg > e2->wg) ? - 1 : 1; } -int select_independent_groups(group_list_t **tab_group,int n,int arity,int M,double *best_val, - group_list_t **best_selection,int bound,double max_duration) +int select_independent_groups(group_list_t **tab_group, int n, int arity, int M, double *best_val, + group_list_t **best_selection, int bound, double max_duration) { - int i,j; + int i, j; group_list_t **selection = NULL; - double val,duration; - CLOCK_T time1,time0; + double val, duration; + CLOCK_T time1, time0; - if(verbose_level>=INFO){ + if(verbose_level>=DEBUG){ for(i=0;itab[j]->id); + printf("%d ", tab_group[i]->tab[j]->id); } - printf(" : %f\n",tab_group[i]->val); + printf(" : %f\n", tab_group[i]->val); } } @@ -662,14 +680,14 @@ int select_independent_groups(group_list_t **tab_group,int n,int arity,int M,do selection = (group_list_t **)MALLOC(sizeof(group_list_t*)*M); CLOCK(time0); - for( i = 0 ; i < MIN(bound,n) ; i++ ){ - /* if(!(i%100)) {printf("%d/%d ",i, MIN(bound,n)); fflush(stdout);} */ + for( i = 0 ; i < MIN(bound, n) ; i++ ){ + /* if(!(i%100)) {printf("%d/%d ", i, MIN(bound, n)); fflush(stdout);} */ selection[0] = tab_group[i]; val = tab_group[i]->val; - recurs_select_independent_groups(tab_group,i+1,n,arity,1,M,val,best_val,selection,best_selection); + recurs_select_independent_groups(tab_group, i+1, n, arity, 1, M, val, best_val, selection, best_selection); if((!(i%5)) && (max_duration>0)){ CLOCK(time1); - duration = CLOCK_DIFF(time1,time0); + duration = CLOCK_DIFF(time1, time0); if(duration>max_duration){ FREE(selection); return 1; @@ -680,365 +698,1029 @@ int select_independent_groups(group_list_t **tab_group,int n,int arity,int M,do if(verbose_level>=INFO) - display_selection(best_selection,M,arity,*best_val); + display_selection(best_selection, M, arity, *best_val); return 0; } -int select_independent_groups_by_largest_index(group_list_t **tab_group,int n,int arity,int M,double *best_val,group_list_t **best_selection,int bound,double max_duration) -{ - int i,dec,nb_groups=0; - group_list_t **selection = NULL; - double val,duration; - CLOCK_T time1,time0; - selection = (group_list_t **)MALLOC(sizeof(group_list_t*)*M); - CLOCK(time0); +static int8_t** init_independent_group_mat(int n, group_list_t **tab_group, int arity){ + int i, j, ii, jj; + int8_t **indep_mat = (int8_t **)MALLOC(sizeof(int8_t*) *n); - dec = MAX(n/10000,2); - for( i = n-1 ; i >= 0 ; i -= dec*dec){ - selection[0] = tab_group[i]; - val = tab_group[i]->val; - nb_groups += test_independent_groups(tab_group,i+1,n,arity,1,M,val,best_val,selection,best_selection); - if(verbose_level>=DEBUG) - printf("%d:%d\n",i,nb_groups); + for( i=0 ; i= bound){ - FREE(selection); - return 0; - } - if((!(i%5)) && (max_duration>0)){ - CLOCK(time1); - duration=CLOCK_DIFF(time1,time0); - if(duration>max_duration){ - FREE(selection); - return 1; + /* always i>j in indep_mat[i][j] */ + for(j=0 ; jtab[ii]->id == elem2->tab[jj]->id){ + indep_mat[i][j] = 0; + goto done; + } + } } + indep_mat[i][j] = 1; + done: ; } } - FREE(selection); - return 0; + + return indep_mat; } -void list_to_tab(group_list_t *list,group_list_t **tab,int n) +static int independent_groups_mat(group_list_t **selection, int selection_size, group_list_t *elem, int8_t **indep_mat) { int i; - for( i = 0 ; i < n ; i++ ){ - if(!list){ - if(verbose_level>=CRITICAL) - fprintf(stderr,"Error not enough elements. Only %d on %d\n",i,n); - exit(-1); - } - tab[n-i-1] = list; - list = list->next; - } - if(list){ - if(verbose_level>=DEBUG) - fprintf(stderr,"Error too many elements\n"); - exit(-1); - } -} + int id_elem = elem->id; + int id_select; -void display_tab_group(group_list_t **tab, int n,int arity) -{ - int i,j; - if(verbose_leveltab[j]->id); - printf(": %.2f %.2f\n",tab[i]->val,tab[i]->wg); - } -} -int independent_tab(tree_t **tab1,tree_t **tab2,int n) -{ - int i = 0,j = 0; + if(selection_size == 0) + return 1; - while( (iid == tab2[j]->id) + for(i=0; i id; + /* I know that id_elem > id_select, always */ + if(indep_mat[id_elem][id_select] == 0 ) return 0; - else if(tab1[i]->id > tab2[j]->id) - j++; - else - i++; } return 1; } -void compute_weighted_degree(group_list_t **tab, int n,int arity) -{ - int i,j; - for( i = 0 ; i < n ; i++) - tab[i]->sum_neighbour = 0; - for( i = 0 ; i < n ; i++ ){ - /*printf("%d/%d=%f%%\n",i,n,(100.0*i)/n);*/ - for( j = i+1 ; j < n ; j++ ) - /*if(!independent_groups(&tab[i],1,tab[j],arity)){*/ - if(!independent_tab(tab[i]->tab,tab[j]->tab,arity)){ - tab[i]->sum_neighbour += tab[j]->val; - tab[j]->sum_neighbour += tab[i]->val; - } + static long int x=0; + static long int y=0; - tab[i]->wg = tab[i]->sum_neighbour/tab[i]->val; - if(tab[i]->sum_neighbour == 0) - tab[i]->wg = 0; - /*printf("%d:%f/%f=%f\n",i,tab[i]->sum_neighbour,tab[i]->val,tab[i]->wg);*/ - } -} -/* - Very slow: explore all possibilities - aff_mat : the affiity matrix at the considered level (used to evaluate a grouping) - tab_node: array of the node to group - parent: node to which attached the computed group - id: current considered node of tab_node - arity: number of children of parent (i.e.) size of the group to compute - best_val: current value of th grouping - cur_group: current grouping - */ -void group(affinity_mat_t *aff_mat,tree_t *tab_node,tree_t *parent,int id,int arity, int n,double *best_val,tree_t **cur_group) -{ +static int thread_derecurs_exhaustive_search(group_list_t **tab_group, int i, int nb_groups, int arity, int depth, int solution_size, + double val, double *best_val, group_list_t **selection, group_list_t **best_selection, + int8_t **indep_mat, pthread_mutex_t *lock, int thread_id, int *tab_i, int start_depth){ - int N = aff_mat->order; - double val; - int i; - /*if we have found enough noide in the group*/ - if( n == arity){ - /* evaluate this group*/ - val = eval_grouping(aff_mat,cur_group,arity); - /* If we improve compared to previous grouping: uodate the children of parent accordingly */ + group_list_t *elem = NULL; + int nb_groups_to_find =0; + int nb_available_groups = 0; + + stack: + nb_groups_to_find = solution_size - depth; + nb_available_groups = nb_groups - i; + if( depth == solution_size ){ + if(verbose_level >= DEBUG) + display_selection(selection, solution_size, arity, val); if( val < *best_val ){ + pthread_mutex_lock(lock); + if(verbose_level >= INFO) + printf("\n---------%d: best_val= %f\n", thread_id, val); *best_val = val; - for( i = 0 ; i < arity ; i++ ) - parent->child[i] = cur_group[i]; - parent->arity = arity; + for( i = 0 ; i < solution_size ; i++ ) + best_selection[i] = selection[i]; + pthread_mutex_unlock(lock); } - return; + if(depth>2) + goto unstack; + else + return 0; } - /* - If we need more node in the group - Continue to explore avilable nodes - */ - for( i = id+1 ; i < N ; i++ ){ - /* If this node is allready in a group: skip it*/ - if(tab_node[i].parent) - continue; - /*Otherwise, add it to the group at place n*/ - cur_group[n] = &tab_node[i]; - /* - printf("%d<-%d\n",n,i); - recursively add the next element to this group - */ - group(aff_mat,tab_node,parent,i,arity,n+1,best_val,cur_group); + if(nb_groups_to_find > nb_available_groups){ /*if there not enough groups available*/ + if(depth>start_depth) + goto unstack; + else + return 0; } -} -/* - aff_mat : the affiity matrix at the considered level (used to evaluate a grouping) - tab_node: array of the node to group - parent: node to which attached the computed group - id: current considered node of tab_node - arity: number of children of parent (i.e.) size of the group to compute - best_val: current value of th grouping - cur_group: current grouping - N: size of tab and tab_node. i.e. number of nodes at the considered level - */ -void fast_group(affinity_mat_t *aff_mat,tree_t *tab_node,tree_t *parent,int id,int arity, int n, - double *best_val,tree_t **cur_group, int *nb_groups,int max_groups) -{ - double val; - int i; - int N = aff_mat->order; - /*printf("Max groups=%d\n",max_groups);*/ - /*if we have found enough node in the group*/ - if( n == arity ){ - (*nb_groups)++; - /*evaluate this group*/ - val = eval_grouping(aff_mat,cur_group,arity); - /* If we improve compared to previous grouping: uodate the children of parent accordingly*/ - if( val < *best_val ){ - *best_val = val; - for( i = 0 ; i < arity ; i++ ) - parent->child[i] = cur_group[i]; + while( i < nb_groups ){ + elem = tab_group[i]; + y++; + if(val+elem->val < *best_val){ + if(val+elem->bound[nb_groups_to_find]>*best_val){ + x++; + /* printf("\ni=%d, val=%.0f, elem->val = %.0f, elem->bound[%d] = %.0f, best_val = %.0f\n", */ + /* i,val,elem->val,nb_groups_to_find,elem->bound[nb_groups_to_find],*best_val); */ + /* exit(-1); */ - parent->arity = arity; + /* printf("x=%ld y=%ld\n",x,y); */ + if(depth>start_depth) + goto unstack; + else + return 0; + } + + if(independent_groups_mat(selection, depth, elem, indep_mat)){ + if(verbose_level >= DEBUG) + printf("%d: %d\n", depth, i); + selection[depth] = elem; + val += selection[depth]->val; + tab_i[depth]=i; + depth ++; + i++; + goto stack; + unstack: + depth --; + val -= selection[depth]->val; + i=tab_i[depth]; + } + } + i++; + nb_available_groups = nb_groups - i; + nb_groups_to_find = solution_size - depth; + if(nb_groups_to_find > nb_available_groups){ /*if there not enough groups available*/ + if(depth>start_depth) + goto unstack; + else + return 0; } - return; } - /* - If we need more node in the group - Continue to explore avilable nodes - */ - for( i = id+1 ; i < N ; i++ ){ - /* If this node is allready in a group: skip it*/ - if(tab_node[i].parent) - continue; - /*Otherwise, add it to the group at place n */ - cur_group[n] = &tab_node[i]; - /* - printf("%d<-%d %d/%d\n",n,i,*nb_groups,max_groups); - exit(-1); - recursively add the next element to this group - */ - fast_group(aff_mat,tab_node,parent,i,arity,n+1,best_val,cur_group,nb_groups,max_groups); - if(*nb_groups > max_groups) - return; - } -} + if(depth>start_depth) + goto unstack; + return 0; +} -void fast_grouping(affinity_mat_t *aff_mat,tree_t *tab_node, tree_t *new_tab_node, int arity, int M,long int k) -{ - tree_t **cur_group = NULL; - int l,i,nb_groups; - double best_val,val=0; +#if 0 +static group_list_t * group_dup(group_list_t *group, int nb_groups){ + group_list_t *elem = NULL; + /* tm_tree_t **tab = NULL; */ + double *bound; + size_t bound_size = nb_groups-group->id+2; - cur_group = (tree_t**)MALLOC(sizeof(tree_t*)*arity); - for( l = 0 ; l < M ; l++ ){ - best_val = DBL_MAX; - nb_groups = 0; - /*printf("k%d/%d, k=%ld\n",l,M,k);*/ - /* select the best greedy grouping among the 10 first one*/ - /*fast_group(tab,tab_node,&new_tab_node[l],-1,arity,0,&best_val,cur_group,N,&nb_groups,MAX(2,(int)(50-log2(k))-M/10));*/ - fast_group(aff_mat,tab_node,&new_tab_node[l],-1,arity,0,&best_val,cur_group,&nb_groups,MAX(1,(int)(50-CmiLog2(k))-M/10)); - val += best_val; - for( i = 0 ; i < new_tab_node[l].arity ; i++ ) - new_tab_node[l].child[i]->parent=&new_tab_node[l]; - update_val(aff_mat,&new_tab_node[l]); - } + /* tab = (tm_tree_t **)MALLOC(sizeof(tm_tree_t *)*arity); */ + /* memcpy(tab, group->tab, sizeof(tm_tree_t *)*arity); */ - FREE(cur_group); + bound = (double*) MALLOC(bound_size*sizeof(double)); + memcpy(bound, group->bound, bound_size*sizeof(double)); - if(verbose_level>=INFO) - printf("val=%f\n",val); - /*exit(-1);*/ + elem = (group_list_t*) MALLOC(sizeof(group_list_t)); - if(verbose_level>=INFO) - display_grouping(new_tab_node,M,arity,val); + elem-> tab = group->tab; + elem-> val = group->val; + elem-> sum_neighbour = group->sum_neighbour; + elem-> wg = group ->wg; + elem-> id = group->id; + elem-> bound = bound; + elem-> next = NULL; + return elem; } +#endif +#if 0 +static group_list_t ** tab_group_dup(group_list_t **tab_group, int nb_groups){ + group_list_t **res; + int i; -int adjacency_asc(const void* x1,const void* x2) -{ - adjacency_t *e1 = NULL,*e2 = NULL; + res = (group_list_t**)MALLOC(sizeof(group_list_t*)*nb_groups); - e1 = ((adjacency_t*)x1); - e2 = ((adjacency_t*)x2); + for(i=0 ; inext = res[i]; + } - return (e1->val < e2->val) ? - 1 : 1; + return res; } +#endif -int adjacency_dsc(const void* x1,const void* x2) -{ - adjacency_t *e1 = NULL,*e2 = NULL; - - e1 = ((adjacency_t*)x1); - e2 = ((adjacency_t*)x2); - +#if 0 +static int8_t **indep_mat_dup(int8_t** mat, int n){ + int i; + int8_t ** res = (int8_t**)MALLOC(sizeof(int8_t*)*n); + int row_len; + /* use indep_mat[i][j] with ival > e2->val) ? -1 : 1; + return res; } +#endif -void super_fast_grouping(affinity_mat_t *aff_mat,tree_t *tab_node, tree_t *new_tab_node, int arity, int M) -{ - double val = 0,duration; - adjacency_t *graph; - int i,j,e,l,nb_groups; - int N = aff_mat->order; - double **mat = aff_mat->mat; +static void partial_exhaustive_search(int nb_args, void **args, int thread_id){ + int i, j; + group_list_t **selection = NULL; + double val; + int n = *(int*) args[1]; + int arity = *(int*) args[2]; + /* group_list_t **tab_group = tab_group_dup((group_list_t **) args[0], n, arity); */ + group_list_t **tab_group = (group_list_t **) args[0]; + int solution_size = *(int*) args[3]; + double *best_val= (double *) args[4]; + group_list_t **best_selection = (group_list_t **) args[5]; + /* int8_t **indep_mat = indep_mat_dup((int8_t **) args[6],n); */ + int8_t **indep_mat = (int8_t **) args[6]; + work_unit_t *work = (work_unit_t *) args[7]; + pthread_mutex_t *lock = (pthread_mutex_t *) args[8]; + int *tab_i; + int id, id1, id2; + int total_work = work->nb_work; + int cur_work = 0; - assert( 2 == arity); + TIC; - TIC; - graph = (adjacency_t*)MALLOC(sizeof(adjacency_t)*((N*N-N)/2)); - e = 0; - for( i = 0 ; i < N ; i++ ) - for( j = i+1 ; j < N ; j++){ - graph[e].i = i; - graph[e].j = j; - graph[e].val = mat[i][j]; - e++; - } + if(nb_args!=9){ + if(verbose_level>=ERROR){ + fprintf(stderr, "Id: %d: bad number of argument for function %s: %d instead of 9\n", thread_id, __func__, nb_args); + return; + } + } - duration = TOC; - if(verbose_level>=DEBUG) - printf("linearization=%fs\n",duration); + pthread_mutex_lock(lock); + TIC; + pthread_mutex_unlock(lock); + tab_i = (int*) MALLOC(sizeof(int)*solution_size); + selection = (group_list_t **)MALLOC(sizeof(group_list_t*)*solution_size); - assert( e == (N*N-N)/2); - TIC; - qsort(graph,e,sizeof(adjacency_t),adjacency_dsc); - duration = TOC; - if(verbose_level>=DEBUG) - printf("sorting=%fs\n",duration); - TIC; - TIC; - l = 0; - nb_groups = 0; - for( i = 0 ; (i < e) && (l < M) ; i++ ) - if(try_add_edge(tab_node,&new_tab_node[l],arity,graph[i].i,graph[i].j,&nb_groups)) - l++; - - for( l = 0 ; l < M ; l++ ){ - update_val(aff_mat,&new_tab_node[l]); - val += new_tab_node[l].val; - } + while(work->tab_group){ + pthread_mutex_lock(lock); + if(!work->done){ + work->done = 1; + pthread_mutex_unlock(lock); + }else{ + pthread_mutex_unlock(lock); + work=work->next; + cur_work++; + continue; + } - duration = TOC; - if(verbose_level>=DEBUG) - printf("Grouping=%fs\n",duration); + /* for(i=0;inb_groups;i++){ */ + /* printf("%d ",work->tab_group[i]); */ + /* } */ + if(verbose_level>=INFO){ + fprintf(stdout, "\r%d: %.2f%% of search space explored...", thread_id,(100.0*cur_work)/total_work); + fflush(stdout); + } + for(i=0;inb_groups;i++){ + id1 = work->tab_group[i]; + for(j=i+1;jnb_groups;j++){ + id2 = work->tab_group[j]; + if(!indep_mat[id2][id1]){ + goto next_work; + } + } + } - if(verbose_level>=DEBUG) - printf("val=%f\n",val); + val = 0; + for(i=0;inb_groups;i++){ + id = work->tab_group[i]; + selection[i] = tab_group[id]; + val += tab_group[id]->val; + } + thread_derecurs_exhaustive_search(tab_group, id+1, n, arity, work->nb_groups, solution_size, val, best_val, selection, best_selection, indep_mat, lock, thread_id, tab_i, work->nb_groups); + next_work: + work=work->next; + cur_work++; + } - display_grouping(new_tab_node,M,arity,val); - FREE(graph); -} -affinity_mat_t *build_cost_matrix(affinity_mat_t *aff_mat, double* obj_weight, double comm_speed) -{ - double **mat = NULL, *sum_row; - double **old_mat; - double avg; - int i,j,N; - if(!obj_weight) - return aff_mat; + /* for( i=0 ; itab); *\/ */ + /* FREE(tab_group[i]->bound); */ + /* FREE(tab_group[i]); */ + /* } */ + /* FREE(tab_group); */ + FREE(selection); + FREE(tab_i); + /* for( i=0 ; iorder; - old_mat = aff_mat -> mat; + /* FREE(indep_mat);*/ - mat = (double**)MALLOC(N*sizeof(double*)); - for( i = 0 ; i < N ; i++ ) - mat[i] = (double*)MALLOC(N*sizeof(double)); + pthread_mutex_lock(lock); + double duration = TOC; + pthread_mutex_unlock(lock); + if(verbose_level>=INFO){ + printf("Thread %d done in %.3f!\n" , thread_id, duration); + } +} - sum_row = (double*)CALLOC(N,sizeof(double)); + +#if 0 +static int dbl_cmp_dec(const void* x1,const void* x2) +{ + return *((double *)x1) > *((double *)x2) ? -1 : 1; +} +#endif +static int dbl_cmp_inc(const void* x1,const void* x2) +{ + return *((double *)x1) < *((double *)x2) ? -1 : 1; +} + + + +static double *build_bound_array(double *tab, int n){ + int i; + double *bound; + + if (n==0) + return NULL; + + bound = (double *)MALLOC(sizeof(double)*(n+2)); + qsort(tab, n, sizeof(double), dbl_cmp_inc); + + + + if(verbose_level>=DEBUG){ + printf("T(%d): ",n); + for(i = 0; itab_group = tab_group; + cur->nb_groups = size; + cur->done = 0; + cur->next = res; + return res; +} + +static work_unit_t *generate_work_units(work_unit_t *cur, int i, int id, int *tab_group,int size, int id_max){ + + tab_group[i] = id; + if(i==size-1){ + return create_work_unit(cur,tab_group,size); + } + + if(id == id_max-1){ + return cur; + } + + id++; + for(;id < id_max;id++){ + cur = generate_work_units(cur,i+1,id,tab_group, size, id_max); + } + + return cur; +} + + +static work_unit_t *create_tab_work(int n){ + int work_size = 4; + int i; + work_unit_t *cur,*res = (work_unit_t *) CALLOC(1,sizeof(work_unit_t)); + int *tab_group = MALLOC(work_size*sizeof(int)); + cur = res; + cur = generate_work_units(cur,0,0,tab_group,3,n); + cur = generate_work_units(cur,0,1,tab_group,2,n); + cur = generate_work_units(cur,0,2,tab_group,2,n); + + for(i=3;itab_group; cur = cur-> next) + res->nb_work++; + + printf("nb_work= %d\n",res->nb_work); + + FREE(tab_group); + + return res; +} + + +static int thread_exhaustive_search(group_list_t **tab_group, int nb_groups, int arity, int solution_size, double *best_val, + group_list_t **best_selection){ + + pthread_mutex_t lock; + int nb_threads; + work_t **works; + int i, j; + int id; + /* matrix of indepedency between groups (i.e; 2 groups are independent if they + are composed of different ids) */ + int8_t **indep_mat; + double *val_array; + double duration; + work_unit_t *work_list; + TIC; + + pthread_mutex_init(&lock, NULL); + nb_threads = get_nb_threads(); + nb_threads = 4; + works = (work_t**)MALLOC(sizeof(work_t*)*nb_threads); + + work_list = create_tab_work(nb_groups); + + if(verbose_level>=DEBUG){ + for(i=0;itab[j]->id); + } + printf(" : %.0f\nb_groups", tab_group[i]->val); + } + } + + fflush(stderr); + + val_array = (double *)MALLOC(nb_groups*sizeof(double)); + + for( i=nb_groups-1 ; i>=0 ; i--){ + val_array[nb_groups-i-1] = tab_group[i]->val; + /* this is allocated here and therefore released here*/ + tab_group[i]->bound = build_bound_array(val_array,nb_groups-i); + + if(verbose_level>=DEBUG){ + printf("-->(%d--%d) %.0f: ", i, nb_groups-i-1, tab_group[i]->val); + for(j=1 ; jbound[j]); + } + printf("\n"); + } + } + + FREE(val_array); + + indep_mat = init_independent_group_mat(nb_groups, tab_group, arity); + + for(id=0;id= DEBUG) + printf("Executing %p\n", (void *)works[id]); + + submit_work( works[id], id); + } + + for(id=0;idargs); + } + + exit(-1); + + if(verbose_level>=INFO) + fprintf(stdout, "\nx=%ld, y=%ld\n",x,y); + + + for( i=0 ; ibound); + } + + FREE(indep_mat); + /* FREE(search_space); */ + FREE(works); + + if(verbose_level>=INFO) + display_selection(best_selection, solution_size, arity, *best_val); + + duration = TOC; + printf("Thread exhaustive search = %g\n",duration); + exit(-1); + return 0; +} + +#if 0 +static int old_recurs_exhaustive_search(group_list_t **tab, int i, int n, int arity, int d, int solution_size, double val, double *best_val, group_list_t **selection, group_list_t **best_selection, int8_t **indep_mat) +{ + group_list_t *elem = NULL; + + + + if( d == solution_size ){ + if(verbose_level >= DEBUG) + display_selection(selection, solution_size, arity, val); + if( val < *best_val ){ + *best_val = val; + for( i = 0 ; i < solution_size ; i++ ) + best_selection[i] = selection[i]; + return 1; + } + return 0; + } + + if(solution_size-d>n-i){ /*if there not enough groups available*/ + return 0; + } + + while( i < n ){ + elem = tab[i]; + if(val+elem->val<*best_val){ + if(independent_groups_mat(selection, d, elem, indep_mat)){ + if(verbose_level >= DEBUG) + printf("%d: %d\n", d, i); + selection[d] = elem; + val += elem->val; + old_recurs_exhaustive_search(tab, i+1, n, arity, d+1, solution_size, val, best_val, selection, best_selection, indep_mat); + val -= elem->val; + } + } + i++; + } + + return 0; +} +#endif + +#if 0 +static int recurs_exhaustive_search(group_list_t **tab, int i, int n, int arity, int d, int solution_size, double val, double *best_val, group_list_t **selection, group_list_t **best_selection, int8_t **indep_mat, int* tab_i) +{ + group_list_t *elem = NULL; + + check: + if( d == solution_size ){ + if(verbose_level >= DEBUG) + display_selection(selection, solution_size, arity, val); + if( val < *best_val ){ + *best_val = val; + for( i = 0 ; i < solution_size ; i++ ) + best_selection[i] = selection[i]; + goto uncheck; + } + goto uncheck; + } + + if(solution_size-d>n-i){ /*if there not enough groups available*/ + if(d>1) + goto uncheck; + else + return 0; + } + + while( i < n ){ + elem = tab[i]; + if(val+elem->val<*best_val){ + if(independent_groups_mat(selection, d, elem, indep_mat)){ + if(verbose_level >= DEBUG) + printf("%d: %d\n", d, i); + selection[d] = elem; + val += selection[d]->val; + tab_i[d]=i; + d++; + i++; + goto check; + uncheck: + d--; + val -= selection[d]->val; + i=tab_i[d]; + } + } + i++; + } + + if(d>1) + goto uncheck; + + return 0; +} +#endif + +#if 0 +static int exhaustive_search(group_list_t **tab_group, int n, int arity, int solution_size, double *best_val, + group_list_t **best_selection) +{ + int i, j; + group_list_t **selection = NULL; + double val; +/* matrix of indepedency between groups (i.e; 2 groups are independent if they + are composed of different ids): lazy data structure filled only once we have + already computed if two groups are independent. otherwise it is initialized at + -1*/ + int8_t **indep_mat; + int *tab_i = (int*) MALLOC(sizeof(int)*solution_size); + double duration; + TIC; + + if(verbose_level>=DEBUG){ + for(i=0;itab[j]->id); + } + printf(" : %f\n", tab_group[i]->val); + } + } + + + + indep_mat = init_independent_group_mat(n, tab_group, arity); + + selection = (group_list_t **)MALLOC(sizeof(group_list_t*)*solution_size); + for( i = 0 ; i < n ; i++ ){ + if(verbose_level>=INFO){ + fprintf(stdout, "\r%.2f%% of search space explored...", (100.0*i)/n); + fflush(stdout); + } + selection[0] = tab_group[i]; + val = tab_group[i]->val; + /* recurs_exhaustive_search(tab_group, i+1, n, arity, 1, solution_size, val, best_val, selection, best_selection, indep_mat, tab_i); */ + old_recurs_exhaustive_search(tab_group, i+1, n, arity, 1, solution_size, val, best_val, selection, best_selection, indep_mat); + } + + if(verbose_level>=INFO) + fprintf(stdout, "\n"); + + FREE(selection); + + for( i=0 ; i=INFO) + display_selection(best_selection, solution_size, arity, *best_val); + duration = TOC; + printf("Seq exhaustive search = %g\n",duration); + exit(-1); + + return 0; +} +#endif + + +int select_independent_groups_by_largest_index(group_list_t **tab_group, int n, int arity, int solution_size, double *best_val, group_list_t **best_selection, int bound, double max_duration) +{ + int i, dec, nb_groups=0; + group_list_t **selection = NULL; + double val, duration; + CLOCK_T time1, time0; + + selection = (group_list_t **)MALLOC(sizeof(group_list_t*)*solution_size); + CLOCK(time0); + + dec = MAX(n/10000, 2); + for( i = n-1 ; i >= 0 ; i -= dec*dec){ + selection[0] = tab_group[i]; + val = tab_group[i]->val; + nb_groups += test_independent_groups(tab_group, i+1, n, arity, 1, solution_size, val, best_val, selection, best_selection); + if(verbose_level>=DEBUG) + printf("%d:%d\n", i, nb_groups); + + if(nb_groups >= bound){ + FREE(selection); + return 0; + } + if((!(i%5)) && (max_duration>0)){ + CLOCK(time1); + duration=CLOCK_DIFF(time1, time0); + if(duration>max_duration){ + FREE(selection); + return 1; + } + } + } + + FREE(selection); + + if(verbose_level>=INFO) + display_selection(best_selection, solution_size, arity, *best_val); + + return 0; +} + +void list_to_tab(group_list_t *list, group_list_t **tab, int n) +{ + int i; + for( i = 0 ; i < n ; i++ ){ + if(!list){ + if(verbose_level>=CRITICAL) + fprintf(stderr, "Error not enough elements. Only %d on %d\n", i, n); + exit(-1); + } + tab[n-i-1] = list; + tab[n-i-1]->id = n-i-1; + list = list->next; + } + if(list){ + if(verbose_level>=CRITICAL) + fprintf(stderr, "Error too many elements\n"); + exit(-1); + } +} + +void display_tab_group(group_list_t **tab, int n, int arity) +{ + int i, j; + if(verbose_leveltab[j]->id); + printf(": %.2f %.2f\n", tab[i]->val, tab[i]->wg); + } +} + +int independent_tab(tm_tree_t **tab1, tm_tree_t **tab2, int arity) +{ + int ii, jj; + for( ii = 0 ; ii < arity ; ii++ ){ + for( jj = 0 ; jj < arity ; jj++ ){ + if(tab1[ii]->id == tab2[jj]->id){ + return 0; + } + } + } + return 1; +} + +void compute_weighted_degree(group_list_t **tab, int n, int arity) +{ + int i, j; + for( i = 0 ; i < n ; i++) + tab[i]->sum_neighbour = 0; + for( i = 0 ; i < n ; i++ ){ + /*printf("%d/%d=%f%%\n", i, n, (100.0*i)/n);*/ + for( j = i+1 ; j < n ; j++ ) + /*if(!independent_groups(&tab[i], 1, tab[j], arity)){*/ + if(!independent_tab(tab[i]->tab, tab[j]->tab, arity)){ + tab[i]->sum_neighbour += tab[j]->val; + tab[j]->sum_neighbour += tab[i]->val; + } + + tab[i]->wg = tab[i]->sum_neighbour/tab[i]->val; + if(tab[i]->sum_neighbour == 0) + tab[i]->wg = 0; + /*printf("%d:%f/%f=%f\n", i, tab[i]->sum_neighbour, tab[i]->val, tab[i]->wg);*/ + } +} + +/* + aff_mat : the affiity matrix at the considered level (used to evaluate a grouping) + tab_node: array of the node to group + parent: node to which attached the computed group + id: current considered node of tab_node + arity: number of children of parent (i.e.) size of the group to compute + best_val: current value of th grouping + cur_group: current grouping + mat_order: size of tab and tab_node. i.e. number of nodes at the considered level + */ +void fast_group(tm_affinity_mat_t *aff_mat, tm_tree_t *tab_node, tm_tree_t *parent, int id, int arity, int n, + double *best_val, tm_tree_t **cur_group, int *nb_groups, int max_groups) +{ + double val; + int i; + int mat_order = aff_mat->order; + + /* printf("Max groups=%d, nb_groups= %d, n= %d, arity = %d\n", max_groups, *nb_groups, n, arity); */ + + /*if we have found enough node in the group*/ + if( n == arity ){ + (*nb_groups)++; + /*evaluate this group*/ + val = eval_grouping(aff_mat, cur_group, arity); + if(verbose_level>=DEBUG) + printf("Grouping %d: %f\n", *nb_groups, val); + /* If we improve compared to previous grouping: uodate the children of parent accordingly*/ + if( val < *best_val ){ + *best_val = val; + for( i = 0 ; i < arity ; i++ ) + parent->child[i] = cur_group[i]; + + parent->arity = arity; + } + return; + } + + /* + If we need more node in the group + Continue to explore avilable nodes + */ + for( i = id+1 ; i < mat_order ; i++ ){ + /* If this node is allready in a group: skip it*/ + if(tab_node[i].parent) + continue; + /*Otherwise, add it to the group at place n */ + cur_group[n] = &tab_node[i]; + /* + printf("%d<-%d %d/%d\n", n, i, *nb_groups, max_groups); + exit(-1); + recursively add the next element to this group + */ + fast_group(aff_mat, tab_node, parent, i, arity, n+1, best_val, cur_group, nb_groups, max_groups); + if(*nb_groups > max_groups) + return; + } +} + + + + + +double fast_grouping(tm_affinity_mat_t *aff_mat, tm_tree_t *tab_node, tm_tree_t *new_tab_node, int arity, int solution_size, double nb_groups) +{ + tm_tree_t **cur_group = NULL; + int l, i, nb_done; + double best_val, val=0; + + cur_group = (tm_tree_t**)MALLOC(sizeof(tm_tree_t*)*arity); + for( l = 0 ; l < solution_size ; l++ ){ + best_val = DBL_MAX; + nb_done = 0; + /*printf("nb_groups%d/%d, nb_groups=%ld\n", l, M, nb_groups);*/ + /* select the best greedy grouping among the 10 first one*/ + /*fast_group(tab, tab_node, &new_tab_node[l], -1, arity, 0, &best_val, cur_group, mat_order, &nb_done, MAX(2, (int)(50-log2(nb_groups))-M/10));*/ + fast_group(aff_mat, tab_node, &new_tab_node[l], -1, arity, 0, &best_val, cur_group, &nb_done, MAX(10, (int)(50-CmiLog2(nb_groups))-solution_size/10)); + val += best_val; + for( i = 0 ; i < new_tab_node[l].arity ; i++ ) + new_tab_node[l].child[i]->parent=&new_tab_node[l]; + update_val(aff_mat, &new_tab_node[l]); + if(new_tab_node[l].val != best_val){ + if(verbose_level>=CRITICAL) + printf("Error: best_val = %f, new_tab_node[%d].val = %f\n", best_val, l, new_tab_node[l].val); + exit(-1); + } + } + + FREE(cur_group); + + return val; +} + +static double k_partition_grouping(tm_affinity_mat_t *aff_mat, tm_tree_t *tab_node, tm_tree_t *new_tab_node, int arity, int solution_size) { + int *partition = NULL; + int n = aff_mat->order; + com_mat_t com_mat; + int i,j,k; + double val = 0; + + com_mat.comm = aff_mat->mat; + com_mat.n = n; + + if(verbose_level>=DEBUG) + printf("K-Partitionning: n=%d, solution_size=%d, arity=%d\n",n, solution_size,arity); + + partition = kpartition(solution_size, &com_mat, n, NULL, 0); + + /* new_tab_node[i]->child[j] = &tab_node[k] where 0<=i< solution size, 0<=jparent = &new_tab_node[i]; + } + + for( i = 0 ; i < solution_size ; i++ ){ + new_tab_node[i].arity = arity; + update_val(aff_mat, &new_tab_node[i]); + val += new_tab_node[i].val; + } + + FREE(j_tab); + FREE(partition); + + return val; + +} + +int adjacency_asc(const void* x1, const void* x2) +{ + adjacency_t *e1 = NULL, *e2 = NULL; + + e1 = ((adjacency_t*)x1); + e2 = ((adjacency_t*)x2); + + return (e1->val < e2->val) ? - 1 : 1; +} + +int adjacency_dsc(const void* x1, const void* x2) +{ + adjacency_t *e1 = NULL, *e2 = NULL; + + e1 = ((adjacency_t*)x1); + e2 = ((adjacency_t*)x2); + + + return (e1->val > e2->val) ? -1 : 1; +} + +void super_fast_grouping(tm_affinity_mat_t *aff_mat, tm_tree_t *tab_node, tm_tree_t *new_tab_node, int arity, int solution_size) +{ + double val = 0, duration; + adjacency_t *graph; + int i, j, e, l, nb_groups; + int mat_order = aff_mat->order; + double **mat = aff_mat->mat; + + assert( 2 == arity); + + TIC; + graph = (adjacency_t*)MALLOC(sizeof(adjacency_t)*((mat_order*mat_order-mat_order)/2)); + e = 0; + for( i = 0 ; i < mat_order ; i++ ) + for( j = i+1 ; j < mat_order ; j++){ + graph[e].i = i; + graph[e].j = j; + graph[e].val = mat[i][j]; + e++; + } + + duration = TOC; + if(verbose_level>=DEBUG) + printf("linearization=%fs\n", duration); + + + assert( e == (mat_order*mat_order-mat_order)/2); + TIC; + qsort(graph, e, sizeof(adjacency_t), adjacency_dsc); + duration = TOC; + if(verbose_level>=DEBUG) + printf("sorting=%fs\n", duration); + + TIC; + +TIC; + l = 0; + nb_groups = 0; + for( i = 0 ; (i < e) && (l < solution_size) ; i++ ) + if(try_add_edge(tab_node, &new_tab_node[l], arity, graph[i].i, graph[i].j, &nb_groups)) + l++; + + for( l = 0 ; l < solution_size ; l++ ){ + update_val(aff_mat, &new_tab_node[l]); + val += new_tab_node[l].val; + } + + duration = TOC; + if(verbose_level>=DEBUG) + printf("Grouping=%fs\n", duration); + + + if(verbose_level>=DEBUG) + printf("val=%f\n", val); + + + display_grouping(new_tab_node, solution_size, arity, val); + + FREE(graph); +} + + +tm_affinity_mat_t *build_cost_matrix(tm_affinity_mat_t *aff_mat, double* obj_weight, double comm_speed) +{ + double **mat = NULL, *sum_row; + double **old_mat; + double avg; + int i, j, mat_order; + + if(!obj_weight) + return aff_mat; + + mat_order = aff_mat->order; + old_mat = aff_mat -> mat; + + mat = (double**)MALLOC(mat_order*sizeof(double*)); + for( i = 0 ; i < mat_order ; i++ ) + mat[i] = (double*)MALLOC(mat_order*sizeof(double)); + + sum_row = (double*)CALLOC(mat_order, sizeof(double)); avg = 0; - for( i = 0 ; i < N ; i++ ) + for( i = 0 ; i < mat_order ; i++ ) avg += obj_weight[i]; - avg /= N; + avg /= mat_order; if(verbose_level>=DEBUG) - printf("avg=%f\n",avg); + printf("avg=%f\n", avg); - for( i = 0 ; i < N ; i++ ) - for( j = 0 ; j < N ; j++){ + for( i = 0 ; i < mat_order ; i++ ) + for( j = 0 ; j < mat_order ; j++){ if( i == j ) mat[i][j] = 0; else{ @@ -1046,7 +1728,7 @@ affinity_mat_t *build_cost_matrix(affinity_mat_t *aff_mat, double* obj_weight, d sum_row[i] += mat[i][j]; } } - return new_affinity_mat(mat,sum_row,N); + return new_affinity_mat(mat, sum_row, mat_order); } @@ -1056,200 +1738,229 @@ affinity_mat_t *build_cost_matrix(affinity_mat_t *aff_mat, double* obj_weight, d tab_node: array of the node to group new_tab_node: array of nodes at the next level (the parents of the node in tab_node once the grouping will be done). arity: number of children of parent (i.e.) size of the group to compute - M: size of new_tab_node (i.e) the number of parents + solution_size: size of new_tab_node (i.e) the number of parents */ -void group_nodes(affinity_mat_t *aff_mat,tree_t *tab_node, tree_t *new_tab_node, int arity, int M, double* obj_weigth, double comm_speed) -{ +void group_nodes(tm_affinity_mat_t *aff_mat, tm_tree_t *tab_node, tm_tree_t *new_tab_node, + int arity, int solution_size, double* obj_weigth, double comm_speed){ + + /* + mat_order: size of tab and tab_node. i.e. number of nodes at the considered level + Hence we have: M*arity=mat_order + */ + int mat_order = aff_mat -> order; + tm_tree_t **cur_group = NULL; + int j, l; + unsigned long int list_size; + unsigned long int i; + group_list_t list, **best_selection = NULL, **tab_group = NULL; + double best_val, last_best; + int timeout; + tm_affinity_mat_t *cost_mat = NULL; /*cost matrix taking into account the communiocation cost but also the weight of the object*/ + double duration; + double val; + double nbg; + TIC; + + + + /* might return aff_mat (if obj_weight==NULL): do not free this tab in this case*/ + cost_mat = build_cost_matrix(aff_mat, obj_weigth, comm_speed); + + nbg = choose(mat_order, arity); + + if(verbose_level>=INFO) + printf("Number of possible groups:%.0lf\n", nbg); + + /* Todo: check if the depth is a criteria for speeding up the computation*/ + /* if(nb_groups>30000||depth>5){*/ + if( nbg > 30000 ){ - /* - N: size of tab and tab_node. i.e. number of nodes at the considered level - Hence we have: M*arity=N - */ - int N = aff_mat -> order; - tree_t **cur_group = NULL; - int j,l; - unsigned int n; - unsigned long int k; - group_list_t list,**best_selection = NULL,**tab_group = NULL; - double best_val,last_best; - int timeout; - affinity_mat_t *cost_mat = NULL; /*cost matrix taking into account the communiocation cost but also the weight of the object*/ double duration; TIC; + if( arity <= 2 ){ + /*super_fast_grouping(tab, tab_node, new_tab_node, arity, mat_order, solution_size, k);*/ + if(verbose_level >= INFO ) + printf("Bucket Grouping...\n"); + val = bucket_grouping(cost_mat, tab_node, new_tab_node, arity, solution_size); + }else if( arity <= 5){ + if(verbose_level >= INFO) + printf("Fast Grouping...\n"); + val = fast_grouping(cost_mat, tab_node, new_tab_node, arity, solution_size, nbg); + } else{ + if(verbose_level >= INFO) + printf("K-partition Grouping...\n"); + val = k_partition_grouping(cost_mat, tab_node, new_tab_node, arity, solution_size); + } + + duration = TOC; + if(verbose_level >= INFO) + printf("Fast grouping duration=%f\n", duration); - /* might return aff_mat (if obj_weight==NULL): do not FREE this tab in this case*/ - cost_mat = build_cost_matrix(aff_mat,obj_weigth,comm_speed); + if(verbose_level >= INFO) + display_grouping(new_tab_node, solution_size, arity, val); + + }else{ + unsigned long int nb_groups = (unsigned long int) nbg; + if(verbose_level >= INFO) + printf("Grouping nodes...\n"); + list.next = NULL; + list.val = 0; /*number of elements in the list*/ + cur_group = (tm_tree_t**)MALLOC(sizeof(tm_tree_t*)*arity); + best_selection = (group_list_t **)MALLOC(sizeof(group_list_t*)*solution_size); + + list_all_possible_groups(cost_mat, tab_node, 0, arity, 0, cur_group, &list); + list_size = (int)list.val; + assert( list_size == nb_groups); + tab_group = (group_list_t**)MALLOC(sizeof(group_list_t*)*nb_groups); + list_to_tab(list.next, tab_group, nb_groups); + if(verbose_level>=INFO) + printf("List to tab done\n"); + + best_val = DBL_MAX; - k = choose(N,arity); + /* perform the pack mapping fist*/ + /* timeout = select_independent_groups(tab_group, n, arity, M, &best_val, best_selection, 1, 0.1); */ + timeout = select_independent_groups(tab_group, nb_groups, arity, solution_size, &best_val, best_selection, 1, 100); if(verbose_level>=INFO) - printf("Number of groups:%ld\n",k); - - /* Todo: check if the depth is a criteria for speeding up the computation*/ - /* if(k>30000||depth>5){*/ - if( k > 30000 ) { - - double duration; - - TIC; - if( arity <= 2 ) { - /*super_fast_grouping(tab,tab_node,new_tab_node,arity,N,M,k);*/ - if(verbose_level >= INFO ) - printf("Bucket Grouping...\n"); - bucket_grouping(cost_mat,tab_node,new_tab_node,arity,M); - } else { - if(verbose_level >= INFO) - printf("Fast Grouping...\n"); - fast_grouping(cost_mat,tab_node,new_tab_node,arity,M,k); - } - - duration = TOC; - if(verbose_level>=INFO) - printf("Fast grouping duration=%f\n",duration); - - if(verbose_level>=DEBUG) - display_grouping(new_tab_node,M,arity,-1); - - } else { - if(verbose_level>=INFO) - printf("Grouping nodes...\n"); - list.next = NULL; - list.val = 0; /*number of elements in the list*/ - cur_group = (tree_t**)MALLOC(sizeof(tree_t*)*arity); - best_selection = (group_list_t **)MALLOC(sizeof(group_list_t*)*M); - - list_all_possible_groups(cost_mat,tab_node,0,arity,0,cur_group,&list); - n = (int)list.val; - assert( n == k ); - tab_group = (group_list_t**)MALLOC(sizeof(group_list_t*)*n); - list_to_tab(list.next,tab_group,n); - if(verbose_level>=INFO) - printf("List to tab done\n"); - - best_val = DBL_MAX; - - /* perform the pack mapping fist*/ - /* timeout = select_independent_groups(tab_group,n,arity,M,&best_val,best_selection,1,0.1); */ - timeout = select_independent_groups(tab_group,n,arity,M,&best_val,best_selection,1,100); - if((verbose_level>=INFO) && timeout) - printf("Packed mapping timeout!\n"); - /* give this mapping an exra credit (in general MPI application are made such that - neighbour process communicates more than distant ones) */ - best_val /= 1.001; - /* best_val *= 1.001; */ - if(verbose_level>=INFO) - printf("Packing computed\n"); - - /* perform a mapping trying to use group that cost less first*/ - qsort(tab_group,n,sizeof(group_list_t*),group_list_asc); - last_best = best_val; - timeout = select_independent_groups(tab_group,n,arity,M,&best_val,best_selection,10,0.1); - /* timeout = select_independent_groups(tab_group,n,arity,M,&best_val,best_selection,n,0); */ - if(verbose_level>=INFO){ - if(timeout) { - printf("Cost less first timeout!\n"); - } else if(last_best>best_val) { - printf("Cost less first Impoved solution\n"); - } - printf("----\n"); - } - /* perform a mapping trying to minimize the use of groups that cost a lot */ - qsort(tab_group,n,sizeof(group_list_t*),group_list_dsc); - last_best=best_val; - timeout=select_independent_groups_by_largest_index(tab_group,n,arity,M,&best_val,best_selection,10,0.1); - if(verbose_level>=DEBUG) { - if(timeout) - printf("Cost most last timeout!\n"); - else if(last_best>best_val) - printf("Cost most last impoved solution\n"); - } - if( n < 10000 ){ - /* perform a mapping in the weighted degree order */ - - - if(verbose_level>=INFO) - printf("----WG----\n"); - - compute_weighted_degree(tab_group,n,arity); - - if(verbose_level>=INFO) - printf("Weigted degree computed\n"); - - qsort(tab_group,n,sizeof(group_list_t*),weighted_degree_dsc); - /* display_tab_group(tab_group,n,arity);*/ - last_best = best_val; - timeout = select_independent_groups(tab_group,n,arity,M,&best_val,best_selection,10,0.1); - /* timeout = select_independent_groups(tab_group,n,arity,M,&best_val,best_selection,n,0); */ - - if(verbose_level>=DEBUG){ - if(timeout) - printf("WG timeout!\n"); - else if(last_best>best_val) - printf("WG impoved solution\n"); - } - } - - qsort(best_selection,M,sizeof(group_list_t*),group_list_id); - - for( l = 0 ; l < M ; l++ ){ - for( j = 0 ; j < arity ; j++ ){ - new_tab_node[l].child[j] = best_selection[l]->tab[j]; - new_tab_node[l].child[j]->parent = &new_tab_node[l]; - } - new_tab_node[l].arity = arity; - - /* printf("arity=%d\n",new_tab_node[l].arity); */ - update_val(cost_mat,&new_tab_node[l]); - } - - delete_group_list((&list)->next); - FREE(best_selection); - FREE(tab_group); - FREE(cur_group); + if(timeout) + printf("Packed mapping timeout!\n"); + /* give this mapping an exra credit (in general MPI application are made such that + neighbour process communicates more than distant ones) */ + best_val /= 1.001; + /* best_val *= 1.001; */ + if(verbose_level>=INFO) + printf("Packing computed\n"); + + + + /* perform a mapping trying to use group that cost less first*/ + qsort(tab_group, nb_groups, sizeof(group_list_t*), group_list_asc); + last_best = best_val; + timeout = select_independent_groups(tab_group, nb_groups, arity, solution_size, &best_val, best_selection, 10, 0.1); + /* timeout = select_independent_groups(tab_group, n, arity, solution_size, &best_val, best_selection, n, 0); */ + if(verbose_level>=INFO){ + if(timeout){ + printf("Cost less first timeout!\n"); + } + if(last_best>best_val){ + printf("Cost less first Impoved solution\n"); + } } + /* perform a mapping trying to minimize the use of groups that cost a lot */ + qsort(tab_group, nb_groups, sizeof(group_list_t*), group_list_dsc); + last_best=best_val; + timeout=select_independent_groups_by_largest_index(tab_group, nb_groups, arity, solution_size, &best_val, best_selection, 10, 0.1); + if(verbose_level>=INFO){ + if(timeout) + printf("Cost most last timeout!\n"); + if(last_best>best_val) + printf("Cost most last impoved solution\n"); + } + if( nb_groups < 1000000 ){ + /* perform a mapping in the weighted degree order */ + + + if(verbose_level>=INFO) + printf("----WG----\n"); + + + compute_weighted_degree(tab_group, nb_groups, arity); + + if(verbose_level>=INFO) + printf("Weigted degree computed\n"); + + qsort(tab_group, nb_groups, sizeof(group_list_t*), weighted_degree_dsc); + + for( i=0 ; iid = i; - if(cost_mat != aff_mat){ - FREE_tab_double(cost_mat->mat,N); - FREE(cost_mat->sum_row); - FREE(cost_mat); + /* display_tab_group(tab_group, n, arity);*/ + last_best = best_val; + timeout = select_independent_groups(tab_group, nb_groups, arity, solution_size, &best_val, best_selection, 10, 0.1); + /* timeout = select_independent_groups(tab_group, n, arity, solution_size, &best_val, best_selection, n, 0); */ + + if(verbose_level>=INFO){ + if(timeout) + printf("WG timeout!\n"); + if(last_best>best_val) + printf("WG impoved solution\n"); + } } - duration = TOC; + if(tm_get_exhaustive_search_flag()){ + if(verbose_level>=INFO) + printf("Running exhaustive search on %ld groups, please wait...\n",nb_groups); + + last_best = best_val; + thread_exhaustive_search(tab_group, nb_groups, arity, solution_size, &best_val, best_selection); + /* exhaustive_search(tab_group, nb_groups, arity, solution_size, &best_val, best_selection); */ + if(verbose_level>=INFO){ + if(last_best>best_val){ + printf("Exhaustive search improved solution by: %.3f\n",(last_best-best_val)/last_best); + } else { + printf("Exhaustive search did not improved solution\n"); + } + } + } + + /* Reorder solution and apply it to new_tab_node: returned array */ + qsort(best_selection, solution_size, sizeof(group_list_t*), group_list_id); + + for( l = 0 ; l < solution_size ; l++ ){ + for( j = 0 ; j < arity ; j++ ){ + new_tab_node[l].child[j] = best_selection[l]->tab[j]; + new_tab_node[l].child[j]->parent = &new_tab_node[l]; + } + new_tab_node[l].arity = arity; - if(verbose_level>=INFO) - display_grouping(new_tab_node,M,arity,-1); + /* printf("arity=%d\n", new_tab_node[l].arity); */ + update_val(cost_mat, &new_tab_node[l]); + } + delete_group_list((&list)->next); + FREE(best_selection); + FREE(tab_group); + FREE(cur_group); + } - if(verbose_level>=INFO) - printf("Grouping done in %.4fs!\n",duration); + if(cost_mat != aff_mat){ + free_affinity_mat(cost_mat); + } + + duration = TOC; + + + if(verbose_level>=INFO) + printf("Grouping done in %.4fs!\n", duration); } -void complete_aff_mat(affinity_mat_t **aff_mat ,int N, int K) +void complete_aff_mat(tm_affinity_mat_t **aff_mat , int mat_order, int K) { - double **old_mat = NULL,**new_mat = NULL; double *sum_row; - int M,i; + double **old_mat = NULL, **new_mat = NULL; double *sum_row; + int M, i; old_mat = (*aff_mat) -> mat; - M = N+K; + M = mat_order+K; new_mat = (double**)MALLOC(M*sizeof(double*)); for( i = 0 ; i < M ; i++ ) - new_mat[i] = (double*)CALLOC((M),sizeof(double)); + new_mat[i] = (double*)CALLOC((M), sizeof(double)); - sum_row = (double*) CALLOC(M,sizeof(double)); + sum_row = (double*) CALLOC(M, sizeof(double)); - for( i = 0 ; i < N ; i++ ){ - memcpy(new_mat[i],old_mat[i],N*sizeof(double)); + for( i = 0 ; i < mat_order ; i++ ){ + memcpy(new_mat[i], old_mat[i], mat_order*sizeof(double)); sum_row[i] = (*aff_mat)->sum_row[i]; } - *aff_mat = new_affinity_mat(new_mat,sum_row,M); + *aff_mat = new_affinity_mat(new_mat, sum_row, M); } -void complete_obj_weight(double **tab,int N, int K) +void complete_obj_weight(double **tab, int mat_order, int K) { - double *old_tab = NULL,*new_tab = NULL,avg; - int M,i; + double *old_tab = NULL, *new_tab = NULL, avg; + int M, i; old_tab = *tab; @@ -1257,63 +1968,62 @@ void complete_obj_weight(double **tab,int N, int K) return; avg = 0; - for( i = 0 ; i < N ; i++ ) + for( i = 0 ; i < mat_order ; i++ ) avg += old_tab[i]; - avg /= N; + avg /= mat_order; - M = N+K; + M = mat_order+K; new_tab = (double*)MALLOC(M*sizeof(double)); *tab = new_tab; for( i = 0 ; i < M ; i++ ) - if(i < N) + if(i < mat_order) new_tab[i] = old_tab[i]; else new_tab[i] = avg; } -void create_dumb_tree(tree_t *node,int depth,tm_topology_t *topology) +void create_dumb_tree(tm_tree_t *node, int depth, tm_topology_t *topology) { - tree_t **list_child = NULL; - int arity,i; + tm_tree_t **list_child = NULL; + int arity, i; if( depth == topology->nb_levels-1) { - set_node(node,NULL,0,NULL,-1,0,NULL,depth); + set_node(node, NULL, 0, NULL, -1, 0, NULL, depth); return; } arity = topology->arity[depth]; assert(arity>0); - list_child = (tree_t**)CALLOC(arity,sizeof(tree_t*)); + list_child = (tm_tree_t**)CALLOC(arity, sizeof(tm_tree_t*)); for( i = 0 ; i < arity ; i++ ){ - list_child[i] = (tree_t*)MALLOC(sizeof(tree_t)); - create_dumb_tree(list_child[i],depth+1,topology); + list_child[i] = (tm_tree_t*)MALLOC(sizeof(tm_tree_t)); + create_dumb_tree(list_child[i], depth+1, topology); list_child[i]->parent = node; list_child[i]->dumb = 1; } - set_node(node,list_child,arity,NULL,-1,0,list_child[0], depth); + set_node(node, list_child, arity, NULL, -1, 0, list_child[0], depth); } - -void complete_tab_node(tree_t **tab,int N, int K,int depth,tm_topology_t *topology) +void complete_tab_node(tm_tree_t **tab, int mat_order, int K, int depth, tm_topology_t *topology) { - tree_t *old_tab = NULL,*new_tab = NULL; - int M,i; + tm_tree_t *old_tab = NULL, *new_tab = NULL; + int M, i; if( K == 0 ) return; old_tab = *tab; - M = N+K; - new_tab = (tree_t*)MALLOC(M*sizeof(tree_t)); + M = mat_order+K; + new_tab = (tm_tree_t*)MALLOC(M*sizeof(tm_tree_t)); *tab = new_tab; for( i = 0 ; i < M ; i++ ) - if(i < N) - clone_tree(&new_tab[i],&old_tab[i]); + if(i < mat_order) + clone_tree(&new_tab[i], &old_tab[i]); else{ - create_dumb_tree(&new_tab[i],depth,topology); + create_dumb_tree(&new_tab[i], depth, topology); new_tab[i].id = i; } @@ -1321,11 +2031,11 @@ void complete_tab_node(tree_t **tab,int N, int K,int depth,tm_topology_t *topolo FREE(old_tab); } -void set_deb_tab_child(tree_t *tree, tree_t *child,int depth) +void set_deb_tab_child(tm_tree_t *tree, tm_tree_t *child, int depth) { - /* printf("depth=%d\t%p\t%p\n",depth,child,tree);*/ + /* printf("depth=%d\t%p\t%p\n", depth, child, tree);*/ if( depth > 0 ) - set_deb_tab_child(tree->tab_child,child,depth-1); + set_deb_tab_child(tree->tab_child, child, depth-1); else tree->tab_child=child; } @@ -1342,63 +2052,63 @@ depth: current depth of the algorithm toplogy: description of the hardware topology. constraints: set of constraints: core ids where to bind the processes */ -tree_t *build_level_topology(tree_t *tab_node, affinity_mat_t *aff_mat,int arity,int depth,tm_topology_t *topology, +tm_tree_t *build_level_topology(tm_tree_t *tab_node, tm_affinity_mat_t *aff_mat, int arity, int depth, tm_topology_t *topology, double *obj_weight, double *comm_speed) { - /* N: number of nodes. Order of com_mat, size of obj_weight */ - int N=aff_mat->order ; - int i,K=0,M; /*M = N/Arity: number the groups*/ - tree_t *new_tab_node = NULL; /*array of node for this level (of size M): there will be linked to the nodes of tab_nodes*/ - affinity_mat_t * new_aff_mat= NULL; /*New communication matrix (after grouyping nodes together)*/ - tree_t *res = NULL; /*resulting tree*/ + /* mat_order: number of nodes. Order of com_mat, size of obj_weight */ + int mat_order=aff_mat->order ; + int i, K=0, M; /*M = mat_order/Arity: number the groups*/ + tm_tree_t *new_tab_node = NULL; /*array of node for this level (of size M): there will be linked to the nodes of tab_nodes*/ + tm_affinity_mat_t * new_aff_mat= NULL; /*New communication matrix (after grouyping nodes together)*/ + tm_tree_t *res = NULL; /*resulting tree*/ int completed = 0; double speed; /* communication speed at this level*/ double *new_obj_weight = NULL; double duration; if( 0 == depth ){ - if((1 == N) && (0 == depth)) + if((1 == mat_order) && (0 == depth)) return &tab_node[0]; else { if(verbose_level >= CRITICAL) - fprintf(stderr,"Error: matrix size: %d and depth:%d (should be 1 and -1 respectively)\n",N,depth); + fprintf(stderr, "Error: matrix size: %d and depth:%d (should be 1 and -1 respectively)\n", mat_order, depth); exit(-1); } } /* If the number of nodes does not divide the arity: we add K nodes */ - if( N%arity != 0 ){ + if( mat_order%arity != 0 ){ TIC; - K = arity*((N/arity)+1)-N; - /*printf("****N=%d arity=%d K=%d\n",N,arity,K); */ - /*display_tab(tab,N);*/ + K = arity*((mat_order/arity)+1)-mat_order; + /*printf("****mat_order=%d arity=%d K=%d\n", mat_order, arity, K); */ + /*display_tab(tab, mat_order);*/ /* add K rows and columns to comm_matrix*/ - complete_aff_mat(&aff_mat,N,K); + complete_aff_mat(&aff_mat, mat_order, K); /* add K element to the object weight*/ - complete_obj_weight(&obj_weight,N,K); - /*display_tab(tab,N+K);*/ + complete_obj_weight(&obj_weight, mat_order, K); + /*display_tab(tab, mat_order+K);*/ /* add a dumb tree to the K new "virtual nodes"*/ - complete_tab_node(&tab_node,N,K,depth,topology); + complete_tab_node(&tab_node, mat_order, K, depth, topology); completed = 1; /*flag this addition*/ - N += K; /*increase the number of nodes accordingly*/ + mat_order += K; /*increase the number of nodes accordingly*/ duration = TOC; if(verbose_level >= INFO) - fprintf(stderr,"Completing matrix duration= %fs\n ", duration); - } /*display_tab(tab,N);*/ + printf("Completing matrix duration= %fs\n ", duration); + } /*display_tab(tab, mat_order);*/ - M = N/arity; + M = mat_order/arity; if(verbose_level >= INFO) - printf("Depth=%d\tnb_nodes=%d\tnb_groups=%d\tsize of groups(arity)=%d\n",depth,N,M,arity); + printf("Depth=%d\tnb_nodes=%d\tnb_groups=%d\tsize of groups(arity)=%d\n", depth, mat_order, M, arity); TIC; /*create the new nodes*/ - new_tab_node = (tree_t*)MALLOC(sizeof(tree_t)*M); + new_tab_node = (tm_tree_t*)MALLOC(sizeof(tm_tree_t)*M); /*intitialize each node*/ for( i = 0 ; i < M ; i++ ){ - tree_t **list_child = NULL; - list_child = (tree_t**)CALLOC(arity,sizeof(tree_t*)); - set_node(&new_tab_node[i],list_child,arity,NULL,i,0,tab_node,depth); + tm_tree_t **list_child = NULL; + list_child = (tm_tree_t**)CALLOC(arity, sizeof(tm_tree_t*)); + set_node(&new_tab_node[i], list_child, arity, NULL, i, 0, tab_node, depth); } duration = TOC; if(verbose_level >= INFO) @@ -1413,7 +2123,7 @@ tree_t *build_level_topology(tree_t *tab_node, affinity_mat_t *aff_mat,int arity TIC; /*based on that grouping aggregate the communication matrix*/ - new_aff_mat = aggregate_aff_mat(new_tab_node,aff_mat,M); + new_aff_mat = aggregate_aff_mat(new_tab_node, aff_mat, M); duration = TOC; if(verbose_level >= INFO) printf("Aggregate_com_mat= %fs\n", duration); @@ -1421,18 +2131,18 @@ tree_t *build_level_topology(tree_t *tab_node, affinity_mat_t *aff_mat,int arity /*based on that grouping aggregate the object weight matrix*/ - new_obj_weight = aggregate_obj_weight(new_tab_node,obj_weight,M); + new_obj_weight = aggregate_obj_weight(new_tab_node, obj_weight, M); duration = TOC; if(verbose_level >= INFO) printf("Aggregate obj_weight= %fs\n ", duration); /* set ID of virtual nodes to -1*/ - for( i = N-K ; i < N ; i++ ) + for( i = mat_order-K ; i < mat_order ; i++ ) tab_node[i].id = -1; /* - for(i=0;imat,aff_mat->order); - FREE(aff_mat->sum_row); - FREE(aff_mat); + free_affinity_mat(aff_mat); FREE(obj_weight); } - FREE_tab_double(new_aff_mat->mat,new_aff_mat->order); - FREE(new_aff_mat->sum_row); - FREE(new_aff_mat); + free_affinity_mat(new_aff_mat); + FREE(new_obj_weight); return res; } -double speed(int depth) -{ - /* - Bertha values - double tab[5]={21,9,4.5,2.5,0.001}; - double tab[5]={1,1,1,1,1}; - double tab[6]={100000,10000,1000,500,100,10}; - */ - double tab[11] = {1024,512,256,128,64,32,16,8,4,2,1}; - return 1.0/tab[depth]; - /* - return 10*log(depth+2); - return (depth+1); - return (long int)pow(100,depth); - */ -} +tm_tree_t *bottom_up_build_tree_from_topology(tm_topology_t *topology, tm_affinity_mat_t *aff_mat, + double *obj_weight, double *comm_speed){ + int depth, i; + tm_tree_t *res = NULL, *tab_node = NULL; + int mat_order = aff_mat->order; -/* check the leaf numbering of the topology - this number must be between 0 and n-1 (the number of leaves) - teh number must all be different - However if a given leaf number is -1, it means that this - leaf cannot bee used for the mapping + tab_node = (tm_tree_t*)MALLOC(sizeof(tm_tree_t)*mat_order); + depth = topology->nb_levels; + for( i = 0 ; i < mat_order ; i++ ) + set_node(&tab_node[i], NULL, 0, NULL, i, 0, NULL, depth); - The function returns the number of constraints (leaves that can be used) - and their numbers (in increasing order) in the array pointed by contraints -*/ + if(verbose_level >= INFO) + printf("nb_levels=%d\n", depth); + /* assume all objects have the same arity*/ + res = build_level_topology(tab_node, aff_mat , topology->arity[depth-2], depth-1, topology, obj_weight, comm_speed); + if(verbose_level >= INFO) + printf("Build (top down) tree done!\n"); -int check_constraints(tm_topology_t *topology, int **constraints) -{ - int j,i,n = nb_processing_units(topology); - int *tab_constraints = NULL, nb_constraints = 0; - int *tab_node = NULL; - int *count = NULL; - - /* tab_node: array of core numbers. - tab_node[i]=-1 if this core is forbiden - numbering is such that - 0<=tab_node[i]node_id[topology->nb_levels-1]; + /* tell the system it is not a constraint tree, this is usefull for freeing pointers*/ + res->constraint = 0; - /* "count" counts the number of cores of a given number. - count[i]: number of cores of number i. - 0<=count[i]<=1 - */ - count = (int *)CALLOC(n,sizeof(int)); - for( i = 0 ; i < n ; i++ ) - if (tab_node[i] != -1){ - if( (tab_node[i] >= 0) && (tab_node[i] < n)){ - /* In the remaining, we assume that the core numbering is logical from 0 to n - so if tab_node[i]!=-1 this mean sthat we have to use core number i*/ - count[i]++; - nb_constraints++; - }else{ - if(verbose_level >= ERROR) - fprintf(stderr, "*** Error: Core numbering not between 0 and %d: tab_node[%d]=%d\n", n , i, tab_node[i]); - *constraints = NULL; - FREE(count); - return 0; - } - } - - if(nb_constraints == 0){ - FREE(count); - *constraints = NULL; - return 0; - } + return res; +} - tab_constraints = (int*) MALLOC(sizeof(int)*nb_constraints); - - /* we can now use the "counting sort" to sort the constraint tab in increasing order in linear time*/ - j = 0; - for( i = 0 ; i < n ; i++ ) - if(count[i]) - tab_constraints[j++] = i; - - /* if the constraint_tab is not full, this means that some count[i]>1*/ - if( j != nb_constraints ){ - if(verbose_level >= ERROR) - fprintf(stderr,"*** Error: Duplicate numbering: j=%d, nb_constraints= %d\n",j, nb_constraints); - FREE(tab_constraints); - FREE(count); - *constraints = NULL; - return 0; - } - /* FREE local variables, assign result, return result*/ - FREE(count); - *constraints = tab_constraints; - return nb_constraints; -} -affinity_mat_t * build_affinity_mat(double **mat, int order){ - int i,j; - double *sum_row = (double*) CALLOC (order, sizeof(double)); - for (i=0 ; inb_levels; - for( i = 0 ; i < N ; i++ ) - set_node(&tab_node[i],NULL,0,NULL,i,0,NULL,depth); - aff_mat = build_affinity_mat(com_mat,N); + int sorted = 1; + int last = -1; + int i, shift; + int nb_constraints = topology->nb_constraints*topology->oversub_fact; + if(nb_constraints && topology->constraints){ + *constraints = (int*)MALLOC(sizeof(int)*(nb_constraints)); + /* renumber constarints logically as it is the way the k-partitionner use it*/ + for(i = 0 ; i < nb_constraints ; i++){ + /* in case of oversubscrining node ids at topology->nb_levels-1 are as follows (for the logocal numbering case): + 0, 0, .., 0, 1, 1, ..., 1, 2, 2, 2, ..., 2, ... where the number of identical consecutive number is topology->oversub_fact. + However, topology->node_rank refers only to the last rank of the id. Hence, + topology->node_rank[topology->nb_levels-1][i] == i*topology->oversub_fact + In order to have all the ranks of a given id we need to shift them as follows: + */ + shift = 1 + i%topology->oversub_fact - topology->oversub_fact; + (*constraints)[i] = topology->node_rank[topology->nb_levels-1][topology->constraints[i/topology->oversub_fact]] +shift; + if((*constraints)[i] < last) + sorted = 0; + last = (*constraints)[i]; + } - if(verbose_level >= INFO) - printf("nb_levels=%d\n",depth); - /* assume all objects have the same arity*/ - res = build_level_topology(tab_node, aff_mat , topology->arity[depth-2], depth-1, topology, obj_weight, comm_speed); - if(verbose_level >= INFO) - printf("Build (top down) tree done!\n"); + if(!sorted){ + qsort(*constraints, nb_constraints , sizeof(int), int_cmp_inc); + } - /* tell the system it is not a constraint tree, this is usefull for freeing pointers*/ - res->constraint = 0; - FREE(aff_mat -> sum_row); - FREE(aff_mat); + }else{ + *constraints = NULL; + } - return res; + return nb_constraints; } -tree_t * build_tree_from_topology(tm_topology_t *topology, double **com_mat, int N, double *obj_weight, double *com_speed) + +tm_tree_t * tm_build_tree_from_topology(tm_topology_t *topology, tm_affinity_mat_t *aff_mat, double *obj_weight, double *com_speed) { int *constraints = NULL, nb_constraints; - tree_t * result; + tm_tree_t * result; + int npu, nb_processes, oversub_fact, nb_slots; - verbose_level = get_verbose_level(); + verbose_level = tm_get_verbose_level(); + oversub_fact = topology->oversub_fact; + /* Here constraints expended to take into account the oversuscribing factor */ nb_constraints = check_constraints (topology, &constraints); + nb_processes = aff_mat->order; + npu = nb_processing_units(topology); + nb_slots = npu * oversub_fact; - printf("nb_constraints = %d, N= %d; nb_processing units = %d\n",nb_constraints, N, nb_processing_units(topology)); + if(verbose_level >= INFO){ + printf("Com matrix size : %d\n", nb_processes); + printf("nb_constraints : %d\n", nb_constraints); + if(constraints) + print_1D_tab(constraints, nb_constraints); + printf("nb_processing units : %d\n", npu); + printf("Oversubscrbing factor: %d\n", oversub_fact); + printf("Nb of slots : %d\n", nb_slots); + } - if(N>nb_constraints){ + if(nb_processes > nb_constraints){ if(verbose_level >= CRITICAL){ - printf("Error : More processes (%d) than number of constraints (%d)!\n",N ,nb_constraints); + fprintf(stderr, "Error : Not enough slots/constraints (%d) for the communication matrix order (%d)!\n", + nb_constraints, nb_processes); } exit(-1); } - if(verbose_level >= INFO){ - printf("Com matrix size: %d\n",N); - printf("nb_constraints: %d\n",nb_constraints); - } - - if(nb_constraints == nb_processing_units(topology)) + if(nb_constraints == nb_slots) { + if(verbose_level >= INFO){ + printf("No need to use %d constraints for %d slots!\n", nb_constraints, nb_slots); + } + nb_constraints = 0; FREE(constraints); } @@ -1634,7 +2294,9 @@ tree_t * build_tree_from_topology(tm_topology_t *topology, double **com_mat, int if(verbose_level >= INFO){ printf("Partitionning with constraints\n"); } - result = kpartition_build_tree_from_topology(topology, com_mat, N, constraints, nb_constraints, obj_weight, com_speed); + result = kpartition_build_tree_from_topology(topology, aff_mat->mat, nb_processes, constraints, nb_constraints, + obj_weight, com_speed); + result->nb_processes = aff_mat->order; FREE(constraints); return result; } @@ -1642,6 +2304,9 @@ tree_t * build_tree_from_topology(tm_topology_t *topology, double **com_mat, int if(verbose_level >= INFO){ printf("Partitionning without constraints\n"); } - return bottom_up_build_tree_from_topology(topology, com_mat, N, obj_weight, com_speed); + + result = bottom_up_build_tree_from_topology(topology, aff_mat, obj_weight, com_speed); + result->nb_processes = aff_mat->order; + return result; } } diff --git a/ompi/mca/topo/treematch/treematch/tm_tree.h b/ompi/mca/topo/treematch/treematch/tm_tree.h index 342a61bd4f7..6168f501618 100644 --- a/ompi/mca/topo/treematch/treematch/tm_tree.h +++ b/ompi/mca/topo/treematch/treematch/tm_tree.h @@ -1,69 +1,22 @@ -#ifndef __TREE_H__ -#define __TREE_H__ +#ifndef __TM_TREE_H__ +#define __TM_TREE_H__ #include +#include "treematch.h" - -typedef struct _node_info_t{ - int submit_date; - int job_id; - int finish_date; -} job_info_t; - -typedef struct _tree_t{ - int constraint; /* tells if the tree has been constructed with constraints on the nodes or not. usefull for freeing it. needs to be set on the root only*/ - struct _tree_t **child; - struct _tree_t *parent; - struct _tree_t *tab_child; /*the pointer to be freed*/ - double val; - int arity; - int depth; - int id; - int uniq; - int dumb; /* 1 if the node belongs to a dumb tree: hence has to be freed separately*/ - job_info_t *job_info; -}tree_t; - -/* Maximum number of levels in the tree*/ -#define MAX_LEVELS 100 - -typedef struct { - int *arity; /* arity of the nodes of each level*/ - int nb_levels; /*number of levels of the tree. Levels are numbered from top to bottom starting at 0*/ - int *nb_nodes; /*nb of nodes of each level*/ - int *nb_free_nodes; /*nb of available nodes of each level*/ - int **node_id; /*ID of the nodes of the tree for each level*/ - int **free_nodes; /*ID of the nodes of the tree for each level*/ -}tm_topology_t; - - -typedef struct { - double ** mat; - double * sum_row; - int order; -} affinity_mat_t; - - - -tree_t * build_tree(double **tab,int N); -tree_t * build_tree_from_topology(tm_topology_t *topology,double **tab,int N, double *obj_weight, double *comm_speed); -void map_tree(tree_t *,tree_t*); +void update_val(tm_affinity_mat_t *aff_mat,tm_tree_t *parent); void display_tab(double **tab,int N); -double speed(int depth); -void set_node(tree_t *node,tree_t ** child, int arity,tree_t *parent,int id,double val,tree_t *deb_tab_child, int depth); -void free_constraint_tree(tree_t *tree); -void free_tree(tree_t *tree); -void free_tab_double(double**tab,int N); -void free_tab_int(int**tab,int N); -void update_val(affinity_mat_t *aff_mat,tree_t *parent); -void FREE_tree(tree_t *tree); -void FREE_tab_double(double**,int); +void set_node(tm_tree_t *node,tm_tree_t ** child, int arity,tm_tree_t *parent, + int id,double val,tm_tree_t *tab_child,int depth); + typedef struct _group_list_t{ struct _group_list_t *next; - tree_t **tab; + tm_tree_t **tab; double val; double sum_neighbour; double wg; + int id; + double *bound; }group_list_t; @@ -74,21 +27,13 @@ typedef struct{ }adjacency_t; +typedef struct _work_unit_t{ + int nb_groups; + int *tab_group; + int done; + int nb_work; + struct _work_unit_t *next; +}work_unit_t; -/* for debugging malloc */ -/* #define __DEBUG_MY_MALLOC__ */ -#undef __DEBUG_MY_MALLOC__ -#ifdef __DEBUG_MY_MALLOC__ -#include "tm_malloc.h" -#define MALLOC(x) my_malloc(x,__FILE__,__LINE__) -#define CALLOC(x,y) my_calloc(x,y,__FILE__,__LINE__) -#define FREE my_free -#define MEM_CHECK my_mem_check -#else -#define MALLOC malloc -#define CALLOC calloc -#define FREE free -#define MEM_CHECK my_mem_check #endif -#endif diff --git a/ompi/mca/topo/treematch/treematch/tm_verbose.c b/ompi/mca/topo/treematch/treematch/tm_verbose.c index 9ff83191215..e360d7122b9 100644 --- a/ompi/mca/topo/treematch/treematch/tm_verbose.c +++ b/ompi/mca/topo/treematch/treematch/tm_verbose.c @@ -1,11 +1,34 @@ #include "tm_verbose.h" +#include static unsigned int verbose_level = ERROR; +static FILE *output = NULL; -void set_verbose_level(unsigned int level){ +void tm_set_verbose_level(unsigned int level){ verbose_level = level; } - -unsigned int get_verbose_level(){ +unsigned int tm_get_verbose_level(){ return verbose_level; } + +int tm_open_verbose_file(char *filename){ + output = fopen(filename,"w"); + if(output == NULL) + return 0; + else + return 1; +} + +int tm_close_verbose_file(void){ + if(output != NULL) + return fclose(output); + + return 0; +} + +FILE *tm_get_verbose_output(){ + if(!output) + return stdout; + else + return output; +} diff --git a/ompi/mca/topo/treematch/treematch/tm_verbose.h b/ompi/mca/topo/treematch/treematch/tm_verbose.h index eafb0942f4e..e16cbbc6c00 100644 --- a/ompi/mca/topo/treematch/treematch/tm_verbose.h +++ b/ompi/mca/topo/treematch/treematch/tm_verbose.h @@ -1,11 +1,22 @@ +#include + #define NONE 0 +/* output in stderr*/ #define CRITICAL 1 #define ERROR 2 +/* output in stdout*/ #define WARNING 3 -#define INFO 4 -#define DEBUG 5 +#define TIMING 4 +#define INFO 5 +#define DEBUG 6 + -void set_verbose_level(unsigned int level); -unsigned int get_verbose_level(void); +/* return 0 on errror and 1 on success */ +int tm_open_verbose_file(char *filename); +int tm_close_verbose_file(void); +void tm_set_verbose_level(unsigned int level); +unsigned int tm_get_verbose_level(void); +FILE * tm_get_verbose_output(void); +#define tm_verbose_printf(level, ...) level <= tm_get_verbose_level()?fprintf(tm_get_verbose_output(),__VA_ARGS__):0 diff --git a/ompi/mca/topo/treematch/treematch/treematch.h b/ompi/mca/topo/treematch/treematch/treematch.h new file mode 100644 index 00000000000..8891c819d0d --- /dev/null +++ b/ompi/mca/topo/treematch/treematch/treematch.h @@ -0,0 +1,188 @@ +#ifndef __TREEMATCH_H__ +#define __TREEMATCH_H__ + +/* size_t definition */ +#include +#include "tm_verbose.h" + +/********* TreeMatch Public Enum **********/ + +/*type of topology files that can be read*/ +typedef enum{ + TM_FILE_TYPE_UNDEF, + TM_FILE_TYPE_XML, + TM_FILE_TYPE_TGT +} tm_file_type_t; + +/* different metrics to evaluate the solution */ +typedef enum{ + TM_METRIC_SUM_COM = 1, + TM_METRIC_MAX_COM = 2, + TM_METRIC_HOP_BYTE = 3 +} tm_metric_t; + + +/********* TreeMatch Public Structures **********/ + +typedef struct _job_info_t{ + int submit_date; + int job_id; + int finish_date; +} tm_job_info_t; + +typedef struct _tree_t{ + int constraint; /* tells if the tree has been constructed with constraints on the nodes or not. + Usefull for freeing it. needs to be set on the root only*/ + struct _tree_t **child; + struct _tree_t *parent; + struct _tree_t *tab_child; /*the pointer to be freed*/ + double val; + int arity; + int depth; + int id; + int uniq; + int dumb; /* 1 if the node belongs to a dumb tree: hence has to be freed separately*/ + tm_job_info_t *job_info; + int nb_processes; /* number of grouped processes (i.e. the order of the affinity matrix). Set at the root only*/ +}tm_tree_t; /* FT : changer le nom : tm_grouap_hierachy_t ?*/ + +/* Maximum number of levels in the tree*/ +#define TM_MAX_LEVELS 100 + +typedef struct { + int *arity; /* arity of the nodes of each level*/ + int nb_levels; /*number of levels of the tree. Levels are numbered from top to bottom starting at 0*/ + size_t *nb_nodes; /*nb of nodes of each level*/ + int **node_id; /*ID of the nodes of the tree for each level*/ + int **node_rank ; /*rank of the nodes of the tree for each level given its ID: this is the inverse tab of node_id*/ + size_t *nb_free_nodes; /*nb of available nodes of each level*/ + int **free_nodes; /*tab of node that are free: useful to simulate batch scheduler*/ + double *cost; /*cost of the communication depending on the distance: + cost[i] is the cost for communicating at distance nb_levels-i*/ + int *constraints; /* array of constraints: id of the nodes where it is possible to map processes */ + int nb_constraints; /* Size of the above array */ + int oversub_fact; /* maximum number of processes to be mapped on a given node */ + int nb_proc_units; /* the real number of units used for computation */ +}tm_topology_t; + + +typedef struct { + double ** mat; + double * sum_row; + int order; +} tm_affinity_mat_t; + +/* + sigma_i is such that process i is mapped on core sigma_i + k_i is such that core i exectutes process k_i_j (0<=j<<=oversubscribing factor - 1) + + size of sigma is the number of processes (nb_objs) + size of k is the number of cores/nodes (nb_compute_units) + size of k[i] is the number of process we can execute per nodes (1 if no oversubscribing) + + We must have numbe of process<=number of cores + + k[i] == NULL if no process is mapped on core i +*/ + +typedef struct { + int *sigma; + size_t sigma_length; + int **k; + size_t k_length; + int oversub_fact; +}tm_solution_t; + + +/************ TreeMatch Public API ************/ + +/* load XML or TGT topology */ +tm_topology_t *tm_load_topology(char *arch_filename, tm_file_type_t arch_file_type); +/* + Alternatively, build a synthetic balanced topology. + + nb_levels : number of levels of the topology +1 (the last level must be of cost 0 and arity 0). + arity : array of arity of the first nb_level (of size nb_levels) + cost : array of costs between the levels (of size nb_levels) + core_numbering: numbering of the core by the system. Array of size nb_core_per_node + + nb_core_per_nodes: number of cores of a given node. Size of the array core_numbering + + both arity and cost are copied inside tm_build_synthetic_topology + + The numbering of the cores is done in round robin fashion after a width traversal of the topology. + for example: + {0,1,2,3} becomes 0,1,2,3,4,5,6,7... + and + {0,2,1,3} becomes 0,2,1,3,4,6,5,7,... + + Example of call to build the 128.tgt file: tleaf 4 16 500 2 100 2 50 2 10 + + double cost[5] = {500,100,50,10,0}; + int arity[5] = {16,2,2,2,0}; + int cn[5]={0,1}; + + topology = tm_build_synthetic_topology(arity,cost,5,cn,2); + + */ +tm_topology_t *tm_build_synthetic_topology(int *arity, double *cost, int nb_levels, int *core_numbering, int nb_core_per_nodes); +/* load affinity matrix */ +tm_affinity_mat_t *tm_load_aff_mat(char *com_filename); +/* + Alternativelly, build the affinity matrix from a array of array of matrix of size order by order + For performance reason mat is not copied. +*/ +tm_affinity_mat_t * tm_build_affinity_mat(double **mat, int order); +/* Add constraints to toplogy + Return 1 on success and 0 if the constari,ts id are not compatible withe nodes id */ +int tm_topology_add_binding_constraints(char *bind_filename, tm_topology_t *topology); +/* Alternatively, set the constraints from an array. + Return 1 on success and 0 if the constari,ts id are not compatible withe nodes id + + The array constraints is copied inside tm_topology_set_binding_constraints + +*/ +int tm_topology_set_binding_constraints(int *constraints, int nb_constraints, tm_topology_t *topology); +/* display arity of the topology */ +void tm_display_arity(tm_topology_t *topology); +/* display the full topology */ +void tm_display_topology(tm_topology_t *topology); +/* Optimize the topology by decomposing arities */ +void tm_optimize_topology(tm_topology_t **topology); +/* Manage oversubscribing */ +void tm_enable_oversubscribing(tm_topology_t *topology, unsigned int oversub_fact); +/* core of the treematch: compute the solution tree */ +tm_tree_t *tm_build_tree_from_topology(tm_topology_t *topology, tm_affinity_mat_t *aff_mat, double *obj_weight, double *com_speed); +/* compute the mapping according to teh tree an dthe core numbering*/ +tm_solution_t *tm_compute_mapping(tm_topology_t *topology, tm_tree_t *comm_tree); +/* display the solution*/ +double tm_display_solution(tm_topology_t *topology, tm_affinity_mat_t *aff_mat, tm_solution_t *sol, tm_metric_t metric); +/* display RR, packed, MPIPP*/ +void tm_display_other_heuristics(tm_topology_t *topology, tm_affinity_mat_t *aff_mat, tm_metric_t metric); +/* free TM strutures*/ +void tm_free_topology(tm_topology_t *topology); +void tm_free_tree(tm_tree_t *comm_tree); +void tm_free_solution(tm_solution_t *sol); +void tm_free_affinity_mat(tm_affinity_mat_t *aff_mat); +/* manage verbosity of TM*/ +void tm_set_verbose_level(unsigned int level); +unsigned int tm_get_verbose_level(void); +/* finalize treematch :check memory if necessary, and free internal variables (thread pool)*/ +void tm_finalize(void); + +/* +Ask for exhaustive search: may be very long + new_val == 0 : no exhuative search + new_val != 0 : exhuative search +*/ +void tm_set_exhaustive_search_flag(int new_val); +int tm_get_exhaustive_search_flag(void); + + +/* Setting the maximum number of threads you want to use in parallel parts of TreeMatch */ +void tm_set_max_nb_threads(unsigned int val); + + +#include "tm_malloc.h" + +#endif diff --git a/ompi/mca/topo/treematch/treematch/uthash.h b/ompi/mca/topo/treematch/treematch/uthash.h index 7b98cad5cc9..3a3dd9a69a2 100644 --- a/ompi/mca/topo/treematch/treematch/uthash.h +++ b/ompi/mca/topo/treematch/treematch/uthash.h @@ -22,7 +22,7 @@ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #ifndef UTHASH_H -#define UTHASH_H +#define UTHASH_H #include /* memcmp,strlen */ #include /* ptrdiff_t */ @@ -49,7 +49,7 @@ do { char **_da_dst = (char**)(&(dst)); \ *_da_dst = (char*)(src); \ } while(0) -#else +#else #define DECLTYPE_ASSIGN(dst,src) \ do { \ (dst) = DECLTYPE(dst)(src); \ @@ -121,9 +121,9 @@ do { HASH_BLOOM_BITTEST((tbl)->bloom_bv, (hashv & (uint32_t)((1ULL << (tbl)->bloom_nbits) - 1))) #else -#define HASH_BLOOM_MAKE(tbl) -#define HASH_BLOOM_FREE(tbl) -#define HASH_BLOOM_ADD(tbl,hashv) +#define HASH_BLOOM_MAKE(tbl) +#define HASH_BLOOM_FREE(tbl) +#define HASH_BLOOM_ADD(tbl,hashv) #define HASH_BLOOM_TEST(tbl,hashv) (1) #endif @@ -148,7 +148,7 @@ do { #define HASH_ADD(hh,head,fieldname,keylen_in,add) \ HASH_ADD_KEYPTR(hh,head,&((add)->fieldname),keylen_in,add) - + #define HASH_ADD_KEYPTR(hh,head,keyptr,keylen_in,add) \ do { \ unsigned _ha_bkt; \ @@ -300,10 +300,10 @@ do { } \ } while (0) #else -#define HASH_FSCK(hh,head) +#define HASH_FSCK(hh,head) #endif -/* When compiled with -DHASH_EMIT_KEYS, length-prefixed keys are emitted to +/* When compiled with -DHASH_EMIT_KEYS, length-prefixed keys are emitted to * the descriptor to which this macro is defined for tuning the hash function. * The app can #include to get the prototype for write(2). */ #ifdef HASH_EMIT_KEYS @@ -313,12 +313,12 @@ do { write(HASH_EMIT_KEYS, &_klen, sizeof(_klen)); \ write(HASH_EMIT_KEYS, keyptr, fieldlen); \ } while (0) -#else -#define HASH_EMIT_KEY(hh,head,keyptr,fieldlen) +#else +#define HASH_EMIT_KEY(hh,head,keyptr,fieldlen) #endif /* default to Jenkin's hash unless overridden e.g. DHASH_FUNCTION=HASH_SAX */ -#ifdef HASH_FUNCTION +#ifdef HASH_FUNCTION #define HASH_FCN HASH_FUNCTION #else #define HASH_FCN HASH_JEN @@ -335,7 +335,7 @@ do { } while (0) -/* SAX/FNV/OAT/JEN hash functions are macro variants of those listed at +/* SAX/FNV/OAT/JEN hash functions are macro variants of those listed at * http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx */ #define HASH_SAX(key,keylen,num_bkts,hashv,bkt) \ do { \ @@ -356,7 +356,7 @@ do { hashv = (hashv * 16777619) ^ _hf_key[_fn_i]; \ bkt = hashv & (num_bkts-1); \ } while(0); - + #define HASH_OAT(key,keylen,num_bkts,hashv,bkt) \ do { \ unsigned _ho_i; \ @@ -485,14 +485,14 @@ do { #ifdef HASH_USING_NO_STRICT_ALIASING /* The MurmurHash exploits some CPU's (x86,x86_64) tolerance for unaligned reads. * For other types of CPU's (e.g. Sparc) an unaligned read causes a bus error. - * MurmurHash uses the faster approach only on CPU's where we know it's safe. + * MurmurHash uses the faster approach only on CPU's where we know it's safe. * * Note the preprocessor built-in defines can be emitted using: * * gcc -m64 -dM -E - < /dev/null (on gcc) * cc -## a.c (where a.c is a simple test file) (Sun Studio) */ -#if (defined(__i386__) || defined(__x86_64__)) +#if (defined(__i386__) || defined(__x86_64__)) #define MUR_GETBLOCK(p,i) p[i] #else /* non intel */ #define MUR_PLUS0_ALIGNED(p) (((unsigned long)p & 0x3) == 0) @@ -562,7 +562,7 @@ do { \ #endif /* HASH_USING_NO_STRICT_ALIASING */ /* key comparison function; return 0 if keys equal */ -#define HASH_KEYCMP(a,b,len) memcmp(a,b,len) +#define HASH_KEYCMP(a,b,len) memcmp(a,b,len) /* iterate over items in a known bucket to find desired item */ #define HASH_FIND_IN_BKT(tbl,hh,head,keyptr,keylen_in,out) \ @@ -603,36 +603,36 @@ do { } \ if (hh_del->hh_next) { \ hh_del->hh_next->hh_prev = hh_del->hh_prev; \ - } + } /* Bucket expansion has the effect of doubling the number of buckets * and redistributing the items into the new buckets. Ideally the * items will distribute more or less evenly into the new buckets * (the extent to which this is true is a measure of the quality of - * the hash function as it applies to the key domain). - * + * the hash function as it applies to the key domain). + * * With the items distributed into more buckets, the chain length * (item count) in each bucket is reduced. Thus by expanding buckets - * the hash keeps a bound on the chain length. This bounded chain + * the hash keeps a bound on the chain length. This bounded chain * length is the essence of how a hash provides constant time lookup. - * + * * The calculation of tbl->ideal_chain_maxlen below deserves some * explanation. First, keep in mind that we're calculating the ideal * maximum chain length based on the *new* (doubled) bucket count. * In fractions this is just n/b (n=number of items,b=new num buckets). - * Since the ideal chain length is an integer, we want to calculate + * Since the ideal chain length is an integer, we want to calculate * ceil(n/b). We don't depend on floating point arithmetic in this * hash, so to calculate ceil(n/b) with integers we could write - * + * * ceil(n/b) = (n/b) + ((n%b)?1:0) - * + * * and in fact a previous version of this hash did just that. * But now we have improved things a bit by recognizing that b is * always a power of two. We keep its base 2 log handy (call it lb), * so now we can write this with a bit shift and logical AND: - * + * * ceil(n/b) = (n>>lb) + ( (n & (b-1)) ? 1:0) - * + * */ #define HASH_EXPAND_BUCKETS(tbl) \ do { \ @@ -684,7 +684,7 @@ do { /* This is an adaptation of Simon Tatham's O(n log(n)) mergesort */ -/* Note that HASH_SORT assumes the hash handle name to be hh. +/* Note that HASH_SORT assumes the hash handle name to be hh. * HASH_SRT was added to allow the hash handle name to be passed in. */ #define HASH_SORT(head,cmpfcn) HASH_SRT(hh,head,cmpfcn) #define HASH_SRT(hh,head,cmpfcn) \ @@ -766,10 +766,10 @@ do { } \ } while (0) -/* This function selects items from one hash into another hash. - * The end result is that the selected items have dual presence - * in both hashes. There is no copy of the items made; rather - * they are added into the new hash through a secondary hash +/* This function selects items from one hash into another hash. + * The end result is that the selected items have dual presence + * in both hashes. There is no copy of the items made; rather + * they are added into the new hash through a secondary hash * hash handle that must be present in the structure. */ #define HASH_SELECT(hh_dst, dst, hh_src, src, cond) \ do { \ @@ -823,7 +823,7 @@ do { #ifdef NO_DECLTYPE #define HASH_ITER(hh,head,el,tmp) \ for((el)=(head), (*(char**)(&(tmp)))=(char*)((head)?(head)->hh.next:NULL); \ - el; (el)=(tmp),(*(char**)(&(tmp)))=(char*)((tmp)?(tmp)->hh.next:NULL)) + el; (el)=(tmp),(*(char**)(&(tmp)))=(char*)((tmp)?(tmp)->hh.next:NULL)) #else #define HASH_ITER(hh,head,el,tmp) \ for((el)=(head),(tmp)=DECLTYPE(el)((head)?(head)->hh.next:NULL); \ @@ -831,7 +831,7 @@ for((el)=(head),(tmp)=DECLTYPE(el)((head)?(head)->hh.next:NULL); #endif /* obtain a count of items in the hash */ -#define HASH_COUNT(head) HASH_CNT(hh,head) +#define HASH_COUNT(head) HASH_CNT(hh,head) #define HASH_CNT(hh,head) ((head)?((head)->hh.tbl->num_items):0) typedef struct UT_hash_bucket { @@ -840,7 +840,7 @@ typedef struct UT_hash_bucket { /* expand_mult is normally set to 0. In this situation, the max chain length * threshold is enforced at its default value, HASH_BKT_CAPACITY_THRESH. (If - * the bucket's chain exceeds this length, bucket expansion is triggered). + * the bucket's chain exceeds this length, bucket expansion is triggered). * However, setting expand_mult to a non-zero value delays bucket expansion * (that would be triggered by additions to this particular bucket) * until its chain length reaches a *multiple* of HASH_BKT_CAPACITY_THRESH. @@ -848,7 +848,7 @@ typedef struct UT_hash_bucket { * multiplier is to reduce bucket expansions, since they are expensive, in * situations where we know that a particular bucket tends to be overused. * It is better to let its chain length grow to a longer yet-still-bounded - * value, than to do an O(n) bucket expansion too often. + * value, than to do an O(n) bucket expansion too often. */ unsigned expand_mult; @@ -874,7 +874,7 @@ typedef struct UT_hash_table { * hash distribution; reaching them in a chain traversal takes >ideal steps */ unsigned nonideal_items; - /* ineffective expands occur when a bucket doubling was performed, but + /* ineffective expands occur when a bucket doubling was performed, but * afterward, more than half the items in the hash had nonideal chain * positions. If this happens on two consecutive expansions we inhibit any * further expansion, as it's not helping; this happens when the hash diff --git a/ompi/mca/vprotocol/example/Makefile.am b/ompi/mca/vprotocol/example/Makefile.am index fff5e295ef3..64ec3e4cca0 100644 --- a/ompi/mca/vprotocol/example/Makefile.am +++ b/ompi/mca/vprotocol/example/Makefile.am @@ -1,6 +1,7 @@ # # Copyright (c) 2004-2007 The Trustees of the University of Tennessee. # All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -36,7 +37,7 @@ local_sources = \ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_vprotocol_example_la_SOURCES = $(local_sources) -mca_vprotocol_example_la_LIBADD = +mca_vprotocol_example_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la mca_vprotocol_example_la_CFLAGS = mca_vprotocol_example_la_LDFLAGS = -module -avoid-version diff --git a/ompi/mca/vprotocol/pessimist/Makefile.am b/ompi/mca/vprotocol/pessimist/Makefile.am index 9a1305b1f06..f037b9f6d00 100644 --- a/ompi/mca/vprotocol/pessimist/Makefile.am +++ b/ompi/mca/vprotocol/pessimist/Makefile.am @@ -1,6 +1,7 @@ # # Copyright (c) 2004-2007 The Trustees of the University of Tennessee. # All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -49,6 +50,7 @@ mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_vprotocol_pessimist_la_SOURCES = $(local_sources) mca_vprotocol_pessimist_la_LDFLAGS = -module -avoid-version +mca_vprotocol_pessimist_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la noinst_LTLIBRARIES = $(component_noinst) libmca_vprotocol_pessimist_la_SOURCES = $(local_sources) diff --git a/ompi/message/message.h b/ompi/message/message.h index 60778ebed1a..0f0f1eacfac 100644 --- a/ompi/message/message.h +++ b/ompi/message/message.h @@ -1,7 +1,7 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* * Copyright (c) 2011-2012 Sandia National Laboratories. All rights reserved. - * Copyright (c) 2012 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2012-2017 Cisco Systems, Inc. All rights reserved * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ @@ -38,7 +38,7 @@ OMPI_DECLSPEC OBJ_CLASS_DECLARATION(ompi_message_t); * See ompi/communicator/communicator.h comments with struct ompi_communicator_t * for full explanation why we chose the following padding construct for predefines. */ -#define PREDEFINED_MESSAGE_PAD (sizeof(void*) * 32) +#define PREDEFINED_MESSAGE_PAD 256 struct ompi_predefined_message_t { struct ompi_message_t message; diff --git a/ompi/mpi/c/Makefile.am b/ompi/mpi/c/Makefile.am index cbca901d614..d340e194fdc 100644 --- a/ompi/mpi/c/Makefile.am +++ b/ompi/mpi/c/Makefile.am @@ -15,7 +15,7 @@ # Copyright (c) 2012-2013 Inria. All rights reserved. # Copyright (c) 2013 Los Alamos National Security, LLC. All rights # reserved. -# Copyright (c) 2015 Research Organization for Information Science +# Copyright (c) 2015-2017 Research Organization for Information Science # and Technology (RIST). All rights reserved. # $COPYRIGHT$ # @@ -154,6 +154,67 @@ libmpi_c_mpi_la_SOURCES = \ exscan.c \ fetch_and_op.c \ iexscan.c \ + file_c2f.c \ + file_call_errhandler.c \ + file_close.c \ + file_create_errhandler.c \ + file_delete.c \ + file_f2c.c \ + file_get_amode.c \ + file_get_atomicity.c \ + file_get_byte_offset.c \ + file_get_errhandler.c \ + file_get_group.c \ + file_get_info.c \ + file_get_position.c \ + file_get_position_shared.c \ + file_get_size.c \ + file_get_type_extent.c \ + file_get_view.c \ + file_iread_at.c \ + file_iread_at_all.c \ + file_iread.c \ + file_iread_all.c \ + file_iread_shared.c \ + file_iwrite_at.c \ + file_iwrite_at_all.c \ + file_iwrite.c \ + file_iwrite_all.c \ + file_iwrite_shared.c \ + file_open.c \ + file_preallocate.c \ + file_read_all_begin.c \ + file_read_all.c \ + file_read_all_end.c \ + file_read_at_all_begin.c \ + file_read_at_all.c \ + file_read_at_all_end.c \ + file_read_at.c \ + file_read.c \ + file_read_ordered_begin.c \ + file_read_ordered.c \ + file_read_ordered_end.c \ + file_read_shared.c \ + file_seek.c \ + file_seek_shared.c \ + file_set_atomicity.c \ + file_set_errhandler.c \ + file_set_info.c \ + file_set_size.c \ + file_set_view.c \ + file_sync.c \ + file_write_all_begin.c \ + file_write_all.c \ + file_write_all_end.c \ + file_write_at_all_begin.c \ + file_write_at_all.c \ + file_write_at_all_end.c \ + file_write_at.c \ + file_write.c \ + file_write_ordered_begin.c \ + file_write_ordered.c \ + file_write_ordered_end.c \ + file_write_shared.c \ finalize.c \ finalized.c \ free_mem.c \ @@ -251,6 +312,7 @@ libmpi_c_mpi_la_SOURCES = \ recv_init.c \ recv.c \ reduce.c \ + register_datarep.c \ ireduce.c \ reduce_local.c \ reduce_scatter.c \ @@ -384,72 +446,6 @@ libmpi_c_mpi_la_SOURCES = \ win_unlock_all.c \ win_wait.c -if OMPI_PROVIDE_MPI_FILE_INTERFACE -libmpi_c_mpi_la_SOURCES += \ - file_c2f.c \ - file_call_errhandler.c \ - file_close.c \ - file_create_errhandler.c \ - file_delete.c \ - file_f2c.c \ - file_get_amode.c \ - file_get_atomicity.c \ - file_get_byte_offset.c \ - file_get_errhandler.c \ - file_get_group.c \ - file_get_info.c \ - file_get_position.c \ - file_get_position_shared.c \ - file_get_size.c \ - file_get_type_extent.c \ - file_get_view.c \ - file_iread_at.c \ - file_iread_at_all.c \ - file_iread.c \ - file_iread_all.c \ - file_iread_shared.c \ - file_iwrite_at.c \ - file_iwrite_at_all.c \ - file_iwrite.c \ - file_iwrite_all.c \ - file_iwrite_shared.c \ - file_open.c \ - file_preallocate.c \ - file_read_all_begin.c \ - file_read_all.c \ - file_read_all_end.c \ - file_read_at_all_begin.c \ - file_read_at_all.c \ - file_read_at_all_end.c \ - file_read_at.c \ - file_read.c \ - file_read_ordered_begin.c \ - file_read_ordered.c \ - file_read_ordered_end.c \ - file_read_shared.c \ - file_seek.c \ - file_seek_shared.c \ - file_set_atomicity.c \ - file_set_errhandler.c \ - file_set_info.c \ - file_set_size.c \ - file_set_view.c \ - file_sync.c \ - file_write_all_begin.c \ - file_write_all.c \ - file_write_all_end.c \ - file_write_at_all_begin.c \ - file_write_at_all.c \ - file_write_at_all_end.c \ - file_write_at.c \ - file_write.c \ - file_write_ordered_begin.c \ - file_write_ordered.c \ - file_write_ordered_end.c \ - file_write_shared.c \ - register_datarep.c -endif - # Conditionally install the header files if WANT_INSTALL_HEADERS diff --git a/ompi/mpi/c/accumulate.c b/ompi/mpi/c/accumulate.c index 8d7370b4eef..74041ac8130 100644 --- a/ompi/mpi/c/accumulate.c +++ b/ompi/mpi/c/accumulate.c @@ -91,7 +91,7 @@ int MPI_Accumulate(const void *origin_addr, int origin_count, MPI_Datatype origi /* ACCUMULATE, unlike REDUCE, can use with derived datatypes with predefinied operations, with some - restrictions outlined in MPI-2:6.3.4. The derived + restrictions outlined in MPI-3:11.3.4. The derived datatype must be composed entierly from one predefined datatype (so you can do all the construction you want, but at the bottom, you can only use one datatype, say, diff --git a/ompi/mpi/c/add_error_class.c b/ompi/mpi/c/add_error_class.c index 74d1cdf9dd3..a0a2dd21ea6 100644 --- a/ompi/mpi/c/add_error_class.c +++ b/ompi/mpi/c/add_error_class.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006 University of Houston. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -66,12 +66,12 @@ int MPI_Add_error_class(int *errorclass) ** in attribute/attribute.c and attribute/attribute_predefined.c ** why we have to call the fortran attr_set function */ - rc = ompi_attr_set_fortran_mpi1 (COMM_ATTR, - MPI_COMM_WORLD, - &MPI_COMM_WORLD->c_keyhash, - MPI_LASTUSEDCODE, - ompi_mpi_errcode_lastused, - true); + rc = ompi_attr_set_fint (COMM_ATTR, + MPI_COMM_WORLD, + &MPI_COMM_WORLD->c_keyhash, + MPI_LASTUSEDCODE, + ompi_mpi_errcode_lastused, + true); if ( MPI_SUCCESS != rc ) { return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, rc, FUNC_NAME); } diff --git a/ompi/mpi/c/add_error_code.c b/ompi/mpi/c/add_error_code.c index 9ec49541949..e5fd5669aee 100644 --- a/ompi/mpi/c/add_error_code.c +++ b/ompi/mpi/c/add_error_code.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006 University of Houston. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -73,12 +73,12 @@ int MPI_Add_error_code(int errorclass, int *errorcode) ** in attribute/attribute.c and attribute/attribute_predefined.c ** why we have to call the fortran attr_set function */ - rc = ompi_attr_set_fortran_mpi1 (COMM_ATTR, - MPI_COMM_WORLD, - &MPI_COMM_WORLD->c_keyhash, - MPI_LASTUSEDCODE, - ompi_mpi_errcode_lastused, - true); + rc = ompi_attr_set_fint (COMM_ATTR, + MPI_COMM_WORLD, + &MPI_COMM_WORLD->c_keyhash, + MPI_LASTUSEDCODE, + ompi_mpi_errcode_lastused, + true); if ( MPI_SUCCESS != rc ) { return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, rc, FUNC_NAME); } diff --git a/ompi/mpi/c/bindings.h b/ompi/mpi/c/bindings.h index 12e29cbfbd2..46239d5a868 100644 --- a/ompi/mpi/c/bindings.h +++ b/ompi/mpi/c/bindings.h @@ -10,6 +10,8 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2018 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -71,6 +73,17 @@ BEGIN_C_DECLS else if( !opal_datatype_is_valid(&((DDT)->super)) ) (RC) = MPI_ERR_TYPE; \ } while(0) +#define OMPI_CHECK_DATATYPE_FOR_VIEW( RC, DDT, COUNT ) \ + do { \ + /* (RC) = MPI_SUCCESS; */ \ + if( NULL == (DDT) || MPI_DATATYPE_NULL == (DDT) ) (RC) = MPI_ERR_TYPE; \ + else if( (COUNT) < 0 ) (RC) = MPI_ERR_COUNT; \ + else if( !opal_datatype_is_committed(&((DDT)->super)) ) (RC) = MPI_ERR_TYPE; \ + /* XXX Fix flags else if( ompi_datatype_is_overlapped((DDT)) ) (RC) = MPI_ERR_TYPE; */ \ + else if( !opal_datatype_is_valid(&((DDT)->super)) ) (RC) = MPI_ERR_TYPE; \ + else if( !ompi_datatype_is_monotonic((DDT)) ) (RC) = MPI_ERR_TYPE; \ + } while (0) + /* This macro has to be used to check the correctness of the user buffer depending on the datatype. * This macro expects that the DDT parameter is a valid pointer to an ompi datatype object. diff --git a/ompi/mpi/c/comm_dup_with_info.c b/ompi/mpi/c/comm_dup_with_info.c index 81d73286133..4f8269c31d7 100644 --- a/ompi/mpi/c/comm_dup_with_info.c +++ b/ompi/mpi/c/comm_dup_with_info.c @@ -16,6 +16,7 @@ * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -67,7 +68,7 @@ int MPI_Comm_dup_with_info(MPI_Comm comm, MPI_Info info, MPI_Comm *newcomm) OPAL_CR_ENTER_LIBRARY(); - rc = ompi_comm_dup_with_info (comm, info, newcomm); + rc = ompi_comm_dup_with_info (comm, &info->super, newcomm); OMPI_ERRHANDLER_RETURN(rc, comm, rc, FUNC_NAME); } diff --git a/ompi/mpi/c/comm_get_errhandler.c b/ompi/mpi/c/comm_get_errhandler.c index 85e52c1243e..baf37496a3e 100644 --- a/ompi/mpi/c/comm_get_errhandler.c +++ b/ompi/mpi/c/comm_get_errhandler.c @@ -11,9 +11,9 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2007-2009 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2016-2017 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -43,8 +43,6 @@ static const char FUNC_NAME[] = "MPI_Comm_get_errhandler"; int MPI_Comm_get_errhandler(MPI_Comm comm, MPI_Errhandler *errhandler) { - MPI_Errhandler tmp; - /* Error checking */ MEMCHECKER( memchecker_comm(comm); @@ -52,31 +50,27 @@ int MPI_Comm_get_errhandler(MPI_Comm comm, MPI_Errhandler *errhandler) OPAL_CR_NOOP_PROGRESS(); - /* Error checking */ + /* Error checking */ - if (MPI_PARAM_CHECK) { - OMPI_ERR_INIT_FINALIZE(FUNC_NAME); - if (ompi_comm_invalid(comm)) { - return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_COMM, - FUNC_NAME); - } else if (NULL == errhandler) { - return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_ARG, - FUNC_NAME); + if (MPI_PARAM_CHECK) { + OMPI_ERR_INIT_FINALIZE(FUNC_NAME); + if (ompi_comm_invalid(comm)) { + return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_COMM, + FUNC_NAME); + } else if (NULL == errhandler) { + return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_ARG, + FUNC_NAME); + } } - } - - /* On 64 bits environments we have to make sure the reading of the - error_handler became atomic. */ - do { - tmp = comm->error_handler; - } while (!OPAL_ATOMIC_CMPSET_PTR(&(comm->error_handler), tmp, tmp)); - /* Retain the errhandler, corresponding to object refcount decrease - in errhandler_free.c. */ - *errhandler = tmp; - OBJ_RETAIN(tmp); + OPAL_THREAD_LOCK(&(comm->c_lock)); + /* Retain the errhandler, corresponding to object refcount decrease + in errhandler_free.c. */ + OBJ_RETAIN(comm->error_handler); + *errhandler = comm->error_handler; + OPAL_THREAD_UNLOCK(&(comm->c_lock)); - /* All done */ + /* All done */ - return MPI_SUCCESS; + return MPI_SUCCESS; } diff --git a/ompi/mpi/c/comm_get_info.c b/ompi/mpi/c/comm_get_info.c index 10f864c6de2..403f54fdc3b 100644 --- a/ompi/mpi/c/comm_get_info.c +++ b/ompi/mpi/c/comm_get_info.c @@ -3,6 +3,7 @@ * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -46,13 +47,22 @@ int MPI_Comm_get_info(MPI_Comm comm, MPI_Info *info_used) } } - /* At the moment, we do not support any communicator hints. So - just return a new, empty info obect handle. */ + if (NULL == comm->super.s_info) { +/* + * Setup any defaults if MPI_Win_set_info was never called + */ + opal_infosubscribe_change_info(&comm->super, &MPI_INFO_NULL->super); + } + + (*info_used) = OBJ_NEW(ompi_info_t); if (NULL == (*info_used)) { - return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_NO_MEM, + return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_NO_MEM, FUNC_NAME); } + opal_info_t *opal_info_used = &(*info_used)->super; + + opal_info_dup_mpistandard(comm->super.s_info, &opal_info_used); return MPI_SUCCESS; } diff --git a/ompi/mpi/c/comm_get_name.c b/ompi/mpi/c/comm_get_name.c index 6d7bd3ae64f..9225456445c 100644 --- a/ompi/mpi/c/comm_get_name.c +++ b/ompi/mpi/c/comm_get_name.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2008 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -60,9 +60,7 @@ int MPI_Comm_get_name(MPI_Comm comm, char *name, int *length) return OMPI_ERRHANDLER_INVOKE ( comm, MPI_ERR_ARG, FUNC_NAME); } -#ifdef USE_MUTEX_FOR_COMMS OPAL_THREAD_LOCK(&(comm->c_lock)); -#endif /* Note that MPI-2.1 requires: - terminating the string with a \0 - name[*resultlen] == '\0' @@ -80,9 +78,7 @@ int MPI_Comm_get_name(MPI_Comm comm, char *name, int *length) name[0] = '\0'; *length = 0; } -#ifdef USE_MUTEX_FOR_COMMS OPAL_THREAD_UNLOCK(&(comm->c_lock)); -#endif return MPI_SUCCESS; } diff --git a/ompi/mpi/c/comm_join.c b/ompi/mpi/c/comm_join.c index bc7a5635b4d..b6fc1e7f93b 100644 --- a/ompi/mpi/c/comm_join.c +++ b/ompi/mpi/c/comm_join.c @@ -13,7 +13,7 @@ * All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2015-2018 Cisco Systems, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -110,9 +110,6 @@ int MPI_Comm_join(int fd, MPI_Comm *intercomm) send_first = true; } - /* ensure the port name is NULL terminated */ - memset(port_name, 0, MPI_MAX_PORT_NAME); - /* Assumption: socket_send should not block, even if the socket is not configured to be non-blocking, because the message length are so short. */ @@ -120,16 +117,21 @@ int MPI_Comm_join(int fd, MPI_Comm *intercomm) /* we will only use the send_first proc's port name, * so pass it to the recv_first participant */ if (send_first) { - /* open a port */ + // The port_name that we get back will be \0-terminated. The + // strlen+\0 will be <= MPI_MAX_PORT_NAME characters. if (OMPI_SUCCESS != (rc = ompi_dpm_open_port(port_name))) { goto error; } + // Send the strlen+1 so that we both send the \0 and the + // receiver receives the \0. llen = (uint32_t)(strlen(port_name)+1); len = htonl(llen); ompi_socket_send( fd, (char *) &len, sizeof(uint32_t)); ompi_socket_send (fd, port_name, llen); } else { ompi_socket_recv (fd, (char *) &rlen, sizeof(uint32_t)); + // The lrlen that we receive will be the strlen+1 (to account + // for \0), and will be <= MPI_MAX_PORT_NAME. lrlen = ntohl(rlen); ompi_socket_recv (fd, port_name, lrlen); } diff --git a/ompi/mpi/c/comm_set_errhandler.c b/ompi/mpi/c/comm_set_errhandler.c index 8091495ea58..767a92b1a45 100644 --- a/ompi/mpi/c/comm_set_errhandler.c +++ b/ompi/mpi/c/comm_set_errhandler.c @@ -10,9 +10,9 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2016-2017 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -69,11 +69,12 @@ int MPI_Comm_set_errhandler(MPI_Comm comm, MPI_Errhandler errhandler) /* Prepare the new error handler */ OBJ_RETAIN(errhandler); - /* Ditch the old errhandler, and decrement its refcount. On 64 - bits environments we have to make sure the reading of the - error_handler became atomic. */ - tmp = OPAL_ATOMIC_SWAP_PTR(&comm->error_handler, errhandler); + OPAL_THREAD_LOCK(&(comm->c_lock)); + /* Ditch the old errhandler, and decrement its refcount. */ + tmp = comm->error_handler; + comm->error_handler = errhandler; OBJ_RELEASE(tmp); + OPAL_THREAD_UNLOCK(&(comm->c_lock)); /* All done */ return MPI_SUCCESS; diff --git a/ompi/mpi/c/comm_set_info.c b/ompi/mpi/c/comm_set_info.c index bae5c9f6977..cca48a67f21 100644 --- a/ompi/mpi/c/comm_set_info.c +++ b/ompi/mpi/c/comm_set_info.c @@ -3,6 +3,7 @@ * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -16,7 +17,7 @@ #include "ompi/runtime/params.h" #include "ompi/communicator/communicator.h" #include "ompi/errhandler/errhandler.h" -#include "ompi/info/info.h" +#include "opal/util/info_subscriber.h" #include #include @@ -47,7 +48,9 @@ int MPI_Comm_set_info(MPI_Comm comm, MPI_Info info) } } - /* At the moment, we do not support any communicator hints. - So... do nothing */ + OPAL_CR_ENTER_LIBRARY(); + + opal_infosubscribe_change_info(&(comm->super), &(info->super)); + return MPI_SUCCESS; } diff --git a/ompi/mpi/c/comm_split_type.c b/ompi/mpi/c/comm_split_type.c index 7bce9ad890c..535c3897652 100644 --- a/ompi/mpi/c/comm_split_type.c +++ b/ompi/mpi/c/comm_split_type.c @@ -13,6 +13,7 @@ * Copyright (c) 2012 Sandia National Laboratories. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -92,7 +93,7 @@ int MPI_Comm_split_type(MPI_Comm comm, int split_type, int key, *newcomm = MPI_COMM_NULL; rc = MPI_SUCCESS; } else { - rc = ompi_comm_split_type( (ompi_communicator_t*)comm, split_type, key, info, + rc = ompi_comm_split_type( (ompi_communicator_t*)comm, split_type, key, &(info->super), (ompi_communicator_t**)newcomm); } OMPI_ERRHANDLER_RETURN ( rc, comm, rc, FUNC_NAME); diff --git a/ompi/mpi/c/dist_graph_create.c b/ompi/mpi/c/dist_graph_create.c index efb3eb1857f..2c07676a64a 100644 --- a/ompi/mpi/c/dist_graph_create.c +++ b/ompi/mpi/c/dist_graph_create.c @@ -1,6 +1,6 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* - * Copyright (c) 2012-2013 The University of Tennessee and The University + * Copyright (c) 2012-2018 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2012-2013 Inria. All rights reserved. @@ -8,6 +8,7 @@ * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -61,7 +62,7 @@ int MPI_Dist_graph_create(MPI_Comm comm_old, int n, const int sources[], /* Ensure the arrays are full of valid-valued integers */ comm_size = ompi_comm_size(comm_old); for( i = index = 0; i < n; ++i ) { - if (sources[i] < 0 || sources[i] >= comm_size) { + if (((sources[i] < 0) && (sources[i] != MPI_PROC_NULL)) || sources[i] >= comm_size) { return OMPI_ERRHANDLER_INVOKE(comm_old, MPI_ERR_ARG, FUNC_NAME); } else if (degrees[i] < 0) { @@ -69,7 +70,7 @@ int MPI_Dist_graph_create(MPI_Comm comm_old, int n, const int sources[], FUNC_NAME); } for( j = 0; j < degrees[i]; ++j ) { - if (destinations[index] < 0 || destinations[index] >= comm_size) { + if (((destinations[index] < 0) && (destinations[index] != MPI_PROC_NULL)) || destinations[index] >= comm_size) { return OMPI_ERRHANDLER_INVOKE(comm_old, MPI_ERR_ARG, FUNC_NAME); } else if (MPI_UNWEIGHTED != weights && weights[index] < 0) { @@ -88,7 +89,7 @@ int MPI_Dist_graph_create(MPI_Comm comm_old, int n, const int sources[], } err = topo->topo.dist_graph.dist_graph_create(topo, comm_old, n, sources, degrees, - destinations, weights, info, + destinations, weights, &(info->super), reorder, newcomm); OMPI_ERRHANDLER_RETURN(err, comm_old, err, FUNC_NAME); } diff --git a/ompi/mpi/c/dist_graph_create_adjacent.c b/ompi/mpi/c/dist_graph_create_adjacent.c index bf2f2cfa979..3c0f5b95c63 100644 --- a/ompi/mpi/c/dist_graph_create_adjacent.c +++ b/ompi/mpi/c/dist_graph_create_adjacent.c @@ -3,7 +3,7 @@ * Copyright (c) 2008 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2011-2013 The University of Tennessee and The University + * Copyright (c) 2011-2018 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2009 Cisco Systems, Inc. All rights reserved. @@ -12,6 +12,7 @@ * Copyright (c) 2012-2013 Inria. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -73,7 +74,7 @@ int MPI_Dist_graph_create_adjacent(MPI_Comm comm_old, } comm_size = ompi_comm_size(comm_old); for (i = 0; i < indegree; ++i) { - if (sources[i] < 0 || sources[i] >= comm_size) { + if (((sources[i] < 0) && (sources[i] != MPI_PROC_NULL)) || sources[i] >= comm_size) { return OMPI_ERRHANDLER_INVOKE(comm_old, MPI_ERR_ARG, "MPI_Dist_graph_create_adjacent invalid sources"); } else if (MPI_UNWEIGHTED != sourceweights && sourceweights[i] < 0) { @@ -82,7 +83,7 @@ int MPI_Dist_graph_create_adjacent(MPI_Comm comm_old, } } for (i = 0; i < outdegree; ++i) { - if (destinations[i] < 0 || destinations[i] >= comm_size) { + if (((destinations[i] < 0) && (destinations[i] != MPI_PROC_NULL)) || destinations[i] >= comm_size) { return OMPI_ERRHANDLER_INVOKE(comm_old, MPI_ERR_ARG, "MPI_Dist_graph_create_adjacent invalid destinations"); } else if (MPI_UNWEIGHTED != destweights && destweights[i] < 0) { @@ -100,7 +101,7 @@ int MPI_Dist_graph_create_adjacent(MPI_Comm comm_old, err = topo->topo.dist_graph.dist_graph_create_adjacent(topo, comm_old, indegree, sources, sourceweights, outdegree, - destinations, destweights, info, + destinations, destweights, &(info->super), reorder, comm_dist_graph); OMPI_ERRHANDLER_RETURN(err, comm_old, err, FUNC_NAME); } diff --git a/ompi/mpi/c/file_delete.c b/ompi/mpi/c/file_delete.c index cad11c4c35a..652b6843284 100644 --- a/ompi/mpi/c/file_delete.c +++ b/ompi/mpi/c/file_delete.c @@ -14,6 +14,7 @@ * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -78,6 +79,6 @@ int MPI_File_delete(const char *filename, MPI_Info info) /* Since there is no MPI_File handle associated with this function, the MCA has to do a selection and perform the action */ - rc = mca_io_base_delete(filename, info); + rc = mca_io_base_delete(filename, &(info->super)); OMPI_ERRHANDLER_RETURN(rc, MPI_FILE_NULL, rc, FUNC_NAME); } diff --git a/ompi/mpi/c/file_get_errhandler.c b/ompi/mpi/c/file_get_errhandler.c index 6ad02a8003e..e0abadd4a56 100644 --- a/ompi/mpi/c/file_get_errhandler.c +++ b/ompi/mpi/c/file_get_errhandler.c @@ -11,9 +11,9 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2016-2017 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -42,8 +42,6 @@ static const char FUNC_NAME[] = "MPI_File_get_errhandler"; int MPI_File_get_errhandler( MPI_File file, MPI_Errhandler *errhandler) { - MPI_Errhandler tmp; - OPAL_CR_NOOP_PROGRESS(); /* Error checking */ @@ -64,16 +62,12 @@ int MPI_File_get_errhandler( MPI_File file, MPI_Errhandler *errhandler) } } - /* On 64 bits environments we have to make sure the reading of the - error_handler became atomic. */ - do { - tmp = file->error_handler; - } while (!OPAL_ATOMIC_CMPSET_PTR(&(file->error_handler), tmp, tmp)); - + OPAL_THREAD_LOCK(&file->f_lock); /* Retain the errhandler, corresponding to object refcount decrease in errhandler_free.c. */ - *errhandler = tmp; - OBJ_RETAIN(tmp); + *errhandler = file->error_handler; + OBJ_RETAIN(file->error_handler); + OPAL_THREAD_UNLOCK(&file->f_lock); /* All done */ diff --git a/ompi/mpi/c/file_get_info.c b/ompi/mpi/c/file_get_info.c index 51b67a41896..976cbdbed1b 100644 --- a/ompi/mpi/c/file_get_info.c +++ b/ompi/mpi/c/file_get_info.c @@ -12,6 +12,7 @@ * Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -24,6 +25,7 @@ #include "ompi/mpi/c/bindings.h" #include "ompi/runtime/params.h" #include "ompi/errhandler/errhandler.h" +#include "ompi/communicator/communicator.h" #include "ompi/file/file.h" #if OMPI_BUILD_MPI_PROFILING @@ -38,36 +40,34 @@ static const char FUNC_NAME[] = "MPI_File_get_info"; int MPI_File_get_info(MPI_File fh, MPI_Info *info_used) { - int rc; + OPAL_CR_NOOP_PROGRESS(); if (MPI_PARAM_CHECK) { - rc = MPI_SUCCESS; OMPI_ERR_INIT_FINALIZE(FUNC_NAME); + if (NULL == info_used) { + return OMPI_ERRHANDLER_INVOKE(fh, MPI_ERR_INFO, FUNC_NAME); + } if (ompi_file_invalid(fh)) { - rc = MPI_ERR_FILE; - fh = MPI_FILE_NULL; - } else if (NULL == info_used) { - rc = MPI_ERR_ARG; + return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_COMM, + FUNC_NAME); } - OMPI_ERRHANDLER_CHECK(rc, fh, rc, FUNC_NAME); } - OPAL_CR_ENTER_LIBRARY(); - - /* Call the back-end io component function */ + if (NULL == fh->super.s_info) { +/* + * Setup any defaults if MPI_Win_set_info was never called + */ + opal_infosubscribe_change_info(&fh->super, &MPI_INFO_NULL->super); + } - switch (fh->f_io_version) { - case MCA_IO_BASE_V_2_0_0: - rc = fh->f_io_selected_module.v2_0_0. - io_module_file_get_info(fh, info_used); - break; - default: - rc = MPI_ERR_INTERN; - break; + (*info_used) = OBJ_NEW(ompi_info_t); + if (NULL == (*info_used)) { + return OMPI_ERRHANDLER_INVOKE(fh, MPI_ERR_NO_MEM, FUNC_NAME); } + opal_info_t *opal_info_used = &(*info_used)->super; - /* All done */ + opal_info_dup_mpistandard(fh->super.s_info, &opal_info_used); - OMPI_ERRHANDLER_RETURN(rc, fh, rc, FUNC_NAME); + return OMPI_SUCCESS; } diff --git a/ompi/mpi/c/file_iread_all.c b/ompi/mpi/c/file_iread_all.c index 46e2c90ff36..9ea72d0b957 100644 --- a/ompi/mpi/c/file_iread_all.c +++ b/ompi/mpi/c/file_iread_all.c @@ -13,6 +13,7 @@ * Copyright (c) 2015 University of Houston. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -71,8 +72,13 @@ int MPI_File_iread_all(MPI_File fh, void *buf, int count, /* Call the back-end io component function */ switch (fh->f_io_version) { case MCA_IO_BASE_V_2_0_0: - rc = fh->f_io_selected_module.v2_0_0. - io_module_file_iread_all(fh, buf, count, datatype, request); + if( OPAL_UNLIKELY(NULL == fh->f_io_selected_module.v2_0_0.io_module_file_iread_all) ) { + rc = MPI_ERR_UNSUPPORTED_OPERATION; + } + else { + rc = fh->f_io_selected_module.v2_0_0. + io_module_file_iread_all(fh, buf, count, datatype, request); + } break; default: diff --git a/ompi/mpi/c/file_iread_at_all.c b/ompi/mpi/c/file_iread_at_all.c index a8da5702dab..93f646f69d2 100644 --- a/ompi/mpi/c/file_iread_at_all.c +++ b/ompi/mpi/c/file_iread_at_all.c @@ -13,6 +13,7 @@ * Copyright (c) 2015 University of Houston. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -71,9 +72,14 @@ int MPI_File_iread_at_all(MPI_File fh, MPI_Offset offset, void *buf, /* Call the back-end io component function */ switch (fh->f_io_version) { case MCA_IO_BASE_V_2_0_0: - rc = fh->f_io_selected_module.v2_0_0. - io_module_file_iread_at_all(fh, offset, buf, count, datatype, - request); + if( OPAL_UNLIKELY(NULL == fh->f_io_selected_module.v2_0_0.io_module_file_iread_at_all) ) { + rc = MPI_ERR_UNSUPPORTED_OPERATION; + } + else { + rc = fh->f_io_selected_module.v2_0_0. + io_module_file_iread_at_all(fh, offset, buf, count, datatype, + request); + } break; default: diff --git a/ompi/mpi/c/file_iwrite_all.c b/ompi/mpi/c/file_iwrite_all.c index fc9f013ff86..d48d5af457b 100644 --- a/ompi/mpi/c/file_iwrite_all.c +++ b/ompi/mpi/c/file_iwrite_all.c @@ -16,6 +16,7 @@ * Copyright (c) 2015 University of Houston. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -75,8 +76,13 @@ int MPI_File_iwrite_all(MPI_File fh, const void *buf, int count, MPI_Datatype /* Call the back-end io component function */ switch (fh->f_io_version) { case MCA_IO_BASE_V_2_0_0: - rc = fh->f_io_selected_module.v2_0_0. - io_module_file_iwrite_all(fh, buf, count, datatype, request); + if( OPAL_UNLIKELY(NULL == fh->f_io_selected_module.v2_0_0.io_module_file_iwrite_all) ) { + rc = MPI_ERR_UNSUPPORTED_OPERATION; + } + else { + rc = fh->f_io_selected_module.v2_0_0. + io_module_file_iwrite_all(fh, buf, count, datatype, request); + } break; default: diff --git a/ompi/mpi/c/file_iwrite_at_all.c b/ompi/mpi/c/file_iwrite_at_all.c index f2d01983538..017ba96dde5 100644 --- a/ompi/mpi/c/file_iwrite_at_all.c +++ b/ompi/mpi/c/file_iwrite_at_all.c @@ -16,6 +16,7 @@ * Copyright (c) 2015 University of Houston. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -76,9 +77,14 @@ int MPI_File_iwrite_at_all(MPI_File fh, MPI_Offset offset, const void *buf, /* Call the back-end io component function */ switch (fh->f_io_version) { case MCA_IO_BASE_V_2_0_0: - rc = fh->f_io_selected_module.v2_0_0. - io_module_file_iwrite_at_all(fh, offset, buf, count, datatype, - request); + if( OPAL_UNLIKELY(NULL == fh->f_io_selected_module.v2_0_0.io_module_file_iwrite_at_all) ) { + rc = MPI_ERR_UNSUPPORTED_OPERATION; + } + else { + rc = fh->f_io_selected_module.v2_0_0. + io_module_file_iwrite_at_all(fh, offset, buf, count, datatype, + request); + } break; default: diff --git a/ompi/mpi/c/file_open.c b/ompi/mpi/c/file_open.c index 74d63e16a95..13f003dad23 100644 --- a/ompi/mpi/c/file_open.c +++ b/ompi/mpi/c/file_open.c @@ -16,6 +16,7 @@ * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2016 University of Houston. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -105,7 +106,7 @@ int MPI_File_open(MPI_Comm comm, const char *filename, int amode, /* Create an empty MPI_File handle */ *fh = MPI_FILE_NULL; - rc = ompi_file_open(comm, filename, amode, info, fh); + rc = ompi_file_open(comm, filename, amode, &(info->super), fh); /* Creating the file handle also selects a component to use, creates a module, and calls file_open() on the module. So diff --git a/ompi/mpi/c/file_set_errhandler.c b/ompi/mpi/c/file_set_errhandler.c index ebd4c9b132d..20b9c824a83 100644 --- a/ompi/mpi/c/file_set_errhandler.c +++ b/ompi/mpi/c/file_set_errhandler.c @@ -11,9 +11,9 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2016-2017 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -68,11 +68,12 @@ int MPI_File_set_errhandler( MPI_File file, MPI_Errhandler errhandler) /* Prepare the new error handler */ OBJ_RETAIN(errhandler); - /* Ditch the old errhandler, and decrement its refcount. On 64 - bits environments we have to make sure the reading of the - error_handler became atomic. */ - tmp = OPAL_ATOMIC_SWAP_PTR (&file->error_handler, errhandler); + OPAL_THREAD_LOCK(&file->f_lock); + /* Ditch the old errhandler, and decrement its refcount. */ + tmp = file->error_handler; + file->error_handler = errhandler; OBJ_RELEASE(tmp); + OPAL_THREAD_UNLOCK(&file->f_lock); /* All done */ return MPI_SUCCESS; diff --git a/ompi/mpi/c/file_set_info.c b/ompi/mpi/c/file_set_info.c index a6a01d5dad6..ff56aa70c75 100644 --- a/ompi/mpi/c/file_set_info.c +++ b/ompi/mpi/c/file_set_info.c @@ -12,6 +12,7 @@ * Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -25,6 +26,8 @@ #include "ompi/runtime/params.h" #include "ompi/errhandler/errhandler.h" #include "ompi/info/info.h" +#include "ompi/communicator/communicator.h" +#include "opal/util/info_subscriber.h" #include "ompi/file/file.h" #if OMPI_BUILD_MPI_PROFILING @@ -39,34 +42,27 @@ static const char FUNC_NAME[] = "MPI_File_set_info"; int MPI_File_set_info(MPI_File fh, MPI_Info info) { - int rc; + int ret; + + OPAL_CR_NOOP_PROGRESS(); if (MPI_PARAM_CHECK) { - rc = MPI_SUCCESS; OMPI_ERR_INIT_FINALIZE(FUNC_NAME); + if (ompi_file_invalid(fh)) { - fh = MPI_FILE_NULL; - rc = MPI_ERR_FILE; + return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_FILE, FUNC_NAME); + } + + if (NULL == info || MPI_INFO_NULL == info || + ompi_info_is_freed(info)) { + return OMPI_ERRHANDLER_INVOKE(fh, MPI_ERR_INFO, + FUNC_NAME); } - OMPI_ERRHANDLER_CHECK(rc, fh, rc, FUNC_NAME); } OPAL_CR_ENTER_LIBRARY(); - /* Call the back-end io component function */ - - switch (fh->f_io_version) { - case MCA_IO_BASE_V_2_0_0: - rc = fh->f_io_selected_module.v2_0_0. - io_module_file_set_info(fh, info); - break; - - default: - rc = MPI_ERR_INTERN; - break; - } - - /* All done */ + ret = opal_infosubscribe_change_info(&fh->super, &info->super); - OMPI_ERRHANDLER_RETURN(rc, fh, rc, FUNC_NAME); + OMPI_ERRHANDLER_RETURN(ret, fh, ret, FUNC_NAME); } diff --git a/ompi/mpi/c/file_set_view.c b/ompi/mpi/c/file_set_view.c index 5200418c686..a49a80f29aa 100644 --- a/ompi/mpi/c/file_set_view.c +++ b/ompi/mpi/c/file_set_view.c @@ -13,8 +13,9 @@ * Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2013 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2018 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -58,9 +59,9 @@ int MPI_File_set_view(MPI_File fh, MPI_Offset disp, MPI_Datatype etype, rc = MPI_ERR_FILE; fh = MPI_FILE_NULL; } else { - OMPI_CHECK_DATATYPE_FOR_RECV(rc, etype, 0); + OMPI_CHECK_DATATYPE_FOR_VIEW(rc, etype, 0); if (MPI_SUCCESS == rc) { - OMPI_CHECK_DATATYPE_FOR_RECV(rc, filetype, 0); + OMPI_CHECK_DATATYPE_FOR_VIEW(rc, filetype, 0); } } OMPI_ERRHANDLER_CHECK(rc, fh, rc, FUNC_NAME); @@ -73,7 +74,7 @@ int MPI_File_set_view(MPI_File fh, MPI_Offset disp, MPI_Datatype etype, switch (fh->f_io_version) { case MCA_IO_BASE_V_2_0_0: rc = fh->f_io_selected_module.v2_0_0. - io_module_file_set_view(fh, disp, etype, filetype, datarep, info); + io_module_file_set_view(fh, disp, etype, filetype, datarep, &(info->super)); break; default: diff --git a/ompi/mpi/c/get_accumulate.c b/ompi/mpi/c/get_accumulate.c index 3985a117998..d510bb253a1 100644 --- a/ompi/mpi/c/get_accumulate.c +++ b/ompi/mpi/c/get_accumulate.c @@ -98,7 +98,7 @@ int MPI_Get_accumulate(const void *origin_addr, int origin_count, MPI_Datatype o /* GET_ACCUMULATE, unlike REDUCE, can use with derived datatypes with predefinied operations, with some - restrictions outlined in MPI-2:6.3.4. The derived + restrictions outlined in MPI-3:11.3.4. The derived datatype must be composed entierly from one predefined datatype (so you can do all the construction you want, but at the bottom, you can only use one datatype, say, diff --git a/ompi/mpi/c/ineighbor_allgather.c b/ompi/mpi/c/ineighbor_allgather.c index 9454d7d1e34..527c9d449a9 100644 --- a/ompi/mpi/c/ineighbor_allgather.c +++ b/ompi/mpi/c/ineighbor_allgather.c @@ -14,7 +14,7 @@ * Copyright (c) 2012 Oak Rigde National Laboratory. All rights reserved. * Copyright (c) 2013 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ @@ -62,11 +62,7 @@ int MPI_Ineighbor_allgather(const void *sendbuf, int sendcount, MPI_Datatype sen memchecker_datatype(recvtype); memchecker_comm(comm); /* check whether the actual send buffer is defined. */ - if (MPI_IN_PLACE == sendbuf) { - memchecker_call(&opal_memchecker_base_isdefined, - (char *)(recvbuf)+rank*ext, - recvcount, recvtype); - } else { + if (MPI_IN_PLACE != sendbuf) { memchecker_datatype(sendtype); memchecker_call(&opal_memchecker_base_isdefined, sendbuf, sendcount, sendtype); } diff --git a/ompi/mpi/c/ineighbor_allgatherv.c b/ompi/mpi/c/ineighbor_allgatherv.c index 464c645b094..60d7be7b10e 100644 --- a/ompi/mpi/c/ineighbor_allgatherv.c +++ b/ompi/mpi/c/ineighbor_allgatherv.c @@ -14,7 +14,7 @@ * Copyright (c) 2012 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ @@ -70,11 +70,7 @@ int MPI_Ineighbor_allgatherv(const void *sendbuf, int sendcount, MPI_Datatype se } /* check whether the actual send buffer is defined. */ - if (MPI_IN_PLACE == sendbuf) { - memchecker_call(&opal_memchecker_base_isdefined, - (char *)(recvbuf)+displs[rank]*ext, - recvcounts[rank], recvtype); - } else { + if (MPI_IN_PLACE != sendbuf) { memchecker_datatype(sendtype); memchecker_call(&opal_memchecker_base_isdefined, sendbuf, sendcount, sendtype); } diff --git a/ompi/mpi/c/ineighbor_alltoallv.c b/ompi/mpi/c/ineighbor_alltoallv.c index 728e9bfebce..3f30bd42a0a 100644 --- a/ompi/mpi/c/ineighbor_alltoallv.c +++ b/ompi/mpi/c/ineighbor_alltoallv.c @@ -11,7 +11,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2007 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights + * Copyright (c) 2012-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014-2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. @@ -29,7 +29,6 @@ #include "ompi/mpi/c/bindings.h" #include "ompi/runtime/params.h" #include "ompi/communicator/communicator.h" -#include "ompi/communicator/comm_helpers.h" #include "ompi/errhandler/errhandler.h" #include "ompi/datatype/ompi_datatype.h" #include "ompi/memchecker.h" @@ -52,7 +51,7 @@ int MPI_Ineighbor_alltoallv(const void *sendbuf, const int sendcounts[], const i MPI_Request *request) { int i, err; - int indegree, outdegree, weighted; + int indegree, outdegree; MEMCHECKER( ptrdiff_t recv_ext; @@ -68,7 +67,7 @@ int MPI_Ineighbor_alltoallv(const void *sendbuf, const int sendcounts[], const i memchecker_datatype(recvtype); ompi_datatype_type_extent(sendtype, &send_ext); - err = ompi_comm_neighbors_count(comm, &indegree, &outdegree, &weighted); + err = mca_topo_base_neighbor_count (comm, &indegree, &outdegree); if (MPI_SUCCESS == err) { if (MPI_IN_PLACE != sendbuf) { for ( i = 0; i < outdegree; i++ ) { @@ -105,7 +104,7 @@ int MPI_Ineighbor_alltoallv(const void *sendbuf, const int sendcounts[], const i return OMPI_ERRHANDLER_INVOKE(comm, MPI_ERR_ARG, FUNC_NAME); } - err = ompi_comm_neighbors_count(comm, &indegree, &outdegree, &weighted); + err = mca_topo_base_neighbor_count (comm, &indegree, &outdegree); OMPI_ERRHANDLER_CHECK(err, comm, err, FUNC_NAME); for (i = 0; i < outdegree; ++i) { OMPI_CHECK_DATATYPE_FOR_SEND(err, sendtype, sendcounts[i]); diff --git a/ompi/mpi/c/ineighbor_alltoallw.c b/ompi/mpi/c/ineighbor_alltoallw.c index a13115d1627..4601d5bc598 100644 --- a/ompi/mpi/c/ineighbor_alltoallw.c +++ b/ompi/mpi/c/ineighbor_alltoallw.c @@ -11,7 +11,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2007 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights + * Copyright (c) 2012-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014-2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. @@ -29,7 +29,6 @@ #include "ompi/mpi/c/bindings.h" #include "ompi/runtime/params.h" #include "ompi/communicator/communicator.h" -#include "ompi/communicator/comm_helpers.h" #include "ompi/errhandler/errhandler.h" #include "ompi/datatype/ompi_datatype.h" #include "ompi/memchecker.h" @@ -52,7 +51,7 @@ int MPI_Ineighbor_alltoallw(const void *sendbuf, const int sendcounts[], const M MPI_Request *request) { int i, err; - int indegree, outdegree, weighted; + int indegree, outdegree; MEMCHECKER( ptrdiff_t recv_ext; @@ -60,7 +59,7 @@ int MPI_Ineighbor_alltoallw(const void *sendbuf, const int sendcounts[], const M memchecker_comm(comm); - err = ompi_comm_neighbors_count(comm, &indegree, &outdegree, &weighted); + err = mca_topo_base_neighbor_count (comm, &indegree, &outdegree); if (MPI_SUCCESS == err) { if (MPI_IN_PLACE != sendbuf) { for ( i = 0; i < outdegree; i++ ) { @@ -105,7 +104,7 @@ int MPI_Ineighbor_alltoallw(const void *sendbuf, const int sendcounts[], const M return OMPI_ERRHANDLER_INVOKE(comm, MPI_ERR_ARG, FUNC_NAME); } - err = ompi_comm_neighbors_count(comm, &indegree, &outdegree, &weighted); + err = mca_topo_base_neighbor_count (comm, &indegree, &outdegree); OMPI_ERRHANDLER_CHECK(err, comm, err, FUNC_NAME); for (i = 0; i < outdegree; ++i) { OMPI_CHECK_DATATYPE_FOR_SEND(err, sendtypes[i], sendcounts[i]); diff --git a/ompi/mpi/c/info_delete.c b/ompi/mpi/c/info_delete.c index dc246ea3288..07305427f1e 100644 --- a/ompi/mpi/c/info_delete.c +++ b/ompi/mpi/c/info_delete.c @@ -14,6 +14,7 @@ * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 Cisco Systems, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -78,5 +79,14 @@ int MPI_Info_delete(MPI_Info info, const char *key) { OPAL_CR_ENTER_LIBRARY(); err = ompi_info_delete (info, key); + + // Note that ompi_info_delete() (i.e., opal_info_delete()) will + // return OPAL_ERR_NOT_FOUND if there was no corresponding key to + // delete. Per MPI-3.1, we need to convert that to + // MPI_ERR_INFO_NOKEY. + if (OPAL_ERR_NOT_FOUND == err) { + err = MPI_ERR_INFO_NOKEY; + } + OMPI_ERRHANDLER_RETURN(err, MPI_COMM_WORLD, err, FUNC_NAME); } diff --git a/ompi/mpi/c/info_get.c b/ompi/mpi/c/info_get.c index e7185975e0a..cbc2d127f00 100644 --- a/ompi/mpi/c/info_get.c +++ b/ompi/mpi/c/info_get.c @@ -14,6 +14,7 @@ * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -100,6 +101,6 @@ int MPI_Info_get(MPI_Info info, const char *key, int valuelen, OPAL_CR_ENTER_LIBRARY(); - err = ompi_info_get (info, key, valuelen, value, flag); + err = ompi_info_get(info, key, valuelen, value, flag); OMPI_ERRHANDLER_RETURN(err, MPI_COMM_WORLD, err, FUNC_NAME); } diff --git a/ompi/mpi/c/intercomm_create.c b/ompi/mpi/c/intercomm_create.c index b899daa4c19..e8405242dc6 100644 --- a/ompi/mpi/c/intercomm_create.c +++ b/ompi/mpi/c/intercomm_create.c @@ -190,10 +190,6 @@ int MPI_Intercomm_create(MPI_Comm local_comm, int local_leader, new_group_pointer /* remote group */ ); - if ( NULL == newcomp ) { - rc = MPI_ERR_INTERN; - goto err_exit; - } if ( MPI_SUCCESS != rc ) { goto err_exit; } diff --git a/ompi/mpi/c/intercomm_merge.c b/ompi/mpi/c/intercomm_merge.c index ed6ff727cee..12107764ce3 100644 --- a/ompi/mpi/c/intercomm_merge.c +++ b/ompi/mpi/c/intercomm_merge.c @@ -109,10 +109,6 @@ int MPI_Intercomm_merge(MPI_Comm intercomm, int high, new_group_pointer, /* local group */ NULL /* remote group */ ); - if ( NULL == newcomp ) { - rc = MPI_ERR_INTERN; - goto exit; - } if ( MPI_SUCCESS != rc ) { goto exit; } diff --git a/ompi/mpi/c/isend.c b/ompi/mpi/c/isend.c index 0723b1b0266..5e56deed67e 100644 --- a/ompi/mpi/c/isend.c +++ b/ompi/mpi/c/isend.c @@ -13,7 +13,7 @@ * Copyright (c) 2006-2007 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2013 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -82,9 +82,13 @@ int MPI_Isend(const void *buf, int count, MPI_Datatype type, int dest, OPAL_CR_ENTER_LIBRARY(); - MEMCHECKER ( - memchecker_call(&opal_memchecker_base_mem_noaccess, buf, count, type); - ); + /* + * today's MPI standard mandates the send buffer remains accessible during the send operation + * hence memchecker cannot mark buf as non accessible, but it might mark buf as read-only in + * order to trap end user errors. Unfortunatly valgrind does not support marking buffers as read-only, + * so there is pretty much nothing we can do here. + */ + rc = MCA_PML_CALL(isend(buf, count, type, dest, tag, MCA_PML_BASE_SEND_STANDARD, comm, request)); OMPI_ERRHANDLER_RETURN(rc, comm, rc, FUNC_NAME); diff --git a/ompi/mpi/c/neighbor_allgather.c b/ompi/mpi/c/neighbor_allgather.c index 8612b2ac17f..88df4e78c5c 100644 --- a/ompi/mpi/c/neighbor_allgather.c +++ b/ompi/mpi/c/neighbor_allgather.c @@ -14,7 +14,7 @@ * Copyright (c) 2010 University of Houston. All rights reserved. * Copyright (c) 2013 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ @@ -62,11 +62,7 @@ int MPI_Neighbor_allgather(const void *sendbuf, int sendcount, MPI_Datatype send memchecker_datatype(recvtype); memchecker_comm(comm); /* check whether the actual send buffer is defined. */ - if (MPI_IN_PLACE == sendbuf) { - memchecker_call(&opal_memchecker_base_isdefined, - (char *)(recvbuf)+rank*ext, - recvcount, recvtype); - } else { + if (MPI_IN_PLACE != sendbuf) { memchecker_datatype(sendtype); memchecker_call(&opal_memchecker_base_isdefined, sendbuf, sendcount, sendtype); } diff --git a/ompi/mpi/c/neighbor_allgatherv.c b/ompi/mpi/c/neighbor_allgatherv.c index 9a2f87a2467..2c775a9e5bb 100644 --- a/ompi/mpi/c/neighbor_allgatherv.c +++ b/ompi/mpi/c/neighbor_allgatherv.c @@ -12,9 +12,9 @@ * All rights reserved. * Copyright (c) 2010 University of Houston. All rights reserved. * Copyright (c) 2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights + * Copyright (c) 2012-2016 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ @@ -32,6 +32,7 @@ #include "ompi/communicator/communicator.h" #include "ompi/errhandler/errhandler.h" #include "ompi/datatype/ompi_datatype.h" +#include "ompi/mca/topo/base/base.h" #include "ompi/memchecker.h" #include "ompi/mca/topo/topo.h" #include "ompi/mca/topo/base/base.h" @@ -50,31 +51,27 @@ int MPI_Neighbor_allgatherv(const void *sendbuf, int sendcount, MPI_Datatype sen void *recvbuf, const int recvcounts[], const int displs[], MPI_Datatype recvtype, MPI_Comm comm) { - int i, size, err; + int in_size, out_size, err; MEMCHECKER( int rank; ptrdiff_t ext; rank = ompi_comm_rank(comm); - size = ompi_comm_size(comm); + mca_topo_base_neighbor_count (comm, &in_size, &out_size); ompi_datatype_type_extent(recvtype, &ext); memchecker_datatype(recvtype); memchecker_comm (comm); /* check whether the receive buffer is addressable. */ - for (i = 0; i < size; i++) { + for (int i = 0; i < in_size; ++i) { memchecker_call(&opal_memchecker_base_isaddressable, (char *)(recvbuf)+displs[i]*ext, recvcounts[i], recvtype); } /* check whether the actual send buffer is defined. */ - if (MPI_IN_PLACE == sendbuf) { - memchecker_call(&opal_memchecker_base_isdefined, - (char *)(recvbuf)+displs[rank]*ext, - recvcounts[rank], recvtype); - } else { + if (MPI_IN_PLACE != sendbuf) { memchecker_datatype(sendtype); memchecker_call(&opal_memchecker_base_isdefined, sendbuf, sendcount, sendtype); } @@ -107,8 +104,8 @@ int MPI_Neighbor_allgatherv(const void *sendbuf, int sendcount, MPI_Datatype sen get the size of the remote group here for both intra- and intercommunicators */ - size = ompi_comm_remote_size(comm); - for (i = 0; i < size; ++i) { + mca_topo_base_neighbor_count (comm, &in_size, &out_size); + for (int i = 0; i < in_size; ++i) { if (recvcounts[i] < 0) { return OMPI_ERRHANDLER_INVOKE(comm, MPI_ERR_COUNT, FUNC_NAME); } @@ -141,27 +138,6 @@ int MPI_Neighbor_allgatherv(const void *sendbuf, int sendcount, MPI_Datatype sen } } - /* Do we need to do anything? Everyone had to give the same - signature, which means that everyone must have given a - sum(recvounts) > 0 if there's anything to do. */ - - if ( OMPI_COMM_IS_INTRA( comm) ) { - for (i = 0; i < ompi_comm_size(comm); ++i) { - if (0 != recvcounts[i]) { - break; - } - } - if (i >= ompi_comm_size(comm)) { - return MPI_SUCCESS; - } - } - /* There is no rule that can be applied for inter-communicators, since - recvcount(s)=0 only indicates that the processes in the other group - do not send anything, sendcount=0 only indicates that I do not send - anything. However, other processes in my group might very well send - something */ - - OPAL_CR_ENTER_LIBRARY(); /* Invoke the coll component to perform the back-end operation */ @@ -170,4 +146,3 @@ int MPI_Neighbor_allgatherv(const void *sendbuf, int sendcount, MPI_Datatype sen recvtype, comm, comm->c_coll->coll_neighbor_allgatherv_module); OMPI_ERRHANDLER_RETURN(err, comm, err, FUNC_NAME); } - diff --git a/ompi/mpi/c/neighbor_alltoallv.c b/ompi/mpi/c/neighbor_alltoallv.c index acadf1ab799..5004e6b42d6 100644 --- a/ompi/mpi/c/neighbor_alltoallv.c +++ b/ompi/mpi/c/neighbor_alltoallv.c @@ -11,7 +11,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2007 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights + * Copyright (c) 2012-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014-2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. @@ -32,7 +32,6 @@ #include "ompi/errhandler/errhandler.h" #include "ompi/datatype/ompi_datatype.h" #include "ompi/memchecker.h" -#include "ompi/communicator/comm_helpers.h" #include "ompi/mca/topo/topo.h" #include "ompi/mca/topo/base/base.h" @@ -52,7 +51,7 @@ int MPI_Neighbor_alltoallv(const void *sendbuf, const int sendcounts[], const in MPI_Datatype recvtype, MPI_Comm comm) { int i, err; - int indegree, outdegree, weighted; + int indegree, outdegree; MEMCHECKER( ptrdiff_t recv_ext; @@ -68,7 +67,7 @@ int MPI_Neighbor_alltoallv(const void *sendbuf, const int sendcounts[], const in memchecker_datatype(recvtype); ompi_datatype_type_extent(sendtype, &send_ext); - err = ompi_comm_neighbors_count(comm, &indegree, &outdegree, &weighted); + err = mca_topo_base_neighbor_count (comm, &indegree, &outdegree); if (MPI_SUCCESS == err) { if (MPI_IN_PLACE != sendbuf) { for ( i = 0; i < outdegree; i++ ) { @@ -105,7 +104,7 @@ int MPI_Neighbor_alltoallv(const void *sendbuf, const int sendcounts[], const in return OMPI_ERRHANDLER_INVOKE(comm, MPI_ERR_ARG, FUNC_NAME); } - err = ompi_comm_neighbors_count(comm, &indegree, &outdegree, &weighted); + err = mca_topo_base_neighbor_count (comm, &indegree, &outdegree); OMPI_ERRHANDLER_CHECK(err, comm, err, FUNC_NAME); for (i = 0; i < outdegree; ++i) { OMPI_CHECK_DATATYPE_FOR_SEND(err, sendtype, sendcounts[i]); diff --git a/ompi/mpi/c/neighbor_alltoallw.c b/ompi/mpi/c/neighbor_alltoallw.c index 347d0d81432..5d339bfa6d6 100644 --- a/ompi/mpi/c/neighbor_alltoallw.c +++ b/ompi/mpi/c/neighbor_alltoallw.c @@ -11,7 +11,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2007 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights + * Copyright (c) 2012-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014-2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. @@ -32,7 +32,6 @@ #include "ompi/errhandler/errhandler.h" #include "ompi/datatype/ompi_datatype.h" #include "ompi/memchecker.h" -#include "ompi/communicator/comm_helpers.h" #include "ompi/mca/topo/topo.h" #include "ompi/mca/topo/base/base.h" @@ -52,7 +51,7 @@ int MPI_Neighbor_alltoallw(const void *sendbuf, const int sendcounts[], const MP const MPI_Datatype recvtypes[], MPI_Comm comm) { int i, err; - int indegree, outdegree, weighted; + int indegree, outdegree; MEMCHECKER( ptrdiff_t recv_ext; @@ -60,7 +59,7 @@ int MPI_Neighbor_alltoallw(const void *sendbuf, const int sendcounts[], const MP memchecker_comm(comm); - err = ompi_comm_neighbors_count(comm, &indegree, &outdegree, &weighted); + err = mca_topo_base_neighbor_count (comm, &indegree, &outdegree); if (MPI_SUCCESS == err) { if (MPI_IN_PLACE != sendbuf) { for ( i = 0; i < outdegree; i++ ) { @@ -101,7 +100,7 @@ int MPI_Neighbor_alltoallw(const void *sendbuf, const int sendcounts[], const MP return OMPI_ERRHANDLER_INVOKE(comm, MPI_ERR_ARG, FUNC_NAME); } - err = ompi_comm_neighbors_count(comm, &indegree, &outdegree, &weighted); + err = mca_topo_base_neighbor_count (comm, &indegree, &outdegree); OMPI_ERRHANDLER_CHECK(err, comm, err, FUNC_NAME); for (i = 0; i < outdegree; ++i) { OMPI_CHECK_DATATYPE_FOR_SEND(err, sendtypes[i], sendcounts[i]); diff --git a/ompi/mpi/c/profile/Makefile.am b/ompi/mpi/c/profile/Makefile.am index ed8b77c8270..0217ab8b218 100644 --- a/ompi/mpi/c/profile/Makefile.am +++ b/ompi/mpi/c/profile/Makefile.am @@ -16,7 +16,7 @@ # Copyright (c) 2012-2013 Inria. All rights reserved. # Copyright (c) 2013 Los Alamos National Security, LLC. All rights # reserved. -# Copyright (c) 2015 Research Organization for Information Science +# Copyright (c) 2015-2017 Research Organization for Information Science # and Technology (RIST). All rights reserved. # $COPYRIGHT$ # @@ -134,6 +134,67 @@ nodist_libmpi_c_pmpi_la_SOURCES = \ pexscan.c \ pfetch_and_op.c \ piexscan.c \ + pfile_c2f.c \ + pfile_call_errhandler.c \ + pfile_close.c \ + pfile_create_errhandler.c \ + pfile_delete.c \ + pfile_f2c.c \ + pfile_get_amode.c \ + pfile_get_atomicity.c \ + pfile_get_byte_offset.c \ + pfile_get_errhandler.c \ + pfile_get_group.c \ + pfile_get_info.c \ + pfile_get_position.c \ + pfile_get_position_shared.c \ + pfile_get_size.c \ + pfile_get_type_extent.c \ + pfile_get_view.c \ + pfile_iread_at.c \ + pfile_iread.c \ + pfile_iread_at_all.c \ + pfile_iread_all.c \ + pfile_iread_shared.c \ + pfile_iwrite_at.c \ + pfile_iwrite.c \ + pfile_iwrite_at_all.c \ + pfile_iwrite_all.c \ + pfile_iwrite_shared.c \ + pfile_open.c \ + pfile_preallocate.c \ + pfile_read_all_begin.c \ + pfile_read_all.c \ + pfile_read_all_end.c \ + pfile_read_at_all_begin.c \ + pfile_read_at_all.c \ + pfile_read_at_all_end.c \ + pfile_read_at.c \ + pfile_read.c \ + pfile_read_ordered_begin.c \ + pfile_read_ordered.c \ + pfile_read_ordered_end.c \ + pfile_read_shared.c \ + pfile_seek.c \ + pfile_seek_shared.c \ + pfile_set_atomicity.c \ + pfile_set_errhandler.c \ + pfile_set_info.c \ + pfile_set_size.c \ + pfile_set_view.c \ + pfile_sync.c \ + pfile_write_all_begin.c \ + pfile_write_all.c \ + pfile_write_all_end.c \ + pfile_write_at_all_begin.c \ + pfile_write_at_all.c \ + pfile_write_at_all_end.c \ + pfile_write_at.c \ + pfile_write.c \ + pfile_write_ordered_begin.c \ + pfile_write_ordered.c \ + pfile_write_ordered_end.c \ + pfile_write_shared.c \ pfinalize.c \ pfinalized.c \ pfree_mem.c \ @@ -231,6 +292,7 @@ nodist_libmpi_c_pmpi_la_SOURCES = \ precv_init.c \ precv.c \ preduce.c \ + pregister_datarep.c \ pireduce.c \ preduce_local.c \ preduce_scatter.c \ @@ -364,72 +426,6 @@ nodist_libmpi_c_pmpi_la_SOURCES = \ pwin_unlock_all.c \ pwin_wait.c -if OMPI_PROVIDE_MPI_FILE_INTERFACE -nodist_libmpi_c_pmpi_la_SOURCES += \ - pfile_c2f.c \ - pfile_call_errhandler.c \ - pfile_close.c \ - pfile_create_errhandler.c \ - pfile_delete.c \ - pfile_f2c.c \ - pfile_get_amode.c \ - pfile_get_atomicity.c \ - pfile_get_byte_offset.c \ - pfile_get_errhandler.c \ - pfile_get_group.c \ - pfile_get_info.c \ - pfile_get_position.c \ - pfile_get_position_shared.c \ - pfile_get_size.c \ - pfile_get_type_extent.c \ - pfile_get_view.c \ - pfile_iread_at.c \ - pfile_iread.c \ - pfile_iread_at_all.c \ - pfile_iread_all.c \ - pfile_iread_shared.c \ - pfile_iwrite_at.c \ - pfile_iwrite.c \ - pfile_iwrite_at_all.c \ - pfile_iwrite_all.c \ - pfile_iwrite_shared.c \ - pfile_open.c \ - pfile_preallocate.c \ - pfile_read_all_begin.c \ - pfile_read_all.c \ - pfile_read_all_end.c \ - pfile_read_at_all_begin.c \ - pfile_read_at_all.c \ - pfile_read_at_all_end.c \ - pfile_read_at.c \ - pfile_read.c \ - pfile_read_ordered_begin.c \ - pfile_read_ordered.c \ - pfile_read_ordered_end.c \ - pfile_read_shared.c \ - pfile_seek.c \ - pfile_seek_shared.c \ - pfile_set_atomicity.c \ - pfile_set_errhandler.c \ - pfile_set_info.c \ - pfile_set_size.c \ - pfile_set_view.c \ - pfile_sync.c \ - pfile_write_all_begin.c \ - pfile_write_all.c \ - pfile_write_all_end.c \ - pfile_write_at_all_begin.c \ - pfile_write_at_all.c \ - pfile_write_at_all_end.c \ - pfile_write_at.c \ - pfile_write.c \ - pfile_write_ordered_begin.c \ - pfile_write_ordered.c \ - pfile_write_ordered_end.c \ - pfile_write_shared.c \ - pregister_datarep.c -endif - # # Sym link in the sources from the real MPI directory # diff --git a/ompi/mpi/c/raccumulate.c b/ompi/mpi/c/raccumulate.c index 05559fafe02..2a755bb5020 100644 --- a/ompi/mpi/c/raccumulate.c +++ b/ompi/mpi/c/raccumulate.c @@ -92,7 +92,7 @@ int MPI_Raccumulate(const void *origin_addr, int origin_count, MPI_Datatype orig /* RACCUMULATE, unlike REDUCE, can use with derived datatypes with predefinied operations, with some - restrictions outlined in MPI-2:6.3.4. The derived + restrictions outlined in MPI-3:11.3.4. The derived datatype must be composed entierly from one predefined datatype (so you can do all the construction you want, but at the bottom, you can only use one datatype, say, diff --git a/ompi/mpi/c/rget_accumulate.c b/ompi/mpi/c/rget_accumulate.c index 7066abb63e3..11764f8ed9e 100644 --- a/ompi/mpi/c/rget_accumulate.c +++ b/ompi/mpi/c/rget_accumulate.c @@ -99,7 +99,7 @@ int MPI_Rget_accumulate(const void *origin_addr, int origin_count, MPI_Datatype /* RGET_ACCUMULATE, unlike REDUCE, can use with derived datatypes with predefinied operations, with some - restrictions outlined in MPI-2:6.3.4. The derived + restrictions outlined in MPI-3:11.3.4. The derived datatype must be composed entierly from one predefined datatype (so you can do all the construction you want, but at the bottom, you can only use one datatype, say, diff --git a/ompi/mpi/c/sendrecv_replace.c b/ompi/mpi/c/sendrecv_replace.c index 0063125119d..bb9f4126f13 100644 --- a/ompi/mpi/c/sendrecv_replace.c +++ b/ompi/mpi/c/sendrecv_replace.c @@ -2,7 +2,7 @@ * Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2010 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, @@ -12,6 +12,7 @@ * Copyright (c) 2010-2012 Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -48,10 +49,10 @@ int MPI_Sendrecv_replace(void * buf, int count, MPI_Datatype datatype, int rc = MPI_SUCCESS; MEMCHECKER( - memchecker_datatype(datatype); - memchecker_call(&opal_memchecker_base_isdefined, buf, count, datatype); - memchecker_comm(comm); - ); + memchecker_datatype(datatype); + memchecker_call(&opal_memchecker_base_isdefined, buf, count, datatype); + memchecker_comm(comm); + ); if ( MPI_PARAM_CHECK ) { rc = MPI_SUCCESS; @@ -76,68 +77,68 @@ int MPI_Sendrecv_replace(void * buf, int count, MPI_Datatype datatype, /* simple case */ if ( source == MPI_PROC_NULL || dest == MPI_PROC_NULL || count == 0 ) { - rc = PMPI_Sendrecv(buf,count,datatype,dest,sendtag,buf,count,datatype,source,recvtag,comm,status); + rc = PMPI_Sendrecv(buf, count, datatype, dest, sendtag, buf, count, datatype, source, recvtag, comm, status); OPAL_CR_EXIT_LIBRARY(); return rc; - } else { - - opal_convertor_t convertor; - struct iovec iov; - unsigned char recv_data[2048]; - size_t packed_size, max_data; - uint32_t iov_count; - ompi_status_public_t recv_status; - ompi_proc_t* proc = ompi_comm_peer_lookup(comm,source); - if(proc == NULL) { - rc = MPI_ERR_RANK; - OMPI_ERRHANDLER_RETURN(rc, comm, rc, FUNC_NAME); - } - - /* initialize convertor to unpack recv buffer */ - OBJ_CONSTRUCT(&convertor, opal_convertor_t); - opal_convertor_copy_and_prepare_for_recv( proc->super.proc_convertor, &(datatype->super), - count, buf, 0, &convertor ); - - /* setup a buffer for recv */ - opal_convertor_get_packed_size( &convertor, &packed_size ); - if( packed_size > sizeof(recv_data) ) { - rc = PMPI_Alloc_mem(packed_size, MPI_INFO_NULL, &iov.iov_base); - if(OMPI_SUCCESS != rc) { - OMPI_ERRHANDLER_RETURN(OMPI_ERR_OUT_OF_RESOURCE, comm, MPI_ERR_BUFFER, FUNC_NAME); - } - } else { - iov.iov_base = (caddr_t)recv_data; - } - - /* recv into temporary buffer */ - rc = PMPI_Sendrecv( buf, count, datatype, dest, sendtag, iov.iov_base, packed_size, - MPI_BYTE, source, recvtag, comm, &recv_status ); - if (rc != MPI_SUCCESS) { - if(packed_size > sizeof(recv_data)) - PMPI_Free_mem(iov.iov_base); - OBJ_DESTRUCT(&convertor); - OMPI_ERRHANDLER_RETURN(rc, comm, rc, FUNC_NAME); - } - - /* unpack into users buffer */ - iov.iov_len = recv_status._ucount; - iov_count = 1; - max_data = recv_status._ucount; - opal_convertor_unpack(&convertor, &iov, &iov_count, &max_data ); + } - /* return status to user */ - if(status != MPI_STATUS_IGNORE) { - *status = recv_status; - } + /** + * If we look for an optimal solution, then we should receive the data into a temporary buffer + * and once the send completes we would unpack back into the original buffer. However, if the + * sender is unknown, this approach can only be implementing by receiving with the recv datatype + * (potentially non-contiguous) and thus the allocated memory will be larger than the size of the + * datatype. A simpler, but potentially less efficient approach is to work on the data we have + * control of, aka the sent data, and pack it into a contiguous buffer before posting the receive. + * Once the send completes, we free it. + */ + opal_convertor_t convertor; + unsigned char packed_data[2048]; + struct iovec iov = { .iov_base = packed_data, .iov_len = sizeof(packed_data) }; + size_t packed_size, max_data; + uint32_t iov_count; + ompi_status_public_t recv_status; + ompi_proc_t* proc = ompi_comm_peer_lookup(comm, dest); + if(proc == NULL) { + rc = MPI_ERR_RANK; + OMPI_ERRHANDLER_RETURN(rc, comm, rc, FUNC_NAME); + } - /* release resources */ - if(packed_size > sizeof(recv_data)) { - PMPI_Free_mem(iov.iov_base); + /* initialize convertor to unpack recv buffer */ + OBJ_CONSTRUCT(&convertor, opal_convertor_t); + opal_convertor_copy_and_prepare_for_send( proc->super.proc_convertor, &(datatype->super), + count, buf, 0, &convertor ); + + /* setup a buffer for recv */ + opal_convertor_get_packed_size( &convertor, &packed_size ); + if( packed_size > sizeof(packed_data) ) { + rc = PMPI_Alloc_mem(packed_size, MPI_INFO_NULL, &iov.iov_base); + if(OMPI_SUCCESS != rc) { + rc = OMPI_ERR_OUT_OF_RESOURCE; + goto cleanup_and_return; } - OBJ_DESTRUCT(&convertor); + iov.iov_len = packed_size; + } + max_data = packed_size; + iov_count = 1; + rc = opal_convertor_pack(&convertor, &iov, &iov_count, &max_data); + + /* recv into temporary buffer */ + rc = PMPI_Sendrecv( iov.iov_base, packed_size, MPI_PACKED, dest, sendtag, buf, count, + datatype, source, recvtag, comm, &recv_status ); + + cleanup_and_return: + /* return status to user */ + if(status != MPI_STATUS_IGNORE) { + *status = recv_status; + } - OPAL_CR_EXIT_LIBRARY(); - return MPI_SUCCESS; + /* release resources */ + if(packed_size > sizeof(packed_data)) { + PMPI_Free_mem(iov.iov_base); } + OBJ_DESTRUCT(&convertor); + + OPAL_CR_EXIT_LIBRARY(); + OMPI_ERRHANDLER_RETURN(rc, comm, rc, FUNC_NAME); } diff --git a/ompi/mpi/c/start.c b/ompi/mpi/c/start.c index aa2c8af7b6b..3f1b3658e31 100644 --- a/ompi/mpi/c/start.c +++ b/ompi/mpi/c/start.c @@ -68,7 +68,7 @@ int MPI_Start(MPI_Request *request) case OMPI_REQUEST_PML: OPAL_CR_ENTER_LIBRARY(); - ret = MCA_PML_CALL(start(1, request)); + ret = (*request)->req_start(1, request); OPAL_CR_EXIT_LIBRARY(); return ret; diff --git a/ompi/mpi/c/startall.c b/ompi/mpi/c/startall.c index 34a3fed2364..14452f68de4 100644 --- a/ompi/mpi/c/startall.c +++ b/ompi/mpi/c/startall.c @@ -44,11 +44,11 @@ static const char FUNC_NAME[] = "MPI_Startall"; int MPI_Startall(int count, MPI_Request requests[]) { - int i; + int i, j; int ret = OMPI_SUCCESS; + ompi_request_start_fn_t start_fn = NULL; MEMCHECKER( - int j; for (j = 0; j < count; j++){ memchecker_request(&requests[j]); } @@ -76,7 +76,7 @@ int MPI_Startall(int count, MPI_Request requests[]) OPAL_CR_ENTER_LIBRARY(); - for (i = 0; i < count; ++i) { + for (i = 0, j = -1; i < count; ++i) { /* Per MPI it is invalid to start an active request */ if (OMPI_REQUEST_INACTIVE != requests[i]->req_state) { return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_REQUEST, FUNC_NAME); @@ -91,9 +91,21 @@ int MPI_Startall(int count, MPI_Request requests[]) */ requests[i]->req_state = OMPI_REQUEST_ACTIVE; } + + /* Call a req_start callback function per requests which have the + * same req_start value. */ + if (requests[i]->req_start != start_fn) { + if (NULL != start_fn && i != 0) { + start_fn(i - j, requests + j); + } + start_fn = requests[i]->req_start; + j = i; + } } - ret = MCA_PML_CALL(start(count, requests)); + if (NULL != start_fn) { + start_fn(i - j, requests + j); + } OPAL_CR_EXIT_LIBRARY(); return ret; diff --git a/ompi/mpi/c/type_c2f.c b/ompi/mpi/c/type_c2f.c index 1af1ffea97b..fa90ef11208 100644 --- a/ompi/mpi/c/type_c2f.c +++ b/ompi/mpi/c/type_c2f.c @@ -2,7 +2,7 @@ * Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2005 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, @@ -55,5 +55,10 @@ MPI_Fint MPI_Type_c2f(MPI_Datatype datatype) } } + /* If necessary add the datatype to the f2c translation table */ + if( -1 == datatype->d_f_to_c_index ) { + datatype->d_f_to_c_index = opal_pointer_array_add(&ompi_datatype_f_to_c_table, datatype); + /* We don't check for error as returning a negative value is considered as an error */ + } return OMPI_INT_2_FINT(datatype->d_f_to_c_index); } diff --git a/ompi/mpi/c/type_create_f90_complex.c b/ompi/mpi/c/type_create_f90_complex.c index 133e783711f..e8ec6d6f9ab 100644 --- a/ompi/mpi/c/type_create_f90_complex.c +++ b/ompi/mpi/c/type_create_f90_complex.c @@ -11,11 +11,12 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2009 Sun Microsystems, Inc. All rights reserved. - * Copyright (c) 2008 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2008-2018 Cisco Systems, Inc. All rights reserved * Copyright (c) 2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -45,6 +46,7 @@ static const char FUNC_NAME[] = "MPI_Type_create_f90_complex"; int MPI_Type_create_f90_complex(int p, int r, MPI_Datatype *newtype) { uint64_t key; + int p_key, r_key; OPAL_CR_NOOP_PROGRESS(); @@ -64,8 +66,10 @@ int MPI_Type_create_f90_complex(int p, int r, MPI_Datatype *newtype) /* if the user does not care about p or r set them to 0 so the * test associate with them will always succeed. */ - if( MPI_UNDEFINED == p ) p = 0; - if( MPI_UNDEFINED == r ) r = 0; + p_key = p; + r_key = r; + if( MPI_UNDEFINED == p ) p_key = 0; + if( MPI_UNDEFINED == r ) r_key = 0; /** * With respect to the MPI standard, MPI-2.0 Sect. 10.2.5, MPI_TYPE_CREATE_F90_xxxx, @@ -86,7 +90,7 @@ int MPI_Type_create_f90_complex(int p, int r, MPI_Datatype *newtype) const int* a_i[2]; int rc; - key = (((uint64_t)p) << 32) | ((uint64_t)r); + key = (((uint64_t)p_key) << 32) | ((uint64_t)r_key); if( OPAL_SUCCESS == opal_hash_table_get_value_uint64( &ompi_mpi_f90_complex_hashtable, key, (void**)newtype ) ) { return MPI_SUCCESS; @@ -103,11 +107,15 @@ int MPI_Type_create_f90_complex(int p, int r, MPI_Datatype *newtype) */ datatype->super.flags |= OMPI_DATATYPE_FLAG_PREDEFINED; /* Mark the datatype as a special F90 convenience type */ - snprintf(datatype->name, MPI_MAX_OBJECT_NAME, "COMBINER %s", - (*newtype)->name); - - a_i[0] = &r; - a_i[1] = &p; + char *new_name; + asprintf(&new_name, "COMBINER %s", (*newtype)->name); + size_t max_len = MPI_MAX_OBJECT_NAME; + strncpy(datatype->name, new_name, max_len - 1); + datatype->name[max_len - 1] = '\0'; + free(new_name); + + a_i[0] = &p; + a_i[1] = &r; ompi_datatype_set_args( datatype, 2, a_i, 0, NULL, 0, NULL, MPI_COMBINER_F90_COMPLEX ); rc = opal_hash_table_set_value_uint64( &ompi_mpi_f90_complex_hashtable, key, datatype ); diff --git a/ompi/mpi/c/type_create_f90_integer.c b/ompi/mpi/c/type_create_f90_integer.c index 95df36e9eaa..108893dfaab 100644 --- a/ompi/mpi/c/type_create_f90_integer.c +++ b/ompi/mpi/c/type_create_f90_integer.c @@ -11,7 +11,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2009 Sun Microsystems, Inc. All rights reserved. - * Copyright (c) 2008-2012 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2008-2018 Cisco Systems, Inc. All rights reserved * Copyright (c) 2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2015 Research Organization for Information Science @@ -103,8 +103,12 @@ int MPI_Type_create_f90_integer(int r, MPI_Datatype *newtype) */ datatype->super.flags |= OMPI_DATATYPE_FLAG_PREDEFINED; /* Mark the datatype as a special F90 convenience type */ - snprintf(datatype->name, MPI_MAX_OBJECT_NAME, "COMBINER %s", - (*newtype)->name); + char *new_name; + asprintf(&new_name, "COMBINER %s", (*newtype)->name); + size_t max_len = MPI_MAX_OBJECT_NAME; + strncpy(datatype->name, new_name, max_len - 1); + datatype->name[max_len - 1] = '\0'; + free(new_name); a_i[0] = &r; ompi_datatype_set_args( datatype, 1, a_i, 0, NULL, 0, NULL, MPI_COMBINER_F90_INTEGER ); diff --git a/ompi/mpi/c/type_create_f90_real.c b/ompi/mpi/c/type_create_f90_real.c index a2144a619a2..de2ee83fac4 100644 --- a/ompi/mpi/c/type_create_f90_real.c +++ b/ompi/mpi/c/type_create_f90_real.c @@ -11,11 +11,12 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2009 Sun Microsystems, Inc. All rights reserved. - * Copyright (c) 2008 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2008-2018 Cisco Systems, Inc. All rights reserved * Copyright (c) 2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -45,6 +46,7 @@ static const char FUNC_NAME[] = "MPI_Type_create_f90_real"; int MPI_Type_create_f90_real(int p, int r, MPI_Datatype *newtype) { uint64_t key; + int p_key, r_key; OPAL_CR_NOOP_PROGRESS(); @@ -64,8 +66,10 @@ int MPI_Type_create_f90_real(int p, int r, MPI_Datatype *newtype) /* if the user does not care about p or r set them to 0 so the * test associate with them will always succeed. */ - if( MPI_UNDEFINED == p ) p = 0; - if( MPI_UNDEFINED == r ) r = 0; + p_key = p; + r_key = r; + if( MPI_UNDEFINED == p ) p_key = 0; + if( MPI_UNDEFINED == r ) r_key = 0; /** * With respect to the MPI standard, MPI-2.0 Sect. 10.2.5, MPI_TYPE_CREATE_F90_xxxx, @@ -83,10 +87,10 @@ int MPI_Type_create_f90_real(int p, int r, MPI_Datatype *newtype) if( *newtype != &ompi_mpi_datatype_null.dt ) { ompi_datatype_t* datatype; - const int* a_i[2] = {&r, &p}; + const int* a_i[2] = {&p, &r}; int rc; - key = (((uint64_t)p) << 32) | ((uint64_t)r); + key = (((uint64_t)p_key) << 32) | ((uint64_t)r_key); if( OPAL_SUCCESS == opal_hash_table_get_value_uint64( &ompi_mpi_f90_real_hashtable, key, (void**)newtype ) ) { return MPI_SUCCESS; @@ -103,8 +107,12 @@ int MPI_Type_create_f90_real(int p, int r, MPI_Datatype *newtype) */ datatype->super.flags |= OMPI_DATATYPE_FLAG_PREDEFINED; /* Mark the datatype as a special F90 convenience type */ - snprintf(datatype->name, MPI_MAX_OBJECT_NAME, "COMBINER %s", - (*newtype)->name); + char *new_name; + asprintf(&new_name, "COMBINER %s", (*newtype)->name); + size_t max_len = MPI_MAX_OBJECT_NAME; + strncpy(datatype->name, new_name, max_len - 1); + datatype->name[max_len - 1] = '\0'; + free(new_name); ompi_datatype_set_args( datatype, 2, a_i, 0, NULL, 0, NULL, MPI_COMBINER_F90_REAL ); diff --git a/ompi/mpi/c/wait.c b/ompi/mpi/c/wait.c index 1763b27b880..c8e8a521b77 100644 --- a/ompi/mpi/c/wait.c +++ b/ompi/mpi/c/wait.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -73,14 +73,18 @@ int MPI_Wait(MPI_Request *request, MPI_Status *status) * Per MPI-1, the MPI_ERROR field is not defined for single-completion calls */ MEMCHECKER( - opal_memchecker_base_mem_undefined(&status->MPI_ERROR, sizeof(int)); + if (MPI_STATUS_IGNORE != status) { + opal_memchecker_base_mem_undefined(&status->MPI_ERROR, sizeof(int)); + } ); OPAL_CR_EXIT_LIBRARY(); return MPI_SUCCESS; } MEMCHECKER( - opal_memchecker_base_mem_undefined(&status->MPI_ERROR, sizeof(int)); + if (MPI_STATUS_IGNORE != status) { + opal_memchecker_base_mem_undefined(&status->MPI_ERROR, sizeof(int)); + } ); OPAL_CR_EXIT_LIBRARY(); return ompi_errhandler_request_invoke(1, request, FUNC_NAME); diff --git a/ompi/mpi/c/win_allocate.c b/ompi/mpi/c/win_allocate.c index f259c3c8ae6..f0d1dbd5e9a 100644 --- a/ompi/mpi/c/win_allocate.c +++ b/ompi/mpi/c/win_allocate.c @@ -12,6 +12,7 @@ * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -77,7 +78,7 @@ int MPI_Win_allocate(MPI_Aint size, int disp_unit, MPI_Info info, OPAL_CR_ENTER_LIBRARY(); /* create window and return */ - ret = ompi_win_allocate((size_t)size, disp_unit, info, + ret = ompi_win_allocate((size_t)size, disp_unit, &(info->super), comm, baseptr, win); if (OMPI_SUCCESS != ret) { *win = MPI_WIN_NULL; diff --git a/ompi/mpi/c/win_allocate_shared.c b/ompi/mpi/c/win_allocate_shared.c index 5179a5d0955..36d26df0c21 100644 --- a/ompi/mpi/c/win_allocate_shared.c +++ b/ompi/mpi/c/win_allocate_shared.c @@ -15,6 +15,7 @@ * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -78,7 +79,7 @@ int MPI_Win_allocate_shared(MPI_Aint size, int disp_unit, MPI_Info info, OPAL_CR_ENTER_LIBRARY(); /* create window and return */ - ret = ompi_win_allocate_shared((size_t)size, disp_unit, info, + ret = ompi_win_allocate_shared((size_t)size, disp_unit, &(info->super), comm, baseptr, win); if (OMPI_SUCCESS != ret) { *win = MPI_WIN_NULL; diff --git a/ompi/mpi/c/win_create.c b/ompi/mpi/c/win_create.c index c5e7f9d463e..7b322c690bd 100644 --- a/ompi/mpi/c/win_create.c +++ b/ompi/mpi/c/win_create.c @@ -12,6 +12,7 @@ * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -78,7 +79,7 @@ int MPI_Win_create(void *base, MPI_Aint size, int disp_unit, /* create window and return */ ret = ompi_win_create(base, (size_t)size, disp_unit, comm, - info, win); + &(info->super), win); if (OMPI_SUCCESS != ret) { *win = MPI_WIN_NULL; OPAL_CR_EXIT_LIBRARY(); diff --git a/ompi/mpi/c/win_create_dynamic.c b/ompi/mpi/c/win_create_dynamic.c index dfafed94c29..438b5900325 100644 --- a/ompi/mpi/c/win_create_dynamic.c +++ b/ompi/mpi/c/win_create_dynamic.c @@ -12,6 +12,7 @@ * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -73,7 +74,7 @@ int MPI_Win_create_dynamic(MPI_Info info, MPI_Comm comm, MPI_Win *win) OPAL_CR_ENTER_LIBRARY(); /* create_dynamic window and return */ - ret = ompi_win_create_dynamic(info, comm, win); + ret = ompi_win_create_dynamic(&(info->super), comm, win); if (OMPI_SUCCESS != ret) { *win = MPI_WIN_NULL; OPAL_CR_EXIT_LIBRARY(); diff --git a/ompi/mpi/c/win_get_errhandler.c b/ompi/mpi/c/win_get_errhandler.c index c6fd3080f6e..5704950ebf1 100644 --- a/ompi/mpi/c/win_get_errhandler.c +++ b/ompi/mpi/c/win_get_errhandler.c @@ -11,7 +11,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2009 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2016 Los Alamos National Security, LLC. All rights * reserved. @@ -42,8 +42,6 @@ static const char FUNC_NAME[] = "MPI_Win_get_errhandler"; int MPI_Win_get_errhandler(MPI_Win win, MPI_Errhandler *errhandler) { - MPI_Errhandler tmp; - OPAL_CR_NOOP_PROGRESS(); if (MPI_PARAM_CHECK) { @@ -57,16 +55,12 @@ int MPI_Win_get_errhandler(MPI_Win win, MPI_Errhandler *errhandler) } } - /* On 64 bits environments we have to make sure the reading of the - error_handler became atomic. */ - do { - tmp = win->error_handler; - } while (!OPAL_ATOMIC_CMPSET_PTR(&(win->error_handler), tmp, tmp)); - + OPAL_THREAD_LOCK(&win->w_lock); /* Retain the errhandler, corresponding to object refcount decrease in errhandler_free.c. */ OBJ_RETAIN(win->error_handler); *errhandler = win->error_handler; + OPAL_THREAD_UNLOCK(&win->w_lock); /* All done */ return MPI_SUCCESS; diff --git a/ompi/mpi/c/win_get_info.c b/ompi/mpi/c/win_get_info.c index ed686eb18c8..512ab1c213b 100644 --- a/ompi/mpi/c/win_get_info.c +++ b/ompi/mpi/c/win_get_info.c @@ -5,6 +5,7 @@ * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -18,6 +19,8 @@ #include "ompi/runtime/params.h" #include "ompi/errhandler/errhandler.h" #include "ompi/win/win.h" +#include "opal/util/info.h" +#include "opal/util/info_subscriber.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS @@ -28,15 +31,12 @@ static const char FUNC_NAME[] = "MPI_Win_get_info"; -static void _win_info_set (ompi_info_t *info, const char *key, int set) -{ - ompi_info_set (info, key, set ? "true" : "false"); -} - int MPI_Win_get_info(MPI_Win win, MPI_Info *info_used) { int ret; + OPAL_CR_NOOP_PROGRESS(); + if (MPI_PARAM_CHECK) { OMPI_ERR_INIT_FINALIZE(FUNC_NAME); @@ -49,18 +49,20 @@ int MPI_Win_get_info(MPI_Win win, MPI_Info *info_used) } } - OPAL_CR_ENTER_LIBRARY(); - - ret = win->w_osc_module->osc_get_info(win, info_used); - - if (OMPI_SUCCESS == ret && *info_used) { - /* set standard info keys based on what the OSC module is using */ + if (NULL == win->super.s_info) { +/* + * Setup any defaults if MPI_Win_set_info was never called + */ + opal_infosubscribe_change_info(&win->super, &MPI_INFO_NULL->super); + } - _win_info_set (*info_used, "no_locks", win->w_flags & OMPI_WIN_NO_LOCKS); - _win_info_set (*info_used, "same_size", win->w_flags & OMPI_WIN_SAME_SIZE); - _win_info_set (*info_used, "same_disp_unit", win->w_flags & OMPI_WIN_SAME_DISP); - ompi_info_set_value_enum (*info_used, "accumulate_ops", win->w_acc_ops, ompi_win_accumulate_ops); + (*info_used) = OBJ_NEW(ompi_info_t); + if (NULL == (*info_used)) { + return OMPI_ERRHANDLER_INVOKE(win, MPI_ERR_NO_MEM, FUNC_NAME); } + opal_info_t *opal_info_used = &(*info_used)->super; + + ret = opal_info_dup_mpistandard(win->super.s_info, &opal_info_used); OMPI_ERRHANDLER_RETURN(ret, win, ret, FUNC_NAME); } diff --git a/ompi/mpi/c/win_lock.c b/ompi/mpi/c/win_lock.c index 82822c9ca4f..96cefc7445e 100644 --- a/ompi/mpi/c/win_lock.c +++ b/ompi/mpi/c/win_lock.c @@ -12,7 +12,7 @@ * All rights reserved. * Copyright (c) 2014 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -61,9 +61,6 @@ int MPI_Win_lock(int lock_type, int rank, int assert, MPI_Win win) } } - /* NTH: do not bother keeping track of locking MPI_PROC_NULL. */ - if (MPI_PROC_NULL == rank) return MPI_SUCCESS; - OPAL_CR_ENTER_LIBRARY(); rc = win->w_osc_module->osc_lock(lock_type, rank, assert, win); diff --git a/ompi/mpi/c/win_set_errhandler.c b/ompi/mpi/c/win_set_errhandler.c index d5e87cff779..e1e6f3e059d 100644 --- a/ompi/mpi/c/win_set_errhandler.c +++ b/ompi/mpi/c/win_set_errhandler.c @@ -11,9 +11,9 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2009 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2016-2017 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -63,11 +63,12 @@ int MPI_Win_set_errhandler(MPI_Win win, MPI_Errhandler errhandler) /* Prepare the new error handler */ OBJ_RETAIN(errhandler); - /* Ditch the old errhandler, and decrement its refcount. On 64 - bits environments we have to make sure the reading of the - error_handler became atomic. */ - tmp = OPAL_ATOMIC_SWAP_PTR(&win->error_handler, errhandler); + OPAL_THREAD_LOCK(&win->w_lock); + /* Ditch the old errhandler, and decrement its refcount. */ + tmp = win->error_handler; + win->error_handler = errhandler; OBJ_RELEASE(tmp); + OPAL_THREAD_UNLOCK(&win->w_lock); /* All done */ return MPI_SUCCESS; diff --git a/ompi/mpi/c/win_set_info.c b/ompi/mpi/c/win_set_info.c index 677488366c0..31eca8f378b 100644 --- a/ompi/mpi/c/win_set_info.c +++ b/ompi/mpi/c/win_set_info.c @@ -2,6 +2,7 @@ * Copyright (c) 2013 Sandia National Laboratories. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -15,6 +16,8 @@ #include "ompi/runtime/params.h" #include "ompi/errhandler/errhandler.h" #include "ompi/win/win.h" +#include "ompi/communicator/communicator.h" +#include "opal/util/info_subscriber.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS @@ -45,6 +48,7 @@ int MPI_Win_set_info(MPI_Win win, MPI_Info info) OPAL_CR_ENTER_LIBRARY(); - ret = win->w_osc_module->osc_set_info(win, info); + ret = opal_infosubscribe_change_info(&(win->super), &(info->super)); + OMPI_ERRHANDLER_RETURN(ret, win, ret, FUNC_NAME); } diff --git a/ompi/mpi/c/win_shared_query.c b/ompi/mpi/c/win_shared_query.c index 769103cdefe..0b456320f96 100644 --- a/ompi/mpi/c/win_shared_query.c +++ b/ompi/mpi/c/win_shared_query.c @@ -1,6 +1,6 @@ /* * Copyright (c) 2012-2013 Sandia National Laboratories. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -39,7 +39,7 @@ int MPI_Win_shared_query(MPI_Win win, int rank, MPI_Aint *size, int *disp_unit, if (ompi_win_invalid(win)) { return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_WIN, FUNC_NAME); - } else if (ompi_win_peer_invalid(win, rank)) { + } else if (MPI_PROC_NULL != rank && ompi_win_peer_invalid(win, rank)) { return OMPI_ERRHANDLER_INVOKE(win, MPI_ERR_RANK, FUNC_NAME); } } diff --git a/ompi/mpi/c/win_unlock.c b/ompi/mpi/c/win_unlock.c index b32e9a7858f..c97bafc49d5 100644 --- a/ompi/mpi/c/win_unlock.c +++ b/ompi/mpi/c/win_unlock.c @@ -12,7 +12,7 @@ * All rights reserved. * Copyright (c) 2014 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -54,9 +54,6 @@ int MPI_Win_unlock(int rank, MPI_Win win) } } - /* NTH: do not bother keeping track of unlocking MPI_PROC_NULL. */ - if (MPI_PROC_NULL == rank) return MPI_SUCCESS; - OPAL_CR_ENTER_LIBRARY(); rc = win->w_osc_module->osc_unlock(rank, win); diff --git a/ompi/mpi/c/wtick.c b/ompi/mpi/c/wtick.c index 9f4795f192c..a246288e777 100644 --- a/ompi/mpi/c/wtick.c +++ b/ompi/mpi/c/wtick.c @@ -12,6 +12,9 @@ * Copyright (c) 2007-2014 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2015-2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. + * Copyright (c) 2017 Los Alamos National Security, LLC. All rights + * reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -24,6 +27,9 @@ #include #endif #include +#ifdef HAVE_TIME_H +#include +#endif #include MCA_timer_IMPLEMENTATION_HEADER #include "ompi/mpi/c/bindings.h" @@ -40,6 +46,11 @@ double MPI_Wtick(void) { OPAL_CR_NOOP_PROGRESS(); + /* + * See https://github.com/open-mpi/ompi/issues/3003 + * to get an idea what's going on here. + */ +#if 0 #if OPAL_TIMER_CYCLE_NATIVE { opal_timer_t freq = opal_timer_base_get_freq(); @@ -52,8 +63,21 @@ double MPI_Wtick(void) } #elif OPAL_TIMER_USEC_NATIVE return 0.000001; +#endif +#else +#if defined(__linux__) && OPAL_HAVE_CLOCK_GETTIME + struct timespec spec; + double wtick = 0.0; + if (0 == clock_getres(CLOCK_MONOTONIC, &spec)){ + wtick = spec.tv_sec + spec.tv_nsec * 1.0e-09; + } else { + /* guess */ + wtick = 1.0e-09; + } + return wtick; #else /* Otherwise, we already return usec precision. */ return 0.000001; #endif +#endif } diff --git a/ompi/mpi/c/wtime.c b/ompi/mpi/c/wtime.c index fa62c985c71..e4c64d15558 100644 --- a/ompi/mpi/c/wtime.c +++ b/ompi/mpi/c/wtime.c @@ -2,7 +2,7 @@ * Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2005 The University of Tennessee and The University + * Copyright (c) 2004-2018 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -12,6 +12,9 @@ * Copyright (c) 2006-2014 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. + * Copyright (c) 2017 Los Alamos National Security, LLC. All rights + * reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -24,6 +27,9 @@ #include #endif #include +#ifdef HAVE_TIME_H +#include +#endif /* HAVE_TIME_H */ #include MCA_timer_IMPLEMENTATION_HEADER #include "ompi/mpi/c/bindings.h" @@ -34,22 +40,57 @@ #pragma weak MPI_Wtime = PMPI_Wtime #endif #define MPI_Wtime PMPI_Wtime +/** + * Have a base time set on the first call to wtime, to improve the range + * and accuracy of the user visible timer. + * More info: https://github.com/mpi-forum/mpi-issues/issues/77#issuecomment-369663119 + */ +#if defined(__linux__) && OPAL_HAVE_CLOCK_GETTIME +struct timespec ompi_wtime_time_origin = {.tv_sec = 0}; +#else +struct timeval ompi_wtime_time_origin = {.tv_sec = 0}; +#endif +#else /* OMPI_BUILD_MPI_PROFILING */ +#if defined(__linux__) && OPAL_HAVE_CLOCK_GETTIME +extern struct timespec ompi_wtime_time_origin; +#else +extern struct timeval ompi_wtime_time_origin; +#endif #endif double MPI_Wtime(void) { double wtime; + /* + * See https://github.com/open-mpi/ompi/issues/3003 to find out + * what's happening here. + */ +#if 0 #if OPAL_TIMER_CYCLE_NATIVE wtime = ((double) opal_timer_base_get_cycles()) / opal_timer_base_get_freq(); #elif OPAL_TIMER_USEC_NATIVE wtime = ((double) opal_timer_base_get_usec()) / 1000000.0; +#endif +#else +#if defined(__linux__) && OPAL_HAVE_CLOCK_GETTIME + struct timespec tp; + (void) clock_gettime(CLOCK_MONOTONIC, &tp); + if( OPAL_UNLIKELY(0 == ompi_wtime_time_origin.tv_sec) ) { + ompi_wtime_time_origin = tp; + } + wtime = (double)(tp.tv_nsec - ompi_wtime_time_origin.tv_nsec)/1.0e+9; + wtime += (tp.tv_sec - ompi_wtime_time_origin.tv_sec); #else /* Fall back to gettimeofday() if we have nothing else */ struct timeval tv; gettimeofday(&tv, NULL); - wtime = tv.tv_sec; - wtime += (double)tv.tv_usec / 1000000.0; + if( OPAL_UNLIKELY(0 == ompi_wtime_time_origin.tv_sec) ) { + ompi_wtime_time_origin = tv; + } + wtime = (double)(tv.tv_usec - ompi_wtime_time_origin.tv_usec) / 1.0e+6; + wtime += (tv.tv_sec - ompi_wtime_time_origin.tv_sec); +#endif #endif OPAL_CR_NOOP_PROGRESS(); diff --git a/ompi/mpi/cxx/Makefile.am b/ompi/mpi/cxx/Makefile.am index e05d878118b..9abb4e6c9a0 100644 --- a/ompi/mpi/cxx/Makefile.am +++ b/ompi/mpi/cxx/Makefile.am @@ -12,6 +12,8 @@ # All rights reserved. # Copyright (c) 2007-2012 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2016 IBM Corporation. All rights reserved. +# Copyright (c) 2017 Research Organization for Information Science +# and Technology (RIST). All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -36,14 +38,10 @@ lib@OMPI_LIBMPI_NAME@_cxx_la_SOURCES = \ intercepts.cc \ comm.cc \ datatype.cc \ + file.cc \ win.cc \ cxx_glue.c -if OMPI_PROVIDE_MPI_FILE_INTERFACE -lib@OMPI_LIBMPI_NAME@_cxx_la_SOURCES += \ - file.cc -endif - lib@OMPI_LIBMPI_NAME@_cxx_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la lib@OMPI_LIBMPI_NAME@_cxx_la_LDFLAGS = -version-info $(libmpi_cxx_so_version) diff --git a/ompi/mpi/cxx/constants.h b/ompi/mpi/cxx/constants.h index eb4a991626b..255853e7d28 100644 --- a/ompi/mpi/cxx/constants.h +++ b/ompi/mpi/cxx/constants.h @@ -12,6 +12,8 @@ // All rights reserved. // Copyright (c) 2008-2009 Cisco Systems, Inc. All rights reserved. // Copyright (c) 2011 FUJITSU LIMITED. All rights reserved. +// Copyright (c) 2017 Research Organization for Information Science +// and Technology (RIST). All rights reserved. // $COPYRIGHT$ // // Additional copyrights may follow @@ -246,9 +248,7 @@ OMPI_DECLSPEC extern const Datatype DATATYPE_NULL; OMPI_DECLSPEC extern Request REQUEST_NULL; OMPI_DECLSPEC extern const Op OP_NULL; OMPI_DECLSPEC extern const Errhandler ERRHANDLER_NULL; -#if OMPI_PROVIDE_MPI_FILE_INTERFACE OMPI_DECLSPEC extern const File FILE_NULL; -#endif // constants specifying empty or ignored input OMPI_DECLSPEC extern const char** ARGV_NULL; @@ -261,7 +261,6 @@ OMPI_DECLSPEC extern const Group GROUP_EMPTY; static const int GRAPH = MPI_GRAPH; static const int CART = MPI_CART; -#if OMPI_PROVIDE_MPI_FILE_INTERFACE // MPI-2 IO static const int MODE_CREATE = MPI_MODE_CREATE; static const int MODE_RDONLY = MPI_MODE_RDONLY; @@ -282,7 +281,6 @@ static const int SEEK_END = ::SEEK_END; #endif static const int MAX_DATAREP_STRING = MPI_MAX_DATAREP_STRING; -#endif // one-sided constants static const int MODE_NOCHECK = MPI_MODE_NOCHECK; diff --git a/ompi/mpi/cxx/cxx_glue.c b/ompi/mpi/cxx/cxx_glue.c index 76aa41be6c9..c67a7001e8c 100644 --- a/ompi/mpi/cxx/cxx_glue.c +++ b/ompi/mpi/cxx/cxx_glue.c @@ -2,7 +2,7 @@ /* * Copyright (c) 2016 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2016 Research Organization for Information Science + * Copyright (c) 2016-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -49,12 +49,10 @@ int ompi_cxx_errhandler_invoke_comm (MPI_Comm comm, int ret, const char *message return OMPI_ERRHANDLER_INVOKE (comm, ret, message); } -#if OMPI_PROVIDE_MPI_FILE_INTERFACE int ompi_cxx_errhandler_invoke_file (MPI_File file, int ret, const char *message) { return OMPI_ERRHANDLER_INVOKE (file, ret, message); } -#endif int ompi_cxx_attr_create_keyval_comm (MPI_Comm_copy_attr_function *copy_fn, MPI_Comm_delete_attr_function* delete_fn, int *keyval, void *extra_state, @@ -114,7 +112,6 @@ MPI_Errhandler ompi_cxx_errhandler_create_win (ompi_cxx_dummy_fn_t *fn) return errhandler; } -#if OMPI_PROVIDE_MPI_FILE_INTERFACE MPI_Errhandler ompi_cxx_errhandler_create_file (ompi_cxx_dummy_fn_t *fn) { ompi_errhandler_t *errhandler; @@ -125,7 +122,6 @@ MPI_Errhandler ompi_cxx_errhandler_create_file (ompi_cxx_dummy_fn_t *fn) (ompi_errhandler_cxx_dispatch_fn_t *) ompi_mpi_cxx_file_errhandler_invoke; return errhandler; } -#endif ompi_cxx_intercept_file_extra_state_t *ompi_cxx_new_intercept_state (void *read_fn_cxx, void *write_fn_cxx, void *extent_fn_cxx, @@ -151,8 +147,6 @@ void ompi_cxx_errhandler_set_callbacks (struct ompi_errhandler_t *errhandler, MP ompi_file_errhandler_fn *eh_file_fn, MPI_Win_errhandler_function *eh_win_fn) { errhandler->eh_comm_fn = eh_comm_fn; -#if OMPI_PROVIDE_MPI_FILE_INTERFACE errhandler->eh_file_fn = eh_file_fn; -#endif errhandler->eh_win_fn = eh_win_fn; } diff --git a/ompi/mpi/cxx/cxx_glue.h b/ompi/mpi/cxx/cxx_glue.h index 8cb906f9f79..52686b444ed 100644 --- a/ompi/mpi/cxx/cxx_glue.h +++ b/ompi/mpi/cxx/cxx_glue.h @@ -1,8 +1,8 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* - * Copyright (c) 2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2016-2017 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2016 Research Organization for Information Science + * Copyright (c) 2016-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -15,7 +15,6 @@ #define OMPI_CXX_COMM_GLUE_H #include "ompi_config.h" -#include "ompi/errhandler/errhandler.h" #include #include "mpi.h" @@ -67,11 +66,9 @@ void ompi_mpi_cxx_comm_errhandler_invoke (MPI_Comm *mpi_comm, int *err, const char *message, void *comm_fn); void ompi_mpi_cxx_win_errhandler_invoke (MPI_Win *mpi_comm, int *err, const char *message, void *win_fn); -#if OMPI_PROVIDE_MPI_FILE_INTERFACE int ompi_cxx_errhandler_invoke_file (MPI_File file, int ret, const char *message); void ompi_mpi_cxx_file_errhandler_invoke (MPI_File *mpi_comm, int *err, const char *message, void *file_fn); -#endif MPI_Errhandler ompi_cxx_errhandler_create_comm (ompi_cxx_dummy_fn_t *fn); MPI_Errhandler ompi_cxx_errhandler_create_win (ompi_cxx_dummy_fn_t *fn); @@ -81,9 +78,6 @@ ompi_cxx_intercept_file_extra_state_t *ompi_cxx_new_intercept_state (void *read_fn_cxx, void *write_fn_cxx, void *extent_fn_cxx, void *extra_state_cxx); -void ompi_cxx_errhandler_set_cxx_dispatch_fn (struct ompi_errhandler_t *errhandler, - ompi_errhandler_cxx_dispatch_fn_t *dispatch_fn); - void ompi_cxx_errhandler_set_callbacks (struct ompi_errhandler_t *errhandler, MPI_Comm_errhandler_function *eh_comm_fn, ompi_file_errhandler_fn *eh_file_fn, MPI_Win_errhandler_function *eh_win_fn); diff --git a/ompi/mpi/cxx/intercepts.cc b/ompi/mpi/cxx/intercepts.cc index 3695ae7579b..331074c1505 100644 --- a/ompi/mpi/cxx/intercepts.cc +++ b/ompi/mpi/cxx/intercepts.cc @@ -14,6 +14,8 @@ // Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. // Copyright (c) 2016 Los Alamos National Security, LLC. All rights // reserved. +// Copyright (c) 2017 Research Organization for Information Science +// and Technology (RIST). All rights reserved. // $COPYRIGHT$ // // Additional copyrights may follow @@ -54,7 +56,6 @@ void ompi_mpi_cxx_comm_throw_excptn_fctn(MPI_Comm *, int *errcode, ...) va_end(ap); } -#if OMPI_PROVIDE_MPI_FILE_INTERFACE extern "C" void ompi_mpi_cxx_file_throw_excptn_fctn(MPI_File *, int *errcode, ...) { @@ -63,7 +64,6 @@ void ompi_mpi_cxx_file_throw_excptn_fctn(MPI_File *, int *errcode, ...) ompi_mpi_cxx_throw_exception(errcode); va_end(ap); } -#endif extern "C" void ompi_mpi_cxx_win_throw_excptn_fctn(MPI_Win *, int *errcode, ...) @@ -80,11 +80,7 @@ MPI::InitializeIntercepts() { ompi_cxx_errhandler_set_callbacks ((struct ompi_errhandler_t *) &ompi_mpi_errors_throw_exceptions, ompi_mpi_cxx_comm_throw_excptn_fctn, -#if OMPI_PROVIDE_MPI_FILE_INTERFACE ompi_mpi_cxx_file_throw_excptn_fctn, -#else - NULL, -#endif ompi_mpi_cxx_win_throw_excptn_fctn); } @@ -106,7 +102,6 @@ void ompi_mpi_cxx_comm_errhandler_invoke(MPI_Comm *c_comm, int *err, cxx_fn((MPI::Comm&) cxx_comm, err, message); } -#if OMPI_PROVIDE_MPI_FILE_INTERFACE // This function uses OMPI types, and is invoked with C linkage for // the express purpose of having a C++ entity call back the C++ // function (so that types can be converted, etc.). @@ -120,7 +115,6 @@ void ompi_mpi_cxx_file_errhandler_invoke(MPI_File *c_file, int *err, cxx_fn(cxx_file, err, message); } -#endif // This function uses OMPI types, and is invoked with C linkage for // the express purpose of having a C++ entity call back the C++ diff --git a/ompi/mpi/cxx/mpicxx.cc b/ompi/mpi/cxx/mpicxx.cc index bd5fb5d2158..7ee3ba3c368 100644 --- a/ompi/mpi/cxx/mpicxx.cc +++ b/ompi/mpi/cxx/mpicxx.cc @@ -13,6 +13,8 @@ // Copyright (c) 2007-2012 Cisco Systems, Inc. All rights reserved. // Copyright (c) 2007 Sun Microsystems, Inc. All rights reserved. // Copyright (c) 2011 FUJITSU LIMITED. All rights reserved. +// Copyright (c) 2017 Research Organization for Information Science +// and Technology (RIST). All rights reserved. // $COPYRIGHT$ // // Additional copyrights may follow @@ -152,9 +154,7 @@ const Datatype DATATYPE_NULL = MPI_DATATYPE_NULL; Request REQUEST_NULL = MPI_REQUEST_NULL; const Op OP_NULL = MPI_OP_NULL; const Errhandler ERRHANDLER_NULL; -#if OMPI_PROVIDE_MPI_FILE_INTERFACE const File FILE_NULL = MPI_FILE_NULL; -#endif // constants specifying empty or ignored input const char** ARGV_NULL = (const char**) MPI_ARGV_NULL; diff --git a/ompi/mpi/cxx/mpicxx.h b/ompi/mpi/cxx/mpicxx.h index f182e15058f..551e823b6a7 100644 --- a/ompi/mpi/cxx/mpicxx.h +++ b/ompi/mpi/cxx/mpicxx.h @@ -15,6 +15,8 @@ // Copyright (c) 2011 FUJITSU LIMITED. All rights reserved. // Copyright (c) 2016 Los Alamos National Security, LLC. All rights // reserved. +// Copyright (c) 2017 Research Organization for Information Science +// and Technology (RIST). All rights reserved. // $COPYRIGHT$ // // Additional copyrights may follow @@ -42,7 +44,7 @@ #include -#if OMPI_PROVIDE_MPI_FILE_INTERFACE && !defined(OMPI_IGNORE_CXX_SEEK) & OMPI_WANT_MPI_CXX_SEEK +#if !defined(OMPI_IGNORE_CXX_SEEK) & OMPI_WANT_MPI_CXX_SEEK // We need to include the header files that define SEEK_* or use them // in ways that require them to be #defines so that if the user // includes them later, the double inclusion logic in the headers will @@ -175,9 +177,7 @@ namespace MPI { class Status; class Info; class Win; -#if OMPI_PROVIDE_MPI_FILE_INTERFACE class File; -#endif typedef MPI_Aint Aint; typedef MPI_Fint Fint; @@ -207,9 +207,7 @@ namespace MPI { #include "ompi/mpi/cxx/group.h" #include "ompi/mpi/cxx/comm.h" #include "ompi/mpi/cxx/win.h" -#if OMPI_PROVIDE_MPI_FILE_INTERFACE #include "ompi/mpi/cxx/file.h" -#endif #include "ompi/mpi/cxx/errhandler.h" #include "ompi/mpi/cxx/intracomm.h" #include "ompi/mpi/cxx/topology.h" //includes Cartcomm and Graphcomm @@ -223,9 +221,7 @@ namespace MPI { #include "openmpi/ompi/mpi/cxx/group.h" #include "openmpi/ompi/mpi/cxx/comm.h" #include "openmpi/ompi/mpi/cxx/win.h" -#if OMPI_PROVIDE_MPI_FILE_INTERFACE #include "openmpi/ompi/mpi/cxx/file.h" -#endif #include "openmpi/ompi/mpi/cxx/errhandler.h" #include "openmpi/ompi/mpi/cxx/intracomm.h" #include "openmpi/ompi/mpi/cxx/topology.h" //includes Cartcomm and Graphcomm @@ -268,9 +264,7 @@ namespace MPI { #include "ompi/mpi/cxx/status_inln.h" #include "ompi/mpi/cxx/info_inln.h" #include "ompi/mpi/cxx/win_inln.h" -#if OMPI_PROVIDE_MPI_FILE_INTERFACE #include "ompi/mpi/cxx/file_inln.h" -#endif #else #include "openmpi/ompi/mpi/cxx/datatype_inln.h" #include "openmpi/ompi/mpi/cxx/functions_inln.h" @@ -285,10 +279,8 @@ namespace MPI { #include "openmpi/ompi/mpi/cxx/status_inln.h" #include "openmpi/ompi/mpi/cxx/info_inln.h" #include "openmpi/ompi/mpi/cxx/win_inln.h" -#if OMPI_PROVIDE_MPI_FILE_INTERFACE #include "openmpi/ompi/mpi/cxx/file_inln.h" #endif -#endif #endif // #if defined(c_plusplus) || defined(__cplusplus) #endif // #ifndef MPIPP_H_ diff --git a/ompi/mpi/cxx/status.h b/ompi/mpi/cxx/status.h index 872707890ff..614b93d2068 100644 --- a/ompi/mpi/cxx/status.h +++ b/ompi/mpi/cxx/status.h @@ -11,6 +11,8 @@ // Copyright (c) 2004-2005 The Regents of the University of California. // All rights reserved. // Copyright (c) 2006-2008 Cisco Systems, Inc. All rights reserved. +// Copyright (c) 2017 Research Organization for Information Science +// and Technology (RIST). All rights reserved. // $COPYRIGHT$ // // Additional copyrights may follow @@ -25,9 +27,7 @@ class Status { #endif friend class MPI::Comm; //so I can access pmpi_status data member in comm.cc friend class MPI::Request; //and also from request.cc -#if OMPI_PROVIDE_MPI_FILE_INTERFACE friend class MPI::File; -#endif public: #if 0 /* OMPI_ENABLE_MPI_PROFILING */ diff --git a/ompi/mpi/fortran/base/Makefile.am b/ompi/mpi/fortran/base/Makefile.am index 35738b27a40..7109e453c47 100644 --- a/ompi/mpi/fortran/base/Makefile.am +++ b/ompi/mpi/fortran/base/Makefile.am @@ -10,7 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2006-2015 Cisco Systems, Inc. All rights reserved. -# Copyright (c) 2015 Research Organization for Information Science +# Copyright (c) 2015-2017 Research Organization for Information Science # and Technology (RIST). All rights reserved. # $COPYRIGHT$ # @@ -45,7 +45,7 @@ libmpi_fortran_base_la_SOURCES = \ constants.h \ datarep.h \ fint_2_int.h \ - strings.h \ + fortran_base_strings.h \ attr_fn_f.c \ conversion_fn_null_f.c \ f90_accessors.c \ diff --git a/ompi/mpi/fortran/base/attr-fn-int-callback-interfaces.h b/ompi/mpi/fortran/base/attr-fn-int-callback-interfaces.h index 27c64cc6251..9bd5989bd46 100644 --- a/ompi/mpi/fortran/base/attr-fn-int-callback-interfaces.h +++ b/ompi/mpi/fortran/base/attr-fn-int-callback-interfaces.h @@ -4,8 +4,8 @@ ! Copyright (c) 2006-2014 Cisco Systems, Inc. All rights reserved. ! Copyright (c) 2013 Los Alamos National Security, LLC. All rights ! reserved. -! Copyright (c) 2015 Research Organization for Information Science -! and Technology (RIST). All rights reserved. +! Copyright (c) 2015-2018 Research Organization for Information Science +! and Technology (RIST). All rights reserved. ! $COPYRIGHT$ ! ! Additional copyrights may follow @@ -84,35 +84,35 @@ interface !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! - subroutine MPI_TYPE_DUP_FN( oldtype, type_keyval, extra_state, & + subroutine MPI_TYPE_DUP_FN( datatype, type_keyval, extra_state, & attribute_val_in, attribute_val_out, & flag, ierr ) implicit none include 'mpif-config.h' - integer :: oldtype + integer :: datatype integer :: type_keyval integer(KIND=MPI_ADDRESS_KIND) :: extra_state, attribute_val_in, attribute_val_out logical :: flag integer :: ierr end subroutine MPI_TYPE_DUP_FN - subroutine MPI_TYPE_NULL_COPY_FN( type, type_keyval, extra_state, & + subroutine MPI_TYPE_NULL_COPY_FN( datatype, type_keyval, extra_state, & attribute_val_in, attribute_val_out, & flag, ierr ) implicit none include 'mpif-config.h' - integer :: type + integer :: datatype integer :: type_keyval integer(kind=MPI_ADDRESS_KIND) :: extra_state, attribute_val_in, attribute_val_out integer :: ierr logical :: flag end subroutine MPI_TYPE_NULL_COPY_FN - subroutine MPI_TYPE_NULL_DELETE_FN( type, type_keyval, attribute_val_out, & + subroutine MPI_TYPE_NULL_DELETE_FN( datatype, type_keyval, attribute_val_out, & extra_state, ierr ) implicit none include 'mpif-config.h' - integer :: type + integer :: datatype integer :: type_keyval integer(kind=MPI_ADDRESS_KIND) :: attribute_val_out, extra_state integer :: ierr diff --git a/ompi/mpi/fortran/base/fortran_base_strings.h b/ompi/mpi/fortran/base/fortran_base_strings.h new file mode 100644 index 00000000000..c1e4f7513e7 --- /dev/null +++ b/ompi/mpi/fortran/base/fortran_base_strings.h @@ -0,0 +1,133 @@ +/* + * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana + * University Research and Technology + * Corporation. All rights reserved. + * Copyright (c) 2004-2005 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, + * University of Stuttgart. All rights reserved. + * Copyright (c) 2004-2005 The Regents of the University of California. + * All rights reserved. + * Copyright (c) 2010-2018 Cisco Systems, Inc. All rights reserved + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#ifndef OMPI_FORTRAN_BASE_STRINGS_H +#define OMPI_FORTRAN_BASE_STRINGS_H + +#include "ompi_config.h" + +BEGIN_C_DECLS + /** + * Convert a fortran string to a C string. + * + * @param fstr Fortran string + * @param len Fortran string length + * @param cstr Pointer to C string that will be created and returned + * + * @retval OMPI_SUCCESS upon success + * @retval OMPI_ERROR upon error + * + * This function is intended to be used in the MPI F77 bindings to + * convert fortran strings to C strings before invoking a back-end + * MPI C binding function. It will create a new C string and + * assign it to the cstr to return. The caller is responsible for + * eventually freeing the C string. + */ + OMPI_DECLSPEC int ompi_fortran_string_f2c(char *fstr, int len, char **cstr); + + /** + * Convert a C string to a fortran string. + * + * @param cstr C string + * @param fstr Fortran string (must already exist and be allocated) + * @param len Fortran string length + * + * @retval OMPI_SUCCESS upon success + * @retval OMPI_ERROR upon error + * + * This function is intended to be used in the MPI F77 bindings to + * convert C strings to fortran strings. It is assumed that the + * fortran string is already allocated and has a length of len. + */ + OMPI_DECLSPEC int ompi_fortran_string_c2f(char *cstr, char *fstr, int len); + + /** + * Convert an array of Fortran strings that are terminated with a + * blank line to an argv-style array of C strings. + * + * @param farray Array of fortran strings + * @param string_len Length of each fortran string in the array + * @param advance Number of bytes to advance to get to the next string + * @param cargv Returned argv-style array of C strings + * + * @retval OMPI_SUCCESS upon success + * @retval OMPI_ERROR upon error + * + * This function is intented to be used in the MPI F77 bindings to + * convert arrays of fortran strings to argv-style arrays of C + * strings. The argv array will be allocated and returned; it is + * the caller's responsibility to invoke opal_argv_free() to free + * it later (or equivalent). + * + * For 1D Fortran string arrays, advance will == string_len. + * + * However, when this function is used (indirectly) for + * MPI_COMM_SPAWN_MULTIPLE, a 2D array of Fortran strings is + * converted to individual C-style argv vectors. In this case, + * Fortran will intertwine the strings of the different argv + * vectors in memory; the displacement between the beginning of 2 + * strings in a single argv vector is (string_len * + * number_of_argv_arrays). Hence, the advance parameter is used + * to specify this displacement. + */ + OMPI_DECLSPEC int ompi_fortran_argv_blank_f2c(char *farray, int string_len, + int advancex, char ***cargv); + + /** + * Convert an array of a specific number of Fortran strings to an + * argv-style array of C strings. + * + * @param farray Array of fortran strings + * @param farray_length Number of entries in the farray array + * @param string_len Length of each fortran string in the array + * @param advance Number of bytes to advance to get to the next string + * @param cargv Returned argv-style array of C strings + * + * @retval OMPI_SUCCESS upon success + * @retval OMPI_ERROR upon error + * + * This function is just like ompi_fortran_argv_blank_f2c(), + * except that it uses farray_length to determine the length of + * farray (vs. looking for a blank string to look for the end of + * the array). + */ + OMPI_DECLSPEC int ompi_fortran_argv_count_f2c(char *farray, int farray_length, int string_len, + int advancex, char ***cargv); + + /** + * Convert an array of argvs to a C style array of argvs + * @param count Dimension of the array of argvs + * @param array Array of fortran argv + * @param len Length of Fortran array + * @param argv Returned C arrray of argvs + * + * This function is intented to be used in the MPI F77 bindings to + * convert arrays of fortran strings to argv-style arrays of C + * strings. The argv array will be allocated and returned; it is + * the caller's responsibility to invoke opal_argv_free() to free + * each content of argv array and call free to deallocate the argv + * array itself + */ + OMPI_DECLSPEC int ompi_fortran_multiple_argvs_f2c(int count, char *array, int len, + char ****argv); + +END_C_DECLS + + +#endif /* OMPI_FORTRAN_BASE_STRINGS_H */ diff --git a/ompi/mpi/fortran/base/gen-mpi-mangling.pl b/ompi/mpi/fortran/base/gen-mpi-mangling.pl index 96294f9fa9e..ab568b98ecd 100755 --- a/ompi/mpi/fortran/base/gen-mpi-mangling.pl +++ b/ompi/mpi/fortran/base/gen-mpi-mangling.pl @@ -77,20 +77,20 @@ $fortran->{argv_null} = { c_type => "char *", c_name => "mpi_fortran_argv_null", - f_type => "integer", + f_type => "character, dimension(1)", f_name => "MPI_ARGV_NULL", }; $fortran->{argvs_null} = { c_type => "char *", c_name => "mpi_fortran_argvs_null", - f_type => "integer", + f_type => "character, dimension(1, 1)", f_name => "MPI_ARGVS_NULL", }; $fortran->{errcodes_ignore} = { c_type => "int *", c_name => "mpi_fortran_errcodes_ignore", - f_type => "integer", + f_type => "integer, dimension(1)", f_name => "MPI_ERRCODES_IGNORE", }; $fortran->{status_ignore} = { diff --git a/ompi/mpi/fortran/base/strings.c b/ompi/mpi/fortran/base/strings.c index 1db122711b5..c8996afba6a 100644 --- a/ompi/mpi/fortran/base/strings.c +++ b/ompi/mpi/fortran/base/strings.c @@ -9,7 +9,9 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2010-2012 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2010-2018 Cisco Systems, Inc. All rights reserved + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -25,7 +27,7 @@ #include "ompi/constants.h" #include "opal/util/argv.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" /* @@ -99,11 +101,19 @@ int ompi_fortran_string_c2f(char *cstr, char *fstr, int len) /* - * creates a C argument vector from an F77 array of strings - * (terminated by a blank string) + * Creates a C argument vector from an F77 array of strings. The + * array is terminated by a blank string. + * + * This function is quite similar to ompi_fortran_argv_count_f2c(), + * that it looks for a blank string to know when it has finished + * traversing the entire array (vs. having the length of the array + * passed in as a parameter). + * + * This function is used to convert "argv" in MPI_COMM_SPAWN (which is + * defined to be terminated by a blank string). */ -int ompi_fortran_argv_f2c(char *array, int string_len, int advance, - char ***argv) +int ompi_fortran_argv_blank_f2c(char *array, int string_len, int advance, + char ***argv) { int err, argc = 0; char *cstr; @@ -139,8 +149,52 @@ int ompi_fortran_argv_f2c(char *array, int string_len, int advance, /* - * Creates a set of C argv arrays from an F77 array of argv's. The - * returned arrays need to be freed by the caller. + * Creates a C argument vector from an F77 array of array_len strings. + * + * This function is quite similar to ompi_fortran_argv_blank_f2c(), + * except that the length of the array is a parameter (vs. looking for + * a blank line to end the array). + * + * This function is used to convert "array_of_commands" in + * MPI_COMM_SPAWN_MULTIPLE (which is not precisely defined, but is + * assumed to be of length "count", and *not* terminated by a blank + * line). + */ +int ompi_fortran_argv_count_f2c(char *array, int array_len, int string_len, int advance, + char ***argv) +{ + int err, argc = 0; + char *cstr; + + /* Fortran lines up strings in memory, each delimited by \0. So + just convert them until we hit an extra \0. */ + + *argv = NULL; + for (int i = 0; i < array_len; ++i) { + if (OMPI_SUCCESS != (err = ompi_fortran_string_f2c(array, string_len, + &cstr))) { + opal_argv_free(*argv); + return err; + } + + if (OMPI_SUCCESS != (err = opal_argv_append(&argc, argv, cstr))) { + opal_argv_free(*argv); + free(cstr); + return err; + } + + free(cstr); + array += advance; + } + + return OMPI_SUCCESS; +} + + +/* + * Creates a set of C argv arrays from an F77 array of argv's (where + * each argv array is terminated by a blank string). The returned + * arrays need to be freed by the caller. */ int ompi_fortran_multiple_argvs_f2c(int num_argv_arrays, char *array, int string_len, char ****argv) @@ -153,9 +207,9 @@ int ompi_fortran_multiple_argvs_f2c(int num_argv_arrays, char *array, argv_array = (char ***) malloc (num_argv_arrays * sizeof(char **)); for (i = 0; i < num_argv_arrays; ++i) { - ret = ompi_fortran_argv_f2c(current_array, string_len, - string_len * num_argv_arrays, - &argv_array[i]); + ret = ompi_fortran_argv_blank_f2c(current_array, string_len, + string_len * num_argv_arrays, + &argv_array[i]); if (OMPI_SUCCESS != ret) { free(argv_array); return ret; diff --git a/ompi/mpi/fortran/base/strings.h b/ompi/mpi/fortran/base/strings.h deleted file mode 100644 index 98c3c868847..00000000000 --- a/ompi/mpi/fortran/base/strings.h +++ /dev/null @@ -1,112 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2005 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2010-2012 Cisco Systems, Inc. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef OMPI_FORTRAN_BASE_STRINGS_H -#define OMPI_FORTRAN_BASE_STRINGS_H - -#include "ompi_config.h" - -BEGIN_C_DECLS - /** - * Convert a fortran string to a C string. - * - * @param fstr Fortran string - * @param len Fortran string length - * @param cstr Pointer to C string that will be created and returned - * - * @retval OMPI_SUCCESS upon success - * @retval OMPI_ERROR upon error - * - * This function is intended to be used in the MPI F77 bindings to - * convert fortran strings to C strings before invoking a back-end - * MPI C binding function. It will create a new C string and - * assign it to the cstr to return. The caller is responsible for - * eventually freeing the C string. - */ - OMPI_DECLSPEC int ompi_fortran_string_f2c(char *fstr, int len, char **cstr); - - /** - * Convert a C string to a fortran string. - * - * @param cstr C string - * @param fstr Fortran string (must already exist and be allocated) - * @param len Fortran string length - * - * @retval OMPI_SUCCESS upon success - * @retval OMPI_ERROR upon error - * - * This function is intended to be used in the MPI F77 bindings to - * convert C strings to fortran strings. It is assumed that the - * fortran string is already allocated and has a length of len. - */ - OMPI_DECLSPEC int ompi_fortran_string_c2f(char *cstr, char *fstr, int len); - - /** - * Convert an array of Fortran strings to an argv-style array of C - * strings. - * - * @param farray Array of fortran strings - * @param string_len Length of each fortran string in the array - * @param advance Number of bytes to advance to get to the next string - * @param cargv Returned argv-style array of C strings - * - * @retval OMPI_SUCCESS upon success - * @retval OMPI_ERROR upon error - * - * This function is intented to be used in the MPI F77 bindings to - * convert arrays of fortran strings to argv-style arrays of C - * strings. The argv array will be allocated and returned; it is - * the caller's responsibility to invoke opal_argv_free() to free - * it later (or equivalent). - * - * For 1D Fortran string arrays, advance will == string_len. - * - * However, when this function is used (indirectly) for - * MPI_COMM_SPAWN_MULTIPLE, a 2D array of Fortran strings is - * converted to individual C-style argv vectors. In this case, - * Fortran will intertwine the strings of the different argv - * vectors in memory; the displacement between the beginning of 2 - * strings in a single argv vector is (string_len * - * number_of_argv_arrays). Hence, the advance parameter is used - * to specify this displacement. - */ - OMPI_DECLSPEC int ompi_fortran_argv_f2c(char *farray, int string_len, - int advancex, char ***cargv); - - /** - * Convert an array of argvs to a C style array of argvs - * @param count Dimension of the array of argvs - * @param array Array of fortran argv - * @param len Length of Fortran array - * @param argv Returned C arrray of argvs - * - * This function is intented to be used in the MPI F77 bindings to - * convert arrays of fortran strings to argv-style arrays of C - * strings. The argv array will be allocated and returned; it is - * the caller's responsibility to invoke opal_argv_free() to free - * each content of argv array and call free to deallocate the argv - * array itself - */ - OMPI_DECLSPEC int ompi_fortran_multiple_argvs_f2c(int count, char *array, int len, - char ****argv); - -END_C_DECLS - - -#endif /* OMPI_FORTRAN_BASE_STRINGS_H */ diff --git a/ompi/mpi/fortran/configure-fortran-output.h.in b/ompi/mpi/fortran/configure-fortran-output.h.in index 9f40f5344f3..7678966b530 100644 --- a/ompi/mpi/fortran/configure-fortran-output.h.in +++ b/ompi/mpi/fortran/configure-fortran-output.h.in @@ -3,6 +3,8 @@ ! Copyright (c) 2006-2014 Cisco Systems, Inc. All rights reserved. ! Copyright (c) 2009-2012 Los Alamos National Security, LLC. ! All rights reserved. +! Copyright (c) 2017-2018 Research Organization for Information Science +! and Technology (RIST). All rights reserved. ! ! $COPYRIGHT$ ! @@ -17,9 +19,6 @@ #ifndef OMPI_FORTRAN_CONFIGURE_OUTPUT_H #define OMPI_FORTRAN_CONFIGURE_OUTPUT_H -! Whether we're building the MPI IO interface or not -#define OMPI_PROVIDE_MPI_FILE_INTERFACE @OMPI_PROVIDE_MPI_FILE_INTERFACE@ - ! Whether we're using wrapper F08 functions or not #define OMPI_FORTRAN_NEED_WRAPPER_ROUTINES @OMPI_FORTRAN_NEED_WRAPPER_ROUTINES@ @@ -47,6 +46,8 @@ ! Line 2 of the ignore TKR syntax #define OMPI_FORTRAN_IGNORE_TKR_TYPE @OMPI_FORTRAN_IGNORE_TKR_TYPE@ + +#define OMPI_FORTRAN_BUILD_SIZEOF @OMPI_FORTRAN_BUILD_SIZEOF@ ! Integers #define OMPI_HAVE_FORTRAN_INTEGER1 @OMPI_HAVE_FORTRAN_INTEGER1@ diff --git a/ompi/mpi/fortran/mpiext/Makefile.am b/ompi/mpi/fortran/mpiext/Makefile.am index 542e7d47e19..869c2358152 100644 --- a/ompi/mpi/fortran/mpiext/Makefile.am +++ b/ompi/mpi/fortran/mpiext/Makefile.am @@ -1,5 +1,7 @@ # -# Copyright (c) 2012 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2012 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 Research Organization for Information Science +# and Technology (RIST). All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -18,6 +20,7 @@ if OMPI_BUILD_FORTRAN_USEMPI_OR_USEMPIF08_EXT AM_FCFLAGS = -I$(top_builddir)/ompi/include -I$(top_srcdir)/ompi/include \ $(OMPI_FC_MODULE_FLAG)$(top_builddir)/ompi/mpi/fortran/base \ + $(OMPI_FC_MODULE_FLAG)$(top_builddir)/ompi/mpi/fortran/use-mpi-f08/mod \ -I$(top_srcdir) $(FCFLAGS_f90) flibs = @@ -61,7 +64,7 @@ libforce_usempif08_module_to_be_built_la_SOURCES = mpi-f08-ext-module.F90 # manually here. Bummer! # -mpi_f08_ext.lo: $(top_builddir)/ompi/mpi/fortran/use-mpi-f08/mpi-f08-types.lo +mpi_f08_ext.lo: $(top_builddir)/ompi/mpi/fortran/use-mpi-f08-modules/mpi-f08-types.lo mpi_f08_ext.lo: mpi-f08-ext-module.F90 endif diff --git a/ompi/mpi/fortran/mpif-h/Makefile.am b/ompi/mpi/fortran/mpif-h/Makefile.am index 437adcb1228..7dc0131fafe 100644 --- a/ompi/mpi/fortran/mpif-h/Makefile.am +++ b/ompi/mpi/fortran/mpif-h/Makefile.am @@ -14,7 +14,7 @@ # Copyright (c) 2011-2013 Universite Bordeaux 1 # Copyright (c) 2013-2014 Los Alamos National Security, LLC. All rights # reserved. -# Copyright (c) 2015 Research Organization for Information Science +# Copyright (c) 2015-2017 Research Organization for Information Science # and Technology (RIST). All rights reserved. # Copyright (c) 2016 IBM Corporation. All rights reserved. # $COPYRIGHT$ @@ -206,6 +206,65 @@ lib@OMPI_LIBMPI_NAME@_mpifh_la_SOURCES += \ error_string_f.c \ exscan_f.c \ f_sync_reg_f.c \ + file_call_errhandler_f.c \ + file_close_f.c \ + file_create_errhandler_f.c \ + file_delete_f.c \ + file_get_amode_f.c \ + file_get_atomicity_f.c \ + file_get_byte_offset_f.c \ + file_get_errhandler_f.c \ + file_get_group_f.c \ + file_get_info_f.c \ + file_get_position_f.c \ + file_get_position_shared_f.c \ + file_get_size_f.c \ + file_get_type_extent_f.c \ + file_get_view_f.c \ + file_iread_at_f.c \ + file_iread_f.c \ + file_iread_at_all_f.c \ + file_iread_all_f.c \ + file_iread_shared_f.c \ + file_iwrite_at_f.c \ + file_iwrite_f.c \ + file_iwrite_at_all_f.c \ + file_iwrite_all_f.c \ + file_iwrite_shared_f.c \ + file_open_f.c \ + file_preallocate_f.c \ + file_read_all_begin_f.c \ + file_read_all_end_f.c \ + file_read_all_f.c \ + file_read_at_all_begin_f.c \ + file_read_at_all_end_f.c \ + file_read_at_all_f.c \ + file_read_at_f.c \ + file_read_f.c \ + file_read_ordered_begin_f.c \ + file_read_ordered_end_f.c \ + file_read_ordered_f.c \ + file_read_shared_f.c \ + file_seek_f.c \ + file_seek_shared_f.c \ + file_set_atomicity_f.c \ + file_set_errhandler_f.c \ + file_set_info_f.c \ + file_set_size_f.c \ + file_set_view_f.c \ + file_sync_f.c \ + file_write_all_begin_f.c \ + file_write_all_end_f.c \ + file_write_all_f.c \ + file_write_at_all_begin_f.c \ + file_write_at_all_end_f.c \ + file_write_at_all_f.c \ + file_write_at_f.c \ + file_write_f.c \ + file_write_ordered_begin_f.c \ + file_write_ordered_end_f.c \ + file_write_ordered_f.c \ + file_write_shared_f.c \ finalized_f.c \ finalize_f.c \ free_mem_f.c \ @@ -311,6 +370,7 @@ lib@OMPI_LIBMPI_NAME@_mpifh_la_SOURCES += \ reduce_local_f.c \ reduce_scatter_f.c \ reduce_scatter_block_f.c \ + register_datarep_f.c \ request_free_f.c \ request_get_status_f.c \ rsend_f.c \ @@ -431,69 +491,6 @@ lib@OMPI_LIBMPI_NAME@_mpifh_la_SOURCES += \ win_flush_local_f.c \ win_flush_local_all_f.c -if OMPI_PROVIDE_MPI_FILE_INTERFACE -lib@OMPI_LIBMPI_NAME@_mpifh_la_SOURCES += \ - file_call_errhandler_f.c \ - file_close_f.c \ - file_create_errhandler_f.c \ - file_delete_f.c \ - file_get_amode_f.c \ - file_get_atomicity_f.c \ - file_get_byte_offset_f.c \ - file_get_errhandler_f.c \ - file_get_group_f.c \ - file_get_info_f.c \ - file_get_position_f.c \ - file_get_position_shared_f.c \ - file_get_size_f.c \ - file_get_type_extent_f.c \ - file_get_view_f.c \ - file_iread_at_f.c \ - file_iread_f.c \ - file_iread_at_all_f.c \ - file_iread_all_f.c \ - file_iread_shared_f.c \ - file_iwrite_at_f.c \ - file_iwrite_f.c \ - file_iwrite_at_all_f.c \ - file_iwrite_all_f.c \ - file_iwrite_shared_f.c \ - file_open_f.c \ - file_preallocate_f.c \ - file_read_all_begin_f.c \ - file_read_all_end_f.c \ - file_read_all_f.c \ - file_read_at_all_begin_f.c \ - file_read_at_all_end_f.c \ - file_read_at_all_f.c \ - file_read_at_f.c \ - file_read_f.c \ - file_read_ordered_begin_f.c \ - file_read_ordered_end_f.c \ - file_read_ordered_f.c \ - file_read_shared_f.c \ - file_seek_f.c \ - file_seek_shared_f.c \ - file_set_atomicity_f.c \ - file_set_errhandler_f.c \ - file_set_info_f.c \ - file_set_size_f.c \ - file_set_view_f.c \ - file_sync_f.c \ - file_write_all_begin_f.c \ - file_write_all_end_f.c \ - file_write_all_f.c \ - file_write_at_all_begin_f.c \ - file_write_at_all_end_f.c \ - file_write_at_all_f.c \ - file_write_at_f.c \ - file_write_f.c \ - file_write_ordered_begin_f.c \ - file_write_ordered_end_f.c \ - file_write_ordered_f.c \ - file_write_shared_f.c \ - register_datarep_f.c -endif endif # diff --git a/ompi/mpi/fortran/mpif-h/add_error_string_f.c b/ompi/mpi/fortran/mpif-h/add_error_string_f.c index 24a854dd338..bb95c144a9d 100644 --- a/ompi/mpi/fortran/mpif-h/add_error_string_f.c +++ b/ompi/mpi/fortran/mpif-h/add_error_string_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -23,7 +23,7 @@ #include "ompi/mpi/fortran/mpif-h/bindings.h" #include "ompi/mpi/fortran/base/constants.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #include "ompi/communicator/communicator.h" #if OMPI_BUILD_MPI_PROFILING diff --git a/ompi/mpi/fortran/mpif-h/attr_get_f.c b/ompi/mpi/fortran/mpif-h/attr_get_f.c index 5e4ca187691..bc4c910ca94 100644 --- a/ompi/mpi/fortran/mpif-h/attr_get_f.c +++ b/ompi/mpi/fortran/mpif-h/attr_get_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -77,10 +77,10 @@ void ompi_attr_get_f(MPI_Fint *comm, MPI_Fint *keyval, /* This stuff is very confusing. Be sure to see the comment at the top of src/attributes/attributes.c. */ - c_ierr = ompi_attr_get_fortran_mpi1(c_comm->c_keyhash, - OMPI_FINT_2_INT(*keyval), - attribute_val, - OMPI_LOGICAL_SINGLE_NAME_CONVERT(flag)); + c_ierr = ompi_attr_get_fint(c_comm->c_keyhash, + OMPI_FINT_2_INT(*keyval), + attribute_val, + OMPI_LOGICAL_SINGLE_NAME_CONVERT(flag)); if (NULL != ierr) *ierr = OMPI_INT_2_FINT(c_ierr); OMPI_SINGLE_INT_2_LOGICAL(flag); diff --git a/ompi/mpi/fortran/mpif-h/attr_put_f.c b/ompi/mpi/fortran/mpif-h/attr_put_f.c index f4908704aa6..db45fc7e318 100644 --- a/ompi/mpi/fortran/mpif-h/attr_put_f.c +++ b/ompi/mpi/fortran/mpif-h/attr_put_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -76,11 +76,11 @@ void ompi_attr_put_f(MPI_Fint *comm, MPI_Fint *keyval, MPI_Fint *attribute_val, /* This stuff is very confusing. Be sure to see the comment at the top of src/attributes/attributes.c. */ - c_err = ompi_attr_set_fortran_mpi1(COMM_ATTR, - c_comm, - &c_comm->c_keyhash, - OMPI_FINT_2_INT(*keyval), - *attribute_val, - false); + c_err = ompi_attr_set_fint(COMM_ATTR, + c_comm, + &c_comm->c_keyhash, + OMPI_FINT_2_INT(*keyval), + *attribute_val, + false); if (NULL != ierr) *ierr = OMPI_INT_2_FINT(c_err); } diff --git a/ompi/mpi/fortran/mpif-h/close_port_f.c b/ompi/mpi/fortran/mpif-h/close_port_f.c index eaf95750e55..434b33ac9b6 100644 --- a/ompi/mpi/fortran/mpif-h/close_port_f.c +++ b/ompi/mpi/fortran/mpif-h/close_port_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -22,7 +22,7 @@ #include "ompi_config.h" #include "ompi/mpi/fortran/mpif-h/bindings.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/comm_accept_f.c b/ompi/mpi/fortran/mpif-h/comm_accept_f.c index 257e2c3062b..2e25674bbb9 100644 --- a/ompi/mpi/fortran/mpif-h/comm_accept_f.c +++ b/ompi/mpi/fortran/mpif-h/comm_accept_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -22,7 +22,7 @@ #include "ompi_config.h" #include "ompi/mpi/fortran/mpif-h/bindings.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/comm_connect_f.c b/ompi/mpi/fortran/mpif-h/comm_connect_f.c index 3acaaa62751..6e3092c6d0f 100644 --- a/ompi/mpi/fortran/mpif-h/comm_connect_f.c +++ b/ompi/mpi/fortran/mpif-h/comm_connect_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -22,7 +22,7 @@ #include "ompi_config.h" #include "ompi/mpi/fortran/mpif-h/bindings.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/comm_create_keyval_f.c b/ompi/mpi/fortran/mpif-h/comm_create_keyval_f.c index 61ca83a48fb..4ed8f95e25f 100644 --- a/ompi/mpi/fortran/mpif-h/comm_create_keyval_f.c +++ b/ompi/mpi/fortran/mpif-h/comm_create_keyval_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2007-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -39,7 +39,7 @@ OMPI_GENERATE_F77_BINDINGS (PMPI_COMM_CREATE_KEYVAL, pmpi_comm_create_keyval_, pmpi_comm_create_keyval__, pompi_comm_create_keyval_f, - (ompi_mpi2_fortran_copy_attr_function* comm_copy_attr_fn, ompi_mpi2_fortran_delete_attr_function* comm_delete_attr_fn, MPI_Fint *comm_keyval, MPI_Aint *extra_state, MPI_Fint *ierr), + (ompi_aint_copy_attr_function* comm_copy_attr_fn, ompi_aint_delete_attr_function* comm_delete_attr_fn, MPI_Fint *comm_keyval, MPI_Aint *extra_state, MPI_Fint *ierr), (comm_copy_attr_fn, comm_delete_attr_fn, comm_keyval, extra_state, ierr) ) #endif #endif @@ -59,7 +59,7 @@ OMPI_GENERATE_F77_BINDINGS (MPI_COMM_CREATE_KEYVAL, mpi_comm_create_keyval_, mpi_comm_create_keyval__, ompi_comm_create_keyval_f, - (ompi_mpi2_fortran_copy_attr_function* comm_copy_attr_fn, ompi_mpi2_fortran_delete_attr_function* comm_delete_attr_fn, MPI_Fint *comm_keyval, MPI_Aint *extra_state, MPI_Fint *ierr), + (ompi_aint_copy_attr_function* comm_copy_attr_fn, ompi_aint_delete_attr_function* comm_delete_attr_fn, MPI_Fint *comm_keyval, MPI_Aint *extra_state, MPI_Fint *ierr), (comm_copy_attr_fn, comm_delete_attr_fn, comm_keyval, extra_state, ierr) ) #else #define ompi_comm_create_keyval_f pompi_comm_create_keyval_f @@ -69,8 +69,8 @@ OMPI_GENERATE_F77_BINDINGS (MPI_COMM_CREATE_KEYVAL, static const char FUNC_NAME[] = "MPI_Comm_create_keyval_f"; -void ompi_comm_create_keyval_f(ompi_mpi2_fortran_copy_attr_function* comm_copy_attr_fn, - ompi_mpi2_fortran_delete_attr_function* comm_delete_attr_fn, +void ompi_comm_create_keyval_f(ompi_aint_copy_attr_function* comm_copy_attr_fn, + ompi_aint_delete_attr_function* comm_delete_attr_fn, MPI_Fint *comm_keyval, MPI_Aint *extra_state, MPI_Fint *ierr) { @@ -79,8 +79,8 @@ void ompi_comm_create_keyval_f(ompi_mpi2_fortran_copy_attr_function* comm_copy_a ompi_attribute_fn_ptr_union_t copy_fn; ompi_attribute_fn_ptr_union_t del_fn; - copy_fn.attr_mpi2_fortran_copy_fn = comm_copy_attr_fn; - del_fn.attr_mpi2_fortran_delete_fn = comm_delete_attr_fn; + copy_fn.attr_aint_copy_fn = comm_copy_attr_fn; + del_fn.attr_aint_delete_fn = comm_delete_attr_fn; /* Note that we only set the "F77" bit and exclude the "F77_OLD" bit, indicating that the callbacks should use the new MPI-2 diff --git a/ompi/mpi/fortran/mpif-h/comm_get_attr_f.c b/ompi/mpi/fortran/mpif-h/comm_get_attr_f.c index d5570d8bf11..1253256e941 100644 --- a/ompi/mpi/fortran/mpif-h/comm_get_attr_f.c +++ b/ompi/mpi/fortran/mpif-h/comm_get_attr_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -78,10 +78,10 @@ void ompi_comm_get_attr_f(MPI_Fint *comm, MPI_Fint *comm_keyval, /* This stuff is very confusing. Be sure to see the comment at the top of src/attributes/attributes.c. */ - c_ierr = ompi_attr_get_fortran_mpi2(c_comm->c_keyhash, - OMPI_FINT_2_INT(*comm_keyval), - attribute_val, - OMPI_LOGICAL_SINGLE_NAME_CONVERT(flag)); + c_ierr = ompi_attr_get_aint(c_comm->c_keyhash, + OMPI_FINT_2_INT(*comm_keyval), + attribute_val, + OMPI_LOGICAL_SINGLE_NAME_CONVERT(flag)); if (NULL != ierr) *ierr = OMPI_INT_2_FINT(c_ierr); OMPI_SINGLE_INT_2_LOGICAL(flag); diff --git a/ompi/mpi/fortran/mpif-h/comm_get_name_f.c b/ompi/mpi/fortran/mpif-h/comm_get_name_f.c index af600628211..59d2808d441 100644 --- a/ompi/mpi/fortran/mpif-h/comm_get_name_f.c +++ b/ompi/mpi/fortran/mpif-h/comm_get_name_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -22,7 +22,7 @@ #include "ompi_config.h" #include "ompi/mpi/fortran/mpif-h/bindings.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #include "ompi/constants.h" #include "ompi/communicator/communicator.h" diff --git a/ompi/mpi/fortran/mpif-h/comm_set_attr_f.c b/ompi/mpi/fortran/mpif-h/comm_set_attr_f.c index 79d14c7126e..ad85ab671df 100644 --- a/ompi/mpi/fortran/mpif-h/comm_set_attr_f.c +++ b/ompi/mpi/fortran/mpif-h/comm_set_attr_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -76,11 +76,11 @@ void ompi_comm_set_attr_f(MPI_Fint *comm, MPI_Fint *comm_keyval, /* This stuff is very confusing. Be sure to see the comment at the top of src/attributes/attributes.c. */ - c_ierr = ompi_attr_set_fortran_mpi2(COMM_ATTR, - c_comm, - &c_comm->c_keyhash, - OMPI_FINT_2_INT(*comm_keyval), - *attribute_val, - false); + c_ierr = ompi_attr_set_aint(COMM_ATTR, + c_comm, + &c_comm->c_keyhash, + OMPI_FINT_2_INT(*comm_keyval), + *attribute_val, + false); if (NULL != ierr) *ierr = OMPI_INT_2_FINT(c_ierr); } diff --git a/ompi/mpi/fortran/mpif-h/comm_set_name_f.c b/ompi/mpi/fortran/mpif-h/comm_set_name_f.c index 1bbfed6a779..6dbffcc9928 100644 --- a/ompi/mpi/fortran/mpif-h/comm_set_name_f.c +++ b/ompi/mpi/fortran/mpif-h/comm_set_name_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -24,7 +24,7 @@ #include "ompi/mpi/fortran/mpif-h/bindings.h" #include "ompi/constants.h" #include "ompi/communicator/communicator.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/comm_spawn_f.c b/ompi/mpi/fortran/mpif-h/comm_spawn_f.c index 2ad50ec7215..17c290e561d 100644 --- a/ompi/mpi/fortran/mpif-h/comm_spawn_f.c +++ b/ompi/mpi/fortran/mpif-h/comm_spawn_f.c @@ -9,8 +9,8 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2010-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2010-2018 Cisco Systems, Inc. All rights reserved + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -23,7 +23,7 @@ #include "ompi/mpi/fortran/mpif-h/bindings.h" #include "ompi/mpi/fortran/base/constants.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #include "opal/util/argv.h" #if OMPI_BUILD_MPI_PROFILING @@ -101,7 +101,7 @@ void ompi_comm_spawn_f(char *command, char *argv, MPI_Fint *maxprocs, if (OMPI_IS_FORTRAN_ARGV_NULL(argv)) { c_argv = MPI_ARGV_NULL; } else { - ompi_fortran_argv_f2c(argv, string_len, string_len, &c_argv); + ompi_fortran_argv_blank_f2c(argv, string_len, string_len, &c_argv); } c_ierr = PMPI_Comm_spawn(c_command, c_argv, diff --git a/ompi/mpi/fortran/mpif-h/comm_spawn_multiple_f.c b/ompi/mpi/fortran/mpif-h/comm_spawn_multiple_f.c index 867934e138a..c4b2d4270dd 100644 --- a/ompi/mpi/fortran/mpif-h/comm_spawn_multiple_f.c +++ b/ompi/mpi/fortran/mpif-h/comm_spawn_multiple_f.c @@ -9,8 +9,8 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2010-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2010-2018 Cisco Systems, Inc. All rights reserved + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2016 Los Alamos National Security, LLC. All rights * reserved. @@ -25,7 +25,7 @@ #include "ompi/mpi/fortran/mpif-h/bindings.h" #include "ompi/mpi/fortran/base/constants.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #include "opal/util/argv.h" @@ -115,8 +115,8 @@ void ompi_comm_spawn_multiple_f(MPI_Fint *count, char *array_commands, OMPI_ARRAY_FINT_2_INT(array_maxprocs, array_size); - ompi_fortran_argv_f2c(array_commands, cmd_string_len, - cmd_string_len, &c_array_commands); + ompi_fortran_argv_count_f2c(array_commands, array_size, cmd_string_len, + cmd_string_len, &c_array_commands); c_info = (MPI_Info *) malloc (array_size * sizeof(MPI_Info)); for (i = 0; i < array_size; ++i) { diff --git a/ompi/mpi/fortran/mpif-h/error_string_f.c b/ompi/mpi/fortran/mpif-h/error_string_f.c index 2462a051f30..7b5f10f9eb6 100644 --- a/ompi/mpi/fortran/mpif-h/error_string_f.c +++ b/ompi/mpi/fortran/mpif-h/error_string_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -22,7 +22,7 @@ #include "ompi_config.h" #include "ompi/mpi/fortran/mpif-h/bindings.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #include "ompi/constants.h" #include "ompi/communicator/communicator.h" diff --git a/ompi/mpi/fortran/mpif-h/file_delete_f.c b/ompi/mpi/fortran/mpif-h/file_delete_f.c index 8c566470802..36a6179f0c7 100644 --- a/ompi/mpi/fortran/mpif-h/file_delete_f.c +++ b/ompi/mpi/fortran/mpif-h/file_delete_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -22,7 +22,7 @@ #include "ompi_config.h" #include "ompi/mpi/fortran/mpif-h/bindings.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #include "ompi/file/file.h" #if OMPI_BUILD_MPI_PROFILING diff --git a/ompi/mpi/fortran/mpif-h/file_get_view_f.c b/ompi/mpi/fortran/mpif-h/file_get_view_f.c index b5acefea4e3..4543337b119 100644 --- a/ompi/mpi/fortran/mpif-h/file_get_view_f.c +++ b/ompi/mpi/fortran/mpif-h/file_get_view_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -22,7 +22,7 @@ #include "ompi_config.h" #include "ompi/mpi/fortran/mpif-h/bindings.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #include "ompi/file/file.h" #if OMPI_BUILD_MPI_PROFILING diff --git a/ompi/mpi/fortran/mpif-h/file_open_f.c b/ompi/mpi/fortran/mpif-h/file_open_f.c index eb144c6238d..8049987dda4 100644 --- a/ompi/mpi/fortran/mpif-h/file_open_f.c +++ b/ompi/mpi/fortran/mpif-h/file_open_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -22,7 +22,7 @@ #include "ompi_config.h" #include "ompi/mpi/fortran/mpif-h/bindings.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #include "ompi/file/file.h" #if OMPI_BUILD_MPI_PROFILING diff --git a/ompi/mpi/fortran/mpif-h/file_set_view_f.c b/ompi/mpi/fortran/mpif-h/file_set_view_f.c index 69ced3e734f..5e301d2d698 100644 --- a/ompi/mpi/fortran/mpif-h/file_set_view_f.c +++ b/ompi/mpi/fortran/mpif-h/file_set_view_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -22,7 +22,7 @@ #include "ompi_config.h" #include "ompi/mpi/fortran/mpif-h/bindings.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #include "ompi/file/file.h" #if OMPI_BUILD_MPI_PROFILING diff --git a/ompi/mpi/fortran/mpif-h/get_library_version_f.c b/ompi/mpi/fortran/mpif-h/get_library_version_f.c index a10966a0d25..429eee154d4 100644 --- a/ompi/mpi/fortran/mpif-h/get_library_version_f.c +++ b/ompi/mpi/fortran/mpif-h/get_library_version_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -22,7 +22,7 @@ #include "ompi_config.h" #include "ompi/mpi/fortran/mpif-h/bindings.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/get_processor_name_f.c b/ompi/mpi/fortran/mpif-h/get_processor_name_f.c index 1f36f671eec..db420f8c88d 100644 --- a/ompi/mpi/fortran/mpif-h/get_processor_name_f.c +++ b/ompi/mpi/fortran/mpif-h/get_processor_name_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -24,7 +24,7 @@ #include "ompi/mpi/fortran/mpif-h/bindings.h" #include "ompi/constants.h" #include "ompi/communicator/communicator.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/info_delete_f.c b/ompi/mpi/fortran/mpif-h/info_delete_f.c index 4197a53f0d0..08e3156a43a 100644 --- a/ompi/mpi/fortran/mpif-h/info_delete_f.c +++ b/ompi/mpi/fortran/mpif-h/info_delete_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -24,7 +24,7 @@ #include "ompi/mpi/fortran/mpif-h/bindings.h" #include "ompi/constants.h" #include "ompi/communicator/communicator.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/info_get_f.c b/ompi/mpi/fortran/mpif-h/info_get_f.c index 48082786fb4..8fa6eb0e7b2 100644 --- a/ompi/mpi/fortran/mpif-h/info_get_f.c +++ b/ompi/mpi/fortran/mpif-h/info_get_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -24,7 +24,7 @@ #include "ompi/mpi/fortran/mpif-h/bindings.h" #include "ompi/constants.h" #include "ompi/communicator/communicator.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/info_get_nthkey_f.c b/ompi/mpi/fortran/mpif-h/info_get_nthkey_f.c index 31fdcdc24b5..ecfd3e12ff8 100644 --- a/ompi/mpi/fortran/mpif-h/info_get_nthkey_f.c +++ b/ompi/mpi/fortran/mpif-h/info_get_nthkey_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -24,7 +24,7 @@ #include "ompi/mpi/fortran/mpif-h/bindings.h" #include "ompi/constants.h" #include "ompi/communicator/communicator.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/info_get_valuelen_f.c b/ompi/mpi/fortran/mpif-h/info_get_valuelen_f.c index 2b2b68567a7..335514d746a 100644 --- a/ompi/mpi/fortran/mpif-h/info_get_valuelen_f.c +++ b/ompi/mpi/fortran/mpif-h/info_get_valuelen_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -24,7 +24,7 @@ #include "ompi/mpi/fortran/mpif-h/bindings.h" #include "ompi/constants.h" #include "ompi/communicator/communicator.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/info_set_f.c b/ompi/mpi/fortran/mpif-h/info_set_f.c index a6eca5722e5..f08e8a29544 100644 --- a/ompi/mpi/fortran/mpif-h/info_set_f.c +++ b/ompi/mpi/fortran/mpif-h/info_set_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -24,7 +24,7 @@ #include "ompi/mpi/fortran/mpif-h/bindings.h" #include "ompi/constants.h" #include "ompi/communicator/communicator.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/keyval_create_f.c b/ompi/mpi/fortran/mpif-h/keyval_create_f.c index 3fa0515381d..bce528b8c67 100644 --- a/ompi/mpi/fortran/mpif-h/keyval_create_f.c +++ b/ompi/mpi/fortran/mpif-h/keyval_create_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2007-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -39,7 +39,7 @@ OMPI_GENERATE_F77_BINDINGS (PMPI_KEYVAL_CREATE, pmpi_keyval_create_, pmpi_keyval_create__, pompi_keyval_create_f, - (ompi_mpi1_fortran_copy_attr_function* copy_fn, ompi_mpi1_fortran_delete_attr_function* delete_fn, MPI_Fint *keyval, MPI_Fint *extra_state, MPI_Fint *ierr), + (ompi_fint_copy_attr_function* copy_fn, ompi_fint_delete_attr_function* delete_fn, MPI_Fint *keyval, MPI_Fint *extra_state, MPI_Fint *ierr), (copy_fn, delete_fn, keyval, extra_state, ierr) ) #endif #endif @@ -59,7 +59,7 @@ OMPI_GENERATE_F77_BINDINGS (MPI_KEYVAL_CREATE, mpi_keyval_create_, mpi_keyval_create__, ompi_keyval_create_f, - (ompi_mpi1_fortran_copy_attr_function* copy_fn, ompi_mpi1_fortran_delete_attr_function* delete_fn, MPI_Fint *keyval, MPI_Fint *extra_state, MPI_Fint *ierr), + (ompi_fint_copy_attr_function* copy_fn, ompi_fint_delete_attr_function* delete_fn, MPI_Fint *keyval, MPI_Fint *extra_state, MPI_Fint *ierr), (copy_fn, delete_fn, keyval, extra_state, ierr) ) #else #define ompi_keyval_create_f pompi_keyval_create_f @@ -68,8 +68,8 @@ OMPI_GENERATE_F77_BINDINGS (MPI_KEYVAL_CREATE, static const char FUNC_NAME[] = "MPI_keyval_create_f"; -void ompi_keyval_create_f(ompi_mpi1_fortran_copy_attr_function* copy_attr_fn, - ompi_mpi1_fortran_delete_attr_function* delete_attr_fn, +void ompi_keyval_create_f(ompi_fint_copy_attr_function* copy_attr_fn, + ompi_fint_delete_attr_function* delete_attr_fn, MPI_Fint *keyval, MPI_Fint *extra_state, MPI_Fint *ierr) { @@ -78,8 +78,8 @@ void ompi_keyval_create_f(ompi_mpi1_fortran_copy_attr_function* copy_attr_fn, ompi_attribute_fn_ptr_union_t copy_fn; ompi_attribute_fn_ptr_union_t del_fn; - copy_fn.attr_mpi1_fortran_copy_fn = copy_attr_fn; - del_fn.attr_mpi1_fortran_delete_fn = delete_attr_fn; + copy_fn.attr_fint_copy_fn = copy_attr_fn; + del_fn.attr_fint_delete_fn = delete_attr_fn; /* Set the "F77_OLD" bit to denote that the callbacks should use the old MPI-1 INTEGER-parameter functions (as opposed to the @@ -88,7 +88,7 @@ void ompi_keyval_create_f(ompi_mpi1_fortran_copy_attr_function* copy_attr_fn, ret = ompi_attr_create_keyval_fint(COMM_ATTR, copy_fn, del_fn, OMPI_SINGLE_NAME_CONVERT(keyval), *extra_state, - OMPI_KEYVAL_F77 | OMPI_KEYVAL_F77_MPI1, + OMPI_KEYVAL_F77, NULL); if (MPI_SUCCESS != ret) { diff --git a/ompi/mpi/fortran/mpif-h/lookup_name_f.c b/ompi/mpi/fortran/mpif-h/lookup_name_f.c index 766361e809f..3f17c626ea9 100644 --- a/ompi/mpi/fortran/mpif-h/lookup_name_f.c +++ b/ompi/mpi/fortran/mpif-h/lookup_name_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -22,7 +22,7 @@ #include "ompi_config.h" #include "ompi/mpi/fortran/mpif-h/bindings.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/open_port_f.c b/ompi/mpi/fortran/mpif-h/open_port_f.c index 167bf055506..60f0c553275 100644 --- a/ompi/mpi/fortran/mpif-h/open_port_f.c +++ b/ompi/mpi/fortran/mpif-h/open_port_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -22,7 +22,7 @@ #include "ompi_config.h" #include "ompi/mpi/fortran/mpif-h/bindings.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/pack_external_f.c b/ompi/mpi/fortran/mpif-h/pack_external_f.c index 461211064ef..3367761ee6c 100644 --- a/ompi/mpi/fortran/mpif-h/pack_external_f.c +++ b/ompi/mpi/fortran/mpif-h/pack_external_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -25,7 +25,7 @@ #include "ompi/constants.h" #include "ompi/communicator/communicator.h" #include "ompi/mpi/fortran/base/constants.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/pack_external_size_f.c b/ompi/mpi/fortran/mpif-h/pack_external_size_f.c index 8e9913acdaf..5937b4ee200 100644 --- a/ompi/mpi/fortran/mpif-h/pack_external_size_f.c +++ b/ompi/mpi/fortran/mpif-h/pack_external_size_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2007-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -25,7 +25,7 @@ #include "ompi/constants.h" #include "ompi/communicator/communicator.h" #include "ompi/mpi/fortran/base/constants.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/profile/Makefile.am b/ompi/mpi/fortran/mpif-h/profile/Makefile.am index bfc22b2286f..790c660e377 100644 --- a/ompi/mpi/fortran/mpif-h/profile/Makefile.am +++ b/ompi/mpi/fortran/mpif-h/profile/Makefile.am @@ -15,7 +15,7 @@ # Copyright (c) 2011-2013 Universite Bordeaux 1 # Copyright (c) 2013-2014 Los Alamos National Security, LLC. All rights # reserved. -# Copyright (c) 2015 Research Organization for Information Science +# Copyright (c) 2015-2017 Research Organization for Information Science # and Technology (RIST). All rights reserved. # $COPYRIGHT$ # @@ -122,6 +122,65 @@ linked_files = \ perror_string_f.c \ pexscan_f.c \ pf_sync_reg_f.c \ + pfile_call_errhandler_f.c \ + pfile_close_f.c \ + pfile_create_errhandler_f.c \ + pfile_delete_f.c \ + pfile_get_amode_f.c \ + pfile_get_atomicity_f.c \ + pfile_get_byte_offset_f.c \ + pfile_get_errhandler_f.c \ + pfile_get_group_f.c \ + pfile_get_info_f.c \ + pfile_get_position_f.c \ + pfile_get_position_shared_f.c \ + pfile_get_size_f.c \ + pfile_get_type_extent_f.c \ + pfile_get_view_f.c \ + pfile_iread_at_f.c \ + pfile_iread_f.c \ + pfile_iread_at_all_f.c \ + pfile_iread_all_f.c \ + pfile_iread_shared_f.c \ + pfile_iwrite_at_f.c \ + pfile_iwrite_f.c \ + pfile_iwrite_at_all_f.c \ + pfile_iwrite_all_f.c \ + pfile_iwrite_shared_f.c \ + pfile_open_f.c \ + pfile_preallocate_f.c \ + pfile_read_all_begin_f.c \ + pfile_read_all_end_f.c \ + pfile_read_all_f.c \ + pfile_read_at_all_begin_f.c \ + pfile_read_at_all_end_f.c \ + pfile_read_at_all_f.c \ + pfile_read_at_f.c \ + pfile_read_f.c \ + pfile_read_ordered_begin_f.c \ + pfile_read_ordered_end_f.c \ + pfile_read_ordered_f.c \ + pfile_read_shared_f.c \ + pfile_seek_f.c \ + pfile_seek_shared_f.c \ + pfile_set_atomicity_f.c \ + pfile_set_errhandler_f.c \ + pfile_set_info_f.c \ + pfile_set_size_f.c \ + pfile_set_view_f.c \ + pfile_sync_f.c \ + pfile_write_all_begin_f.c \ + pfile_write_all_end_f.c \ + pfile_write_all_f.c \ + pfile_write_at_all_begin_f.c \ + pfile_write_at_all_end_f.c \ + pfile_write_at_all_f.c \ + pfile_write_at_f.c \ + pfile_write_f.c \ + pfile_write_ordered_begin_f.c \ + pfile_write_ordered_end_f.c \ + pfile_write_ordered_f.c \ + pfile_write_shared_f.c \ pfinalized_f.c \ pfinalize_f.c \ pfree_mem_f.c \ @@ -301,6 +360,7 @@ linked_files = \ pwtime_f.c \ paccumulate_f.c \ praccumulate_f.c \ + pregister_datarep_f.c \ pget_f.c \ prget_f.c \ pget_accumulate_f.c \ @@ -347,71 +407,6 @@ linked_files = \ pwin_flush_local_f.c \ pwin_flush_local_all_f.c - -if OMPI_PROVIDE_MPI_FILE_INTERFACE -linked_files += \ - pfile_call_errhandler_f.c \ - pfile_close_f.c \ - pfile_create_errhandler_f.c \ - pfile_delete_f.c \ - pfile_get_amode_f.c \ - pfile_get_atomicity_f.c \ - pfile_get_byte_offset_f.c \ - pfile_get_errhandler_f.c \ - pfile_get_group_f.c \ - pfile_get_info_f.c \ - pfile_get_position_f.c \ - pfile_get_position_shared_f.c \ - pfile_get_size_f.c \ - pfile_get_type_extent_f.c \ - pfile_get_view_f.c \ - pfile_iread_at_f.c \ - pfile_iread_f.c \ - pfile_iread_at_all_f.c \ - pfile_iread_all_f.c \ - pfile_iread_shared_f.c \ - pfile_iwrite_at_f.c \ - pfile_iwrite_f.c \ - pfile_iwrite_at_all_f.c \ - pfile_iwrite_all_f.c \ - pfile_iwrite_shared_f.c \ - pfile_open_f.c \ - pfile_preallocate_f.c \ - pfile_read_all_begin_f.c \ - pfile_read_all_end_f.c \ - pfile_read_all_f.c \ - pfile_read_at_all_begin_f.c \ - pfile_read_at_all_end_f.c \ - pfile_read_at_all_f.c \ - pfile_read_at_f.c \ - pfile_read_f.c \ - pfile_read_ordered_begin_f.c \ - pfile_read_ordered_end_f.c \ - pfile_read_ordered_f.c \ - pfile_read_shared_f.c \ - pfile_seek_f.c \ - pfile_seek_shared_f.c \ - pfile_set_atomicity_f.c \ - pfile_set_errhandler_f.c \ - pfile_set_info_f.c \ - pfile_set_size_f.c \ - pfile_set_view_f.c \ - pfile_sync_f.c \ - pfile_write_all_begin_f.c \ - pfile_write_all_end_f.c \ - pfile_write_all_f.c \ - pfile_write_at_all_begin_f.c \ - pfile_write_at_all_end_f.c \ - pfile_write_at_all_f.c \ - pfile_write_at_f.c \ - pfile_write_f.c \ - pfile_write_ordered_begin_f.c \ - pfile_write_ordered_end_f.c \ - pfile_write_ordered_f.c \ - pfile_write_shared_f.c \ - pregister_datarep_f.c -endif - # # Sym link in the sources from the real MPI directory # diff --git a/ompi/mpi/fortran/mpif-h/prototypes_mpi.h b/ompi/mpi/fortran/mpif-h/prototypes_mpi.h index 1241e422e16..6a664e9bd2f 100644 --- a/ompi/mpi/fortran/mpif-h/prototypes_mpi.h +++ b/ompi/mpi/fortran/mpif-h/prototypes_mpi.h @@ -14,6 +14,8 @@ * Copyright (c) 2011-2013 Universite Bordeaux 1 * Copyright (c) 2013-2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2016-2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -120,7 +122,7 @@ PN2(void, MPI_Comm_call_errhandler, mpi_comm_call_errhandler, MPI_COMM_CALL_ERRH PN2(void, MPI_Comm_compare, mpi_comm_compare, MPI_COMM_COMPARE, (MPI_Fint *comm1, MPI_Fint *comm2, MPI_Fint *result, MPI_Fint *ierr)); PN2(void, MPI_Comm_connect, mpi_comm_connect, MPI_COMM_CONNECT, (char *port_name, MPI_Fint *info, MPI_Fint *root, MPI_Fint *comm, MPI_Fint *newcomm, MPI_Fint *ierr, int port_name_len)); PN2(void, MPI_Comm_create_errhandler, mpi_comm_create_errhandler, MPI_COMM_CREATE_ERRHANDLER, (ompi_errhandler_fortran_handler_fn_t* function, MPI_Fint *errhandler, MPI_Fint *ierr)); -PN2(void, MPI_Comm_create_keyval, mpi_comm_create_keyval, MPI_COMM_CREATE_KEYVAL, (ompi_mpi2_fortran_copy_attr_function* comm_copy_attr_fn, ompi_mpi2_fortran_delete_attr_function* comm_delete_attr_fn, MPI_Fint *comm_keyval, MPI_Aint *extra_state, MPI_Fint *ierr)); +PN2(void, MPI_Comm_create_keyval, mpi_comm_create_keyval, MPI_COMM_CREATE_KEYVAL, (ompi_aint_copy_attr_function* comm_copy_attr_fn, ompi_aint_delete_attr_function* comm_delete_attr_fn, MPI_Fint *comm_keyval, MPI_Aint *extra_state, MPI_Fint *ierr)); PN2(void, MPI_Comm_create, mpi_comm_create, MPI_COMM_CREATE, (MPI_Fint *comm, MPI_Fint *group, MPI_Fint *newcomm, MPI_Fint *ierr)); PN2(void, MPI_Comm_create_group, mpi_comm_create_group, MPI_COMM_CREATE_GROUP, (MPI_Fint *comm, MPI_Fint *group, MPI_Fint *tag, MPI_Fint *newcomm, MPI_Fint *ierr)); PN2(void, MPI_Comm_delete_attr, mpi_comm_delete_attr, MPI_COMM_DELETE_ATTR, (MPI_Fint *comm, MPI_Fint *comm_keyval, MPI_Fint *ierr)); @@ -303,7 +305,7 @@ PN2(void, MPI_Irsend, mpi_irsend, MPI_IRSEND, (char *buf, MPI_Fint *count, MPI_F PN2(void, MPI_Isend, mpi_isend, MPI_ISEND, (char *buf, MPI_Fint *count, MPI_Fint *datatype, MPI_Fint *dest, MPI_Fint *tag, MPI_Fint *comm, MPI_Fint *request, MPI_Fint *ierr)); PN2(void, MPI_Issend, mpi_issend, MPI_ISSEND, (char *buf, MPI_Fint *count, MPI_Fint *datatype, MPI_Fint *dest, MPI_Fint *tag, MPI_Fint *comm, MPI_Fint *request, MPI_Fint *ierr)); PN2(void, MPI_Is_thread_main, mpi_is_thread_main, MPI_IS_THREAD_MAIN, (ompi_fortran_logical_t *flag, MPI_Fint *ierr)); -PN2(void, MPI_Keyval_create, mpi_keyval_create, MPI_KEYVAL_CREATE, (ompi_mpi1_fortran_copy_attr_function* copy_fn, ompi_mpi1_fortran_delete_attr_function* delete_fn, MPI_Fint *keyval, MPI_Fint *extra_state, MPI_Fint *ierr)); +PN2(void, MPI_Keyval_create, mpi_keyval_create, MPI_KEYVAL_CREATE, (ompi_fint_copy_attr_function* copy_fn, ompi_fint_delete_attr_function* delete_fn, MPI_Fint *keyval, MPI_Fint *extra_state, MPI_Fint *ierr)); PN2(void, MPI_Keyval_free, mpi_keyval_free, MPI_KEYVAL_FREE, (MPI_Fint *keyval, MPI_Fint *ierr)); PN2(void, MPI_Lookup_name, mpi_lookup_name, MPI_LOOKUP_NAME, (char *service_name, MPI_Fint *info, char *port_name, MPI_Fint *ierr, int service_name_len, int port_name_len)); PN2(void, MPI_Mprobe, mpi_mprobe, MPI_MPROBE, (MPI_Fint *source, MPI_Fint *tag, MPI_Fint *comm, MPI_Fint *message, MPI_Fint *status, MPI_Fint *ierr)); @@ -369,7 +371,7 @@ PN2(void, MPI_Type_create_f90_integer, mpi_type_create_f90_integer, MPI_TYPE_CRE PN2(void, MPI_Type_create_f90_real, mpi_type_create_f90_real, MPI_TYPE_CREATE_F90_REAL, (MPI_Fint *p, MPI_Fint *r, MPI_Fint *newtype, MPI_Fint *ierr)); PN2(void, MPI_Type_create_hindexed, mpi_type_create_hindexed, MPI_TYPE_CREATE_HINDEXED, (MPI_Fint *count, MPI_Fint *array_of_blocklengths, MPI_Aint *array_of_displacements, MPI_Fint *oldtype, MPI_Fint *newtype, MPI_Fint *ierr)); PN2(void, MPI_Type_create_hvector, mpi_type_create_hvector, MPI_TYPE_CREATE_HVECTOR, (MPI_Fint *count, MPI_Fint *blocklength, MPI_Aint *stride, MPI_Fint *oldtype, MPI_Fint *newtype, MPI_Fint *ierr)); -PN2(void, MPI_Type_create_keyval, mpi_type_create_keyval, MPI_TYPE_CREATE_KEYVAL, (ompi_mpi2_fortran_copy_attr_function* type_copy_attr_fn, ompi_mpi2_fortran_delete_attr_function* type_delete_attr_fn, MPI_Fint *type_keyval, MPI_Aint *extra_state, MPI_Fint *ierr)); +PN2(void, MPI_Type_create_keyval, mpi_type_create_keyval, MPI_TYPE_CREATE_KEYVAL, (ompi_aint_copy_attr_function* type_copy_attr_fn, ompi_aint_delete_attr_function* type_delete_attr_fn, MPI_Fint *type_keyval, MPI_Aint *extra_state, MPI_Fint *ierr)); PN2(void, MPI_Type_create_indexed_block, mpi_type_create_indexed_block, MPI_TYPE_CREATE_INDEXED_BLOCK, (MPI_Fint *count, MPI_Fint *blocklength, MPI_Fint *array_of_displacements, MPI_Fint *oldtype, MPI_Fint *newtype, MPI_Fint *ierr)); PN2(void, MPI_Type_create_hindexed_block, mpi_type_create_hindexed_block, MPI_TYPE_CREATE_HINDEXED_BLOCK, (MPI_Fint *count, MPI_Fint *blocklength, MPI_Aint *array_of_displacements, MPI_Fint *oldtype, MPI_Fint *newtype, MPI_Fint *ierr)); PN2(void, MPI_Type_create_struct, mpi_type_create_struct, MPI_TYPE_CREATE_STRUCT, (MPI_Fint *count, MPI_Fint *array_of_block_lengths, MPI_Aint *array_of_displacements, MPI_Fint *array_of_types, MPI_Fint *newtype, MPI_Fint *ierr)); @@ -417,7 +419,7 @@ PN2(void, MPI_Win_complete, mpi_win_complete, MPI_WIN_COMPLETE, (MPI_Fint *win, PN2(void, MPI_Win_create, mpi_win_create, MPI_WIN_CREATE, (char *base, MPI_Aint *size, MPI_Fint *disp_unit, MPI_Fint *info, MPI_Fint *comm, MPI_Fint *win, MPI_Fint *ierr)); PN2(void, MPI_Win_create_dynamic, mpi_win_create_dynamic, MPI_WIN_CREATE_DYNAMIC, (MPI_Fint *info, MPI_Fint *comm, MPI_Fint *win, MPI_Fint *ierr)); PN2(void, MPI_Win_create_errhandler, mpi_win_create_errhandler, MPI_WIN_CREATE_ERRHANDLER, (ompi_errhandler_fortran_handler_fn_t* function, MPI_Fint *errhandler, MPI_Fint *ierr)); -PN2(void, MPI_Win_create_keyval, mpi_win_create_keyval, MPI_WIN_CREATE_KEYVAL, (ompi_mpi2_fortran_copy_attr_function* win_copy_attr_fn, ompi_mpi2_fortran_delete_attr_function* win_delete_attr_fn, MPI_Fint *win_keyval, MPI_Aint *extra_state, MPI_Fint *ierr)); +PN2(void, MPI_Win_create_keyval, mpi_win_create_keyval, MPI_WIN_CREATE_KEYVAL, (ompi_aint_copy_attr_function* win_copy_attr_fn, ompi_aint_delete_attr_function* win_delete_attr_fn, MPI_Fint *win_keyval, MPI_Aint *extra_state, MPI_Fint *ierr)); PN2(void, MPI_Win_delete_attr, mpi_win_delete_attr, MPI_WIN_DELETE_ATTR, (MPI_Fint *win, MPI_Fint *win_keyval, MPI_Fint *ierr)); PN2(void, MPI_Win_detach, mpi_win_detach, MPI_WIN_DETACH, (MPI_Fint *win, char *base, MPI_Fint *ierr)); PN2(void, MPI_Win_fence, mpi_win_fence, MPI_WIN_FENCE, (MPI_Fint *assert, MPI_Fint *win, MPI_Fint *ierr)); diff --git a/ompi/mpi/fortran/mpif-h/publish_name_f.c b/ompi/mpi/fortran/mpif-h/publish_name_f.c index 21dc6191ccb..d219e564a0a 100644 --- a/ompi/mpi/fortran/mpif-h/publish_name_f.c +++ b/ompi/mpi/fortran/mpif-h/publish_name_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -22,7 +22,7 @@ #include "ompi_config.h" #include "ompi/mpi/fortran/mpif-h/bindings.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/register_datarep_f.c b/ompi/mpi/fortran/mpif-h/register_datarep_f.c index 7b9e628f60b..63e31191ba3 100644 --- a/ompi/mpi/fortran/mpif-h/register_datarep_f.c +++ b/ompi/mpi/fortran/mpif-h/register_datarep_f.c @@ -10,8 +10,9 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2007-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -26,7 +27,7 @@ #include "ompi/mpi/fortran/mpif-h/bindings.h" #include "ompi/mpi/fortran/base/constants.h" #include "ompi/mpi/fortran/base/datarep.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #include "ompi/mpi/fortran/base/fint_2_int.h" #include "ompi/runtime/mpiruntime.h" #include "ompi/file/file.h" @@ -95,12 +96,12 @@ typedef struct intercept_extra_state { ompi_mpi2_fortran_datarep_conversion_fn_t *write_fn_f77; ompi_mpi2_fortran_datarep_extent_fn_t *extent_fn_f77; MPI_Aint *extra_state_f77; -} intercept_extra_state_t; +} ompi_intercept_extra_state_t; -OBJ_CLASS_DECLARATION(intercept_extra_state_t); +OBJ_CLASS_DECLARATION(ompi_intercept_extra_state_t); #if !OMPI_BUILD_MPI_PROFILING || OPAL_HAVE_WEAK_SYMBOLS -static void intercept_extra_state_constructor(intercept_extra_state_t *obj) +static void intercept_extra_state_constructor(ompi_intercept_extra_state_t *obj) { obj->read_fn_f77 = NULL; obj->write_fn_f77 = NULL; @@ -108,7 +109,7 @@ static void intercept_extra_state_constructor(intercept_extra_state_t *obj) obj->extra_state_f77 = NULL; } -OBJ_CLASS_INSTANCE(intercept_extra_state_t, +OBJ_CLASS_INSTANCE(ompi_intercept_extra_state_t, opal_list_item_t, intercept_extra_state_constructor, NULL); #endif /* !OMPI_BUILD_MPI_PROFILING */ @@ -137,10 +138,10 @@ void ompi_register_datarep_f(char *datarep, char *c_datarep; int c_ierr, ret; MPI_Datarep_conversion_function *read_fn_c, *write_fn_c; - intercept_extra_state_t *intercept; + ompi_intercept_extra_state_t *intercept; /* Malloc space for the intercept callback data */ - intercept = OBJ_NEW(intercept_extra_state_t); + intercept = OBJ_NEW(ompi_intercept_extra_state_t); if (NULL == intercept) { c_ierr = OMPI_ERRHANDLER_INVOKE(MPI_FILE_NULL, OMPI_ERR_OUT_OF_RESOURCE, FUNC_NAME); @@ -210,8 +211,8 @@ static int read_intercept_fn(void *userbuf, MPI_Datatype type_c, int count_c, { MPI_Fint ierr, count_f77 = OMPI_FINT_2_INT(count_c); MPI_Fint type_f77 = PMPI_Type_c2f(type_c); - intercept_extra_state_t *intercept_data = - (intercept_extra_state_t*) extra_state; + ompi_intercept_extra_state_t *intercept_data = + (ompi_intercept_extra_state_t*) extra_state; intercept_data->read_fn_f77((char *) userbuf, &type_f77, &count_f77, (char *) filebuf, &position, intercept_data->extra_state_f77, @@ -228,8 +229,8 @@ static int write_intercept_fn(void *userbuf, MPI_Datatype type_c, int count_c, { MPI_Fint ierr, count_f77 = OMPI_FINT_2_INT(count_c); MPI_Fint type_f77 = PMPI_Type_c2f(type_c); - intercept_extra_state_t *intercept_data = - (intercept_extra_state_t*) extra_state; + ompi_intercept_extra_state_t *intercept_data = + (ompi_intercept_extra_state_t*) extra_state; intercept_data->write_fn_f77((char *) userbuf, &type_f77, &count_f77, (char *) filebuf, &position, intercept_data->extra_state_f77, @@ -244,8 +245,8 @@ static int extent_intercept_fn(MPI_Datatype type_c, MPI_Aint *file_extent_f77, void *extra_state) { MPI_Fint ierr, type_f77 = PMPI_Type_c2f(type_c); - intercept_extra_state_t *intercept_data = - (intercept_extra_state_t*) extra_state; + ompi_intercept_extra_state_t *intercept_data = + (ompi_intercept_extra_state_t*) extra_state; intercept_data->extent_fn_f77(&type_f77, file_extent_f77, intercept_data->extra_state_f77, &ierr); diff --git a/ompi/mpi/fortran/mpif-h/type_create_keyval_f.c b/ompi/mpi/fortran/mpif-h/type_create_keyval_f.c index 11a59188ca0..dca7bcc91c9 100644 --- a/ompi/mpi/fortran/mpif-h/type_create_keyval_f.c +++ b/ompi/mpi/fortran/mpif-h/type_create_keyval_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2007-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -39,7 +39,7 @@ OMPI_GENERATE_F77_BINDINGS (PMPI_TYPE_CREATE_KEYVAL, pmpi_type_create_keyval_, pmpi_type_create_keyval__, pompi_type_create_keyval_f, - (ompi_mpi2_fortran_copy_attr_function* type_copy_attr_fn, ompi_mpi2_fortran_delete_attr_function* type_delete_attr_fn, MPI_Fint *type_keyval, MPI_Aint *extra_state, MPI_Fint *ierr), + (ompi_aint_copy_attr_function* type_copy_attr_fn, ompi_aint_delete_attr_function* type_delete_attr_fn, MPI_Fint *type_keyval, MPI_Aint *extra_state, MPI_Fint *ierr), (type_copy_attr_fn, type_delete_attr_fn, type_keyval, extra_state, ierr) ) #endif #endif @@ -59,7 +59,7 @@ OMPI_GENERATE_F77_BINDINGS (MPI_TYPE_CREATE_KEYVAL, mpi_type_create_keyval_, mpi_type_create_keyval__, ompi_type_create_keyval_f, - (ompi_mpi2_fortran_copy_attr_function* type_copy_attr_fn, ompi_mpi2_fortran_delete_attr_function* type_delete_attr_fn, MPI_Fint *type_keyval, MPI_Aint *extra_state, MPI_Fint *ierr), + (ompi_aint_copy_attr_function* type_copy_attr_fn, ompi_aint_delete_attr_function* type_delete_attr_fn, MPI_Fint *type_keyval, MPI_Aint *extra_state, MPI_Fint *ierr), (type_copy_attr_fn, type_delete_attr_fn, type_keyval, extra_state, ierr) ) #else #define ompi_type_create_keyval_f pompi_type_create_keyval_f @@ -68,8 +68,8 @@ OMPI_GENERATE_F77_BINDINGS (MPI_TYPE_CREATE_KEYVAL, static char FUNC_NAME[] = "MPI_Type_create_keyval_f"; -void ompi_type_create_keyval_f(ompi_mpi2_fortran_copy_attr_function* type_copy_attr_fn, - ompi_mpi2_fortran_delete_attr_function* type_delete_attr_fn, +void ompi_type_create_keyval_f(ompi_aint_copy_attr_function* type_copy_attr_fn, + ompi_aint_delete_attr_function* type_delete_attr_fn, MPI_Fint *type_keyval, MPI_Aint *extra_state, MPI_Fint *ierr) { int ret, c_ierr; @@ -77,8 +77,8 @@ void ompi_type_create_keyval_f(ompi_mpi2_fortran_copy_attr_function* type_copy_a ompi_attribute_fn_ptr_union_t copy_fn; ompi_attribute_fn_ptr_union_t del_fn; - copy_fn.attr_mpi2_fortran_copy_fn = type_copy_attr_fn; - del_fn.attr_mpi2_fortran_delete_fn = type_delete_attr_fn; + copy_fn.attr_aint_copy_fn = type_copy_attr_fn; + del_fn.attr_aint_delete_fn = type_delete_attr_fn; /* Note that we only set the "F77" bit and exclude the "F77_OLD" bit, indicating that the callbacks should use the new MPI-2 diff --git a/ompi/mpi/fortran/mpif-h/type_get_attr_f.c b/ompi/mpi/fortran/mpif-h/type_get_attr_f.c index 84e51e25e66..7b8dc979c91 100644 --- a/ompi/mpi/fortran/mpif-h/type_get_attr_f.c +++ b/ompi/mpi/fortran/mpif-h/type_get_attr_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2007-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -78,10 +78,10 @@ void ompi_type_get_attr_f(MPI_Fint *type, MPI_Fint *type_keyval, /* This stuff is very confusing. Be sure to see the comment at the top of src/attributes/attributes.c. */ - c_ierr = ompi_attr_get_fortran_mpi2(c_type->d_keyhash, - OMPI_FINT_2_INT(*type_keyval), - attribute_val, - OMPI_LOGICAL_SINGLE_NAME_CONVERT(flag)); + c_ierr = ompi_attr_get_aint(c_type->d_keyhash, + OMPI_FINT_2_INT(*type_keyval), + attribute_val, + OMPI_LOGICAL_SINGLE_NAME_CONVERT(flag)); if (NULL != ierr) *ierr = OMPI_INT_2_FINT(c_ierr); OMPI_SINGLE_INT_2_LOGICAL(flag); diff --git a/ompi/mpi/fortran/mpif-h/type_get_name_f.c b/ompi/mpi/fortran/mpif-h/type_get_name_f.c index 5e646bec9b2..76ce7605843 100644 --- a/ompi/mpi/fortran/mpif-h/type_get_name_f.c +++ b/ompi/mpi/fortran/mpif-h/type_get_name_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -23,7 +23,7 @@ #include "ompi/mpi/fortran/mpif-h/bindings.h" #include "ompi/constants.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/type_set_attr_f.c b/ompi/mpi/fortran/mpif-h/type_set_attr_f.c index 644d2b32ae9..bc5c27d95f8 100644 --- a/ompi/mpi/fortran/mpif-h/type_set_attr_f.c +++ b/ompi/mpi/fortran/mpif-h/type_set_attr_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -75,11 +75,11 @@ void ompi_type_set_attr_f(MPI_Fint *type, MPI_Fint *type_keyval, MPI_Aint *attri /* This stuff is very confusing. Be sure to see the comment at the top of src/attributes/attributes.c. */ - c_ierr = ompi_attr_set_fortran_mpi2(TYPE_ATTR, - c_type, - &c_type->d_keyhash, - OMPI_FINT_2_INT(*type_keyval), - *attribute_val, - false); + c_ierr = ompi_attr_set_aint(TYPE_ATTR, + c_type, + &c_type->d_keyhash, + OMPI_FINT_2_INT(*type_keyval), + *attribute_val, + false); if (NULL != ierr) *ierr = OMPI_INT_2_FINT(c_ierr); } diff --git a/ompi/mpi/fortran/mpif-h/type_set_name_f.c b/ompi/mpi/fortran/mpif-h/type_set_name_f.c index a2333260dcd..62220192bcb 100644 --- a/ompi/mpi/fortran/mpif-h/type_set_name_f.c +++ b/ompi/mpi/fortran/mpif-h/type_set_name_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -25,7 +25,7 @@ #include "ompi/constants.h" #include "ompi/errhandler/errhandler.h" #include "ompi/communicator/communicator.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/unpack_external_f.c b/ompi/mpi/fortran/mpif-h/unpack_external_f.c index ad10f73ad5e..7a9ec77aced 100644 --- a/ompi/mpi/fortran/mpif-h/unpack_external_f.c +++ b/ompi/mpi/fortran/mpif-h/unpack_external_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -25,7 +25,7 @@ #include "ompi/constants.h" #include "ompi/communicator/communicator.h" #include "ompi/mpi/fortran/base/constants.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/unpublish_name_f.c b/ompi/mpi/fortran/mpif-h/unpublish_name_f.c index 290b02dfb45..80458071f03 100644 --- a/ompi/mpi/fortran/mpif-h/unpublish_name_f.c +++ b/ompi/mpi/fortran/mpif-h/unpublish_name_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -22,7 +22,7 @@ #include "ompi_config.h" #include "ompi/mpi/fortran/mpif-h/bindings.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING #if OPAL_HAVE_WEAK_SYMBOLS diff --git a/ompi/mpi/fortran/mpif-h/win_create_keyval_f.c b/ompi/mpi/fortran/mpif-h/win_create_keyval_f.c index c54db08de15..b1136806b21 100644 --- a/ompi/mpi/fortran/mpif-h/win_create_keyval_f.c +++ b/ompi/mpi/fortran/mpif-h/win_create_keyval_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2007-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -39,7 +39,7 @@ OMPI_GENERATE_F77_BINDINGS (PMPI_WIN_CREATE_KEYVAL, pmpi_win_create_keyval_, pmpi_win_create_keyval__, pompi_win_create_keyval_f, - (ompi_mpi2_fortran_copy_attr_function* win_copy_attr_fn, ompi_mpi2_fortran_delete_attr_function* win_delete_attr_fn, MPI_Fint *win_keyval, MPI_Aint *extra_state, MPI_Fint *ierr), + (ompi_aint_copy_attr_function* win_copy_attr_fn, ompi_aint_delete_attr_function* win_delete_attr_fn, MPI_Fint *win_keyval, MPI_Aint *extra_state, MPI_Fint *ierr), (win_copy_attr_fn, win_delete_attr_fn, win_keyval, extra_state, ierr) ) #endif #endif @@ -59,7 +59,7 @@ OMPI_GENERATE_F77_BINDINGS (MPI_WIN_CREATE_KEYVAL, mpi_win_create_keyval_, mpi_win_create_keyval__, ompi_win_create_keyval_f, - (ompi_mpi2_fortran_copy_attr_function* win_copy_attr_fn, ompi_mpi2_fortran_delete_attr_function* win_delete_attr_fn, MPI_Fint *win_keyval, MPI_Aint *extra_state, MPI_Fint *ierr), + (ompi_aint_copy_attr_function* win_copy_attr_fn, ompi_aint_delete_attr_function* win_delete_attr_fn, MPI_Fint *win_keyval, MPI_Aint *extra_state, MPI_Fint *ierr), (win_copy_attr_fn, win_delete_attr_fn, win_keyval, extra_state, ierr) ) #else #define ompi_win_create_keyval_f pompi_win_create_keyval_f @@ -68,8 +68,8 @@ OMPI_GENERATE_F77_BINDINGS (MPI_WIN_CREATE_KEYVAL, static char FUNC_NAME[] = "MPI_Win_create_keyval"; -void ompi_win_create_keyval_f(ompi_mpi2_fortran_copy_attr_function* win_copy_attr_fn, - ompi_mpi2_fortran_delete_attr_function* win_delete_attr_fn, +void ompi_win_create_keyval_f(ompi_aint_copy_attr_function* win_copy_attr_fn, + ompi_aint_delete_attr_function* win_delete_attr_fn, MPI_Fint *win_keyval, MPI_Aint *extra_state, MPI_Fint *ierr) { int ret, c_ierr; @@ -77,8 +77,8 @@ void ompi_win_create_keyval_f(ompi_mpi2_fortran_copy_attr_function* win_copy_att ompi_attribute_fn_ptr_union_t copy_fn; ompi_attribute_fn_ptr_union_t del_fn; - copy_fn.attr_mpi2_fortran_copy_fn = win_copy_attr_fn; - del_fn.attr_mpi2_fortran_delete_fn = win_delete_attr_fn; + copy_fn.attr_aint_copy_fn = win_copy_attr_fn; + del_fn.attr_aint_delete_fn = win_delete_attr_fn; /* Note that we only set the "F77" bit and exclude the "F77_OLD" bit, indicating that the callbacks should use the new MPI-2 diff --git a/ompi/mpi/fortran/mpif-h/win_get_attr_f.c b/ompi/mpi/fortran/mpif-h/win_get_attr_f.c index af77810f380..2f982a48438 100644 --- a/ompi/mpi/fortran/mpif-h/win_get_attr_f.c +++ b/ompi/mpi/fortran/mpif-h/win_get_attr_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -77,10 +77,10 @@ void ompi_win_get_attr_f(MPI_Fint *win, MPI_Fint *win_keyval, /* This stuff is very confusing. Be sure to see the comment at the top of src/attributes/attributes.c. */ - c_ierr = ompi_attr_get_fortran_mpi2(c_win->w_keyhash, - OMPI_FINT_2_INT(*win_keyval), - attribute_val, - OMPI_LOGICAL_SINGLE_NAME_CONVERT(flag)); + c_ierr = ompi_attr_get_aint(c_win->w_keyhash, + OMPI_FINT_2_INT(*win_keyval), + attribute_val, + OMPI_LOGICAL_SINGLE_NAME_CONVERT(flag)); if (NULL != ierr) *ierr = OMPI_INT_2_FINT(c_ierr); OMPI_SINGLE_INT_2_LOGICAL(flag); } diff --git a/ompi/mpi/fortran/mpif-h/win_get_name_f.c b/ompi/mpi/fortran/mpif-h/win_get_name_f.c index 8d523ed1b45..f5b77ef8ccc 100644 --- a/ompi/mpi/fortran/mpif-h/win_get_name_f.c +++ b/ompi/mpi/fortran/mpif-h/win_get_name_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -23,7 +23,7 @@ #include "ompi/mpi/fortran/mpif-h/bindings.h" #include "ompi/constants.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #if OMPI_BUILD_MPI_PROFILING diff --git a/ompi/mpi/fortran/mpif-h/win_set_attr_f.c b/ompi/mpi/fortran/mpif-h/win_set_attr_f.c index 7dd9d51f93e..056c8c23e6d 100644 --- a/ompi/mpi/fortran/mpif-h/win_set_attr_f.c +++ b/ompi/mpi/fortran/mpif-h/win_set_attr_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -77,11 +77,11 @@ void ompi_win_set_attr_f(MPI_Fint *win, MPI_Fint *win_keyval, /* This stuff is very confusing. Be sure to see the comment at the top of src/attributes/attributes.c. */ - c_ierr = ompi_attr_set_fortran_mpi2(WIN_ATTR, - c_win, - &c_win->w_keyhash, - OMPI_FINT_2_INT(*win_keyval), - *attribute_val, - false); + c_ierr = ompi_attr_set_aint(WIN_ATTR, + c_win, + &c_win->w_keyhash, + OMPI_FINT_2_INT(*win_keyval), + *attribute_val, + false); if (NULL != ierr) *ierr = OMPI_INT_2_FINT(c_ierr); } diff --git a/ompi/mpi/fortran/mpif-h/win_set_name_f.c b/ompi/mpi/fortran/mpif-h/win_set_name_f.c index ccec5e41eb3..4c8bf2f7cda 100644 --- a/ompi/mpi/fortran/mpif-h/win_set_name_f.c +++ b/ompi/mpi/fortran/mpif-h/win_set_name_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -23,7 +23,7 @@ #include "ompi/constants.h" #include "ompi/mpi/fortran/mpif-h/bindings.h" -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #include "ompi/communicator/communicator.h" #if OMPI_BUILD_MPI_PROFILING diff --git a/ompi/mpi/fortran/mpif-h/win_shared_query_f.c b/ompi/mpi/fortran/mpif-h/win_shared_query_f.c index dd847b7afeb..5a1fecaf47f 100644 --- a/ompi/mpi/fortran/mpif-h/win_shared_query_f.c +++ b/ompi/mpi/fortran/mpif-h/win_shared_query_f.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2014 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -118,7 +118,7 @@ void ompi_win_shared_query_f(MPI_Fint *win, MPI_Fint *rank, MPI_Aint *size, c_win = PMPI_Win_f2c(*win); c_ierr = PMPI_Win_shared_query(c_win, OMPI_FINT_2_INT(*rank), size, - OMPI_SINGLE_NAME_CONVERT(disp_unit), baseptr); + OMPI_SINGLE_NAME_CONVERT(disp_unit), baseptr); if (NULL != ierr) *ierr = OMPI_INT_2_FINT(c_ierr); if (MPI_SUCCESS == c_ierr) { diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/ISO_Fortran_binding.h b/ompi/mpi/fortran/use-mpi-f08-desc/ISO_Fortran_binding.h deleted file mode 100644 index d7879939cbe..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/ISO_Fortran_binding.h +++ /dev/null @@ -1,98 +0,0 @@ -/************** Example ISO_Fortran_binding.h ********************/ -#include -#include - -/* Struct CFI_dim_t for triples of bound, extent and stride information */ - -typedef struct { - intptr_t lower_bound, - extent, - sm; -} CFI_dim_t; - -typedef struct { - intptr_t lower_bound, - upper_bound, - stride; -} CFI_bounds_t; - - -/* Maximum rank supported by the companion Fortran processor */ - -/* Changed from 15 to F2003 value of 7 (CER) */ -#define CFI_MAX_RANK 7 - -/* Struct CFI_cdesc_t for holding all the information about a - descriptor-based Fortran object */ - -typedef struct { - void * base_addr; /* base address of object */ - size_t elem_len; /* length of one element, in bytes */ - int rank; /* object rank, 0 .. CF_MAX_RANK */ - int type; /* identifier for type of object */ - int attribute; /* object attribute: 0..2, or -1 */ - int state; /* allocation/association state: 0 or 1 */ -//Removed (CER) -//void * fdesc; /* pointer to corresponding Fortran descriptor */ - CFI_dim_t dim[CFI_MAX_RANK]; /* dimension triples */ -} CFI_cdesc_t; - - -/* function prototypes */ - -int CFI_update_cdesc ( CFI_cdesc_t * ); - -int CFI_update_fdesc ( CFI_cdesc_t * ); - -int CFI_allocate ( CFI_cdesc_t *, const CFI_bounds_t bounds[] ); - -int CFI_deallocate ( CFI_cdesc_t * ); - -int CFI_is_contiguous ( const CFI_cdesc_t *, _Bool * ); - -int CFI_bounds_to_cdesc ( const CFI_bounds_t bounds[] , CFI_cdesc_t * ); - -int CFI_cdesc_to_bounds ( const CFI_cdesc_t * , CFI_bounds_t bounds[] ); - - -/* Sympolic names for attributes of objects */ - -#define CFI_attribute_assumed 0 -#define CFI_attribute_allocatable 1 -#define CFI_attribute_pointer 2 - -/* Symbolic names for type identifiers */ - -#define CFI_type_unknown 0 -#define CFI_type_struct 100 -#define CFI_type_signed_char 1 -#define CFI_type_short 3 -#define CFI_type_int 5 -#define CFI_type_long 7 -#define CFI_type_long_long 9 -#define CFI_type_size_t 11 -#define CFI_type_int8_t 12 -#define CFI_type_int16_t 14 -#define CFI_type_int32_t 16 -#define CFI_type_int64_t 18 -#define CFI_type_int_least8_t 20 -#define CFI_type_int_least16_t 22 -#define CFI_type_int_least32_t 24 -#define CFI_type_int_least64_t 26 -#define CFI_type_int_fast8_t 28 -#define CFI_type_int_fast16_t 30 -#define CFI_type_int_fast32_t 32 -#define CFI_type_int_fast64_t 34 -#define CFI_type_intmax_t 36 -#define CFI_type_intptr_t 37 -#define CFI_type_float 38 -#define CFI_type_double 39 -#define CFI_type_long_double 40 -#define CFI_type_float_Complex 41 -#define CFI_type_double_Complex 42 -#define CFI_type_long_double_Complex 43 -#define CFI_type_Bool 44 -#define CFI_type_char 45 - -/* End of Example ISO_Fortran_binding.h */ - diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/Makefile.am b/ompi/mpi/fortran/use-mpi-f08-desc/Makefile.am deleted file mode 100644 index 9e55e5bd36d..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/Makefile.am +++ /dev/null @@ -1,106 +0,0 @@ -# -*- makefile -*- -# -# Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. -# Copyright (c) 2015 Research Organization for Information Science -# and Technology (RIST). All rights reserved. -# Copyright (c) 2016 IBM Corporation. All rights reserved. -# -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - -# This Makefile is only relevant if we're building the "use mpi_f08" -# MPI bindings. -if OMPI_BUILD_FORTRAN_USEMPIF08_BINDINGS - -AM_FCFLAGS = -I$(top_builddir)/ompi/include -I$(top_srcdir)/ompi/include \ - -I$(top_srcdir) $(FCFLAGS) - -lib_LTLIBRARIES = lib@OMPI_LIBMPI_NAME@_usempif08.la - -# -# This list is a subset of the full MPI API used for testing Fortran -# descriptors usage in MPI-3 -# -mpi_api_files = \ - comm_rank_f08.f90 \ - comm_size_f08.f90 \ - finalize_f08.f90 \ - init_f08.f90 \ - recv_f08_desc.f90 \ - send_f08_desc.f90 \ - type_commit_f08.f90 \ - type_contiguous_f08.f90 \ - type_vector_f08.f90 - -lib@OMPI_LIBMPI_NAME@_usempif08_la_SOURCES = \ - $(mpi_api_files) \ - mpi-f08-types.f90 \ - mpi-f08-interfaces.F90 \ - mpi-f-interfaces-bind.h \ - mpi-f08.f90 \ - ISO_Fortran_binding.h \ - OMPI_Fortran_binding.f90 \ - OMPI_Fortran_binding_c.c \ - constants.h \ - constants.c - -# -# Clean up all F90 module files -# - -MOSTLYCLEANFILES = *.mod - -# -# Automake doesn't do Fortran dependency analysis, so must list them -# manually here. Bummer! -# - -mpi-f08-types.lo: mpi-f08-types.f90 -mpi-f08-interfaces.lo: mpi-f08-interfaces.F90 mpi-f08-types.lo -OMPI_Fortran_binding.lo: OMPI_Fortran_binding.f90 mpi-f08-types.lo - - -# -# Automake doesn't do Fortran dependency analysis, so must list them -# manually here. Bummer! -# - -mpi_api_lo_files = $(mpi_api_files:.f90=.lo) - -$(mpi_api_lo_files): mpi-f08.lo - -mpi-f08.lo: mpi-f08-types.lo -mpi-f08.lo: OMPI_Fortran_binding.lo -mpi-f08.lo: mpi-f08-interfaces.lo -mpi-f08.lo: mpi-f-interfaces-bind.h -mpi-f08.lo: mpi-f08.f90 - -# Install the generated .mod files. Unfortunately, each F90 compiler -# may generate different filenames, so we have to use a glob. :-( - -install-exec-hook: - @ for file in `ls *.mod`; do \ - echo $(INSTALL) $$file $(DESTDIR)$(libdir); \ - $(INSTALL) $$file $(DESTDIR)$(libdir); \ - done - -uninstall-local: - @ for file in `ls *.mod`; do \ - echo rm -f $(DESTDIR)$(libdir)/$$file; \ - rm -f $(DESTDIR)$(libdir)/$$file; \ - done - -else - -# Need to have empty targets because AM can't handle having an -# AM_CONDITIONAL was targets in the "if" statement but not in the -# "else". :-( - -install-exec-hook: -uninstall-local: - -endif diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/OMPI_Fortran_binding.f90 b/ompi/mpi/fortran/use-mpi-f08-desc/OMPI_Fortran_binding.f90 deleted file mode 100644 index 6b42518a71a..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/OMPI_Fortran_binding.f90 +++ /dev/null @@ -1,142 +0,0 @@ -! -*- f90 -*- -! -! Copyright (c) 2010-2012 Cisco Systems, Inc. All rights reserved. -! Copyright (c) 2009-2012 Los Alamos National Security, LLC. -! All rights reserved. -! $COPYRIGHT$ - -!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! -! Fortran equivalent of ISO_Fortran_binding.h -! - this is used temporarily until compilers support the TR -! -module OMPI_Fortran_binding - use mpi_f08_types - use, intrinsic :: ISO_C_BINDING - - ! - ! The following types and procedures are here temporarily, - ! for testing purposes only - ! - - integer, parameter :: INTPTR_T_KIND = C_INTPTR_T - integer, parameter :: CFI_MAX_RANK = 7 ! until F2008 compilers - - type, bind(C) :: CFI_dim_t - integer(INTPTR_T_KIND) :: lower_bound, extent, sm; - end type CFI_dim_t - - type, bind(C) :: CFI_cdesc_t - type(C_PTR) :: base_addr ! base address of object - integer(C_SIZE_T) :: elem_len ! length of one element, in bytes - integer(C_INT) :: rank ! object rank, 0 .. CF_MAX_RANK - integer(C_INT) :: type ! identifier for type of object - integer(C_INT) :: attribute ! object attribute: 0..2, or -1 - integer(C_INT) :: state ! allocation/association state: 0 or 1 - type(CFI_dim_t) :: dim(CFI_MAX_RANK) ! dimension triples - end type CFI_cdesc_t - - interface - subroutine ompi_recv_f08_desc_f(desc,count,datatype,dest,tag,comm,status,ierror) & - BIND(C, name="ompi_recv_f08_desc_f") - use mpi_f08_types, only : MPI_Status - import CFI_cdesc_t - implicit none - type(CFI_cdesc_t) :: desc - INTEGER, INTENT(IN) :: count, dest, tag - INTEGER, INTENT(IN) :: datatype - INTEGER, INTENT(IN) :: comm - TYPE(MPI_Status), INTENT(OUT) :: status - INTEGER, INTENT(OUT) :: ierror - end subroutine ompi_recv_f08_desc_f - - subroutine ompi_send_f08_desc_f(desc,count,datatype,dest,tag,comm,ierror) & - BIND(C, name="ompi_send_f08_desc_f") - import CFI_cdesc_t - implicit none - type(CFI_cdesc_t) :: desc - INTEGER, INTENT(IN) :: count, dest, tag - INTEGER, INTENT(IN) :: datatype - INTEGER, INTENT(IN) :: comm - INTEGER, INTENT(OUT) :: ierror - end subroutine ompi_send_f08_desc_f - - function ompi_f08_addr(buf) & - BIND(C, name="ompi_f08_addr") - import :: C_PTR - type(C_PTR), value :: buf - type(C_PTR) :: ompi_f08_addr - end function ompi_f08_addr - - subroutine ompi_f08_print_addr(buf) & - BIND(C, name="ompi_f08_print_addr") - import :: C_PTR - type(C_PTR), value :: buf - end subroutine ompi_f08_print_addr - - function ompi_f08_addr_diff(buf1, buf2) & - BIND(C, name="ompi_f08_addr_diff") - import :: C_PTR, C_SIZE_T - type(C_PTR), value :: buf1, buf2 - integer(C_SIZE_T) :: ompi_f08_addr_diff - end function ompi_f08_addr_diff - end interface - -contains - -subroutine print_desc(desc) - implicit none - type(CFI_cdesc_t), intent(in) :: desc - type(C_PTR) :: cptr - integer :: i - - print *, "print_desc:" - call ompi_f08_print_addr(desc%base_addr) - print *, " rank =", desc%rank - print *, " elem_len =", desc%elem_len - print *, " type =", desc%type - print *, " attribute=", desc%attribute - print *, " state =", desc%attribute - print *, " dims =" - do i = 1, desc%rank - print *, desc%dim(i)%lower_bound, desc%dim(i)%extent, desc%dim(i)%sm - end do - -end subroutine print_desc - -subroutine make_desc_f(buf, desc) - use mpi_f08_types - use, intrinsic :: ISO_C_BINDING - implicit none - integer, target :: buf(:,:) - type(CFI_cdesc_t), intent(inout) :: desc - - integer :: i, shp(2) - -! print *, "row1" -! print *, buf(1,1:18) -! print *, "col1" -! print *, buf(1:18,1) -! print *, "size=", size(buf) -! print *, "shape=", shape(buf) -! print *, "lb=", lbound(buf) -! print *, "ub=", ubound(buf) - - shp = shape(buf) - - desc%base_addr = ompi_f08_addr(C_LOC(buf(1,1))) - desc%elem_len = 4 ! C_SIZEOF(buf(1,1)) ?Intel compiler doesn't have this function? - desc%rank = 2 - desc%type = 0; ! no type info for now - desc%attribute = 2; ! assumed shape - desc%state = 1; ! always 1 for assumed shape - - do i = 1, desc%rank - desc%dim(i)%lower_bound = 1 - desc%dim(i)%extent = shp(i) - end do - - desc%dim(1)%sm = ompi_f08_addr_diff(C_LOC(buf(1,1)), C_LOC(buf(2,1))) - desc%dim(2)%sm = ompi_f08_addr_diff(C_LOC(buf(1,1)), C_LOC(buf(1,2))) -end subroutine - -end module OMPI_Fortran_binding diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/OMPI_Fortran_binding_c.c b/ompi/mpi/fortran/use-mpi-f08-desc/OMPI_Fortran_binding_c.c deleted file mode 100644 index fbd4e3531d6..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/OMPI_Fortran_binding_c.c +++ /dev/null @@ -1,194 +0,0 @@ -/* - * Temporary file to test MPI-3 interfaces with descriptors - */ - -#include "ompi_config.h" -#include "ISO_Fortran_binding.h" -#include - -#define DEBUG_PRINT 0 - -void * ompi_f08_addr(void * buf) -{ -#if DEBUG_PRINT - printf("ompi_f08_addr = %p val=%d\n", buf, ((int*)buf)[0]); -#endif - return buf; -} - -void * ompi_f08_print_addr(void * buf) -{ - printf(" ompi_f08_addr = %p val=%d\n", buf, ((int*)buf)[0]); - return buf; -} - -size_t ompi_f08_addr_diff(void * buf1, void * buf2) -{ - size_t diff = (char*) buf2 - (char*) buf1; -#if DEBUG_PRINT - printf("ompi_f08_addr_diff buf1 = %p val=%d\n", buf1, ((int*)buf1)[0]); - printf("ompi_f08_addr_diff buf2 = %p val=%d\n", buf2, ((int*)buf2)[0]); - printf("ompi_f08_addr_diff diff = %ld\n", diff); -#endif - return diff; -} - -/* - * Returns true if the array described by desc is contiguous - */ -int isContiguous(CFI_cdesc_t * desc) -{ - int r; - size_t sm = desc->elem_len; - - for (r = 0; r < desc->rank; r++) { - if (sm == desc->dim[r].sm) { - sm *= desc->dim[r].extent; - } else { - return 0; - } - } - - return 1; -} - -/* - * Returns the number of elements in the array described by desc. - * The array may be non-contiguous. - */ -size_t numElements(CFI_cdesc_t * desc) -{ - int r; - size_t num = 1; - - /* TODO - can have 0 size arrays? */ - - for (r = 0; r < desc->rank; r++) { - num *= desc->dim[r].extent; - } - return num; -} - -/* - * General routine to copy the elements from the array described by desc - * to cont_buf. The array itself may be non-contiguous. For an array - * of specific rank and type there exists more efficient methods to - * copy the buffer. Returns number of bytes copied. - */ -void * copyToContiguous(CFI_cdesc_t * desc, void * cont_buf, size_t offset, int rank) -{ - size_t b, e, num_copied; - char * next_out; - - char * in = (char *) desc->base_addr + offset; - char * out = (char *) cont_buf; - - if (rank == 0) { - /* copy scalar element */ - for (b = 0; b < desc->elem_len; b++) { - *out++ = *in++; - } - cont_buf = out; - } - else { - rank -= 1; - for (e = 0; e < desc->dim[rank].extent; e++) { - /* recur on subarrays of lesser rank */ - cont_buf = copyToContiguous(desc, cont_buf, offset, rank); - offset += desc->dim[rank].sm; - } - } - - return cont_buf; -} - -/* - * General routine to copy the elements to the array described by desc - * from cont_buf. The array itself may be non-contiguous. For an array - * of specific rank and type there exists more efficient methods to - * copy the buffer. Returns number of bytes copied. - */ -void * copyFromContiguous(CFI_cdesc_t * desc, void * cont_buf, size_t offset, int rank) -{ - size_t b, e, num_copied; - char * next_out; - - char * out = (char *) desc->base_addr + offset; - char * in = (char *) cont_buf; - - if (rank == 0) { - /* copy scalar element */ - for (b = 0; b < desc->elem_len; b++) { - *out++ = *in++; - } - cont_buf = in; - } - else { - rank -= 1; - for (e = 0; e < desc->dim[rank].extent; e++) { - /* recur on subarrays of lesser rank */ - cont_buf = copyFromContiguous(desc, cont_buf, offset, rank); - offset += desc->dim[rank].sm; - } - } - - return cont_buf; -} - -/* From ../mpif-h/send_f.c - */ -void ompi_recv_f(char *buf, MPI_Fint *count, MPI_Fint *datatype, - MPI_Fint *source, MPI_Fint *tag, MPI_Fint *comm, - MPI_Fint *status, MPI_Fint *ierr); - -void ompi_recv_f08_desc_f(CFI_cdesc_t *desc, MPI_Fint *count, MPI_Fint *datatype, - MPI_Fint *source, MPI_Fint *tag, MPI_Fint *comm, - MPI_Fint *status, MPI_Fint *ierr) -{ - size_t num_bytes = 0; - - if (isContiguous(desc)) { - //printf("ompi_recv_f08_desc_f: buf is contiguous\n"); - ompi_recv_f(desc->base_addr, count, datatype, source, tag, comm, status, ierr); - } else { - size_t cont_size = desc->elem_len * numElements(desc); - void * cont_buf = malloc(cont_size); - //assert(cont_buf); - - //printf("ompi_recv_f08_desc_f: buf not contiguous, # elements==%ld, receiving %ld bytes\n", numElements(desc), cont_size); - ompi_recv_f(cont_buf, count, datatype, source, tag, comm, status, ierr); - - num_bytes = (char*) copyFromContiguous(desc, cont_buf, 0, desc->rank) - (char*) cont_buf; - //printf("ompi_recv_f08_desc_f: received %d bytes\n", num_bytes); - - free(cont_buf); - } -} - -/* From ../mpif-h/send_f.c - */ -void ompi_send_f(char *buf, MPI_Fint *count, MPI_Fint *datatype, - MPI_Fint *dest, MPI_Fint *tag, MPI_Fint *comm, MPI_Fint *ierr); - -void ompi_send_f08_desc_f(CFI_cdesc_t *desc, MPI_Fint *count, MPI_Fint *datatype, - MPI_Fint *dest, MPI_Fint *tag, MPI_Fint *comm, MPI_Fint *ierr) -{ - size_t num_bytes = 0; - - if (isContiguous(desc)) { - //printf("ompi_send_f08_desc_f: buf is contiguous\n"); - ompi_send_f(desc->base_addr, count, datatype, dest, tag, comm, ierr); - } else { - size_t cont_size = desc->elem_len * numElements(desc); - void * cont_buf = malloc(cont_size); - //assert(cont_buf); - - num_bytes = (char*) copyToContiguous(desc, cont_buf, 0, desc->rank) - (char*) cont_buf; - - //printf("ompi_send_f08_desc_f: buf not contiguous, # elements==%ld, sending %ld bytes\n", numElements(desc), num_bytes); - ompi_send_f(cont_buf, count, datatype, dest, tag, comm, ierr); - - free(cont_buf); - } - -} diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/comm_rank_f08.f90 b/ompi/mpi/fortran/use-mpi-f08-desc/comm_rank_f08.f90 deleted file mode 100644 index 263f840811c..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/comm_rank_f08.f90 +++ /dev/null @@ -1,20 +0,0 @@ -! -*- f90 -*- -! -! Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. -! Copyright (c) 2009-2012 Los Alamos National Security, LLC. -! All Rights reserved. -! $COPYRIGHT$ - -subroutine MPI_Comm_rank_f08(comm,rank,ierror) - use :: mpi_f08_types, only : MPI_Comm - use :: mpi_f08, only : ompi_comm_rank_f - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(OUT) :: rank - INTEGER, OPTIONAL, INTENT(OUT) :: ierror - integer :: c_ierror - - call ompi_comm_rank_f(comm%MPI_VAL,rank,c_ierror) - if (present(ierror)) ierror = c_ierror - -end subroutine MPI_Comm_rank_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/comm_size_f08.f90 b/ompi/mpi/fortran/use-mpi-f08-desc/comm_size_f08.f90 deleted file mode 100644 index 13b69e9504a..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/comm_size_f08.f90 +++ /dev/null @@ -1,20 +0,0 @@ -! -*- f90 -*- -! -! Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. -! Copyright (c) 2009-2012 Los Alamos National Security, LLC. -! All Rights reserved. -! $COPYRIGHT$ - -subroutine MPI_Comm_size_f08(comm,size,ierror) - use :: mpi_f08_types, only : MPI_Comm - use :: mpi_f08, only : ompi_comm_size_f - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(OUT) :: size - INTEGER, OPTIONAL, INTENT(OUT) :: ierror - integer :: c_ierror - - call ompi_comm_size_f(comm%MPI_VAL,size,c_ierror) - if (present(ierror)) ierror = c_ierror - -end subroutine MPI_Comm_size_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/constants.c b/ompi/mpi/fortran/use-mpi-f08-desc/constants.c deleted file mode 100644 index 2422424f4e5..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/constants.c +++ /dev/null @@ -1,64 +0,0 @@ -/* - * Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. - * - * This file provides symbols for the derived type values needed - * in mpi3_types.f90. - */ - -#include "ompi_config.h" - -#include "constants.h" - -OMPI_DECLSPEC int ompi_f08_mpi_comm_world = OMPI_MPI_COMM_WORLD; -OMPI_DECLSPEC int ompi_f08_mpi_comm_self = OMPI_MPI_COMM_SELF; -OMPI_DECLSPEC int ompi_f08_mpi_group_empty = OMPI_MPI_GROUP_EMPTY; -OMPI_DECLSPEC int ompi_f08_mpi_errors_are_fatal = OMPI_MPI_ERRORS_ARE_FATAL; -OMPI_DECLSPEC int ompi_f08_mpi_errors_return = OMPI_MPI_ERRORS_RETURN; - -/* - * NULL "handles" (indices) - */ -OMPI_DECLSPEC int ompi_f08_mpi_group_null = OMPI_MPI_GROUP_NULL; -OMPI_DECLSPEC int ompi_f08_mpi_comm_null = OMPI_MPI_COMM_NULL; -OMPI_DECLSPEC int ompi_f08_mpi_datatype_null = OMPI_MPI_DATATYPE_NULL; -OMPI_DECLSPEC int ompi_f08_mpi_request_null = OMPI_MPI_REQUEST_NULL; -OMPI_DECLSPEC int ompi_f08_mpi_op_null = OMPI_MPI_OP_NULL; -OMPI_DECLSPEC int ompi_f08_mpi_errhandler_null = OMPI_MPI_ERRHANDLER_NULL; -OMPI_DECLSPEC int ompi_f08_mpi_info_null = OMPI_MPI_INFO_NULL; -OMPI_DECLSPEC int ompi_f08_mpi_win_null = OMPI_MPI_WIN_NULL; - -/* - * common block items from ompi/include/mpif-common.h - */ -OMPI_DECLSPEC int ompi_f08_mpi_byte = OMPI_MPI_BYTE; -OMPI_DECLSPEC int ompi_f08_mpi_packed = OMPI_MPI_PACKED; -OMPI_DECLSPEC int ompi_f08_mpi_ub = OMPI_MPI_UB; -OMPI_DECLSPEC int ompi_f08_mpi_lb = OMPI_MPI_LB; -OMPI_DECLSPEC int ompi_f08_mpi_character = OMPI_MPI_CHARACTER; -OMPI_DECLSPEC int ompi_f08_mpi_logical = OMPI_MPI_LOGICAL; -OMPI_DECLSPEC int ompi_f08_mpi_integer = OMPI_MPI_INTEGER; -OMPI_DECLSPEC int ompi_f08_mpi_integer1 = OMPI_MPI_INTEGER1; -OMPI_DECLSPEC int ompi_f08_mpi_integer2 = OMPI_MPI_INTEGER2; -OMPI_DECLSPEC int ompi_f08_mpi_integer4 = OMPI_MPI_INTEGER4; -OMPI_DECLSPEC int ompi_f08_mpi_integer8 = OMPI_MPI_INTEGER8; -OMPI_DECLSPEC int ompi_f08_mpi_integer16 = OMPI_MPI_INTEGER16; -OMPI_DECLSPEC int ompi_f08_mpi_real = OMPI_MPI_REAL; -OMPI_DECLSPEC int ompi_f08_mpi_real4 = OMPI_MPI_REAL4; -OMPI_DECLSPEC int ompi_f08_mpi_real8 = OMPI_MPI_REAL8; -OMPI_DECLSPEC int ompi_f08_mpi_real16 = OMPI_MPI_REAL16; -OMPI_DECLSPEC int ompi_f08_mpi_double_precision = OMPI_MPI_DOUBLE_PRECISION; -OMPI_DECLSPEC int ompi_f08_mpi_complex = OMPI_MPI_COMPLEX; -OMPI_DECLSPEC int ompi_f08_mpi_complex8 = OMPI_MPI_COMPLEX8; -OMPI_DECLSPEC int ompi_f08_mpi_complex16 = OMPI_MPI_COMPLEX16; -OMPI_DECLSPEC int ompi_f08_mpi_complex32 = OMPI_MPI_COMPLEX32; -OMPI_DECLSPEC int ompi_f08_mpi_double_complex = OMPI_MPI_DOUBLE_COMPLEX; -OMPI_DECLSPEC int ompi_f08_mpi_2real = OMPI_MPI_2REAL; -OMPI_DECLSPEC int ompi_f08_mpi_2double_precision = OMPI_MPI_2DOUBLE_PRECISION; -OMPI_DECLSPEC int ompi_f08_mpi_2integer = OMPI_MPI_2INTEGER; -OMPI_DECLSPEC int ompi_f08_mpi_2complex = OMPI_MPI_2COMPLEX; -OMPI_DECLSPEC int ompi_f08_mpi_2double_complex = OMPI_MPI_2DOUBLE_COMPLEX; -OMPI_DECLSPEC int ompi_f08_mpi_real2 = OMPI_MPI_REAL2; -OMPI_DECLSPEC int ompi_f08_mpi_logical1 = OMPI_MPI_LOGICAL1; -OMPI_DECLSPEC int ompi_f08_mpi_logical2 = OMPI_MPI_LOGICAL2; -OMPI_DECLSPEC int ompi_f08_mpi_logical4 = OMPI_MPI_LOGICAL4; -OMPI_DECLSPEC int ompi_f08_mpi_logical8 = OMPI_MPI_LOGICAL8; diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/constants.h b/ompi/mpi/fortran/use-mpi-f08-desc/constants.h deleted file mode 100644 index d8a320bea13..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/constants.h +++ /dev/null @@ -1,84 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2006 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2007 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2007-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2008-2009 Sun Microsystems, Inc. All rights reserved. - * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef USE_MPI_F08_CONSTANTS_H -#define USE_MPI_F08_CONSTANTS_H - -/* - * This file contains macro definitions for parameter values used in the - * MPI_F08 Fortran bindings. The values are the same as those in - * ompi/include/mpif-common.h and are generated by the script - * TODO FIXME ompi/xxx/mpif-common.pl. - */ - -#define OMPI_MPI_COMM_WORLD 0 -#define OMPI_MPI_COMM_SELF 1 -#define OMPI_MPI_GROUP_EMPTY 1 -#define OMPI_MPI_ERRORS_ARE_FATAL 1 -#define OMPI_MPI_ERRORS_RETURN 2 - -/* - * NULL 'handles' (indices) - */ - -#define OMPI_MPI_GROUP_NULL 0 -#define OMPI_MPI_COMM_NULL 2 -#define OMPI_MPI_DATATYPE_NULL 0 -#define OMPI_MPI_REQUEST_NULL 0 -#define OMPI_MPI_OP_NULL 0 -#define OMPI_MPI_ERRHANDLER_NULL 0 -#define OMPI_MPI_INFO_NULL 0 -#define OMPI_MPI_WIN_NULL 0 - -#define OMPI_MPI_BYTE 1 -#define OMPI_MPI_PACKED 2 -#define OMPI_MPI_UB 3 -#define OMPI_MPI_LB 4 -#define OMPI_MPI_CHARACTER 5 -#define OMPI_MPI_LOGICAL 6 -#define OMPI_MPI_INTEGER 7 -#define OMPI_MPI_INTEGER1 8 -#define OMPI_MPI_INTEGER2 9 -#define OMPI_MPI_INTEGER4 10 -#define OMPI_MPI_INTEGER8 11 -#define OMPI_MPI_INTEGER16 12 -#define OMPI_MPI_REAL 13 -#define OMPI_MPI_REAL4 14 -#define OMPI_MPI_REAL8 15 -#define OMPI_MPI_REAL16 16 -#define OMPI_MPI_DOUBLE_PRECISION 17 -#define OMPI_MPI_COMPLEX 18 -#define OMPI_MPI_COMPLEX8 19 -#define OMPI_MPI_COMPLEX16 20 -#define OMPI_MPI_COMPLEX32 21 -#define OMPI_MPI_DOUBLE_COMPLEX 22 -#define OMPI_MPI_2REAL 23 -#define OMPI_MPI_2DOUBLE_PRECISION 24 -#define OMPI_MPI_2INTEGER 25 -#define OMPI_MPI_2COMPLEX 26 -#define OMPI_MPI_2DOUBLE_COMPLEX 27 -#define OMPI_MPI_REAL2 28 -#define OMPI_MPI_LOGICAL1 29 -#define OMPI_MPI_LOGICAL2 30 -#define OMPI_MPI_LOGICAL4 31 -#define OMPI_MPI_LOGICAL8 32 - -#endif /* USE_MPI_F08_CONSTANTS_H */ diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/constants.h.fin b/ompi/mpi/fortran/use-mpi-f08-desc/constants.h.fin deleted file mode 100644 index f6f3a33ba23..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/constants.h.fin +++ /dev/null @@ -1,33 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2006 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2007 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2007-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2008-2009 Sun Microsystems, Inc. All rights reserved. - * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. - * Copyright (c) 2009-2012 Los Alamos National Security, LLC. - * All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef USE_MPI_F08_CONSTANTS_H -#define USE_MPI_F08_CONSTANTS_H - -/* - * This file contains macro definitions for parameter values used in the - * MPI_F08 Fortran bindings. The values are the same as those in - * ompi/include/mpif-common.h and are generated by the script - * ompi/include/mpif-common.pl. - */ - diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/finalize_f08.f90 b/ompi/mpi/fortran/use-mpi-f08-desc/finalize_f08.f90 deleted file mode 100644 index 42b8c7be592..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/finalize_f08.f90 +++ /dev/null @@ -1,17 +0,0 @@ -! -*- f90 -*- -! -! Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. -! Copyright (c) 2009-2012 Los Alamos National Security, LLC. -! All Rights reserved. -! $COPYRIGHT$ - -subroutine MPI_Finalize_f08(ierror) - use :: mpi_f08, only : ompi_finalize_f - implicit none - INTEGER, OPTIONAL, INTENT(OUT) :: ierror - integer :: c_ierror - - call ompi_finalize_f(c_ierror) - if (present(ierror)) ierror = c_ierror - -end subroutine MPI_Finalize_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/init_f08.f90 b/ompi/mpi/fortran/use-mpi-f08-desc/init_f08.f90 deleted file mode 100644 index 1b1471d3f3f..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/init_f08.f90 +++ /dev/null @@ -1,17 +0,0 @@ -! -*- f90 -*- -! -! Copyright (c) 2010-2012 Cisco Systems, Inc. All rights reserved. -! Copyright (c) 2009-2012 Los Alamos National Security, LLC. -! All Rights reserved. -! $COPYRIGHT$ - -subroutine MPI_Init_f08(ierror) - use :: mpi_f08, only : ompi_init_f - implicit none - INTEGER, OPTIONAL, INTENT(OUT) :: ierror - integer :: c_ierror - - call ompi_init_f(c_ierror) - if (present(ierror)) ierror = c_ierror - -end subroutine MPI_Init_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/mpi-f-interfaces-bind.h b/ompi/mpi/fortran/use-mpi-f08-desc/mpi-f-interfaces-bind.h deleted file mode 100644 index c2892f759fb..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/mpi-f-interfaces-bind.h +++ /dev/null @@ -1,78 +0,0 @@ -! -*- f90 -*- -! -! Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. -! Copyright (c) 2009-2012 Los Alamos National Security, LLC. -! All rights reserved. -! $COPYRIGHT$ - -! -! This file provides the interface specifications for the MPI Fortran -! API bindings. It effectively maps between public names ("MPI_Init") -! and the back-end implementation subroutine name (e.g., "ompi_init_f"). - -interface - -subroutine ompi_comm_rank_f(comm,rank,ierror) & - BIND(C, name="ompi_comm_rank_f") - use :: mpi_f08_types - implicit none - INTEGER, INTENT(IN) :: comm - INTEGER, INTENT(OUT) :: rank - INTEGER, INTENT(OUT) :: ierror -end subroutine ompi_comm_rank_f - -subroutine ompi_comm_size_f(comm,size,ierror) & - BIND(C, name="ompi_comm_size_f") - use :: mpi_f08_types - implicit none - INTEGER, INTENT(IN) :: comm - INTEGER, INTENT(OUT) :: size - INTEGER, INTENT(OUT) :: ierror -end subroutine ompi_comm_size_f - -subroutine ompi_finalize_f(ierror) & - BIND(C, name="ompi_finalize_f") - use :: mpi_f08_types - implicit none - INTEGER, INTENT(OUT) :: ierror -end subroutine ompi_finalize_f - -subroutine ompi_init_f(ierror) & - BIND(C, name="ompi_init_f") - use :: mpi_f08_types - implicit none - INTEGER, INTENT(OUT) :: ierror -end subroutine ompi_init_f - -! ompi_send_f/ompi_recv_f interfaces not needed as they are called from C -! - -subroutine ompi_type_commit_f(datatype,ierror) & - BIND(C, name="ompi_type_commit_f") - use :: mpi_f08_types - implicit none - INTEGER, INTENT(INOUT) :: datatype - INTEGER, INTENT(OUT) :: ierror -end subroutine ompi_type_commit_f - -subroutine ompi_type_contiguous_f(count,oldtype,newtype,ierror) & - BIND(C, name="ompi_type_contiguous_f") - use :: mpi_f08_types - implicit none - INTEGER, INTENT(IN) :: count - INTEGER, INTENT(IN) :: oldtype - INTEGER, INTENT(OUT) :: newtype - INTEGER, INTENT(OUT) :: ierror -end subroutine ompi_type_contiguous_f - -subroutine ompi_type_vector_f(count,blocklength,stride,oldtype,newtype,ierror) & - BIND(C, name="ompi_type_vector_f") - use :: mpi_f08_types - implicit none - INTEGER, INTENT(IN) :: count, blocklength, stride - INTEGER, INTENT(IN) :: oldtype - INTEGER, INTENT(OUT) :: newtype - INTEGER, INTENT(OUT) :: ierror -end subroutine ompi_type_vector_f - -end interface diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/mpi-f08-interfaces.F90 b/ompi/mpi/fortran/use-mpi-f08-desc/mpi-f08-interfaces.F90 deleted file mode 100644 index e6a84c0283b..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/mpi-f08-interfaces.F90 +++ /dev/null @@ -1,150 +0,0 @@ -! -*- f90 -*- -! -! Copyright (c) 2009-2010 Cisco Systems, Inc. All rights reserved. -! Copyright (c) 2009-2012 Los Alamos National Security, LLC. -! All rights reserved. -! -! This file provides the interface specifications for the MPI Fortran -! API bindings. It effectively maps between public names ("MPI_Init") -! and the name for tools ("MPI_Init_f08") and the back-end implementation -! name (e.g., "MPI_Init_f08"). - -module mpi_f08_interfaces - -interface MPI_Comm_rank -subroutine MPI_Comm_rank_f08(comm,rank,ierror) - use :: mpi_f08_types - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(OUT) :: rank - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_rank_f08 -end interface MPI_Comm_rank - -interface MPI_Comm_size -subroutine MPI_Comm_size_f08(comm,size,ierror) - use :: mpi_f08_types - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(OUT) :: size - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_size_f08 -end interface MPI_Comm_size - -interface MPI_Finalize -subroutine MPI_Finalize_f08(ierror) - use :: mpi_f08_types - implicit none - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Finalize_f08 -end interface MPI_Finalize - -interface MPI_Init -subroutine MPI_Init_f08(ierror) - use :: mpi_f08_types - implicit none - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Init_f08 -end interface MPI_Init - -! Note that send/recv only works with specific types for buffers (2D integers) -! Double precision is not implemented at this time but allows compiler to do -! type checking so that MPI_SUBARRAYS_SUPPORTED flag can be utilized in -! test_send_recv.f90. -! - -interface MPI_Recv -subroutine MPI_Recv_f08_desc_int_2d(buf,count,datatype,source,tag,comm,status,ierror) - use :: mpi_f08_types - implicit none - INTEGER, INTENT(IN), target :: buf(:,:) - INTEGER, INTENT(IN) :: count, source, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Recv_f08_desc_int_2d -subroutine MPI_Recv_f08_desc_dbl_1d(buf,count,datatype,source,tag,comm,status,ierror) - use :: mpi_f08_types - implicit none - DOUBLE PRECISION, INTENT(IN), target :: buf(:) - INTEGER, INTENT(IN) :: count, source, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Recv_f08_desc_dbl_1d -subroutine MPI_Recv_f08_desc_dbl_0d(buf,count,datatype,source,tag,comm,status,ierror) - use :: mpi_f08_types - implicit none - DOUBLE PRECISION, INTENT(IN), target :: buf - INTEGER, INTENT(IN) :: count, source, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Recv_f08_desc_dbl_0d -end interface MPI_Recv - -interface MPI_Send -subroutine MPI_Send_f08_desc_int_2d(buf,count,datatype,dest,tag,comm,ierror) - use :: mpi_f08_types - implicit none - INTEGER, INTENT(IN), target :: buf(:,:) - INTEGER, INTENT(IN) :: count, dest, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Send_f08_desc_int_2d -subroutine MPI_Send_f08_desc_dbl_1d(buf,count,datatype,dest,tag,comm,ierror) - use :: mpi_f08_types - implicit none - DOUBLE PRECISION, INTENT(IN), target :: buf(:) - INTEGER, INTENT(IN) :: count, dest, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Send_f08_desc_dbl_1d -subroutine MPI_Send_f08_desc_dbl_0d(buf,count,datatype,dest,tag,comm,ierror) - use :: mpi_f08_types - implicit none - DOUBLE PRECISION, INTENT(IN), target :: buf - INTEGER, INTENT(IN) :: count, dest, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Send_f08_desc_dbl_0d -end interface MPI_Send - -interface MPI_Type_commit -subroutine MPI_Type_commit_f08(datatype,ierror) - use :: mpi_f08_types - implicit none - TYPE(MPI_Datatype), INTENT(INOUT) :: datatype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_commit_f08 -end interface MPI_Type_commit - -interface MPI_Type_contiguous -subroutine MPI_Type_contiguous_f08(count,oldtype,newtype,ierror) - use :: mpi_f08_types - implicit none - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: oldtype - TYPE(MPI_Datatype), INTENT(OUT) :: newtype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_contiguous_f08 -end interface MPI_Type_contiguous - -interface MPI_Type_vector -subroutine MPI_Type_vector_f08(count,blocklength,stride,oldtype,newtype,ierror) - use :: mpi_f08_types - implicit none - INTEGER, INTENT(IN) :: count, blocklength, stride - TYPE(MPI_Datatype), INTENT(IN) :: oldtype - TYPE(MPI_Datatype), INTENT(OUT) :: newtype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_vector_f08 -end interface MPI_Type_vector - -end module mpi_f08_interfaces diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/mpi-f08-types.f90 b/ompi/mpi/fortran/use-mpi-f08-desc/mpi-f08-types.f90 deleted file mode 100644 index ec6fbf2ca31..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/mpi-f08-types.f90 +++ /dev/null @@ -1,158 +0,0 @@ -! -*- f90 -*- -! -! Copyright (c) 2009 Cisco Systems, Inc. All rights reserved. -! Copyright (c) 2009-2012 Los Alamos National Security, LLC. -! All rights reserved. -! Copyright (c) 2015 Research Organization for Information Science -! and Technology (RIST). All rights reserved. -! -! This file creates mappings between MPI C types (e.g., MPI_Comm) and -! variables (e.g., MPI_COMM_WORLD) and corresponding Fortran names -! (type(MPI_Comm_world) and MPI_COMM_WORLD, respectively). - -module mpi_f08_types - - use, intrinsic :: ISO_C_BINDING - - include "mpif-config.h" - - ! - ! constants (these must agree with those in mpif-common.h, mpif-config.h) - ! - integer, parameter :: MPI_SUCCESS = 0 - - ! - ! kind parameters - ! - - integer, parameter :: MPI_DOUBLE_KIND = C_DOUBLE - - ! - ! derived types - ! - - type, BIND(C) :: MPI_Comm - integer :: MPI_VAL - end type MPI_Comm - - type, BIND(C) :: MPI_Datatype - integer :: MPI_VAL - end type MPI_Datatype - - type, BIND(C) :: MPI_Errhandler - integer :: MPI_VAL - end type MPI_Errhandler - - type, BIND(C) :: MPI_File - integer :: MPI_VAL - end type MPI_File - - type, BIND(C) :: MPI_Group - integer :: MPI_VAL - end type MPI_Group - - type, BIND(C) :: MPI_Info - integer :: MPI_VAL - end type MPI_Info - - type, BIND(C) :: MPI_Message - integer :: MPI_VAL - end type MPI_Message - - type, BIND(C) :: MPI_Op - integer :: MPI_VAL - end type MPI_Op - - type, BIND(C) :: MPI_Request - integer :: MPI_VAL - end type MPI_Request - - type, BIND(C) :: MPI_Win - integer :: MPI_VAL - end type MPI_Win - - type, BIND(C) :: MPI_Status - integer :: MPI_SOURCE - integer :: MPI_TAG - integer :: MPI_ERROR - integer(C_INT), private :: c_cancelled - integer(C_SIZE_T), private :: c_count - end type MPI_Status - - ! - ! Typedefs from C - ! - -! MPI_Aint -! MPI_Offset - - ! - ! Pre-defined communicator bindings - ! - - type(MPI_Comm), protected, bind(C, name="ompi_f08_mpi_comm_world") :: MPI_COMM_WORLD - type(MPI_Comm), protected, bind(C, name="ompi_f08_mpi_comm_self") :: MPI_COMM_SELF - type(MPI_Group), protected, bind(C, name="ompi_f08_mpi_group_empty") :: MPI_GROUP_EMPTY - type(MPI_Errhandler), protected, bind(C, name="ompi_f08_mpi_errors_are_fatal") :: MPI_ERRORS_ARE_FATAL - type(MPI_Errhandler), protected, bind(C, name="ompi_f08_mpi_errors_return") :: MPI_ERRORS_RETURN - - ! - ! NULL "handles" (indices) - ! - - type(MPI_Comm), protected, bind(C, name="ompi_f08_mpi_comm_null") :: MPI_COMM_NULL; - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_datatype_null") :: MPI_DATATYPE_NULL; - type(MPI_Errhandler), protected, bind(C, name="ompi_f08_mpi_errhandler_null") :: MPI_ERRHANDLER_NULL; - type(MPI_Group), protected, bind(C, name="ompi_f08_mpi_group_null") :: MPI_GROUP_NULL; - type(MPI_Info), protected, bind(C, name="ompi_f08_mpi_info_null") :: MPI_INFO_NULL; - type(MPI_Message), protected, bind(C, name="ompi_f08_mpi_message_null") :: MPI_MESSAGE_NULL; - type(MPI_Op), protected, bind(C, name="ompi_f08_mpi_op_null") :: MPI_OP_NULL; - type(MPI_Request), protected, bind(C, name="ompi_f08_mpi_request_null") :: MPI_REQUEST_NULL; - type(MPI_Win), protected, bind(C, name="ompi_f08_mpi_win_null") :: MPI_WIN_NULL; - - ! - ! Pre-defined datatype bindings - ! - ! These definitions should match those in ompi/include/mpif-common.h. - ! They are defined in ompi/runtime/ompi_mpi_init.c - ! - - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_byte") :: MPI_BYTE - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_packed") :: MPI_PACKED - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_ub") :: MPI_UB - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_lb") :: MPI_LB - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_character") :: MPI_CHARACTER - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_logical") :: MPI_LOGICAL - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_integer") :: MPI_INTEGER - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_integer1") :: MPI_INTEGER1 - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_integer2") :: MPI_INTEGER2 - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_integer4") :: MPI_INTEGER4 - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_integer8") :: MPI_INTEGER8 - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_integer16") :: MPI_INTEGER16 - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_real") :: MPI_REAL - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_real4") :: MPI_REAL4 - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_real8") :: MPI_REAL8 - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_real16") :: MPI_REAL16 - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_double_precision") :: MPI_DOUBLE_PRECISION - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_complex") :: MPI_COMPLEX - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_complex8") :: MPI_COMPLEX8 - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_complex16") :: MPI_COMPLEX16 - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_complex32") :: MPI_COMPLEX32 - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_double_complex") :: MPI_DOUBLE_COMPLEX - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_2real") :: MPI_2REAL - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_2double_precision") :: MPI_2DOUBLE_PRECISION - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_2integer") :: MPI_2INTEGER - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_2complex") :: MPI_2COMPLEX - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_2double_complex") :: MPI_2DOUBLE_COMPLEX - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_real2") :: MPI_REAL2 - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_logical1") :: MPI_LOGICAL1 - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_logical2") :: MPI_LOGICAL2 - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_logical4") :: MPI_LOGICAL4 - type(MPI_Datatype), protected, bind(C, name="ompi_f08_mpi_logical8") :: MPI_LOGICAL8 - -! -! STATUS/STATUSES_IGNORE -! -#include "mpif-f08-types.h" - -end module mpi_f08_types diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/mpi-f08.f90 b/ompi/mpi/fortran/use-mpi-f08-desc/mpi-f08.f90 deleted file mode 100644 index ce3b197ed94..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/mpi-f08.f90 +++ /dev/null @@ -1,35 +0,0 @@ -! -*- f90 -*- -! -! Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana -! University Research and Technology -! Corporation. All rights reserved. -! Copyright (c) 2004-2005 The University of Tennessee and The University -! of Tennessee Research Foundation. All rights -! reserved. -! Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, -! University of Stuttgart. All rights reserved. -! Copyright (c) 2004-2005 The Regents of the University of California. -! All rights reserved. -! Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. -! Copyright (c) 2009-2012 Los Alamos National Security, LLC. -! All rights reserved. -! $COPYRIGHT$ -! -! Additional copyrights may follow -! -! $HEADER$ -! - -module mpi_f08 - - use mpi_f08_types - use mpi_f08_interfaces ! this module contains the mpi_f08 interface declarations - use ompi_fortran_binding ! this module provides support for building descriptors - -! -! Declaration of the interfaces to the ompi impl files -! e.g., send_f.c -! - include "mpi-f-interfaces-bind.h" - -end module mpi_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/recv_f08_desc.f90 b/ompi/mpi/fortran/use-mpi-f08-desc/recv_f08_desc.f90 deleted file mode 100644 index e7ac58d305a..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/recv_f08_desc.f90 +++ /dev/null @@ -1,84 +0,0 @@ -! -*- f90 -*- -! -! Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. -! Copyright (c) 2009-2012 Los Alamos National Security, LLC. -! All Rights reserved. -! $COPYRIGHT$ - - - ! This wrapper mimics how the MPI_Recv_wrapper will eventually work. - ! Eventually buf will be typed, TYPE(*), DIMENSION(..) - ! Now can only mimic with explicit type and rank for assumed-shape dummy - ! arguments. - ! - subroutine MPI_Recv_f08_desc_int_2d(buf,count,datatype,source,tag,comm,status,ierror) - use :: OMPI_Fortran_binding - implicit none - integer, INTENT(IN), target :: buf(:,:) - INTEGER, INTENT(IN) :: count, source, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Status), INTENT(OUT) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror - - integer :: c_ierror - type(CFI_cdesc_t) :: buf_desc - - call make_desc_f(buf, buf_desc) - !call print_desc(buf_desc) - - call ompi_recv_f08_desc_f(buf_desc, count, datatype%MPI_VAL, source, tag, comm%MPI_VAL, status, c_ierror) - - if (present(ierror)) ierror = c_ierror - - end subroutine MPI_Recv_f08_desc_int_2d - - -! WARNING, not yet implemented, stub used to test MPI_SUBARRAYS_SUPPORTED usage -! - subroutine MPI_Recv_f08_desc_dbl_1d(buf,count,datatype,source,tag,comm,status,ierror) - use :: OMPI_Fortran_binding - implicit none - double precision, INTENT(IN), target :: buf(:) - INTEGER, INTENT(IN) :: count, source, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Status), INTENT(OUT) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror - - integer :: c_ierror - - ! this hack is to remove compiler warning about no change to out variable - status = MPI_STATUS_IGNORE - - c_ierror = 1 - print *, "WARNING, testing of double precision arrays not yet supported with subarrays" - - if (present(ierror)) ierror = c_ierror - - end subroutine MPI_Recv_f08_desc_dbl_1d - -! WARNING, not yet implemented, stub used to test MPI_SUBARRAYS_SUPPORTED usage -! - subroutine MPI_Recv_f08_desc_dbl_0d(buf,count,datatype,source,tag,comm,status,ierror) - use :: OMPI_Fortran_binding - implicit none - double precision, INTENT(IN), target :: buf - INTEGER, INTENT(IN) :: count, source, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Status), INTENT(OUT) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror - - integer :: c_ierror - - ! this hack is to remove compiler warning about no change to out variable - status = MPI_STATUS_IGNORE - - c_ierror = 1 - print *, "WARNING, testing of double precision arrays not yet supported with subarrays" - - if (present(ierror)) ierror = c_ierror - - end subroutine MPI_Recv_f08_desc_dbl_0d - diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/send_f08_desc.f90 b/ompi/mpi/fortran/use-mpi-f08-desc/send_f08_desc.f90 deleted file mode 100644 index 98f76e832cd..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/send_f08_desc.f90 +++ /dev/null @@ -1,73 +0,0 @@ -! -*- f90 -*- -! -! Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. -! Copyright (c) 2009-2012 Los Alamos National Security, LLC. -! All Rights reserved. -! $COPYRIGHT$ - - ! This wrapper mimics how the MPI_Send_wrapper will eventually work. - ! Eventually buf will be typed, TYPE(*), DIMENSION(..) - ! Now can only mimic with explicit type and rank for assumed-shape dummy - ! arguments. - ! - subroutine MPI_Send_f08_desc_int_2d(buf, count, datatype, dest, tag, comm, ierror) - use OMPI_Fortran_binding - implicit none - integer, intent(in), target :: buf(:,:) - integer, intent(in) :: count, dest, tag - type(MPI_Datatype), intent(in) :: datatype - type(MPI_Comm), intent(in) :: comm - integer, optional, intent(out) :: ierror - - integer :: err - type(CFI_cdesc_t) :: buf_desc - - call make_desc_f(buf, buf_desc) - !call print_desc(buf_desc) - - call ompi_send_f08_desc_f(buf_desc, count, datatype%MPI_VAL, dest, tag, comm%MPI_VAL, err) - - if (present(ierror)) ierror = err - - end subroutine MPI_Send_f08_desc_int_2d - - -! WARNING, not yet implemented, stub used to test MPI_SUBARRAYS_SUPPORTED usage -! - subroutine MPI_Send_f08_desc_dbl_1d(buf, count, datatype, dest, tag, comm, ierror) - use OMPI_Fortran_binding - implicit none - double precision, intent(in), target :: buf(:) - integer, intent(in) :: count, dest, tag - type(MPI_Datatype), intent(in) :: datatype - type(MPI_Comm), intent(in) :: comm - integer, optional, intent(out) :: ierror - - integer :: err - - print *, "WARNING, testing of double precision arrays not yet supported with subarrays" - err = 1 - - if (present(ierror)) ierror = err - - end subroutine MPI_Send_f08_desc_dbl_1d - -! WARNING, not yet implemented, stub used to test MPI_SUBARRAYS_SUPPORTED usage -! - subroutine MPI_Send_f08_desc_dbl_0d(buf, count, datatype, dest, tag, comm, ierror) - use OMPI_Fortran_binding - implicit none - double precision, intent(in), target :: buf - integer, intent(in) :: count, dest, tag - type(MPI_Datatype), intent(in) :: datatype - type(MPI_Comm), intent(in) :: comm - integer, optional, intent(out) :: ierror - - integer :: err - - print *, "WARNING, testing of double precision arrays not yet supported with subarrays" - err = 1 - - if (present(ierror)) ierror = err - - end subroutine MPI_Send_f08_desc_dbl_0d diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/type_commit_f08.f90 b/ompi/mpi/fortran/use-mpi-f08-desc/type_commit_f08.f90 deleted file mode 100644 index 5ef027f46fd..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/type_commit_f08.f90 +++ /dev/null @@ -1,19 +0,0 @@ -! -*- f90 -*- -! -! Copyright (c) 2010-2012 Cisco Systems, Inc. All rights reserved. -! Copyright (c) 2009-2012 Los Alamos National Security, LLC. -! All Rights reserved. -! $COPYRIGHT$ - -subroutine MPI_Type_commit_f08(datatype,ierror) - use :: mpi_f08_types, only : MPI_Datatype - use :: mpi_f08, only : ompi_type_commit_f - implicit none - TYPE(MPI_Datatype), INTENT(INOUT) :: datatype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror - integer :: c_ierror - - call ompi_type_commit_f(datatype%MPI_VAL,c_ierror) - if (present(ierror)) ierror = c_ierror - -end subroutine MPI_Type_commit_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/type_contiguous_f08.f90 b/ompi/mpi/fortran/use-mpi-f08-desc/type_contiguous_f08.f90 deleted file mode 100644 index b3ca74001f5..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/type_contiguous_f08.f90 +++ /dev/null @@ -1,21 +0,0 @@ -! -*- f90 -*- -! -! Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. -! Copyright (c) 2009-2012 Los Alamos National Security, LLC. -! All Rights reserved. -! $COPYRIGHT$ - -subroutine MPI_Type_contiguous_f08(count,oldtype,newtype,ierror) - use :: mpi_f08_types, only : MPI_Datatype - use :: mpi_f08, only : ompi_type_contiguous_f - implicit none - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: oldtype - TYPE(MPI_Datatype), INTENT(OUT) :: newtype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror - integer :: c_ierror - - call ompi_type_contiguous_f(count,oldtype%MPI_VAL,newtype%MPI_VAL,c_ierror) - if (present(ierror)) ierror = c_ierror - -end subroutine MPI_Type_contiguous_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08-desc/type_vector_f08.f90 b/ompi/mpi/fortran/use-mpi-f08-desc/type_vector_f08.f90 deleted file mode 100644 index 6b015926e66..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08-desc/type_vector_f08.f90 +++ /dev/null @@ -1,22 +0,0 @@ -! -*- f90 -*- -! -! Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. -! Copyright (c) 2009-2012 Los Alamos National Security, LLC. -! All Rights reserved. -! $COPYRIGHT$ - -subroutine MPI_Type_vector_f08(count,blocklength,stride,oldtype,newtype,ierror) - use :: mpi_f08_types, only : MPI_Datatype - use :: mpi_f08, only : ompi_type_vector_f - implicit none - INTEGER, INTENT(IN) :: count, blocklength, stride - TYPE(MPI_Datatype), INTENT(IN) :: oldtype - TYPE(MPI_Datatype), INTENT(OUT) :: newtype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror - integer :: c_ierror - - call ompi_type_vector_f(count,blocklength,stride, & - oldtype%MPI_VAL,newtype%MPI_VAL,c_ierror) - if (present(ierror)) ierror = c_ierror - -end subroutine MPI_Type_vector_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08/Makefile.am b/ompi/mpi/fortran/use-mpi-f08/Makefile.am index 75bbf7600d8..786b7640a79 100644 --- a/ompi/mpi/fortran/use-mpi-f08/Makefile.am +++ b/ompi/mpi/fortran/use-mpi-f08/Makefile.am @@ -7,9 +7,10 @@ # Copyright (c) 2012-2013 Inria. All rights reserved. # Copyright (c) 2013 Los Alamos National Security, LLC. All rights # reserved. -# Copyright (c) 2015-2016 Research Organization for Information Science +# Copyright (c) 2015-2017 Research Organization for Information Science # and Technology (RIST). All rights reserved. # Copyright (c) 2016 IBM Corporation. All rights reserved. +# Copyright (c) 2017 FUJITSU LIMITED. All rights reserved. # # $COPYRIGHT$ # @@ -27,7 +28,7 @@ if OMPI_BUILD_FORTRAN_USEMPIF08_BINDINGS AM_FCFLAGS = -I$(top_builddir)/ompi/include \ -I$(top_srcdir)/ompi/include \ $(OMPI_FC_MODULE_FLAG)$(top_builddir)/ompi/$(OMPI_FORTRAN_USEMPI_DIR) \ - $(OMPI_FC_MODULE_FLAG). \ + $(OMPI_FC_MODULE_FLAG)mod \ -I$(top_srcdir) $(FCFLAGS_f90) MOSTLYCLEANFILES = *.mod @@ -37,9 +38,7 @@ CLEANFILES += *.i90 lib_LTLIBRARIES = lib@OMPI_LIBMPI_NAME@_usempif08.la module_sentinel_file = \ - libforce_usempif08_internal_modules_to_be_built.la - -noinst_LTLIBRARIES = $(module_sentinel_file) + mod/libforce_usempif08_internal_modules_to_be_built.la mpi-f08.lo: $(module_sentinel_file) mpi-f08.lo: mpi-f08.F90 @@ -163,6 +162,65 @@ mpi_api_files = \ exscan_f08.F90 \ f_sync_reg_f08.F90 \ fetch_and_op_f08.F90 \ + file_call_errhandler_f08.F90 \ + file_close_f08.F90 \ + file_create_errhandler_f08.F90 \ + file_delete_f08.F90 \ + file_get_amode_f08.F90 \ + file_get_atomicity_f08.F90 \ + file_get_byte_offset_f08.F90 \ + file_get_errhandler_f08.F90 \ + file_get_group_f08.F90 \ + file_get_info_f08.F90 \ + file_get_position_f08.F90 \ + file_get_position_shared_f08.F90 \ + file_get_size_f08.F90 \ + file_get_type_extent_f08.F90 \ + file_get_view_f08.F90 \ + file_iread_at_f08.F90 \ + file_iread_f08.F90 \ + file_iread_at_all_f08.F90 \ + file_iread_all_f08.F90 \ + file_iread_shared_f08.F90 \ + file_iwrite_at_f08.F90 \ + file_iwrite_f08.F90 \ + file_iwrite_at_all_f08.F90 \ + file_iwrite_all_f08.F90 \ + file_iwrite_shared_f08.F90 \ + file_open_f08.F90 \ + file_preallocate_f08.F90 \ + file_read_all_begin_f08.F90 \ + file_read_all_end_f08.F90 \ + file_read_all_f08.F90 \ + file_read_at_all_begin_f08.F90 \ + file_read_at_all_end_f08.F90 \ + file_read_at_all_f08.F90 \ + file_read_at_f08.F90 \ + file_read_f08.F90 \ + file_read_ordered_begin_f08.F90 \ + file_read_ordered_end_f08.F90 \ + file_read_ordered_f08.F90 \ + file_read_shared_f08.F90 \ + file_seek_f08.F90 \ + file_seek_shared_f08.F90 \ + file_set_atomicity_f08.F90 \ + file_set_errhandler_f08.F90 \ + file_set_info_f08.F90 \ + file_set_size_f08.F90 \ + file_set_view_f08.F90 \ + file_sync_f08.F90 \ + file_write_all_begin_f08.F90 \ + file_write_all_end_f08.F90 \ + file_write_all_f08.F90 \ + file_write_at_all_begin_f08.F90 \ + file_write_at_all_end_f08.F90 \ + file_write_at_all_f08.F90 \ + file_write_at_f08.F90 \ + file_write_f08.F90 \ + file_write_ordered_begin_f08.F90 \ + file_write_ordered_end_f08.F90 \ + file_write_ordered_f08.F90 \ + file_write_shared_f08.F90 \ finalized_f08.F90 \ finalize_f08.F90 \ free_mem_f08.F90 \ @@ -270,6 +328,7 @@ mpi_api_files = \ reduce_local_f08.F90 \ reduce_scatter_f08.F90 \ reduce_scatter_block_f08.F90 \ + register_datarep_f08.F90 \ request_free_f08.F90 \ request_get_status_f08.F90 \ rget_f08.F90 \ @@ -375,70 +434,6 @@ mpi_api_files = \ win_unlock_all_f08.F90 \ win_wait_f08.F90 -if OMPI_PROVIDE_MPI_FILE_INTERFACE -mpi_api_files += \ - file_call_errhandler_f08.F90 \ - file_close_f08.F90 \ - file_create_errhandler_f08.F90 \ - file_delete_f08.F90 \ - file_get_amode_f08.F90 \ - file_get_atomicity_f08.F90 \ - file_get_byte_offset_f08.F90 \ - file_get_errhandler_f08.F90 \ - file_get_group_f08.F90 \ - file_get_info_f08.F90 \ - file_get_position_f08.F90 \ - file_get_position_shared_f08.F90 \ - file_get_size_f08.F90 \ - file_get_type_extent_f08.F90 \ - file_get_view_f08.F90 \ - file_iread_at_f08.F90 \ - file_iread_f08.F90 \ - file_iread_at_all_f08.F90 \ - file_iread_all_f08.F90 \ - file_iread_shared_f08.F90 \ - file_iwrite_at_f08.F90 \ - file_iwrite_f08.F90 \ - file_iwrite_at_all_f08.F90 \ - file_iwrite_all_f08.F90 \ - file_iwrite_shared_f08.F90 \ - file_open_f08.F90 \ - file_preallocate_f08.F90 \ - file_read_all_begin_f08.F90 \ - file_read_all_end_f08.F90 \ - file_read_all_f08.F90 \ - file_read_at_all_begin_f08.F90 \ - file_read_at_all_end_f08.F90 \ - file_read_at_all_f08.F90 \ - file_read_at_f08.F90 \ - file_read_f08.F90 \ - file_read_ordered_begin_f08.F90 \ - file_read_ordered_end_f08.F90 \ - file_read_ordered_f08.F90 \ - file_read_shared_f08.F90 \ - file_seek_f08.F90 \ - file_seek_shared_f08.F90 \ - file_set_atomicity_f08.F90 \ - file_set_errhandler_f08.F90 \ - file_set_info_f08.F90 \ - file_set_size_f08.F90 \ - file_set_view_f08.F90 \ - file_sync_f08.F90 \ - file_write_all_begin_f08.F90 \ - file_write_all_end_f08.F90 \ - file_write_all_f08.F90 \ - file_write_at_all_begin_f08.F90 \ - file_write_at_all_end_f08.F90 \ - file_write_at_all_f08.F90 \ - file_write_at_f08.F90 \ - file_write_f08.F90 \ - file_write_ordered_begin_f08.F90 \ - file_write_ordered_end_f08.F90 \ - file_write_ordered_f08.F90 \ - file_write_shared_f08.F90 \ - register_datarep_f08.F90 -endif - # JMS Somehow this variable substitution isn't quite working, and I # don't have time to figure it out. So just wholesale copy the file # list. :-( @@ -522,6 +517,65 @@ pmpi_api_files = \ profile/pexscan_f08.F90 \ profile/pf_sync_reg_f08.F90 \ profile/pfetch_and_op_f08.F90 \ + profile/pfile_call_errhandler_f08.F90 \ + profile/pfile_close_f08.F90 \ + profile/pfile_create_errhandler_f08.F90 \ + profile/pfile_delete_f08.F90 \ + profile/pfile_get_amode_f08.F90 \ + profile/pfile_get_atomicity_f08.F90 \ + profile/pfile_get_byte_offset_f08.F90 \ + profile/pfile_get_errhandler_f08.F90 \ + profile/pfile_get_group_f08.F90 \ + profile/pfile_get_info_f08.F90 \ + profile/pfile_get_position_f08.F90 \ + profile/pfile_get_position_shared_f08.F90 \ + profile/pfile_get_size_f08.F90 \ + profile/pfile_get_type_extent_f08.F90 \ + profile/pfile_get_view_f08.F90 \ + profile/pfile_iread_at_f08.F90 \ + profile/pfile_iread_f08.F90 \ + profile/pfile_iread_at_all_f08.F90 \ + profile/pfile_iread_all_f08.F90 \ + profile/pfile_iread_shared_f08.F90 \ + profile/pfile_iwrite_at_f08.F90 \ + profile/pfile_iwrite_f08.F90 \ + profile/pfile_iwrite_at_all_f08.F90 \ + profile/pfile_iwrite_all_f08.F90 \ + profile/pfile_iwrite_shared_f08.F90 \ + profile/pfile_open_f08.F90 \ + profile/pfile_preallocate_f08.F90 \ + profile/pfile_read_all_begin_f08.F90 \ + profile/pfile_read_all_end_f08.F90 \ + profile/pfile_read_all_f08.F90 \ + profile/pfile_read_at_all_begin_f08.F90 \ + profile/pfile_read_at_all_end_f08.F90 \ + profile/pfile_read_at_all_f08.F90 \ + profile/pfile_read_at_f08.F90 \ + profile/pfile_read_f08.F90 \ + profile/pfile_read_ordered_begin_f08.F90 \ + profile/pfile_read_ordered_end_f08.F90 \ + profile/pfile_read_ordered_f08.F90 \ + profile/pfile_read_shared_f08.F90 \ + profile/pfile_seek_f08.F90 \ + profile/pfile_seek_shared_f08.F90 \ + profile/pfile_set_atomicity_f08.F90 \ + profile/pfile_set_errhandler_f08.F90 \ + profile/pfile_set_info_f08.F90 \ + profile/pfile_set_size_f08.F90 \ + profile/pfile_set_view_f08.F90 \ + profile/pfile_sync_f08.F90 \ + profile/pfile_write_all_begin_f08.F90 \ + profile/pfile_write_all_end_f08.F90 \ + profile/pfile_write_all_f08.F90 \ + profile/pfile_write_at_all_begin_f08.F90 \ + profile/pfile_write_at_all_end_f08.F90 \ + profile/pfile_write_at_all_f08.F90 \ + profile/pfile_write_at_f08.F90 \ + profile/pfile_write_f08.F90 \ + profile/pfile_write_ordered_begin_f08.F90 \ + profile/pfile_write_ordered_end_f08.F90 \ + profile/pfile_write_ordered_f08.F90 \ + profile/pfile_write_shared_f08.F90 \ profile/pfinalized_f08.F90 \ profile/pfinalize_f08.F90 \ profile/pfree_mem_f08.F90 \ @@ -629,6 +683,7 @@ pmpi_api_files = \ profile/preduce_local_f08.F90 \ profile/preduce_scatter_f08.F90 \ profile/preduce_scatter_block_f08.F90 \ + profile/pregister_datarep_f08.F90 \ profile/prequest_free_f08.F90 \ profile/prequest_get_status_f08.F90 \ profile/prget_f08.F90 \ @@ -734,66 +789,6 @@ pmpi_api_files = \ profile/pwin_unlock_all_f08.F90 \ profile/pwin_wait_f08.F90 -if OMPI_PROVIDE_MPI_FILE_INTERFACE -pmpi_api_files += \ - profile/pfile_call_errhandler_f08.F90 \ - profile/pfile_close_f08.F90 \ - profile/pfile_create_errhandler_f08.F90 \ - profile/pfile_delete_f08.F90 \ - profile/pfile_get_amode_f08.F90 \ - profile/pfile_get_atomicity_f08.F90 \ - profile/pfile_get_byte_offset_f08.F90 \ - profile/pfile_get_errhandler_f08.F90 \ - profile/pfile_get_group_f08.F90 \ - profile/pfile_get_info_f08.F90 \ - profile/pfile_get_position_f08.F90 \ - profile/pfile_get_position_shared_f08.F90 \ - profile/pfile_get_size_f08.F90 \ - profile/pfile_get_type_extent_f08.F90 \ - profile/pfile_get_view_f08.F90 \ - profile/pfile_iread_at_f08.F90 \ - profile/pfile_iread_f08.F90 \ - profile/pfile_iread_shared_f08.F90 \ - profile/pfile_iwrite_at_f08.F90 \ - profile/pfile_iwrite_f08.F90 \ - profile/pfile_iwrite_shared_f08.F90 \ - profile/pfile_open_f08.F90 \ - profile/pfile_preallocate_f08.F90 \ - profile/pfile_read_all_begin_f08.F90 \ - profile/pfile_read_all_end_f08.F90 \ - profile/pfile_read_all_f08.F90 \ - profile/pfile_read_at_all_begin_f08.F90 \ - profile/pfile_read_at_all_end_f08.F90 \ - profile/pfile_read_at_all_f08.F90 \ - profile/pfile_read_at_f08.F90 \ - profile/pfile_read_f08.F90 \ - profile/pfile_read_ordered_begin_f08.F90 \ - profile/pfile_read_ordered_end_f08.F90 \ - profile/pfile_read_ordered_f08.F90 \ - profile/pfile_read_shared_f08.F90 \ - profile/pfile_seek_f08.F90 \ - profile/pfile_seek_shared_f08.F90 \ - profile/pfile_set_atomicity_f08.F90 \ - profile/pfile_set_errhandler_f08.F90 \ - profile/pfile_set_info_f08.F90 \ - profile/pfile_set_size_f08.F90 \ - profile/pfile_set_view_f08.F90 \ - profile/pfile_sync_f08.F90 \ - profile/pfile_write_all_begin_f08.F90 \ - profile/pfile_write_all_end_f08.F90 \ - profile/pfile_write_all_f08.F90 \ - profile/pfile_write_at_all_begin_f08.F90 \ - profile/pfile_write_at_all_end_f08.F90 \ - profile/pfile_write_at_all_f08.F90 \ - profile/pfile_write_at_f08.F90 \ - profile/pfile_write_f08.F90 \ - profile/pfile_write_ordered_begin_f08.F90 \ - profile/pfile_write_ordered_end_f08.F90 \ - profile/pfile_write_ordered_f08.F90 \ - profile/pfile_write_shared_f08.F90 \ - profile/pregister_datarep_f08.F90 -endif - lib@OMPI_LIBMPI_NAME@_usempif08_la_SOURCES = \ $(mpi_api_files) \ $(pmpi_api_files) \ @@ -843,41 +838,6 @@ mpi-f08.lo: mpi-f-interfaces-bind.h pmpi-f-interfaces-bind.h ########################################################################### -# f08 support modules - -libforce_usempif08_internal_modules_to_be_built_la_SOURCES = \ - mpi-f08-types.F90 \ - mpi-f08-interfaces.F90 \ - mpi-f08-interfaces-callbacks.F90 \ - mpi-f08-callbacks.F90 \ - pmpi-f08-interfaces.F90 - -config_h = \ - $(top_builddir)/ompi/mpi/fortran/configure-fortran-output.h \ - $(top_srcdir)/ompi/mpi/fortran/configure-fortran-output-bottom.h - -# -# Automake doesn't do Fortran dependency analysis, so must list them -# manually here. Bummer! -# - -mpi-f08-types.lo: $(config_h) -mpi-f08-types.lo: mpi-f08-types.F90 -mpi-f08-interfaces.lo: $(config_h) -mpi-f08-interfaces.lo: mpi-f08-interfaces.F90 -mpi-f08-interfaces.lo: mpi-f08-interfaces-callbacks.lo -mpi-f08-interfaces-callbacks.lo: $(config_h) -mpi-f08-interfaces-callbacks.lo: mpi-f08-interfaces-callbacks.F90 -mpi-f08-interfaces-callbacks.lo: mpi-f08-types.lo -mpi-f08-callbacks.lo: $(config_h) -mpi-f08-callbacks.lo: mpi-f08-callbacks.F90 -mpi-f08-callbacks.lo: mpi-f08-types.lo -pmpi-f08-interfaces.lo: $(config_h) -pmpi-f08-interfaces.lo: pmpi-f08-interfaces.F90 -pmpi-f08-interfaces.lo: mpi-f08-interfaces-callbacks.lo - -########################################################################### - # Install the generated .mod files. Unfortunately, each F90 compiler # may generate different filenames, so we have to use a glob. :-( diff --git a/ompi/mpi/fortran/use-mpi-f08/accumulate_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/accumulate_f08.F90 index f6879e36e39..67d99414419 100644 --- a/ompi/mpi/fortran/use-mpi-f08/accumulate_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/accumulate_f08.F90 @@ -13,7 +13,7 @@ subroutine MPI_Accumulate_f08(origin_addr,origin_count,origin_datatype,& use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Win, MPI_ADDRESS_KIND use :: mpi_f08, only : ompi_accumulate_f implicit none - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN), ASYNCHRONOUS :: origin_addr INTEGER, INTENT(IN) :: origin_count, target_rank, target_count TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp diff --git a/ompi/mpi/fortran/use-mpi-f08/compare_and_swap_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/compare_and_swap_f08.F90 index 8a129f244ab..f9acb19e60c 100644 --- a/ompi/mpi/fortran/use-mpi-f08/compare_and_swap_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/compare_and_swap_f08.F90 @@ -13,8 +13,8 @@ subroutine MPI_Compare_and_swap_f08(origin_addr,compare_addr,result_addr,& use :: mpi_f08_types, only : MPI_Datatype, MPI_Win, MPI_ADDRESS_KIND use :: mpi_f08, only : ompi_compare_and_swap_f implicit none - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr, compare_addr - OMPI_FORTRAN_IGNORE_TKR_TYPE :: result_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN), ASYNCHRONOUS :: origin_addr, compare_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, ASYNCHRONOUS :: result_addr TYPE(MPI_Datatype), INTENT(IN) :: datatype INTEGER, INTENT(IN) :: target_rank INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp diff --git a/ompi/mpi/fortran/use-mpi-f08/constants.c b/ompi/mpi/fortran/use-mpi-f08/constants.c index 8b6c5353362..377b5de99e5 100644 --- a/ompi/mpi/fortran/use-mpi-f08/constants.c +++ b/ompi/mpi/fortran/use-mpi-f08/constants.c @@ -1,6 +1,6 @@ /* * Copyright (c) 2010-2015 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * * $COPYRIGHT$ @@ -62,9 +62,7 @@ OMPI_DECLSPEC ompi_fortran_08_handle_t OMPI_F08_HANDLE_ALIGNED ompi_f08_mpi_mess OMPI_DECLSPEC ompi_fortran_08_handle_t OMPI_F08_HANDLE_ALIGNED ompi_f08_mpi_op_null = {OMPI_MPI_OP_NULL}; OMPI_DECLSPEC ompi_fortran_08_handle_t OMPI_F08_HANDLE_ALIGNED ompi_f08_mpi_request_null = {OMPI_MPI_REQUEST_NULL}; OMPI_DECLSPEC ompi_fortran_08_handle_t OMPI_F08_HANDLE_ALIGNED ompi_f08_mpi_win_null = {OMPI_MPI_WIN_NULL}; -#if OMPI_PROVIDE_MPI_FILE_INTERFACE OMPI_DECLSPEC ompi_fortran_08_handle_t OMPI_F08_HANDLE_ALIGNED ompi_f08_mpi_file_null = {OMPI_MPI_FILE_NULL}; -#endif /* * common block items from ompi/include/mpif-common.h diff --git a/ompi/mpi/fortran/use-mpi-f08/fetch_and_op_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/fetch_and_op_f08.F90 index 75f687cff10..6ef6ed56b22 100644 --- a/ompi/mpi/fortran/use-mpi-f08/fetch_and_op_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/fetch_and_op_f08.F90 @@ -12,8 +12,8 @@ subroutine MPI_Fetch_and_op_f08(origin_addr,result_addr,datatype,target_rank, & use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Win, MPI_ADDRESS_KIND use :: mpi_f08, only : ompi_fetch_and_op_f implicit none - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr - OMPI_FORTRAN_IGNORE_TKR_TYPE :: result_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN), ASYNCHRONOUS :: origin_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, ASYNCHRONOUS :: result_addr TYPE(MPI_Datatype), INTENT(IN) :: datatype INTEGER, INTENT(IN) :: target_rank INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp diff --git a/ompi/mpi/fortran/use-mpi-f08/get_accumulate_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/get_accumulate_f08.F90 index 999a252128e..0302058a2cf 100644 --- a/ompi/mpi/fortran/use-mpi-f08/get_accumulate_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/get_accumulate_f08.F90 @@ -14,10 +14,10 @@ subroutine MPI_Get_accumulate_f08(origin_addr,origin_count,origin_datatype,& use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Win, MPI_ADDRESS_KIND use :: mpi_f08, only : ompi_get_accumulate_f implicit none - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN), ASYNCHRONOUS :: origin_addr INTEGER, INTENT(IN) :: origin_count, result_count, target_rank, target_count TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype - OMPI_FORTRAN_IGNORE_TKR_TYPE :: result_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, ASYNCHRONOUS :: result_addr TYPE(MPI_Datatype), INTENT(IN) :: result_datatype INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp TYPE(MPI_Datatype), INTENT(IN) :: target_datatype diff --git a/ompi/mpi/fortran/use-mpi-f08/get_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/get_f08.F90 index 4ef9188c1bb..075a0f71ddb 100644 --- a/ompi/mpi/fortran/use-mpi-f08/get_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/get_f08.F90 @@ -1,7 +1,7 @@ ! -*- f90 -*- ! ! Copyright (c) 2010-2012 Cisco Systems, Inc. All rights reserved. -! Copyright (c) 2009-2012 Los Alamos National Security, LLC. +! Copyright (c) 2009-2018 Los Alamos National Security, LLC. ! All Rights reserved. ! $COPYRIGHT$ @@ -12,7 +12,7 @@ subroutine MPI_Get_f08(origin_addr,origin_count,origin_datatype,target_rank,& use :: mpi_f08_types, only : MPI_Datatype, MPI_Win, MPI_ADDRESS_KIND use :: mpi_f08, only : ompi_get_f implicit none - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, ASYNCHRONOUS :: origin_addr INTEGER, INTENT(IN) :: origin_count, target_rank, target_count TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp diff --git a/ompi/mpi/fortran/use-mpi-f08/mod/Makefile.am b/ompi/mpi/fortran/use-mpi-f08/mod/Makefile.am new file mode 100644 index 00000000000..e8dc8bfbb5c --- /dev/null +++ b/ompi/mpi/fortran/use-mpi-f08/mod/Makefile.am @@ -0,0 +1,102 @@ +# -*- makefile -*- +# +# Copyright (c) 2006-2015 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2012-2013 The University of Tennessee and The University +# of Tennessee Research Foundation. All rights +# reserved. +# Copyright (c) 2012-2013 Inria. All rights reserved. +# Copyright (c) 2013 Los Alamos National Security, LLC. All rights +# reserved. +# Copyright (c) 2015-2016 Research Organization for Information Science +# and Technology (RIST). All rights reserved. +# Copyright (c) 2016 IBM Corporation. All rights reserved. +# +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +include $(top_srcdir)/Makefile.ompi-rules + +# This Makefile is only relevant if we're building the "use mpi_f08" +# MPI bindings. +if OMPI_BUILD_FORTRAN_USEMPIF08_BINDINGS + +AM_FCFLAGS = -I$(top_builddir)/ompi/include \ + -I$(top_srcdir)/ompi/include \ + $(OMPI_FC_MODULE_FLAG)$(top_builddir)/ompi/$(OMPI_FORTRAN_USEMPI_DIR) \ + $(OMPI_FC_MODULE_FLAG). \ + -I$(top_srcdir) $(FCFLAGS_f90) + +MOSTLYCLEANFILES = *.mod + +CLEANFILES += *.i90 + +########################################################################### + +module_sentinel_file = \ + libforce_usempif08_internal_modules_to_be_built.la + +noinst_LTLIBRARIES = $(module_sentinel_file) + +# f08 support modules + +libforce_usempif08_internal_modules_to_be_built_la_SOURCES = \ + mpi-f08-types.F90 \ + mpi-f08-interfaces.F90 \ + mpi-f08-interfaces-callbacks.F90 \ + mpi-f08-callbacks.F90 \ + pmpi-f08-interfaces.F90 + +config_h = \ + $(top_builddir)/ompi/mpi/fortran/configure-fortran-output.h \ + $(top_srcdir)/ompi/mpi/fortran/configure-fortran-output-bottom.h + +# +# Automake doesn't do Fortran dependency analysis, so must list them +# manually here. Bummer! +# + +mpi-f08-types.lo: $(config_h) +mpi-f08-types.lo: mpi-f08-types.F90 +mpi-f08-interfaces.lo: $(config_h) +mpi-f08-interfaces.lo: mpi-f08-interfaces.F90 +mpi-f08-interfaces.lo: mpi-f08-interfaces-callbacks.lo +mpi-f08-interfaces-callbacks.lo: $(config_h) +mpi-f08-interfaces-callbacks.lo: mpi-f08-interfaces-callbacks.F90 +mpi-f08-interfaces-callbacks.lo: mpi-f08-types.lo +mpi-f08-callbacks.lo: $(config_h) +mpi-f08-callbacks.lo: mpi-f08-callbacks.F90 +mpi-f08-callbacks.lo: mpi-f08-types.lo +pmpi-f08-interfaces.lo: $(config_h) +pmpi-f08-interfaces.lo: pmpi-f08-interfaces.F90 +pmpi-f08-interfaces.lo: mpi-f08-interfaces-callbacks.lo + +########################################################################### + +# Install the generated .mod files. Unfortunately, each F90 compiler +# may generate different filenames, so we have to use a glob. :-( + +install-exec-hook: + @ for file in `ls *.mod`; do \ + echo $(INSTALL) $$file $(DESTDIR)$(libdir); \ + $(INSTALL) $$file $(DESTDIR)$(libdir); \ + done + +uninstall-local: + @ for file in `ls *.mod`; do \ + echo rm -f $(DESTDIR)$(libdir)/$$file; \ + rm -f $(DESTDIR)$(libdir)/$$file; \ + done +else + +# Need to have empty targets because AM can't handle having an +# AM_CONDITIONAL was targets in the "if" statement but not in the +# "else". :-( + +install-exec-hook: +uninstall-local: + +endif diff --git a/ompi/mpi/fortran/use-mpi-f08/mpi-f08-callbacks.F90 b/ompi/mpi/fortran/use-mpi-f08/mod/mpi-f08-callbacks.F90 similarity index 100% rename from ompi/mpi/fortran/use-mpi-f08/mpi-f08-callbacks.F90 rename to ompi/mpi/fortran/use-mpi-f08/mod/mpi-f08-callbacks.F90 diff --git a/ompi/mpi/fortran/use-mpi-f08/mpi-f08-interfaces-callbacks.F90 b/ompi/mpi/fortran/use-mpi-f08/mod/mpi-f08-interfaces-callbacks.F90 similarity index 95% rename from ompi/mpi/fortran/use-mpi-f08/mpi-f08-interfaces-callbacks.F90 rename to ompi/mpi/fortran/use-mpi-f08/mod/mpi-f08-interfaces-callbacks.F90 index 47801afefe3..d72ce1b9e2f 100644 --- a/ompi/mpi/fortran/use-mpi-f08/mpi-f08-interfaces-callbacks.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/mod/mpi-f08-interfaces-callbacks.F90 @@ -2,8 +2,8 @@ ! Copyright (c) 2009-2013 Cisco Systems, Inc. All rights reserved. ! Copyright (c) 2009-2012 Los Alamos National Security, LLC. ! All rights reserved. -! Copyright (c) 2015-2016 Research Organization for Information Science -! and Technology (RIST). All rights reserved. +! Copyright (c) 2015-2018 Research Organization for Information Science +! and Technology (RIST). All rights reserved. ! $COPYRIGHT$ #include "ompi/mpi/fortran/configure-fortran-output.h" @@ -23,13 +23,13 @@ SUBROUTINE MPI_User_function(invec, inoutvec, len, datatype) !Example of a user defined callback function ! -! subroutine my_user_function( invec, inoutvec, len, type ) bind(c) +! subroutine my_user_function( invec, inoutvec, len, datatype ) bind(c) ! use, intrinsic :: iso_c_binding, only : c_ptr, c_f_pointer ! type(c_ptr), value :: invec, inoutvec ! integer, intent(in) :: len -! type(MPI_Datatype) :: type +! type(MPI_Datatype) :: datatype ! real, pointer :: invec_r(:), inoutvec_r(:) -! if (type%MPI_VAL == MPI_REAL%MPI_VAL) then +! if (datatype%MPI_VAL == MPI_REAL%MPI_VAL) then ! call c_f_pointer(invec, invec_r, (/ len /) ) ! call c_f_pointer(inoutvec, inoutvec_r, (/ len /) ) ! inoutvec_r = invec_r + inoutvec_r @@ -151,8 +151,6 @@ SUBROUTINE MPI_Win_errhandler_function(win, error_code) END SUBROUTINE END INTERFACE -#if OMPI_PROVIDE_MPI_FILE_INTERFACE - OMPI_ABSTRACT INTERFACE SUBROUTINE MPI_File_errhandler_function(file, error_code) USE mpi_f08_types @@ -162,8 +160,6 @@ SUBROUTINE MPI_File_errhandler_function(file, error_code) END SUBROUTINE END INTERFACE -#endif - OMPI_ABSTRACT INTERFACE SUBROUTINE MPI_Grequest_query_function(extra_state,status,ierror) USE mpi_f08_types diff --git a/ompi/mpi/fortran/use-mpi-f08/mod/mpi-f08-interfaces.F90 b/ompi/mpi/fortran/use-mpi-f08/mod/mpi-f08-interfaces.F90 new file mode 100644 index 00000000000..fd46c5d730a --- /dev/null +++ b/ompi/mpi/fortran/use-mpi-f08/mod/mpi-f08-interfaces.F90 @@ -0,0 +1,4739 @@ +! -*- f90 -*- +! +! Copyright (c) 2009-2015 Cisco Systems, Inc. All rights reserved. +! Copyright (c) 2009-2015 Los Alamos National Security, LLC. +! All rights reserved. +! Copyright (c) 2012 The University of Tennessee and The University +! of Tennessee Research Foundation. All rights +! reserved. +! Copyright (c) 2012 Inria. All rights reserved. +! Copyright (c) 2015-2017 Research Organization for Information Science +! and Technology (RIST). All rights reserved. +! Copyright (c) 2017 FUJITSU LIMITED. All rights reserved. +! $COPYRIGHT$ +! +! This file provides the interface specifications for the MPI Fortran +! API bindings. It effectively maps between public names ("MPI_Init") +! and the name for tools ("MPI_Init_f08") and the back-end implementation +! name (e.g., "MPI_Init_f08"). + +#include "ompi/mpi/fortran/configure-fortran-output.h" + +module mpi_f08_interfaces + +interface MPI_Bsend +subroutine MPI_Bsend_f08(buf,count,datatype,dest,tag,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count, dest, tag + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Bsend_f08 +end interface MPI_Bsend + +interface MPI_Bsend_init +subroutine MPI_Bsend_init_f08(buf,count,datatype,dest,tag,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count, dest, tag + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Bsend_init_f08 +end interface MPI_Bsend_init + +interface MPI_Buffer_attach +subroutine MPI_Buffer_attach_f08(buffer,size,ierror) + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buffer + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buffer + !$PRAGMA IGNORE_TKR buffer + !DIR$ IGNORE_TKR buffer + !IBM* IGNORE_TKR buffer + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buffer + INTEGER, INTENT(IN) :: size + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Buffer_attach_f08 +end interface MPI_Buffer_attach + +interface MPI_Buffer_detach +subroutine MPI_Buffer_detach_f08(buffer_addr,size,ierror) + USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR + implicit none + TYPE(C_PTR), INTENT(OUT) :: buffer_addr + INTEGER, INTENT(OUT) :: size + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Buffer_detach_f08 +end interface MPI_Buffer_detach + +interface MPI_Cancel +subroutine MPI_Cancel_f08(request,ierror) + use :: mpi_f08_types, only : MPI_Request + implicit none + TYPE(MPI_Request), INTENT(IN) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Cancel_f08 +end interface MPI_Cancel + +interface MPI_Get_count +subroutine MPI_Get_count_f08(status,datatype,count,ierror) + use :: mpi_f08_types, only : MPI_Status, MPI_Datatype + implicit none + TYPE(MPI_Status), INTENT(IN) :: status + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, INTENT(OUT) :: count + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Get_count_f08 +end interface MPI_Get_count + +interface MPI_Ibsend +subroutine MPI_Ibsend_f08(buf,count,datatype,dest,tag,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count, dest, tag + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Ibsend_f08 +end interface MPI_Ibsend + +interface MPI_Iprobe +subroutine MPI_Iprobe_f08(source,tag,comm,flag,status,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Status + implicit none + INTEGER, INTENT(IN) :: source, tag + TYPE(MPI_Comm), INTENT(IN) :: comm + LOGICAL, INTENT(OUT) :: flag + TYPE(MPI_Status), INTENT(OUT) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Iprobe_f08 +end interface MPI_Iprobe + +interface MPI_Irecv +subroutine MPI_Irecv_f08(buf,count,datatype,source,tag,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count, source, tag + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Irecv_f08 +end interface MPI_Irecv + +interface MPI_Irsend +subroutine MPI_Irsend_f08(buf,count,datatype,dest,tag,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count, dest, tag + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Irsend_f08 +end interface MPI_Irsend + +interface MPI_Isend +subroutine MPI_Isend_f08(buf,count,datatype,dest,tag,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count, dest, tag + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Isend_f08 +end interface MPI_Isend + +interface MPI_Issend +subroutine MPI_Issend_f08(buf,count,datatype,dest,tag,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count, dest, tag + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Issend_f08 +end interface MPI_Issend + +interface MPI_Probe +subroutine MPI_Probe_f08(source,tag,comm,status,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Status + implicit none + INTEGER, INTENT(IN) :: source, tag + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Status), INTENT(OUT) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Probe_f08 +end interface MPI_Probe + +interface MPI_Recv +subroutine MPI_Recv_f08(buf,count,datatype,source,tag,comm,status,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Status + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count, source, tag + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Recv_f08 +end interface MPI_Recv + +interface MPI_Recv_init +subroutine MPI_Recv_init_f08(buf,count,datatype,source,tag,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count, source, tag + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Recv_init_f08 +end interface MPI_Recv_init + +interface MPI_Request_free +subroutine MPI_Request_free_f08(request,ierror) + use :: mpi_f08_types, only : MPI_Request + implicit none + TYPE(MPI_Request), INTENT(INOUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Request_free_f08 +end interface MPI_Request_free + +interface MPI_Request_get_status +subroutine MPI_Request_get_status_f08(request,flag,status,ierror) + use :: mpi_f08_types, only : MPI_Request, MPI_Status + implicit none + TYPE(MPI_Request), INTENT(IN) :: request + LOGICAL, INTENT(OUT) :: flag + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Request_get_status_f08 +end interface MPI_Request_get_status + +interface MPI_Rsend +subroutine MPI_Rsend_f08(buf,count,datatype,dest,tag,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count, dest, tag + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Rsend_f08 +end interface MPI_Rsend + +interface MPI_Rsend_init +subroutine MPI_Rsend_init_f08(buf,count,datatype,dest,tag,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count, dest, tag + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Rsend_init_f08 +end interface MPI_Rsend_init + +interface MPI_Send +subroutine MPI_Send_f08(buf,count,datatype,dest,tag,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count, dest, tag + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Send_f08 +end interface MPI_Send + +interface MPI_Sendrecv +subroutine MPI_Sendrecv_f08(sendbuf,sendcount,sendtype,dest,sendtag,recvbuf, & + recvcount,recvtype,source,recvtag,comm,status,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Status + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcount, dest, sendtag, recvcount, source, recvtag + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Sendrecv_f08 +end interface MPI_Sendrecv + +interface MPI_Sendrecv_replace +subroutine MPI_Sendrecv_replace_f08(buf,count,datatype,dest,sendtag,source,recvtag, & + comm,status,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Status + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count, dest, sendtag, source, recvtag + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Sendrecv_replace_f08 +end interface MPI_Sendrecv_replace + +interface MPI_Send_init +subroutine MPI_Send_init_f08(buf,count,datatype,dest,tag,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count, dest, tag + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Send_init_f08 +end interface MPI_Send_init + +interface MPI_Ssend +subroutine MPI_Ssend_f08(buf,count,datatype,dest,tag,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count, dest, tag + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Ssend_f08 +end interface MPI_Ssend + +interface MPI_Ssend_init +subroutine MPI_Ssend_init_f08(buf,count,datatype,dest,tag,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count, dest, tag + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Ssend_init_f08 +end interface MPI_Ssend_init + +interface MPI_Start +subroutine MPI_Start_f08(request,ierror) + use :: mpi_f08_types, only : MPI_Request + implicit none + TYPE(MPI_Request), INTENT(INOUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Start_f08 +end interface MPI_Start + +interface MPI_Startall +subroutine MPI_Startall_f08(count,array_of_requests,ierror) + use :: mpi_f08_types, only : MPI_Request + implicit none + INTEGER, INTENT(IN) :: count + TYPE(MPI_Request), INTENT(INOUT) :: array_of_requests(count) + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Startall_f08 +end interface MPI_Startall + +interface MPI_Test +subroutine MPI_Test_f08(request,flag,status,ierror) + use :: mpi_f08_types, only : MPI_Request, MPI_Status + implicit none + TYPE(MPI_Request), INTENT(INOUT) :: request + LOGICAL, INTENT(OUT) :: flag + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Test_f08 +end interface MPI_Test + +interface MPI_Testall +subroutine MPI_Testall_f08(count,array_of_requests,flag,array_of_statuses,ierror) + use :: mpi_f08_types, only : MPI_Request, MPI_Status + implicit none + INTEGER, INTENT(IN) :: count + TYPE(MPI_Request), INTENT(INOUT) :: array_of_requests(count) + LOGICAL, INTENT(OUT) :: flag + TYPE(MPI_Status) :: array_of_statuses(*) + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Testall_f08 +end interface MPI_Testall + +interface MPI_Testany +subroutine MPI_Testany_f08(count,array_of_requests,index,flag,status,ierror) + use :: mpi_f08_types, only : MPI_Request, MPI_Status + implicit none + INTEGER, INTENT(IN) :: count + TYPE(MPI_Request), INTENT(INOUT) :: array_of_requests(count) + INTEGER, INTENT(OUT) :: index + LOGICAL, INTENT(OUT) :: flag + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Testany_f08 +end interface MPI_Testany + +interface MPI_Testsome +subroutine MPI_Testsome_f08(incount,array_of_requests,outcount, & + array_of_indices,array_of_statuses,ierror) + use :: mpi_f08_types, only : MPI_Request, MPI_Status + implicit none + INTEGER, INTENT(IN) :: incount + TYPE(MPI_Request), INTENT(INOUT) :: array_of_requests(incount) + INTEGER, INTENT(OUT) :: outcount, array_of_indices(*) + TYPE(MPI_Status) :: array_of_statuses(*) + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Testsome_f08 +end interface MPI_Testsome + +interface MPI_Test_cancelled +subroutine MPI_Test_cancelled_f08(status,flag,ierror) + use :: mpi_f08_types, only : MPI_Status + implicit none + TYPE(MPI_Status), INTENT(IN) :: status + LOGICAL, INTENT(OUT) :: flag + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Test_cancelled_f08 +end interface MPI_Test_cancelled + +interface MPI_Wait +subroutine MPI_Wait_f08(request,status,ierror) + use :: mpi_f08_types, only : MPI_Request, MPI_Status + implicit none + TYPE(MPI_Request), INTENT(INOUT) :: request + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Wait_f08 +end interface MPI_Wait + +interface MPI_Waitall +subroutine MPI_Waitall_f08(count,array_of_requests,array_of_statuses,ierror) + use :: mpi_f08_types, only : MPI_Request, MPI_Status + implicit none + INTEGER, INTENT(IN) :: count + TYPE(MPI_Request), INTENT(INOUT) :: array_of_requests(count) + TYPE(MPI_Status) :: array_of_statuses(*) + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Waitall_f08 +end interface MPI_Waitall + +interface MPI_Waitany +subroutine MPI_Waitany_f08(count,array_of_requests,index,status,ierror) + use :: mpi_f08_types, only : MPI_Request, MPI_Status + implicit none + INTEGER, INTENT(IN) :: count + TYPE(MPI_Request), INTENT(INOUT) :: array_of_requests(count) + INTEGER, INTENT(OUT) :: index + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Waitany_f08 +end interface MPI_Waitany + +interface MPI_Waitsome +subroutine MPI_Waitsome_f08(incount,array_of_requests,outcount, & + array_of_indices,array_of_statuses,ierror) + use :: mpi_f08_types, only : MPI_Request, MPI_Status + implicit none + INTEGER, INTENT(IN) :: incount + TYPE(MPI_Request), INTENT(INOUT) :: array_of_requests(incount) + INTEGER, INTENT(OUT) :: outcount, array_of_indices(*) + TYPE(MPI_Status) :: array_of_statuses(*) + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Waitsome_f08 +end interface MPI_Waitsome + +interface MPI_Get_address +subroutine MPI_Get_address_f08(location,address,ierror) + use :: mpi_f08_types, only : MPI_ADDRESS_KIND + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: location + !GCC$ ATTRIBUTES NO_ARG_CHECK :: location + !$PRAGMA IGNORE_TKR location + !DIR$ IGNORE_TKR location + !IBM* IGNORE_TKR location + OMPI_FORTRAN_IGNORE_TKR_TYPE :: location + INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: address + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Get_address_f08 +end interface MPI_Get_address + +interface MPI_Get_elements +subroutine MPI_Get_elements_f08(status,datatype,count,ierror) + use :: mpi_f08_types, only : MPI_Status, MPI_Datatype + implicit none + TYPE(MPI_Status), INTENT(IN) :: status + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, INTENT(OUT) :: count + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Get_elements_f08 +end interface MPI_Get_elements + +interface MPI_Get_elements_x +subroutine MPI_Get_elements_x_f08(status,datatype,count,ierror) + use :: mpi_f08_types, only : MPI_Status, MPI_Datatype, MPI_COUNT_KIND + implicit none + TYPE(MPI_Status), INTENT(IN) :: status + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER(MPI_COUNT_KIND), INTENT(OUT) :: count + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Get_elements_x_f08 +end interface MPI_Get_elements_x + +interface MPI_Pack +subroutine MPI_Pack_f08(inbuf,incount,datatype,outbuf,outsize,position,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, outbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, outbuf + !$PRAGMA IGNORE_TKR inbuf, outbuf + !DIR$ IGNORE_TKR inbuf, outbuf + !IBM* IGNORE_TKR inbuf, outbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: inbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: outbuf + INTEGER, INTENT(IN) :: incount, outsize + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, INTENT(INOUT) :: position + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Pack_f08 +end interface MPI_Pack + +interface MPI_Pack_external +subroutine MPI_Pack_external_f08(datarep,inbuf,incount,datatype,outbuf,outsize, & + position,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND + implicit none + CHARACTER(LEN=*), INTENT(IN) :: datarep + !DEC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, outbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, outbuf + !$PRAGMA IGNORE_TKR inbuf, outbuf + !DIR$ IGNORE_TKR inbuf, outbuf + !IBM* IGNORE_TKR inbuf, outbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: inbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: outbuf + INTEGER, INTENT(IN) :: incount + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: outsize + INTEGER(MPI_ADDRESS_KIND), INTENT(INOUT) :: position + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Pack_external_f08 +end interface MPI_Pack_external + +interface MPI_Pack_external_size +subroutine MPI_Pack_external_size_f08(datarep,incount,datatype,size,ierror & + ) + use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND + implicit none + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, INTENT(IN) :: incount + CHARACTER(LEN=*), INTENT(IN) :: datarep + INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: size + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Pack_external_size_f08 +end interface MPI_Pack_external_size + +interface MPI_Pack_size +subroutine MPI_Pack_size_f08(incount,datatype,comm,size,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + INTEGER, INTENT(IN) :: incount + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(OUT) :: size + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Pack_size_f08 +end interface MPI_Pack_size + +interface MPI_Type_commit +subroutine MPI_Type_commit_f08(datatype,ierror) + use :: mpi_f08_types, only : MPI_Datatype + implicit none + TYPE(MPI_Datatype), INTENT(INOUT) :: datatype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_commit_f08 +end interface MPI_Type_commit + +interface MPI_Type_contiguous +subroutine MPI_Type_contiguous_f08(count,oldtype,newtype,ierror) + use :: mpi_f08_types, only : MPI_Datatype + implicit none + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: oldtype + TYPE(MPI_Datatype), INTENT(OUT) :: newtype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_contiguous_f08 +end interface MPI_Type_contiguous + +interface MPI_Type_create_darray +subroutine MPI_Type_create_darray_f08(size,rank,ndims,array_of_gsizes, & + array_of_distribs,array_of_dargs,array_of_psizes,order, & + oldtype,newtype,ierror) + use :: mpi_f08_types, only : MPI_Datatype + implicit none + INTEGER, INTENT(IN) :: size, rank, ndims, order + INTEGER, INTENT(IN) :: array_of_gsizes(ndims), array_of_distribs(ndims) + INTEGER, INTENT(IN) :: array_of_dargs(ndims), array_of_psizes(ndims) + TYPE(MPI_Datatype), INTENT(IN) :: oldtype + TYPE(MPI_Datatype), INTENT(OUT) :: newtype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_create_darray_f08 +end interface MPI_Type_create_darray + +interface MPI_Type_create_hindexed +subroutine MPI_Type_create_hindexed_f08(count,array_of_blocklengths, & + array_of_displacements,oldtype,newtype,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND + implicit none + INTEGER, INTENT(IN) :: count + INTEGER, INTENT(IN) :: array_of_blocklengths(count) + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: array_of_displacements(count) + TYPE(MPI_Datatype), INTENT(IN) :: oldtype + TYPE(MPI_Datatype), INTENT(OUT) :: newtype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_create_hindexed_f08 +end interface MPI_Type_create_hindexed + +interface MPI_Type_create_hvector +subroutine MPI_Type_create_hvector_f08(count,blocklength,stride,oldtype,newtype,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND + implicit none + INTEGER, INTENT(IN) :: count, blocklength + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: stride + TYPE(MPI_Datatype), INTENT(IN) :: oldtype + TYPE(MPI_Datatype), INTENT(OUT) :: newtype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_create_hvector_f08 +end interface MPI_Type_create_hvector + +interface MPI_Type_create_indexed_block +subroutine MPI_Type_create_indexed_block_f08(count,blocklength, & + array_of_displacements,oldtype,newtype,ierror) + use :: mpi_f08_types, only : MPI_Datatype + implicit none + INTEGER, INTENT(IN) :: count, blocklength + INTEGER, INTENT(IN) :: array_of_displacements(count) + TYPE(MPI_Datatype), INTENT(IN) :: oldtype + TYPE(MPI_Datatype), INTENT(OUT) :: newtype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_create_indexed_block_f08 +end interface MPI_Type_create_indexed_block + +interface MPI_Type_create_hindexed_block +subroutine MPI_Type_create_hindexed_block_f08(count,blocklength, & + array_of_displacements,oldtype,newtype,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND + implicit none + INTEGER, INTENT(IN) :: count, blocklength + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: array_of_displacements(count) + TYPE(MPI_Datatype), INTENT(IN) :: oldtype + TYPE(MPI_Datatype), INTENT(OUT) :: newtype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_create_hindexed_block_f08 +end interface MPI_Type_create_hindexed_block + +interface MPI_Type_create_resized +subroutine MPI_Type_create_resized_f08(oldtype,lb,extent,newtype,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND + implicit none + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: lb, extent + TYPE(MPI_Datatype), INTENT(IN) :: oldtype + TYPE(MPI_Datatype), INTENT(OUT) :: newtype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_create_resized_f08 +end interface MPI_Type_create_resized + +interface MPI_Type_create_struct +subroutine MPI_Type_create_struct_f08(count,array_of_blocklengths, & + array_of_displacements,array_of_types,newtype,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND + implicit none + INTEGER, INTENT(IN) :: count + INTEGER, INTENT(IN) :: array_of_blocklengths(count) + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: array_of_displacements(count) + TYPE(MPI_Datatype), INTENT(IN) :: array_of_types(count) + TYPE(MPI_Datatype), INTENT(OUT) :: newtype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_create_struct_f08 +end interface MPI_Type_create_struct + +interface MPI_Type_create_subarray +subroutine MPI_Type_create_subarray_f08(ndims,array_of_sizes,array_of_subsizes, & + array_of_starts,order,oldtype,newtype,ierror) + use :: mpi_f08_types, only : MPI_Datatype + implicit none + INTEGER, INTENT(IN) :: ndims, order + INTEGER, INTENT(IN) :: array_of_sizes(ndims), array_of_subsizes(ndims) + INTEGER, INTENT(IN) :: array_of_starts(ndims) + TYPE(MPI_Datatype), INTENT(IN) :: oldtype + TYPE(MPI_Datatype), INTENT(OUT) :: newtype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_create_subarray_f08 +end interface MPI_Type_create_subarray + +interface MPI_Type_dup +subroutine MPI_Type_dup_f08(oldtype,newtype,ierror) + use :: mpi_f08_types, only : MPI_Datatype + implicit none + TYPE(MPI_Datatype), INTENT(IN) :: oldtype + TYPE(MPI_Datatype), INTENT(OUT) :: newtype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_dup_f08 +end interface MPI_Type_dup + +interface MPI_Type_free +subroutine MPI_Type_free_f08(datatype,ierror) + use :: mpi_f08_types, only : MPI_Datatype + implicit none + TYPE(MPI_Datatype), INTENT(INOUT) :: datatype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_free_f08 +end interface MPI_Type_free + +interface MPI_Type_get_contents +subroutine MPI_Type_get_contents_f08(datatype,max_integers,max_addresses,max_datatypes, & + array_of_integers,array_of_addresses,array_of_datatypes, & + ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND + implicit none + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, INTENT(IN) :: max_integers, max_addresses, max_datatypes + INTEGER, INTENT(OUT) :: array_of_integers(max_integers) + INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: array_of_addresses(max_addresses) + TYPE(MPI_Datatype), INTENT(OUT) :: array_of_datatypes(max_datatypes) + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_get_contents_f08 +end interface MPI_Type_get_contents + +interface MPI_Type_get_envelope +subroutine MPI_Type_get_envelope_f08(datatype,num_integers,num_addresses,num_datatypes, & + combiner,ierror) + use :: mpi_f08_types, only : MPI_Datatype + implicit none + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, INTENT(OUT) :: num_integers, num_addresses, num_datatypes, combiner + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_get_envelope_f08 +end interface MPI_Type_get_envelope + +interface MPI_Type_get_extent +subroutine MPI_Type_get_extent_f08(datatype,lb,extent,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND + implicit none + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: lb, extent + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_get_extent_f08 +end interface MPI_Type_get_extent + +interface MPI_Type_get_extent_x +subroutine MPI_Type_get_extent_x_f08(datatype,lb,extent,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND, MPI_COUNT_KIND + implicit none + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER(MPI_COUNT_KIND), INTENT(OUT) :: lb, extent + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_get_extent_x_f08 +end interface MPI_Type_get_extent_x + +interface MPI_Type_get_true_extent +subroutine MPI_Type_get_true_extent_f08(datatype,true_lb,true_extent,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND + implicit none + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: true_lb, true_extent + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_get_true_extent_f08 +end interface MPI_Type_get_true_extent + +interface MPI_Type_get_true_extent_x +subroutine MPI_Type_get_true_extent_x_f08(datatype,true_lb,true_extent,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND, MPI_COUNT_KIND + implicit none + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER(MPI_COUNT_KIND), INTENT(OUT) :: true_lb, true_extent + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_get_true_extent_x_f08 +end interface MPI_Type_get_true_extent_x + +interface MPI_Type_indexed +subroutine MPI_Type_indexed_f08(count,array_of_blocklengths, & + array_of_displacements,oldtype,newtype,ierror) + use :: mpi_f08_types, only : MPI_Datatype + implicit none + INTEGER, INTENT(IN) :: count + INTEGER, INTENT(IN) :: array_of_blocklengths(count), array_of_displacements(count) + TYPE(MPI_Datatype), INTENT(IN) :: oldtype + TYPE(MPI_Datatype), INTENT(OUT) :: newtype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_indexed_f08 +end interface MPI_Type_indexed + +interface MPI_Type_size +subroutine MPI_Type_size_f08(datatype,size,ierror) + use :: mpi_f08_types, only : MPI_Datatype + implicit none + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, INTENT(OUT) :: size + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_size_f08 +end interface MPI_Type_size + +interface MPI_Type_size_x +subroutine MPI_Type_size_x_f08(datatype,size,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_COUNT_KIND + implicit none + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER(MPI_COUNT_KIND), INTENT(OUT) :: size + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_size_x_f08 +end interface MPI_Type_size_x + +interface MPI_Type_vector +subroutine MPI_Type_vector_f08(count,blocklength,stride,oldtype,newtype,ierror) + use :: mpi_f08_types, only : MPI_Datatype + implicit none + INTEGER, INTENT(IN) :: count, blocklength, stride + TYPE(MPI_Datatype), INTENT(IN) :: oldtype + TYPE(MPI_Datatype), INTENT(OUT) :: newtype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_vector_f08 +end interface MPI_Type_vector + +interface MPI_Unpack +subroutine MPI_Unpack_f08(inbuf,insize,position,outbuf,outcount,datatype,comm, & + ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, outbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, outbuf + !$PRAGMA IGNORE_TKR inbuf, outbuf + !DIR$ IGNORE_TKR inbuf, outbuf + !IBM* IGNORE_TKR inbuf, outbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: inbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: outbuf + INTEGER, INTENT(IN) :: insize, outcount + INTEGER, INTENT(INOUT) :: position + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Unpack_f08 +end interface MPI_Unpack + +interface MPI_Unpack_external +subroutine MPI_Unpack_external_f08(datarep,inbuf,insize,position,outbuf,outcount, & + datatype,ierror & + ) + use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND + implicit none + CHARACTER(LEN=*), INTENT(IN) :: datarep + !DEC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, outbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, outbuf + !$PRAGMA IGNORE_TKR inbuf, outbuf + !DIR$ IGNORE_TKR inbuf, outbuf + !IBM* IGNORE_TKR inbuf, outbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: inbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: outbuf + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: insize + INTEGER(MPI_ADDRESS_KIND), INTENT(INOUT) :: position + INTEGER, INTENT(IN) :: outcount + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Unpack_external_f08 +end interface MPI_Unpack_external + +interface MPI_Allgather +subroutine MPI_Allgather_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & + comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcount, recvcount + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Allgather_f08 +end interface MPI_Allgather + +interface MPI_Iallgather +subroutine MPI_Iallgather_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & + comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcount, recvcount + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Iallgather_f08 +end interface MPI_Iallgather + +interface MPI_Allgatherv +subroutine MPI_Allgatherv_f08(sendbuf,sendcount,sendtype,recvbuf,recvcounts,displs, & + recvtype,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcount + INTEGER, INTENT(IN) :: recvcounts(*), displs(*) + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Allgatherv_f08 +end interface MPI_Allgatherv + +interface MPI_Iallgatherv +subroutine MPI_Iallgatherv_f08(sendbuf,sendcount,sendtype,recvbuf,recvcounts,displs, & + recvtype,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcount + INTEGER, INTENT(IN) :: recvcounts(*), displs(*) + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Iallgatherv_f08 +end interface MPI_Iallgatherv + +interface MPI_Allreduce +subroutine MPI_Allreduce_f08(sendbuf,recvbuf,count,datatype,op,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Op), INTENT(IN) :: op + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Allreduce_f08 +end interface MPI_Allreduce + +interface MPI_Iallreduce +subroutine MPI_Iallreduce_f08(sendbuf,recvbuf,count,datatype,op,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Op), INTENT(IN) :: op + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Iallreduce_f08 +end interface MPI_Iallreduce + +interface MPI_Alltoall +subroutine MPI_Alltoall_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & + comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcount, recvcount + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Alltoall_f08 +end interface MPI_Alltoall + +interface MPI_Ialltoall +subroutine MPI_Ialltoall_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & + comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcount, recvcount + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Ialltoall_f08 +end interface MPI_Ialltoall + +interface MPI_Alltoallv +subroutine MPI_Alltoallv_f08(sendbuf,sendcounts,sdispls,sendtype,recvbuf,recvcounts, & + rdispls,recvtype,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcounts(*), sdispls(*), recvcounts(*), rdispls(*) + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Alltoallv_f08 +end interface MPI_Alltoallv + +interface MPI_Ialltoallv +subroutine MPI_Ialltoallv_f08(sendbuf,sendcounts,sdispls,sendtype,recvbuf,recvcounts, & + rdispls,recvtype,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcounts(*), sdispls(*), recvcounts(*), rdispls(*) + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(IN) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Ialltoallv_f08 +end interface MPI_Ialltoallv + +interface MPI_Alltoallw +subroutine MPI_Alltoallw_f08(sendbuf,sendcounts,sdispls,sendtypes,recvbuf,recvcounts, & + rdispls,recvtypes,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcounts(*), sdispls(*), recvcounts(*), rdispls(*) + TYPE(MPI_Datatype), INTENT(IN) :: sendtypes(*), recvtypes(*) + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Alltoallw_f08 +end interface MPI_Alltoallw + +interface MPI_Ialltoallw +subroutine MPI_Ialltoallw_f08(sendbuf,sendcounts,sdispls,sendtypes,recvbuf,recvcounts, & + rdispls,recvtypes,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcounts(*), sdispls(*), recvcounts(*), rdispls(*) + TYPE(MPI_Datatype), INTENT(IN) :: sendtypes(*), recvtypes(*) + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(IN) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Ialltoallw_f08 +end interface MPI_Ialltoallw + +interface MPI_Barrier +subroutine MPI_Barrier_f08(comm,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Barrier_f08 +end interface MPI_Barrier + +interface MPI_Ibarrier +subroutine MPI_Ibarrier_f08(comm,request,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Request + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Ibarrier_f08 +end interface MPI_Ibarrier + +interface MPI_Bcast +subroutine MPI_Bcast_f08(buffer,count,datatype,root,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buffer + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buffer + !$PRAGMA IGNORE_TKR buffer + !DIR$ IGNORE_TKR buffer + !IBM* IGNORE_TKR buffer + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buffer + INTEGER, INTENT(IN) :: count, root + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Bcast_f08 +end interface MPI_Bcast + +interface MPI_Ibcast +subroutine MPI_Ibcast_f08(buffer,count,datatype,root,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buffer + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buffer + !$PRAGMA IGNORE_TKR buffer + !DIR$ IGNORE_TKR buffer + !IBM* IGNORE_TKR buffer + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buffer + INTEGER, INTENT(IN) :: count, root + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Ibcast_f08 +end interface MPI_Ibcast + +interface MPI_Exscan +subroutine MPI_Exscan_f08(sendbuf,recvbuf,count,datatype,op,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Op), INTENT(IN) :: op + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Exscan_f08 +end interface MPI_Exscan + +interface MPI_Iexscan +subroutine MPI_Iexscan_f08(sendbuf,recvbuf,count,datatype,op,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Op), INTENT(IN) :: op + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Iexscan_f08 +end interface MPI_Iexscan + +interface MPI_Gather +subroutine MPI_Gather_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & + root,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcount, recvcount, root + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Gather_f08 +end interface MPI_Gather + +interface MPI_Igather +subroutine MPI_Igather_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & + root,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcount, recvcount, root + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Igather_f08 +end interface MPI_Igather + +interface MPI_Gatherv +subroutine MPI_Gatherv_f08(sendbuf,sendcount,sendtype,recvbuf,recvcounts,displs, & + recvtype,root,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcount, root + INTEGER, INTENT(IN) :: recvcounts(*), displs(*) + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Gatherv_f08 +end interface MPI_Gatherv + +interface MPI_Igatherv +subroutine MPI_Igatherv_f08(sendbuf,sendcount,sendtype,recvbuf,recvcounts,displs, & + recvtype,root,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcount, root + INTEGER, INTENT(IN) :: recvcounts(*), displs(*) + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Igatherv_f08 +end interface MPI_Igatherv + +interface MPI_Op_commutative +subroutine MPI_Op_commutative_f08(op,commute,ierror) + use :: mpi_f08_types, only : MPI_Op + implicit none + TYPE(MPI_Op), INTENT(IN) :: op + LOGICAL, INTENT(OUT) :: commute + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Op_commutative_f08 +end interface MPI_Op_commutative + +interface MPI_Op_create +subroutine MPI_Op_create_f08(user_fn,commute,op,ierror) + use :: mpi_f08_types, only : MPI_Op + use :: mpi_f08_interfaces_callbacks, only : MPI_User_function + implicit none + PROCEDURE(MPI_User_function) :: user_fn + LOGICAL, INTENT(IN) :: commute + TYPE(MPI_Op), INTENT(OUT) :: op + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Op_create_f08 +end interface MPI_Op_create + +interface MPI_Op_free +subroutine MPI_Op_free_f08(op,ierror) + use :: mpi_f08_types, only : MPI_Op + implicit none + TYPE(MPI_Op), INTENT(INOUT) :: op + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Op_free_f08 +end interface MPI_Op_free + +interface MPI_Reduce +subroutine MPI_Reduce_f08(sendbuf,recvbuf,count,datatype,op,root,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: count, root + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Op), INTENT(IN) :: op + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Reduce_f08 +end interface MPI_Reduce + +interface MPI_Ireduce +subroutine MPI_Ireduce_f08(sendbuf,recvbuf,count,datatype,op,root,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: count, root + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Op), INTENT(IN) :: op + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Ireduce_f08 +end interface MPI_Ireduce + +interface MPI_Reduce_local +subroutine MPI_Reduce_local_f08(inbuf,inoutbuf,count,datatype,op,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Op + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, inoutbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, inoutbuf + !$PRAGMA IGNORE_TKR inbuf, inoutbuf + !DIR$ IGNORE_TKR inbuf, inoutbuf + !IBM* IGNORE_TKR inbuf, inoutbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: inbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: inoutbuf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Op), INTENT(IN) :: op + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Reduce_local_f08 +end interface MPI_Reduce_local + +interface MPI_Reduce_scatter +subroutine MPI_Reduce_scatter_f08(sendbuf,recvbuf,recvcounts,datatype,op,comm, & + ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: recvcounts(*) + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Op), INTENT(IN) :: op + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Reduce_scatter_f08 +end interface MPI_Reduce_scatter + +interface MPI_Ireduce_scatter +subroutine MPI_Ireduce_scatter_f08(sendbuf,recvbuf,recvcounts,datatype,op,comm, & + request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: recvcounts(*) + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Op), INTENT(IN) :: op + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Ireduce_scatter_f08 +end interface MPI_Ireduce_scatter + +interface MPI_Reduce_scatter_block +subroutine MPI_Reduce_scatter_block_f08(sendbuf,recvbuf,recvcount,datatype,op,comm, & + ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: recvcount + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Op), INTENT(IN) :: op + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Reduce_scatter_block_f08 +end interface MPI_Reduce_scatter_block + +interface MPI_Ireduce_scatter_block +subroutine MPI_Ireduce_scatter_block_f08(sendbuf,recvbuf,recvcount,datatype,op,comm, & + request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: recvcount + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Op), INTENT(IN) :: op + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Ireduce_scatter_block_f08 +end interface MPI_Ireduce_scatter_block + +interface MPI_Scan +subroutine MPI_Scan_f08(sendbuf,recvbuf,count,datatype,op,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Op), INTENT(IN) :: op + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Scan_f08 +end interface MPI_Scan + +interface MPI_Iscan +subroutine MPI_Iscan_f08(sendbuf,recvbuf,count,datatype,op,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Op), INTENT(IN) :: op + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Iscan_f08 +end interface MPI_Iscan + +interface MPI_Scatter +subroutine MPI_Scatter_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & + root,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcount, recvcount, root + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Scatter_f08 +end interface MPI_Scatter + +interface MPI_Iscatter +subroutine MPI_Iscatter_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & + root,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcount, recvcount, root + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Iscatter_f08 +end interface MPI_Iscatter + +interface MPI_Scatterv +subroutine MPI_Scatterv_f08(sendbuf,sendcounts,displs,sendtype,recvbuf,recvcount, & + recvtype,root,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: recvcount, root + INTEGER, INTENT(IN) :: sendcounts(*), displs(*) + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Scatterv_f08 +end interface MPI_Scatterv + +interface MPI_Iscatterv +subroutine MPI_Iscatterv_f08(sendbuf,sendcounts,displs,sendtype,recvbuf,recvcount, & + recvtype,root,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: recvcount, root + INTEGER, INTENT(IN) :: sendcounts(*), displs(*) + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Iscatterv_f08 +end interface MPI_Iscatterv + +interface MPI_Comm_compare +subroutine MPI_Comm_compare_f08(comm1,comm2,result,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm1 + TYPE(MPI_Comm), INTENT(IN) :: comm2 + INTEGER, INTENT(OUT) :: result + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_compare_f08 +end interface MPI_Comm_compare + +interface MPI_Comm_create +subroutine MPI_Comm_create_f08(comm,group,newcomm,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Group + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Group), INTENT(IN) :: group + TYPE(MPI_Comm), INTENT(OUT) :: newcomm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_create_f08 +end interface MPI_Comm_create + +interface MPI_Comm_create_group +subroutine MPI_Comm_create_group_f08(comm,group,tag,newcomm,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Group + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Group), INTENT(IN) :: group + INTEGER, INTENT(IN) :: tag + TYPE(MPI_Comm), INTENT(OUT) :: newcomm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_create_group_f08 +end interface MPI_Comm_create_group + +interface MPI_Comm_create_keyval +subroutine MPI_Comm_create_keyval_f08(comm_copy_attr_fn,comm_delete_attr_fn,comm_keyval, & + extra_state,ierror) + use :: mpi_f08_types, only : MPI_ADDRESS_KIND + use :: mpi_f08_interfaces_callbacks, only : MPI_Comm_copy_attr_function + use :: mpi_f08_interfaces_callbacks, only : MPI_Comm_delete_attr_function + implicit none + PROCEDURE(MPI_Comm_copy_attr_function) :: comm_copy_attr_fn + PROCEDURE(MPI_Comm_delete_attr_function) :: comm_delete_attr_fn + INTEGER, INTENT(OUT) :: comm_keyval + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: extra_state + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_create_keyval_f08 +end interface MPI_Comm_create_keyval + +interface MPI_Comm_delete_attr +subroutine MPI_Comm_delete_attr_f08(comm,comm_keyval,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(IN) :: comm_keyval + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_delete_attr_f08 +end interface MPI_Comm_delete_attr + +interface MPI_Comm_dup +subroutine MPI_Comm_dup_f08(comm,newcomm,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Comm), INTENT(OUT) :: newcomm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_dup_f08 +end interface MPI_Comm_dup + +interface MPI_Comm_dup_with_info +subroutine MPI_Comm_dup_with_info_f08(comm,info,newcomm,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Info + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Info), INTENT(IN) :: info + TYPE(MPI_Comm), INTENT(OUT) :: newcomm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_dup_with_info_f08 +end interface MPI_Comm_dup_with_info + +interface MPI_Comm_idup +subroutine MPI_Comm_idup_f08(comm,newcomm,request,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Request + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Comm), INTENT(OUT) :: newcomm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_idup_f08 +end interface MPI_Comm_idup + +interface MPI_Comm_free +subroutine MPI_Comm_free_f08(comm,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(INOUT) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_free_f08 +end interface MPI_Comm_free + +interface MPI_Comm_free_keyval +subroutine MPI_Comm_free_keyval_f08(comm_keyval,ierror) + implicit none + INTEGER, INTENT(INOUT) :: comm_keyval + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_free_keyval_f08 +end interface MPI_Comm_free_keyval + +interface MPI_Comm_get_attr +subroutine MPI_Comm_get_attr_f08(comm,comm_keyval,attribute_val,flag,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_ADDRESS_KIND + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(IN) :: comm_keyval + INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: attribute_val + LOGICAL, INTENT(OUT) :: flag + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_get_attr_f08 +end interface MPI_Comm_get_attr + +interface MPI_Comm_get_info +subroutine MPI_Comm_get_info_f08(comm,info_used,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Info + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Info), INTENT(OUT) :: info_used + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_get_info_f08 +end interface MPI_Comm_get_info + +interface MPI_Comm_get_name +subroutine MPI_Comm_get_name_f08(comm,comm_name,resultlen,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_MAX_OBJECT_NAME + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + CHARACTER(LEN=MPI_MAX_OBJECT_NAME), INTENT(OUT) :: comm_name + INTEGER, INTENT(OUT) :: resultlen + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_get_name_f08 +end interface MPI_Comm_get_name + +interface MPI_Comm_group +subroutine MPI_Comm_group_f08(comm,group,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Group + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Group), INTENT(OUT) :: group + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_group_f08 +end interface MPI_Comm_group + +interface MPI_Comm_rank +subroutine MPI_Comm_rank_f08(comm,rank,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(OUT) :: rank + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_rank_f08 +end interface MPI_Comm_rank + +interface MPI_Comm_remote_group +subroutine MPI_Comm_remote_group_f08(comm,group,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Group + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Group), INTENT(OUT) :: group + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_remote_group_f08 +end interface MPI_Comm_remote_group + +interface MPI_Comm_remote_size +subroutine MPI_Comm_remote_size_f08(comm,size,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(OUT) :: size + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_remote_size_f08 +end interface MPI_Comm_remote_size + +interface MPI_Comm_set_attr +subroutine MPI_Comm_set_attr_f08(comm,comm_keyval,attribute_val,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_ADDRESS_KIND + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(IN) :: comm_keyval + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: attribute_val + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_set_attr_f08 +end interface MPI_Comm_set_attr + +interface MPI_Comm_set_info +subroutine MPI_Comm_set_info_f08(comm,info,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Info + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Info), INTENT(IN) :: info + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_set_info_f08 +end interface MPI_Comm_set_info + +interface MPI_Comm_set_name +subroutine MPI_Comm_set_name_f08(comm,comm_name,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + CHARACTER(LEN=*), INTENT(IN) :: comm_name + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_set_name_f08 +end interface MPI_Comm_set_name + +interface MPI_Comm_size +subroutine MPI_Comm_size_f08(comm,size,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(OUT) :: size + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_size_f08 +end interface MPI_Comm_size + +interface MPI_Comm_split +subroutine MPI_Comm_split_f08(comm,color,key,newcomm,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(IN) :: color, key + TYPE(MPI_Comm), INTENT(OUT) :: newcomm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_split_f08 +end interface MPI_Comm_split + +interface MPI_Comm_test_inter +subroutine MPI_Comm_test_inter_f08(comm,flag,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + LOGICAL, INTENT(OUT) :: flag + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_test_inter_f08 +end interface MPI_Comm_test_inter + +interface MPI_Group_compare +subroutine MPI_Group_compare_f08(group1,group2,result,ierror) + use :: mpi_f08_types, only : MPI_Group + implicit none + TYPE(MPI_Group), INTENT(IN) :: group1, group2 + INTEGER, INTENT(OUT) :: result + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Group_compare_f08 +end interface MPI_Group_compare + +interface MPI_Group_difference +subroutine MPI_Group_difference_f08(group1,group2,newgroup,ierror) + use :: mpi_f08_types, only : MPI_Group + implicit none + TYPE(MPI_Group), INTENT(IN) :: group1, group2 + TYPE(MPI_Group), INTENT(OUT) :: newgroup + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Group_difference_f08 +end interface MPI_Group_difference + +interface MPI_Group_excl +subroutine MPI_Group_excl_f08(group,n,ranks,newgroup,ierror) + use :: mpi_f08_types, only : MPI_Group + implicit none + TYPE(MPI_Group), INTENT(IN) :: group + INTEGER, INTENT(IN) :: n, ranks(n) + TYPE(MPI_Group), INTENT(OUT) :: newgroup + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Group_excl_f08 +end interface MPI_Group_excl + +interface MPI_Group_free +subroutine MPI_Group_free_f08(group,ierror) + use :: mpi_f08_types, only : MPI_Group + implicit none + TYPE(MPI_Group), INTENT(INOUT) :: group + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Group_free_f08 +end interface MPI_Group_free + +interface MPI_Group_incl +subroutine MPI_Group_incl_f08(group,n,ranks,newgroup,ierror) + use :: mpi_f08_types, only : MPI_Group + implicit none + INTEGER, INTENT(IN) :: n, ranks(n) + TYPE(MPI_Group), INTENT(IN) :: group + TYPE(MPI_Group), INTENT(OUT) :: newgroup + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Group_incl_f08 +end interface MPI_Group_incl + +interface MPI_Group_intersection +subroutine MPI_Group_intersection_f08(group1,group2,newgroup,ierror) + use :: mpi_f08_types, only : MPI_Group + implicit none + TYPE(MPI_Group), INTENT(IN) :: group1, group2 + TYPE(MPI_Group), INTENT(OUT) :: newgroup + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Group_intersection_f08 +end interface MPI_Group_intersection + +interface MPI_Group_range_excl +subroutine MPI_Group_range_excl_f08(group,n,ranges,newgroup,ierror) + use :: mpi_f08_types, only : MPI_Group + implicit none + TYPE(MPI_Group), INTENT(IN) :: group + INTEGER, INTENT(IN) :: n, ranges(3,n) + TYPE(MPI_Group), INTENT(OUT) :: newgroup + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Group_range_excl_f08 +end interface MPI_Group_range_excl + +interface MPI_Group_range_incl +subroutine MPI_Group_range_incl_f08(group,n,ranges,newgroup,ierror) + use :: mpi_f08_types, only : MPI_Group + implicit none + TYPE(MPI_Group), INTENT(IN) :: group + INTEGER, INTENT(IN) :: n, ranges(3,n) + TYPE(MPI_Group), INTENT(OUT) :: newgroup + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Group_range_incl_f08 +end interface MPI_Group_range_incl + +interface MPI_Group_rank +subroutine MPI_Group_rank_f08(group,rank,ierror) + use :: mpi_f08_types, only : MPI_Group + implicit none + TYPE(MPI_Group), INTENT(IN) :: group + INTEGER, INTENT(OUT) :: rank + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Group_rank_f08 +end interface MPI_Group_rank + +interface MPI_Group_size +subroutine MPI_Group_size_f08(group,size,ierror) + use :: mpi_f08_types, only : MPI_Group + implicit none + TYPE(MPI_Group), INTENT(IN) :: group + INTEGER, INTENT(OUT) :: size + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Group_size_f08 +end interface MPI_Group_size + +interface MPI_Group_translate_ranks +subroutine MPI_Group_translate_ranks_f08(group1,n,ranks1,group2,ranks2,ierror) + use :: mpi_f08_types, only : MPI_Group + implicit none + TYPE(MPI_Group), INTENT(IN) :: group1, group2 + INTEGER, INTENT(IN) :: n + INTEGER, INTENT(IN) :: ranks1(n) + INTEGER, INTENT(OUT) :: ranks2(n) + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Group_translate_ranks_f08 +end interface MPI_Group_translate_ranks + +interface MPI_Group_union +subroutine MPI_Group_union_f08(group1,group2,newgroup,ierror) + use :: mpi_f08_types, only : MPI_Group + implicit none + TYPE(MPI_Group), INTENT(IN) :: group1, group2 + TYPE(MPI_Group), INTENT(OUT) :: newgroup + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Group_union_f08 +end interface MPI_Group_union + +interface MPI_Intercomm_create +subroutine MPI_Intercomm_create_f08(local_comm,local_leader,peer_comm,remote_leader, & + tag,newintercomm,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: local_comm, peer_comm + INTEGER, INTENT(IN) :: local_leader, remote_leader, tag + TYPE(MPI_Comm), INTENT(OUT) :: newintercomm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Intercomm_create_f08 +end interface MPI_Intercomm_create + +interface MPI_Intercomm_merge +subroutine MPI_Intercomm_merge_f08(intercomm,high,newintracomm,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: intercomm + LOGICAL, INTENT(IN) :: high + TYPE(MPI_Comm), INTENT(OUT) :: newintracomm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Intercomm_merge_f08 +end interface MPI_Intercomm_merge + +interface MPI_Type_create_keyval +subroutine MPI_Type_create_keyval_f08(type_copy_attr_fn,type_delete_attr_fn,type_keyval, & + extra_state,ierror) + use :: mpi_f08_types, only : MPI_ADDRESS_KIND + use :: mpi_f08_interfaces_callbacks, only : MPI_Type_copy_attr_function + use :: mpi_f08_interfaces_callbacks, only : MPI_Type_delete_attr_function + implicit none + PROCEDURE(MPI_Type_copy_attr_function) :: type_copy_attr_fn + PROCEDURE(MPI_Type_delete_attr_function) :: type_delete_attr_fn + INTEGER, INTENT(OUT) :: type_keyval + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: extra_state + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_create_keyval_f08 +end interface MPI_Type_create_keyval + +interface MPI_Type_delete_attr +subroutine MPI_Type_delete_attr_f08(datatype,type_keyval,ierror) + use :: mpi_f08_types, only : MPI_Datatype + implicit none + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, INTENT(IN) :: type_keyval + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_delete_attr_f08 +end interface MPI_Type_delete_attr + +interface MPI_Type_free_keyval +subroutine MPI_Type_free_keyval_f08(type_keyval,ierror) + implicit none + INTEGER, INTENT(INOUT) :: type_keyval + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_free_keyval_f08 +end interface MPI_Type_free_keyval + +interface MPI_Type_get_attr +subroutine MPI_Type_get_attr_f08(datatype,type_keyval,attribute_val,flag,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND + implicit none + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, INTENT(IN) :: type_keyval + INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: attribute_val + LOGICAL, INTENT(OUT) :: flag + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_get_attr_f08 +end interface MPI_Type_get_attr + +interface MPI_Type_get_name +subroutine MPI_Type_get_name_f08(datatype,type_name,resultlen,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_MAX_OBJECT_NAME + implicit none + TYPE(MPI_Datatype), INTENT(IN) :: datatype + CHARACTER(LEN=MPI_MAX_OBJECT_NAME), INTENT(OUT) :: type_name + INTEGER, INTENT(OUT) :: resultlen + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_get_name_f08 +end interface MPI_Type_get_name + +interface MPI_Type_set_attr +subroutine MPI_Type_set_attr_f08(datatype,type_keyval,attribute_val,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND + implicit none + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, INTENT(IN) :: type_keyval + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: attribute_val + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_set_attr_f08 +end interface MPI_Type_set_attr + +interface MPI_Type_set_name +subroutine MPI_Type_set_name_f08(datatype,type_name,ierror) + use :: mpi_f08_types, only : MPI_Datatype + implicit none + TYPE(MPI_Datatype), INTENT(IN) :: datatype + CHARACTER(LEN=*), INTENT(IN) :: type_name + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_set_name_f08 +end interface MPI_Type_set_name + +interface MPI_Win_allocate +subroutine MPI_Win_allocate_f08(size, disp_unit, info, comm, & + baseptr, win, ierror) + USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR + use :: mpi_f08_types, only : MPI_Info, MPI_Comm, MPI_Win, MPI_ADDRESS_KIND + INTEGER(KIND=MPI_ADDRESS_KIND), INTENT(IN) :: size + INTEGER, INTENT(IN) :: disp_unit + TYPE(MPI_Info), INTENT(IN) :: info + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(C_PTR), INTENT(OUT) :: baseptr + TYPE(MPI_Win), INTENT(OUT) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_allocate_f08 +end interface MPI_Win_allocate + +interface MPI_Win_allocate_shared +subroutine MPI_Win_allocate_shared_f08(size, disp_unit, info, comm, & + baseptr, win, ierror) + USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR + use :: mpi_f08_types, only : MPI_Info, MPI_Comm, MPI_Win, MPI_ADDRESS_KIND + INTEGER(KIND=MPI_ADDRESS_KIND), INTENT(IN) :: size + INTEGER, INTENT(IN) :: disp_unit + TYPE(MPI_Info), INTENT(IN) :: info + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(C_PTR), INTENT(OUT) :: baseptr + TYPE(MPI_Win), INTENT(OUT) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_allocate_shared_f08 +end interface MPI_Win_allocate_shared + +interface MPI_Win_create_keyval +subroutine MPI_Win_create_keyval_f08(win_copy_attr_fn,win_delete_attr_fn,win_keyval, & + extra_state,ierror) + use :: mpi_f08_types, only : MPI_ADDRESS_KIND + use :: mpi_f08_interfaces_callbacks, only : MPI_Win_copy_attr_function + use :: mpi_f08_interfaces_callbacks, only : MPI_Win_delete_attr_function + implicit none + PROCEDURE(MPI_Win_copy_attr_function) :: win_copy_attr_fn + PROCEDURE(MPI_Win_delete_attr_function) :: win_delete_attr_fn + INTEGER, INTENT(OUT) :: win_keyval + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: extra_state + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_create_keyval_f08 +end interface MPI_Win_create_keyval + +interface MPI_Win_delete_attr +subroutine MPI_Win_delete_attr_f08(win,win_keyval,ierror) + use :: mpi_f08_types, only : MPI_Win + implicit none + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, INTENT(IN) :: win_keyval + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_delete_attr_f08 +end interface MPI_Win_delete_attr + +interface MPI_Win_free_keyval +subroutine MPI_Win_free_keyval_f08(win_keyval,ierror) + implicit none + INTEGER, INTENT(INOUT) :: win_keyval + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_free_keyval_f08 +end interface MPI_Win_free_keyval + +interface MPI_Win_get_attr +subroutine MPI_Win_get_attr_f08(win,win_keyval,attribute_val,flag,ierror) + use :: mpi_f08_types, only : MPI_Win, MPI_ADDRESS_KIND + implicit none + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, INTENT(IN) :: win_keyval + INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: attribute_val + LOGICAL, INTENT(OUT) :: flag + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_get_attr_f08 +end interface MPI_Win_get_attr + +interface MPI_Win_get_info +subroutine MPI_Win_get_info_f08(win,info,ierror) + use :: mpi_f08_types, only : MPI_Win, MPI_Info + implicit none + TYPE(MPI_Win), INTENT(IN) :: win + TYPE(MPI_Info), INTENT(OUT) :: info + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_get_info_f08 +end interface MPI_Win_get_info + +interface MPI_Win_get_name +subroutine MPI_Win_get_name_f08(win,win_name,resultlen,ierror) + use :: mpi_f08_types, only : MPI_Win, MPI_MAX_OBJECT_NAME + implicit none + TYPE(MPI_Win), INTENT(IN) :: win + CHARACTER(LEN=MPI_MAX_OBJECT_NAME), INTENT(OUT) :: win_name + INTEGER, INTENT(OUT) :: resultlen + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_get_name_f08 +end interface MPI_Win_get_name + +interface MPI_Win_set_attr +subroutine MPI_Win_set_attr_f08(win,win_keyval,attribute_val,ierror) + use :: mpi_f08_types, only : MPI_Win, MPI_ADDRESS_KIND + implicit none + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, INTENT(IN) :: win_keyval + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: attribute_val + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_set_attr_f08 +end interface MPI_Win_set_attr + +interface MPI_Win_set_info +subroutine MPI_Win_set_info_f08(win,info,ierror) + use :: mpi_f08_types, only : MPI_Win, MPI_Info + implicit none + TYPE(MPI_Win), INTENT(IN) :: win + TYPE(MPI_Info), INTENT(IN) :: info + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_set_info_f08 +end interface MPI_Win_set_info + +interface MPI_Win_set_name +subroutine MPI_Win_set_name_f08(win,win_name,ierror) + use :: mpi_f08_types, only : MPI_Win + implicit none + TYPE(MPI_Win), INTENT(IN) :: win + CHARACTER(LEN=*), INTENT(IN) :: win_name + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_set_name_f08 +end interface MPI_Win_set_name + +interface MPI_Cartdim_get +subroutine MPI_Cartdim_get_f08(comm,ndims,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(OUT) :: ndims + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Cartdim_get_f08 +end interface MPI_Cartdim_get + +interface MPI_Cart_coords +subroutine MPI_Cart_coords_f08(comm,rank,maxdims,coords,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(IN) :: rank, maxdims + INTEGER, INTENT(OUT) :: coords(maxdims) + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Cart_coords_f08 +end interface MPI_Cart_coords + +interface MPI_Cart_create +subroutine MPI_Cart_create_f08(comm_old,ndims,dims,periods,reorder,comm_cart,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm_old + INTEGER, INTENT(IN) :: ndims, dims(ndims) + LOGICAL, INTENT(IN) :: periods(ndims), reorder + TYPE(MPI_Comm), INTENT(OUT) :: comm_cart + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Cart_create_f08 +end interface MPI_Cart_create + +interface MPI_Cart_get +subroutine MPI_Cart_get_f08(comm,maxdims,dims,periods,coords,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(IN) :: maxdims + INTEGER, INTENT(OUT) :: dims(maxdims), coords(maxdims) + LOGICAL, INTENT(OUT) :: periods(maxdims) + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Cart_get_f08 +end interface MPI_Cart_get + +interface MPI_Cart_map +subroutine MPI_Cart_map_f08(comm,ndims,dims,periods,newrank,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(IN) :: ndims, dims(ndims) + LOGICAL, INTENT(IN) :: periods(ndims) + INTEGER, INTENT(OUT) :: newrank + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Cart_map_f08 +end interface MPI_Cart_map + +interface MPI_Cart_rank +subroutine MPI_Cart_rank_f08(comm,coords,rank,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(IN) :: coords(*) + INTEGER, INTENT(OUT) :: rank + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Cart_rank_f08 +end interface MPI_Cart_rank + +interface MPI_Cart_shift +subroutine MPI_Cart_shift_f08(comm,direction,disp,rank_source,rank_dest,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(IN) :: direction, disp + INTEGER, INTENT(OUT) :: rank_source, rank_dest + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Cart_shift_f08 +end interface MPI_Cart_shift + +interface MPI_Cart_sub +subroutine MPI_Cart_sub_f08(comm,remain_dims,newcomm,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + LOGICAL, INTENT(IN) :: remain_dims(*) + TYPE(MPI_Comm), INTENT(OUT) :: newcomm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Cart_sub_f08 +end interface MPI_Cart_sub + +interface MPI_Dims_create +subroutine MPI_Dims_create_f08(nnodes,ndims,dims,ierror) + implicit none + INTEGER, INTENT(IN) :: nnodes, ndims + INTEGER, INTENT(INOUT) :: dims(ndims) + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Dims_create_f08 +end interface MPI_Dims_create + +interface MPI_Dist_graph_create +subroutine MPI_Dist_graph_create_f08(comm_old,n,sources,degrees,destinations,weights, & + info,reorder,comm_dist_graph,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Info + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm_old + INTEGER, INTENT(IN) :: n, sources(n), degrees(n), destinations(*), weights(*) + TYPE(MPI_Info), INTENT(IN) :: info + LOGICAL, INTENT(IN) :: reorder + TYPE(MPI_Comm), INTENT(OUT) :: comm_dist_graph + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Dist_graph_create_f08 +end interface MPI_Dist_graph_create + +interface MPI_Dist_graph_create_adjacent +subroutine MPI_Dist_graph_create_adjacent_f08(comm_old,indegree,sources,sourceweights, & + outdegree,destinations,destweights,info,reorder, & + comm_dist_graph,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Info + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm_old + INTEGER, INTENT(IN) :: indegree, sources(indegree), outdegree, destinations(outdegree) + INTEGER, INTENT(IN) :: sourceweights(indegree), destweights(outdegree) + TYPE(MPI_Info), INTENT(IN) :: info + LOGICAL, INTENT(IN) :: reorder + TYPE(MPI_Comm), INTENT(OUT) :: comm_dist_graph + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Dist_graph_create_adjacent_f08 +end interface MPI_Dist_graph_create_adjacent + +interface MPI_Dist_graph_neighbors +subroutine MPI_Dist_graph_neighbors_f08(comm,maxindegree,sources,sourceweights, & + maxoutdegree,destinations,destweights,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(IN) :: maxindegree, maxoutdegree + INTEGER, INTENT(OUT) :: sources(maxindegree), destinations(maxoutdegree) + INTEGER, INTENT(OUT) :: sourceweights(maxindegree), destweights(maxoutdegree) + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Dist_graph_neighbors_f08 +end interface MPI_Dist_graph_neighbors + +interface MPI_Dist_graph_neighbors_count +subroutine MPI_Dist_graph_neighbors_count_f08(comm,indegree,outdegree,weighted,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(OUT) :: indegree, outdegree + LOGICAL, INTENT(OUT) :: weighted + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Dist_graph_neighbors_count_f08 +end interface MPI_Dist_graph_neighbors_count + +interface MPI_Graphdims_get +subroutine MPI_Graphdims_get_f08(comm,nnodes,nedges,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(OUT) :: nnodes, nedges + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Graphdims_get_f08 +end interface MPI_Graphdims_get + +interface MPI_Graph_create +subroutine MPI_Graph_create_f08(comm_old,nnodes,index,edges,reorder,comm_graph, & + ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm_old + INTEGER, INTENT(IN) :: nnodes, index(nnodes), edges(*) + LOGICAL, INTENT(IN) :: reorder + TYPE(MPI_Comm), INTENT(OUT) :: comm_graph + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Graph_create_f08 +end interface MPI_Graph_create + +interface MPI_Graph_get +subroutine MPI_Graph_get_f08(comm,maxindex,maxedges,index,edges,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(IN) :: maxindex, maxedges + INTEGER, INTENT(OUT) :: index(maxindex), edges(maxedges) + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Graph_get_f08 +end interface MPI_Graph_get + +interface MPI_Graph_map +subroutine MPI_Graph_map_f08(comm,nnodes,index,edges,newrank,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(IN) :: nnodes, index(nnodes), edges(*) + INTEGER, INTENT(OUT) :: newrank + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Graph_map_f08 +end interface MPI_Graph_map + +interface MPI_Graph_neighbors +subroutine MPI_Graph_neighbors_f08(comm,rank,maxneighbors,neighbors,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(IN) :: rank, maxneighbors + INTEGER, INTENT(OUT) :: neighbors(maxneighbors) + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Graph_neighbors_f08 +end interface MPI_Graph_neighbors + +interface MPI_Graph_neighbors_count +subroutine MPI_Graph_neighbors_count_f08(comm,rank,nneighbors,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(IN) :: rank + INTEGER, INTENT(OUT) :: nneighbors + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Graph_neighbors_count_f08 +end interface MPI_Graph_neighbors_count + +interface MPI_Topo_test +subroutine MPI_Topo_test_f08(comm,status,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Status + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(OUT) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Topo_test_f08 +end interface MPI_Topo_test + +! MPI_Wtick is not a wrapper function +! +interface MPI_Wtick +function MPI_Wtick_f08( ) BIND(C,name="MPI_Wtick") + use, intrinsic :: ISO_C_BINDING + implicit none + DOUBLE PRECISION :: MPI_Wtick_f08 +end function MPI_Wtick_f08 +end interface MPI_Wtick + +! MPI_Wtime is not a wrapper function +! +interface MPI_Wtime +function MPI_Wtime_f08( ) BIND(C,name="MPI_Wtime") + use, intrinsic :: ISO_C_BINDING + implicit none + DOUBLE PRECISION :: MPI_Wtime_f08 +end function MPI_Wtime_f08 +end interface MPI_Wtime + +interface MPI_Aint_add +function MPI_Aint_add_f08(base,diff) + use :: mpi_f08_types, only : MPI_ADDRESS_KIND + implicit none + INTEGER(MPI_ADDRESS_KIND) :: base + INTEGER(MPI_ADDRESS_KIND) :: diff + INTEGER(MPI_ADDRESS_KIND) :: MPI_Aint_add_f08 +end function MPI_Aint_add_f08 +end interface MPI_Aint_add + +interface MPI_Aint_diff +function MPI_Aint_diff_f08(addr1,addr2) + use :: mpi_f08_types, only : MPI_ADDRESS_KIND + implicit none + INTEGER(MPI_ADDRESS_KIND) :: addr1 + INTEGER(MPI_ADDRESS_KIND) :: addr2 + INTEGER(MPI_ADDRESS_KIND) :: MPI_Aint_diff_f08 +end function MPI_Aint_diff_f08 +end interface MPI_Aint_diff + +interface MPI_Abort +subroutine MPI_Abort_f08(comm,errorcode,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(IN) :: errorcode + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Abort_f08 +end interface MPI_Abort + +interface MPI_Add_error_class +subroutine MPI_Add_error_class_f08(errorclass,ierror) + implicit none + INTEGER, INTENT(OUT) :: errorclass + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Add_error_class_f08 +end interface MPI_Add_error_class + +interface MPI_Add_error_code +subroutine MPI_Add_error_code_f08(errorclass,errorcode,ierror) + implicit none + INTEGER, INTENT(IN) :: errorclass + INTEGER, INTENT(OUT) :: errorcode + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Add_error_code_f08 +end interface MPI_Add_error_code + +interface MPI_Add_error_string +subroutine MPI_Add_error_string_f08(errorcode,string,ierror) + implicit none + integer, intent(in) :: errorcode + character(len=*), intent(in) :: string + integer, optional, intent(out) :: ierror +end subroutine MPI_Add_error_string_f08 +end interface MPI_Add_error_string + +interface MPI_Alloc_mem +subroutine MPI_Alloc_mem_f08(size,info,baseptr,ierror) + use, intrinsic :: ISO_C_BINDING, only : C_PTR + use :: mpi_f08_types, only : MPI_Info, MPI_ADDRESS_KIND + implicit none + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: size + TYPE(MPI_Info), INTENT(IN) :: info + TYPE(C_PTR), INTENT(OUT) :: baseptr + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Alloc_mem_f08 +end interface MPI_Alloc_mem + +interface MPI_Comm_call_errhandler +subroutine MPI_Comm_call_errhandler_f08(comm,errorcode,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(IN) :: errorcode + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_call_errhandler_f08 +end interface MPI_Comm_call_errhandler + +interface MPI_Comm_create_errhandler +subroutine MPI_Comm_create_errhandler_f08(comm_errhandler_fn,errhandler,ierror) + use :: mpi_f08_types, only : MPI_Errhandler + use :: mpi_f08_interfaces_callbacks, only : MPI_Comm_errhandler_function + implicit none + PROCEDURE(MPI_Comm_errhandler_function) :: comm_errhandler_fn + TYPE(MPI_Errhandler), INTENT(OUT) :: errhandler + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_create_errhandler_f08 +end interface MPI_Comm_create_errhandler + +interface MPI_Comm_get_errhandler +subroutine MPI_Comm_get_errhandler_f08(comm,errhandler,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Errhandler + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Errhandler), INTENT(OUT) :: errhandler + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_get_errhandler_f08 +end interface MPI_Comm_get_errhandler + +interface MPI_Comm_set_errhandler +subroutine MPI_Comm_set_errhandler_f08(comm,errhandler,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Errhandler + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Errhandler), INTENT(IN) :: errhandler + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_set_errhandler_f08 +end interface MPI_Comm_set_errhandler + +interface MPI_Errhandler_free +subroutine MPI_Errhandler_free_f08(errhandler,ierror) + use :: mpi_f08_types, only : MPI_Errhandler + implicit none + TYPE(MPI_Errhandler), INTENT(INOUT) :: errhandler + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Errhandler_free_f08 +end interface MPI_Errhandler_free + +interface MPI_Error_class +subroutine MPI_Error_class_f08(errorcode,errorclass,ierror) + implicit none + INTEGER, INTENT(IN) :: errorcode + INTEGER, INTENT(OUT) :: errorclass + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Error_class_f08 +end interface MPI_Error_class + +interface MPI_Error_string +subroutine MPI_Error_string_f08(errorcode,string,resultlen,ierror) + use :: mpi_f08_types, only : MPI_MAX_ERROR_STRING + implicit none + integer, intent(in) :: errorcode + character(len=MPI_MAX_ERROR_STRING), intent(out) :: string + integer, intent(out) :: resultlen + integer, optional, intent(out) :: ierror +end subroutine MPI_Error_string_f08 +end interface MPI_Error_string + +interface MPI_File_call_errhandler +subroutine MPI_File_call_errhandler_f08(fh,errorcode,ierror) + use :: mpi_f08_types, only : MPI_File + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER, INTENT(IN) :: errorcode + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_call_errhandler_f08 +end interface MPI_File_call_errhandler + +interface MPI_File_create_errhandler +subroutine MPI_File_create_errhandler_f08(file_errhandler_fn,errhandler,ierror) + use :: mpi_f08_types, only : MPI_Errhandler + use :: mpi_f08_interfaces_callbacks, only : MPI_File_errhandler_function + implicit none + PROCEDURE(MPI_File_errhandler_function) :: file_errhandler_fn + TYPE(MPI_Errhandler), INTENT(OUT) :: errhandler + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_create_errhandler_f08 +end interface MPI_File_create_errhandler + +interface MPI_File_get_errhandler +subroutine MPI_File_get_errhandler_f08(file,errhandler,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Errhandler + implicit none + TYPE(MPI_File), INTENT(IN) :: file + TYPE(MPI_Errhandler), INTENT(OUT) :: errhandler + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_get_errhandler_f08 +end interface MPI_File_get_errhandler + +interface MPI_File_set_errhandler +subroutine MPI_File_set_errhandler_f08(file,errhandler,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Errhandler + implicit none + TYPE(MPI_File), INTENT(IN) :: file + TYPE(MPI_Errhandler), INTENT(IN) :: errhandler + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_set_errhandler_f08 +end interface MPI_File_set_errhandler + +interface MPI_Finalize +subroutine MPI_Finalize_f08(ierror) + implicit none + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Finalize_f08 +end interface MPI_Finalize + +interface MPI_Finalized +subroutine MPI_Finalized_f08(flag,ierror) + implicit none + LOGICAL, INTENT(OUT) :: flag + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Finalized_f08 +end interface MPI_Finalized + +! ASYNCHRONOUS had to removed from the base argument because +! the dummy argument is not an assumed-shape array. This will +! be okay once the Interop TR is implemented. +interface MPI_Free_mem +subroutine MPI_Free_mem_f08(base,ierror) + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: base + !GCC$ ATTRIBUTES NO_ARG_CHECK :: base + !$PRAGMA IGNORE_TKR base + !DIR$ IGNORE_TKR base + !IBM* IGNORE_TKR base + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: base + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Free_mem_f08 +end interface MPI_Free_mem + +interface MPI_Get_processor_name +subroutine MPI_Get_processor_name_f08(name,resultlen,ierror) + use :: mpi_f08_types, only : MPI_MAX_PROCESSOR_NAME + implicit none + character(len=MPI_MAX_PROCESSOR_NAME), intent(out) :: name + integer, intent(out) :: resultlen + integer, optional, intent(out) :: ierror +end subroutine MPI_Get_processor_name_f08 +end interface MPI_Get_processor_name + +interface MPI_Get_version +subroutine MPI_Get_version_f08(version,subversion,ierror) + implicit none + INTEGER, INTENT(OUT) :: version, subversion + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Get_version_f08 +end interface MPI_Get_version + +interface MPI_Init +subroutine MPI_Init_f08(ierror) + implicit none + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Init_f08 +end interface MPI_Init + +interface MPI_Initialized +subroutine MPI_Initialized_f08(flag,ierror) + implicit none + LOGICAL, INTENT(OUT) :: flag + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Initialized_f08 +end interface MPI_Initialized + +interface MPI_Win_call_errhandler +subroutine MPI_Win_call_errhandler_f08(win,errorcode,ierror) + use :: mpi_f08_types, only : MPI_Win + implicit none + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, INTENT(IN) :: errorcode + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_call_errhandler_f08 +end interface MPI_Win_call_errhandler + +interface MPI_Win_create_errhandler +subroutine MPI_Win_create_errhandler_f08(win_errhandler_fn,errhandler,ierror) + use :: mpi_f08_types, only : MPI_Errhandler + use :: mpi_f08_interfaces_callbacks, only : MPI_Win_errhandler_function + implicit none + PROCEDURE(MPI_Win_errhandler_function) :: win_errhandler_fn + TYPE(MPI_Errhandler), INTENT(OUT) :: errhandler + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_create_errhandler_f08 +end interface MPI_Win_create_errhandler + +interface MPI_Win_get_errhandler +subroutine MPI_Win_get_errhandler_f08(win,errhandler,ierror) + use :: mpi_f08_types, only : MPI_Win, MPI_Errhandler + implicit none + TYPE(MPI_Win), INTENT(IN) :: win + TYPE(MPI_Errhandler), INTENT(OUT) :: errhandler + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_get_errhandler_f08 +end interface MPI_Win_get_errhandler + +interface MPI_Win_set_errhandler +subroutine MPI_Win_set_errhandler_f08(win,errhandler,ierror) + use :: mpi_f08_types, only : MPI_Win, MPI_Errhandler + implicit none + TYPE(MPI_Win), INTENT(IN) :: win + TYPE(MPI_Errhandler), INTENT(IN) :: errhandler + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_set_errhandler_f08 +end interface MPI_Win_set_errhandler + +interface MPI_Info_create +subroutine MPI_Info_create_f08(info,ierror) + use :: mpi_f08_types, only : MPI_Info + implicit none + TYPE(MPI_Info), INTENT(OUT) :: info + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Info_create_f08 +end interface MPI_Info_create + +interface MPI_Info_delete +subroutine MPI_Info_delete_f08(info,key,ierror) + use :: mpi_f08_types, only : MPI_Info + implicit none + TYPE(MPI_Info), INTENT(IN) :: info + CHARACTER(LEN=*), INTENT(IN) :: key + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Info_delete_f08 +end interface MPI_Info_delete + +interface MPI_Info_dup +subroutine MPI_Info_dup_f08(info,newinfo,ierror) + use :: mpi_f08_types, only : MPI_Info + implicit none + TYPE(MPI_Info), INTENT(IN) :: info + TYPE(MPI_Info), INTENT(OUT) :: newinfo + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Info_dup_f08 +end interface MPI_Info_dup + +interface MPI_Info_free +subroutine MPI_Info_free_f08(info,ierror) + use :: mpi_f08_types, only : MPI_Info + implicit none + TYPE(MPI_Info), INTENT(INOUT) :: info + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Info_free_f08 +end interface MPI_Info_free + +interface MPI_Info_get +subroutine MPI_Info_get_f08(info,key,valuelen,value,flag,ierror) + use :: mpi_f08_types, only : MPI_Info + implicit none + TYPE(MPI_Info), INTENT(IN) :: info + CHARACTER(LEN=*), INTENT(IN) :: key + INTEGER, INTENT(IN) :: valuelen + CHARACTER(LEN=valuelen), INTENT(OUT) :: value + LOGICAL, INTENT(OUT) :: flag + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Info_get_f08 +end interface MPI_Info_get + +interface MPI_Info_get_nkeys +subroutine MPI_Info_get_nkeys_f08(info,nkeys,ierror) + use :: mpi_f08_types, only : MPI_Info + implicit none + TYPE(MPI_Info), INTENT(IN) :: info + INTEGER, INTENT(OUT) :: nkeys + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Info_get_nkeys_f08 +end interface MPI_Info_get_nkeys + +interface MPI_Info_get_nthkey +subroutine MPI_Info_get_nthkey_f08(info,n,key,ierror) + use :: mpi_f08_types, only : MPI_Info + implicit none + TYPE(MPI_Info), INTENT(IN) :: info + INTEGER, INTENT(IN) :: n + CHARACTER(lEN=*), INTENT(OUT) :: key + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Info_get_nthkey_f08 +end interface MPI_Info_get_nthkey + +interface MPI_Info_get_valuelen +subroutine MPI_Info_get_valuelen_f08(info,key,valuelen,flag,ierror) + use :: mpi_f08_types, only : MPI_Info + implicit none + TYPE(MPI_Info), INTENT(IN) :: info + CHARACTER(LEN=*), INTENT(IN) :: key + INTEGER, INTENT(OUT) :: valuelen + LOGICAL, INTENT(OUT) :: flag + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Info_get_valuelen_f08 +end interface MPI_Info_get_valuelen + +interface MPI_Info_set +subroutine MPI_Info_set_f08(info,key,value,ierror) + use :: mpi_f08_types, only : MPI_Info + implicit none + TYPE(MPI_Info), INTENT(IN) :: info + CHARACTER(LEN=*), INTENT(IN) :: key, value + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Info_set_f08 +end interface MPI_Info_set + +interface MPI_Close_port +subroutine MPI_Close_port_f08(port_name,ierror) + implicit none + CHARACTER(LEN=*), INTENT(IN) :: port_name + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Close_port_f08 +end interface MPI_Close_port + +interface MPI_Comm_accept +subroutine MPI_Comm_accept_f08(port_name,info,root,comm,newcomm,ierror) + use :: mpi_f08_types, only : MPI_Info, MPI_Comm + implicit none + CHARACTER(LEN=*), INTENT(IN) :: port_name + TYPE(MPI_Info), INTENT(IN) :: info + INTEGER, INTENT(IN) :: root + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Comm), INTENT(OUT) :: newcomm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_accept_f08 +end interface MPI_Comm_accept + +interface MPI_Comm_connect +subroutine MPI_Comm_connect_f08(port_name,info,root,comm,newcomm,ierror) + use :: mpi_f08_types, only : MPI_Info, MPI_Comm + implicit none + CHARACTER(LEN=*), INTENT(IN) :: port_name + TYPE(MPI_Info), INTENT(IN) :: info + INTEGER, INTENT(IN) :: root + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Comm), INTENT(OUT) :: newcomm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_connect_f08 +end interface MPI_Comm_connect + +interface MPI_Comm_disconnect +subroutine MPI_Comm_disconnect_f08(comm,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(INOUT) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_disconnect_f08 +end interface MPI_Comm_disconnect + +interface MPI_Comm_get_parent +subroutine MPI_Comm_get_parent_f08(parent,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + TYPE(MPI_Comm), INTENT(OUT) :: parent + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_get_parent_f08 +end interface MPI_Comm_get_parent + +interface MPI_Comm_join +subroutine MPI_Comm_join_f08(fd,intercomm,ierror) + use :: mpi_f08_types, only : MPI_Comm + implicit none + INTEGER, INTENT(IN) :: fd + TYPE(MPI_Comm), INTENT(OUT) :: intercomm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_join_f08 +end interface MPI_Comm_join + +interface MPI_Comm_spawn +subroutine MPI_Comm_spawn_f08(command,argv,maxprocs,info,root,comm,intercomm, & + array_of_errcodes,ierror) + use :: mpi_f08_types, only : MPI_Info, MPI_Comm + implicit none + CHARACTER(LEN=*), INTENT(IN) :: command, argv(*) + INTEGER, INTENT(IN) :: maxprocs, root + TYPE(MPI_Info), INTENT(IN) :: info + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Comm), INTENT(OUT) :: intercomm + INTEGER :: array_of_errcodes(*) + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_spawn_f08 +end interface MPI_Comm_spawn + +interface MPI_Comm_spawn_multiple +subroutine MPI_Comm_spawn_multiple_f08(count,array_of_commands,array_of_argv,array_of_maxprocs, & + array_of_info,root,comm,intercomm, & + array_of_errcodes,ierror) + use :: mpi_f08_types, only : MPI_Info, MPI_Comm + implicit none + INTEGER, INTENT(IN) :: count, array_of_maxprocs(*), root + CHARACTER(LEN=*), INTENT(IN) :: array_of_commands(*), array_of_argv(count,*) + TYPE(MPI_Info), INTENT(IN) :: array_of_info(*) + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Comm), INTENT(OUT) :: intercomm + INTEGER :: array_of_errcodes(*) + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_spawn_multiple_f08 +end interface MPI_Comm_spawn_multiple + +interface MPI_Lookup_name +subroutine MPI_Lookup_name_f08(service_name,info,port_name,ierror) + use :: mpi_f08_types, only : MPI_Info, MPI_MAX_PORT_NAME + implicit none + CHARACTER(LEN=*), INTENT(IN) :: service_name + TYPE(MPI_Info), INTENT(IN) :: info + CHARACTER(LEN=MPI_MAX_PORT_NAME), INTENT(OUT) :: port_name + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Lookup_name_f08 +end interface MPI_Lookup_name + +interface MPI_Open_port +subroutine MPI_Open_port_f08(info,port_name,ierror) + use :: mpi_f08_types, only : MPI_Info, MPI_MAX_PORT_NAME + implicit none + TYPE(MPI_Info), INTENT(IN) :: info + CHARACTER(LEN=MPI_MAX_PORT_NAME), INTENT(OUT) :: port_name + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Open_port_f08 +end interface MPI_Open_port + +interface MPI_Publish_name +subroutine MPI_Publish_name_f08(service_name,info,port_name,ierror) + use :: mpi_f08_types, only : MPI_Info + implicit none + TYPE(MPI_Info), INTENT(IN) :: info + CHARACTER(LEN=*), INTENT(IN) :: service_name, port_name + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Publish_name_f08 +end interface MPI_Publish_name + +interface MPI_Unpublish_name +subroutine MPI_Unpublish_name_f08(service_name,info,port_name,ierror) + use :: mpi_f08_types, only : MPI_Info + implicit none + CHARACTER(LEN=*), INTENT(IN) :: service_name, port_name + TYPE(MPI_Info), INTENT(IN) :: info + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Unpublish_name_f08 +end interface MPI_Unpublish_name + +interface MPI_Accumulate +subroutine MPI_Accumulate_f08(origin_addr,origin_count,origin_datatype,target_rank, & + target_disp,target_count,target_datatype,op,win,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Win, MPI_ADDRESS_KIND + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr + !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr + !$PRAGMA IGNORE_TKR origin_addr + !DIR$ IGNORE_TKR origin_addr + !IBM* IGNORE_TKR origin_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr + INTEGER, INTENT(IN) :: origin_count, target_rank, target_count + TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp + TYPE(MPI_Datatype), INTENT(IN) :: target_datatype + TYPE(MPI_Op), INTENT(IN) :: op + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Accumulate_f08 +end interface MPI_Accumulate + +interface MPI_Raccumulate +subroutine MPI_Raccumulate_f08(origin_addr,origin_count,origin_datatype,target_rank, & + target_disp,target_count,target_datatype,op,win,request, & + ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Win, MPI_Request, MPI_ADDRESS_KIND + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr + !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr + !$PRAGMA IGNORE_TKR origin_addr + !DIR$ IGNORE_TKR origin_addr + !IBM* IGNORE_TKR origin_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr + INTEGER, INTENT(IN) :: origin_count, target_rank, target_count + TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp + TYPE(MPI_Datatype), INTENT(IN) :: target_datatype + TYPE(MPI_Op), INTENT(IN) :: op + TYPE(MPI_Win), INTENT(IN) :: win + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Raccumulate_f08 +end interface MPI_Raccumulate + +interface MPI_Get +subroutine MPI_Get_f08(origin_addr,origin_count,origin_datatype,target_rank, & + target_disp,target_count,target_datatype,win,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Win, MPI_ADDRESS_KIND + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr + !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr + !$PRAGMA IGNORE_TKR origin_addr + !DIR$ IGNORE_TKR origin_addr + !IBM* IGNORE_TKR origin_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE :: origin_addr + INTEGER, INTENT(IN) :: origin_count, target_rank, target_count + TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp + TYPE(MPI_Datatype), INTENT(IN) :: target_datatype + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Get_f08 +end interface MPI_Get + +interface MPI_Rget +subroutine MPI_Rget_f08(origin_addr,origin_count,origin_datatype,target_rank, & + target_disp,target_count,target_datatype,win,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Request, MPI_Win, MPI_ADDRESS_KIND + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr + !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr + !$PRAGMA IGNORE_TKR origin_addr + !DIR$ IGNORE_TKR origin_addr + !IBM* IGNORE_TKR origin_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE :: origin_addr + INTEGER, INTENT(IN) :: origin_count, target_rank, target_count + TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp + TYPE(MPI_Datatype), INTENT(IN) :: target_datatype + TYPE(MPI_Win), INTENT(IN) :: win + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Rget_f08 +end interface MPI_Rget + +interface MPI_Get_accumulate +subroutine MPI_Get_accumulate_f08(origin_addr,origin_count,origin_datatype,result_addr, & + result_count,result_datatype,target_rank,target_disp, & + target_count,target_datatype,op,win,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Win, MPI_ADDRESS_KIND + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr,result_addr + !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr,result_addr + !$PRAGMA IGNORE_TKR origin_addr,result_addr + !DIR$ IGNORE_TKR origin_addr,result_addr + !IBM* IGNORE_TKR origin_addr,result_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE :: origin_addr + INTEGER, INTENT(IN) :: origin_count, result_count, target_rank, target_count + TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype + OMPI_FORTRAN_IGNORE_TKR_TYPE :: result_addr + TYPE(MPI_Datatype), INTENT(IN) :: result_datatype + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp + TYPE(MPI_Datatype), INTENT(IN) :: target_datatype + TYPE(MPI_Op), INTENT(IN) :: op + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Get_accumulate_f08 +end interface MPI_Get_accumulate + +interface MPI_Rget_accumulate +subroutine MPI_Rget_accumulate_f08(origin_addr,origin_count,origin_datatype,result_addr, & + result_count,result_datatype,target_rank,target_disp, & + target_count,target_datatype,op,win,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Request, MPI_Win, MPI_ADDRESS_KIND + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr,result_addr + !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr,result_addr + !$PRAGMA IGNORE_TKR origin_addr,result_addr + !DIR$ IGNORE_TKR origin_addr,result_addr + !IBM* IGNORE_TKR origin_addr,result_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE :: origin_addr + INTEGER, INTENT(IN) :: origin_count, result_count, target_rank, target_count + TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype + OMPI_FORTRAN_IGNORE_TKR_TYPE :: result_addr + TYPE(MPI_Datatype), INTENT(IN) :: result_datatype + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp + TYPE(MPI_Datatype), INTENT(IN) :: target_datatype + TYPE(MPI_Op), INTENT(IN) :: op + TYPE(MPI_Win), INTENT(IN) :: win + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Rget_accumulate_f08 +end interface MPI_Rget_accumulate + +interface MPI_Put +subroutine MPI_Put_f08(origin_addr,origin_count,origin_datatype,target_rank, & + target_disp,target_count,target_datatype,win,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Win, MPI_ADDRESS_KIND + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr + !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr + !$PRAGMA IGNORE_TKR origin_addr + !DIR$ IGNORE_TKR origin_addr + !IBM* IGNORE_TKR origin_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr + INTEGER, INTENT(IN) :: origin_count, target_rank, target_count + TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp + TYPE(MPI_Datatype), INTENT(IN) :: target_datatype + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Put_f08 +end interface MPI_Put + +interface MPI_Rput +subroutine MPI_Rput_f08(origin_addr,origin_count,origin_datatype,target_rank, & + target_disp,target_count,target_datatype,win,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Win, MPI_Request, MPI_ADDRESS_KIND + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr + !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr + !$PRAGMA IGNORE_TKR origin_addr + !DIR$ IGNORE_TKR origin_addr + !IBM* IGNORE_TKR origin_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr + INTEGER, INTENT(IN) :: origin_count, target_rank, target_count + TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp + TYPE(MPI_Datatype), INTENT(IN) :: target_datatype + TYPE(MPI_Win), INTENT(IN) :: win + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Rput_f08 +end interface MPI_Rput + +interface MPI_Fetch_and_op +subroutine MPI_Fetch_and_op_f08(origin_addr,result_addr,datatype,target_rank, & + target_disp,op,win,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Win, MPI_ADDRESS_KIND + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr,result_addr + !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr,result_addr + !$PRAGMA IGNORE_TKR origin_addr,result_addr + !DIR$ IGNORE_TKR origin_addr,result_addr + !IBM* IGNORE_TKR origin_addr,result_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE :: result_addr + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, INTENT(IN) :: target_rank + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp + TYPE(MPI_Op), INTENT(IN) :: op + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Fetch_and_op_f08 +end interface MPI_Fetch_and_op + +interface MPI_Compare_and_swap +subroutine MPI_Compare_and_swap_f08(origin_addr,compare_addr,result_addr,datatype, & + target_rank,target_disp,win,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Win, MPI_ADDRESS_KIND + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr,compare_addr,result_addr + !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr,compare_addr,result_addr + !$PRAGMA IGNORE_TKR origin_addr,compare_addr,result_addr + !DIR$ IGNORE_TKR origin_addr,compare_addr,result_addr + !IBM* IGNORE_TKR origin_addr,compare_addr,result_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr,compare_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE :: result_addr + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, INTENT(IN) :: target_rank + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Compare_and_swap_f08 +end interface MPI_Compare_and_swap + +interface MPI_Win_complete +subroutine MPI_Win_complete_f08(win,ierror) + use :: mpi_f08_types, only : MPI_Info, MPI_Comm, MPI_Win, MPI_ADDRESS_KIND + implicit none + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_complete_f08 +end interface MPI_Win_complete + +interface MPI_Win_create +subroutine MPI_Win_create_f08(base,size,disp_unit,info,comm,win,ierror) + use :: mpi_f08_types, only : MPI_Info, MPI_Comm, MPI_Win, MPI_ADDRESS_KIND + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: base + !GCC$ ATTRIBUTES NO_ARG_CHECK :: base + !$PRAGMA IGNORE_TKR base + !DIR$ IGNORE_TKR base + !IBM* IGNORE_TKR base + OMPI_FORTRAN_IGNORE_TKR_TYPE :: base + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: size + INTEGER, INTENT(IN) :: disp_unit + TYPE(MPI_Info), INTENT(IN) :: info + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Win), INTENT(OUT) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_create_f08 +end interface MPI_Win_create + +interface MPI_Win_create_dynamic +subroutine MPI_Win_create_dynamic_f08(info,comm,win,ierror) + use :: mpi_f08_types, only : MPI_Info, MPI_Comm, MPI_Win + implicit none + TYPE(MPI_Info), INTENT(IN) :: info + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Win), INTENT(OUT) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_create_dynamic_f08 +end interface MPI_Win_create_dynamic + +interface MPI_Win_attach +subroutine MPI_Win_attach_f08(win,base,size,ierror) + use :: mpi_f08_types, only : MPI_Win, MPI_ADDRESS_KIND + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: base + !GCC$ ATTRIBUTES NO_ARG_CHECK :: base + !$PRAGMA IGNORE_TKR base + !DIR$ IGNORE_TKR base + !IBM* IGNORE_TKR base + OMPI_FORTRAN_IGNORE_TKR_TYPE :: base + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: size + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_attach_f08 +end interface MPI_Win_attach + +interface MPI_Win_detach +subroutine MPI_Win_detach_f08(win,base,ierror) + use :: mpi_f08_types, only : MPI_Win, MPI_ADDRESS_KIND + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: base + !GCC$ ATTRIBUTES NO_ARG_CHECK :: base + !$PRAGMA IGNORE_TKR base + !DIR$ IGNORE_TKR base + !IBM* IGNORE_TKR base + OMPI_FORTRAN_IGNORE_TKR_TYPE :: base + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_detach_f08 +end interface MPI_Win_detach + +interface MPI_Win_fence +subroutine MPI_Win_fence_f08(assert,win,ierror) + use :: mpi_f08_types, only : MPI_Win + implicit none + INTEGER, INTENT(IN) :: assert + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_fence_f08 +end interface MPI_Win_fence + +interface MPI_Win_free +subroutine MPI_Win_free_f08(win,ierror) + use :: mpi_f08_types, only : MPI_Win + implicit none + TYPE(MPI_Win), INTENT(INOUT) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_free_f08 +end interface MPI_Win_free + +interface MPI_Win_get_group +subroutine MPI_Win_get_group_f08(win,group,ierror) + use :: mpi_f08_types, only : MPI_Win, MPI_Group + implicit none + TYPE(MPI_Win), INTENT(IN) :: win + TYPE(MPI_Group), INTENT(OUT) :: group + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_get_group_f08 +end interface MPI_Win_get_group + +interface MPI_Win_lock +subroutine MPI_Win_lock_f08(lock_type,rank,assert,win,ierror) + use :: mpi_f08_types, only : MPI_Win + implicit none + INTEGER, INTENT(IN) :: lock_type, rank, assert + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_lock_f08 +end interface MPI_Win_lock + +interface MPI_Win_lock_all +subroutine MPI_Win_lock_all_f08(assert,win,ierror) + use :: mpi_f08_types, only : MPI_Win + implicit none + INTEGER, INTENT(IN) :: assert + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_lock_all_f08 +end interface MPI_Win_lock_all + +interface MPI_Win_post +subroutine MPI_Win_post_f08(group,assert,win,ierror) + use :: mpi_f08_types, only : MPI_Group, MPI_Win + implicit none + TYPE(MPI_Group), INTENT(IN) :: group + INTEGER, INTENT(IN) :: assert + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_post_f08 +end interface MPI_Win_post + +interface MPI_Win_shared_query +subroutine MPI_Win_shared_query_f08(win, rank, size, disp_unit, baseptr,& + ierror) + USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR + use :: mpi_f08_types, only : MPI_Win, MPI_ADDRESS_KIND + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, INTENT(IN) :: rank + INTEGER(KIND=MPI_ADDRESS_KIND), INTENT(OUT) :: size + INTEGER, INTENT(OUT) :: disp_unit + TYPE(C_PTR), INTENT(OUT) :: baseptr + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_shared_query_f08 +end interface MPI_Win_shared_query + +interface MPI_Win_start +subroutine MPI_Win_start_f08(group,assert,win,ierror) + use :: mpi_f08_types, only : MPI_Group, MPI_Win + implicit none + TYPE(MPI_Group), INTENT(IN) :: group + INTEGER, INTENT(IN) :: assert + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_start_f08 +end interface MPI_Win_start + +interface MPI_Win_sync +subroutine MPI_Win_sync_f08(win,ierror) + use :: mpi_f08_types, only : MPI_Group, MPI_Win + implicit none + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_sync_f08 +end interface MPI_Win_sync + +interface MPI_Win_test +subroutine MPI_Win_test_f08(win,flag,ierror) + use :: mpi_f08_types, only : MPI_Win + implicit none + LOGICAL, INTENT(OUT) :: flag + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_test_f08 +end interface MPI_Win_test + +interface MPI_Win_unlock +subroutine MPI_Win_unlock_f08(rank,win,ierror) + use :: mpi_f08_types, only : MPI_Win + implicit none + INTEGER, INTENT(IN) :: rank + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_unlock_f08 +end interface MPI_Win_unlock + +interface MPI_Win_unlock_all +subroutine MPI_Win_unlock_all_f08(win,ierror) + use :: mpi_f08_types, only : MPI_Win + implicit none + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_unlock_all_f08 +end interface MPI_Win_unlock_all + +interface MPI_Win_wait +subroutine MPI_Win_wait_f08(win,ierror) + use :: mpi_f08_types, only : MPI_Win + implicit none + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_wait_f08 +end interface MPI_Win_wait + +interface MPI_Win_flush +subroutine MPI_Win_flush_f08(rank,win,ierror) + use :: mpi_f08_types, only : MPI_Win + implicit none + INTEGER, INTENT(IN) :: rank + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_flush_f08 +end interface MPI_Win_flush + +interface MPI_Win_flush_local +subroutine MPI_Win_flush_local_f08(rank,win,ierror) + use :: mpi_f08_types, only : MPI_Win + implicit none + INTEGER, INTENT(IN) :: rank + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_flush_local_f08 +end interface MPI_Win_flush_local + +interface MPI_Win_flush_local_all +subroutine MPI_Win_flush_local_all_f08(win,ierror) + use :: mpi_f08_types, only : MPI_Win + implicit none + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_flush_local_all_f08 +end interface MPI_Win_flush_local_all + +interface MPI_Win_flush_all +subroutine MPI_Win_flush_all_f08(win,ierror) + use :: mpi_f08_types, only : MPI_Win + implicit none + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Win_flush_all_f08 +end interface MPI_Win_flush_all + +interface MPI_Grequest_complete +subroutine MPI_Grequest_complete_f08(request,ierror) + use :: mpi_f08_types, only : MPI_Request + implicit none + TYPE(MPI_Request), INTENT(IN) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Grequest_complete_f08 +end interface MPI_Grequest_complete + +interface MPI_Grequest_start +subroutine MPI_Grequest_start_f08(query_fn,free_fn,cancel_fn,extra_state,request, & + ierror) + use :: mpi_f08_types, only : MPI_Request, MPI_ADDRESS_KIND + use :: mpi_f08_interfaces_callbacks, only : MPI_Grequest_query_function + use :: mpi_f08_interfaces_callbacks, only : MPI_Grequest_free_function + use :: mpi_f08_interfaces_callbacks, only : MPI_Grequest_cancel_function + implicit none + PROCEDURE(MPI_Grequest_query_function) :: query_fn + PROCEDURE(MPI_Grequest_free_function) :: free_fn + PROCEDURE(MPI_Grequest_cancel_function) :: cancel_fn + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: extra_state + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Grequest_start_f08 +end interface MPI_Grequest_start + +interface MPI_Init_thread +subroutine MPI_Init_thread_f08(required,provided,ierror) + implicit none + INTEGER, INTENT(IN) :: required + INTEGER, INTENT(OUT) :: provided + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Init_thread_f08 +end interface MPI_Init_thread + +interface MPI_Is_thread_main +subroutine MPI_Is_thread_main_f08(flag,ierror) + implicit none + LOGICAL, INTENT(OUT) :: flag + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Is_thread_main_f08 +end interface MPI_Is_thread_main + +interface MPI_Query_thread +subroutine MPI_Query_thread_f08(provided,ierror) + implicit none + INTEGER, INTENT(OUT) :: provided + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Query_thread_f08 +end interface MPI_Query_thread + +interface MPI_Status_set_cancelled +subroutine MPI_Status_set_cancelled_f08(status,flag,ierror) + use :: mpi_f08_types, only : MPI_Status + implicit none + TYPE(MPI_Status), INTENT(INOUT) :: status + LOGICAL, INTENT(OUT) :: flag + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Status_set_cancelled_f08 +end interface MPI_Status_set_cancelled + +interface MPI_Status_set_elements +subroutine MPI_Status_set_elements_f08(status,datatype,count,ierror) + use :: mpi_f08_types, only : MPI_Status, MPI_Datatype + implicit none + TYPE(MPI_Status), INTENT(INOUT) :: status + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, INTENT(IN) :: count + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Status_set_elements_f08 +end interface MPI_Status_set_elements + +interface MPI_Status_set_elements_x +subroutine MPI_Status_set_elements_x_f08(status,datatype,count,ierror) + use :: mpi_f08_types, only : MPI_Status, MPI_Datatype, MPI_COUNT_KIND + implicit none + TYPE(MPI_Status), INTENT(INOUT) :: status + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER(MPI_COUNT_KIND), INTENT(IN) :: count + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Status_set_elements_x_f08 +end interface MPI_Status_set_elements_x + +interface MPI_File_close +subroutine MPI_File_close_f08(fh,ierror) + use :: mpi_f08_types, only : MPI_File + implicit none + TYPE(MPI_File), INTENT(INOUT) :: fh + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_close_f08 +end interface MPI_File_close + +interface MPI_File_delete +subroutine MPI_File_delete_f08(filename,info,ierror) + use :: mpi_f08_types, only : MPI_Info + implicit none + CHARACTER(LEN=*), INTENT(IN) :: filename + TYPE(MPI_Info), INTENT(IN) :: info + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_delete_f08 +end interface MPI_File_delete + +interface MPI_File_get_amode +subroutine MPI_File_get_amode_f08(fh,amode,ierror) + use :: mpi_f08_types, only : MPI_File + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER, INTENT(OUT) :: amode + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_get_amode_f08 +end interface MPI_File_get_amode + +interface MPI_File_get_atomicity +subroutine MPI_File_get_atomicity_f08(fh,flag,ierror) + use :: mpi_f08_types, only : MPI_File + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + LOGICAL, INTENT(OUT) :: flag + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_get_atomicity_f08 +end interface MPI_File_get_atomicity + +interface MPI_File_get_byte_offset +subroutine MPI_File_get_byte_offset_f08(fh,offset,disp,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset + INTEGER(MPI_OFFSET_KIND), INTENT(OUT) :: disp + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_get_byte_offset_f08 +end interface MPI_File_get_byte_offset + +interface MPI_File_get_group +subroutine MPI_File_get_group_f08(fh,group,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Group + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + TYPE(MPI_Group), INTENT(OUT) :: group + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_get_group_f08 +end interface MPI_File_get_group + +interface MPI_File_get_info +subroutine MPI_File_get_info_f08(fh,info_used,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Info + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + TYPE(MPI_Info), INTENT(OUT) :: info_used + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_get_info_f08 +end interface MPI_File_get_info + +interface MPI_File_get_position +subroutine MPI_File_get_position_f08(fh,offset,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(OUT) :: offset + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_get_position_f08 +end interface MPI_File_get_position + +interface MPI_File_get_position_shared +subroutine MPI_File_get_position_shared_f08(fh,offset,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(OUT) :: offset + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_get_position_shared_f08 +end interface MPI_File_get_position_shared + +interface MPI_File_get_size +subroutine MPI_File_get_size_f08(fh,size,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(OUT) :: size + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_get_size_f08 +end interface MPI_File_get_size + +interface MPI_File_get_type_extent +subroutine MPI_File_get_type_extent_f08(fh,datatype,extent,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_ADDRESS_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: extent + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_get_type_extent_f08 +end interface MPI_File_get_type_extent + +interface MPI_File_get_view +subroutine MPI_File_get_view_f08(fh,disp,etype,filetype,datarep,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(OUT) :: disp + TYPE(MPI_Datatype), INTENT(OUT) :: etype + TYPE(MPI_Datatype), INTENT(OUT) :: filetype + CHARACTER(LEN=*), INTENT(OUT) :: datarep + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_get_view_f08 +end interface MPI_File_get_view + +interface MPI_File_iread +subroutine MPI_File_iread_f08(fh,buf,count,datatype,request,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_iread_f08 +end interface MPI_File_iread + +interface MPI_File_iread_at +subroutine MPI_File_iread_at_f08(fh,offset,buf,count,datatype,request,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_iread_at_f08 +end interface MPI_File_iread_at + +interface MPI_File_iread_all +subroutine MPI_File_iread_all_f08(fh,buf,count,datatype,request,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_iread_all_f08 +end interface MPI_File_iread_all + +interface MPI_File_iread_at_all +subroutine MPI_File_iread_at_all_f08(fh,offset,buf,count,datatype,request,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_iread_at_all_f08 +end interface MPI_File_iread_at_all + +interface MPI_File_iread_shared +subroutine MPI_File_iread_shared_f08(fh,buf,count,datatype,request,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_iread_shared_f08 +end interface MPI_File_iread_shared + +interface MPI_File_iwrite +subroutine MPI_File_iwrite_f08(fh,buf,count,datatype,request,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_iwrite_f08 +end interface MPI_File_iwrite + +interface MPI_File_iwrite_at +subroutine MPI_File_iwrite_at_f08(fh,offset,buf,count,datatype,request,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_iwrite_at_f08 +end interface MPI_File_iwrite_at + +interface MPI_File_iwrite_all +subroutine MPI_File_iwrite_all_f08(fh,buf,count,datatype,request,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_iwrite_all_f08 +end interface MPI_File_iwrite_all + +interface MPI_File_iwrite_at_all +subroutine MPI_File_iwrite_at_all_f08(fh,offset,buf,count,datatype,request,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_iwrite_at_all_f08 +end interface MPI_File_iwrite_at_all + +interface MPI_File_iwrite_shared +subroutine MPI_File_iwrite_shared_f08(fh,buf,count,datatype,request,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_iwrite_shared_f08 +end interface MPI_File_iwrite_shared + +interface MPI_File_open +subroutine MPI_File_open_f08(comm,filename,amode,info,fh,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Info, MPI_File + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + CHARACTER(LEN=*), INTENT(IN) :: filename + INTEGER, INTENT(IN) :: amode + TYPE(MPI_Info), INTENT(IN) :: info + TYPE(MPI_File), INTENT(OUT) :: fh + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_open_f08 +end interface MPI_File_open + +interface MPI_File_preallocate +subroutine MPI_File_preallocate_f08(fh,size,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: size + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_preallocate_f08 +end interface MPI_File_preallocate + +interface MPI_File_read +subroutine MPI_File_read_f08(fh,buf,count,datatype,status,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_read_f08 +end interface MPI_File_read + +interface MPI_File_read_all +subroutine MPI_File_read_all_f08(fh,buf,count,datatype,status,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_read_all_f08 +end interface MPI_File_read_all + +interface MPI_File_read_all_begin +subroutine MPI_File_read_all_begin_f08(fh,buf,count,datatype,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_read_all_begin_f08 +end interface MPI_File_read_all_begin + +interface MPI_File_read_all_end +subroutine MPI_File_read_all_end_f08(fh,buf,status,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Status + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_read_all_end_f08 +end interface MPI_File_read_all_end + +interface MPI_File_read_at +subroutine MPI_File_read_at_f08(fh,offset,buf,count,datatype,status,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_read_at_f08 +end interface MPI_File_read_at + +interface MPI_File_read_at_all +subroutine MPI_File_read_at_all_f08(fh,offset,buf,count,datatype,status,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_read_at_all_f08 +end interface MPI_File_read_at_all + +interface MPI_File_read_at_all_begin +subroutine MPI_File_read_at_all_begin_f08(fh,offset,buf,count,datatype,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_read_at_all_begin_f08 +end interface MPI_File_read_at_all_begin + +interface MPI_File_read_at_all_end +subroutine MPI_File_read_at_all_end_f08(fh,buf,status,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Status + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_read_at_all_end_f08 +end interface MPI_File_read_at_all_end + +interface MPI_File_read_ordered +subroutine MPI_File_read_ordered_f08(fh,buf,count,datatype,status,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_read_ordered_f08 +end interface MPI_File_read_ordered + +interface MPI_File_read_ordered_begin +subroutine MPI_File_read_ordered_begin_f08(fh,buf,count,datatype,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_read_ordered_begin_f08 +end interface MPI_File_read_ordered_begin + +interface MPI_File_read_ordered_end +subroutine MPI_File_read_ordered_end_f08(fh,buf,status,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Status + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_read_ordered_end_f08 +end interface MPI_File_read_ordered_end + +interface MPI_File_read_shared +subroutine MPI_File_read_shared_f08(fh,buf,count,datatype,status,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_read_shared_f08 +end interface MPI_File_read_shared + +interface MPI_File_seek +subroutine MPI_File_seek_f08(fh,offset,whence,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset + INTEGER, INTENT(IN) :: whence + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_seek_f08 +end interface MPI_File_seek + +interface MPI_File_seek_shared +subroutine MPI_File_seek_shared_f08(fh,offset,whence,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset + INTEGER, INTENT(IN) :: whence + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_seek_shared_f08 +end interface MPI_File_seek_shared + +interface MPI_File_set_atomicity +subroutine MPI_File_set_atomicity_f08(fh,flag,ierror) + use :: mpi_f08_types, only : MPI_File + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + LOGICAL, INTENT(IN) :: flag + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_set_atomicity_f08 +end interface MPI_File_set_atomicity + +interface MPI_File_set_info +subroutine MPI_File_set_info_f08(fh,info,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Info + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + TYPE(MPI_Info), INTENT(IN) :: info + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_set_info_f08 +end interface MPI_File_set_info + +interface MPI_File_set_size +subroutine MPI_File_set_size_f08(fh,size,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: size + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_set_size_f08 +end interface MPI_File_set_size + +interface MPI_File_set_view +subroutine MPI_File_set_view_f08(fh,disp,etype,filetype,datarep,info,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Info, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: disp + TYPE(MPI_Datatype), INTENT(IN) :: etype + TYPE(MPI_Datatype), INTENT(IN) :: filetype + CHARACTER(LEN=*), INTENT(IN) :: datarep + TYPE(MPI_Info), INTENT(IN) :: info + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_set_view_f08 +end interface MPI_File_set_view + +interface MPI_File_sync +subroutine MPI_File_sync_f08(fh,ierror) + use :: mpi_f08_types, only : MPI_File + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_sync_f08 +end interface MPI_File_sync + +interface MPI_File_write +subroutine MPI_File_write_f08(fh,buf,count,datatype,status,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_write_f08 +end interface MPI_File_write + +interface MPI_File_write_all +subroutine MPI_File_write_all_f08(fh,buf,count,datatype,status,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_write_all_f08 +end interface MPI_File_write_all + +interface MPI_File_write_all_begin +subroutine MPI_File_write_all_begin_f08(fh,buf,count,datatype,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_write_all_begin_f08 +end interface MPI_File_write_all_begin + +interface MPI_File_write_all_end +subroutine MPI_File_write_all_end_f08(fh,buf,status,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Status + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_write_all_end_f08 +end interface MPI_File_write_all_end + +interface MPI_File_write_at +subroutine MPI_File_write_at_f08(fh,offset,buf,count,datatype,status,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_write_at_f08 +end interface MPI_File_write_at + +interface MPI_File_write_at_all +subroutine MPI_File_write_at_all_f08(fh,offset,buf,count,datatype,status,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_write_at_all_f08 +end interface MPI_File_write_at_all + +interface MPI_File_write_at_all_begin +subroutine MPI_File_write_at_all_begin_f08(fh,offset,buf,count,datatype,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_OFFSET_KIND + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_write_at_all_begin_f08 +end interface MPI_File_write_at_all_begin + +interface MPI_File_write_at_all_end +subroutine MPI_File_write_at_all_end_f08(fh,buf,status,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Status + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_write_at_all_end_f08 +end interface MPI_File_write_at_all_end + +interface MPI_File_write_ordered +subroutine MPI_File_write_ordered_f08(fh,buf,count,datatype,status,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_write_ordered_f08 +end interface MPI_File_write_ordered + +interface MPI_File_write_ordered_begin +subroutine MPI_File_write_ordered_begin_f08(fh,buf,count,datatype,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_write_ordered_begin_f08 +end interface MPI_File_write_ordered_begin + +interface MPI_File_write_ordered_end +subroutine MPI_File_write_ordered_end_f08(fh,buf,status,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Status + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_write_ordered_end_f08 +end interface MPI_File_write_ordered_end + +interface MPI_File_write_shared +subroutine MPI_File_write_shared_f08(fh,buf,count,datatype,status,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_File_write_shared_f08 +end interface MPI_File_write_shared + +interface MPI_Register_datarep +subroutine MPI_Register_datarep_f08(datarep,read_conversion_fn,write_conversion_fn, & + dtype_file_extent_fn,extra_state,ierror) + use :: mpi_f08_types, only : MPI_ADDRESS_KIND + use :: mpi_f08_interfaces_callbacks, only : MPI_Datarep_conversion_function + use :: mpi_f08_interfaces_callbacks, only : MPI_Datarep_extent_function + implicit none + CHARACTER(LEN=*), INTENT(IN) :: datarep + PROCEDURE(MPI_Datarep_conversion_function) :: read_conversion_fn + PROCEDURE(MPI_Datarep_conversion_function) :: write_conversion_fn + PROCEDURE(MPI_Datarep_extent_function) :: dtype_file_extent_fn + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: extra_state + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Register_datarep_f08 +end interface MPI_Register_datarep + +! +! MPI_Sizeof is generic for numeric types. This ignore TKR interface +! is replaced by the specific generics. Implemented in mpi_sizeof_mod.F90. +! +!subroutine MPI_Sizeof(x,size,ierror) +! use :: mpi_f08_types +! implicit none +! !DEC$ ATTRIBUTES NO_ARG_CHECK :: x +! !GCC$ ATTRIBUTES NO_ARG_CHECK :: x +! !$PRAGMA IGNORE_TKR x +! !DIR$ IGNORE_TKR x +! !IBM* IGNORE_TKR x +! OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: x +! INTEGER, INTENT(OUT) :: size +! INTEGER, OPTIONAL, INTENT(OUT) :: ierror +!end subroutine MPI_Sizeof + +interface MPI_Type_create_f90_complex +subroutine MPI_Type_create_f90_complex_f08(p,r,newtype,ierror) + use :: mpi_f08_types, only : MPI_Datatype + implicit none + INTEGER, INTENT(IN) :: p, r + TYPE(MPI_Datatype), INTENT(OUT) :: newtype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_create_f90_complex_f08 +end interface MPI_Type_create_f90_complex + +interface MPI_Type_create_f90_integer +subroutine MPI_Type_create_f90_integer_f08(r,newtype,ierror) + use :: mpi_f08_types, only : MPI_Datatype + implicit none + INTEGER, INTENT(IN) :: r + TYPE(MPI_Datatype), INTENT(OUT) :: newtype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_create_f90_integer_f08 +end interface MPI_Type_create_f90_integer + +interface MPI_Type_create_f90_real +subroutine MPI_Type_create_f90_real_f08(p,r,newtype,ierror) + use :: mpi_f08_types, only : MPI_Datatype + implicit none + INTEGER, INTENT(IN) :: p, r + TYPE(MPI_Datatype), INTENT(OUT) :: newtype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_create_f90_real_f08 +end interface MPI_Type_create_f90_real + +interface MPI_Type_match_size +subroutine MPI_Type_match_size_f08(typeclass,size,datatype,ierror) + use :: mpi_f08_types, only : MPI_Datatype + implicit none + INTEGER, INTENT(IN) :: typeclass, size + TYPE(MPI_Datatype), INTENT(OUT) :: datatype + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Type_match_size_f08 +end interface MPI_Type_match_size + +interface MPI_Pcontrol +subroutine MPI_Pcontrol_f08(level) + implicit none + INTEGER, INTENT(IN) :: level +end subroutine MPI_Pcontrol_f08 +end interface MPI_Pcontrol + + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +! New routines to MPI-3 +! + +interface MPI_Comm_split_type +subroutine MPI_Comm_split_type_f08(comm,split_type,key,info,newcomm,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Info + implicit none + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, INTENT(IN) :: split_type + INTEGER, INTENT(IN) :: key + TYPE(MPI_Info), INTENT(IN) :: info + TYPE(MPI_Comm), INTENT(OUT) :: newcomm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Comm_split_type_f08 +end interface MPI_Comm_split_type + +interface MPI_F_sync_reg +subroutine MPI_F_sync_reg_f08(buf) + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf +end subroutine MPI_F_sync_reg_f08 +end interface MPI_F_sync_reg + +interface MPI_Get_library_version +subroutine MPI_Get_library_version_f08(version,resultlen,ierror) + use :: mpi_f08_types, only : MPI_MAX_LIBRARY_VERSION_STRING + implicit none + character(len=MPI_MAX_LIBRARY_VERSION_STRING), intent(out) :: version + integer, intent(out) :: resultlen + integer, optional, intent(out) :: ierror +end subroutine MPI_Get_library_version_f08 +end interface MPI_Get_library_version + +interface MPI_Mprobe +subroutine MPI_Mprobe_f08(source,tag,comm,message,status,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Message, MPI_Status + implicit none + INTEGER, INTENT(IN) :: source, tag + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Message), INTENT(OUT) :: message + TYPE(MPI_Status), INTENT(OUT) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Mprobe_f08 +end interface MPI_Mprobe + +interface MPI_Improbe +subroutine MPI_Improbe_f08(source,tag,comm,flag,message,status,ierror) + use :: mpi_f08_types, only : MPI_Comm, MPI_Message, MPI_Status + implicit none + INTEGER, INTENT(IN) :: source, tag + TYPE(MPI_Comm), INTENT(IN) :: comm + LOGICAL, INTENT(OUT) :: flag + TYPE(MPI_Message), INTENT(OUT) :: message + TYPE(MPI_Status), INTENT(OUT) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Improbe_f08 +end interface MPI_Improbe + +interface MPI_Imrecv +subroutine MPI_Imrecv_f08(buf,count,datatype,message,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Message, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Message), INTENT(INOUT) :: message + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Imrecv_f08 +end interface MPI_Imrecv + +interface MPI_Mrecv +subroutine MPI_Mrecv_f08(buf,count,datatype,message,status,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Message, MPI_Status + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf + !$PRAGMA IGNORE_TKR buf + !DIR$ IGNORE_TKR buf + !IBM* IGNORE_TKR buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Message), INTENT(INOUT) :: message + TYPE(MPI_Status) :: status + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Mrecv_f08 +end interface MPI_Mrecv + +interface MPI_Neighbor_allgather +subroutine MPI_Neighbor_allgather_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & + comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcount, recvcount + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Neighbor_allgather_f08 +end interface MPI_Neighbor_allgather + +interface MPI_Ineighbor_allgather +subroutine MPI_Ineighbor_allgather_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & + comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcount, recvcount + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Ineighbor_allgather_f08 +end interface MPI_Ineighbor_allgather + +interface MPI_Neighbor_allgatherv +subroutine MPI_Neighbor_allgatherv_f08(sendbuf,sendcount,sendtype,recvbuf,recvcounts,displs, & + recvtype,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcount + INTEGER, INTENT(IN) :: recvcounts(*), displs(*) + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Neighbor_allgatherv_f08 +end interface MPI_Neighbor_allgatherv + +interface MPI_Ineighbor_allgatherv +subroutine MPI_Ineighbor_allgatherv_f08(sendbuf,sendcount,sendtype,recvbuf,recvcounts,displs, & + recvtype,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcount + INTEGER, INTENT(IN) :: recvcounts(*), displs(*) + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Ineighbor_allgatherv_f08 +end interface MPI_Ineighbor_allgatherv + +interface MPI_Neighbor_alltoall +subroutine MPI_Neighbor_alltoall_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & + comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcount, recvcount + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Neighbor_alltoall_f08 +end interface MPI_Neighbor_alltoall + +interface MPI_Ineighbor_alltoall +subroutine MPI_Ineighbor_alltoall_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & + comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcount, recvcount + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Ineighbor_alltoall_f08 +end interface MPI_Ineighbor_alltoall + +interface MPI_Neighbor_alltoallv +subroutine MPI_Neighbor_alltoallv_f08(sendbuf,sendcounts,sdispls,sendtype,recvbuf,recvcounts, & + rdispls,recvtype,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcounts(*), sdispls(*), recvcounts(*), rdispls(*) + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Neighbor_alltoallv_f08 +end interface MPI_Neighbor_alltoallv + +interface MPI_Ineighbor_alltoallv +subroutine MPI_Ineighbor_alltoallv_f08(sendbuf,sendcounts,sdispls,sendtype,recvbuf,recvcounts, & + rdispls,recvtype,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcounts(*), sdispls(*), recvcounts(*), rdispls(*) + TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(IN) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Ineighbor_alltoallv_f08 +end interface MPI_Ineighbor_alltoallv + +interface MPI_Neighbor_alltoallw +subroutine MPI_Neighbor_alltoallw_f08(sendbuf,sendcounts,sdispls,sendtypes,recvbuf,recvcounts, & + rdispls,recvtypes,comm,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_ADDRESS_KIND + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcounts(*), recvcounts(*) + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: sdispls(*), rdispls(*) + TYPE(MPI_Datatype), INTENT(IN) :: sendtypes(*), recvtypes(*) + TYPE(MPI_Comm), INTENT(IN) :: comm + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Neighbor_alltoallw_f08 +end interface MPI_Neighbor_alltoallw + +interface MPI_Ineighbor_alltoallw +subroutine MPI_Ineighbor_alltoallw_f08(sendbuf,sendcounts,sdispls,sendtypes,recvbuf,recvcounts, & + rdispls,recvtypes,comm,request,ierror) + use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request, MPI_ADDRESS_KIND + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf + !$PRAGMA IGNORE_TKR sendbuf, recvbuf + !DIR$ IGNORE_TKR sendbuf, recvbuf + !IBM* IGNORE_TKR sendbuf, recvbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf + INTEGER, INTENT(IN) :: sendcounts(*), recvcounts(*) + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: sdispls(*), rdispls(*) + TYPE(MPI_Datatype), INTENT(IN) :: sendtypes(*), recvtypes(*) + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Request), INTENT(IN) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine MPI_Ineighbor_alltoallw_f08 +end interface MPI_Ineighbor_alltoallw + +end module mpi_f08_interfaces diff --git a/ompi/mpi/fortran/use-mpi-f08/mpi-f08-types.F90 b/ompi/mpi/fortran/use-mpi-f08/mod/mpi-f08-types.F90 similarity index 97% rename from ompi/mpi/fortran/use-mpi-f08/mpi-f08-types.F90 rename to ompi/mpi/fortran/use-mpi-f08/mod/mpi-f08-types.F90 index 622d0ae574b..9eabb36eb7c 100644 --- a/ompi/mpi/fortran/use-mpi-f08/mpi-f08-types.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/mod/mpi-f08-types.F90 @@ -3,7 +3,7 @@ ! Copyright (c) 2009-2015 Cisco Systems, Inc. All rights reserved. ! Copyright (c) 2009-2012 Los Alamos National Security, LLC. ! All rights reserved. -! Copyright (c) 2015 Research Organization for Information Science +! Copyright (c) 2015-2017 Research Organization for Information Science ! and Technology (RIST). All rights reserved. ! $COPYRIGHT$ ! @@ -19,9 +19,7 @@ module mpi_f08_types include "mpif-config.h" include "mpif-constants.h" -#if OMPI_PROVIDE_MPI_FILE_INTERFACE include "mpif-io-constants.h" -#endif ! ! derived types @@ -39,11 +37,9 @@ module mpi_f08_types integer :: MPI_VAL end type MPI_Errhandler -#if OMPI_PROVIDE_MPI_FILE_INTERFACE type, BIND(C) :: MPI_File integer :: MPI_VAL end type MPI_File -#endif type, BIND(C) :: MPI_Group integer :: MPI_VAL @@ -120,9 +116,7 @@ module mpi_f08_types type(MPI_Op), bind(C, name="ompi_f08_mpi_op_null") OMPI_PROTECTED :: MPI_OP_NULL; type(MPI_Request), bind(C, name="ompi_f08_mpi_request_null") OMPI_PROTECTED :: MPI_REQUEST_NULL; type(MPI_Win), bind(C, name="ompi_f08_mpi_win_null") OMPI_PROTECTED :: MPI_WIN_NULL; -#if OMPI_PROVIDE_MPI_FILE_INTERFACE type(MPI_File), bind(C, name="ompi_f08_mpi_file_null") OMPI_PROTECTED :: MPI_FILE_NULL; -#endif ! ! Pre-defined datatype bindings @@ -175,9 +169,7 @@ module mpi_f08_types module procedure ompi_comm_op_eq module procedure ompi_datatype_op_eq module procedure ompi_errhandler_op_eq -#if OMPI_PROVIDE_MPI_FILE_INTERFACE module procedure ompi_file_op_eq -#endif module procedure ompi_group_op_eq module procedure ompi_info_op_eq module procedure ompi_message_op_eq @@ -190,9 +182,7 @@ module mpi_f08_types module procedure ompi_comm_op_ne module procedure ompi_datatype_op_ne module procedure ompi_errhandler_op_ne -#if OMPI_PROVIDE_MPI_FILE_INTERFACE module procedure ompi_file_op_ne -#endif module procedure ompi_group_op_ne module procedure ompi_info_op_ne module procedure ompi_message_op_ne @@ -220,12 +210,10 @@ logical function ompi_errhandler_op_eq(a, b) ompi_errhandler_op_eq = (a%MPI_VAL .EQ. b%MPI_VAL) end function ompi_errhandler_op_eq -#if OMPI_PROVIDE_MPI_FILE_INTERFACE logical function ompi_file_op_eq(a, b) type(MPI_File), intent(in) :: a, b ompi_file_op_eq = (a%MPI_VAL .EQ. b%MPI_VAL) end function ompi_file_op_eq -#endif logical function ompi_group_op_eq(a, b) type(MPI_Group), intent(in) :: a, b @@ -274,12 +262,10 @@ logical function ompi_errhandler_op_ne(a, b) ompi_errhandler_op_ne = (a%MPI_VAL .NE. b%MPI_VAL) end function ompi_errhandler_op_ne -#if OMPI_PROVIDE_MPI_FILE_INTERFACE logical function ompi_file_op_ne(a, b) type(MPI_File), intent(in) :: a, b ompi_file_op_ne = (a%MPI_VAL .NE. b%MPI_VAL) end function ompi_file_op_ne -#endif logical function ompi_group_op_ne(a, b) type(MPI_Group), intent(in) :: a, b diff --git a/ompi/mpi/fortran/use-mpi-f08/pmpi-f08-interfaces.F90 b/ompi/mpi/fortran/use-mpi-f08/mod/pmpi-f08-interfaces.F90 similarity index 96% rename from ompi/mpi/fortran/use-mpi-f08/pmpi-f08-interfaces.F90 rename to ompi/mpi/fortran/use-mpi-f08/mod/pmpi-f08-interfaces.F90 index 99c6306d58d..ebf4bdbf8eb 100644 --- a/ompi/mpi/fortran/use-mpi-f08/pmpi-f08-interfaces.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/mod/pmpi-f08-interfaces.F90 @@ -7,8 +7,9 @@ ! of Tennessee Research Foundation. All rights ! reserved. ! Copyright (c) 2012 Inria. All rights reserved. -! Copyright (c) 2015 Research Organization for Information Science +! Copyright (c) 2015-2017 Research Organization for Information Science ! and Technology (RIST). All rights reserved. +! Copyright (c) 2017 FUJITSU LIMITED. All rights reserved. ! $COPYRIGHT$ ! ! This file provides the interface specifications for the MPI Fortran @@ -46,7 +47,7 @@ subroutine PMPI_Bsend_init_f08(buf,count,datatype,dest,tag,comm,request,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf INTEGER, INTENT(IN) :: count, dest, tag TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Comm), INTENT(IN) :: comm @@ -63,7 +64,7 @@ subroutine PMPI_Buffer_attach_f08(buffer,size,ierror) !$PRAGMA IGNORE_TKR buffer !DIR$ IGNORE_TKR buffer !IBM* IGNORE_TKR buffer - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buffer + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buffer INTEGER, INTENT(IN) :: size INTEGER, OPTIONAL, INTENT(OUT) :: ierror end subroutine PMPI_Buffer_attach_f08 @@ -71,13 +72,9 @@ end subroutine PMPI_Buffer_attach_f08 interface PMPI_Buffer_detach subroutine PMPI_Buffer_detach_f08(buffer_addr,size,ierror) + USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buffer_addr - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buffer_addr - !$PRAGMA IGNORE_TKR buffer_addr - !DIR$ IGNORE_TKR buffer_addr - !IBM* IGNORE_TKR buffer_addr - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buffer_addr + TYPE(C_PTR), INTENT(OUT) :: buffer_addr INTEGER, INTENT(OUT) :: size INTEGER, OPTIONAL, INTENT(OUT) :: ierror end subroutine PMPI_Buffer_detach_f08 @@ -112,7 +109,7 @@ subroutine PMPI_Ibsend_f08(buf,count,datatype,dest,tag,comm,request,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf INTEGER, INTENT(IN) :: count, dest, tag TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Comm), INTENT(IN) :: comm @@ -142,7 +139,7 @@ subroutine PMPI_Irecv_f08(buf,count,datatype,source,tag,comm,request,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf INTEGER, INTENT(IN) :: count, source, tag TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Comm), INTENT(IN) :: comm @@ -160,7 +157,7 @@ subroutine PMPI_Irsend_f08(buf,count,datatype,dest,tag,comm,request,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf INTEGER, INTENT(IN) :: count, dest, tag TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Comm), INTENT(IN) :: comm @@ -178,7 +175,7 @@ subroutine PMPI_Isend_f08(buf,count,datatype,dest,tag,comm,request,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf INTEGER, INTENT(IN) :: count, dest, tag TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Comm), INTENT(IN) :: comm @@ -196,7 +193,7 @@ subroutine PMPI_Issend_f08(buf,count,datatype,dest,tag,comm,request,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf INTEGER, INTENT(IN) :: count, dest, tag TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Comm), INTENT(IN) :: comm @@ -243,7 +240,7 @@ subroutine PMPI_Recv_init_f08(buf,count,datatype,source,tag,comm,request,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf INTEGER, INTENT(IN) :: count, source, tag TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Comm), INTENT(IN) :: comm @@ -298,7 +295,7 @@ subroutine PMPI_Rsend_init_f08(buf,count,datatype,dest,tag,comm,request,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf INTEGER, INTENT(IN) :: count, dest, tag TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Comm), INTENT(IN) :: comm @@ -372,7 +369,7 @@ subroutine PMPI_Send_init_f08(buf,count,datatype,dest,tag,comm,request,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf INTEGER, INTENT(IN) :: count, dest, tag TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Comm), INTENT(IN) :: comm @@ -407,7 +404,7 @@ subroutine PMPI_Ssend_init_f08(buf,count,datatype,dest,tag,comm,request,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf INTEGER, INTENT(IN) :: count, dest, tag TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Comm), INTENT(IN) :: comm @@ -549,7 +546,7 @@ subroutine PMPI_Get_address_f08(location,address,ierror) !$PRAGMA IGNORE_TKR location !DIR$ IGNORE_TKR location !IBM* IGNORE_TKR location - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: location + OMPI_FORTRAN_IGNORE_TKR_TYPE :: location INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: address INTEGER, OPTIONAL, INTENT(OUT) :: ierror end subroutine PMPI_Get_address_f08 @@ -1762,7 +1759,7 @@ subroutine PMPI_Comm_get_info_f08(comm,info_used,ierror) use :: mpi_f08_types, only : MPI_Comm, MPI_Info implicit none TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Comm), INTENT(OUT) :: info_used + TYPE(MPI_Info), INTENT(OUT) :: info_used INTEGER, OPTIONAL, INTENT(OUT) :: ierror end subroutine PMPI_Comm_get_info_f08 end interface PMPI_Comm_get_info @@ -2105,6 +2102,36 @@ subroutine PMPI_Type_set_name_f08(datatype,type_name,ierror) end subroutine PMPI_Type_set_name_f08 end interface PMPI_Type_set_name +interface PMPI_Win_allocate +subroutine PMPI_Win_allocate_f08(size, disp_unit, info, comm, & + baseptr, win, ierror) + USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR + use :: mpi_f08_types, only : MPI_Info, MPI_Comm, MPI_Win, MPI_ADDRESS_KIND + INTEGER(KIND=MPI_ADDRESS_KIND), INTENT(IN) :: size + INTEGER, INTENT(IN) :: disp_unit + TYPE(MPI_Info), INTENT(IN) :: info + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(C_PTR), INTENT(OUT) :: baseptr + TYPE(MPI_Win), INTENT(OUT) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine PMPI_Win_allocate_f08 +end interface PMPI_Win_allocate + +interface PMPI_Win_allocate_shared +subroutine PMPI_Win_allocate_shared_f08(size, disp_unit, info, comm, & + baseptr, win, ierror) + USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR + use :: mpi_f08_types, only : MPI_Info, MPI_Comm, MPI_Win, MPI_ADDRESS_KIND + INTEGER(KIND=MPI_ADDRESS_KIND), INTENT(IN) :: size + INTEGER, INTENT(IN) :: disp_unit + TYPE(MPI_Info), INTENT(IN) :: info + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(C_PTR), INTENT(OUT) :: baseptr + TYPE(MPI_Win), INTENT(OUT) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine PMPI_Win_allocate_shared_f08 +end interface PMPI_Win_allocate_shared + interface PMPI_Win_create_keyval subroutine PMPI_Win_create_keyval_f08(win_copy_attr_fn,win_delete_attr_fn,win_keyval, & extra_state,ierror) @@ -2150,6 +2177,16 @@ subroutine PMPI_Win_get_attr_f08(win,win_keyval,attribute_val,flag,ierror) end subroutine PMPI_Win_get_attr_f08 end interface PMPI_Win_get_attr +interface PMPI_Win_get_info +subroutine PMPI_Win_get_info_f08(win,info,ierror) + use :: mpi_f08_types, only : MPI_Win, MPI_Info + implicit none + TYPE(MPI_Win), INTENT(IN) :: win + TYPE(MPI_Info), INTENT(OUT) :: info + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine PMPI_Win_get_info_f08 +end interface PMPI_Win_get_info + interface PMPI_Win_get_name subroutine PMPI_Win_get_name_f08(win,win_name,resultlen,ierror) use :: mpi_f08_types, only : MPI_Win, MPI_MAX_OBJECT_NAME @@ -2172,6 +2209,16 @@ subroutine PMPI_Win_set_attr_f08(win,win_keyval,attribute_val,ierror) end subroutine PMPI_Win_set_attr_f08 end interface PMPI_Win_set_attr +interface PMPI_Win_set_info +subroutine PMPI_Win_set_info_f08(win,info,ierror) + use :: mpi_f08_types, only : MPI_Win, MPI_Info + implicit none + TYPE(MPI_Win), INTENT(IN) :: win + TYPE(MPI_Info), INTENT(IN) :: info + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine PMPI_Win_set_info_f08 +end interface PMPI_Win_set_info + interface PMPI_Win_set_name subroutine PMPI_Win_set_name_f08(win,win_name,ierror) use :: mpi_f08_types, only : MPI_Win @@ -2433,21 +2480,23 @@ end function PMPI_Wtime_f08 end interface PMPI_Wtime interface PMPI_Aint_add -subroutine PMPI_Aint_add_f08(base,diff) +function PMPI_Aint_add_f08(base,diff) use :: mpi_f08_types, only : MPI_ADDRESS_KIND implicit none INTEGER(MPI_ADDRESS_KIND) :: base INTEGER(MPI_ADDRESS_KIND) :: diff -end subroutine PMPI_Aint_add_f08 + INTEGER(MPI_ADDRESS_KIND) :: PMPI_Aint_add_f08 +end function PMPI_Aint_add_f08 end interface PMPI_Aint_add interface PMPI_Aint_diff -subroutine PMPI_Aint_diff_f08(addr1,addr2) +function PMPI_Aint_diff_f08(addr1,addr2) use :: mpi_f08_types, only : MPI_ADDRESS_KIND implicit none INTEGER(MPI_ADDRESS_KIND) :: addr1 INTEGER(MPI_ADDRESS_KIND) :: addr2 -end subroutine PMPI_Aint_diff_f08 + INTEGER(MPI_ADDRESS_KIND) :: PMPI_Aint_diff_f08 +end function PMPI_Aint_diff_f08 end interface PMPI_Aint_diff interface PMPI_Abort @@ -2568,8 +2617,6 @@ subroutine PMPI_Error_string_f08(errorcode,string,resultlen,ierror) end subroutine PMPI_Error_string_f08 end interface PMPI_Error_string -#if OMPI_PROVIDE_MPI_FILE_INTERFACE - interface PMPI_File_call_errhandler subroutine PMPI_File_call_errhandler_f08(fh,errorcode,ierror) use :: mpi_f08_types, only : MPI_File @@ -2611,9 +2658,6 @@ subroutine PMPI_File_set_errhandler_f08(file,errhandler,ierror) end subroutine PMPI_File_set_errhandler_f08 end interface PMPI_File_set_errhandler -! endif for OMPI_PROVIDE_MPI_FILE_INTERFACE -#endif - interface PMPI_Finalize subroutine PMPI_Finalize_f08(ierror) implicit none @@ -2632,7 +2676,6 @@ end subroutine PMPI_Finalized_f08 ! ASYNCHRONOUS had to removed from the base argument because ! the dummy argument is not an assumed-shape array. This will ! be okay once the Interop TR is implemented. -! interface PMPI_Free_mem subroutine PMPI_Free_mem_f08(base,ierror) implicit none @@ -2958,7 +3001,7 @@ subroutine PMPI_Accumulate_f08(origin_addr,origin_count,origin_datatype,target_r !$PRAGMA IGNORE_TKR origin_addr !DIR$ IGNORE_TKR origin_addr !IBM* IGNORE_TKR origin_addr - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) OMPI_ASYNCHRONOUS :: origin_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr INTEGER, INTENT(IN) :: origin_count, target_rank, target_count TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp @@ -3002,7 +3045,7 @@ subroutine PMPI_Get_f08(origin_addr,origin_count,origin_datatype,target_rank, & !$PRAGMA IGNORE_TKR origin_addr !DIR$ IGNORE_TKR origin_addr !IBM* IGNORE_TKR origin_addr - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: origin_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE :: origin_addr INTEGER, INTENT(IN) :: origin_count, target_rank, target_count TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp @@ -3092,7 +3135,7 @@ subroutine PMPI_Put_f08(origin_addr,origin_count,origin_datatype,target_rank, & !$PRAGMA IGNORE_TKR origin_addr !DIR$ IGNORE_TKR origin_addr !IBM* IGNORE_TKR origin_addr - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) OMPI_ASYNCHRONOUS :: origin_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr INTEGER, INTENT(IN) :: origin_count, target_rank, target_count TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp @@ -3182,7 +3225,7 @@ subroutine PMPI_Win_create_f08(base,size,disp_unit,info,comm,win,ierror) !$PRAGMA IGNORE_TKR base !DIR$ IGNORE_TKR base !IBM* IGNORE_TKR base - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: base + OMPI_FORTRAN_IGNORE_TKR_TYPE :: base INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: size INTEGER, INTENT(IN) :: disp_unit TYPE(MPI_Info), INTENT(IN) :: info @@ -3192,6 +3235,48 @@ subroutine PMPI_Win_create_f08(base,size,disp_unit,info,comm,win,ierror) end subroutine PMPI_Win_create_f08 end interface PMPI_Win_create +interface PMPI_Win_create_dynamic +subroutine PMPI_Win_create_dynamic_f08(info,comm,win,ierror) + use :: mpi_f08_types, only : MPI_Info, MPI_Comm, MPI_Win + implicit none + TYPE(MPI_Info), INTENT(IN) :: info + TYPE(MPI_Comm), INTENT(IN) :: comm + TYPE(MPI_Win), INTENT(OUT) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine PMPI_Win_create_dynamic_f08 +end interface PMPI_Win_create_dynamic + +interface PMPI_Win_attach +subroutine PMPI_Win_attach_f08(win,base,size,ierror) + use :: mpi_f08_types, only : MPI_Win, MPI_ADDRESS_KIND + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: base + !GCC$ ATTRIBUTES NO_ARG_CHECK :: base + !$PRAGMA IGNORE_TKR base + !DIR$ IGNORE_TKR base + !IBM* IGNORE_TKR base + OMPI_FORTRAN_IGNORE_TKR_TYPE :: base + INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: size + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine PMPI_Win_attach_f08 +end interface PMPI_Win_attach + +interface PMPI_Win_detach +subroutine PMPI_Win_detach_f08(win,base,ierror) + use :: mpi_f08_types, only : MPI_Win, MPI_ADDRESS_KIND + implicit none + !DEC$ ATTRIBUTES NO_ARG_CHECK :: base + !GCC$ ATTRIBUTES NO_ARG_CHECK :: base + !$PRAGMA IGNORE_TKR base + !DIR$ IGNORE_TKR base + !IBM* IGNORE_TKR base + OMPI_FORTRAN_IGNORE_TKR_TYPE :: base + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine PMPI_Win_detach_f08 +end interface PMPI_Win_detach + interface PMPI_Win_fence subroutine PMPI_Win_fence_f08(assert,win,ierror) use :: mpi_f08_types, only : MPI_Win @@ -3252,6 +3337,20 @@ subroutine PMPI_Win_post_f08(group,assert,win,ierror) end subroutine PMPI_Win_post_f08 end interface PMPI_Win_post +interface PMPI_Win_shared_query +subroutine PMPI_Win_shared_query_f08(win, rank, size, disp_unit, baseptr,& + ierror) + USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR + use :: mpi_f08_types, only : MPI_Win, MPI_ADDRESS_KIND + TYPE(MPI_Win), INTENT(IN) :: win + INTEGER, INTENT(IN) :: rank + INTEGER(KIND=MPI_ADDRESS_KIND), INTENT(OUT) :: size + INTEGER, INTENT(OUT) :: disp_unit + TYPE(C_PTR), INTENT(OUT) :: baseptr + INTEGER, OPTIONAL, INTENT(OUT) :: ierror +end subroutine PMPI_Win_shared_query_f08 +end interface PMPI_Win_shared_query + interface PMPI_Win_start subroutine PMPI_Win_start_f08(group,assert,win,ierror) use :: mpi_f08_types, only : MPI_Group, MPI_Win @@ -3431,8 +3530,6 @@ subroutine PMPI_Status_set_elements_x_f08(status,datatype,count,ierror) end subroutine PMPI_Status_set_elements_x_f08 end interface PMPI_Status_set_elements_x -#if OMPI_PROVIDE_MPI_FILE_INTERFACE - interface PMPI_File_close subroutine PMPI_File_close_f08(fh,ierror) use :: mpi_f08_types, only : MPI_File @@ -3567,7 +3664,7 @@ subroutine PMPI_File_iread_f08(fh,buf,count,datatype,request,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf INTEGER, INTENT(IN) :: count TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Request), INTENT(OUT) :: request @@ -3586,7 +3683,7 @@ subroutine PMPI_File_iread_at_f08(fh,offset,buf,count,datatype,request,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf INTEGER, INTENT(IN) :: count TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Request), INTENT(OUT) :: request @@ -3604,7 +3701,7 @@ subroutine PMPI_File_iread_all_f08(fh,buf,count,datatype,request,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf INTEGER, INTENT(IN) :: count TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Request), INTENT(OUT) :: request @@ -3623,7 +3720,7 @@ subroutine PMPI_File_iread_at_all_f08(fh,offset,buf,count,datatype,request,ierro !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf INTEGER, INTENT(IN) :: count TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Request), INTENT(OUT) :: request @@ -3641,7 +3738,7 @@ subroutine PMPI_File_iread_shared_f08(fh,buf,count,datatype,request,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf INTEGER, INTENT(IN) :: count TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Request), INTENT(OUT) :: request @@ -3659,7 +3756,7 @@ subroutine PMPI_File_iwrite_f08(fh,buf,count,datatype,request,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf INTEGER, INTENT(IN) :: count TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Request), INTENT(OUT) :: request @@ -3678,7 +3775,7 @@ subroutine PMPI_File_iwrite_at_f08(fh,offset,buf,count,datatype,request,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf INTEGER, INTENT(IN) :: count TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Request), INTENT(OUT) :: request @@ -3696,7 +3793,7 @@ subroutine PMPI_File_iwrite_all_f08(fh,buf,count,datatype,request,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf INTEGER, INTENT(IN) :: count TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Request), INTENT(OUT) :: request @@ -3715,7 +3812,7 @@ subroutine PMPI_File_iwrite_at_all_f08(fh,offset,buf,count,datatype,request,ierr !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf INTEGER, INTENT(IN) :: count TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Request), INTENT(OUT) :: request @@ -3732,7 +3829,7 @@ subroutine PMPI_File_iwrite_shared_f08(fh,buf,count,datatype,request,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf TYPE(MPI_File), INTENT(IN) :: fh INTEGER, INTENT(IN) :: count TYPE(MPI_Datatype), INTENT(IN) :: datatype @@ -3810,7 +3907,7 @@ subroutine PMPI_File_read_all_begin_f08(fh,buf,count,datatype,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf INTEGER, INTENT(IN) :: count TYPE(MPI_Datatype), INTENT(IN) :: datatype INTEGER, OPTIONAL, INTENT(OUT) :: ierror @@ -3827,7 +3924,7 @@ subroutine PMPI_File_read_all_end_f08(fh,buf,status,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf TYPE(MPI_Status) :: status INTEGER, OPTIONAL, INTENT(OUT) :: ierror end subroutine PMPI_File_read_all_end_f08 @@ -3882,7 +3979,7 @@ subroutine PMPI_File_read_at_all_begin_f08(fh,offset,buf,count,datatype,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf INTEGER, INTENT(IN) :: count TYPE(MPI_Datatype), INTENT(IN) :: datatype INTEGER, OPTIONAL, INTENT(OUT) :: ierror @@ -3899,7 +3996,7 @@ subroutine PMPI_File_read_at_all_end_f08(fh,buf,status,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf TYPE(MPI_Status) :: status INTEGER, OPTIONAL, INTENT(OUT) :: ierror end subroutine PMPI_File_read_at_all_end_f08 @@ -3933,7 +4030,7 @@ subroutine PMPI_File_read_ordered_begin_f08(fh,buf,count,datatype,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf INTEGER, INTENT(IN) :: count TYPE(MPI_Datatype), INTENT(IN) :: datatype INTEGER, OPTIONAL, INTENT(OUT) :: ierror @@ -3950,7 +4047,7 @@ subroutine PMPI_File_read_ordered_end_f08(fh,buf,status,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf TYPE(MPI_Status) :: status INTEGER, OPTIONAL, INTENT(OUT) :: ierror end subroutine PMPI_File_read_ordered_end_f08 @@ -4095,7 +4192,7 @@ subroutine PMPI_File_write_all_begin_f08(fh,buf,count,datatype,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf INTEGER, INTENT(IN) :: count TYPE(MPI_Datatype), INTENT(IN) :: datatype INTEGER, OPTIONAL, INTENT(OUT) :: ierror @@ -4112,7 +4209,7 @@ subroutine PMPI_File_write_all_end_f08(fh,buf,status,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf TYPE(MPI_Status) :: status INTEGER, OPTIONAL, INTENT(OUT) :: ierror end subroutine PMPI_File_write_all_end_f08 @@ -4167,7 +4264,7 @@ subroutine PMPI_File_write_at_all_begin_f08(fh,offset,buf,count,datatype,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf INTEGER, INTENT(IN) :: count TYPE(MPI_Datatype), INTENT(IN) :: datatype INTEGER, OPTIONAL, INTENT(OUT) :: ierror @@ -4184,7 +4281,7 @@ subroutine PMPI_File_write_at_all_end_f08(fh,buf,status,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf TYPE(MPI_Status) :: status INTEGER, OPTIONAL, INTENT(OUT) :: ierror end subroutine PMPI_File_write_at_all_end_f08 @@ -4218,7 +4315,7 @@ subroutine PMPI_File_write_ordered_begin_f08(fh,buf,count,datatype,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf INTEGER, INTENT(IN) :: count TYPE(MPI_Datatype), INTENT(IN) :: datatype INTEGER, OPTIONAL, INTENT(OUT) :: ierror @@ -4235,7 +4332,7 @@ subroutine PMPI_File_write_ordered_end_f08(fh,buf,status,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf TYPE(MPI_Status) :: status INTEGER, OPTIONAL, INTENT(OUT) :: ierror end subroutine PMPI_File_write_ordered_end_f08 @@ -4259,9 +4356,6 @@ subroutine PMPI_File_write_shared_f08(fh,buf,count,datatype,status,ierror) end subroutine PMPI_File_write_shared_f08 end interface PMPI_File_write_shared -! endif for OMPI_PROVIDE_MPI_FILE_INTERFACE -#endif - interface PMPI_Register_datarep subroutine PMPI_Register_datarep_f08(datarep,read_conversion_fn,write_conversion_fn, & dtype_file_extent_fn,extra_state,ierror) @@ -4368,7 +4462,7 @@ subroutine PMPI_F_sync_reg_f08(buf) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf end subroutine PMPI_F_sync_reg_f08 end interface PMPI_F_sync_reg @@ -4416,7 +4510,7 @@ subroutine PMPI_Imrecv_f08(buf,count,datatype,message,request,ierror) !$PRAGMA IGNORE_TKR buf !DIR$ IGNORE_TKR buf !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE OMPI_ASYNCHRONOUS :: buf + OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf INTEGER, INTENT(IN) :: count TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Message), INTENT(INOUT) :: message diff --git a/ompi/mpi/fortran/use-mpi-f08/mpi-f-interfaces-bind.h b/ompi/mpi/fortran/use-mpi-f08/mpi-f-interfaces-bind.h index 9e2ad6060aa..478ecbf9e29 100644 --- a/ompi/mpi/fortran/use-mpi-f08/mpi-f-interfaces-bind.h +++ b/ompi/mpi/fortran/use-mpi-f08/mpi-f-interfaces-bind.h @@ -7,8 +7,8 @@ ! of Tennessee Research Foundation. All rights ! reserved. ! Copyright (c) 2012 Inria. All rights reserved. -! Copyright (c) 2015 Research Organization for Information Science -! and Technology (RIST). All rights reserved. +! Copyright (c) 2015-2018 Research Organization for Information Science +! and Technology (RIST). All rights reserved. ! $COPYRIGHT$ ! ! This file provides the interface specifications for the MPI Fortran @@ -655,10 +655,10 @@ subroutine ompi_type_create_subarray_f(ndims,array_of_sizes, & INTEGER, INTENT(OUT) :: ierror end subroutine ompi_type_create_subarray_f -subroutine ompi_type_dup_f(type,newtype,ierror) & +subroutine ompi_type_dup_f(oldtype,newtype,ierror) & BIND(C, name="ompi_type_dup_f") implicit none - INTEGER, INTENT(IN) :: type + INTEGER, INTENT(IN) :: oldtype INTEGER, INTENT(OUT) :: newtype INTEGER, INTENT(OUT) :: ierror end subroutine ompi_type_dup_f @@ -1536,10 +1536,10 @@ subroutine ompi_type_create_keyval_f(type_copy_attr_fn,type_delete_attr_fn, & INTEGER, INTENT(OUT) :: ierror end subroutine ompi_type_create_keyval_f -subroutine ompi_type_delete_attr_f(type,type_keyval,ierror) & +subroutine ompi_type_delete_attr_f(datatype,type_keyval,ierror) & BIND(C, name="ompi_type_delete_attr_f") implicit none - INTEGER, INTENT(IN) :: type + INTEGER, INTENT(IN) :: datatype INTEGER, INTENT(IN) :: type_keyval INTEGER, INTENT(OUT) :: ierror end subroutine ompi_type_delete_attr_f @@ -1551,32 +1551,32 @@ subroutine ompi_type_free_keyval_f(type_keyval,ierror) & INTEGER, INTENT(OUT) :: ierror end subroutine ompi_type_free_keyval_f -subroutine ompi_type_get_name_f(type,type_name,resultlen,ierror,type_name_len) & +subroutine ompi_type_get_name_f(datatype,type_name,resultlen,ierror,type_name_len) & BIND(C, name="ompi_type_get_name_f") use, intrinsic :: ISO_C_BINDING, only : C_CHAR implicit none - INTEGER, INTENT(IN) :: type + INTEGER, INTENT(IN) :: datatype CHARACTER(KIND=C_CHAR), DIMENSION(*), INTENT(OUT) :: type_name INTEGER, INTENT(OUT) :: resultlen INTEGER, INTENT(OUT) :: ierror INTEGER, VALUE, INTENT(IN) :: type_name_len end subroutine ompi_type_get_name_f -subroutine ompi_type_set_attr_f(type,type_keyval,attribute_val,ierror) & +subroutine ompi_type_set_attr_f(datatype,type_keyval,attribute_val,ierror) & BIND(C, name="ompi_type_set_attr_f") use :: mpi_f08_types, only : MPI_ADDRESS_KIND implicit none - INTEGER, INTENT(IN) :: type + INTEGER, INTENT(IN) :: datatype INTEGER, INTENT(IN) :: type_keyval INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: attribute_val INTEGER, INTENT(OUT) :: ierror end subroutine ompi_type_set_attr_f -subroutine ompi_type_set_name_f(type,type_name,ierror,type_name_len) & +subroutine ompi_type_set_name_f(datatype,type_name,ierror,type_name_len) & BIND(C, name="ompi_type_set_name_f") use, intrinsic :: ISO_C_BINDING, only : C_CHAR implicit none - INTEGER, INTENT(IN) :: type + INTEGER, INTENT(IN) :: datatype CHARACTER(KIND=C_CHAR), DIMENSION(*), INTENT(IN) :: type_name INTEGER, INTENT(OUT) :: ierror INTEGER, VALUE, INTENT(IN) :: type_name_len @@ -1908,8 +1908,6 @@ subroutine ompi_error_string_f(errorcode,string,resultlen,ierror,str_len) & INTEGER, VALUE, INTENT(IN) :: str_len end subroutine ompi_error_string_f -#if OMPI_PROVIDE_MPI_FILE_INTERFACE - subroutine ompi_file_call_errhandler_f(fh,errorcode,ierror) & BIND(C, name="ompi_file_call_errhandler_f") implicit none @@ -1943,9 +1941,6 @@ subroutine ompi_file_set_errhandler_f(file,errhandler,ierror) & INTEGER, INTENT(OUT) :: ierror end subroutine ompi_file_set_errhandler_f -! OMPI_PROFILE_MPI_FILE_INTERFACE -#endif - subroutine ompi_finalize_f(ierror) & BIND(C, name="ompi_finalize_f") implicit none @@ -2625,8 +2620,6 @@ subroutine ompi_status_set_elements_x_f(status,datatype,count,ierror) & INTEGER, INTENT(OUT) :: ierror end subroutine ompi_status_set_elements_x_f -#if OMPI_PROVIDE_MPI_FILE_INTERFACE - subroutine ompi_file_close_f(fh,ierror) & BIND(C, name="ompi_file_close_f") implicit none @@ -3200,9 +3193,6 @@ subroutine ompi_file_write_shared_f(fh,buf,count,datatype,status,ierror) & INTEGER, INTENT(OUT) :: ierror end subroutine ompi_file_write_shared_f -! OMPI_PROVIDE_MPI_FILE_INTERFACE -#endif - subroutine ompi_register_datarep_f(datarep,read_conversion_fn, & write_conversion_fn,dtype_file_extent_fn, & extra_state,ierror,datarep_len) & @@ -3256,11 +3246,11 @@ subroutine ompi_type_create_f90_real_f(p,r,newtype,ierror) & INTEGER, INTENT(OUT) :: ierror end subroutine ompi_type_create_f90_real_f -subroutine ompi_type_match_size_f(typeclass,size,type,ierror) & +subroutine ompi_type_match_size_f(typeclass,size,datatype,ierror) & BIND(C, name="ompi_type_match_size_f") implicit none INTEGER, INTENT(IN) :: typeclass, size - INTEGER, INTENT(OUT) :: type + INTEGER, INTENT(OUT) :: datatype INTEGER, INTENT(OUT) :: ierror end subroutine ompi_type_match_size_f diff --git a/ompi/mpi/fortran/use-mpi-f08/mpi-f08-interfaces.F90 b/ompi/mpi/fortran/use-mpi-f08/mpi-f08-interfaces.F90 deleted file mode 100644 index 0ef30a25139..00000000000 --- a/ompi/mpi/fortran/use-mpi-f08/mpi-f08-interfaces.F90 +++ /dev/null @@ -1,4748 +0,0 @@ -! -*- f90 -*- -! -! Copyright (c) 2009-2015 Cisco Systems, Inc. All rights reserved. -! Copyright (c) 2009-2015 Los Alamos National Security, LLC. -! All rights reserved. -! Copyright (c) 2012 The University of Tennessee and The University -! of Tennessee Research Foundation. All rights -! reserved. -! Copyright (c) 2012 Inria. All rights reserved. -! Copyright (c) 2015 Research Organization for Information Science -! and Technology (RIST). All rights reserved. -! $COPYRIGHT$ -! -! This file provides the interface specifications for the MPI Fortran -! API bindings. It effectively maps between public names ("MPI_Init") -! and the name for tools ("MPI_Init_f08") and the back-end implementation -! name (e.g., "MPI_Init_f08"). - -#include "ompi/mpi/fortran/configure-fortran-output.h" - -module mpi_f08_interfaces - -interface MPI_Bsend -subroutine MPI_Bsend_f08(buf,count,datatype,dest,tag,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count, dest, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Bsend_f08 -end interface MPI_Bsend - -interface MPI_Bsend_init -subroutine MPI_Bsend_init_f08(buf,count,datatype,dest,tag,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count, dest, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Bsend_init_f08 -end interface MPI_Bsend_init - -interface MPI_Buffer_attach -subroutine MPI_Buffer_attach_f08(buffer,size,ierror) - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buffer - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buffer - !$PRAGMA IGNORE_TKR buffer - !DIR$ IGNORE_TKR buffer - !IBM* IGNORE_TKR buffer - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buffer - INTEGER, INTENT(IN) :: size - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Buffer_attach_f08 -end interface MPI_Buffer_attach - -interface MPI_Buffer_detach -subroutine MPI_Buffer_detach_f08(buffer_addr,size,ierror) - USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR - implicit none - TYPE(C_PTR), INTENT(OUT) :: buffer_addr - INTEGER, INTENT(OUT) :: size - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Buffer_detach_f08 -end interface MPI_Buffer_detach - -interface MPI_Cancel -subroutine MPI_Cancel_f08(request,ierror) - use :: mpi_f08_types, only : MPI_Request - implicit none - TYPE(MPI_Request), INTENT(IN) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Cancel_f08 -end interface MPI_Cancel - -interface MPI_Get_count -subroutine MPI_Get_count_f08(status,datatype,count,ierror) - use :: mpi_f08_types, only : MPI_Status, MPI_Datatype - implicit none - TYPE(MPI_Status), INTENT(IN) :: status - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, INTENT(OUT) :: count - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Get_count_f08 -end interface MPI_Get_count - -interface MPI_Ibsend -subroutine MPI_Ibsend_f08(buf,count,datatype,dest,tag,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count, dest, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Ibsend_f08 -end interface MPI_Ibsend - -interface MPI_Iprobe -subroutine MPI_Iprobe_f08(source,tag,comm,flag,status,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Status - implicit none - INTEGER, INTENT(IN) :: source, tag - TYPE(MPI_Comm), INTENT(IN) :: comm - LOGICAL, INTENT(OUT) :: flag - TYPE(MPI_Status), INTENT(OUT) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Iprobe_f08 -end interface MPI_Iprobe - -interface MPI_Irecv -subroutine MPI_Irecv_f08(buf,count,datatype,source,tag,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count, source, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Irecv_f08 -end interface MPI_Irecv - -interface MPI_Irsend -subroutine MPI_Irsend_f08(buf,count,datatype,dest,tag,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count, dest, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Irsend_f08 -end interface MPI_Irsend - -interface MPI_Isend -subroutine MPI_Isend_f08(buf,count,datatype,dest,tag,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count, dest, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Isend_f08 -end interface MPI_Isend - -interface MPI_Issend -subroutine MPI_Issend_f08(buf,count,datatype,dest,tag,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count, dest, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Issend_f08 -end interface MPI_Issend - -interface MPI_Probe -subroutine MPI_Probe_f08(source,tag,comm,status,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Status - implicit none - INTEGER, INTENT(IN) :: source, tag - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Status), INTENT(OUT) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Probe_f08 -end interface MPI_Probe - -interface MPI_Recv -subroutine MPI_Recv_f08(buf,count,datatype,source,tag,comm,status,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Status - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count, source, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Recv_f08 -end interface MPI_Recv - -interface MPI_Recv_init -subroutine MPI_Recv_init_f08(buf,count,datatype,source,tag,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count, source, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Recv_init_f08 -end interface MPI_Recv_init - -interface MPI_Request_free -subroutine MPI_Request_free_f08(request,ierror) - use :: mpi_f08_types, only : MPI_Request - implicit none - TYPE(MPI_Request), INTENT(INOUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Request_free_f08 -end interface MPI_Request_free - -interface MPI_Request_get_status -subroutine MPI_Request_get_status_f08(request,flag,status,ierror) - use :: mpi_f08_types, only : MPI_Request, MPI_Status - implicit none - TYPE(MPI_Request), INTENT(IN) :: request - LOGICAL, INTENT(OUT) :: flag - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Request_get_status_f08 -end interface MPI_Request_get_status - -interface MPI_Rsend -subroutine MPI_Rsend_f08(buf,count,datatype,dest,tag,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count, dest, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Rsend_f08 -end interface MPI_Rsend - -interface MPI_Rsend_init -subroutine MPI_Rsend_init_f08(buf,count,datatype,dest,tag,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count, dest, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Rsend_init_f08 -end interface MPI_Rsend_init - -interface MPI_Send -subroutine MPI_Send_f08(buf,count,datatype,dest,tag,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count, dest, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Send_f08 -end interface MPI_Send - -interface MPI_Sendrecv -subroutine MPI_Sendrecv_f08(sendbuf,sendcount,sendtype,dest,sendtag,recvbuf, & - recvcount,recvtype,source,recvtag,comm,status,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Status - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcount, dest, sendtag, recvcount, source, recvtag - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Sendrecv_f08 -end interface MPI_Sendrecv - -interface MPI_Sendrecv_replace -subroutine MPI_Sendrecv_replace_f08(buf,count,datatype,dest,sendtag,source,recvtag, & - comm,status,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Status - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count, dest, sendtag, source, recvtag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Sendrecv_replace_f08 -end interface MPI_Sendrecv_replace - -interface MPI_Send_init -subroutine MPI_Send_init_f08(buf,count,datatype,dest,tag,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count, dest, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Send_init_f08 -end interface MPI_Send_init - -interface MPI_Ssend -subroutine MPI_Ssend_f08(buf,count,datatype,dest,tag,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count, dest, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Ssend_f08 -end interface MPI_Ssend - -interface MPI_Ssend_init -subroutine MPI_Ssend_init_f08(buf,count,datatype,dest,tag,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count, dest, tag - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Ssend_init_f08 -end interface MPI_Ssend_init - -interface MPI_Start -subroutine MPI_Start_f08(request,ierror) - use :: mpi_f08_types, only : MPI_Request - implicit none - TYPE(MPI_Request), INTENT(INOUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Start_f08 -end interface MPI_Start - -interface MPI_Startall -subroutine MPI_Startall_f08(count,array_of_requests,ierror) - use :: mpi_f08_types, only : MPI_Request - implicit none - INTEGER, INTENT(IN) :: count - TYPE(MPI_Request), INTENT(INOUT) :: array_of_requests(count) - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Startall_f08 -end interface MPI_Startall - -interface MPI_Test -subroutine MPI_Test_f08(request,flag,status,ierror) - use :: mpi_f08_types, only : MPI_Request, MPI_Status - implicit none - TYPE(MPI_Request), INTENT(INOUT) :: request - LOGICAL, INTENT(OUT) :: flag - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Test_f08 -end interface MPI_Test - -interface MPI_Testall -subroutine MPI_Testall_f08(count,array_of_requests,flag,array_of_statuses,ierror) - use :: mpi_f08_types, only : MPI_Request, MPI_Status - implicit none - INTEGER, INTENT(IN) :: count - TYPE(MPI_Request), INTENT(INOUT) :: array_of_requests(count) - LOGICAL, INTENT(OUT) :: flag - TYPE(MPI_Status) :: array_of_statuses(*) - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Testall_f08 -end interface MPI_Testall - -interface MPI_Testany -subroutine MPI_Testany_f08(count,array_of_requests,index,flag,status,ierror) - use :: mpi_f08_types, only : MPI_Request, MPI_Status - implicit none - INTEGER, INTENT(IN) :: count - TYPE(MPI_Request), INTENT(INOUT) :: array_of_requests(count) - INTEGER, INTENT(OUT) :: index - LOGICAL, INTENT(OUT) :: flag - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Testany_f08 -end interface MPI_Testany - -interface MPI_Testsome -subroutine MPI_Testsome_f08(incount,array_of_requests,outcount, & - array_of_indices,array_of_statuses,ierror) - use :: mpi_f08_types, only : MPI_Request, MPI_Status - implicit none - INTEGER, INTENT(IN) :: incount - TYPE(MPI_Request), INTENT(INOUT) :: array_of_requests(incount) - INTEGER, INTENT(OUT) :: outcount, array_of_indices(*) - TYPE(MPI_Status) :: array_of_statuses(*) - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Testsome_f08 -end interface MPI_Testsome - -interface MPI_Test_cancelled -subroutine MPI_Test_cancelled_f08(status,flag,ierror) - use :: mpi_f08_types, only : MPI_Status - implicit none - TYPE(MPI_Status), INTENT(IN) :: status - LOGICAL, INTENT(OUT) :: flag - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Test_cancelled_f08 -end interface MPI_Test_cancelled - -interface MPI_Wait -subroutine MPI_Wait_f08(request,status,ierror) - use :: mpi_f08_types, only : MPI_Request, MPI_Status - implicit none - TYPE(MPI_Request), INTENT(INOUT) :: request - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Wait_f08 -end interface MPI_Wait - -interface MPI_Waitall -subroutine MPI_Waitall_f08(count,array_of_requests,array_of_statuses,ierror) - use :: mpi_f08_types, only : MPI_Request, MPI_Status - implicit none - INTEGER, INTENT(IN) :: count - TYPE(MPI_Request), INTENT(INOUT) :: array_of_requests(count) - TYPE(MPI_Status) :: array_of_statuses(*) - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Waitall_f08 -end interface MPI_Waitall - -interface MPI_Waitany -subroutine MPI_Waitany_f08(count,array_of_requests,index,status,ierror) - use :: mpi_f08_types, only : MPI_Request, MPI_Status - implicit none - INTEGER, INTENT(IN) :: count - TYPE(MPI_Request), INTENT(INOUT) :: array_of_requests(count) - INTEGER, INTENT(OUT) :: index - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Waitany_f08 -end interface MPI_Waitany - -interface MPI_Waitsome -subroutine MPI_Waitsome_f08(incount,array_of_requests,outcount, & - array_of_indices,array_of_statuses,ierror) - use :: mpi_f08_types, only : MPI_Request, MPI_Status - implicit none - INTEGER, INTENT(IN) :: incount - TYPE(MPI_Request), INTENT(INOUT) :: array_of_requests(incount) - INTEGER, INTENT(OUT) :: outcount, array_of_indices(*) - TYPE(MPI_Status) :: array_of_statuses(*) - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Waitsome_f08 -end interface MPI_Waitsome - -interface MPI_Get_address -subroutine MPI_Get_address_f08(location,address,ierror) - use :: mpi_f08_types, only : MPI_ADDRESS_KIND - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: location - !GCC$ ATTRIBUTES NO_ARG_CHECK :: location - !$PRAGMA IGNORE_TKR location - !DIR$ IGNORE_TKR location - !IBM* IGNORE_TKR location - OMPI_FORTRAN_IGNORE_TKR_TYPE :: location - INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: address - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Get_address_f08 -end interface MPI_Get_address - -interface MPI_Get_elements -subroutine MPI_Get_elements_f08(status,datatype,count,ierror) - use :: mpi_f08_types, only : MPI_Status, MPI_Datatype - implicit none - TYPE(MPI_Status), INTENT(IN) :: status - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, INTENT(OUT) :: count - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Get_elements_f08 -end interface MPI_Get_elements - -interface MPI_Get_elements_x -subroutine MPI_Get_elements_x_f08(status,datatype,count,ierror) - use :: mpi_f08_types, only : MPI_Status, MPI_Datatype, MPI_COUNT_KIND - implicit none - TYPE(MPI_Status), INTENT(IN) :: status - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER(MPI_COUNT_KIND), INTENT(OUT) :: count - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Get_elements_x_f08 -end interface MPI_Get_elements_x - -interface MPI_Pack -subroutine MPI_Pack_f08(inbuf,incount,datatype,outbuf,outsize,position,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, outbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, outbuf - !$PRAGMA IGNORE_TKR inbuf, outbuf - !DIR$ IGNORE_TKR inbuf, outbuf - !IBM* IGNORE_TKR inbuf, outbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: inbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: outbuf - INTEGER, INTENT(IN) :: incount, outsize - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, INTENT(INOUT) :: position - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Pack_f08 -end interface MPI_Pack - -interface MPI_Pack_external -subroutine MPI_Pack_external_f08(datarep,inbuf,incount,datatype,outbuf,outsize, & - position,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND - implicit none - CHARACTER(LEN=*), INTENT(IN) :: datarep - !DEC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, outbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, outbuf - !$PRAGMA IGNORE_TKR inbuf, outbuf - !DIR$ IGNORE_TKR inbuf, outbuf - !IBM* IGNORE_TKR inbuf, outbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: inbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: outbuf - INTEGER, INTENT(IN) :: incount - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: outsize - INTEGER(MPI_ADDRESS_KIND), INTENT(INOUT) :: position - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Pack_external_f08 -end interface MPI_Pack_external - -interface MPI_Pack_external_size -subroutine MPI_Pack_external_size_f08(datarep,incount,datatype,size,ierror & - ) - use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND - implicit none - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, INTENT(IN) :: incount - CHARACTER(LEN=*), INTENT(IN) :: datarep - INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: size - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Pack_external_size_f08 -end interface MPI_Pack_external_size - -interface MPI_Pack_size -subroutine MPI_Pack_size_f08(incount,datatype,comm,size,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - INTEGER, INTENT(IN) :: incount - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(OUT) :: size - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Pack_size_f08 -end interface MPI_Pack_size - -interface MPI_Type_commit -subroutine MPI_Type_commit_f08(datatype,ierror) - use :: mpi_f08_types, only : MPI_Datatype - implicit none - TYPE(MPI_Datatype), INTENT(INOUT) :: datatype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_commit_f08 -end interface MPI_Type_commit - -interface MPI_Type_contiguous -subroutine MPI_Type_contiguous_f08(count,oldtype,newtype,ierror) - use :: mpi_f08_types, only : MPI_Datatype - implicit none - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: oldtype - TYPE(MPI_Datatype), INTENT(OUT) :: newtype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_contiguous_f08 -end interface MPI_Type_contiguous - -interface MPI_Type_create_darray -subroutine MPI_Type_create_darray_f08(size,rank,ndims,array_of_gsizes, & - array_of_distribs,array_of_dargs,array_of_psizes,order, & - oldtype,newtype,ierror) - use :: mpi_f08_types, only : MPI_Datatype - implicit none - INTEGER, INTENT(IN) :: size, rank, ndims, order - INTEGER, INTENT(IN) :: array_of_gsizes(ndims), array_of_distribs(ndims) - INTEGER, INTENT(IN) :: array_of_dargs(ndims), array_of_psizes(ndims) - TYPE(MPI_Datatype), INTENT(IN) :: oldtype - TYPE(MPI_Datatype), INTENT(OUT) :: newtype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_create_darray_f08 -end interface MPI_Type_create_darray - -interface MPI_Type_create_hindexed -subroutine MPI_Type_create_hindexed_f08(count,array_of_blocklengths, & - array_of_displacements,oldtype,newtype,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND - implicit none - INTEGER, INTENT(IN) :: count - INTEGER, INTENT(IN) :: array_of_blocklengths(count) - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: array_of_displacements(count) - TYPE(MPI_Datatype), INTENT(IN) :: oldtype - TYPE(MPI_Datatype), INTENT(OUT) :: newtype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_create_hindexed_f08 -end interface MPI_Type_create_hindexed - -interface MPI_Type_create_hvector -subroutine MPI_Type_create_hvector_f08(count,blocklength,stride,oldtype,newtype,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND - implicit none - INTEGER, INTENT(IN) :: count, blocklength - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: stride - TYPE(MPI_Datatype), INTENT(IN) :: oldtype - TYPE(MPI_Datatype), INTENT(OUT) :: newtype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_create_hvector_f08 -end interface MPI_Type_create_hvector - -interface MPI_Type_create_indexed_block -subroutine MPI_Type_create_indexed_block_f08(count,blocklength, & - array_of_displacements,oldtype,newtype,ierror) - use :: mpi_f08_types, only : MPI_Datatype - implicit none - INTEGER, INTENT(IN) :: count, blocklength - INTEGER, INTENT(IN) :: array_of_displacements(count) - TYPE(MPI_Datatype), INTENT(IN) :: oldtype - TYPE(MPI_Datatype), INTENT(OUT) :: newtype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_create_indexed_block_f08 -end interface MPI_Type_create_indexed_block - -interface MPI_Type_create_hindexed_block -subroutine MPI_Type_create_hindexed_block_f08(count,blocklength, & - array_of_displacements,oldtype,newtype,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND - implicit none - INTEGER, INTENT(IN) :: count, blocklength - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: array_of_displacements(count) - TYPE(MPI_Datatype), INTENT(IN) :: oldtype - TYPE(MPI_Datatype), INTENT(OUT) :: newtype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_create_hindexed_block_f08 -end interface MPI_Type_create_hindexed_block - -interface MPI_Type_create_resized -subroutine MPI_Type_create_resized_f08(oldtype,lb,extent,newtype,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND - implicit none - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: lb, extent - TYPE(MPI_Datatype), INTENT(IN) :: oldtype - TYPE(MPI_Datatype), INTENT(OUT) :: newtype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_create_resized_f08 -end interface MPI_Type_create_resized - -interface MPI_Type_create_struct -subroutine MPI_Type_create_struct_f08(count,array_of_blocklengths, & - array_of_displacements,array_of_types,newtype,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND - implicit none - INTEGER, INTENT(IN) :: count - INTEGER, INTENT(IN) :: array_of_blocklengths(count) - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: array_of_displacements(count) - TYPE(MPI_Datatype), INTENT(IN) :: array_of_types(count) - TYPE(MPI_Datatype), INTENT(OUT) :: newtype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_create_struct_f08 -end interface MPI_Type_create_struct - -interface MPI_Type_create_subarray -subroutine MPI_Type_create_subarray_f08(ndims,array_of_sizes,array_of_subsizes, & - array_of_starts,order,oldtype,newtype,ierror) - use :: mpi_f08_types, only : MPI_Datatype - implicit none - INTEGER, INTENT(IN) :: ndims, order - INTEGER, INTENT(IN) :: array_of_sizes(ndims), array_of_subsizes(ndims) - INTEGER, INTENT(IN) :: array_of_starts(ndims) - TYPE(MPI_Datatype), INTENT(IN) :: oldtype - TYPE(MPI_Datatype), INTENT(OUT) :: newtype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_create_subarray_f08 -end interface MPI_Type_create_subarray - -interface MPI_Type_dup -subroutine MPI_Type_dup_f08(oldtype,newtype,ierror) - use :: mpi_f08_types, only : MPI_Datatype - implicit none - TYPE(MPI_Datatype), INTENT(IN) :: oldtype - TYPE(MPI_Datatype), INTENT(OUT) :: newtype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_dup_f08 -end interface MPI_Type_dup - -interface MPI_Type_free -subroutine MPI_Type_free_f08(datatype,ierror) - use :: mpi_f08_types, only : MPI_Datatype - implicit none - TYPE(MPI_Datatype), INTENT(INOUT) :: datatype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_free_f08 -end interface MPI_Type_free - -interface MPI_Type_get_contents -subroutine MPI_Type_get_contents_f08(datatype,max_integers,max_addresses,max_datatypes, & - array_of_integers,array_of_addresses,array_of_datatypes, & - ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND - implicit none - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, INTENT(IN) :: max_integers, max_addresses, max_datatypes - INTEGER, INTENT(OUT) :: array_of_integers(max_integers) - INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: array_of_addresses(max_addresses) - TYPE(MPI_Datatype), INTENT(OUT) :: array_of_datatypes(max_datatypes) - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_get_contents_f08 -end interface MPI_Type_get_contents - -interface MPI_Type_get_envelope -subroutine MPI_Type_get_envelope_f08(datatype,num_integers,num_addresses,num_datatypes, & - combiner,ierror) - use :: mpi_f08_types, only : MPI_Datatype - implicit none - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, INTENT(OUT) :: num_integers, num_addresses, num_datatypes, combiner - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_get_envelope_f08 -end interface MPI_Type_get_envelope - -interface MPI_Type_get_extent -subroutine MPI_Type_get_extent_f08(datatype,lb,extent,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND - implicit none - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: lb, extent - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_get_extent_f08 -end interface MPI_Type_get_extent - -interface MPI_Type_get_extent_x -subroutine MPI_Type_get_extent_x_f08(datatype,lb,extent,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND, MPI_COUNT_KIND - implicit none - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER(MPI_COUNT_KIND), INTENT(OUT) :: lb, extent - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_get_extent_x_f08 -end interface MPI_Type_get_extent_x - -interface MPI_Type_get_true_extent -subroutine MPI_Type_get_true_extent_f08(datatype,true_lb,true_extent,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND - implicit none - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: true_lb, true_extent - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_get_true_extent_f08 -end interface MPI_Type_get_true_extent - -interface MPI_Type_get_true_extent_x -subroutine MPI_Type_get_true_extent_x_f08(datatype,true_lb,true_extent,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND, MPI_COUNT_KIND - implicit none - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER(MPI_COUNT_KIND), INTENT(OUT) :: true_lb, true_extent - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_get_true_extent_x_f08 -end interface MPI_Type_get_true_extent_x - -interface MPI_Type_indexed -subroutine MPI_Type_indexed_f08(count,array_of_blocklengths, & - array_of_displacements,oldtype,newtype,ierror) - use :: mpi_f08_types, only : MPI_Datatype - implicit none - INTEGER, INTENT(IN) :: count - INTEGER, INTENT(IN) :: array_of_blocklengths(count), array_of_displacements(count) - TYPE(MPI_Datatype), INTENT(IN) :: oldtype - TYPE(MPI_Datatype), INTENT(OUT) :: newtype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_indexed_f08 -end interface MPI_Type_indexed - -interface MPI_Type_size -subroutine MPI_Type_size_f08(datatype,size,ierror) - use :: mpi_f08_types, only : MPI_Datatype - implicit none - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, INTENT(OUT) :: size - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_size_f08 -end interface MPI_Type_size - -interface MPI_Type_size_x -subroutine MPI_Type_size_x_f08(datatype,size,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_COUNT_KIND - implicit none - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER(MPI_COUNT_KIND), INTENT(OUT) :: size - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_size_x_f08 -end interface MPI_Type_size_x - -interface MPI_Type_vector -subroutine MPI_Type_vector_f08(count,blocklength,stride,oldtype,newtype,ierror) - use :: mpi_f08_types, only : MPI_Datatype - implicit none - INTEGER, INTENT(IN) :: count, blocklength, stride - TYPE(MPI_Datatype), INTENT(IN) :: oldtype - TYPE(MPI_Datatype), INTENT(OUT) :: newtype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_vector_f08 -end interface MPI_Type_vector - -interface MPI_Unpack -subroutine MPI_Unpack_f08(inbuf,insize,position,outbuf,outcount,datatype,comm, & - ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, outbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, outbuf - !$PRAGMA IGNORE_TKR inbuf, outbuf - !DIR$ IGNORE_TKR inbuf, outbuf - !IBM* IGNORE_TKR inbuf, outbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: inbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: outbuf - INTEGER, INTENT(IN) :: insize, outcount - INTEGER, INTENT(INOUT) :: position - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Unpack_f08 -end interface MPI_Unpack - -interface MPI_Unpack_external -subroutine MPI_Unpack_external_f08(datarep,inbuf,insize,position,outbuf,outcount, & - datatype,ierror & - ) - use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND - implicit none - CHARACTER(LEN=*), INTENT(IN) :: datarep - !DEC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, outbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, outbuf - !$PRAGMA IGNORE_TKR inbuf, outbuf - !DIR$ IGNORE_TKR inbuf, outbuf - !IBM* IGNORE_TKR inbuf, outbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: inbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: outbuf - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: insize - INTEGER(MPI_ADDRESS_KIND), INTENT(INOUT) :: position - INTEGER, INTENT(IN) :: outcount - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Unpack_external_f08 -end interface MPI_Unpack_external - -interface MPI_Allgather -subroutine MPI_Allgather_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & - comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcount, recvcount - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Allgather_f08 -end interface MPI_Allgather - -interface MPI_Iallgather -subroutine MPI_Iallgather_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & - comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcount, recvcount - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Iallgather_f08 -end interface MPI_Iallgather - -interface MPI_Allgatherv -subroutine MPI_Allgatherv_f08(sendbuf,sendcount,sendtype,recvbuf,recvcounts,displs, & - recvtype,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcount - INTEGER, INTENT(IN) :: recvcounts(*), displs(*) - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Allgatherv_f08 -end interface MPI_Allgatherv - -interface MPI_Iallgatherv -subroutine MPI_Iallgatherv_f08(sendbuf,sendcount,sendtype,recvbuf,recvcounts,displs, & - recvtype,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcount - INTEGER, INTENT(IN) :: recvcounts(*), displs(*) - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Iallgatherv_f08 -end interface MPI_Iallgatherv - -interface MPI_Allreduce -subroutine MPI_Allreduce_f08(sendbuf,recvbuf,count,datatype,op,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Op), INTENT(IN) :: op - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Allreduce_f08 -end interface MPI_Allreduce - -interface MPI_Iallreduce -subroutine MPI_Iallreduce_f08(sendbuf,recvbuf,count,datatype,op,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Op), INTENT(IN) :: op - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Iallreduce_f08 -end interface MPI_Iallreduce - -interface MPI_Alltoall -subroutine MPI_Alltoall_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & - comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcount, recvcount - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Alltoall_f08 -end interface MPI_Alltoall - -interface MPI_Ialltoall -subroutine MPI_Ialltoall_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & - comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcount, recvcount - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Ialltoall_f08 -end interface MPI_Ialltoall - -interface MPI_Alltoallv -subroutine MPI_Alltoallv_f08(sendbuf,sendcounts,sdispls,sendtype,recvbuf,recvcounts, & - rdispls,recvtype,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcounts(*), sdispls(*), recvcounts(*), rdispls(*) - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Alltoallv_f08 -end interface MPI_Alltoallv - -interface MPI_Ialltoallv -subroutine MPI_Ialltoallv_f08(sendbuf,sendcounts,sdispls,sendtype,recvbuf,recvcounts, & - rdispls,recvtype,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcounts(*), sdispls(*), recvcounts(*), rdispls(*) - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(IN) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Ialltoallv_f08 -end interface MPI_Ialltoallv - -interface MPI_Alltoallw -subroutine MPI_Alltoallw_f08(sendbuf,sendcounts,sdispls,sendtypes,recvbuf,recvcounts, & - rdispls,recvtypes,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcounts(*), sdispls(*), recvcounts(*), rdispls(*) - TYPE(MPI_Datatype), INTENT(IN) :: sendtypes(*), recvtypes(*) - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Alltoallw_f08 -end interface MPI_Alltoallw - -interface MPI_Ialltoallw -subroutine MPI_Ialltoallw_f08(sendbuf,sendcounts,sdispls,sendtypes,recvbuf,recvcounts, & - rdispls,recvtypes,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcounts(*), sdispls(*), recvcounts(*), rdispls(*) - TYPE(MPI_Datatype), INTENT(IN) :: sendtypes(*), recvtypes(*) - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(IN) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Ialltoallw_f08 -end interface MPI_Ialltoallw - -interface MPI_Barrier -subroutine MPI_Barrier_f08(comm,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Barrier_f08 -end interface MPI_Barrier - -interface MPI_Ibarrier -subroutine MPI_Ibarrier_f08(comm,request,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Request - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Ibarrier_f08 -end interface MPI_Ibarrier - -interface MPI_Bcast -subroutine MPI_Bcast_f08(buffer,count,datatype,root,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buffer - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buffer - !$PRAGMA IGNORE_TKR buffer - !DIR$ IGNORE_TKR buffer - !IBM* IGNORE_TKR buffer - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buffer - INTEGER, INTENT(IN) :: count, root - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Bcast_f08 -end interface MPI_Bcast - -interface MPI_Ibcast -subroutine MPI_Ibcast_f08(buffer,count,datatype,root,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buffer - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buffer - !$PRAGMA IGNORE_TKR buffer - !DIR$ IGNORE_TKR buffer - !IBM* IGNORE_TKR buffer - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buffer - INTEGER, INTENT(IN) :: count, root - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Ibcast_f08 -end interface MPI_Ibcast - -interface MPI_Exscan -subroutine MPI_Exscan_f08(sendbuf,recvbuf,count,datatype,op,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Op), INTENT(IN) :: op - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Exscan_f08 -end interface MPI_Exscan - -interface MPI_Iexscan -subroutine MPI_Iexscan_f08(sendbuf,recvbuf,count,datatype,op,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Op), INTENT(IN) :: op - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Iexscan_f08 -end interface MPI_Iexscan - -interface MPI_Gather -subroutine MPI_Gather_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & - root,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcount, recvcount, root - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Gather_f08 -end interface MPI_Gather - -interface MPI_Igather -subroutine MPI_Igather_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & - root,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcount, recvcount, root - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Igather_f08 -end interface MPI_Igather - -interface MPI_Gatherv -subroutine MPI_Gatherv_f08(sendbuf,sendcount,sendtype,recvbuf,recvcounts,displs, & - recvtype,root,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcount, root - INTEGER, INTENT(IN) :: recvcounts(*), displs(*) - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Gatherv_f08 -end interface MPI_Gatherv - -interface MPI_Igatherv -subroutine MPI_Igatherv_f08(sendbuf,sendcount,sendtype,recvbuf,recvcounts,displs, & - recvtype,root,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcount, root - INTEGER, INTENT(IN) :: recvcounts(*), displs(*) - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Igatherv_f08 -end interface MPI_Igatherv - -interface MPI_Op_commutative -subroutine MPI_Op_commutative_f08(op,commute,ierror) - use :: mpi_f08_types, only : MPI_Op - implicit none - TYPE(MPI_Op), INTENT(IN) :: op - LOGICAL, INTENT(OUT) :: commute - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Op_commutative_f08 -end interface MPI_Op_commutative - -interface MPI_Op_create -subroutine MPI_Op_create_f08(user_fn,commute,op,ierror) - use :: mpi_f08_types, only : MPI_Op - use :: mpi_f08_interfaces_callbacks, only : MPI_User_function - implicit none - PROCEDURE(MPI_User_function) :: user_fn - LOGICAL, INTENT(IN) :: commute - TYPE(MPI_Op), INTENT(OUT) :: op - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Op_create_f08 -end interface MPI_Op_create - -interface MPI_Op_free -subroutine MPI_Op_free_f08(op,ierror) - use :: mpi_f08_types, only : MPI_Op - implicit none - TYPE(MPI_Op), INTENT(INOUT) :: op - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Op_free_f08 -end interface MPI_Op_free - -interface MPI_Reduce -subroutine MPI_Reduce_f08(sendbuf,recvbuf,count,datatype,op,root,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: count, root - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Op), INTENT(IN) :: op - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Reduce_f08 -end interface MPI_Reduce - -interface MPI_Ireduce -subroutine MPI_Ireduce_f08(sendbuf,recvbuf,count,datatype,op,root,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: count, root - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Op), INTENT(IN) :: op - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Ireduce_f08 -end interface MPI_Ireduce - -interface MPI_Reduce_local -subroutine MPI_Reduce_local_f08(inbuf,inoutbuf,count,datatype,op,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Op - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, inoutbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, inoutbuf - !$PRAGMA IGNORE_TKR inbuf, inoutbuf - !DIR$ IGNORE_TKR inbuf, inoutbuf - !IBM* IGNORE_TKR inbuf, inoutbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: inbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: inoutbuf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Op), INTENT(IN) :: op - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Reduce_local_f08 -end interface MPI_Reduce_local - -interface MPI_Reduce_scatter -subroutine MPI_Reduce_scatter_f08(sendbuf,recvbuf,recvcounts,datatype,op,comm, & - ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: recvcounts(*) - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Op), INTENT(IN) :: op - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Reduce_scatter_f08 -end interface MPI_Reduce_scatter - -interface MPI_Ireduce_scatter -subroutine MPI_Ireduce_scatter_f08(sendbuf,recvbuf,recvcounts,datatype,op,comm, & - request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: recvcounts(*) - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Op), INTENT(IN) :: op - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Ireduce_scatter_f08 -end interface MPI_Ireduce_scatter - -interface MPI_Reduce_scatter_block -subroutine MPI_Reduce_scatter_block_f08(sendbuf,recvbuf,recvcount,datatype,op,comm, & - ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: recvcount - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Op), INTENT(IN) :: op - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Reduce_scatter_block_f08 -end interface MPI_Reduce_scatter_block - -interface MPI_Ireduce_scatter_block -subroutine MPI_Ireduce_scatter_block_f08(sendbuf,recvbuf,recvcount,datatype,op,comm, & - request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: recvcount - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Op), INTENT(IN) :: op - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Ireduce_scatter_block_f08 -end interface MPI_Ireduce_scatter_block - -interface MPI_Scan -subroutine MPI_Scan_f08(sendbuf,recvbuf,count,datatype,op,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Op), INTENT(IN) :: op - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Scan_f08 -end interface MPI_Scan - -interface MPI_Iscan -subroutine MPI_Iscan_f08(sendbuf,recvbuf,count,datatype,op,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Op), INTENT(IN) :: op - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Iscan_f08 -end interface MPI_Iscan - -interface MPI_Scatter -subroutine MPI_Scatter_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & - root,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcount, recvcount, root - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Scatter_f08 -end interface MPI_Scatter - -interface MPI_Iscatter -subroutine MPI_Iscatter_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & - root,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcount, recvcount, root - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Iscatter_f08 -end interface MPI_Iscatter - -interface MPI_Scatterv -subroutine MPI_Scatterv_f08(sendbuf,sendcounts,displs,sendtype,recvbuf,recvcount, & - recvtype,root,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: recvcount, root - INTEGER, INTENT(IN) :: sendcounts(*), displs(*) - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Scatterv_f08 -end interface MPI_Scatterv - -interface MPI_Iscatterv -subroutine MPI_Iscatterv_f08(sendbuf,sendcounts,displs,sendtype,recvbuf,recvcount, & - recvtype,root,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: recvcount, root - INTEGER, INTENT(IN) :: sendcounts(*), displs(*) - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Iscatterv_f08 -end interface MPI_Iscatterv - -interface MPI_Comm_compare -subroutine MPI_Comm_compare_f08(comm1,comm2,result,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm1 - TYPE(MPI_Comm), INTENT(IN) :: comm2 - INTEGER, INTENT(OUT) :: result - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_compare_f08 -end interface MPI_Comm_compare - -interface MPI_Comm_create -subroutine MPI_Comm_create_f08(comm,group,newcomm,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Group - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Group), INTENT(IN) :: group - TYPE(MPI_Comm), INTENT(OUT) :: newcomm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_create_f08 -end interface MPI_Comm_create - -interface MPI_Comm_create_group -subroutine MPI_Comm_create_group_f08(comm,group,tag,newcomm,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Group - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Group), INTENT(IN) :: group - INTEGER, INTENT(IN) :: tag - TYPE(MPI_Comm), INTENT(OUT) :: newcomm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_create_group_f08 -end interface MPI_Comm_create_group - -interface MPI_Comm_create_keyval -subroutine MPI_Comm_create_keyval_f08(comm_copy_attr_fn,comm_delete_attr_fn,comm_keyval, & - extra_state,ierror) - use :: mpi_f08_types, only : MPI_ADDRESS_KIND - use :: mpi_f08_interfaces_callbacks, only : MPI_Comm_copy_attr_function - use :: mpi_f08_interfaces_callbacks, only : MPI_Comm_delete_attr_function - implicit none - PROCEDURE(MPI_Comm_copy_attr_function) :: comm_copy_attr_fn - PROCEDURE(MPI_Comm_delete_attr_function) :: comm_delete_attr_fn - INTEGER, INTENT(OUT) :: comm_keyval - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: extra_state - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_create_keyval_f08 -end interface MPI_Comm_create_keyval - -interface MPI_Comm_delete_attr -subroutine MPI_Comm_delete_attr_f08(comm,comm_keyval,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(IN) :: comm_keyval - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_delete_attr_f08 -end interface MPI_Comm_delete_attr - -interface MPI_Comm_dup -subroutine MPI_Comm_dup_f08(comm,newcomm,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Comm), INTENT(OUT) :: newcomm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_dup_f08 -end interface MPI_Comm_dup - -interface MPI_Comm_dup_with_info -subroutine MPI_Comm_dup_with_info_f08(comm,info,newcomm,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Info - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Info), INTENT(IN) :: info - TYPE(MPI_Comm), INTENT(OUT) :: newcomm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_dup_with_info_f08 -end interface MPI_Comm_dup_with_info - -interface MPI_Comm_idup -subroutine MPI_Comm_idup_f08(comm,newcomm,request,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Request - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Comm), INTENT(OUT) :: newcomm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_idup_f08 -end interface MPI_Comm_idup - -interface MPI_Comm_free -subroutine MPI_Comm_free_f08(comm,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(INOUT) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_free_f08 -end interface MPI_Comm_free - -interface MPI_Comm_free_keyval -subroutine MPI_Comm_free_keyval_f08(comm_keyval,ierror) - implicit none - INTEGER, INTENT(INOUT) :: comm_keyval - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_free_keyval_f08 -end interface MPI_Comm_free_keyval - -interface MPI_Comm_get_attr -subroutine MPI_Comm_get_attr_f08(comm,comm_keyval,attribute_val,flag,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_ADDRESS_KIND - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(IN) :: comm_keyval - INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: attribute_val - LOGICAL, INTENT(OUT) :: flag - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_get_attr_f08 -end interface MPI_Comm_get_attr - -interface MPI_Comm_get_info -subroutine MPI_Comm_get_info_f08(comm,info_used,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Info - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Info), INTENT(OUT) :: info_used - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_get_info_f08 -end interface MPI_Comm_get_info - -interface MPI_Comm_get_name -subroutine MPI_Comm_get_name_f08(comm,comm_name,resultlen,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_MAX_OBJECT_NAME - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - CHARACTER(LEN=MPI_MAX_OBJECT_NAME), INTENT(OUT) :: comm_name - INTEGER, INTENT(OUT) :: resultlen - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_get_name_f08 -end interface MPI_Comm_get_name - -interface MPI_Comm_group -subroutine MPI_Comm_group_f08(comm,group,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Group - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Group), INTENT(OUT) :: group - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_group_f08 -end interface MPI_Comm_group - -interface MPI_Comm_rank -subroutine MPI_Comm_rank_f08(comm,rank,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(OUT) :: rank - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_rank_f08 -end interface MPI_Comm_rank - -interface MPI_Comm_remote_group -subroutine MPI_Comm_remote_group_f08(comm,group,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Group - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Group), INTENT(OUT) :: group - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_remote_group_f08 -end interface MPI_Comm_remote_group - -interface MPI_Comm_remote_size -subroutine MPI_Comm_remote_size_f08(comm,size,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(OUT) :: size - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_remote_size_f08 -end interface MPI_Comm_remote_size - -interface MPI_Comm_set_attr -subroutine MPI_Comm_set_attr_f08(comm,comm_keyval,attribute_val,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_ADDRESS_KIND - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(IN) :: comm_keyval - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: attribute_val - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_set_attr_f08 -end interface MPI_Comm_set_attr - -interface MPI_Comm_set_info -subroutine MPI_Comm_set_info_f08(comm,info,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Info - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Info), INTENT(IN) :: info - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_set_info_f08 -end interface MPI_Comm_set_info - -interface MPI_Comm_set_name -subroutine MPI_Comm_set_name_f08(comm,comm_name,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - CHARACTER(LEN=*), INTENT(IN) :: comm_name - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_set_name_f08 -end interface MPI_Comm_set_name - -interface MPI_Comm_size -subroutine MPI_Comm_size_f08(comm,size,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(OUT) :: size - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_size_f08 -end interface MPI_Comm_size - -interface MPI_Comm_split -subroutine MPI_Comm_split_f08(comm,color,key,newcomm,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(IN) :: color, key - TYPE(MPI_Comm), INTENT(OUT) :: newcomm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_split_f08 -end interface MPI_Comm_split - -interface MPI_Comm_test_inter -subroutine MPI_Comm_test_inter_f08(comm,flag,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - LOGICAL, INTENT(OUT) :: flag - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_test_inter_f08 -end interface MPI_Comm_test_inter - -interface MPI_Group_compare -subroutine MPI_Group_compare_f08(group1,group2,result,ierror) - use :: mpi_f08_types, only : MPI_Group - implicit none - TYPE(MPI_Group), INTENT(IN) :: group1, group2 - INTEGER, INTENT(OUT) :: result - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Group_compare_f08 -end interface MPI_Group_compare - -interface MPI_Group_difference -subroutine MPI_Group_difference_f08(group1,group2,newgroup,ierror) - use :: mpi_f08_types, only : MPI_Group - implicit none - TYPE(MPI_Group), INTENT(IN) :: group1, group2 - TYPE(MPI_Group), INTENT(OUT) :: newgroup - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Group_difference_f08 -end interface MPI_Group_difference - -interface MPI_Group_excl -subroutine MPI_Group_excl_f08(group,n,ranks,newgroup,ierror) - use :: mpi_f08_types, only : MPI_Group - implicit none - TYPE(MPI_Group), INTENT(IN) :: group - INTEGER, INTENT(IN) :: n, ranks(n) - TYPE(MPI_Group), INTENT(OUT) :: newgroup - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Group_excl_f08 -end interface MPI_Group_excl - -interface MPI_Group_free -subroutine MPI_Group_free_f08(group,ierror) - use :: mpi_f08_types, only : MPI_Group - implicit none - TYPE(MPI_Group), INTENT(INOUT) :: group - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Group_free_f08 -end interface MPI_Group_free - -interface MPI_Group_incl -subroutine MPI_Group_incl_f08(group,n,ranks,newgroup,ierror) - use :: mpi_f08_types, only : MPI_Group - implicit none - INTEGER, INTENT(IN) :: n, ranks(n) - TYPE(MPI_Group), INTENT(IN) :: group - TYPE(MPI_Group), INTENT(OUT) :: newgroup - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Group_incl_f08 -end interface MPI_Group_incl - -interface MPI_Group_intersection -subroutine MPI_Group_intersection_f08(group1,group2,newgroup,ierror) - use :: mpi_f08_types, only : MPI_Group - implicit none - TYPE(MPI_Group), INTENT(IN) :: group1, group2 - TYPE(MPI_Group), INTENT(OUT) :: newgroup - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Group_intersection_f08 -end interface MPI_Group_intersection - -interface MPI_Group_range_excl -subroutine MPI_Group_range_excl_f08(group,n,ranges,newgroup,ierror) - use :: mpi_f08_types, only : MPI_Group - implicit none - TYPE(MPI_Group), INTENT(IN) :: group - INTEGER, INTENT(IN) :: n, ranges(3,n) - TYPE(MPI_Group), INTENT(OUT) :: newgroup - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Group_range_excl_f08 -end interface MPI_Group_range_excl - -interface MPI_Group_range_incl -subroutine MPI_Group_range_incl_f08(group,n,ranges,newgroup,ierror) - use :: mpi_f08_types, only : MPI_Group - implicit none - TYPE(MPI_Group), INTENT(IN) :: group - INTEGER, INTENT(IN) :: n, ranges(3,n) - TYPE(MPI_Group), INTENT(OUT) :: newgroup - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Group_range_incl_f08 -end interface MPI_Group_range_incl - -interface MPI_Group_rank -subroutine MPI_Group_rank_f08(group,rank,ierror) - use :: mpi_f08_types, only : MPI_Group - implicit none - TYPE(MPI_Group), INTENT(IN) :: group - INTEGER, INTENT(OUT) :: rank - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Group_rank_f08 -end interface MPI_Group_rank - -interface MPI_Group_size -subroutine MPI_Group_size_f08(group,size,ierror) - use :: mpi_f08_types, only : MPI_Group - implicit none - TYPE(MPI_Group), INTENT(IN) :: group - INTEGER, INTENT(OUT) :: size - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Group_size_f08 -end interface MPI_Group_size - -interface MPI_Group_translate_ranks -subroutine MPI_Group_translate_ranks_f08(group1,n,ranks1,group2,ranks2,ierror) - use :: mpi_f08_types, only : MPI_Group - implicit none - TYPE(MPI_Group), INTENT(IN) :: group1, group2 - INTEGER, INTENT(IN) :: n - INTEGER, INTENT(IN) :: ranks1(n) - INTEGER, INTENT(OUT) :: ranks2(n) - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Group_translate_ranks_f08 -end interface MPI_Group_translate_ranks - -interface MPI_Group_union -subroutine MPI_Group_union_f08(group1,group2,newgroup,ierror) - use :: mpi_f08_types, only : MPI_Group - implicit none - TYPE(MPI_Group), INTENT(IN) :: group1, group2 - TYPE(MPI_Group), INTENT(OUT) :: newgroup - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Group_union_f08 -end interface MPI_Group_union - -interface MPI_Intercomm_create -subroutine MPI_Intercomm_create_f08(local_comm,local_leader,peer_comm,remote_leader, & - tag,newintercomm,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: local_comm, peer_comm - INTEGER, INTENT(IN) :: local_leader, remote_leader, tag - TYPE(MPI_Comm), INTENT(OUT) :: newintercomm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Intercomm_create_f08 -end interface MPI_Intercomm_create - -interface MPI_Intercomm_merge -subroutine MPI_Intercomm_merge_f08(intercomm,high,newintracomm,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: intercomm - LOGICAL, INTENT(IN) :: high - TYPE(MPI_Comm), INTENT(OUT) :: newintracomm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Intercomm_merge_f08 -end interface MPI_Intercomm_merge - -interface MPI_Type_create_keyval -subroutine MPI_Type_create_keyval_f08(type_copy_attr_fn,type_delete_attr_fn,type_keyval, & - extra_state,ierror) - use :: mpi_f08_types, only : MPI_ADDRESS_KIND - use :: mpi_f08_interfaces_callbacks, only : MPI_Type_copy_attr_function - use :: mpi_f08_interfaces_callbacks, only : MPI_Type_delete_attr_function - implicit none - PROCEDURE(MPI_Type_copy_attr_function) :: type_copy_attr_fn - PROCEDURE(MPI_Type_delete_attr_function) :: type_delete_attr_fn - INTEGER, INTENT(OUT) :: type_keyval - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: extra_state - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_create_keyval_f08 -end interface MPI_Type_create_keyval - -interface MPI_Type_delete_attr -subroutine MPI_Type_delete_attr_f08(datatype,type_keyval,ierror) - use :: mpi_f08_types, only : MPI_Datatype - implicit none - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, INTENT(IN) :: type_keyval - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_delete_attr_f08 -end interface MPI_Type_delete_attr - -interface MPI_Type_free_keyval -subroutine MPI_Type_free_keyval_f08(type_keyval,ierror) - implicit none - INTEGER, INTENT(INOUT) :: type_keyval - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_free_keyval_f08 -end interface MPI_Type_free_keyval - -interface MPI_Type_get_attr -subroutine MPI_Type_get_attr_f08(datatype,type_keyval,attribute_val,flag,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND - implicit none - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, INTENT(IN) :: type_keyval - INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: attribute_val - LOGICAL, INTENT(OUT) :: flag - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_get_attr_f08 -end interface MPI_Type_get_attr - -interface MPI_Type_get_name -subroutine MPI_Type_get_name_f08(datatype,type_name,resultlen,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_MAX_OBJECT_NAME - implicit none - TYPE(MPI_Datatype), INTENT(IN) :: datatype - CHARACTER(LEN=MPI_MAX_OBJECT_NAME), INTENT(OUT) :: type_name - INTEGER, INTENT(OUT) :: resultlen - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_get_name_f08 -end interface MPI_Type_get_name - -interface MPI_Type_set_attr -subroutine MPI_Type_set_attr_f08(datatype,type_keyval,attribute_val,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND - implicit none - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, INTENT(IN) :: type_keyval - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: attribute_val - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_set_attr_f08 -end interface MPI_Type_set_attr - -interface MPI_Type_set_name -subroutine MPI_Type_set_name_f08(datatype,type_name,ierror) - use :: mpi_f08_types, only : MPI_Datatype - implicit none - TYPE(MPI_Datatype), INTENT(IN) :: datatype - CHARACTER(LEN=*), INTENT(IN) :: type_name - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_set_name_f08 -end interface MPI_Type_set_name - -interface MPI_Win_allocate -subroutine MPI_Win_allocate_f08(size, disp_unit, info, comm, & - baseptr, win, ierror) - USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR - use :: mpi_f08_types, only : MPI_Info, MPI_Comm, MPI_Win, MPI_ADDRESS_KIND - INTEGER(KIND=MPI_ADDRESS_KIND), INTENT(IN) :: size - INTEGER, INTENT(IN) :: disp_unit - TYPE(MPI_Info), INTENT(IN) :: info - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(C_PTR), INTENT(OUT) :: baseptr - TYPE(MPI_Win), INTENT(OUT) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_allocate_f08 -end interface MPI_Win_allocate - -interface MPI_Win_allocate_shared -subroutine MPI_Win_allocate_shared_f08(size, disp_unit, info, comm, & - baseptr, win, ierror) - USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR - use :: mpi_f08_types, only : MPI_Info, MPI_Comm, MPI_Win, MPI_ADDRESS_KIND - INTEGER(KIND=MPI_ADDRESS_KIND), INTENT(IN) :: size - INTEGER, INTENT(IN) :: disp_unit - TYPE(MPI_Info), INTENT(IN) :: info - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(C_PTR), INTENT(OUT) :: baseptr - TYPE(MPI_Win), INTENT(OUT) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_allocate_shared_f08 -end interface MPI_Win_allocate_shared - -interface MPI_Win_create_keyval -subroutine MPI_Win_create_keyval_f08(win_copy_attr_fn,win_delete_attr_fn,win_keyval, & - extra_state,ierror) - use :: mpi_f08_types, only : MPI_ADDRESS_KIND - use :: mpi_f08_interfaces_callbacks, only : MPI_Win_copy_attr_function - use :: mpi_f08_interfaces_callbacks, only : MPI_Win_delete_attr_function - implicit none - PROCEDURE(MPI_Win_copy_attr_function) :: win_copy_attr_fn - PROCEDURE(MPI_Win_delete_attr_function) :: win_delete_attr_fn - INTEGER, INTENT(OUT) :: win_keyval - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: extra_state - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_create_keyval_f08 -end interface MPI_Win_create_keyval - -interface MPI_Win_delete_attr -subroutine MPI_Win_delete_attr_f08(win,win_keyval,ierror) - use :: mpi_f08_types, only : MPI_Win - implicit none - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, INTENT(IN) :: win_keyval - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_delete_attr_f08 -end interface MPI_Win_delete_attr - -interface MPI_Win_free_keyval -subroutine MPI_Win_free_keyval_f08(win_keyval,ierror) - implicit none - INTEGER, INTENT(INOUT) :: win_keyval - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_free_keyval_f08 -end interface MPI_Win_free_keyval - -interface MPI_Win_get_attr -subroutine MPI_Win_get_attr_f08(win,win_keyval,attribute_val,flag,ierror) - use :: mpi_f08_types, only : MPI_Win, MPI_ADDRESS_KIND - implicit none - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, INTENT(IN) :: win_keyval - INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: attribute_val - LOGICAL, INTENT(OUT) :: flag - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_get_attr_f08 -end interface MPI_Win_get_attr - -interface MPI_Win_get_info -subroutine MPI_Win_get_info_f08(win,info,ierror) - use :: mpi_f08_types, only : MPI_Win, MPI_Info - implicit none - TYPE(MPI_Win), INTENT(IN) :: win - TYPE(MPI_Info), INTENT(OUT) :: info - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_get_info_f08 -end interface MPI_Win_get_info - -interface MPI_Win_get_name -subroutine MPI_Win_get_name_f08(win,win_name,resultlen,ierror) - use :: mpi_f08_types, only : MPI_Win, MPI_MAX_OBJECT_NAME - implicit none - TYPE(MPI_Win), INTENT(IN) :: win - CHARACTER(LEN=MPI_MAX_OBJECT_NAME), INTENT(OUT) :: win_name - INTEGER, INTENT(OUT) :: resultlen - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_get_name_f08 -end interface MPI_Win_get_name - -interface MPI_Win_set_attr -subroutine MPI_Win_set_attr_f08(win,win_keyval,attribute_val,ierror) - use :: mpi_f08_types, only : MPI_Win, MPI_ADDRESS_KIND - implicit none - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, INTENT(IN) :: win_keyval - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: attribute_val - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_set_attr_f08 -end interface MPI_Win_set_attr - -interface MPI_Win_set_info -subroutine MPI_Win_set_info_f08(win,info,ierror) - use :: mpi_f08_types, only : MPI_Win, MPI_Info - implicit none - TYPE(MPI_Win), INTENT(IN) :: win - TYPE(MPI_Info), INTENT(IN) :: info - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_set_info_f08 -end interface MPI_Win_set_info - -interface MPI_Win_set_name -subroutine MPI_Win_set_name_f08(win,win_name,ierror) - use :: mpi_f08_types, only : MPI_Win - implicit none - TYPE(MPI_Win), INTENT(IN) :: win - CHARACTER(LEN=*), INTENT(IN) :: win_name - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_set_name_f08 -end interface MPI_Win_set_name - -interface MPI_Cartdim_get -subroutine MPI_Cartdim_get_f08(comm,ndims,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(OUT) :: ndims - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Cartdim_get_f08 -end interface MPI_Cartdim_get - -interface MPI_Cart_coords -subroutine MPI_Cart_coords_f08(comm,rank,maxdims,coords,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(IN) :: rank, maxdims - INTEGER, INTENT(OUT) :: coords(maxdims) - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Cart_coords_f08 -end interface MPI_Cart_coords - -interface MPI_Cart_create -subroutine MPI_Cart_create_f08(comm_old,ndims,dims,periods,reorder,comm_cart,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm_old - INTEGER, INTENT(IN) :: ndims, dims(ndims) - LOGICAL, INTENT(IN) :: periods(ndims), reorder - TYPE(MPI_Comm), INTENT(OUT) :: comm_cart - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Cart_create_f08 -end interface MPI_Cart_create - -interface MPI_Cart_get -subroutine MPI_Cart_get_f08(comm,maxdims,dims,periods,coords,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(IN) :: maxdims - INTEGER, INTENT(OUT) :: dims(maxdims), coords(maxdims) - LOGICAL, INTENT(OUT) :: periods(maxdims) - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Cart_get_f08 -end interface MPI_Cart_get - -interface MPI_Cart_map -subroutine MPI_Cart_map_f08(comm,ndims,dims,periods,newrank,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(IN) :: ndims, dims(ndims) - LOGICAL, INTENT(IN) :: periods(ndims) - INTEGER, INTENT(OUT) :: newrank - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Cart_map_f08 -end interface MPI_Cart_map - -interface MPI_Cart_rank -subroutine MPI_Cart_rank_f08(comm,coords,rank,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(IN) :: coords(*) - INTEGER, INTENT(OUT) :: rank - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Cart_rank_f08 -end interface MPI_Cart_rank - -interface MPI_Cart_shift -subroutine MPI_Cart_shift_f08(comm,direction,disp,rank_source,rank_dest,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(IN) :: direction, disp - INTEGER, INTENT(OUT) :: rank_source, rank_dest - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Cart_shift_f08 -end interface MPI_Cart_shift - -interface MPI_Cart_sub -subroutine MPI_Cart_sub_f08(comm,remain_dims,newcomm,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - LOGICAL, INTENT(IN) :: remain_dims(*) - TYPE(MPI_Comm), INTENT(OUT) :: newcomm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Cart_sub_f08 -end interface MPI_Cart_sub - -interface MPI_Dims_create -subroutine MPI_Dims_create_f08(nnodes,ndims,dims,ierror) - implicit none - INTEGER, INTENT(IN) :: nnodes, ndims - INTEGER, INTENT(INOUT) :: dims(ndims) - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Dims_create_f08 -end interface MPI_Dims_create - -interface MPI_Dist_graph_create -subroutine MPI_Dist_graph_create_f08(comm_old,n,sources,degrees,destinations,weights, & - info,reorder,comm_dist_graph,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Info - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm_old - INTEGER, INTENT(IN) :: n, sources(n), degrees(n), destinations(*), weights(*) - TYPE(MPI_Info), INTENT(IN) :: info - LOGICAL, INTENT(IN) :: reorder - TYPE(MPI_Comm), INTENT(OUT) :: comm_dist_graph - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Dist_graph_create_f08 -end interface MPI_Dist_graph_create - -interface MPI_Dist_graph_create_adjacent -subroutine MPI_Dist_graph_create_adjacent_f08(comm_old,indegree,sources,sourceweights, & - outdegree,destinations,destweights,info,reorder, & - comm_dist_graph,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Info - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm_old - INTEGER, INTENT(IN) :: indegree, sources(indegree), outdegree, destinations(outdegree) - INTEGER, INTENT(IN) :: sourceweights(indegree), destweights(outdegree) - TYPE(MPI_Info), INTENT(IN) :: info - LOGICAL, INTENT(IN) :: reorder - TYPE(MPI_Comm), INTENT(OUT) :: comm_dist_graph - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Dist_graph_create_adjacent_f08 -end interface MPI_Dist_graph_create_adjacent - -interface MPI_Dist_graph_neighbors -subroutine MPI_Dist_graph_neighbors_f08(comm,maxindegree,sources,sourceweights, & - maxoutdegree,destinations,destweights,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(IN) :: maxindegree, maxoutdegree - INTEGER, INTENT(OUT) :: sources(maxindegree), destinations(maxoutdegree) - INTEGER, INTENT(OUT) :: sourceweights(maxindegree), destweights(maxoutdegree) - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Dist_graph_neighbors_f08 -end interface MPI_Dist_graph_neighbors - -interface MPI_Dist_graph_neighbors_count -subroutine MPI_Dist_graph_neighbors_count_f08(comm,indegree,outdegree,weighted,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(OUT) :: indegree, outdegree - LOGICAL, INTENT(OUT) :: weighted - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Dist_graph_neighbors_count_f08 -end interface MPI_Dist_graph_neighbors_count - -interface MPI_Graphdims_get -subroutine MPI_Graphdims_get_f08(comm,nnodes,nedges,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(OUT) :: nnodes, nedges - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Graphdims_get_f08 -end interface MPI_Graphdims_get - -interface MPI_Graph_create -subroutine MPI_Graph_create_f08(comm_old,nnodes,index,edges,reorder,comm_graph, & - ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm_old - INTEGER, INTENT(IN) :: nnodes, index(nnodes), edges(*) - LOGICAL, INTENT(IN) :: reorder - TYPE(MPI_Comm), INTENT(OUT) :: comm_graph - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Graph_create_f08 -end interface MPI_Graph_create - -interface MPI_Graph_get -subroutine MPI_Graph_get_f08(comm,maxindex,maxedges,index,edges,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(IN) :: maxindex, maxedges - INTEGER, INTENT(OUT) :: index(maxindex), edges(maxedges) - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Graph_get_f08 -end interface MPI_Graph_get - -interface MPI_Graph_map -subroutine MPI_Graph_map_f08(comm,nnodes,index,edges,newrank,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(IN) :: nnodes, index(nnodes), edges(*) - INTEGER, INTENT(OUT) :: newrank - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Graph_map_f08 -end interface MPI_Graph_map - -interface MPI_Graph_neighbors -subroutine MPI_Graph_neighbors_f08(comm,rank,maxneighbors,neighbors,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(IN) :: rank, maxneighbors - INTEGER, INTENT(OUT) :: neighbors(maxneighbors) - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Graph_neighbors_f08 -end interface MPI_Graph_neighbors - -interface MPI_Graph_neighbors_count -subroutine MPI_Graph_neighbors_count_f08(comm,rank,nneighbors,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(IN) :: rank - INTEGER, INTENT(OUT) :: nneighbors - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Graph_neighbors_count_f08 -end interface MPI_Graph_neighbors_count - -interface MPI_Topo_test -subroutine MPI_Topo_test_f08(comm,status,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Status - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(OUT) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Topo_test_f08 -end interface MPI_Topo_test - -! MPI_Wtick is not a wrapper function -! -interface MPI_Wtick -function MPI_Wtick_f08( ) BIND(C,name="MPI_Wtick") - use, intrinsic :: ISO_C_BINDING - implicit none - DOUBLE PRECISION :: MPI_Wtick_f08 -end function MPI_Wtick_f08 -end interface MPI_Wtick - -! MPI_Wtime is not a wrapper function -! -interface MPI_Wtime -function MPI_Wtime_f08( ) BIND(C,name="MPI_Wtime") - use, intrinsic :: ISO_C_BINDING - implicit none - DOUBLE PRECISION :: MPI_Wtime_f08 -end function MPI_Wtime_f08 -end interface MPI_Wtime - -interface MPI_Aint_add -function MPI_Aint_add_f08(base,diff) - use :: mpi_f08_types, only : MPI_ADDRESS_KIND - implicit none - INTEGER(MPI_ADDRESS_KIND) :: base - INTEGER(MPI_ADDRESS_KIND) :: diff - INTEGER(MPI_ADDRESS_KIND) :: MPI_Aint_add_f08 -end function MPI_Aint_add_f08 -end interface MPI_Aint_add - -interface MPI_Aint_diff -function MPI_Aint_diff_f08(addr1,addr2) - use :: mpi_f08_types, only : MPI_ADDRESS_KIND - implicit none - INTEGER(MPI_ADDRESS_KIND) :: addr1 - INTEGER(MPI_ADDRESS_KIND) :: addr2 - INTEGER(MPI_ADDRESS_KIND) :: MPI_Aint_diff_f08 -end function MPI_Aint_diff_f08 -end interface MPI_Aint_diff - -interface MPI_Abort -subroutine MPI_Abort_f08(comm,errorcode,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(IN) :: errorcode - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Abort_f08 -end interface MPI_Abort - -interface MPI_Add_error_class -subroutine MPI_Add_error_class_f08(errorclass,ierror) - implicit none - INTEGER, INTENT(OUT) :: errorclass - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Add_error_class_f08 -end interface MPI_Add_error_class - -interface MPI_Add_error_code -subroutine MPI_Add_error_code_f08(errorclass,errorcode,ierror) - implicit none - INTEGER, INTENT(IN) :: errorclass - INTEGER, INTENT(OUT) :: errorcode - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Add_error_code_f08 -end interface MPI_Add_error_code - -interface MPI_Add_error_string -subroutine MPI_Add_error_string_f08(errorcode,string,ierror) - implicit none - integer, intent(in) :: errorcode - character(len=*), intent(in) :: string - integer, optional, intent(out) :: ierror -end subroutine MPI_Add_error_string_f08 -end interface MPI_Add_error_string - -interface MPI_Alloc_mem -subroutine MPI_Alloc_mem_f08(size,info,baseptr,ierror) - use, intrinsic :: ISO_C_BINDING, only : C_PTR - use :: mpi_f08_types, only : MPI_Info, MPI_ADDRESS_KIND - implicit none - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: size - TYPE(MPI_Info), INTENT(IN) :: info - TYPE(C_PTR), INTENT(OUT) :: baseptr - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Alloc_mem_f08 -end interface MPI_Alloc_mem - -interface MPI_Comm_call_errhandler -subroutine MPI_Comm_call_errhandler_f08(comm,errorcode,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(IN) :: errorcode - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_call_errhandler_f08 -end interface MPI_Comm_call_errhandler - -interface MPI_Comm_create_errhandler -subroutine MPI_Comm_create_errhandler_f08(comm_errhandler_fn,errhandler,ierror) - use :: mpi_f08_types, only : MPI_Errhandler - use :: mpi_f08_interfaces_callbacks, only : MPI_Comm_errhandler_function - implicit none - PROCEDURE(MPI_Comm_errhandler_function) :: comm_errhandler_fn - TYPE(MPI_Errhandler), INTENT(OUT) :: errhandler - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_create_errhandler_f08 -end interface MPI_Comm_create_errhandler - -interface MPI_Comm_get_errhandler -subroutine MPI_Comm_get_errhandler_f08(comm,errhandler,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Errhandler - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Errhandler), INTENT(OUT) :: errhandler - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_get_errhandler_f08 -end interface MPI_Comm_get_errhandler - -interface MPI_Comm_set_errhandler -subroutine MPI_Comm_set_errhandler_f08(comm,errhandler,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Errhandler - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Errhandler), INTENT(IN) :: errhandler - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_set_errhandler_f08 -end interface MPI_Comm_set_errhandler - -interface MPI_Errhandler_free -subroutine MPI_Errhandler_free_f08(errhandler,ierror) - use :: mpi_f08_types, only : MPI_Errhandler - implicit none - TYPE(MPI_Errhandler), INTENT(INOUT) :: errhandler - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Errhandler_free_f08 -end interface MPI_Errhandler_free - -interface MPI_Error_class -subroutine MPI_Error_class_f08(errorcode,errorclass,ierror) - implicit none - INTEGER, INTENT(IN) :: errorcode - INTEGER, INTENT(OUT) :: errorclass - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Error_class_f08 -end interface MPI_Error_class - -interface MPI_Error_string -subroutine MPI_Error_string_f08(errorcode,string,resultlen,ierror) - use :: mpi_f08_types, only : MPI_MAX_ERROR_STRING - implicit none - integer, intent(in) :: errorcode - character(len=MPI_MAX_ERROR_STRING), intent(out) :: string - integer, intent(out) :: resultlen - integer, optional, intent(out) :: ierror -end subroutine MPI_Error_string_f08 -end interface MPI_Error_string - -#if OMPI_PROVIDE_MPI_FILE_INTERFACE - -interface MPI_File_call_errhandler -subroutine MPI_File_call_errhandler_f08(fh,errorcode,ierror) - use :: mpi_f08_types, only : MPI_File - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER, INTENT(IN) :: errorcode - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_call_errhandler_f08 -end interface MPI_File_call_errhandler - -interface MPI_File_create_errhandler -subroutine MPI_File_create_errhandler_f08(file_errhandler_fn,errhandler,ierror) - use :: mpi_f08_types, only : MPI_Errhandler - use :: mpi_f08_interfaces_callbacks, only : MPI_File_errhandler_function - implicit none - PROCEDURE(MPI_File_errhandler_function) :: file_errhandler_fn - TYPE(MPI_Errhandler), INTENT(OUT) :: errhandler - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_create_errhandler_f08 -end interface MPI_File_create_errhandler - -interface MPI_File_get_errhandler -subroutine MPI_File_get_errhandler_f08(file,errhandler,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Errhandler - implicit none - TYPE(MPI_File), INTENT(IN) :: file - TYPE(MPI_Errhandler), INTENT(OUT) :: errhandler - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_get_errhandler_f08 -end interface MPI_File_get_errhandler - -interface MPI_File_set_errhandler -subroutine MPI_File_set_errhandler_f08(file,errhandler,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Errhandler - implicit none - TYPE(MPI_File), INTENT(IN) :: file - TYPE(MPI_Errhandler), INTENT(IN) :: errhandler - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_set_errhandler_f08 -end interface MPI_File_set_errhandler - -! endif for OMPI_PROVIDE_MPI_FILE_INTERFACE -#endif - -interface MPI_Finalize -subroutine MPI_Finalize_f08(ierror) - implicit none - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Finalize_f08 -end interface MPI_Finalize - -interface MPI_Finalized -subroutine MPI_Finalized_f08(flag,ierror) - implicit none - LOGICAL, INTENT(OUT) :: flag - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Finalized_f08 -end interface MPI_Finalized - -! ASYNCHRONOUS had to removed from the base argument because -! the dummy argument is not an assumed-shape array. This will -! be okay once the Interop TR is implemented. -interface MPI_Free_mem -subroutine MPI_Free_mem_f08(base,ierror) - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: base - !GCC$ ATTRIBUTES NO_ARG_CHECK :: base - !$PRAGMA IGNORE_TKR base - !DIR$ IGNORE_TKR base - !IBM* IGNORE_TKR base - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: base - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Free_mem_f08 -end interface MPI_Free_mem - -interface MPI_Get_processor_name -subroutine MPI_Get_processor_name_f08(name,resultlen,ierror) - use :: mpi_f08_types, only : MPI_MAX_PROCESSOR_NAME - implicit none - character(len=MPI_MAX_PROCESSOR_NAME), intent(out) :: name - integer, intent(out) :: resultlen - integer, optional, intent(out) :: ierror -end subroutine MPI_Get_processor_name_f08 -end interface MPI_Get_processor_name - -interface MPI_Get_version -subroutine MPI_Get_version_f08(version,subversion,ierror) - implicit none - INTEGER, INTENT(OUT) :: version, subversion - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Get_version_f08 -end interface MPI_Get_version - -interface MPI_Init -subroutine MPI_Init_f08(ierror) - implicit none - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Init_f08 -end interface MPI_Init - -interface MPI_Initialized -subroutine MPI_Initialized_f08(flag,ierror) - implicit none - LOGICAL, INTENT(OUT) :: flag - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Initialized_f08 -end interface MPI_Initialized - -interface MPI_Win_call_errhandler -subroutine MPI_Win_call_errhandler_f08(win,errorcode,ierror) - use :: mpi_f08_types, only : MPI_Win - implicit none - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, INTENT(IN) :: errorcode - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_call_errhandler_f08 -end interface MPI_Win_call_errhandler - -interface MPI_Win_create_errhandler -subroutine MPI_Win_create_errhandler_f08(win_errhandler_fn,errhandler,ierror) - use :: mpi_f08_types, only : MPI_Errhandler - use :: mpi_f08_interfaces_callbacks, only : MPI_Win_errhandler_function - implicit none - PROCEDURE(MPI_Win_errhandler_function) :: win_errhandler_fn - TYPE(MPI_Errhandler), INTENT(OUT) :: errhandler - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_create_errhandler_f08 -end interface MPI_Win_create_errhandler - -interface MPI_Win_get_errhandler -subroutine MPI_Win_get_errhandler_f08(win,errhandler,ierror) - use :: mpi_f08_types, only : MPI_Win, MPI_Errhandler - implicit none - TYPE(MPI_Win), INTENT(IN) :: win - TYPE(MPI_Errhandler), INTENT(OUT) :: errhandler - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_get_errhandler_f08 -end interface MPI_Win_get_errhandler - -interface MPI_Win_set_errhandler -subroutine MPI_Win_set_errhandler_f08(win,errhandler,ierror) - use :: mpi_f08_types, only : MPI_Win, MPI_Errhandler - implicit none - TYPE(MPI_Win), INTENT(IN) :: win - TYPE(MPI_Errhandler), INTENT(IN) :: errhandler - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_set_errhandler_f08 -end interface MPI_Win_set_errhandler - -interface MPI_Info_create -subroutine MPI_Info_create_f08(info,ierror) - use :: mpi_f08_types, only : MPI_Info - implicit none - TYPE(MPI_Info), INTENT(OUT) :: info - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Info_create_f08 -end interface MPI_Info_create - -interface MPI_Info_delete -subroutine MPI_Info_delete_f08(info,key,ierror) - use :: mpi_f08_types, only : MPI_Info - implicit none - TYPE(MPI_Info), INTENT(IN) :: info - CHARACTER(LEN=*), INTENT(IN) :: key - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Info_delete_f08 -end interface MPI_Info_delete - -interface MPI_Info_dup -subroutine MPI_Info_dup_f08(info,newinfo,ierror) - use :: mpi_f08_types, only : MPI_Info - implicit none - TYPE(MPI_Info), INTENT(IN) :: info - TYPE(MPI_Info), INTENT(OUT) :: newinfo - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Info_dup_f08 -end interface MPI_Info_dup - -interface MPI_Info_free -subroutine MPI_Info_free_f08(info,ierror) - use :: mpi_f08_types, only : MPI_Info - implicit none - TYPE(MPI_Info), INTENT(INOUT) :: info - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Info_free_f08 -end interface MPI_Info_free - -interface MPI_Info_get -subroutine MPI_Info_get_f08(info,key,valuelen,value,flag,ierror) - use :: mpi_f08_types, only : MPI_Info - implicit none - TYPE(MPI_Info), INTENT(IN) :: info - CHARACTER(LEN=*), INTENT(IN) :: key - INTEGER, INTENT(IN) :: valuelen - CHARACTER(LEN=valuelen), INTENT(OUT) :: value - LOGICAL, INTENT(OUT) :: flag - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Info_get_f08 -end interface MPI_Info_get - -interface MPI_Info_get_nkeys -subroutine MPI_Info_get_nkeys_f08(info,nkeys,ierror) - use :: mpi_f08_types, only : MPI_Info - implicit none - TYPE(MPI_Info), INTENT(IN) :: info - INTEGER, INTENT(OUT) :: nkeys - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Info_get_nkeys_f08 -end interface MPI_Info_get_nkeys - -interface MPI_Info_get_nthkey -subroutine MPI_Info_get_nthkey_f08(info,n,key,ierror) - use :: mpi_f08_types, only : MPI_Info - implicit none - TYPE(MPI_Info), INTENT(IN) :: info - INTEGER, INTENT(IN) :: n - CHARACTER(lEN=*), INTENT(OUT) :: key - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Info_get_nthkey_f08 -end interface MPI_Info_get_nthkey - -interface MPI_Info_get_valuelen -subroutine MPI_Info_get_valuelen_f08(info,key,valuelen,flag,ierror) - use :: mpi_f08_types, only : MPI_Info - implicit none - TYPE(MPI_Info), INTENT(IN) :: info - CHARACTER(LEN=*), INTENT(IN) :: key - INTEGER, INTENT(OUT) :: valuelen - LOGICAL, INTENT(OUT) :: flag - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Info_get_valuelen_f08 -end interface MPI_Info_get_valuelen - -interface MPI_Info_set -subroutine MPI_Info_set_f08(info,key,value,ierror) - use :: mpi_f08_types, only : MPI_Info - implicit none - TYPE(MPI_Info), INTENT(IN) :: info - CHARACTER(LEN=*), INTENT(IN) :: key, value - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Info_set_f08 -end interface MPI_Info_set - -interface MPI_Close_port -subroutine MPI_Close_port_f08(port_name,ierror) - implicit none - CHARACTER(LEN=*), INTENT(IN) :: port_name - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Close_port_f08 -end interface MPI_Close_port - -interface MPI_Comm_accept -subroutine MPI_Comm_accept_f08(port_name,info,root,comm,newcomm,ierror) - use :: mpi_f08_types, only : MPI_Info, MPI_Comm - implicit none - CHARACTER(LEN=*), INTENT(IN) :: port_name - TYPE(MPI_Info), INTENT(IN) :: info - INTEGER, INTENT(IN) :: root - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Comm), INTENT(OUT) :: newcomm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_accept_f08 -end interface MPI_Comm_accept - -interface MPI_Comm_connect -subroutine MPI_Comm_connect_f08(port_name,info,root,comm,newcomm,ierror) - use :: mpi_f08_types, only : MPI_Info, MPI_Comm - implicit none - CHARACTER(LEN=*), INTENT(IN) :: port_name - TYPE(MPI_Info), INTENT(IN) :: info - INTEGER, INTENT(IN) :: root - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Comm), INTENT(OUT) :: newcomm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_connect_f08 -end interface MPI_Comm_connect - -interface MPI_Comm_disconnect -subroutine MPI_Comm_disconnect_f08(comm,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(INOUT) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_disconnect_f08 -end interface MPI_Comm_disconnect - -interface MPI_Comm_get_parent -subroutine MPI_Comm_get_parent_f08(parent,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - TYPE(MPI_Comm), INTENT(OUT) :: parent - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_get_parent_f08 -end interface MPI_Comm_get_parent - -interface MPI_Comm_join -subroutine MPI_Comm_join_f08(fd,intercomm,ierror) - use :: mpi_f08_types, only : MPI_Comm - implicit none - INTEGER, INTENT(IN) :: fd - TYPE(MPI_Comm), INTENT(OUT) :: intercomm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_join_f08 -end interface MPI_Comm_join - -interface MPI_Comm_spawn -subroutine MPI_Comm_spawn_f08(command,argv,maxprocs,info,root,comm,intercomm, & - array_of_errcodes,ierror) - use :: mpi_f08_types, only : MPI_Info, MPI_Comm - implicit none - CHARACTER(LEN=*), INTENT(IN) :: command, argv(*) - INTEGER, INTENT(IN) :: maxprocs, root - TYPE(MPI_Info), INTENT(IN) :: info - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Comm), INTENT(OUT) :: intercomm - INTEGER :: array_of_errcodes(*) - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_spawn_f08 -end interface MPI_Comm_spawn - -interface MPI_Comm_spawn_multiple -subroutine MPI_Comm_spawn_multiple_f08(count,array_of_commands,array_of_argv,array_of_maxprocs, & - array_of_info,root,comm,intercomm, & - array_of_errcodes,ierror) - use :: mpi_f08_types, only : MPI_Info, MPI_Comm - implicit none - INTEGER, INTENT(IN) :: count, array_of_maxprocs(*), root - CHARACTER(LEN=*), INTENT(IN) :: array_of_commands(*), array_of_argv(count,*) - TYPE(MPI_Info), INTENT(IN) :: array_of_info(*) - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Comm), INTENT(OUT) :: intercomm - INTEGER :: array_of_errcodes(*) - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_spawn_multiple_f08 -end interface MPI_Comm_spawn_multiple - -interface MPI_Lookup_name -subroutine MPI_Lookup_name_f08(service_name,info,port_name,ierror) - use :: mpi_f08_types, only : MPI_Info, MPI_MAX_PORT_NAME - implicit none - CHARACTER(LEN=*), INTENT(IN) :: service_name - TYPE(MPI_Info), INTENT(IN) :: info - CHARACTER(LEN=MPI_MAX_PORT_NAME), INTENT(OUT) :: port_name - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Lookup_name_f08 -end interface MPI_Lookup_name - -interface MPI_Open_port -subroutine MPI_Open_port_f08(info,port_name,ierror) - use :: mpi_f08_types, only : MPI_Info, MPI_MAX_PORT_NAME - implicit none - TYPE(MPI_Info), INTENT(IN) :: info - CHARACTER(LEN=MPI_MAX_PORT_NAME), INTENT(OUT) :: port_name - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Open_port_f08 -end interface MPI_Open_port - -interface MPI_Publish_name -subroutine MPI_Publish_name_f08(service_name,info,port_name,ierror) - use :: mpi_f08_types, only : MPI_Info - implicit none - TYPE(MPI_Info), INTENT(IN) :: info - CHARACTER(LEN=*), INTENT(IN) :: service_name, port_name - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Publish_name_f08 -end interface MPI_Publish_name - -interface MPI_Unpublish_name -subroutine MPI_Unpublish_name_f08(service_name,info,port_name,ierror) - use :: mpi_f08_types, only : MPI_Info - implicit none - CHARACTER(LEN=*), INTENT(IN) :: service_name, port_name - TYPE(MPI_Info), INTENT(IN) :: info - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Unpublish_name_f08 -end interface MPI_Unpublish_name - -interface MPI_Accumulate -subroutine MPI_Accumulate_f08(origin_addr,origin_count,origin_datatype,target_rank, & - target_disp,target_count,target_datatype,op,win,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Win, MPI_ADDRESS_KIND - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr - !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr - !$PRAGMA IGNORE_TKR origin_addr - !DIR$ IGNORE_TKR origin_addr - !IBM* IGNORE_TKR origin_addr - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr - INTEGER, INTENT(IN) :: origin_count, target_rank, target_count - TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp - TYPE(MPI_Datatype), INTENT(IN) :: target_datatype - TYPE(MPI_Op), INTENT(IN) :: op - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Accumulate_f08 -end interface MPI_Accumulate - -interface MPI_Raccumulate -subroutine MPI_Raccumulate_f08(origin_addr,origin_count,origin_datatype,target_rank, & - target_disp,target_count,target_datatype,op,win,request, & - ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Win, MPI_Request, MPI_ADDRESS_KIND - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr - !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr - !$PRAGMA IGNORE_TKR origin_addr - !DIR$ IGNORE_TKR origin_addr - !IBM* IGNORE_TKR origin_addr - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr - INTEGER, INTENT(IN) :: origin_count, target_rank, target_count - TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp - TYPE(MPI_Datatype), INTENT(IN) :: target_datatype - TYPE(MPI_Op), INTENT(IN) :: op - TYPE(MPI_Win), INTENT(IN) :: win - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Raccumulate_f08 -end interface MPI_Raccumulate - -interface MPI_Get -subroutine MPI_Get_f08(origin_addr,origin_count,origin_datatype,target_rank, & - target_disp,target_count,target_datatype,win,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Win, MPI_ADDRESS_KIND - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr - !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr - !$PRAGMA IGNORE_TKR origin_addr - !DIR$ IGNORE_TKR origin_addr - !IBM* IGNORE_TKR origin_addr - OMPI_FORTRAN_IGNORE_TKR_TYPE :: origin_addr - INTEGER, INTENT(IN) :: origin_count, target_rank, target_count - TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp - TYPE(MPI_Datatype), INTENT(IN) :: target_datatype - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Get_f08 -end interface MPI_Get - -interface MPI_Rget -subroutine MPI_Rget_f08(origin_addr,origin_count,origin_datatype,target_rank, & - target_disp,target_count,target_datatype,win,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Request, MPI_Win, MPI_ADDRESS_KIND - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr - !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr - !$PRAGMA IGNORE_TKR origin_addr - !DIR$ IGNORE_TKR origin_addr - !IBM* IGNORE_TKR origin_addr - OMPI_FORTRAN_IGNORE_TKR_TYPE :: origin_addr - INTEGER, INTENT(IN) :: origin_count, target_rank, target_count - TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp - TYPE(MPI_Datatype), INTENT(IN) :: target_datatype - TYPE(MPI_Win), INTENT(IN) :: win - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Rget_f08 -end interface MPI_Rget - -interface MPI_Get_accumulate -subroutine MPI_Get_accumulate_f08(origin_addr,origin_count,origin_datatype,result_addr, & - result_count,result_datatype,target_rank,target_disp, & - target_count,target_datatype,op,win,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Win, MPI_ADDRESS_KIND - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr,result_addr - !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr,result_addr - !$PRAGMA IGNORE_TKR origin_addr,result_addr - !DIR$ IGNORE_TKR origin_addr,result_addr - !IBM* IGNORE_TKR origin_addr,result_addr - OMPI_FORTRAN_IGNORE_TKR_TYPE :: origin_addr - INTEGER, INTENT(IN) :: origin_count, result_count, target_rank, target_count - TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype - OMPI_FORTRAN_IGNORE_TKR_TYPE :: result_addr - TYPE(MPI_Datatype), INTENT(IN) :: result_datatype - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp - TYPE(MPI_Datatype), INTENT(IN) :: target_datatype - TYPE(MPI_Op), INTENT(IN) :: op - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Get_accumulate_f08 -end interface MPI_Get_accumulate - -interface MPI_Rget_accumulate -subroutine MPI_Rget_accumulate_f08(origin_addr,origin_count,origin_datatype,result_addr, & - result_count,result_datatype,target_rank,target_disp, & - target_count,target_datatype,op,win,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Request, MPI_Win, MPI_ADDRESS_KIND - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr,result_addr - !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr,result_addr - !$PRAGMA IGNORE_TKR origin_addr,result_addr - !DIR$ IGNORE_TKR origin_addr,result_addr - !IBM* IGNORE_TKR origin_addr,result_addr - OMPI_FORTRAN_IGNORE_TKR_TYPE :: origin_addr - INTEGER, INTENT(IN) :: origin_count, result_count, target_rank, target_count - TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype - OMPI_FORTRAN_IGNORE_TKR_TYPE :: result_addr - TYPE(MPI_Datatype), INTENT(IN) :: result_datatype - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp - TYPE(MPI_Datatype), INTENT(IN) :: target_datatype - TYPE(MPI_Op), INTENT(IN) :: op - TYPE(MPI_Win), INTENT(IN) :: win - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Rget_accumulate_f08 -end interface MPI_Rget_accumulate - -interface MPI_Put -subroutine MPI_Put_f08(origin_addr,origin_count,origin_datatype,target_rank, & - target_disp,target_count,target_datatype,win,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Win, MPI_ADDRESS_KIND - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr - !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr - !$PRAGMA IGNORE_TKR origin_addr - !DIR$ IGNORE_TKR origin_addr - !IBM* IGNORE_TKR origin_addr - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr - INTEGER, INTENT(IN) :: origin_count, target_rank, target_count - TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp - TYPE(MPI_Datatype), INTENT(IN) :: target_datatype - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Put_f08 -end interface MPI_Put - -interface MPI_Rput -subroutine MPI_Rput_f08(origin_addr,origin_count,origin_datatype,target_rank, & - target_disp,target_count,target_datatype,win,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Win, MPI_Request, MPI_ADDRESS_KIND - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr - !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr - !$PRAGMA IGNORE_TKR origin_addr - !DIR$ IGNORE_TKR origin_addr - !IBM* IGNORE_TKR origin_addr - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr - INTEGER, INTENT(IN) :: origin_count, target_rank, target_count - TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp - TYPE(MPI_Datatype), INTENT(IN) :: target_datatype - TYPE(MPI_Win), INTENT(IN) :: win - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Rput_f08 -end interface MPI_Rput - -interface MPI_Fetch_and_op -subroutine MPI_Fetch_and_op_f08(origin_addr,result_addr,datatype,target_rank, & - target_disp,op,win,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Win, MPI_ADDRESS_KIND - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr,result_addr - !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr,result_addr - !$PRAGMA IGNORE_TKR origin_addr,result_addr - !DIR$ IGNORE_TKR origin_addr,result_addr - !IBM* IGNORE_TKR origin_addr,result_addr - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr - OMPI_FORTRAN_IGNORE_TKR_TYPE :: result_addr - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, INTENT(IN) :: target_rank - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp - TYPE(MPI_Op), INTENT(IN) :: op - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Fetch_and_op_f08 -end interface MPI_Fetch_and_op - -interface MPI_Compare_and_swap -subroutine MPI_Compare_and_swap_f08(origin_addr,compare_addr,result_addr,datatype, & - target_rank,target_disp,win,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Win, MPI_ADDRESS_KIND - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr,compare_addr,result_addr - !GCC$ ATTRIBUTES NO_ARG_CHECK :: origin_addr,compare_addr,result_addr - !$PRAGMA IGNORE_TKR origin_addr,compare_addr,result_addr - !DIR$ IGNORE_TKR origin_addr,compare_addr,result_addr - !IBM* IGNORE_TKR origin_addr,compare_addr,result_addr - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr,compare_addr - OMPI_FORTRAN_IGNORE_TKR_TYPE :: result_addr - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, INTENT(IN) :: target_rank - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Compare_and_swap_f08 -end interface MPI_Compare_and_swap - -interface MPI_Win_complete -subroutine MPI_Win_complete_f08(win,ierror) - use :: mpi_f08_types, only : MPI_Info, MPI_Comm, MPI_Win, MPI_ADDRESS_KIND - implicit none - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_complete_f08 -end interface MPI_Win_complete - -interface MPI_Win_create -subroutine MPI_Win_create_f08(base,size,disp_unit,info,comm,win,ierror) - use :: mpi_f08_types, only : MPI_Info, MPI_Comm, MPI_Win, MPI_ADDRESS_KIND - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: base - !GCC$ ATTRIBUTES NO_ARG_CHECK :: base - !$PRAGMA IGNORE_TKR base - !DIR$ IGNORE_TKR base - !IBM* IGNORE_TKR base - OMPI_FORTRAN_IGNORE_TKR_TYPE :: base - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: size - INTEGER, INTENT(IN) :: disp_unit - TYPE(MPI_Info), INTENT(IN) :: info - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Win), INTENT(OUT) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_create_f08 -end interface MPI_Win_create - -interface MPI_Win_create_dynamic -subroutine MPI_Win_create_dynamic_f08(info,comm,win,ierror) - use :: mpi_f08_types, only : MPI_Info, MPI_Comm, MPI_Win - implicit none - TYPE(MPI_Info), INTENT(IN) :: info - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Win), INTENT(OUT) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_create_dynamic_f08 -end interface MPI_Win_create_dynamic - -interface MPI_Win_attach -subroutine MPI_Win_attach_f08(win,base,size,ierror) - use :: mpi_f08_types, only : MPI_Win, MPI_ADDRESS_KIND - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: base - !GCC$ ATTRIBUTES NO_ARG_CHECK :: base - !$PRAGMA IGNORE_TKR base - !DIR$ IGNORE_TKR base - !IBM* IGNORE_TKR base - OMPI_FORTRAN_IGNORE_TKR_TYPE :: base - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: size - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_attach_f08 -end interface MPI_Win_attach - -interface MPI_Win_detach -subroutine MPI_Win_detach_f08(win,base,ierror) - use :: mpi_f08_types, only : MPI_Win, MPI_ADDRESS_KIND - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: base - !GCC$ ATTRIBUTES NO_ARG_CHECK :: base - !$PRAGMA IGNORE_TKR base - !DIR$ IGNORE_TKR base - !IBM* IGNORE_TKR base - OMPI_FORTRAN_IGNORE_TKR_TYPE :: base - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_detach_f08 -end interface MPI_Win_detach - -interface MPI_Win_fence -subroutine MPI_Win_fence_f08(assert,win,ierror) - use :: mpi_f08_types, only : MPI_Win - implicit none - INTEGER, INTENT(IN) :: assert - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_fence_f08 -end interface MPI_Win_fence - -interface MPI_Win_free -subroutine MPI_Win_free_f08(win,ierror) - use :: mpi_f08_types, only : MPI_Win - implicit none - TYPE(MPI_Win), INTENT(INOUT) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_free_f08 -end interface MPI_Win_free - -interface MPI_Win_get_group -subroutine MPI_Win_get_group_f08(win,group,ierror) - use :: mpi_f08_types, only : MPI_Win, MPI_Group - implicit none - TYPE(MPI_Win), INTENT(IN) :: win - TYPE(MPI_Group), INTENT(OUT) :: group - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_get_group_f08 -end interface MPI_Win_get_group - -interface MPI_Win_lock -subroutine MPI_Win_lock_f08(lock_type,rank,assert,win,ierror) - use :: mpi_f08_types, only : MPI_Win - implicit none - INTEGER, INTENT(IN) :: lock_type, rank, assert - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_lock_f08 -end interface MPI_Win_lock - -interface MPI_Win_lock_all -subroutine MPI_Win_lock_all_f08(assert,win,ierror) - use :: mpi_f08_types, only : MPI_Win - implicit none - INTEGER, INTENT(IN) :: assert - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_lock_all_f08 -end interface MPI_Win_lock_all - -interface MPI_Win_post -subroutine MPI_Win_post_f08(group,assert,win,ierror) - use :: mpi_f08_types, only : MPI_Group, MPI_Win - implicit none - TYPE(MPI_Group), INTENT(IN) :: group - INTEGER, INTENT(IN) :: assert - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_post_f08 -end interface MPI_Win_post - -interface MPI_Win_shared_query -subroutine MPI_Win_shared_query_f08(win, rank, size, disp_unit, baseptr,& - ierror) - USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR - use :: mpi_f08_types, only : MPI_Win, MPI_ADDRESS_KIND - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, INTENT(IN) :: rank - INTEGER(KIND=MPI_ADDRESS_KIND), INTENT(OUT) :: size - INTEGER, INTENT(OUT) :: disp_unit - TYPE(C_PTR), INTENT(OUT) :: baseptr - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_shared_query_f08 -end interface - -interface MPI_Win_start -subroutine MPI_Win_start_f08(group,assert,win,ierror) - use :: mpi_f08_types, only : MPI_Group, MPI_Win - implicit none - TYPE(MPI_Group), INTENT(IN) :: group - INTEGER, INTENT(IN) :: assert - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_start_f08 -end interface MPI_Win_start - -interface MPI_Win_sync -subroutine MPI_Win_sync_f08(win,ierror) - use :: mpi_f08_types, only : MPI_Group, MPI_Win - implicit none - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_sync_f08 -end interface MPI_Win_sync - -interface MPI_Win_test -subroutine MPI_Win_test_f08(win,flag,ierror) - use :: mpi_f08_types, only : MPI_Win - implicit none - LOGICAL, INTENT(OUT) :: flag - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_test_f08 -end interface MPI_Win_test - -interface MPI_Win_unlock -subroutine MPI_Win_unlock_f08(rank,win,ierror) - use :: mpi_f08_types, only : MPI_Win - implicit none - INTEGER, INTENT(IN) :: rank - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_unlock_f08 -end interface MPI_Win_unlock - -interface MPI_Win_unlock_all -subroutine MPI_Win_unlock_all_f08(win,ierror) - use :: mpi_f08_types, only : MPI_Win - implicit none - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_unlock_all_f08 -end interface MPI_Win_unlock_all - -interface MPI_Win_wait -subroutine MPI_Win_wait_f08(win,ierror) - use :: mpi_f08_types, only : MPI_Win - implicit none - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_wait_f08 -end interface MPI_Win_wait - -interface MPI_Win_flush -subroutine MPI_Win_flush_f08(rank,win,ierror) - use :: mpi_f08_types, only : MPI_Win - implicit none - INTEGER, INTENT(IN) :: rank - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_flush_f08 -end interface MPI_Win_flush - -interface MPI_Win_flush_local -subroutine MPI_Win_flush_local_f08(rank,win,ierror) - use :: mpi_f08_types, only : MPI_Win - implicit none - INTEGER, INTENT(IN) :: rank - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_flush_local_f08 -end interface MPI_Win_flush_local - -interface MPI_Win_flush_local_all -subroutine MPI_Win_flush_local_all_f08(win,ierror) - use :: mpi_f08_types, only : MPI_Win - implicit none - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_flush_local_all_f08 -end interface MPI_Win_flush_local_all - -interface MPI_Win_flush_all -subroutine MPI_Win_flush_all_f08(win,ierror) - use :: mpi_f08_types, only : MPI_Win - implicit none - TYPE(MPI_Win), INTENT(IN) :: win - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Win_flush_all_f08 -end interface MPI_Win_flush_all - -interface MPI_Grequest_complete -subroutine MPI_Grequest_complete_f08(request,ierror) - use :: mpi_f08_types, only : MPI_Request - implicit none - TYPE(MPI_Request), INTENT(IN) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Grequest_complete_f08 -end interface MPI_Grequest_complete - -interface MPI_Grequest_start -subroutine MPI_Grequest_start_f08(query_fn,free_fn,cancel_fn,extra_state,request, & - ierror) - use :: mpi_f08_types, only : MPI_Request, MPI_ADDRESS_KIND - use :: mpi_f08_interfaces_callbacks, only : MPI_Grequest_query_function - use :: mpi_f08_interfaces_callbacks, only : MPI_Grequest_free_function - use :: mpi_f08_interfaces_callbacks, only : MPI_Grequest_cancel_function - implicit none - PROCEDURE(MPI_Grequest_query_function) :: query_fn - PROCEDURE(MPI_Grequest_free_function) :: free_fn - PROCEDURE(MPI_Grequest_cancel_function) :: cancel_fn - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: extra_state - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Grequest_start_f08 -end interface MPI_Grequest_start - -interface MPI_Init_thread -subroutine MPI_Init_thread_f08(required,provided,ierror) - implicit none - INTEGER, INTENT(IN) :: required - INTEGER, INTENT(OUT) :: provided - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Init_thread_f08 -end interface MPI_Init_thread - -interface MPI_Is_thread_main -subroutine MPI_Is_thread_main_f08(flag,ierror) - implicit none - LOGICAL, INTENT(OUT) :: flag - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Is_thread_main_f08 -end interface MPI_Is_thread_main - -interface MPI_Query_thread -subroutine MPI_Query_thread_f08(provided,ierror) - implicit none - INTEGER, INTENT(OUT) :: provided - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Query_thread_f08 -end interface MPI_Query_thread - -interface MPI_Status_set_cancelled -subroutine MPI_Status_set_cancelled_f08(status,flag,ierror) - use :: mpi_f08_types, only : MPI_Status - implicit none - TYPE(MPI_Status), INTENT(INOUT) :: status - LOGICAL, INTENT(OUT) :: flag - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Status_set_cancelled_f08 -end interface MPI_Status_set_cancelled - -interface MPI_Status_set_elements -subroutine MPI_Status_set_elements_f08(status,datatype,count,ierror) - use :: mpi_f08_types, only : MPI_Status, MPI_Datatype - implicit none - TYPE(MPI_Status), INTENT(INOUT) :: status - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, INTENT(IN) :: count - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Status_set_elements_f08 -end interface MPI_Status_set_elements - -interface MPI_Status_set_elements_x -subroutine MPI_Status_set_elements_x_f08(status,datatype,count,ierror) - use :: mpi_f08_types, only : MPI_Status, MPI_Datatype, MPI_COUNT_KIND - implicit none - TYPE(MPI_Status), INTENT(INOUT) :: status - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER(MPI_COUNT_KIND), INTENT(IN) :: count - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Status_set_elements_x_f08 -end interface MPI_Status_set_elements_x - -#if OMPI_PROVIDE_MPI_FILE_INTERFACE - -interface MPI_File_close -subroutine MPI_File_close_f08(fh,ierror) - use :: mpi_f08_types, only : MPI_File - implicit none - TYPE(MPI_File), INTENT(INOUT) :: fh - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_close_f08 -end interface MPI_File_close - -interface MPI_File_delete -subroutine MPI_File_delete_f08(filename,info,ierror) - use :: mpi_f08_types, only : MPI_Info - implicit none - CHARACTER(LEN=*), INTENT(IN) :: filename - TYPE(MPI_Info), INTENT(IN) :: info - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_delete_f08 -end interface MPI_File_delete - -interface MPI_File_get_amode -subroutine MPI_File_get_amode_f08(fh,amode,ierror) - use :: mpi_f08_types, only : MPI_File - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER, INTENT(OUT) :: amode - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_get_amode_f08 -end interface MPI_File_get_amode - -interface MPI_File_get_atomicity -subroutine MPI_File_get_atomicity_f08(fh,flag,ierror) - use :: mpi_f08_types, only : MPI_File - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - LOGICAL, INTENT(OUT) :: flag - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_get_atomicity_f08 -end interface MPI_File_get_atomicity - -interface MPI_File_get_byte_offset -subroutine MPI_File_get_byte_offset_f08(fh,offset,disp,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset - INTEGER(MPI_OFFSET_KIND), INTENT(OUT) :: disp - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_get_byte_offset_f08 -end interface MPI_File_get_byte_offset - -interface MPI_File_get_group -subroutine MPI_File_get_group_f08(fh,group,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Group - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - TYPE(MPI_Group), INTENT(OUT) :: group - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_get_group_f08 -end interface MPI_File_get_group - -interface MPI_File_get_info -subroutine MPI_File_get_info_f08(fh,info_used,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Info - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - TYPE(MPI_Info), INTENT(OUT) :: info_used - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_get_info_f08 -end interface MPI_File_get_info - -interface MPI_File_get_position -subroutine MPI_File_get_position_f08(fh,offset,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(OUT) :: offset - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_get_position_f08 -end interface MPI_File_get_position - -interface MPI_File_get_position_shared -subroutine MPI_File_get_position_shared_f08(fh,offset,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(OUT) :: offset - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_get_position_shared_f08 -end interface MPI_File_get_position_shared - -interface MPI_File_get_size -subroutine MPI_File_get_size_f08(fh,size,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(OUT) :: size - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_get_size_f08 -end interface MPI_File_get_size - -interface MPI_File_get_type_extent -subroutine MPI_File_get_type_extent_f08(fh,datatype,extent,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_ADDRESS_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: extent - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_get_type_extent_f08 -end interface MPI_File_get_type_extent - -interface MPI_File_get_view -subroutine MPI_File_get_view_f08(fh,disp,etype,filetype,datarep,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(OUT) :: disp - TYPE(MPI_Datatype), INTENT(OUT) :: etype - TYPE(MPI_Datatype), INTENT(OUT) :: filetype - CHARACTER(LEN=*), INTENT(OUT) :: datarep - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_get_view_f08 -end interface MPI_File_get_view - -interface MPI_File_iread -subroutine MPI_File_iread_f08(fh,buf,count,datatype,request,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_iread_f08 -end interface MPI_File_iread - -interface MPI_File_iread_at -subroutine MPI_File_iread_at_f08(fh,offset,buf,count,datatype,request,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_iread_at_f08 -end interface MPI_File_iread_at - -interface MPI_File_iread_all -subroutine MPI_File_iread_all_f08(fh,buf,count,datatype,request,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_iread_all_f08 -end interface MPI_File_iread_all - -interface MPI_File_iread_at_all -subroutine MPI_File_iread_at_all_f08(fh,offset,buf,count,datatype,request,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_iread_at_all_f08 -end interface MPI_File_iread_at_all - -interface MPI_File_iread_shared -subroutine MPI_File_iread_shared_f08(fh,buf,count,datatype,request,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_iread_shared_f08 -end interface MPI_File_iread_shared - -interface MPI_File_iwrite -subroutine MPI_File_iwrite_f08(fh,buf,count,datatype,request,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_iwrite_f08 -end interface MPI_File_iwrite - -interface MPI_File_iwrite_at -subroutine MPI_File_iwrite_at_f08(fh,offset,buf,count,datatype,request,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_iwrite_at_f08 -end interface MPI_File_iwrite_at - -interface MPI_File_iwrite_all -subroutine MPI_File_iwrite_all_f08(fh,buf,count,datatype,request,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_iwrite_all_f08 -end interface MPI_File_iwrite_all - -interface MPI_File_iwrite_at_all -subroutine MPI_File_iwrite_at_all_f08(fh,offset,buf,count,datatype,request,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_iwrite_at_all_f08 -end interface MPI_File_iwrite_at_all - -interface MPI_File_iwrite_shared -subroutine MPI_File_iwrite_shared_f08(fh,buf,count,datatype,request,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_iwrite_shared_f08 -end interface MPI_File_iwrite_shared - -interface MPI_File_open -subroutine MPI_File_open_f08(comm,filename,amode,info,fh,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Info, MPI_File - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - CHARACTER(LEN=*), INTENT(IN) :: filename - INTEGER, INTENT(IN) :: amode - TYPE(MPI_Info), INTENT(IN) :: info - TYPE(MPI_File), INTENT(OUT) :: fh - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_open_f08 -end interface MPI_File_open - -interface MPI_File_preallocate -subroutine MPI_File_preallocate_f08(fh,size,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: size - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_preallocate_f08 -end interface MPI_File_preallocate - -interface MPI_File_read -subroutine MPI_File_read_f08(fh,buf,count,datatype,status,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_read_f08 -end interface MPI_File_read - -interface MPI_File_read_all -subroutine MPI_File_read_all_f08(fh,buf,count,datatype,status,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_read_all_f08 -end interface MPI_File_read_all - -interface MPI_File_read_all_begin -subroutine MPI_File_read_all_begin_f08(fh,buf,count,datatype,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_read_all_begin_f08 -end interface MPI_File_read_all_begin - -interface MPI_File_read_all_end -subroutine MPI_File_read_all_end_f08(fh,buf,status,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Status - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_read_all_end_f08 -end interface MPI_File_read_all_end - -interface MPI_File_read_at -subroutine MPI_File_read_at_f08(fh,offset,buf,count,datatype,status,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_read_at_f08 -end interface MPI_File_read_at - -interface MPI_File_read_at_all -subroutine MPI_File_read_at_all_f08(fh,offset,buf,count,datatype,status,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_read_at_all_f08 -end interface MPI_File_read_at_all - -interface MPI_File_read_at_all_begin -subroutine MPI_File_read_at_all_begin_f08(fh,offset,buf,count,datatype,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_read_at_all_begin_f08 -end interface MPI_File_read_at_all_begin - -interface MPI_File_read_at_all_end -subroutine MPI_File_read_at_all_end_f08(fh,buf,status,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Status - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_read_at_all_end_f08 -end interface MPI_File_read_at_all_end - -interface MPI_File_read_ordered -subroutine MPI_File_read_ordered_f08(fh,buf,count,datatype,status,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_read_ordered_f08 -end interface MPI_File_read_ordered - -interface MPI_File_read_ordered_begin -subroutine MPI_File_read_ordered_begin_f08(fh,buf,count,datatype,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_read_ordered_begin_f08 -end interface MPI_File_read_ordered_begin - -interface MPI_File_read_ordered_end -subroutine MPI_File_read_ordered_end_f08(fh,buf,status,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Status - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_read_ordered_end_f08 -end interface MPI_File_read_ordered_end - -interface MPI_File_read_shared -subroutine MPI_File_read_shared_f08(fh,buf,count,datatype,status,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_read_shared_f08 -end interface MPI_File_read_shared - -interface MPI_File_seek -subroutine MPI_File_seek_f08(fh,offset,whence,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset - INTEGER, INTENT(IN) :: whence - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_seek_f08 -end interface MPI_File_seek - -interface MPI_File_seek_shared -subroutine MPI_File_seek_shared_f08(fh,offset,whence,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset - INTEGER, INTENT(IN) :: whence - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_seek_shared_f08 -end interface MPI_File_seek_shared - -interface MPI_File_set_atomicity -subroutine MPI_File_set_atomicity_f08(fh,flag,ierror) - use :: mpi_f08_types, only : MPI_File - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - LOGICAL, INTENT(IN) :: flag - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_set_atomicity_f08 -end interface MPI_File_set_atomicity - -interface MPI_File_set_info -subroutine MPI_File_set_info_f08(fh,info,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Info - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - TYPE(MPI_Info), INTENT(IN) :: info - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_set_info_f08 -end interface MPI_File_set_info - -interface MPI_File_set_size -subroutine MPI_File_set_size_f08(fh,size,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: size - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_set_size_f08 -end interface MPI_File_set_size - -interface MPI_File_set_view -subroutine MPI_File_set_view_f08(fh,disp,etype,filetype,datarep,info,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Info, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: disp - TYPE(MPI_Datatype), INTENT(IN) :: etype - TYPE(MPI_Datatype), INTENT(IN) :: filetype - CHARACTER(LEN=*), INTENT(IN) :: datarep - TYPE(MPI_Info), INTENT(IN) :: info - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_set_view_f08 -end interface MPI_File_set_view - -interface MPI_File_sync -subroutine MPI_File_sync_f08(fh,ierror) - use :: mpi_f08_types, only : MPI_File - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_sync_f08 -end interface MPI_File_sync - -interface MPI_File_write -subroutine MPI_File_write_f08(fh,buf,count,datatype,status,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_write_f08 -end interface MPI_File_write - -interface MPI_File_write_all -subroutine MPI_File_write_all_f08(fh,buf,count,datatype,status,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_write_all_f08 -end interface MPI_File_write_all - -interface MPI_File_write_all_begin -subroutine MPI_File_write_all_begin_f08(fh,buf,count,datatype,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_write_all_begin_f08 -end interface MPI_File_write_all_begin - -interface MPI_File_write_all_end -subroutine MPI_File_write_all_end_f08(fh,buf,status,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Status - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_write_all_end_f08 -end interface MPI_File_write_all_end - -interface MPI_File_write_at -subroutine MPI_File_write_at_f08(fh,offset,buf,count,datatype,status,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_write_at_f08 -end interface MPI_File_write_at - -interface MPI_File_write_at_all -subroutine MPI_File_write_at_all_f08(fh,offset,buf,count,datatype,status,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_write_at_all_f08 -end interface MPI_File_write_at_all - -interface MPI_File_write_at_all_begin -subroutine MPI_File_write_at_all_begin_f08(fh,offset,buf,count,datatype,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_OFFSET_KIND - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_write_at_all_begin_f08 -end interface MPI_File_write_at_all_begin - -interface MPI_File_write_at_all_end -subroutine MPI_File_write_at_all_end_f08(fh,buf,status,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Status - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_write_at_all_end_f08 -end interface MPI_File_write_at_all_end - -interface MPI_File_write_ordered -subroutine MPI_File_write_ordered_f08(fh,buf,count,datatype,status,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_write_ordered_f08 -end interface MPI_File_write_ordered - -interface MPI_File_write_ordered_begin -subroutine MPI_File_write_ordered_begin_f08(fh,buf,count,datatype,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_write_ordered_begin_f08 -end interface MPI_File_write_ordered_begin - -interface MPI_File_write_ordered_end -subroutine MPI_File_write_ordered_end_f08(fh,buf,status,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Status - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_write_ordered_end_f08 -end interface MPI_File_write_ordered_end - -interface MPI_File_write_shared -subroutine MPI_File_write_shared_f08(fh,buf,count,datatype,status,ierror) - use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Status - implicit none - TYPE(MPI_File), INTENT(IN) :: fh - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_File_write_shared_f08 -end interface MPI_File_write_shared - -! endif for OMPI_PROVIDE_MPI_FILE_INTERFACE -#endif - -interface MPI_Register_datarep -subroutine MPI_Register_datarep_f08(datarep,read_conversion_fn,write_conversion_fn, & - dtype_file_extent_fn,extra_state,ierror) - use :: mpi_f08_types, only : MPI_ADDRESS_KIND - use :: mpi_f08_interfaces_callbacks, only : MPI_Datarep_conversion_function - use :: mpi_f08_interfaces_callbacks, only : MPI_Datarep_extent_function - implicit none - CHARACTER(LEN=*), INTENT(IN) :: datarep - PROCEDURE(MPI_Datarep_conversion_function) :: read_conversion_fn - PROCEDURE(MPI_Datarep_conversion_function) :: write_conversion_fn - PROCEDURE(MPI_Datarep_extent_function) :: dtype_file_extent_fn - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: extra_state - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Register_datarep_f08 -end interface MPI_Register_datarep - -! -! MPI_Sizeof is generic for numeric types. This ignore TKR interface -! is replaced by the specific generics. Implemented in mpi_sizeof_mod.F90. -! -!subroutine MPI_Sizeof(x,size,ierror) -! use :: mpi_f08_types -! implicit none -! !DEC$ ATTRIBUTES NO_ARG_CHECK :: x -! !GCC$ ATTRIBUTES NO_ARG_CHECK :: x -! !$PRAGMA IGNORE_TKR x -! !DIR$ IGNORE_TKR x -! !IBM* IGNORE_TKR x -! OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: x -! INTEGER, INTENT(OUT) :: size -! INTEGER, OPTIONAL, INTENT(OUT) :: ierror -!end subroutine MPI_Sizeof - -interface MPI_Type_create_f90_complex -subroutine MPI_Type_create_f90_complex_f08(p,r,newtype,ierror) - use :: mpi_f08_types, only : MPI_Datatype - implicit none - INTEGER, INTENT(IN) :: p, r - TYPE(MPI_Datatype), INTENT(OUT) :: newtype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_create_f90_complex_f08 -end interface MPI_Type_create_f90_complex - -interface MPI_Type_create_f90_integer -subroutine MPI_Type_create_f90_integer_f08(r,newtype,ierror) - use :: mpi_f08_types, only : MPI_Datatype - implicit none - INTEGER, INTENT(IN) :: r - TYPE(MPI_Datatype), INTENT(OUT) :: newtype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_create_f90_integer_f08 -end interface MPI_Type_create_f90_integer - -interface MPI_Type_create_f90_real -subroutine MPI_Type_create_f90_real_f08(p,r,newtype,ierror) - use :: mpi_f08_types, only : MPI_Datatype - implicit none - INTEGER, INTENT(IN) :: p, r - TYPE(MPI_Datatype), INTENT(OUT) :: newtype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_create_f90_real_f08 -end interface MPI_Type_create_f90_real - -interface MPI_Type_match_size -subroutine MPI_Type_match_size_f08(typeclass,size,datatype,ierror) - use :: mpi_f08_types, only : MPI_Datatype - implicit none - INTEGER, INTENT(IN) :: typeclass, size - TYPE(MPI_Datatype), INTENT(OUT) :: datatype - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Type_match_size_f08 -end interface MPI_Type_match_size - -interface MPI_Pcontrol -subroutine MPI_Pcontrol_f08(level) - implicit none - INTEGER, INTENT(IN) :: level -end subroutine MPI_Pcontrol_f08 -end interface MPI_Pcontrol - - -!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! -! New routines to MPI-3 -! - -interface MPI_Comm_split_type -subroutine MPI_Comm_split_type_f08(comm,split_type,key,info,newcomm,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Info - implicit none - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, INTENT(IN) :: split_type - INTEGER, INTENT(IN) :: key - TYPE(MPI_Info), INTENT(IN) :: info - TYPE(MPI_Comm), INTENT(OUT) :: newcomm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Comm_split_type_f08 -end interface MPI_Comm_split_type - -interface MPI_F_sync_reg -subroutine MPI_F_sync_reg_f08(buf) - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf -end subroutine MPI_F_sync_reg_f08 -end interface MPI_F_sync_reg - -interface MPI_Get_library_version -subroutine MPI_Get_library_version_f08(version,resultlen,ierror) - use :: mpi_f08_types, only : MPI_MAX_LIBRARY_VERSION_STRING - implicit none - character(len=MPI_MAX_LIBRARY_VERSION_STRING), intent(out) :: version - integer, intent(out) :: resultlen - integer, optional, intent(out) :: ierror -end subroutine MPI_Get_library_version_f08 -end interface MPI_Get_library_version - -interface MPI_Mprobe -subroutine MPI_Mprobe_f08(source,tag,comm,message,status,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Message, MPI_Status - implicit none - INTEGER, INTENT(IN) :: source, tag - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Message), INTENT(OUT) :: message - TYPE(MPI_Status), INTENT(OUT) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Mprobe_f08 -end interface MPI_Mprobe - -interface MPI_Improbe -subroutine MPI_Improbe_f08(source,tag,comm,flag,message,status,ierror) - use :: mpi_f08_types, only : MPI_Comm, MPI_Message, MPI_Status - implicit none - INTEGER, INTENT(IN) :: source, tag - TYPE(MPI_Comm), INTENT(IN) :: comm - LOGICAL, INTENT(OUT) :: flag - TYPE(MPI_Message), INTENT(OUT) :: message - TYPE(MPI_Status), INTENT(OUT) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Improbe_f08 -end interface MPI_Improbe - -interface MPI_Imrecv -subroutine MPI_Imrecv_f08(buf,count,datatype,message,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Message, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Message), INTENT(INOUT) :: message - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Imrecv_f08 -end interface MPI_Imrecv - -interface MPI_Mrecv -subroutine MPI_Mrecv_f08(buf,count,datatype,message,status,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Message, MPI_Status - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: buf - !$PRAGMA IGNORE_TKR buf - !DIR$ IGNORE_TKR buf - !IBM* IGNORE_TKR buf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: buf - INTEGER, INTENT(IN) :: count - TYPE(MPI_Datatype), INTENT(IN) :: datatype - TYPE(MPI_Message), INTENT(INOUT) :: message - TYPE(MPI_Status) :: status - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Mrecv_f08 -end interface MPI_Mrecv - -interface MPI_Neighbor_allgather -subroutine MPI_Neighbor_allgather_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & - comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcount, recvcount - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Neighbor_allgather_f08 -end interface MPI_Neighbor_allgather - -interface MPI_Ineighbor_allgather -subroutine MPI_Ineighbor_allgather_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & - comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcount, recvcount - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Ineighbor_allgather_f08 -end interface MPI_Ineighbor_allgather - -interface MPI_Neighbor_allgatherv -subroutine MPI_Neighbor_allgatherv_f08(sendbuf,sendcount,sendtype,recvbuf,recvcounts,displs, & - recvtype,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcount - INTEGER, INTENT(IN) :: recvcounts(*), displs(*) - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Neighbor_allgatherv_f08 -end interface MPI_Neighbor_allgatherv - -interface MPI_Ineighbor_allgatherv -subroutine MPI_Ineighbor_allgatherv_f08(sendbuf,sendcount,sendtype,recvbuf,recvcounts,displs, & - recvtype,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcount - INTEGER, INTENT(IN) :: recvcounts(*), displs(*) - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Ineighbor_allgatherv_f08 -end interface MPI_Ineighbor_allgatherv - -interface MPI_Neighbor_alltoall -subroutine MPI_Neighbor_alltoall_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & - comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcount, recvcount - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Neighbor_alltoall_f08 -end interface MPI_Neighbor_alltoall - -interface MPI_Ineighbor_alltoall -subroutine MPI_Ineighbor_alltoall_f08(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype, & - comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcount, recvcount - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(OUT) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Ineighbor_alltoall_f08 -end interface MPI_Ineighbor_alltoall - -interface MPI_Neighbor_alltoallv -subroutine MPI_Neighbor_alltoallv_f08(sendbuf,sendcounts,sdispls,sendtype,recvbuf,recvcounts, & - rdispls,recvtype,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcounts(*), sdispls(*), recvcounts(*), rdispls(*) - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Neighbor_alltoallv_f08 -end interface MPI_Neighbor_alltoallv - -interface MPI_Ineighbor_alltoallv -subroutine MPI_Ineighbor_alltoallv_f08(sendbuf,sendcounts,sdispls,sendtype,recvbuf,recvcounts, & - rdispls,recvtype,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcounts(*), sdispls(*), recvcounts(*), rdispls(*) - TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(IN) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Ineighbor_alltoallv_f08 -end interface MPI_Ineighbor_alltoallv - -interface MPI_Neighbor_alltoallw -subroutine MPI_Neighbor_alltoallw_f08(sendbuf,sendcounts,sdispls,sendtypes,recvbuf,recvcounts, & - rdispls,recvtypes,comm,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_ADDRESS_KIND - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcounts(*), recvcounts(*) - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: sdispls(*), rdispls(*) - TYPE(MPI_Datatype), INTENT(IN) :: sendtypes(*), recvtypes(*) - TYPE(MPI_Comm), INTENT(IN) :: comm - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Neighbor_alltoallw_f08 -end interface MPI_Neighbor_alltoallw - -interface MPI_Ineighbor_alltoallw -subroutine MPI_Ineighbor_alltoallw_f08(sendbuf,sendcounts,sdispls,sendtypes,recvbuf,recvcounts, & - rdispls,recvtypes,comm,request,ierror) - use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm, MPI_Request, MPI_ADDRESS_KIND - implicit none - !DEC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !GCC$ ATTRIBUTES NO_ARG_CHECK :: sendbuf, recvbuf - !$PRAGMA IGNORE_TKR sendbuf, recvbuf - !DIR$ IGNORE_TKR sendbuf, recvbuf - !IBM* IGNORE_TKR sendbuf, recvbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: sendbuf - OMPI_FORTRAN_IGNORE_TKR_TYPE :: recvbuf - INTEGER, INTENT(IN) :: sendcounts(*), recvcounts(*) - INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: sdispls(*), rdispls(*) - TYPE(MPI_Datatype), INTENT(IN) :: sendtypes(*), recvtypes(*) - TYPE(MPI_Comm), INTENT(IN) :: comm - TYPE(MPI_Request), INTENT(IN) :: request - INTEGER, OPTIONAL, INTENT(OUT) :: ierror -end subroutine MPI_Ineighbor_alltoallw_f08 -end interface MPI_Ineighbor_alltoallw - -end module mpi_f08_interfaces diff --git a/ompi/mpi/fortran/use-mpi-f08/mpi-f08.F90 b/ompi/mpi/fortran/use-mpi-f08/mpi-f08.F90 index 43b6cb09109..2cd04596e09 100644 --- a/ompi/mpi/fortran/use-mpi-f08/mpi-f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/mpi-f08.F90 @@ -13,7 +13,7 @@ ! Copyright (c) 2006-2014 Cisco Systems, Inc. All rights reserved. ! Copyright (c) 2009-2012 Los Alamos National Security, LLC. ! All rights reserved. -! Copyright (c) 2016 Research Organization for Information Science +! Copyright (c) 2016-2017 Research Organization for Information Science ! and Technology (RIST). All rights reserved. ! $COPYRIGHT$ ! @@ -30,6 +30,7 @@ module mpi_f08 use mpi_f08_interfaces ! this module contains the mpi_f08 interface declarations use pmpi_f08_interfaces ! this module contains the pmpi_f08 interface declarations use mpi_f08_callbacks ! this module contains the mpi_f08 attribute callback subroutines + use mpi_f08_interfaces_callbacks ! this module contains the mpi_f08 callback interfaces ! ! Declaration of the interfaces to the ompi impl files diff --git a/ompi/mpi/fortran/use-mpi-f08/pmpi-f-interfaces-bind.h b/ompi/mpi/fortran/use-mpi-f08/pmpi-f-interfaces-bind.h index 695ff644046..5a95b883058 100644 --- a/ompi/mpi/fortran/use-mpi-f08/pmpi-f-interfaces-bind.h +++ b/ompi/mpi/fortran/use-mpi-f08/pmpi-f-interfaces-bind.h @@ -7,8 +7,8 @@ ! of Tennessee Research Foundation. All rights ! reserved. ! Copyright (c) 2012 Inria. All rights reserved. -! Copyright (c) 2015 Research Organization for Information Science -! and Technology (RIST). All rights reserved. +! Copyright (c) 2015-2018 Research Organization for Information Science +! and Technology (RIST). All rights reserved. ! $COPYRIGHT$ ! ! This file provides the interface specifications for the MPI Fortran @@ -560,10 +560,10 @@ subroutine pompi_type_create_subarray_f(ndims,array_of_sizes, & INTEGER, INTENT(OUT) :: ierror end subroutine pompi_type_create_subarray_f -subroutine pompi_type_dup_f(type,newtype,ierror) & +subroutine pompi_type_dup_f(oldtype,newtype,ierror) & BIND(C, name="pompi_type_dup_f") implicit none - INTEGER, INTENT(IN) :: type + INTEGER, INTENT(IN) :: oldtype INTEGER, INTENT(OUT) :: newtype INTEGER, INTENT(OUT) :: ierror end subroutine pompi_type_dup_f @@ -1370,10 +1370,10 @@ subroutine pompi_type_create_keyval_f(type_copy_attr_fn,type_delete_attr_fn, & INTEGER, INTENT(OUT) :: ierror end subroutine pompi_type_create_keyval_f -subroutine pompi_type_delete_attr_f(type,type_keyval,ierror) & +subroutine pompi_type_delete_attr_f(datatype,type_keyval,ierror) & BIND(C, name="pompi_type_delete_attr_f") implicit none - INTEGER, INTENT(IN) :: type + INTEGER, INTENT(IN) :: datatype INTEGER, INTENT(IN) :: type_keyval INTEGER, INTENT(OUT) :: ierror end subroutine pompi_type_delete_attr_f @@ -1385,32 +1385,32 @@ subroutine pompi_type_free_keyval_f(type_keyval,ierror) & INTEGER, INTENT(OUT) :: ierror end subroutine pompi_type_free_keyval_f -subroutine pompi_type_get_name_f(type,type_name,resultlen,ierror,type_name_len) & +subroutine pompi_type_get_name_f(datatype,type_name,resultlen,ierror,type_name_len) & BIND(C, name="pompi_type_get_name_f") use, intrinsic :: ISO_C_BINDING, only : C_CHAR implicit none - INTEGER, INTENT(IN) :: type + INTEGER, INTENT(IN) :: datatype CHARACTER(KIND=C_CHAR), DIMENSION(*), INTENT(OUT) :: type_name INTEGER, INTENT(OUT) :: resultlen INTEGER, INTENT(OUT) :: ierror INTEGER, VALUE, INTENT(IN) :: type_name_len end subroutine pompi_type_get_name_f -subroutine pompi_type_set_attr_f(type,type_keyval,attribute_val,ierror) & +subroutine pompi_type_set_attr_f(datatype,type_keyval,attribute_val,ierror) & BIND(C, name="pompi_type_set_attr_f") use :: mpi_f08_types, only : MPI_ADDRESS_KIND implicit none - INTEGER, INTENT(IN) :: type + INTEGER, INTENT(IN) :: datatype INTEGER, INTENT(IN) :: type_keyval INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: attribute_val INTEGER, INTENT(OUT) :: ierror end subroutine pompi_type_set_attr_f -subroutine pompi_type_set_name_f(type,type_name,ierror,type_name_len) & +subroutine pompi_type_set_name_f(datatype,type_name,ierror,type_name_len) & BIND(C, name="pompi_type_set_name_f") use, intrinsic :: ISO_C_BINDING, only : C_CHAR implicit none - INTEGER, INTENT(IN) :: type + INTEGER, INTENT(IN) :: datatype CHARACTER(KIND=C_CHAR), DIMENSION(*), INTENT(IN) :: type_name INTEGER, INTENT(OUT) :: ierror INTEGER, VALUE, INTENT(IN) :: type_name_len @@ -1745,8 +1745,6 @@ subroutine pompi_error_string_f(errorcode,string,resultlen,ierror,str_len) & INTEGER, VALUE, INTENT(IN) :: str_len end subroutine pompi_error_string_f -#if OMPI_PROVIDE_MPI_FILE_INTERFACE - subroutine pompi_file_call_errhandler_f(fh,errorcode,ierror) & BIND(C, name="pompi_file_call_errhandler_f") implicit none @@ -1780,9 +1778,6 @@ subroutine pompi_file_set_errhandler_f(file,errhandler,ierror) & INTEGER, INTENT(OUT) :: ierror end subroutine pompi_file_set_errhandler_f -! OMPI_PROVIDE_MPI_FILE_INTERFACE -#endif - subroutine pompi_finalize_f(ierror) & BIND(C, name="pompi_finalize_f") implicit none @@ -2408,8 +2403,6 @@ subroutine pompi_status_set_elements_f(status,datatype,count,ierror) & INTEGER, INTENT(OUT) :: ierror end subroutine pompi_status_set_elements_f -#if OMPI_PROVIDE_MPI_FILE_INTERFACE - subroutine pompi_file_close_f(fh,ierror) & BIND(C, name="pompi_file_close_f") implicit none @@ -2983,9 +2976,6 @@ subroutine pompi_file_write_shared_f(fh,buf,count,datatype,status,ierror) & INTEGER, INTENT(OUT) :: ierror end subroutine pompi_file_write_shared_f -! OMPI_PROVIDE_MPI_FILE_INTERFACE -#endif - subroutine pompi_register_datarep_f(datarep,read_conversion_fn, & write_conversion_fn,dtype_file_extent_fn, & extra_state,ierror,datarep_len) & @@ -3039,11 +3029,11 @@ subroutine pompi_type_create_f90_real_f(p,r,newtype,ierror) & INTEGER, INTENT(OUT) :: ierror end subroutine pompi_type_create_f90_real_f -subroutine pompi_type_match_size_f(typeclass,size,type,ierror) & +subroutine pompi_type_match_size_f(typeclass,size,datatype,ierror) & BIND(C, name="pompi_type_match_size_f") implicit none INTEGER, INTENT(IN) :: typeclass, size - INTEGER, INTENT(OUT) :: type + INTEGER, INTENT(OUT) :: datatype INTEGER, INTENT(OUT) :: ierror end subroutine pompi_type_match_size_f diff --git a/ompi/mpi/fortran/use-mpi-f08/profile/pfile_iread_all_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/profile/pfile_iread_all_f08.F90 new file mode 100644 index 00000000000..3d935f98ae4 --- /dev/null +++ b/ompi/mpi/fortran/use-mpi-f08/profile/pfile_iread_all_f08.F90 @@ -0,0 +1,26 @@ +! -*- f90 -*- +! +! Copyright (c) 2010-2012 Cisco Systems, Inc. All rights reserved. +! Copyright (c) 2009-2012 Los Alamos National Security, LLC. +! All Rights reserved. +! Copyright (c) 2017 FUJITSU LIMITED. All rights reserved. +! $COPYRIGHT$ + +#include "ompi/mpi/fortran/configure-fortran-output.h" + +subroutine PMPI_File_iread_all_f08(fh,buf,count,datatype,request,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request + use :: mpi_f08, only : ompi_file_iread_all_f + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror + integer :: c_ierror + + call ompi_file_iread_all_f(fh%MPI_VAL,buf,count,datatype%MPI_VAL,request%MPI_VAL,c_ierror) + if (present(ierror)) ierror = c_ierror + +end subroutine PMPI_File_iread_all_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08/profile/pfile_iread_at_all_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/profile/pfile_iread_at_all_f08.F90 new file mode 100644 index 00000000000..1a627fd7399 --- /dev/null +++ b/ompi/mpi/fortran/use-mpi-f08/profile/pfile_iread_at_all_f08.F90 @@ -0,0 +1,28 @@ +! -*- f90 -*- +! +! Copyright (c) 2010-2012 Cisco Systems, Inc. All rights reserved. +! Copyright (c) 2009-2012 Los Alamos National Security, LLC. +! All Rights reserved. +! Copyright (c) 2017 FUJITSU LIMITED. All rights reserved. +! $COPYRIGHT$ + +#include "ompi/mpi/fortran/configure-fortran-output.h" + +subroutine PMPI_File_iread_at_all_f08(fh,offset,buf,count,datatype,request,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request, MPI_OFFSET_KIND + use :: mpi_f08, only : ompi_file_iread_at_all_f + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror + integer :: c_ierror + + call ompi_file_iread_at_all_f(fh%MPI_VAL,offset,buf,count,& + datatype%MPI_VAL,request%MPI_VAL,c_ierror) + if (present(ierror)) ierror = c_ierror + +end subroutine PMPI_File_iread_at_all_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08/profile/pfile_iwrite_all_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/profile/pfile_iwrite_all_f08.F90 new file mode 100644 index 00000000000..f176b17d9e2 --- /dev/null +++ b/ompi/mpi/fortran/use-mpi-f08/profile/pfile_iwrite_all_f08.F90 @@ -0,0 +1,27 @@ +! -*- f90 -*- +! +! Copyright (c) 2010-2012 Cisco Systems, Inc. All rights reserved. +! Copyright (c) 2009-2012 Los Alamos National Security, LLC. +! All Rights reserved. +! Copyright (c) 2017 FUJITSU LIMITED. All rights reserved. +! $COPYRIGHT$ + +#include "ompi/mpi/fortran/configure-fortran-output.h" + +subroutine PMPI_File_iwrite_all_f08(fh,buf,count,datatype,request,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request + use :: mpi_f08, only : ompi_file_iwrite_all_f + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror + integer :: c_ierror + + call ompi_file_iwrite_all_f(fh%MPI_VAL,buf,count,& + datatype%MPI_VAL,request%MPI_VAL,c_ierror) + if (present(ierror)) ierror = c_ierror + +end subroutine PMPI_File_iwrite_all_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08/profile/pfile_iwrite_at_all_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/profile/pfile_iwrite_at_all_f08.F90 new file mode 100644 index 00000000000..ff5116f5d85 --- /dev/null +++ b/ompi/mpi/fortran/use-mpi-f08/profile/pfile_iwrite_at_all_f08.F90 @@ -0,0 +1,28 @@ +! -*- f90 -*- +! +! Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. +! Copyright (c) 2009-2012 Los Alamos National Security, LLC. +! All Rights reserved. +! Copyright (c) 2017 FUJITSU LIMITED. All rights reserved. +! $COPYRIGHT$ + +#include "ompi/mpi/fortran/configure-fortran-output.h" + +subroutine PMPI_File_iwrite_at_all_f08(fh,offset,buf,count,datatype,request,ierror) + use :: mpi_f08_types, only : MPI_File, MPI_Datatype, MPI_Request, MPI_OFFSET_KIND + use :: mpi_f08, only : ompi_file_iwrite_at_all_f + implicit none + TYPE(MPI_File), INTENT(IN) :: fh + INTEGER(MPI_OFFSET_KIND), INTENT(IN) :: offset + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: buf + INTEGER, INTENT(IN) :: count + TYPE(MPI_Datatype), INTENT(IN) :: datatype + TYPE(MPI_Request), INTENT(OUT) :: request + INTEGER, OPTIONAL, INTENT(OUT) :: ierror + integer :: c_ierror + + call ompi_file_iwrite_at_all_f(fh%MPI_VAL,offset,buf,count,& + datatype%MPI_VAL,request%MPI_VAL,c_ierror) + if (present(ierror)) ierror = c_ierror + +end subroutine PMPI_File_iwrite_at_all_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08/profile/pstatus_set_cancelled_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/profile/pstatus_set_cancelled_f08.F90 index 8e05c6bed89..620e85a7c94 100644 --- a/ompi/mpi/fortran/use-mpi-f08/profile/pstatus_set_cancelled_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/profile/pstatus_set_cancelled_f08.F90 @@ -16,12 +16,12 @@ subroutine PMPI_Status_set_cancelled_f08(status,flag,ierror) ! See note in mpi-f-interfaces-bind.h for why we include an ! interface here and call a PMPI_* subroutine below. interface - subroutine MPI_Status_set_cancelled(status, flag, ierror) + subroutine PMPI_Status_set_cancelled(status, flag, ierror) use :: mpi_f08_types, only : MPI_Status type(MPI_Status), intent(inout) :: status logical, intent(in) :: flag integer, intent(out) :: ierror - end subroutine MPI_Status_set_cancelled + end subroutine PMPI_Status_set_cancelled end interface call PMPI_Status_set_cancelled(status,flag,c_ierror) diff --git a/ompi/mpi/fortran/use-mpi-f08/profile/ptype_delete_attr_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/profile/ptype_delete_attr_f08.F90 index b0de0c9b1a8..7a862f52650 100644 --- a/ompi/mpi/fortran/use-mpi-f08/profile/ptype_delete_attr_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/profile/ptype_delete_attr_f08.F90 @@ -3,18 +3,20 @@ ! Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. ! Copyright (c) 2009-2012 Los Alamos National Security, LLC. ! All rights reserved. +! Copyright (c) 2018 Research Organization for Information Science +! and Technology (RIST). All rights reserved. ! $COPYRIGHT$ -subroutine PMPI_Type_delete_attr_f08(type,type_keyval,ierror) +subroutine PMPI_Type_delete_attr_f08(datatype,type_keyval,ierror) use :: mpi_f08_types, only : MPI_Datatype use :: mpi_f08, only : ompi_type_delete_attr_f implicit none - TYPE(MPI_Datatype), INTENT(IN) :: type + TYPE(MPI_Datatype), INTENT(IN) :: datatype INTEGER, INTENT(IN) :: type_keyval INTEGER, OPTIONAL, INTENT(OUT) :: ierror integer :: c_ierror - call ompi_type_delete_attr_f(type%MPI_VAL,type_keyval,c_ierror) + call ompi_type_delete_attr_f(datatype%MPI_VAL,type_keyval,c_ierror) if (present(ierror)) ierror = c_ierror end subroutine PMPI_Type_delete_attr_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08/profile/ptype_dup_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/profile/ptype_dup_f08.F90 index 440b4dfa740..1afa8e3d0cb 100644 --- a/ompi/mpi/fortran/use-mpi-f08/profile/ptype_dup_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/profile/ptype_dup_f08.F90 @@ -3,18 +3,20 @@ ! Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. ! Copyright (c) 2009-2012 Los Alamos National Security, LLC. ! All rights reserved. +! Copyright (c) 2018 Research Organization for Information Science +! and Technology (RIST). All rights reserved. ! $COPYRIGHT$ -subroutine PMPI_Type_dup_f08(type,newtype,ierror) +subroutine PMPI_Type_dup_f08(oldtype,newtype,ierror) use :: mpi_f08_types, only : MPI_Datatype use :: mpi_f08, only : ompi_type_dup_f implicit none - TYPE(MPI_Datatype), INTENT(IN) :: type + TYPE(MPI_Datatype), INTENT(IN) :: oldtype TYPE(MPI_Datatype), INTENT(OUT) :: newtype INTEGER, OPTIONAL, INTENT(OUT) :: ierror integer :: c_ierror - call ompi_type_dup_f(type%MPI_VAL,newtype%MPI_VAL,c_ierror) + call ompi_type_dup_f(oldtype%MPI_VAL,newtype%MPI_VAL,c_ierror) if (present(ierror)) ierror = c_ierror end subroutine PMPI_Type_dup_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08/profile/ptype_get_attr_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/profile/ptype_get_attr_f08.F90 index eb6c7dffede..4ad25f8e667 100644 --- a/ompi/mpi/fortran/use-mpi-f08/profile/ptype_get_attr_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/profile/ptype_get_attr_f08.F90 @@ -3,21 +3,23 @@ ! Copyright (c) 2009-2013 Cisco Systems, Inc. All rights reserved. ! Copyright (c) 2009-2012 Los Alamos National Security, LLC. ! All rights reserved. +! Copyright (c) 2018 Research Organization for Information Science +! and Technology (RIST). All rights reserved. ! $COPYRIGHT$ -subroutine PMPI_Type_get_attr_f08(type,type_keyval,attribute_val,flag,ierror) +subroutine PMPI_Type_get_attr_f08(datatype,type_keyval,attribute_val,flag,ierror) use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND ! See note in mpi-f-interfaces-bind.h for why we "use mpi" here and ! call a PMPI_* subroutine below. use :: mpi, only : PMPI_Type_get_attr implicit none - TYPE(MPI_Datatype), INTENT(IN) :: type + TYPE(MPI_Datatype), INTENT(IN) :: datatype INTEGER, INTENT(IN) :: type_keyval INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: attribute_val LOGICAL, INTENT(OUT) :: flag INTEGER, OPTIONAL, INTENT(OUT) :: ierror integer :: c_ierror - call PMPI_Type_get_attr(type%MPI_VAL,type_keyval,attribute_val,flag,c_ierror) + call PMPI_Type_get_attr(datatype%MPI_VAL,type_keyval,attribute_val,flag,c_ierror) if (present(ierror)) ierror = c_ierror end subroutine PMPI_Type_get_attr_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08/profile/ptype_get_name_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/profile/ptype_get_name_f08.F90 index dd3a87a2a6e..8947f690ab5 100644 --- a/ompi/mpi/fortran/use-mpi-f08/profile/ptype_get_name_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/profile/ptype_get_name_f08.F90 @@ -3,19 +3,21 @@ ! Copyright (c) 2010-2011 Cisco Systems, Inc. All rights reserved. ! Copyright (c) 2009-2012 Los Alamos National Security, LLC. ! All rights reserved. +! Copyright (c) 2018 Research Organization for Information Science +! and Technology (RIST). All rights reserved. ! $COPYRIGHT$ -subroutine PMPI_Type_get_name_f08(type,type_name,resultlen,ierror) +subroutine PMPI_Type_get_name_f08(datatype,type_name,resultlen,ierror) use :: mpi_f08_types, only : MPI_Datatype, MPI_MAX_OBJECT_NAME use :: mpi_f08, only : ompi_type_get_name_f implicit none - TYPE(MPI_Datatype), INTENT(IN) :: type + TYPE(MPI_Datatype), INTENT(IN) :: datatype CHARACTER(LEN=*), INTENT(OUT) :: type_name INTEGER, INTENT(OUT) :: resultlen INTEGER, OPTIONAL, INTENT(OUT) :: ierror integer :: c_ierror - call ompi_type_get_name_f(type%MPI_VAL,type_name,resultlen,c_ierror,len(type_name)) + call ompi_type_get_name_f(datatype%MPI_VAL,type_name,resultlen,c_ierror,len(type_name)) if (present(ierror)) ierror = c_ierror end subroutine PMPI_Type_get_name_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08/profile/ptype_match_size_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/profile/ptype_match_size_f08.F90 index 5843afab052..1b4219760e9 100644 --- a/ompi/mpi/fortran/use-mpi-f08/profile/ptype_match_size_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/profile/ptype_match_size_f08.F90 @@ -3,18 +3,20 @@ ! Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. ! Copyright (c) 2009-2012 Los Alamos National Security, LLC. ! All Rights reserved. +! Copyright (c) 2018 Research Organization for Information Science +! and Technology (RIST). All rights reserved. ! $COPYRIGHT$ -subroutine PMPI_Type_match_size_f08(typeclass,size,type,ierror) +subroutine PMPI_Type_match_size_f08(typeclass,size,datatype,ierror) use :: mpi_f08_types, only : MPI_Datatype use :: mpi_f08, only : ompi_type_match_size_f implicit none INTEGER, INTENT(IN) :: typeclass, size - TYPE(MPI_Datatype), INTENT(OUT) :: type + TYPE(MPI_Datatype), INTENT(OUT) :: datatype INTEGER, OPTIONAL, INTENT(OUT) :: ierror integer :: c_ierror - call ompi_type_match_size_f(typeclass,size,type%MPI_VAL,c_ierror) + call ompi_type_match_size_f(typeclass,size,datatype%MPI_VAL,c_ierror) if (present(ierror)) ierror = c_ierror end subroutine PMPI_Type_match_size_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08/profile/ptype_set_attr_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/profile/ptype_set_attr_f08.F90 index e00226cc6a3..92db37557aa 100644 --- a/ompi/mpi/fortran/use-mpi-f08/profile/ptype_set_attr_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/profile/ptype_set_attr_f08.F90 @@ -3,19 +3,21 @@ ! Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. ! Copyright (c) 2009-2012 Los Alamos National Security, LLC. ! All rights reserved. +! Copyright (c) 2018 Research Organization for Information Science +! and Technology (RIST). All rights reserved. ! $COPYRIGHT$ -subroutine PMPI_Type_set_attr_f08(type,type_keyval,attribute_val,ierror) +subroutine PMPI_Type_set_attr_f08(datatype,type_keyval,attribute_val,ierror) use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND use :: mpi_f08, only : ompi_type_set_attr_f implicit none - TYPE(MPI_Datatype), INTENT(IN) :: type + TYPE(MPI_Datatype), INTENT(IN) :: datatype INTEGER, INTENT(IN) :: type_keyval INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: attribute_val INTEGER, OPTIONAL, INTENT(OUT) :: ierror integer :: c_ierror - call ompi_type_set_attr_f(type%MPI_VAL,type_keyval,attribute_val,c_ierror) + call ompi_type_set_attr_f(datatype%MPI_VAL,type_keyval,attribute_val,c_ierror) if (present(ierror)) ierror = c_ierror end subroutine PMPI_Type_set_attr_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08/profile/ptype_set_name_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/profile/ptype_set_name_f08.F90 index 9306462f7ff..a6ae8a17ce7 100644 --- a/ompi/mpi/fortran/use-mpi-f08/profile/ptype_set_name_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/profile/ptype_set_name_f08.F90 @@ -3,18 +3,20 @@ ! Copyright (c) 2010-2011 Cisco Systems, Inc. All rights reserved. ! Copyright (c) 2009-2012 Los Alamos National Security, LLC. ! All rights reserved. +! Copyright (c) 2018 Research Organization for Information Science +! and Technology (RIST). All rights reserved. ! $COPYRIGHT$ -subroutine PMPI_Type_set_name_f08(type,type_name,ierror) +subroutine PMPI_Type_set_name_f08(datatype,type_name,ierror) use :: mpi_f08_types, only : MPI_Datatype use :: mpi_f08, only : ompi_type_set_name_f implicit none - TYPE(MPI_Datatype), INTENT(IN) :: type + TYPE(MPI_Datatype), INTENT(IN) :: datatype CHARACTER(LEN=*), INTENT(IN) :: type_name INTEGER, OPTIONAL, INTENT(OUT) :: ierror integer :: c_ierror - call ompi_type_set_name_f(type%MPI_VAL,type_name,c_ierror,len(type_name)) + call ompi_type_set_name_f(datatype%MPI_VAL,type_name,c_ierror,len(type_name)) if (present(ierror)) ierror = c_ierror end subroutine PMPI_Type_set_name_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08/put_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/put_f08.F90 index b139867cf66..aa9a4fb88bf 100644 --- a/ompi/mpi/fortran/use-mpi-f08/put_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/put_f08.F90 @@ -12,7 +12,7 @@ subroutine MPI_Put_f08(origin_addr,origin_count,origin_datatype,target_rank,& use :: mpi_f08_types, only : MPI_Datatype, MPI_Win, MPI_ADDRESS_KIND use :: mpi_f08, only : ompi_put_f implicit none - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN), ASYNCHRONOUS :: origin_addr INTEGER, INTENT(IN) :: origin_count, target_rank, target_count TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp diff --git a/ompi/mpi/fortran/use-mpi-f08/raccumulate_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/raccumulate_f08.F90 index 5749437681d..3c51b689b3d 100644 --- a/ompi/mpi/fortran/use-mpi-f08/raccumulate_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/raccumulate_f08.F90 @@ -13,7 +13,7 @@ subroutine MPI_Raccumulate_f08(origin_addr,origin_count,origin_datatype,& use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Win, MPI_Request, MPI_ADDRESS_KIND use :: mpi_f08, only : ompi_raccumulate_f implicit none - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN),ASYNCHRONOUS :: origin_addr INTEGER, INTENT(IN) :: origin_count, target_rank, target_count TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp diff --git a/ompi/mpi/fortran/use-mpi-f08/rget_accumulate_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/rget_accumulate_f08.F90 index 5aeba68b045..a8ba2c95536 100644 --- a/ompi/mpi/fortran/use-mpi-f08/rget_accumulate_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/rget_accumulate_f08.F90 @@ -14,10 +14,10 @@ subroutine MPI_Rget_accumulate_f08(origin_addr,origin_count,origin_datatype,& use :: mpi_f08_types, only : MPI_Datatype, MPI_Op, MPI_Win, MPI_Request, MPI_ADDRESS_KIND use :: mpi_f08, only : ompi_rget_accumulate_f implicit none - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN), ASYNCHRONOUS :: origin_addr INTEGER, INTENT(IN) :: origin_count, result_count, target_rank, target_count TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype - OMPI_FORTRAN_IGNORE_TKR_TYPE :: result_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, ASYNCHRONOUS :: result_addr TYPE(MPI_Datatype), INTENT(IN) :: result_datatype INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp TYPE(MPI_Datatype), INTENT(IN) :: target_datatype diff --git a/ompi/mpi/fortran/use-mpi-f08/rget_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/rget_f08.F90 index 167fe8d2ef9..5d398fe436a 100644 --- a/ompi/mpi/fortran/use-mpi-f08/rget_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/rget_f08.F90 @@ -12,7 +12,7 @@ subroutine MPI_Rget_f08(origin_addr,origin_count,origin_datatype,target_rank,& use :: mpi_f08_types, only : MPI_Datatype, MPI_Win, MPI_Request, MPI_ADDRESS_KIND use :: mpi_f08, only : ompi_rget_f implicit none - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, ASYNCHRONOUS :: origin_addr INTEGER, INTENT(IN) :: origin_count, target_rank, target_count TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp diff --git a/ompi/mpi/fortran/use-mpi-f08/rput_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/rput_f08.F90 index 6012f95eab0..f0007699afb 100644 --- a/ompi/mpi/fortran/use-mpi-f08/rput_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/rput_f08.F90 @@ -12,7 +12,7 @@ subroutine MPI_Rput_f08(origin_addr,origin_count,origin_datatype,target_rank,& use :: mpi_f08_types, only : MPI_Datatype, MPI_Win, MPI_Request, MPI_ADDRESS_KIND use :: mpi_f08, only : ompi_rput_f implicit none - OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN) :: origin_addr + OMPI_FORTRAN_IGNORE_TKR_TYPE, INTENT(IN), ASYNCHRONOUS :: origin_addr INTEGER, INTENT(IN) :: origin_count, target_rank, target_count TYPE(MPI_Datatype), INTENT(IN) :: origin_datatype INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: target_disp diff --git a/ompi/mpi/fortran/use-mpi-f08/type_delete_attr_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/type_delete_attr_f08.F90 index cdb5ddcf080..794f0e4b41e 100644 --- a/ompi/mpi/fortran/use-mpi-f08/type_delete_attr_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/type_delete_attr_f08.F90 @@ -1,20 +1,20 @@ ! -*- f90 -*- ! -! Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. +! Copyright (c) 2009-2018 Cisco Systems, Inc. All rights reserved ! Copyright (c) 2009-2012 Los Alamos National Security, LLC. ! All rights reserved. ! $COPYRIGHT$ -subroutine MPI_Type_delete_attr_f08(type,type_keyval,ierror) +subroutine MPI_Type_delete_attr_f08(datatype,type_keyval,ierror) use :: mpi_f08_types, only : MPI_Datatype use :: mpi_f08, only : ompi_type_delete_attr_f implicit none - TYPE(MPI_Datatype), INTENT(IN) :: type + TYPE(MPI_Datatype), INTENT(IN) :: datatype INTEGER, INTENT(IN) :: type_keyval INTEGER, OPTIONAL, INTENT(OUT) :: ierror integer :: c_ierror - call ompi_type_delete_attr_f(type%MPI_VAL,type_keyval,c_ierror) + call ompi_type_delete_attr_f(datatype%MPI_VAL,type_keyval,c_ierror) if (present(ierror)) ierror = c_ierror end subroutine MPI_Type_delete_attr_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08/type_dup_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/type_dup_f08.F90 index 5e76d89877d..589068bc7d7 100644 --- a/ompi/mpi/fortran/use-mpi-f08/type_dup_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/type_dup_f08.F90 @@ -1,20 +1,20 @@ ! -*- f90 -*- ! -! Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. +! Copyright (c) 2009-2018 Cisco Systems, Inc. All rights reserved ! Copyright (c) 2009-2012 Los Alamos National Security, LLC. ! All rights reserved. ! $COPYRIGHT$ -subroutine MPI_Type_dup_f08(type,newtype,ierror) +subroutine MPI_Type_dup_f08(datatype,newtype,ierror) use :: mpi_f08_types, only : MPI_Datatype use :: mpi_f08, only : ompi_type_dup_f implicit none - TYPE(MPI_Datatype), INTENT(IN) :: type + TYPE(MPI_Datatype), INTENT(IN) :: datatype TYPE(MPI_Datatype), INTENT(OUT) :: newtype INTEGER, OPTIONAL, INTENT(OUT) :: ierror integer :: c_ierror - call ompi_type_dup_f(type%MPI_VAL,newtype%MPI_VAL,c_ierror) + call ompi_type_dup_f(datatype%MPI_VAL,newtype%MPI_VAL,c_ierror) if (present(ierror)) ierror = c_ierror end subroutine MPI_Type_dup_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08/type_get_attr_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/type_get_attr_f08.F90 index 2f413880797..4ddb6a0a8a5 100644 --- a/ompi/mpi/fortran/use-mpi-f08/type_get_attr_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/type_get_attr_f08.F90 @@ -1,23 +1,23 @@ ! -*- f90 -*- ! -! Copyright (c) 2009-2013 Cisco Systems, Inc. All rights reserved. +! Copyright (c) 2009-2018 Cisco Systems, Inc. All rights reserved ! Copyright (c) 2009-2012 Los Alamos National Security, LLC. ! All rights reserved. ! $COPYRIGHT$ -subroutine MPI_Type_get_attr_f08(type,type_keyval,attribute_val,flag,ierror) +subroutine MPI_Type_get_attr_f08(datatype,type_keyval,attribute_val,flag,ierror) use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND ! See note in mpi-f-interfaces-bind.h for why we "use mpi" here and ! call a PMPI_* subroutine below. use :: mpi, only : PMPI_Type_get_attr implicit none - TYPE(MPI_Datatype), INTENT(IN) :: type + TYPE(MPI_Datatype), INTENT(IN) :: datatype INTEGER, INTENT(IN) :: type_keyval INTEGER(MPI_ADDRESS_KIND), INTENT(OUT) :: attribute_val LOGICAL, INTENT(OUT) :: flag INTEGER, OPTIONAL, INTENT(OUT) :: ierror integer :: c_ierror - call PMPI_Type_get_attr(type%MPI_VAL,type_keyval,attribute_val,flag,c_ierror) + call PMPI_Type_get_attr(datatype%MPI_VAL,type_keyval,attribute_val,flag,c_ierror) if (present(ierror)) ierror = c_ierror end subroutine MPI_Type_get_attr_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08/type_get_name_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/type_get_name_f08.F90 index 63b47d97eac..abf1af3530e 100644 --- a/ompi/mpi/fortran/use-mpi-f08/type_get_name_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/type_get_name_f08.F90 @@ -1,21 +1,21 @@ ! -*- f90 -*- ! -! Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. +! Copyright (c) 2009-2018 Cisco Systems, Inc. All rights reserved ! Copyright (c) 2009-2012 Los Alamos National Security, LLC. ! All rights reserved. ! $COPYRIGHT$ -subroutine MPI_Type_get_name_f08(type,type_name,resultlen,ierror) +subroutine MPI_Type_get_name_f08(datatype,type_name,resultlen,ierror) use :: mpi_f08_types, only : MPI_Datatype, MPI_MAX_OBJECT_NAME use :: mpi_f08, only : ompi_type_get_name_f implicit none - TYPE(MPI_Datatype), INTENT(IN) :: type + TYPE(MPI_Datatype), INTENT(IN) :: datatype CHARACTER(LEN=*), INTENT(OUT) :: type_name INTEGER, INTENT(OUT) :: resultlen INTEGER, OPTIONAL, INTENT(OUT) :: ierror integer :: c_ierror - call ompi_type_get_name_f(type%MPI_VAL,type_name,resultlen,c_ierror,len(type_name)) + call ompi_type_get_name_f(datatype%MPI_VAL,type_name,resultlen,c_ierror,len(type_name)) if (present(ierror)) ierror = c_ierror end subroutine MPI_Type_get_name_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08/type_match_size_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/type_match_size_f08.F90 index bcc95ec5777..a5839d563c5 100644 --- a/ompi/mpi/fortran/use-mpi-f08/type_match_size_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/type_match_size_f08.F90 @@ -1,20 +1,20 @@ ! -*- f90 -*- ! -! Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. +! Copyright (c) 2009-2018 Cisco Systems, Inc. All rights reserved ! Copyright (c) 2009-2012 Los Alamos National Security, LLC. ! All Rights reserved. ! $COPYRIGHT$ -subroutine MPI_Type_match_size_f08(typeclass,size,type,ierror) +subroutine MPI_Type_match_size_f08(typeclass,size,datatype,ierror) use :: mpi_f08_types, only : MPI_Datatype use :: mpi_f08, only : ompi_type_match_size_f implicit none INTEGER, INTENT(IN) :: typeclass, size - TYPE(MPI_Datatype), INTENT(OUT) :: type + TYPE(MPI_Datatype), INTENT(OUT) :: datatype INTEGER, OPTIONAL, INTENT(OUT) :: ierror integer :: c_ierror - call ompi_type_match_size_f(typeclass,size,type%MPI_VAL,c_ierror) + call ompi_type_match_size_f(typeclass,size,datatype%MPI_VAL,c_ierror) if (present(ierror)) ierror = c_ierror end subroutine MPI_Type_match_size_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08/type_set_attr_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/type_set_attr_f08.F90 index 580f0002b30..3b52871460f 100644 --- a/ompi/mpi/fortran/use-mpi-f08/type_set_attr_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/type_set_attr_f08.F90 @@ -1,21 +1,21 @@ ! -*- f90 -*- ! -! Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. +! Copyright (c) 2009-2018 Cisco Systems, Inc. All rights reserved ! Copyright (c) 2009-2012 Los Alamos National Security, LLC. ! All rights reserved. ! $COPYRIGHT$ -subroutine MPI_Type_set_attr_f08(type,type_keyval,attribute_val,ierror) +subroutine MPI_Type_set_attr_f08(datatype,type_keyval,attribute_val,ierror) use :: mpi_f08_types, only : MPI_Datatype, MPI_ADDRESS_KIND use :: mpi_f08, only : ompi_type_set_attr_f implicit none - TYPE(MPI_Datatype), INTENT(IN) :: type + TYPE(MPI_Datatype), INTENT(IN) :: datatype INTEGER, INTENT(IN) :: type_keyval INTEGER(MPI_ADDRESS_KIND), INTENT(IN) :: attribute_val INTEGER, OPTIONAL, INTENT(OUT) :: ierror integer :: c_ierror - call ompi_type_set_attr_f(type%MPI_VAL,type_keyval,attribute_val,c_ierror) + call ompi_type_set_attr_f(datatype%MPI_VAL,type_keyval,attribute_val,c_ierror) if (present(ierror)) ierror = c_ierror end subroutine MPI_Type_set_attr_f08 diff --git a/ompi/mpi/fortran/use-mpi-f08/type_set_name_f08.F90 b/ompi/mpi/fortran/use-mpi-f08/type_set_name_f08.F90 index bd67f3deec7..1b0167aaa11 100644 --- a/ompi/mpi/fortran/use-mpi-f08/type_set_name_f08.F90 +++ b/ompi/mpi/fortran/use-mpi-f08/type_set_name_f08.F90 @@ -1,20 +1,20 @@ ! -*- f90 -*- ! -! Copyright (c) 2010-2011 Cisco Systems, Inc. All rights reserved. +! Copyright (c) 2010-2018 Cisco Systems, Inc. All rights reserved ! Copyright (c) 2009-2012 Los Alamos National Security, LLC. ! All rights reserved. ! $COPYRIGHT$ -subroutine MPI_Type_set_name_f08(type,type_name,ierror) +subroutine MPI_Type_set_name_f08(datatype,type_name,ierror) use :: mpi_f08_types, only : MPI_Datatype use :: mpi_f08, only : ompi_type_set_name_f implicit none - TYPE(MPI_Datatype), INTENT(IN) :: type + TYPE(MPI_Datatype), INTENT(IN) :: datatype CHARACTER(LEN=*), INTENT(IN) :: type_name INTEGER, OPTIONAL, INTENT(OUT) :: ierror integer :: c_ierror - call ompi_type_set_name_f(type%MPI_VAL,type_name,c_ierror,len(type_name)) + call ompi_type_set_name_f(datatype%MPI_VAL,type_name,c_ierror,len(type_name)) if (present(ierror)) ierror = c_ierror end subroutine MPI_Type_set_name_f08 diff --git a/ompi/mpi/fortran/use-mpi-ignore-tkr/mpi-ignore-tkr-interfaces.h.in b/ompi/mpi/fortran/use-mpi-ignore-tkr/mpi-ignore-tkr-interfaces.h.in index 756ba19d062..c346551d417 100644 --- a/ompi/mpi/fortran/use-mpi-ignore-tkr/mpi-ignore-tkr-interfaces.h.in +++ b/ompi/mpi/fortran/use-mpi-ignore-tkr/mpi-ignore-tkr-interfaces.h.in @@ -1,6 +1,6 @@ ! -*- fortran -*- ! -! Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. +! Copyright (c) 2006-2018 Cisco Systems, Inc. All rights reserved. ! Copyright (c) 2007 Los Alamos National Security, LLC. All rights ! reserved. ! Copyright (c) 2012 The University of Tennessee and The University @@ -9,8 +9,8 @@ ! Copyright (c) 2012 Inria. All rights reserved. ! Copyright (c) 2013-2015 Los Alamos National Security, LLC. All rights ! reserved. -! Copyright (c) 2015 Research Organization for Information Science -! and Technology (RIST). All rights reserved. +! Copyright (c) 2015-2018 Research Organization for Information Science +! and Technology (RIST). All rights reserved. ! $COPYRIGHT$ ! ! Additional copyrights may follow @@ -6335,8 +6335,8 @@ end interface interface MPI_Type_commit -subroutine MPI_Type_commit(type, ierror) - integer, intent(inout) :: type +subroutine MPI_Type_commit(datatype, ierror) + integer, intent(inout) :: datatype integer, intent(out) :: ierror end subroutine MPI_Type_commit @@ -6344,8 +6344,8 @@ end interface interface PMPI_Type_commit -subroutine PMPI_Type_commit(type, ierror) - integer, intent(inout) :: type +subroutine PMPI_Type_commit(datatype, ierror) + integer, intent(inout) :: datatype integer, intent(out) :: ierror end subroutine PMPI_Type_commit @@ -6723,8 +6723,8 @@ end interface interface MPI_Type_delete_attr -subroutine MPI_Type_delete_attr(type, type_keyval, ierror) - integer, intent(in) :: type +subroutine MPI_Type_delete_attr(datatype, type_keyval, ierror) + integer, intent(in) :: datatype integer, intent(in) :: type_keyval integer, intent(out) :: ierror end subroutine MPI_Type_delete_attr @@ -6733,8 +6733,8 @@ end interface interface PMPI_Type_delete_attr -subroutine PMPI_Type_delete_attr(type, type_keyval, ierror) - integer, intent(in) :: type +subroutine PMPI_Type_delete_attr(datatype, type_keyval, ierror) + integer, intent(in) :: datatype integer, intent(in) :: type_keyval integer, intent(out) :: ierror end subroutine PMPI_Type_delete_attr @@ -6744,8 +6744,8 @@ end interface interface MPI_Type_dup -subroutine MPI_Type_dup(type, newtype, ierror) - integer, intent(in) :: type +subroutine MPI_Type_dup(datatype, newtype, ierror) + integer, intent(in) :: datatype integer, intent(out) :: newtype integer, intent(out) :: ierror end subroutine MPI_Type_dup @@ -6754,8 +6754,8 @@ end interface interface PMPI_Type_dup -subroutine PMPI_Type_dup(type, newtype, ierror) - integer, intent(in) :: type +subroutine PMPI_Type_dup(datatype, newtype, ierror) + integer, intent(in) :: datatype integer, intent(out) :: newtype integer, intent(out) :: ierror end subroutine PMPI_Type_dup @@ -6765,8 +6765,8 @@ end interface interface MPI_Type_extent -subroutine MPI_Type_extent(type, extent, ierror) - integer, intent(in) :: type +subroutine MPI_Type_extent(datatype, extent, ierror) + integer, intent(in) :: datatype integer, intent(out) :: extent integer, intent(out) :: ierror end subroutine MPI_Type_extent @@ -6775,8 +6775,8 @@ end interface interface PMPI_Type_extent -subroutine PMPI_Type_extent(type, extent, ierror) - integer, intent(in) :: type +subroutine PMPI_Type_extent(datatype, extent, ierror) + integer, intent(in) :: datatype integer, intent(out) :: extent integer, intent(out) :: ierror end subroutine PMPI_Type_extent @@ -6786,8 +6786,8 @@ end interface interface MPI_Type_free -subroutine MPI_Type_free(type, ierror) - integer, intent(inout) :: type +subroutine MPI_Type_free(datatype, ierror) + integer, intent(inout) :: datatype integer, intent(out) :: ierror end subroutine MPI_Type_free @@ -6795,8 +6795,8 @@ end interface interface PMPI_Type_free -subroutine PMPI_Type_free(type, ierror) - integer, intent(inout) :: type +subroutine PMPI_Type_free(datatype, ierror) + integer, intent(inout) :: datatype integer, intent(out) :: ierror end subroutine PMPI_Type_free @@ -6824,9 +6824,9 @@ end interface interface MPI_Type_get_attr -subroutine MPI_Type_get_attr(type, type_keyval, attribute_val, flag, ierror) +subroutine MPI_Type_get_attr(datatype, type_keyval, attribute_val, flag, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer, intent(in) :: type_keyval integer(kind=MPI_ADDRESS_KIND), intent(out) :: attribute_val logical, intent(out) :: flag @@ -6837,9 +6837,9 @@ end interface interface PMPI_Type_get_attr -subroutine PMPI_Type_get_attr(type, type_keyval, attribute_val, flag, ierror) +subroutine PMPI_Type_get_attr(datatype, type_keyval, attribute_val, flag, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer, intent(in) :: type_keyval integer(kind=MPI_ADDRESS_KIND), intent(out) :: attribute_val logical, intent(out) :: flag @@ -6851,10 +6851,10 @@ end interface interface MPI_Type_get_contents -subroutine MPI_Type_get_contents(mtype, max_integers, max_addresses, max_datatypes, array_of_integers, & +subroutine MPI_Type_get_contents(datatype, max_integers, max_addresses, max_datatypes, array_of_integers, & array_of_addresses, array_of_datatypes, ierror) include 'mpif-config.h' - integer, intent(in) :: mtype + integer, intent(in) :: datatype integer, intent(in) :: max_integers integer, intent(in) :: max_addresses integer, intent(in) :: max_datatypes @@ -6868,10 +6868,10 @@ end interface interface PMPI_Type_get_contents -subroutine PMPI_Type_get_contents(mtype, max_integers, max_addresses, max_datatypes, array_of_integers, & +subroutine PMPI_Type_get_contents(datatype, max_integers, max_addresses, max_datatypes, array_of_integers, & array_of_addresses, array_of_datatypes, ierror) include 'mpif-config.h' - integer, intent(in) :: mtype + integer, intent(in) :: datatype integer, intent(in) :: max_integers integer, intent(in) :: max_addresses integer, intent(in) :: max_datatypes @@ -6886,9 +6886,9 @@ end interface interface MPI_Type_get_envelope -subroutine MPI_Type_get_envelope(type, num_integers, num_addresses, num_datatypes, combiner& +subroutine MPI_Type_get_envelope(datatype, num_integers, num_addresses, num_datatypes, combiner& , ierror) - integer, intent(in) :: type + integer, intent(in) :: datatype integer, intent(out) :: num_integers integer, intent(out) :: num_addresses integer, intent(out) :: num_datatypes @@ -6900,9 +6900,9 @@ end interface interface PMPI_Type_get_envelope -subroutine PMPI_Type_get_envelope(type, num_integers, num_addresses, num_datatypes, combiner& +subroutine PMPI_Type_get_envelope(datatype, num_integers, num_addresses, num_datatypes, combiner& , ierror) - integer, intent(in) :: type + integer, intent(in) :: datatype integer, intent(out) :: num_integers integer, intent(out) :: num_addresses integer, intent(out) :: num_datatypes @@ -6915,9 +6915,9 @@ end interface interface MPI_Type_get_extent -subroutine MPI_Type_get_extent(type, lb, extent, ierror) +subroutine MPI_Type_get_extent(datatype, lb, extent, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer(kind=MPI_ADDRESS_KIND), intent(out) :: lb integer(kind=MPI_ADDRESS_KIND), intent(out) :: extent integer, intent(out) :: ierror @@ -6927,9 +6927,9 @@ end interface interface PMPI_Type_get_extent -subroutine PMPI_Type_get_extent(type, lb, extent, ierror) +subroutine PMPI_Type_get_extent(datatype, lb, extent, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer(kind=MPI_ADDRESS_KIND), intent(out) :: lb integer(kind=MPI_ADDRESS_KIND), intent(out) :: extent integer, intent(out) :: ierror @@ -6940,9 +6940,9 @@ end interface interface MPI_Type_get_extent_x -subroutine MPI_Type_get_extent_x(type, lb, extent, ierror) +subroutine MPI_Type_get_extent_x(datatype, lb, extent, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer(kind=MPI_COUNT_KIND), intent(out) :: lb integer(kind=MPI_COUNT_KIND), intent(out) :: extent integer, intent(out) :: ierror @@ -6952,9 +6952,9 @@ end interface interface PMPI_Type_get_extent_x -subroutine PMPI_Type_get_extent_x(type, lb, extent, ierror) +subroutine PMPI_Type_get_extent_x(datatype, lb, extent, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer(kind=MPI_COUNT_KIND), intent(out) :: lb integer(kind=MPI_COUNT_KIND), intent(out) :: extent integer, intent(out) :: ierror @@ -6965,8 +6965,8 @@ end interface interface MPI_Type_get_name -subroutine MPI_Type_get_name(type, type_name, resultlen, ierror) - integer, intent(in) :: type +subroutine MPI_Type_get_name(datatype, type_name, resultlen, ierror) + integer, intent(in) :: datatype character(len=*), intent(out) :: type_name integer, intent(out) :: resultlen integer, intent(out) :: ierror @@ -6976,8 +6976,8 @@ end interface interface PMPI_Type_get_name -subroutine PMPI_Type_get_name(type, type_name, resultlen, ierror) - integer, intent(in) :: type +subroutine PMPI_Type_get_name(datatype, type_name, resultlen, ierror) + integer, intent(in) :: datatype character(len=*), intent(out) :: type_name integer, intent(out) :: resultlen integer, intent(out) :: ierror @@ -7125,8 +7125,8 @@ end interface interface MPI_Type_lb -subroutine MPI_Type_lb(type, lb, ierror) - integer, intent(in) :: type +subroutine MPI_Type_lb(datatype, lb, ierror) + integer, intent(in) :: datatype integer, intent(out) :: lb integer, intent(out) :: ierror end subroutine MPI_Type_lb @@ -7135,8 +7135,8 @@ end interface interface PMPI_Type_lb -subroutine PMPI_Type_lb(type, lb, ierror) - integer, intent(in) :: type +subroutine PMPI_Type_lb(datatype, lb, ierror) + integer, intent(in) :: datatype integer, intent(out) :: lb integer, intent(out) :: ierror end subroutine PMPI_Type_lb @@ -7146,10 +7146,10 @@ end interface interface MPI_Type_match_size -subroutine MPI_Type_match_size(typeclass, size, type, ierror) +subroutine MPI_Type_match_size(typeclass, size, datatype, ierror) integer, intent(in) :: typeclass integer, intent(in) :: size - integer, intent(out) :: type + integer, intent(out) :: datatype integer, intent(out) :: ierror end subroutine MPI_Type_match_size @@ -7157,10 +7157,10 @@ end interface interface PMPI_Type_match_size -subroutine PMPI_Type_match_size(typeclass, size, type, ierror) +subroutine PMPI_Type_match_size(typeclass, size, datatype, ierror) integer, intent(in) :: typeclass integer, intent(in) :: size - integer, intent(out) :: type + integer, intent(out) :: datatype integer, intent(out) :: ierror end subroutine PMPI_Type_match_size @@ -7169,9 +7169,9 @@ end interface interface MPI_Type_set_attr -subroutine MPI_Type_set_attr(type, type_keyval, attr_val, ierror) +subroutine MPI_Type_set_attr(datatype, type_keyval, attr_val, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer, intent(in) :: type_keyval integer(kind=MPI_ADDRESS_KIND), intent(in) :: attr_val integer, intent(out) :: ierror @@ -7181,9 +7181,9 @@ end interface interface PMPI_Type_set_attr -subroutine PMPI_Type_set_attr(type, type_keyval, attr_val, ierror) +subroutine PMPI_Type_set_attr(datatype, type_keyval, attr_val, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer, intent(in) :: type_keyval integer(kind=MPI_ADDRESS_KIND), intent(in) :: attr_val integer, intent(out) :: ierror @@ -7194,8 +7194,8 @@ end interface interface MPI_Type_set_name -subroutine MPI_Type_set_name(type, type_name, ierror) - integer, intent(in) :: type +subroutine MPI_Type_set_name(datatype, type_name, ierror) + integer, intent(in) :: datatype character(len=*), intent(in) :: type_name integer, intent(out) :: ierror end subroutine MPI_Type_set_name @@ -7204,8 +7204,8 @@ end interface interface PMPI_Type_set_name -subroutine PMPI_Type_set_name(type, type_name, ierror) - integer, intent(in) :: type +subroutine PMPI_Type_set_name(datatype, type_name, ierror) + integer, intent(in) :: datatype character(len=*), intent(in) :: type_name integer, intent(out) :: ierror end subroutine PMPI_Type_set_name @@ -7215,8 +7215,8 @@ end interface interface MPI_Type_size -subroutine MPI_Type_size(type, size, ierror) - integer, intent(in) :: type +subroutine MPI_Type_size(datatype, size, ierror) + integer, intent(in) :: datatype integer, intent(out) :: size integer, intent(out) :: ierror end subroutine MPI_Type_size @@ -7225,8 +7225,8 @@ end interface interface PMPI_Type_size -subroutine PMPI_Type_size(type, size, ierror) - integer, intent(in) :: type +subroutine PMPI_Type_size(datatype, size, ierror) + integer, intent(in) :: datatype integer, intent(out) :: size integer, intent(out) :: ierror end subroutine PMPI_Type_size @@ -7236,9 +7236,9 @@ end interface interface MPI_Type_size_x -subroutine MPI_Type_size_x(type, size, ierror) +subroutine MPI_Type_size_x(datatype, size, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer(kind=MPI_COUNT_KIND), intent(out) :: size integer, intent(out) :: ierror end subroutine MPI_Type_size_x @@ -7247,9 +7247,9 @@ end interface interface PMPI_Type_size_x -subroutine PMPI_Type_size_x(type, size, ierror) +subroutine PMPI_Type_size_x(datatype, size, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer(kind=MPI_COUNT_KIND), intent(out) :: size integer, intent(out) :: ierror end subroutine PMPI_Type_size_x @@ -7288,8 +7288,8 @@ end interface interface MPI_Type_ub -subroutine MPI_Type_ub(mtype, ub, ierror) - integer, intent(in) :: mtype +subroutine MPI_Type_ub(datatype, ub, ierror) + integer, intent(in) :: datatype integer, intent(out) :: ub integer, intent(out) :: ierror end subroutine MPI_Type_ub @@ -7298,8 +7298,8 @@ end interface interface PMPI_Type_ub -subroutine PMPI_Type_ub(mtype, ub, ierror) - integer, intent(in) :: mtype +subroutine PMPI_Type_ub(datatype, ub, ierror) + integer, intent(in) :: datatype integer, intent(out) :: ub integer, intent(out) :: ierror end subroutine PMPI_Type_ub diff --git a/ompi/mpi/fortran/use-mpi-ignore-tkr/mpi-ignore-tkr.F90 b/ompi/mpi/fortran/use-mpi-ignore-tkr/mpi-ignore-tkr.F90 index 4120d7d6b3b..c3acea62cad 100644 --- a/ompi/mpi/fortran/use-mpi-ignore-tkr/mpi-ignore-tkr.F90 +++ b/ompi/mpi/fortran/use-mpi-ignore-tkr/mpi-ignore-tkr.F90 @@ -11,6 +11,8 @@ ! Copyright (c) 2004-2005 The Regents of the University of California. ! All rights reserved. ! Copyright (c) 2006-2014 Cisco Systems, Inc. All rights reserved. +! Copyright (c) 2017 Research Organization for Information Science +! and Technology (RIST). All rights reserved. ! $COPYRIGHT$ ! ! Additional copyrights may follow @@ -25,10 +27,8 @@ module mpi include "mpif-config.h" include "mpif-constants.h" include "mpif-handles.h" -#if OMPI_PROVIDE_MPI_FILE_INTERFACE include "mpif-io-constants.h" include "mpif-io-handles.h" -#endif include "mpif-sentinels.h" ! The MPI attribute callback functions @@ -42,9 +42,7 @@ module mpi ! The ignore-TKR version of the MPI interfaces include "ompi/mpi/fortran/use-mpi-ignore-tkr/mpi-ignore-tkr-interfaces.h" -#if OMPI_PROVIDE_MPI_FILE_INTERFACE include "ompi/mpi/fortran/use-mpi-ignore-tkr/mpi-ignore-tkr-file-interfaces.h" -#endif include 'mpi-ignore-tkr-sizeof.h' diff --git a/ompi/mpi/fortran/use-mpi-tkr/Makefile.am b/ompi/mpi/fortran/use-mpi-tkr/Makefile.am index a1f3105ddd8..df609dd859c 100644 --- a/ompi/mpi/fortran/use-mpi-tkr/Makefile.am +++ b/ompi/mpi/fortran/use-mpi-tkr/Makefile.am @@ -10,7 +10,7 @@ # University of Stuttgart. All rights reserved. # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. -# Copyright (c) 2006-2015 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2006-2018 Cisco Systems, Inc. All rights reserved # Copyright (c) 2007 Los Alamos National Security, LLC. All rights # reserved. # Copyright (c) 2014-2016 Research Organization for Information Science @@ -90,6 +90,7 @@ nodist_lib@OMPI_LIBMPI_NAME@_usempi_la_SOURCES += \ mpi-tkr-sizeof.h \ mpi-tkr-sizeof.f90 endif +mpi.lo: $(nodist_lib@OMPI_LIBMPI_NAME@_usempi_la_SOURCES) # Note that we invoke some OPAL functions directly in # libmpi_usempi.la, so we need to link in the OPAL library directly diff --git a/ompi/mpi/fortran/use-mpi-tkr/mpi-f90-interfaces.h b/ompi/mpi/fortran/use-mpi-tkr/mpi-f90-interfaces.h index db318bda77c..e78fb27c754 100644 --- a/ompi/mpi/fortran/use-mpi-tkr/mpi-f90-interfaces.h +++ b/ompi/mpi/fortran/use-mpi-tkr/mpi-f90-interfaces.h @@ -11,8 +11,8 @@ ! Copyright (c) 2004-2005 The Regents of the University of California. ! All rights reserved. ! Copyright (c) 2006-2014 Cisco Systems, Inc. All rights reserved. -! Copyright (c) 2016 Research Organization for Information Science -! and Technology (RIST). All rights reserved. +! Copyright (c) 2016-2018 Research Organization for Information Science +! and Technology (RIST). All rights reserved. ! $COPYRIGHT$ ! ! Additional copyrights may follow @@ -1481,8 +1481,8 @@ end interface interface MPI_Type_commit -subroutine MPI_Type_commit(type, ierror) - integer, intent(inout) :: type +subroutine MPI_Type_commit(datatype, ierror) + integer, intent(inout) :: datatype integer, intent(out) :: ierror end subroutine MPI_Type_commit @@ -1666,8 +1666,8 @@ end interface interface MPI_Type_delete_attr -subroutine MPI_Type_delete_attr(type, type_keyval, ierror) - integer, intent(in) :: type +subroutine MPI_Type_delete_attr(datatype, type_keyval, ierror) + integer, intent(in) :: datatype integer, intent(in) :: type_keyval integer, intent(out) :: ierror end subroutine MPI_Type_delete_attr @@ -1677,8 +1677,8 @@ end interface interface MPI_Type_dup -subroutine MPI_Type_dup(type, newtype, ierror) - integer, intent(in) :: type +subroutine MPI_Type_dup(oldtype, newtype, ierror) + integer, intent(in) :: oldtype integer, intent(out) :: newtype integer, intent(out) :: ierror end subroutine MPI_Type_dup @@ -1688,8 +1688,8 @@ end interface interface MPI_Type_extent -subroutine MPI_Type_extent(type, extent, ierror) - integer, intent(in) :: type +subroutine MPI_Type_extent(datatype, extent, ierror) + integer, intent(in) :: datatype integer, intent(out) :: extent integer, intent(out) :: ierror end subroutine MPI_Type_extent @@ -1699,8 +1699,8 @@ end interface interface MPI_Type_free -subroutine MPI_Type_free(type, ierror) - integer, intent(inout) :: type +subroutine MPI_Type_free(datatype, ierror) + integer, intent(inout) :: datatype integer, intent(out) :: ierror end subroutine MPI_Type_free @@ -1719,9 +1719,9 @@ end interface interface MPI_Type_get_attr -subroutine MPI_Type_get_attr(type, type_keyval, attribute_val, flag, ierror) +subroutine MPI_Type_get_attr(datatype, type_keyval, attribute_val, flag, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer, intent(in) :: type_keyval integer(kind=MPI_ADDRESS_KIND), intent(out) :: attribute_val logical, intent(out) :: flag @@ -1733,10 +1733,10 @@ end interface interface MPI_Type_get_contents -subroutine MPI_Type_get_contents(mtype, max_integers, max_addresses, max_datatypes, array_of_integers, & +subroutine MPI_Type_get_contents(datatype, max_integers, max_addresses, max_datatypes, array_of_integers, & array_of_addresses, array_of_datatypes, ierror) include 'mpif-config.h' - integer, intent(in) :: mtype + integer, intent(in) :: datatype integer, intent(in) :: max_integers integer, intent(in) :: max_addresses integer, intent(in) :: max_datatypes @@ -1751,9 +1751,9 @@ end interface interface MPI_Type_get_envelope -subroutine MPI_Type_get_envelope(type, num_integers, num_addresses, num_datatypes, combiner& +subroutine MPI_Type_get_envelope(datatype, num_integers, num_addresses, num_datatypes, combiner& , ierror) - integer, intent(in) :: type + integer, intent(in) :: datatype integer, intent(out) :: num_integers integer, intent(out) :: num_addresses integer, intent(out) :: num_datatypes @@ -1766,9 +1766,9 @@ end interface interface MPI_Type_get_extent -subroutine MPI_Type_get_extent(type, lb, extent, ierror) +subroutine MPI_Type_get_extent(datatype, lb, extent, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer(kind=MPI_ADDRESS_KIND), intent(out) :: lb integer(kind=MPI_ADDRESS_KIND), intent(out) :: extent integer, intent(out) :: ierror @@ -1779,9 +1779,9 @@ end interface interface MPI_Type_get_extent_x -subroutine MPI_Type_get_extent_x(type, lb, extent, ierror) +subroutine MPI_Type_get_extent_x(datatype, lb, extent, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer(kind=MPI_COUNT_KIND), intent(out) :: lb integer(kind=MPI_COUNT_KIND), intent(out) :: extent integer, intent(out) :: ierror @@ -1792,8 +1792,8 @@ end interface interface MPI_Type_get_name -subroutine MPI_Type_get_name(type, type_name, resultlen, ierror) - integer, intent(in) :: type +subroutine MPI_Type_get_name(datatype, type_name, resultlen, ierror) + integer, intent(in) :: datatype character(len=*), intent(out) :: type_name integer, intent(out) :: resultlen integer, intent(out) :: ierror @@ -1875,8 +1875,8 @@ end interface interface MPI_Type_lb -subroutine MPI_Type_lb(type, lb, ierror) - integer, intent(in) :: type +subroutine MPI_Type_lb(datatype, lb, ierror) + integer, intent(in) :: datatype integer, intent(out) :: lb integer, intent(out) :: ierror end subroutine MPI_Type_lb @@ -1886,10 +1886,10 @@ end interface interface MPI_Type_match_size -subroutine MPI_Type_match_size(typeclass, size, type, ierror) +subroutine MPI_Type_match_size(typeclass, size, datatype, ierror) integer, intent(in) :: typeclass integer, intent(in) :: size - integer, intent(out) :: type + integer, intent(out) :: datatype integer, intent(out) :: ierror end subroutine MPI_Type_match_size @@ -1898,9 +1898,9 @@ end interface interface MPI_Type_set_attr -subroutine MPI_Type_set_attr(type, type_keyval, attr_val, ierror) +subroutine MPI_Type_set_attr(datatype, type_keyval, attr_val, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer, intent(in) :: type_keyval integer(kind=MPI_ADDRESS_KIND), intent(in) :: attr_val integer, intent(out) :: ierror @@ -1911,8 +1911,8 @@ end interface interface MPI_Type_set_name -subroutine MPI_Type_set_name(type, type_name, ierror) - integer, intent(in) :: type +subroutine MPI_Type_set_name(datatype, type_name, ierror) + integer, intent(in) :: datatype character(len=*), intent(in) :: type_name integer, intent(out) :: ierror end subroutine MPI_Type_set_name @@ -1922,8 +1922,8 @@ end interface interface MPI_Type_size -subroutine MPI_Type_size(type, size, ierror) - integer, intent(in) :: type +subroutine MPI_Type_size(datatype, size, ierror) + integer, intent(in) :: datatype integer, intent(out) :: size integer, intent(out) :: ierror end subroutine MPI_Type_size @@ -1933,9 +1933,9 @@ end interface interface MPI_Type_size_x -subroutine MPI_Type_size_x(type, size, ierror) +subroutine MPI_Type_size_x(datatype, size, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer(kind=MPI_COUNT_KIND), intent(out) :: size integer, intent(out) :: ierror end subroutine MPI_Type_size_x @@ -1960,8 +1960,8 @@ end interface interface MPI_Type_ub -subroutine MPI_Type_ub(mtype, ub, ierror) - integer, intent(in) :: mtype +subroutine MPI_Type_ub(datatype, ub, ierror) + integer, intent(in) :: datatype integer, intent(out) :: ub integer, intent(out) :: ierror end subroutine MPI_Type_ub diff --git a/ompi/mpi/fortran/use-mpi-tkr/mpi.F90 b/ompi/mpi/fortran/use-mpi-tkr/mpi.F90 index be54f43f5b9..9ac593a8d70 100644 --- a/ompi/mpi/fortran/use-mpi-tkr/mpi.F90 +++ b/ompi/mpi/fortran/use-mpi-tkr/mpi.F90 @@ -11,7 +11,7 @@ ! Copyright (c) 2004-2005 The Regents of the University of California. ! All rights reserved. ! Copyright (c) 2006-2014 Cisco Systems, Inc. All rights reserved. -! Copyright (c) 2016 Research Organization for Information Science +! Copyright (c) 2016-2017 Research Organization for Information Science ! and Technology (RIST). All rights reserved. ! $COPYRIGHT$ ! @@ -27,10 +27,8 @@ module mpi include "mpif-config.h" include "mpif-constants.h" include "mpif-handles.h" -#if OMPI_PROVIDE_MPI_FILE_INTERFACE include "mpif-io-constants.h" include "mpif-io-handles.h" -#endif include "mpif-sentinels.h" ! The MPI attribute callback functions diff --git a/ompi/mpi/fortran/use-mpi-tkr/pmpi-f90-interfaces.h b/ompi/mpi/fortran/use-mpi-tkr/pmpi-f90-interfaces.h index 3f9d3291f6d..1baa4bf76cc 100644 --- a/ompi/mpi/fortran/use-mpi-tkr/pmpi-f90-interfaces.h +++ b/ompi/mpi/fortran/use-mpi-tkr/pmpi-f90-interfaces.h @@ -10,9 +10,9 @@ ! University of Stuttgart. All rights reserved. ! Copyright (c) 2004-2005 The Regents of the University of California. ! All rights reserved. -! Copyright (c) 2006-2014 Cisco Systems, Inc. All rights reserved. -! Copyright (c) 2016 Research Organization for Information Science -! and Technology (RIST). All rights reserved. +! Copyright (c) 2006-2018 Cisco Systems, Inc. All rights reserved. +! Copyright (c) 2016-2018 Research Organization for Information Science +! and Technology (RIST). All rights reserved. ! $COPYRIGHT$ ! ! Additional copyrights may follow @@ -1481,8 +1481,8 @@ end interface interface PMPI_Type_commit -subroutine PMPI_Type_commit(type, ierror) - integer, intent(inout) :: type +subroutine PMPI_Type_commit(datatype, ierror) + integer, intent(inout) :: datatype integer, intent(out) :: ierror end subroutine PMPI_Type_commit @@ -1666,8 +1666,8 @@ end interface interface PMPI_Type_delete_attr -subroutine PMPI_Type_delete_attr(type, type_keyval, ierror) - integer, intent(in) :: type +subroutine PMPI_Type_delete_attr(datatype, type_keyval, ierror) + integer, intent(in) :: datatype integer, intent(in) :: type_keyval integer, intent(out) :: ierror end subroutine PMPI_Type_delete_attr @@ -1677,8 +1677,8 @@ end interface interface PMPI_Type_dup -subroutine PMPI_Type_dup(type, newtype, ierror) - integer, intent(in) :: type +subroutine PMPI_Type_dup(datatype, newtype, ierror) + integer, intent(in) :: datatype integer, intent(out) :: newtype integer, intent(out) :: ierror end subroutine PMPI_Type_dup @@ -1688,8 +1688,8 @@ end interface interface PMPI_Type_extent -subroutine PMPI_Type_extent(type, extent, ierror) - integer, intent(in) :: type +subroutine PMPI_Type_extent(datatype, extent, ierror) + integer, intent(in) :: datatype integer, intent(out) :: extent integer, intent(out) :: ierror end subroutine PMPI_Type_extent @@ -1699,8 +1699,8 @@ end interface interface PMPI_Type_free -subroutine PMPI_Type_free(type, ierror) - integer, intent(inout) :: type +subroutine PMPI_Type_free(datatype, ierror) + integer, intent(inout) :: datatype integer, intent(out) :: ierror end subroutine PMPI_Type_free @@ -1719,9 +1719,9 @@ end interface interface PMPI_Type_get_attr -subroutine PMPI_Type_get_attr(type, type_keyval, attribute_val, flag, ierror) +subroutine PMPI_Type_get_attr(datatype, type_keyval, attribute_val, flag, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer, intent(in) :: type_keyval integer(kind=MPI_ADDRESS_KIND), intent(out) :: attribute_val logical, intent(out) :: flag @@ -1733,10 +1733,10 @@ end interface interface PMPI_Type_get_contents -subroutine PMPI_Type_get_contents(mtype, max_integers, max_addresses, max_datatypes, array_of_integers, & +subroutine PMPI_Type_get_contents(datatype, max_integers, max_addresses, max_datatypes, array_of_integers, & array_of_addresses, array_of_datatypes, ierror) include 'mpif-config.h' - integer, intent(in) :: mtype + integer, intent(in) :: datatype integer, intent(in) :: max_integers integer, intent(in) :: max_addresses integer, intent(in) :: max_datatypes @@ -1751,9 +1751,9 @@ end interface interface PMPI_Type_get_envelope -subroutine PMPI_Type_get_envelope(type, num_integers, num_addresses, num_datatypes, combiner& +subroutine PMPI_Type_get_envelope(datatype, num_integers, num_addresses, num_datatypes, combiner& , ierror) - integer, intent(in) :: type + integer, intent(in) :: datatype integer, intent(out) :: num_integers integer, intent(out) :: num_addresses integer, intent(out) :: num_datatypes @@ -1766,9 +1766,9 @@ end interface interface PMPI_Type_get_extent -subroutine PMPI_Type_get_extent(type, lb, extent, ierror) +subroutine PMPI_Type_get_extent(datatype, lb, extent, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer(kind=MPI_ADDRESS_KIND), intent(out) :: lb integer(kind=MPI_ADDRESS_KIND), intent(out) :: extent integer, intent(out) :: ierror @@ -1779,9 +1779,9 @@ end interface interface PMPI_Type_get_extent_x -subroutine PMPI_Type_get_extent_x(type, lb, extent, ierror) +subroutine PMPI_Type_get_extent_x(datatype, lb, extent, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer(kind=MPI_COUNT_KIND), intent(out) :: lb integer(kind=MPI_COUNT_KIND), intent(out) :: extent integer, intent(out) :: ierror @@ -1792,8 +1792,8 @@ end interface interface PMPI_Type_get_name -subroutine PMPI_Type_get_name(type, type_name, resultlen, ierror) - integer, intent(in) :: type +subroutine PMPI_Type_get_name(datatype, type_name, resultlen, ierror) + integer, intent(in) :: datatype character(len=*), intent(out) :: type_name integer, intent(out) :: resultlen integer, intent(out) :: ierror @@ -1875,8 +1875,8 @@ end interface interface PMPI_Type_lb -subroutine PMPI_Type_lb(type, lb, ierror) - integer, intent(in) :: type +subroutine PMPI_Type_lb(datatype, lb, ierror) + integer, intent(in) :: datatype integer, intent(out) :: lb integer, intent(out) :: ierror end subroutine PMPI_Type_lb @@ -1886,10 +1886,10 @@ end interface interface PMPI_Type_match_size -subroutine PMPI_Type_match_size(typeclass, size, type, ierror) +subroutine PMPI_Type_match_size(typeclass, size, datatype, ierror) integer, intent(in) :: typeclass integer, intent(in) :: size - integer, intent(out) :: type + integer, intent(out) :: datatype integer, intent(out) :: ierror end subroutine PMPI_Type_match_size @@ -1898,9 +1898,9 @@ end interface interface PMPI_Type_set_attr -subroutine PMPI_Type_set_attr(type, type_keyval, attr_val, ierror) +subroutine PMPI_Type_set_attr(datatype, type_keyval, attr_val, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer, intent(in) :: type_keyval integer(kind=MPI_ADDRESS_KIND), intent(in) :: attr_val integer, intent(out) :: ierror @@ -1911,8 +1911,8 @@ end interface interface PMPI_Type_set_name -subroutine PMPI_Type_set_name(type, type_name, ierror) - integer, intent(in) :: type +subroutine PMPI_Type_set_name(datatype, type_name, ierror) + integer, intent(in) :: datatype character(len=*), intent(in) :: type_name integer, intent(out) :: ierror end subroutine PMPI_Type_set_name @@ -1922,8 +1922,8 @@ end interface interface PMPI_Type_size -subroutine PMPI_Type_size(type, size, ierror) - integer, intent(in) :: type +subroutine PMPI_Type_size(datatype, size, ierror) + integer, intent(in) :: datatype integer, intent(out) :: size integer, intent(out) :: ierror end subroutine PMPI_Type_size @@ -1933,9 +1933,9 @@ end interface interface PMPI_Type_size_x -subroutine PMPI_Type_size_x(type, size, ierror) +subroutine PMPI_Type_size_x(datatype, size, ierror) include 'mpif-config.h' - integer, intent(in) :: type + integer, intent(in) :: datatype integer(kind=MPI_COUNT_KIND), intent(out) :: size integer, intent(out) :: ierror end subroutine PMPI_Type_size_x @@ -1960,8 +1960,8 @@ end interface interface PMPI_Type_ub -subroutine PMPI_Type_ub(mtype, ub, ierror) - integer, intent(in) :: mtype +subroutine PMPI_Type_ub(datatype, ub, ierror) + integer, intent(in) :: datatype integer, intent(out) :: ub integer, intent(out) :: ierror end subroutine PMPI_Type_ub diff --git a/ompi/mpi/java/c/Makefile.am b/ompi/mpi/java/c/Makefile.am index 2fee2dc0611..96552eb93e0 100644 --- a/ompi/mpi/java/c/Makefile.am +++ b/ompi/mpi/java/c/Makefile.am @@ -1,6 +1,6 @@ # -*- makefile -*- # -# Copyright (c) 2011-2013 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2011-2018 Cisco Systems, Inc. All rights reserved # Copyright (c) 2012 Oracle and/or its affiliates. All rights reserved. # Copyright (c) 2015 Los Alamos National Security, LLC. All rights # reserved. @@ -17,7 +17,7 @@ if OMPI_WANT_JAVA_BINDINGS # Get the include files that were generated from the .java source files -AM_CPPFLAGS = -I$(top_builddir)/ompi/mpi/java/java $(OPAL_JDK_CPPFLAGS) -DOMPI_LIBMPI_NAME=\"$(OMPI_LIBMPI_NAME)\" -DOPAL_DYN_LIB_SUFFIX=\"$(OPAL_DYN_LIB_SUFFIX)\" +AM_CPPFLAGS = -I$(top_builddir)/ompi/mpi/java/java $(OMPI_JDK_CPPFLAGS) -DOMPI_LIBMPI_NAME=\"$(OMPI_LIBMPI_NAME)\" -DOPAL_DYN_LIB_SUFFIX=\"$(OPAL_DYN_LIB_SUFFIX)\" headers = \ mpiJava.h diff --git a/ompi/mpi/java/c/mpi_Comm.c b/ompi/mpi/java/c/mpi_Comm.c index 81510879016..89f819cd587 100644 --- a/ompi/mpi/java/c/mpi_Comm.c +++ b/ompi/mpi/java/c/mpi_Comm.c @@ -13,6 +13,7 @@ * and Technology (RIST). All rights reserved. * Copyright (c) 2016 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017 FUJITSU LIMITED. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -708,6 +709,13 @@ JNIEXPORT jlong JNICALL Java_mpi_Comm_getErrhandler( return (jlong)errhandler; } +JNIEXPORT void JNICALL Java_mpi_Comm_callErrhandler( + JNIEnv *env, jobject jthis, jlong comm, jint errorCode) +{ + int rc = MPI_Comm_call_errhandler((MPI_Comm)comm, errorCode); + ompi_java_exceptionCheck(env, rc); +} + static int commCopyAttr(MPI_Comm oldcomm, int keyval, void *extraState, void *attrValIn, void *attrValOut, int *flag) { diff --git a/ompi/mpi/java/c/mpi_File.c b/ompi/mpi/java/c/mpi_File.c index fe15a70b842..237b522776b 100644 --- a/ompi/mpi/java/c/mpi_File.c +++ b/ompi/mpi/java/c/mpi_File.c @@ -11,6 +11,7 @@ * All rights reserved. * Copyright (c) 2016 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017 FUJITSU LIMITED. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -236,6 +237,20 @@ JNIEXPORT jlong JNICALL Java_mpi_File_iReadAt( return (jlong)request; } +JNIEXPORT jlong JNICALL Java_mpi_File_iReadAtAll( + JNIEnv *env, jobject jthis, jlong fh, jlong offset, + jobject buf, jint count, jlong type) +{ + void *ptr = (*env)->GetDirectBufferAddress(env, buf); + MPI_Request request; + + int rc = MPI_File_iread_at_all((MPI_File)fh, (MPI_Offset)offset, + ptr, count, (MPI_Datatype)type, &request); + + ompi_java_exceptionCheck(env, rc); + return (jlong)request; +} + JNIEXPORT jlong JNICALL Java_mpi_File_iWriteAt( JNIEnv *env, jobject jthis, jlong fh, jlong offset, jobject buf, jint count, jlong type) @@ -250,6 +265,20 @@ JNIEXPORT jlong JNICALL Java_mpi_File_iWriteAt( return (jlong)request; } +JNIEXPORT jlong JNICALL Java_mpi_File_iWriteAtAll( + JNIEnv *env, jobject jthis, jlong fh, jlong offset, + jobject buf, jint count, jlong type) +{ + void *ptr = (*env)->GetDirectBufferAddress(env, buf); + MPI_Request request; + + int rc = MPI_File_iwrite_at_all((MPI_File)fh, (MPI_Offset)offset, + ptr, count, (MPI_Datatype)type, &request); + + ompi_java_exceptionCheck(env, rc); + return (jlong)request; +} + JNIEXPORT void JNICALL Java_mpi_File_read( JNIEnv *env, jobject jthis, jlong fh, jobject buf, jboolean db, jint off, jint count, jlong jType, jint bType, jlongArray stat) @@ -336,6 +365,20 @@ JNIEXPORT jlong JNICALL Java_mpi_File_iRead( return (jlong)request; } +JNIEXPORT jlong JNICALL Java_mpi_File_iReadAll( + JNIEnv *env, jobject jthis, jlong fh, + jobject buf, jint count, jlong type) +{ + void *ptr = (*env)->GetDirectBufferAddress(env, buf); + MPI_Request request; + + int rc = MPI_File_iread_all((MPI_File)fh, ptr, count, + (MPI_Datatype)type, &request); + + ompi_java_exceptionCheck(env, rc); + return (jlong)request; +} + JNIEXPORT jlong JNICALL Java_mpi_File_iWrite( JNIEnv *env, jobject jthis, jlong fh, jobject buf, jint count, jlong type) @@ -350,6 +393,20 @@ JNIEXPORT jlong JNICALL Java_mpi_File_iWrite( return (jlong)request; } +JNIEXPORT jlong JNICALL Java_mpi_File_iWriteAll( + JNIEnv *env, jobject jthis, jlong fh, + jobject buf, jint count, jlong type) +{ + void *ptr = (*env)->GetDirectBufferAddress(env, buf); + MPI_Request request; + + int rc = MPI_File_iwrite_all((MPI_File)fh, ptr, count, + (MPI_Datatype)type, &request); + + ompi_java_exceptionCheck(env, rc); + return (jlong)request; +} + JNIEXPORT void JNICALL Java_mpi_File_seek( JNIEnv *env, jobject jthis, jlong fh, jlong offset, jint whence) { @@ -646,9 +703,43 @@ JNIEXPORT void JNICALL Java_mpi_File_setAtomicity( ompi_java_exceptionCheck(env, rc); } +JNIEXPORT jboolean JNICALL Java_mpi_File_getAtomicity( + JNIEnv *env, jobject jthis, jlong fh) +{ + int atomicity; + int rc = MPI_File_get_atomicity((MPI_File)fh, &atomicity); + ompi_java_exceptionCheck(env, rc); + return atomicity ? JNI_TRUE : JNI_FALSE; +} + JNIEXPORT void JNICALL Java_mpi_File_sync( JNIEnv *env, jobject jthis, jlong fh) { int rc = MPI_File_sync((MPI_File)fh); ompi_java_exceptionCheck(env, rc); } + +JNIEXPORT void JNICALL Java_mpi_File_setErrhandler( + JNIEnv *env, jobject jthis, jlong fh, jlong errhandler) +{ + int rc = MPI_File_set_errhandler( + (MPI_File)fh, (MPI_Errhandler)errhandler); + + ompi_java_exceptionCheck(env, rc); +} + +JNIEXPORT jlong JNICALL Java_mpi_File_getErrhandler( + JNIEnv *env, jobject jthis, jlong fh) +{ + MPI_Errhandler errhandler; + int rc = MPI_File_get_errhandler((MPI_File)fh, &errhandler); + ompi_java_exceptionCheck(env, rc); + return (jlong)errhandler; +} + +JNIEXPORT void JNICALL Java_mpi_File_callErrhandler( + JNIEnv *env, jobject jthis, jlong fh, jint errorCode) +{ + int rc = MPI_File_call_errhandler((MPI_File)fh, errorCode); + ompi_java_exceptionCheck(env, rc); +} diff --git a/ompi/mpi/java/c/mpi_MPI.c b/ompi/mpi/java/c/mpi_MPI.c index af4a1f66149..a2d4c4e6725 100644 --- a/ompi/mpi/java/c/mpi_MPI.c +++ b/ompi/mpi/java/c/mpi_MPI.c @@ -16,7 +16,7 @@ * Copyright (c) 2015 Intel, Inc. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2016 IBM Corporation. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -87,7 +87,6 @@ ompi_java_globals_t ompi_java = {0}; int ompi_mpi_java_eager = 65536; opal_free_list_t ompi_java_buffers = {{{0}}}; -static void *libmpi = NULL; static void bufferConstructor(ompi_java_buffer_t *item) { @@ -108,27 +107,6 @@ OBJ_CLASS_INSTANCE(ompi_java_buffer_t, * Class: mpi_MPI * Method: loadGlobalLibraries * - * Java implementations typically default to loading dynamic - * libraries strictly to a local namespace. This breaks the - * Open MPI model where components reference back up to the - * base libraries (e.g., libmpi) as it requires that the - * symbols in those base libraries be globally available. - * - * One option, of course, is to build with --disable-dlopen. - * However, this would preclude the ability to pickup 3rd-party - * binary plug-ins at time of execution. This is a valuable - * capability that would be a negative factor towards use of - * the Java bindings. - * - * The other option is to explicitly dlopen libmpi ourselves - * and instruct dlopen to add all those symbols to the global - * namespace. This must be done prior to calling any MPI - * function (e.g., MPI_Init) or else Java will have already - * loaded the library to the local namespace. So create a - * special JNI entry point that just loads the required libmpi - * to the global namespace and call it first (see MPI.java), - * thus making all symbols available to subsequent dlopen calls - * when opening OMPI components. */ jint JNI_OnLoad(JavaVM *vm, void *reserved) { @@ -136,43 +114,9 @@ jint JNI_OnLoad(JavaVM *vm, void *reserved) // the library (see comment in the function for more detail). opal_init_psm(); - libmpi = dlopen("lib" OMPI_LIBMPI_NAME "." OPAL_DYN_LIB_SUFFIX, RTLD_NOW | RTLD_GLOBAL); - -#if defined(HAVE_DL_INFO) && defined(HAVE_LIBGEN_H) - /* - * OS X El Capitan does not propagate DYLD_LIBRARY_PATH to children any more - * so if previous dlopen failed, try to open libmpi in the same directory - * than the current libmpi_java - */ - if(NULL == libmpi) { - Dl_info info; - if(0 != dladdr((void *)JNI_OnLoad, &info)) { - char libmpipath[OPAL_PATH_MAX]; - char *libmpijavapath = strdup(info.dli_fname); - if (NULL != libmpijavapath) { - snprintf(libmpipath, OPAL_PATH_MAX-1, "%s/lib" OMPI_LIBMPI_NAME "." OPAL_DYN_LIB_SUFFIX, dirname(libmpijavapath)); - free(libmpijavapath); - libmpi = dlopen(libmpipath, RTLD_NOW | RTLD_GLOBAL); - } - } - } -#endif - - if(NULL == libmpi) - { - fprintf(stderr, "Java bindings failed to load lib" OMPI_LIBMPI_NAME ": %s\n",dlerror()); - exit(1); - } - return JNI_VERSION_1_6; } -void JNI_OnUnload(JavaVM *vm, void *reserved) -{ - if(libmpi != NULL) - dlclose(libmpi); -} - static void initFreeList(void) { OBJ_CONSTRUCT(&ompi_java_buffers, opal_free_list_t); diff --git a/ompi/mpi/java/c/mpi_Win.c b/ompi/mpi/java/c/mpi_Win.c index 95bb919c0f8..551b6e258e6 100644 --- a/ompi/mpi/java/c/mpi_Win.c +++ b/ompi/mpi/java/c/mpi_Win.c @@ -13,6 +13,7 @@ * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017 FUJITSU LIMITED. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -221,11 +222,20 @@ JNIEXPORT void JNICALL Java_mpi_Win_setErrhandler( JNIEnv *env, jobject jthis, jlong win, jlong errhandler) { int rc = MPI_Win_set_errhandler( - (MPI_Win)win, (MPI_Errhandler)MPI_ERRORS_RETURN); + (MPI_Win)win, (MPI_Errhandler)errhandler); ompi_java_exceptionCheck(env, rc); } +JNIEXPORT jlong JNICALL Java_mpi_Win_getErrhandler( + JNIEnv *env, jobject jthis, jlong win) +{ + MPI_Errhandler errhandler; + int rc = MPI_Win_get_errhandler((MPI_Win)win, &errhandler); + ompi_java_exceptionCheck(env, rc); + return (jlong)errhandler; +} + JNIEXPORT void JNICALL Java_mpi_Win_callErrhandler( JNIEnv *env, jobject jthis, jlong win, jint errorCode) { diff --git a/ompi/mpi/java/java/Comm.java b/ompi/mpi/java/java/Comm.java index 938dcce2dbf..ea08bb09245 100644 --- a/ompi/mpi/java/java/Comm.java +++ b/ompi/mpi/java/java/Comm.java @@ -13,6 +13,7 @@ * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017-2018 FUJITSU LIMITED. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -1136,8 +1137,8 @@ public final void deleteAttr(int keyval) throws MPIException /** * Returns the type of topology associated with the communicator. *

Java binding of the MPI operation {@code MPI_TOPO_TEST}. - *

The return value will be one of {@code MPI.GRAPH}, {@code MPI.CART} - * or {@code MPI.UNDEFINED}. + *

The return value will be one of {@code MPI.GRAPH}, {@code MPI.CART}, + * {@code MPI.DIST_GRAPH} or {@code MPI.UNDEFINED}. * @return topology type of communicator * @throws MPIException Signals that an MPI exception of some sort has occurred. */ @@ -1169,7 +1170,7 @@ public final void abort(int errorcode) throws MPIException /** * Associates a new error handler with communicator at the calling process. - *

Java binding of the MPI operation {@code MPI_ERRHANDLER_SET}. + *

Java binding of the MPI operation {@code MPI_COMM_SET_ERRHANDLER}. * @param errhandler new MPI error handler for communicator * @throws MPIException Signals that an MPI exception of some sort has occurred. */ @@ -1184,7 +1185,7 @@ private native void setErrhandler(long comm, long errhandler) /** * Returns the error handler currently associated with the communicator. - *

Java binding of the MPI operation {@code MPI_ERRHANDLER_GET}. + *

Java binding of the MPI operation {@code MPI_COMM_GET_ERRHANDLER}. * @return MPI error handler currently associated with communicator * @throws MPIException Signals that an MPI exception of some sort has occurred. */ @@ -1196,6 +1197,20 @@ public final Errhandler getErrhandler() throws MPIException private native long getErrhandler(long comm); + /** + * Calls the error handler currently associated with the communicator. + *

Java binding of the MPI operation {@code MPI_COMM_CALL_ERRHANDLER}. + * @param errorCode error code + * @throws MPIException Signals that an MPI exception of some sort has occurred. + */ + public void callErrhandler(int errorCode) throws MPIException + { + callErrhandler(handle, errorCode); + } + + private native void callErrhandler(long handle, int errorCode) + throws MPIException; + // Collective Communication /** @@ -2370,9 +2385,9 @@ private native long iAllToAllv(long comm, throws MPIException; /** - * Adds flexibility to {@code allToAll}: location of data for send is //here - * specified by {@code sDispls} and location to place data on receive - * side is specified by {@code rDispls}. + * Adds more flexibility to {@code allToAllv}: datatypes for send are + * specified by {@code sendTypes} and datatypes for receive are specified + * by {@code recvTypes} per process. *

Java binding of the MPI operation {@code MPI_ALLTOALLW}. * @param sendBuf send buffer * @param sendCount number of items sent to each buffer @@ -2406,9 +2421,9 @@ private native void allToAllw(long comm, throws MPIException; /** - * Adds flexibility to {@code iAllToAll}: location of data for send is - * specified by {@code sDispls} and location to place data on receive - * side is specified by {@code rDispls}. + * Adds more flexibility to {@code iAllToAllv}: datatypes for send are + * specified by {@code sendTypes} and datatypes for receive are specified + * by {@code recvTypes} per process. *

Java binding of the MPI operation {@code MPI_IALLTOALLW}. * @param sendBuf send buffer * @param sendCount number of items sent to each buffer diff --git a/ompi/mpi/java/java/Datatype.java b/ompi/mpi/java/java/Datatype.java index a8e113d1cdb..992670abc6f 100644 --- a/ompi/mpi/java/java/Datatype.java +++ b/ompi/mpi/java/java/Datatype.java @@ -11,6 +11,7 @@ * All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2018 FUJITSU LIMITED. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -419,7 +420,7 @@ private static native long getHIndexed( /** * The most general type constructor. - *

Java binding of the MPI operation {@code MPI_TYPE_STRUCT}. + *

Java binding of the MPI operation {@code MPI_TYPE_CREATE_STRUCT}. *

The number of blocks is taken to be size of the {@code blockLengths} * argument. The second and third arguments, {@code displacements}, * and {@code types}, should be the same size. diff --git a/ompi/mpi/java/java/File.java b/ompi/mpi/java/java/File.java index 3309c623770..34506adc83d 100644 --- a/ompi/mpi/java/java/File.java +++ b/ompi/mpi/java/java/File.java @@ -11,6 +11,7 @@ * All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017-2018 FUJITSU LIMITED. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -152,7 +153,7 @@ public long getSize() throws MPIException /** * Java binding of {@code MPI_FILE_GET_GROUP}. - * @return group wich opened the file + * @return group which opened the file * @throws MPIException Signals that an MPI exception of some sort has occurred. */ public Group getGroup() throws MPIException @@ -405,6 +406,29 @@ private native long iReadAt( long fh, long offset, Buffer buf, int count, long type) throws MPIException; + /** + * Java binding of {@code MPI_FILE_IREAD_AT_ALL}. + * @param offset file offset + * @param buf buffer + * @param count number of items in buffer + * @param type datatype of each buffer element + * @return request object + * @throws MPIException Signals that an MPI exception of some sort has occurred. + */ + public Request iReadAtAll(long offset, Buffer buf, int count, Datatype type) + throws MPIException + { + MPI.check(); + assertDirectBuffer(buf); + Request req = new Request(iReadAtAll(handle, offset, buf, count, type.handle)); + req.addRecvBufRef(buf); + return req; + } + + private native long iReadAtAll( + long fh, long offset, Buffer buf, int count, long type) + throws MPIException; + /** * Java binding of {@code MPI_FILE_IWRITE_AT}. * @param offset file offset @@ -428,6 +452,29 @@ private native long iWriteAt( long fh, long offset, Buffer buf, int count, long type) throws MPIException; + /** + * Java binding of {@code MPI_FILE_IWRITE_AT_ALL}. + * @param offset file offset + * @param buf buffer + * @param count number of items in buffer + * @param type datatype of each buffer element + * @return request object + * @throws MPIException Signals that an MPI exception of some sort has occurred. + */ + public Request iWriteAtAll(long offset, Buffer buf, int count, Datatype type) + throws MPIException + { + MPI.check(); + assertDirectBuffer(buf); + Request req = new Request(iWriteAtAll(handle, offset, buf, count, type.handle)); + req.addSendBufRef(buf); + return req; + } + + private native long iWriteAtAll( + long fh, long offset, Buffer buf, int count, long type) + throws MPIException; + /** * Java binding of {@code MPI_FILE_READ}. * @param buf buffer @@ -564,6 +611,26 @@ public Request iRead(Buffer buf, int count, Datatype type) throws MPIException private native long iRead(long fh, Buffer buf, int count, long type) throws MPIException; + /** + * Java binding of {@code MPI_FILE_IREAD_ALL}. + * @param buf buffer + * @param count number of items in buffer + * @param type datatype of each buffer element + * @return request object + * @throws MPIException Signals that an MPI exception of some sort has occurred. + */ + public Request iReadAll(Buffer buf, int count, Datatype type) throws MPIException + { + MPI.check(); + assertDirectBuffer(buf); + Request req = new Request(iReadAll(handle, buf, count, type.handle)); + req.addRecvBufRef(buf); + return req; + } + + private native long iReadAll(long fh, Buffer buf, int count, long type) + throws MPIException; + /** * Java binding of {@code MPI_FILE_IWRITE}. * @param buf buffer @@ -584,6 +651,26 @@ public Request iWrite(Buffer buf, int count, Datatype type) throws MPIException private native long iWrite(long fh, Buffer buf, int count, long type) throws MPIException; + /** + * Java binding of {@code MPI_FILE_IWRITE_ALL}. + * @param buf buffer + * @param count number of items in buffer + * @param type datatype of each buffer element + * @return request object + * @throws MPIException Signals that an MPI exception of some sort has occurred. + */ + public Request iWriteAll(Buffer buf, int count, Datatype type) throws MPIException + { + MPI.check(); + assertDirectBuffer(buf); + Request req = new Request(iWriteAll(handle, buf, count, type.handle)); + req.addRecvBufRef(buf); + return req; + } + + private native long iWriteAll(long fh, Buffer buf, int count, long type) + throws MPIException; + /** * Java binding of {@code MPI_FILE_SEEK}. * @param offset file offset @@ -1234,6 +1321,19 @@ public void setAtomicity(boolean atomicity) throws MPIException private native void setAtomicity(long fh, boolean atomicity) throws MPIException; + /** + * Java binding of {@code MPI_FILE_GET_ATOMICITY}. + * @return current consistency of the file + * @throws MPIException Signals that an MPI exception of some sort has occurred. + */ + public boolean getAtomicity() throws MPIException + { + MPI.check(); + return getAtomicity(handle); + } + + private native boolean getAtomicity(long fh) throws MPIException; + /** * Java binding of {@code MPI_FILE_SYNC}. * @throws MPIException Signals that an MPI exception of some sort has occurred. @@ -1246,4 +1346,44 @@ public void sync() throws MPIException private native void sync(long handle) throws MPIException; + /** + * Java binding of the MPI operation {@code MPI_FILE_SET_ERRHANDLER}. + * @param errhandler new MPI error handler for file + * @throws MPIException Signals that an MPI exception of some sort has occurred. + */ + public void setErrhandler(Errhandler errhandler) throws MPIException + { + MPI.check(); + setErrhandler(handle, errhandler.handle); + } + + private native void setErrhandler(long fh, long errhandler) + throws MPIException; + + /** + * Java binding of the MPI operation {@code MPI_FILE_GET_ERRHANDLER}. + * @return MPI error handler currently associated with file + * @throws MPIException Signals that an MPI exception of some sort has occurred. + */ + public Errhandler getErrhandler() throws MPIException + { + MPI.check(); + return new Errhandler(getErrhandler(handle)); + } + + private native long getErrhandler(long fh); + + /** + * Java binding of the MPI operation {@code MPI_FILE_CALL_ERRHANDLER}. + * @param errorCode error code + * @throws MPIException Signals that an MPI exception of some sort has occurred. + */ + public void callErrhandler(int errorCode) throws MPIException + { + callErrhandler(handle, errorCode); + } + + private native void callErrhandler(long handle, int errorCode) + throws MPIException; + } // File diff --git a/ompi/mpi/java/java/Info.java b/ompi/mpi/java/java/Info.java index 82c3f668a5c..ba3bb3a54e8 100644 --- a/ompi/mpi/java/java/Info.java +++ b/ompi/mpi/java/java/Info.java @@ -11,6 +11,7 @@ * All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2018 FUJITSU LIMITED. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -69,7 +70,7 @@ private native void set(long handle, String key, String value) throws MPIException; /** - * Java binding of the MPI operation {@code MPI_INFO_SET}. + * Java binding of the MPI operation {@code MPI_INFO_GET}. * @param key key * @return value or {@code null} if key is not defined * @throws MPIException Signals that an MPI exception of some sort has occurred. @@ -83,7 +84,7 @@ public String get(String key) throws MPIException private native String get(long handle, String key) throws MPIException; /** - * Java binding of the MPI operation {@code MPI_INFO_SET}. + * Java binding of the MPI operation {@code MPI_INFO_DELETE}. * @param key key * @throws MPIException Signals that an MPI exception of some sort has occurred. */ diff --git a/ompi/mpi/java/java/MPI.java b/ompi/mpi/java/java/MPI.java index 35317c15b23..a5e96e0b045 100644 --- a/ompi/mpi/java/java/MPI.java +++ b/ompi/mpi/java/java/MPI.java @@ -11,6 +11,7 @@ * All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2018 FUJITSU LIMITED. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -528,7 +529,7 @@ public static double wtime() throws MPIException /** * Returns resolution of timer. - *

Java binding of the MPI operation {MPI_WTICK}. + *

Java binding of the MPI operation {@code MPI_WTICK}. * @return resolution of {@code wtime} in seconds. * @throws MPIException Signals that an MPI exception of some sort has occurred. */ diff --git a/ompi/mpi/java/java/Makefile.am b/ompi/mpi/java/java/Makefile.am index bf7d2aaa3e5..eb818ea0eeb 100644 --- a/ompi/mpi/java/java/Makefile.am +++ b/ompi/mpi/java/java/Makefile.am @@ -1,8 +1,11 @@ # -*- makefile -*- # -# Copyright (c) 2011-2014 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2011-2018 Cisco Systems, Inc. All rights reserved # Copyright (c) 2015 Los Alamos National Security, LLC. All rights # reserved. +# Copyright (c) 2017 FUJITSU LIMITED. All rights reserved. +# Copyright (c) 2018 Research Organization for Information Science +# and Technology (RIST). All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -73,7 +76,6 @@ if OMPI_WANT_JAVA_BINDINGS # from JAVA_SRC_FILES. JAVA_H = \ mpi_MPI.h \ - mpi_CartParms.h \ mpi_CartComm.h \ mpi_Comm.h \ mpi_Constant.h \ @@ -81,7 +83,6 @@ JAVA_H = \ mpi_Datatype.h \ mpi_Errhandler.h \ mpi_File.h \ - mpi_GraphParms.h \ mpi_GraphComm.h \ mpi_Group.h \ mpi_Info.h \ @@ -91,9 +92,7 @@ JAVA_H = \ mpi_Op.h \ mpi_Prequest.h \ mpi_Request.h \ - mpi_ShiftParms.h \ mpi_Status.h \ - mpi_Version.h \ mpi_Win.h # A little verbosity magic; see Makefile.ompi-rules for an explanation. @@ -139,6 +138,7 @@ ompi__v_JAVADOC_QUIET_0 = -quiet # in. This, along with the fact that the .java files seem to have # circular references, prevents us from using a .foo.bar: generic # Makefile rule. :-( +if OMPI_HAVE_JAVAH_SUPPORT mpi/MPI.class: $(JAVA_SRC_FILES) $(OMPI_V_JAVAC) CLASSPATH=. ; \ export CLASSPATH ; \ @@ -147,11 +147,18 @@ mpi/MPI.class: $(JAVA_SRC_FILES) # Similar to above, all the generated .h files are dependent upon the # token mpi/MPI.class file. Hence, all the classes will be generated # first, then we'll individually generate each of the .h files. + $(JAVA_H): mpi/MPI.class $(OMPI_V_JAVAH) sourcename=mpi.`echo $@ | sed -e s/^mpi_// -e s/.h$$//`; \ CLASSPATH=. ; \ export CLASSPATH ; \ $(JAVAH) -d . -jni $$sourcename +else +mpi/MPI.class: $(JAVA_SRC_FILES) + $(OMPI_V_JAVAC) CLASSPATH=. ; \ + export CLASSPATH ; \ + $(JAVAC) -h . -d . $(top_srcdir)/ompi/mpi/java/java/*.java +endif # OMPI_HAVE_JAVAH_SUPPORT # Generate the .jar file from all the class files. List mpi/MPI.class # as a dependency so that it fires the rule above that will generate @@ -170,7 +177,11 @@ java_DATA = mpi.jar # List all the header files in BUILT_SOURCES so that Automake's "all" # target will build them. This will also force the building of the # mpi/*.class files (for the jar file). +if OMPI_HAVE_JAVAH_SUPPORT BUILT_SOURCES = $(JAVA_H) doc +else +BUILT_SOURCES = mpi/MPI.class doc +endif # Convenience for building Javadoc docs jdoc: doc @@ -179,7 +190,7 @@ jdoc: doc # mpi.jar is ever rebuilt, then also make the docs eligible to be # rebuilt. doc: mpi/MPI.class - $(OMPI_V_JAVADOC) javadoc $(OMPI_V_JAVADOC_QUIET) -d doc $(srcdir)/*.java + $(OMPI_V_JAVADOC) $(JAVADOC) $(OMPI_V_JAVADOC_QUIET) -d doc $(srcdir)/*.java @touch doc jdoc-install: doc diff --git a/ompi/mpi/java/java/Op.java b/ompi/mpi/java/java/Op.java index eb3ccd86638..a10559825b2 100644 --- a/ompi/mpi/java/java/Op.java +++ b/ompi/mpi/java/java/Op.java @@ -11,6 +11,7 @@ * All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2018 FUJITSU LIMITED. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -109,7 +110,7 @@ protected void call(Object invec, Object inoutvec, int count) } /** - * Test if the operation is conmutative. + * Test if the operation is commutative. *

Java binding of the MPI operation {@code MPI_OP_COMMUTATIVE}. * @return {@code true} if commutative, {@code false} otherwise */ diff --git a/ompi/mpi/java/java/Win.java b/ompi/mpi/java/java/Win.java index 91b09f58776..d3a7d7c1682 100644 --- a/ompi/mpi/java/java/Win.java +++ b/ompi/mpi/java/java/Win.java @@ -13,6 +13,7 @@ * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017 FUJITSU LIMITED. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -427,6 +428,19 @@ public void setErrhandler(Errhandler errhandler) throws MPIException private native void setErrhandler(long win, long errhandler) throws MPIException; + /** + * Java binding of the MPI operation {@code MPI_WIN_GET_ERRHANDLER}. + * @return MPI error handler currently associated with window + * @throws MPIException Signals that an MPI exception of some sort has occurred. + */ + public Errhandler getErrhandler() throws MPIException + { + MPI.check(); + return new Errhandler(getErrhandler(handle)); + } + + private native long getErrhandler(long win); + /** * Java binding of the MPI operation {@code MPI_WIN_CALL_ERRHANDLER}. * @param errorCode error code diff --git a/ompi/mpi/man/man3/MPI_Alltoallv.3in b/ompi/mpi/man/man3/MPI_Alltoallv.3in index 79fed316094..678b3f4bf8d 100644 --- a/ompi/mpi/man/man3/MPI_Alltoallv.3in +++ b/ompi/mpi/man/man3/MPI_Alltoallv.3in @@ -30,7 +30,6 @@ int MPI_Ialltoallv(const void *\fIsendbuf\fP, const int \fIsendcounts\fP[], .nf USE MPI ! or the older form: INCLUDE 'mpif.h' - MPI_ALLTOALLV(\fISENDBUF, SENDCOUNTS, SDISPLS, SENDTYPE, RECVBUF, RECVCOUNTS, RDISPLS, RECVTYPE, COMM, IERROR\fP) diff --git a/ompi/mpi/man/man3/MPI_Attr_get.3in b/ompi/mpi/man/man3/MPI_Attr_get.3in index f355acad902..def22fcb7c0 100644 --- a/ompi/mpi/man/man3/MPI_Attr_get.3in +++ b/ompi/mpi/man/man3/MPI_Attr_get.3in @@ -50,7 +50,7 @@ Fortran only: Error status (integer). .SH DESCRIPTION .ft R Note that use of this routine is \fIdeprecated\fP as of MPI-2, and -was \fIdeleted\fP in MPI-3. Please use MPI_Comm_create_attr. This +was \fIdeleted\fP in MPI-3. Please use MPI_Comm_get_attr. This function does not have a C++ or mpi_f08 binding. .sp Retrieves attribute value by key. The call is erroneous if there is no key diff --git a/ompi/mpi/man/man3/MPI_Cart_create.3in b/ompi/mpi/man/man3/MPI_Cart_create.3in index 6a7cb599e2a..16ed8f3a137 100644 --- a/ompi/mpi/man/man3/MPI_Cart_create.3in +++ b/ompi/mpi/man/man3/MPI_Cart_create.3in @@ -41,7 +41,7 @@ MPI_Cart_create(\fIcomm_old\fP, \fIndims\fP, \fIdims\fP, \fIperiods\fP, \fIreord .SH C++ Syntax .nf #include -Cartcomm Intracomm.Create_cart(int[] \fIndims\fP, int[] \fIdims\fP[], +Cartcomm Intracomm.Create_cart(int \fIndims\fP, int[] \fIdims\fP[], const bool \fIperiods\fP[], bool \fIreorder\fP) const .fi diff --git a/ompi/mpi/man/man3/MPI_Cart_shift.3in b/ompi/mpi/man/man3/MPI_Cart_shift.3in index 142e4b6a8cf..9b1de40a971 100644 --- a/ompi/mpi/man/man3/MPI_Cart_shift.3in +++ b/ompi/mpi/man/man3/MPI_Cart_shift.3in @@ -83,7 +83,7 @@ Depending on the periodicity of the Cartesian group in the specified coordinate .nf \&.... C find process rank - CALL MPI_COMM_RANK(comm, rank, ierr)) + CALL MPI_COMM_RANK(comm, rank, ierr) C find Cartesian coordinates CALL MPI_CART_COORDS(comm, rank, maxdims, coords, ierr) diff --git a/ompi/mpi/man/man3/MPI_Comm_dup_with_info.3in b/ompi/mpi/man/man3/MPI_Comm_dup_with_info.3in index dcad64f539e..fd69e403c46 100644 --- a/ompi/mpi/man/man3/MPI_Comm_dup_with_info.3in +++ b/ompi/mpi/man/man3/MPI_Comm_dup_with_info.3in @@ -60,6 +60,10 @@ MPI_Comm_dup_with_info acts exactly like MPI_Comm_dup except that the info hints associated with the communicator \fIcomm\fP are not duplicated in \fInewcomm\fP. The hints provided by the argument \fIinfo\fP are associated with the output communicator \fInewcomm\fP instead. +.sp +See +.BR MPI_Comm_set_info (3) +for the list of recognized info keys. .SH NOTES This operation is used to provide a parallel @@ -82,3 +86,4 @@ called. By default, this error handler aborts the MPI job, except for I/O functi .SH SEE ALSO MPI_Comm_dup MPI_Comm_idup +MPI_Comm_set_info diff --git a/ompi/mpi/man/man3/MPI_Comm_set_info.3in b/ompi/mpi/man/man3/MPI_Comm_set_info.3in index d768ec51318..38bee95c823 100644 --- a/ompi/mpi/man/man3/MPI_Comm_set_info.3in +++ b/ompi/mpi/man/man3/MPI_Comm_set_info.3in @@ -58,6 +58,31 @@ requires to be the same on all processes must appear with the same value in each process's .I info object. +.sp +The following info key assertions may be accepted by Open MPI: +.sp +\fImpi_assert_no_any_tag\fP (boolean): If set to true, then the +implementation may assume that the process will not use the +MPI_ANY_TAG wildcard on the given +communicator. +.sp +\fImpi_assert_no_any_source\fP (boolean): If set to true, then +the implementation may assume that the process will not use the +MPI_ANY_SOURCE wildcard on the given communicator. +.sp +\fImpi_assert_exact_length\fP (boolean): If set to true, then the +implementation may assume that the lengths of messages received by the +process are equal to the lengths of the corresponding receive buffers, +for point-to-point communication operations on the given communicator. +.sp +\fImpi_assert_allow_overtaking\fP (boolean): If set to true, then the +implementation may assume that point-to-point communications on the +given communicator do not rely on the non-overtaking rule specified in +MPI-3.1 Section 3.5. In other words, the application asserts that send +operations are not required to be matched at the receiver in the order +in which the send operations were performed by the sender, and receive +operations are not required to be matched in the order in which they +were performed by the receiver. . .SH ERRORS Almost all MPI routines return an error value; C routines as the value diff --git a/ompi/mpi/man/man3/MPI_Comm_spawn.3in b/ompi/mpi/man/man3/MPI_Comm_spawn.3in index 5580353de19..69966e81e07 100644 --- a/ompi/mpi/man/man3/MPI_Comm_spawn.3in +++ b/ompi/mpi/man/man3/MPI_Comm_spawn.3in @@ -173,7 +173,7 @@ ompi_preload_files char * A comma-separated list of files that \fIompi_preload_binary\fP - files can be moved to the target even if an executable is not moved. -ompi_stdin_target char* Comma-delimited list of ranks to +ompi_stdin_target char * Comma-delimited list of ranks to receive stdin when forwarded. ompi_non_mpi bool If set to true, launching a non-MPI application; the returned communicator @@ -186,25 +186,25 @@ ompi_param char * Pass an OMPI MCA parameter to the exists in the environment, the value will be overwritten by the provided value. -mapper char* Mapper to be used for this job -map_by char* Mapping directive indicating how +mapper char * Mapper to be used for this job +map_by char * Mapping directive indicating how processes are to be mapped (slot, node, socket, etc.). -rank_by char * Ranking directive indicating how +rank_by char * Ranking directive indicating how processes are to be ranked (slot, node, socket, etc.). -bind_to char * Binding directive indicating how +bind_to char * Binding directive indicating how processes are to be bound (core, slot, node, socket, etc.). -path char* List of directories to search for +path char * List of directories to search for the executable -npernode char* Number of processes to spawn on +npernode char * Number of processes to spawn on each node of the allocation -pernode bool Equivalent to npernode of 1 -ppr char* Spawn specified number of processes - on each of the identified object type -env char* Newline-delimited list of envars to - be passed to the spawned procs +pernode bool Equivalent to npernode of 1 +ppr char * Spawn specified number of processes + on each of the identified object type +env char * Newline-delimited list of envars to + be passed to the spawned procs .fi \fIbool\fP info keys are actually strings but are evaluated as diff --git a/ompi/mpi/man/man3/MPI_Comm_spawn_multiple.3in b/ompi/mpi/man/man3/MPI_Comm_spawn_multiple.3in index 41ba586cc5f..e7d47de3ea7 100644 --- a/ompi/mpi/man/man3/MPI_Comm_spawn_multiple.3in +++ b/ompi/mpi/man/man3/MPI_Comm_spawn_multiple.3in @@ -1,6 +1,6 @@ .\" -*- nroff -*- .\" Copyright 2013 Los Alamos National Security, LLC. All rights reserved. -.\" Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. +.\" Copyright (c) 2010-2018 Cisco Systems, Inc. All rights reserved .\" Copyright 2006-2008 Sun Microsystems, Inc. .\" Copyright (c) 1996 Thinking Machines Corporation .\" $COPYRIGHT$ @@ -179,7 +179,7 @@ ompi_preload_files char * A comma-separated list of files that \fIompi_preload_binary\fP - files can be moved to the target even if an executable is not moved. -ompi_stdin_target char* Comma-delimited list of ranks to +ompi_stdin_target char * Comma-delimited list of ranks to receive stdin when forwarded. ompi_non_mpi bool If set to true, launching a non-MPI application; the returned communicator @@ -192,25 +192,25 @@ ompi_param char * Pass an OMPI MCA parameter to the exists in the environment, the value will be overwritten by the provided value. -mapper char* Mapper to be used for this job -map_by char* Mapping directive indicating how +mapper char * Mapper to be used for this job +map_by char * Mapping directive indicating how processes are to be mapped (slot, node, socket, etc.). -rank_by char * Ranking directive indicating how +rank_by char * Ranking directive indicating how processes are to be ranked (slot, node, socket, etc.). -bind_to char * Binding directive indicating how +bind_to char * Binding directive indicating how processes are to be bound (core, slot, node, socket, etc.). -path char* List of directories to search for +path char * List of directories to search for the executable -npernode char* Number of processes to spawn on +npernode char * Number of processes to spawn on each node of the allocation -pernode bool Equivalent to npernode of 1 -ppr char* Spawn specified number of processes - on each of the identified object type -env char* Newline-delimited list of envars to - be passed to the spawned procs +pernode bool Equivalent to npernode of 1 +ppr char * Spawn specified number of processes + on each of the identified object type +env char * Newline-delimited list of envars to + be passed to the spawned procs .fi .sp @@ -249,6 +249,15 @@ parameter; see MPI_Comm_spawn(3)'s description of the .I argv parameter for more details. .sp +MPI-3.1 implies (but does not directly state) that the argument +\fIarray_of_commands\fP must be an array of strings of length +\fIcount\fP. Unlike the \fIarray_of_argv\fP parameter, +\fIarray_of_commands\fP does not need to be terminated with a NULL +pointer in C or a blank string in Fortran. Older versions of Open MPI +required that \fIarray_of_commands\fP be terminated with a blank +string in Fortran; that is no longer required in this version of Open +MPI. +.sp Calling MPI_Comm_spawn(3) many times would create many sets of children with different MPI_COMM_WORLDs, whereas MPI_Comm_spawn_multiple creates children with a single MPI_COMM_WORLD, diff --git a/ompi/mpi/man/man3/MPI_Imrecv.3in b/ompi/mpi/man/man3/MPI_Imrecv.3in index be032498464..b453e7db056 100644 --- a/ompi/mpi/man/man3/MPI_Imrecv.3in +++ b/ompi/mpi/man/man3/MPI_Imrecv.3in @@ -22,7 +22,7 @@ USE MPI ! or the older form: INCLUDE 'mpif.h' MPI_IMRECV(\fIBUF, COUNT, DATATYPE, MESSAGE, REQUEST, IERROR\fP) \fIBUF(*)\fP - INTEGER \fCOUNT, DATATYPE, MESSAGE, REQUEST, IERROR\fP + INTEGER \fICOUNT, DATATYPE, MESSAGE, REQUEST, IERROR\fP .fi .SH Fortran 2008 Syntax diff --git a/ompi/mpi/man/man3/MPI_Mrecv.3in b/ompi/mpi/man/man3/MPI_Mrecv.3in index e0f34f8ed60..96037e0a560 100644 --- a/ompi/mpi/man/man3/MPI_Mrecv.3in +++ b/ompi/mpi/man/man3/MPI_Mrecv.3in @@ -22,7 +22,7 @@ USE MPI ! or the older form: INCLUDE 'mpif.h' MPI_MRECV(\fIBUF, COUNT, DATATYPE, MESSAGE, STATUS, IERROR\fP) \fIBUF(*)\fP - INTEGER \fCOUNT, DATATYPE, MESSAGE\fP + INTEGER \fICOUNT, DATATYPE, MESSAGE\fP INTEGER \fISTATUS(MPI_STATUS_SIZE), IERROR\fP .fi diff --git a/ompi/mpi/man/man3/MPI_Neighbor_alltoallv.3in b/ompi/mpi/man/man3/MPI_Neighbor_alltoallv.3in index ae211b84adb..aaf678813ab 100644 --- a/ompi/mpi/man/man3/MPI_Neighbor_alltoallv.3in +++ b/ompi/mpi/man/man3/MPI_Neighbor_alltoallv.3in @@ -30,7 +30,6 @@ int MPI_Ineighbor_alltoallv(const void *\fIsendbuf\fP, const int \fIsendcounts\f .nf USE MPI ! or the older form: INCLUDE 'mpif.h' - MPI_NEIGHBOR_ALLTOALLV(\fISENDBUF, SENDCOUNTS, SDISPLS, SENDTYPE, RECVBUF, RECVCOUNTS, RDISPLS, RECVTYPE, COMM, IERROR\fP) diff --git a/ompi/mpi/man/man3/MPI_Sizeof.3in b/ompi/mpi/man/man3/MPI_Sizeof.3in index e6fbf64aaca..de9a3175810 100644 --- a/ompi/mpi/man/man3/MPI_Sizeof.3in +++ b/ompi/mpi/man/man3/MPI_Sizeof.3in @@ -23,7 +23,7 @@ INTEGER \fISIZE, IERROR\fP .SH Fortran 2008 Syntax .nf USE mpi_f08 -MPI_Sizeof(\fx\fP, \fIsize\fP, \fIierror\fP) +MPI_Sizeof(\fIx\fP, \fIsize\fP, \fIierror\fP) TYPE(*), DIMENSION(..) :: \fIx\fP INTEGER, INTENT(OUT) :: \fIsize\fP INTEGER, OPTIONAL, INTENT(OUT) :: \fIierror\fP diff --git a/ompi/mpi/man/man3/MPI_Type_create_subarray.3in b/ompi/mpi/man/man3/MPI_Type_create_subarray.3in index 36fd5de3448..ee21a0b9de1 100644 --- a/ompi/mpi/man/man3/MPI_Type_create_subarray.3in +++ b/ompi/mpi/man/man3/MPI_Type_create_subarray.3in @@ -13,7 +13,7 @@ .SH C Syntax .nf #include -int MPI_Type_create_subarray(int \fIndims\fP, const int \fIarray_of_sizes[]\fP, const int \fIarray_of_subsizes[]\fP, const int \fIarray_of_starts[]\fP, int \fIorder\fP, MPI_Datatype \fIoldtype\fO, MPI_Datatype \fI*newtype\fP) +int MPI_Type_create_subarray(int \fIndims\fP, const int \fIarray_of_sizes[]\fP, const int \fIarray_of_subsizes[]\fP, const int \fIarray_of_starts[]\fP, int \fIorder\fP, MPI_Datatype \fIoldtype\fP, MPI_Datatype \fI*newtype\fP) .fi .SH Fortran Syntax diff --git a/ompi/mpi/man/man3/MPI_Win_allocate.3in b/ompi/mpi/man/man3/MPI_Win_allocate.3in index 0115c4aa662..6f90f807bdc 100644 --- a/ompi/mpi/man/man3/MPI_Win_allocate.3in +++ b/ompi/mpi/man/man3/MPI_Win_allocate.3in @@ -22,7 +22,7 @@ int MPI_Win_allocate (MPI_Aint \fIsize\fP, int \fIdisp_unit\fP, MPI_Info \fIinfo .nf USE MPI ! or the older form: INCLUDE 'mpif.h' -MPI_WIN_ALLOCATE(\fSIZE, DISP_UNIT, INFO, COMM, BASEPTR, WIN, IERROR\fP) +MPI_WIN_ALLOCATE(\fISIZE, DISP_UNIT, INFO, COMM, BASEPTR, WIN, IERROR\fP) INTEGER(KIND=MPI_ADDRESS_KIND) \fISIZE, BASEPTR\fP INTEGER \fIDISP_UNIT, INFO, COMM, WIN, IERROR\fP diff --git a/ompi/mpi/man/man3/MPI_Win_allocate_shared.3in b/ompi/mpi/man/man3/MPI_Win_allocate_shared.3in index 7ad410ff3b7..8c995fb186d 100644 --- a/ompi/mpi/man/man3/MPI_Win_allocate_shared.3in +++ b/ompi/mpi/man/man3/MPI_Win_allocate_shared.3in @@ -22,7 +22,7 @@ int MPI_Win_allocate_shared (MPI_Aint \fIsize\fP, int \fIdisp_unit\fP, MPI_Info .nf USE MPI ! or the older form: INCLUDE 'mpif.h' -MPI_WIN_ALLOCATE_SHARED(\fSIZE, DISP_UNIT, INFO, COMM, BASEPTR, WIN, IERROR\fP) +MPI_WIN_ALLOCATE_SHARED(\fISIZE, DISP_UNIT, INFO, COMM, BASEPTR, WIN, IERROR\fP) INTEGER(KIND=MPI_ADDRESS_KIND) \fISIZE, BASEPTR\fP INTEGER \fIDISP_UNIT, INFO, COMM, WIN, IERROR\fP diff --git a/ompi/mpi/man/man3/MPI_Win_flush.3in b/ompi/mpi/man/man3/MPI_Win_flush.3in index 770b4873917..1b41798b0ba 100644 --- a/ompi/mpi/man/man3/MPI_Win_flush.3in +++ b/ompi/mpi/man/man3/MPI_Win_flush.3in @@ -25,7 +25,7 @@ USE MPI MPI_WIN_FLUSH(\fIRANK, WIN, IERROR\fP) INTEGER \fIRANK, WIN, IERROR\fP -MPI_WIN_FLUSH_ALL(\fWIN, IERROR\fP) +MPI_WIN_FLUSH_ALL(\fIWIN, IERROR\fP) INTEGER \fIWIN, IERROR\fP .fi diff --git a/ompi/mpi/man/man3/MPI_Win_flush_local.3in b/ompi/mpi/man/man3/MPI_Win_flush_local.3in index dc6044f7a93..440fbfe41f8 100644 --- a/ompi/mpi/man/man3/MPI_Win_flush_local.3in +++ b/ompi/mpi/man/man3/MPI_Win_flush_local.3in @@ -25,7 +25,7 @@ USE MPI MPI_WIN_FLUSH_LOCAL(\fIRANK, WIN, IERROR\fP) INTEGER \fIRANK, WIN, IERROR\fP -MPI_WIN_FLUSH_LOCAL_ALL(\fWIN, IERROR\fP) +MPI_WIN_FLUSH_LOCAL_ALL(\fIWIN, IERROR\fP) INTEGER \fIWIN, IERROR\fP .fi diff --git a/ompi/mpi/man/man3/MPI_Win_unlock_all.3in b/ompi/mpi/man/man3/MPI_Win_unlock_all.3in index 6dfe84e0117..480fe0dbc05 100644 --- a/ompi/mpi/man/man3/MPI_Win_unlock_all.3in +++ b/ompi/mpi/man/man3/MPI_Win_unlock_all.3in @@ -20,8 +20,8 @@ int MPI_Win_unlock_all(MPI_Win \fIwin\fP) .nf USE MPI ! or the older form: INCLUDE 'mpif.h' -MPI_WIN_UNLOCK_ALL(\fWIN, IERROR\fP) - INTEGER \fWIN, IERROR\fP +MPI_WIN_UNLOCK_ALL(\fIWIN, IERROR\fP) + INTEGER \fIWIN, IERROR\fP .fi .SH Fortran 2008 Syntax diff --git a/ompi/mpi/tool/category_changed.c b/ompi/mpi/tool/category_changed.c index d6a18c8ba80..aed854ba669 100644 --- a/ompi/mpi/tool/category_changed.c +++ b/ompi/mpi/tool/category_changed.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -27,9 +28,9 @@ int MPI_T_category_changed(int *stamp) return MPI_T_ERR_NOT_INITIALIZED; } - mpit_lock (); + ompi_mpit_lock (); *stamp = mca_base_var_group_get_stamp (); - mpit_unlock (); + ompi_mpit_unlock (); return MPI_SUCCESS; } diff --git a/ompi/mpi/tool/category_get_categories.c b/ompi/mpi/tool/category_get_categories.c index 5be82880b4e..0e85d9edd42 100644 --- a/ompi/mpi/tool/category_get_categories.c +++ b/ompi/mpi/tool/category_get_categories.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -32,7 +33,7 @@ int MPI_T_category_get_categories(int cat_index, int len, int indices[]) return MPI_T_ERR_NOT_INITIALIZED; } - mpit_lock (); + ompi_mpit_lock (); do { rc = mca_base_var_group_get (cat_index, &group); @@ -49,7 +50,7 @@ int MPI_T_category_get_categories(int cat_index, int len, int indices[]) } } while (0); - mpit_unlock (); + ompi_mpit_unlock (); return rc; } diff --git a/ompi/mpi/tool/category_get_cvars.c b/ompi/mpi/tool/category_get_cvars.c index ea9424f5ca1..9983958aeff 100644 --- a/ompi/mpi/tool/category_get_cvars.c +++ b/ompi/mpi/tool/category_get_cvars.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -32,7 +33,7 @@ int MPI_T_category_get_cvars(int cat_index, int len, int indices[]) return MPI_T_ERR_NOT_INITIALIZED; } - mpit_lock (); + ompi_mpit_lock (); do { rc = mca_base_var_group_get (cat_index, &group); @@ -49,7 +50,7 @@ int MPI_T_category_get_cvars(int cat_index, int len, int indices[]) } } while (0); - mpit_unlock (); + ompi_mpit_unlock (); return rc; } diff --git a/ompi/mpi/tool/category_get_index.c b/ompi/mpi/tool/category_get_index.c index 6edb6f2af4d..f25473c7b8a 100644 --- a/ompi/mpi/tool/category_get_index.c +++ b/ompi/mpi/tool/category_get_index.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2014 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -33,9 +34,9 @@ int MPI_T_category_get_index (const char *name, int *category_index) return MPI_ERR_ARG; } - mpit_lock (); + ompi_mpit_lock (); ret = mca_base_var_group_find_by_name (name, category_index); - mpit_unlock (); + ompi_mpit_unlock (); if (OPAL_SUCCESS != ret) { return MPI_T_ERR_INVALID_NAME; } diff --git a/ompi/mpi/tool/category_get_info.c b/ompi/mpi/tool/category_get_info.c index c10a2aa708d..2b6766e54f2 100644 --- a/ompi/mpi/tool/category_get_info.c +++ b/ompi/mpi/tool/category_get_info.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -32,7 +33,7 @@ int MPI_T_category_get_info(int cat_index, char *name, int *name_len, return MPI_T_ERR_NOT_INITIALIZED; } - mpit_lock (); + ompi_mpit_lock (); do { rc = mca_base_var_group_get (cat_index, &group); @@ -57,7 +58,7 @@ int MPI_T_category_get_info(int cat_index, char *name, int *name_len, mpit_copy_string (desc, desc_len, group->group_description); } while (0); - mpit_unlock (); + ompi_mpit_unlock (); return rc; } diff --git a/ompi/mpi/tool/category_get_num.c b/ompi/mpi/tool/category_get_num.c index dbab0b2bf60..cfbfcd8b0e6 100644 --- a/ompi/mpi/tool/category_get_num.c +++ b/ompi/mpi/tool/category_get_num.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -31,9 +32,9 @@ int MPI_T_category_get_num (int *num_cat) return MPI_ERR_ARG; } - mpit_lock (); + ompi_mpit_lock (); *num_cat = mca_base_var_group_get_count (); - mpit_unlock (); + ompi_mpit_unlock (); return MPI_SUCCESS; } diff --git a/ompi/mpi/tool/category_get_pvars.c b/ompi/mpi/tool/category_get_pvars.c index 3936fb9b022..e6337ed2fe2 100644 --- a/ompi/mpi/tool/category_get_pvars.c +++ b/ompi/mpi/tool/category_get_pvars.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-213 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -32,7 +33,7 @@ int MPI_T_category_get_pvars(int cat_index, int len, int indices[]) return MPI_T_ERR_NOT_INITIALIZED; } - mpit_lock (); + ompi_mpit_lock (); do { rc = mca_base_var_group_get (cat_index, &group); @@ -49,7 +50,7 @@ int MPI_T_category_get_pvars(int cat_index, int len, int indices[]) } } while (0); - mpit_unlock (); + ompi_mpit_unlock (); return rc; } diff --git a/ompi/mpi/tool/cvar_get_index.c b/ompi/mpi/tool/cvar_get_index.c index e587adf7f34..2445d0462c4 100644 --- a/ompi/mpi/tool/cvar_get_index.c +++ b/ompi/mpi/tool/cvar_get_index.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2014 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -33,9 +34,9 @@ int MPI_T_cvar_get_index (const char *name, int *cvar_index) return MPI_ERR_ARG; } - mpit_lock (); + ompi_mpit_lock (); ret = mca_base_var_find_by_name (name, cvar_index); - mpit_unlock (); + ompi_mpit_unlock (); if (OPAL_SUCCESS != ret) { return MPI_T_ERR_INVALID_NAME; } diff --git a/ompi/mpi/tool/cvar_get_info.c b/ompi/mpi/tool/cvar_get_info.c index e6f70c0c749..ba3bde12f8e 100644 --- a/ompi/mpi/tool/cvar_get_info.c +++ b/ompi/mpi/tool/cvar_get_info.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -32,7 +33,7 @@ int MPI_T_cvar_get_info(int cvar_index, char *name, int *name_len, int *verbosit return MPI_T_ERR_NOT_INITIALIZED; } - mpit_lock (); + ompi_mpit_lock (); do { rc = mca_base_var_get (cvar_index, &var); @@ -69,7 +70,7 @@ int MPI_T_cvar_get_info(int cvar_index, char *name, int *name_len, int *verbosit } } while (0); - mpit_unlock (); + ompi_mpit_unlock (); return rc; } diff --git a/ompi/mpi/tool/cvar_get_num.c b/ompi/mpi/tool/cvar_get_num.c index 7ece8df6d84..10e04514eee 100644 --- a/ompi/mpi/tool/cvar_get_num.c +++ b/ompi/mpi/tool/cvar_get_num.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -30,9 +31,9 @@ int MPI_T_cvar_get_num (int *num_cvar) { return MPI_ERR_ARG; } - mpit_lock (); + ompi_mpit_lock (); *num_cvar = mca_base_var_get_count(); - mpit_unlock (); + ompi_mpit_unlock (); return MPI_SUCCESS; } diff --git a/ompi/mpi/tool/cvar_handle_alloc.c b/ompi/mpi/tool/cvar_handle_alloc.c index 0ef8eea42de..6e0ae41dd3f 100644 --- a/ompi/mpi/tool/cvar_handle_alloc.c +++ b/ompi/mpi/tool/cvar_handle_alloc.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -35,7 +36,7 @@ int MPI_T_cvar_handle_alloc (int cvar_index, void *obj_handle, return MPI_ERR_ARG; } - mpit_lock (); + ompi_mpit_lock (); *handle = NULL; @@ -68,7 +69,7 @@ int MPI_T_cvar_handle_alloc (int cvar_index, void *obj_handle, *handle = (MPI_T_cvar_handle) new_handle; } while (0); - mpit_unlock (); + ompi_mpit_unlock (); return rc; } diff --git a/ompi/mpi/tool/cvar_read.c b/ompi/mpi/tool/cvar_read.c index e79df41f81a..2246c5f88be 100644 --- a/ompi/mpi/tool/cvar_read.c +++ b/ompi/mpi/tool/cvar_read.c @@ -1,9 +1,10 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* - * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights + * Copyright (c) 2012-2018 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2016 Intel, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -35,7 +36,7 @@ int MPI_T_cvar_read (MPI_T_cvar_handle handle, void *buf) return MPI_ERR_ARG; } - mpit_lock (); + ompi_mpit_lock (); do { rc = mca_base_var_get_value(handle->var->mbv_index, &value, NULL, NULL); @@ -50,6 +51,15 @@ int MPI_T_cvar_read (MPI_T_cvar_handle handle, void *buf) case MCA_BASE_VAR_TYPE_UNSIGNED_INT: ((int *) buf)[0] = value->intval; break; + case MCA_BASE_VAR_TYPE_INT32_T: + case MCA_BASE_VAR_TYPE_UINT32_T: + ((int32_t *) buf)[0] = value->int32tval; + break; + case MCA_BASE_VAR_TYPE_INT64_T: + case MCA_BASE_VAR_TYPE_UINT64_T: + ((int64_t *) buf)[0] = value->int64tval; + break; + case MCA_BASE_VAR_TYPE_LONG: case MCA_BASE_VAR_TYPE_UNSIGNED_LONG: ((unsigned long *) buf)[0] = value->ulval; break; @@ -60,7 +70,7 @@ int MPI_T_cvar_read (MPI_T_cvar_handle handle, void *buf) ((size_t *) buf)[0] = value->sizetval; break; case MCA_BASE_VAR_TYPE_BOOL: - ((int *) buf)[0] = value->boolval; + ((bool *) buf)[0] = value->boolval; break; case MCA_BASE_VAR_TYPE_DOUBLE: ((double *) buf)[0] = value->lfval; @@ -78,7 +88,7 @@ int MPI_T_cvar_read (MPI_T_cvar_handle handle, void *buf) } } while (0); - mpit_unlock (); + ompi_mpit_unlock (); return rc; } diff --git a/ompi/mpi/tool/cvar_write.c b/ompi/mpi/tool/cvar_write.c index a76e6a39c55..4d660416e0a 100644 --- a/ompi/mpi/tool/cvar_write.c +++ b/ompi/mpi/tool/cvar_write.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -33,7 +34,7 @@ int MPI_T_cvar_write (MPI_T_cvar_handle handle, const void *buf) return MPI_ERR_ARG; } - mpit_lock (); + ompi_mpit_lock (); do { if (MCA_BASE_VAR_SCOPE_CONSTANT == handle->var->mbv_scope || @@ -53,7 +54,7 @@ int MPI_T_cvar_write (MPI_T_cvar_handle handle, const void *buf) } } while (0); - mpit_unlock (); + ompi_mpit_unlock (); return rc; } diff --git a/ompi/mpi/tool/enum_get_info.c b/ompi/mpi/tool/enum_get_info.c index 129682c2d3f..4e87bd0a676 100644 --- a/ompi/mpi/tool/enum_get_info.c +++ b/ompi/mpi/tool/enum_get_info.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -29,7 +30,7 @@ int MPI_T_enum_get_info(MPI_T_enum enumtype, int *num, char *name, int *name_len return MPI_T_ERR_NOT_INITIALIZED; } - mpit_lock (); + ompi_mpit_lock (); do { if (num) { @@ -43,7 +44,7 @@ int MPI_T_enum_get_info(MPI_T_enum enumtype, int *num, char *name, int *name_len mpit_copy_string (name, name_len, enumtype->enum_name); } while (0); - mpit_unlock (); + ompi_mpit_unlock (); return rc; } diff --git a/ompi/mpi/tool/enum_get_item.c b/ompi/mpi/tool/enum_get_item.c index f86f3abecd4..e9e8fff9ac2 100644 --- a/ompi/mpi/tool/enum_get_item.c +++ b/ompi/mpi/tool/enum_get_item.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -31,7 +32,7 @@ int MPI_T_enum_get_item(MPI_T_enum enumtype, int index, int *value, char *name, return MPI_T_ERR_NOT_INITIALIZED; } - mpit_lock (); + ompi_mpit_lock (); do { rc = enumtype->get_count (enumtype, &count); @@ -54,7 +55,7 @@ int MPI_T_enum_get_item(MPI_T_enum enumtype, int index, int *value, char *name, mpit_copy_string(name, name_len, tmp); } while (0); - mpit_unlock (); + ompi_mpit_unlock (); return rc; } diff --git a/ompi/mpi/tool/finalize.c b/ompi/mpi/tool/finalize.c index 38a0ce31ee9..27abe888b3d 100644 --- a/ompi/mpi/tool/finalize.c +++ b/ompi/mpi/tool/finalize.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2015 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -28,14 +29,14 @@ int MPI_T_finalize (void) { - mpit_lock (); + ompi_mpit_lock (); if (!mpit_is_initialized ()) { - mpit_unlock (); + ompi_mpit_unlock (); return MPI_T_ERR_NOT_INITIALIZED; } - if (0 == --mpit_init_count) { + if (0 == --ompi_mpit_init_count) { (void) ompi_info_close_components (); if ((!ompi_mpi_initialized || ompi_mpi_finalized) && @@ -49,7 +50,7 @@ int MPI_T_finalize (void) (void) opal_finalize_util (); } - mpit_unlock (); + ompi_mpit_unlock (); return MPI_SUCCESS; } diff --git a/ompi/mpi/tool/init_thread.c b/ompi/mpi/tool/init_thread.c index 8f0fb6b3c62..53c8e4cf988 100644 --- a/ompi/mpi/tool/init_thread.c +++ b/ompi/mpi/tool/init_thread.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -24,9 +25,9 @@ #include "ompi/mpi/tool/profile/defines.h" #endif -extern opal_mutex_t mpit_big_lock; +extern opal_mutex_t ompi_mpit_big_lock; -extern volatile uint32_t mpit_init_count; +extern volatile uint32_t ompi_mpit_init_count; extern volatile int32_t initted; @@ -34,10 +35,10 @@ int MPI_T_init_thread (int required, int *provided) { int rc = MPI_SUCCESS; - mpit_lock (); + ompi_mpit_lock (); do { - if (0 != mpit_init_count++) { + if (0 != ompi_mpit_init_count++) { break; } @@ -60,7 +61,7 @@ int MPI_T_init_thread (int required, int *provided) ompi_mpi_thread_level (required, provided); } while (0); - mpit_unlock (); + ompi_mpit_unlock (); return rc; } diff --git a/ompi/mpi/tool/mpit-internal.h b/ompi/mpi/tool/mpit-internal.h index 557472743b6..fb6c6b68684 100644 --- a/ompi/mpi/tool/mpit-internal.h +++ b/ompi/mpi/tool/mpit-internal.h @@ -3,6 +3,7 @@ * Copyright (c) 2011-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2011 UT-Battelle, LLC. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -31,17 +32,17 @@ typedef struct ompi_mpit_cvar_handle_t { void *bound_object; } ompi_mpit_cvar_handle_t; -void mpit_lock (void); -void mpit_unlock (void); +void ompi_mpit_lock (void); +void ompi_mpit_unlock (void); -extern volatile uint32_t mpit_init_count; +extern volatile uint32_t ompi_mpit_init_count; int ompit_var_type_to_datatype (mca_base_var_type_t type, MPI_Datatype *datatype); int ompit_opal_to_mpit_error (int rc); static inline int mpit_is_initialized (void) { - return !!mpit_init_count; + return !!ompi_mpit_init_count; } static inline void mpit_copy_string (char *dest, int *len, const char *source) diff --git a/ompi/mpi/tool/mpit_common.c b/ompi/mpi/tool/mpit_common.c index 9443402c207..d30e1b89c94 100644 --- a/ompi/mpi/tool/mpit_common.c +++ b/ompi/mpi/tool/mpit_common.c @@ -1,9 +1,10 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* - * Copyright (c) 2012-2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2012-2018 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -13,67 +14,55 @@ #include "ompi/mpi/tool/mpit-internal.h" -opal_mutex_t mpit_big_lock = OPAL_MUTEX_STATIC_INIT; +opal_mutex_t ompi_mpit_big_lock = OPAL_MUTEX_STATIC_INIT; -volatile uint32_t mpit_init_count = 0; +volatile uint32_t ompi_mpit_init_count = 0; -void mpit_lock (void) +void ompi_mpit_lock (void) { - opal_mutex_lock (&mpit_big_lock); + opal_mutex_lock (&ompi_mpit_big_lock); } -void mpit_unlock (void) +void ompi_mpit_unlock (void) { - opal_mutex_unlock (&mpit_big_lock); + opal_mutex_unlock (&ompi_mpit_big_lock); } +static MPI_Datatype mca_to_mpi_datatypes[MCA_BASE_VAR_TYPE_MAX] = { + [MCA_BASE_VAR_TYPE_INT] = MPI_INT, + [MCA_BASE_VAR_TYPE_UNSIGNED_INT] = MPI_UNSIGNED, + [MCA_BASE_VAR_TYPE_UNSIGNED_LONG] = MPI_UNSIGNED_LONG, + [MCA_BASE_VAR_TYPE_UNSIGNED_LONG_LONG] = MPI_UNSIGNED_LONG_LONG, + +#if SIZEOF_SIZE_T == SIZEOF_UNSIGNED_INT + [MCA_BASE_VAR_TYPE_SIZE_T] = MPI_UNSIGNED, +#elif SIZEOF_SIZE_T == SIZEOF_UNSIGNED_LONG + [MCA_BASE_VAR_TYPE_SIZE_T] = MPI_UNSIGNED_LONG, +#elif SIZEOF_SIZE_T == SIZEOF_LONG_LONG + [MCA_BASE_VAR_TYPE_SIZE_T] = MPI_UNSIGNED_LONG_LONG, +#else + [MCA_BASE_VAR_TYPE_SIZE_T] = NULL, +#endif + + [MCA_BASE_VAR_TYPE_STRING] = MPI_CHAR, + [MCA_BASE_VAR_TYPE_VERSION_STRING] = MPI_CHAR, + [MCA_BASE_VAR_TYPE_BOOL] = MPI_C_BOOL, + [MCA_BASE_VAR_TYPE_DOUBLE] = MPI_DOUBLE, + [MCA_BASE_VAR_TYPE_LONG] = MPI_LONG, + [MCA_BASE_VAR_TYPE_INT32_T] = MPI_INT32_T, + [MCA_BASE_VAR_TYPE_UINT32_T] = MPI_UINT32_T, + [MCA_BASE_VAR_TYPE_INT64_T] = MPI_INT64_T, + [MCA_BASE_VAR_TYPE_UINT64_T] = MPI_UINT64_T, +}; + int ompit_var_type_to_datatype (mca_base_var_type_t type, MPI_Datatype *datatype) { if (!datatype) { return OMPI_SUCCESS; } - switch (type) { - case MCA_BASE_VAR_TYPE_INT: - *datatype = MPI_INT; - break; - case MCA_BASE_VAR_TYPE_UNSIGNED_INT: - *datatype = MPI_UNSIGNED; - break; - case MCA_BASE_VAR_TYPE_UNSIGNED_LONG: - *datatype = MPI_UNSIGNED_LONG; - break; - case MCA_BASE_VAR_TYPE_UNSIGNED_LONG_LONG: - *datatype = MPI_UNSIGNED_LONG_LONG; - break; - case MCA_BASE_VAR_TYPE_SIZE_T: - if (sizeof (size_t) == sizeof (unsigned)) { - *datatype = MPI_UNSIGNED; - } else if (sizeof (size_t) == sizeof (unsigned long)) { - *datatype = MPI_UNSIGNED_LONG; - } else if (sizeof (size_t) == sizeof (unsigned long long)) { - *datatype = MPI_UNSIGNED_LONG_LONG; - } else { - /* not supported -- fixme */ - assert (0); - } - - break; - case MCA_BASE_VAR_TYPE_STRING: - case MCA_BASE_VAR_TYPE_VERSION_STRING: - *datatype = MPI_CHAR; - break; - case MCA_BASE_VAR_TYPE_BOOL: - *datatype = MPI_INT; - break; - case MCA_BASE_VAR_TYPE_DOUBLE: - *datatype = MPI_DOUBLE; - break; - default: - /* not supported -- fixme */ - assert (0); - break; - } + *datatype = mca_to_mpi_datatypes[type]; + assert (*datatype); return OMPI_SUCCESS; } diff --git a/ompi/mpi/tool/pvar_get_index.c b/ompi/mpi/tool/pvar_get_index.c index 88e71c5b4fe..b7d5d5e5244 100644 --- a/ompi/mpi/tool/pvar_get_index.c +++ b/ompi/mpi/tool/pvar_get_index.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2015 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -33,9 +34,9 @@ int MPI_T_pvar_get_index (const char *name, int var_class, int *pvar_index) return MPI_ERR_ARG; } - mpit_lock (); + ompi_mpit_lock (); ret = mca_base_pvar_find_by_name (name, var_class, pvar_index); - mpit_unlock (); + ompi_mpit_unlock (); if (OPAL_SUCCESS != ret) { return MPI_T_ERR_INVALID_NAME; } diff --git a/ompi/mpi/tool/pvar_get_info.c b/ompi/mpi/tool/pvar_get_info.c index 92aec5bea7b..8121558f49c 100644 --- a/ompi/mpi/tool/pvar_get_info.c +++ b/ompi/mpi/tool/pvar_get_info.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -33,7 +34,7 @@ int MPI_T_pvar_get_info(int pvar_index, char *name, int *name_len, return MPI_T_ERR_NOT_INITIALIZED; } - mpit_lock (); + ompi_mpit_lock (); do { /* Find the performance variable. mca_base_pvar_get() handles the @@ -88,7 +89,7 @@ int MPI_T_pvar_get_info(int pvar_index, char *name, int *name_len, } } while (0); - mpit_unlock (); + ompi_mpit_unlock (); return ret; } diff --git a/ompi/mpi/tool/pvar_handle_alloc.c b/ompi/mpi/tool/pvar_handle_alloc.c index 504fc6f74f0..770f51323a4 100644 --- a/ompi/mpi/tool/pvar_handle_alloc.c +++ b/ompi/mpi/tool/pvar_handle_alloc.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -31,7 +32,7 @@ int MPI_T_pvar_handle_alloc(MPI_T_pvar_session session, int pvar_index, return MPI_T_ERR_NOT_INITIALIZED; } - mpit_lock (); + ompi_mpit_lock (); do { /* Find the performance variable. mca_base_pvar_get() handles the @@ -52,7 +53,7 @@ int MPI_T_pvar_handle_alloc(MPI_T_pvar_session session, int pvar_index, handle, count); } while (0); - mpit_unlock (); + ompi_mpit_unlock (); return ompit_opal_to_mpit_error(ret); } diff --git a/ompi/mpi/tool/pvar_handle_free.c b/ompi/mpi/tool/pvar_handle_free.c index 9e50577d5b0..095964778ff 100644 --- a/ompi/mpi/tool/pvar_handle_free.c +++ b/ompi/mpi/tool/pvar_handle_free.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -29,7 +30,7 @@ int MPI_T_pvar_handle_free(MPI_T_pvar_session session, MPI_T_pvar_handle *handle return MPI_T_ERR_NOT_INITIALIZED; } - mpit_lock (); + ompi_mpit_lock (); do { /* Check that this is a valid handle */ @@ -49,7 +50,7 @@ int MPI_T_pvar_handle_free(MPI_T_pvar_session session, MPI_T_pvar_handle *handle *handle = MPI_T_PVAR_HANDLE_NULL; } while (0); - mpit_unlock (); + ompi_mpit_unlock (); return ret; } diff --git a/ompi/mpi/tool/pvar_read.c b/ompi/mpi/tool/pvar_read.c index 6710a3018e8..8314c9d4291 100644 --- a/ompi/mpi/tool/pvar_read.c +++ b/ompi/mpi/tool/pvar_read.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -34,11 +35,11 @@ int MPI_T_pvar_read(MPI_T_pvar_session session, MPI_T_pvar_handle handle, return MPI_T_ERR_INVALID_HANDLE; } - mpit_lock (); + ompi_mpit_lock (); ret = mca_base_pvar_handle_read_value (handle, buf); - mpit_unlock (); + ompi_mpit_unlock (); return ompit_opal_to_mpit_error (ret); } diff --git a/ompi/mpi/tool/pvar_reset.c b/ompi/mpi/tool/pvar_reset.c index cf05f58ea82..80e0bdeded5 100644 --- a/ompi/mpi/tool/pvar_reset.c +++ b/ompi/mpi/tool/pvar_reset.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2015 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -29,7 +30,7 @@ int MPI_T_pvar_reset(MPI_T_pvar_session session, MPI_T_pvar_handle handle) return MPI_T_ERR_NOT_INITIALIZED; } - mpit_lock (); + ompi_mpit_lock (); if (MPI_T_PVAR_ALL_HANDLES == handle) { OPAL_LIST_FOREACH(handle, &session->handles, mca_base_pvar_handle_t) { @@ -44,7 +45,7 @@ int MPI_T_pvar_reset(MPI_T_pvar_session session, MPI_T_pvar_handle handle) ret = mca_base_pvar_handle_reset (handle); } - mpit_unlock (); + ompi_mpit_unlock (); return ompit_opal_to_mpit_error (ret); } diff --git a/ompi/mpi/tool/pvar_session_create.c b/ompi/mpi/tool/pvar_session_create.c index 204a27d3fc0..6389125d529 100644 --- a/ompi/mpi/tool/pvar_session_create.c +++ b/ompi/mpi/tool/pvar_session_create.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -29,14 +30,14 @@ int MPI_T_pvar_session_create(MPI_T_pvar_session *session) return MPI_T_ERR_NOT_INITIALIZED; } - mpit_lock (); + ompi_mpit_lock (); *session = OBJ_NEW(mca_base_pvar_session_t); if (NULL == *session) { ret = MPI_ERR_NO_MEM; } - mpit_unlock (); + ompi_mpit_unlock (); return ret; } diff --git a/ompi/mpi/tool/pvar_start.c b/ompi/mpi/tool/pvar_start.c index 667c3cc486c..d2fce3fa2a6 100644 --- a/ompi/mpi/tool/pvar_start.c +++ b/ompi/mpi/tool/pvar_start.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -38,7 +39,7 @@ int MPI_T_pvar_start(MPI_T_pvar_session session, MPI_T_pvar_handle handle) return MPI_T_ERR_NOT_INITIALIZED; } - mpit_lock (); + ompi_mpit_lock (); if (MPI_T_PVAR_ALL_HANDLES == handle) { OPAL_LIST_FOREACH(handle, &session->handles, mca_base_pvar_handle_t) { @@ -53,7 +54,7 @@ int MPI_T_pvar_start(MPI_T_pvar_session session, MPI_T_pvar_handle handle) ret = pvar_handle_start (handle); } - mpit_unlock (); + ompi_mpit_unlock (); return ompit_opal_to_mpit_error (ret); } diff --git a/ompi/mpi/tool/pvar_stop.c b/ompi/mpi/tool/pvar_stop.c index 0866ac46a03..8923bbbf7b6 100644 --- a/ompi/mpi/tool/pvar_stop.c +++ b/ompi/mpi/tool/pvar_stop.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -38,7 +39,7 @@ int MPI_T_pvar_stop(MPI_T_pvar_session session, MPI_T_pvar_handle handle) return MPI_T_ERR_NOT_INITIALIZED; } - mpit_lock (); + ompi_mpit_lock (); if (MPI_T_PVAR_ALL_HANDLES == handle) { OPAL_LIST_FOREACH(handle, &session->handles, mca_base_pvar_handle_t) { @@ -55,7 +56,7 @@ int MPI_T_pvar_stop(MPI_T_pvar_session session, MPI_T_pvar_handle handle) ret = pvar_handle_stop (handle); } - mpit_unlock (); + ompi_mpit_unlock (); return ompit_opal_to_mpit_error (ret); } diff --git a/ompi/mpi/tool/pvar_write.c b/ompi/mpi/tool/pvar_write.c index 3f5368d552e..5bd17213600 100644 --- a/ompi/mpi/tool/pvar_write.c +++ b/ompi/mpi/tool/pvar_write.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -34,11 +35,11 @@ int MPI_T_pvar_write(MPI_T_pvar_session session, MPI_T_pvar_handle handle, return MPI_T_ERR_INVALID_HANDLE; } - mpit_lock (); + ompi_mpit_lock (); ret = mca_base_pvar_handle_write_value (handle, buf); - mpit_unlock (); + ompi_mpit_unlock (); return ompit_opal_to_mpit_error (ret); } diff --git a/ompi/mpiext/affinity/c/mpiext_affinity_str.c b/ompi/mpiext/affinity/c/mpiext_affinity_str.c index bc6412da665..6ccfa551a47 100644 --- a/ompi/mpiext/affinity/c/mpiext_affinity_str.c +++ b/ompi/mpiext/affinity/c/mpiext_affinity_str.c @@ -8,7 +8,7 @@ * reserved. * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2015 Intel, Inc. All rights reserved. + * Copyright (c) 2015-2017 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -34,6 +34,7 @@ #include "ompi/communicator/communicator.h" #include "ompi/errhandler/errhandler.h" +#include "ompi/mca/rte/rte.h" #include "ompi/mpi/c/bindings.h" #include "ompi/mpiext/affinity/c/mpiext_affinity_c.h" @@ -104,12 +105,12 @@ static int get_rsrc_ompi_bound(char str[OMPI_AFFINITY_STRING_MAX]) return OMPI_SUCCESS; } - if (NULL == orte_proc_applied_binding) { + if (NULL == ompi_proc_applied_binding) { ret = OPAL_ERR_NOT_BOUND; } else { ret = opal_hwloc_base_cset2str(str, OMPI_AFFINITY_STRING_MAX, opal_hwloc_topology, - orte_proc_applied_binding); + ompi_proc_applied_binding); } if (OPAL_ERR_NOT_BOUND == ret) { strncpy(str, not_bound_str, OMPI_AFFINITY_STRING_MAX - 1); @@ -131,7 +132,7 @@ static int get_rsrc_current_binding(char str[OMPI_AFFINITY_STRING_MAX]) /* get our root object */ root = hwloc_get_root_obj(opal_hwloc_topology); - rootset = opal_hwloc_base_get_available_cpus(opal_hwloc_topology, root); + rootset = root->cpuset; /* get our bindings */ boundset = hwloc_bitmap_alloc(); @@ -297,12 +298,12 @@ static int get_layout_ompi_bound(char str[OMPI_AFFINITY_STRING_MAX]) } /* Find out what OMPI bound us to and prettyprint it */ - if (NULL == orte_proc_applied_binding) { + if (NULL == ompi_proc_applied_binding) { ret = OPAL_ERR_NOT_BOUND; } else { ret = opal_hwloc_base_cset2mapstr(str, OMPI_AFFINITY_STRING_MAX, opal_hwloc_topology, - orte_proc_applied_binding); + ompi_proc_applied_binding); } if (OPAL_ERR_NOT_BOUND == ret) { strncpy(str, not_bound_str, OMPI_AFFINITY_STRING_MAX - 1); @@ -324,7 +325,7 @@ static int get_layout_current_binding(char str[OMPI_AFFINITY_STRING_MAX]) /* get our root object */ root = hwloc_get_root_obj(opal_hwloc_topology); - rootset = opal_hwloc_base_get_available_cpus(opal_hwloc_topology, root); + rootset = root->cpuset; /* get our bindings */ boundset = hwloc_bitmap_alloc(); diff --git a/ompi/mpiext/cr/c/quiesce_start.c b/ompi/mpiext/cr/c/quiesce_start.c index 9b61ebe6d0a..3c15ab2964a 100644 --- a/ompi/mpiext/cr/c/quiesce_start.c +++ b/ompi/mpiext/cr/c/quiesce_start.c @@ -6,6 +6,7 @@ * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2012 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -76,7 +77,7 @@ int OMPI_CR_Quiesce_start(MPI_Comm commP, MPI_Info *info) /* * (Old) info logic */ - /*ompi_info_set((ompi_info_t*)*info, "target", cur_datum.target_dir);*/ + /*opal_info_set((opal_info_t*)*info, "target", cur_datum.target_dir);*/ return ret; } @@ -123,7 +124,7 @@ int OMPI_CR_Quiesce_start(MPI_Comm commP, MPI_Info *info) * 1 = Memory must be in user space (i.e., not on network card * */ -static int extract_info_into_datum(ompi_info_t *info, orte_snapc_base_quiesce_t *datum) +static int extract_info_into_datum(opal_info_t *info, orte_snapc_base_quiesce_t *datum) { int info_flag = false; int max_crs_len = 32; @@ -135,7 +136,7 @@ static int extract_info_into_datum(ompi_info_t *info, orte_snapc_base_quiesce_t /* * Key: crs */ - ompi_info_get(info, "crs", max_crs_len, info_char, &info_flag); + opal_info_get(info, "crs", max_crs_len, info_char, &info_flag); if( info_flag) { datum->crs_name = strdup(info_char); } @@ -143,7 +144,7 @@ static int extract_info_into_datum(ompi_info_t *info, orte_snapc_base_quiesce_t /* * Key: cmdline */ - ompi_info_get(info, "cmdline", OPAL_PATH_MAX, info_char, &info_flag); + opal_info_get(info, "cmdline", OPAL_PATH_MAX, info_char, &info_flag); if( info_flag) { datum->cmdline = strdup(info_char); } @@ -151,7 +152,7 @@ static int extract_info_into_datum(ompi_info_t *info, orte_snapc_base_quiesce_t /* * Key: handle */ - ompi_info_get(info, "handle", OPAL_PATH_MAX, info_char, &info_flag); + opal_info_get(info, "handle", OPAL_PATH_MAX, info_char, &info_flag); if( info_flag) { datum->handle = strdup(info_char); } @@ -159,7 +160,7 @@ static int extract_info_into_datum(ompi_info_t *info, orte_snapc_base_quiesce_t /* * Key: target */ - ompi_info_get(info, "target", OPAL_PATH_MAX, info_char, &info_flag); + opal_info_get(info, "target", OPAL_PATH_MAX, info_char, &info_flag); if( info_flag) { datum->target_dir = strdup(info_char); } @@ -167,7 +168,7 @@ static int extract_info_into_datum(ompi_info_t *info, orte_snapc_base_quiesce_t /* * Key: restarting */ - ompi_info_get_bool(info, "restarting", &info_bool, &info_flag); + opal_info_get_bool(info, "restarting", &info_bool, &info_flag); if( info_flag ) { datum->restarting = info_bool; } else { @@ -177,7 +178,7 @@ static int extract_info_into_datum(ompi_info_t *info, orte_snapc_base_quiesce_t /* * Key: checkpointing */ - ompi_info_get_bool(info, "checkpointing", &info_bool, &info_flag); + opal_info_get_bool(info, "checkpointing", &info_bool, &info_flag); if( info_flag ) { datum->checkpointing = info_bool; } else { diff --git a/ompi/mpiext/example/use-mpi-f08/Makefile.am b/ompi/mpiext/example/use-mpi-f08/Makefile.am index 656a036f098..f495b4414d6 100644 --- a/ompi/mpiext/example/use-mpi-f08/Makefile.am +++ b/ompi/mpiext/example/use-mpi-f08/Makefile.am @@ -1,5 +1,7 @@ # # Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 Research Organization for Information Science +# and Technology (RIST). All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -12,7 +14,7 @@ # We must set these #defines and include paths so that the inner OMPI # MPI prototype header files do the Right Thing. -AM_FCFLAGS = $(OMPI_FC_MODULE_FLAG)$(top_builddir)/ompi/mpi/fortran/base \ +AM_FCFLAGS = $(OMPI_FC_MODULE_FLAG)$(top_builddir)/ompi/mpi/fortran/use-mpi-f08/mod \ -I$(top_srcdir) $(FCFLAGS_f90) # Note that the mpi_f08-based bindings are optional -- they can only diff --git a/ompi/op/op.h b/ompi/op/op.h index a99f64e9521..aa52688cb27 100644 --- a/ompi/op/op.h +++ b/ompi/op/op.h @@ -11,7 +11,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008 UT-Battelle, LLC - * Copyright (c) 2008-2012 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2008-2017 Cisco Systems, Inc. All rights reserved * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. @@ -199,7 +199,7 @@ OMPI_DECLSPEC OBJ_CLASS_DECLARATION(ompi_op_t); * See ompi/communicator/communicator.h comments with struct ompi_communicator_t * for full explanation why we chose the following padding construct for predefines. */ -#define PREDEFINED_OP_PAD (sizeof(void*) * 256) +#define PREDEFINED_OP_PAD 2048 struct ompi_predefined_op_t { struct ompi_op_t op; diff --git a/ompi/patterns/comm/allgather.c b/ompi/patterns/comm/allgather.c index 48321bf3cf4..1dbaafae770 100644 --- a/ompi/patterns/comm/allgather.c +++ b/ompi/patterns/comm/allgather.c @@ -3,8 +3,9 @@ * Copyright (c) 2009-2012 Oak Ridge National Laboratory. All rights reserved. * Copyright (c) 2012 Los Alamos National Security, LLC. * All rights reserved. - * Copyright (c) 2014 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -27,7 +28,7 @@ /** * All-reduce - subgroup in communicator */ -OMPI_DECLSPEC int comm_allgather_pml(void *src_buf, void *dest_buf, int count, +OMPI_DECLSPEC int ompi_comm_allgather_pml(void *src_buf, void *dest_buf, int count, ompi_datatype_t *dtype, int my_rank_in_group, int n_peers, int *ranks_in_comm,ompi_communicator_t *comm) { @@ -40,7 +41,7 @@ OMPI_DECLSPEC int comm_allgather_pml(void *src_buf, void *dest_buf, int count, netpatterns_pair_exchange_node_t my_exchange_node; size_t message_extent,current_data_extent,current_data_count; size_t dt_size; - OPAL_PTRDIFF_TYPE dt_extent; + ptrdiff_t dt_extent; char *src_buf_current; char *dest_buf_current; struct iovec send_iov[2] = {{0,0},{0,0}}, @@ -76,7 +77,7 @@ OMPI_DECLSPEC int comm_allgather_pml(void *src_buf, void *dest_buf, int count, /* get my reduction communication pattern */ memset(&my_exchange_node, 0, sizeof(netpatterns_pair_exchange_node_t)); - rc = netpatterns_setup_recursive_doubling_tree_node(n_peers, + rc = ompi_netpatterns_setup_recursive_doubling_tree_node(n_peers, my_rank_in_group, &my_exchange_node); if(OMPI_SUCCESS != rc){ return rc; @@ -283,7 +284,7 @@ OMPI_DECLSPEC int comm_allgather_pml(void *src_buf, void *dest_buf, int count, } } - netpatterns_cleanup_recursive_doubling_tree_node(&my_exchange_node); + ompi_netpatterns_cleanup_recursive_doubling_tree_node(&my_exchange_node); /* return */ return OMPI_SUCCESS; diff --git a/ompi/patterns/comm/allreduce.c b/ompi/patterns/comm/allreduce.c index 2fbf9e21773..7bd779a3554 100644 --- a/ompi/patterns/comm/allreduce.c +++ b/ompi/patterns/comm/allreduce.c @@ -3,8 +3,9 @@ * Copyright (c) 2009-2012 Oak Ridge National Laboratory. All rights reserved. * Copyright (c) 2012 Los Alamos National Security, LLC. * All rights reserved. - * Copyright (c) 2014 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -22,13 +23,14 @@ #include "opal/include/opal/sys/atomic.h" #include "ompi/mca/pml/pml.h" #include "ompi/patterns/net/netpatterns.h" +#include "ompi/mca/coll/base/coll_base_util.h" #include "coll_ops.h" #include "commpatterns.h" /** * All-reduce for contigous primitive types */ -OMPI_DECLSPEC int comm_allreduce_pml(void *sbuf, void *rbuf, int count, +OMPI_DECLSPEC int ompi_comm_allreduce_pml(void *sbuf, void *rbuf, int count, ompi_datatype_t *dtype, int my_rank_in_group, struct ompi_op_t *op, int n_peers,int *ranks_in_comm, ompi_communicator_t *comm) @@ -42,7 +44,6 @@ OMPI_DECLSPEC int comm_allreduce_pml(void *sbuf, void *rbuf, int count, char scratch_bufers[2][MAX_TMP_BUFFER]; int send_buffer=0,recv_buffer=1; char *sbuf_current, *rbuf_current; - ompi_request_t *requests[2]; /* get size of data needed - same layout as user data, so that * we can apply the reudction routines directly on these buffers @@ -51,7 +52,7 @@ OMPI_DECLSPEC int comm_allreduce_pml(void *sbuf, void *rbuf, int count, if( OMPI_SUCCESS != rc ) { goto Error; } - rc = ompi_datatype_type_extent(dtype, (OPAL_PTRDIFF_TYPE *)&dt_extent); + rc = ompi_datatype_type_extent(dtype, (ptrdiff_t *)&dt_extent); if( OMPI_SUCCESS != rc ) { goto Error; } @@ -79,7 +80,7 @@ OMPI_DECLSPEC int comm_allreduce_pml(void *sbuf, void *rbuf, int count, /* get my reduction communication pattern */ memset(&my_exchange_node, 0, sizeof(netpatterns_pair_exchange_node_t)); - rc = netpatterns_setup_recursive_doubling_tree_node(n_peers, + rc = ompi_netpatterns_setup_recursive_doubling_tree_node(n_peers, my_rank_in_group, &my_exchange_node); if(OMPI_SUCCESS != rc){ return rc; @@ -118,7 +119,7 @@ OMPI_DECLSPEC int comm_allreduce_pml(void *sbuf, void *rbuf, int count, -OMPI_COMMON_TAG_ALLREDUCE, comm, MPI_STATUSES_IGNORE)); if( 0 > rc ) { - fprintf(stderr," first recv failed in comm_allreduce_pml \n"); + fprintf(stderr," first recv failed in ompi_comm_allreduce_pml \n"); fflush(stderr); goto Error; } @@ -144,7 +145,7 @@ OMPI_DECLSPEC int comm_allreduce_pml(void *sbuf, void *rbuf, int count, -OMPI_COMMON_TAG_ALLREDUCE, MCA_PML_BASE_SEND_STANDARD, comm)); if( 0 > rc ) { - fprintf(stderr," first send failed in comm_allreduce_pml \n"); + fprintf(stderr," first send failed in ompi_comm_allreduce_pml \n"); fflush(stderr); goto Error; } @@ -165,32 +166,20 @@ OMPI_DECLSPEC int comm_allreduce_pml(void *sbuf, void *rbuf, int count, /* is the remote data read */ pair_rank=my_exchange_node.rank_exchanges[exchange]; - /* post non-blocking receive */ - rc=MCA_PML_CALL(irecv(scratch_bufers[recv_buffer], - count_this_stripe,dtype,ranks_in_comm[pair_rank], - -OMPI_COMMON_TAG_ALLREDUCE, - comm,&(requests[0]))); + rc=ompi_coll_base_sendrecv_actual(scratch_bufers[send_buffer], + count_this_stripe,dtype, ranks_in_comm[pair_rank], + -OMPI_COMMON_TAG_ALLREDUCE, + scratch_bufers[recv_buffer], + count_this_stripe,dtype,ranks_in_comm[pair_rank], + -OMPI_COMMON_TAG_ALLREDUCE, + comm, MPI_STATUS_IGNORE); if( 0 > rc ) { - fprintf(stderr," irecv failed in comm_allreduce_pml at iterations %d \n", + fprintf(stderr," irecv failed in ompi_comm_allreduce_pml at iterations %d \n", exchange); fflush(stderr); goto Error; } - /* post non-blocking send */ - rc=MCA_PML_CALL(isend(scratch_bufers[send_buffer], - count_this_stripe,dtype, ranks_in_comm[pair_rank], - -OMPI_COMMON_TAG_ALLREDUCE,MCA_PML_BASE_SEND_STANDARD, - comm,&(requests[1]))); - if( 0 > rc ) { - fprintf(stderr," isend failed in comm_allreduce_pml at iterations %d \n", - exchange); - fflush(stderr); - goto Error; - } - /* wait on send and receive completion */ - ompi_request_wait_all(2,requests,MPI_STATUSES_IGNORE); - /* reduce the data */ if( 0 < count_this_stripe ) { ompi_op_reduce(op, @@ -217,7 +206,7 @@ OMPI_DECLSPEC int comm_allreduce_pml(void *sbuf, void *rbuf, int count, -OMPI_COMMON_TAG_ALLREDUCE, comm, MPI_STATUSES_IGNORE)); if( 0 > rc ) { - fprintf(stderr," last recv failed in comm_allreduce_pml \n"); + fprintf(stderr," last recv failed in ompi_comm_allreduce_pml \n"); fflush(stderr); goto Error; } @@ -235,7 +224,7 @@ OMPI_DECLSPEC int comm_allreduce_pml(void *sbuf, void *rbuf, int count, -OMPI_COMMON_TAG_ALLREDUCE, MCA_PML_BASE_SEND_STANDARD, comm)); if( 0 > rc ) { - fprintf(stderr," last send failed in comm_allreduce_pml \n"); + fprintf(stderr," last send failed in ompi_comm_allreduce_pml \n"); fflush(stderr); goto Error; } @@ -250,7 +239,7 @@ OMPI_DECLSPEC int comm_allreduce_pml(void *sbuf, void *rbuf, int count, count_processed += count_this_stripe; } - netpatterns_cleanup_recursive_doubling_tree_node(&my_exchange_node); + ompi_netpatterns_cleanup_recursive_doubling_tree_node(&my_exchange_node); /* return */ return OMPI_SUCCESS; diff --git a/ompi/patterns/comm/bcast.c b/ompi/patterns/comm/bcast.c index 2a25d495db6..bc54613cc01 100644 --- a/ompi/patterns/comm/bcast.c +++ b/ompi/patterns/comm/bcast.c @@ -5,6 +5,7 @@ * All rights reserved. * Copyright (c) 2014 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -29,7 +30,7 @@ * This is a very simple algorithm - binary tree, transmitting the full * message at each step. */ -OMPI_DECLSPEC int comm_bcast_pml(void *buffer, int root, int count, +OMPI_DECLSPEC int ompi_comm_bcast_pml(void *buffer, int root, int count, ompi_datatype_t *dtype, int my_rank_in_group, int n_peers, int *ranks_in_comm,ompi_communicator_t *comm) { @@ -47,7 +48,7 @@ OMPI_DECLSPEC int comm_bcast_pml(void *buffer, int root, int count, /* * compute my communication pattern - binary tree */ - rc=netpatterns_setup_narray_tree(2, node_rank, n_peers, + rc=ompi_netpatterns_setup_narray_tree(2, node_rank, n_peers, &node_data); if( OMPI_SUCCESS != rc ) { goto Error; diff --git a/ompi/patterns/comm/coll_ops.h b/ompi/patterns/comm/coll_ops.h index 846e5660cc4..5acb66c1e69 100644 --- a/ompi/patterns/comm/coll_ops.h +++ b/ompi/patterns/comm/coll_ops.h @@ -3,6 +3,7 @@ * Copyright (c) 2009-2012 Oak Ridge National Laboratory. All rights reserved. * Copyright (c) 2012 Los Alamos National Security, LLC. * All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -26,14 +27,14 @@ BEGIN_C_DECLS -OMPI_DECLSPEC int comm_allgather_pml(void *src_buf, void *dest_buf, int count, +OMPI_DECLSPEC int ompi_comm_allgather_pml(void *src_buf, void *dest_buf, int count, ompi_datatype_t *dtype, int my_rank_in_group, int n_peers, int *ranks_in_comm,ompi_communicator_t *comm); -OMPI_DECLSPEC int comm_allreduce_pml(void *sbuf, void *rbuf, int count, +OMPI_DECLSPEC int ompi_comm_allreduce_pml(void *sbuf, void *rbuf, int count, ompi_datatype_t *dtype, int my_rank_in_group, struct ompi_op_t *op, int n_peers,int *ranks_in_comm, ompi_communicator_t *comm); -OMPI_DECLSPEC int comm_bcast_pml(void *buffer, int root, int count, +OMPI_DECLSPEC int ompi_comm_bcast_pml(void *buffer, int root, int count, ompi_datatype_t *dtype, int my_rank_in_group, int n_peers, int *ranks_in_comm,ompi_communicator_t *comm); diff --git a/ompi/patterns/net/allreduce.c b/ompi/patterns/net/allreduce.c index 1f0cc0b4a89..ecf95bfd977 100644 --- a/ompi/patterns/net/allreduce.c +++ b/ompi/patterns/net/allreduce.c @@ -3,6 +3,7 @@ * Copyright (c) 2009-2012 Oak Ridge National Laboratory. All rights reserved. * Copyright (c) 2012 Los Alamos National Security, LLC. * All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -152,7 +153,7 @@ comm_allreduce(void *sbuf, void *rbuf, int count, opal_datatype_t *dtype, } /* get my reduction communication pattern */ - ret=netpatterns_setup_recursive_doubling_tree_node(n_peers,my_rank,&my_exchange_node); + ret=ompi_netpatterns_setup_recursive_doubling_tree_node(n_peers,my_rank,&my_exchange_node); if(OMPI_SUCCESS != ret){ return ret; } diff --git a/ompi/patterns/net/netpatterns.h b/ompi/patterns/net/netpatterns.h index 1759fd8e646..d75c721dd5a 100644 --- a/ompi/patterns/net/netpatterns.h +++ b/ompi/patterns/net/netpatterns.h @@ -3,6 +3,7 @@ * Copyright (c) 2009-2012 Oak Ridge National Laboratory. All rights reserved. * Copyright (c) 2012 Los Alamos National Security, LLC. * All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -20,21 +21,21 @@ BEGIN_C_DECLS -int netpatterns_base_err(const char* fmt, ...); -int netpatterns_register_mca_params(void); +int ompi_netpatterns_base_err(const char* fmt, ...); +int ompi_netpatterns_register_mca_params(void); #if OPAL_ENABLE_DEBUG -extern int netpatterns_base_verbose; /* disabled by default */ -OMPI_DECLSPEC extern int netpatterns_base_err(const char*, ...) __opal_attribute_format__(__printf__, 1, 2); +extern int ompi_netpatterns_base_verbose; /* disabled by default */ +OMPI_DECLSPEC extern int ompi_netpatterns_base_err(const char*, ...) __opal_attribute_format__(__printf__, 1, 2); #define NETPATTERNS_VERBOSE(args) \ do { \ - if(netpatterns_base_verbose > 0) { \ - netpatterns_base_err("[%s]%s[%s:%d:%s] ",\ + if(ompi_netpatterns_base_verbose > 0) { \ + ompi_netpatterns_base_err("[%s]%s[%s:%d:%s] ",\ ompi_process_info.nodename, \ OMPI_NAME_PRINT(OMPI_PROC_MY_NAME), \ __FILE__, __LINE__, __func__); \ - netpatterns_base_err args; \ - netpatterns_base_err("\n"); \ + ompi_netpatterns_base_err args; \ + ompi_netpatterns_base_err("\n"); \ } \ } while(0); #else @@ -121,24 +122,24 @@ netpatterns_narray_knomial_tree_node_t; /* Init code for common_netpatterns */ -OMPI_DECLSPEC int netpatterns_init(void); +OMPI_DECLSPEC int ompi_netpatterns_init(void); /* setup an n-array tree */ -OMPI_DECLSPEC int netpatterns_setup_narray_tree(int tree_order, int my_rank, int num_nodes, +OMPI_DECLSPEC int ompi_netpatterns_setup_narray_tree(int tree_order, int my_rank, int num_nodes, netpatterns_tree_node_t *my_node); /* setup an n-array tree with k-nomial levels */ -OMPI_DECLSPEC int netpatterns_setup_narray_knomial_tree( int tree_order, int my_rank, int num_nodes, +OMPI_DECLSPEC int ompi_netpatterns_setup_narray_knomial_tree( int tree_order, int my_rank, int num_nodes, netpatterns_narray_knomial_tree_node_t *my_node); /* cleanup an n-array tree setup by the above function */ -OMPI_DECLSPEC void netpatterns_cleanup_narray_knomial_tree (netpatterns_narray_knomial_tree_node_t *my_node); +OMPI_DECLSPEC void ompi_netpatterns_cleanup_narray_knomial_tree (netpatterns_narray_knomial_tree_node_t *my_node); /* setup an multi-nomial tree - for each node in the tree * this returns it's parent, and it's children */ -OMPI_DECLSPEC int netpatterns_setup_multinomial_tree(int tree_order, int num_nodes, +OMPI_DECLSPEC int ompi_netpatterns_setup_multinomial_tree(int tree_order, int num_nodes, netpatterns_tree_node_t *tree_nodes); -OMPI_DECLSPEC int netpatterns_setup_narray_tree_contigous_ranks(int tree_order, +OMPI_DECLSPEC int ompi_netpatterns_setup_narray_tree_contigous_ranks(int tree_order, int num_nodes, netpatterns_tree_node_t **tree_nodes); /* calculate the nearest power of radix that is equal to or greater diff --git a/ompi/patterns/net/netpatterns_base.c b/ompi/patterns/net/netpatterns_base.c index bc51490def5..62669533a1e 100644 --- a/ompi/patterns/net/netpatterns_base.c +++ b/ompi/patterns/net/netpatterns_base.c @@ -2,6 +2,7 @@ * * Copyright (c) 2009-2012 Mellanox Technologies. All rights reserved. * Copyright (c) 2009-2012 Oak Ridge National Laboratory. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -13,22 +14,22 @@ #include "ompi/include/ompi/constants.h" #include "netpatterns.h" -int netpatterns_base_verbose = 0; /* disabled by default */ +int ompi_netpatterns_base_verbose = 0; /* disabled by default */ -int netpatterns_register_mca_params(void) +int ompi_netpatterns_register_mca_params(void) { - netpatterns_base_verbose = 0; + ompi_netpatterns_base_verbose = 0; mca_base_var_register("ompi", "common", "netpatterns", "base_verbose", "Verbosity level of the NETPATTERNS framework", MCA_BASE_VAR_TYPE_INT, NULL, 0, 0, OPAL_INFO_LVL_9, MCA_BASE_VAR_SCOPE_READONLY, - &netpatterns_base_verbose); + &ompi_netpatterns_base_verbose); return OMPI_SUCCESS; } -int netpatterns_base_err(const char* fmt, ...) +int ompi_netpatterns_base_err(const char* fmt, ...) { va_list list; int ret; @@ -39,16 +40,16 @@ int netpatterns_base_err(const char* fmt, ...) return ret; } -int netpatterns_init(void) +int ompi_netpatterns_init(void) { /* There is no component for common_netpatterns so every component that uses it - should call netpatterns_init, still we want to run it only once */ + should call ompi_netpatterns_init, still we want to run it only once */ static int was_called = 0; if (0 == was_called) { was_called = 1; - return netpatterns_register_mca_params(); + return ompi_netpatterns_register_mca_params(); } return OMPI_SUCCESS; diff --git a/ompi/patterns/net/netpatterns_knomial_tree.c b/ompi/patterns/net/netpatterns_knomial_tree.c index f09ef968fb7..09b45cc7428 100644 --- a/ompi/patterns/net/netpatterns_knomial_tree.c +++ b/ompi/patterns/net/netpatterns_knomial_tree.c @@ -6,6 +6,7 @@ * and Technology (RIST). All rights reserved. * Copyright (c) 2014 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -33,7 +34,7 @@ /* setup recursive doubleing tree node */ -OMPI_DECLSPEC int netpatterns_setup_recursive_knomial_allgather_tree_node( +OMPI_DECLSPEC int ompi_netpatterns_setup_recursive_knomial_allgather_tree_node( int num_nodes, int node_rank, int tree_order, int *hier_ranks, netpatterns_k_exchange_node_t *exchange_node) { @@ -52,7 +53,7 @@ OMPI_DECLSPEC int netpatterns_setup_recursive_knomial_allgather_tree_node( NETPATTERNS_VERBOSE( - ("Enter netpatterns_setup_recursive_knomial_tree_node(num_nodes=%d, node_rank=%d, tree_order=%d)", + ("Enter ompi_netpatterns_setup_recursive_knomial_tree_node(num_nodes=%d, node_rank=%d, tree_order=%d)", num_nodes, node_rank, tree_order)); assert(num_nodes > 1); @@ -504,7 +505,7 @@ OMPI_DECLSPEC int netpatterns_setup_recursive_knomial_allgather_tree_node( return OMPI_ERROR; } -OMPI_DECLSPEC void netpatterns_cleanup_recursive_knomial_allgather_tree_node( +OMPI_DECLSPEC void ompi_netpatterns_cleanup_recursive_knomial_allgather_tree_node( netpatterns_k_exchange_node_t *exchange_node) { int i; @@ -531,7 +532,7 @@ OMPI_DECLSPEC void netpatterns_cleanup_recursive_knomial_allgather_tree_node( free(exchange_node->payload_info); } -OMPI_DECLSPEC int netpatterns_setup_recursive_knomial_tree_node( +OMPI_DECLSPEC int ompi_netpatterns_setup_recursive_knomial_tree_node( int num_nodes, int node_rank, int tree_order, netpatterns_k_exchange_node_t *exchange_node) { @@ -541,7 +542,7 @@ OMPI_DECLSPEC int netpatterns_setup_recursive_knomial_tree_node( int k_base, kpow_num, peer; NETPATTERNS_VERBOSE( - ("Enter netpatterns_setup_recursive_knomial_tree_node(num_nodes=%d, node_rank=%d, tree_order=%d)", + ("Enter ompi_netpatterns_setup_recursive_knomial_tree_node(num_nodes=%d, node_rank=%d, tree_order=%d)", num_nodes, node_rank, tree_order)); assert(num_nodes > 1); @@ -669,13 +670,13 @@ OMPI_DECLSPEC int netpatterns_setup_recursive_knomial_tree_node( Error: - netpatterns_cleanup_recursive_knomial_tree_node (exchange_node); + ompi_netpatterns_cleanup_recursive_knomial_tree_node (exchange_node); /* error return */ return OMPI_ERROR; } -OMPI_DECLSPEC void netpatterns_cleanup_recursive_knomial_tree_node( +OMPI_DECLSPEC void ompi_netpatterns_cleanup_recursive_knomial_tree_node( netpatterns_k_exchange_node_t *exchange_node) { int i; @@ -697,7 +698,7 @@ OMPI_DECLSPEC void netpatterns_cleanup_recursive_knomial_tree_node( } #if 1 -OMPI_DECLSPEC int netpatterns_setup_recursive_doubling_n_tree_node(int num_nodes, int node_rank, int tree_order, +OMPI_DECLSPEC int ompi_netpatterns_setup_recursive_doubling_n_tree_node(int num_nodes, int node_rank, int tree_order, netpatterns_pair_exchange_node_t *exchange_node) { /* local variables */ @@ -705,7 +706,7 @@ OMPI_DECLSPEC int netpatterns_setup_recursive_doubling_n_tree_node(int num_nodes int n_levels; int shift, mask; - NETPATTERNS_VERBOSE(("Enter netpatterns_setup_recursive_doubling_n_tree_node(num_nodes=%d, node_rank=%d, tree_order=%d)", num_nodes, node_rank, tree_order)); + NETPATTERNS_VERBOSE(("Enter ompi_netpatterns_setup_recursive_doubling_n_tree_node(num_nodes=%d, node_rank=%d, tree_order=%d)", num_nodes, node_rank, tree_order)); assert(num_nodes > 1); while (tree_order > num_nodes) { @@ -838,7 +839,7 @@ OMPI_DECLSPEC int netpatterns_setup_recursive_doubling_n_tree_node(int num_nodes return OMPI_ERROR; } -OMPI_DECLSPEC void netpatterns_cleanup_recursive_doubling_tree_node( +OMPI_DECLSPEC void ompi_netpatterns_cleanup_recursive_doubling_tree_node( netpatterns_pair_exchange_node_t *exchange_node) { NETPATTERNS_VERBOSE(("About to release rank_extra_sources_array and rank_exchanges")); @@ -852,15 +853,15 @@ OMPI_DECLSPEC void netpatterns_cleanup_recursive_doubling_tree_node( } #endif -OMPI_DECLSPEC int netpatterns_setup_recursive_doubling_tree_node(int num_nodes, int node_rank, +OMPI_DECLSPEC int ompi_netpatterns_setup_recursive_doubling_tree_node(int num_nodes, int node_rank, netpatterns_pair_exchange_node_t *exchange_node) { - return netpatterns_setup_recursive_doubling_n_tree_node(num_nodes, node_rank, 2, exchange_node); + return ompi_netpatterns_setup_recursive_doubling_n_tree_node(num_nodes, node_rank, 2, exchange_node); } #if 0 /*OMPI_DECLSPEC int old_netpatterns_setup_recursive_doubling_tree_node(int num_nodes, int node_rank,*/ -OMPI_DECLSPEC int netpatterns_setup_recursive_doubling_n_tree_node(int num_nodes, int node_rank,int tree_order, +OMPI_DECLSPEC int ompi_netpatterns_setup_recursive_doubling_n_tree_node(int num_nodes, int node_rank,int tree_order, netpatterns_pair_exchange_node_t *exchange_node) { /* local variables */ diff --git a/ompi/patterns/net/netpatterns_knomial_tree.h b/ompi/patterns/net/netpatterns_knomial_tree.h index a5736a1d877..16dd6d81868 100644 --- a/ompi/patterns/net/netpatterns_knomial_tree.h +++ b/ompi/patterns/net/netpatterns_knomial_tree.h @@ -5,6 +5,7 @@ * All rights reserved. * Copyright (c) 2014 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -110,27 +111,27 @@ struct netpatterns_k_exchange_node_t { typedef struct netpatterns_k_exchange_node_t netpatterns_k_exchange_node_t; -OMPI_DECLSPEC int netpatterns_setup_recursive_doubling_n_tree_node(int num_nodes, int node_rank, int tree_order, +OMPI_DECLSPEC int ompi_netpatterns_setup_recursive_doubling_n_tree_node(int num_nodes, int node_rank, int tree_order, netpatterns_pair_exchange_node_t *exchange_node); -OMPI_DECLSPEC void netpatterns_cleanup_recursive_doubling_tree_node( +OMPI_DECLSPEC void ompi_netpatterns_cleanup_recursive_doubling_tree_node( netpatterns_pair_exchange_node_t *exchange_node); -OMPI_DECLSPEC int netpatterns_setup_recursive_doubling_tree_node(int num_nodes, int node_rank, +OMPI_DECLSPEC int ompi_netpatterns_setup_recursive_doubling_tree_node(int num_nodes, int node_rank, netpatterns_pair_exchange_node_t *exchange_node); -OMPI_DECLSPEC int netpatterns_setup_recursive_knomial_tree_node( +OMPI_DECLSPEC int ompi_netpatterns_setup_recursive_knomial_tree_node( int num_nodes, int node_rank, int tree_order, netpatterns_k_exchange_node_t *exchange_node); -OMPI_DECLSPEC void netpatterns_cleanup_recursive_knomial_tree_node( +OMPI_DECLSPEC void ompi_netpatterns_cleanup_recursive_knomial_tree_node( netpatterns_k_exchange_node_t *exchange_node); -OMPI_DECLSPEC int netpatterns_setup_recursive_knomial_allgather_tree_node( +OMPI_DECLSPEC int ompi_netpatterns_setup_recursive_knomial_allgather_tree_node( int num_nodes, int node_rank, int tree_order, int *hier_ranks, netpatterns_k_exchange_node_t *exchange_node); -OMPI_DECLSPEC void netpatterns_cleanup_recursive_knomial_allgather_tree_node( +OMPI_DECLSPEC void ompi_netpatterns_cleanup_recursive_knomial_allgather_tree_node( netpatterns_k_exchange_node_t *exchange_node); /* Input: k_exchange_node structure diff --git a/ompi/patterns/net/netpatterns_multinomial_tree.c b/ompi/patterns/net/netpatterns_multinomial_tree.c index 54fc41f4c98..bb397c91238 100644 --- a/ompi/patterns/net/netpatterns_multinomial_tree.c +++ b/ompi/patterns/net/netpatterns_multinomial_tree.c @@ -1,6 +1,7 @@ /* * Copyright (c) 2009-2012 Mellanox Technologies. All rights reserved. * Copyright (c) 2009-2012 Oak Ridge National Laboratory. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -26,7 +27,7 @@ /* setup an multi-nomial tree - for each node in the tree * this returns it's parent, and it's children */ -OMPI_DECLSPEC int netpatterns_setup_multinomial_tree(int tree_order, int num_nodes, +OMPI_DECLSPEC int ompi_netpatterns_setup_multinomial_tree(int tree_order, int num_nodes, netpatterns_tree_node_t *tree_nodes) { /* local variables */ diff --git a/ompi/patterns/net/netpatterns_nary_tree.c b/ompi/patterns/net/netpatterns_nary_tree.c index 6ab4b5be6e3..08f1543173d 100644 --- a/ompi/patterns/net/netpatterns_nary_tree.c +++ b/ompi/patterns/net/netpatterns_nary_tree.c @@ -4,6 +4,7 @@ * Copyright (c) 2009-2012 Oak Ridge National Laboratory. All rights reserved. * Copyright (c) 2014 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -32,7 +33,7 @@ /* setup an n-array tree */ -int netpatterns_setup_narray_tree(int tree_order, int my_rank, int num_nodes, +int ompi_netpatterns_setup_narray_tree(int tree_order, int my_rank, int num_nodes, netpatterns_tree_node_t *my_node) { /* local variables */ @@ -159,7 +160,7 @@ int netpatterns_setup_narray_tree(int tree_order, int my_rank, int num_nodes, return OMPI_ERROR; } -void netpatterns_cleanup_narray_knomial_tree (netpatterns_narray_knomial_tree_node_t *my_node) +void ompi_netpatterns_cleanup_narray_knomial_tree (netpatterns_narray_knomial_tree_node_t *my_node) { if (my_node->children_ranks) { free (my_node->children_ranks); @@ -167,11 +168,11 @@ void netpatterns_cleanup_narray_knomial_tree (netpatterns_narray_knomial_tree_no } if (0 != my_node->my_rank) { - netpatterns_cleanup_recursive_knomial_tree_node (&my_node->k_node); + ompi_netpatterns_cleanup_recursive_knomial_tree_node (&my_node->k_node); } } -int netpatterns_setup_narray_knomial_tree( +int ompi_netpatterns_setup_narray_knomial_tree( int tree_order, int my_rank, int num_nodes, netpatterns_narray_knomial_tree_node_t *my_node) { @@ -231,7 +232,7 @@ int netpatterns_setup_narray_knomial_tree( my_rank-cum_cnt; my_node->level_size = cnt; - rc = netpatterns_setup_recursive_knomial_tree_node( + rc = ompi_netpatterns_setup_recursive_knomial_tree_node( my_node->level_size, my_node->rank_on_level, tree_order, &my_node->k_node); if (OMPI_SUCCESS != rc) { @@ -430,7 +431,7 @@ static int fill_in_node_data(int tree_order, int num_nodes, int my_node, * ranks may be rotated based on who the actual root is, to obtain the * appropriate communication pattern for such roots. */ -OMPI_DECLSPEC int netpatterns_setup_narray_tree_contigous_ranks( +OMPI_DECLSPEC int ompi_netpatterns_setup_narray_tree_contigous_ranks( int tree_order, int num_nodes, netpatterns_tree_node_t **tree_nodes) { diff --git a/ompi/proc/proc.c b/ompi/proc/proc.c index 961e8c5f9b9..5b712bf25e1 100644 --- a/ompi/proc/proc.c +++ b/ompi/proc/proc.c @@ -116,6 +116,8 @@ static int ompi_proc_allocate (ompi_jobid_t jobid, ompi_vpid_t vpid, ompi_proc_t opal_hash_table_set_value_ptr (&ompi_proc_hash, &proc->super.proc_name, sizeof (proc->super.proc_name), proc); + /* by default we consider process to be remote */ + proc->super.proc_flags = OPAL_PROC_NON_LOCAL; *procp = proc; return OMPI_SUCCESS; @@ -133,26 +135,14 @@ static int ompi_proc_allocate (ompi_jobid_t jobid, ompi_vpid_t vpid, ompi_proc_t */ int ompi_proc_complete_init_single (ompi_proc_t *proc) { - uint16_t u16, *u16ptr; int ret; - u16ptr = &u16; - if ((OMPI_CAST_RTE_NAME(&proc->super.proc_name)->jobid == OMPI_PROC_MY_NAME->jobid) && (OMPI_CAST_RTE_NAME(&proc->super.proc_name)->vpid == OMPI_PROC_MY_NAME->vpid)) { /* nothing else to do */ return OMPI_SUCCESS; } - /* get the locality information - all RTEs are required - * to provide this information at startup */ - OPAL_MODEX_RECV_VALUE_OPTIONAL(ret, OPAL_PMIX_LOCALITY, &proc->super.proc_name, &u16ptr, OPAL_UINT16); - if (OPAL_SUCCESS != ret) { - proc->super.proc_flags = OPAL_PROC_NON_LOCAL; - } else { - proc->super.proc_flags = u16; - } - /* we can retrieve the hostname at no cost because it * was provided at startup - but make it optional so * we don't chase after it if some system doesn't @@ -287,20 +277,6 @@ int ompi_proc_init(void) } #endif - if (ompi_process_info.num_procs < ompi_add_procs_cutoff) { - /* create proc structures and find self */ - for (ompi_vpid_t i = 0 ; i < ompi_process_info.num_procs ; ++i ) { - if (i == OMPI_PROC_MY_NAME->vpid) { - continue; - } - - ret = ompi_proc_allocate (OMPI_PROC_MY_NAME->jobid, i, &proc); - if (OMPI_SUCCESS != ret) { - return ret; - } - } - } - return OMPI_SUCCESS; } @@ -329,11 +305,44 @@ static int ompi_proc_compare_vid (opal_list_item_t **a, opal_list_item_t **b) */ int ompi_proc_complete_init(void) { + opal_process_name_t wildcard_rank; ompi_proc_t *proc; int ret, errcode = OMPI_SUCCESS; + char *val; opal_mutex_lock (&ompi_proc_lock); + /* Add all local peers first */ + wildcard_rank.jobid = OMPI_PROC_MY_NAME->jobid; + wildcard_rank.vpid = OMPI_NAME_WILDCARD->vpid; + /* retrieve the local peers */ + OPAL_MODEX_RECV_VALUE(ret, OPAL_PMIX_LOCAL_PEERS, + &wildcard_rank, &val, OPAL_STRING); + if (OPAL_SUCCESS == ret && NULL != val) { + char **peers = opal_argv_split(val, ','); + int i; + free(val); + for (i=0; NULL != peers[i]; i++) { + ompi_vpid_t local_rank = strtoul(peers[i], NULL, 10); + uint16_t u16, *u16ptr = &u16; + if (OMPI_PROC_MY_NAME->vpid == local_rank) { + continue; + } + ret = ompi_proc_allocate (OMPI_PROC_MY_NAME->jobid, local_rank, &proc); + if (OMPI_SUCCESS != ret) { + return ret; + } + /* get the locality information - all RTEs are required + * to provide this information at startup */ + OPAL_MODEX_RECV_VALUE_OPTIONAL(ret, OPAL_PMIX_LOCALITY, &proc->super.proc_name, &u16ptr, OPAL_UINT16); + if (OPAL_SUCCESS == ret) { + proc->super.proc_flags = u16; + } + } + opal_argv_free(peers); + } + + /* Complete initialization of node-local procs */ OPAL_LIST_FOREACH(proc, &ompi_proc_list, ompi_proc_t) { ret = ompi_proc_complete_init_single (proc); if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { @@ -341,35 +350,32 @@ int ompi_proc_complete_init(void) break; } } - opal_mutex_unlock (&ompi_proc_lock); - if (ompi_process_info.num_procs >= ompi_add_procs_cutoff) { - char *val = NULL; - opal_process_name_t wildcard_rank; - wildcard_rank.jobid = OMPI_PROC_MY_NAME->jobid; - wildcard_rank.vpid = OMPI_NAME_WILDCARD->vpid; - /* retrieve the local peers */ - OPAL_MODEX_RECV_VALUE(ret, OPAL_PMIX_LOCAL_PEERS, - &wildcard_rank, &val, OPAL_STRING); - if (OPAL_SUCCESS == ret && NULL != val) { - char **peers = opal_argv_split(val, ','); - int i; - free(val); - for (i=0; NULL != peers[i]; i++) { - ompi_vpid_t local_rank = strtoul(peers[i], NULL, 10); - opal_process_name_t proc_name = {.vpid = local_rank, .jobid = OMPI_PROC_MY_NAME->jobid}; - - if (OMPI_PROC_MY_NAME->vpid == local_rank) { - continue; - } - (void) ompi_proc_for_name (proc_name); - } - opal_argv_free(peers); + /* if cutoff is larger than # of procs - add all processes + * NOTE that local procs will be automatically skipped as they + * are already in the hash table + */ + if (ompi_process_info.num_procs < ompi_add_procs_cutoff) { + /* sinse ompi_proc_for_name is locking internally - + * we need to release lock here + */ + opal_mutex_unlock (&ompi_proc_lock); + + for (ompi_vpid_t i = 0 ; i < ompi_process_info.num_procs ; ++i ) { + opal_process_name_t proc_name; + proc_name.jobid = OMPI_PROC_MY_NAME->jobid; + proc_name.vpid = i; + (void) ompi_proc_for_name (proc_name); } + + /* acquire lock back for the next step - sort */ + opal_mutex_lock (&ompi_proc_lock); } opal_list_sort (&ompi_proc_list, ompi_proc_compare_vid); + opal_mutex_unlock (&ompi_proc_lock); + return errcode; } diff --git a/ompi/request/req_wait.c b/ompi/request/req_wait.c index 411863ef760..e4d4d5e68a6 100644 --- a/ompi/request/req_wait.c +++ b/ompi/request/req_wait.c @@ -13,7 +13,7 @@ * Copyright (c) 2006-2008 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2010-2012 Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2012 Oak Ridge National Labs. All rights reserved. - * Copyright (c) 2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2016-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2016 Mellanox Technologies. All rights reserved. * Copyright (c) 2016 Research Organization for Information Science @@ -100,6 +100,8 @@ int ompi_request_default_wait_any(size_t count, num_requests_null_inactive = 0; for (i = 0; i < count; i++) { + void *_tmp_ptr = REQUEST_PENDING; + request = requests[i]; /* Check for null or completed persistent request. For @@ -110,7 +112,7 @@ int ompi_request_default_wait_any(size_t count, continue; } - if( !OPAL_ATOMIC_CMPSET_PTR(&request->req_complete, REQUEST_PENDING, &sync) ) { + if( !OPAL_ATOMIC_COMPARE_EXCHANGE_STRONG_PTR(&request->req_complete, &_tmp_ptr, &sync) ) { assert(REQUEST_COMPLETE(request)); completed = i; *index = i; @@ -136,6 +138,8 @@ int ompi_request_default_wait_any(size_t count, * user. */ for(i = completed-1; (i+1) > 0; i--) { + void *tmp_ptr = &sync; + request = requests[i]; if( request->req_state == OMPI_REQUEST_INACTIVE ) { @@ -146,7 +150,7 @@ int ompi_request_default_wait_any(size_t count, * Otherwise, the request has been completed meanwhile, and it * has been atomically marked as REQUEST_COMPLETE. */ - if( !OPAL_ATOMIC_CMPSET_PTR(&request->req_complete, &sync, REQUEST_PENDING) ) { + if( !OPAL_ATOMIC_COMPARE_EXCHANGE_STRONG_PTR(&request->req_complete, &tmp_ptr, REQUEST_PENDING) ) { *index = i; } } @@ -211,6 +215,8 @@ int ompi_request_default_wait_all( size_t count, WAIT_SYNC_INIT(&sync, count); rptr = requests; for (i = 0; i < count; i++) { + void *_tmp_ptr = REQUEST_PENDING; + request = *rptr++; if( request->req_state == OMPI_REQUEST_INACTIVE ) { @@ -218,7 +224,7 @@ int ompi_request_default_wait_all( size_t count, continue; } - if (!OPAL_ATOMIC_CMPSET_PTR(&request->req_complete, REQUEST_PENDING, &sync)) { + if (!OPAL_ATOMIC_COMPARE_EXCHANGE_STRONG_PTR(&request->req_complete, &_tmp_ptr, &sync)) { if( OPAL_UNLIKELY( MPI_SUCCESS != request->req_status.MPI_ERROR ) ) { failed++; } @@ -246,6 +252,8 @@ int ompi_request_default_wait_all( size_t count, if (MPI_STATUSES_IGNORE != statuses) { /* fill out status and free request if required */ for( i = 0; i < count; i++, rptr++ ) { + void *_tmp_ptr = &sync; + request = *rptr; if( request->req_state == OMPI_REQUEST_INACTIVE ) { @@ -260,7 +268,7 @@ int ompi_request_default_wait_all( size_t count, * mark the request as pending then it is neither failed nor complete, and * we must stop altering it. */ - if( OPAL_ATOMIC_CMPSET_PTR(&request->req_complete, &sync, REQUEST_PENDING ) ) { + if( OPAL_ATOMIC_COMPARE_EXCHANGE_STRONG_PTR(&request->req_complete, &_tmp_ptr, REQUEST_PENDING ) ) { /* * Per MPI 2.2 p 60: * Allows requests to be marked as MPI_ERR_PENDING if they are @@ -306,6 +314,8 @@ int ompi_request_default_wait_all( size_t count, int rc; /* free request if required */ for( i = 0; i < count; i++, rptr++ ) { + void *_tmp_ptr = &sync; + request = *rptr; if( request->req_state == OMPI_REQUEST_INACTIVE ) { @@ -320,7 +330,7 @@ int ompi_request_default_wait_all( size_t count, /* If the request is still pending due to a failed request * then skip it in this loop. */ - if( OPAL_ATOMIC_CMPSET_PTR(&request->req_complete, &sync, REQUEST_PENDING ) ) { + if( OPAL_ATOMIC_COMPARE_EXCHANGE_STRONG_PTR(&request->req_complete, &_tmp_ptr, REQUEST_PENDING ) ) { /* * Per MPI 2.2 p 60: * Allows requests to be marked as MPI_ERR_PENDING if they are @@ -398,6 +408,8 @@ int ompi_request_default_wait_some(size_t count, num_requests_null_inactive = 0; num_requests_done = 0; for (size_t i = 0; i < count; i++, rptr++) { + void *_tmp_ptr = REQUEST_PENDING; + request = *rptr; /* * Check for null or completed persistent request. @@ -407,7 +419,7 @@ int ompi_request_default_wait_some(size_t count, num_requests_null_inactive++; continue; } - indices[i] = OPAL_ATOMIC_CMPSET_PTR(&request->req_complete, REQUEST_PENDING, &sync); + indices[i] = OPAL_ATOMIC_COMPARE_EXCHANGE_STRONG_PTR(&request->req_complete, &_tmp_ptr, &sync); if( !indices[i] ) { /* If the request is completed go ahead and mark it as such */ assert( REQUEST_COMPLETE(request) ); @@ -434,6 +446,8 @@ int ompi_request_default_wait_some(size_t count, rptr = requests; num_requests_done = 0; for (size_t i = 0; i < count; i++, rptr++) { + void *_tmp_ptr = &sync; + request = *rptr; if( request->req_state == OMPI_REQUEST_INACTIVE ) { @@ -454,7 +468,7 @@ int ompi_request_default_wait_some(size_t count, */ if( !indices[i] ){ indices[num_requests_done++] = i; - } else if( !OPAL_ATOMIC_CMPSET_PTR(&request->req_complete, &sync, REQUEST_PENDING) ) { + } else if( !OPAL_ATOMIC_COMPARE_EXCHANGE_STRONG_PTR(&request->req_complete, &_tmp_ptr, REQUEST_PENDING) ) { indices[num_requests_done++] = i; } } diff --git a/ompi/request/request.c b/ompi/request/request.c index 8a73624ba36..6c37008473b 100644 --- a/ompi/request/request.c +++ b/ompi/request/request.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2016 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2007 High Performance Computing Center Stuttgart, @@ -55,6 +55,7 @@ static void ompi_request_construct(ompi_request_t* req) req->req_state = OMPI_REQUEST_INVALID; req->req_complete = false; req->req_persistent = false; + req->req_start = NULL; req->req_free = NULL; req->req_cancel = NULL; req->req_complete_cb = NULL; @@ -108,7 +109,7 @@ int ompi_request_init(void) OBJ_CONSTRUCT(&ompi_request_null, ompi_request_t); OBJ_CONSTRUCT(&ompi_request_f_to_c_table, opal_pointer_array_t); if( OPAL_SUCCESS != opal_pointer_array_init(&ompi_request_f_to_c_table, - 0, OMPI_FORTRAN_HANDLE_MAX, 64) ) { + 0, OMPI_FORTRAN_HANDLE_MAX, 32) ) { return OMPI_ERROR; } ompi_request_null.request.req_type = OMPI_REQUEST_NULL; @@ -123,6 +124,7 @@ int ompi_request_init(void) ompi_request_null.request.req_persistent = false; ompi_request_null.request.req_f_to_c_index = opal_pointer_array_add(&ompi_request_f_to_c_table, &ompi_request_null); + ompi_request_null.request.req_start = NULL; /* should not be called */ ompi_request_null.request.req_free = ompi_request_null_free; ompi_request_null.request.req_cancel = ompi_request_null_cancel; ompi_request_null.request.req_mpi_object.comm = &ompi_mpi_comm_world.comm; @@ -155,6 +157,7 @@ int ompi_request_init(void) ompi_request_empty.req_persistent = false; ompi_request_empty.req_f_to_c_index = opal_pointer_array_add(&ompi_request_f_to_c_table, &ompi_request_empty); + ompi_request_empty.req_start = NULL; /* should not be called */ ompi_request_empty.req_free = ompi_request_empty_free; ompi_request_empty.req_cancel = ompi_request_null_cancel; ompi_request_empty.req_mpi_object.comm = &ompi_mpi_comm_world.comm; diff --git a/ompi/request/request.h b/ompi/request/request.h index 9587486ec8c..5a1c02c4b65 100644 --- a/ompi/request/request.h +++ b/ompi/request/request.h @@ -10,10 +10,10 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2006-2017 Cisco Systems, Inc. All rights reserved * Copyright (c) 2009-2012 Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2012 Oak Ridge National Labs. All rights reserved. - * Copyright (c) 2015-2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2015-2017 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -55,6 +55,26 @@ OMPI_DECLSPEC OBJ_CLASS_DECLARATION(ompi_request_t); struct ompi_request_t; +/** + * Initiate one or more persistent requests. + * + * This function is called by MPI_START and MPI_STARTALL. + * + * When called by MPI_START, count is 1. + * + * When called by MPI_STARTALL, multiple requests which have the same + * req_start value are passed. This may help scheduling optimization + * of multiple communications. + * + * @param count (IN) Number of requests + * @param requests (IN/OUT) Array of persistent requests + * @return OMPI_SUCCESS or failure status. + */ +typedef int (*ompi_request_start_fn_t)( + size_t count, + struct ompi_request_t ** requests +); + /* * Required function to free the request and any associated resources. */ @@ -109,6 +129,7 @@ struct ompi_request_t { volatile ompi_request_state_t req_state; /**< enum indicate state of the request */ bool req_persistent; /**< flag indicating if the this is a persistent request */ int req_f_to_c_index; /**< Index in Fortran <-> C translation array */ + ompi_request_start_fn_t req_start; /**< Called by MPI_START and MPI_STARTALL */ ompi_request_free_fn_t req_free; /**< Called by free */ ompi_request_cancel_fn_t req_cancel; /**< Optional function to cancel the request */ ompi_request_complete_fn_t req_complete_cb; /**< Called when the request is MPI completed */ @@ -127,7 +148,7 @@ typedef struct ompi_request_t ompi_request_t; * See ompi/communicator/communicator.h comments with struct ompi_communicator_t * for full explanation why we chose the following padding construct for predefines. */ -#define PREDEFINED_REQUEST_PAD (sizeof(void*) * 32) +#define PREDEFINED_REQUEST_PAD 256 struct ompi_predefined_request_t { struct ompi_request_t request; @@ -375,10 +396,12 @@ static inline int ompi_request_free(ompi_request_t** request) static inline void ompi_request_wait_completion(ompi_request_t *req) { if (opal_using_threads () && !REQUEST_COMPLETE(req)) { + void *_tmp_ptr = REQUEST_PENDING; ompi_wait_sync_t sync; + WAIT_SYNC_INIT(&sync, 1); - if (OPAL_ATOMIC_CMPSET_PTR(&req->req_complete, REQUEST_PENDING, &sync)) { + if (OPAL_ATOMIC_COMPARE_EXCHANGE_STRONG_PTR(&req->req_complete, &_tmp_ptr, &sync)) { SYNC_WAIT(&sync); } else { /* completed before we had a chance to swap in the sync object */ @@ -418,7 +441,9 @@ static inline int ompi_request_complete(ompi_request_t* request, bool with_signa if (0 == rc) { if( OPAL_LIKELY(with_signal) ) { - if(!OPAL_ATOMIC_CMPSET_PTR(&request->req_complete, REQUEST_PENDING, REQUEST_COMPLETED)) { + void *_tmp_ptr = REQUEST_PENDING; + + if(!OPAL_ATOMIC_COMPARE_EXCHANGE_STRONG_PTR(&request->req_complete, &_tmp_ptr, REQUEST_COMPLETED)) { ompi_wait_sync_t *tmp_sync = (ompi_wait_sync_t *) OPAL_ATOMIC_SWAP_PTR(&request->req_complete, REQUEST_COMPLETED); /* In the case where another thread concurrently changed the request to REQUEST_PENDING */ diff --git a/ompi/runtime/help-mpi-runtime.txt b/ompi/runtime/help-mpi-runtime.txt index f2028417b98..ee0e29d6da0 100644 --- a/ompi/runtime/help-mpi-runtime.txt +++ b/ompi/runtime/help-mpi-runtime.txt @@ -12,6 +12,7 @@ # All rights reserved. # Copyright (c) 2007-2015 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2013 NVIDIA Corporation. All rights reserved. +# Copyright (c) 2017 Intel, Inc. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -93,3 +94,13 @@ Open MPI with --enable-heterogeneous. [no cuda support] The user requested CUDA support with the --mca mpi_cuda_support 1 flag but the library was not compiled with any support. +# +[noconxcpt] +The user has called an operation involving MPI_Connect and/or MPI_Accept, +but this environment lacks the necessary infrastructure support for +that operation. Open MPI relies on the PMIx_Publish/Lookup (or one of +its predecessors) APIs for this operation. + +This typically happens when launching outside of mpirun where the underlying +resource manager does not provide publish/lookup support. One way of solving +the problem is to simply use mpirun to start the application. diff --git a/ompi/runtime/ompi_mpi_abort.c b/ompi/runtime/ompi_mpi_abort.c index db96d98e864..672203d4c27 100644 --- a/ompi/runtime/ompi_mpi_abort.c +++ b/ompi/runtime/ompi_mpi_abort.c @@ -18,6 +18,7 @@ * reserved. * Copyright (c) 2015 Mellanox Technologies, Inc. * All rights reserved. + * Copyright (c) 2017 FUJITSU LIMITED. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -42,6 +43,7 @@ #include #include "opal/mca/backtrace/backtrace.h" +#include "opal/util/error.h" #include "opal/runtime/opal_params.h" #include "ompi/communicator/communicator.h" @@ -146,7 +148,7 @@ ompi_mpi_abort(struct ompi_communicator_t* comm, if (OPAL_SUCCESS == opal_backtrace_buffer(&messages, &len)) { for (i = 0; i < len; ++i) { - fprintf(stderr, "[%s:%d] [%d] func:%s\n", host, (int) pid, + fprintf(stderr, "[%s:%05d] [%d] func:%s\n", host, (int) pid, i, messages[i]); fflush(stderr); } @@ -159,29 +161,13 @@ ompi_mpi_abort(struct ompi_communicator_t* comm, } } - /* Should we wait for a while before aborting? */ - - if (0 != opal_abort_delay) { - if (opal_abort_delay < 0) { - fprintf(stderr ,"[%s:%d] Looping forever (MCA parameter opal_abort_delay is < 0)\n", - host, (int) pid); - fflush(stderr); - while (1) { - sleep(5); - } - } else { - fprintf(stderr, "[%s:%d] Delaying for %d seconds before aborting\n", - host, (int) pid, opal_abort_delay); - do { - sleep(1); - } while (--opal_abort_delay > 0); - } - } + /* Wait for a while before aborting */ + opal_delay_abort(); /* If the RTE isn't setup yet/any more, then don't even try killing everyone. Sorry, Charlie... */ if (!ompi_rte_initialized) { - fprintf(stderr, "[%s:%d] Local abort %s completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!\n", + fprintf(stderr, "[%s:%05d] Local abort %s completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!\n", host, (int) pid, ompi_mpi_finalized ? "after MPI_FINALIZE started" : "before MPI_INIT completed"); _exit(errcode == 0 ? 1 : errcode); diff --git a/ompi/runtime/ompi_mpi_finalize.c b/ompi/runtime/ompi_mpi_finalize.c index efa3f7fbb2d..1a143c1e860 100644 --- a/ompi/runtime/ompi_mpi_finalize.c +++ b/ompi/runtime/ompi_mpi_finalize.c @@ -16,10 +16,11 @@ * Copyright (c) 2006 University of Houston. All rights reserved. * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2011 Sandia National Laboratories. All rights reserved. - * Copyright (c) 2014-2016 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2017 Intel, Inc. All rights reserved. * Copyright (c) 2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. * + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -91,12 +92,13 @@ #include "ompi/runtime/ompi_cr.h" extern bool ompi_enable_timing; -extern bool ompi_enable_timing_ext; static void fence_cbfunc(int status, void *cbdata) { - volatile bool *active = (volatile bool*)cbdata; - *active = false; + volatile bool *active = (volatile bool*)cbdata; + OPAL_ACQUIRE_OBJECT(active); + *active = false; + OPAL_POST_OBJECT(active); } int ompi_mpi_finalize(void) @@ -108,8 +110,8 @@ int ompi_mpi_finalize(void) volatile bool active; uint32_t key; ompi_datatype_t * datatype; - OPAL_TIMING_DECLARE(tm); - OPAL_TIMING_INIT_EXT(&tm, OPAL_TIMING_GET_TIME_OF_DAY); + //OPAL_TIMING_DECLARE(tm); + //OPAL_TIMING_INIT_EXT(&tm, OPAL_TIMING_GET_TIME_OF_DAY); ompi_hook_base_mpi_finalize_top(); @@ -176,7 +178,7 @@ int ompi_mpi_finalize(void) opal_progress_event_users_increment(); /* check to see if we want timing information */ - OPAL_TIMING_MSTART((&tm,"time to execute finalize barrier")); + //OPAL_TIMING_MSTART((&tm,"time to execute finalize barrier")); /* NOTE: MPI-2.1 requires that MPI_FINALIZE is "collective" across *all* connected processes. This only means that all processes @@ -256,6 +258,7 @@ int ompi_mpi_finalize(void) if (!ompi_async_mpi_finalize) { if (NULL != opal_pmix.fence_nb) { active = true; + OPAL_POST_OBJECT(&active); /* Note that use of the non-blocking PMIx fence will * allow us to lazily cycle calling * opal_progress(), which will allow any other pending @@ -279,10 +282,7 @@ int ompi_mpi_finalize(void) /* check for timing request - get stop time and report elapsed time if so */ - OPAL_TIMING_MSTOP(&tm); - OPAL_TIMING_DELTAS(ompi_enable_timing, &tm); - OPAL_TIMING_REPORT(ompi_enable_timing_ext, &tm); - OPAL_TIMING_RELEASE(&tm); + //OPAL_TIMING_DELTAS(ompi_enable_timing, &tm); /* * Shutdown the Checkpoint/Restart Mech. @@ -427,7 +427,7 @@ int ompi_mpi_finalize(void) } /* free info resources */ - if (OMPI_SUCCESS != (ret = ompi_info_finalize())) { + if (OMPI_SUCCESS != (ret = ompi_mpiinfo_finalize())) { goto done; } diff --git a/ompi/runtime/ompi_mpi_init.c b/ompi/runtime/ompi_mpi_init.c index a39424ff80e..e707c428125 100644 --- a/ompi/runtime/ompi_mpi_init.c +++ b/ompi/runtime/ompi_mpi_init.c @@ -20,8 +20,9 @@ * Copyright (c) 2014-2017 Intel, Inc. All rights reserved. * Copyright (c) 2014-2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved. + * Copyright (c) 2016-2018 Mellanox Technologies Ltd. All rights reserved. * + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -58,7 +59,7 @@ #include "opal/mca/rcache/rcache.h" #include "opal/mca/mpool/base/base.h" #include "opal/mca/btl/base/base.h" -#include "opal/mca/pmix/pmix.h" +#include "opal/mca/pmix/base/base.h" #include "opal/util/timings.h" #include "opal/util/opal_environ.h" @@ -70,6 +71,7 @@ #include "ompi/info/info.h" #include "ompi/errhandler/errcode.h" #include "ompi/errhandler/errhandler.h" +#include "ompi/interlib/interlib.h" #include "ompi/request/request.h" #include "ompi/message/message.h" #include "ompi/op/op.h" @@ -93,6 +95,7 @@ #include "ompi/dpm/dpm.h" #include "ompi/mpiext/mpiext.h" #include "ompi/mca/hook/base/base.h" +#include "ompi/util/timings.h" #if OPAL_ENABLE_FT_CR == 1 #include "ompi/mca/crcp/crcp.h" @@ -279,7 +282,7 @@ opal_hash_table_t ompi_mpi_f90_complex_hashtable = {{0}}; */ opal_list_t ompi_registered_datareps = {{0}}; -bool ompi_enable_timing = false, ompi_enable_timing_ext = false; +bool ompi_enable_timing = false; extern bool ompi_mpi_yield_when_idle; extern int ompi_mpi_event_tick_rate; @@ -314,7 +317,6 @@ static int _convert_process_name_to_string(char** name_string, return ompi_rte_convert_process_name_to_string(name_string, name); } - void ompi_mpi_thread_level(int requested, int *provided) { /** @@ -348,6 +350,9 @@ static int ompi_register_mca_variables(void) } /* check to see if we want timing information */ + /* TODO: enable OMPI init and OMPI finalize timings if + * this variable was set to 1! + */ ompi_enable_timing = false; (void) mca_base_var_register("ompi", "ompi", NULL, "timing", "Request that critical timing loops be measured", @@ -356,20 +361,15 @@ static int ompi_register_mca_variables(void) MCA_BASE_VAR_SCOPE_READONLY, &ompi_enable_timing); - ompi_enable_timing_ext = false; - (void) mca_base_var_register("ompi", "ompi", NULL, "timing_ext", - "Request that critical timing loops be measured", - MCA_BASE_VAR_TYPE_BOOL, NULL, 0, 0, - OPAL_INFO_LVL_9, - MCA_BASE_VAR_SCOPE_READONLY, - &ompi_enable_timing_ext); return OMPI_SUCCESS; } static void fence_release(int status, void *cbdata) { volatile bool *active = (volatile bool*)cbdata; + OPAL_ACQUIRE_OBJECT(active); *active = false; + OPAL_POST_OBJECT(active); } int ompi_mpi_init(int argc, char **argv, int requested, int *provided) @@ -379,15 +379,12 @@ int ompi_mpi_init(int argc, char **argv, int requested, int *provided) size_t nprocs; char *error = NULL; ompi_errhandler_errtrk_t errtrk; - volatile bool active; opal_list_t info; opal_value_t *kv; - OPAL_TIMING_DECLARE(tm); - OPAL_TIMING_INIT_EXT(&tm, OPAL_TIMING_GET_TIME_OF_DAY); + volatile bool active; + bool background_fence = false; - /* bitflag of the thread level support provided. To be used - * for the modex in order to work in heterogeneous environments. */ - uint8_t threadlevel_bf; + OMPI_TIMING_INIT(64); ompi_hook_base_mpi_init_top(argc, argv, requested, provided); @@ -426,6 +423,7 @@ int ompi_mpi_init(int argc, char **argv, int requested, int *provided) error = "ompi_mpi_init: opal_init_util failed"; goto error; } + OMPI_TIMING_IMPORT_OPAL("opal_init_util"); /* If thread support was enabled, then setup OPAL to allow for them. This must be done * early to prevent a race condition that can occur with orte_init(). */ @@ -486,7 +484,7 @@ int ompi_mpi_init(int argc, char **argv, int requested, int *provided) ompi_hook_base_mpi_init_top_post_opal(argc, argv, requested, provided); - OPAL_TIMING_MSTART((&tm,"time from start to completion of rte_init")); + OMPI_TIMING_NEXT("initialization"); /* if we were not externally started, then we need to setup * some envars so the MPI_INFO_ENV can get the cmd name @@ -515,10 +513,11 @@ int ompi_mpi_init(int argc, char **argv, int requested, int *provided) error = "ompi_mpi_init: ompi_rte_init failed"; goto error; } - ompi_rte_initialized = true; + OMPI_TIMING_NEXT("rte_init"); + OMPI_TIMING_IMPORT_OPAL("orte_ess_base_app_setup"); + OMPI_TIMING_IMPORT_OPAL("rte_init"); - /* check for timing request - get stop time and report elapsed time if so */ - OPAL_TIMING_MNEXT((&tm,"time from completion of rte_init to modex")); + ompi_rte_initialized = true; /* Register the default errhandler callback */ errtrk.status = OPAL_ERROR; @@ -528,6 +527,12 @@ int ompi_mpi_init(int argc, char **argv, int requested, int *provided) kv = OBJ_NEW(opal_value_t); kv->key = strdup(OPAL_PMIX_EVENT_ORDER_PREPEND); opal_list_append(&info, &kv->super); + /* give it a name so we can distinguish it */ + kv = OBJ_NEW(opal_value_t); + kv->key = strdup(OPAL_PMIX_EVENT_HDLR_NAME); + kv->type = OPAL_STRING; + kv->data.string = strdup("MPI-Default"); + opal_list_append(&info, &kv->super); opal_pmix.register_evhandler(NULL, &info, ompi_errhandler_callback, ompi_errhandler_registration_callback, (void*)&errtrk); @@ -540,16 +545,10 @@ int ompi_mpi_init(int argc, char **argv, int requested, int *provided) goto error; } - - /* determine the bitflag belonging to the threadlevel_support provided */ - memset ( &threadlevel_bf, 0, sizeof(uint8_t)); - OMPI_THREADLEVEL_SET_BITFLAG ( ompi_mpi_thread_provided, threadlevel_bf ); - - /* add this bitflag to the modex */ - OPAL_MODEX_SEND_STRING(ret, OPAL_PMIX_GLOBAL, - "MPI_THREAD_LEVEL", &threadlevel_bf, sizeof(uint8_t)); - if (OPAL_SUCCESS != ret) { - error = "ompi_mpi_init: modex send thread level"; + /* declare our presence for interlib coordination, and + * register for callbacks when other libs declare */ + if (OMPI_SUCCESS != (ret = ompi_interlib_declare(*provided, OMPI_IDENT_STRING))) { + error = "ompi_interlib_declare"; goto error; } @@ -645,26 +644,59 @@ int ompi_mpi_init(int argc, char **argv, int requested, int *provided) goto error; } - /* check for timing request - get stop time and report elapsed time if so */ - OPAL_TIMING_MNEXT((&tm,"time to execute modex")); + OMPI_TIMING_IMPORT_OPAL("orte_init"); + OMPI_TIMING_NEXT("rte_init-commit"); /* exchange connection info - this function may also act as a barrier * if data exchange is required. The modex occurs solely across procs * in our job. If a barrier is required, the "modex" function will * perform it internally */ opal_pmix.commit(); - if (!opal_pmix_base_async_modex) { - if (NULL != opal_pmix.fence_nb) { + OMPI_TIMING_NEXT("commit"); +#if (OPAL_ENABLE_TIMING) + if (OMPI_TIMING_ENABLED && !opal_pmix_base_async_modex && + opal_pmix_collect_all_data) { + opal_pmix.fence(NULL, 0); + OMPI_TIMING_NEXT("pmix-barrier-1"); + opal_pmix.fence(NULL, 0); + OMPI_TIMING_NEXT("pmix-barrier-2"); + } +#endif + + /* If we have a non-blocking fence: + * if we are doing an async modex, but we are collecting all + * data, then execute the non-blocking modex in the background. + * All calls to modex_recv will be cached until the background + * modex completes. If collect_all_data is false, then we skip + * the fence completely and retrieve data on-demand from the + * source node. + * + * If we do not have a non-blocking fence, then we must always + * execute the blocking fence as the system does not support + * later data retrieval. */ + if (NULL != opal_pmix.fence_nb) { + if (opal_pmix_base_async_modex && opal_pmix_collect_all_data) { + /* execute the fence_nb in the background to collect + * the data */ + background_fence = true; + active = true; + OPAL_POST_OBJECT(&active); + opal_pmix.fence_nb(NULL, true, fence_release, (void*)&active); + } else if (!opal_pmix_base_async_modex) { + /* we want to do the modex */ active = true; + OPAL_POST_OBJECT(&active); opal_pmix.fence_nb(NULL, opal_pmix_collect_all_data, fence_release, (void*)&active); + /* cannot just wait on thread as we need to call opal_progress */ OMPI_LAZY_WAIT_FOR_COMPLETION(active); - } else { - opal_pmix.fence(NULL, opal_pmix_collect_all_data); } + /* otherwise, we don't want to do the modex, so fall thru */ + } else if (!opal_pmix_base_async_modex || opal_pmix_collect_all_data) { + opal_pmix.fence(NULL, opal_pmix_collect_all_data); } - OPAL_TIMING_MNEXT((&tm,"time from modex to first barrier")); + OMPI_TIMING_NEXT("modex"); /* select buffered send allocator component to be used */ if( OMPI_SUCCESS != @@ -710,7 +742,7 @@ int ompi_mpi_init(int argc, char **argv, int requested, int *provided) } /* initialize info */ - if (OMPI_SUCCESS != (ret = ompi_info_init())) { + if (OMPI_SUCCESS != (ret = ompi_mpiinfo_init())) { error = "ompi_info_init() failed"; goto error; } @@ -825,14 +857,20 @@ int ompi_mpi_init(int argc, char **argv, int requested, int *provided) ompi_rte_wait_for_debugger(); /* Next timing measurement */ - OPAL_TIMING_MNEXT((&tm,"time to execute barrier")); - - /* wait for everyone to reach this point - this is a hard - * barrier requirement at this time, though we hope to relax - * it at a later point */ - if (!ompi_async_mpi_init) { - active = true; + OMPI_TIMING_NEXT("modex-barrier"); + + /* if we executed the above fence in the background, then + * we have to wait here for it to complete. However, there + * is no reason to do two barriers! */ + if (background_fence) { + OMPI_LAZY_WAIT_FOR_COMPLETION(active); + } else if (!ompi_async_mpi_init) { + /* wait for everyone to reach this point - this is a hard + * barrier requirement at this time, though we hope to relax + * it at a later point */ if (NULL != opal_pmix.fence_nb) { + active = true; + OPAL_POST_OBJECT(&active); opal_pmix.fence_nb(NULL, false, fence_release, (void*)&active); OMPI_LAZY_WAIT_FOR_COMPLETION(active); @@ -843,7 +881,7 @@ int ompi_mpi_init(int argc, char **argv, int requested, int *provided) /* check for timing request - get stop time and report elapsed time if so, then start the clock again */ - OPAL_TIMING_MNEXT((&tm,"time from barrier to complete mpi_init")); + OMPI_TIMING_NEXT("barrier"); #if OPAL_ENABLE_PROGRESS_THREADS == 0 /* Start setting up the event engine for MPI operations. Don't @@ -944,7 +982,7 @@ int ompi_mpi_init(int argc, char **argv, int requested, int *provided) error: if (ret != OMPI_SUCCESS) { /* Only print a message if one was not already printed */ - if (NULL != error) { + if (NULL != error && OMPI_ERR_SILENT != ret) { const char *err_msg = opal_strerror(ret); opal_show_help("help-mpi-runtime.txt", "mpi_init:startup:internal-failure", true, @@ -952,6 +990,7 @@ int ompi_mpi_init(int argc, char **argv, int requested, int *provided) } opal_mutex_unlock(&ompi_mpi_bootstrap_mutex); ompi_hook_base_mpi_init_error(argc, argv, requested, provided); + OMPI_TIMING_FINALIZE; return ret; } @@ -976,10 +1015,9 @@ int ompi_mpi_init(int argc, char **argv, int requested, int *provided) /* Finish last measurement, output results * and clear timing structure */ - OPAL_TIMING_MSTOP(&tm); - OPAL_TIMING_DELTAS(ompi_enable_timing, &tm); - OPAL_TIMING_REPORT(ompi_enable_timing_ext, &tm); - OPAL_TIMING_RELEASE(&tm); + OMPI_TIMING_NEXT("barrier-finish"); + OMPI_TIMING_OUT; + OMPI_TIMING_FINALIZE; opal_mutex_unlock(&ompi_mpi_bootstrap_mutex); diff --git a/ompi/runtime/ompi_mpi_params.c b/ompi/runtime/ompi_mpi_params.c index 6d799032c74..f8376db633d 100644 --- a/ompi/runtime/ompi_mpi_params.c +++ b/ompi/runtime/ompi_mpi_params.c @@ -62,7 +62,7 @@ bool ompi_mpi_keep_fqdn_hostnames = false; bool ompi_have_sparse_group_storage = OPAL_INT_TO_BOOL(OMPI_GROUP_SPARSE); bool ompi_use_sparse_group_storage = OPAL_INT_TO_BOOL(OMPI_GROUP_SPARSE); -bool ompi_mpi_yield_when_idle = true; +bool ompi_mpi_yield_when_idle = false; int ompi_mpi_event_tick_rate = -1; char *ompi_mpi_show_mca_params_string = NULL; bool ompi_mpi_have_sparse_group_storage = !!(OMPI_GROUP_SPARSE); @@ -107,7 +107,7 @@ int ompi_mpi_register_params(void) */ /* JMS: Need ORTE data here -- set this to 0 when exactly/under-subscribed, or 1 when oversubscribed */ - ompi_mpi_yield_when_idle = true; + ompi_mpi_yield_when_idle = false; (void) mca_base_var_register("ompi", "mpi", NULL, "yield_when_idle", "Yield the processor when waiting for MPI communication (for MPI processes, will default to 1 when oversubscribing nodes)", MCA_BASE_VAR_TYPE_BOOL, NULL, 0, 0, @@ -287,11 +287,7 @@ int ompi_mpi_register_params(void) MCA_BASE_VAR_SCOPE_READONLY, &ompi_mpi_dynamics_enabled); - if (opal_pmix_base_async_modex) { - ompi_async_mpi_init = true; - } else { - ompi_async_mpi_init = false; - } + ompi_async_mpi_init = false; (void) mca_base_var_register("ompi", "async", "mpi", "init", "Do not perform a barrier at the end of MPI_Init", MCA_BASE_VAR_TYPE_BOOL, NULL, 0, 0, diff --git a/ompi/runtime/ompi_mpi_preconnect.c b/ompi/runtime/ompi_mpi_preconnect.c index 0fac35d5178..6b4d207419a 100644 --- a/ompi/runtime/ompi_mpi_preconnect.c +++ b/ompi/runtime/ompi_mpi_preconnect.c @@ -8,6 +8,8 @@ * Copyright (c) 2007 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2016 Intel, Inc. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -22,8 +24,8 @@ #include "ompi/constants.h" #include "ompi/mca/pml/pml.h" #include "ompi/communicator/communicator.h" -#include "ompi/request/request.h" #include "ompi/runtime/mpiruntime.h" +#include "ompi/mca/coll/base/coll_base_util.h" int ompi_init_preconnect_mpi(void) @@ -31,7 +33,6 @@ ompi_init_preconnect_mpi(void) int comm_size = ompi_comm_size(MPI_COMM_WORLD); int comm_rank = ompi_comm_rank(MPI_COMM_WORLD); int param, next, prev, i, ret = OMPI_SUCCESS; - struct ompi_request_t * requests[2]; char inbuf[1], outbuf[1]; const bool *value = NULL; @@ -58,21 +59,12 @@ ompi_init_preconnect_mpi(void) next = (comm_rank + i) % comm_size; prev = (comm_rank - i + comm_size) % comm_size; - ret = MCA_PML_CALL(isend(outbuf, 1, MPI_CHAR, - next, 1, - MCA_PML_BASE_SEND_COMPLETE, - MPI_COMM_WORLD, - &requests[1])); - if (OMPI_SUCCESS != ret) return ret; - - ret = MCA_PML_CALL(irecv(inbuf, 1, MPI_CHAR, - prev, 1, - MPI_COMM_WORLD, - &requests[0])); + ret = ompi_coll_base_sendrecv_actual(outbuf, 1, MPI_CHAR, + next, 1, + inbuf, 1, MPI_CHAR, + prev, 1, + MPI_COMM_WORLD, MPI_STATUS_IGNORE); if(OMPI_SUCCESS != ret) return ret; - - ret = ompi_request_wait_all(2, requests, MPI_STATUSES_IGNORE); - if (OMPI_SUCCESS != ret) return ret; } return ret; diff --git a/ompi/tools/mpisync/Makefile.am b/ompi/tools/mpisync/Makefile.am index 50619e0aad8..3514afcc59f 100644 --- a/ompi/tools/mpisync/Makefile.am +++ b/ompi/tools/mpisync/Makefile.am @@ -15,6 +15,8 @@ # All rights reserved. # Copyright (c) 2014 Artem Polyakov # Copyright (c) 2016 IBM Corporation. All rights reserved. +# Copyright (c) 2017 Research Organization for Information Science +# and Technology (RIST). All rights reserved. # # $COPYRIGHT$ # @@ -30,7 +32,7 @@ AM_CFLAGS = \ -DOPAL_CONFIGURE_HOST="\"@OPAL_CONFIGURE_HOST@\"" \ -DOPAL_CONFIGURE_DATE="\"@OPAL_CONFIGURE_DATE@\"" \ -DOMPI_BUILD_USER="\"$$USER\"" \ - -DOMPI_BUILD_HOST="\"`hostname`\"" \ + -DOMPI_BUILD_HOST="\"`(hostname || uname -n) | sed 1q`\"" \ -DOMPI_BUILD_DATE="\"`date`\"" \ -DOMPI_BUILD_CFLAGS="\"@CFLAGS@\"" \ -DOMPI_BUILD_CPPFLAGS="\"@CPPFLAGS@\"" \ diff --git a/ompi/tools/mpisync/sync.c b/ompi/tools/mpisync/sync.c index 658ada2df7e..bcedadcb4ad 100644 --- a/ompi/tools/mpisync/sync.c +++ b/ompi/tools/mpisync/sync.c @@ -1,6 +1,6 @@ /* * Copyright (C) 2014 Artem Polyakov - * Copyright (c) 2014 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2017 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -139,7 +139,6 @@ int main(int argc, char **argv) MPI_Gather(hname,sizeof(hname),MPI_CHAR,hnames,sizeof(hname),MPI_CHAR, 0, MPI_COMM_WORLD); MPI_Gather(send,2,MPI_DOUBLE,measure,2, MPI_DOUBLE, 0, MPI_COMM_WORLD); - char tmpname[128]; FILE *fp = fopen(filename,"w"); if( fp == NULL ){ fprintf(stderr, "Fail to open the file %s. Abort\n", filename); diff --git a/ompi/tools/ompi_info/Makefile.am b/ompi/tools/ompi_info/Makefile.am index 58ab9dd0c0b..296d8ba283a 100644 --- a/ompi/tools/ompi_info/Makefile.am +++ b/ompi/tools/ompi_info/Makefile.am @@ -14,6 +14,8 @@ # Copyright (c) 2012 Los Alamos National Security, LLC. # All rights reserved. # Copyright (c) 2016 IBM Corporation. All rights reserved. +# Copyright (c) 2017 Research Organization for Information Science +# and Technology (RIST). All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -26,7 +28,7 @@ AM_CFLAGS = \ -DOPAL_CONFIGURE_HOST="\"@OPAL_CONFIGURE_HOST@\"" \ -DOPAL_CONFIGURE_DATE="\"@OPAL_CONFIGURE_DATE@\"" \ -DOMPI_BUILD_USER="\"$$USER\"" \ - -DOMPI_BUILD_HOST="\"`hostname`\"" \ + -DOMPI_BUILD_HOST="\"`(hostname || uname -n) 2> /dev/null | sed 1q`\"" \ -DOMPI_BUILD_DATE="\"`date`\"" \ -DOMPI_BUILD_CFLAGS="\"@CFLAGS@\"" \ -DOMPI_BUILD_CPPFLAGS="\"@CPPFLAGS@\"" \ diff --git a/ompi/tools/ompi_info/ompi_info.c b/ompi/tools/ompi_info/ompi_info.c index 547e6264af5..faf9ad6e9b1 100644 --- a/ompi/tools/ompi_info/ompi_info.c +++ b/ompi/tools/ompi_info/ompi_info.c @@ -2,7 +2,7 @@ * Copyright (c) 2004-2010 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2006 The University of Tennessee and The University + * Copyright (c) 2004-2016 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2007 High Performance Computing Center Stuttgart, @@ -110,7 +110,7 @@ int main(int argc, char *argv[]) /* setup the mca_types array */ OBJ_CONSTRUCT(&mca_types, opal_pointer_array_t); - opal_pointer_array_init(&mca_types, 256, INT_MAX, 128); + opal_pointer_array_init(&mca_types, 128, INT_MAX, 64); /* add in the opal frameworks */ opal_info_register_types(&mca_types); @@ -124,7 +124,7 @@ int main(int argc, char *argv[]) /* init the component map */ OBJ_CONSTRUCT(&component_map, opal_pointer_array_t); - opal_pointer_array_init(&component_map, 256, INT_MAX, 128); + opal_pointer_array_init(&component_map, 64, INT_MAX, 32); /* Register OMPI's params */ if (OMPI_SUCCESS != (ret = ompi_info_register_framework_params(&component_map))) { diff --git a/ompi/tools/ompi_info/param.c b/ompi/tools/ompi_info/param.c index a61c8e7d37c..17e2cc42e28 100644 --- a/ompi/tools/ompi_info/param.c +++ b/ompi/tools/ompi_info/param.c @@ -9,9 +9,9 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2007-2015 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2007-2017 Cisco Systems, Inc. All rights reserved * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. - * Copyright (c) 2014-2016 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Intel, Inc. All rights reserved * $COPYRIGHT$ @@ -126,12 +126,12 @@ void ompi_info_do_config(bool want_all) char *mpirun_prefix_by_default; #endif char *sparse_groups; - char *have_mpi_io; char *wtime_support; char *symbol_visibility; char *ft_support; char *crdebug_support; char *topology_support; + char *ipv6_support; /* Do a little preprocessor trickery here to figure opal_info_out the * tri-state of MPI_PARAM_CHECK (which will be either 0, 1, or @@ -157,6 +157,16 @@ void ompi_info_do_config(bool want_all) paramcheck = "runtime"; #endif + /* The current mpi_f08 implementation does not support Fortran + subarrays. However, someday it may/will. Hence, I'm leaving + in all the logic that checks to see whether subarrays are + supported, but I'm just hard-coding + OMPI_BUILD_FORTRAN_F08_SUBARRAYS to 0 (we used to have a + prototype mpi_f08 module that implemented a handful of + descriptor-based interfaces and supported subarrays, but that + has been removed). */ + const int OMPI_BUILD_FORTRAN_F08_SUBARRAYS = 0; + /* setup the strings that don't require allocations*/ cxx = OMPI_BUILD_CXX_BINDINGS ? "yes" : "no"; if (OMPI_BUILD_FORTRAN_BINDINGS >= OMPI_FORTRAN_USEMPI_BINDINGS) { @@ -218,7 +228,7 @@ void ompi_info_do_config(bool want_all) } else { int first = 1; snprintf(f08_msg, sizeof(f08_msg), - "The mpi_f08 module is available, but due to limitations in the %s compiler, does not support the following: ", + "The mpi_f08 module is available, but due to limitations in the %s compiler and/or Open MPI, does not support the following: ", OMPI_FC); if (!OMPI_BUILD_FORTRAN_F08_SUBARRAYS) { append(f08_msg, sizeof(f08_msg), &first, "array subsections"); @@ -271,10 +281,10 @@ void ompi_info_do_config(bool want_all) mpirun_prefix_by_default = ORTE_WANT_ORTERUN_PREFIX_BY_DEFAULT ? "yes" : "no"; #endif sparse_groups = OMPI_GROUP_SPARSE ? "yes" : "no"; - have_mpi_io = OMPI_PROVIDE_MPI_FILE_INTERFACE ? "yes" : "no"; wtime_support = OPAL_TIMER_USEC_NATIVE ? "native" : "gettimeofday"; symbol_visibility = OPAL_C_HAVE_VISIBILITY ? "yes" : "no"; topology_support = "yes"; + ipv6_support = OPAL_ENABLE_IPV6 ? "yes" : "no"; /* setup strings that require allocation */ if (OMPI_BUILD_FORTRAN_BINDINGS >= OMPI_FORTRAN_MPIFH_BINDINGS) { @@ -628,11 +638,11 @@ void ompi_info_do_config(bool want_all) opal_info_out("mpirun default --prefix", "mpirun:prefix_by_default", mpirun_prefix_by_default); #endif - opal_info_out("MPI I/O support", "options:mpi-io", have_mpi_io); opal_info_out("MPI_WTIME support", "options:mpi-wtime", wtime_support); opal_info_out("Symbol vis. support", "options:visibility", symbol_visibility); opal_info_out("Host topology support", "options:host-topology", topology_support); + opal_info_out("IPv6 support", "options:ipv6", ipv6_support); opal_info_out("MPI extensions", "options:mpi_ext", OMPI_MPIEXT_COMPONENTS); @@ -654,12 +664,7 @@ void ompi_info_do_config(bool want_all) MPI_MAX_INFO_VAL); opal_info_out_int("MPI_MAX_PORT_NAME", "options:mpi-max-port-name", MPI_MAX_PORT_NAME); -#if OMPI_PROVIDE_MPI_FILE_INTERFACE opal_info_out_int("MPI_MAX_DATAREP_STRING", "options:mpi-max-datarep-string", MPI_MAX_DATAREP_STRING); -#else - opal_info_out("MPI_MAX_DATAREP_STRING", "options:mpi-max-datarep-string", - "IO interface not provided"); -#endif } diff --git a/ompi/tools/wrappers/Makefile.am b/ompi/tools/wrappers/Makefile.am index 9f973785048..933eb3d7620 100644 --- a/ompi/tools/wrappers/Makefile.am +++ b/ompi/tools/wrappers/Makefile.am @@ -14,6 +14,7 @@ # Copyright (c) 2013 Intel, Inc. All rights reserved. # Copyright (c) 2014 Research Organization for Information Science # and Technology (RIST). All rights reserved. +# Copyright (c) 2017 FUJITSU LIMITED. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -24,14 +25,15 @@ include $(top_srcdir)/Makefile.ompi-rules generated_man_pages = mpicc.1 mpic++.1 mpicxx.1 mpifort.1 mpif77.1 mpif90.1 -man_pages = $(generated_man_pages) - -EXTRA_DIST = mpif77.1in mpijavac.1 mpijavac.pl.in if OMPI_WANT_JAVA_BINDINGS -man_pages += mpijavac.1 +generated_man_pages += mpijavac.1 endif +man_pages = $(generated_man_pages) + +EXTRA_DIST = mpif77.1in mpijavac.1in mpijavac.pl.in + if OPAL_WANT_SCRIPT_WRAPPER_COMPILERS bin_SCRIPTS = ompi_wrapper_script diff --git a/ompi/tools/wrappers/mpijavac.1 b/ompi/tools/wrappers/mpijavac.1 deleted file mode 100644 index 15ffe26ef16..00000000000 --- a/ompi/tools/wrappers/mpijavac.1 +++ /dev/null @@ -1,146 +0,0 @@ -.\" Copyright (c) 2012 Los Alamos National Security, LLC. All rights reserved. -.TH mpijava 1 "Unreleased developer copy" "1.7a1r25839M" "Open MPI" -. -.SH NAME -mpijava -- Open MPI Java wrapper compiler -. -.SH SYNTAX -mpijava [-showme|-showme:compile|-showme:link] ... -. -.SH OPTIONS -.TP ---showme -This option comes in several different variants (see below). None of -the variants invokes the underlying compiler; they all provide -information on how the underlying compiler would have been invoked had -.I --showme -not been used. -The basic -.I --showme -option outputs the command line that would be executed to compile the -program. \fBNOTE:\fR If a non-filename argument is passed on the -command line, the \fI-showme\fR option will \fInot\fR display any -additional flags. For example, both "mpijava --showme" and -"mpijava --showme my_source.java" will show all the wrapper-supplied -flags. But "mpijava --showme -v" will only show the underlying -compiler name and "-v". -.TP ---showme:compile -Output the compiler flags that would have been supplied to the -java compiler. -.TP ---showme:link -Output the linker flags that would have been supplied to the -java compiler. -.TP ---showme:command -Outputs the underlying java compiler command (which may be one -or more tokens). -.TP ---showme:incdirs -Outputs a space-delimited (but otherwise undecorated) list of -directories that the wrapper compiler would have provided to the -underlying java compiler to indicate where relevant header files -are located. -.TP ---showme:libdirs -Outputs a space-delimited (but otherwise undecorated) list of -directories that the wrapper compiler would have provided to the -underlying linker to indicate where relevant libraries are located. -.TP ---showme:libs -Outputs a space-delimited (but otherwise undecorated) list of library -names that the wrapper compiler would have used to link an -application. For example: "mpi open-rte open-pal util". -.TP ---showme:version -Outputs the version number of Open MPI. -.PP -See the man page for your underlying java compiler for other -options that can be passed through mpijava. -. -. -.SH DESCRIPTION -.PP -Conceptually, the role of these commands is quite simple: -transparently add relevant compiler and linker flags to the user's -command line that are necessary to compile / link Open MPI -programs, and then invoke the underlying compiler to actually perform -the command. -. -.PP -As such, these commands are frequently referred to as "wrapper" -compilers because they do not actually compile or link applications -themselves; they only add in command line flags and invoke the -back-end compiler. -. -. -.SS Overview -\fImpijava\fR is a convenience wrapper for the underlying -java compiler. Translation of an Open MPI program requires the -linkage of the Open MPI-specific libraries which may not reside in -one of the standard search directories of ld(1). It also often -requires the inclusion of header files what may also not be found in a -standard location. -. -.PP -\fImpijava\fR passes its arguments to the underlying java -compiler along with the -I, -L and -l options required by Open MPI -programs. -. -.PP -The Open MPI Team \fIstrongly\fR encourages using the wrapper -compilers instead of attempting to link to the Open MPI libraries -manually. This allows the specific implementation of Open MPI to -change without forcing changes to linker directives in users' -Makefiles. Indeed, the specific set of flags and libraries used by -the wrapper compilers depends on how Open MPI was configured and -built; the values can change between different installations of the -same version of Open MPI. -. -.PP -Indeed, since the wrappers are simply thin shells on top of an -underlying compiler, there are very, very few compelling reasons -\fInot\fR to use \fImpijava\fR. When it is not possible to use the -wrappers directly, the \fI-showme:compile\fR and \fI-showme:link\fR -options should be used to determine what flags the wrappers would have -used. -. -. -.SH NOTES -.PP -It is possible to make the wrapper compilers multi-lib aware. That -is, the libraries and includes specified may differ based on the -compiler flags specified (for example, with the GNU compilers on -Linux, a different library path may be used if -m32 is seen versus --m64 being seen). This is not the default behavior in a standard -build, but can be activated (for example, in a binary package -providing both 32 and 64 bit support). More information can be found -at: -.PP - https://svn.open-mpi.org/trac/ompi/wiki/compilerwrapper3264 -. -. -.SH FILES -.PP -The string that the wrapper compilers insert into the command line -before invoking the underlying compiler are stored in a text file -created by Open MPI and installed to -\fI$pkgdata/mpijava-wrapper-data.txt\fR, where \fI$pkgdata\fR -is typically \fI$prefix/share/openmpi\fR, and \fI$prefix\fR is the top -installation directory of Open MPI. -. -.PP -It is rarely necessary to edit this file, but it can be examined to -gain insight into what flags the wrappers are placing on the command -line. -. -. -.SH ENVIRONMENT VARIABLES -.PP -By default, the wrappers use the compilers that were selected when -Open MPI was configured. These compilers were either found -automatically by Open MPI's "configure" script, or were selected by -the user in the CC, CXX, F77, JAVAC, and/or FC environment variables -before "configure" was invoked. Additionally, other arguments -specific to the compiler may have been selected by configure. diff --git a/ompi/tools/wrappers/mpijavac.1in b/ompi/tools/wrappers/mpijavac.1in new file mode 100644 index 00000000000..e95016e6aa5 --- /dev/null +++ b/ompi/tools/wrappers/mpijavac.1in @@ -0,0 +1,147 @@ +.\" Copyright (c) 2012 Los Alamos National Security, LLC. All rights reserved. +.\" Copyright (c) 2017 FUJITSU LIMITED. All rights reserved. +.TH mpijava 1 "#OPAL_DATE#" "#PACKAGE_VERSION#" "#PACKAGE_NAME#" +. +.SH NAME +mpijava -- Open MPI Java wrapper compiler +. +.SH SYNTAX +mpijava [-showme|-showme:compile|-showme:link] ... +. +.SH OPTIONS +.TP +--showme +This option comes in several different variants (see below). None of +the variants invokes the underlying compiler; they all provide +information on how the underlying compiler would have been invoked had +.I --showme +not been used. +The basic +.I --showme +option outputs the command line that would be executed to compile the +program. \fBNOTE:\fR If a non-filename argument is passed on the +command line, the \fI-showme\fR option will \fInot\fR display any +additional flags. For example, both "mpijava --showme" and +"mpijava --showme my_source.java" will show all the wrapper-supplied +flags. But "mpijava --showme -v" will only show the underlying +compiler name and "-v". +.TP +--showme:compile +Output the compiler flags that would have been supplied to the +java compiler. +.TP +--showme:link +Output the linker flags that would have been supplied to the +java compiler. +.TP +--showme:command +Outputs the underlying java compiler command (which may be one +or more tokens). +.TP +--showme:incdirs +Outputs a space-delimited (but otherwise undecorated) list of +directories that the wrapper compiler would have provided to the +underlying java compiler to indicate where relevant header files +are located. +.TP +--showme:libdirs +Outputs a space-delimited (but otherwise undecorated) list of +directories that the wrapper compiler would have provided to the +underlying linker to indicate where relevant libraries are located. +.TP +--showme:libs +Outputs a space-delimited (but otherwise undecorated) list of library +names that the wrapper compiler would have used to link an +application. For example: "mpi open-rte open-pal util". +.TP +--showme:version +Outputs the version number of Open MPI. +.PP +See the man page for your underlying java compiler for other +options that can be passed through mpijava. +. +. +.SH DESCRIPTION +.PP +Conceptually, the role of these commands is quite simple: +transparently add relevant compiler and linker flags to the user's +command line that are necessary to compile / link Open MPI +programs, and then invoke the underlying compiler to actually perform +the command. +. +.PP +As such, these commands are frequently referred to as "wrapper" +compilers because they do not actually compile or link applications +themselves; they only add in command line flags and invoke the +back-end compiler. +. +. +.SS Overview +\fImpijava\fR is a convenience wrapper for the underlying +java compiler. Translation of an Open MPI program requires the +linkage of the Open MPI-specific libraries which may not reside in +one of the standard search directories of ld(1). It also often +requires the inclusion of header files what may also not be found in a +standard location. +. +.PP +\fImpijava\fR passes its arguments to the underlying java +compiler along with the -I, -L and -l options required by Open MPI +programs. +. +.PP +The Open MPI Team \fIstrongly\fR encourages using the wrapper +compilers instead of attempting to link to the Open MPI libraries +manually. This allows the specific implementation of Open MPI to +change without forcing changes to linker directives in users' +Makefiles. Indeed, the specific set of flags and libraries used by +the wrapper compilers depends on how Open MPI was configured and +built; the values can change between different installations of the +same version of Open MPI. +. +.PP +Indeed, since the wrappers are simply thin shells on top of an +underlying compiler, there are very, very few compelling reasons +\fInot\fR to use \fImpijava\fR. When it is not possible to use the +wrappers directly, the \fI-showme:compile\fR and \fI-showme:link\fR +options should be used to determine what flags the wrappers would have +used. +. +. +.SH NOTES +.PP +It is possible to make the wrapper compilers multi-lib aware. That +is, the libraries and includes specified may differ based on the +compiler flags specified (for example, with the GNU compilers on +Linux, a different library path may be used if -m32 is seen versus +-m64 being seen). This is not the default behavior in a standard +build, but can be activated (for example, in a binary package +providing both 32 and 64 bit support). More information can be found +at: +.PP + https://svn.open-mpi.org/trac/ompi/wiki/compilerwrapper3264 +. +. +.SH FILES +.PP +The string that the wrapper compilers insert into the command line +before invoking the underlying compiler are stored in a text file +created by Open MPI and installed to +\fI$pkgdata/mpijava-wrapper-data.txt\fR, where \fI$pkgdata\fR +is typically \fI$prefix/share/openmpi\fR, and \fI$prefix\fR is the top +installation directory of Open MPI. +. +.PP +It is rarely necessary to edit this file, but it can be examined to +gain insight into what flags the wrappers are placing on the command +line. +. +. +.SH ENVIRONMENT VARIABLES +.PP +By default, the wrappers use the compilers that were selected when +Open MPI was configured. These compilers were either found +automatically by Open MPI's "configure" script, or were selected by +the user in the CC, CXX, F77, JAVAC, and/or FC environment variables +before "configure" was invoked. Additionally, other arguments +specific to the compiler may have been selected by configure. diff --git a/ompi/util/Makefile.am b/ompi/util/Makefile.am new file mode 100644 index 00000000000..45f01c77069 --- /dev/null +++ b/ompi/util/Makefile.am @@ -0,0 +1,13 @@ +# -*- makefile -*- +# +# Copyright (c) 2017 Mellanox Technologies Ltd. All rights reserved. +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +# Source code files +headers += \ + util/timings.h diff --git a/ompi/util/timings.h b/ompi/util/timings.h new file mode 100644 index 00000000000..be870665529 --- /dev/null +++ b/ompi/util/timings.h @@ -0,0 +1,284 @@ +/* + * Copyright (c) 2017-2018 Mellanox Technologies Ltd. All rights reserved. + * Copyright (c) 2017 Intel, Inc. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#ifndef OMPI_UTIL_TIMING_H +#define OMPI_UTIL_TIMING_H + +#include "opal/util/timings.h" +/* TODO: we need access to MPI_* functions */ + +#if (OPAL_ENABLE_TIMING) + +typedef struct { + char desc[OPAL_TIMING_STR_LEN]; + double ts; + char *file; + char *prefix; + int imported; +} ompi_timing_val_t; + +typedef struct { + ompi_timing_val_t *val; + int use; + struct ompi_timing_list_t *next; +} ompi_timing_list_t; + +typedef struct ompi_timing_t { + double ts; + const char *prefix; + int size; + int cnt; + int error; + int enabled; + int import_cnt; + opal_timing_ts_func_t get_ts; + ompi_timing_list_t *timing; + ompi_timing_list_t *cur_timing; +} ompi_timing_t; + +#define OMPI_TIMING_ENABLED \ + (getenv("OMPI_TIMING_ENABLE") ? atoi(getenv("OMPI_TIMING_ENABLE")) : 0) + +#define OMPI_TIMING_INIT(_size) \ + ompi_timing_t OMPI_TIMING; \ + OMPI_TIMING.prefix = __func__; \ + OMPI_TIMING.size = _size; \ + OMPI_TIMING.get_ts = opal_timing_ts_func(OPAL_TIMING_AUTOMATIC_TIMER); \ + OMPI_TIMING.cnt = 0; \ + OMPI_TIMING.error = 0; \ + OMPI_TIMING.ts = OMPI_TIMING.get_ts(); \ + OMPI_TIMING.enabled = 0; \ + OMPI_TIMING.import_cnt = 0; \ + { \ + char *ptr; \ + ptr = getenv("OMPI_TIMING_ENABLE"); \ + if (NULL != ptr) { \ + OMPI_TIMING.enabled = atoi(ptr); \ + } \ + if (OMPI_TIMING.enabled) { \ + setenv("OPAL_TIMING_ENABLE", "1", 1); \ + OMPI_TIMING.timing = (ompi_timing_list_t*)malloc(sizeof(ompi_timing_list_t)); \ + memset(OMPI_TIMING.timing, 0, sizeof(ompi_timing_list_t)); \ + OMPI_TIMING.timing->val = (ompi_timing_val_t*)malloc(sizeof(ompi_timing_val_t) * _size); \ + OMPI_TIMING.cur_timing = OMPI_TIMING.timing; \ + } \ + } + +#define OMPI_TIMING_ITEM_EXTEND \ + do { \ + if (OMPI_TIMING.enabled) { \ + OMPI_TIMING.cur_timing->next = (struct ompi_timing_list_t*)malloc(sizeof(ompi_timing_list_t)); \ + OMPI_TIMING.cur_timing = (ompi_timing_list_t*)OMPI_TIMING.cur_timing->next; \ + memset(OMPI_TIMING.cur_timing, 0, sizeof(ompi_timing_list_t)); \ + OMPI_TIMING.cur_timing->val = malloc(sizeof(ompi_timing_val_t) * OMPI_TIMING.size); \ + } \ + } while(0) + +#define OMPI_TIMING_FINALIZE \ + do { \ + if (OMPI_TIMING.enabled) { \ + ompi_timing_list_t *t = OMPI_TIMING.timing, *tmp; \ + while ( NULL != t) { \ + tmp = t; \ + t = (ompi_timing_list_t*)t->next; \ + free(tmp->val); \ + free(tmp); \ + } \ + OMPI_TIMING.timing = NULL; \ + OMPI_TIMING.cur_timing = NULL; \ + OMPI_TIMING.cnt = 0; \ + } \ + } while(0) + +#define OMPI_TIMING_NEXT(...) \ + do { \ + if (!OMPI_TIMING.error && OMPI_TIMING.enabled) { \ + char *f = strrchr(__FILE__, '/'); \ + f = (f == NULL) ? strdup(__FILE__) : f+1; \ + int len = 0; \ + if (OMPI_TIMING.cur_timing->use >= OMPI_TIMING.size){ \ + OMPI_TIMING_ITEM_EXTEND; \ + } \ + len = snprintf(OMPI_TIMING.cur_timing->val[OMPI_TIMING.cur_timing->use].desc, \ + OPAL_TIMING_STR_LEN, ##__VA_ARGS__); \ + if (len >= OPAL_TIMING_STR_LEN) { \ + OMPI_TIMING.error = 1; \ + } \ + OMPI_TIMING.cur_timing->val[OMPI_TIMING.cur_timing->use].file = strdup(f); \ + OMPI_TIMING.cur_timing->val[OMPI_TIMING.cur_timing->use].prefix = strdup(__func__); \ + OMPI_TIMING.cur_timing->val[OMPI_TIMING.cur_timing->use++].ts = \ + OMPI_TIMING.get_ts() - OMPI_TIMING.ts; \ + OMPI_TIMING.cnt++; \ + OMPI_TIMING.ts = OMPI_TIMING.get_ts(); \ + } \ + } while(0) + +#define OMPI_TIMING_APPEND(filename,func,desc,ts) \ + do { \ + if (OMPI_TIMING.cur_timing->use >= OMPI_TIMING.size){ \ + OMPI_TIMING_ITEM_EXTEND; \ + } \ + int len = snprintf(OMPI_TIMING.cur_timing->val[OMPI_TIMING.cur_timing->use].desc, \ + OPAL_TIMING_STR_LEN, "%s", desc); \ + if (len >= OPAL_TIMING_STR_LEN) { \ + OMPI_TIMING.error = 1; \ + } \ + OMPI_TIMING.cur_timing->val[OMPI_TIMING.cur_timing->use].prefix = func; \ + OMPI_TIMING.cur_timing->val[OMPI_TIMING.cur_timing->use].file = filename; \ + OMPI_TIMING.cur_timing->val[OMPI_TIMING.cur_timing->use++].ts = ts; \ + OMPI_TIMING.cnt++; \ + } while(0) + +#define OMPI_TIMING_IMPORT_OPAL_PREFIX(_prefix, func) \ + do { \ + if (!OMPI_TIMING.error && OMPI_TIMING.enabled) { \ + int cnt; \ + int i; \ + double ts; \ + OMPI_TIMING.import_cnt++; \ + OPAL_TIMING_ENV_CNT(func, cnt); \ + OPAL_TIMING_ENV_ERROR_PREFIX(_prefix, func, OMPI_TIMING.error); \ + for(i = 0; i < cnt; i++){ \ + char *desc, *filename; \ + OMPI_TIMING.cur_timing->val[OMPI_TIMING.cur_timing->use].imported= \ + OMPI_TIMING.import_cnt; \ + OPAL_TIMING_ENV_GETDESC_PREFIX(_prefix, &filename, func, i, &desc, ts); \ + OMPI_TIMING_APPEND(filename, func, desc, ts); \ + } \ + } \ + } while(0) + +#define OMPI_TIMING_IMPORT_OPAL(func) \ + OMPI_TIMING_IMPORT_OPAL_PREFIX("", func); + +#define OMPI_TIMING_OUT \ + do { \ + if (OMPI_TIMING.enabled) { \ + int i, size, rank; \ + MPI_Comm_size(MPI_COMM_WORLD, &size); \ + MPI_Comm_rank(MPI_COMM_WORLD, &rank); \ + int error = 0; \ + int imported = 0; \ + \ + MPI_Reduce(&OMPI_TIMING.error, &error, 1, \ + MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); \ + \ + if (error) { \ + if (0 == rank) { \ + printf("==OMPI_TIMING== error: something went wrong, timings doesn't work\n"); \ + } \ + } \ + else { \ + double *avg = (double*)malloc(sizeof(double) * OMPI_TIMING.cnt); \ + double *min = (double*)malloc(sizeof(double) * OMPI_TIMING.cnt); \ + double *max = (double*)malloc(sizeof(double) * OMPI_TIMING.cnt); \ + char **desc = (char**)malloc(sizeof(char*) * OMPI_TIMING.cnt); \ + char **prefix = (char**)malloc(sizeof(char*) * OMPI_TIMING.cnt); \ + char **file = (char**)malloc(sizeof(char*) * OMPI_TIMING.cnt); \ + double total_avg = 0, total_min = 0, total_max = 0; \ + \ + if( OMPI_TIMING.cnt > 0 ) { \ + OMPI_TIMING.ts = OMPI_TIMING.get_ts(); \ + ompi_timing_list_t *timing = OMPI_TIMING.timing; \ + i = 0; \ + do { \ + int use; \ + for (use = 0; use < timing->use; use++) { \ + MPI_Reduce(&timing->val[use].ts, avg + i, 1, \ + MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); \ + MPI_Reduce(&timing->val[use].ts, min + i, 1, \ + MPI_DOUBLE, MPI_MIN, 0, MPI_COMM_WORLD); \ + MPI_Reduce(&timing->val[use].ts, max + i, 1, \ + MPI_DOUBLE, MPI_MAX, 0, MPI_COMM_WORLD); \ + desc[i] = timing->val[use].desc; \ + prefix[i] = timing->val[use].prefix; \ + file[i] = timing->val[use].file; \ + i++; \ + } \ + timing = (ompi_timing_list_t*)timing->next; \ + } while (timing != NULL); \ + \ + if( 0 == rank ) { \ + if (OMPI_TIMING.timing->next) { \ + printf("==OMPI_TIMING== warning: added the extra timings allocation that might misrepresent the results.\n" \ + "==OMPI_TIMING== Increase the inited size of timings to avoid extra allocation during runtime.\n"); \ + } \ + \ + printf("------------------ %s ------------------\n", \ + OMPI_TIMING.prefix); \ + imported = OMPI_TIMING.timing->val[0].imported; \ + for(i=0; i< OMPI_TIMING.cnt; i++){ \ + bool print_total = 0; \ + imported = OMPI_TIMING.timing->val[i].imported; \ + avg[i] /= size; \ + printf("%s[%s:%s:%s]: %lf / %lf / %lf\n", \ + imported ? " -- " : "", \ + file[i], prefix[i], desc[i], avg[i], min[i], max[i]); \ + if (OMPI_TIMING.timing->val[i].imported) { \ + total_avg += avg[i]; \ + total_min += min[i]; \ + total_max += max[i]; \ + } \ + if (i == (OMPI_TIMING.cnt-1)) { \ + print_total = true; \ + } else { \ + print_total = imported != OMPI_TIMING.timing->val[i+1].imported; \ + } \ + if (print_total && OMPI_TIMING.timing->val[i].imported) { \ + printf("%s[%s:%s:%s]: %lf / %lf / %lf\n", \ + imported ? " !! " : "", \ + file[i], prefix[i], "total", \ + total_avg, total_min, total_max); \ + total_avg = 0; total_min = 0; total_max = 0; \ + } \ + } \ + total_avg = 0; total_min = 0; total_max = 0; \ + for(i=0; i< OMPI_TIMING.cnt; i++) { \ + if (!OMPI_TIMING.timing->val[i].imported) { \ + total_avg += avg[i]; \ + total_min += min[i]; \ + total_max += max[i]; \ + } \ + } \ + printf("[%s:total] %lf / %lf / %lf\n", \ + OMPI_TIMING.prefix, \ + total_avg, total_min, total_max); \ + printf("[%s:overhead]: %lf \n", OMPI_TIMING.prefix, \ + OMPI_TIMING.get_ts() - OMPI_TIMING.ts); \ + } \ + } \ + free(avg); \ + free(min); \ + free(max); \ + free(desc); \ + free(prefix); \ + free(file); \ + } \ + } \ + } while(0) + +#else +#define OMPI_TIMING_INIT(size) + +#define OMPI_TIMING_NEXT(...) + +#define OMPI_TIMING_APPEND(desc,ts) + +#define OMPI_TIMING_OUT + +#define OMPI_TIMING_IMPORT_OPAL(func) + +#define OMPI_TIMING_FINALIZE + +#define OMPI_TIMING_ENABLED 0 + +#endif + +#endif diff --git a/ompi/win/win.c b/ompi/win/win.c index 3b3d2b9ba04..bd388f967ec 100644 --- a/ompi/win/win.c +++ b/ompi/win/win.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2007 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -14,8 +14,9 @@ * Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2013-2015 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -25,6 +26,8 @@ #include "ompi_config.h" +#include "opal/util/info_subscriber.h" + #include "mpi.h" #include "ompi/win/win.h" #include "ompi/errhandler/errhandler.h" @@ -43,7 +46,7 @@ */ opal_pointer_array_t ompi_mpi_windows = {{0}}; -ompi_predefined_win_t ompi_mpi_win_null = {{{0}}}; +ompi_predefined_win_t ompi_mpi_win_null = {{{{0}}}}; ompi_predefined_win_t *ompi_mpi_win_null_addr = &ompi_mpi_win_null; mca_base_var_enum_t *ompi_win_accumulate_ops = NULL; mca_base_var_enum_flag_t *ompi_win_accumulate_order = NULL; @@ -67,7 +70,7 @@ static mca_base_var_enum_value_flag_t accumulate_order_flags[] = { static void ompi_win_construct(ompi_win_t *win); static void ompi_win_destruct(ompi_win_t *win); -OBJ_CLASS_INSTANCE(ompi_win_t, opal_object_t, +OBJ_CLASS_INSTANCE(ompi_win_t, opal_infosubscriber_t, ompi_win_construct, ompi_win_destruct); int @@ -79,8 +82,8 @@ ompi_win_init(void) /* setup window Fortran array */ OBJ_CONSTRUCT(&ompi_mpi_windows, opal_pointer_array_t); - if( OPAL_SUCCESS != opal_pointer_array_init(&ompi_mpi_windows, 0, - OMPI_FORTRAN_HANDLE_MAX, 64) ) { + if( OPAL_SUCCESS != opal_pointer_array_init(&ompi_mpi_windows, 4, + OMPI_FORTRAN_HANDLE_MAX, 16) ) { return OMPI_ERROR; } @@ -136,7 +139,7 @@ int ompi_win_finalize(void) return OMPI_SUCCESS; } -static int alloc_window(struct ompi_communicator_t *comm, ompi_info_t *info, int flavor, ompi_win_t **win_out) +static int alloc_window(struct ompi_communicator_t *comm, opal_info_t *info, int flavor, ompi_win_t **win_out) { ompi_win_t *win; ompi_group_t *group; @@ -148,7 +151,7 @@ static int alloc_window(struct ompi_communicator_t *comm, ompi_info_t *info, int return OMPI_ERR_OUT_OF_RESOURCE; } - ret = ompi_info_get_value_enum (info, "accumulate_ops", &acc_ops, + ret = opal_info_get_value_enum (info, "accumulate_ops", &acc_ops, OMPI_WIN_ACCUMULATE_OPS_SAME_OP_NO_OP, ompi_win_accumulate_ops, &flag); if (OMPI_SUCCESS != ret) { @@ -158,7 +161,7 @@ static int alloc_window(struct ompi_communicator_t *comm, ompi_info_t *info, int win->w_acc_ops = (ompi_win_accumulate_ops_t)acc_ops; - ret = ompi_info_get_value_enum (info, "accumulate_order", &acc_order, + ret = opal_info_get_value_enum (info, "accumulate_order", &acc_order, OMPI_WIN_ACC_ORDER_RAR | OMPI_WIN_ACC_ORDER_WAR | OMPI_WIN_ACC_ORDER_RAW | OMPI_WIN_ACC_ORDER_WAW, &(ompi_win_accumulate_order->super), &flag); @@ -176,6 +179,12 @@ static int alloc_window(struct ompi_communicator_t *comm, ompi_info_t *info, int OBJ_RETAIN(group); win->w_group = group; + /* Copy the info for the info layer */ + win->super.s_info = OBJ_NEW(opal_info_t); + if (info) { + opal_info_dup(info, &(win->super.s_info)); + } + *win_out = win; return OMPI_SUCCESS; @@ -191,25 +200,25 @@ config_window(void *base, size_t size, int disp_unit, MPI_WIN_BASE, base, true); if (OMPI_SUCCESS != ret) return ret; - ret = ompi_attr_set_fortran_mpi2(WIN_ATTR, win, - &win->w_keyhash, - MPI_WIN_SIZE, size, true); + ret = ompi_attr_set_aint(WIN_ATTR, win, + &win->w_keyhash, + MPI_WIN_SIZE, size, true); if (OMPI_SUCCESS != ret) return ret; - ret = ompi_attr_set_fortran_mpi1(WIN_ATTR, win, - &win->w_keyhash, - MPI_WIN_DISP_UNIT, disp_unit, - true); + ret = ompi_attr_set_int(WIN_ATTR, win, + &win->w_keyhash, + MPI_WIN_DISP_UNIT, disp_unit, + true); if (OMPI_SUCCESS != ret) return ret; - ret = ompi_attr_set_fortran_mpi1(WIN_ATTR, win, - &win->w_keyhash, - MPI_WIN_CREATE_FLAVOR, flavor, true); + ret = ompi_attr_set_int(WIN_ATTR, win, + &win->w_keyhash, + MPI_WIN_CREATE_FLAVOR, flavor, true); if (OMPI_SUCCESS != ret) return ret; - ret = ompi_attr_set_fortran_mpi1(WIN_ATTR, win, - &win->w_keyhash, - MPI_WIN_MODEL, model, true); + ret = ompi_attr_set_int(WIN_ATTR, win, + &win->w_keyhash, + MPI_WIN_MODEL, model, true); if (OMPI_SUCCESS != ret) return ret; win->w_f_to_c_index = opal_pointer_array_add(&ompi_mpi_windows, win); @@ -221,7 +230,7 @@ config_window(void *base, size_t size, int disp_unit, int ompi_win_create(void *base, size_t size, int disp_unit, ompi_communicator_t *comm, - ompi_info_t *info, + opal_info_t *info, ompi_win_t** newwin) { ompi_win_t *win; @@ -252,7 +261,7 @@ ompi_win_create(void *base, size_t size, int -ompi_win_allocate(size_t size, int disp_unit, ompi_info_t *info, +ompi_win_allocate(size_t size, int disp_unit, opal_info_t *info, ompi_communicator_t *comm, void *baseptr, ompi_win_t **newwin) { ompi_win_t *win; @@ -285,7 +294,7 @@ ompi_win_allocate(size_t size, int disp_unit, ompi_info_t *info, int -ompi_win_allocate_shared(size_t size, int disp_unit, ompi_info_t *info, +ompi_win_allocate_shared(size_t size, int disp_unit, opal_info_t *info, ompi_communicator_t *comm, void *baseptr, ompi_win_t **newwin) { ompi_win_t *win; @@ -318,7 +327,7 @@ ompi_win_allocate_shared(size_t size, int disp_unit, ompi_info_t *info, int -ompi_win_create_dynamic(ompi_info_t *info, ompi_communicator_t *comm, ompi_win_t **newwin) +ompi_win_create_dynamic(opal_info_t *info, ompi_communicator_t *comm, ompi_win_t **newwin) { ompi_win_t *win; int model; @@ -358,6 +367,10 @@ ompi_win_free(ompi_win_t *win) NULL); } + if (NULL != (win->super.s_info)) { + OBJ_RELEASE(win->super.s_info); + } + if (OMPI_SUCCESS == ret) { OBJ_RELEASE(win); } diff --git a/ompi/win/win.h b/ompi/win/win.h index ab4af8fc43e..63aec9de14a 100644 --- a/ompi/win/win.h +++ b/ompi/win/win.h @@ -10,10 +10,11 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2006-2012 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2006-2017 Cisco Systems, Inc. All rights reserved * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2013-2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -29,6 +30,7 @@ #include "opal/class/opal_object.h" #include "opal/class/opal_hash_table.h" +#include "opal/util/info_subscriber.h" #include "ompi/errhandler/errhandler.h" #include "ompi/info/info.h" #include "ompi/communicator/communicator.h" @@ -73,12 +75,12 @@ OMPI_DECLSPEC extern mca_base_var_enum_flag_t *ompi_win_accumulate_order; OMPI_DECLSPEC extern opal_pointer_array_t ompi_mpi_windows; struct ompi_win_t { - opal_object_t w_base; + opal_infosubscriber_t super; opal_mutex_t w_lock; char w_name[MPI_MAX_OBJECT_NAME]; - + /* Group associated with this window. */ ompi_group_t *w_group; @@ -117,7 +119,7 @@ OMPI_DECLSPEC OBJ_CLASS_DECLARATION(ompi_win_t); * See ompi/communicator/communicator.h comments with struct ompi_communicator_t * for full explanation why we chose the following padding construct for predefines. */ -#define PREDEFINED_WIN_PAD (sizeof(void*) * 64) +#define PREDEFINED_WIN_PAD 512 struct ompi_predefined_win_t { struct ompi_win_t win; @@ -132,13 +134,13 @@ int ompi_win_init(void); int ompi_win_finalize(void); int ompi_win_create(void *base, size_t size, int disp_unit, - ompi_communicator_t *comm, ompi_info_t *info, + ompi_communicator_t *comm, opal_info_t *info, ompi_win_t **newwin); -int ompi_win_allocate(size_t size, int disp_unit, ompi_info_t *info, +int ompi_win_allocate(size_t size, int disp_unit, opal_info_t *info, ompi_communicator_t *comm, void *baseptr, ompi_win_t **newwin); -int ompi_win_allocate_shared(size_t size, int disp_unit, ompi_info_t *info, +int ompi_win_allocate_shared(size_t size, int disp_unit, opal_info_t *info, ompi_communicator_t *comm, void *baseptr, ompi_win_t **newwin); -int ompi_win_create_dynamic(ompi_info_t *info, ompi_communicator_t *comm, ompi_win_t **newwin); +int ompi_win_create_dynamic(opal_info_t *info, ompi_communicator_t *comm, ompi_win_t **newwin); int ompi_win_free(ompi_win_t *win); @@ -162,7 +164,7 @@ static inline int ompi_win_invalid(ompi_win_t *win) { } static inline int ompi_win_peer_invalid(ompi_win_t *win, int peer) { - if (win->w_group->grp_proc_count <= peer) return true; + if (win->w_group->grp_proc_count <= peer || peer < 0) return true; return false; } diff --git a/opal/Makefile.am b/opal/Makefile.am index b1954255457..b657794eabc 100644 --- a/opal/Makefile.am +++ b/opal/Makefile.am @@ -22,7 +22,6 @@ SUBDIRS = \ include \ - asm \ datatype \ etc \ util \ @@ -37,7 +36,6 @@ SUBDIRS = \ # therefore make distclean will fail). DIST_SUBDIRS = \ include \ - asm \ datatype \ etc \ util \ @@ -50,7 +48,6 @@ DIST_SUBDIRS = \ lib_LTLIBRARIES = lib@OPAL_LIB_PREFIX@open-pal.la lib@OPAL_LIB_PREFIX@open_pal_la_SOURCES = lib@OPAL_LIB_PREFIX@open_pal_la_LIBADD = \ - asm/libasm.la \ datatype/libdatatype.la \ mca/base/libmca_base.la \ util/libopalutil.la \ diff --git a/opal/asm/Makefile.am b/opal/asm/Makefile.am deleted file mode 100644 index 73eebed27d6..00000000000 --- a/opal/asm/Makefile.am +++ /dev/null @@ -1,93 +0,0 @@ -# -# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana -# University Research and Technology -# Corporation. All rights reserved. -# Copyright (c) 2004-2005 The University of Tennessee and The University -# of Tennessee Research Foundation. All rights -# reserved. -# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, -# University of Stuttgart. All rights reserved. -# Copyright (c) 2004-2005 The Regents of the University of California. -# All rights reserved. -# Copyright (c) 2011-2014 Cisco Systems, Inc. All rights reserved. -# Copyright (c) 2017 Research Organization for Information Science -# and Technology (RIST). All rights reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - -include $(top_srcdir)/Makefile.ompi-rules - -###################################################################### -# -# This is a bit complicated. If there is anything in the library, -# it will always be atomic-asm.S. We just symlink atomic-asm.S to -# the best atomic operations available (as determined at configure -# time) -# -###################################################################### -generated/@OPAL_ASM_FILE@: base/@OPAL_ASSEMBLY_ARCH@.asm - @ if test ! -f "$(top_srcdir)/opal/asm/$@" ; then \ - cmd="$(PERL) '$(top_srcdir)/opal/asm/generate-asm.pl' '@OPAL_ASSEMBLY_ARCH@' '@OPAL_ASSEMBLY_FORMAT@' '$(top_srcdir)/opal/asm/base' '$(top_builddir)/opal/asm/generated/@OPAL_ASM_FILE@'" ; \ - echo "$$cmd" ; \ - eval $$cmd ; \ - fi - -atomic-asm.S: generated/@OPAL_ASM_FILE@ - rm -f atomic-asm.S - @ if test -f "$(top_builddir)/opal/asm/generated/@OPAL_ASM_FILE@" ; then \ - cmd="ln -s \"$(top_builddir)/opal/asm/generated/@OPAL_ASM_FILE@\" atomic-asm.S" ; \ - echo "$$cmd" ; \ - eval $$cmd ; \ - else \ - cmd="ln -s \"$(top_srcdir)/opal/asm/generated/@OPAL_ASM_FILE@\" atomic-asm.S" ; \ - echo "$$cmd" ; \ - eval $$cmd ; \ - fi - -if OPAL_HAVE_ASM_FILE -nodist_libasm_la_SOURCES = atomic-asm.S -libasm_la_DEPENDENCIES = generated/@OPAL_ASM_FILE@ -else -nodist_libasm_la_SOURCES = -libasm_la_DEPENDENCIES = -endif - -noinst_LTLIBRARIES = libasm.la -dist_libasm_la_SOURCES = asm.c - -EXTRA_DIST = \ - asm-data.txt \ - generate-asm.pl \ - generate-all-asm.pl \ - base/aix.conf \ - base/default.conf \ - base/X86_64.asm \ - base/ARM.asm \ - base/IA32.asm \ - base/IA64.asm \ - base/MIPS.asm \ - base/POWERPC32.asm \ - base/POWERPC64.asm \ - base/SPARCV9_32.asm \ - base/SPARCV9_64.asm - -###################################################################### - -clean-local: - rm -f atomic-asm.S - -distclean-local: - rm -f generated/atomic-local.s - -###################################################################### - -# -# Copy over all the generated files -# -dist-hook: - mkdir "${distdir}/generated" - $(PERL) "$(top_srcdir)/opal/asm/generate-all-asm.pl" "$(PERL)" "$(srcdir)" "$(distdir)" diff --git a/opal/asm/asm-data.txt b/opal/asm/asm-data.txt deleted file mode 100644 index 198c9f6c886..00000000000 --- a/opal/asm/asm-data.txt +++ /dev/null @@ -1,133 +0,0 @@ -# -*- sh -*- -# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana -# University Research and Technology -# Corporation. All rights reserved. -# Copyright (c) 2004-2005 The University of Tennessee and The University -# of Tennessee Research Foundation. All rights -# reserved. -# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, -# University of Stuttgart. All rights reserved. -# Copyright (c) 2004-2005 The Regents of the University of California. -# All rights reserved. -# Copyright (c) 2014 Intel, Inc. All rights reserved. -# Copyright (c) 2017 Research Organization for Information Science -# and Technology (RIST). All rights reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - -# -# Database for mapping architecture and assembly format to prebuilt -# assembly files. For explination of the assembly operations, see -# the inline assembly header files in src/include/sys/. -# -# FORMAT: -# ARCHITECTURE ASSEMBLY FORMAT BASE FILENAME -# -# Assembly Format field: -# config_file-text-global-label_suffix-gsym-lsym-type-size-align_log-ppc_r_reg-64_bit-gnu_stack - -###################################################################### -# -# AMD Opteron / Intel EM64T -# -###################################################################### - -X86_64 default-.text-.globl-:--.L-@-1-0-1-1-1 x86_64-linux -X86_64 default-.text-.globl-:--.L-@-1-0-1-1-0 x86_64-linux-nongas - - -###################################################################### -# -# ARM (ARMv7 and later) -# -###################################################################### - -ARM default-.text-.globl-:--.L-#-1-1-1-1-1 arm-linux - - -###################################################################### -# -# Intel Pentium Class -# -###################################################################### - -IA32 default-.text-.globl-:--.L-@-1-0-1-1-1 ia32-linux -IA32 default-.text-.globl-:--.L-@-1-0-1-1-0 ia32-linux-nongas -IA32 default-.text-.globl-:-_-L--0-1-1-1-0 ia32-osx -IA32 default-.text-.globl-:-_-L--0-0-1-1-1 ia32-cygwin -IA32 default-.text-.globl-:-_-L--0-0-1-1-0 ia32-cygwin-nongas - - -###################################################################### -# -# IA64 (Intel Itanium) -# -###################################################################### - -IA64 default-.text-.globl-:--.L-@-1-0-1-1-1 ia64-linux -IA64 default-.text-.globl-:--.L-@-1-0-1-1-0 ia64-linux-nongas - - -###################################################################### -# -# PowerPC / POWER -# -###################################################################### - -# standard ppc instruction set (AIX calls it ppc). This is not the -# true intersection of all the POWER / PowerPC machines, but works -# on PowerPCs since the 601 and on at least POWER 3 and above. -POWERPC32 default-.text-.globl-:-_-L--0-1-1-0-0 powerpc32-osx -POWERPC32 default-.text-.globl-:--.L-@-1-1-0-0-1 powerpc32-linux -POWERPC32 default-.text-.globl-:--.L-@-1-1-0-0-0 powerpc32-linux-nongas -POWERPC32 aix-.csect .text[PR]-.globl-:-.-L--0-1-0-0-0 powerpc32-aix - -# The ppc code above, plus support for the 64 bit operations. This -# mode is really only available on OS X when using the OS X 10.3 -# compiler chain with the -mcpu=970 option. -POWERPC32 default-.text-.globl-:-_-L--0-1-1-1-0 powerpc32-64-osx - -# PowerPC / POWER 64bit machines. sizeof(void*) == 8. -POWERPC64 default-.text-.globl-:-_-L--0-1-1-1-0 powerpc64-osx -POWERPC64 default-.text-.globl-:-.-.L-@-1-1-0-1-1 powerpc64-linux -POWERPC64 default-.text-.globl-:-.-.L-@-1-1-0-1-0 powerpc64-linux-nongas -POWERPC64 aix-.csect .text[PR]-.globl-:-.-L--0-1-0-1-0 powerpc64-aix - - -###################################################################### -# -# SPARC / UltraSPARC (Scalalable Processor ARChitecture) -# -###################################################################### - -# Usually compiled with -xarch=v8plus. Basically Sparc V9, but with -# sizeof(void*) == 4 instead of 8. Different from V9_64 because still -# uses 2 registers to pass in a 64bit integer -SPARCV9_32 default-.text-.globl-:--.L-#-1-0-1-1-0 sparcv9-32-solaris - -# The Sparc v9 (aka Ultra Sparc). Sizeof(void*) == 8. -SPARCV9_64 default-.text-.globl-:--.L-#-1-0-1-1-0 sparcv9-64-solaris - - -###################################################################### -# -# MIPS III (Microprocessor without Interlocked Pipeline Stages) -# R4000 and above -# -###################################################################### - -# So MIPS, in it's infinite wisdom (thank you!) decided that when -# compiling in 32bit mode and passing in a 64bit integer, it is done -# in one register (instead of SPARC and POWER, who use two). Which -# means that we can use the same code either way. Woo hoo! - -MIPS default-.text-.globl-:--L--1-1-1-1-0 mips-irix -MIPS default-.text-.globl-:--L--1-1-1-1-0 mips64el -MIPS default-.text-.globl-:--L-@-1-1-1-1-1 mips64-linux - -# However, this doesn't hold true for 32-bit MIPS as used on Linux. -MIPS default-.text-.globl-:--L-@-1-1-1-0-1 mips-linux diff --git a/opal/asm/asm.c b/opal/asm/asm.c deleted file mode 100644 index 766f50f394c..00000000000 --- a/opal/asm/asm.c +++ /dev/null @@ -1,74 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2005 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#include "opal_config.h" - -#include "opal/sys/atomic.h" -#include "opal/sys/architecture.h" - -#if OPAL_ASSEMBLY_ARCH == OPAL_SPARC - -#define LOCKS_TABLE_SIZE 8 -/* make sure to get into reasonably useful bits (so shift at least 5) */ -#define FIND_LOCK(addr) (&(locks_table[(((unsigned long) addr) >> 8) & \ - (LOCKS_TABLE_SIZE - 1)])) - -/* have to fix if you change LOCKS_TABLE_SIZE */ -static opal_atomic_lock_t locks_table[LOCKS_TABLE_SIZE] = { - { { OPAL_ATOMIC_UNLOCKED } }, - { { OPAL_ATOMIC_UNLOCKED } }, - { { OPAL_ATOMIC_UNLOCKED } }, - { { OPAL_ATOMIC_UNLOCKED } }, - { { OPAL_ATOMIC_UNLOCKED } }, - { { OPAL_ATOMIC_UNLOCKED } }, - { { OPAL_ATOMIC_UNLOCKED } }, - { { OPAL_ATOMIC_UNLOCKED } } -}; - - -int32_t -opal_atomic_add_32(volatile int32_t *addr, int delta) -{ - int32_t ret; - - opal_atomic_lock(FIND_LOCK(addr)); - - ret = (*addr += delta); - - opal_atomic_unlock(FIND_LOCK(addr)); - - return ret; -} - - -int32_t -opal_atomic_sub_32(volatile int32_t *addr, int delta) -{ - int32_t ret; - - opal_atomic_lock(FIND_LOCK(addr)); - - ret = (*addr -= delta); - - opal_atomic_unlock(FIND_LOCK(addr)); - - return ret; -} - - -#endif /* OPAL_ASSEMBLY_ARCH == OPAL_SPARC32 */ diff --git a/opal/asm/base/ARM.asm b/opal/asm/base/ARM.asm deleted file mode 100644 index 3f545f49754..00000000000 --- a/opal/asm/base/ARM.asm +++ /dev/null @@ -1,153 +0,0 @@ -START_FILE - TEXT - - ALIGN(4) -START_FUNC(opal_atomic_mb) - dmb - bx lr -END_FUNC(opal_atomic_mb) - - -START_FUNC(opal_atomic_rmb) - dmb - bx lr -END_FUNC(opal_atomic_rmb) - - -START_FUNC(opal_atomic_wmb) - dmb - bx lr -END_FUNC(opal_atomic_wmb) - - -START_FUNC(opal_atomic_cmpset_32) - LSYM(1) - ldrex r3, [r0] - cmp r1, r3 - bne REFLSYM(2) - strex r12, r2, [r0] - cmp r12, #0 - bne REFLSYM(1) - mov r0, #1 - LSYM(2) - movne r0, #0 - bx lr -END_FUNC(opal_atomic_cmpset_32) - - -START_FUNC(opal_atomic_cmpset_acq_32) - LSYM(3) - ldrex r3, [r0] - cmp r1, r3 - bne REFLSYM(4) - strex r12, r2, [r0] - cmp r12, #0 - bne REFLSYM(3) - dmb - mov r0, #1 - LSYM(4) - movne r0, #0 - bx lr -END_FUNC(opal_atomic_cmpset_acq_32) - - -START_FUNC(opal_atomic_cmpset_rel_32) - LSYM(5) - ldrex r3, [r0] - cmp r1, r3 - bne REFLSYM(6) - dmb - strex r12, r2, [r0] - cmp r12, #0 - bne REFLSYM(4) - mov r0, #1 - LSYM(6) - movne r0, #0 - bx lr -END_FUNC(opal_atomic_cmpset_rel_32) - -#START_64BIT -START_FUNC(opal_atomic_cmpset_64) - push {r4-r7} - ldrd r6, r7, [sp, #16] - LSYM(7) - ldrexd r4, r5, [r0] - cmp r4, r2 - it eq - cmpeq r5, r3 - bne REFLSYM(8) - strexd r1, r6, r7, [r0] - cmp r1, #0 - bne REFLSYM(7) - mov r0, #1 - LSYM(8) - movne r0, #0 - pop {r4-r7} - bx lr -END_FUNC(opal_atomic_cmpset_64) - -START_FUNC(opal_atomic_cmpset_acq_64) - push {r4-r7} - ldrd r6, r7, [sp, #16] - LSYM(9) - ldrexd r4, r5, [r0] - cmp r4, r2 - it eq - cmpeq r5, r3 - bne REFLSYM(10) - strexd r1, r6, r7, [r0] - cmp r1, #0 - bne REFLSYM(9) - dmb - mov r0, #1 - LSYM(10) - movne r0, #0 - pop {r4-r7} - bx lr -END_FUNC(opal_atomic_cmpset_acq_64) - - -START_FUNC(opal_atomic_cmpset_rel_64) - push {r4-r7} - ldrd r6, r7, [sp, #16] - LSYM(11) - ldrexd r4, r5, [r0] - cmp r4, r2 - it eq - cmpeq r5, r3 - bne REFLSYM(12) - dmb - strexd r1, r6, r7, [r0] - cmp r1, #0 - bne REFLSYM(11) - mov r0, #1 - LSYM(12) - movne r0, #0 - pop {r4-r7} - bx lr -END_FUNC(opal_atomic_cmpset_rel_64) -#END_64BIT - - -START_FUNC(opal_atomic_add_32) - LSYM(13) - ldrex r2, [r0] - add r2, r2, r1 - strex r3, r2, [r0] - cmp r3, #0 - bne REFLSYM(13) - mov r0, r2 - bx lr -END_FUNC(opal_atomic_add_32) - - -START_FUNC(opal_atomic_sub_32) - LSYM(14) - ldrex r2, [r0] - sub r2, r2, r1 - strex r3, r2, [r0] - cmp r3, #0 - bne REFLSYM(14) - mov r0, r2 - bx lr -END_FUNC(opal_atomic_sub_32) diff --git a/opal/asm/base/IA32.asm b/opal/asm/base/IA32.asm deleted file mode 100644 index f82b9ce2d15..00000000000 --- a/opal/asm/base/IA32.asm +++ /dev/null @@ -1,110 +0,0 @@ -START_FILE - TEXT - -START_FUNC(opal_atomic_mb) - pushl %ebp - movl %esp, %ebp - leave - ret -END_FUNC(opal_atomic_mb) - - -START_FUNC(opal_atomic_rmb) - pushl %ebp - movl %esp, %ebp - leave - ret -END_FUNC(opal_atomic_rmb) - - -START_FUNC(opal_atomic_wmb) - pushl %ebp - movl %esp, %ebp - leave - ret -END_FUNC(opal_atomic_wmb) - - -START_FUNC(opal_atomic_cmpset_32) - pushl %ebp - movl %esp, %ebp - movl 8(%ebp), %edx - movl 16(%ebp), %ecx - movl 12(%ebp), %eax - lock; cmpxchgl %ecx,(%edx) - sete %dl - - movzbl %dl, %eax - leave - ret -END_FUNC(opal_atomic_cmpset_32) - - -START_FUNC(opal_atomic_cmpset_64) - pushl %ebp - movl %esp, %ebp - subl $32, %esp - movl %ebx, -12(%ebp) - movl %esi, -8(%ebp) - movl %edi, -4(%ebp) - movl 8(%ebp), %edi - movl 12(%ebp), %eax - movl 16(%ebp), %edx - movl %eax, -24(%ebp) - movl %edx, -20(%ebp) - movl 20(%ebp), %eax - movl 24(%ebp), %edx - movl %eax, -32(%ebp) - movl %edx, -28(%ebp) - movl -24(%ebp), %ebx - movl -20(%ebp), %edx - movl -32(%ebp), %esi - movl -28(%ebp), %ecx - movl %ebx, %eax - push %ebx - movl %esi, %ebx - lock; cmpxchg8b (%edi) - sete %dl - pop %ebx - - movzbl %dl, %eax - movl -12(%ebp), %ebx - movl -8(%ebp), %esi - movl -4(%ebp), %edi - movl %ebp, %esp - popl %ebp - ret -END_FUNC(opal_atomic_cmpset_64) - - -START_FUNC(opal_atomic_add_32) - pushl %ebp - movl %esp, %ebp - movl 8(%ebp), %eax - movl 12(%ebp), %edx - lock; addl %edx,(%eax) - movl (%eax), %eax - leave - ret -END_FUNC(opal_atomic_add_32) - - -START_FUNC(opal_atomic_sub_32) - pushl %ebp - movl %esp, %ebp - movl 8(%ebp), %eax - movl 12(%ebp), %edx - lock; subl %edx,(%eax) - movl (%eax), %eax - leave - ret -END_FUNC(opal_atomic_sub_32) - - -START_FUNC(opal_sys_timer_get_cycles) - pushl %ebp - movl %esp, %ebp - rdtsc - popl %ebp - ret -END_FUNC(opal_sys_timer_get_cycles) diff --git a/opal/asm/base/IA64.asm b/opal/asm/base/IA64.asm deleted file mode 100644 index 686af30b613..00000000000 --- a/opal/asm/base/IA64.asm +++ /dev/null @@ -1,109 +0,0 @@ -START_FILE - - .pred.safe_across_calls p1-p5,p16-p63 - .text - .align 16 - .global opal_atomic_mb# - .proc opal_atomic_mb# -opal_atomic_mb: - .prologue - .body - mf - br.ret.sptk.many b0 - ;; - .endp opal_atomic_mb# - .align 16 - .global opal_atomic_rmb# - .proc opal_atomic_rmb# -opal_atomic_rmb: - .prologue - .body - mf - br.ret.sptk.many b0 - ;; - .endp opal_atomic_rmb# - .align 16 - .global opal_atomic_wmb# - .proc opal_atomic_wmb# -opal_atomic_wmb: - .prologue - .body - mf - br.ret.sptk.many b0 - ;; - .endp opal_atomic_wmb# - .align 16 - .global opal_atomic_cmpset_acq_32# - .proc opal_atomic_cmpset_acq_32# -opal_atomic_cmpset_acq_32: - .prologue - .body - mov ar.ccv=r33;; - cmpxchg4.acq r32=[r32],r34,ar.ccv - ;; - cmp4.eq p6, p7 = r32, r33 - ;; - (p6) addl r8 = 1, r0 - (p7) mov r8 = r0 - br.ret.sptk.many b0 - ;; - .endp opal_atomic_cmpset_acq_32# - .align 16 - .global opal_atomic_cmpset_rel_32# - .proc opal_atomic_cmpset_rel_32# -opal_atomic_cmpset_rel_32: - .prologue - .body - mov ar.ccv=r33;; - cmpxchg4.rel r32=[r32],r34,ar.ccv - ;; - cmp4.eq p6, p7 = r32, r33 - ;; - (p6) addl r8 = 1, r0 - (p7) mov r8 = r0 - br.ret.sptk.many b0 - ;; - .endp opal_atomic_cmpset_rel_32# - .align 16 - .global opal_atomic_cmpset_acq_64# - .proc opal_atomic_cmpset_acq_64# -opal_atomic_cmpset_acq_64: - .prologue - .body - mov ar.ccv=r33;; - cmpxchg8.acq r32=[r32],r34,ar.ccv - ;; - cmp.eq p6, p7 = r33, r32 - ;; - (p6) addl r8 = 1, r0 - (p7) mov r8 = r0 - br.ret.sptk.many b0 - ;; - .endp opal_atomic_cmpset_acq_64# - .align 16 - .global opal_atomic_cmpset_rel_64# - .proc opal_atomic_cmpset_rel_64# -opal_atomic_cmpset_rel_64: - .prologue - .body - mov ar.ccv=r33;; - cmpxchg8.rel r32=[r32],r34,ar.ccv - ;; - cmp.eq p6, p7 = r33, r32 - ;; - (p6) addl r8 = 1, r0 - (p7) mov r8 = r0 - br.ret.sptk.many b0 - ;; - .endp opal_atomic_cmpset_rel_64# - .align 16 - .global opal_sys_timer_get_cycles# - .proc opal_sys_timer_get_cycles# -opal_sys_timer_get_cycles: - .prologue - .body - mov r8=ar.itc - br.ret.sptk.many b0 - ;; - .endp opal_sys_timer_get_cycles# - .ident "GCC: (GNU) 3.2.3 20030502 (Red Hat Linux 3.2.3-49)" diff --git a/opal/asm/base/MIPS.asm b/opal/asm/base/MIPS.asm deleted file mode 100644 index 0a82a173dbc..00000000000 --- a/opal/asm/base/MIPS.asm +++ /dev/null @@ -1,196 +0,0 @@ -START_FILE - -#ifdef __linux__ -#include -#else -#include -#endif -#include - - TEXT - - ALIGN(8) -LEAF(opal_atomic_mb) -#ifdef __linux__ - .set mips2 -#endif - sync -#ifdef __linux__ - .set mips0 -#endif - j ra -END(opal_atomic_mb) - - - ALIGN(8) -LEAF(opal_atomic_rmb) -#ifdef __linux__ - .set mips2 -#endif - sync -#ifdef __linux__ - .set mips0 -#endif - j ra -END(opal_atomic_rmb) - - -LEAF(opal_atomic_wmb) -#ifdef __linux__ - .set mips2 -#endif - sync -#ifdef __linux__ - .set mips0 -#endif - j ra -END(opal_atomic_wmb) - - -LEAF(opal_atomic_cmpset_32) - .set noreorder -retry1: -#ifdef __linux__ - .set mips2 -#endif - ll $3, 0($4) -#ifdef __linux__ - .set mips0 -#endif - bne $3, $5, done1 - or $2, $6, 0 -#ifdef __linux__ - .set mips2 -#endif - sc $2, 0($4) -#ifdef __linux__ - .set mips0 -#endif - beqz $2, retry1 -done1: - xor $3,$3,$5 - j ra - sltu $2,$3,1 - .set reorder -END(opal_atomic_cmpset_32) - - -LEAF(opal_atomic_cmpset_acq_32) - .set noreorder -retry2: -#ifdef __linux__ - .set mips2 -#endif - ll $3, 0($4) -#ifdef __linux__ - .set mips0 -#endif - bne $3, $5, done2 - or $2, $6, 0 -#ifdef __linux__ - .set mips2 -#endif - sc $2, 0($4) -#ifdef __linux__ - .set mips0 -#endif - beqz $2, retry2 -done2: -#ifdef __linux__ - .set mips2 -#endif - sync -#ifdef __linux__ - .set mips0 -#endif - xor $3,$3,$5 - j ra - sltu $2,$3,1 - .set reorder -END(opal_atomic_cmpset_acq_32) - - -LEAF(opal_atomic_cmpset_rel_32) - .set noreorder -#ifdef __linux__ - .set mips2 -#endif - sync -#ifdef __linux__ - .set mips0 -#endif -retry3: -#ifdef __linux__ - .set mips2 -#endif - ll $3, 0($4) -#ifdef __linux__ - .set mips0 -#endif - bne $3, $5, done3 - or $2, $6, 0 -#ifdef __linux__ - .set mips2 -#endif - sc $2, 0($4) -#ifdef __linux__ - .set mips0 -#endif - beqz $2, retry3 -done3: - xor $3,$3,$5 - j ra - sltu $2,$3,1 - .set reorder -END(opal_atomic_cmpset_rel_32) - -#ifdef __mips64 -LEAF(opal_atomic_cmpset_64) - .set noreorder -retry4: - lld $3, 0($4) - bne $3, $5, done4 - or $2, $6, 0 - scd $2, 0($4) - beqz $2, retry4 -done4: - xor $3,$3,$5 - j ra - sltu $2,$3,1 - .set reorder -END(opal_atomic_cmpset_64) - - -LEAF(opal_atomic_cmpset_acq_64) - .set noreorder -retry5: - lld $3, 0($4) - bne $3, $5, done5 - or $2, $6, 0 - scd $2, 0($4) - beqz $2, retry5 -done5: - sync - xor $3,$3,$5 - j ra - sltu $2,$3,1 - .set reorder -END(opal_atomic_cmpset_acq_64) - - -LEAF(opal_atomic_cmpset_rel_64) - .set noreorder - sync -retry6: - lld $3, 0($4) - bne $3, $5, done6 - or $2, $6, 0 - scd $2, 0($4) - beqz $2, retry6 -done6: - xor $3,$3,$5 - j ra - sltu $2,$3,1 - .set reorder -END(opal_atomic_cmpset_rel_64) -#endif /* __mips64 */ diff --git a/opal/asm/base/POWERPC32.asm b/opal/asm/base/POWERPC32.asm deleted file mode 100644 index 6939fef8f86..00000000000 --- a/opal/asm/base/POWERPC32.asm +++ /dev/null @@ -1,168 +0,0 @@ -START_FILE - TEXT - - ALIGN(4) -START_FUNC(opal_atomic_mb) - sync - blr -END_FUNC(opal_atomic_mb) - - -START_FUNC(opal_atomic_rmb) - lwsync - blr -END_FUNC(opal_atomic_rmb) - - -START_FUNC(opal_atomic_wmb) - eieio - blr -END_FUNC(opal_atomic_wmb) - - -START_FUNC(opal_atomic_cmpset_32) - LSYM(1) lwarx r0, 0, r3 - cmpw 0, r0, r4 - bne- REFLSYM(2) - stwcx. r5, 0, r3 - bne- REFLSYM(1) - LSYM(2) - xor r3,r0,r4 - subfic r5,r3,0 - adde r3,r5,r3 - blr -END_FUNC(opal_atomic_cmpset_32) - - -START_FUNC(opal_atomic_cmpset_acq_32) - LSYM(3) lwarx r0, 0, r3 - cmpw 0, r0, r4 - bne- REFLSYM(4) - stwcx. r5, 0, r3 - bne- REFLSYM(3) - sync - LSYM(4) - xor r3,r0,r4 - subfic r5,r3,0 - adde r3,r5,r3 - lwsync - blr -END_FUNC(opal_atomic_cmpset_acq_32) - - -START_FUNC(opal_atomic_cmpset_rel_32) - eieio - LSYM(5) lwarx r0, 0, r3 - cmpw 0, r0, r4 - bne- REFLSYM(6) - stwcx. r5, 0, r3 - bne- REFLSYM(5) - sync - LSYM(6) - xor r3,r0,r4 - subfic r5,r3,0 - adde r3,r5,r3 - blr -END_FUNC(opal_atomic_cmpset_rel_32) - -#START_64BIT -START_FUNC(opal_atomic_cmpset_64) - stw r4,-32(r1) - stw r5,-28(r1) - stw r6,-24(r1) - stw r7,-20(r1) - ld r5,-32(r1) - ld r7,-24(r1) - LSYM(7) ldarx r9, 0, r3 - cmpd 0, r9, r5 - bne- REFLSYM(8) - stdcx. r7, 0, r3 - bne- REFLSYM(7) - LSYM(8) - xor r3,r5,r9 - subfic r5,r3,0 - adde r3,r5,r3 - blr -END_FUNC(opal_atomic_cmpset_64) - - -START_FUNC(opal_atomic_cmpset_acq_64) - stw r4,-32(r1) - stw r5,-28(r1) - stw r6,-24(r1) - stw r7,-20(r1) - ld r5,-32(r1) - ld r7,-24(r1) - - LSYM(9) ldarx r9, 0, r3 - cmpd 0, r9, r5 - bne- REFLSYM(10) - stdcx. r7, 0, r3 - bne- REFLSYM(9) - LSYM(10) - xor r3,r5,r9 - subfic r5,r3,0 - adde r3,r5,r3 - blr - lwsync - blr -END_FUNC(opal_atomic_cmpset_acq_64) - - -START_FUNC(opal_atomic_cmpset_rel_64) - stw r4,-32(r1) - stw r5,-28(r1) - stw r6,-24(r1) - stw r7,-20(r1) - ld r5,-32(r1) - ld r7,-24(r1) - - eieio - LSYM(11) ldarx r9, 0, r3 - cmpd 0, r9, r5 - bne- REFLSYM(12) - stdcx. r7, 0, r3 - bne- REFLSYM(11) - LSYM(12) - xor r3,r5,r9 - subfic r5,r3,0 - adde r3,r5,r3 - blr - lwsync - blr -END_FUNC(opal_atomic_cmpset_rel_64) -#END_64BIT - - -START_FUNC(opal_atomic_add_32) - LSYM(13) lwarx r0, 0, r3 - add r0, r4, r0 - stwcx. r0, 0, r3 - bne- REFLSYM(13) - mr r3,r0 - blr -END_FUNC(opal_atomic_add_32) - - -START_FUNC(opal_atomic_sub_32) - LSYM(14) lwarx r0,0,r3 - subf r0,r4,r0 - stwcx. r0,0,r3 - bne- REFLSYM(14) - mr r3,r0 - blr -END_FUNC(opal_atomic_sub_32) - -START_FUNC(opal_sys_timer_get_cycles) - LSYM(15) - mftbu r0 - mftb r11 - mftbu r2 - cmpw cr7,r2,r0 - bne+ cr7,REFLSYM(15) - li r4,0 - li r9,0 - or r3,r2,r9 - or r4,r4,r11 - blr -END_FUNC(opal_sys_timer_get_cycles) diff --git a/opal/asm/base/POWERPC64.asm b/opal/asm/base/POWERPC64.asm deleted file mode 100644 index 28da3f4d8e0..00000000000 --- a/opal/asm/base/POWERPC64.asm +++ /dev/null @@ -1,157 +0,0 @@ -START_FILE - TEXT - - ALIGN(4) -START_FUNC(opal_atomic_mb) - sync - blr -END_FUNC(opal_atomic_mb) - - -START_FUNC(opal_atomic_rmb) - lwsync - blr -END_FUNC(opal_atomic_rmb) - - -START_FUNC(opal_atomic_wmb) - eieio - blr -END_FUNC(opal_atomic_wmb) - - -START_FUNC(opal_atomic_cmpset_32) - LSYM(1) lwarx r0, 0, r3 - cmpw 0, r0, r4 - bne- REFLSYM(2) - stwcx. r5, 0, r3 - bne- REFLSYM(1) - LSYM(2) - cmpw cr7,r0,r4 - mfcr r3 - rlwinm r3,r3,31,1 - blr -END_FUNC(opal_atomic_cmpset_32) - - -START_FUNC(opal_atomic_cmpset_acq_32) - mflr r0 - std r29,-24(r1) - std r0,16(r1) - stdu r1,-144(r1) - bl REFGSYM(opal_atomic_cmpset_32) - mr r29,r3 - bl REFGSYM(opal_atomic_rmb) - mr r3,r29 - addi r1,r1,144 - ld r0,16(r1) - mtlr r0 - ld r29,-24(r1) - blr -END_FUNC(opal_atomic_cmpset_acq_32) - - -START_FUNC(opal_atomic_cmpset_rel_32) - mflr r0 - std r27,-40(r1) - std r28,-32(r1) - std r29,-24(r1) - std r0,16(r1) - stdu r1,-160(r1) - mr r29,r3 - mr r28,r4 - mr r27,r5 - bl REFGSYM(opal_atomic_wmb) - mr r3,r29 - mr r4,r28 - mr r5,r27 - bl REFGSYM(opal_atomic_cmpset_32) - addi r1,r1,160 - ld r0,16(r1) - mtlr r0 - ld r27,-40(r1) - ld r28,-32(r1) - ld r29,-24(r1) - blr -END_FUNC(opal_atomic_cmpset_rel_32) - - -START_FUNC(opal_atomic_cmpset_64) - LSYM(3) ldarx r0, 0, r3 - cmpd 0, r0, r4 - bne- REFLSYM(4) - stdcx. r5, 0, r3 - bne- REFLSYM(3) - LSYM(4) - xor r3,r4,r0 - subfic r5,r3,0 - adde r3,r5,r3 - blr -END_FUNC(opal_atomic_cmpset_64) - - -START_FUNC(opal_atomic_cmpset_acq_64) - LSYM(7) ldarx r0, 0, r3 - cmpd 0, r0, r4 - bne- REFLSYM(8) - stdcx. r5, 0, r3 - bne- REFLSYM(7) - LSYM(8) - lwsync - xor r3,r4,r0 - subfic r5,r3,0 - adde r3,r5,r3 - blr -END_FUNC(opal_atomic_cmpset_acq_64) - - -START_FUNC(opal_atomic_cmpset_rel_64) - eieio - LSYM(9) ldarx r0, 0, r3 - cmpd 0, r0, r4 - bne- REFLSYM(10) - stdcx. r5, 0, r3 - bne- REFLSYM(9) - LSYM(10) - xor r3,r4,r0 - subfic r5,r3,0 - adde r3,r5,r3 - blr -END_FUNC(opal_atomic_cmpset_rel_64) - - -START_FUNC(opal_atomic_add_32) - LSYM(5) lwarx r0, 0, r3 - add r0, r4, r0 - stwcx. r0, 0, r3 - bne- REFLSYM(5) - - mr r3,r0 - blr -END_FUNC(opal_atomic_add_32) - - -START_FUNC(opal_atomic_sub_32) - LSYM(6) lwarx r0,0,r3 - subf r0,r4,r0 - stwcx. r0,0,r3 - bne- REFLSYM(6) - - mr r3,r0 - blr -END_FUNC(opal_atomic_sub_32) - -START_FUNC(opal_sys_timer_get_cycles) - LSYM(11) - mftbu r2 - rldicl r2,r2,0,32 - mftb r0 - rldicl r9,r0,0,32 - mftbu r0 - rldicl r0,r0,0,32 - cmpw cr7,r0,r2 - bne cr7,REFLSYM(11) - sldi r3,r0,32 - or r3,r3,r9 - blr -END_FUNC(opal_sys_timer_get_cycles) diff --git a/opal/asm/base/SPARCV9_32.asm b/opal/asm/base/SPARCV9_32.asm deleted file mode 100644 index eb004a80653..00000000000 --- a/opal/asm/base/SPARCV9_32.asm +++ /dev/null @@ -1,171 +0,0 @@ -START_FILE - TEXT - - ALIGN(4) - - -START_FUNC(opal_atomic_mb) - !#PROLOGUE# 0 - !#PROLOGUE# 1 - membar #LoadLoad | #LoadStore | #StoreStore | #StoreLoad - retl - nop -END_FUNC(opal_atomic_mb) - - -START_FUNC(opal_atomic_rmb) - !#PROLOGUE# 0 - !#PROLOGUE# 1 - membar #LoadLoad - retl - nop -END_FUNC(opal_atomic_rmb) - - -START_FUNC(opal_atomic_wmb) - !#PROLOGUE# 0 - !#PROLOGUE# 1 - membar #StoreStore - retl - nop -END_FUNC(opal_atomic_wmb) - - -START_FUNC(opal_atomic_cmpset_32) - !#PROLOGUE# 0 - !#PROLOGUE# 1 - casa [%o0] 0x80, %o1, %o2 - xor %o2, %o1, %o2 - subcc %g0, %o2, %g0 - retl - subx %g0, -1, %o0 -END_FUNC(opal_atomic_cmpset_32) - - -START_FUNC(opal_atomic_cmpset_acq_32) - !#PROLOGUE# 0 - !#PROLOGUE# 1 - casa [%o0] 0x80, %o1, %o2 - xor %o2, %o1, %o2 - subcc %g0, %o2, %g0 - subx %g0, -1, %o0 - membar #LoadLoad - retl - sra %o0, 0, %o0 -END_FUNC(opal_atomic_cmpset_acq_32) - - -START_FUNC(opal_atomic_cmpset_rel_32) - !#PROLOGUE# 0 - !#PROLOGUE# 1 - membar #StoreStore - casa [%o0] 0x80, %o1, %o2 - xor %o2, %o1, %o2 - subcc %g0, %o2, %g0 - retl - subx %g0, -1, %o0 -END_FUNC(opal_atomic_cmpset_rel_32) - - -START_FUNC(opal_atomic_cmpset_64) - !#PROLOGUE# 0 - save %sp, -128, %sp - !#PROLOGUE# 1 - mov %i3, %o4 - mov %i4, %o5 - st %i1, [%fp-32] - st %i2, [%fp-28] - std %o4, [%fp-24] - ldx [%fp-24], %g1 - ldx [%fp-32], %g2 - casxa [%i0] 0x80, %g2, %g1 - stx %g1, [%fp-24] - - ld [%fp-24], %i5 - ld [%fp-32], %g1 - cmp %i5, %g1 - bne REFLSYM(12) - mov 0, %i0 - ld [%fp-20], %i2 - ld [%fp-28], %i1 - cmp %i2, %i1 - be,a REFLSYM(12) - mov 1, %i0 -LSYM(12) - ret - restore -END_FUNC(opal_atomic_cmpset_64) - - -START_FUNC(opal_atomic_cmpset_acq_64) - !#PROLOGUE# 0 - save %sp, -128, %sp - !#PROLOGUE# 1 - mov %i1, %o4 - mov %i2, %o5 - mov %i3, %o2 - mov %i4, %o3 - std %o4, [%fp-32] - std %o2, [%fp-24] - ldx [%fp-24], %g1 - ldx [%fp-32], %g2 - casxa [%i0] 0x80, %g2, %g1 - stx %g1, [%fp-24] - - ld [%fp-24], %i5 - ld [%fp-32], %g1 - cmp %i5, %g1 - bne REFLSYM(16) - mov 0, %i0 - ld [%fp-20], %i2 - ld [%fp-28], %i1 - cmp %i2, %i1 - be,a REFLSYM(16) - mov 1, %i0 -LSYM(16) - membar #LoadLoad - ret - restore -END_FUNC(opal_atomic_cmpset_acq_64) - - -START_FUNC(opal_atomic_cmpset_rel_64) - !#PROLOGUE# 0 - save %sp, -128, %sp - !#PROLOGUE# 1 - mov %i1, %o4 - mov %i2, %o5 - mov %i3, %o2 - mov %i4, %o3 - membar #StoreStore - std %o4, [%fp-32] - std %o2, [%fp-24] - ldx [%fp-24], %g1 - ldx [%fp-32], %g2 - casxa [%i0] 0x80, %g2, %g1 - stx %g1, [%fp-24] - - ld [%fp-24], %i5 - ld [%fp-32], %g1 - cmp %i5, %g1 - bne REFLSYM(21) - mov 0, %i0 - ld [%fp-20], %i2 - ld [%fp-28], %i1 - cmp %i2, %i1 - be,a REFLSYM(21) - mov 1, %i0 -LSYM(21) - ret - restore -END_FUNC(opal_atomic_cmpset_rel_64) - - -START_FUNC(opal_sys_timer_get_cycles) - save %sp,-96,%sp - rd %tick,%o0 - srlx %o0,32,%o1 - or %g0,%o1,%i0 - ret ! Result = %i0 - restore %o0,0,%o1 -END_FUNC(opal_sys_timer_get_cycles) diff --git a/opal/asm/base/SPARCV9_64.asm b/opal/asm/base/SPARCV9_64.asm deleted file mode 100644 index 9820ab34ce1..00000000000 --- a/opal/asm/base/SPARCV9_64.asm +++ /dev/null @@ -1,111 +0,0 @@ -START_FILE - TEXT - - ALIGN(4) - - -START_FUNC(opal_atomic_mb) - !#PROLOGUE# 0 - !#PROLOGUE# 1 - membar #LoadLoad | #LoadStore | #StoreStore | #StoreLoad - retl - nop -END_FUNC(opal_atomic_mb) - - -START_FUNC(opal_atomic_rmb) - !#PROLOGUE# 0 - !#PROLOGUE# 1 - membar #LoadLoad - retl - nop -END_FUNC(opal_atomic_rmb) - - -START_FUNC(opal_atomic_wmb) - !#PROLOGUE# 0 - !#PROLOGUE# 1 - membar #StoreStore - retl - nop -END_FUNC(opal_atomic_wmb) - - -START_FUNC(opal_atomic_cmpset_32) - !#PROLOGUE# 0 - !#PROLOGUE# 1 - casa [%o0] 0x80, %o1, %o2 - xor %o2, %o1, %o2 - subcc %g0, %o2, %g0 - retl - subx %g0, -1, %o0 -END_FUNC(opal_atomic_cmpset_32) - - -START_FUNC(opal_atomic_cmpset_acq_32) - !#PROLOGUE# 0 - !#PROLOGUE# 1 - casa [%o0] 0x80, %o1, %o2 - xor %o2, %o1, %o2 - subcc %g0, %o2, %g0 - subx %g0, -1, %o0 - membar #LoadLoad - retl - sra %o0, 0, %o0 -END_FUNC(opal_atomic_cmpset_acq_32) - - -START_FUNC(opal_atomic_cmpset_rel_32) - !#PROLOGUE# 0 - !#PROLOGUE# 1 - membar #StoreStore - casa [%o0] 0x80, %o1, %o2 - xor %o2, %o1, %o2 - subcc %g0, %o2, %g0 - retl - subx %g0, -1, %o0 -END_FUNC(opal_atomic_cmpset_rel_32) - - -START_FUNC(opal_atomic_cmpset_64) - !#PROLOGUE# 0 - !#PROLOGUE# 1 - casxa [%o0] 0x80, %o1, %o2 - mov 0, %o0 - xor %o2, %o1, %o2 - retl - movre %o2, 1, %o0 -END_FUNC(opal_atomic_cmpset_64) - - -START_FUNC(opal_atomic_cmpset_acq_64) - !#PROLOGUE# 0 - !#PROLOGUE# 1 - casxa [%o0] 0x80, %o1, %o2 - mov 0, %o0 - xor %o2, %o1, %o2 - movre %o2, 1, %o0 - membar #LoadLoad - retl - sra %o0, 0, %o0 -END_FUNC(opal_atomic_cmpset_acq_64) - - -START_FUNC(opal_atomic_cmpset_rel_64) - !#PROLOGUE# 0 - !#PROLOGUE# 1 - membar #StoreStore - casxa [%o0] 0x80, %o1, %o2 - mov 0, %o0 - xor %o2, %o1, %o2 - retl - movre %o2, 1, %o0 -END_FUNC(opal_atomic_cmpset_rel_64) - - -START_FUNC(opal_sys_timer_get_cycles) - save %sp,-176,%sp - rd %tick,%o0 - ret ! Result = %i0 - restore %o0,0,%o0 -END_FUNC(opal_sys_timer_get_cycles) diff --git a/opal/asm/base/X86_64.asm b/opal/asm/base/X86_64.asm deleted file mode 100644 index 2468b638f64..00000000000 --- a/opal/asm/base/X86_64.asm +++ /dev/null @@ -1,52 +0,0 @@ -START_FILE - TEXT - -START_FUNC(opal_atomic_mb) - pushq %rbp - movq %rsp, %rbp - leave - ret -END_FUNC(opal_atomic_mb) - - -START_FUNC(opal_atomic_rmb) - pushq %rbp - movq %rsp, %rbp - leave - ret -END_FUNC(opal_atomic_rmb) - - -START_FUNC(opal_atomic_wmb) - pushq %rbp - movq %rsp, %rbp - leave - ret -END_FUNC(opal_atomic_wmb) - - -START_FUNC(opal_atomic_cmpset_32) - movl %esi, %eax - lock; cmpxchgl %edx,(%rdi) - sete %dl - movzbl %dl, %eax - ret -END_FUNC(opal_atomic_cmpset_32) - - -START_FUNC(opal_atomic_cmpset_64) - movq %rsi, %rax - lock; cmpxchgq %rdx,(%rdi) - sete %dl - movzbl %dl, %eax - ret -END_FUNC(opal_atomic_cmpset_64) - - -START_FUNC(opal_sys_timer_get_cycles) - rdtsc - salq $32, %rdx - mov %eax, %eax - orq %rdx, %rax - ret -END_FUNC(opal_sys_timer_get_cycles) diff --git a/opal/asm/base/aix.conf b/opal/asm/base/aix.conf deleted file mode 100644 index 482aabdd418..00000000000 --- a/opal/asm/base/aix.conf +++ /dev/null @@ -1,44 +0,0 @@ -sub start_file() -{ - my $ret = ""; - if ($IS64BIT == 1) { - $ret .= "\t.machine \"ppc64\"\n"; - } else { - $ret .= "\t.machine \"ppc\"\n"; - } - $ret .= "\t.toc\n"; - return $ret; -} - - -sub start_func($) -{ - my $func_name = shift; - my $ret = ""; - - $ret = "\t$GLOBAL $func_name\n"; - $ret .= "\t$GLOBAL $GSYM$func_name\n"; - $ret .= "\t.csect [DS],3\n"; - - $ret .= "$func_name$SUFFIX\n"; - - if ($IS64BIT == 1) { - $ret .= "\t.llong .$func_name, TOC[tc0], 0\n"; - } else { - $ret .= "\t.long .$func_name, TOC[tc0], 0\n"; - } - $ret .= "\t.csect [PR]\n"; - - $ret .= "\t.align 2\n"; - $ret .= "$GSYM$func_name$SUFFIX\n"; - - return $ret; -} - - -sub end_func($) -{ - return ""; -} - -1 diff --git a/opal/asm/base/default.conf b/opal/asm/base/default.conf deleted file mode 100644 index c54f085cf99..00000000000 --- a/opal/asm/base/default.conf +++ /dev/null @@ -1,34 +0,0 @@ -sub start_file -{ - return ""; -} - - -sub start_func($) -{ - my $func_name = shift; - my $ret = ""; - - $ret = "\t$GLOBAL $GSYM$func_name\n"; - if (! $TYPE eq "") { - $ret .= "\t.type $GSYM$func_name, $TYPE" . "function\n"; - } - $ret .= "$GSYM$func_name$SUFFIX\n"; - - return $ret; -} - - -sub end_func($) -{ - my $func_name = shift; - my $ret = ""; - - if ($SIZE != 0) { - $ret = "\t.size $GSYM$func_name, .-$GSYM$func_name\n"; - } - - return $ret; -} - -1 diff --git a/opal/asm/generate-all-asm.pl b/opal/asm/generate-all-asm.pl deleted file mode 100644 index e452cbeaf2e..00000000000 --- a/opal/asm/generate-all-asm.pl +++ /dev/null @@ -1,27 +0,0 @@ -#!/usr/bin/perl -w - -my $perl = shift; -my $srcdir = shift; -my $destdir = shift; - -if (! $perl || ! $srcdir || ! $destdir) { - print "ERROR: invalid argument to generate-all-asm.pl\n"; - print "usage: generate-all-asm.pl [PERL] [SRCDIR] [DESTDIR]\n"; - exit 1; -} - -open(DATAFILE, "$srcdir/asm-data.txt") || die "Could not open data file: $!\n"; - -my $ASMARCH = ""; -my $ASMFORMAT = ""; -my $ASMFILE = ""; - -while() { - if (/^#/) { next; } - ($ASMARCH, $ASMFORMAT, $ASMFILE) = /(.*)\t(.*)\t(.*)/; - if (! $ASMARCH || ! $ASMFORMAT) { next; } - - print "--> Generating assembly for \"$ASMARCH\" \"$ASMFORMAT\"\n"; - system("$perl \'$srcdir/generate-asm.pl\' \'$ASMARCH\' \'$ASMFORMAT\' \'$srcdir/base\' \'$destdir/generated/atomic-$ASMFILE.s\'"); - -} diff --git a/opal/asm/generate-asm.pl b/opal/asm/generate-asm.pl deleted file mode 100644 index 6c904a77f36..00000000000 --- a/opal/asm/generate-asm.pl +++ /dev/null @@ -1,122 +0,0 @@ -#!/usr/bin/perl -w -# -# Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - - -my $asmarch = shift; -my $asmformat = shift; -my $basedir = shift; -my $output = shift; - -if ( ! $asmarch) { - print "usage: generate-asm.pl [ASMARCH] [ASMFORMAT] [BASEDIR] [OUTPUT NAME]\n"; - exit(1); -} - -open(INPUT, "$basedir/$asmarch.asm") || - die "Could not open $basedir/$asmarch.asm: $!\n"; -open(OUTPUT, ">$output") || die "Could not open $output: $!\n"; - -$CONFIG = "default"; -$TEXT = ""; -$GLOBAL = ""; -$SUFFIX = ""; -$GSYM = ""; -$LSYM = ""; -$TYPE = ""; -$SIZE = 0; -$ALIGN_LOG = 0; -$DEL_R_REG = 0; -$IS64BIT = 0; - -($CONFIG, $TEXT, $GLOBAL, $SUFFIX, $GSYM, $LSYM, $TYPE, $SIZE, $ALIGN_LOG, $DEL_R_REG, $IS64BIT, $GNU_STACK) = ( - $asmformat =~ /(.*)\-(.*)\-(.*)\-(.*)\-(.*)\-(.*)\-(.*)\-(.*)\-(.*)\-(.*)\-(.*)\-(.*)/); - -if (0) { -print "$asmformat\n"; -print "CONFIG: $CONFIG\n"; -print "TEXT: $TEXT\n"; -print "GLOBAL: $GLOBAL\n"; -print "SUFFIX: $SUFFIX\n"; -print "GSYM: $GSYM\n"; -print "LSYM: $LSYM\n"; -print "GNU_STACK: $GNU_STACK\n"; -} - -my $current_func = ""; -my $delete = 0; - -# load our configuration -do "$basedir/$CONFIG.conf" or die "Could not open config file $basedir/$CONFIG.conf: $!\n"; - -while () { - s/TEXT/$TEXT/g; - s/GLOBAL/$GLOBAL/g; - s/REFGSYM\((.*)\)/$GSYM$1/g; - s/REFLSYM\((.*)\)/$LSYM$1/g; - s/GSYM\((.*)\)/$GSYM$1$SUFFIX/g; - s/LSYM\((.*)\)/$LSYM$1$SUFFIX/g; - - if ($DEL_R_REG == 0) { - s/cr([0-9][0-9]?)/$1/g; - s/r([0-9][0-9]?)/$1/g; - } - - if (/START_FILE/) { - $_ = start_file(); - } - - if (/START_FUNC\((.*)\)/) { - $current_func = $1; - $_ = start_func($current_func); - } - - if (/END_FUNC\((.*)\)/) { - $current_func = $1; - $_ = end_func($current_func); - } - - if ($ALIGN_LOG == 0) { - s/ALIGN\((\d*)\)/.align $1/g; - } else { - # Ugh... - if (m/ALIGN\((\d*)\)/) { - $val = $1; - $result = 0; - while ($val > 1) { $val /= 2; $result++ } - s/ALIGN\((\d*)\)/.align $result/; - } - } - - if (/^\#START_64BIT/) { - $_ = ""; - if ($IS64BIT == 0) { - $delete = 1; - } - } - if (/^\#END_64BIT/) { - $_ = ""; - $delete = 0; - } - - if ($delete == 0) { - print OUTPUT $_; - } -} - -if ($GNU_STACK == 1) { - if ($asmarch eq "ARM") { - print OUTPUT "\n\t.section\t.note.GNU-stack,\"\",\%progbits\n"; - } else { - print OUTPUT "\n\t.section\t.note.GNU-stack,\"\",\@progbits\n"; - } -} - -close(INPUT); -close(OUTPUT); diff --git a/opal/class/Makefile.am b/opal/class/Makefile.am index e98f955de8d..c3f3c8cb041 100644 --- a/opal/class/Makefile.am +++ b/opal/class/Makefile.am @@ -11,7 +11,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. -# Copyright (c) 2014-2015 Los Alamos National Security, LLC. All rights +# Copyright (c) 2014-2018 Los Alamos National Security, LLC. All rights # reserved. # $COPYRIGHT$ # @@ -38,7 +38,8 @@ headers += \ class/opal_pointer_array.h \ class/opal_value_array.h \ class/opal_ring_buffer.h \ - class/opal_rb_tree.h + class/opal_rb_tree.h \ + class/opal_interval_tree.h lib@OPAL_LIB_PREFIX@open_pal_la_SOURCES += \ class/opal_bitmap.c \ @@ -54,4 +55,5 @@ lib@OPAL_LIB_PREFIX@open_pal_la_SOURCES += \ class/opal_pointer_array.c \ class/opal_value_array.c \ class/opal_ring_buffer.c \ - class/opal_rb_tree.c + class/opal_rb_tree.c \ + class/opal_interval_tree.c diff --git a/opal/class/opal_bitmap.c b/opal/class/opal_bitmap.c index 11d2a21bb38..d4e1e9f6b50 100644 --- a/opal/class/opal_bitmap.c +++ b/opal/class/opal_bitmap.c @@ -12,7 +12,7 @@ * Copyright (c) 2007 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2010-2012 Oak Ridge National Labs. All rights reserved. * Copyright (c) 2014 Intel, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -123,7 +123,7 @@ opal_bitmap_set_bit(opal_bitmap_t *bm, int bit) out of range. We don't throw any error here, because this is valid and we simply expand the bitmap */ - new_size = (int)(((size_t)index / bm->array_size + 1 ) * bm->array_size); + new_size = index + 1; if( new_size > bm->max_size ) new_size = bm->max_size; diff --git a/opal/class/opal_fifo.h b/opal/class/opal_fifo.h index ad9cbdbcbb4..ad67c77a6ff 100644 --- a/opal/class/opal_fifo.h +++ b/opal/class/opal_fifo.h @@ -12,7 +12,7 @@ * All rights reserved. * Copyright (c) 2007 Voltaire All rights reserved. * Copyright (c) 2010 IBM Corporation. All rights reserved. - * Copyright (c) 2014-2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2014-2017 Los Alamos National Security, LLC. All rights * reseved. * $COPYRIGHT$ * @@ -76,7 +76,7 @@ static inline bool opal_fifo_is_empty( opal_fifo_t* fifo ) return opal_fifo_head (fifo) == &fifo->opal_fifo_ghost; } -#if OPAL_HAVE_ATOMIC_CMPSET_128 +#if OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_128 /* Add one element to the FIFO. We will return the last head of the list * to allow the upper level to detect if this element is the first one in the @@ -85,14 +85,12 @@ static inline bool opal_fifo_is_empty( opal_fifo_t* fifo ) static inline opal_list_item_t *opal_fifo_push_atomic (opal_fifo_t *fifo, opal_list_item_t *item) { - opal_counted_pointer_t tail; + opal_counted_pointer_t tail = {.value = fifo->opal_fifo_tail.value}; item->opal_list_next = &fifo->opal_fifo_ghost; do { - tail.value = fifo->opal_fifo_tail.value; - - if (opal_update_counted_pointer (&fifo->opal_fifo_tail, tail, item)) { + if (opal_update_counted_pointer (&fifo->opal_fifo_tail, &tail, item)) { break; } } while (1); @@ -102,7 +100,7 @@ static inline opal_list_item_t *opal_fifo_push_atomic (opal_fifo_t *fifo, if (&fifo->opal_fifo_ghost == tail.data.item) { /* update the head */ opal_counted_pointer_t head = {.value = fifo->opal_fifo_head.value}; - opal_update_counted_pointer (&fifo->opal_fifo_head, head, item); + opal_update_counted_pointer (&fifo->opal_fifo_head, &head, item); } else { /* update previous item */ tail.data.item->opal_list_next = item; @@ -116,29 +114,28 @@ static inline opal_list_item_t *opal_fifo_push_atomic (opal_fifo_t *fifo, */ static inline opal_list_item_t *opal_fifo_pop_atomic (opal_fifo_t *fifo) { - opal_list_item_t *item, *next; - opal_counted_pointer_t head, tail; + opal_list_item_t *item, *next, *ghost = &fifo->opal_fifo_ghost; + opal_counted_pointer_t head = {.value = fifo->opal_fifo_head.value}, tail; do { - head.value = fifo->opal_fifo_head.value; tail.value = fifo->opal_fifo_tail.value; opal_atomic_rmb (); item = (opal_list_item_t *) head.data.item; next = (opal_list_item_t *) item->opal_list_next; - if (&fifo->opal_fifo_ghost == tail.data.item && &fifo->opal_fifo_ghost == item) { + if (ghost == tail.data.item && ghost == item) { return NULL; } /* the head or next pointer are in an inconsistent state. keep looping. */ - if (tail.data.item != item && &fifo->opal_fifo_ghost != tail.data.item && - &fifo->opal_fifo_ghost == next) { + if (tail.data.item != item && ghost != tail.data.item && ghost == next) { + head.value = fifo->opal_fifo_head.value; continue; } /* try popping the head */ - if (opal_update_counted_pointer (&fifo->opal_fifo_head, head, next)) { + if (opal_update_counted_pointer (&fifo->opal_fifo_head, &head, next)) { break; } } while (1); @@ -146,14 +143,14 @@ static inline opal_list_item_t *opal_fifo_pop_atomic (opal_fifo_t *fifo) opal_atomic_wmb (); /* check for tail and head consistency */ - if (&fifo->opal_fifo_ghost == next) { + if (ghost == next) { /* the head was just set to &fifo->opal_fifo_ghost. try to update the tail as well */ - if (!opal_update_counted_pointer (&fifo->opal_fifo_tail, tail, &fifo->opal_fifo_ghost)) { + if (!opal_update_counted_pointer (&fifo->opal_fifo_tail, &tail, ghost)) { /* tail was changed by a push operation. wait for the item's next pointer to be se then * update the head */ /* wait for next pointer to be updated by push */ - while (&fifo->opal_fifo_ghost == item->opal_list_next) { + while (ghost == item->opal_list_next) { opal_atomic_rmb (); } @@ -166,7 +163,7 @@ static inline opal_list_item_t *opal_fifo_pop_atomic (opal_fifo_t *fifo) head.value = fifo->opal_fifo_head.value; next = (opal_list_item_t *) item->opal_list_next; - assert (&fifo->opal_fifo_ghost == head.data.item); + assert (ghost == head.data.item); fifo->opal_fifo_head.data.item = next; opal_atomic_wmb (); @@ -215,14 +212,14 @@ static inline opal_list_item_t *opal_fifo_push_atomic (opal_fifo_t *fifo, */ static inline opal_list_item_t *opal_fifo_pop_atomic (opal_fifo_t *fifo) { - opal_list_item_t *item, *next; + opal_list_item_t *item, *next, *ghost = &fifo->opal_fifo_ghost; #if OPAL_HAVE_ATOMIC_LLSC_PTR /* use load-linked store-conditional to avoid ABA issues */ do { item = opal_atomic_ll_ptr (&fifo->opal_fifo_head.data.item); - if (&fifo->opal_fifo_ghost == item) { - if (&fifo->opal_fifo_ghost == fifo->opal_fifo_tail.data.item) { + if (ghost == item) { + if (ghost == fifo->opal_fifo_tail.data.item) { return NULL; } @@ -239,7 +236,7 @@ static inline opal_list_item_t *opal_fifo_pop_atomic (opal_fifo_t *fifo) #else /* protect against ABA issues by "locking" the head */ do { - if (opal_atomic_cmpset_32 ((int32_t *) &fifo->opal_fifo_head.data.counter, 0, 1)) { + if (!opal_atomic_swap_32 ((volatile int32_t *) &fifo->opal_fifo_head.data.counter, 1)) { break; } @@ -249,7 +246,7 @@ static inline opal_list_item_t *opal_fifo_pop_atomic (opal_fifo_t *fifo) opal_atomic_wmb(); item = opal_fifo_head (fifo); - if (&fifo->opal_fifo_ghost == item) { + if (ghost == item) { fifo->opal_fifo_head.data.counter = 0; return NULL; } @@ -258,9 +255,11 @@ static inline opal_list_item_t *opal_fifo_pop_atomic (opal_fifo_t *fifo) fifo->opal_fifo_head.data.item = next; #endif - if (&fifo->opal_fifo_ghost == next) { - if (!opal_atomic_cmpset_ptr (&fifo->opal_fifo_tail.data.item, item, &fifo->opal_fifo_ghost)) { - while (&fifo->opal_fifo_ghost == item->opal_list_next) { + if (ghost == next) { + void *tmp = item; + + if (!opal_atomic_compare_exchange_strong_ptr (&fifo->opal_fifo_tail.data.item, &tmp, ghost)) { + while (ghost == item->opal_list_next) { opal_atomic_rmb (); } diff --git a/opal/class/opal_free_list.h b/opal/class/opal_free_list.h index 3a196141cc1..1e1de3e8e83 100644 --- a/opal/class/opal_free_list.h +++ b/opal/class/opal_free_list.h @@ -248,7 +248,7 @@ static inline opal_free_list_item_t *opal_free_list_get (opal_free_list_t *flist static inline opal_free_list_item_t *opal_free_list_wait_mt (opal_free_list_t *fl) { opal_free_list_item_t *item = - (opal_free_list_item_t *) opal_lifo_pop (&fl->super); + (opal_free_list_item_t *) opal_lifo_pop_atomic (&fl->super); while (NULL == item) { if (!opal_mutex_trylock (&fl->fl_lock)) { @@ -274,7 +274,7 @@ static inline opal_free_list_item_t *opal_free_list_wait_mt (opal_free_list_t *f opal_mutex_lock (&fl->fl_lock); } opal_mutex_unlock (&fl->fl_lock); - item = (opal_free_list_item_t *) opal_lifo_pop (&fl->super); + item = (opal_free_list_item_t *) opal_lifo_pop_atomic (&fl->super); } return item; diff --git a/opal/class/opal_graph.c b/opal/class/opal_graph.c index 66aec9e9f74..8ee0e88702f 100644 --- a/opal/class/opal_graph.c +++ b/opal/class/opal_graph.c @@ -11,7 +11,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2007 Voltaire All rights reserved. - * Copyright (c) 2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2016-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2016 Cisco Systems, Inc. All rights reserved. * $COPYRIGHT$ @@ -186,25 +186,16 @@ static void opal_adjacency_list_destruct(opal_adjacency_list_t *aj_list) static void delete_all_edges_conceded_to_vertex(opal_graph_t *graph, opal_graph_vertex_t *vertex) { opal_adjacency_list_t *aj_list; - opal_list_item_t *aj_list_item; - opal_graph_edge_t *edge; - opal_list_item_t *edge_item; + opal_graph_edge_t *edge, *next; /** * for all the adjacency list in the graph */ - for (aj_list_item = opal_list_get_first(graph->adjacency_list); - aj_list_item != opal_list_get_end(graph->adjacency_list); - aj_list_item = opal_list_get_next(aj_list_item)) { - aj_list = (opal_adjacency_list_t *) aj_list_item; + OPAL_LIST_FOREACH(aj_list, graph->adjacency_list, opal_adjacency_list_t) { /** * for all the edges in the adjacency list */ - edge_item = opal_list_get_first(aj_list->edges); - while (edge_item != opal_list_get_end(aj_list->edges)) { - edge = (opal_graph_edge_t *)edge_item; - edge_item = opal_list_get_next(edge_item); - + OPAL_LIST_FOREACH_SAFE(edge, next, aj_list->edges, opal_graph_edge_t) { /** * if the edge is ended in the vertex */ @@ -228,15 +219,11 @@ static void delete_all_edges_conceded_to_vertex(opal_graph_t *graph, opal_graph_ void opal_graph_add_vertex(opal_graph_t *graph, opal_graph_vertex_t *vertex) { opal_adjacency_list_t *aj_list; - opal_list_item_t *item; /** * Find if this vertex already exists in the graph. */ - for (item = opal_list_get_first(graph->adjacency_list); - item != opal_list_get_end(graph->adjacency_list); - item = opal_list_get_next(item)) { - aj_list = (opal_adjacency_list_t *) item; + OPAL_LIST_FOREACH(aj_list, graph->adjacency_list, opal_adjacency_list_t) { if (aj_list->vertex == vertex) { /* If this vertex exists, dont do anything. */ return; @@ -270,19 +257,14 @@ void opal_graph_add_vertex(opal_graph_t *graph, opal_graph_vertex_t *vertex) int opal_graph_add_edge(opal_graph_t *graph, opal_graph_edge_t *edge) { opal_adjacency_list_t *aj_list, *start_aj_list= NULL; - opal_list_item_t *item; - bool start_found = false, end_found = false; + bool end_found = false; /** * find the vertices that this edge should connect. */ - for (item = opal_list_get_first(graph->adjacency_list); - item != opal_list_get_end(graph->adjacency_list); - item = opal_list_get_next(item)) { - aj_list = (opal_adjacency_list_t *) item; + OPAL_LIST_FOREACH(aj_list, graph->adjacency_list, opal_adjacency_list_t) { if (aj_list->vertex == edge->start) { - start_found = true; start_aj_list = aj_list; } if (aj_list->vertex == edge->end) { @@ -293,7 +275,7 @@ int opal_graph_add_edge(opal_graph_t *graph, opal_graph_edge_t *edge) * if one of the vertices either the start or the end is not * found - return an error. */ - if (false == start_found && false == end_found) { + if (NULL == start_aj_list || false == end_found) { return OPAL_ERROR; } /* point the edge to the adjacency list of the start vertex (for easy search) */ @@ -372,7 +354,6 @@ void opal_graph_remove_vertex(opal_graph_t *graph, opal_graph_vertex_t *vertex) uint32_t opal_graph_adjacent(opal_graph_t *graph, opal_graph_vertex_t *vertex1, opal_graph_vertex_t *vertex2) { opal_adjacency_list_t *adj_list; - opal_list_item_t *item; opal_graph_edge_t *edge; /** @@ -401,10 +382,7 @@ uint32_t opal_graph_adjacent(opal_graph_t *graph, opal_graph_vertex_t *vertex1, * vertex. */ adj_list = (opal_adjacency_list_t *) vertex1->in_adj_list; - for (item = opal_list_get_first(adj_list->edges); - item != opal_list_get_end(adj_list->edges); - item = opal_list_get_next(item)) { - edge = (opal_graph_edge_t *)item; + OPAL_LIST_FOREACH(edge, adj_list->edges, opal_graph_edge_t) { if (edge->end == vertex2) { /* if the second vertex was found in the adjacency list of the first one, return the weight */ return edge->weight; @@ -452,15 +430,11 @@ int opal_graph_get_size(opal_graph_t *graph) opal_graph_vertex_t *opal_graph_find_vertex(opal_graph_t *graph, void *vertex_data) { opal_adjacency_list_t *aj_list; - opal_list_item_t *item; /** * Run on all the vertices of the graph */ - for (item = opal_list_get_first(graph->adjacency_list); - item != opal_list_get_end(graph->adjacency_list); - item = opal_list_get_next(item)) { - aj_list = (opal_adjacency_list_t *) item; + OPAL_LIST_FOREACH(aj_list, graph->adjacency_list, opal_adjacency_list_t) { if (NULL != aj_list->vertex->compare_vertex) { /* if the vertex data of a vertex is equal to the vertex data */ if (0 == aj_list->vertex->compare_vertex(aj_list->vertex->vertex_data, vertex_data)) { @@ -489,8 +463,6 @@ opal_graph_vertex_t *opal_graph_find_vertex(opal_graph_t *graph, void *vertex_da int opal_graph_get_graph_vertices(opal_graph_t *graph, opal_pointer_array_t *vertices_list) { opal_adjacency_list_t *aj_list; - opal_list_item_t *item; - int i; /** * If the graph order is 0, return NULL. @@ -499,10 +471,7 @@ int opal_graph_get_graph_vertices(opal_graph_t *graph, opal_pointer_array_t *ver return 0; } /* Run on all the vertices of the graph */ - for (item = opal_list_get_first(graph->adjacency_list), i = 0; - item != opal_list_get_end(graph->adjacency_list); - item = opal_list_get_next(item), i++) { - aj_list = (opal_adjacency_list_t *) item; + OPAL_LIST_FOREACH(aj_list, graph->adjacency_list, opal_adjacency_list_t) { /* Add the vertex to the vertices array */ opal_pointer_array_add(vertices_list,(void *)aj_list->vertex); } @@ -528,9 +497,7 @@ int opal_graph_get_adjacent_vertices(opal_graph_t *graph, opal_graph_vertex_t *v opal_adjacency_list_t *adj_list; opal_graph_edge_t *edge; int adjacents_number; - opal_list_item_t *item; vertex_distance_from_t distance_from; - int i; /** * Verify that the vertex belongs to the graph. @@ -546,10 +513,7 @@ int opal_graph_get_adjacent_vertices(opal_graph_t *graph, opal_graph_vertex_t *v /* find the number of adjcents of this vertex */ adjacents_number = opal_list_get_size(adj_list->edges); /* Run on all the edges from this vertex */ - for (item = opal_list_get_first(adj_list->edges), i = 0; - item != opal_list_get_end(adj_list->edges); - item = opal_list_get_next(item), i++) { - edge = (opal_graph_edge_t *)item; + OPAL_LIST_FOREACH(edge, adj_list->edges, opal_graph_edge_t) { /* assign vertices and their weight in the adjcents list */ distance_from.vertex = edge->end; distance_from.weight = edge->weight; @@ -663,7 +627,6 @@ uint32_t opal_graph_dijkstra(opal_graph_t *graph, opal_graph_vertex_t *vertex, o { int graph_order; vertex_distance_from_t *Q, *q_start, *current_vertex; - opal_list_item_t *adj_list_item; opal_adjacency_list_t *adj_list; int number_of_items_in_q; int i; @@ -683,22 +646,15 @@ uint32_t opal_graph_dijkstra(opal_graph_t *graph, opal_graph_vertex_t *vertex, o /* assign a pointer to the start of the queue */ q_start = Q; /* run on all the vertices of the graph */ - for (adj_list_item = opal_list_get_first(graph->adjacency_list), i=0; - adj_list_item != opal_list_get_end(graph->adjacency_list); - adj_list_item = opal_list_get_next(adj_list_item), i++) { - adj_list = (opal_adjacency_list_t *)adj_list_item; + i = 0; + OPAL_LIST_FOREACH(adj_list, graph->adjacency_list, opal_adjacency_list_t) { /* insert the vertices pointes to the working queue */ Q[i].vertex = adj_list->vertex; /** * assign an infinity distance to all the vertices in the queue * except the reference vertex which its distance should be 0. */ - if (Q[i].vertex == vertex) { - Q[i].weight = 0; - } - else { - Q[i].weight = DISTANCE_INFINITY; - } + Q[i++].weight = (adj_list->vertex == vertex) ? 0 : DISTANCE_INFINITY; } number_of_items_in_q = i; /* sort the working queue according the distance from the reference vertex */ @@ -750,17 +706,13 @@ uint32_t opal_graph_dijkstra(opal_graph_t *graph, opal_graph_vertex_t *vertex, o void opal_graph_duplicate(opal_graph_t **dest, opal_graph_t *src) { opal_adjacency_list_t *aj_list; - opal_list_item_t *aj_list_item, *edg_item; opal_graph_vertex_t *vertex; opal_graph_edge_t *edge, *new_edge; /* construct a new graph */ *dest = OBJ_NEW(opal_graph_t); /* Run on all the vertices of the src graph */ - for (aj_list_item = opal_list_get_first(src->adjacency_list); - aj_list_item != opal_list_get_end(src->adjacency_list); - aj_list_item = opal_list_get_next(aj_list_item)) { - aj_list = (opal_adjacency_list_t *) aj_list_item; + OPAL_LIST_FOREACH(aj_list, src->adjacency_list, opal_adjacency_list_t) { /* for each vertex in the src graph, construct a new vertex */ vertex = OBJ_NEW(opal_graph_vertex_t); /* associate the new vertex to a vertex from the original graph */ @@ -789,15 +741,9 @@ void opal_graph_duplicate(opal_graph_t **dest, opal_graph_t *src) * Now, copy all the edges from the source graph */ /* Run on all the adjscency lists in the graph */ - for (aj_list_item = opal_list_get_first(src->adjacency_list); - aj_list_item != opal_list_get_end(src->adjacency_list); - aj_list_item = opal_list_get_next(aj_list_item)) { - aj_list = (opal_adjacency_list_t *) aj_list_item; + OPAL_LIST_FOREACH(aj_list, src->adjacency_list, opal_adjacency_list_t) { /* for all the edges in the adjscency list */ - for (edg_item = opal_list_get_first(aj_list->edges); - edg_item != opal_list_get_end(aj_list->edges); - edg_item = opal_list_get_next(edg_item)) { - edge = (opal_graph_edge_t *)edg_item; + OPAL_LIST_FOREACH(edge, aj_list->edges, opal_graph_edge_t) { /* construct new edge for the new graph */ new_edge = OBJ_NEW(opal_graph_edge_t); /* copy the edge weight from the original edge */ @@ -818,9 +764,7 @@ void opal_graph_duplicate(opal_graph_t **dest, opal_graph_t *src) void opal_graph_print(opal_graph_t *graph) { opal_adjacency_list_t *aj_list; - opal_list_item_t *aj_list_item; opal_graph_edge_t *edge; - opal_list_item_t *edge_item; char *tmp_str1, *tmp_str2; bool need_free1, need_free2; @@ -828,10 +772,7 @@ void opal_graph_print(opal_graph_t *graph) opal_output(0, " Graph "); opal_output(0, "===================="); /* run on all the vertices of the graph */ - for (aj_list_item = opal_list_get_first(graph->adjacency_list); - aj_list_item != opal_list_get_end(graph->adjacency_list); - aj_list_item = opal_list_get_next(aj_list_item)) { - aj_list = (opal_adjacency_list_t *) aj_list_item; + OPAL_LIST_FOREACH(aj_list, graph->adjacency_list, opal_adjacency_list_t) { /* print vertex data to temporary string*/ if (NULL != aj_list->vertex->print_vertex) { need_free1 = true; @@ -844,10 +785,7 @@ void opal_graph_print(opal_graph_t *graph) /* print vertex */ opal_output(0, "V(%s) Connections:",tmp_str1); /* run on all the edges of the vertex */ - for (edge_item = opal_list_get_first(aj_list->edges); - edge_item != opal_list_get_end(aj_list->edges); - edge_item = opal_list_get_next(edge_item)) { - edge = (opal_graph_edge_t *)edge_item; + OPAL_LIST_FOREACH(edge, aj_list->edges, opal_graph_edge_t) { /* print the vertex data of the vertex in the end of the edge to a temporary string */ if (NULL != edge->end->print_vertex) { need_free2 = true; diff --git a/opal/class/opal_interval_tree.c b/opal/class/opal_interval_tree.c new file mode 100644 index 00000000000..e8ccda2024b --- /dev/null +++ b/opal/class/opal_interval_tree.c @@ -0,0 +1,909 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ +/* + * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana + * University Research and Technology + * Corporation. All rights reserved. + * Copyright (c) 2004-2013 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, + * University of Stuttgart. All rights reserved. + * Copyright (c) 2004-2005 The Regents of the University of California. + * All rights reserved. + * Copyright (c) 2015-2018 Los Alamos National Security, LLC. All rights + * reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ +/* + * @file + */ + +#include "opal_config.h" + +#include "opal/class/opal_interval_tree.h" +#include + +/* Private functions */ +static void opal_interval_tree_insert_node (opal_interval_tree_t *tree, opal_interval_tree_node_t *node); + +/* tree rebalancing functions */ +static void opal_interval_tree_delete_fixup (opal_interval_tree_t *tree, opal_interval_tree_node_t *node, + opal_interval_tree_node_t *parent); +static void opal_interval_tree_insert_fixup (opal_interval_tree_t *tree, opal_interval_tree_node_t *x); + +static opal_interval_tree_node_t *opal_interval_tree_next (opal_interval_tree_t *tree, + opal_interval_tree_node_t *node); +static opal_interval_tree_node_t * opal_interval_tree_find_node(opal_interval_tree_t *tree, + uint64_t low, uint64_t high, + bool exact, void *data); + +static opal_interval_tree_node_t *left_rotate (opal_interval_tree_t *tree, opal_interval_tree_node_t *x); +static opal_interval_tree_node_t *right_rotate (opal_interval_tree_t *tree, opal_interval_tree_node_t *x); + +static void inorder_destroy(opal_interval_tree_t *tree, opal_interval_tree_node_t * node); + +#define max(x,y) (((x) > (y)) ? (x) : (y)) + +/** + * the constructor function. creates the free list to get the nodes from + * + * @param object the tree that is to be used + * + * @retval NONE + */ +static void opal_interval_tree_construct (opal_interval_tree_t *tree) +{ + OBJ_CONSTRUCT(&tree->root, opal_interval_tree_node_t); + OBJ_CONSTRUCT(&tree->nill, opal_interval_tree_node_t); + OBJ_CONSTRUCT(&tree->free_list, opal_free_list_t); + OBJ_CONSTRUCT(&tree->gc_list, opal_list_t); + + /* initialize sentinel */ + tree->nill.color = OPAL_INTERVAL_TREE_COLOR_BLACK; + tree->nill.left = tree->nill.right = tree->nill.parent = &tree->nill; + tree->nill.max = 0; + tree->nill.data = NULL; + + /* initialize root sentinel */ + tree->root.color = OPAL_INTERVAL_TREE_COLOR_BLACK; + tree->root.left = tree->root.right = tree->root.parent = &tree->nill; + /* this simplifies inserting at the root as we only have to check the + * low value. */ + tree->root.low = (uint64_t) -1; + tree->root.data = NULL; + + /* set the tree size to zero */ + tree->tree_size = 0; + tree->lock = 0; + tree->reader_count = 0; + tree->epoch = 0; + + /* set all reader epochs to UINT_MAX. this value is used to simplfy + * checks against the current epoch. */ + for (int i = 0 ; i < OPAL_INTERVAL_TREE_MAX_READERS ; ++i) { + tree->reader_epochs[i] = UINT_MAX; + } +} + +/** + * the destructor function. Free the tree and destroys the free list. + * + * @param object the tree object + */ +static void opal_interval_tree_destruct (opal_interval_tree_t *tree) +{ + opal_interval_tree_destroy (tree); + + OBJ_DESTRUCT(&tree->free_list); + OBJ_DESTRUCT(&tree->root); + OBJ_DESTRUCT(&tree->nill); +} + +/* declare the instance of the classes */ +OBJ_CLASS_INSTANCE(opal_interval_tree_node_t, opal_free_list_item_t, NULL, NULL); +OBJ_CLASS_INSTANCE(opal_interval_tree_t, opal_object_t, opal_interval_tree_construct, + opal_interval_tree_destruct); + +typedef int32_t opal_interval_tree_token_t; + +/** + * @brief pick and return a reader slot + */ +static opal_interval_tree_token_t opal_interval_tree_reader_get_token (opal_interval_tree_t *tree) +{ + opal_interval_tree_token_t token = -1; + + if (token < 0) { + int32_t reader_count = tree->reader_count; + /* NTH: could have used an atomic here but all we are after is some distribution of threads + * across the reader slots. with high thread counts i see no real performance difference + * using atomics. */ + token = tree->reader_id++ % OPAL_INTERVAL_TREE_MAX_READERS; + while (OPAL_UNLIKELY(reader_count <= token)) { + if (opal_atomic_compare_exchange_strong_32 (&tree->reader_count, &reader_count, token + 1)) { + break; + } + } + } + + while (!OPAL_ATOMIC_COMPARE_EXCHANGE_STRONG_32((volatile int32_t *) &tree->reader_epochs[token], + &(int32_t) {UINT_MAX}, tree->epoch)); + + return token; +} + +static void opal_interval_tree_reader_return_token (opal_interval_tree_t *tree, opal_interval_tree_token_t token) +{ + tree->reader_epochs[token] = UINT_MAX; +} + +/* Create the tree */ +int opal_interval_tree_init (opal_interval_tree_t *tree) +{ + return opal_free_list_init (&tree->free_list, sizeof(opal_interval_tree_node_t), + opal_cache_line_size, OBJ_CLASS(opal_interval_tree_node_t), + 0, opal_cache_line_size, 0, -1 , 128, NULL, 0, NULL, NULL, NULL); +} + +static bool opal_interval_tree_write_trylock (opal_interval_tree_t *tree) +{ + opal_atomic_rmb (); + return !(tree->lock || opal_atomic_swap_32 (&tree->lock, 1)); +} + +static void opal_interval_tree_write_lock (opal_interval_tree_t *tree) +{ + while (!opal_interval_tree_write_trylock (tree)); +} + +static void opal_interval_tree_write_unlock (opal_interval_tree_t *tree) +{ + opal_atomic_wmb (); + tree->lock = 0; +} + +static void opal_interval_tree_insert_fixup_helper (opal_interval_tree_t *tree, opal_interval_tree_node_t *node) { + opal_interval_tree_node_t *y, *parent = node->parent; + bool rotate_right = false; + + if (parent->color == OPAL_INTERVAL_TREE_COLOR_BLACK) { + return; + } + + if (parent == parent->parent->left) { + y = parent->parent->right; + rotate_right = true; + } else { + y = parent->parent->left; + } + + if (y->color == OPAL_INTERVAL_TREE_COLOR_RED) { + parent->color = OPAL_INTERVAL_TREE_COLOR_BLACK; + y->color = OPAL_INTERVAL_TREE_COLOR_BLACK; + parent->parent->color = OPAL_INTERVAL_TREE_COLOR_RED; + opal_interval_tree_insert_fixup_helper (tree, parent->parent); + return; + } + + if (rotate_right) { + if (node == parent->right) { + node = left_rotate (tree, parent); + parent = node->parent; + } + + parent->color = OPAL_INTERVAL_TREE_COLOR_BLACK; + parent->parent->color = OPAL_INTERVAL_TREE_COLOR_RED; + (void) right_rotate(tree, parent->parent); + } else { + if (node == parent->left) { + node = right_rotate(tree, parent); + parent = node->parent; + } + parent->color = OPAL_INTERVAL_TREE_COLOR_BLACK; + parent->parent->color = OPAL_INTERVAL_TREE_COLOR_RED; + (void) left_rotate(tree, parent->parent); + } + + opal_interval_tree_insert_fixup_helper (tree, node); +} + +static void opal_interval_tree_insert_fixup (opal_interval_tree_t *tree, opal_interval_tree_node_t *node) { + /* do the rotations */ + /* usually one would have to check for NULL, but because of the sentinal, + * we don't have to */ + opal_interval_tree_insert_fixup_helper (tree, node); + + /* after the rotations the root is black */ + tree->root.left->color = OPAL_INTERVAL_TREE_COLOR_BLACK; +} + +/** + * @brief Guts of the delete fixup + * + * @param[in] tree opal interval tree + * @param[in] node node to fixup + * @param[in] left true if the node is a left child of its parent + * + * @returns the next node to fixup or root if done + */ +static inline opal_interval_tree_node_t * +opal_interval_tree_delete_fixup_helper (opal_interval_tree_t *tree, opal_interval_tree_node_t *node, + opal_interval_tree_node_t *parent, const bool left) +{ + opal_interval_tree_node_t *w; + + /* get sibling */ + w = left ? parent->right : parent->left; + if (w->color == OPAL_INTERVAL_TREE_COLOR_RED) { + w->color = OPAL_INTERVAL_TREE_COLOR_BLACK; + parent->color = OPAL_INTERVAL_TREE_COLOR_RED; + if (left) { + (void) left_rotate(tree, parent); + w = parent->right; + } else { + (void) right_rotate(tree, parent); + w = parent->left; + } + } + + if ((w->left->color == OPAL_INTERVAL_TREE_COLOR_BLACK) && (w->right->color == OPAL_INTERVAL_TREE_COLOR_BLACK)) { + w->color = OPAL_INTERVAL_TREE_COLOR_RED; + return parent; + } + + if (left) { + if (w->right->color == OPAL_INTERVAL_TREE_COLOR_BLACK) { + w->left->color = OPAL_INTERVAL_TREE_COLOR_BLACK; + w->color = OPAL_INTERVAL_TREE_COLOR_RED; + (void) right_rotate(tree, w); + w = parent->right; + } + w->color = parent->color; + parent->color = OPAL_INTERVAL_TREE_COLOR_BLACK; + w->right->color = OPAL_INTERVAL_TREE_COLOR_BLACK; + (void) left_rotate(tree, parent); + } else { + if (w->left->color == OPAL_INTERVAL_TREE_COLOR_BLACK) { + w->right->color = OPAL_INTERVAL_TREE_COLOR_BLACK; + w->color = OPAL_INTERVAL_TREE_COLOR_RED; + (void) left_rotate(tree, w); + w = parent->left; + } + w->color = parent->color; + parent->color = OPAL_INTERVAL_TREE_COLOR_BLACK; + w->left->color = OPAL_INTERVAL_TREE_COLOR_BLACK; + (void) right_rotate(tree, parent); + } + + /* return the root */ + return tree->root.left; + } + +/* Fixup the balance of the btree after deletion */ +static void opal_interval_tree_delete_fixup (opal_interval_tree_t *tree, opal_interval_tree_node_t *node, + opal_interval_tree_node_t *parent) +{ + while ((node != tree->root.left) && (node->color == OPAL_INTERVAL_TREE_COLOR_BLACK)) { + node = opal_interval_tree_delete_fixup_helper (tree, node, parent, node == parent->left); + parent = node->parent; + } + + node->color = OPAL_INTERVAL_TREE_COLOR_BLACK; + tree->nill.color = OPAL_INTERVAL_TREE_COLOR_BLACK; +} + +/* traverse the garbage-collection list and return any nodes that can not have any + * references. this function MUST be called with the writer lock held. */ +static void opal_interval_tree_gc_clean (opal_interval_tree_t *tree) +{ + opal_interval_tree_node_t *node, *next; + uint32_t oldest_epoch = UINT_MAX; + + if (0 == opal_list_get_size (&tree->gc_list)) { + return; + } + + for (int i = 0 ; i < tree->reader_count ; ++i) { + oldest_epoch = (oldest_epoch < tree->reader_epochs[i]) ? oldest_epoch : tree->reader_epochs[i]; + } + + OPAL_LIST_FOREACH_SAFE(node, next, &tree->gc_list, opal_interval_tree_node_t) { + if (node->epoch < oldest_epoch) { + opal_list_remove_item (&tree->gc_list, &node->super.super); + opal_free_list_return_st (&tree->free_list, &node->super); + } + } +} + +/* This inserts a node into the tree based on the passed values. */ +int opal_interval_tree_insert (opal_interval_tree_t *tree, void *value, uint64_t low, uint64_t high) +{ + opal_interval_tree_node_t * node; + + if (low > high) { + return OPAL_ERR_BAD_PARAM; + } + + opal_interval_tree_write_lock (tree); + + opal_interval_tree_gc_clean (tree); + + /* get the memory for a node */ + node = (opal_interval_tree_node_t *) opal_free_list_get (&tree->free_list); + if (OPAL_UNLIKELY(NULL == node)) { + opal_interval_tree_write_unlock (tree); + return OPAL_ERR_OUT_OF_RESOURCE; + } + + /* insert the data into the node */ + node->data = value; + node->low = low; + node->high = high; + node->max = high; + node->epoch = tree->epoch; + + /* insert the node into the tree */ + opal_interval_tree_insert_node (tree, node); + + opal_interval_tree_insert_fixup (tree, node); + opal_interval_tree_write_unlock (tree); + + return OPAL_SUCCESS; +} + +static opal_interval_tree_node_t *opal_interval_tree_find_interval(opal_interval_tree_t *tree, opal_interval_tree_node_t *node, uint64_t low, + uint64_t high, bool exact, void *data) +{ + if (node == &tree->nill) { + return NULL; + } + + if (((exact && node->low == low && node->high == high) || (!exact && node->low <= low && node->high >= high)) && + (!data || node->data == data)) { + return node; + } + + if (low <= node->low) { + return opal_interval_tree_find_interval (tree, node->left, low, high, exact, data); + } + + return opal_interval_tree_find_interval (tree, node->right, low, high, exact, data); +} + +/* Finds the node in the tree based on the key and returns a pointer + * to the node. This is a bit a code duplication, but this has to be fast + * so we go ahead with the duplication */ +static opal_interval_tree_node_t *opal_interval_tree_find_node(opal_interval_tree_t *tree, uint64_t low, uint64_t high, bool exact, void *data) +{ + return opal_interval_tree_find_interval (tree, tree->root.left, low, high, exact, data); +} + +void *opal_interval_tree_find_overlapping (opal_interval_tree_t *tree, uint64_t low, uint64_t high) +{ + opal_interval_tree_token_t token; + opal_interval_tree_node_t *node; + + token = opal_interval_tree_reader_get_token (tree); + node = opal_interval_tree_find_node (tree, low, high, true, NULL); + opal_interval_tree_reader_return_token (tree, token); + + return node ? node->data : NULL; +} + +static size_t opal_interval_tree_depth_node (opal_interval_tree_t *tree, opal_interval_tree_node_t *node) +{ + if (&tree->nill == node) { + return 0; + } + + return 1 + max (opal_interval_tree_depth_node (tree, node->right), opal_interval_tree_depth_node (tree, node->left)); +} + +size_t opal_interval_tree_depth (opal_interval_tree_t *tree) +{ + opal_interval_tree_token_t token; + size_t depth; + + token = opal_interval_tree_reader_get_token (tree); + depth = opal_interval_tree_depth_node (tree, &tree->root); + opal_interval_tree_reader_return_token (tree, token); + + return depth; +} + +/* update the value of a tree pointer */ +static inline void rp_publish (opal_interval_tree_node_t **ptr, opal_interval_tree_node_t *node) +{ + /* ensure all writes complete before continuing */ + opal_atomic_wmb (); + /* just set the value */ + *ptr = node; +} + + +static inline void rp_wait_for_readers (opal_interval_tree_t *tree) +{ + uint32_t epoch_id = ++tree->epoch; + + /* wait for all readers to see the new tree version */ + for (int i = 0 ; i < tree->reader_count ; ++i) { + while (tree->reader_epochs[i] < epoch_id); + } +} + +/* waits for all writers to finish with the node then releases the last reference */ +static inline void rp_free_wait (opal_interval_tree_t *tree, opal_interval_tree_node_t *node) +{ + rp_wait_for_readers (tree); + /* no other threads are working on this node so go ahead and return it */ + opal_free_list_return_st (&tree->free_list, &node->super); +} + +/* schedules the node for releasing */ +static inline void rp_free (opal_interval_tree_t *tree, opal_interval_tree_node_t *node) +{ + opal_list_append (&tree->gc_list, &node->super.super); +} + +static opal_interval_tree_node_t *opal_interval_tree_node_copy (opal_interval_tree_t *tree, opal_interval_tree_node_t *node) +{ + opal_interval_tree_node_t *copy = (opal_interval_tree_node_t *) opal_free_list_wait_st (&tree->free_list); + size_t color_offset = offsetof(opal_interval_tree_node_t, color); + assert (NULL != copy); + memcpy ((unsigned char *) copy + color_offset, (unsigned char *) node + color_offset, + sizeof (*node) - color_offset); + return copy; +} + +/* this function deletes a node that is either a left or right leaf (or both) */ +static void opal_interval_tree_delete_leaf (opal_interval_tree_t *tree, opal_interval_tree_node_t *node) +{ + const opal_interval_tree_node_t *nill = &tree->nill; + opal_interval_tree_node_t **parent_ptr, *next, *parent = node->parent; + opal_interval_tree_nodecolor_t color = node->color; + + assert (node->left == nill || node->right == nill); + + parent_ptr = (parent->right == node) ? &parent->right : &parent->left; + + next = (node->right == nill) ? node->left : node->right; + + next->parent = node->parent; + rp_publish (parent_ptr, next); + + rp_free (tree, node); + + if (OPAL_INTERVAL_TREE_COLOR_BLACK == color) { + if (OPAL_INTERVAL_TREE_COLOR_RED == next->color) { + next->color = OPAL_INTERVAL_TREE_COLOR_BLACK; + } else { + opal_interval_tree_delete_fixup (tree, next, parent); + } + } +} + +static void opal_interval_tree_delete_interior (opal_interval_tree_t *tree, opal_interval_tree_node_t *node) +{ + opal_interval_tree_node_t **parent_ptr, *next, *next_copy, *parent = node->parent; + opal_interval_tree_nodecolor_t color = node->color, next_color; + + parent_ptr = (parent->right == node) ? &parent->right : &parent->left; + next = opal_interval_tree_next (tree, node); + next_color = next->color; + + if (next != node->right) { + /* case 3 */ + next_copy = opal_interval_tree_node_copy (tree, next); + next_copy->color = node->color; + next_copy->left = node->left; + next_copy->left->parent = next_copy; + next_copy->right = node->right; + next_copy->right->parent = next_copy; + next_copy->parent = node->parent; + + rp_publish (parent_ptr, next_copy); + rp_free_wait (tree, node); + + opal_interval_tree_delete_leaf (tree, next); + } else { + /* case 2. no copies are needed */ + next->color = color; + next->left = node->left; + next->left->parent = next; + next->parent = node->parent; + rp_publish (parent_ptr, next); + rp_free (tree, node); + + /* since we are actually "deleting" the next node the fixup needs to happen on the + * right child of next (by definition next was a left child) */ + if (OPAL_INTERVAL_TREE_COLOR_BLACK == next_color) { + if (OPAL_INTERVAL_TREE_COLOR_RED == next->right->color) { + next->right->color = OPAL_INTERVAL_TREE_COLOR_BLACK; + } else { + opal_interval_tree_delete_fixup (tree, next->right, next); + } + } + } +} + +/* Delete a node from the tree based on the key */ +int opal_interval_tree_delete (opal_interval_tree_t *tree, uint64_t low, uint64_t high, void *data) +{ + opal_interval_tree_node_t *node; + + opal_interval_tree_write_lock (tree); + node = opal_interval_tree_find_node (tree, low, high, true, data); + if (NULL == node) { + opal_interval_tree_write_unlock (tree); + return OPAL_ERR_NOT_FOUND; + } + + /* there are three cases that have to be handled: + * 1) the node p is a left leaf or a right left (one of p's children is nill) + * in this case we can delete p and we can replace it with one of it's children + * or nill (if both children are nill). + * 2) the right child of p is a left leaf (node->right->left == nill) + * in this case we can set node->right->left = node->left and replace node with node->right + * 3) p is a interior node + * we replace node with next(node) + */ + + if ((node->left == &tree->nill) || (node->right == &tree->nill)) { + /* handle case 1 */ + opal_interval_tree_delete_leaf (tree, node); + } else { + /* handle case 2 and 3 */ + opal_interval_tree_delete_interior (tree, node); + } + + --tree->tree_size; + + opal_interval_tree_write_unlock (tree); + + return OPAL_SUCCESS; +} + +int opal_interval_tree_destroy (opal_interval_tree_t *tree) +{ + /* Recursive inorder traversal for delete */ + inorder_destroy(tree, &tree->root); + tree->tree_size = 0; + return OPAL_SUCCESS; +} + + +/* Find the next inorder successor of a node */ +static opal_interval_tree_node_t *opal_interval_tree_next (opal_interval_tree_t *tree, opal_interval_tree_node_t *node) +{ + opal_interval_tree_node_t *p = node->right; + + if (p == &tree->nill) { + p = node->parent; + while (node == p->right) { + node = p; + p = p->parent; + } + + if (p == &tree->root) { + return &tree->nill; + } + + return p; + } + + while (p->left != &tree->nill) { + p = p->left; + } + + return p; +} + +/* Insert an element in the normal binary search tree fashion */ +/* this function goes through the tree and finds the leaf where + * the node will be inserted */ +static void opal_interval_tree_insert_node (opal_interval_tree_t *tree, opal_interval_tree_node_t *node) +{ + opal_interval_tree_node_t *parent = &tree->root; + opal_interval_tree_node_t *n = parent->left; /* the real root of the tree */ + opal_interval_tree_node_t *nill = &tree->nill; + + /* set up initial values for the node */ + node->color = OPAL_INTERVAL_TREE_COLOR_RED; + node->parent = NULL; + node->left = nill; + node->right = nill; + + /* find the leaf where we will insert the node */ + while (n != nill) { + if (n->max < node->high) { + n->max = node->high; + } + + parent = n; + n = ((node->low < n->low) ? n->left : n->right); + assert (nill == n || n->parent == parent); + } + + /* place it on either the left or the right */ + if ((node->low < parent->low)) { + parent->left = node; + } else { + parent->right = node; + } + + /* set its parent and children */ + node->parent = parent; + + ++tree->tree_size; +} + +static int inorder_traversal (opal_interval_tree_t *tree, uint64_t low, uint64_t high, + bool partial_ok, opal_interval_tree_action_fn_t action, + opal_interval_tree_node_t * node, void *ctx) +{ + int rc; + + if (node == &tree->nill) { + return OPAL_SUCCESS; + } + + rc = inorder_traversal(tree, low, high, partial_ok, action, node->left, ctx); + if (OPAL_SUCCESS != rc) { + return rc; + } + + if ((!partial_ok && (node->low <= low && node->high >= high)) || + (partial_ok && ((low >= node->low && low <= node->high) || + (high >= node->low && high <= node->high) || + (node->low >= low && node->low <= high) || + (node->high >= high && node->high <= high)))) { + rc = action (node->low, node->high, node->data, ctx); + if (OPAL_SUCCESS != rc) { + return rc; + } + } + + return inorder_traversal(tree, low, high, partial_ok, action, node->right, ctx); +} + +/* Free the nodes in inorder fashion */ + +static void inorder_destroy (opal_interval_tree_t *tree, opal_interval_tree_node_t *node) +{ + if (node == &tree->nill) { + return; + } + + inorder_destroy(tree, node->left); + inorder_destroy(tree, node->right); + + if (node->left != &tree->nill) { + opal_free_list_return_st (&tree->free_list, &node->left->super); + } + + if (node->right != &tree->nill) { + opal_free_list_return_st (&tree->free_list, &node->right->super); + } +} + +/* Try to access all the elements of the hashmap conditionally */ + +int opal_interval_tree_traverse (opal_interval_tree_t *tree, uint64_t low, uint64_t high, + bool partial_ok, opal_interval_tree_action_fn_t action, void *ctx) +{ + opal_interval_tree_token_t token; + int rc; + + if (action == NULL) { + return OPAL_ERR_BAD_PARAM; + } + + token = opal_interval_tree_reader_get_token (tree); + rc = inorder_traversal (tree, low, high, partial_ok, action, tree->root.left, ctx); + opal_interval_tree_reader_return_token (tree, token); + return rc; +} + +/* Left rotate the tree */ +/* basically what we want to do is to make x be the left child + * of its right child */ +static opal_interval_tree_node_t *left_rotate (opal_interval_tree_t *tree, opal_interval_tree_node_t *x) +{ + opal_interval_tree_node_t *x_copy = x; + opal_interval_tree_node_t *y = x->right; + opal_interval_tree_node_t *parent = x->parent; + + /* make the left child of y's parent be x if it is not the sentinal node*/ + if (y->left != &tree->nill) { + y->left->parent = x_copy; + } + + /* x's parent is now y */ + x_copy->parent = y; + x_copy->right = y->left; + x_copy->max = max (x_copy->high, max (x_copy->left->max, x_copy->left->max)); + + rp_publish (&y->left, x_copy); + + /* normlly we would have to check to see if we are at the root. + * however, the root sentinal takes care of it for us */ + if (x == parent->left) { + rp_publish (&parent->left, y); + } else { + rp_publish (&parent->right, y); + } + + /* the old parent of x is now y's parent */ + y->parent = parent; + + return x_copy; +} + + +/* Right rotate the tree */ +/* basically what we want to do is to make x be the right child + * of its left child */ +static opal_interval_tree_node_t *right_rotate (opal_interval_tree_t *tree, opal_interval_tree_node_t *x) +{ + opal_interval_tree_node_t *x_copy = x; + opal_interval_tree_node_t *y = x->left; + opal_interval_tree_node_t *parent = x->parent; + + /* make the left child of y's parent be x if it is not the sentinal node*/ + if (y->right != &tree->nill) { + y->right->parent = x_copy; + } + + x_copy->left = y->right; + x_copy->parent = y; + + rp_publish (&y->right, x_copy); + + /* the maximum value in the subtree rooted at y is now the value it + * was at x */ + y->max = x->max; + y->parent = parent; + + if (parent->left == x) { + rp_publish (&parent->left, y); + } else { + rp_publish (&parent->right, y); + } + + return x_copy; +} + +/* returns the size of the tree */ +size_t opal_interval_tree_size(opal_interval_tree_t *tree) +{ + return tree->tree_size; +} + +static bool opal_interval_tree_verify_node (opal_interval_tree_t *tree, opal_interval_tree_node_t *node, int black_depth, + int current_black_depth) +{ + if (node == &tree->nill) { + return true; + } + + if (OPAL_INTERVAL_TREE_COLOR_RED == node->color && + (OPAL_INTERVAL_TREE_COLOR_BLACK != node->left->color || + OPAL_INTERVAL_TREE_COLOR_BLACK != node->right->color)) { + fprintf (stderr, "Red node has a red child!\n"); + return false; + } + + if (OPAL_INTERVAL_TREE_COLOR_BLACK == node->color) { + current_black_depth++; + } + + if (node->left == &tree->nill && node->right == &tree->nill) { + if (black_depth != current_black_depth) { + fprintf (stderr, "Found leaf with unexpected black depth: %d, expected: %d\n", current_black_depth, black_depth); + return false; + } + + return true; + } + + return opal_interval_tree_verify_node (tree, node->left, black_depth, current_black_depth) || + opal_interval_tree_verify_node (tree, node->right, black_depth, current_black_depth); +} + +static int opal_interval_tree_black_depth (opal_interval_tree_t *tree, opal_interval_tree_node_t *node, int depth) +{ + if (node == &tree->nill) { + return depth; + } + + /* suffices to always go left */ + if (OPAL_INTERVAL_TREE_COLOR_BLACK == node->color) { + depth++; + } + + return opal_interval_tree_black_depth (tree, node->left, depth); +} + +bool opal_interval_tree_verify (opal_interval_tree_t *tree) +{ + int black_depth; + + if (OPAL_INTERVAL_TREE_COLOR_BLACK != tree->root.left->color) { + fprintf (stderr, "Root node of tree is NOT black!\n"); + return false; + } + + if (OPAL_INTERVAL_TREE_COLOR_BLACK != tree->nill.color) { + fprintf (stderr, "Leaf node color is NOT black!\n"); + return false; + } + + black_depth = opal_interval_tree_black_depth (tree, tree->root.left, 0); + + return opal_interval_tree_verify_node (tree, tree->root.left, black_depth, 0); +} + +static void opal_interval_tree_dump_node (opal_interval_tree_t *tree, opal_interval_tree_node_t *node, int black_rank, FILE *fh) +{ + const char *color = (node->color == OPAL_INTERVAL_TREE_COLOR_BLACK) ? "black" : "red"; + uintptr_t left = (uintptr_t) node->left, right = (uintptr_t) node->right; + opal_interval_tree_node_t *nill = &tree->nill; + + if (node->color == OPAL_INTERVAL_TREE_COLOR_BLACK) { + ++black_rank; + } + + if (nill == node) { + return; + } + + /* print out nill nodes if any */ + if ((uintptr_t) nill == left) { + left = (uintptr_t) node | 0x1; + fprintf (fh, " Node%lx [color=black,label=nill];\n\n", left); + } else { + left = (uintptr_t) node->left; + } + + if ((uintptr_t) nill == right) { + right = (uintptr_t) node | 0x2; + fprintf (fh, " Node%lx [color=black,label=nill];\n\n", right); + } else { + right = (uintptr_t) node->right; + } + + /* print out this node and its edges */ + fprintf (fh, " Node%lx [color=%s,shape=box,label=\"[0x%" PRIx64 ",0x%" PRIx64 "]\\nmax=0x%" PRIx64 + "\\ndata=0x%lx\\nblack rank=%d\"];\n", (uintptr_t) node, color, node->low, node->high, node->max, + (uintptr_t) node->data, black_rank); + fprintf (fh, " Node%lx -> Node%lx;\n", (uintptr_t) node, left); + fprintf (fh, " Node%lx -> Node%lx;\n\n", (uintptr_t) node, right); + if (node != tree->root.left) { + fprintf (fh, " Node%lx -> Node%lx;\n\n", (uintptr_t) node, (uintptr_t) node->parent); + } + opal_interval_tree_dump_node (tree, node->left, black_rank, fh); + opal_interval_tree_dump_node (tree, node->right, black_rank, fh); +} + +int opal_interval_tree_dump (opal_interval_tree_t *tree, const char *path) +{ + FILE *fh; + + fh = fopen (path, "w"); + if (NULL == fh) { + return OPAL_ERR_BAD_PARAM; + } + + fprintf (fh, "digraph {\n"); + fprintf (fh, " graph [ordering=\"out\"];"); + opal_interval_tree_dump_node (tree, tree->root.left, 0, fh); + fprintf (fh, "}\n"); + + fclose (fh); + + return OPAL_SUCCESS; +} diff --git a/opal/class/opal_interval_tree.h b/opal/class/opal_interval_tree.h new file mode 100644 index 00000000000..cbb36eb1cc8 --- /dev/null +++ b/opal/class/opal_interval_tree.h @@ -0,0 +1,241 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ +/* + * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana + * University Research and Technology + * Corporation. All rights reserved. + * Copyright (c) 2004-2005 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, + * University of Stuttgart. All rights reserved. + * Copyright (c) 2004-2005 The Regents of the University of California. + * All rights reserved. + * Copyright (c) 2015-2018 Los Alamos National Security, LLC. All rights + * reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + * + */ + +/** @file + * + * A thread-safe interval tree derived from opal_rb_tree_t + */ + +#ifndef OPAL_INTERVAL_TREE_H +#define OPAL_INTERVAL_TREE_H + +#include "opal_config.h" +#include +#include "opal/constants.h" +#include "opal/class/opal_object.h" +#include "opal/class/opal_free_list.h" + +BEGIN_C_DECLS +/* + * Data structures and datatypes + */ + +/** + * red and black enum + */ +typedef enum {OPAL_INTERVAL_TREE_COLOR_RED, OPAL_INTERVAL_TREE_COLOR_BLACK} opal_interval_tree_nodecolor_t; + +/** + * node data structure + */ +struct opal_interval_tree_node_t +{ + opal_free_list_item_t super; /**< the parent class */ + opal_interval_tree_nodecolor_t color; /**< the node color */ + struct opal_interval_tree_node_t *parent;/**< the parent node, can be NULL */ + struct opal_interval_tree_node_t *left; /**< the left child - can be nill */ + struct opal_interval_tree_node_t *right; /**< the right child - can be nill */ + /** edit epoch associated with this node */ + uint32_t epoch; + /** data for this interval */ + void *data; + /** low value of this interval */ + uint64_t low; + /** high value of this interval */ + uint64_t high; + /** maximum value of all intervals in tree rooted at this node */ + uint64_t max; +}; +typedef struct opal_interval_tree_node_t opal_interval_tree_node_t; + +/** maximum number of simultaneous readers */ +#define OPAL_INTERVAL_TREE_MAX_READERS 128 + +/** + * the data structure that holds all the needed information about the tree. + */ +struct opal_interval_tree_t { + opal_object_t super; /**< the parent class */ + /* this root pointer doesn't actually point to the root of the tree. + * rather, it points to a sentinal node who's left branch is the real + * root of the tree. This is done to eliminate special cases */ + opal_interval_tree_node_t root; /**< a pointer to the root of the tree */ + opal_interval_tree_node_t nill; /**< the nill sentinal node */ + opal_free_list_t free_list; /**< the free list to get the memory from */ + opal_list_t gc_list; /**< list of nodes that need to be released */ + uint32_t epoch; /**< current update epoch */ + volatile size_t tree_size; /**< the current size of the tree */ + volatile int32_t lock; /**< update lock */ + volatile int32_t reader_count; /**< current highest reader slot to check */ + volatile uint32_t reader_id; /**< next reader slot to check */ + volatile uint32_t reader_epochs[OPAL_INTERVAL_TREE_MAX_READERS]; +}; +typedef struct opal_interval_tree_t opal_interval_tree_t; + +/** declare the tree node as a class */ +OPAL_DECLSPEC OBJ_CLASS_DECLARATION(opal_interval_tree_node_t); +/** declare the tree as a class */ +OPAL_DECLSPEC OBJ_CLASS_DECLARATION(opal_interval_tree_t); + +/* Function pointers for map traversal function */ +/** + * this function is used for the opal_interval_tree_traverse function. + * it is passed a pointer to the value for each node and, if it returns + * a one, the action function is called on that node. Otherwise, the node is ignored. + */ +typedef int (*opal_interval_tree_condition_fn_t)(void *); +/** + * this function is used for the user to perform any action on the passed + * values. The first argument is the key and the second is the value. + * note that this function SHOULD NOT modify the keys, as that would + * mess up the tree. + */ +typedef int (*opal_interval_tree_action_fn_t)(uint64_t low, uint64_t high, void *data, void *ctx); + +/* + * Public function protoypes + */ + +/** + * the function creates a new tree + * + * @param tree a pointer to an allocated area of memory for the main + * tree data structure. + * @param comp a pointer to the function to use for comaparing 2 nodes + * + * @retval OPAL_SUCCESS if it is successful + * @retval OPAL_ERR_TEMP_OUT_OF_RESOURCE if unsuccessful + */ +OPAL_DECLSPEC int opal_interval_tree_init(opal_interval_tree_t * tree); + + +/** + * inserts a node into the tree + * + * @param tree a pointer to the tree data structure + * @param key the key for the node + * @param value the value for the node + * + * @retval OPAL_SUCCESS + * @retval OPAL_ERR_TEMP_OUT_OF_RESOURCE if unsuccessful + */ +OPAL_DECLSPEC int opal_interval_tree_insert(opal_interval_tree_t *tree, void *value, uint64_t low, uint64_t high); + +/** + * finds a value in the tree based on the passed key using passed + * compare function + * + * @param tree a pointer to the tree data structure + * @param key a pointer to the key + * @param compare function + * + * @retval pointer to the value if found + * @retval NULL if not found + */ +OPAL_DECLSPEC void *opal_interval_tree_find_overlapping (opal_interval_tree_t *tree, uint64_t low, uint64_t high); + +/** + * deletes a node based on its interval + * + * @param tree a pointer to the tree data structure + * @param low low value of interval + * @param high high value of interval + * @param data data to match (NULL for any) + * + * @retval OPAL_SUCCESS if the node is found and deleted + * @retval OPAL_ERR_NOT_FOUND if the node is not found + * + * This function finds and deletes an interval from the tree that exactly matches + * the given range. + */ +OPAL_DECLSPEC int opal_interval_tree_delete(opal_interval_tree_t *tree, uint64_t low, uint64_t high, void *data); + +/** + * frees all the nodes on the tree + * + * @param tree a pointer to the tree data structure + * + * @retval OPAL_SUCCESS + */ +OPAL_DECLSPEC int opal_interval_tree_destroy(opal_interval_tree_t *tree); + +/** + * traverses the entire tree, performing the cond function on each of the + * values and if it returns one it performs the action function on the values + * + * @param tree a pointer to the tree + * @param low low value of interval + * @param high high value of interval + * @param partial_ok traverse nodes that parially overlap the given range + * @param action a pointer to the action function + * @param ctx context to pass to action function + * + * @retval OPAL_SUCCESS + * @retval OPAL_ERROR if there is an error + */ +OPAL_DECLSPEC int opal_interval_tree_traverse (opal_interval_tree_t *tree, uint64_t low, uint64_t high, + bool complete, opal_interval_tree_action_fn_t action, void *ctx); + +/** + * returns the size of the tree + * + * @param tree a pointer to the tree data structure + * + * @retval int the nuber of items on the tree + */ +OPAL_DECLSPEC size_t opal_interval_tree_size (opal_interval_tree_t *tree); + +/** + * Diagnostic function to get the max depth of an interval tree. + * + * @param[in] tree opal interval tree pointer + * + * This is an expensive function that walks the entire tree to find the + * maximum depth. For a valid interval tree this depth will always be + * O(log(n)) where n is the number of intervals in the tree. + */ +OPAL_DECLSPEC size_t opal_interval_tree_depth (opal_interval_tree_t *tree); + +/** + * Diagnostic function that can be used to verify that an interval tree + * is valid. + * + * @param[in] tree opal interval tree pointer + * + * @returns true if the tree is a valid interval tree + * @returns false otherwise + */ +OPAL_DECLSPEC bool opal_interval_tree_verify (opal_interval_tree_t *tree); + +/** + * Dump a DOT representation of the interval tree + * + * @param[in] tree opal interval tree pointer + * @param[in] path output file path + * + * This function dumps the tree and includes: color, data value, interval, and sub-tree + * min and max. + */ +OPAL_DECLSPEC int opal_interval_tree_dump (opal_interval_tree_t *tree, const char *path); + +END_C_DECLS +#endif /* OPAL_INTERVAL_TREE_H */ diff --git a/opal/class/opal_lifo.h b/opal/class/opal_lifo.h index 0bf2cd20960..aea21fc97e7 100644 --- a/opal/class/opal_lifo.h +++ b/opal/class/opal_lifo.h @@ -12,10 +12,10 @@ * All rights reserved. * Copyright (c) 2007 Voltaire All rights reserved. * Copyright (c) 2010 IBM Corporation. All rights reserved. - * Copyright (c) 2014-2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2014-2017 Los Alamos National Security, LLC. All rights * reseved. - * Copyright (c) 2016 Research Organization for Information Science - * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2018 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -36,8 +36,8 @@ BEGIN_C_DECLS /* NTH: temporarily suppress warnings about this not being defined */ -#if !defined(OPAL_HAVE_ATOMIC_CMPSET_128) -#define OPAL_HAVE_ATOMIC_CMPSET_128 0 +#if !defined(OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_128) +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_128 0 #endif /** @@ -48,9 +48,9 @@ union opal_counted_pointer_t { /** update counter used when cmpset_128 is available */ uint64_t counter; /** list item pointer */ - opal_list_item_t *item; + volatile opal_list_item_t * volatile item; } data; -#if OPAL_HAVE_ATOMIC_CMPSET_128 && HAVE_OPAL_INT128_T +#if OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_128 && HAVE_OPAL_INT128_T /** used for atomics when there is a cmpset that can operate on * two 64-bit values */ opal_int128_t value; @@ -59,19 +59,19 @@ union opal_counted_pointer_t { typedef union opal_counted_pointer_t opal_counted_pointer_t; -#if OPAL_HAVE_ATOMIC_CMPSET_128 +#if OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_128 /* Add one element to the FIFO. We will return the last head of the list * to allow the upper level to detect if this element is the first one in the * list (if the list was empty before this operation). */ -static inline bool opal_update_counted_pointer (volatile opal_counted_pointer_t *addr, opal_counted_pointer_t old, +static inline bool opal_update_counted_pointer (volatile opal_counted_pointer_t *addr, opal_counted_pointer_t *old, opal_list_item_t *item) { opal_counted_pointer_t new_p; new_p.data.item = item; - new_p.data.counter = old.data.counter + 1; - return opal_atomic_cmpset_128 (&addr->value, old.value, new_p.value); + new_p.data.counter = old->data.counter + 1; + return opal_atomic_compare_exchange_strong_128 (&addr->value, &old->value, new_p.value); } #endif @@ -110,7 +110,7 @@ static inline bool opal_lifo_is_empty( opal_lifo_t* lifo ) } -#if OPAL_HAVE_ATOMIC_CMPSET_128 +#if OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_128 /* Add one element to the LIFO. We will return the last head of the list * to allow the upper level to detect if this element is the first one in the @@ -119,14 +119,14 @@ static inline bool opal_lifo_is_empty( opal_lifo_t* lifo ) static inline opal_list_item_t *opal_lifo_push_atomic (opal_lifo_t *lifo, opal_list_item_t *item) { - do { - opal_list_item_t *next = (opal_list_item_t *) lifo->opal_lifo_head.data.item; + opal_list_item_t *next = (opal_list_item_t *) lifo->opal_lifo_head.data.item; + do { item->opal_list_next = next; opal_atomic_wmb (); /* to protect against ABA issues it is sufficient to only update the counter in pop */ - if (opal_atomic_cmpset_ptr (&lifo->opal_lifo_head.data.item, next, item)) { + if (opal_atomic_compare_exchange_strong_ptr (&lifo->opal_lifo_head.data.item, &next, item)) { return next; } /* DO some kind of pause to release the bus */ @@ -138,20 +138,20 @@ static inline opal_list_item_t *opal_lifo_push_atomic (opal_lifo_t *lifo, */ static inline opal_list_item_t *opal_lifo_pop_atomic (opal_lifo_t* lifo) { + opal_counted_pointer_t old_head; opal_list_item_t *item; - do { - opal_counted_pointer_t old_head; - - old_head.data.counter = lifo->opal_lifo_head.data.counter; - opal_atomic_rmb (); - item = old_head.data.item = lifo->opal_lifo_head.data.item; + old_head.data.counter = lifo->opal_lifo_head.data.counter; + opal_atomic_rmb (); + old_head.data.item = (opal_list_item_t *) lifo->opal_lifo_head.data.item; + do { + item = (opal_list_item_t *) old_head.data.item; if (item == &lifo->opal_lifo_ghost) { return NULL; } - if (opal_update_counted_pointer (&lifo->opal_lifo_head, old_head, + if (opal_update_counted_pointer (&lifo->opal_lifo_head, &old_head, (opal_list_item_t *) item->opal_list_next)) { opal_atomic_wmb (); item->opal_list_next = NULL; @@ -169,13 +169,15 @@ static inline opal_list_item_t *opal_lifo_pop_atomic (opal_lifo_t* lifo) static inline opal_list_item_t *opal_lifo_push_atomic (opal_lifo_t *lifo, opal_list_item_t *item) { + opal_list_item_t *next = (opal_list_item_t *) lifo->opal_lifo_head.data.item; + /* item free acts as a mini lock to avoid ABA problems */ item->item_free = 1; + do { - opal_list_item_t *next = (opal_list_item_t *) lifo->opal_lifo_head.data.item; item->opal_list_next = next; opal_atomic_wmb(); - if (opal_atomic_cmpset_ptr (&lifo->opal_lifo_head.data.item, next, item)) { + if (opal_atomic_compare_exchange_strong_ptr (&lifo->opal_lifo_head.data.item, &next, item)) { opal_atomic_wmb (); /* now safe to pop this item */ item->item_free = 0; @@ -236,8 +238,9 @@ static inline opal_list_item_t *opal_lifo_pop_atomic (opal_lifo_t* lifo) */ static inline opal_list_item_t *opal_lifo_pop_atomic (opal_lifo_t* lifo) { - opal_list_item_t *item; - while ((item = (opal_list_item_t *) lifo->opal_lifo_head.data.item) != &lifo->opal_lifo_ghost) { + opal_list_item_t *item, *head, *ghost = &lifo->opal_lifo_ghost; + + while ((item=(opal_list_item_t *)lifo->opal_lifo_head.data.item) != ghost) { /* ensure it is safe to pop the head */ if (opal_atomic_swap_32((volatile int32_t *) &item->item_free, 1)) { continue; @@ -245,14 +248,16 @@ static inline opal_list_item_t *opal_lifo_pop_atomic (opal_lifo_t* lifo) opal_atomic_wmb (); + head = item; /* try to swap out the head pointer */ - if (opal_atomic_cmpset_ptr (&lifo->opal_lifo_head.data.item, item, - (void *) item->opal_list_next)) { + if (opal_atomic_compare_exchange_strong_ptr (&lifo->opal_lifo_head.data.item, &head, + (void *) item->opal_list_next)) { break; } /* NTH: don't need another atomic here */ item->item_free = 0; + item = head; /* Do some kind of pause to release the bus */ } diff --git a/opal/class/opal_list.c b/opal/class/opal_list.c index e0a5112c38a..87cb1192b1b 100644 --- a/opal/class/opal_list.c +++ b/opal/class/opal_list.c @@ -144,7 +144,7 @@ bool opal_list_insert(opal_list_t *list, opal_list_item_t *item, long long idx) /* Spot check: ensure this item is only on the list that we just insertted it into */ - (void)opal_atomic_add( &(item->opal_list_item_refcount), 1 ); + opal_atomic_add ( &(item->opal_list_item_refcount), 1 ); assert(1 == item->opal_list_item_refcount); item->opal_list_item_belong_to = list; #endif diff --git a/opal/class/opal_list.h b/opal/class/opal_list.h index 1e91604ca9f..5edd6730d54 100644 --- a/opal/class/opal_list.h +++ b/opal/class/opal_list.h @@ -103,9 +103,9 @@ struct opal_list_item_t { opal_object_t super; /**< Generic parent class for all Open MPI objects */ - volatile struct opal_list_item_t *opal_list_next; + volatile struct opal_list_item_t * volatile opal_list_next; /**< Pointer to next list item */ - volatile struct opal_list_item_t *opal_list_prev; + volatile struct opal_list_item_t * volatile opal_list_prev; /**< Pointer to previous list item */ int32_t item_free; @@ -509,7 +509,7 @@ static inline opal_list_item_t *opal_list_remove_item #if OPAL_ENABLE_DEBUG /* Spot check: ensure that this item is still only on one list */ - OPAL_THREAD_ADD32( &(item->opal_list_item_refcount), -1 ); + OPAL_THREAD_ADD_FETCH32( &(item->opal_list_item_refcount), -1 ); assert(0 == item->opal_list_item_refcount); item->opal_list_item_belong_to = NULL; #endif @@ -575,7 +575,7 @@ static inline void _opal_list_append(opal_list_t *list, opal_list_item_t *item /* Spot check: ensure this item is only on the list that we just appended it to */ - OPAL_THREAD_ADD32( &(item->opal_list_item_refcount), 1 ); + OPAL_THREAD_ADD_FETCH32( &(item->opal_list_item_refcount), 1 ); assert(1 == item->opal_list_item_refcount); item->opal_list_item_belong_to = list; #endif @@ -625,7 +625,7 @@ static inline void opal_list_prepend(opal_list_t *list, /* Spot check: ensure this item is only on the list that we just prepended it to */ - OPAL_THREAD_ADD32( &(item->opal_list_item_refcount), 1 ); + OPAL_THREAD_ADD_FETCH32( &(item->opal_list_item_refcount), 1 ); assert(1 == item->opal_list_item_refcount); item->opal_list_item_belong_to = list; #endif @@ -686,7 +686,7 @@ static inline opal_list_item_t *opal_list_remove_first(opal_list_t *list) /* Spot check: ensure that the item we're returning is now on no lists */ - OPAL_THREAD_ADD32( &(item->opal_list_item_refcount), -1 ); + OPAL_THREAD_ADD_FETCH32( &(item->opal_list_item_refcount), -1 ); assert(0 == item->opal_list_item_refcount); #endif @@ -746,7 +746,7 @@ static inline opal_list_item_t *opal_list_remove_last(opal_list_t *list) /* Spot check: ensure that the item we're returning is now on no lists */ - OPAL_THREAD_ADD32( &(item->opal_list_item_refcount), -1 ); + OPAL_THREAD_ADD_FETCH32( &(item->opal_list_item_refcount), -1 ); assert(0 == item->opal_list_item_refcount); item->opal_list_item_belong_to = NULL; #endif @@ -789,7 +789,7 @@ static inline void opal_list_insert_pos(opal_list_t *list, opal_list_item_t *pos /* Spot check: double check that this item is only on the list that we just added it to */ - OPAL_THREAD_ADD32( &(item->opal_list_item_refcount), 1 ); + OPAL_THREAD_ADD_FETCH32( &(item->opal_list_item_refcount), 1 ); assert(1 == item->opal_list_item_refcount); item->opal_list_item_belong_to = list; #endif diff --git a/opal/class/opal_object.c b/opal/class/opal_object.c index d36ab208b6c..cd09d647f66 100644 --- a/opal/class/opal_object.c +++ b/opal/class/opal_object.c @@ -55,7 +55,7 @@ int opal_class_init_epoch = 1; /* * Local variables */ -static opal_atomic_lock_t class_lock = { { OPAL_ATOMIC_UNLOCKED } }; +static opal_atomic_lock_t class_lock = { { OPAL_ATOMIC_LOCK_UNLOCKED } }; static void** classes = NULL; static int num_classes = 0; static int max_classes = 0; diff --git a/opal/class/opal_object.h b/opal/class/opal_object.h index 8539f2bf872..4e2da95c204 100644 --- a/opal/class/opal_object.h +++ b/opal/class/opal_object.h @@ -510,7 +510,7 @@ static inline opal_object_t *opal_obj_new(opal_class_t * cls) static inline int opal_obj_update(opal_object_t *object, int inc) __opal_attribute_always_inline__; static inline int opal_obj_update(opal_object_t *object, int inc) { - return OPAL_THREAD_ADD32(&object->obj_reference_count, inc); + return OPAL_THREAD_ADD_FETCH32(&object->obj_reference_count, inc); } END_C_DECLS diff --git a/opal/class/opal_pointer_array.c b/opal/class/opal_pointer_array.c index 0bbbb5a2277..b28337a616c 100644 --- a/opal/class/opal_pointer_array.c +++ b/opal/class/opal_pointer_array.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2007 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -27,11 +27,9 @@ #include "opal/class/opal_pointer_array.h" #include "opal/util/output.h" -enum { TABLE_INIT = 1, TABLE_GROW = 2 }; - static void opal_pointer_array_construct(opal_pointer_array_t *); static void opal_pointer_array_destruct(opal_pointer_array_t *); -static bool grow_table(opal_pointer_array_t *table, int soft, int hard); +static bool grow_table(opal_pointer_array_t *table, int at_least); OBJ_CLASS_INSTANCE(opal_pointer_array_t, opal_object_t, opal_pointer_array_construct, @@ -47,8 +45,9 @@ static void opal_pointer_array_construct(opal_pointer_array_t *array) array->number_free = 0; array->size = 0; array->max_size = INT_MAX; - array->block_size = 0; - array->addr = 0; + array->block_size = 8; + array->free_bits = NULL; + array->addr = NULL; } /* @@ -57,7 +56,11 @@ static void opal_pointer_array_construct(opal_pointer_array_t *array) static void opal_pointer_array_destruct(opal_pointer_array_t *array) { /* free table */ - if( NULL != array->addr) { + if( NULL != array->free_bits) { + free(array->free_bits); + array->free_bits = NULL; + } + if( NULL != array->addr ) { free(array->addr); array->addr = NULL; } @@ -67,6 +70,108 @@ static void opal_pointer_array_destruct(opal_pointer_array_t *array) OBJ_DESTRUCT(&array->lock); } +#define TYPE_ELEM_COUNT(TYPE, CAP) (((CAP) + 8 * sizeof(TYPE) - 1) / (8 * sizeof(TYPE))) + +/** + * Translate an index position into the free bits array into 2 values, the + * index of the element and the index of the bit position. + */ +#define GET_BIT_POS(IDX, BIDX, PIDX) \ + do { \ + uint32_t __idx = (uint32_t)(IDX); \ + (BIDX) = (__idx / (8 * sizeof(uint64_t))); \ + (PIDX) = (__idx % (8 * sizeof(uint64_t))); \ + } while(0) + +/** + * A classical find first zero bit (ffs) on a large array. It checks starting + * from the indicated position until it finds a zero bit. If SET is true, + * the bit is set. The position of the bit is returned in store. + * + * According to Section 6.4.4.1 of the C standard we don't need to prepend a type + * indicator to constants (the type is inferred by the compiler according to + * the number of bits necessary to represent it). + */ +#define FIND_FIRST_ZERO(START_IDX, STORE) \ + do { \ + uint32_t __b_idx, __b_pos; \ + if( 0 == table->number_free ) { \ + (STORE) = table->size; \ + break; \ + } \ + GET_BIT_POS((START_IDX), __b_idx, __b_pos); \ + for (; table->free_bits[__b_idx] == 0xFFFFFFFFFFFFFFFFu; __b_idx++); \ + assert(__b_idx < (uint32_t)table->size); \ + uint64_t __check_value = table->free_bits[__b_idx]; \ + __b_pos = 0; \ + \ + if( 0x00000000FFFFFFFFu == (__check_value & 0x00000000FFFFFFFFu) ) { \ + __check_value >>= 32; __b_pos += 32; \ + } \ + if( 0x000000000000FFFFu == (__check_value & 0x000000000000FFFFu) ) { \ + __check_value >>= 16; __b_pos += 16; \ + } \ + if( 0x00000000000000FFu == (__check_value & 0x00000000000000FFu) ) { \ + __check_value >>= 8; __b_pos += 8; \ + } \ + if( 0x000000000000000Fu == (__check_value & 0x000000000000000Fu) ) { \ + __check_value >>= 4; __b_pos += 4; \ + } \ + if( 0x0000000000000003u == (__check_value & 0x0000000000000003u) ) { \ + __check_value >>= 2; __b_pos += 2; \ + } \ + if( 0x0000000000000001u == (__check_value & 0x0000000000000001u) ) { \ + __b_pos += 1; \ + } \ + (STORE) = (__b_idx * 8 * sizeof(uint64_t)) + __b_pos; \ + } while(0) + +/** + * Set the IDX bit in the free_bits array. The bit should be previously unset. + */ +#define SET_BIT(IDX) \ + do { \ + uint32_t __b_idx, __b_pos; \ + GET_BIT_POS((IDX), __b_idx, __b_pos); \ + assert( 0 == (table->free_bits[__b_idx] & (((uint64_t)1) << __b_pos))); \ + table->free_bits[__b_idx] |= (((uint64_t)1) << __b_pos); \ + } while(0) + +/** + * Unset the IDX bit in the free_bits array. The bit should be previously set. + */ +#define UNSET_BIT(IDX) \ + do { \ + uint32_t __b_idx, __b_pos; \ + GET_BIT_POS((IDX), __b_idx, __b_pos); \ + assert( (table->free_bits[__b_idx] & (((uint64_t)1) << __b_pos))); \ + table->free_bits[__b_idx] ^= (((uint64_t)1) << __b_pos); \ + } while(0) + +#if 0 +/** + * Validate the pointer array by making sure that the elements and + * the free bits array are in sync. It also check that the number + * of remaining free element is consistent. + */ +static void opal_pointer_array_validate(opal_pointer_array_t *array) +{ + int i, cnt = 0; + uint32_t b_idx, p_idx; + + for( i = 0; i < array->size; i++ ) { + GET_BIT_POS(i, b_idx, p_idx); + if( NULL == array->addr[i] ) { + cnt++; + assert( 0 == (array->free_bits[b_idx] & (((uint64_t)1) << p_idx)) ); + } else { + assert( 0 != (array->free_bits[b_idx] & (((uint64_t)1) << p_idx)) ); + } + } + assert(cnt == array->number_free); +} +#endif + /** * initialize an array object */ @@ -82,18 +187,24 @@ int opal_pointer_array_init(opal_pointer_array_t* array, } array->max_size = max_size; - array->block_size = block_size; + array->block_size = (0 == block_size ? 8 : block_size); + array->lowest_free = 0; num_bytes = (0 < initial_allocation ? initial_allocation : block_size); - array->number_free = num_bytes; - array->size = num_bytes; - num_bytes *= sizeof(void*); /* Allocate and set the array to NULL */ - array->addr = (void **)calloc(num_bytes, 1); + array->addr = (void **)calloc(num_bytes, sizeof(void*)); if (NULL == array->addr) { /* out of memory */ return OPAL_ERR_OUT_OF_RESOURCE; } + array->free_bits = (uint64_t*)calloc(TYPE_ELEM_COUNT(uint64_t, num_bytes), sizeof(uint64_t)); + if (NULL == array->free_bits) { /* out of memory */ + free(array->addr); + array->addr = NULL; + return OPAL_ERR_OUT_OF_RESOURCE; + } + array->number_free = num_bytes; + array->size = num_bytes; return OPAL_SUCCESS; } @@ -108,15 +219,13 @@ int opal_pointer_array_init(opal_pointer_array_t* array, */ int opal_pointer_array_add(opal_pointer_array_t *table, void *ptr) { - int i, index; + int index = table->size + 1; OPAL_THREAD_LOCK(&(table->lock)); if (table->number_free == 0) { /* need to grow table */ - if (!grow_table(table, - (NULL == table->addr ? TABLE_INIT : table->size * TABLE_GROW), - INT_MAX)) { + if (!grow_table(table, index) ) { OPAL_THREAD_UNLOCK(&(table->lock)); return OPAL_ERR_OUT_OF_RESOURCE; } @@ -131,21 +240,19 @@ int opal_pointer_array_add(opal_pointer_array_t *table, void *ptr) */ index = table->lowest_free; - assert(table->addr[index] == NULL); + assert(NULL == table->addr[index]); table->addr[index] = ptr; table->number_free--; + SET_BIT(index); if (table->number_free > 0) { - for (i = table->lowest_free + 1; i < table->size; i++) { - if (table->addr[i] == NULL) { - table->lowest_free = i; - break; - } - } - } - else { + FIND_FIRST_ZERO(index, table->lowest_free); + } else { table->lowest_free = table->size; } +#if 0 + opal_pointer_array_validate(table); +#endif OPAL_THREAD_UNLOCK(&(table->lock)); return index; } @@ -174,41 +281,37 @@ int opal_pointer_array_set_item(opal_pointer_array_t *table, int index, OPAL_THREAD_LOCK(&(table->lock)); if (table->size <= index) { - if (!grow_table(table, ((index / TABLE_GROW) + 1) * TABLE_GROW, - index)) { + if (!grow_table(table, index)) { OPAL_THREAD_UNLOCK(&(table->lock)); return OPAL_ERROR; } } - + assert(table->size > index); /* mark element as free, if NULL element */ if( NULL == value ) { - if (index < table->lowest_free) { - table->lowest_free = index; - } if( NULL != table->addr[index] ) { + if (index < table->lowest_free) { + table->lowest_free = index; + } table->number_free++; + UNSET_BIT(index); } } else { if (NULL == table->addr[index]) { table->number_free--; - } - /* Reset lowest_free if required */ - if ( index == table->lowest_free ) { - int i; - - table->lowest_free = table->size; - for ( i=index + 1; isize; i++) { - if ( NULL == table->addr[i] ){ - table->lowest_free = i; - break; - } + SET_BIT(index); + /* Reset lowest_free if required */ + if ( index == table->lowest_free ) { + FIND_FIRST_ZERO(index, table->lowest_free); } + } else { + assert( index != table->lowest_free ); } } table->addr[index] = value; #if 0 + opal_pointer_array_validate(table); opal_output(0,"opal_pointer_array_set_item: OUT: " " table %p (size %ld, lowest free %ld, number free %ld)" " addr[%d] = %p\n", @@ -259,8 +362,7 @@ bool opal_pointer_array_test_and_set_item (opal_pointer_array_t *table, /* Do we need to grow the table? */ if (table->size <= index) { - if (!grow_table(table, (((index / TABLE_GROW) + 1) * TABLE_GROW), - index)) { + if (!grow_table(table, index)) { OPAL_THREAD_UNLOCK(&(table->lock)); return false; } @@ -269,22 +371,21 @@ bool opal_pointer_array_test_and_set_item (opal_pointer_array_t *table, /* * allow a specific index to be changed. */ + assert(NULL == table->addr[index]); table->addr[index] = value; table->number_free--; + SET_BIT(index); /* Reset lowest_free if required */ - if ( index == table->lowest_free ) { - int i; - - table->lowest_free = table->size; - for ( i=index; isize; i++) { - if ( NULL == table->addr[i] ){ - table->lowest_free = i; - break; - } + if( table->number_free > 0 ) { + if ( index == table->lowest_free ) { + FIND_FIRST_ZERO(index, table->lowest_free); } + } else { + table->lowest_free = table->size; } #if 0 + opal_pointer_array_validate(table); opal_output(0,"opal_pointer_array_test_and_set_item: OUT: " " table %p (size %ld, lowest free %ld, number free %ld)" " addr[%d] = %p\n", @@ -300,7 +401,7 @@ int opal_pointer_array_set_size(opal_pointer_array_t *array, int new_size) { OPAL_THREAD_LOCK(&(array->lock)); if(new_size > array->size) { - if (!grow_table(array, new_size, new_size)) { + if (!grow_table(array, new_size)) { OPAL_THREAD_UNLOCK(&(array->lock)); return OPAL_ERROR; } @@ -309,37 +410,45 @@ int opal_pointer_array_set_size(opal_pointer_array_t *array, int new_size) return OPAL_SUCCESS; } -static bool grow_table(opal_pointer_array_t *table, int soft, int hard) +static bool grow_table(opal_pointer_array_t *table, int at_least) { - int new_size; - int i, new_size_int; + int i, new_size, new_size_int; void *p; - /* new_size = ((table->size + num_needed + table->block_size - 1) / - table->block_size) * table->block_size; */ - new_size = soft; - if( soft > table->max_size ) { - if( hard > table->max_size ) { + new_size = table->block_size * ((at_least + 1 + table->block_size - 1) / table->block_size); + if( new_size >= table->max_size ) { + new_size = table->max_size; + if( at_least >= table->max_size ) { return false; } - new_size = hard; - } - if( new_size >= table->max_size ) { - return false; } p = (void **) realloc(table->addr, new_size * sizeof(void *)); - if (p == NULL) { + if (NULL == p) { return false; } - new_size_int = (int) new_size; - table->number_free += new_size_int - table->size; + table->number_free += (new_size - table->size); table->addr = (void**)p; - for (i = table->size; i < new_size_int; ++i) { + for (i = table->size; i < new_size; ++i) { table->addr[i] = NULL; } - table->size = new_size_int; - + new_size_int = TYPE_ELEM_COUNT(uint64_t, new_size); + if( (int)(TYPE_ELEM_COUNT(uint64_t, table->size)) != new_size_int ) { + p = (uint64_t*)realloc(table->free_bits, new_size_int * sizeof(uint64_t)); + if (NULL == p) { + return false; + } + table->free_bits = (uint64_t*)p; + for (i = TYPE_ELEM_COUNT(uint64_t, table->size); + i < new_size_int; i++ ) { + table->free_bits[i] = 0; + } + } + table->size = new_size; +#if 0 + opal_output(0, "grow_table %p to %d (max_size %d, block %d, number_free %d)\n", + (void*)table, table->size, table->max_size, table->block_size, table->number_free); +#endif return true; } diff --git a/opal/class/opal_pointer_array.h b/opal/class/opal_pointer_array.h index 87b45b1a337..5900243b043 100644 --- a/opal/class/opal_pointer_array.h +++ b/opal/class/opal_pointer_array.h @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2008 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -58,6 +58,8 @@ struct opal_pointer_array_t { int max_size; /** block size for each allocation */ int block_size; + /** pointer to an array of bits to speed up the research for an empty position. */ + uint64_t* free_bits; /** pointer to array of pointers */ void **addr; }; @@ -195,9 +197,12 @@ static inline void opal_pointer_array_remove_all(opal_pointer_array_t *array) OPAL_THREAD_LOCK(&array->lock); array->lowest_free = 0; array->number_free = array->size; - for(i=0; isize; i++) { + for(i = 0; i < array->size; i++) { array->addr[i] = NULL; } + for(i = 0; i < (int)((array->size + 8*sizeof(uint64_t) - 1) / (8*sizeof(uint64_t))); i++) { + array->free_bits[i] = 0; + } OPAL_THREAD_UNLOCK(&array->lock); } diff --git a/opal/class/opal_tree.c b/opal/class/opal_tree.c index fdd41ea20a1..d56813f1dd3 100644 --- a/opal/class/opal_tree.c +++ b/opal/class/opal_tree.c @@ -210,7 +210,7 @@ void opal_tree_add_child(opal_tree_item_t *parent_item, /* Spot check: ensure this item is only on the list that we just appended it to */ - OPAL_THREAD_ADD32( &(new_item->opal_tree_item_refcount), 1 ); + OPAL_THREAD_ADD_FETCH32( &(new_item->opal_tree_item_refcount), 1 ); assert(1 == new_item->opal_tree_item_refcount); new_item->opal_tree_item_belong_to = new_item->opal_tree_container; #endif diff --git a/opal/datatype/Makefile.am b/opal/datatype/Makefile.am index 6002a739f20..daaaa8e4b07 100644 --- a/opal/datatype/Makefile.am +++ b/opal/datatype/Makefile.am @@ -15,6 +15,8 @@ # Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2011-2013 NVIDIA Corporation. All rights reserved. +# Copyright (c) 2018 Research Organization for Information Science +# and Technology (RIST). All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -59,6 +61,7 @@ libdatatype_la_SOURCES = \ opal_datatype_fake_stack.c \ opal_datatype_get_count.c \ opal_datatype_module.c \ + opal_datatype_monotonic.c \ opal_datatype_optimize.c \ opal_datatype_pack.c \ opal_datatype_position.c \ diff --git a/opal/datatype/opal_convertor.c b/opal/datatype/opal_convertor.c index 46aff829723..63b4d714084 100644 --- a/opal/datatype/opal_convertor.c +++ b/opal/datatype/opal_convertor.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2016 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2006 High Performance Computing Center Stuttgart, @@ -12,8 +12,9 @@ * All rights reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. * Copyright (c) 2011 NVIDIA Corporation. All rights reserved. - * Copyright (c) 2013-2016 Research Organization for Information Science + * Copyright (c) 2013-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 Intel, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -43,9 +44,6 @@ CONVERTOR->cbmemcpy( (DST), (SRC), (BLENGTH), (CONVERTOR) ) #endif -extern int opal_convertor_create_stack_with_pos_general( opal_convertor_t* convertor, - int starting_point, const int* sizes ); - static void opal_convertor_construct( opal_convertor_t* convertor ) { convertor->pStack = convertor->static_stack; @@ -226,7 +224,7 @@ int32_t opal_convertor_pack( opal_convertor_t* pConv, if( OPAL_LIKELY(pConv->flags & CONVERTOR_NO_OP) ) { /** * We are doing conversion on a contiguous datatype on a homogeneous - * environment. The convertor contain minimal informations, we only + * environment. The convertor contain minimal information, we only * use the bConverted to manage the conversion. */ uint32_t i; @@ -333,7 +331,7 @@ static inline int opal_convertor_create_stack_with_pos_contig( opal_convertor_t* const opal_datatype_t* pData = pConvertor->pDesc; dt_elem_desc_t* pElems; uint32_t count; - OPAL_PTRDIFF_TYPE extent; + ptrdiff_t extent; pStack = pConvertor->pStack; /** @@ -447,31 +445,55 @@ int32_t opal_convertor_set_position_nocheck( opal_convertor_t* convertor, return rc; } +static size_t +opal_datatype_compute_remote_size( const opal_datatype_t* pData, + const size_t* sizes ) +{ + uint32_t typeMask = pData->bdt_used; + size_t length = 0; + + if (opal_datatype_is_predefined(pData)) { + return sizes[pData->desc.desc->elem.common.type]; + } + + if( OPAL_UNLIKELY(NULL == pData->ptypes) ) { + /* Allocate and fill the array of types used in the datatype description */ + opal_datatype_compute_ptypes( (opal_datatype_t*)pData ); + } + + for( int i = OPAL_DATATYPE_FIRST_TYPE; typeMask && (i < OPAL_DATATYPE_MAX_PREDEFINED); i++ ) { + if( typeMask & ((uint32_t)1 << i) ) { + length += (pData->ptypes[i] * sizes[i]); + typeMask ^= ((uint32_t)1 << i); + } + } + return length; +} /** * Compute the remote size. If necessary remove the homogeneous flag * and redirect the convertor description toward the non-optimized * datatype representation. */ -#define OPAL_CONVERTOR_COMPUTE_REMOTE_SIZE(convertor, datatype, bdt_mask) \ -{ \ - if( OPAL_UNLIKELY(0 != (bdt_mask)) ) { \ - opal_convertor_master_t* master; \ - int i; \ - uint32_t mask = datatype->bdt_used; \ - convertor->flags &= (~CONVERTOR_HOMOGENEOUS); \ - master = convertor->master; \ - convertor->remote_size = 0; \ - for( i = OPAL_DATATYPE_FIRST_TYPE; mask && (i < OPAL_DATATYPE_MAX_PREDEFINED); i++ ) { \ - if( mask & ((uint32_t)1 << i) ) { \ - convertor->remote_size += (datatype->btypes[i] * \ - master->remote_sizes[i]); \ - mask ^= ((uint32_t)1 << i); \ - } \ - } \ - convertor->remote_size *= convertor->count; \ - convertor->use_desc = &(datatype->desc); \ - } \ +size_t opal_convertor_compute_remote_size( opal_convertor_t* pConvertor ) +{ + opal_datatype_t* datatype = (opal_datatype_t*)pConvertor->pDesc; + + pConvertor->remote_size = pConvertor->local_size; + if( OPAL_UNLIKELY(datatype->bdt_used & pConvertor->master->hetero_mask) ) { + pConvertor->flags &= (~CONVERTOR_HOMOGENEOUS); + if (!(pConvertor->flags & CONVERTOR_SEND && pConvertor->flags & OPAL_DATATYPE_FLAG_CONTIGUOUS)) { + pConvertor->use_desc = &(datatype->desc); + } + if( 0 == (pConvertor->flags & CONVERTOR_HAS_REMOTE_SIZE) ) { + /* This is for a single datatype, we must update it with the count */ + pConvertor->remote_size = opal_datatype_compute_remote_size(datatype, + pConvertor->master->remote_sizes); + pConvertor->remote_size *= pConvertor->count; + } + } + pConvertor->flags |= CONVERTOR_HAS_REMOTE_SIZE; + return pConvertor->remote_size; } /** @@ -483,29 +505,26 @@ int32_t opal_convertor_set_position_nocheck( opal_convertor_t* convertor, */ #define OPAL_CONVERTOR_PREPARE( convertor, datatype, count, pUserBuf ) \ { \ - uint32_t bdt_mask; \ - \ + convertor->local_size = count * datatype->size; \ + convertor->pBaseBuf = (unsigned char*)pUserBuf; \ + convertor->count = count; \ + convertor->pDesc = (opal_datatype_t*)datatype; \ + convertor->bConverted = 0; \ + convertor->use_desc = &(datatype->opt_desc); \ /* If the data is empty we just mark the convertor as \ * completed. With this flag set the pack and unpack functions \ * will not do anything. \ */ \ if( OPAL_UNLIKELY((0 == count) || (0 == datatype->size)) ) { \ - convertor->flags |= OPAL_DATATYPE_FLAG_NO_GAPS | CONVERTOR_COMPLETED; \ + convertor->flags |= (OPAL_DATATYPE_FLAG_NO_GAPS | CONVERTOR_COMPLETED | CONVERTOR_HAS_REMOTE_SIZE); \ convertor->local_size = convertor->remote_size = 0; \ return OPAL_SUCCESS; \ } \ - /* Compute the local in advance */ \ - convertor->local_size = count * datatype->size; \ - convertor->pBaseBuf = (unsigned char*)pUserBuf; \ - convertor->count = count; \ \ /* Grab the datatype part of the flags */ \ convertor->flags &= CONVERTOR_TYPE_MASK; \ convertor->flags |= (CONVERTOR_DATATYPE_MASK & datatype->flags); \ convertor->flags |= (CONVERTOR_NO_OP | CONVERTOR_HOMOGENEOUS); \ - convertor->pDesc = (opal_datatype_t*)datatype; \ - convertor->bConverted = 0; \ - convertor->use_desc = &(datatype->opt_desc); \ \ convertor->remote_size = convertor->local_size; \ if( OPAL_LIKELY(convertor->remoteArch == opal_local_arch) ) { \ @@ -516,9 +535,8 @@ int32_t opal_convertor_set_position_nocheck( opal_convertor_t* convertor, } \ } \ \ - bdt_mask = datatype->bdt_used & convertor->master->hetero_mask; \ - OPAL_CONVERTOR_COMPUTE_REMOTE_SIZE( convertor, datatype, \ - bdt_mask ); \ + assert( (convertor)->pDesc == (datatype) ); \ + opal_convertor_compute_remote_size( convertor ); \ assert( NULL != convertor->use_desc->desc ); \ /* For predefined datatypes (contiguous) do nothing more */ \ /* if checksum is enabled then always continue */ \ @@ -530,7 +548,7 @@ int32_t opal_convertor_set_position_nocheck( opal_convertor_t* convertor, } \ convertor->flags &= ~CONVERTOR_NO_OP; \ { \ - uint32_t required_stack_length = datatype->btypes[OPAL_DATATYPE_LOOP] + 1; \ + uint32_t required_stack_length = datatype->loops + 1; \ \ if( required_stack_length > convertor->stack_size ) { \ assert(convertor->pStack == convertor->static_stack); \ @@ -552,9 +570,12 @@ int32_t opal_convertor_prepare_for_recv( opal_convertor_t* convertor, convertor->flags |= CONVERTOR_RECV; #if OPAL_CUDA_SUPPORT - mca_cuda_convertor_init(convertor, pUserBuf); + if (!( convertor->flags & CONVERTOR_SKIP_CUDA_INIT )) { + mca_cuda_convertor_init(convertor, pUserBuf); + } #endif + assert(! (convertor->flags & CONVERTOR_SEND)); OPAL_CONVERTOR_PREPARE( convertor, datatype, count, pUserBuf ); if( convertor->flags & CONVERTOR_WITH_CHECKSUM ) { @@ -589,7 +610,9 @@ int32_t opal_convertor_prepare_for_send( opal_convertor_t* convertor, { convertor->flags |= CONVERTOR_SEND; #if OPAL_CUDA_SUPPORT - mca_cuda_convertor_init(convertor, pUserBuf); + if (!( convertor->flags & CONVERTOR_SKIP_CUDA_INIT )) { + mca_cuda_convertor_init(convertor, pUserBuf); + } #endif OPAL_CONVERTOR_PREPARE( convertor, datatype, count, pUserBuf ); @@ -599,7 +622,7 @@ int32_t opal_convertor_prepare_for_send( opal_convertor_t* convertor, convertor->fAdvance = opal_pack_general_checksum; } else { if( datatype->flags & OPAL_DATATYPE_FLAG_CONTIGUOUS ) { - if( ((datatype->ub - datatype->lb) == (OPAL_PTRDIFF_TYPE)datatype->size) + if( ((datatype->ub - datatype->lb) == (ptrdiff_t)datatype->size) || (1 >= convertor->count) ) convertor->fAdvance = opal_pack_homogeneous_contig_checksum; else @@ -613,7 +636,7 @@ int32_t opal_convertor_prepare_for_send( opal_convertor_t* convertor, convertor->fAdvance = opal_pack_general; } else { if( datatype->flags & OPAL_DATATYPE_FLAG_CONTIGUOUS ) { - if( ((datatype->ub - datatype->lb) == (OPAL_PTRDIFF_TYPE)datatype->size) + if( ((datatype->ub - datatype->lb) == (ptrdiff_t)datatype->size) || (1 >= convertor->count) ) convertor->fAdvance = opal_pack_homogeneous_contig; else @@ -714,8 +737,8 @@ void opal_datatype_dump_stack( const dt_stack_t* pStack, int stack_pos, opal_output( 0, "%d: pos %d count %d disp %ld ", stack_pos, pStack[stack_pos].index, (int)pStack[stack_pos].count, (long)pStack[stack_pos].disp ); if( pStack->index != -1 ) - opal_output( 0, "\t[desc count %d disp %ld extent %ld]\n", - pDesc[pStack[stack_pos].index].elem.count, + opal_output( 0, "\t[desc count %lu disp %ld extent %ld]\n", + (unsigned long)pDesc[pStack[stack_pos].index].elem.count, (long)pDesc[pStack[stack_pos].index].elem.disp, (long)pDesc[pStack[stack_pos].index].elem.extent ); else diff --git a/opal/datatype/opal_convertor.h b/opal/datatype/opal_convertor.h index 7c5de1af39b..22a2bb1de3f 100644 --- a/opal/datatype/opal_convertor.h +++ b/opal/datatype/opal_convertor.h @@ -3,7 +3,7 @@ * Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2014 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2006 High Performance Computing Center Stuttgart, @@ -12,6 +12,9 @@ * All rights reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. * Copyright (c) 2014 NVIDIA Corporation. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 Intel, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -52,6 +55,8 @@ BEGIN_C_DECLS #define CONVERTOR_STATE_ALLOC 0x04000000 #define CONVERTOR_COMPLETED 0x08000000 #define CONVERTOR_CUDA_UNIFIED 0x10000000 +#define CONVERTOR_HAS_REMOTE_SIZE 0x20000000 +#define CONVERTOR_SKIP_CUDA_INIT 0x40000000 union dt_elem_desc; typedef struct opal_convertor_t opal_convertor_t; @@ -70,7 +75,7 @@ struct dt_stack_t { int32_t index; /**< index in the element description */ int16_t type; /**< the type used for the last pack/unpack (original or OPAL_DATATYPE_UINT1) */ size_t count; /**< number of times we still have to do it */ - OPAL_PTRDIFF_TYPE disp; /**< actual displacement depending on the count field */ + ptrdiff_t disp; /**< actual displacement depending on the count field */ }; typedef struct dt_stack_t dt_stack_t; @@ -184,9 +189,16 @@ static inline int32_t opal_convertor_need_buffers( const opal_convertor_t* pConv return 1; } +/** + * Update the size of the remote datatype representation. The size will + * depend on the configuration of the master convertor. In homogeneous + * environments, the local and remote sizes are identical. + */ +size_t +opal_convertor_compute_remote_size( opal_convertor_t* pConv ); -/* - * +/** + * Return the local size of the convertor (count times the size of the datatype). */ static inline void opal_convertor_get_packed_size( const opal_convertor_t* pConv, size_t* pSize ) @@ -195,16 +207,25 @@ static inline void opal_convertor_get_packed_size( const opal_convertor_t* pConv } -/* - * +/** + * Return the remote size of the convertor (count times the remote size of the + * datatype). On homogeneous environments the local and remote sizes are + * identical. */ static inline void opal_convertor_get_unpacked_size( const opal_convertor_t* pConv, size_t* pSize ) { + if( pConv->flags & CONVERTOR_HOMOGENEOUS ) { + *pSize = pConv->local_size; + return; + } + if( 0 == (CONVERTOR_HAS_REMOTE_SIZE & pConv->flags) ) { + assert(! (pConv->flags & CONVERTOR_SEND)); + opal_convertor_compute_remote_size( (opal_convertor_t*)pConv); + } *pSize = pConv->remote_size; } - /** * Return the current absolute position of the next pack/unpack. This function is * mostly useful for contiguous datatypes, when we need to get the pointer to the @@ -277,6 +298,7 @@ opal_convertor_raw( opal_convertor_t* convertor, /* [IN/OUT] */ uint32_t* iov_count, /* [IN/OUT] */ size_t* length ); /* [OUT] */ + /* * Upper level does not need to call the _nocheck function directly. */ diff --git a/opal/datatype/opal_convertor_internal.h b/opal/datatype/opal_convertor_internal.h index 8c7f9f05da3..025633cb7e7 100644 --- a/opal/datatype/opal_convertor_internal.h +++ b/opal/datatype/opal_convertor_internal.h @@ -4,7 +4,9 @@ * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. - * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -21,9 +23,9 @@ BEGIN_C_DECLS typedef int32_t (*conversion_fct_t)( opal_convertor_t* pConvertor, uint32_t count, - const void* from, size_t from_len, OPAL_PTRDIFF_TYPE from_extent, - void* to, size_t to_length, OPAL_PTRDIFF_TYPE to_extent, - OPAL_PTRDIFF_TYPE *advance ); + const void* from, size_t from_len, ptrdiff_t from_extent, + void* to, size_t to_length, ptrdiff_t to_extent, + ptrdiff_t *advance ); typedef struct opal_convertor_master_t { struct opal_convertor_master_t* next; diff --git a/opal/datatype/opal_convertor_raw.c b/opal/datatype/opal_convertor_raw.c index ce0eaf33305..09019388127 100644 --- a/opal/datatype/opal_convertor_raw.c +++ b/opal/datatype/opal_convertor_raw.c @@ -4,7 +4,9 @@ * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. - * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -164,7 +166,7 @@ opal_convertor_raw( opal_convertor_t* pConvertor, pos_desc, (long)pStack->disp, (unsigned long)raw_data ); ); } if( OPAL_DATATYPE_LOOP == pElem->elem.common.type ) { - OPAL_PTRDIFF_TYPE local_disp = (OPAL_PTRDIFF_TYPE)source_base; + ptrdiff_t local_disp = (ptrdiff_t)source_base; ddt_endloop_desc_t* end_loop = (ddt_endloop_desc_t*)(pElem + pElem->loop.items); if( pElem->loop.common.flags & OPAL_DATATYPE_FLAG_CONTIGUOUS ) { @@ -185,7 +187,7 @@ opal_convertor_raw( opal_convertor_t* pConvertor, goto update_loop_description; } } - local_disp = (OPAL_PTRDIFF_TYPE)source_base - local_disp; + local_disp = (ptrdiff_t)source_base - local_disp; PUSH_STACK( pStack, pConvertor->stack_pos, pos_desc, OPAL_DATATYPE_LOOP, count_desc, pStack->disp + local_disp); pos_desc++; diff --git a/opal/datatype/opal_copy_functions.c b/opal/datatype/opal_copy_functions.c index 433cf4173e3..221d07a920c 100644 --- a/opal/datatype/opal_copy_functions.c +++ b/opal/datatype/opal_copy_functions.c @@ -4,9 +4,9 @@ * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -40,9 +40,9 @@ */ #define COPY_TYPE( TYPENAME, TYPE, COUNT ) \ static int copy_##TYPENAME( opal_convertor_t *pConvertor, uint32_t count, \ - char* from, size_t from_len, OPAL_PTRDIFF_TYPE from_extent, \ - char* to, size_t to_len, OPAL_PTRDIFF_TYPE to_extent, \ - OPAL_PTRDIFF_TYPE *advance) \ + char* from, size_t from_len, ptrdiff_t from_extent, \ + char* to, size_t to_len, ptrdiff_t to_extent, \ + ptrdiff_t *advance) \ { \ uint32_t i; \ size_t remote_TYPE_size = sizeof(TYPE) * (COUNT); /* TODO */ \ @@ -61,8 +61,8 @@ static int copy_##TYPENAME( opal_convertor_t *pConvertor, uint32_t count, DUMP( " copy %s count %d from buffer %p with length %d to %p space %d\n", \ #TYPE, count, from, from_len, to, to_len ); \ \ - if( (from_extent == (OPAL_PTRDIFF_TYPE)local_TYPE_size) && \ - (to_extent == (OPAL_PTRDIFF_TYPE)remote_TYPE_size) ) { \ + if( (from_extent == (ptrdiff_t)local_TYPE_size) && \ + (to_extent == (ptrdiff_t)remote_TYPE_size) ) { \ /* copy of contigous data at both source and destination */ \ MEMCPY( to, from, count * local_TYPE_size ); \ } else { \ @@ -93,9 +93,9 @@ static int copy_##TYPENAME( opal_convertor_t *pConvertor, uint32_t count, */ #define COPY_CONTIGUOUS_BYTES( TYPENAME, COUNT ) \ static int copy_##TYPENAME##_##COUNT( opal_convertor_t *pConvertor, uint32_t count, \ - char* from, size_t from_len, OPAL_PTRDIFF_TYPE from_extent, \ - char* to, size_t to_len, OPAL_PTRDIFF_TYPE to_extent, \ - OPAL_PTRDIFF_TYPE *advance ) \ + char* from, size_t from_len, ptrdiff_t from_extent, \ + char* to, size_t to_len, ptrdiff_t to_extent, \ + ptrdiff_t *advance ) \ { \ uint32_t i; \ size_t remote_TYPE_size = (size_t)(COUNT); /* TODO */ \ @@ -113,8 +113,8 @@ static int copy_##TYPENAME##_##COUNT( opal_convertor_t *pConvertor, uint32_t cou DUMP( " copy %s count %d from buffer %p with length %d to %p space %d\n", \ #TYPENAME, count, from, from_len, to, to_len ); \ \ - if( (from_extent == (OPAL_PTRDIFF_TYPE)local_TYPE_size) && \ - (to_extent == (OPAL_PTRDIFF_TYPE)remote_TYPE_size) ) { \ + if( (from_extent == (ptrdiff_t)local_TYPE_size) && \ + (to_extent == (ptrdiff_t)remote_TYPE_size) ) { \ MEMCPY( to, from, count * local_TYPE_size ); \ } else { \ for( i = 0; i < count; i++ ) { \ diff --git a/opal/datatype/opal_copy_functions_heterogeneous.c b/opal/datatype/opal_copy_functions_heterogeneous.c index 956a1d46bcb..a46e87b4dde 100644 --- a/opal/datatype/opal_copy_functions_heterogeneous.c +++ b/opal/datatype/opal_copy_functions_heterogeneous.c @@ -4,7 +4,8 @@ * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -15,6 +16,10 @@ #include "opal_config.h" +#ifdef HAVE_IEEE754_H +#include +#endif + #include #include @@ -62,6 +67,68 @@ opal_dt_swap_bytes(void *to_p, const void *from_p, const size_t size, size_t cou } } +#ifdef HAVE_IEEE754_H +struct bit128 { + unsigned int mantissa3:32; + unsigned int mantissa2:32; + unsigned int mantissa1:32; + unsigned int mantissa0:16; + unsigned int exponent:15; + unsigned int negative:1; +}; + +struct bit80 { + unsigned int pad:32; + unsigned int empty:16; + unsigned int negative:1; + unsigned int exponent:15; + unsigned int mantissa0:32; + unsigned int mantissa1:32; +}; + +static inline void +opal_dt_swap_long_double(void *to_p, const void *from_p, const size_t size, size_t count, uint32_t remoteArch) +{ +#ifdef HAVE_IEEE754_H + size_t i; + long double*to = (long double *) to_p; + + if ((opal_local_arch&OPAL_ARCH_LDISINTEL) && !(remoteArch&OPAL_ARCH_LDISINTEL)) { +#ifdef __x86_64 + for (i=0; imantissa0 << 15) & 0x7FFF8000) | ((b->mantissa1 >> 17) & 0x00007FFF); + ld.ieee.mantissa1 = ((b->mantissa1 << 15) & 0xFFFF8000) | ((b->mantissa2 << 17) & 0x000007FFF); + ld.ieee.exponent = b->exponent; + ld.ieee.negative = b->negative; + MEMCPY( to, &ld, sizeof(long double)); + } +#endif + } else if (!(opal_local_arch&OPAL_ARCH_LDISINTEL) && (remoteArch&OPAL_ARCH_LDISINTEL)) { +#ifdef __sparcv9 + for (i=0; imantissa0 << 1) | (b->mantissa1 & 0x80000000); + ld.ieee.mantissa1 = (b->mantissa1 << 1) & 0xFFFFFFFE; + ld.ieee.exponent = b->exponent; + ld.ieee.negative = b->negative; + MEMCPY( to, &ld, sizeof(long double)); + } +#endif + } +#else + assert(0); +#endif +} +#else +#define opal_dt_swap_long_double(to_p, from_p, size, count, remoteArch) +#endif + /** * BEWARE: Do not use the following macro with composed types such as * complex. As the swap is done using the entire type sizeof, the @@ -69,11 +136,14 @@ opal_dt_swap_bytes(void *to_p, const void *from_p, const size_t size, size_t cou * COPY_2SAMETYPE_HETEROGENEOUS. */ #define COPY_TYPE_HETEROGENEOUS( TYPENAME, TYPE ) \ + COPY_TYPE_HETEROGENEOUS_INTERNAL( TYPENAME, TYPE, 0 ) + +#define COPY_TYPE_HETEROGENEOUS_INTERNAL( TYPENAME, TYPE, LONG_DOUBLE ) \ static int32_t \ copy_##TYPENAME##_heterogeneous(opal_convertor_t *pConvertor, uint32_t count, \ - const char* from, size_t from_len, OPAL_PTRDIFF_TYPE from_extent, \ - char* to, size_t to_length, OPAL_PTRDIFF_TYPE to_extent, \ - OPAL_PTRDIFF_TYPE *advance) \ + const char* from, size_t from_len, ptrdiff_t from_extent, \ + char* to, size_t to_length, ptrdiff_t to_extent, \ + ptrdiff_t *advance) \ { \ uint32_t i; \ \ @@ -85,15 +155,21 @@ copy_##TYPENAME##_heterogeneous(opal_convertor_t *pConvertor, uint32_t count, (opal_local_arch & OPAL_ARCH_ISBIGENDIAN)) { \ if( (to_extent == from_extent) && (to_extent == sizeof(TYPE)) ) { \ opal_dt_swap_bytes(to, from, sizeof(TYPE), count); \ + if (LONG_DOUBLE) { \ + opal_dt_swap_long_double(to, from, sizeof(TYPE), count, pConvertor->remoteArch);\ + } \ } else { \ for( i = 0; i < count; i++ ) { \ opal_dt_swap_bytes(to, from, sizeof(TYPE), 1); \ + if (LONG_DOUBLE) { \ + opal_dt_swap_long_double(to, from, sizeof(TYPE), 1, pConvertor->remoteArch);\ + } \ to += to_extent; \ from += from_extent; \ } \ } \ - } else if ((OPAL_PTRDIFF_TYPE)sizeof(TYPE) == to_extent && \ - (OPAL_PTRDIFF_TYPE)sizeof(TYPE) == from_extent) { \ + } else if ((ptrdiff_t)sizeof(TYPE) == to_extent && \ + (ptrdiff_t)sizeof(TYPE) == from_extent) { \ MEMCPY( to, from, count * sizeof(TYPE) ); \ } else { \ /* source or destination are non-contigous */ \ @@ -108,11 +184,14 @@ copy_##TYPENAME##_heterogeneous(opal_convertor_t *pConvertor, uint32_t count, } #define COPY_2SAMETYPE_HETEROGENEOUS( TYPENAME, TYPE ) \ + COPY_2SAMETYPE_HETEROGENEOUS_INTERNAL( TYPENAME, TYPE, 0) + +#define COPY_2SAMETYPE_HETEROGENEOUS_INTERNAL( TYPENAME, TYPE, LONG_DOUBLE) \ static int32_t \ copy_##TYPENAME##_heterogeneous(opal_convertor_t *pConvertor, uint32_t count, \ - const char* from, size_t from_len, OPAL_PTRDIFF_TYPE from_extent, \ - char* to, size_t to_length, OPAL_PTRDIFF_TYPE to_extent, \ - OPAL_PTRDIFF_TYPE *advance) \ + const char* from, size_t from_len, ptrdiff_t from_extent, \ + char* to, size_t to_length, ptrdiff_t to_extent, \ + ptrdiff_t *advance) \ { \ uint32_t i; \ \ @@ -122,17 +201,23 @@ copy_##TYPENAME##_heterogeneous(opal_convertor_t *pConvertor, uint32_t count, \ if ((pConvertor->remoteArch & OPAL_ARCH_ISBIGENDIAN) != \ (opal_local_arch & OPAL_ARCH_ISBIGENDIAN)) { \ - if( (to_extent == from_extent) && (to_extent == sizeof(TYPE)) ) { \ + if( (to_extent == from_extent) && (to_extent == (2 * sizeof(TYPE))) ) { \ opal_dt_swap_bytes(to, from, sizeof(TYPE), 2 * count); \ + if (LONG_DOUBLE) { \ + opal_dt_swap_long_double(to, from, sizeof(TYPE), 2*count, pConvertor->remoteArch);\ + } \ } else { \ for( i = 0; i < count; i++ ) { \ opal_dt_swap_bytes(to, from, sizeof(TYPE), 2); \ + if (LONG_DOUBLE) { \ + opal_dt_swap_long_double(to, from, sizeof(TYPE), 2, pConvertor->remoteArch);\ + } \ to += to_extent; \ from += from_extent; \ } \ } \ - } else if ((OPAL_PTRDIFF_TYPE)sizeof(TYPE) == to_extent && \ - (OPAL_PTRDIFF_TYPE)sizeof(TYPE) == from_extent) { \ + } else if ((ptrdiff_t)sizeof(TYPE) == to_extent && \ + (ptrdiff_t)sizeof(TYPE) == from_extent) { \ MEMCPY( to, from, count * sizeof(TYPE) ); \ } else { \ /* source or destination are non-contigous */ \ @@ -149,9 +234,9 @@ copy_##TYPENAME##_heterogeneous(opal_convertor_t *pConvertor, uint32_t count, #define COPY_2TYPE_HETEROGENEOUS( TYPENAME, TYPE1, TYPE2 ) \ static int32_t \ copy_##TYPENAME##_heterogeneous(opal_convertor_t *pConvertor, uint32_t count, \ - const char* from, uint32_t from_len, OPAL_PTRDIFF_TYPE from_extent, \ - char* to, uint32_t to_length, OPAL_PTRDIFF_TYPE to_extent, \ - OPAL_PTRDIFF_TYPE *advance) \ + const char* from, uint32_t from_len, ptrdiff_t from_extent, \ + char* to, uint32_t to_length, ptrdiff_t to_extent, \ + ptrdiff_t *advance) \ { \ uint32_t i; \ \ @@ -173,8 +258,8 @@ copy_##TYPENAME##_heterogeneous(opal_convertor_t *pConvertor, uint32_t count, \ to += to_extent; \ from += from_extent; \ } \ - } else if ((OPAL_PTRDIFF_TYPE)(sizeof(TYPE1) + sizeof(TYPE2)) == to_extent && \ - (OPAL_PTRDIFF_TYPE)(sizeof(TYPE1) + sizeof(TYPE2)) == from_extent) { \ + } else if ((ptrdiff_t)(sizeof(TYPE1) + sizeof(TYPE2)) == to_extent && \ + (ptrdiff_t)(sizeof(TYPE1) + sizeof(TYPE2)) == from_extent) { \ /* source and destination are contigous */ \ MEMCPY( to, from, count * (sizeof(TYPE1) + sizeof(TYPE2)) ); \ } else { \ @@ -192,8 +277,8 @@ copy_##TYPENAME##_heterogeneous(opal_convertor_t *pConvertor, uint32_t count, \ static inline void datatype_check(char *type, size_t local_size, size_t remote_size, uint32_t *count, - const char* from, size_t from_len, OPAL_PTRDIFF_TYPE from_extent, - char* to, size_t to_len, OPAL_PTRDIFF_TYPE to_extent) + const char* from, size_t from_len, ptrdiff_t from_extent, + char* to, size_t to_len, ptrdiff_t to_extent) { /* make sure the remote buffer is large enough to hold the data */ if( (remote_size * *count) > from_len ) { @@ -219,9 +304,9 @@ datatype_check(char *type, size_t local_size, size_t remote_size, uint32_t *coun } static int32_t copy_cxx_bool_heterogeneous(opal_convertor_t *pConvertor, uint32_t count, - const char* from, uint32_t from_len, OPAL_PTRDIFF_TYPE from_extent, - char* to, uint32_t to_length, OPAL_PTRDIFF_TYPE to_extent, - OPAL_PTRDIFF_TYPE *advance) + const char* from, uint32_t from_len, ptrdiff_t from_extent, + char* to, uint32_t to_length, ptrdiff_t to_extent, + ptrdiff_t *advance) { uint32_t i; @@ -333,7 +418,7 @@ COPY_TYPE_HETEROGENEOUS( float16, float ) #elif SIZEOF_DOUBLE == 16 COPY_TYPE_HETEROGENEOUS( float16, double ) #elif HAVE_LONG_DOUBLE && SIZEOF_LONG_DOUBLE == 16 -COPY_TYPE_HETEROGENEOUS( float16, long double ) +COPY_TYPE_HETEROGENEOUS_INTERNAL( float16, long double, 1) #else /* #error No basic type for copy function for opal_datatype_float16 found */ #define copy_float16_heterogeneous NULL @@ -354,7 +439,7 @@ COPY_2SAMETYPE_HETEROGENEOUS( double_complex, double ) #endif #if HAVE_LONG_DOUBLE__COMPLEX -COPY_2SAMETYPE_HETEROGENEOUS( long_double_complex, long double ) +COPY_2SAMETYPE_HETEROGENEOUS_INTERNAL( long_double_complex, long double, 1) #else /* #error No basic type for copy function for opal_datatype_long_double_complex found */ #define copy_long_double_complex_heterogeneous NULL diff --git a/opal/datatype/opal_datatype.h b/opal/datatype/opal_datatype.h index 34c7b4e1b66..3605660fa1f 100644 --- a/opal/datatype/opal_datatype.h +++ b/opal/datatype/opal_datatype.h @@ -3,7 +3,7 @@ * Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2015 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2006 High Performance Computing Center Stuttgart, @@ -14,7 +14,7 @@ * reserved. * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. - * Copyright (c) 2017 Research Organization for Information Science + * Copyright (c) 2017-2018 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -53,9 +53,10 @@ BEGIN_C_DECLS #endif /* * No more than this number of _Basic_ datatypes in C/CPP or Fortran - * are supported (in order to not change setup and usage of btypes). + * are supported (in order to not change setup and usage of the predefined + * datatypes). * - * XXX TODO Adapt to whatever the OMPI-layer needs + * BEWARE: This constant should reflect whatever the OMPI-layer needs. */ #define OPAL_DATATYPE_MAX_SUPPORTED 47 @@ -63,7 +64,7 @@ BEGIN_C_DECLS /* flags for the datatypes. */ #define OPAL_DATATYPE_FLAG_UNAVAILABLE 0x0001 /**< datatypes unavailable on the build (OS or compiler dependant) */ #define OPAL_DATATYPE_FLAG_PREDEFINED 0x0002 /**< cannot be removed: initial and predefined datatypes */ -#define OPAL_DATATYPE_FLAG_COMMITTED 0x0004 /**< ready to be used for a send/recv operation */ +#define OPAL_DATATYPE_FLAG_COMMITTED 0x0004 /**< ready to be used for a send/recv operation */ #define OPAL_DATATYPE_FLAG_OVERLAP 0x0008 /**< datatype is unpropper for a recv operation */ #define OPAL_DATATYPE_FLAG_CONTIGUOUS 0x0010 /**< contiguous datatype */ #define OPAL_DATATYPE_FLAG_NO_GAPS 0x0020 /**< no gaps around the datatype, aka OPAL_DATATYPE_FLAG_CONTIGUOUS and extent == size */ @@ -80,7 +81,6 @@ BEGIN_C_DECLS OPAL_DATATYPE_FLAG_DATA | \ OPAL_DATATYPE_FLAG_COMMITTED) - /** * The number of supported entries in the data-type definition and the * associated type. @@ -108,13 +108,14 @@ struct opal_datatype_t { uint32_t bdt_used; /**< bitset of which basic datatypes are used in the data description */ size_t size; /**< total size in bytes of the memory used by the data if the data is put on a contiguous buffer */ - OPAL_PTRDIFF_TYPE true_lb; /**< the true lb of the data without user defined lb and ub */ - OPAL_PTRDIFF_TYPE true_ub; /**< the true ub of the data without user defined lb and ub */ - OPAL_PTRDIFF_TYPE lb; /**< lower bound in memory */ - OPAL_PTRDIFF_TYPE ub; /**< upper bound in memory */ + ptrdiff_t true_lb; /**< the true lb of the data without user defined lb and ub */ + ptrdiff_t true_ub; /**< the true ub of the data without user defined lb and ub */ + ptrdiff_t lb; /**< lower bound in memory */ + ptrdiff_t ub; /**< upper bound in memory */ /* --- cacheline 1 boundary (64 bytes) --- */ size_t nbElems; /**< total number of elements inside the datatype */ uint32_t align; /**< data should be aligned to */ + uint32_t loops; /**< number of loops on the iternal type stack */ /* Attribute fields */ char name[OPAL_MAX_OBJECT_NAME]; /**< name of the datatype */ @@ -123,11 +124,12 @@ struct opal_datatype_t { dt_type_desc_t opt_desc; /**< short description of the data used when conversion is useless or in the send case (without conversion) */ - uint32_t btypes[OPAL_DATATYPE_MAX_SUPPORTED]; - /**< basic elements count used to compute the size of the - datatype for remote nodes. The length of the array is dependent on - the maximum number of datatypes of all top layers. - Reason being is that Fortran is not at the OPAL layer. */ + size_t *ptypes; /**< array of basic predefined types that facilitate the computing + of the remote size in heterogeneous environments. The length of the + array is dependent on the maximum number of predefined datatypes of + all language interfaces (because Fortran is not known at the OPAL + layer). This field should never be initialized in homogeneous + environments */ /* --- cacheline 5 boundary (320 bytes) was 32-36 bytes ago --- */ /* size: 352, cachelines: 6, members: 15 */ @@ -184,6 +186,7 @@ OPAL_DECLSPEC opal_datatype_t* opal_datatype_create( int32_t expectedSize ); OPAL_DECLSPEC int32_t opal_datatype_create_desc( opal_datatype_t * datatype, int32_t expectedSize ); OPAL_DECLSPEC int32_t opal_datatype_commit( opal_datatype_t * pData ); OPAL_DECLSPEC int32_t opal_datatype_destroy( opal_datatype_t** ); +OPAL_DECLSPEC int32_t opal_datatype_is_monotonic( opal_datatype_t* type); static inline int32_t opal_datatype_is_committed( const opal_datatype_t* type ) @@ -226,19 +229,19 @@ OPAL_DECLSPEC void opal_datatype_dump( const opal_datatype_t* pData ); /* data creation functions */ OPAL_DECLSPEC int32_t opal_datatype_clone( const opal_datatype_t * src_type, opal_datatype_t * dest_type ); OPAL_DECLSPEC int32_t opal_datatype_create_contiguous( int count, const opal_datatype_t* oldType, opal_datatype_t** newType ); -OPAL_DECLSPEC int32_t opal_datatype_resize( opal_datatype_t* type, OPAL_PTRDIFF_TYPE lb, OPAL_PTRDIFF_TYPE extent ); -OPAL_DECLSPEC int32_t opal_datatype_add( opal_datatype_t* pdtBase, const opal_datatype_t* pdtAdd, uint32_t count, - OPAL_PTRDIFF_TYPE disp, OPAL_PTRDIFF_TYPE extent ); +OPAL_DECLSPEC int32_t opal_datatype_resize( opal_datatype_t* type, ptrdiff_t lb, ptrdiff_t extent ); +OPAL_DECLSPEC int32_t opal_datatype_add( opal_datatype_t* pdtBase, const opal_datatype_t* pdtAdd, size_t count, + ptrdiff_t disp, ptrdiff_t extent ); static inline int32_t -opal_datatype_type_lb( const opal_datatype_t* pData, OPAL_PTRDIFF_TYPE* disp ) +opal_datatype_type_lb( const opal_datatype_t* pData, ptrdiff_t* disp ) { *disp = pData->lb; return 0; } static inline int32_t -opal_datatype_type_ub( const opal_datatype_t* pData, OPAL_PTRDIFF_TYPE* disp ) +opal_datatype_type_ub( const opal_datatype_t* pData, ptrdiff_t* disp ) { *disp = pData->ub; return 0; @@ -252,21 +255,21 @@ opal_datatype_type_size( const opal_datatype_t* pData, size_t *size ) } static inline int32_t -opal_datatype_type_extent( const opal_datatype_t* pData, OPAL_PTRDIFF_TYPE* extent ) +opal_datatype_type_extent( const opal_datatype_t* pData, ptrdiff_t* extent ) { *extent = pData->ub - pData->lb; return 0; } static inline int32_t -opal_datatype_get_extent( const opal_datatype_t* pData, OPAL_PTRDIFF_TYPE* lb, OPAL_PTRDIFF_TYPE* extent) +opal_datatype_get_extent( const opal_datatype_t* pData, ptrdiff_t* lb, ptrdiff_t* extent) { *lb = pData->lb; *extent = pData->ub - pData->lb; return 0; } static inline int32_t -opal_datatype_get_true_extent( const opal_datatype_t* pData, OPAL_PTRDIFF_TYPE* true_lb, OPAL_PTRDIFF_TYPE* true_extent) +opal_datatype_get_true_extent( const opal_datatype_t* pData, ptrdiff_t* true_lb, ptrdiff_t* true_extent) { *true_lb = pData->true_lb; *true_extent = (pData->true_ub - pData->true_lb); @@ -281,6 +284,8 @@ OPAL_DECLSPEC int32_t opal_datatype_copy_content_same_ddt( const opal_datatype_t* pData, int32_t count, char* pDestBuf, char* pSrcBuf ); +OPAL_DECLSPEC int opal_datatype_compute_ptypes( opal_datatype_t* datatype ); + OPAL_DECLSPEC const opal_datatype_t* opal_datatype_match_size( int size, uint16_t datakind, uint16_t datalang ); @@ -297,12 +302,12 @@ opal_datatype_sndrcv( void *sbuf, int32_t scount, const opal_datatype_t* sdtype, OPAL_DECLSPEC int32_t opal_datatype_get_args( const opal_datatype_t* pData, int32_t which, int32_t * ci, int32_t * i, - int32_t * ca, OPAL_PTRDIFF_TYPE* a, + int32_t * ca, ptrdiff_t* a, int32_t * cd, opal_datatype_t** d, int32_t * type); OPAL_DECLSPEC int32_t opal_datatype_set_args( opal_datatype_t* pData, int32_t ci, int32_t ** i, - int32_t ca, OPAL_PTRDIFF_TYPE* a, + int32_t ca, ptrdiff_t* a, int32_t cd, opal_datatype_t** d,int32_t type); OPAL_DECLSPEC int32_t opal_datatype_copy_args( const opal_datatype_t* source_data, @@ -340,16 +345,17 @@ opal_datatype_create_from_packed_description( void** packed_buffer, * Returns: the memory span of count repetition of the datatype, and in the gap * argument, the number of bytes of the gap at the beginning. */ -static inline OPAL_PTRDIFF_TYPE +static inline ptrdiff_t opal_datatype_span( const opal_datatype_t* pData, int64_t count, - OPAL_PTRDIFF_TYPE* gap) + ptrdiff_t* gap) { - OPAL_PTRDIFF_TYPE extent = (pData->ub - pData->lb); - OPAL_PTRDIFF_TYPE true_extent = (pData->true_ub - pData->true_lb); if (OPAL_UNLIKELY(0 == pData->size) || (0 == count)) { + *gap = 0; return 0; } *gap = pData->true_lb; + ptrdiff_t extent = (pData->ub - pData->lb); + ptrdiff_t true_extent = (pData->true_ub - pData->true_lb); return true_extent + (count - 1) * extent; } diff --git a/opal/datatype/opal_datatype_add.c b/opal/datatype/opal_datatype_add.c index 890f5503bbd..146ce12afe2 100644 --- a/opal/datatype/opal_datatype_add.c +++ b/opal/datatype/opal_datatype_add.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2016 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2006 High Performance Computing Center Stuttgart, @@ -11,7 +11,9 @@ * Copyright (c) 2004-2006 The Regents of the University of California. * All rights reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. - * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -34,19 +36,19 @@ #define UNSET_CONTIGUOUS_FLAG( INT_VALUE ) (INT_VALUE) = (INT_VALUE) & (~(OPAL_DATATYPE_FLAG_CONTIGUOUS | OPAL_DATATYPE_FLAG_NO_GAPS)) #if defined(__GNUC__) && !defined(__STDC__) -#define LMAX(A,B) ({ OPAL_PTRDIFF_TYPE _a = (A), _b = (B); (_a < _b ? _b : _a) }) -#define LMIN(A,B) ({ OPAL_PTRDIFF_TYPE _a = (A), _b = (B); (_a < _b ? _a : _b); }) +#define LMAX(A,B) ({ ptrdiff_t _a = (A), _b = (B); (_a < _b ? _b : _a) }) +#define LMIN(A,B) ({ ptrdiff_t _a = (A), _b = (B); (_a < _b ? _a : _b); }) #define IMAX(A,B) ({ int _a = (A), _b = (B); (_a < _b ? _b : _a); }) #else -static inline OPAL_PTRDIFF_TYPE LMAX( OPAL_PTRDIFF_TYPE a, OPAL_PTRDIFF_TYPE b ) { return ( a < b ? b : a ); } -static inline OPAL_PTRDIFF_TYPE LMIN( OPAL_PTRDIFF_TYPE a, OPAL_PTRDIFF_TYPE b ) { return ( a < b ? a : b ); } +static inline ptrdiff_t LMAX( ptrdiff_t a, ptrdiff_t b ) { return ( a < b ? b : a ); } +static inline ptrdiff_t LMIN( ptrdiff_t a, ptrdiff_t b ) { return ( a < b ? a : b ); } static inline int IMAX( int a, int b ) { return ( a < b ? b : a ); } #endif /* __GNU__ */ #define OPAL_DATATYPE_COMPUTE_REQUIRED_ENTRIES( _pdtAdd, _count, _extent, _place_needed) \ { \ if( (_pdtAdd)->flags & OPAL_DATATYPE_FLAG_PREDEFINED ) { /* add a basic datatype */ \ - (_place_needed) = ((_extent) == (OPAL_PTRDIFF_TYPE)(_pdtAdd)->size ? 1 : 3); \ + (_place_needed) = ((_extent) == (ptrdiff_t)(_pdtAdd)->size ? 1 : 3); \ } else { \ (_place_needed) = (_pdtAdd)->desc.used; \ if( (_count) != 1 ) { \ @@ -70,7 +72,7 @@ static inline int IMAX( int a, int b ) { return ( a < b ? b : a ); } _new_lb = (_old_lb) + (_disp); \ _new_ub = (_old_ub) + (_disp); \ } else { \ - OPAL_PTRDIFF_TYPE lower, upper; \ + ptrdiff_t lower, upper; \ upper = (_disp) + (_old_extent) * ((_count) - 1); \ lower = (_disp); \ if( lower < upper ) { \ @@ -101,12 +103,12 @@ static inline int IMAX( int a, int b ) { return ( a < b ? b : a ); } * set to ZERO if it's a empty datatype. */ int32_t opal_datatype_add( opal_datatype_t* pdtBase, const opal_datatype_t* pdtAdd, - uint32_t count, OPAL_PTRDIFF_TYPE disp, OPAL_PTRDIFF_TYPE extent ) + size_t count, ptrdiff_t disp, ptrdiff_t extent ) { uint32_t newLength, place_needed = 0, i; short localFlags = 0; /* no specific options yet */ dt_elem_desc_t *pLast, *pLoop = NULL; - OPAL_PTRDIFF_TYPE lb, ub, true_lb, true_ub, epsilon, old_true_ub; + ptrdiff_t lb, ub, true_lb, true_ub, epsilon, old_true_ub; /** * From MPI-3, page 84, lines 18-20: Most datatype constructors have @@ -130,7 +132,7 @@ int32_t opal_datatype_add( opal_datatype_t* pdtBase, const opal_datatype_t* pdtA pdtBase->lb = disp; pdtBase->flags |= OPAL_DATATYPE_FLAG_USER_LB; } - if( (pdtBase->ub - pdtBase->lb) != (OPAL_PTRDIFF_TYPE)pdtBase->size ) { + if( (pdtBase->ub - pdtBase->lb) != (ptrdiff_t)pdtBase->size ) { pdtBase->flags &= ~OPAL_DATATYPE_FLAG_NO_GAPS; } return OPAL_SUCCESS; /* Just ignore the OPAL_DATATYPE_LOOP and OPAL_DATATYPE_END_LOOP */ @@ -142,7 +144,7 @@ int32_t opal_datatype_add( opal_datatype_t* pdtBase, const opal_datatype_t* pdtA pdtBase->ub = disp; pdtBase->flags |= OPAL_DATATYPE_FLAG_USER_UB; } - if( (pdtBase->ub - pdtBase->lb) != (OPAL_PTRDIFF_TYPE)pdtBase->size ) { + if( (pdtBase->ub - pdtBase->lb) != (ptrdiff_t)pdtBase->size ) { pdtBase->flags &= ~OPAL_DATATYPE_FLAG_NO_GAPS; } return OPAL_SUCCESS; /* Just ignore the OPAL_DATATYPE_LOOP and OPAL_DATATYPE_END_LOOP */ @@ -277,25 +279,26 @@ int32_t opal_datatype_add( opal_datatype_t* pdtBase, const opal_datatype_t* pdtA * predefined non contiguous datatypes (like MPI_SHORT_INT). */ if( (pdtAdd->flags & (OPAL_DATATYPE_FLAG_PREDEFINED | OPAL_DATATYPE_FLAG_DATA)) == (OPAL_DATATYPE_FLAG_PREDEFINED | OPAL_DATATYPE_FLAG_DATA) ) { - pdtBase->btypes[pdtAdd->id] += count; + if( NULL != pdtBase->ptypes ) + pdtBase->ptypes[pdtAdd->id] += count; pLast->elem.common.type = pdtAdd->id; pLast->elem.count = count; pLast->elem.disp = disp; pLast->elem.extent = extent; pdtBase->desc.used++; pLast->elem.common.flags = pdtAdd->flags & ~(OPAL_DATATYPE_FLAG_COMMITTED); - if( (extent != (OPAL_PTRDIFF_TYPE)pdtAdd->size) && (count > 1) ) { /* gaps around the datatype */ + if( (extent != (ptrdiff_t)pdtAdd->size) && (count > 1) ) { /* gaps around the datatype */ pLast->elem.common.flags &= ~(OPAL_DATATYPE_FLAG_CONTIGUOUS | OPAL_DATATYPE_FLAG_NO_GAPS); } } else { /* keep trace of the total number of basic datatypes in the datatype definition */ - pdtBase->btypes[OPAL_DATATYPE_LOOP] += pdtAdd->btypes[OPAL_DATATYPE_LOOP]; - pdtBase->btypes[OPAL_DATATYPE_END_LOOP] += pdtAdd->btypes[OPAL_DATATYPE_END_LOOP]; - pdtBase->btypes[OPAL_DATATYPE_LB] |= pdtAdd->btypes[OPAL_DATATYPE_LB]; - pdtBase->btypes[OPAL_DATATYPE_UB] |= pdtAdd->btypes[OPAL_DATATYPE_UB]; - for( i = 4; i < OPAL_DATATYPE_MAX_PREDEFINED; i++ ) - if( pdtAdd->btypes[i] != 0 ) pdtBase->btypes[i] += (count * pdtAdd->btypes[i]); - + pdtBase->loops += pdtAdd->loops; + pdtBase->flags |= (pdtAdd->flags & OPAL_DATATYPE_FLAG_USER_LB); + pdtBase->flags |= (pdtAdd->flags & OPAL_DATATYPE_FLAG_USER_UB); + if( (NULL != pdtBase->ptypes) && (NULL != pdtAdd->ptypes) ) { + for( i = OPAL_DATATYPE_FIRST_TYPE; i < OPAL_DATATYPE_MAX_PREDEFINED; i++ ) + if( pdtAdd->ptypes[i] != 0 ) pdtBase->ptypes[i] += (count * pdtAdd->ptypes[i]); + } if( (1 == pdtAdd->desc.used) && (extent == (pdtAdd->ub - pdtAdd->lb)) && (extent == pdtAdd->desc.desc[0].elem.extent) ){ pLast->elem = pdtAdd->desc.desc[0].elem; @@ -310,7 +313,7 @@ int32_t opal_datatype_add( opal_datatype_t* pdtBase, const opal_datatype_t* pdtA pLoop = pLast; CREATE_LOOP_START( pLast, count, pdtAdd->desc.used + 1, extent, (pdtAdd->flags & ~(OPAL_DATATYPE_FLAG_COMMITTED)) ); - pdtBase->btypes[OPAL_DATATYPE_LOOP] += 2; + pdtBase->loops += 2; pdtBase->desc.used += 2; pLast++; } @@ -344,11 +347,11 @@ int32_t opal_datatype_add( opal_datatype_t* pdtBase, const opal_datatype_t* pdtA UNSET_CONTIGUOUS_FLAG(pdtBase->flags); if( (localFlags & OPAL_DATATYPE_FLAG_CONTIGUOUS) /* both type were contiguous */ && ((disp + pdtAdd->true_lb) == old_true_ub) /* and there is no gap between them */ - && ( ((OPAL_PTRDIFF_TYPE)pdtAdd->size == extent) /* the size and the extent of the + && ( ((ptrdiff_t)pdtAdd->size == extent) /* the size and the extent of the * added type have to match */ || (count < 2)) ) { /* if the count is bigger than 2 */ SET_CONTIGUOUS_FLAG(pdtBase->flags); - if( (OPAL_PTRDIFF_TYPE)pdtBase->size == (pdtBase->ub - pdtBase->lb) ) + if( (ptrdiff_t)pdtBase->size == (pdtBase->ub - pdtBase->lb) ) SET_NO_GAP_FLAG(pdtBase->flags); } diff --git a/opal/datatype/opal_datatype_clone.c b/opal/datatype/opal_datatype_clone.c index 05f57c88cd8..fa4479982d0 100644 --- a/opal/datatype/opal_datatype_clone.c +++ b/opal/datatype/opal_datatype_clone.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2009 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2006 High Performance Computing Center Stuttgart, @@ -61,6 +61,9 @@ int32_t opal_datatype_clone( const opal_datatype_t * src_type, opal_datatype_t * dest_type->opt_desc.used = src_type->opt_desc.used; memcpy( dest_type->opt_desc.desc, src_type->opt_desc.desc, desc_length * sizeof(dt_elem_desc_t) ); } + } else { + assert( NULL == dest_type->opt_desc.desc ); + assert( 0 == dest_type->opt_desc.length ); } } dest_type->id = src_type->id; /* preserve the default id. This allow us to diff --git a/opal/datatype/opal_datatype_copy.c b/opal/datatype/opal_datatype_copy.c index d1027a2d63e..7bf94ef97b9 100644 --- a/opal/datatype/opal_datatype_copy.c +++ b/opal/datatype/opal_datatype_copy.c @@ -12,8 +12,8 @@ * All rights reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. * Copyright (c) 2011 NVIDIA Corporation. All rights reserved. - * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -99,7 +99,7 @@ static size_t opal_datatype_memop_block_size = 128 * 1024; int32_t opal_datatype_copy_content_same_ddt( const opal_datatype_t* datatype, int32_t count, char* destination_base, char* source_base ) { - OPAL_PTRDIFF_TYPE extent; + ptrdiff_t extent; int32_t (*fct)( const opal_datatype_t*, int32_t, char*, char*); #if OPAL_CUDA_SUPPORT diff --git a/opal/datatype/opal_datatype_copy.h b/opal/datatype/opal_datatype_copy.h index 5557142b1fd..5dcfe2ec5d3 100644 --- a/opal/datatype/opal_datatype_copy.h +++ b/opal/datatype/opal_datatype_copy.h @@ -1,10 +1,10 @@ /* -*- Mode: C; c-basic-offset:4 ; -*- */ /* - * Copyright (c) 2004-2012 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -98,7 +98,7 @@ static inline void _contiguous_loop( const dt_elem_desc_t* ELEM, size_t _copy_loops = (COUNT); uint32_t _i; - if( _loop->extent == (OPAL_PTRDIFF_TYPE)_end_loop->size ) { /* the loop is contiguous */ + if( _loop->extent == (ptrdiff_t)_end_loop->size ) { /* the loop is contiguous */ _copy_loops *= _end_loop->size; OPAL_DATATYPE_SAFEGUARD_POINTER( _source, _copy_loops, (SOURCE_BASE), (DATATYPE), (TOTAL_COUNT) ); @@ -140,13 +140,13 @@ static inline int32_t _copy_content_same_ddt( const opal_datatype_t* datatype, i * do a MEM_OP. */ if( datatype->flags & OPAL_DATATYPE_FLAG_CONTIGUOUS ) { - OPAL_PTRDIFF_TYPE extent = (datatype->ub - datatype->lb); + ptrdiff_t extent = (datatype->ub - datatype->lb); /* Now that we know the datatype is contiguous, we should move the 2 pointers * source and destination to the correct displacement. */ destination += datatype->true_lb; source += datatype->true_lb; - if( (OPAL_PTRDIFF_TYPE)datatype->size == extent ) { /* all contiguous == no gaps around */ + if( (ptrdiff_t)datatype->size == extent ) { /* all contiguous == no gaps around */ size_t total_length = iov_len_local; size_t memop_chunk = opal_datatype_memop_block_size; while( total_length > 0 ) { @@ -179,7 +179,7 @@ static inline int32_t _copy_content_same_ddt( const opal_datatype_t* datatype, i return 0; /* completed */ } - pStack = (dt_stack_t*)alloca( sizeof(dt_stack_t) * (datatype->btypes[OPAL_DATATYPE_LOOP] + 1) ); + pStack = (dt_stack_t*)alloca( sizeof(dt_stack_t) * (datatype->loops + 1) ); pStack->count = count; pStack->index = -1; pStack->disp = 0; @@ -233,14 +233,14 @@ static inline int32_t _copy_content_same_ddt( const opal_datatype_t* datatype, i (int)pStack->count, stack_pos, pos_desc, (long)pStack->disp, (unsigned long)iov_len_local ); ); } if( OPAL_DATATYPE_LOOP == pElem->elem.common.type ) { - OPAL_PTRDIFF_TYPE local_disp = (OPAL_PTRDIFF_TYPE)source; + ptrdiff_t local_disp = (ptrdiff_t)source; if( pElem->loop.common.flags & OPAL_DATATYPE_FLAG_CONTIGUOUS ) { _contiguous_loop( pElem, datatype, (unsigned char*)source_base, count, count_desc, source, destination, &iov_len_local ); pos_desc += pElem->loop.items + 1; goto update_loop_description; } - local_disp = (OPAL_PTRDIFF_TYPE)source - local_disp; + local_disp = (ptrdiff_t)source - local_disp; PUSH_STACK( pStack, stack_pos, pos_desc, OPAL_DATATYPE_LOOP, count_desc, pStack->disp + local_disp); pos_desc++; diff --git a/opal/datatype/opal_datatype_create.c b/opal/datatype/opal_datatype_create.c index e64e1f04190..0e6d49b9bd7 100644 --- a/opal/datatype/opal_datatype_create.c +++ b/opal/datatype/opal_datatype_create.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2013 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2006 High Performance Computing Center Stuttgart, @@ -30,8 +30,6 @@ static void opal_datatype_construct( opal_datatype_t* pData ) { - int i; - pData->size = 0; pData->flags = OPAL_DATATYPE_FLAG_CONTIGUOUS; pData->id = 0; @@ -53,32 +51,36 @@ static void opal_datatype_construct( opal_datatype_t* pData ) pData->opt_desc.length = 0; pData->opt_desc.used = 0; - for( i = 0; i < OPAL_DATATYPE_MAX_SUPPORTED; i++ ) - pData->btypes[i] = 0; + pData->ptypes = NULL; + pData->loops = 0; } static void opal_datatype_destruct( opal_datatype_t* datatype ) { + /** + * As the default description and the optimized description might point to the + * same data description we should start by cleaning the optimized description. + */ + if( NULL != datatype->opt_desc.desc ) { + if( datatype->opt_desc.desc != datatype->desc.desc ) + free( datatype->opt_desc.desc ); + datatype->opt_desc.length = 0; + datatype->opt_desc.used = 0; + datatype->opt_desc.desc = NULL; + } if (!opal_datatype_is_predefined(datatype)) { - if( datatype->desc.desc != NULL ) { + if( NULL != datatype->desc.desc ) { free( datatype->desc.desc ); datatype->desc.length = 0; datatype->desc.used = 0; + datatype->desc.desc = NULL; } } - if( datatype->opt_desc.desc != NULL ) { - if( datatype->opt_desc.desc != datatype->desc.desc ) - free( datatype->opt_desc.desc ); - datatype->opt_desc.length = 0; - datatype->opt_desc.used = 0; - datatype->opt_desc.desc = NULL; + /* dont free the ptypes of predefined types (it was not dynamically allocated) */ + if( (NULL != datatype->ptypes) && (datatype->id >= OPAL_DATATYPE_MAX_PREDEFINED) ) { + free(datatype->ptypes); + datatype->ptypes = NULL; } - /** - * As the default description and the optimized description can point to the - * same memory location we should keep the default location pointer until we - * know what we should do with the optimized description. - */ - datatype->desc.desc = NULL; /* make sure the name is set to empty */ datatype->name[0] = '\0'; diff --git a/opal/datatype/opal_datatype_dump.c b/opal/datatype/opal_datatype_dump.c index 30575674196..d469f8291dc 100644 --- a/opal/datatype/opal_datatype_dump.c +++ b/opal/datatype/opal_datatype_dump.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2009 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2006 High Performance Computing Center Stuttgart, @@ -12,6 +12,7 @@ * All rights reserved. * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. + * Copyright (c) 2018 Cisco Systems, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -42,8 +43,14 @@ int opal_datatype_contain_basic_datatypes( const opal_datatype_t* pData, char* p if( pData->flags & OPAL_DATATYPE_FLAG_USER_LB ) index += snprintf( ptr, length - index, "lb " ); if( pData->flags & OPAL_DATATYPE_FLAG_USER_UB ) index += snprintf( ptr + index, length - index, "ub " ); for( i = 0; i < OPAL_DATATYPE_MAX_PREDEFINED; i++ ) { - if( pData->bdt_used & mask ) - index += snprintf( ptr + index, length - index, "%s ", opal_datatype_basicDatatypes[i]->name ); + if( pData->bdt_used & mask ) { + if( NULL == pData->ptypes ) { + index += snprintf( ptr + index, length - index, "%s:* ", opal_datatype_basicDatatypes[i]->name ); + } else { + index += snprintf( ptr + index, length - index, "%s:%" PRIsize_t " ", opal_datatype_basicDatatypes[i]->name, + pData->ptypes[i]); + } + } mask <<= 1; if( length <= (size_t)index ) break; } @@ -115,7 +122,7 @@ void opal_datatype_dump( const opal_datatype_t* pData ) (void*)pData, pData->name, (long)pData->size, (int)pData->align, pData->id, (int)pData->desc.length, (int)pData->desc.used, (long)pData->true_lb, (long)pData->true_ub, (long)(pData->true_ub - pData->true_lb), (long)pData->lb, (long)pData->ub, (long)(pData->ub - pData->lb), - (int)pData->nbElems, (int)pData->btypes[OPAL_DATATYPE_LOOP], (int)pData->flags ); + (int)pData->nbElems, (int)pData->loops, (int)pData->flags ); /* dump the flags */ if( pData->flags == OPAL_DATATYPE_FLAG_PREDEFINED ) index += snprintf( buffer + index, length - index, "predefined " ); diff --git a/opal/datatype/opal_datatype_fake_stack.c b/opal/datatype/opal_datatype_fake_stack.c index 4f72b343672..1cc05fe8860 100644 --- a/opal/datatype/opal_datatype_fake_stack.c +++ b/opal/datatype/opal_datatype_fake_stack.c @@ -3,14 +3,16 @@ * Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2009 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. - * Copyright (c) 2004-2006 High Performance Computing Center Stuttgart, + * Copyright (c) 2004-2017 High Performance Computing Center Stuttgart, * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2006 The Regents of the University of California. * All rights reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -32,21 +34,8 @@ #include "opal/datatype/opal_datatype_internal.h" -int opal_convertor_create_stack_with_pos_general( opal_convertor_t* pConvertor, - size_t starting_point, - const size_t* sizes ); - -static inline size_t -opal_convertor_compute_remote_size( const opal_datatype_t* pData, const size_t* sizes ) -{ - uint32_t i; - size_t length = 0; - - for( i = OPAL_DATATYPE_FIRST_TYPE; i < OPAL_DATATYPE_MAX_PREDEFINED; i++ ) { - length += (pData->btypes[i] * sizes[i]); - } - return length; -} +extern int opal_convertor_create_stack_with_pos_general( opal_convertor_t* convertor, + size_t starting_point, const size_t* sizes ); int opal_convertor_create_stack_with_pos_general( opal_convertor_t* pConvertor, size_t starting_point, const size_t* sizes ) @@ -78,7 +67,7 @@ int opal_convertor_create_stack_with_pos_general( opal_convertor_t* pConvertor, if( (pConvertor->flags & CONVERTOR_HOMOGENEOUS) && (pData->flags & OPAL_DATATYPE_FLAG_CONTIGUOUS) ) { /* Special case for contiguous datatypes */ int32_t cnt = (int32_t)(starting_point / pData->size); - OPAL_PTRDIFF_TYPE extent = pData->ub - pData->lb; + ptrdiff_t extent = pData->ub - pData->lb; loop_length = GET_FIRST_NON_LOOP( pElems ); pStack[0].disp = pElems[loop_length].elem.disp; @@ -90,7 +79,7 @@ int opal_convertor_create_stack_with_pos_general( opal_convertor_t* pConvertor, pStack[1].disp = pStack[0].disp; pStack[1].count = pData->size - cnt; - if( (OPAL_PTRDIFF_TYPE)pData->size == extent ) { /* all elements are contiguous */ + if( (ptrdiff_t)pData->size == extent ) { /* all elements are contiguous */ pStack[1].disp += starting_point; } else { /* each is contiguous but there are gaps inbetween */ pStack[1].disp += (pConvertor->count - pStack[0].count) * extent + cnt; @@ -102,7 +91,8 @@ int opal_convertor_create_stack_with_pos_general( opal_convertor_t* pConvertor, } /* remove from the main loop all the complete datatypes */ - remote_size = opal_convertor_compute_remote_size( pData, sizes ); + assert (! (pConvertor->flags & CONVERTOR_SEND)); + remote_size = opal_convertor_compute_remote_size( pConvertor ); count = (int32_t)(starting_point / remote_size); resting_place -= (remote_size * count); pStack->count = pConvertor->count - count; @@ -112,7 +102,7 @@ int opal_convertor_create_stack_with_pos_general( opal_convertor_t* pConvertor, pStack->disp = count * (pData->ub - pData->lb) + pElems[loop_length].elem.disp; pos_desc = 0; - remoteLength = (size_t*)alloca( sizeof(size_t) * (pConvertor->pDesc->btypes[OPAL_DATATYPE_LOOP] + 1)); + remoteLength = (size_t*)alloca( sizeof(size_t) * (pConvertor->pDesc->loops + 1)); remoteLength[0] = 0; /* initial value set to ZERO */ loop_length = 0; @@ -122,7 +112,7 @@ int opal_convertor_create_stack_with_pos_general( opal_convertor_t* pConvertor, while( pos_desc < (int32_t)pConvertor->use_desc->used ) { if( OPAL_DATATYPE_END_LOOP == pElems->elem.common.type ) { /* end of the current loop */ ddt_endloop_desc_t* end_loop = (ddt_endloop_desc_t*)pElems; - OPAL_PTRDIFF_TYPE extent; + ptrdiff_t extent; if( (loop_length * pStack->count) > resting_place ) { /* We will stop somewhere on this loop. To avoid moving inside the loop diff --git a/opal/datatype/opal_datatype_get_count.c b/opal/datatype/opal_datatype_get_count.c index 7b539fbec81..ae085c42704 100644 --- a/opal/datatype/opal_datatype_get_count.c +++ b/opal/datatype/opal_datatype_get_count.c @@ -1,6 +1,6 @@ /* -*- Mode: C; c-basic-offset:4 ; -*- */ /* - * Copyright (c) 2004-2009 The University of Tennessee and The University + * Copyright (c) 2004-2018 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. @@ -39,9 +39,9 @@ ssize_t opal_datatype_get_element_count( const opal_datatype_t* datatype, size_t /* Normally the size should be less or equal to the size of the datatype. * This function does not support a iSize bigger than the size of the datatype. */ - assert( (uint32_t)iSize <= datatype->size ); - DUMP( "dt_count_elements( %p, %d )\n", (void*)datatype, iSize ); - pStack = (dt_stack_t*)alloca( sizeof(dt_stack_t) * (datatype->btypes[OPAL_DATATYPE_LOOP] + 2) ); + assert( iSize <= datatype->size ); + DUMP( "dt_count_elements( %p, %ul )\n", (void*)datatype, (unsigned long)iSize ); + pStack = (dt_stack_t*)alloca( sizeof(dt_stack_t) * (datatype->loops + 2) ); pStack->count = 1; pStack->index = -1; pStack->disp = 0; @@ -53,14 +53,15 @@ ssize_t opal_datatype_get_element_count( const opal_datatype_t* datatype, size_t if( --(pStack->count) == 0 ) { /* end of loop */ stack_pos--; pStack--; if( stack_pos == -1 ) return nbElems; /* completed */ + pos_desc++; /* advance to the next element after the end loop */ + } else { + pos_desc = pStack->index + 1; /* go back to the begining of the loop */ } - pos_desc = pStack->index + 1; continue; } if( OPAL_DATATYPE_LOOP == pElems[pos_desc].elem.common.type ) { - ddt_loop_desc_t* loop = &(pElems[pos_desc].loop); do { - PUSH_STACK( pStack, stack_pos, pos_desc, OPAL_DATATYPE_LOOP, loop->loops, 0 ); + PUSH_STACK( pStack, stack_pos, pos_desc, OPAL_DATATYPE_LOOP, pElems[pos_desc].loop.loops, 0 ); pos_desc++; } while( OPAL_DATATYPE_LOOP == pElems[pos_desc].elem.common.type ); /* let's start another loop */ DDT_DUMP_STACK( pStack, stack_pos, pElems, "advance loops" ); @@ -93,9 +94,7 @@ int32_t opal_datatype_set_element_count( const opal_datatype_t* datatype, size_t /** * Handle all complete multiple of the datatype. */ - for( pos_desc = 4; pos_desc < OPAL_DATATYPE_MAX_PREDEFINED; pos_desc++ ) { - local_length += datatype->btypes[pos_desc]; - } + local_length = datatype->nbElems; pos_desc = count / local_length; count = count % local_length; *length = datatype->size * pos_desc; @@ -104,7 +103,7 @@ int32_t opal_datatype_set_element_count( const opal_datatype_t* datatype, size_t } DUMP( "dt_set_element_count( %p, %d )\n", (void*)datatype, count ); - pStack = (dt_stack_t*)alloca( sizeof(dt_stack_t) * (datatype->btypes[OPAL_DATATYPE_LOOP] + 2) ); + pStack = (dt_stack_t*)alloca( sizeof(dt_stack_t) * (datatype->loops + 2) ); pStack->count = 1; pStack->index = -1; pStack->disp = 0; @@ -116,14 +115,15 @@ int32_t opal_datatype_set_element_count( const opal_datatype_t* datatype, size_t if( --(pStack->count) == 0 ) { /* end of loop */ stack_pos--; pStack--; if( stack_pos == -1 ) return 0; + pos_desc++; /* advance to the next element after the end loop */ + } else { + pos_desc = pStack->index + 1; /* go back to the begining of the loop */ } - pos_desc = pStack->index + 1; continue; } if( OPAL_DATATYPE_LOOP == pElems[pos_desc].elem.common.type ) { - ddt_loop_desc_t* loop = &(pElems[pos_desc].loop); do { - PUSH_STACK( pStack, stack_pos, pos_desc, OPAL_DATATYPE_LOOP, loop->loops, 0 ); + PUSH_STACK( pStack, stack_pos, pos_desc, OPAL_DATATYPE_LOOP, pElems[pos_desc].loop.loops, 0 ); pos_desc++; } while( OPAL_DATATYPE_LOOP == pElems[pos_desc].elem.common.type ); /* let's start another loop */ DDT_DUMP_STACK( pStack, stack_pos, pElems, "advance loops" ); @@ -143,3 +143,58 @@ int32_t opal_datatype_set_element_count( const opal_datatype_t* datatype, size_t } } +/** + * Compute the array of counts of the predefined datatypes contained in + * the datatype. We have no simple way to create this array, as we only + * sporadically need it (when we deal with heterogeneous environments or + * when we use get_element_count). Thus, we will pay the cost once per + * datatype, but we will only update this array if/when needed. + */ +int opal_datatype_compute_ptypes( opal_datatype_t* datatype ) +{ + dt_stack_t* pStack; /* pointer to the position on the stack */ + uint32_t pos_desc; /* actual position in the description of the derived datatype */ + ssize_t nbElems = 0, stack_pos = 0; + dt_elem_desc_t* pElems; + + if( NULL != datatype->ptypes ) return 0; + datatype->ptypes = (size_t*)calloc(OPAL_DATATYPE_MAX_SUPPORTED, sizeof(size_t)); + + DUMP( "opal_datatype_compute_ptypes( %p )\n", (void*)datatype ); + pStack = (dt_stack_t*)alloca( sizeof(dt_stack_t) * (datatype->loops + 2) ); + pStack->count = 1; + pStack->index = -1; + pStack->disp = 0; + pElems = datatype->desc.desc; + pos_desc = 0; + + while( 1 ) { /* loop forever the exit condition is on the last OPAL_DATATYPE_END_LOOP */ + if( OPAL_DATATYPE_END_LOOP == pElems[pos_desc].elem.common.type ) { /* end of the current loop */ + if( --(pStack->count) == 0 ) { /* end of loop */ + stack_pos--; pStack--; + if( stack_pos == -1 ) return 0; /* completed */ + pos_desc++; /* advance to the next element after the end loop */ + } else { + pos_desc = pStack->index + 1; /* go back to the begining of the loop */ + } + continue; + } + if( OPAL_DATATYPE_LOOP == pElems[pos_desc].elem.common.type ) { + do { + PUSH_STACK( pStack, stack_pos, pos_desc, OPAL_DATATYPE_LOOP, pElems[pos_desc].loop.loops, 0 ); + pos_desc++; + } while( OPAL_DATATYPE_LOOP == pElems[pos_desc].elem.common.type ); /* let's start another loop */ + DDT_DUMP_STACK( pStack, stack_pos, pElems, "advance loops" ); + } + while( pElems[pos_desc].elem.common.flags & OPAL_DATATYPE_FLAG_DATA ) { + /* now here we have a basic datatype */ + datatype->ptypes[pElems[pos_desc].elem.common.type] += pElems[pos_desc].elem.count; + nbElems += pElems[pos_desc].elem.count; + + DUMP( " compute_ptypes-add: type %d count %"PRIsize_t" (total type %"PRIsize_t" total %lld)\n", + pElems[pos_desc].elem.common.type, datatype->ptypes[pElems[pos_desc].elem.common.type], + pElems[pos_desc].elem.count, nbElems ); + pos_desc++; /* advance to the next data */ + } + } +} diff --git a/opal/datatype/opal_datatype_internal.h b/opal/datatype/opal_datatype_internal.h index 5fdd2c59d96..bc3f8aa7cab 100644 --- a/opal/datatype/opal_datatype_internal.h +++ b/opal/datatype/opal_datatype_internal.h @@ -3,7 +3,7 @@ * Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2012 The University of Tennessee and The University + * Copyright (c) 2004-2018 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2006 High Performance Computing Center Stuttgart, @@ -13,7 +13,9 @@ * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. * Copyright (c) 2013 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -47,10 +49,9 @@ static inline void DUMP( char* fmt, ... ) va_list list; va_start( list, fmt ); - opal_output( opal_datatype_dfd, fmt, list ); + opal_output_vverbose( 0, opal_datatype_dfd, fmt, list ); va_end( list ); } -# define DUMP printf # endif /* __GNUC__ && !__STDC__ */ # endif /* ACCEPT_C99 */ #else @@ -153,10 +154,10 @@ typedef struct ddt_elem_id_description ddt_elem_id_description; */ struct ddt_elem_desc { ddt_elem_id_description common; /**< basic data description and flags */ - uint32_t count; /**< number of blocks */ uint32_t blocklen; /**< number of elements on each block */ - OPAL_PTRDIFF_TYPE extent; /**< extent of each block (in bytes) */ - OPAL_PTRDIFF_TYPE disp; /**< displacement of the first block */ + size_t count; /**< number of blocks */ + ptrdiff_t extent; /**< extent of each block (in bytes) */ + ptrdiff_t disp; /**< displacement of the first block */ }; typedef struct ddt_elem_desc ddt_elem_desc_t; @@ -170,10 +171,10 @@ typedef struct ddt_elem_desc ddt_elem_desc_t; */ struct ddt_loop_desc { ddt_elem_id_description common; /**< basic data description and flags */ - uint32_t loops; /**< number of elements */ uint32_t items; /**< number of items in the loop */ + uint32_t loops; /**< number of elements */ size_t unused; /**< not used right now */ - OPAL_PTRDIFF_TYPE extent; /**< extent of the whole loop */ + ptrdiff_t extent; /**< extent of the whole loop */ }; typedef struct ddt_loop_desc ddt_loop_desc_t; @@ -182,7 +183,7 @@ struct ddt_endloop_desc { uint32_t items; /**< number of elements */ uint32_t unused; /**< not used right now */ size_t size; /**< real size of the data in the loop */ - OPAL_PTRDIFF_TYPE first_elem_disp; /**< the displacement of the first block in the loop */ + ptrdiff_t first_elem_disp; /**< the displacement of the first block in the loop */ }; typedef struct ddt_endloop_desc ddt_endloop_desc_t; @@ -212,13 +213,20 @@ union dt_elem_desc { (_place)->end_loop.unused = -1; \ } while(0) + +/** + * Create one or more elements depending on the value of _count. If the value + * is too large for the type of elem.count then use oth the elem.count and + * elem.blocklen to create it. If the number is prime then create a second + * element to account for the difference. + */ #define CREATE_ELEM( _place, _type, _flags, _count, _disp, _extent ) \ do { \ (_place)->elem.common.flags = (_flags) | OPAL_DATATYPE_FLAG_DATA; \ (_place)->elem.common.type = (_type); \ - (_place)->elem.count = (_count); \ (_place)->elem.disp = (_disp); \ (_place)->elem.extent = (_extent); \ + (_place)->elem.count = (_count); \ (_place)->elem.blocklen = 1; \ } while(0) /* @@ -236,8 +244,8 @@ struct opal_datatype_t; * OPAL_DATATYPE_INIT_BTYPES_ARRAY_[0-21], then order and naming would _not_ matter.... */ -#define OPAL_DATATYPE_INIT_BTYPES_ARRAY_UNAVAILABLE { 0 } -#define OPAL_DATATYPE_INIT_BTYPES_ARRAY(NAME) { [OPAL_DATATYPE_ ## NAME] = 1 } +#define OPAL_DATATYPE_INIT_PTYPES_ARRAY_UNAVAILABLE NULL +#define OPAL_DATATYPE_INIT_PTYPES_ARRAY(NAME) (size_t[OPAL_DATATYPE_MAX_PREDEFINED]){ [OPAL_DATATYPE_ ## NAME] = 1, [OPAL_DATATYPE_MAX_PREDEFINED-1] = 0 } #define OPAL_DATATYPE_INIT_NAME(NAME) "OPAL_" #NAME @@ -266,7 +274,7 @@ struct opal_datatype_t; .name = OPAL_DATATYPE_INIT_NAME(NAME), \ .desc = OPAL_DATATYPE_INIT_DESC_PREDEFINED(UNAVAILABLE), \ .opt_desc = OPAL_DATATYPE_INIT_DESC_PREDEFINED(UNAVAILABLE), \ - .btypes = OPAL_DATATYPE_INIT_BTYPES_ARRAY_UNAVAILABLE \ + .ptypes = OPAL_DATATYPE_INIT_PTYPES_ARRAY_UNAVAILABLE \ } #define OPAL_DATATYPE_INITIALIZER_UNAVAILABLE( FLAGS ) \ @@ -285,7 +293,7 @@ struct opal_datatype_t; .name = OPAL_DATATYPE_INIT_NAME(EMPTY), \ .desc = OPAL_DATATYPE_INIT_DESC_NULL, \ .opt_desc = OPAL_DATATYPE_INIT_DESC_NULL, \ - .btypes = OPAL_DATATYPE_INIT_BTYPES_ARRAY_UNAVAILABLE \ + .ptypes = OPAL_DATATYPE_INIT_PTYPES_ARRAY_UNAVAILABLE \ } #define OPAL_DATATYPE_INIT_BASIC_TYPE( TYPE, NAME, FLAGS ) \ @@ -301,7 +309,7 @@ struct opal_datatype_t; .name = OPAL_DATATYPE_INIT_NAME(NAME), \ .desc = OPAL_DATATYPE_INIT_DESC_NULL, \ .opt_desc = OPAL_DATATYPE_INIT_DESC_NULL, \ - .btypes = OPAL_DATATYPE_INIT_BTYPES_ARRAY(NAME) \ + .ptypes = OPAL_DATATYPE_INIT_PTYPES_ARRAY_UNAVAILABLE \ } #define OPAL_DATATYPE_INIT_BASIC_DATATYPE( TYPE, ALIGN, NAME, FLAGS ) \ @@ -317,11 +325,11 @@ struct opal_datatype_t; .name = OPAL_DATATYPE_INIT_NAME(NAME), \ .desc = OPAL_DATATYPE_INIT_DESC_PREDEFINED(NAME), \ .opt_desc = OPAL_DATATYPE_INIT_DESC_PREDEFINED(NAME), \ - .btypes = OPAL_DATATYPE_INIT_BTYPES_ARRAY(NAME) \ + .ptypes = OPAL_DATATYPE_INIT_PTYPES_ARRAY_UNAVAILABLE \ } -#define OPAL_DATATYPE_INITIALIZER_LOOP(FLAGS) OPAL_DATATYPE_INIT_BASIC_TYPE( OPAL_DATATYPE_LOOP, LOOP, FLAGS ) -#define OPAL_DATATYPE_INITIALIZER_END_LOOP(FLAGS) OPAL_DATATYPE_INIT_BASIC_TYPE( OPAL_DATATYPE_END_LOOP, END_LOOP, FLAGS ) +#define OPAL_DATATYPE_INITIALIZER_LOOP(FLAGS) OPAL_DATATYPE_INIT_BASIC_TYPE( OPAL_DATATYPE_LOOP, LOOP_S, FLAGS ) +#define OPAL_DATATYPE_INITIALIZER_END_LOOP(FLAGS) OPAL_DATATYPE_INIT_BASIC_TYPE( OPAL_DATATYPE_END_LOOP, LOOP_E, FLAGS ) #define OPAL_DATATYPE_INITIALIZER_LB(FLAGS) OPAL_DATATYPE_INIT_BASIC_TYPE( OPAL_DATATYPE_LB, LB, FLAGS ) #define OPAL_DATATYPE_INITIALIZER_UB(FLAGS) OPAL_DATATYPE_INIT_BASIC_TYPE( OPAL_DATATYPE_UB, UB, FLAGS ) #define OPAL_DATATYPE_INITIALIZER_INT1(FLAGS) OPAL_DATATYPE_INIT_BASIC_DATATYPE( int8_t, OPAL_ALIGNMENT_INT8, INT1, FLAGS ) @@ -474,7 +482,10 @@ static inline int GET_FIRST_NON_LOOP( const union dt_elem_desc* _pElem ) #define UPDATE_INTERNAL_COUNTERS( DESCRIPTION, POSITION, ELEMENT, COUNTER ) \ do { \ (ELEMENT) = &((DESCRIPTION)[(POSITION)]); \ - (COUNTER) = (ELEMENT)->elem.count; \ + if( OPAL_DATATYPE_LOOP == (ELEMENT)->elem.common.type ) \ + (COUNTER) = (ELEMENT)->loop.loops; \ + else \ + (COUNTER) = (ELEMENT)->elem.count; \ } while (0) OPAL_DECLSPEC int opal_datatype_contain_basic_datatypes( const struct opal_datatype_t* pData, char* ptr, size_t length ); diff --git a/opal/datatype/opal_datatype_module.c b/opal/datatype/opal_datatype_module.c index 7de8fae5b08..2d8dedc94e7 100644 --- a/opal/datatype/opal_datatype_module.c +++ b/opal/datatype/opal_datatype_module.c @@ -3,14 +3,14 @@ * Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2013 The University of Tennessee and The University + * Copyright (c) 2004-2018 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2006 High Performance Computing Center Stuttgart, * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2006 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2007 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2007-2018 Cisco Systems, Inc. All rights reserved * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. * Copyright (c) 2013 Los Alamos National Security, LLC. All rights @@ -29,6 +29,7 @@ #include #include "opal/util/arch.h" +#include "opal/util/output.h" #include "opal/datatype/opal_datatype_internal.h" #include "opal/datatype/opal_datatype.h" #include "opal/datatype/opal_convertor_internal.h" @@ -40,6 +41,7 @@ bool opal_unpack_debug = false; bool opal_pack_debug = false; bool opal_position_debug = false; bool opal_copy_debug = false; +int opal_ddt_verbose = -1; /* Has the datatype verbose it's own output stream */ extern int opal_cuda_verbose; @@ -177,6 +179,14 @@ int opal_datatype_register_params(void) return ret; } + ret = mca_base_var_register ("opal", "opal", NULL, "ddt_verbose", + "Set level of opal datatype verbosity", + MCA_BASE_VAR_TYPE_INT, NULL, 0, MCA_BASE_VAR_FLAG_SETTABLE, + OPAL_INFO_LVL_8, MCA_BASE_VAR_SCOPE_LOCAL, + &opal_ddt_verbose); + if (0 > ret) { + return ret; + } #if OPAL_CUDA_SUPPORT /* Set different levels of verbosity in the cuda related code. */ ret = mca_base_var_register ("opal", "opal", NULL, "cuda_verbose", @@ -226,6 +236,12 @@ int32_t opal_datatype_init( void ) datatype->desc.desc[1].end_loop.size = datatype->size; } + /* Enable a private output stream for datatype */ + if( opal_ddt_verbose > 0 ) { + opal_datatype_dfd = opal_output_open(NULL); + opal_output_set_verbosity(opal_datatype_dfd, opal_ddt_verbose); + } + return OPAL_SUCCESS; } diff --git a/opal/datatype/opal_datatype_monotonic.c b/opal/datatype/opal_datatype_monotonic.c new file mode 100644 index 00000000000..b467d95ecbe --- /dev/null +++ b/opal/datatype/opal_datatype_monotonic.c @@ -0,0 +1,57 @@ +/* -*- Mode: C; c-basic-offset:4 ; -*- */ +/* + * Copyright (c) 2018 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "opal_config.h" + +#include + +#include "opal/constants.h" +#include "opal/datatype/opal_datatype.h" +#include "opal/datatype/opal_datatype_internal.h" +#include "opal/datatype/opal_convertor.h" + +int32_t opal_datatype_is_monotonic(opal_datatype_t* type ) +{ + opal_convertor_t *pConv; + uint32_t iov_count; + struct iovec iov[5]; + size_t max_data = 0; + long prev = -1; + int rc; + bool monotonic = true; + + pConv = opal_convertor_create( opal_local_arch, 0 ); + if (OPAL_UNLIKELY(NULL == pConv)) { + return 0; + } + rc = opal_convertor_prepare_for_send( pConv, type, 1, NULL ); + if( OPAL_UNLIKELY(OPAL_SUCCESS != rc)) { + OBJ_RELEASE(pConv); + return 0; + } + + do { + iov_count = 5; + rc = opal_convertor_raw( pConv, iov, &iov_count, &max_data); + for (uint32_t i=0; ibtypes[OPAL_DATATYPE_LOOP]+2) ); + pOrigStack = pStack = (dt_stack_t*)malloc( sizeof(dt_stack_t) * (pData->loops+2) ); SAVE_STACK( pStack, -1, 0, count, 0 ); pTypeDesc->length = 2 * pData->desc.used + 1 /* for the fake OPAL_DATATYPE_END_LOOP at the end */; @@ -85,7 +86,7 @@ opal_datatype_optimize_short( opal_datatype_t* pData, pElemDesc++; nbElems++; if( --stack_pos >= 0 ) { /* still something to do ? */ ddt_loop_desc_t* pStartLoop = &(pTypeDesc->desc[pStack->index - 1].loop); - pStartLoop->items = (pElemDesc - 1)->elem.count; + pStartLoop->items = end_loop->items; total_disp = pStack->disp; /* update the displacement position */ } pStack--; /* go down one position on the stack */ @@ -96,13 +97,13 @@ opal_datatype_optimize_short( opal_datatype_t* pData, ddt_loop_desc_t* loop = (ddt_loop_desc_t*)&(pData->desc.desc[pos_desc]); ddt_endloop_desc_t* end_loop = (ddt_endloop_desc_t*)&(pData->desc.desc[pos_desc + loop->items]); int index = GET_FIRST_NON_LOOP( &(pData->desc.desc[pos_desc]) ); - OPAL_PTRDIFF_TYPE loop_disp = pData->desc.desc[pos_desc + index].elem.disp; + ptrdiff_t loop_disp = pData->desc.desc[pos_desc + index].elem.disp; - continuity = ((last_disp + last_length * (OPAL_PTRDIFF_TYPE)opal_datatype_basicDatatypes[last_type]->size) - == (total_disp + loop_disp)); + continuity = ((last_disp + (ptrdiff_t)last_length * (ptrdiff_t)opal_datatype_basicDatatypes[last_type]->size) + == (total_disp + loop_disp)); if( loop->common.flags & OPAL_DATATYPE_FLAG_CONTIGUOUS ) { /* the loop is contiguous or composed by contiguous elements with a gap */ - if( loop->extent == (OPAL_PTRDIFF_TYPE)end_loop->size ) { + if( loop->extent == (ptrdiff_t)end_loop->size ) { /* the whole loop is contiguous */ if( !continuity ) { if( 0 != last_length ) { @@ -119,7 +120,7 @@ opal_datatype_optimize_short( opal_datatype_t* pData, last_extent = 1; } else { int counter = loop->loops; - OPAL_PTRDIFF_TYPE merged_disp = 0; + ptrdiff_t merged_disp = 0; /* if the previous data is contiguous with this piece and it has a length not ZERO */ if( last_length != 0 ) { if( continuity ) { @@ -175,14 +176,14 @@ opal_datatype_optimize_short( opal_datatype_t* pData, } if( 2 == loop->items ) { /* small loop */ if( (1 == elem->count) - && (elem->extent == (OPAL_PTRDIFF_TYPE)opal_datatype_basicDatatypes[elem->common.type]->size) ) { + && (elem->extent == (ptrdiff_t)opal_datatype_basicDatatypes[elem->common.type]->size) ) { CREATE_ELEM( pElemDesc, elem->common.type, elem->common.flags & ~OPAL_DATATYPE_FLAG_CONTIGUOUS, loop->loops, elem->disp, loop->extent ); pElemDesc++; nbElems++; pos_desc += loop->items + 1; goto complete_loop; } else if( loop->loops < 3 ) { - OPAL_PTRDIFF_TYPE elem_displ = elem->disp; + ptrdiff_t elem_displ = elem->disp; for( i = 0; i < loop->loops; i++ ) { CREATE_ELEM( pElemDesc, elem->common.type, elem->common.flags, elem->count, elem_displ, elem->extent ); @@ -206,7 +207,7 @@ opal_datatype_optimize_short( opal_datatype_t* pData, while( pData->desc.desc[pos_desc].elem.common.flags & OPAL_DATATYPE_FLAG_DATA ) { /* keep doing it until we reach a non datatype element */ /* now here we have a basic datatype */ type = pData->desc.desc[pos_desc].elem.common.type; - continuity = ((last_disp + last_length * (OPAL_PTRDIFF_TYPE)opal_datatype_basicDatatypes[last_type]->size) + continuity = ((last_disp + (ptrdiff_t)last_length * (ptrdiff_t)opal_datatype_basicDatatypes[last_type]->size) == (total_disp + pData->desc.desc[pos_desc].elem.disp)); if( (pData->desc.desc[pos_desc].elem.common.flags & OPAL_DATATYPE_FLAG_CONTIGUOUS) && continuity && @@ -254,7 +255,7 @@ opal_datatype_optimize_short( opal_datatype_t* pData, int32_t opal_datatype_commit( opal_datatype_t * pData ) { ddt_endloop_desc_t* pLast = &(pData->desc.desc[pData->desc.used].end_loop); - OPAL_PTRDIFF_TYPE first_elem_disp = 0; + ptrdiff_t first_elem_disp = 0; if( pData->flags & OPAL_DATATYPE_FLAG_COMMITTED ) return OPAL_SUCCESS; pData->flags |= OPAL_DATATYPE_FLAG_COMMITTED; diff --git a/opal/datatype/opal_datatype_pack.c b/opal/datatype/opal_datatype_pack.c index 08ae1ecf7ac..9af53f4dd58 100644 --- a/opal/datatype/opal_datatype_pack.c +++ b/opal/datatype/opal_datatype_pack.c @@ -11,7 +11,9 @@ * Copyright (c) 2004-2006 The Regents of the University of California. * All rights reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. - * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -66,7 +68,7 @@ opal_pack_homogeneous_contig_function( opal_convertor_t* pConv, unsigned char *source_base = NULL; uint32_t iov_count; size_t length = pConv->local_size - pConv->bConverted, initial_amount = pConv->bConverted; - OPAL_PTRDIFF_TYPE initial_displ = pConv->use_desc->desc[pConv->use_desc->used].end_loop.first_elem_disp; + ptrdiff_t initial_displ = pConv->use_desc->desc[pConv->use_desc->used].end_loop.first_elem_disp; source_base = (pConv->pBaseBuf + initial_displ + pStack[0].disp + pStack[1].disp); @@ -114,10 +116,10 @@ opal_pack_homogeneous_contig_with_gaps_function( opal_convertor_t* pConv, unsigned char *user_memory, *packed_buffer; uint32_t i, index, iov_count; size_t bConverted, remaining, length, initial_bytes_converted = pConv->bConverted; - OPAL_PTRDIFF_TYPE extent= pData->ub - pData->lb; - OPAL_PTRDIFF_TYPE initial_displ = pConv->use_desc->desc[pConv->use_desc->used].end_loop.first_elem_disp; + ptrdiff_t extent= pData->ub - pData->lb; + ptrdiff_t initial_displ = pConv->use_desc->desc[pConv->use_desc->used].end_loop.first_elem_disp; - assert( (pData->flags & OPAL_DATATYPE_FLAG_CONTIGUOUS) && ((OPAL_PTRDIFF_TYPE)pData->size != extent) ); + assert( (pData->flags & OPAL_DATATYPE_FLAG_CONTIGUOUS) && ((ptrdiff_t)pData->size != extent) ); DO_DEBUG( opal_output( 0, "pack_homogeneous_contig( pBaseBuf %p, iov_count %d )\n", (void*)pConv->pBaseBuf, *out_size ); ); if( stack[1].type != opal_datatype_uint1.id ) { @@ -354,7 +356,7 @@ opal_generic_simple_pack_function( opal_convertor_t* pConvertor, count_desc, (long)pStack->disp, (unsigned long)iov_len_local ); ); } if( OPAL_DATATYPE_LOOP == pElem->elem.common.type ) { - OPAL_PTRDIFF_TYPE local_disp = (OPAL_PTRDIFF_TYPE)conv_ptr; + ptrdiff_t local_disp = (ptrdiff_t)conv_ptr; if( pElem->loop.common.flags & OPAL_DATATYPE_FLAG_CONTIGUOUS ) { PACK_CONTIGUOUS_LOOP( pConvertor, pElem, count_desc, conv_ptr, iov_ptr, iov_len_local ); @@ -364,7 +366,7 @@ opal_generic_simple_pack_function( opal_convertor_t* pConvertor, } /* Save the stack with the correct last_count value. */ } - local_disp = (OPAL_PTRDIFF_TYPE)conv_ptr - local_disp; + local_disp = (ptrdiff_t)conv_ptr - local_disp; PUSH_STACK( pStack, pConvertor->stack_pos, pos_desc, OPAL_DATATYPE_LOOP, count_desc, pStack->disp + local_disp); pos_desc++; @@ -417,7 +419,7 @@ pack_predefined_heterogeneous( opal_convertor_t* CONVERTOR, const opal_convertor_master_t* master = (CONVERTOR)->master; const ddt_elem_desc_t* _elem = &((ELEM)->elem); unsigned char* _source = (*SOURCE) + _elem->disp; - OPAL_PTRDIFF_TYPE advance; + ptrdiff_t advance; uint32_t _count = *(COUNT); size_t _r_blength; @@ -430,8 +432,8 @@ pack_predefined_heterogeneous( opal_convertor_t* CONVERTOR, OPAL_DATATYPE_SAFEGUARD_POINTER( _source, (_count * _elem->extent), (CONVERTOR)->pBaseBuf, (CONVERTOR)->pDesc, (CONVERTOR)->count ); DO_DEBUG( opal_output( 0, "pack [l %s r %s] memcpy( %p, %p, %lu ) => space %lu\n", - ((OPAL_PTRDIFF_TYPE)(opal_datatype_basicDatatypes[_elem->common.type]->size) == _elem->extent) ? "cont" : "----", - ((OPAL_PTRDIFF_TYPE)_r_blength == _elem->extent) ? "cont" : "----", + ((ptrdiff_t)(opal_datatype_basicDatatypes[_elem->common.type]->size) == _elem->extent) ? "cont" : "----", + ((ptrdiff_t)_r_blength == _elem->extent) ? "cont" : "----", (void*)*(DESTINATION), (void*)_source, (unsigned long)_r_blength, (unsigned long)(*(SPACE)) ); ); master->pFunctions[_elem->common.type]( CONVERTOR, _count, @@ -542,7 +544,7 @@ opal_pack_general_function( opal_convertor_t* pConvertor, count_desc, (long)pStack->disp, (unsigned long)iov_len_local ); ); } if( OPAL_DATATYPE_LOOP == pElem->elem.common.type ) { - OPAL_PTRDIFF_TYPE local_disp = (OPAL_PTRDIFF_TYPE)conv_ptr; + ptrdiff_t local_disp = (ptrdiff_t)conv_ptr; #if 0 if( pElem->loop.common.flags & OPAL_DATATYPE_FLAG_CONTIGUOUS ) { PACK_CONTIGUOUS_LOOP( pConvertor, pElem, count_desc, @@ -554,7 +556,7 @@ opal_pack_general_function( opal_convertor_t* pConvertor, /* Save the stack with the correct last_count value. */ } #endif /* in a heterogeneous environment we can't handle the contiguous loops */ - local_disp = (OPAL_PTRDIFF_TYPE)conv_ptr - local_disp; + local_disp = (ptrdiff_t)conv_ptr - local_disp; PUSH_STACK( pStack, pConvertor->stack_pos, pos_desc, OPAL_DATATYPE_LOOP, count_desc, pStack->disp + local_disp); pos_desc++; diff --git a/opal/datatype/opal_datatype_pack.h b/opal/datatype/opal_datatype_pack.h index 541a4fbe24d..2176e53e897 100644 --- a/opal/datatype/opal_datatype_pack.h +++ b/opal/datatype/opal_datatype_pack.h @@ -5,6 +5,8 @@ * reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. * Copyright (c) 2011 NVIDIA Corporation. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -44,7 +46,7 @@ static inline void pack_predefined_data( opal_convertor_t* CONVERTOR, if( 0 == _copy_count ) return; /* nothing to do */ } - if( (OPAL_PTRDIFF_TYPE)_copy_blength == _elem->extent ) { + if( (ptrdiff_t)_copy_blength == _elem->extent ) { _copy_blength *= _copy_count; /* the extent and the size of the basic datatype are equal */ OPAL_DATATYPE_SAFEGUARD_POINTER( _source, _copy_blength, (CONVERTOR)->pBaseBuf, diff --git a/opal/datatype/opal_datatype_position.c b/opal/datatype/opal_datatype_position.c index c710a4ae3e2..a4a088ffbdb 100644 --- a/opal/datatype/opal_datatype_position.c +++ b/opal/datatype/opal_datatype_position.c @@ -11,8 +11,8 @@ * Copyright (c) 2004-2006 The Regents of the University of California. * All rights reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. - * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2014-2015 Research Organization for Information Science + * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -121,7 +121,7 @@ int opal_convertor_generic_simple_position( opal_convertor_t* pConvertor, dt_elem_desc_t* pElem; /* current position */ unsigned char *base_pointer = pConvertor->pBaseBuf; size_t iov_len_local; - OPAL_PTRDIFF_TYPE extent = pConvertor->pDesc->ub - pConvertor->pDesc->lb; + ptrdiff_t extent = pConvertor->pDesc->ub - pConvertor->pDesc->lb; DUMP( "opal_convertor_generic_simple_position( %p, &%ld )\n", (void*)pConvertor, (long)*position ); assert(*position > pConvertor->bConverted); @@ -207,7 +207,7 @@ int opal_convertor_generic_simple_position( opal_convertor_t* pConvertor, (unsigned long long)pStack->disp, (unsigned long)iov_len_local ); ); } if( OPAL_DATATYPE_LOOP == pElem->elem.common.type ) { - OPAL_PTRDIFF_TYPE local_disp = (OPAL_PTRDIFF_TYPE)base_pointer; + ptrdiff_t local_disp = (ptrdiff_t)base_pointer; if( pElem->loop.common.flags & OPAL_DATATYPE_FLAG_CONTIGUOUS ) { POSITION_CONTIGUOUS_LOOP( pConvertor, pElem, count_desc, base_pointer, iov_len_local ); @@ -217,7 +217,7 @@ int opal_convertor_generic_simple_position( opal_convertor_t* pConvertor, } /* Save the stack with the correct last_count value. */ } - local_disp = (OPAL_PTRDIFF_TYPE)base_pointer - local_disp; + local_disp = (ptrdiff_t)base_pointer - local_disp; PUSH_STACK( pStack, pConvertor->stack_pos, pos_desc, OPAL_DATATYPE_LOOP, count_desc, pStack->disp + local_disp ); pos_desc++; diff --git a/opal/datatype/opal_datatype_resize.c b/opal/datatype/opal_datatype_resize.c index b239c675b02..62147645fc5 100644 --- a/opal/datatype/opal_datatype_resize.c +++ b/opal/datatype/opal_datatype_resize.c @@ -4,7 +4,7 @@ * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -18,13 +18,13 @@ #include "opal/datatype/opal_datatype.h" #include "opal/datatype/opal_datatype_internal.h" -int32_t opal_datatype_resize( opal_datatype_t* type, OPAL_PTRDIFF_TYPE lb, OPAL_PTRDIFF_TYPE extent ) +int32_t opal_datatype_resize( opal_datatype_t* type, ptrdiff_t lb, ptrdiff_t extent ) { type->lb = lb; type->ub = lb + extent; type->flags &= ~OPAL_DATATYPE_FLAG_NO_GAPS; - if( (extent == (OPAL_PTRDIFF_TYPE)type->size) && + if( (extent == (ptrdiff_t)type->size) && (type->flags & OPAL_DATATYPE_FLAG_CONTIGUOUS) ) { type->flags |= OPAL_DATATYPE_FLAG_NO_GAPS; } diff --git a/opal/datatype/opal_datatype_unpack.c b/opal/datatype/opal_datatype_unpack.c index 195bca48f1e..b43a5c8f83e 100644 --- a/opal/datatype/opal_datatype_unpack.c +++ b/opal/datatype/opal_datatype_unpack.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2014 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2006 High Performance Computing Center Stuttgart, @@ -12,7 +12,9 @@ * All rights reserved. * Copyright (c) 2008-2009 Oak Ridge National Labs. All rights reserved. * Copyright (c) 2011 NVIDIA Corporation. All rights reserved. - * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -71,8 +73,8 @@ opal_unpack_homogeneous_contig_function( opal_convertor_t* pConv, uint32_t iov_count, i; size_t bConverted, remaining, length, initial_bytes_converted = pConv->bConverted; dt_stack_t* stack = pConv->pStack; - OPAL_PTRDIFF_TYPE extent = pData->ub - pData->lb; - OPAL_PTRDIFF_TYPE initial_displ = pConv->use_desc->desc[pConv->use_desc->used].end_loop.first_elem_disp; + ptrdiff_t extent = pData->ub - pData->lb; + ptrdiff_t initial_displ = pConv->use_desc->desc[pConv->use_desc->used].end_loop.first_elem_disp; DO_DEBUG( opal_output( 0, "unpack_homogeneous_contig( pBaseBuf %p, iov_count %d )\n", (void*)pConv->pBaseBuf, *out_size ); ); @@ -89,7 +91,7 @@ opal_unpack_homogeneous_contig_function( opal_convertor_t* pConv, bConverted = remaining; /* how much will get unpacked this time */ user_memory = pConv->pBaseBuf + initial_displ; - if( (OPAL_PTRDIFF_TYPE)pData->size == extent ) { + if( (ptrdiff_t)pData->size == extent ) { user_memory += pConv->bConverted; DO_DEBUG( opal_output( 0, "unpack_homogeneous_contig( user_memory %p, packed_buffer %p length %lu\n", (void*)user_memory, (void*)packed_buffer, (unsigned long)remaining ); ); @@ -169,7 +171,7 @@ opal_unpack_homogeneous_contig_function( opal_convertor_t* pConv, * part of the datatype is already received, we need to use a trick to handle this special * case. The trick is to fill the missing part with some well known value, unpack the data * as if it was completely received, and then move into the user memory only the bytes - * that don't match th wekk known value. This approach work as long as there is no need + * that don't match the well known value. This approach work as long as there is no need * for more than structural changes. They will not work for cases where we will have to * change the content of the data (as in all conversions that require changing the size * of the exponent or mantissa). @@ -177,7 +179,7 @@ opal_unpack_homogeneous_contig_function( opal_convertor_t* pConv, static inline uint32_t opal_unpack_partial_datatype( opal_convertor_t* pConvertor, dt_elem_desc_t* pElem, unsigned char* partial_data, - OPAL_PTRDIFF_TYPE start_position, OPAL_PTRDIFF_TYPE length, + ptrdiff_t start_position, ptrdiff_t length, unsigned char** user_buffer ) { char unused_byte = 0x7F, saved_data[16]; @@ -377,7 +379,7 @@ opal_generic_simple_unpack_function( opal_convertor_t* pConvertor, (long)pStack->disp, (unsigned long)iov_len_local ); ); } if( OPAL_DATATYPE_LOOP == pElem->elem.common.type ) { - OPAL_PTRDIFF_TYPE local_disp = (OPAL_PTRDIFF_TYPE)conv_ptr; + ptrdiff_t local_disp = (ptrdiff_t)conv_ptr; if( pElem->loop.common.flags & OPAL_DATATYPE_FLAG_CONTIGUOUS ) { UNPACK_CONTIGUOUS_LOOP( pConvertor, pElem, count_desc, iov_ptr, conv_ptr, iov_len_local ); @@ -387,7 +389,7 @@ opal_generic_simple_unpack_function( opal_convertor_t* pConvertor, } /* Save the stack with the correct last_count value. */ } - local_disp = (OPAL_PTRDIFF_TYPE)conv_ptr - local_disp; + local_disp = (ptrdiff_t)conv_ptr - local_disp; PUSH_STACK( pStack, pConvertor->stack_pos, pos_desc, OPAL_DATATYPE_LOOP, count_desc, pStack->disp + local_disp); pos_desc++; @@ -448,7 +450,7 @@ opal_unpack_general_function( opal_convertor_t* pConvertor, uint32_t iov_count; const opal_convertor_master_t* master = pConvertor->master; - OPAL_PTRDIFF_TYPE advance; /* number of bytes that we should advance the buffer */ + ptrdiff_t advance; /* number of bytes that we should advance the buffer */ int32_t rc; DO_DEBUG( opal_output( 0, "opal_convertor_general_unpack( %p, {%p, %lu}, %u )\n", @@ -500,6 +502,7 @@ opal_unpack_general_function( opal_convertor_t* pConvertor, conv_ptr = pConvertor->pBaseBuf + pStack->disp; pos_desc++; /* advance to the next data */ UPDATE_INTERNAL_COUNTERS( description, pos_desc, pElem, count_desc ); + if( 0 == iov_len_local ) goto complete_loop; /* escape if we're done */ continue; } conv_ptr += rc * description[pos_desc].elem.extent; diff --git a/opal/datatype/opal_datatype_unpack.h b/opal/datatype/opal_datatype_unpack.h index bbc8d30e39f..44f7505a58c 100644 --- a/opal/datatype/opal_datatype_unpack.h +++ b/opal/datatype/opal_datatype_unpack.h @@ -5,6 +5,8 @@ * reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. * Copyright (c) 2011 NVIDIA Corporation. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -43,7 +45,7 @@ unpack_predefined_data( opal_convertor_t* CONVERTOR, /* the convertor */ if( 0 == _copy_count ) return; /* nothing to do */ } - if( (OPAL_PTRDIFF_TYPE)_copy_blength == _elem->extent ) { + if( (ptrdiff_t)_copy_blength == _elem->extent ) { _copy_blength *= _copy_count; /* the extent and the size of the basic datatype are equal */ OPAL_DATATYPE_SAFEGUARD_POINTER( _destination, _copy_blength, (CONVERTOR)->pBaseBuf, diff --git a/opal/dss/dss.h b/opal/dss/dss.h index 35e3589577d..a9f4deedf87 100644 --- a/opal/dss/dss.h +++ b/opal/dss/dss.h @@ -11,7 +11,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2012 Los Alamos National Security, Inc. All rights reserved. - * Copyright (c) 2014 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2017 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -33,6 +33,16 @@ BEGIN_C_DECLS +/* Provide a macro for determining the bool value of an opal_value_t */ +#define OPAL_CHECK_BOOL(v, p) \ + do { \ + if (OPAL_UNDEF == (v)->type) { \ + (p) = true; \ + } else { \ + (p) = (v)->data.flag; \ + } \ + } while(0) + /* A non-API function for something that happens in a number * of places throughout the code base - loading a value into * an opal_value_t structure diff --git a/opal/dss/dss_compare.c b/opal/dss/dss_compare.c index 20ae1f0fe75..734306d9371 100644 --- a/opal/dss/dss_compare.c +++ b/opal/dss/dss_compare.c @@ -12,7 +12,7 @@ * Copyright (c) 2012 Los Alamos National Security, Inc. All rights reserved. * Copyright (c) 2014-2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2014-2016 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2018 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -359,6 +359,8 @@ int opal_dss_compare_value(opal_value_t *value1, opal_value_t *value2, opal_data return opal_dss_compare_timeval(&value1->data.tv, &value2->data.tv, type); case OPAL_NAME: return opal_dss_compare_name(&value1->data.name, &value2->data.name, type); + case OPAL_ENVAR: + return opal_dss_compare_envar(&value1->data.envar, &value2->data.envar, type); default: opal_output(0, "COMPARE-OPAL-VALUE: UNSUPPORTED TYPE %d", (int)value1->type); return OPAL_EQUAL; @@ -458,3 +460,47 @@ int opal_dss_compare_status(int *value1, int *value2, opal_data_type_t type) return OPAL_EQUAL; } +int opal_dss_compare_envar(opal_envar_t *value1, opal_envar_t *value2, opal_data_type_t type) +{ + int rc; + + if (NULL != value1->envar) { + if (NULL == value2->envar) { + return OPAL_VALUE1_GREATER; + } + rc = strcmp(value1->envar, value2->envar); + if (rc < 0) { + return OPAL_VALUE2_GREATER; + } else if (0 < rc) { + return OPAL_VALUE1_GREATER; + } + } else if (NULL != value2->envar) { + /* we know value1->envar had to be NULL */ + return OPAL_VALUE2_GREATER; + } + + /* if both are NULL or are equal, then check value */ + if (NULL != value1->value) { + if (NULL == value2->value) { + return OPAL_VALUE1_GREATER; + } + rc = strcmp(value1->value, value2->value); + if (rc < 0) { + return OPAL_VALUE2_GREATER; + } else if (0 < rc) { + return OPAL_VALUE1_GREATER; + } + } else if (NULL != value2->value) { + /* we know value1->value had to be NULL */ + return OPAL_VALUE2_GREATER; + } + + /* finally, check separator */ + if (value1->separator < value2->separator) { + return OPAL_VALUE2_GREATER; + } + if (value2->separator < value1->separator) { + return OPAL_VALUE1_GREATER; + } + return OPAL_EQUAL; +} diff --git a/opal/dss/dss_copy.c b/opal/dss/dss_copy.c index a39798bd46a..184897d77ea 100644 --- a/opal/dss/dss_copy.c +++ b/opal/dss/dss_copy.c @@ -9,7 +9,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2014-2016 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2018 Intel, Inc. All rights reserved. * Copyright (c) 2014-2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ @@ -338,6 +338,16 @@ int opal_dss_copy_value(opal_value_t **dest, opal_value_t *src, case OPAL_NAME: memcpy(&p->data.name, &src->data.name, sizeof(opal_process_name_t)); break; + case OPAL_ENVAR: + OBJ_CONSTRUCT(&p->data.envar, opal_envar_t); + if (NULL != src->data.envar.envar) { + p->data.envar.envar = strdup(src->data.envar.envar); + } + if (NULL != src->data.envar.value) { + p->data.envar.value = strdup(src->data.envar.value); + } + p->data.envar.separator = src->data.envar.separator; + break; default: opal_output(0, "COPY-OPAL-VALUE: UNSUPPORTED TYPE %d", (int)src->type); return OPAL_ERROR; @@ -409,3 +419,25 @@ int opal_dss_copy_vpid(opal_vpid_t **dest, opal_vpid_t *src, opal_data_type_t ty return OPAL_SUCCESS; } + +int opal_dss_copy_envar(opal_envar_t **dest, opal_envar_t *src, opal_data_type_t type) +{ + opal_envar_t *val; + + val = OBJ_NEW(opal_envar_t); + if (NULL == val) { + OPAL_ERROR_LOG(OPAL_ERR_OUT_OF_RESOURCE); + return OPAL_ERR_OUT_OF_RESOURCE; + } + + if (NULL != src->envar) { + val->envar = strdup(src->envar); + } + if (NULL != src->value) { + val->value = strdup(src->value); + } + val->separator = src->separator; + *dest = val; + + return OPAL_SUCCESS; +} diff --git a/opal/dss/dss_internal.h b/opal/dss/dss_internal.h index a2514379ce4..e4360b23f3f 100644 --- a/opal/dss/dss_internal.h +++ b/opal/dss/dss_internal.h @@ -11,7 +11,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2012 Los Alamos National Security, Inc. All rights reserved. - * Copyright (c) 2014-2017 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2018 Intel, Inc. All rights reserved. * Copyright (c) 2014 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. @@ -334,6 +334,8 @@ int opal_dss_pack_vpid(opal_buffer_t *buffer, const void *src, int opal_dss_pack_status(opal_buffer_t *buffer, const void *src, int32_t num_vals, opal_data_type_t type); +int opal_dss_pack_envar(opal_buffer_t *buffer, const void *src, + int32_t num_vals, opal_data_type_t type); /* * Internal unpack functions @@ -407,6 +409,9 @@ int opal_dss_unpack_vpid(opal_buffer_t *buffer, void *dest, int opal_dss_unpack_status(opal_buffer_t *buffer, void *dest, int32_t *num_vals, opal_data_type_t type); +int opal_dss_unpack_envar(opal_buffer_t *buffer, void *dest, + int32_t *num_vals, opal_data_type_t type); + /* * Internal copy functions */ @@ -438,6 +443,8 @@ int opal_dss_copy_jobid(opal_jobid_t **dest, opal_jobid_t *src, opal_data_type_t int opal_dss_copy_vpid(opal_vpid_t **dest, opal_vpid_t *src, opal_data_type_t type); +int opal_dss_copy_envar(opal_envar_t **dest, opal_envar_t *src, opal_data_type_t type); + /* * Internal compare functions @@ -503,6 +510,7 @@ int opal_dss_compare_jobid(opal_jobid_t *value1, opal_data_type_t type); int opal_dss_compare_status(int *value1, int *value2, opal_data_type_t type); +int opal_dss_compare_envar(opal_envar_t *value1, opal_envar_t *value2, opal_data_type_t type); /* * Internal print functions @@ -544,6 +552,8 @@ int opal_dss_print_name(char **output, char *prefix, opal_process_name_t *name, int opal_dss_print_jobid(char **output, char *prefix, opal_process_name_t *src, opal_data_type_t type); int opal_dss_print_vpid(char **output, char *prefix, opal_process_name_t *src, opal_data_type_t type); int opal_dss_print_status(char **output, char *prefix, int *src, opal_data_type_t type); +int opal_dss_print_envar(char **output, char *prefix, + opal_envar_t *src, opal_data_type_t type); /* diff --git a/opal/dss/dss_load_unload.c b/opal/dss/dss_load_unload.c index e84bfc4ccb3..0fa02d01c28 100644 --- a/opal/dss/dss_load_unload.c +++ b/opal/dss/dss_load_unload.c @@ -9,7 +9,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2014-2015 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2018 Intel, Inc. All rights reserved. * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ diff --git a/opal/dss/dss_open_close.c b/opal/dss/dss_open_close.c index baf58143efe..1b7085f8bd4 100644 --- a/opal/dss/dss_open_close.c +++ b/opal/dss/dss_open_close.c @@ -11,9 +11,10 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2012-2013 Los Alamos National Security, Inc. All rights reserved. - * Copyright (c) 2014-2016 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2018 Intel, Inc. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -43,7 +44,7 @@ static opal_dss_buffer_type_t default_buf_type = OPAL_DSS_BUFFER_NON_DESC; /* variable group id */ static int opal_dss_group_id = -1; -mca_base_var_enum_value_t buffer_type_values[] = { +static mca_base_var_enum_value_t buffer_type_values[] = { {OPAL_DSS_BUFFER_NON_DESC, "non-described"}, {OPAL_DSS_BUFFER_FULLY_DESC, "described"}, {0, NULL} @@ -231,6 +232,26 @@ OBJ_CLASS_INSTANCE(opal_node_stats_t, opal_object_t, opal_node_stats_destruct); +static void opal_envar_construct(opal_envar_t *obj) +{ + obj->envar = NULL; + obj->value = NULL; + obj->separator = '\0'; +} +static void opal_envar_destruct(opal_envar_t *obj) +{ + if (NULL != obj->envar) { + free(obj->envar); + } + if (NULL != obj->value) { + free(obj->value); + } +} +OBJ_CLASS_INSTANCE(opal_envar_t, + opal_list_item_t, + opal_envar_construct, + opal_envar_destruct); + int opal_dss_register_vars (void) { mca_base_var_enum_t *new_enum; @@ -623,6 +644,17 @@ int opal_dss_open(void) "OPAL_STATUS", &tmp))) { return rc; } + + tmp = OPAL_ENVAR; + if (OPAL_SUCCESS != (rc = opal_dss.register_type(opal_dss_pack_envar, + opal_dss_unpack_envar, + (opal_dss_copy_fn_t)opal_dss_copy_envar, + (opal_dss_compare_fn_t)opal_dss_compare_envar, + (opal_dss_print_fn_t)opal_dss_print_envar, + OPAL_DSS_UNSTRUCTURED, + "OPAL_ENVAR", &tmp))) { + return rc; + } /* All done */ opal_dss_initialized = true; diff --git a/opal/dss/dss_pack.c b/opal/dss/dss_pack.c index 23c9d3b31bc..703886856fb 100644 --- a/opal/dss/dss_pack.c +++ b/opal/dss/dss_pack.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2013 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2014-2016 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2018 Intel, Inc. All rights reserved. * Copyright (c) 2014 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ @@ -836,8 +836,13 @@ int opal_dss_pack_value(opal_buffer_t *buffer, const void *src, return ret; } break; + case OPAL_ENVAR: + if (OPAL_SUCCESS != (ret = opal_dss_pack_buffer(buffer, &ptr[i]->data.envar, 1, OPAL_ENVAR))) { + return ret; + } + break; default: - opal_output(0, "PACK-OPAL-VALUE: UNSUPPORTED TYPE %d", (int)ptr[i]->type); + opal_output(0, "PACK-OPAL-VALUE: UNSUPPORTED TYPE %d FOR KEY %s", (int)ptr[i]->type, ptr[i]->key); return OPAL_ERROR; } } @@ -982,3 +987,23 @@ int opal_dss_pack_status(opal_buffer_t *buffer, const void *src, return ret; } +int opal_dss_pack_envar(opal_buffer_t *buffer, const void *src, + int32_t num_vals, opal_data_type_t type) +{ + int ret; + int32_t n; + opal_envar_t *ptr = (opal_envar_t*)src; + + for (n=0; n < num_vals; n++) { + if (OPAL_SUCCESS != (ret = opal_dss_pack_string(buffer, &ptr[n].envar, 1, OPAL_STRING))) { + return ret; + } + if (OPAL_SUCCESS != (ret = opal_dss_pack_string(buffer, &ptr[n].value, 1, OPAL_STRING))) { + return ret; + } + if (OPAL_SUCCESS != (ret = opal_dss_pack_byte(buffer, &ptr[n].separator, 1, OPAL_BYTE))) { + return ret; + } + } + return OPAL_SUCCESS; +} diff --git a/opal/dss/dss_print.c b/opal/dss/dss_print.c index 8cd620715f7..8009c3f2c14 100644 --- a/opal/dss/dss_print.c +++ b/opal/dss/dss_print.c @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2012 Los Alamos National Security, Inc. All rights reserved. - * Copyright (c) 2014-2016 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2018 Intel, Inc. All rights reserved. * Copyright (c) 2014 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ @@ -806,6 +806,13 @@ int opal_dss_print_value(char **output, char *prefix, opal_value_t *src, opal_da case OPAL_PTR: asprintf(output, "%sOPAL_VALUE: Data type: OPAL_PTR\tKey: %s", prefx, src->key); break; + case OPAL_ENVAR: + asprintf(output, "%sOPAL_VALUE: Data type: OPAL_ENVAR\tKey: %s\tName: %s\tValue: %s\tSeparator: %c", + prefx, src->key, + (NULL == src->data.envar.envar) ? "NULL" : src->data.envar.envar, + (NULL == src->data.envar.value) ? "NULL" : src->data.envar.value, + ('\0' == src->data.envar.separator) ? ' ' : src->data.envar.separator); + break; default: asprintf(output, "%sOPAL_VALUE: Data type: UNKNOWN\tKey: %s\tValue: UNPRINTABLE", prefx, src->key); @@ -845,77 +852,75 @@ int opal_dss_print_name(char **output, char *prefix, opal_process_name_t *name, int opal_dss_print_jobid(char **output, char *prefix, opal_process_name_t *src, opal_data_type_t type) { - char *prefx; + char *prefx = " "; /* deal with NULL prefix */ - if (NULL == prefix) asprintf(&prefx, " "); - else prefx = prefix; + if (NULL != prefix) prefx = prefix; /* if src is NULL, just print data type and return */ if (NULL == src) { asprintf(output, "%sData type: OPAL_JOBID\tValue: NULL pointer", prefx); - if (prefx != prefix) { - free(prefx); - } return OPAL_SUCCESS; } asprintf(output, "%sData type: OPAL_JOBID\tValue: %s", prefx, opal_jobid_print(src->jobid)); - if (prefx != prefix) { - free(prefx); - } - return OPAL_SUCCESS; } int opal_dss_print_vpid(char **output, char *prefix, opal_process_name_t *src, opal_data_type_t type) { - char *prefx; + char *prefx = " "; /* deal with NULL prefix */ - if (NULL == prefix) asprintf(&prefx, " "); - else prefx = prefix; + if (NULL != prefix) prefx = prefix; /* if src is NULL, just print data type and return */ if (NULL == src) { asprintf(output, "%sData type: OPAL_VPID\tValue: NULL pointer", prefx); - if (prefx != prefix) { - free(prefx); - } return OPAL_SUCCESS; } asprintf(output, "%sData type: OPAL_VPID\tValue: %s", prefx, opal_vpid_print(src->vpid)); - if (prefx != prefix) { - free(prefx); - } - return OPAL_SUCCESS; } int opal_dss_print_status(char **output, char *prefix, int *src, opal_data_type_t type) { - char *prefx; + char *prefx = " "; /* deal with NULL prefix */ - if (NULL == prefix) asprintf(&prefx, " "); - else prefx = prefix; + if (NULL != prefix) prefx = prefix; /* if src is NULL, just print data type and return */ if (NULL == src) { asprintf(output, "%sData type: OPAL_STATUS\tValue: NULL pointer", prefx); - if (prefx != prefix) { - free(prefx); - } return OPAL_SUCCESS; } asprintf(output, "%sData type: OPAL_STATUS\tValue: %s", prefx, opal_strerror(*src)); - if (prefx != prefix) { - free(prefx); + return OPAL_SUCCESS; +} + + +int opal_dss_print_envar(char **output, char *prefix, + opal_envar_t *src, opal_data_type_t type) +{ + char *prefx = " "; + + /* deal with NULL prefix */ + if (NULL != prefix) prefx = prefix; + + /* if src is NULL, just print data type and return */ + if (NULL == src) { + asprintf(output, "%sData type: OPAL_ENVAR\tValue: NULL pointer", prefx); + return OPAL_SUCCESS; } + asprintf(output, "%sOPAL_VALUE: Data type: OPAL_ENVAR\tName: %s\tValue: %s\tSeparator: %c", + prefx, (NULL == src->envar) ? "NULL" : src->envar, + (NULL == src->value) ? "NULL" : src->value, + ('\0' == src->separator) ? ' ' : src->separator); return OPAL_SUCCESS; } diff --git a/opal/dss/dss_types.h b/opal/dss/dss_types.h index 23d2f08dcae..47da99da6c4 100644 --- a/opal/dss/dss_types.h +++ b/opal/dss/dss_types.h @@ -15,7 +15,7 @@ * reserved. * Copyright (c) 2014-2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2014-2016 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2018 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -107,6 +107,7 @@ typedef struct { #define OPAL_INFO_DIRECTIVES (opal_data_type_t) 36 /**< corresponds to PMIx info directives type (uint32_t) */ #define OPAL_PROC_STATE (opal_data_type_t) 37 /**< corresponds to PMIx proc state type (uint8_t) */ #define OPAL_PROC_INFO (opal_data_type_t) 38 /**< corresponds to PMIx proc_info type */ +#define OPAL_ENVAR (opal_data_type_t) 39 /**< corresponds to PMIx envar type */ /* OPAL Dynamic */ #define OPAL_DSS_ID_DYNAMIC (opal_data_type_t) 100 @@ -131,7 +132,16 @@ typedef struct { opal_status_t exit_code; opal_proc_state_t state; } opal_proc_info_t; -OBJ_CLASS_DECLARATION(opal_proc_info_t); +OPAL_DECLSPEC OBJ_CLASS_DECLARATION(opal_proc_info_t); + +/* defaine a struct for envar directives */ +typedef struct { + opal_list_item_t super; + char *envar; + char *value; + char separator; +} opal_envar_t; +OPAL_DECLSPEC OBJ_CLASS_DECLARATION(opal_envar_t); /* Data value object */ typedef struct { @@ -163,6 +173,7 @@ typedef struct { opal_process_name_t name; opal_proc_info_t pinfo; void *ptr; // never packed or passed anywhere + opal_envar_t envar; } data; } opal_value_t; OPAL_DECLSPEC OBJ_CLASS_DECLARATION(opal_value_t); diff --git a/opal/dss/dss_unpack.c b/opal/dss/dss_unpack.c index be9993983cd..bb28673d2f0 100644 --- a/opal/dss/dss_unpack.c +++ b/opal/dss/dss_unpack.c @@ -11,7 +11,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2012-2015 Los Alamos National Security, Inc. All rights reserved. - * Copyright (c) 2014-2016 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2018 Intel, Inc. All rights reserved. * Copyright (c) 2014-2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ @@ -1086,13 +1086,26 @@ int opal_dss_unpack_value(opal_buffer_t *buffer, void *dest, return ret; } break; + case OPAL_PTR: + /* just ignore these values */ + break; case OPAL_NAME: if (OPAL_SUCCESS != (ret = opal_dss_unpack_buffer(buffer, &ptr[i]->data.name, &m, OPAL_NAME))) { return ret; } break; + case OPAL_STATUS: + if (OPAL_SUCCESS != (ret = opal_dss_unpack_buffer(buffer, &ptr[i]->data.status, &m, OPAL_INT))) { + return ret; + } + break; + case OPAL_ENVAR: + if (OPAL_SUCCESS != (ret = opal_dss_unpack_buffer(buffer, &ptr[i]->data.envar, &m, OPAL_ENVAR))) { + return ret; + } + break; default: - opal_output(0, "PACK-OPAL-VALUE: UNSUPPORTED TYPE"); + opal_output(0, "UNPACK-OPAL-VALUE: UNSUPPORTED TYPE %d FOR KEY %s", (int)ptr[i]->type, ptr[i]->key); return OPAL_ERROR; } } @@ -1253,3 +1266,35 @@ int opal_dss_unpack_status(opal_buffer_t *buffer, void *dest, return ret; } + + +int opal_dss_unpack_envar(opal_buffer_t *buffer, void *dest, + int32_t *num_vals, opal_data_type_t type) +{ + opal_envar_t *ptr; + int32_t i, n, m; + int ret; + + ptr = (opal_envar_t *) dest; + n = *num_vals; + + for (i = 0; i < n; ++i) { + m=1; + if (OPAL_SUCCESS != (ret = opal_dss_unpack_string(buffer, &ptr[i].envar, &m, OPAL_STRING))) { + OPAL_ERROR_LOG(ret); + return ret; + } + m=1; + if (OPAL_SUCCESS != (ret = opal_dss_unpack_string(buffer, &ptr[i].value, &m, OPAL_STRING))) { + OPAL_ERROR_LOG(ret); + return ret; + } + m=1; + if (OPAL_SUCCESS != (ret = opal_dss_unpack_byte(buffer, &ptr[i].separator, &m, OPAL_BYTE))) { + OPAL_ERROR_LOG(ret); + return ret; + } + } + + return OPAL_SUCCESS; +} diff --git a/opal/etc/openmpi-mca-params.conf b/opal/etc/openmpi-mca-params.conf index e4914804723..09c1ac300b4 100644 --- a/opal/etc/openmpi-mca-params.conf +++ b/opal/etc/openmpi-mca-params.conf @@ -10,6 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2006-2017 Cisco Systems, Inc. All rights reserved +# Copyright (c) 2018 Intel, Inc. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -52,7 +53,7 @@ # directory. For example: # Change component loading path -# component_path = /usr/local/lib/openmpi:~/my_openmpi_components +# mca_base_component_path = /usr/local/lib/openmpi:~/my_openmpi_components # See "ompi_info --param all all --level 9" for a full listing of Open # MPI MCA parameters available and their default values. diff --git a/opal/include/opal/constants.h b/opal/include/opal/constants.h index f05e53b6cdd..246e964da02 100644 --- a/opal/include/opal/constants.h +++ b/opal/include/opal/constants.h @@ -10,7 +10,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2010-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2014-2016 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2018 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -94,10 +94,13 @@ enum { OPAL_ERR_PROC_RESTART = (OPAL_ERR_BASE - 63), OPAL_ERR_PROC_CHECKPOINT = (OPAL_ERR_BASE - 64), OPAL_ERR_PROC_MIGRATE = (OPAL_ERR_BASE - 65), - OPAL_ERR_EVENT_REGISTRATION = (OPAL_ERR_BASE - 66) + OPAL_ERR_EVENT_REGISTRATION = (OPAL_ERR_BASE - 66), + OPAL_ERR_HEARTBEAT_ALERT = (OPAL_ERR_BASE - 67), + OPAL_ERR_FILE_ALERT = (OPAL_ERR_BASE - 68), + OPAL_ERR_MODEL_DECLARED = (OPAL_ERR_BASE - 69), + OPAL_PMIX_LAUNCH_DIRECTIVE = (OPAL_ERR_BASE - 70) }; #define OPAL_ERR_MAX (OPAL_ERR_BASE - 100) #endif /* OPAL_CONSTANTS_H */ - diff --git a/opal/include/opal/sys/Makefile.am b/opal/include/opal/sys/Makefile.am index 230abe81e79..9387ed6da17 100644 --- a/opal/include/opal/sys/Makefile.am +++ b/opal/include/opal/sys/Makefile.am @@ -35,9 +35,6 @@ include opal/sys/x86_64/Makefile.am include opal/sys/arm/Makefile.am include opal/sys/arm64/Makefile.am include opal/sys/ia32/Makefile.am -include opal/sys/ia64/Makefile.am -include opal/sys/mips/Makefile.am -include opal/sys/osx/Makefile.am include opal/sys/powerpc/Makefile.am include opal/sys/sparcv9/Makefile.am include opal/sys/sync_builtin/Makefile.am diff --git a/opal/include/opal/sys/architecture.h b/opal/include/opal/sys/architecture.h index 6341fc354fb..ee9aa96901d 100644 --- a/opal/include/opal/sys/architecture.h +++ b/opal/include/opal/sys/architecture.h @@ -42,8 +42,9 @@ #define OPAL_MIPS 0070 #define OPAL_ARM 0100 #define OPAL_ARM64 0101 +#define OPAL_S390 0110 +#define OPAL_S390X 0111 #define OPAL_BUILTIN_SYNC 0200 -#define OPAL_BUILTIN_OSX 0201 #define OPAL_BUILTIN_GCC 0202 #define OPAL_BUILTIN_NO 0203 diff --git a/opal/include/opal/sys/arm/atomic.h b/opal/include/opal/sys/arm/atomic.h index 49d033ba222..6d4db3ad7a4 100644 --- a/opal/include/opal/sys/arm/atomic.h +++ b/opal/include/opal/sys/arm/atomic.h @@ -1,3 +1,4 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology @@ -11,6 +12,8 @@ * All rights reserved. * Copyright (c) 2010 IBM Corporation. All rights reserved. * Copyright (c) 2010 ARM ltd. All rights reserved. + * Copyright (c) 2017 Los Alamos National Security, LLC. All rights + * reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -104,12 +107,12 @@ void opal_atomic_isync(void) #if (OPAL_GCC_INLINE_ASSEMBLY && (OPAL_ASM_ARM_VERSION >= 6)) -#define OPAL_HAVE_ATOMIC_CMPSET_32 1 +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 1 #define OPAL_HAVE_ATOMIC_MATH_32 1 -static inline int opal_atomic_cmpset_32(volatile int32_t *addr, - int32_t oldval, int32_t newval) +static inline bool opal_atomic_compare_exchange_strong_32 (volatile int32_t *addr, int32_t *oldval, int32_t newval) { - int32_t ret, tmp; + int32_t prev, tmp; + bool ret; __asm__ __volatile__ ( "1: ldrex %0, [%2] \n" @@ -120,11 +123,13 @@ static inline int opal_atomic_cmpset_32(volatile int32_t *addr, " bne 1b \n" "2: \n" - : "=&r" (ret), "=&r" (tmp) - : "r" (addr), "r" (oldval), "r" (newval) + : "=&r" (prev), "=&r" (tmp) + : "r" (addr), "r" (*oldval), "r" (newval) : "cc", "memory"); - return (ret == oldval); + ret = (prev == *oldval); + *oldval = prev; + return ret; } /* these two functions aren't inlined in the non-gcc case because then @@ -132,51 +137,50 @@ static inline int opal_atomic_cmpset_32(volatile int32_t *addr, atomic_?mb can be inlined). Instead, we "inline" them by hand in the assembly, meaning there is one function call overhead instead of two */ -static inline int opal_atomic_cmpset_acq_32(volatile int32_t *addr, - int32_t oldval, int32_t newval) +static inline bool opal_atomic_compare_exchange_strong_acq_32 (volatile int32_t *addr, int32_t *oldval, int32_t newval) { - int rc; + bool rc; - rc = opal_atomic_cmpset_32(addr, oldval, newval); + rc = opal_atomic_compare_exchange_strong_32 (addr, oldval, newval); opal_atomic_rmb(); return rc; } -static inline int opal_atomic_cmpset_rel_32(volatile int32_t *addr, - int32_t oldval, int32_t newval) +static inline bool opal_atomic_compare_exchange_strong_rel_32 (volatile int32_t *addr, int32_t *oldval, int32_t newval) { opal_atomic_wmb(); - return opal_atomic_cmpset_32(addr, oldval, newval); + return opal_atomic_compare_exchange_strong_32 (addr, oldval, newval); } #if (OPAL_ASM_SUPPORT_64BIT == 1) -#define OPAL_HAVE_ATOMIC_CMPSET_64 1 -static inline int opal_atomic_cmpset_64(volatile int64_t *addr, - int64_t oldval, int64_t newval) +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64 1 +static inline bool opal_atomic_compare_exchange_strong_64 (volatile int64_t *addr, int64_t *oldval, int64_t newval) { - int64_t ret; - int tmp; - - - __asm__ __volatile__ ( - "1: ldrexd %0, %H0, [%2] \n" - " cmp %0, %3 \n" - " it eq \n" - " cmpeq %H0, %H3 \n" - " bne 2f \n" - " strexd %1, %4, %H4, [%2] \n" - " cmp %1, #0 \n" - " bne 1b \n" - "2: \n" - - : "=&r" (ret), "=&r" (tmp) - : "r" (addr), "r" (oldval), "r" (newval) - : "cc", "memory"); - - return (ret == oldval); + int64_t prev; + int tmp; + bool ret; + + __asm__ __volatile__ ( + "1: ldrexd %0, %H0, [%2] \n" + " cmp %0, %3 \n" + " it eq \n" + " cmpeq %H0, %H3 \n" + " bne 2f \n" + " strexd %1, %4, %H4, [%2] \n" + " cmp %1, #0 \n" + " bne 1b \n" + "2: \n" + + : "=&r" (prev), "=&r" (tmp) + : "r" (addr), "r" (*oldval), "r" (newval) + : "cc", "memory"); + + ret = (prev == *oldval); + *oldval = prev; + return ret; } /* these two functions aren't inlined in the non-gcc case because then @@ -184,91 +188,65 @@ static inline int opal_atomic_cmpset_64(volatile int64_t *addr, atomic_?mb can be inlined). Instead, we "inline" them by hand in the assembly, meaning there is one function call overhead instead of two */ -static inline int opal_atomic_cmpset_acq_64(volatile int64_t *addr, - int64_t oldval, int64_t newval) +static inline bool opal_atomic_compare_exchange_strong_acq_64 (volatile int64_t *addr, int64_t *oldval, int64_t newval) { - int rc; + bool rc; - rc = opal_atomic_cmpset_64(addr, oldval, newval); + rc = opal_atomic_compare_exchange_strong_64 (addr, oldval, newval); opal_atomic_rmb(); return rc; } -static inline int opal_atomic_cmpset_rel_64(volatile int64_t *addr, - int64_t oldval, int64_t newval) +static inline bool opal_atomic_compare_exchange_strong_rel_64 (volatile int64_t *addr, int64_t *oldval, int64_t newval) { opal_atomic_wmb(); - return opal_atomic_cmpset_64(addr, oldval, newval); + return opal_atomic_compare_exchange_strong_64 (addr, oldval, newval); } #endif #define OPAL_HAVE_ATOMIC_ADD_32 1 -static inline int32_t opal_atomic_add_32(volatile int32_t* v, int inc) +static inline int32_t opal_atomic_fetch_add_32(volatile int32_t* v, int inc) { - int32_t t; - int tmp; - - __asm__ __volatile__( - "1: ldrex %0, [%2] \n" - " add %0, %0, %3 \n" - " strex %1, %0, [%2] \n" - " cmp %1, #0 \n" + int32_t t, old; + int tmp; + + __asm__ __volatile__( + "1: ldrex %1, [%3] \n" + " add %0, %1, %4 \n" + " strex %2, %0, [%3] \n" + " cmp %2, #0 \n" " bne 1b \n" - : "=&r" (t), "=&r" (tmp) + : "=&r" (t), "=&r" (old), "=&r" (tmp) : "r" (v), "r" (inc) : "cc", "memory"); - return t; + return old; } #define OPAL_HAVE_ATOMIC_SUB_32 1 -static inline int32_t opal_atomic_sub_32(volatile int32_t* v, int dec) +static inline int32_t opal_atomic_fetch_sub_32(volatile int32_t* v, int dec) { - int32_t t; - int tmp; - - __asm__ __volatile__( - "1: ldrex %0, [%2] \n" - " sub %0, %0, %3 \n" - " strex %1, %0, [%2] \n" - " cmp %1, #0 \n" + int32_t t, old; + int tmp; + + __asm__ __volatile__( + "1: ldrex %1, [%3] \n" + " sub %0, %1, %4 \n" + " strex %2, %0, [%3] \n" + " cmp %2, #0 \n" " bne 1b \n" - : "=&r" (t), "=&r" (tmp) + : "=&r" (t), "=&r" (old), "=&r" (tmp) : "r" (v), "r" (dec) : "cc", "memory"); - return t; -} - -#else /* OPAL_ASM_ARM_VERSION <=5 or no GCC inline assembly */ - -#define OPAL_HAVE_ATOMIC_CMPSET_32 1 -#define __kuser_cmpxchg (*((int (*)(int, int, volatile int*))(0xffff0fc0))) -static inline int opal_atomic_cmpset_32(volatile int32_t *addr, - int32_t oldval, int32_t newval) -{ - return !(__kuser_cmpxchg(oldval, newval, addr)); -} - -static inline int opal_atomic_cmpset_acq_32(volatile int32_t *addr, - int32_t oldval, int32_t newval) -{ - /* kernel function includes all necessary memory barriers */ - return opal_atomic_cmpset_32(addr, oldval, newval); -} - -static inline int opal_atomic_cmpset_rel_32(volatile int32_t *addr, - int32_t oldval, int32_t newval) -{ - /* kernel function includes all necessary memory barriers */ - return opal_atomic_cmpset_32(addr, oldval, newval); + return t; } #endif diff --git a/opal/include/opal/sys/arm64/atomic.h b/opal/include/opal/sys/arm64/atomic.h index 2f7f7d32aac..6b380ccc2a2 100644 --- a/opal/include/opal/sys/arm64/atomic.h +++ b/opal/include/opal/sys/arm64/atomic.h @@ -12,7 +12,7 @@ * All rights reserved. * Copyright (c) 2010 IBM Corporation. All rights reserved. * Copyright (c) 2010 ARM ltd. All rights reserved. - * Copyright (c) 2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2016-2017 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -29,15 +29,21 @@ #define OPAL_HAVE_ATOMIC_MEM_BARRIER 1 #define OPAL_HAVE_ATOMIC_LLSC_32 1 -#define OPAL_HAVE_ATOMIC_CMPSET_32 1 +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 1 #define OPAL_HAVE_ATOMIC_SWAP_32 1 #define OPAL_HAVE_ATOMIC_MATH_32 1 -#define OPAL_HAVE_ATOMIC_CMPSET_64 1 +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64 1 #define OPAL_HAVE_ATOMIC_SWAP_64 1 #define OPAL_HAVE_ATOMIC_LLSC_64 1 #define OPAL_HAVE_ATOMIC_ADD_32 1 +#define OPAL_HAVE_ATOMIC_AND_32 1 +#define OPAL_HAVE_ATOMIC_OR_32 1 +#define OPAL_HAVE_ATOMIC_XOR_32 1 #define OPAL_HAVE_ATOMIC_SUB_32 1 #define OPAL_HAVE_ATOMIC_ADD_64 1 +#define OPAL_HAVE_ATOMIC_AND_64 1 +#define OPAL_HAVE_ATOMIC_OR_64 1 +#define OPAL_HAVE_ATOMIC_XOR_64 1 #define OPAL_HAVE_ATOMIC_SUB_64 1 #define MB() __asm__ __volatile__ ("dmb sy" : : : "memory") @@ -76,10 +82,10 @@ static inline void opal_atomic_isync (void) * *********************************************************************/ -static inline int opal_atomic_cmpset_32(volatile int32_t *addr, - int32_t oldval, int32_t newval) +static inline bool opal_atomic_compare_exchange_strong_32 (volatile int32_t *addr, int32_t *oldval, int32_t newval) { - int32_t ret, tmp; + int32_t prev, tmp; + bool ret; __asm__ __volatile__ ("1: ldaxr %w0, [%2] \n" " cmp %w0, %w3 \n" @@ -87,11 +93,13 @@ static inline int opal_atomic_cmpset_32(volatile int32_t *addr, " stxr %w1, %w4, [%2] \n" " cbnz %w1, 1b \n" "2: \n" - : "=&r" (ret), "=&r" (tmp) - : "r" (addr), "r" (oldval), "r" (newval) + : "=&r" (prev), "=&r" (tmp) + : "r" (addr), "r" (*oldval), "r" (newval) : "cc", "memory"); - return (ret == oldval); + ret = (prev == *oldval); + *oldval = prev; + return ret; } static inline int32_t opal_atomic_swap_32(volatile int32_t *addr, int32_t newval) @@ -113,10 +121,10 @@ static inline int32_t opal_atomic_swap_32(volatile int32_t *addr, int32_t newval atomic_?mb can be inlined). Instead, we "inline" them by hand in the assembly, meaning there is one function call overhead instead of two */ -static inline int opal_atomic_cmpset_acq_32(volatile int32_t *addr, - int32_t oldval, int32_t newval) +static inline bool opal_atomic_compare_exchange_strong_acq_32 (volatile int32_t *addr, int32_t *oldval, int32_t newval) { - int32_t ret, tmp; + int32_t prev, tmp; + bool ret; __asm__ __volatile__ ("1: ldaxr %w0, [%2] \n" " cmp %w0, %w3 \n" @@ -124,18 +132,20 @@ static inline int opal_atomic_cmpset_acq_32(volatile int32_t *addr, " stxr %w1, %w4, [%2] \n" " cbnz %w1, 1b \n" "2: \n" - : "=&r" (ret), "=&r" (tmp) - : "r" (addr), "r" (oldval), "r" (newval) + : "=&r" (prev), "=&r" (tmp) + : "r" (addr), "r" (*oldval), "r" (newval) : "cc", "memory"); - return (ret == oldval); + ret = (prev == *oldval); + *oldval = prev; + return ret; } -static inline int opal_atomic_cmpset_rel_32(volatile int32_t *addr, - int32_t oldval, int32_t newval) +static inline bool opal_atomic_compare_exchange_strong_rel_32 (volatile int32_t *addr, int32_t *oldval, int32_t newval) { - int32_t ret, tmp; + int32_t prev, tmp; + bool ret; __asm__ __volatile__ ("1: ldxr %w0, [%2] \n" " cmp %w0, %w3 \n" @@ -143,11 +153,13 @@ static inline int opal_atomic_cmpset_rel_32(volatile int32_t *addr, " stlxr %w1, %w4, [%2] \n" " cbnz %w1, 1b \n" "2: \n" - : "=&r" (ret), "=&r" (tmp) - : "r" (addr), "r" (oldval), "r" (newval) + : "=&r" (prev), "=&r" (tmp) + : "r" (addr), "r" (*oldval), "r" (newval) : "cc", "memory"); - return (ret == oldval); + ret = (prev == *oldval); + *oldval = prev; + return ret; } static inline int32_t opal_atomic_ll_32 (volatile int32_t *addr) @@ -173,11 +185,11 @@ static inline int opal_atomic_sc_32 (volatile int32_t *addr, int32_t newval) return ret == 0; } -static inline int opal_atomic_cmpset_64(volatile int64_t *addr, - int64_t oldval, int64_t newval) +static inline bool opal_atomic_compare_exchange_strong_64 (volatile int64_t *addr, int64_t *oldval, int64_t newval) { - int64_t ret; + int64_t prev; int tmp; + bool ret; __asm__ __volatile__ ("1: ldaxr %0, [%2] \n" " cmp %0, %3 \n" @@ -185,11 +197,13 @@ static inline int opal_atomic_cmpset_64(volatile int64_t *addr, " stxr %w1, %4, [%2] \n" " cbnz %w1, 1b \n" "2: \n" - : "=&r" (ret), "=&r" (tmp) - : "r" (addr), "r" (oldval), "r" (newval) + : "=&r" (prev), "=&r" (tmp) + : "r" (addr), "r" (*oldval), "r" (newval) : "cc", "memory"); - return (ret == oldval); + ret = (prev == *oldval); + *oldval = prev; + return ret; } static inline int64_t opal_atomic_swap_64 (volatile int64_t *addr, int64_t newval) @@ -212,11 +226,11 @@ static inline int64_t opal_atomic_swap_64 (volatile int64_t *addr, int64_t newva atomic_?mb can be inlined). Instead, we "inline" them by hand in the assembly, meaning there is one function call overhead instead of two */ -static inline int opal_atomic_cmpset_acq_64(volatile int64_t *addr, - int64_t oldval, int64_t newval) +static inline bool opal_atomic_compare_exchange_strong_acq_64 (volatile int64_t *addr, int64_t *oldval, int64_t newval) { - int64_t ret; + int64_t prev; int tmp; + bool ret; __asm__ __volatile__ ("1: ldaxr %0, [%2] \n" " cmp %0, %3 \n" @@ -224,19 +238,21 @@ static inline int opal_atomic_cmpset_acq_64(volatile int64_t *addr, " stxr %w1, %4, [%2] \n" " cbnz %w1, 1b \n" "2: \n" - : "=&r" (ret), "=&r" (tmp) - : "r" (addr), "r" (oldval), "r" (newval) + : "=&r" (prev), "=&r" (tmp) + : "r" (addr), "r" (*oldval), "r" (newval) : "cc", "memory"); - return (ret == oldval); + ret = (prev == *oldval); + *oldval = prev; + return ret; } -static inline int opal_atomic_cmpset_rel_64(volatile int64_t *addr, - int64_t oldval, int64_t newval) +static inline bool opal_atomic_compare_exchange_strong_rel_64 (volatile int64_t *addr, int64_t *oldval, int64_t newval) { - int64_t ret; + int64_t prev; int tmp; + bool ret; __asm__ __volatile__ ("1: ldxr %0, [%2] \n" " cmp %0, %3 \n" @@ -244,11 +260,13 @@ static inline int opal_atomic_cmpset_rel_64(volatile int64_t *addr, " stlxr %w1, %4, [%2] \n" " cbnz %w1, 1b \n" "2: \n" - : "=&r" (ret), "=&r" (tmp) - : "r" (addr), "r" (oldval), "r" (newval) + : "=&r" (prev), "=&r" (tmp) + : "r" (addr), "r" (*oldval), "r" (newval) : "cc", "memory"); - return (ret == oldval); + ret = (prev == *oldval); + *oldval = prev; + return ret; } static inline int64_t opal_atomic_ll_64 (volatile int64_t *addr) @@ -275,25 +293,31 @@ static inline int opal_atomic_sc_64 (volatile int64_t *addr, int64_t newval) } #define OPAL_ASM_MAKE_ATOMIC(type, bits, name, inst, reg) \ - static inline type opal_atomic_ ## name ## _ ## bits (volatile type *addr, type value) \ + static inline type opal_atomic_fetch_ ## name ## _ ## bits (volatile type *addr, type value) \ { \ - type newval; \ + type newval, old; \ int32_t tmp; \ \ - __asm__ __volatile__("1: ldxr %" reg "0, [%2] \n" \ - " " inst " %" reg "0, %" reg "0, %" reg "3 \n" \ - " stxr %w1, %" reg "0, [%2] \n" \ - " cbnz %w1, 1b \n" \ - : "=&r" (newval), "=&r" (tmp) \ + __asm__ __volatile__("1: ldxr %" reg "1, [%3] \n" \ + " " inst " %" reg "0, %" reg "1, %" reg "4 \n" \ + " stxr %w2, %" reg "0, [%3] \n" \ + " cbnz %w2, 1b \n" \ + : "=&r" (newval), "=&r" (old), "=&r" (tmp) \ : "r" (addr), "r" (value) \ : "cc", "memory"); \ \ - return newval; \ + return old; \ } OPAL_ASM_MAKE_ATOMIC(int32_t, 32, add, "add", "w") +OPAL_ASM_MAKE_ATOMIC(int32_t, 32, and, "and", "w") +OPAL_ASM_MAKE_ATOMIC(int32_t, 32, or, "orr", "w") +OPAL_ASM_MAKE_ATOMIC(int32_t, 32, xor, "eor", "w") OPAL_ASM_MAKE_ATOMIC(int32_t, 32, sub, "sub", "w") OPAL_ASM_MAKE_ATOMIC(int64_t, 64, add, "add", "") +OPAL_ASM_MAKE_ATOMIC(int64_t, 64, and, "and", "") +OPAL_ASM_MAKE_ATOMIC(int64_t, 64, or, "orr", "") +OPAL_ASM_MAKE_ATOMIC(int64_t, 64, xor, "eor", "") OPAL_ASM_MAKE_ATOMIC(int64_t, 64, sub, "sub", "") #endif /* OPAL_GCC_INLINE_ASSEMBLY */ diff --git a/opal/include/opal/sys/atomic.h b/opal/include/opal/sys/atomic.h index 1622d4f8303..3b165f00e05 100644 --- a/opal/include/opal/sys/atomic.h +++ b/opal/include/opal/sys/atomic.h @@ -12,7 +12,7 @@ * All rights reserved. * Copyright (c) 2007 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2011 Sandia National Laboratories. All rights reserved. - * Copyright (c) 2011-2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2011-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. @@ -40,11 +40,11 @@ * * - \c OPAL_HAVE_ATOMIC_MEM_BARRIER atomic memory barriers * - \c OPAL_HAVE_ATOMIC_SPINLOCKS atomic spinlocks - * - \c OPAL_HAVE_ATOMIC_MATH_32 if 32 bit add/sub/cmpset can be done "atomicly" - * - \c OPAL_HAVE_ATOMIC_MATH_64 if 64 bit add/sub/cmpset can be done "atomicly" + * - \c OPAL_HAVE_ATOMIC_MATH_32 if 32 bit add/sub/compare-exchange can be done "atomicly" + * - \c OPAL_HAVE_ATOMIC_MATH_64 if 64 bit add/sub/compare-exchange can be done "atomicly" * * Note that for the Atomic math, atomic add/sub may be implemented as - * C code using opal_atomic_cmpset. The appearance of atomic + * C code using opal_atomic_compare_exchange. The appearance of atomic * operation will be upheld in these cases. */ @@ -53,6 +53,8 @@ #include "opal_config.h" +#include + #include "opal/sys/architecture.h" #include "opal_stdint.h" @@ -61,10 +63,6 @@ #ifdef OPAL_DISABLE_INLINE_ASM #undef OPAL_C_GCC_INLINE_ASSEMBLY #define OPAL_C_GCC_INLINE_ASSEMBLY 0 -#undef OPAL_C_DEC_INLINE_ASSEMBLY -#define OPAL_C_DEC_INLINE_ASSEMBLY 0 -#undef OPAL_C_XLC_INLINE_ASSEMBLY -#define OPAL_C_XLC_INLINE_ASSEMBLY 0 #endif /* define OPAL_{GCC,DEC,XLC}_INLINE_ASSEMBLY based on the @@ -73,12 +71,8 @@ #if defined(c_plusplus) || defined(__cplusplus) /* We no longer support inline assembly for C++ as OPAL is a C-only interface */ #define OPAL_GCC_INLINE_ASSEMBLY 0 -#define OPAL_DEC_INLINE_ASSEMBLY 0 -#define OPAL_XLC_INLINE_ASSEMBLY 0 #else #define OPAL_GCC_INLINE_ASSEMBLY OPAL_C_GCC_INLINE_ASSEMBLY -#define OPAL_DEC_INLINE_ASSEMBLY OPAL_C_DEC_INLINE_ASSEMBLY -#define OPAL_XLC_INLINE_ASSEMBLY OPAL_C_XLC_INLINE_ASSEMBLY #endif @@ -113,21 +107,33 @@ typedef struct opal_atomic_lock_t opal_atomic_lock_t; *********************************************************************/ #if !OPAL_GCC_INLINE_ASSEMBLY #define OPAL_HAVE_INLINE_ATOMIC_MEM_BARRIER 0 -#define OPAL_HAVE_INLINE_ATOMIC_CMPSET_32 0 -#define OPAL_HAVE_INLINE_ATOMIC_CMPSET_64 0 +#define OPAL_HAVE_INLINE_ATOMIC_COMPARE_EXCHANGE_32 0 +#define OPAL_HAVE_INLINE_ATOMIC_COMPARE_EXCHANGE_64 0 #define OPAL_HAVE_INLINE_ATOMIC_ADD_32 0 +#define OPAL_HAVE_INLINE_ATOMIC_AND_32 0 +#define OPAL_HAVE_INLINE_ATOMIC_OR_32 0 +#define OPAL_HAVE_INLINE_ATOMIC_XOR_32 0 #define OPAL_HAVE_INLINE_ATOMIC_SUB_32 0 #define OPAL_HAVE_INLINE_ATOMIC_ADD_64 0 +#define OPAL_HAVE_INLINE_ATOMIC_AND_64 0 +#define OPAL_HAVE_INLINE_ATOMIC_OR_64 0 +#define OPAL_HAVE_INLINE_ATOMIC_XOR_64 0 #define OPAL_HAVE_INLINE_ATOMIC_SUB_64 0 #define OPAL_HAVE_INLINE_ATOMIC_SWAP_32 0 #define OPAL_HAVE_INLINE_ATOMIC_SWAP_64 0 #else #define OPAL_HAVE_INLINE_ATOMIC_MEM_BARRIER 1 -#define OPAL_HAVE_INLINE_ATOMIC_CMPSET_32 1 -#define OPAL_HAVE_INLINE_ATOMIC_CMPSET_64 1 +#define OPAL_HAVE_INLINE_ATOMIC_COMPARE_EXCHANGE_32 1 +#define OPAL_HAVE_INLINE_ATOMIC_COMPARE_EXCHANGE_64 1 #define OPAL_HAVE_INLINE_ATOMIC_ADD_32 1 +#define OPAL_HAVE_INLINE_ATOMIC_AND_32 1 +#define OPAL_HAVE_INLINE_ATOMIC_OR_32 1 +#define OPAL_HAVE_INLINE_ATOMIC_XOR_32 1 #define OPAL_HAVE_INLINE_ATOMIC_SUB_32 1 #define OPAL_HAVE_INLINE_ATOMIC_ADD_64 1 +#define OPAL_HAVE_INLINE_ATOMIC_AND_64 1 +#define OPAL_HAVE_INLINE_ATOMIC_OR_64 1 +#define OPAL_HAVE_INLINE_ATOMIC_XOR_64 1 #define OPAL_HAVE_INLINE_ATOMIC_SUB_64 1 #define OPAL_HAVE_INLINE_ATOMIC_SWAP_32 1 #define OPAL_HAVE_INLINE_ATOMIC_SWAP_64 1 @@ -137,8 +143,8 @@ typedef struct opal_atomic_lock_t opal_atomic_lock_t; * Enumeration of lock states */ enum { - OPAL_ATOMIC_UNLOCKED = 0, - OPAL_ATOMIC_LOCKED = 1 + OPAL_ATOMIC_LOCK_UNLOCKED = 0, + OPAL_ATOMIC_LOCK_LOCKED = 1 }; /********************************************************************** @@ -153,8 +159,6 @@ enum { #include "opal/sys/sync_builtin/atomic.h" #elif OPAL_ASSEMBLY_BUILTIN == OPAL_BUILTIN_GCC #include "opal/sys/gcc_builtin/atomic.h" -#elif OPAL_ASSEMBLY_BUILTIN == OPAL_BUILTIN_OSX -#include "opal/sys/osx/atomic.h" #elif OPAL_ASSEMBLY_ARCH == OPAL_X86_64 #include "opal/sys/x86_64/atomic.h" #elif OPAL_ASSEMBLY_ARCH == OPAL_ARM @@ -183,14 +187,14 @@ enum { /* compare and set operations can't really be emulated from software, so if these defines aren't already set, they should be set to 0 now */ -#ifndef OPAL_HAVE_ATOMIC_CMPSET_32 -#define OPAL_HAVE_ATOMIC_CMPSET_32 0 +#ifndef OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 0 #endif -#ifndef OPAL_HAVE_ATOMIC_CMPSET_64 -#define OPAL_HAVE_ATOMIC_CMPSET_64 0 +#ifndef OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64 +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64 0 #endif -#ifndef OPAL_HAVE_ATOMIC_CMPSET_128 -#define OPAL_HAVE_ATOMIC_CMPSET_128 0 +#ifndef OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_128 +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_128 0 #endif #ifndef OPAL_HAVE_ATOMIC_LLSC_32 #define OPAL_HAVE_ATOMIC_LLSC_32 0 @@ -266,7 +270,7 @@ void opal_atomic_wmb(void); /********************************************************************** * - * Atomic spinlocks - always inlined, if have atomic cmpset + * Atomic spinlocks - always inlined, if have atomic compare-and-swap * *********************************************************************/ @@ -276,7 +280,7 @@ void opal_atomic_wmb(void); #define OPAL_HAVE_ATOMIC_SPINLOCKS 0 #endif -#if defined(DOXYGEN) || OPAL_HAVE_ATOMIC_SPINLOCKS || (OPAL_HAVE_ATOMIC_CMPSET_32 || OPAL_HAVE_ATOMIC_CMPSET_64) +#if defined(DOXYGEN) || OPAL_HAVE_ATOMIC_SPINLOCKS || (OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 || OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64) /** * Initialize a lock to value @@ -287,7 +291,7 @@ void opal_atomic_wmb(void); #if OPAL_HAVE_ATOMIC_SPINLOCKS == 0 static inline #endif -void opal_atomic_init(opal_atomic_lock_t* lock, int32_t value); +void opal_atomic_lock_init(opal_atomic_lock_t* lock, int32_t value); /** @@ -326,7 +330,7 @@ void opal_atomic_unlock(opal_atomic_lock_t *lock); #if OPAL_HAVE_ATOMIC_SPINLOCKS == 0 #undef OPAL_HAVE_ATOMIC_SPINLOCKS -#define OPAL_HAVE_ATOMIC_SPINLOCKS (OPAL_HAVE_ATOMIC_CMPSET_32 || OPAL_HAVE_ATOMIC_CMPSET_64) +#define OPAL_HAVE_ATOMIC_SPINLOCKS (OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 || OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64) #define OPAL_NEED_INLINE_ATOMIC_SPINLOCKS 1 #endif @@ -343,48 +347,48 @@ void opal_atomic_unlock(opal_atomic_lock_t *lock); #endif #if defined(DOXYGEN) || OPAL_HAVE_ATOMIC_CMPSET_32 -#if OPAL_HAVE_INLINE_ATOMIC_CMPSET_32 +#if OPAL_HAVE_INLINE_ATOMIC_COMPARE_EXCHANGE_32 static inline #endif -int opal_atomic_cmpset_32(volatile int32_t *addr, int32_t oldval, - int32_t newval); +bool opal_atomic_compare_exchange_strong_32 (volatile int32_t *addr, int32_t *oldval, + int32_t newval); -#if OPAL_HAVE_INLINE_ATOMIC_CMPSET_32 +#if OPAL_HAVE_INLINE_ATOMIC_COMPARE_EXCHANGE_32 static inline #endif -int opal_atomic_cmpset_acq_32(volatile int32_t *addr, int32_t oldval, - int32_t newval); +bool opal_atomic_compare_exchange_strong_acq_32 (volatile int32_t *addr, int32_t *oldval, + int32_t newval); -#if OPAL_HAVE_INLINE_ATOMIC_CMPSET_32 +#if OPAL_HAVE_INLINE_ATOMIC_COMPARE_EXCHANGE_32 static inline #endif -int opal_atomic_cmpset_rel_32(volatile int32_t *addr, int32_t oldval, - int32_t newval); +bool opal_atomic_compare_exchange_strong_rel_32 (volatile int32_t *addr, int32_t *oldval, + int32_t newval); #endif -#if !defined(OPAL_HAVE_ATOMIC_CMPSET_64) && !defined(DOXYGEN) -#define OPAL_HAVE_ATOMIC_CMPSET_64 0 +#if !defined(OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64) && !defined(DOXYGEN) +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64 0 #endif -#if defined(DOXYGEN) || OPAL_HAVE_ATOMIC_CMPSET_64 +#if defined(DOXYGEN) || OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64 -#if OPAL_HAVE_INLINE_ATOMIC_CMPSET_64 +#if OPAL_HAVE_INLINE_ATOMIC_COMPARE_EXCHANGE_64 static inline #endif -int opal_atomic_cmpset_64(volatile int64_t *addr, int64_t oldval, - int64_t newval); +bool opal_atomic_compare_exchange_strong_64 (volatile int64_t *addr, int64_t *oldval, + int64_t newval); -#if OPAL_HAVE_INLINE_ATOMIC_CMPSET_64 +#if OPAL_HAVE_INLINE_ATOMIC_COMPARE_EXCHANGE_64 static inline #endif -int opal_atomic_cmpset_acq_64(volatile int64_t *addr, int64_t oldval, - int64_t newval); +bool opal_atomic_compare_exchange_strong_acq_64 (volatile int64_t *addr, int64_t *oldval, + int64_t newval); -#if OPAL_HAVE_INLINE_ATOMIC_CMPSET_64 +#if OPAL_HAVE_INLINE_ATOMIC_COMPARE_EXCHANGE_64 static inline #endif -int opal_atomic_cmpset_rel_64(volatile int64_t *addr, int64_t oldval, - int64_t newval); +bool opal_atomic_compare_exchange_strong_rel_64 (volatile int64_t *addr, int64_t *oldval, + int64_t newval); #endif @@ -393,30 +397,29 @@ int opal_atomic_cmpset_rel_64(volatile int64_t *addr, int64_t oldval, #define OPAL_HAVE_ATOMIC_MATH_32 0 #endif -#if defined(DOXYGEN) || OPAL_HAVE_ATOMIC_MATH_32 || OPAL_HAVE_ATOMIC_CMPSET_32 +#if defined(DOXYGEN) || OPAL_HAVE_ATOMIC_MATH_32 || OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 -/* OPAL_HAVE_INLINE_ATOMIC_*_32 will be 1 if /atomic.h provides - a static inline version of it (in assembly). If we have to fall - back on cmpset 32, that too will be inline. */ -#if OPAL_HAVE_INLINE_ATOMIC_ADD_32 || (!defined(OPAL_HAVE_ATOMIC_ADD_32) && OPAL_HAVE_ATOMIC_CMPSET_32) -static inline -#endif -int32_t opal_atomic_add_32(volatile int32_t *addr, int delta); - -/* OPAL_HAVE_INLINE_ATOMIC_*_32 will be 1 if /atomic.h provides - a static inline version of it (in assembly). If we have to fall - back to cmpset 32, that too will be inline. */ -#if OPAL_HAVE_INLINE_ATOMIC_SUB_32 || (!defined(OPAL_HAVE_ATOMIC_ADD_32) && OPAL_HAVE_ATOMIC_CMPSET_32) -static inline -#endif -int32_t opal_atomic_sub_32(volatile int32_t *addr, int delta); +static inline int32_t opal_atomic_add_fetch_32(volatile int32_t *addr, int delta); +static inline int32_t opal_atomic_fetch_add_32(volatile int32_t *addr, int delta); +static inline int32_t opal_atomic_and_fetch_32(volatile int32_t *addr, int32_t value); +static inline int32_t opal_atomic_fetch_and_32(volatile int32_t *addr, int32_t value); +static inline int32_t opal_atomic_or_fetch_32(volatile int32_t *addr, int32_t value); +static inline int32_t opal_atomic_fetch_or_32(volatile int32_t *addr, int32_t value); +static inline int32_t opal_atomic_xor_fetch_32(volatile int32_t *addr, int32_t value); +static inline int32_t opal_atomic_fetch_xor_32(volatile int32_t *addr, int32_t value); +static inline int32_t opal_atomic_sub_fetch_32(volatile int32_t *addr, int delta); +static inline int32_t opal_atomic_fetch_sub_32(volatile int32_t *addr, int delta); +static inline int32_t opal_atomic_min_fetch_32 (volatile int32_t *addr, int32_t value); +static inline int32_t opal_atomic_fetch_min_32 (volatile int32_t *addr, int32_t value); +static inline int32_t opal_atomic_max_fetch_32 (volatile int32_t *addr, int32_t value); +static inline int32_t opal_atomic_fetch_max_32 (volatile int32_t *addr, int32_t value); #endif /* OPAL_HAVE_ATOMIC_MATH_32 */ #if ! OPAL_HAVE_ATOMIC_MATH_32 /* fix up the value of opal_have_atomic_math_32 to allow for C versions */ #undef OPAL_HAVE_ATOMIC_MATH_32 -#define OPAL_HAVE_ATOMIC_MATH_32 OPAL_HAVE_ATOMIC_CMPSET_32 +#define OPAL_HAVE_ATOMIC_MATH_32 OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 #endif #ifndef OPAL_HAVE_ATOMIC_MATH_64 @@ -424,30 +427,28 @@ int32_t opal_atomic_sub_32(volatile int32_t *addr, int delta); #define OPAL_HAVE_ATOMIC_MATH_64 0 #endif -#if defined(DOXYGEN) || OPAL_HAVE_ATOMIC_MATH_64 || OPAL_HAVE_ATOMIC_CMPSET_64 +#if defined(DOXYGEN) || OPAL_HAVE_ATOMIC_MATH_64 || OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64 -/* OPAL_HAVE_INLINE_ATOMIC_*_64 will be 1 if /atomic.h provides - a static inline version of it (in assembly). If we have to fall - back to cmpset 64, that too will be inline */ -#if OPAL_HAVE_INLINE_ATOMIC_ADD_64 || (!defined(OPAL_HAVE_ATOMIC_ADD_64) && OPAL_HAVE_ATOMIC_CMPSET_64) -static inline -#endif -int64_t opal_atomic_add_64(volatile int64_t *addr, int64_t delta); - -/* OPAL_HAVE_INLINE_ATOMIC_*_64 will be 1 if /atomic.h provides - a static inline version of it (in assembly). If we have to fall - back to cmpset 64, that too will be inline */ -#if OPAL_HAVE_INLINE_ATOMIC_SUB_64 || (!defined(OPAL_HAVE_ATOMIC_ADD_64) && OPAL_HAVE_ATOMIC_CMPSET_64) -static inline -#endif -int64_t opal_atomic_sub_64(volatile int64_t *addr, int64_t delta); +static inline int64_t opal_atomic_add_fetch_64(volatile int64_t *addr, int64_t delta); +static inline int64_t opal_atomic_fetch_add_64(volatile int64_t *addr, int64_t delta); +static inline int64_t opal_atomic_and_fetch_64(volatile int64_t *addr, int64_t value); +static inline int64_t opal_atomic_fetch_and_64(volatile int64_t *addr, int64_t value); +static inline int64_t opal_atomic_or_fetch_64(volatile int64_t *addr, int64_t value); +static inline int64_t opal_atomic_fetch_or_64(volatile int64_t *addr, int64_t value); +static inline int64_t opal_atomic_fetch_xor_64(volatile int64_t *addr, int64_t value); +static inline int64_t opal_atomic_sub_fetch_64(volatile int64_t *addr, int64_t delta); +static inline int64_t opal_atomic_fetch_sub_64(volatile int64_t *addr, int64_t delta); +static inline int64_t opal_atomic_min_fetch_64 (volatile int64_t *addr, int64_t value); +static inline int64_t opal_atomic_fetch_min_64 (volatile int64_t *addr, int64_t value); +static inline int64_t opal_atomic_max_fetch_64 (volatile int64_t *addr, int64_t value); +static inline int64_t opal_atomic_fetch_max_64 (volatile int64_t *addr, int64_t value); -#endif /* OPAL_HAVE_ATOMIC_MATH_32 */ +#endif /* OPAL_HAVE_ATOMIC_MATH_64 */ #if ! OPAL_HAVE_ATOMIC_MATH_64 /* fix up the value of opal_have_atomic_math_64 to allow for C versions */ #undef OPAL_HAVE_ATOMIC_MATH_64 -#define OPAL_HAVE_ATOMIC_MATH_64 OPAL_HAVE_ATOMIC_CMPSET_64 +#define OPAL_HAVE_ATOMIC_MATH_64 OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64 #endif /* provide a size_t add/subtract. When in debug mode, make it an @@ -457,114 +458,141 @@ int64_t opal_atomic_sub_64(volatile int64_t *addr, int64_t delta); */ #if defined(DOXYGEN) || OPAL_ENABLE_DEBUG static inline size_t -opal_atomic_add_size_t(volatile size_t *addr, int delta) +opal_atomic_add_fetch_size_t(volatile size_t *addr, size_t delta) { #if SIZEOF_SIZE_T == 4 - return (size_t) opal_atomic_add_32((int32_t*) addr, delta); + return (size_t) opal_atomic_add_fetch_32((int32_t*) addr, delta); #elif SIZEOF_SIZE_T == 8 - return (size_t) opal_atomic_add_64((int64_t*) addr, delta); + return (size_t) opal_atomic_add_fetch_64((int64_t*) addr, delta); #else #error "Unknown size_t size" #endif } + static inline size_t -opal_atomic_sub_size_t(volatile size_t *addr, int delta) +opal_atomic_fetch_add_size_t(volatile size_t *addr, size_t delta) { #if SIZEOF_SIZE_T == 4 - return (size_t) opal_atomic_sub_32((int32_t*) addr, delta); + return (size_t) opal_atomic_fetch_add_32((int32_t*) addr, delta); #elif SIZEOF_SIZE_T == 8 - return (size_t) opal_atomic_sub_64((int64_t*) addr, delta); + return (size_t) opal_atomic_fetch_add_64((int64_t*) addr, delta); #else #error "Unknown size_t size" #endif } + +static inline size_t +opal_atomic_sub_fetch_size_t(volatile size_t *addr, size_t delta) +{ +#if SIZEOF_SIZE_T == 4 + return (size_t) opal_atomic_sub_fetch_32((int32_t*) addr, delta); +#elif SIZEOF_SIZE_T == 8 + return (size_t) opal_atomic_sub_fetch_64((int64_t*) addr, delta); #else +#error "Unknown size_t size" +#endif +} + +static inline size_t +opal_atomic_fetch_sub_size_t(volatile size_t *addr, size_t delta) +{ #if SIZEOF_SIZE_T == 4 -#define opal_atomic_add_size_t(addr, delta) ((size_t) opal_atomic_add_32((int32_t*) addr, delta)) -#define opal_atomic_sub_size_t(addr, delta) ((size_t) opal_atomic_sub_32((int32_t*) addr, delta)) -#elif SIZEOF_SIZE_T ==8 -#define opal_atomic_add_size_t(addr, delta) ((size_t) opal_atomic_add_64((int64_t*) addr, delta)) -#define opal_atomic_sub_size_t(addr, delta) ((size_t) opal_atomic_sub_64((int64_t*) addr, delta)) + return (size_t) opal_atomic_fetch_sub_32((int32_t*) addr, delta); +#elif SIZEOF_SIZE_T == 8 + return (size_t) opal_atomic_fetch_sub_64((int64_t*) addr, delta); +#else +#error "Unknown size_t size" +#endif +} + +#else +#if SIZEOF_SIZE_T == 4 +#define opal_atomic_add_fetch_size_t(addr, delta) ((size_t) opal_atomic_add_fetch_32((volatile int32_t *) addr, delta)) +#define opal_atomic_fetch_add_size_t(addr, delta) ((size_t) opal_atomic_fetch_add_32((volatile int32_t *) addr, delta)) +#define opal_atomic_sub_fetch_size_t(addr, delta) ((size_t) opal_atomic_sub_fetch_32((volatile int32_t *) addr, delta)) +#define opal_atomic_fetch_sub_size_t(addr, delta) ((size_t) opal_atomic_fetch_sub_32((volatile int32_t *) addr, delta)) +#elif SIZEOF_SIZE_T == 8 +#define opal_atomic_add_fetch_size_t(addr, delta) ((size_t) opal_atomic_add_fetch_64((volatile int64_t *) addr, delta)) +#define opal_atomic_fetch_add_size_t(addr, delta) ((size_t) opal_atomic_fetch_add_64((volatile int64_t *) addr, delta)) +#define opal_atomic_sub_fetch_size_t(addr, delta) ((size_t) opal_atomic_sub_fetch_64((volatile int64_t *) addr, delta)) +#define opal_atomic_fetch_sub_size_t(addr, delta) ((size_t) opal_atomic_fetch_sub_64((volatile int64_t *) addr, delta)) #else #error "Unknown size_t size" #endif #endif -#if defined(DOXYGEN) || (OPAL_HAVE_ATOMIC_CMPSET_32 || OPAL_HAVE_ATOMIC_CMPSET_64) +#if defined(DOXYGEN) || (OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 || OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64) /* these are always done with inline functions, so always mark as static inline */ -static inline int opal_atomic_cmpset_xx(volatile void* addr, int64_t oldval, - int64_t newval, size_t length); -static inline int opal_atomic_cmpset_acq_xx(volatile void* addr, - int64_t oldval, int64_t newval, - size_t length); -static inline int opal_atomic_cmpset_rel_xx(volatile void* addr, - int64_t oldval, int64_t newval, - size_t length); - -static inline int opal_atomic_cmpset_ptr(volatile void* addr, - void* oldval, - void* newval); -static inline int opal_atomic_cmpset_acq_ptr(volatile void* addr, - void* oldval, - void* newval); -static inline int opal_atomic_cmpset_rel_ptr(volatile void* addr, - void* oldval, - void* newval); + +static inline bool opal_atomic_compare_exchange_strong_xx (volatile void *addr, void *oldval, + int64_t newval, size_t length); +static inline bool opal_atomic_compare_exchange_strong_acq_xx (volatile void *addr, void *oldval, + int64_t newval, size_t length); +static inline bool opal_atomic_compare_exchange_strong_rel_xx (volatile void *addr, void *oldval, + int64_t newval, size_t length); + + +static inline bool opal_atomic_compare_exchange_strong_ptr (volatile void* addr, void *oldval, + void *newval); +static inline bool opal_atomic_compare_exchange_strong_acq_ptr (volatile void* addr, void *oldval, + void *newval); +static inline bool opal_atomic_compare_exchange_strong_rel_ptr (volatile void* addr, void *oldval, + void *newval); /** - * Atomic compare and set of pointer with relaxed semantics. This + * Atomic compare and set of generic type with relaxed semantics. This * macro detect at compile time the type of the first argument and * choose the correct function to be called. * * \note This macro should only be used for integer types. * * @param addr Address of . - * @param oldval Comparison value . + * @param oldval Comparison value address of . * @param newval New value to set if comparision is true . * - * See opal_atomic_cmpset_* for pseudo-code. + * See opal_atomic_compare_exchange_* for pseudo-code. */ -#define opal_atomic_cmpset( ADDR, OLDVAL, NEWVAL ) \ - opal_atomic_cmpset_xx( (volatile void*)(ADDR), (intptr_t)(OLDVAL), \ - (intptr_t)(NEWVAL), sizeof(*(ADDR)) ) +#define opal_atomic_compare_exchange_strong( ADDR, OLDVAL, NEWVAL ) \ + opal_atomic_compare_exchange_strong_xx( (volatile void*)(ADDR), (void *)(OLDVAL), \ + (intptr_t)(NEWVAL), sizeof(*(ADDR)) ) /** - * Atomic compare and set of pointer with acquire semantics. This - * macro detect at compile time the type of the first argument - * and choose the correct function to be called. + * Atomic compare and set of generic type with acquire semantics. This + * macro detect at compile time the type of the first argument and + * choose the correct function to be called. * * \note This macro should only be used for integer types. * * @param addr Address of . - * @param oldval Comparison value . + * @param oldval Comparison value address of . * @param newval New value to set if comparision is true . * - * See opal_atomic_cmpset_acq_* for pseudo-code. + * See opal_atomic_compare_exchange_acq_* for pseudo-code. */ -#define opal_atomic_cmpset_acq( ADDR, OLDVAL, NEWVAL ) \ - opal_atomic_cmpset_acq_xx( (volatile void*)(ADDR), (int64_t)(OLDVAL), \ - (int64_t)(NEWVAL), sizeof(*(ADDR)) ) - +#define opal_atomic_compare_exchange_strong_acq( ADDR, OLDVAL, NEWVAL ) \ + opal_atomic_compare_exchange_strong_acq_xx( (volatile void*)(ADDR), (void *)(OLDVAL), \ + (intptr_t)(NEWVAL), sizeof(*(ADDR)) ) /** - * Atomic compare and set of pointer with release semantics. This - * macro detect at compile time the type of the first argument - * and choose the correct function to b + * Atomic compare and set of generic type with release semantics. This + * macro detect at compile time the type of the first argument and + * choose the correct function to be called. * * \note This macro should only be used for integer types. * * @param addr Address of . - * @param oldval Comparison value . + * @param oldval Comparison value address of . * @param newval New value to set if comparision is true . * - * See opal_atomic_cmpsetrel_* for pseudo-code. + * See opal_atomic_compare_exchange_rel_* for pseudo-code. */ -#define opal_atomic_cmpset_rel( ADDR, OLDVAL, NEWVAL ) \ - opal_atomic_cmpset_rel_xx( (volatile void*)(ADDR), (int64_t)(OLDVAL), \ - (int64_t)(NEWVAL), sizeof(*(ADDR)) ) +#define opal_atomic_compare_exchange_strong_rel( ADDR, OLDVAL, NEWVAL ) \ + opal_atomic_compare_exchange_strong_rel_xx( (volatile void*)(ADDR), (void *)(OLDVAL), \ + (intptr_t)(NEWVAL), sizeof(*(ADDR)) ) + -#endif /* (OPAL_HAVE_ATOMIC_CMPSET_32 || OPAL_HAVE_ATOMIC_CMPSET_64) */ +#endif /* (OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 || OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64) */ #if defined(DOXYGEN) || (OPAL_HAVE_ATOMIC_MATH_32 || OPAL_HAVE_ATOMIC_MATH_64) @@ -572,15 +600,11 @@ static inline void opal_atomic_add_xx(volatile void* addr, int32_t value, size_t length); static inline void opal_atomic_sub_xx(volatile void* addr, int32_t value, size_t length); -#if SIZEOF_VOID_P == 4 && OPAL_HAVE_ATOMIC_CMPSET_32 -static inline int32_t opal_atomic_add_ptr( volatile void* addr, void* delta ); -static inline int32_t opal_atomic_sub_ptr( volatile void* addr, void* delta ); -#elif SIZEOF_VOID_P == 8 && OPAL_HAVE_ATOMIC_CMPSET_64 -static inline int64_t opal_atomic_add_ptr( volatile void* addr, void* delta ); -static inline int64_t opal_atomic_sub_ptr( volatile void* addr, void* delta ); -#else -#error Atomic arithmetic on pointers not supported -#endif + +static inline intptr_t opal_atomic_add_fetch_ptr( volatile void* addr, void* delta ); +static inline intptr_t opal_atomic_fetch_add_ptr( volatile void* addr, void* delta ); +static inline intptr_t opal_atomic_sub_fetch_ptr( volatile void* addr, void* delta ); +static inline intptr_t opal_atomic_fetch_sub_ptr( volatile void* addr, void* delta ); /** * Atomically increment the content depending on the type. This diff --git a/opal/include/opal/sys/atomic_impl.h b/opal/include/opal/sys/atomic_impl.h index 16b03b485f3..0eef41eb49a 100644 --- a/opal/include/opal/sys/atomic_impl.h +++ b/opal/include/opal/sys/atomic_impl.h @@ -11,7 +11,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2010-2014 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2012-2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2012-2017 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -34,20 +34,63 @@ * * Some architectures do not provide support for the 64 bits * atomic operations. Until we find a better solution let's just - * undefine all those functions if there is no 64 bit cmpset + * undefine all those functions if there is no 64 bit compare-exchange * *********************************************************************/ -#if OPAL_HAVE_ATOMIC_CMPSET_32 +#if OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 + +#if !defined(OPAL_HAVE_ATOMIC_MIN_32) +static inline int32_t opal_atomic_fetch_min_32 (volatile int32_t *addr, int32_t value) +{ + int32_t old = *addr; + do { + if (old <= value) { + break; + } + } while (!opal_atomic_compare_exchange_strong_32 (addr, &old, value)); + + return old; +} + +#define OPAL_HAVE_ATOMIC_MIN_32 1 + +#endif /* OPAL_HAVE_ATOMIC_MIN_32 */ + +#if !defined(OPAL_HAVE_ATOMIC_MAX_32) +static inline int32_t opal_atomic_fetch_max_32 (volatile int32_t *addr, int32_t value) +{ + int32_t old = *addr; + do { + if (old >= value) { + break; + } + } while (!opal_atomic_compare_exchange_strong_32 (addr, &old, value)); + + return old; +} + +#define OPAL_HAVE_ATOMIC_MAX_32 1 +#endif /* OPAL_HAVE_ATOMIC_MAX_32 */ + +#define OPAL_ATOMIC_DEFINE_CMPXCG_OP(type, bits, operation, name) \ + static inline type opal_atomic_fetch_ ## name ## _ ## bits (volatile type *addr, type value) \ + { \ + type oldval; \ + do { \ + oldval = *addr; \ + } while (!opal_atomic_compare_exchange_strong_ ## bits (addr, &oldval, oldval operation value)); \ + \ + return oldval; \ + } #if !defined(OPAL_HAVE_ATOMIC_SWAP_32) #define OPAL_HAVE_ATOMIC_SWAP_32 1 static inline int32_t opal_atomic_swap_32(volatile int32_t *addr, int32_t newval) { - int32_t old; + int32_t old = *addr; do { - old = *addr; - } while (0 == opal_atomic_cmpset_32(addr, old, newval)); + } while (!opal_atomic_compare_exchange_strong_32 (addr, &old, newval)); return old; } @@ -55,214 +98,201 @@ static inline int32_t opal_atomic_swap_32(volatile int32_t *addr, #if !defined(OPAL_HAVE_ATOMIC_ADD_32) #define OPAL_HAVE_ATOMIC_ADD_32 1 -static inline int32_t -opal_atomic_add_32(volatile int32_t *addr, int delta) -{ - int32_t oldval; - do { - oldval = *addr; - } while (0 == opal_atomic_cmpset_32(addr, oldval, oldval + delta)); - return (oldval + delta); -} +OPAL_ATOMIC_DEFINE_CMPXCG_OP(int32_t, 32, +, add) + #endif /* OPAL_HAVE_ATOMIC_ADD_32 */ +#if !defined(OPAL_HAVE_ATOMIC_AND_32) +#define OPAL_HAVE_ATOMIC_AND_32 1 + +OPAL_ATOMIC_DEFINE_CMPXCG_OP(int32_t, 32, &, and) + +#endif /* OPAL_HAVE_ATOMIC_AND_32 */ + +#if !defined(OPAL_HAVE_ATOMIC_OR_32) +#define OPAL_HAVE_ATOMIC_OR_32 1 + +OPAL_ATOMIC_DEFINE_CMPXCG_OP(int32_t, 32, |, or) + +#endif /* OPAL_HAVE_ATOMIC_OR_32 */ + +#if !defined(OPAL_HAVE_ATOMIC_XOR_32) +#define OPAL_HAVE_ATOMIC_XOR_32 1 + +OPAL_ATOMIC_DEFINE_CMPXCG_OP(int32_t, 32, ^, xor) + +#endif /* OPAL_HAVE_ATOMIC_XOR_32 */ + #if !defined(OPAL_HAVE_ATOMIC_SUB_32) #define OPAL_HAVE_ATOMIC_SUB_32 1 -static inline int32_t -opal_atomic_sub_32(volatile int32_t *addr, int delta) -{ - int32_t oldval; - do { - oldval = *addr; - } while (0 == opal_atomic_cmpset_32(addr, oldval, oldval - delta)); - return (oldval - delta); -} +OPAL_ATOMIC_DEFINE_CMPXCG_OP(int32_t, 32, -, sub) + #endif /* OPAL_HAVE_ATOMIC_SUB_32 */ -#endif /* OPAL_HAVE_ATOMIC_CMPSET_32 */ +#endif /* OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 */ -#if OPAL_HAVE_ATOMIC_CMPSET_64 +#if OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64 -#if !defined(OPAL_HAVE_ATOMIC_SWAP_64) -#define OPAL_HAVE_ATOMIC_SWAP_64 1 -static inline int64_t opal_atomic_swap_64(volatile int64_t *addr, - int64_t newval) +#if !defined(OPAL_HAVE_ATOMIC_MIN_64) +static inline int64_t opal_atomic_fetch_min_64 (volatile int64_t *addr, int64_t value) { - int64_t old; + int64_t old = *addr; do { - old = *addr; - } while (0 == opal_atomic_cmpset_64(addr, old, newval)); + if (old <= value) { + break; + } + } while (!opal_atomic_compare_exchange_strong_64 (addr, &old, value)); + return old; } -#endif /* OPAL_HAVE_ATOMIC_SWAP_32 */ -#if !defined(OPAL_HAVE_ATOMIC_ADD_64) -#define OPAL_HAVE_ATOMIC_ADD_64 1 -static inline int64_t -opal_atomic_add_64(volatile int64_t *addr, int64_t delta) +#define OPAL_HAVE_ATOMIC_MIN_64 1 + +#endif /* OPAL_HAVE_ATOMIC_MIN_64 */ + +#if !defined(OPAL_HAVE_ATOMIC_MAX_64) +static inline int64_t opal_atomic_fetch_max_64 (volatile int64_t *addr, int64_t value) { - int64_t oldval; + int64_t old = *addr; + do { + if (old >= value) { + break; + } + } while (!opal_atomic_compare_exchange_strong_64 (addr, &old, value)); - do { - oldval = *addr; - } while (0 == opal_atomic_cmpset_64(addr, oldval, oldval + delta)); - return (oldval + delta); + return old; } -#endif /* OPAL_HAVE_ATOMIC_ADD_64 */ +#define OPAL_HAVE_ATOMIC_MAX_64 1 +#endif /* OPAL_HAVE_ATOMIC_MAX_64 */ -#if !defined(OPAL_HAVE_ATOMIC_SUB_64) -#define OPAL_HAVE_ATOMIC_SUB_64 1 -static inline int64_t -opal_atomic_sub_64(volatile int64_t *addr, int64_t delta) +#if !defined(OPAL_HAVE_ATOMIC_SWAP_64) +#define OPAL_HAVE_ATOMIC_SWAP_64 1 +static inline int64_t opal_atomic_swap_64(volatile int64_t *addr, + int64_t newval) { - int64_t oldval; - + int64_t old = *addr; do { - oldval = *addr; - } while (0 == opal_atomic_cmpset_64(addr, oldval, oldval - delta)); - return (oldval - delta); -} -#endif /* OPAL_HAVE_ATOMIC_SUB_64 */ + } while (!opal_atomic_compare_exchange_strong_64 (addr, &old, newval)); -#else + return old; +} +#endif /* OPAL_HAVE_ATOMIC_SWAP_64 */ #if !defined(OPAL_HAVE_ATOMIC_ADD_64) -#define OPAL_HAVE_ATOMIC_ADD_64 0 -#endif +#define OPAL_HAVE_ATOMIC_ADD_64 1 -#if !defined(OPAL_HAVE_ATOMIC_SUB_64) -#define OPAL_HAVE_ATOMIC_SUB_64 0 -#endif +OPAL_ATOMIC_DEFINE_CMPXCG_OP(int64_t, 64, +, add) -#endif /* OPAL_HAVE_ATOMIC_CMPSET_64 */ +#endif /* OPAL_HAVE_ATOMIC_ADD_64 */ +#if !defined(OPAL_HAVE_ATOMIC_AND_64) +#define OPAL_HAVE_ATOMIC_AND_64 1 -#if (OPAL_HAVE_ATOMIC_CMPSET_32 || OPAL_HAVE_ATOMIC_CMPSET_64) +OPAL_ATOMIC_DEFINE_CMPXCG_OP(int64_t, 64, &, and) -static inline int -opal_atomic_cmpset_xx(volatile void* addr, int64_t oldval, - int64_t newval, size_t length) -{ - switch( length ) { -#if OPAL_HAVE_ATOMIC_CMPSET_32 - case 4: - return opal_atomic_cmpset_32( (volatile int32_t*)addr, - (int32_t)oldval, (int32_t)newval ); -#endif /* OPAL_HAVE_ATOMIC_CMPSET_32 */ +#endif /* OPAL_HAVE_ATOMIC_AND_64 */ -#if OPAL_HAVE_ATOMIC_CMPSET_64 - case 8: - return opal_atomic_cmpset_64( (volatile int64_t*)addr, - (int64_t)oldval, (int64_t)newval ); -#endif /* OPAL_HAVE_ATOMIC_CMPSET_64 */ - } - abort(); - /* This should never happen, so deliberately abort (hopefully - leaving a corefile for analysis) */ -} +#if !defined(OPAL_HAVE_ATOMIC_OR_64) +#define OPAL_HAVE_ATOMIC_OR_64 1 +OPAL_ATOMIC_DEFINE_CMPXCG_OP(int64_t, 64, |, or) -static inline int -opal_atomic_cmpset_acq_xx(volatile void* addr, int64_t oldval, - int64_t newval, size_t length) -{ - switch( length ) { -#if OPAL_HAVE_ATOMIC_CMPSET_32 - case 4: - return opal_atomic_cmpset_acq_32( (volatile int32_t*)addr, - (int32_t)oldval, (int32_t)newval ); -#endif /* OPAL_HAVE_ATOMIC_CMPSET_32 */ +#endif /* OPAL_HAVE_ATOMIC_OR_64 */ -#if OPAL_HAVE_ATOMIC_CMPSET_64 - case 8: - return opal_atomic_cmpset_acq_64( (volatile int64_t*)addr, - (int64_t)oldval, (int64_t)newval ); -#endif /* OPAL_HAVE_ATOMIC_CMPSET_64 */ - } - /* This should never happen, so deliberately abort (hopefully - leaving a corefile for analysis) */ - abort(); -} +#if !defined(OPAL_HAVE_ATOMIC_XOR_64) +#define OPAL_HAVE_ATOMIC_XOR_64 1 +OPAL_ATOMIC_DEFINE_CMPXCG_OP(int64_t, 64, ^, xor) -static inline int -opal_atomic_cmpset_rel_xx(volatile void* addr, int64_t oldval, - int64_t newval, size_t length) -{ - switch( length ) { -#if OPAL_HAVE_ATOMIC_CMPSET_32 - case 4: - return opal_atomic_cmpset_rel_32( (volatile int32_t*)addr, - (int32_t)oldval, (int32_t)newval ); -#endif /* OPAL_HAVE_ATOMIC_CMPSET_32 */ +#endif /* OPAL_HAVE_ATOMIC_XOR_64 */ -#if OPAL_HAVE_ATOMIC_CMPSET_64 - case 8: - return opal_atomic_cmpset_rel_64( (volatile int64_t*)addr, - (int64_t)oldval, (int64_t)newval ); -#endif /* OPAL_HAVE_ATOMIC_CMPSET_64 */ - } - /* This should never happen, so deliberately abort (hopefully - leaving a corefile for analysis) */ - abort(); -} +#if !defined(OPAL_HAVE_ATOMIC_SUB_64) +#define OPAL_HAVE_ATOMIC_SUB_64 1 +OPAL_ATOMIC_DEFINE_CMPXCG_OP(int64_t, 64, -, sub) + +#endif /* OPAL_HAVE_ATOMIC_SUB_64 */ -static inline int -opal_atomic_cmpset_ptr(volatile void* addr, - void* oldval, - void* newval) -{ -#if SIZEOF_VOID_P == 4 && OPAL_HAVE_ATOMIC_CMPSET_32 - return opal_atomic_cmpset_32((int32_t*) addr, (unsigned long) oldval, - (unsigned long) newval); -#elif SIZEOF_VOID_P == 8 && OPAL_HAVE_ATOMIC_CMPSET_64 - return opal_atomic_cmpset_64((int64_t*) addr, (unsigned long) oldval, - (unsigned long) newval); #else - abort(); + +#if !defined(OPAL_HAVE_ATOMIC_ADD_64) +#define OPAL_HAVE_ATOMIC_ADD_64 0 #endif -} +#if !defined(OPAL_HAVE_ATOMIC_SUB_64) +#define OPAL_HAVE_ATOMIC_SUB_64 0 +#endif -static inline int -opal_atomic_cmpset_acq_ptr(volatile void* addr, - void* oldval, - void* newval) -{ -#if SIZEOF_VOID_P == 4 && OPAL_HAVE_ATOMIC_CMPSET_32 - return opal_atomic_cmpset_acq_32((int32_t*) addr, (unsigned long) oldval, - (unsigned long) newval); -#elif SIZEOF_VOID_P == 8 && OPAL_HAVE_ATOMIC_CMPSET_64 - return opal_atomic_cmpset_acq_64((int64_t*) addr, (unsigned long) oldval, - (unsigned long) newval); +#endif /* OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64 */ + +#if (OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 || OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64) + +#if OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 && OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64 +#define OPAL_ATOMIC_DEFINE_CMPXCG_XX(semantics) \ + static inline bool \ + opal_atomic_compare_exchange_strong ## semantics ## xx (volatile void* addr, void *oldval, \ + int64_t newval, const size_t length) \ + { \ + switch (length) { \ + case 4: \ + return opal_atomic_compare_exchange_strong_32 ((volatile int32_t *) addr, \ + (int32_t *) oldval, (int32_t) newval); \ + case 8: \ + return opal_atomic_compare_exchange_strong_64 ((volatile int64_t *) addr, \ + (int64_t *) oldval, (int64_t) newval); \ + } \ + abort(); \ + } +#elif OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 +#define OPAL_ATOMIC_DEFINE_CMPXCG_XX(semantics) \ + static inline bool \ + opal_atomic_compare_exchange_strong ## semantics ## xx (volatile void* addr, void *oldval, \ + int64_t newval, const size_t length) \ + { \ + switch (length) { \ + case 4: \ + return opal_atomic_compare_exchange_strong_32 ((volatile int32_t *) addr, \ + (int32_t *) oldval, (int32_t) newval); \ + } \ + abort(); \ + } #else - abort(); +#error "Platform does not have required atomic compare-and-swap functionality" #endif -} - -static inline int opal_atomic_cmpset_rel_ptr(volatile void* addr, - void* oldval, - void* newval) -{ -#if SIZEOF_VOID_P == 4 && OPAL_HAVE_ATOMIC_CMPSET_32 - return opal_atomic_cmpset_rel_32((int32_t*) addr, (unsigned long) oldval, - (unsigned long) newval); -#elif SIZEOF_VOID_P == 8 && OPAL_HAVE_ATOMIC_CMPSET_64 - return opal_atomic_cmpset_rel_64((int64_t*) addr, (unsigned long) oldval, - (unsigned long) newval); +OPAL_ATOMIC_DEFINE_CMPXCG_XX(_) +OPAL_ATOMIC_DEFINE_CMPXCG_XX(_acq_) +OPAL_ATOMIC_DEFINE_CMPXCG_XX(_rel_) + +#if SIZEOF_VOID_P == 4 && OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 +#define OPAL_ATOMIC_DEFINE_CMPXCG_PTR_XX(semantics) \ + static inline bool \ + opal_atomic_compare_exchange_strong ## semantics ## ptr (volatile void* addr, void *oldval, void *newval) \ + { \ + return opal_atomic_compare_exchange_strong_32 ((volatile int32_t *) addr, (int32_t *) oldval, (int32_t) newval); \ + } +#elif SIZEOF_VOID_P == 8 && OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64 +#define OPAL_ATOMIC_DEFINE_CMPXCG_PTR_XX(semantics) \ + static inline bool \ + opal_atomic_compare_exchange_strong ## semantics ## ptr (volatile void* addr, void *oldval, void *newval) \ + { \ + return opal_atomic_compare_exchange_strong_64 ((volatile int64_t *) addr, (int64_t *) oldval, (int64_t) newval); \ + } #else - abort(); +#error "Can not define opal_atomic_compare_exchange_strong_ptr with existing atomics" #endif -} -#endif /* (OPAL_HAVE_ATOMIC_CMPSET_32 || OPAL_HAVE_ATOMIC_CMPSET_64) */ +OPAL_ATOMIC_DEFINE_CMPXCG_PTR_XX(_) +OPAL_ATOMIC_DEFINE_CMPXCG_PTR_XX(_acq_) +OPAL_ATOMIC_DEFINE_CMPXCG_PTR_XX(_rel_) + +#endif /* (OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 || OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64) */ + #if (OPAL_HAVE_ATOMIC_SWAP_32 || OPAL_HAVE_ATOMIC_SWAP_64) @@ -300,20 +330,19 @@ static inline int opal_atomic_cmpset_rel_ptr(volatile void* addr, #if OPAL_HAVE_ATOMIC_MATH_32 || OPAL_HAVE_ATOMIC_MATH_64 - static inline void -opal_atomic_add_xx(volatile void* addr, int32_t value, size_t length) + opal_atomic_add_xx(volatile void* addr, int32_t value, size_t length) { switch( length ) { #if OPAL_HAVE_ATOMIC_ADD_32 case 4: - opal_atomic_add_32( (volatile int32_t*)addr, (int32_t)value ); + (void) opal_atomic_fetch_add_32( (volatile int32_t*)addr, (int32_t)value ); break; -#endif /* OPAL_HAVE_ATOMIC_CMPSET_32 */ +#endif /* OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 */ #if OPAL_HAVE_ATOMIC_ADD_64 case 8: - opal_atomic_add_64( (volatile int64_t*)addr, (int64_t)value ); + (void) opal_atomic_fetch_add_64( (volatile int64_t*)addr, (int64_t)value ); break; #endif /* OPAL_HAVE_ATOMIC_ADD_64 */ default: @@ -330,13 +359,13 @@ opal_atomic_sub_xx(volatile void* addr, int32_t value, size_t length) switch( length ) { #if OPAL_HAVE_ATOMIC_SUB_32 case 4: - opal_atomic_sub_32( (volatile int32_t*)addr, (int32_t)value ); + (void) opal_atomic_fetch_sub_32( (volatile int32_t*)addr, (int32_t)value ); break; #endif /* OPAL_HAVE_ATOMIC_SUB_32 */ #if OPAL_HAVE_ATOMIC_SUB_64 case 8: - opal_atomic_sub_64( (volatile int64_t*)addr, (int64_t)value ); + (void) opal_atomic_fetch_sub_64( (volatile int64_t*)addr, (int64_t)value ); break; #endif /* OPAL_HAVE_ATOMIC_SUB_64 */ default: @@ -346,47 +375,102 @@ opal_atomic_sub_xx(volatile void* addr, int32_t value, size_t length) } } -#if SIZEOF_VOID_P == 4 && OPAL_HAVE_ATOMIC_ADD_32 -static inline int32_t opal_atomic_add_ptr( volatile void* addr, - void* delta ) +#define OPAL_ATOMIC_DEFINE_OP_FETCH(op, operation, type, ptr_type, suffix) \ + static inline type opal_atomic_ ## op ## _fetch_ ## suffix (volatile ptr_type *addr, type value) \ + { \ + return opal_atomic_fetch_ ## op ## _ ## suffix (addr, value) operation value; \ + } + +OPAL_ATOMIC_DEFINE_OP_FETCH(add, +, int32_t, int32_t, 32) +OPAL_ATOMIC_DEFINE_OP_FETCH(and, &, int32_t, int32_t, 32) +OPAL_ATOMIC_DEFINE_OP_FETCH(or, |, int32_t, int32_t, 32) +OPAL_ATOMIC_DEFINE_OP_FETCH(xor, ^, int32_t, int32_t, 32) +OPAL_ATOMIC_DEFINE_OP_FETCH(sub, -, int32_t, int32_t, 32) + +static inline int32_t opal_atomic_min_fetch_32 (volatile int32_t *addr, int32_t value) { - return opal_atomic_add_32((int32_t*) addr, (unsigned long) delta); + int32_t old = opal_atomic_fetch_min_32 (addr, value); + return old <= value ? old : value; } -#elif SIZEOF_VOID_P == 8 && OPAL_HAVE_ATOMIC_ADD_64 -static inline int64_t opal_atomic_add_ptr( volatile void* addr, - void* delta ) + +static inline int32_t opal_atomic_max_fetch_32 (volatile int32_t *addr, int32_t value) { - return opal_atomic_add_64((int64_t*) addr, (unsigned long) delta); + int32_t old = opal_atomic_fetch_max_32 (addr, value); + return old >= value ? old : value; } -#else -static inline int32_t opal_atomic_add_ptr( volatile void* addr, - void* delta ) + +#if OPAL_HAVE_ATOMIC_MATH_64 +OPAL_ATOMIC_DEFINE_OP_FETCH(add, +, int64_t, int64_t, 64) +OPAL_ATOMIC_DEFINE_OP_FETCH(and, &, int64_t, int64_t, 64) +OPAL_ATOMIC_DEFINE_OP_FETCH(or, |, int64_t, int64_t, 64) +OPAL_ATOMIC_DEFINE_OP_FETCH(xor, ^, int64_t, int64_t, 64) +OPAL_ATOMIC_DEFINE_OP_FETCH(sub, -, int64_t, int64_t, 64) + +static inline int64_t opal_atomic_min_fetch_64 (volatile int64_t *addr, int64_t value) { - abort(); - return 0; + int64_t old = opal_atomic_fetch_min_64 (addr, value); + return old <= value ? old : value; } + +static inline int64_t opal_atomic_max_fetch_64 (volatile int64_t *addr, int64_t value) +{ + int64_t old = opal_atomic_fetch_max_64 (addr, value); + return old >= value ? old : value; +} + #endif -#if SIZEOF_VOID_P == 4 && OPAL_HAVE_ATOMIC_SUB_32 -static inline int32_t opal_atomic_sub_ptr( volatile void* addr, +static inline intptr_t opal_atomic_fetch_add_ptr( volatile void* addr, void* delta ) { - return opal_atomic_sub_32((int32_t*) addr, (unsigned long) delta); +#if SIZEOF_VOID_P == 4 && OPAL_HAVE_ATOMIC_ADD_32 + return opal_atomic_fetch_add_32((int32_t*) addr, (unsigned long) delta); +#elif SIZEOF_VOID_P == 8 && OPAL_HAVE_ATOMIC_ADD_64 + return opal_atomic_fetch_add_64((int64_t*) addr, (unsigned long) delta); +#else + abort (); + return 0; +#endif } -#elif SIZEOF_VOID_P == 8 && OPAL_HAVE_ATOMIC_SUB_32 -static inline int64_t opal_atomic_sub_ptr( volatile void* addr, + +static inline intptr_t opal_atomic_add_fetch_ptr( volatile void* addr, void* delta ) { - return opal_atomic_sub_64((int64_t*) addr, (unsigned long) delta); -} +#if SIZEOF_VOID_P == 4 && OPAL_HAVE_ATOMIC_ADD_32 + return opal_atomic_add_fetch_32((int32_t*) addr, (unsigned long) delta); +#elif SIZEOF_VOID_P == 8 && OPAL_HAVE_ATOMIC_ADD_64 + return opal_atomic_add_fetch_64((int64_t*) addr, (unsigned long) delta); #else -static inline int32_t opal_atomic_sub_ptr( volatile void* addr, + abort (); + return 0; +#endif +} + +static inline intptr_t opal_atomic_fetch_sub_ptr( volatile void* addr, void* delta ) { +#if SIZEOF_VOID_P == 4 && OPAL_HAVE_ATOMIC_SUB_32 + return opal_atomic_fetch_sub_32((int32_t*) addr, (unsigned long) delta); +#elif SIZEOF_VOID_P == 8 && OPAL_HAVE_ATOMIC_SUB_32 + return opal_atomic_fetch_sub_64((int64_t*) addr, (unsigned long) delta); +#else abort(); return 0; +#endif } + +static inline intptr_t opal_atomic_sub_fetch_ptr( volatile void* addr, + void* delta ) +{ +#if SIZEOF_VOID_P == 4 && OPAL_HAVE_ATOMIC_SUB_32 + return opal_atomic_sub_fetch_32((int32_t*) addr, (unsigned long) delta); +#elif SIZEOF_VOID_P == 8 && OPAL_HAVE_ATOMIC_SUB_32 + return opal_atomic_sub_fetch_64((int64_t*) addr, (unsigned long) delta); +#else + abort(); + return 0; #endif +} #endif /* OPAL_HAVE_ATOMIC_MATH_32 || OPAL_HAVE_ATOMIC_MATH_64 */ @@ -401,7 +485,7 @@ static inline int32_t opal_atomic_sub_ptr( volatile void* addr, * Lock initialization function. It set the lock to UNLOCKED. */ static inline void -opal_atomic_init( opal_atomic_lock_t* lock, int32_t value ) +opal_atomic_lock_init( opal_atomic_lock_t* lock, int32_t value ) { lock->u.lock = value; } @@ -410,21 +494,20 @@ opal_atomic_init( opal_atomic_lock_t* lock, int32_t value ) static inline int opal_atomic_trylock(opal_atomic_lock_t *lock) { - int ret = opal_atomic_cmpset_acq_32( &(lock->u.lock), - OPAL_ATOMIC_UNLOCKED, OPAL_ATOMIC_LOCKED); - return (ret == 0) ? 1 : 0; + int32_t unlocked = OPAL_ATOMIC_LOCK_UNLOCKED; + bool ret = opal_atomic_compare_exchange_strong_32 (&lock->u.lock, &unlocked, OPAL_ATOMIC_LOCK_LOCKED); + return (ret == false) ? 1 : 0; } static inline void opal_atomic_lock(opal_atomic_lock_t *lock) { - while( !opal_atomic_cmpset_acq_32( &(lock->u.lock), - OPAL_ATOMIC_UNLOCKED, OPAL_ATOMIC_LOCKED) ) { - while (lock->u.lock == OPAL_ATOMIC_LOCKED) { - /* spin */ ; - } - } + while (opal_atomic_trylock (lock)) { + while (lock->u.lock == OPAL_ATOMIC_LOCK_LOCKED) { + /* spin */ ; + } + } } @@ -432,7 +515,7 @@ static inline void opal_atomic_unlock(opal_atomic_lock_t *lock) { opal_atomic_wmb(); - lock->u.lock=OPAL_ATOMIC_UNLOCKED; + lock->u.lock=OPAL_ATOMIC_LOCK_UNLOCKED; } #endif /* OPAL_HAVE_ATOMIC_SPINLOCKS */ diff --git a/opal/include/opal/sys/cma.h b/opal/include/opal/sys/cma.h index 6304e749505..4211013a328 100644 --- a/opal/include/opal/sys/cma.h +++ b/opal/include/opal/sys/cma.h @@ -82,6 +82,16 @@ #endif +#elif OPAL_ASSEMBLY_ARCH == OPAL_S390 + +#define __NR_process_vm_readv 340 +#define __NR_process_vm_writev 341 + +#elif OPAL_ASSEMBLY_ARCH == OPAL_S390X + +#define __NR_process_vm_readv 340 +#define __NR_process_vm_writev 341 + #else #error "Unsupported architecture for process_vm_readv and process_vm_writev syscalls" #endif diff --git a/opal/include/opal/sys/gcc_builtin/atomic.h b/opal/include/opal/sys/gcc_builtin/atomic.h index 35543119245..c6ef6eb9c30 100644 --- a/opal/include/opal/sys/gcc_builtin/atomic.h +++ b/opal/include/opal/sys/gcc_builtin/atomic.h @@ -11,9 +11,9 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011 Sandia National Laboratories. All rights reserved. - * Copyright (c) 2014-2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2014-2017 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2016 Research Organization for Information Science + * Copyright (c) 2016-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -25,8 +25,6 @@ #ifndef OPAL_SYS_ARCH_ATOMIC_H #define OPAL_SYS_ARCH_ATOMIC_H 1 -#include - /********************************************************************** * * Memory Barriers @@ -35,13 +33,19 @@ #define OPAL_HAVE_ATOMIC_MEM_BARRIER 1 #define OPAL_HAVE_ATOMIC_MATH_32 1 -#define OPAL_HAVE_ATOMIC_CMPSET_32 1 +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 1 #define OPAL_HAVE_ATOMIC_ADD_32 1 +#define OPAL_HAVE_ATOMIC_AND_32 1 +#define OPAL_HAVE_ATOMIC_OR_32 1 +#define OPAL_HAVE_ATOMIC_XOR_32 1 #define OPAL_HAVE_ATOMIC_SUB_32 1 #define OPAL_HAVE_ATOMIC_SWAP_32 1 #define OPAL_HAVE_ATOMIC_MATH_64 1 -#define OPAL_HAVE_ATOMIC_CMPSET_64 1 +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64 1 #define OPAL_HAVE_ATOMIC_ADD_64 1 +#define OPAL_HAVE_ATOMIC_AND_64 1 +#define OPAL_HAVE_ATOMIC_OR_64 1 +#define OPAL_HAVE_ATOMIC_XOR_64 1 #define OPAL_HAVE_ATOMIC_SUB_64 1 #define OPAL_HAVE_ATOMIC_SWAP_64 1 @@ -77,26 +81,20 @@ static inline void opal_atomic_wmb(void) #pragma error_messages(off, E_ARG_INCOMPATIBLE_WITH_ARG_L) #endif -static inline int opal_atomic_cmpset_acq_32( volatile int32_t *addr, - int32_t oldval, int32_t newval) +static inline bool opal_atomic_compare_exchange_strong_acq_32 (volatile int32_t *addr, int32_t *oldval, int32_t newval) { - return __atomic_compare_exchange_n (addr, &oldval, newval, false, - __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); + return __atomic_compare_exchange_n (addr, oldval, newval, false, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); } -static inline int opal_atomic_cmpset_rel_32( volatile int32_t *addr, - int32_t oldval, int32_t newval) +static inline bool opal_atomic_compare_exchange_strong_rel_32 (volatile int32_t *addr, int32_t *oldval, int32_t newval) { - return __atomic_compare_exchange_n (addr, &oldval, newval, false, - __ATOMIC_RELEASE, __ATOMIC_RELAXED); + return __atomic_compare_exchange_n (addr, oldval, newval, false, __ATOMIC_RELEASE, __ATOMIC_RELAXED); } -static inline int opal_atomic_cmpset_32( volatile int32_t *addr, - int32_t oldval, int32_t newval) +static inline bool opal_atomic_compare_exchange_strong_32 (volatile int32_t *addr, int32_t *oldval, int32_t newval) { - return __atomic_compare_exchange_n (addr, &oldval, newval, false, - __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); + return __atomic_compare_exchange_n (addr, oldval, newval, false, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); } static inline int32_t opal_atomic_swap_32 (volatile int32_t *addr, int32_t newval) @@ -106,36 +104,45 @@ static inline int32_t opal_atomic_swap_32 (volatile int32_t *addr, int32_t newva return oldval; } -static inline int32_t opal_atomic_add_32(volatile int32_t *addr, int32_t delta) +static inline int32_t opal_atomic_fetch_add_32(volatile int32_t *addr, int32_t delta) { - return __atomic_add_fetch (addr, delta, __ATOMIC_RELAXED); + return __atomic_fetch_add (addr, delta, __ATOMIC_RELAXED); } -static inline int32_t opal_atomic_sub_32(volatile int32_t *addr, int32_t delta) +static inline int32_t opal_atomic_fetch_and_32(volatile int32_t *addr, int32_t value) { - return __atomic_sub_fetch (addr, delta, __ATOMIC_RELAXED); + return __atomic_fetch_and (addr, value, __ATOMIC_RELAXED); } -static inline int opal_atomic_cmpset_acq_64( volatile int64_t *addr, - int64_t oldval, int64_t newval) +static inline int32_t opal_atomic_fetch_or_32(volatile int32_t *addr, int32_t value) { - return __atomic_compare_exchange_n (addr, &oldval, newval, false, - __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); + return __atomic_fetch_or (addr, value, __ATOMIC_RELAXED); +} + +static inline int32_t opal_atomic_fetch_xor_32(volatile int32_t *addr, int32_t value) +{ + return __atomic_fetch_xor (addr, value, __ATOMIC_RELAXED); } -static inline int opal_atomic_cmpset_rel_64( volatile int64_t *addr, - int64_t oldval, int64_t newval) +static inline int32_t opal_atomic_fetch_sub_32(volatile int32_t *addr, int32_t delta) { - return __atomic_compare_exchange_n (addr, &oldval, newval, false, - __ATOMIC_RELEASE, __ATOMIC_RELAXED); + return __atomic_fetch_sub (addr, delta, __ATOMIC_RELAXED); } +static inline bool opal_atomic_compare_exchange_strong_acq_64 (volatile int64_t *addr, int64_t *oldval, int64_t newval) +{ + return __atomic_compare_exchange_n (addr, oldval, newval, false, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); +} -static inline int opal_atomic_cmpset_64( volatile int64_t *addr, - int64_t oldval, int64_t newval) +static inline bool opal_atomic_compare_exchange_strong_rel_64 (volatile int64_t *addr, int64_t *oldval, int64_t newval) { - return __atomic_compare_exchange_n (addr, &oldval, newval, false, - __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); + return __atomic_compare_exchange_n (addr, oldval, newval, false, __ATOMIC_RELEASE, __ATOMIC_RELAXED); +} + + +static inline bool opal_atomic_compare_exchange_strong_64 (volatile int64_t *addr, int64_t *oldval, int64_t newval) +{ + return __atomic_compare_exchange_n (addr, oldval, newval, false, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); } static inline int64_t opal_atomic_swap_64 (volatile int64_t *addr, int64_t newval) @@ -145,37 +152,55 @@ static inline int64_t opal_atomic_swap_64 (volatile int64_t *addr, int64_t newva return oldval; } -static inline int64_t opal_atomic_add_64(volatile int64_t *addr, int64_t delta) +static inline int64_t opal_atomic_fetch_add_64(volatile int64_t *addr, int64_t delta) +{ + return __atomic_fetch_add (addr, delta, __ATOMIC_RELAXED); +} + +static inline int64_t opal_atomic_fetch_and_64(volatile int64_t *addr, int64_t value) +{ + return __atomic_fetch_and (addr, value, __ATOMIC_RELAXED); +} + +static inline int64_t opal_atomic_fetch_or_64(volatile int64_t *addr, int64_t value) +{ + return __atomic_fetch_or (addr, value, __ATOMIC_RELAXED); +} + +static inline int64_t opal_atomic_fetch_xor_64(volatile int64_t *addr, int64_t value) { - return __atomic_add_fetch (addr, delta, __ATOMIC_RELAXED); + return __atomic_fetch_xor (addr, value, __ATOMIC_RELAXED); } -static inline int64_t opal_atomic_sub_64(volatile int64_t *addr, int64_t delta) +static inline int64_t opal_atomic_fetch_sub_64(volatile int64_t *addr, int64_t delta) { - return __atomic_sub_fetch (addr, delta, __ATOMIC_RELAXED); + return __atomic_fetch_sub (addr, delta, __ATOMIC_RELAXED); } #if OPAL_HAVE_GCC_BUILTIN_CSWAP_INT128 -#define OPAL_HAVE_ATOMIC_CMPSET_128 1 +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_128 1 -static inline int opal_atomic_cmpset_128 (volatile opal_int128_t *addr, - opal_int128_t oldval, opal_int128_t newval) +static inline bool opal_atomic_compare_exchange_strong_128 (volatile opal_int128_t *addr, + opal_int128_t *oldval, opal_int128_t newval) { - return __atomic_compare_exchange_n (addr, &oldval, newval, false, + return __atomic_compare_exchange_n (addr, oldval, newval, false, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); } #elif defined(OPAL_HAVE_SYNC_BUILTIN_CSWAP_INT128) && OPAL_HAVE_SYNC_BUILTIN_CSWAP_INT128 -#define OPAL_HAVE_ATOMIC_CMPSET_128 1 +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_128 1 /* __atomic version is not lock-free so use legacy __sync version */ -static inline int opal_atomic_cmpset_128 (volatile opal_int128_t *addr, - opal_int128_t oldval, opal_int128_t newval) +static inline bool opal_atomic_compare_exchange_strong_128 (volatile opal_int128_t *addr, + opal_int128_t *oldval, opal_int128_t newval) { - return __sync_bool_compare_and_swap (addr, oldval, newval); + opal_int128_t prev = __sync_val_compare_and_swap (addr, *oldval, newval); + bool ret = prev == *oldval; + *oldval = prev; + return ret; } #endif @@ -186,16 +211,16 @@ static inline int opal_atomic_cmpset_128 (volatile opal_int128_t *addr, #define OPAL_HAVE_ATOMIC_SPINLOCKS 1 -static inline void opal_atomic_init (opal_atomic_lock_t* lock, int32_t value) +static inline void opal_atomic_lock_init (opal_atomic_lock_t* lock, int32_t value) { lock->u.lock = value; } static inline int opal_atomic_trylock(opal_atomic_lock_t *lock) { - int ret = __atomic_exchange_n (&lock->u.lock, OPAL_ATOMIC_LOCKED, + int ret = __atomic_exchange_n (&lock->u.lock, OPAL_ATOMIC_LOCK_LOCKED, __ATOMIC_ACQUIRE | __ATOMIC_HLE_ACQUIRE); - if (OPAL_ATOMIC_LOCKED == ret) { + if (OPAL_ATOMIC_LOCK_LOCKED == ret) { /* abort the transaction */ _mm_pause (); return 1; @@ -206,7 +231,7 @@ static inline int opal_atomic_trylock(opal_atomic_lock_t *lock) static inline void opal_atomic_lock (opal_atomic_lock_t *lock) { - while (OPAL_ATOMIC_LOCKED == __atomic_exchange_n (&lock->u.lock, OPAL_ATOMIC_LOCKED, + while (OPAL_ATOMIC_LOCK_LOCKED == __atomic_exchange_n (&lock->u.lock, OPAL_ATOMIC_LOCK_LOCKED, __ATOMIC_ACQUIRE | __ATOMIC_HLE_ACQUIRE)) { /* abort the transaction */ _mm_pause (); @@ -215,7 +240,7 @@ static inline void opal_atomic_lock (opal_atomic_lock_t *lock) static inline void opal_atomic_unlock (opal_atomic_lock_t *lock) { - __atomic_store_n (&lock->u.lock, OPAL_ATOMIC_UNLOCKED, + __atomic_store_n (&lock->u.lock, OPAL_ATOMIC_LOCK_UNLOCKED, __ATOMIC_RELEASE | __ATOMIC_HLE_RELEASE); } diff --git a/opal/include/opal/sys/ia32/atomic.h b/opal/include/opal/sys/ia32/atomic.h index 2a053849eb9..bb863dec14a 100644 --- a/opal/include/opal/sys/ia32/atomic.h +++ b/opal/include/opal/sys/ia32/atomic.h @@ -13,7 +13,7 @@ * Copyright (c) 2007-2010 Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2015-2017 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -40,17 +40,12 @@ *********************************************************************/ #define OPAL_HAVE_ATOMIC_MEM_BARRIER 1 -#define OPAL_HAVE_ATOMIC_CMPSET_32 1 +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 1 #define OPAL_HAVE_ATOMIC_MATH_32 1 #define OPAL_HAVE_ATOMIC_ADD_32 1 #define OPAL_HAVE_ATOMIC_SUB_32 1 -#define OPAL_HAVE_ATOMIC_CMPSET_64 1 - -#undef OPAL_HAVE_INLINE_ATOMIC_CMPSET_64 -#define OPAL_HAVE_INLINE_ATOMIC_CMPSET_64 0 - /********************************************************************** * * Memory Barriers @@ -89,73 +84,23 @@ static inline void opal_atomic_isync(void) *********************************************************************/ #if OPAL_GCC_INLINE_ASSEMBLY -static inline int opal_atomic_cmpset_32(volatile int32_t *addr, - int32_t oldval, - int32_t newval) +static inline bool opal_atomic_compare_exchange_strong_32 (volatile int32_t *addr, int32_t *oldval, int32_t newval) { unsigned char ret; __asm__ __volatile__ ( SMPLOCK "cmpxchgl %3,%2 \n\t" "sete %0 \n\t" - : "=qm" (ret), "+a" (oldval), "+m" (*addr) + : "=qm" (ret), "+a" (*oldval), "+m" (*addr) : "q"(newval) : "memory", "cc"); - return (int)ret; -} - -#endif /* OPAL_GCC_INLINE_ASSEMBLY */ - -#define opal_atomic_cmpset_acq_32 opal_atomic_cmpset_32 -#define opal_atomic_cmpset_rel_32 opal_atomic_cmpset_32 - -#if OPAL_GCC_INLINE_ASSEMBLY - -#if 0 - -/* some versions of GCC won't let you use ebx period (even though they - should be able to save / restore for the life of the inline - assembly). For the beta, just use the non-inline version */ - -#ifndef ll_low /* GLIBC provides these somewhere, so protect */ -#define ll_low(x) *(((unsigned int*)&(x))+0) -#define ll_high(x) *(((unsigned int*)&(x))+1) -#endif - -/* On Linux the EBX register is used by the shared libraries - * to keep the global offset. In same time this register is - * required by the cmpxchg8b instruction (as an input parameter). - * This conflict force us to save the EBX before the cmpxchg8b - * and to restore it afterward. - */ -static inline int opal_atomic_cmpset_64(volatile int64_t *addr, - int64_t oldval, - int64_t newval) -{ - /* - * Compare EDX:EAX with m64. If equal, set ZF and load ECX:EBX into - * m64. Else, clear ZF and load m64 into EDX:EAX. - */ - unsigned char ret; - - __asm__ __volatile__( - "push %%ebx \n\t" - "movl %4, %%ebx \n\t" - SMPLOCK "cmpxchg8b (%1) \n\t" - "sete %0 \n\t" - "pop %%ebx \n\t" - : "=qm"(ret) - : "D"(addr), "a"(ll_low(oldval)), "d"(ll_high(oldval)), - "r"(ll_low(newval)), "c"(ll_high(newval)) - : "cc", "memory", "ebx"); - return (int) ret; + return (bool) ret; } -#endif /* if 0 */ #endif /* OPAL_GCC_INLINE_ASSEMBLY */ -#define opal_atomic_cmpset_acq_64 opal_atomic_cmpset_64 -#define opal_atomic_cmpset_rel_64 opal_atomic_cmpset_64 +#define opal_atomic_compare_exchange_strong_acq_32 opal_atomic_compare_exchange_strong_32 +#define opal_atomic_compare_exchange_strong_rel_32 opal_atomic_compare_exchange_strong_32 #if OPAL_GCC_INLINE_ASSEMBLY @@ -185,7 +130,7 @@ static inline int32_t opal_atomic_swap_32( volatile int32_t *addr, * * Atomically adds @i to @v. */ -static inline int32_t opal_atomic_add_32(volatile int32_t* v, int i) +static inline int32_t opal_atomic_fetch_add_32(volatile int32_t* v, int i) { int ret = i; __asm__ __volatile__( @@ -194,7 +139,7 @@ static inline int32_t opal_atomic_add_32(volatile int32_t* v, int i) : :"memory", "cc" ); - return (ret+i); + return ret; } @@ -205,7 +150,7 @@ static inline int32_t opal_atomic_add_32(volatile int32_t* v, int i) * * Atomically subtracts @i from @v. */ -static inline int32_t opal_atomic_sub_32(volatile int32_t* v, int i) +static inline int32_t opal_atomic_fetch_sub_32(volatile int32_t* v, int i) { int ret = -i; __asm__ __volatile__( @@ -214,7 +159,7 @@ static inline int32_t opal_atomic_sub_32(volatile int32_t* v, int i) : :"memory", "cc" ); - return (ret-i); + return ret; } #endif /* OPAL_GCC_INLINE_ASSEMBLY */ diff --git a/opal/include/opal/sys/ia64/Makefile.am b/opal/include/opal/sys/ia64/Makefile.am deleted file mode 100644 index b189dc22d44..00000000000 --- a/opal/include/opal/sys/ia64/Makefile.am +++ /dev/null @@ -1,23 +0,0 @@ -# -# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana -# University Research and Technology -# Corporation. All rights reserved. -# Copyright (c) 2004-2005 The University of Tennessee and The University -# of Tennessee Research Foundation. All rights -# reserved. -# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, -# University of Stuttgart. All rights reserved. -# Copyright (c) 2004-2005 The Regents of the University of California. -# All rights reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - -# This makefile.am does not stand on its own - it is included from opal/include/Makefile.am - -headers += \ - opal/sys/ia64/atomic.h \ - opal/sys/ia64/timer.h diff --git a/opal/include/opal/sys/ia64/atomic.h b/opal/include/opal/sys/ia64/atomic.h deleted file mode 100644 index cd77c5214c3..00000000000 --- a/opal/include/opal/sys/ia64/atomic.h +++ /dev/null @@ -1,145 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2005 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef OPAL_SYS_ARCH_ATOMIC_H -#define OPAL_SYS_ARCH_ATOMIC_H 1 - -/* - * On ia64, we use cmpxchg, which supports acquire/release semantics natively. - */ - - -#define MB() __asm__ __volatile__("mf": : :"memory") - - -/********************************************************************** - * - * Define constants for IA64 - * - *********************************************************************/ -#define OPAL_HAVE_ATOMIC_MEM_BARRIER 1 - -#define OPAL_HAVE_ATOMIC_CMPSET_32 1 -#define OPAL_HAVE_ATOMIC_CMPSET_64 1 - -/********************************************************************** - * - * Memory Barriers - * - *********************************************************************/ -#if OPAL_GCC_INLINE_ASSEMBLY - -static inline void opal_atomic_mb(void) -{ - MB(); -} - - -static inline void opal_atomic_rmb(void) -{ - MB(); -} - - -static inline void opal_atomic_wmb(void) -{ - MB(); -} - -static inline void opal_atomic_isync(void) -{ -} - -#endif /* OPAL_GCC_INLINE_ASSEMBLY */ - - -/********************************************************************** - * - * Atomic math operations - * - *********************************************************************/ -#if OPAL_GCC_INLINE_ASSEMBLY - -#define ia64_cmpxchg4_acq(ptr, new, old) \ -({ \ - __u64 ia64_intri_res; \ - ia64_intri_res; \ -}) - -static inline int opal_atomic_cmpset_acq_32( volatile int32_t *addr, - int32_t oldval, int32_t newval) -{ - int64_t ret; - - __asm__ __volatile__ ("mov ar.ccv=%0;;" :: "rO"(oldval)); - __asm__ __volatile__ ("cmpxchg4.acq %0=[%1],%2,ar.ccv": - "=r"(ret) : "r"(addr), "r"(newval) : "memory"); - - return ((int32_t)ret == oldval); -} - - -static inline int opal_atomic_cmpset_rel_32( volatile int32_t *addr, - int32_t oldval, int32_t newval) -{ - int64_t ret; - - __asm__ __volatile__ ("mov ar.ccv=%0;;" :: "rO"(oldval)); - __asm__ __volatile__ ("cmpxchg4.rel %0=[%1],%2,ar.ccv": - "=r"(ret) : "r"(addr), "r"(newval) : "memory"); - - return ((int32_t)ret == oldval); -} - -#endif /* OPAL_GCC_INLINE_ASSEMBLY */ - - -#define opal_atomic_cmpset_32 opal_atomic_cmpset_acq_32 - -#if OPAL_GCC_INLINE_ASSEMBLY - -static inline int opal_atomic_cmpset_acq_64( volatile int64_t *addr, - int64_t oldval, int64_t newval) -{ - int64_t ret; - - __asm__ __volatile__ ("mov ar.ccv=%0;;" :: "rO"(oldval)); - __asm__ __volatile__ ("cmpxchg8.acq %0=[%1],%2,ar.ccv": - "=r"(ret) : "r"(addr), "r"(newval) : "memory"); - - return (ret == oldval); -} - - -static inline int opal_atomic_cmpset_rel_64( volatile int64_t *addr, - int64_t oldval, int64_t newval) -{ - int64_t ret; - - __asm__ __volatile__ ("mov ar.ccv=%0;;" :: "rO"(oldval)); - __asm__ __volatile__ ("cmpxchg8.rel %0=[%1],%2,ar.ccv": - "=r"(ret) : "r"(addr), "r"(newval) : "memory"); - - return (ret == oldval); -} - -#endif /* OPAL_GCC_INLINE_ASSEMBLY */ - -#define opal_atomic_cmpset_64 opal_atomic_cmpset_acq_64 - -#endif /* ! OPAL_SYS_ARCH_ATOMIC_H */ diff --git a/opal/include/opal/sys/ia64/timer.h b/opal/include/opal/sys/ia64/timer.h deleted file mode 100644 index 36356730aec..00000000000 --- a/opal/include/opal/sys/ia64/timer.h +++ /dev/null @@ -1,48 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2005 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef OPAL_SYS_ARCH_TIMER_H -#define OPAL_SYS_ARCH_TIMER_H 1 - - -typedef uint64_t opal_timer_t; - - -#if OPAL_GCC_INLINE_ASSEMBLY - -static inline opal_timer_t -opal_sys_timer_get_cycles(void) -{ - opal_timer_t ret; - - __asm__ __volatile__ ("mov %0=ar.itc" : "=r"(ret)); - - return ret; -} - -#define OPAL_HAVE_SYS_TIMER_GET_CYCLES 1 - -#else - -opal_timer_t opal_sys_timer_get_cycles(void); - -#define OPAL_HAVE_SYS_TIMER_GET_CYCLES 1 - -#endif /* OPAL_GCC_INLINE_ASSEMBLY */ - -#endif /* ! OPAL_SYS_ARCH_TIMER_H */ diff --git a/opal/include/opal/sys/ia64/update.sh b/opal/include/opal/sys/ia64/update.sh deleted file mode 100644 index 0f2f4af1eea..00000000000 --- a/opal/include/opal/sys/ia64/update.sh +++ /dev/null @@ -1,37 +0,0 @@ -#!/bin/sh -# -# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana -# University Research and Technology -# Corporation. All rights reserved. -# Copyright (c) 2004-2005 The University of Tennessee and The University -# of Tennessee Research Foundation. All rights -# reserved. -# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, -# University of Stuttgart. All rights reserved. -# Copyright (c) 2004-2005 The Regents of the University of California. -# All rights reserved. -# Copyright (c) 2007 Sun Microsystems, Inc. All rights reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - -CFILE=/tmp/opal_atomic_$$.c - -trap "/bin/rm -f $CFILE; exit 0" 0 1 2 15 - -echo Updating asm.s from atomic.h and timer.h using gcc - -cat > $CFILE< -#include -#define static -#define inline -#define OPAL_GCC_INLINE_ASSEMBLY 1 -#include "atomic.h" -#include "timer.h" -EOF - -gcc -O1 -I. -S $CFILE -o asm.s diff --git a/opal/include/opal/sys/mips/Makefile.am b/opal/include/opal/sys/mips/Makefile.am deleted file mode 100644 index cf7f925b209..00000000000 --- a/opal/include/opal/sys/mips/Makefile.am +++ /dev/null @@ -1,24 +0,0 @@ -# -# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana -# University Research and Technology -# Corporation. All rights reserved. -# Copyright (c) 2004-2008 The University of Tennessee and The University -# of Tennessee Research Foundation. All rights -# reserved. -# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, -# University of Stuttgart. All rights reserved. -# Copyright (c) 2004-2005 The Regents of the University of California. -# All rights reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - -# This makefile.am does not stand on its own - it is included from opal/include/Makefile.am - -headers += \ - opal/sys/mips/atomic.h \ - opal/sys/mips/timer.h - diff --git a/opal/include/opal/sys/mips/atomic.h b/opal/include/opal/sys/mips/atomic.h deleted file mode 100644 index 95ba1139513..00000000000 --- a/opal/include/opal/sys/mips/atomic.h +++ /dev/null @@ -1,198 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2005 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef OPAL_SYS_ARCH_ATOMIC_H -#define OPAL_SYS_ARCH_ATOMIC_H 1 - - -/* BWB - FIX ME! */ -#ifdef __linux__ -#define MB() __asm__ __volatile__(".set mips2; sync; .set mips0": : :"memory") -#define RMB() __asm__ __volatile__(".set mips2; sync; .set mips0": : :"memory") -#define WMB() __asm__ __volatile__(".set mips2; sync; .set mips0": : :"memory") -#define SMP_SYNC ".set mips2; sync; .set mips0" -#else -#define MB() __asm__ __volatile__("sync": : :"memory") -#define RMB() __asm__ __volatile__("sync": : :"memory") -#define WMB() __asm__ __volatile__("sync": : :"memory") -#define SMP_SYNC "sync" -#endif - - -/********************************************************************** - * - * Define constants for MIPS - * - *********************************************************************/ -#define OPAL_HAVE_ATOMIC_MEM_BARRIER 1 - -#define OPAL_HAVE_ATOMIC_CMPSET_32 1 - -#ifdef __mips64 -#define OPAL_HAVE_ATOMIC_CMPSET_64 1 -#endif - -/********************************************************************** - * - * Memory Barriers - * - *********************************************************************/ -#if OPAL_GCC_INLINE_ASSEMBLY - -static inline -void opal_atomic_mb(void) -{ - MB(); -} - - -static inline -void opal_atomic_rmb(void) -{ - RMB(); -} - - -static inline -void opal_atomic_wmb(void) -{ - WMB(); -} - -static inline -void opal_atomic_isync(void) -{ -} - -#endif - -/********************************************************************** - * - * Atomic math operations - * - *********************************************************************/ -#if OPAL_GCC_INLINE_ASSEMBLY - -static inline int opal_atomic_cmpset_32(volatile int32_t *addr, - int32_t oldval, int32_t newval) -{ - int32_t ret; - - __asm__ __volatile__ (".set noreorder \n" - ".set noat \n" - "1: \n" -#ifdef __linux__ - ".set mips2 \n\t" -#endif - "ll %0, %2 \n" /* load *addr into ret */ - "bne %0, %z3, 2f \n" /* done if oldval != ret */ - "or $1, %z4, 0 \n" /* tmp = newval (delay slot) */ - "sc $1, %2 \n" /* store tmp in *addr */ -#ifdef __linux__ - ".set mips0 \n\t" -#endif - /* note: ret will be 0 if failed, 1 if succeeded */ - "beqz $1, 1b \n" /* if 0 jump back to 1b */ - "nop \n" /* fill delay slots */ - "2: \n" - ".set reorder \n" - : "=&r"(ret), "=m"(*addr) - : "m"(*addr), "r"(oldval), "r"(newval) - : "cc", "memory"); - return (ret == oldval); -} - - -/* these two functions aren't inlined in the non-gcc case because then - there would be two function calls (since neither cmpset_32 nor - atomic_?mb can be inlined). Instead, we "inline" them by hand in - the assembly, meaning there is one function call overhead instead - of two */ -static inline int opal_atomic_cmpset_acq_32(volatile int32_t *addr, - int32_t oldval, int32_t newval) -{ - int rc; - - rc = opal_atomic_cmpset_32(addr, oldval, newval); - opal_atomic_rmb(); - - return rc; -} - - -static inline int opal_atomic_cmpset_rel_32(volatile int32_t *addr, - int32_t oldval, int32_t newval) -{ - opal_atomic_wmb(); - return opal_atomic_cmpset_32(addr, oldval, newval); -} - -#ifdef OPAL_HAVE_ATOMIC_CMPSET_64 -static inline int opal_atomic_cmpset_64(volatile int64_t *addr, - int64_t oldval, int64_t newval) -{ - int64_t ret; - - __asm__ __volatile__ (".set noreorder \n" - ".set noat \n" - "1: \n\t" - "lld %0, %2 \n\t" /* load *addr into ret */ - "bne %0, %z3, 2f \n\t" /* done if oldval != ret */ - "or $1, %4, 0 \n\t" /* tmp = newval (delay slot) */ - "scd $1, %2 \n\t" /* store tmp in *addr */ - /* note: ret will be 0 if failed, 1 if succeeded */ - "beqz $1, 1b \n\t" /* if 0 jump back to 1b */ - "nop \n\t" /* fill delay slot */ - "2: \n\t" - ".set reorder \n" - : "=&r" (ret), "=m" (*addr) - : "m" (*addr), "r" (oldval), "r" (newval) - : "cc", "memory"); - - return (ret == oldval); -} - - -/* these two functions aren't inlined in the non-gcc case because then - there would be two function calls (since neither cmpset_64 nor - atomic_?mb can be inlined). Instead, we "inline" them by hand in - the assembly, meaning there is one function call overhead instead - of two */ -static inline int opal_atomic_cmpset_acq_64(volatile int64_t *addr, - int64_t oldval, int64_t newval) -{ - int rc; - - rc = opal_atomic_cmpset_64(addr, oldval, newval); - opal_atomic_rmb(); - - return rc; -} - - -static inline int opal_atomic_cmpset_rel_64(volatile int64_t *addr, - int64_t oldval, int64_t newval) -{ - opal_atomic_wmb(); - return opal_atomic_cmpset_64(addr, oldval, newval); -} -#endif /* OPAL_HAVE_ATOMIC_CMPSET_64 */ - -#endif /* OPAL_GCC_INLINE_ASSEMBLY */ - -#endif /* ! OPAL_SYS_ARCH_ATOMIC_H */ diff --git a/opal/include/opal/sys/mips/timer.h b/opal/include/opal/sys/mips/timer.h deleted file mode 100644 index b93689c908d..00000000000 --- a/opal/include/opal/sys/mips/timer.h +++ /dev/null @@ -1,33 +0,0 @@ -/* - * Copyright (c) 2008 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef OPAL_SYS_ARCH_TIMER_H -#define OPAL_SYS_ARCH_TIMER_H 1 - -#include - -typedef uint64_t opal_timer_t; - -static inline opal_timer_t -opal_sys_timer_get_cycles(void) -{ - opal_timer_t ret; - struct tms accurate_clock; - - times(&accurate_clock); - ret = accurate_clock.tms_utime + accurate_clock.tms_stime; - - return ret; -} - -#define OPAL_HAVE_SYS_TIMER_GET_CYCLES 1 - -#endif /* ! OPAL_SYS_ARCH_TIMER_H */ diff --git a/opal/include/opal/sys/mips/update.sh b/opal/include/opal/sys/mips/update.sh deleted file mode 100644 index 94d8ed2714b..00000000000 --- a/opal/include/opal/sys/mips/update.sh +++ /dev/null @@ -1,36 +0,0 @@ -#!/bin/sh -# -# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana -# University Research and Technology -# Corporation. All rights reserved. -# Copyright (c) 2004-2005 The University of Tennessee and The University -# of Tennessee Research Foundation. All rights -# reserved. -# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, -# University of Stuttgart. All rights reserved. -# Copyright (c) 2004-2005 The Regents of the University of California. -# All rights reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - -CFILE=/tmp/opal_atomic_$$.c - -trap "/bin/rm -f $CFILE; exit 0" 0 1 2 15 - -echo Updating atomic.s from atomic.h using gcc - -cat > $CFILE< -#include -#define static -#define inline -#define OPAL_GCC_INLINE_ASSEMBLY 1 -#include "../architecture.h" -#include "atomic.h" -EOF - -gcc -O1 -I. -S $CFILE -o atomic.s diff --git a/opal/include/opal/sys/osx/Makefile.am b/opal/include/opal/sys/osx/Makefile.am deleted file mode 100644 index 012ada40296..00000000000 --- a/opal/include/opal/sys/osx/Makefile.am +++ /dev/null @@ -1,24 +0,0 @@ -# -# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana -# University Research and Technology -# Corporation. All rights reserved. -# Copyright (c) 2004-2005 The University of Tennessee and The University -# of Tennessee Research Foundation. All rights -# reserved. -# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, -# University of Stuttgart. All rights reserved. -# Copyright (c) 2004-2005 The Regents of the University of California. -# All rights reserved. -# Copyright (c) 2013 Los Alamos National Security, LLC. All rights -# reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - -# This makefile.am does not stand on its own - it is included from opal/include/Makefile.am - -headers += \ - opal/sys/osx/atomic.h diff --git a/opal/include/opal/sys/osx/atomic.h b/opal/include/opal/sys/osx/atomic.h deleted file mode 100644 index f73efc59f07..00000000000 --- a/opal/include/opal/sys/osx/atomic.h +++ /dev/null @@ -1,169 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2010 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2007 Sun Microsystems, Inc. All rights reserverd. - * Copyright (c) 2013 Los Alamos National Security, LLC. All rights - * reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef OPAL_SYS_ARCH_ATOMIC_H -#define OPAL_SYS_ARCH_ATOMIC_H 1 - -#include - - -#define MB() OSMemoryBarrier - - -/********************************************************************** - * - * Define constants for OSX/iOS - * - *********************************************************************/ -#define OPAL_HAVE_ATOMIC_MEM_BARRIER 1 -#define OPAL_HAVE_ATOMIC_CMPSET_32 1 -#define OPAL_HAVE_ATOMIC_CMPSET_64 1 -#define OPAL_HAVE_ATOMIC_MATH_32 1 -#define OPAL_HAVE_ATOMIC_MATH_64 1 -#define OPAL_HAVE_ATOMIC_ADD_32 1 -#define OPAL_HAVE_ATOMIC_ADD_64 1 -#define OPAL_HAVE_ATOMIC_SUB_32 1 -#define OPAL_HAVE_ATOMIC_SUB_64 1 -#define OPAL_HAVE_ATOMIC_SPINLOCKS 1 - -/********************************************************************** - * - * Memory Barriers - * - *********************************************************************/ -static inline void opal_atomic_mb(void) -{ - MB(); -} - - -static inline void opal_atomic_rmb(void) -{ - MB(); -} - - -static inline void opal_atomic_wmb(void) -{ - MB(); -} - -static inline void opal_atomic_isync(void) -{ -} - -/********************************************************************** - * - * Atomic math operations - * - *********************************************************************/ -static inline int opal_atomic_cmpset_32( volatile int32_t *addr, - int32_t oldval, int32_t newval) -{ - return OSAtomicCompareAndSwap32 (oldval, newval, addr); -} - -#define opal_atomic_cmpset_acq_32 opal_atomic_cmpset_32 -#define opal_atomic_cmpset_rel_32 opal_atomic_cmpset_32 - - -static inline int opal_atomic_cmpset_64( volatile int64_t *addr, - int64_t oldval, int64_t newval) -{ - return OSAtomicCompareAndSwap64 (oldval, newval, addr); -} - -#define opal_atomic_cmpset_acq_64 opal_atomic_cmpset_64 -#define opal_atomic_cmpset_rel_64 opal_atomic_cmpset_64 - -/** - * atomic_add - add integer to atomic variable - * @i: integer value to add - * @v: pointer of type int - * - * Atomically adds @i to @v. - */ -static inline int32_t opal_atomic_add_32(volatile int32_t* v, int i) -{ - return OSAtomicAdd32 (i, v); -} - -/** - * atomic_add - add integer to atomic variable - * @i: integer value to add - * @v: pointer of type int - * - * Atomically adds @i to @v. - */ -static inline int64_t opal_atomic_add_64(volatile int64_t* v, int64_t i) -{ - return OSAtomicAdd64 (i, v); -} - -/** - * atomic_sub - subtract the atomic variable - * @i: integer value to subtract - * @v: pointer of type int - * - * Atomically subtracts @i from @v. - */ -static inline int32_t opal_atomic_sub_32(volatile int32_t* v, int i) -{ - return OSAtomicAdd32 (-i, v); -} - -/** - * atomic_sub - subtract the atomic variable - * @i: integer value to subtract - * @v: pointer of type int - * - * Atomically subtracts @i from @v. - */ -static inline int64_t opal_atomic_sub_64(volatile int64_t* v, int64_t i) -{ - return OSAtomicAdd64 (-i, v); -} - -static inline void opal_atomic_init(opal_atomic_lock_t* lock, int32_t value) -{ - lock->u.lock = OS_SPINLOCK_INIT; - if (value) { - OSSpinLockLock (&lock->u.lock); - } -} - -static inline int opal_atomic_trylock(opal_atomic_lock_t *lock) -{ - return !OSSpinLockTry (&lock->u.lock); -} - -static inline void opal_atomic_lock(opal_atomic_lock_t *lock) -{ - OSSpinLockLock (&lock->u.lock); -} - -static inline void opal_atomic_unlock(opal_atomic_lock_t *lock) -{ - OSSpinLockUnlock (&lock->u.lock); -} - -#endif /* ! OPAL_SYS_ARCH_ATOMIC_H */ diff --git a/opal/include/opal/sys/powerpc/atomic.h b/opal/include/opal/sys/powerpc/atomic.h index 019b44edb49..bf6978aa852 100644 --- a/opal/include/opal/sys/powerpc/atomic.h +++ b/opal/include/opal/sys/powerpc/atomic.h @@ -10,8 +10,8 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2010 IBM Corporation. All rights reserved. - * Copyright (c) 2015-2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2010-2017 IBM Corporation. All rights reserved. + * Copyright (c) 2015-2017 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -29,10 +29,8 @@ #define MB() __asm__ __volatile__ ("sync" : : : "memory") #define RMB() __asm__ __volatile__ ("lwsync" : : : "memory") -#define WMB() __asm__ __volatile__ ("eieio" : : : "memory") +#define WMB() __asm__ __volatile__ ("lwsync" : : : "memory") #define ISYNC() __asm__ __volatile__ ("isync" : : : "memory") -#define SMP_SYNC "sync \n\t" -#define SMP_ISYNC "\n\tisync" /********************************************************************** @@ -42,21 +40,27 @@ *********************************************************************/ #define OPAL_HAVE_ATOMIC_MEM_BARRIER 1 -#define OPAL_HAVE_ATOMIC_CMPSET_32 1 +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 1 #define OPAL_HAVE_ATOMIC_SWAP_32 1 #define OPAL_HAVE_ATOMIC_LLSC_32 1 #define OPAL_HAVE_ATOMIC_MATH_32 1 #define OPAL_HAVE_ATOMIC_ADD_32 1 +#define OPAL_HAVE_ATOMIC_AND_32 1 +#define OPAL_HAVE_ATOMIC_OR_32 1 +#define OPAL_HAVE_ATOMIC_XOR_32 1 #define OPAL_HAVE_ATOMIC_SUB_32 1 #if (OPAL_ASSEMBLY_ARCH == OPAL_POWERPC64) || OPAL_ASM_SUPPORT_64BIT -#define OPAL_HAVE_ATOMIC_CMPSET_64 1 +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64 1 #define OPAL_HAVE_ATOMIC_SWAP_64 1 #define OPAL_HAVE_ATOMIC_LLSC_64 1 #define OPAL_HAVE_ATOMIC_MATH_64 1 #define OPAL_HAVE_ATOMIC_ADD_64 1 +#define OPAL_HAVE_ATOMIC_AND_64 1 +#define OPAL_HAVE_ATOMIC_OR_64 1 +#define OPAL_HAVE_ATOMIC_XOR_64 1 #define OPAL_HAVE_ATOMIC_SUB_64 1 #endif @@ -85,7 +89,7 @@ void opal_atomic_rmb(void) static inline void opal_atomic_wmb(void) { - RMB(); + WMB(); } static inline @@ -111,7 +115,7 @@ void opal_atomic_isync(void) #pragma mc_func opal_atomic_rmb { "7c2004ac" } /* lwsync */ #pragma reg_killed_by opal_atomic_rmb /* none */ -#pragma mc_func opal_atomic_wmb { "7c0006ac" } /* eieio */ +#pragma mc_func opal_atomic_wmb { "7c2004ac" } /* lwsync */ #pragma reg_killed_by opal_atomic_wmb /* none */ #endif @@ -140,24 +144,25 @@ void opal_atomic_isync(void) #define OPAL_ASM_VALUE64(x) x #endif - -static inline int opal_atomic_cmpset_32(volatile int32_t *addr, - int32_t oldval, int32_t newval) +static inline bool opal_atomic_compare_exchange_strong_32 (volatile int32_t *addr, int32_t *oldval, int32_t newval) { - int32_t ret; - - __asm__ __volatile__ ( - "1: lwarx %0, 0, %2 \n\t" - " cmpw 0, %0, %3 \n\t" - " bne- 2f \n\t" - " stwcx. %4, 0, %2 \n\t" - " bne- 1b \n\t" - "2:" - : "=&r" (ret), "=m" (*addr) - : "r" OPAL_ASM_ADDR(addr), "r" (oldval), "r" (newval), "m" (*addr) - : "cc", "memory"); + int32_t prev; + bool ret; + + __asm__ __volatile__ ( + "1: lwarx %0, 0, %2 \n\t" + " cmpw 0, %0, %3 \n\t" + " bne- 2f \n\t" + " stwcx. %4, 0, %2 \n\t" + " bne- 1b \n\t" + "2:" + : "=&r" (prev), "=m" (*addr) + : "r" OPAL_ASM_ADDR(addr), "r" (*oldval), "r" (newval), "m" (*addr) + : "cc", "memory"); - return (ret == oldval); + ret = (prev == *oldval); + *oldval = prev; + return ret; } static inline int32_t opal_atomic_ll_32 (volatile int32_t *addr) @@ -191,23 +196,21 @@ static inline int opal_atomic_sc_32 (volatile int32_t *addr, int32_t newval) atomic_?mb can be inlined). Instead, we "inline" them by hand in the assembly, meaning there is one function call overhead instead of two */ -static inline int opal_atomic_cmpset_acq_32(volatile int32_t *addr, - int32_t oldval, int32_t newval) +static inline bool opal_atomic_compare_exchange_strong_acq_32 (volatile int32_t *addr, int32_t *oldval, int32_t newval) { - int rc; + bool rc; - rc = opal_atomic_cmpset_32(addr, oldval, newval); + rc = opal_atomic_compare_exchange_strong_32 (addr, oldval, newval); opal_atomic_rmb(); return rc; } -static inline int opal_atomic_cmpset_rel_32(volatile int32_t *addr, - int32_t oldval, int32_t newval) +static inline bool opal_atomic_compare_exchange_strong_rel_32 (volatile int32_t *addr, int32_t *oldval, int32_t newval) { opal_atomic_wmb(); - return opal_atomic_cmpset_32(addr, oldval, newval); + return opal_atomic_compare_exchange_strong_32 (addr, oldval, newval); } static inline int32_t opal_atomic_swap_32(volatile int32_t *addr, int32_t newval) @@ -231,55 +234,48 @@ static inline int32_t opal_atomic_swap_32(volatile int32_t *addr, int32_t newval #if OPAL_GCC_INLINE_ASSEMBLY -static inline int64_t opal_atomic_add_64 (volatile int64_t* v, int64_t inc) -{ - int64_t t; - - __asm__ __volatile__("1: ldarx %0, 0, %3 \n\t" - " add %0, %2, %0 \n\t" - " stdcx. %0, 0, %3 \n\t" - " bne- 1b \n\t" - : "=&r" (t), "=m" (*v) - : "r" (OPAL_ASM_VALUE64(inc)), "r" OPAL_ASM_ADDR(v), "m" (*v) - : "cc"); - - return t; +#define OPAL_ATOMIC_POWERPC_DEFINE_ATOMIC_64(type, instr) \ +static inline int64_t opal_atomic_fetch_ ## type ## _64(volatile int64_t* v, int64_t val) \ +{ \ + int64_t t, old; \ + \ + __asm__ __volatile__( \ + "1: ldarx %1, 0, %4 \n\t" \ + " " #instr " %0, %3, %1 \n\t" \ + " stdcx. %0, 0, %4 \n\t" \ + " bne- 1b \n\t" \ + : "=&r" (t), "=&r" (old), "=m" (*v) \ + : "r" (OPAL_ASM_VALUE64(val)), "r" OPAL_ASM_ADDR(v), "m" (*v) \ + : "cc"); \ + \ + return old; \ } +OPAL_ATOMIC_POWERPC_DEFINE_ATOMIC_64(add, add) +OPAL_ATOMIC_POWERPC_DEFINE_ATOMIC_64(and, and) +OPAL_ATOMIC_POWERPC_DEFINE_ATOMIC_64(or, or) +OPAL_ATOMIC_POWERPC_DEFINE_ATOMIC_64(xor, xor) +OPAL_ATOMIC_POWERPC_DEFINE_ATOMIC_64(sub, subf) -static inline int64_t opal_atomic_sub_64 (volatile int64_t* v, int64_t dec) -{ - int64_t t; - - __asm__ __volatile__( - "1: ldarx %0,0,%3 \n\t" - " subf %0,%2,%0 \n\t" - " stdcx. %0,0,%3 \n\t" - " bne- 1b \n\t" - : "=&r" (t), "=m" (*v) - : "r" (OPAL_ASM_VALUE64(dec)), "r" OPAL_ASM_ADDR(v), "m" (*v) - : "cc"); - - return t; -} - -static inline int opal_atomic_cmpset_64(volatile int64_t *addr, - int64_t oldval, int64_t newval) +static inline bool opal_atomic_compare_exchange_strong_64 (volatile int64_t *addr, int64_t *oldval, int64_t newval) { - int64_t ret; - - __asm__ __volatile__ ( - "1: ldarx %0, 0, %2 \n\t" - " cmpd 0, %0, %3 \n\t" - " bne- 2f \n\t" - " stdcx. %4, 0, %2 \n\t" - " bne- 1b \n\t" - "2:" - : "=&r" (ret), "=m" (*addr) - : "r" (addr), "r" (OPAL_ASM_VALUE64(oldval)), "r" (OPAL_ASM_VALUE64(newval)), "m" (*addr) - : "cc", "memory"); + int64_t prev; + bool ret; + + __asm__ __volatile__ ( + "1: ldarx %0, 0, %2 \n\t" + " cmpd 0, %0, %3 \n\t" + " bne- 2f \n\t" + " stdcx. %4, 0, %2 \n\t" + " bne- 1b \n\t" + "2:" + : "=&r" (prev), "=m" (*addr) + : "r" (addr), "r" (OPAL_ASM_VALUE64(*oldval)), "r" (OPAL_ASM_VALUE64(newval)), "m" (*addr) + : "cc", "memory"); - return (ret == oldval); + ret = (prev == *oldval); + *oldval = prev; + return ret; } static inline int64_t opal_atomic_ll_64(volatile int64_t *addr) @@ -308,29 +304,6 @@ static inline int opal_atomic_sc_64(volatile int64_t *addr, int64_t newval) return ret; } -/* these two functions aren't inlined in the non-gcc case because then - there would be two function calls (since neither cmpset_64 nor - atomic_?mb can be inlined). Instead, we "inline" them by hand in - the assembly, meaning there is one function call overhead instead - of two */ -static inline int opal_atomic_cmpset_acq_64(volatile int64_t *addr, - int64_t oldval, int64_t newval) -{ - int rc; - - rc = opal_atomic_cmpset_64(addr, oldval, newval); - opal_atomic_rmb(); - - return rc; -} - - -static inline int opal_atomic_cmpset_rel_64(volatile int64_t *addr, - int64_t oldval, int64_t newval) -{ - opal_atomic_wmb(); - return opal_atomic_cmpset_64(addr, oldval, newval); -} static inline int64_t opal_atomic_swap_64(volatile int64_t *addr, int64_t newval) { @@ -357,9 +330,9 @@ static inline int64_t opal_atomic_swap_64(volatile int64_t *addr, int64_t newval #if OPAL_GCC_INLINE_ASSEMBLY -static inline int opal_atomic_cmpset_64(volatile int64_t *addr, - int64_t oldval, int64_t newval) +static inline bool opal_atomic_compare_exchange_strong_64 (volatile int64_t *addr, int64_t *oldval, int64_t newval) { + int64_t prev; int ret; /* @@ -374,89 +347,76 @@ static inline int opal_atomic_cmpset_64(volatile int64_t *addr, * is very similar to the pure 64 bit version. */ __asm__ __volatile__ ( - "ld r4,%2 \n\t" - "ld r5,%3 \n\t" - "1: ldarx r9, 0, %1 \n\t" - " cmpd 0, r9, r4 \n\t" + "ld r4,%3 \n\t" + "ld r5,%4 \n\t" + "1: ldarx %1, 0, %2 \n\t" + " cmpd 0, %1, r4 \n\t" " bne- 2f \n\t" - " stdcx. r5, 0, %1 \n\t" + " stdcx. r5, 0, %2 \n\t" " bne- 1b \n\t" "2: \n\t" - "xor r5,r4,r9 \n\t" + "xor r5,r4,%1 \n\t" "subfic r9,r5,0 \n\t" "adde %0,r9,r5 \n\t" - : "=&r" (ret) + : "=&r" (ret), "+r" (prev) : "r"OPAL_ASM_ADDR(addr), - "m"(oldval), "m"(newval) + "m"(*oldval), "m"(newval) : "r4", "r5", "r9", "cc", "memory"); - - return ret; + *oldval = prev; + return (bool) ret; } +#endif /* OPAL_GCC_INLINE_ASSEMBLY */ + +#endif /* OPAL_ASM_SUPPORT_64BIT */ + +#if OPAL_GCC_INLINE_ASSEMBLY + /* these two functions aren't inlined in the non-gcc case because then there would be two function calls (since neither cmpset_64 nor atomic_?mb can be inlined). Instead, we "inline" them by hand in the assembly, meaning there is one function call overhead instead of two */ -static inline int opal_atomic_cmpset_acq_64(volatile int64_t *addr, - int64_t oldval, int64_t newval) +static inline bool opal_atomic_compare_exchange_strong_acq_64 (volatile int64_t *addr, int64_t *oldval, int64_t newval) { - int rc; + bool rc; - rc = opal_atomic_cmpset_64(addr, oldval, newval); + rc = opal_atomic_compare_exchange_strong_64 (addr, oldval, newval); opal_atomic_rmb(); return rc; } -static inline int opal_atomic_cmpset_rel_64(volatile int64_t *addr, - int64_t oldval, int64_t newval) +static inline bool opal_atomic_compare_exchange_strong_rel_64 (volatile int64_t *addr, int64_t *oldval, int64_t newval) { opal_atomic_wmb(); - return opal_atomic_cmpset_64(addr, oldval, newval); + return opal_atomic_compare_exchange_strong_64 (addr, oldval, newval); } -#endif /* OPAL_GCC_INLINE_ASSEMBLY */ - -#endif /* OPAL_ASM_SUPPORT_64BIT */ - -#if OPAL_GCC_INLINE_ASSEMBLY - -static inline int32_t opal_atomic_add_32(volatile int32_t* v, int inc) -{ - int32_t t; - - __asm__ __volatile__( - "1: lwarx %0, 0, %3 \n\t" - " add %0, %2, %0 \n\t" - " stwcx. %0, 0, %3 \n\t" - " bne- 1b \n\t" - : "=&r" (t), "=m" (*v) - : "r" (inc), "r" OPAL_ASM_ADDR(v), "m" (*v) - : "cc"); - - return t; -} - - -static inline int32_t opal_atomic_sub_32(volatile int32_t* v, int dec) -{ - int32_t t; - - __asm__ __volatile__( - "1: lwarx %0,0,%3 \n\t" - " subf %0,%2,%0 \n\t" - " stwcx. %0,0,%3 \n\t" - " bne- 1b \n\t" - : "=&r" (t), "=m" (*v) - : "r" (dec), "r" OPAL_ASM_ADDR(v), "m" (*v) - : "cc"); - - return t; +#define OPAL_ATOMIC_POWERPC_DEFINE_ATOMIC_32(type, instr) \ +static inline int32_t opal_atomic_fetch_ ## type ## _32(volatile int32_t* v, int val) \ +{ \ + int32_t t, old; \ + \ + __asm__ __volatile__( \ + "1: lwarx %1, 0, %4 \n\t" \ + " " #instr " %0, %3, %1 \n\t" \ + " stwcx. %0, 0, %4 \n\t" \ + " bne- 1b \n\t" \ + : "=&r" (t), "=&r" (old), "=m" (*v) \ + : "r" (val), "r" OPAL_ASM_ADDR(v), "m" (*v) \ + : "cc"); \ + \ + return t; \ } +OPAL_ATOMIC_POWERPC_DEFINE_ATOMIC_32(add, add) +OPAL_ATOMIC_POWERPC_DEFINE_ATOMIC_32(and, and) +OPAL_ATOMIC_POWERPC_DEFINE_ATOMIC_32(or, or) +OPAL_ATOMIC_POWERPC_DEFINE_ATOMIC_32(xor, xor) +OPAL_ATOMIC_POWERPC_DEFINE_ATOMIC_32(sub, subf) #endif /* OPAL_GCC_INLINE_ASSEMBLY */ diff --git a/opal/include/opal/sys/sparcv9/atomic.h b/opal/include/opal/sys/sparcv9/atomic.h index da6821f0183..c79e32b1ebb 100644 --- a/opal/include/opal/sys/sparcv9/atomic.h +++ b/opal/include/opal/sys/sparcv9/atomic.h @@ -1,3 +1,4 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology @@ -12,6 +13,8 @@ * Copyright (c) 2007 Sun Microsystems, Inc. All rights reserverd. * Copyright (c) 2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 Los Alamos National Security, LLC. All rights + * reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -38,9 +41,9 @@ *********************************************************************/ #define OPAL_HAVE_ATOMIC_MEM_BARRIER 1 -#define OPAL_HAVE_ATOMIC_CMPSET_32 1 +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 1 -#define OPAL_HAVE_ATOMIC_CMPSET_64 1 +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64 1 /********************************************************************** @@ -82,50 +85,49 @@ static inline void opal_atomic_isync(void) *********************************************************************/ #if OPAL_GCC_INLINE_ASSEMBLY -static inline int opal_atomic_cmpset_32( volatile int32_t *addr, - int32_t oldval, int32_t newval) +static inline bool opal_atomic_compare_exchange_strong_32 (volatile int32_t *addr, int32_t *oldval, int32_t newval) { - /* casa [reg(rs1)] %asi, reg(rs2), reg(rd) - * - * if (*(reg(rs1)) == reg(rs2) ) - * swap reg(rd), *(reg(rs1)) - * else - * reg(rd) = *(reg(rs1)) - */ - - int32_t ret = newval; - - __asm__ __volatile__("casa [%1] " ASI_P ", %2, %0" - : "+r" (ret) - : "r" (addr), "r" (oldval)); - return (ret == oldval); + /* casa [reg(rs1)] %asi, reg(rs2), reg(rd) + * + * if (*(reg(rs1)) == reg(rs2) ) + * swap reg(rd), *(reg(rs1)) + * else + * reg(rd) = *(reg(rs1)) + */ + + int32_t prev = newval; + bool ret; + + __asm__ __volatile__("casa [%1] " ASI_P ", %2, %0" + : "+r" (prev) + : "r" (addr), "r" (*oldval)); + ret = (prev == *oldval); + *oldval = prev; + return ret; } -static inline int opal_atomic_cmpset_acq_32( volatile int32_t *addr, - int32_t oldval, int32_t newval) +static inline bool opal_atomic_compare_exchange_strong_acq_32 (volatile int32_t *addr, int32_t *oldval, int32_t newval) { - int rc; + bool rc; - rc = opal_atomic_cmpset_32(addr, oldval, newval); - opal_atomic_rmb(); + rc = opal_atomic_compare_exchange_strong_32 (addr, oldval, newval); + opal_atomic_rmb(); - return rc; + return rc; } -static inline int opal_atomic_cmpset_rel_32( volatile int32_t *addr, - int32_t oldval, int32_t newval) +static inline bool opal_atomic_compare_exchange_strong_rel_32 (volatile int32_t *addr, int32_t *oldval, int32_t newval) { - opal_atomic_wmb(); - return opal_atomic_cmpset_32(addr, oldval, newval); + opal_atomic_wmb(); + return opal_atomic_compare_exchange_strong_32 (addr, oldval, newval); } #if OPAL_ASSEMBLY_ARCH == OPAL_SPARCV9_64 -static inline int opal_atomic_cmpset_64( volatile int64_t *addr, - int64_t oldval, int64_t newval) +static inline bool opal_atomic_compare_exchange_strong_64 (volatile int64_t *addr, int64_t *oldval, int64_t newval) { /* casa [reg(rs1)] %asi, reg(rs2), reg(rd) * @@ -134,18 +136,20 @@ static inline int opal_atomic_cmpset_64( volatile int64_t *addr, * else * reg(rd) = *(reg(rs1)) */ - int64_t ret = newval; - - __asm__ __volatile__("casxa [%1] " ASI_P ", %2, %0" - : "+r" (ret) - : "r" (addr), "r" (oldval)); - return (ret == oldval); + int64_t prev = newval; + bool ret; + + __asm__ __volatile__("casxa [%1] " ASI_P ", %2, %0" + : "+r" (prev) + : "r" (addr), "r" (*oldval)); + ret = (prev == *oldval); + *oldval = prev; + return ret; } #else /* OPAL_ASSEMBLY_ARCH == OPAL_SPARCV9_64 */ -static inline int opal_atomic_cmpset_64( volatile int64_t *addr, - int64_t oldval, int64_t newval) +static inline bool opal_atomic_compare_exchange_strong_64 (volatile int64_t *addr, int64_t *oldval, int64_t newval) { /* casa [reg(rs1)] %asi, reg(rs2), reg(rd) * @@ -155,40 +159,41 @@ static inline int opal_atomic_cmpset_64( volatile int64_t *addr, * reg(rd) = *(reg(rs1)) * */ - long long ret = newval; + int64_t prev = newval; + bool ret; __asm__ __volatile__( "ldx %0, %%g1 \n\t" /* g1 = ret */ "ldx %2, %%g2 \n\t" /* g2 = oldval */ "casxa [%1] " ASI_P ", %%g2, %%g1 \n\t" "stx %%g1, %0 \n" - : "+m"(ret) - : "r"(addr), "m"(oldval) + : "+m"(prev) + : "r"(addr), "m"(*oldval) : "%g1", "%g2" ); - return (ret == oldval); + ret = (prev == *oldval); + *oldval = prev; + return ret; } #endif /* OPAL_ASSEMBLY_ARCH == OPAL_SPARCV9_64 */ -static inline int opal_atomic_cmpset_acq_64( volatile int64_t *addr, - int64_t oldval, int64_t newval) +static inline bool opal_atomic_compare_exchange_strong_acq_64 (volatile int64_t *addr, int64_t *oldval, int64_t newval) { - int rc; + bool rc; - rc = opal_atomic_cmpset_64(addr, oldval, newval); - opal_atomic_rmb(); + rc = opal_atomic_compare_exchange_strong_64 (addr, oldval, newval); + opal_atomic_rmb(); - return rc; + return rc; } -static inline int opal_atomic_cmpset_rel_64( volatile int64_t *addr, - int64_t oldval, int64_t newval) +static inline bool opal_atomic_compare_exchange_strong_rel_64 (volatile int64_t *addr, int64_t *oldval, int64_t newval) { - opal_atomic_wmb(); - return opal_atomic_cmpset_64(addr, oldval, newval); + opal_atomic_wmb(); + return opal_atomic_compare_exchange_strong_64 (addr, oldval, newval); } #endif /* OPAL_GCC_INLINE_ASSEMBLY */ diff --git a/opal/include/opal/sys/sync_builtin/atomic.h b/opal/include/opal/sys/sync_builtin/atomic.h index 0f18039ff66..11decc2f0ad 100644 --- a/opal/include/opal/sys/sync_builtin/atomic.h +++ b/opal/include/opal/sys/sync_builtin/atomic.h @@ -11,8 +11,10 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011 Sandia National Laboratories. All rights reserved. - * Copyright (c) 2014-2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2014-2017 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -53,83 +55,110 @@ static inline void opal_atomic_wmb(void) * *********************************************************************/ -#define OPAL_HAVE_ATOMIC_CMPSET_32 1 -static inline int opal_atomic_cmpset_acq_32( volatile int32_t *addr, - int32_t oldval, int32_t newval) +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 1 + +static inline bool opal_atomic_compare_exchange_strong_32 (volatile int32_t *addr, int32_t *oldval, int32_t newval) { - return __sync_bool_compare_and_swap(addr, oldval, newval); + int32_t prev = __sync_val_compare_and_swap (addr, *oldval, newval); + bool ret = prev == *oldval; + *oldval = prev; + return ret; } +#define opal_atomic_compare_exchange_strong_acq_32 opal_atomic_compare_exchange_strong_32 +#define opal_atomic_compare_exchange_strong_rel_32 opal_atomic_compare_exchange_strong_32 + +#define OPAL_HAVE_ATOMIC_MATH_32 1 -static inline int opal_atomic_cmpset_rel_32( volatile int32_t *addr, - int32_t oldval, int32_t newval) +#define OPAL_HAVE_ATOMIC_ADD_32 1 +static inline int32_t opal_atomic_fetch_add_32(volatile int32_t *addr, int32_t delta) { - return __sync_bool_compare_and_swap(addr, oldval, newval);} + return __sync_fetch_and_add(addr, delta); +} -static inline int opal_atomic_cmpset_32( volatile int32_t *addr, - int32_t oldval, int32_t newval) +#define OPAL_HAVE_ATOMIC_AND_32 1 +static inline int32_t opal_atomic_fetch_and_32(volatile int32_t *addr, int32_t value) { - return __sync_bool_compare_and_swap(addr, oldval, newval); + return __sync_fetch_and_and(addr, value); } -#define OPAL_HAVE_ATOMIC_MATH_32 1 +#define OPAL_HAVE_ATOMIC_OR_32 1 +static inline int32_t opal_atomic_fetch_or_32(volatile int32_t *addr, int32_t value) +{ + return __sync_fetch_and_or(addr, value); +} -#define OPAL_HAVE_ATOMIC_ADD_32 1 -static inline int32_t opal_atomic_add_32(volatile int32_t *addr, int32_t delta) +#define OPAL_HAVE_ATOMIC_XOR_32 1 +static inline int32_t opal_atomic_fetch_xor_32(volatile int32_t *addr, int32_t value) { - return __sync_add_and_fetch(addr, delta); + return __sync_fetch_and_xor(addr, value); } #define OPAL_HAVE_ATOMIC_SUB_32 1 -static inline int32_t opal_atomic_sub_32(volatile int32_t *addr, int32_t delta) +static inline int32_t opal_atomic_fetch_sub_32(volatile int32_t *addr, int32_t delta) { - return __sync_sub_and_fetch(addr, delta); + return __sync_fetch_and_sub(addr, delta); } #if OPAL_ASM_SYNC_HAVE_64BIT -#define OPAL_HAVE_ATOMIC_CMPSET_64 1 -static inline int opal_atomic_cmpset_acq_64( volatile int64_t *addr, - int64_t oldval, int64_t newval) +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64 1 + +static inline bool opal_atomic_compare_exchange_strong_64 (volatile int64_t *addr, int64_t *oldval, int64_t newval) { - return __sync_bool_compare_and_swap(addr, oldval, newval); + int64_t prev = __sync_val_compare_and_swap (addr, *oldval, newval); + bool ret = prev == *oldval; + *oldval = prev; + return ret; } -static inline int opal_atomic_cmpset_rel_64( volatile int64_t *addr, - int64_t oldval, int64_t newval) +#define opal_atomic_compare_exchange_strong_acq_64 opal_atomic_compare_exchange_strong_64 +#define opal_atomic_compare_exchange_strong_rel_64 opal_atomic_compare_exchange_strong_64 + +#define OPAL_HAVE_ATOMIC_MATH_64 1 +#define OPAL_HAVE_ATOMIC_ADD_64 1 +static inline int64_t opal_atomic_fetch_add_64(volatile int64_t *addr, int64_t delta) { - return __sync_bool_compare_and_swap(addr, oldval, newval);} + return __sync_fetch_and_add(addr, delta); +} +#define OPAL_HAVE_ATOMIC_AND_64 1 +static inline int64_t opal_atomic_fetch_and_64(volatile int64_t *addr, int64_t value) +{ + return __sync_fetch_and_and(addr, value); +} -static inline int opal_atomic_cmpset_64( volatile int64_t *addr, - int64_t oldval, int64_t newval) +#define OPAL_HAVE_ATOMIC_OR_64 1 +static inline int64_t opal_atomic_fetch_or_64(volatile int64_t *addr, int64_t value) { - return __sync_bool_compare_and_swap(addr, oldval, newval); + return __sync_fetch_and_or(addr, value); } -#define OPAL_HAVE_ATOMIC_MATH_64 1 -#define OPAL_HAVE_ATOMIC_ADD_64 1 -static inline int64_t opal_atomic_add_64(volatile int64_t *addr, int64_t delta) +#define OPAL_HAVE_ATOMIC_XOR_64 1 +static inline int64_t opal_atomic_fetch_xor_64(volatile int64_t *addr, int64_t value) { - return __sync_add_and_fetch(addr, delta); + return __sync_fetch_and_xor(addr, value); } #define OPAL_HAVE_ATOMIC_SUB_64 1 -static inline int64_t opal_atomic_sub_64(volatile int64_t *addr, int64_t delta) +static inline int64_t opal_atomic_fetch_sub_64(volatile int64_t *addr, int64_t delta) { - return __sync_sub_and_fetch(addr, delta); + return __sync_fetch_and_sub(addr, delta); } #endif #if OPAL_HAVE_SYNC_BUILTIN_CSWAP_INT128 -static inline int opal_atomic_cmpset_128 (volatile opal_int128_t *addr, - opal_int128_t oldval, opal_int128_t newval) +static inline bool opal_atomic_compare_exchange_strong_128 (volatile opal_int128_t *addr, + opal_int128_t *oldval, opal_int128_t newval) { - return __sync_bool_compare_and_swap(addr, oldval, newval); + opal_int128_t prev = __sync_val_compare_and_swap (addr, *oldval, newval); + bool ret = prev == *oldval; + *oldval = prev; + return ret; } -#define OPAL_HAVE_ATOMIC_CMPSET_128 1 +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_128 1 #endif diff --git a/opal/include/opal/sys/timer.h b/opal/include/opal/sys/timer.h index 014903dbe01..4ce2810b7f6 100644 --- a/opal/include/opal/sys/timer.h +++ b/opal/include/opal/sys/timer.h @@ -11,7 +11,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2016 Broadcom Limited. All rights reserved. - * Copyright (c) 2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2016-2017 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -42,16 +42,6 @@ #ifdef OPAL_DISABLE_INLINE_ASM #undef OPAL_C_GCC_INLINE_ASSEMBLY #define OPAL_C_GCC_INLINE_ASSEMBLY 0 -#undef OPAL_CXX_GCC_INLINE_ASSEMBLY -#define OPAL_CXX_GCC_INLINE_ASSEMBLY 0 -#undef OPAL_C_DEC_INLINE_ASSEMBLY -#define OPAL_C_DEC_INLINE_ASSEMBLY 0 -#undef OPAL_CXX_DEC_INLINE_ASSEMBLY -#define OPAL_CXX_DEC_INLINE_ASSEMBLY 0 -#undef OPAL_C_XLC_INLINE_ASSEMBLY -#define OPAL_C_XLC_INLINE_ASSEMBLY 0 -#undef OPAL_CXX_XLC_INLINE_ASSEMBLY -#define OPAL_CXX_XLC_INLINE_ASSEMBLY 0 #endif /* define OPAL_{GCC,DEC,XLC}_INLINE_ASSEMBLY based on the @@ -59,12 +49,8 @@ are in C or C++ */ #if defined(c_plusplus) || defined(__cplusplus) #define OPAL_GCC_INLINE_ASSEMBLY OPAL_CXX_GCC_INLINE_ASSEMBLY -#define OPAL_DEC_INLINE_ASSEMBLY OPAL_CXX_DEC_INLINE_ASSEMBLY -#define OPAL_XLC_INLINE_ASSEMBLY OPAL_CXX_XLC_INLINE_ASSEMBLY #else #define OPAL_GCC_INLINE_ASSEMBLY OPAL_C_GCC_INLINE_ASSEMBLY -#define OPAL_DEC_INLINE_ASSEMBLY OPAL_C_DEC_INLINE_ASSEMBLY -#define OPAL_XLC_INLINE_ASSEMBLY OPAL_C_XLC_INLINE_ASSEMBLY #endif /********************************************************************** diff --git a/opal/include/opal/sys/x86_64/atomic.h b/opal/include/opal/sys/x86_64/atomic.h index ae0bbbbc0bc..49d740de388 100644 --- a/opal/include/opal/sys/x86_64/atomic.h +++ b/opal/include/opal/sys/x86_64/atomic.h @@ -11,7 +11,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2007 Sun Microsystems, Inc. All rights reserverd. - * Copyright (c) 2012-2014 Los Alamos National Security, LLC. All rights + * Copyright (c) 2012-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2016-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. @@ -40,9 +40,9 @@ *********************************************************************/ #define OPAL_HAVE_ATOMIC_MEM_BARRIER 1 -#define OPAL_HAVE_ATOMIC_CMPSET_32 1 +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_32 1 -#define OPAL_HAVE_ATOMIC_CMPSET_64 1 +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_64 1 /********************************************************************** * @@ -82,51 +82,48 @@ static inline void opal_atomic_isync(void) *********************************************************************/ #if OPAL_GCC_INLINE_ASSEMBLY -static inline int opal_atomic_cmpset_32( volatile int32_t *addr, - int32_t oldval, int32_t newval) +static inline bool opal_atomic_compare_exchange_strong_32 (volatile int32_t *addr, int32_t *oldval, int32_t newval) { unsigned char ret; __asm__ __volatile__ ( SMPLOCK "cmpxchgl %3,%2 \n\t" "sete %0 \n\t" - : "=qm" (ret), "+a" (oldval), "+m" (*addr) + : "=qm" (ret), "+a" (*oldval), "+m" (*addr) : "q"(newval) : "memory", "cc"); - return (int)ret; + return (bool) ret; } #endif /* OPAL_GCC_INLINE_ASSEMBLY */ -#define opal_atomic_cmpset_acq_32 opal_atomic_cmpset_32 -#define opal_atomic_cmpset_rel_32 opal_atomic_cmpset_32 +#define opal_atomic_compare_exchange_strong_acq_32 opal_atomic_compare_exchange_strong_32 +#define opal_atomic_compare_exchange_strong_rel_32 opal_atomic_compare_exchange_strong_32 #if OPAL_GCC_INLINE_ASSEMBLY -static inline int opal_atomic_cmpset_64( volatile int64_t *addr, - int64_t oldval, int64_t newval) +static inline bool opal_atomic_compare_exchange_strong_64 (volatile int64_t *addr, int64_t *oldval, int64_t newval) { unsigned char ret; __asm__ __volatile__ ( SMPLOCK "cmpxchgq %3,%2 \n\t" "sete %0 \n\t" - : "=qm" (ret), "+a" (oldval), "+m" (*((volatile long*)addr)) + : "=qm" (ret), "+a" (*oldval), "+m" (*((volatile long*)addr)) : "q"(newval) : "memory", "cc" ); - return (int)ret; + return (bool) ret; } #endif /* OPAL_GCC_INLINE_ASSEMBLY */ -#define opal_atomic_cmpset_acq_64 opal_atomic_cmpset_64 -#define opal_atomic_cmpset_rel_64 opal_atomic_cmpset_64 +#define opal_atomic_compare_exchange_strong_acq_64 opal_atomic_compare_exchange_strong_64 +#define opal_atomic_compare_exchange_strong_rel_64 opal_atomic_compare_exchange_strong_64 #if OPAL_GCC_INLINE_ASSEMBLY && OPAL_HAVE_CMPXCHG16B && HAVE_OPAL_INT128_T -static inline int opal_atomic_cmpset_128 (volatile opal_int128_t *addr, opal_int128_t oldval, - opal_int128_t newval) +static inline bool opal_atomic_compare_exchange_strong_128 (volatile opal_int128_t *addr, opal_int128_t *oldval, opal_int128_t newval) { unsigned char ret; @@ -135,15 +132,14 @@ static inline int opal_atomic_cmpset_128 (volatile opal_int128_t *addr, opal_int * at the address is returned in eax:edx. */ __asm__ __volatile__ (SMPLOCK "cmpxchg16b (%%rsi) \n\t" "sete %0 \n\t" - : "=qm" (ret) - : "S" (addr), "b" (((int64_t *)&newval)[0]), "c" (((int64_t *)&newval)[1]), - "a" (((int64_t *)&oldval)[0]), "d" (((int64_t *)&oldval)[1]) + : "=qm" (ret), "+a" (((int64_t *)oldval)[0]), "+d" (((int64_t *)oldval)[1]) + : "S" (addr), "b" (((int64_t *)&newval)[0]), "c" (((int64_t *)&newval)[1]) : "memory", "cc"); - return (int) ret; + return (bool) ret; } -#define OPAL_HAVE_ATOMIC_CMPSET_128 1 +#define OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_128 1 #endif /* OPAL_GCC_INLINE_ASSEMBLY */ @@ -200,7 +196,7 @@ static inline int64_t opal_atomic_swap_64( volatile int64_t *addr, * * Atomically adds @i to @v. */ -static inline int32_t opal_atomic_add_32(volatile int32_t* v, int i) +static inline int32_t opal_atomic_fetch_add_32(volatile int32_t* v, int i) { int ret = i; __asm__ __volatile__( @@ -209,7 +205,7 @@ static inline int32_t opal_atomic_add_32(volatile int32_t* v, int i) : :"memory", "cc" ); - return (ret+i); + return ret; } #define OPAL_HAVE_ATOMIC_ADD_64 1 @@ -221,7 +217,7 @@ static inline int32_t opal_atomic_add_32(volatile int32_t* v, int i) * * Atomically adds @i to @v. */ -static inline int64_t opal_atomic_add_64(volatile int64_t* v, int64_t i) +static inline int64_t opal_atomic_fetch_add_64(volatile int64_t* v, int64_t i) { int64_t ret = i; __asm__ __volatile__( @@ -230,7 +226,7 @@ static inline int64_t opal_atomic_add_64(volatile int64_t* v, int64_t i) : :"memory", "cc" ); - return (ret+i); + return ret; } #define OPAL_HAVE_ATOMIC_SUB_32 1 @@ -242,7 +238,7 @@ static inline int64_t opal_atomic_add_64(volatile int64_t* v, int64_t i) * * Atomically subtracts @i from @v. */ -static inline int32_t opal_atomic_sub_32(volatile int32_t* v, int i) +static inline int32_t opal_atomic_fetch_sub_32(volatile int32_t* v, int i) { int ret = -i; __asm__ __volatile__( @@ -251,7 +247,7 @@ static inline int32_t opal_atomic_sub_32(volatile int32_t* v, int i) : :"memory", "cc" ); - return (ret-i); + return ret; } #define OPAL_HAVE_ATOMIC_SUB_64 1 @@ -263,7 +259,7 @@ static inline int32_t opal_atomic_sub_32(volatile int32_t* v, int i) * * Atomically subtracts @i from @v. */ -static inline int64_t opal_atomic_sub_64(volatile int64_t* v, int64_t i) +static inline int64_t opal_atomic_fetch_sub_64(volatile int64_t* v, int64_t i) { int64_t ret = -i; __asm__ __volatile__( @@ -272,7 +268,7 @@ static inline int64_t opal_atomic_sub_64(volatile int64_t* v, int64_t i) : :"memory", "cc" ); - return (ret-i); + return ret; } #endif /* OPAL_GCC_INLINE_ASSEMBLY */ diff --git a/opal/include/opal_config_bottom.h b/opal/include/opal_config_bottom.h index 2fed0820ea6..58823471774 100644 --- a/opal/include/opal_config_bottom.h +++ b/opal/include/opal_config_bottom.h @@ -13,9 +13,9 @@ * Copyright (c) 2009-2013 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2013 Mellanox Technologies, Inc. * All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2015 Intel, Inc. All rights reserved. + * Copyright (c) 2015-2017 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -234,6 +234,18 @@ # define __opal_attribute_destructor__ #endif +#if OPAL_HAVE_ATTRIBUTE_OPTNONE +# define __opal_attribute_optnone__ __attribute__((__optnone__)) +#else +# define __opal_attribute_optnone__ +#endif + +#if OPAL_HAVE_ATTRIBUTE_EXTENSION +# define __opal_attribute_extension__ __extension__ +#else +# define __opal_attribute_extension__ +#endif + # if OPAL_C_HAVE_VISIBILITY # define OPAL_DECLSPEC __opal_attribute_visibility__("default") # define OPAL_MODULE_DECLSPEC __opal_attribute_visibility__("default") @@ -260,10 +272,6 @@ **********************************************************************/ #if OMPI_BUILDING -#ifndef HAVE_PTRDIFF_T -typedef OPAL_PTRDIFF_TYPE ptrdiff_t; -#endif - /* * Maximum size of a filename path. */ @@ -272,11 +280,11 @@ typedef OPAL_PTRDIFF_TYPE ptrdiff_t; #include #endif #if defined(PATH_MAX) -#define OPAL_PATH_MAX (PATH_MAX + 1) +#define OPAL_PATH_MAX (PATH_MAX + 1) #elif defined(_POSIX_PATH_MAX) -#define OPAL_PATH_MAX (_POSIX_PATH_MAX + 1) +#define OPAL_PATH_MAX (_POSIX_PATH_MAX + 1) #else -#define OPAL_PATH_MAX 256 +#define OPAL_PATH_MAX 256 #endif /* diff --git a/opal/mca/allocator/basic/Makefile.am b/opal/mca/allocator/basic/Makefile.am index 48d497723bc..b385131f194 100644 --- a/opal/mca/allocator/basic/Makefile.am +++ b/opal/mca/allocator/basic/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -37,6 +38,7 @@ mcacomponentdir = $(opallibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_allocator_basic_la_SOURCES = $(sources) mca_allocator_basic_la_LDFLAGS = -module -avoid-version +mca_allocator_basic_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la noinst_LTLIBRARIES = $(component_noinst) libmca_allocator_basic_la_SOURCES = $(sources) diff --git a/opal/mca/allocator/bucket/Makefile.am b/opal/mca/allocator/bucket/Makefile.am index 2726a044c1c..ba50d9398de 100644 --- a/opal/mca/allocator/bucket/Makefile.am +++ b/opal/mca/allocator/bucket/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -38,6 +39,7 @@ mcacomponentdir = $(opallibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_allocator_bucket_la_SOURCES = $(sources) mca_allocator_bucket_la_LDFLAGS = -module -avoid-version +mca_allocator_bucket_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la noinst_LTLIBRARIES = $(component_noinst) libmca_allocator_bucket_la_SOURCES = $(sources) diff --git a/opal/mca/base/base.h b/opal/mca/base/base.h index 1fdcbd899d7..5c29c0039b8 100644 --- a/opal/mca/base/base.h +++ b/opal/mca/base/base.h @@ -15,6 +15,7 @@ * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -68,6 +69,7 @@ OPAL_DECLSPEC OBJ_CLASS_DECLARATION(mca_base_component_priority_list_item_t); */ OPAL_DECLSPEC extern char *mca_base_component_path; OPAL_DECLSPEC extern bool mca_base_component_show_load_errors; +OPAL_DECLSPEC extern bool mca_base_component_track_load_errors; OPAL_DECLSPEC extern bool mca_base_component_disable_dlopen; OPAL_DECLSPEC extern char *mca_base_system_default_path; OPAL_DECLSPEC extern char *mca_base_user_default_path; diff --git a/opal/mca/base/mca_base_cmd_line.c b/opal/mca/base/mca_base_cmd_line.c index 2a26018c379..fd299cbf700 100644 --- a/opal/mca/base/mca_base_cmd_line.c +++ b/opal/mca/base/mca_base_cmd_line.c @@ -67,7 +67,8 @@ int mca_base_cmd_line_setup(opal_cmd_line_t *cmd) opal_cmd_line_init_t entry = {"mca_base_param_file_prefix", '\0', "am", NULL, 1, NULL, OPAL_CMD_LINE_TYPE_STRING, - "Aggregate MCA parameter set file list" + "Aggregate MCA parameter set file list", + OPAL_CMD_LINE_OTYPE_LAUNCH }; ret = opal_cmd_line_make_opt_mca(cmd, entry); if (OPAL_SUCCESS != ret) { @@ -79,7 +80,8 @@ int mca_base_cmd_line_setup(opal_cmd_line_t *cmd) opal_cmd_line_init_t entry = {"mca_base_envar_file_prefix", '\0', "tune", NULL, 1, NULL, OPAL_CMD_LINE_TYPE_STRING, - "Application profile options file list" + "Application profile options file list", + OPAL_CMD_LINE_OTYPE_DEBUG }; ret = opal_cmd_line_make_opt_mca(cmd, entry); if (OPAL_SUCCESS != ret) { diff --git a/opal/mca/base/mca_base_component_repository.c b/opal/mca/base/mca_base_component_repository.c index f1497f68360..b34f19eea03 100644 --- a/opal/mca/base/mca_base_component_repository.c +++ b/opal/mca/base/mca_base_component_repository.c @@ -15,6 +15,7 @@ * reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -55,6 +56,29 @@ OBJ_CLASS_INSTANCE(mca_base_component_repository_item_t, opal_list_item_t, #endif /* OPAL_HAVE_DL_SUPPORT */ +static void clf_constructor(opal_object_t *obj); +static void clf_destructor(opal_object_t *obj); + +OBJ_CLASS_INSTANCE(mca_base_failed_component_t, opal_list_item_t, + clf_constructor, clf_destructor); + + +static void clf_constructor(opal_object_t *obj) +{ + mca_base_failed_component_t *cli = (mca_base_failed_component_t *) obj; + cli->comp = NULL; + cli->error_msg = NULL; +} + +static void clf_destructor(opal_object_t *obj) +{ + mca_base_failed_component_t *cli = (mca_base_failed_component_t *) obj; + cli->comp = NULL; + if( NULL != cli->error_msg ) { + free(cli->error_msg); + cli->error_msg = NULL; + } +} /* * Private variables @@ -408,6 +432,14 @@ int mca_base_component_repository_open (mca_base_framework_t *framework, } opal_output_verbose(vl, 0, "mca_base_component_repository_open: unable to open %s: %s (ignored)", ri->ri_base, err_msg); + + if( mca_base_component_track_load_errors ) { + mca_base_failed_component_t *f_comp = OBJ_NEW(mca_base_failed_component_t); + f_comp->comp = ri; + asprintf(&(f_comp->error_msg), "%s", err_msg); + opal_list_append(&framework->framework_failed_components, &f_comp->super); + } + return OPAL_ERR_BAD_PARAM; } diff --git a/opal/mca/base/mca_base_component_repository.h b/opal/mca/base/mca_base_component_repository.h index 290c83c83c3..08babe70511 100644 --- a/opal/mca/base/mca_base_component_repository.h +++ b/opal/mca/base/mca_base_component_repository.h @@ -13,6 +13,7 @@ * Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -60,6 +61,17 @@ typedef struct mca_base_component_repository_item_t mca_base_component_repositor OBJ_CLASS_DECLARATION(mca_base_component_repository_item_t); +/* + * Structure to track information about why a component failed to load. + */ +struct mca_base_failed_component_t { + opal_list_item_t super; + mca_base_component_repository_item_t *comp; + char *error_msg; +}; +typedef struct mca_base_failed_component_t mca_base_failed_component_t; +OPAL_DECLSPEC OBJ_CLASS_DECLARATION(mca_base_failed_component_t); + /** * @brief initialize the component repository * diff --git a/opal/mca/base/mca_base_framework.c b/opal/mca/base/mca_base_framework.c index a1e49e4d5b0..9bd968319e2 100644 --- a/opal/mca/base/mca_base_framework.c +++ b/opal/mca/base/mca_base_framework.c @@ -3,6 +3,7 @@ * Copyright (c) 2012-2015 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -66,6 +67,7 @@ int mca_base_framework_register (struct mca_base_framework_t *framework, } OBJ_CONSTRUCT(&framework->framework_components, opal_list_t); + OBJ_CONSTRUCT(&framework->framework_failed_components, opal_list_t); if (framework->framework_flags & MCA_BASE_FRAMEWORK_FLAG_NO_DSO) { flags |= MCA_BASE_REGISTER_STATIC_ONLY; @@ -228,12 +230,16 @@ int mca_base_framework_close (struct mca_base_framework_t *framework) { framework->framework_output); OBJ_RELEASE(item); } + while (NULL != (item = opal_list_remove_first (&framework->framework_failed_components))) { + OBJ_RELEASE(item); + } ret = OPAL_SUCCESS; } framework->framework_flags &= ~(MCA_BASE_FRAMEWORK_FLAG_REGISTERED | MCA_BASE_FRAMEWORK_FLAG_OPEN); OBJ_DESTRUCT(&framework->framework_components); + OBJ_DESTRUCT(&framework->framework_failed_components); framework_close_output (framework); diff --git a/opal/mca/base/mca_base_framework.h b/opal/mca/base/mca_base_framework.h index c5009ac3823..46dfc1de223 100644 --- a/opal/mca/base/mca_base_framework.h +++ b/opal/mca/base/mca_base_framework.h @@ -2,6 +2,7 @@ /* * Copyright (c) 2012-2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -154,6 +155,8 @@ typedef struct mca_base_framework_t { /** List of selected components (filled in by mca_base_framework_register() or mca_base_framework_open() */ opal_list_t framework_components; + /** List of components that failed to load */ + opal_list_t framework_failed_components; } mca_base_framework_t; diff --git a/opal/mca/base/mca_base_open.c b/opal/mca/base/mca_base_open.c index 0e7144ac1a6..684117c932d 100644 --- a/opal/mca/base/mca_base_open.c +++ b/opal/mca/base/mca_base_open.c @@ -3,14 +3,14 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2008 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2011 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2011-2017 Cisco Systems, Inc. All rights reserved * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2017 IBM Corporation. All rights reserved. @@ -48,7 +48,9 @@ char *mca_base_component_path = NULL; int mca_base_opened = 0; char *mca_base_system_default_path = NULL; char *mca_base_user_default_path = NULL; -bool mca_base_component_show_load_errors = true; +bool mca_base_component_show_load_errors = + (bool) OPAL_SHOW_LOAD_ERRORS_DEFAULT; +bool mca_base_component_track_load_errors = false; bool mca_base_component_disable_dlopen = false; static char *mca_base_verbose = NULL; @@ -101,7 +103,8 @@ int mca_base_open(void) MCA_BASE_VAR_SYN_FLAG_DEPRECATED); free(value); - mca_base_component_show_load_errors = true; + mca_base_component_show_load_errors = + (bool) OPAL_SHOW_LOAD_ERRORS_DEFAULT; var_id = mca_base_var_register("opal", "mca", "base", "component_show_load_errors", "Whether to show errors for components that failed to load or not", MCA_BASE_VAR_TYPE_BOOL, NULL, 0, 0, @@ -111,6 +114,14 @@ int mca_base_open(void) (void) mca_base_var_register_synonym(var_id, "opal", "mca", NULL, "component_show_load_errors", MCA_BASE_VAR_SYN_FLAG_DEPRECATED); + mca_base_component_track_load_errors = false; + var_id = mca_base_var_register("opal", "mca", "base", "component_track_load_errors", + "Whether to track errors for components that failed to load or not", + MCA_BASE_VAR_TYPE_BOOL, NULL, 0, 0, + OPAL_INFO_LVL_9, + MCA_BASE_VAR_SCOPE_READONLY, + &mca_base_component_track_load_errors); + mca_base_component_disable_dlopen = false; var_id = mca_base_var_register("opal", "mca", "base", "component_disable_dlopen", "Whether to attempt to disable opening dynamic components or not", @@ -165,8 +176,10 @@ static void set_defaults(opal_output_stream_t *lds) /* Load up defaults */ OBJ_CONSTRUCT(lds, opal_output_stream_t); +#if defined(HAVE_SYSLOG) && defined(HAVE_SYSLOG_H) lds->lds_syslog_priority = LOG_INFO; lds->lds_syslog_ident = "ompi"; +#endif lds->lds_want_stderr = true; } @@ -196,10 +209,15 @@ static void parse_verbose(char *e, opal_output_stream_t *lds) } if (0 == strcasecmp(ptr, "syslog")) { +#if defined(HAVE_SYSLOG) && defined(HAVE_SYSLOG_H) lds->lds_want_syslog = true; have_output = true; +#else + opal_output(0, "syslog support requested but not available on this system"); +#endif /* defined(HAVE_SYSLOG) && defined(HAVE_SYSLOG_H) */ } else if (strncasecmp(ptr, "syslogpri:", 10) == 0) { +#if defined(HAVE_SYSLOG) && defined(HAVE_SYSLOG_H) lds->lds_want_syslog = true; have_output = true; if (strcasecmp(ptr + 10, "notice") == 0) @@ -208,9 +226,16 @@ static void parse_verbose(char *e, opal_output_stream_t *lds) lds->lds_syslog_priority = LOG_INFO; else if (strcasecmp(ptr + 10, "DEBUG") == 0) lds->lds_syslog_priority = LOG_DEBUG; +#else + opal_output(0, "syslog support requested but not available on this system"); +#endif /* defined(HAVE_SYSLOG) && defined(HAVE_SYSLOG_H) */ } else if (strncasecmp(ptr, "syslogid:", 9) == 0) { +#if defined(HAVE_SYSLOG) && defined(HAVE_SYSLOG_H) lds->lds_want_syslog = true; lds->lds_syslog_ident = ptr + 9; +#else + opal_output(0, "syslog support requested but not available on this system"); +#endif /* defined(HAVE_SYSLOG) && defined(HAVE_SYSLOG_H) */ } else if (strcasecmp(ptr, "stdout") == 0) { @@ -221,12 +246,12 @@ static void parse_verbose(char *e, opal_output_stream_t *lds) have_output = true; } - else if (strcasecmp(ptr, "file") == 0) { + else if (strcasecmp(ptr, "file") == 0 || strcasecmp(ptr, "file:") == 0) { lds->lds_want_file = true; have_output = true; } else if (strncasecmp(ptr, "file:", 5) == 0) { lds->lds_want_file = true; - lds->lds_file_suffix = ptr + 5; + lds->lds_file_suffix = strdup(ptr + 5); have_output = true; } else if (strcasecmp(ptr, "fileappend") == 0) { lds->lds_want_file = true; diff --git a/opal/mca/base/mca_base_pvar.c b/opal/mca/base/mca_base_pvar.c index 7decb8ab6f2..01f95e82a14 100644 --- a/opal/mca/base/mca_base_pvar.c +++ b/opal/mca/base/mca_base_pvar.c @@ -1,12 +1,13 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* - * Copyright (c) 2013-2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2013-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2015 Bull SAS. All rights reserved. * Copyright (c) 2015 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -148,7 +149,7 @@ static int mca_base_pvar_default_get_value (const mca_base_pvar_t *pvar, void *v /* not used */ (void) obj_handle; - memmove (value, pvar->ctx, var_type_sizes[pvar->type]); + memmove (value, pvar->ctx, ompi_var_type_sizes[pvar->type]); return OPAL_SUCCESS; } @@ -158,7 +159,7 @@ static int mca_base_pvar_default_set_value (mca_base_pvar_t *pvar, const void *v /* not used */ (void) obj_handle; - memmove (pvar->ctx, value, var_type_sizes[pvar->type]); + memmove (pvar->ctx, value, ompi_var_type_sizes[pvar->type]); return OPAL_SUCCESS; } @@ -188,7 +189,6 @@ int mca_base_pvar_register (const char *project, const char *framework, const ch /* assert on usage errors */ if (!get_value && !ctx) { - assert (0); return OPAL_ERR_BAD_PARAM; } @@ -202,14 +202,12 @@ int mca_base_pvar_register (const char *project, const char *framework, const ch case MCA_BASE_PVAR_CLASS_STATE: /* states MUST be integers */ if (MCA_BASE_VAR_TYPE_INT != type) { - assert (0); return OPAL_ERR_BAD_PARAM; } break; case MCA_BASE_PVAR_CLASS_COUNTER: /* counters can have the any of types in the fall-through except double */ if (MCA_BASE_VAR_TYPE_DOUBLE == type) { - assert (0); return OPAL_ERR_BAD_PARAM; } /* fall-through */ @@ -223,14 +221,12 @@ int mca_base_pvar_register (const char *project, const char *framework, const ch MCA_BASE_VAR_TYPE_UNSIGNED_LONG != type && MCA_BASE_VAR_TYPE_UNSIGNED_LONG_LONG != type && MCA_BASE_VAR_TYPE_DOUBLE != type) { - assert (0); return OPAL_ERR_BAD_PARAM; } break; case MCA_BASE_PVAR_CLASS_PERCENTAGE: /* percentages must be doubles */ if (MCA_BASE_VAR_TYPE_DOUBLE != type) { - assert (0); return OPAL_ERR_BAD_PARAM; } break; @@ -239,8 +235,7 @@ int mca_base_pvar_register (const char *project, const char *framework, const ch variables */ break; default: - assert (0); - break; + return OPAL_ERR_BAD_PARAM; } /* update this assert if more MPIT verbosity levels are added */ @@ -252,7 +247,6 @@ int mca_base_pvar_register (const char *project, const char *framework, const ch ret = mca_base_pvar_get_internal (ret, &pvar, true); if (OPAL_SUCCESS != ret) { /* inconsistent internal state */ - assert (0); return OPAL_ERROR; } @@ -347,9 +341,8 @@ int mca_base_component_pvar_register (const mca_base_component_t *component, con int bind, mca_base_pvar_flag_t flags, mca_base_get_value_fn_t get_value, mca_base_set_value_fn_t set_value, mca_base_notify_fn_t notify, void *ctx) { - /* XXX -- component_update -- We will stash the project name in the component */ /* invalidate this variable if the component's group is deregistered */ - return mca_base_pvar_register(NULL, component->mca_type_name, component->mca_component_name, + return mca_base_pvar_register(component->mca_project_name, component->mca_type_name, component->mca_component_name, name, description, verbosity, var_class, type, enumerator, bind, flags | MCA_BASE_PVAR_FLAG_IWG, get_value, set_value, notify, ctx); } @@ -463,7 +456,7 @@ int mca_base_pvar_handle_alloc (mca_base_pvar_session_t *session, int index, voi break; } - pvar_handle->obj_handle = obj_handle; + pvar_handle->obj_handle = (NULL == obj_handle ? NULL : *(void**)obj_handle); pvar_handle->pvar = pvar; *handle = pvar_handle; @@ -481,7 +474,7 @@ int mca_base_pvar_handle_alloc (mca_base_pvar_session_t *session, int index, voi /* get the size of this datatype since read functions will expect an array of datatype not mca_base_pvar_value_t's. */ - datatype_size = var_type_sizes[pvar->type]; + datatype_size = ompi_var_type_sizes[pvar->type]; if (0 == datatype_size) { ret = OPAL_ERROR; break; @@ -689,7 +682,7 @@ int mca_base_pvar_handle_read_value (mca_base_pvar_handle_t *handle, void *value if (mca_base_pvar_is_sum (handle->pvar) || mca_base_pvar_is_watermark (handle->pvar) || !mca_base_pvar_handle_is_running (handle)) { /* read the value cached in the handle. */ - memmove (value, handle->current_value, handle->count * var_type_sizes[handle->pvar->type]); + memmove (value, handle->current_value, handle->count * ompi_var_type_sizes[handle->pvar->type]); } else { /* read the value directly from the variable. */ ret = handle->pvar->get_value (handle->pvar, value, handle->obj_handle); @@ -718,7 +711,9 @@ int mca_base_pvar_handle_write_value (mca_base_pvar_handle_t *handle, const void return ret; } - memmove (handle->current_value, value, handle->count * var_type_sizes[handle->pvar->type]); + memmove (handle->current_value, value, handle->count * ompi_var_type_sizes[handle->pvar->type]); + /* read the value directly from the variable. */ + ret = handle->pvar->set_value (handle->pvar, value, handle->obj_handle); return OPAL_SUCCESS; } @@ -797,7 +792,7 @@ int mca_base_pvar_handle_reset (mca_base_pvar_handle_t *handle) /* reset this handle to a state analagous to when it was created */ if (mca_base_pvar_is_sum (handle->pvar)) { /* reset the running sum to 0 */ - memset (handle->current_value, 0, handle->count * var_type_sizes[handle->pvar->type]); + memset (handle->current_value, 0, handle->count * ompi_var_type_sizes[handle->pvar->type]); if (mca_base_pvar_handle_is_running (handle)) { ret = handle->pvar->get_value (handle->pvar, handle->last_value, handle->obj_handle); @@ -877,7 +872,7 @@ int mca_base_pvar_dump(int index, char ***out, mca_base_var_dump_type_t output_t } } - (void)asprintf(out[0] + line++, "%stype:%s", tmp, var_type_names[pvar->type]); + (void)asprintf(out[0] + line++, "%stype:%s", tmp, ompi_var_type_names[pvar->type]); free(tmp); // release tmp storage } else { /* there will be at most three lines in the pretty print case */ @@ -887,7 +882,7 @@ int mca_base_pvar_dump(int index, char ***out, mca_base_var_dump_type_t output_t } (void)asprintf (out[0] + line++, "performance \"%s\" (type: %s, class: %s)", full_name, - var_type_names[pvar->type], pvar_class_names[pvar->var_class]); + ompi_var_type_names[pvar->type], pvar_class_names[pvar->var_class]); if (pvar->description) { (void)asprintf(out[0] + line++, "%s", pvar->description); diff --git a/opal/mca/base/mca_base_var.c b/opal/mca/base/mca_base_var.c index 728f023eb10..9b07a9664ee 100644 --- a/opal/mca/base/mca_base_var.c +++ b/opal/mca/base/mca_base_var.c @@ -11,11 +11,12 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2015 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2012-2017 Los Alamos National Security, LLC. All rights + * Copyright (c) 2012-2018 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014-2016 Intel, Inc. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -78,7 +79,7 @@ static int mca_base_var_count = 0; static opal_hash_table_t mca_base_var_index_hash; -const char *var_type_names[] = { +const char *ompi_var_type_names[] = { "int", "unsigned_int", "unsigned_long", @@ -87,10 +88,15 @@ const char *var_type_names[] = { "string", "version_string", "bool", - "double" + "double", + "long", + "int32_t", + "uint32_t", + "int64_t", + "uint64_t", }; -const size_t var_type_sizes[] = { +const size_t ompi_var_type_sizes[] = { sizeof (int), sizeof (unsigned), sizeof (unsigned long), @@ -99,10 +105,15 @@ const size_t var_type_sizes[] = { sizeof (char), sizeof (char), sizeof (bool), - sizeof (double) + sizeof (double), + sizeof (long), + sizeof (int32_t), + sizeof (uint32_t), + sizeof (int64_t), + sizeof (uint64_t), }; -const char *var_source_names[] = { +static const char *var_source_names[] = { "default", "command line", "environment", @@ -682,8 +693,13 @@ static int var_set_from_string (mca_base_var_t *var, char *src) switch (var->mbv_type) { case MCA_BASE_VAR_TYPE_INT: + case MCA_BASE_VAR_TYPE_INT32_T: + case MCA_BASE_VAR_TYPE_UINT32_T: + case MCA_BASE_VAR_TYPE_LONG: case MCA_BASE_VAR_TYPE_UNSIGNED_INT: case MCA_BASE_VAR_TYPE_UNSIGNED_LONG: + case MCA_BASE_VAR_TYPE_INT64_T: + case MCA_BASE_VAR_TYPE_UINT64_T: case MCA_BASE_VAR_TYPE_UNSIGNED_LONG_LONG: case MCA_BASE_VAR_TYPE_BOOL: case MCA_BASE_VAR_TYPE_SIZE_T: @@ -709,6 +725,17 @@ static int var_set_from_string (mca_base_var_t *var, char *src) MCA_BASE_VAR_TYPE_UNSIGNED_INT == var->mbv_type) { int *castme = (int*) var->mbv_storage; *castme = int_value; + } else if (MCA_BASE_VAR_TYPE_INT32_T == var->mbv_type || + MCA_BASE_VAR_TYPE_UINT32_T == var->mbv_type) { + int32_t *castme = (int32_t *) var->mbv_storage; + *castme = int_value; + } else if (MCA_BASE_VAR_TYPE_INT64_T == var->mbv_type || + MCA_BASE_VAR_TYPE_UINT64_T == var->mbv_type) { + int64_t *castme = (int64_t *) var->mbv_storage; + *castme = int_value; + } else if (MCA_BASE_VAR_TYPE_LONG == var->mbv_type) { + long *castme = (long*) var->mbv_storage; + *castme = (long) int_value; } else if (MCA_BASE_VAR_TYPE_UNSIGNED_LONG == var->mbv_type) { unsigned long *castme = (unsigned long*) var->mbv_storage; *castme = (unsigned long) int_value; @@ -770,7 +797,7 @@ int mca_base_var_set_value (int vari, const void *value, size_t size, mca_base_v } if (MCA_BASE_VAR_TYPE_STRING != var->mbv_type && MCA_BASE_VAR_TYPE_VERSION_STRING != var->mbv_type) { - memmove (var->mbv_storage, value, var_type_sizes[var->mbv_type]); + memmove (var->mbv_storage, value, ompi_var_type_sizes[var->mbv_type]); } else { var_set_string (var, (char *) value); } @@ -1262,11 +1289,18 @@ static int register_variable (const char *project_name, const char *framework_na uintptr_t align = 0; switch (type) { case MCA_BASE_VAR_TYPE_INT: - align = OPAL_ALIGNMENT_INT; - break; case MCA_BASE_VAR_TYPE_UNSIGNED_INT: align = OPAL_ALIGNMENT_INT; break; + case MCA_BASE_VAR_TYPE_INT32_T: + case MCA_BASE_VAR_TYPE_UINT32_T: + align = OPAL_ALIGNMENT_INT32; + break; + case MCA_BASE_VAR_TYPE_INT64_T: + case MCA_BASE_VAR_TYPE_UINT64_T: + align = OPAL_ALIGNMENT_INT64; + break; + case MCA_BASE_VAR_TYPE_LONG: case MCA_BASE_VAR_TYPE_UNSIGNED_LONG: align = OPAL_ALIGNMENT_LONG; break; @@ -1895,6 +1929,14 @@ static int var_value_string (mca_base_var_t *var, char **value_string) assert (MCA_BASE_VAR_TYPE_MAX > var->mbv_type); + /** Parameters with MCA_BASE_VAR_FLAG_DEF_UNSET flag should be shown + * as "unset" by default. */ + if ((var->mbv_flags & MCA_BASE_VAR_FLAG_DEF_UNSET) && + (MCA_BASE_VAR_SOURCE_DEFAULT == var->mbv_source)){ + asprintf (value_string, "%s", "unset"); + return OPAL_SUCCESS; + } + ret = mca_base_var_get_value(var->mbv_index, &value, NULL, NULL); if (OPAL_SUCCESS != ret || NULL == value) { return ret; @@ -1905,6 +1947,21 @@ static int var_value_string (mca_base_var_t *var, char **value_string) case MCA_BASE_VAR_TYPE_INT: ret = asprintf (value_string, "%d", value->intval); break; + case MCA_BASE_VAR_TYPE_INT32_T: + ret = asprintf (value_string, "%" PRId32, value->int32tval); + break; + case MCA_BASE_VAR_TYPE_UINT32_T: + ret = asprintf (value_string, "%" PRIu32, value->uint32tval); + break; + case MCA_BASE_VAR_TYPE_INT64_T: + ret = asprintf (value_string, "%" PRId64, value->int64tval); + break; + case MCA_BASE_VAR_TYPE_UINT64_T: + ret = asprintf (value_string, "%" PRIu64, value->uint64tval); + break; + case MCA_BASE_VAR_TYPE_LONG: + ret = asprintf (value_string, "%ld", value->longval); + break; case MCA_BASE_VAR_TYPE_UNSIGNED_INT: ret = asprintf (value_string, "%u", value->uintval); break; @@ -2117,7 +2174,7 @@ int mca_base_var_dump(int vari, char ***out, mca_base_var_dump_type_t output_typ /* Is this variable deprecated? */ asprintf(out[0] + line++, "%sdeprecated:%s", tmp, VAR_IS_DEPRECATED(var[0]) ? "yes" : "no"); - asprintf(out[0] + line++, "%stype:%s", tmp, var_type_names[var->mbv_type]); + asprintf(out[0] + line++, "%stype:%s", tmp, ompi_var_type_names[var->mbv_type]); /* Does this parameter have any synonyms or is it a synonym? */ if (VAR_IS_SYNONYM(var[0])) { @@ -2148,7 +2205,7 @@ int mca_base_var_dump(int vari, char ***out, mca_base_var_dump_type_t output_typ asprintf (out[0], "%s \"%s\" (current value: \"%s\", data source: %s, level: %d %s, type: %s", VAR_IS_DEFAULT_ONLY(var[0]) ? "informational" : "parameter", full_name, value_string, source_string, var->mbv_info_lvl + 1, - info_lvl_strings[var->mbv_info_lvl], var_type_names[var->mbv_type]); + info_lvl_strings[var->mbv_info_lvl], ompi_var_type_names[var->mbv_type]); tmp = out[0][0]; if (VAR_IS_DEPRECATED(var[0])) { diff --git a/opal/mca/base/mca_base_var.h b/opal/mca/base/mca_base_var.h index 6f9967c0397..946e0b58529 100644 --- a/opal/mca/base/mca_base_var.h +++ b/opal/mca/base/mca_base_var.h @@ -11,9 +11,10 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2011 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2012-2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2012-2018 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2016 Intel, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -91,11 +92,22 @@ typedef enum { MCA_BASE_VAR_TYPE_BOOL, /** The variable is of type double */ MCA_BASE_VAR_TYPE_DOUBLE, + /** The variable is of type long int */ + MCA_BASE_VAR_TYPE_LONG, + /** The variable is of type int32_t */ + MCA_BASE_VAR_TYPE_INT32_T, + /** The variable is of type uint32_t */ + MCA_BASE_VAR_TYPE_UINT32_T, + /** The variable is of type int64_t */ + MCA_BASE_VAR_TYPE_INT64_T, + /** The variable is of type uint64_t */ + MCA_BASE_VAR_TYPE_UINT64_T, + /** Maximum variable type. */ MCA_BASE_VAR_TYPE_MAX } mca_base_var_type_t; -extern const char *var_type_names[]; +extern const char *ompi_var_type_names[]; /** * Source of an MCA variable's value @@ -190,7 +202,10 @@ typedef enum { manually when you register a variable with mca_base_var_register(). Analogous to the MCA_BASE_PVAR_FLAG_IWG. */ - MCA_BASE_VAR_FLAG_DWG = 0x0040 + MCA_BASE_VAR_FLAG_DWG = 0x0040, + /** Variable has a default value of "unset". Meaning to only + * be set when the user explicitly asks for it */ + MCA_BASE_VAR_FLAG_DEF_UNSET = 0x0080, } mca_base_var_flag_t; @@ -200,14 +215,24 @@ typedef enum { typedef union { /** integer value */ int intval; + /** int32_t value */ + int32_t int32tval; + /** long value */ + long longval; + /** int64_t value */ + int64_t int64tval; /** unsigned int value */ unsigned int uintval; + /** uint32_t value */ + uint32_t uint32tval; /** string value */ char *stringval; /** boolean value */ bool boolval; /** unsigned long value */ unsigned long ulval; + /** uint64_t value */ + uint64_t uint64tval; /** unsigned long long value */ unsigned long long ullval; /** size_t value */ diff --git a/opal/mca/base/mca_base_var_enum.c b/opal/mca/base/mca_base_var_enum.c index 0cfa4434f82..626a8db2950 100644 --- a/opal/mca/base/mca_base_var_enum.c +++ b/opal/mca/base/mca_base_var_enum.c @@ -11,10 +11,11 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2013 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2012-2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2012-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -40,7 +41,7 @@ OBJ_CLASS_INSTANCE(mca_base_var_enum_t, opal_object_t, mca_base_var_enum_constru static void mca_base_var_enum_flag_constructor (mca_base_var_enum_flag_t *enumerator); static void mca_base_var_enum_flag_destructor (mca_base_var_enum_flag_t *enumerator); -OBJ_CLASS_INSTANCE(mca_base_var_enum_flag_t, opal_object_t, mca_base_var_enum_flag_constructor, +static OBJ_CLASS_INSTANCE(mca_base_var_enum_flag_t, opal_object_t, mca_base_var_enum_flag_constructor, mca_base_var_enum_flag_destructor); static int enum_dump (mca_base_var_enum_t *self, char **out); @@ -70,7 +71,7 @@ static int mca_base_var_enum_bool_vfs (mca_base_var_enum_t *self, const char *st int *value) { char *tmp; - int v; + long v; /* skip whitespace */ string_value += strspn (string_value, " \t\n\v\f\r"); @@ -78,10 +79,12 @@ static int mca_base_var_enum_bool_vfs (mca_base_var_enum_t *self, const char *st v = strtol (string_value, &tmp, 10); if (*tmp != '\0') { if (0 == strcmp (string_value, "true") || 0 == strcmp (string_value, "t") || - 0 == strcmp (string_value, "enabled") || 0 == strcmp (string_value, "yes")) { + 0 == strcmp (string_value, "enabled") || 0 == strcmp (string_value, "yes") || + 0 == strcmp (string_value, "y")) { v = 1; } else if (0 == strcmp (string_value, "false") || 0 == strcmp (string_value, "f") || - 0 == strcmp (string_value, "disabled") || 0 == strcmp (string_value, "no")) { + 0 == strcmp (string_value, "disabled") || 0 == strcmp (string_value, "no") || + 0 == strcmp (string_value, "n")) { v = 0; } else { return OPAL_ERR_VALUE_OUT_OF_BOUNDS; @@ -105,7 +108,7 @@ static int mca_base_var_enum_bool_sfv (mca_base_var_enum_t *self, const int valu static int mca_base_var_enum_bool_dump (mca_base_var_enum_t *self, char **out) { - *out = strdup ("0: f|false|disabled|no, 1: t|true|enabled|yes"); + *out = strdup ("0: f|false|disabled|no|n, 1: t|true|enabled|yes|y"); return *out ? OPAL_SUCCESS : OPAL_ERR_OUT_OF_RESOURCE; } @@ -146,7 +149,7 @@ static int mca_base_var_enum_auto_bool_vfs (mca_base_var_enum_t *self, const cha int *value) { char *tmp; - int v; + long v; /* skip whitespace */ string_value += strspn (string_value, " \t\n\v\f\r"); @@ -154,10 +157,12 @@ static int mca_base_var_enum_auto_bool_vfs (mca_base_var_enum_t *self, const cha v = strtol (string_value, &tmp, 10); if (*tmp != '\0') { if (0 == strcasecmp (string_value, "true") || 0 == strcasecmp (string_value, "t") || - 0 == strcasecmp (string_value, "enabled") || 0 == strcasecmp (string_value, "yes")) { + 0 == strcasecmp (string_value, "enabled") || 0 == strcasecmp (string_value, "yes") || + 0 == strcasecmp (string_value, "y")) { v = 1; } else if (0 == strcasecmp (string_value, "false") || 0 == strcasecmp (string_value, "f") || - 0 == strcasecmp (string_value, "disabled") || 0 == strcasecmp (string_value, "no")) { + 0 == strcasecmp (string_value, "disabled") || 0 == strcasecmp (string_value, "no") || + 0 == strcasecmp (string_value, "n")) { v = 0; } else if (0 == strcasecmp (string_value, "auto")) { v = -1; @@ -171,7 +176,7 @@ static int mca_base_var_enum_auto_bool_vfs (mca_base_var_enum_t *self, const cha } else if (v < -1) { *value = -1; } else { - *value = v; + *value = (int) v; } return OPAL_SUCCESS; @@ -195,7 +200,7 @@ static int mca_base_var_enum_auto_bool_sfv (mca_base_var_enum_t *self, const int static int mca_base_var_enum_auto_bool_dump (mca_base_var_enum_t *self, char **out) { - *out = strdup ("-1: auto, 0: f|false|disabled|no, 1: t|true|enabled|yes"); + *out = strdup ("-1: auto, 0: f|false|disabled|no|n, 1: t|true|enabled|yes|y"); return *out ? OPAL_SUCCESS : OPAL_ERR_OUT_OF_RESOURCE; } diff --git a/opal/mca/base/mca_base_var_group.c b/opal/mca/base/mca_base_var_group.c index 11058ef31ee..d009e19c2a2 100644 --- a/opal/mca/base/mca_base_var_group.c +++ b/opal/mca/base/mca_base_var_group.c @@ -321,8 +321,7 @@ int mca_base_var_group_register (const char *project_name, const char *framework int mca_base_var_group_component_register (const mca_base_component_t *component, const char *description) { - /* 1.7 components do not store the project */ - return group_register (NULL, component->mca_type_name, + return group_register (component->mca_project_name, component->mca_type_name, component->mca_component_name, description); } diff --git a/opal/mca/base/mca_base_vari.h b/opal/mca/base/mca_base_vari.h index f1a4722f054..51f879dfda9 100644 --- a/opal/mca/base/mca_base_vari.h +++ b/opal/mca/base/mca_base_vari.h @@ -15,6 +15,7 @@ * reserved. * Copyright (c) 2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -71,8 +72,8 @@ typedef enum { #define VAR_IS_SETTABLE(var) (!!((var).mbv_flags & MCA_BASE_VAR_FLAG_SETTABLE)) #define VAR_IS_DEPRECATED(var) (!!((var).mbv_flags & MCA_BASE_VAR_FLAG_DEPRECATED)) -extern const char *var_type_names[]; -extern const size_t var_type_sizes[]; +extern const char *ompi_var_type_names[]; +extern const size_t ompi_var_type_sizes[]; extern bool mca_base_var_initialized; /** diff --git a/opal/mca/btl/base/btl_base_frame.c b/opal/mca/btl/base/btl_base_frame.c index e1851ec6e58..857273bea15 100644 --- a/opal/mca/btl/base/btl_base_frame.c +++ b/opal/mca/btl/base/btl_base_frame.c @@ -14,7 +14,7 @@ * Copyright (c) 2008-2013 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2016-2018 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -53,6 +53,7 @@ mca_base_var_enum_value_flag_t mca_btl_base_flag_enum_flags[] = { {MCA_BTL_FLAGS_NEED_ACK, "need-ack", 0}, {MCA_BTL_FLAGS_NEED_CSUM, "need-csum", 0}, {MCA_BTL_FLAGS_HETEROGENEOUS_RDMA, "hetero-rdma", 0}, + {MCA_BTL_FLAGS_RDMA_FLUSH, "rdma-flush", 0}, {0, NULL, 0} }; diff --git a/opal/mca/btl/base/btl_base_mca.c b/opal/mca/btl/base/btl_base_mca.c index fb59f0e816c..c65c0a3b5c2 100644 --- a/opal/mca/btl/base/btl_base_mca.c +++ b/opal/mca/btl/base/btl_base_mca.c @@ -14,7 +14,7 @@ * Copyright (c) 2007 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2010 Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2013 NVIDIA Corporation. All rights reserved. - * Copyright (c) 2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2016-2018 Los Alamos National Security, LLC. All rights * reserved. * * $COPYRIGHT$ @@ -182,6 +182,10 @@ int mca_btl_base_param_verify(mca_btl_base_module_t *module) module->btl_flags &= ~MCA_BTL_FLAGS_GET; } + if (NULL == module->btl_flush) { + module->btl_flags &= ~MCA_BTL_FLAGS_RDMA_FLUSH; + } + if (0 == module->btl_atomic_flags) { module->btl_flags &= ~MCA_BTL_FLAGS_ATOMIC_OPS; } diff --git a/opal/mca/btl/btl.h b/opal/mca/btl/btl.h index 48564b573ed..5f0e73b30c7 100644 --- a/opal/mca/btl/btl.h +++ b/opal/mca/btl/btl.h @@ -10,7 +10,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2006-2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2006-2018 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2010 Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2012-2013 NVIDIA Corporation. All rights reserved. @@ -244,6 +244,9 @@ typedef uint8_t mca_btl_base_tag_t; /* The BTL is using progress thread and need the protection on matching */ #define MCA_BTL_FLAGS_BTL_PROGRESS_THREAD_ENABLED 0x40000 +/* The BTL supports RMDA flush */ +#define MCA_BTL_FLAGS_RDMA_FLUSH 0x80000 + /* Default exclusivity levels */ #define MCA_BTL_EXCLUSIVITY_HIGH (64*1024) /* internal loopback */ #define MCA_BTL_EXCLUSIVITY_DEFAULT 1024 /* GM/IB/etc. */ @@ -1164,6 +1167,20 @@ typedef void (*mca_btl_base_module_dump_fn_t)( */ typedef int (*mca_btl_base_module_ft_event_fn_t)(int state); +/** + * Flush all outstanding RDMA operations on an endpoint or all endpoints. + * + * @param btl (IN) BTL module + * @param endpoint (IN) Endpoint to flush (NULL == all) + * + * This function returns when all outstanding RDMA (put, get, atomic) operations + * that were started prior to the flush call have completed. This call does + * NOT guarantee that all BTL callbacks have been completed. + * + * The BTL is allowed to ignore the endpoint parameter and flush *all* endpoints. + */ +typedef int (*mca_btl_base_module_flush_fn_t) (struct mca_btl_base_module_t *btl, struct mca_btl_base_endpoint_t *endpoint); + /** * BTL module interface functions and attributes. */ @@ -1231,23 +1248,30 @@ struct mca_btl_base_module_t { #if OPAL_CUDA_SUPPORT size_t btl_cuda_max_send_size; /**< set if CUDA max send_size is different from host max send size */ #endif /* OPAL_CUDA_SUPPORT */ + + mca_btl_base_module_flush_fn_t btl_flush; /**< flush all previous operations on an endpoint */ + + unsigned char padding[256]; /**< padding to future-proof the btl module */ }; typedef struct mca_btl_base_module_t mca_btl_base_module_t; /* - * Macro for use in modules that are of type btl v3.0.0 - * NOTE: This is not the final version of 3.0.0. Consider it - * alpha until this comment is removed. + * Macro for use in modules that are of type btl v3.1.0 */ -#define MCA_BTL_BASE_VERSION_3_0_0 \ - OPAL_MCA_BASE_VERSION_2_1_0("btl", 3, 0, 0) +#define MCA_BTL_BASE_VERSION_3_1_0 \ + OPAL_MCA_BASE_VERSION_2_1_0("btl", 3, 1, 0) #define MCA_BTL_DEFAULT_VERSION(name) \ - MCA_BTL_BASE_VERSION_3_0_0, \ + MCA_BTL_BASE_VERSION_3_1_0, \ .mca_component_name = name, \ MCA_BASE_MAKE_VERSION(component, OPAL_MAJOR_VERSION, OPAL_MINOR_VERSION, \ OPAL_RELEASE_VERSION) +/** + * Convinience macro for detecting the BTL interface version. + */ +#define BTL_VERSION 310 + END_C_DECLS #endif /* OPAL_MCA_BTL_H */ diff --git a/opal/mca/btl/openib/Makefile.am b/opal/mca/btl/openib/Makefile.am index aeb9da07e09..c66d1619aed 100644 --- a/opal/mca/btl/openib/Makefile.am +++ b/opal/mca/btl/openib/Makefile.am @@ -17,6 +17,7 @@ # Copyright (c) 2013 Intel, Inc. All rights reserved. # Copyright (c) 2016 Research Organization for Information Science # and Technology (RIST). All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -113,7 +114,8 @@ mcacomponentdir = $(opallibdir) mcacomponent_LTLIBRARIES = $(component) mca_btl_openib_la_SOURCES = $(component_sources) mca_btl_openib_la_LDFLAGS = -module -avoid-version $(btl_openib_LDFLAGS) -mca_btl_openib_la_LIBADD = $(btl_openib_LIBS) \ +mca_btl_openib_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la \ + $(btl_openib_LIBS) \ $(OPAL_TOP_BUILDDIR)/opal/mca/common/verbs/lib@OPAL_LIB_PREFIX@mca_common_verbs.la if OPAL_cuda_support mca_btl_openib_la_LIBADD += \ diff --git a/opal/mca/btl/openib/btl_openib.c b/opal/mca/btl/openib/btl_openib.c index fe8ee2e8c74..dc279df8347 100644 --- a/opal/mca/btl/openib/btl_openib.c +++ b/opal/mca/btl/openib/btl_openib.c @@ -226,7 +226,7 @@ static int adjust_cq(mca_btl_openib_device_t *device, const int cq) rc = ibv_resize_cq(device->ib_cq[cq], cq_size); /* For ConnectX the resize CQ is not implemented and verbs returns -ENOSYS * but should return ENOSYS. So it is reason for abs */ - if(rc && ENOSYS != abs(rc)) { + if(rc && ENOSYS != abs(rc) && EOPNOTSUPP != abs(rc)) { BTL_ERROR(("cannot resize completion queue, error: %d", rc)); return OPAL_ERROR; } @@ -1119,7 +1119,7 @@ int mca_btl_openib_add_procs( } if (nprocs_new) { - opal_atomic_add_32 (&openib_btl->num_peers, nprocs_new); + opal_atomic_add_fetch_32 (&openib_btl->num_peers, nprocs_new); /* adjust cq sizes given the new procs */ rc = openib_btl_size_queues (openib_btl); @@ -1229,7 +1229,7 @@ struct mca_btl_base_endpoint_t *mca_btl_openib_get_ep (struct mca_btl_base_modul /* this is a new process to this openib btl * account this procs if need */ - opal_atomic_add_32 (&openib_btl->num_peers, 1); + opal_atomic_add_fetch_32 (&openib_btl->num_peers, 1); rc = openib_btl_size_queues(openib_btl); if (OPAL_SUCCESS != rc) { BTL_ERROR(("error creating cqs")); diff --git a/opal/mca/btl/openib/btl_openib_async.c b/opal/mca/btl/openib/btl_openib_async.c index 3662624292e..5c52f9566b1 100644 --- a/opal/mca/btl/openib/btl_openib_async.c +++ b/opal/mca/btl/openib/btl_openib_async.c @@ -237,7 +237,7 @@ static void btl_openib_async_device (int fd, short flags, void *arg) /* Set the flag to fatal */ device->got_fatal_event = true; /* It is not critical to protect the counter */ - OPAL_THREAD_ADD32(&mca_btl_openib_component.error_counter, 1); + OPAL_THREAD_ADD_FETCH32(&mca_btl_openib_component.error_counter, 1); /* fall through */ case IBV_EVENT_CQ_ERR: case IBV_EVENT_QP_FATAL: @@ -280,7 +280,7 @@ static void btl_openib_async_device (int fd, short flags, void *arg) openib_event_to_str((enum ibv_event_type)event_type)); /* Set the flag to indicate port error */ device->got_port_event = true; - OPAL_THREAD_ADD32(&mca_btl_openib_component.error_counter, 1); + OPAL_THREAD_ADD_FETCH32(&mca_btl_openib_component.error_counter, 1); break; case IBV_EVENT_COMM_EST: case IBV_EVENT_PORT_ACTIVE: @@ -470,7 +470,7 @@ void mca_btl_openib_async_fini (void) void mca_btl_openib_async_add_device (mca_btl_openib_device_t *device) { if (mca_btl_openib_component.async_evbase) { - if (1 == OPAL_THREAD_ADD32 (&btl_openib_async_device_count, 1)) { + if (1 == OPAL_THREAD_ADD_FETCH32 (&btl_openib_async_device_count, 1)) { mca_btl_openib_async_init (); } opal_event_set (mca_btl_openib_component.async_evbase, &device->async_event, @@ -484,7 +484,7 @@ void mca_btl_openib_async_rem_device (mca_btl_openib_device_t *device) { if (mca_btl_openib_component.async_evbase) { opal_event_del (&device->async_event); - if (0 == OPAL_THREAD_ADD32 (&btl_openib_async_device_count, -1)) { + if (0 == OPAL_THREAD_ADD_FETCH32 (&btl_openib_async_device_count, -1)) { mca_btl_openib_async_fini (); } } diff --git a/opal/mca/btl/openib/btl_openib_component.c b/opal/mca/btl/openib/btl_openib_component.c index c7cfb834ebc..554a5eaa60c 100644 --- a/opal/mca/btl/openib/btl_openib_component.c +++ b/opal/mca/btl/openib/btl_openib_component.c @@ -18,8 +18,8 @@ * Copyright (c) 2009-2012 Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2011-2015 NVIDIA Corporation. All rights reserved. * Copyright (c) 2012 Oak Ridge National Laboratory. All rights reserved - * Copyright (c) 2013-2016 Intel, Inc. All rights reserved. - * Copyright (c) 2014-2016 Research Organization for Information Science + * Copyright (c) 2013-2017 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2014 Bull SAS. All rights reserved. * $COPYRIGHT$ @@ -2330,6 +2330,11 @@ static float get_ib_dev_distance(struct ibv_device *dev) /* If we don't have hwloc, we'll default to a distance of 0, because we have no way of measuring. */ float distance = 0; + float a, b; + int i; + hwloc_cpuset_t my_cpuset = NULL, ibv_cpuset = NULL; + hwloc_obj_t my_obj, ibv_obj, node_obj; + struct hwloc_distances_s *hwloc_distances = NULL; /* Override any distance logic so all devices are used */ if (0 != mca_btl_openib_component.ignore_locality || @@ -2337,24 +2342,29 @@ static float get_ib_dev_distance(struct ibv_device *dev) return distance; } - float a, b; - int i; - hwloc_cpuset_t my_cpuset = NULL, ibv_cpuset = NULL; - hwloc_obj_t my_obj, ibv_obj, node_obj; - - /* Note that this struct is owned by hwloc; there's no need to - free it at the end of time */ - static const struct hwloc_distances_s *hwloc_distances = NULL; +#if HWLOC_API_VERSION >= 0x20000 + unsigned int j, distances_nr = 1; + int ibvindex, myindex; +#endif if (NULL == hwloc_distances) { - hwloc_distances = - hwloc_get_whole_distance_matrix_by_type(opal_hwloc_topology, - HWLOC_OBJ_NODE); - } + #if HWLOC_API_VERSION < 0x20000 + hwloc_distances = + (struct hwloc_distances_s*)hwloc_get_whole_distance_matrix_by_type(opal_hwloc_topology, + HWLOC_OBJ_NODE); + /* If we got no info, just return 0 */ + if (NULL == hwloc_distances || NULL == hwloc_distances->latency) { + goto out; + } - /* If we got no info, just return 0 */ - if (NULL == hwloc_distances || NULL == hwloc_distances->latency) { - goto out; + #else + if (0 != hwloc_distances_get_by_type(opal_hwloc_topology, HWLOC_OBJ_NODE, + &distances_nr, &hwloc_distances, + HWLOC_DISTANCES_KIND_MEANS_LATENCY, 0) || 0 == distances_nr) { + hwloc_distances = NULL; + goto out; + } + #endif } /* Next, find the NUMA node where this IBV device is located */ @@ -2372,16 +2382,31 @@ static float get_ib_dev_distance(struct ibv_device *dev) opal_output_verbose(5, opal_btl_base_framework.framework_output, "hwloc_distances->nbobjs=%d", hwloc_distances->nbobjs); +#if HWLOC_API_VERSION < 0x20000 for (i = 0; i < (int)(2 * hwloc_distances->nbobjs); i++) { opal_output_verbose(5, opal_btl_base_framework.framework_output, "hwloc_distances->latency[%d]=%f", i, hwloc_distances->latency[i]); } +#else + for (i = 0; i < (int)hwloc_distances->nbobjs; i++) { + opal_output_verbose(5, opal_btl_base_framework.framework_output, + "hwloc_distances->values[%d]=%"PRIu64, i, hwloc_distances->values[i]); + } +#endif /* If ibv_obj is a NUMA node or below, we're good. */ switch (ibv_obj->type) { case HWLOC_OBJ_NODE: case HWLOC_OBJ_SOCKET: +#if HWLOC_API_VERSION < 0x20000 case HWLOC_OBJ_CACHE: +#else + case HWLOC_OBJ_L1CACHE: + case HWLOC_OBJ_L2CACHE: + case HWLOC_OBJ_L3CACHE: + case HWLOC_OBJ_L4CACHE: + case HWLOC_OBJ_L5CACHE: +#endif case HWLOC_OBJ_CORE: case HWLOC_OBJ_PU: while (NULL != ibv_obj && ibv_obj->type != HWLOC_OBJ_NODE) { @@ -2401,6 +2426,22 @@ static float get_ib_dev_distance(struct ibv_device *dev) if (NULL == ibv_obj) { goto out; } + #if HWLOC_API_VERSION >= 0x20000 + /* the new matrix format isn't quite as friendly, so we have to + * do an exhaustive search to find the index of this object + * in that array */ + ibvindex = -1; + for (j=0; j < distances_nr; j++) { + if (ibv_obj == hwloc_distances->objs[j]) { + ibvindex = j; + break; + } + } + if (-1 == ibvindex) { + OPAL_ERROR_LOG(OPAL_ERR_NOT_FOUND); + goto out; + } + #endif opal_output_verbose(5, opal_btl_base_framework.framework_output, "ibv_obj->logical_index=%d", ibv_obj->logical_index); @@ -2423,7 +2464,15 @@ static float get_ib_dev_distance(struct ibv_device *dev) switch (my_obj->type) { case HWLOC_OBJ_NODE: case HWLOC_OBJ_SOCKET: - case HWLOC_OBJ_CACHE: + #if HWLOC_API_VERSION < 0x20000 + case HWLOC_OBJ_CACHE: + #else + case HWLOC_OBJ_L1CACHE: + case HWLOC_OBJ_L2CACHE: + case HWLOC_OBJ_L3CACHE: + case HWLOC_OBJ_L4CACHE: + case HWLOC_OBJ_L5CACHE: + #endif case HWLOC_OBJ_CORE: case HWLOC_OBJ_PU: while (NULL != my_obj && my_obj->type != HWLOC_OBJ_NODE) { @@ -2434,12 +2483,31 @@ static float get_ib_dev_distance(struct ibv_device *dev) "my_obj->logical_index=%d", my_obj->logical_index); /* Distance may be asymetrical, so calculate both of them and take the max */ - a = hwloc_distances->latency[my_obj->logical_index + - (ibv_obj->logical_index * - hwloc_distances->nbobjs)]; - b = hwloc_distances->latency[ibv_obj->logical_index + - (my_obj->logical_index * - hwloc_distances->nbobjs)]; + #if HWLOC_API_VERSION < 0x20000 + a = hwloc_distances->latency[my_obj->logical_index + + (ibv_obj->logical_index * + hwloc_distances->nbobjs)]; + b = hwloc_distances->latency[ibv_obj->logical_index + + (my_obj->logical_index * + hwloc_distances->nbobjs)]; + #else + /* the new matrix format isn't quite as friendly, so we have to + * do an exhaustive search to find the index of this object + * in that array */ + myindex = -1; + for (j=0; j < distances_nr; j++) { + if (my_obj == hwloc_distances->objs[j]) { + myindex = j; + break; + } + } + if (-1 == myindex) { + OPAL_ERROR_LOG(OPAL_ERR_NOT_FOUND); + goto out; + } + a = (float)hwloc_distances->values[myindex + (ibvindex * hwloc_distances->nbobjs)]; + b = (float)hwloc_distances->values[ibvindex + (myindex * hwloc_distances->nbobjs)]; + #endif distance = (a > b) ? a : b; } break; @@ -2455,13 +2523,28 @@ static float get_ib_dev_distance(struct ibv_device *dev) node_obj = hwloc_get_obj_inside_cpuset_by_type(opal_hwloc_topology, ibv_obj->cpuset, HWLOC_OBJ_NODE, ++i)) { - - a = hwloc_distances->latency[node_obj->logical_index + - (ibv_obj->logical_index * - hwloc_distances->nbobjs)]; - b = hwloc_distances->latency[ibv_obj->logical_index + - (node_obj->logical_index * - hwloc_distances->nbobjs)]; + #if HWLOC_API_VERSION < 0x20000 + a = hwloc_distances->latency[node_obj->logical_index + + (ibv_obj->logical_index * + hwloc_distances->nbobjs)]; + b = hwloc_distances->latency[ibv_obj->logical_index + + (node_obj->logical_index * + hwloc_distances->nbobjs)]; + #else + unsigned int j; + j = node_obj->logical_index + (ibv_obj->logical_index * hwloc_distances->nbobjs); + if (j < distances_nr) { + a = (float)hwloc_distances->values[j]; + } else { + goto out; + } + j = ibv_obj->logical_index + (node_obj->logical_index * hwloc_distances->nbobjs); + if (j < distances_nr) { + b = (float)hwloc_distances->values[j]; + } else { + goto out; + } + #endif a = (a > b) ? a : b; distance = (a > distance) ? a : distance; } @@ -2476,6 +2559,11 @@ static float get_ib_dev_distance(struct ibv_device *dev) hwloc_bitmap_free(my_cpuset); } +#if HWLOC_API_VERSION >= 0x20000 + if (NULL != hwloc_distances) { + hwloc_distances_release(opal_hwloc_topology, hwloc_distances); + } +#endif return distance; } @@ -3115,7 +3203,7 @@ static int btl_openib_handle_incoming(mca_btl_openib_module_t *openib_btl, credits = hdr->credits; if(hdr->cm_seen) - OPAL_THREAD_ADD32(&ep->qps[cqp].u.pp_qp.cm_sent, -hdr->cm_seen); + OPAL_THREAD_ADD_FETCH32(&ep->qps[cqp].u.pp_qp.cm_sent, -hdr->cm_seen); /* Now return fragment. Don't touch hdr after this point! */ if(MCA_BTL_OPENIB_RDMA_FRAG(frag)) { @@ -3127,7 +3215,7 @@ static int btl_openib_handle_incoming(mca_btl_openib_module_t *openib_btl, tf = MCA_BTL_OPENIB_GET_LOCAL_RDMA_FRAG(ep, erl->tail); if(MCA_BTL_OPENIB_RDMA_FRAG_LOCAL(tf)) break; - OPAL_THREAD_ADD32(&erl->credits, 1); + OPAL_THREAD_ADD_FETCH32(&erl->credits, 1); MCA_BTL_OPENIB_RDMA_NEXT_INDEX(erl->tail); } OPAL_THREAD_UNLOCK(&erl->lock); @@ -3145,14 +3233,14 @@ static int btl_openib_handle_incoming(mca_btl_openib_module_t *openib_btl, MCA_BTL_IB_FRAG_RETURN(frag); if (BTL_OPENIB_QP_TYPE_PP(rqp)) { if (OPAL_UNLIKELY(is_credit_msg)) { - OPAL_THREAD_ADD32(&ep->qps[cqp].u.pp_qp.cm_received, 1); + OPAL_THREAD_ADD_FETCH32(&ep->qps[cqp].u.pp_qp.cm_received, 1); } else { - OPAL_THREAD_ADD32(&ep->qps[rqp].u.pp_qp.rd_posted, -1); + OPAL_THREAD_ADD_FETCH32(&ep->qps[rqp].u.pp_qp.rd_posted, -1); } mca_btl_openib_endpoint_post_rr(ep, cqp); } else { mca_btl_openib_module_t *btl = ep->endpoint_btl; - OPAL_THREAD_ADD32(&btl->qps[rqp].u.srq_qp.rd_posted, -1); + OPAL_THREAD_ADD_FETCH32(&btl->qps[rqp].u.srq_qp.rd_posted, -1); mca_btl_openib_post_srr(btl, rqp); } } @@ -3163,10 +3251,10 @@ static int btl_openib_handle_incoming(mca_btl_openib_module_t *openib_btl, /* If we got any credits (RDMA or send), then try to progress all the no_credits_pending_frags lists */ if (rcredits > 0) { - OPAL_THREAD_ADD32(&ep->eager_rdma_remote.tokens, rcredits); + OPAL_THREAD_ADD_FETCH32(&ep->eager_rdma_remote.tokens, rcredits); } if (credits > 0) { - OPAL_THREAD_ADD32(&ep->qps[cqp].u.pp_qp.sd_credits, credits); + OPAL_THREAD_ADD_FETCH32(&ep->qps[cqp].u.pp_qp.sd_credits, credits); } if (rcredits + credits > 0) { int rc; @@ -3215,7 +3303,7 @@ static void btl_openib_handle_incoming_completion(mca_btl_base_module_t* btl, credits = hdr->credits; if(hdr->cm_seen) - OPAL_THREAD_ADD32(&ep->qps[cqp].u.pp_qp.cm_sent, -hdr->cm_seen); + OPAL_THREAD_ADD_FETCH32(&ep->qps[cqp].u.pp_qp.cm_sent, -hdr->cm_seen); /* We should not be here with eager, control, or credit messages */ assert(openib_frag_type(frag) != MCA_BTL_OPENIB_FRAG_EAGER_RDMA); @@ -3226,11 +3314,11 @@ static void btl_openib_handle_incoming_completion(mca_btl_base_module_t* btl, /* Otherwise, FRAG_RETURN it and repost if necessary */ MCA_BTL_IB_FRAG_RETURN(frag); if (BTL_OPENIB_QP_TYPE_PP(rqp)) { - OPAL_THREAD_ADD32(&ep->qps[rqp].u.pp_qp.rd_posted, -1); + OPAL_THREAD_ADD_FETCH32(&ep->qps[rqp].u.pp_qp.rd_posted, -1); mca_btl_openib_endpoint_post_rr(ep, cqp); } else { mca_btl_openib_module_t *btl = ep->endpoint_btl; - OPAL_THREAD_ADD32(&btl->qps[rqp].u.srq_qp.rd_posted, -1); + OPAL_THREAD_ADD_FETCH32(&btl->qps[rqp].u.srq_qp.rd_posted, -1); mca_btl_openib_post_srr(btl, rqp); } @@ -3239,10 +3327,10 @@ static void btl_openib_handle_incoming_completion(mca_btl_base_module_t* btl, /* If we got any credits (RDMA or send), then try to progress all the no_credits_pending_frags lists */ if (rcredits > 0) { - OPAL_THREAD_ADD32(&ep->eager_rdma_remote.tokens, rcredits); + OPAL_THREAD_ADD_FETCH32(&ep->eager_rdma_remote.tokens, rcredits); } if (credits > 0) { - OPAL_THREAD_ADD32(&ep->qps[cqp].u.pp_qp.sd_credits, credits); + OPAL_THREAD_ADD_FETCH32(&ep->qps[cqp].u.pp_qp.sd_credits, credits); } if (rcredits + credits > 0) { int rc; @@ -3348,7 +3436,9 @@ progress_pending_frags_wqe(mca_btl_base_endpoint_t *ep, const int qpn) frag = opal_list_remove_first(&ep->qps[qpn].no_wqe_pending_frags[i]); if(NULL == frag) break; +#if OPAL_ENABLE_DEBUG assert(0 == frag->opal_list_item_refcount); +#endif tmp_ep = to_com_frag(frag)->endpoint; ret = mca_btl_openib_endpoint_post_send(tmp_ep, to_send_frag(frag)); if (OPAL_SUCCESS != ret) { @@ -3435,7 +3525,7 @@ static void handle_wc(mca_btl_openib_device_t* device, const uint32_t cq, case IBV_WC_FETCH_ADD: OPAL_OUTPUT((-1, "Got WC: RDMA_READ or RDMA_WRITE")); - OPAL_THREAD_ADD32(&endpoint->get_tokens, 1); + OPAL_THREAD_ADD_FETCH32(&endpoint->get_tokens, 1); mca_btl_openib_get_frag_t *get_frag = to_get_frag(des); @@ -3487,7 +3577,7 @@ static void handle_wc(mca_btl_openib_device_t* device, const uint32_t cq, n = qp_frag_to_wqe(endpoint, qp, to_com_frag(des)); if(IBV_WC_SEND == wc->opcode && !BTL_OPENIB_QP_TYPE_PP(qp)) { - OPAL_THREAD_ADD32(&openib_btl->qps[qp].u.srq_qp.sd_credits, 1+n); + OPAL_THREAD_ADD_FETCH32(&openib_btl->qps[qp].u.srq_qp.sd_credits, 1+n); /* new SRQ credit available. Try to progress pending frags*/ progress_pending_frags_srq(openib_btl, qp); @@ -3513,7 +3603,7 @@ static void handle_wc(mca_btl_openib_device_t* device, const uint32_t cq, wc->byte_len < mca_btl_openib_component.eager_limit && openib_btl->eager_rdma_channels < mca_btl_openib_component.max_eager_rdma && - OPAL_THREAD_ADD32(&endpoint->eager_recv_count, 1) == + OPAL_THREAD_ADD_FETCH32(&endpoint->eager_recv_count, 1) == mca_btl_openib_component.eager_rdma_threshold) { mca_btl_openib_endpoint_connect_eager_rdma(endpoint); } @@ -3846,7 +3936,7 @@ int mca_btl_openib_post_srr(mca_btl_openib_module_t* openib_btl, const int qp) if(OPAL_LIKELY(0 == rc)) { struct ibv_srq_attr srq_attr; - OPAL_THREAD_ADD32(&openib_btl->qps[qp].u.srq_qp.rd_posted, num_post); + OPAL_THREAD_ADD_FETCH32(&openib_btl->qps[qp].u.srq_qp.rd_posted, num_post); if(true == openib_btl->qps[qp].u.srq_qp.srq_limit_event_flag) { srq_attr.max_wr = openib_btl->qps[qp].u.srq_qp.rd_curr_num; diff --git a/opal/mca/btl/openib/btl_openib_eager_rdma.h b/opal/mca/btl/openib/btl_openib_eager_rdma.h index 0ba5a030d4c..5acb038177f 100644 --- a/opal/mca/btl/openib/btl_openib_eager_rdma.h +++ b/opal/mca/btl/openib/btl_openib_eager_rdma.h @@ -96,7 +96,7 @@ typedef struct mca_btl_openib_eager_rdma_remote_t mca_btl_openib_eager_rdma_remo #define MCA_BTL_OPENIB_RDMA_MOVE_INDEX(HEAD, OLD_HEAD, SEQ) \ do { \ - (SEQ) = OPAL_THREAD_ADD32(&(HEAD), 1) - 1; \ + (SEQ) = OPAL_THREAD_ADD_FETCH32(&(HEAD), 1) - 1; \ (OLD_HEAD) = (SEQ) % mca_btl_openib_component.eager_rdma_num; \ } while(0) @@ -108,7 +108,7 @@ typedef struct mca_btl_openib_eager_rdma_remote_t mca_btl_openib_eager_rdma_remo #define MCA_BTL_OPENIB_RDMA_MOVE_INDEX(HEAD, OLD_HEAD) \ do { \ - (OLD_HEAD) = (OPAL_THREAD_ADD32(&(HEAD), 1) - 1) % mca_btl_openib_component.eager_rdma_num; \ + (OLD_HEAD) = (OPAL_THREAD_ADD_FETCH32(&(HEAD), 1) - 1) % mca_btl_openib_component.eager_rdma_num; \ } while(0) #endif diff --git a/opal/mca/btl/openib/btl_openib_endpoint.c b/opal/mca/btl/openib/btl_openib_endpoint.c index b0ae1062be0..be01664b1c3 100644 --- a/opal/mca/btl/openib/btl_openib_endpoint.c +++ b/opal/mca/btl/openib/btl_openib_endpoint.c @@ -212,7 +212,7 @@ endpoint_init_qp_xrc(mca_btl_base_endpoint_t *ep, const int qp) qp_attr.cap.max_recv_sge = 1; /* we do not use SG list */ rc = ibv_modify_qp (ep_qp->qp->lcl_qp, &qp_attr, IBV_QP_CAP); if (0 == rc) { - opal_atomic_add_32 (&ep_qp->qp->sd_wqe, incr); + opal_atomic_add_fetch_32 (&ep_qp->qp->sd_wqe, incr); } } else { ep_qp->qp->sd_wqe = ep->ib_addr->max_wqe; @@ -373,11 +373,12 @@ static void mca_btl_openib_endpoint_destruct(mca_btl_base_endpoint_t* endpoint) /* Release memory resources */ do { + void *_tmp_ptr = NULL; /* Make sure that mca_btl_openib_endpoint_connect_eager_rdma () * was not in "connect" or "bad" flow (failed to allocate memory) * and changed the pointer back to NULL */ - if(!opal_atomic_cmpset_ptr(&endpoint->eager_rdma_local.base.pval, NULL, (void*)1)) { + if(!opal_atomic_compare_exchange_strong_ptr(&endpoint->eager_rdma_local.base.pval, (void *) &_tmp_ptr, (void *) 1)) { if (NULL != endpoint->eager_rdma_local.reg) { endpoint->endpoint_btl->device->rcache->rcache_deregister (endpoint->endpoint_btl->device->rcache, &endpoint->eager_rdma_local.reg->base); @@ -766,9 +767,9 @@ void mca_btl_openib_endpoint_send_credits(mca_btl_openib_endpoint_t* endpoint, if(OPAL_SUCCESS == acquire_eager_rdma_send_credit(endpoint)) { do_rdma = true; } else { - if(OPAL_THREAD_ADD32(&endpoint->qps[qp].u.pp_qp.cm_sent, 1) > + if(OPAL_THREAD_ADD_FETCH32(&endpoint->qps[qp].u.pp_qp.cm_sent, 1) > (mca_btl_openib_component.qp_infos[qp].u.pp_qp.rd_rsv - 1)) { - OPAL_THREAD_ADD32(&endpoint->qps[qp].u.pp_qp.cm_sent, -1); + OPAL_THREAD_ADD_FETCH32(&endpoint->qps[qp].u.pp_qp.cm_sent, -1); BTL_OPENIB_CREDITS_SEND_UNLOCK(endpoint, qp); return; } @@ -781,7 +782,7 @@ void mca_btl_openib_endpoint_send_credits(mca_btl_openib_endpoint_t* endpoint, if(cm_return > 255) { frag->hdr->cm_seen = 255; cm_return -= 255; - OPAL_THREAD_ADD32(&endpoint->qps[qp].u.pp_qp.cm_return, cm_return); + OPAL_THREAD_ADD_FETCH32(&endpoint->qps[qp].u.pp_qp.cm_return, cm_return); } else { frag->hdr->cm_seen = cm_return; } @@ -802,14 +803,14 @@ void mca_btl_openib_endpoint_send_credits(mca_btl_openib_endpoint_t* endpoint, BTL_OPENIB_RDMA_CREDITS_HEADER_NTOH(*credits_hdr); } BTL_OPENIB_CREDITS_SEND_UNLOCK(endpoint, qp); - OPAL_THREAD_ADD32(&endpoint->qps[qp].u.pp_qp.rd_credits, + OPAL_THREAD_ADD_FETCH32(&endpoint->qps[qp].u.pp_qp.rd_credits, frag->hdr->credits); - OPAL_THREAD_ADD32(&endpoint->eager_rdma_local.credits, + OPAL_THREAD_ADD_FETCH32(&endpoint->eager_rdma_local.credits, credits_hdr->rdma_credits); if(do_rdma) - OPAL_THREAD_ADD32(&endpoint->eager_rdma_remote.tokens, 1); + OPAL_THREAD_ADD_FETCH32(&endpoint->eager_rdma_remote.tokens, 1); else - OPAL_THREAD_ADD32(&endpoint->qps[qp].u.pp_qp.cm_sent, -1); + OPAL_THREAD_ADD_FETCH32(&endpoint->qps[qp].u.pp_qp.cm_sent, -1); BTL_ERROR(("error posting send request errno %d says %s", rc, strerror(errno))); @@ -823,7 +824,7 @@ static void mca_btl_openib_endpoint_eager_rdma_connect_cb( int status) { mca_btl_openib_device_t *device = endpoint->endpoint_btl->device; - OPAL_THREAD_ADD32(&device->non_eager_rdma_endpoints, -1); + OPAL_THREAD_ADD_FETCH32(&device->non_eager_rdma_endpoints, -1); assert(device->non_eager_rdma_endpoints >= 0); MCA_BTL_IB_FRAG_RETURN(descriptor); } @@ -894,12 +895,14 @@ void mca_btl_openib_endpoint_connect_eager_rdma( mca_btl_openib_recv_frag_t *headers_buf; int i, rc; uint32_t flag = MCA_RCACHE_FLAGS_CACHE_BYPASS; + void *_tmp_ptr = NULL; /* Set local rdma pointer to 1 temporarily so other threads will not try * to enter the function */ - if(!opal_atomic_cmpset_ptr(&endpoint->eager_rdma_local.base.pval, NULL, - (void*)1)) + if(!opal_atomic_compare_exchange_strong_ptr (&endpoint->eager_rdma_local.base.pval, (void *) &_tmp_ptr, + (void *) 1)) { return; + } headers_buf = (mca_btl_openib_recv_frag_t*) malloc(sizeof(mca_btl_openib_recv_frag_t) * @@ -975,22 +978,23 @@ void mca_btl_openib_endpoint_connect_eager_rdma( endpoint->eager_rdma_local.rd_win?endpoint->eager_rdma_local.rd_win:1; /* set local rdma pointer to real value */ - (void)opal_atomic_cmpset_ptr(&endpoint->eager_rdma_local.base.pval, - (void*)1, buf); + endpoint->eager_rdma_local.base.pval = buf; endpoint->eager_rdma_local.alloc_base = alloc_base; if(mca_btl_openib_endpoint_send_eager_rdma(endpoint) == OPAL_SUCCESS) { mca_btl_openib_device_t *device = endpoint->endpoint_btl->device; mca_btl_openib_endpoint_t **p; + void *_tmp_ptr; OBJ_RETAIN(endpoint); assert(((opal_object_t*)endpoint)->obj_reference_count == 2); do { + _tmp_ptr = NULL; p = &device->eager_rdma_buffers[device->eager_rdma_buffers_count]; - } while(!opal_atomic_cmpset_ptr(p, NULL, endpoint)); + } while(!opal_atomic_compare_exchange_strong_ptr (p, (void *) &_tmp_ptr, endpoint)); - OPAL_THREAD_ADD32(&openib_btl->eager_rdma_channels, 1); + OPAL_THREAD_ADD_FETCH32(&openib_btl->eager_rdma_channels, 1); /* from this point progress function starts to poll new buffer */ - OPAL_THREAD_ADD32(&device->eager_rdma_buffers_count, 1); + OPAL_THREAD_ADD_FETCH32(&device->eager_rdma_buffers_count, 1); return; } @@ -1001,8 +1005,7 @@ void mca_btl_openib_endpoint_connect_eager_rdma( free(headers_buf); unlock_rdma_local: /* set local rdma pointer back to zero. Will retry later */ - (void)opal_atomic_cmpset_ptr(&endpoint->eager_rdma_local.base.pval, - endpoint->eager_rdma_local.base.pval, NULL); + endpoint->eager_rdma_local.base.pval = NULL; endpoint->eager_rdma_local.frags = NULL; } diff --git a/opal/mca/btl/openib/btl_openib_endpoint.h b/opal/mca/btl/openib/btl_openib_endpoint.h index c4a12996432..89c42c595e5 100644 --- a/opal/mca/btl/openib/btl_openib_endpoint.h +++ b/opal/mca/btl/openib/btl_openib_endpoint.h @@ -277,19 +277,19 @@ OBJ_CLASS_DECLARATION(mca_btl_openib_endpoint_t); static inline int32_t qp_get_wqe(mca_btl_openib_endpoint_t *ep, const int qp) { - return OPAL_THREAD_ADD32(&ep->qps[qp].qp->sd_wqe, -1); + return OPAL_THREAD_ADD_FETCH32(&ep->qps[qp].qp->sd_wqe, -1); } static inline int32_t qp_put_wqe(mca_btl_openib_endpoint_t *ep, const int qp) { - return OPAL_THREAD_ADD32(&ep->qps[qp].qp->sd_wqe, 1); + return OPAL_THREAD_ADD_FETCH32(&ep->qps[qp].qp->sd_wqe, 1); } static inline int32_t qp_inc_inflight_wqe(mca_btl_openib_endpoint_t *ep, const int qp, mca_btl_openib_com_frag_t *frag) { frag->n_wqes_inflight = 0; - return OPAL_THREAD_ADD32(&ep->qps[qp].qp->sd_wqe_inflight, 1); + return OPAL_THREAD_ADD_FETCH32(&ep->qps[qp].qp->sd_wqe_inflight, 1); } static inline void qp_inflight_wqe_to_frag(mca_btl_openib_endpoint_t *ep, const int qp, mca_btl_openib_com_frag_t *frag) @@ -303,7 +303,7 @@ static inline int qp_frag_to_wqe(mca_btl_openib_endpoint_t *ep, const int qp, mc { int n; n = frag->n_wqes_inflight; - OPAL_THREAD_ADD32(&ep->qps[qp].qp->sd_wqe, n); + OPAL_THREAD_ADD_FETCH32(&ep->qps[qp].qp->sd_wqe, n); frag->n_wqes_inflight = 0; return n; @@ -420,15 +420,15 @@ static inline int mca_btl_openib_endpoint_post_rr_nolock( if((rc = post_recvs(ep, qp, num_post)) != OPAL_SUCCESS) { return rc; } - OPAL_THREAD_ADD32(&ep->qps[qp].u.pp_qp.rd_posted, num_post); - OPAL_THREAD_ADD32(&ep->qps[qp].u.pp_qp.rd_credits, num_post); + OPAL_THREAD_ADD_FETCH32(&ep->qps[qp].u.pp_qp.rd_posted, num_post); + OPAL_THREAD_ADD_FETCH32(&ep->qps[qp].u.pp_qp.rd_credits, num_post); /* post buffers for credit management on credit management qp */ if((rc = post_recvs(ep, cqp, cm_received)) != OPAL_SUCCESS) { return rc; } - OPAL_THREAD_ADD32(&ep->qps[qp].u.pp_qp.cm_return, cm_received); - OPAL_THREAD_ADD32(&ep->qps[qp].u.pp_qp.cm_received, -cm_received); + OPAL_THREAD_ADD_FETCH32(&ep->qps[qp].u.pp_qp.cm_return, cm_received); + OPAL_THREAD_ADD_FETCH32(&ep->qps[qp].u.pp_qp.cm_received, -cm_received); assert(ep->qps[qp].u.pp_qp.rd_credits <= rd_num && ep->qps[qp].u.pp_qp.rd_credits >= 0); @@ -446,14 +446,16 @@ static inline int mca_btl_openib_endpoint_post_rr( return ret; } -#define BTL_OPENIB_CREDITS_SEND_TRYLOCK(E, Q) \ - OPAL_ATOMIC_CMPSET_32(&(E)->qps[(Q)].rd_credit_send_lock, 0, 1) -#define BTL_OPENIB_CREDITS_SEND_UNLOCK(E, Q) \ - OPAL_ATOMIC_CMPSET_32(&(E)->qps[(Q)].rd_credit_send_lock, 1, 0) -#define BTL_OPENIB_GET_CREDITS(FROM, TO) \ - do { \ - TO = FROM; \ - } while(0 == OPAL_ATOMIC_CMPSET_32(&FROM, TO, 0)) +static inline __opal_attribute_always_inline__ bool btl_openib_credits_send_trylock (mca_btl_openib_endpoint_t *ep, int qp) +{ + int32_t _tmp_value = 0; + return OPAL_ATOMIC_COMPARE_EXCHANGE_STRONG_32(&ep->qps[qp].rd_credit_send_lock, &_tmp_value, 1); +} + +#define BTL_OPENIB_CREDITS_SEND_UNLOCK(E, Q) \ + OPAL_ATOMIC_SWAP_32 (&(E)->qps[(Q)].rd_credit_send_lock, 0) +#define BTL_OPENIB_GET_CREDITS(FROM, TO) \ + TO = OPAL_ATOMIC_SWAP_32(&FROM, 0) static inline bool check_eager_rdma_credits(const mca_btl_openib_endpoint_t *ep) @@ -486,7 +488,7 @@ static inline void send_credits(mca_btl_openib_endpoint_t *ep, int qp) return; try_send: - if(BTL_OPENIB_CREDITS_SEND_TRYLOCK(ep, qp)) + if(btl_openib_credits_send_trylock(ep, qp)) mca_btl_openib_endpoint_send_credits(ep, qp); } @@ -530,8 +532,8 @@ ib_send_flags(uint32_t size, mca_btl_openib_endpoint_qp_t *qp, int do_signal) static inline int acquire_eager_rdma_send_credit(mca_btl_openib_endpoint_t *endpoint) { - if(OPAL_THREAD_ADD32(&endpoint->eager_rdma_remote.tokens, -1) < 0) { - OPAL_THREAD_ADD32(&endpoint->eager_rdma_remote.tokens, 1); + if(OPAL_THREAD_ADD_FETCH32(&endpoint->eager_rdma_remote.tokens, -1) < 0) { + OPAL_THREAD_ADD_FETCH32(&endpoint->eager_rdma_remote.tokens, 1); return OPAL_ERR_OUT_OF_RESOURCE; } @@ -636,8 +638,8 @@ static inline int mca_btl_openib_endpoint_credit_acquire (struct mca_btl_base_en prio = !prio; if (BTL_OPENIB_QP_TYPE_PP(qp)) { - if (OPAL_THREAD_ADD32(&endpoint->qps[qp].u.pp_qp.sd_credits, -1) < 0) { - OPAL_THREAD_ADD32(&endpoint->qps[qp].u.pp_qp.sd_credits, 1); + if (OPAL_THREAD_ADD_FETCH32(&endpoint->qps[qp].u.pp_qp.sd_credits, -1) < 0) { + OPAL_THREAD_ADD_FETCH32(&endpoint->qps[qp].u.pp_qp.sd_credits, 1); if (queue_frag) { opal_list_append(&endpoint->qps[qp].no_credits_pending_frags[prio], (opal_list_item_t *)frag); @@ -646,8 +648,8 @@ static inline int mca_btl_openib_endpoint_credit_acquire (struct mca_btl_base_en return OPAL_ERR_OUT_OF_RESOURCE; } } else { - if(OPAL_THREAD_ADD32(&openib_btl->qps[qp].u.srq_qp.sd_credits, -1) < 0) { - OPAL_THREAD_ADD32(&openib_btl->qps[qp].u.srq_qp.sd_credits, 1); + if(OPAL_THREAD_ADD_FETCH32(&openib_btl->qps[qp].u.srq_qp.sd_credits, -1) < 0) { + OPAL_THREAD_ADD_FETCH32(&openib_btl->qps[qp].u.srq_qp.sd_credits, 1); if (queue_frag) { OPAL_THREAD_LOCK(&openib_btl->ib_lock); opal_list_append(&openib_btl->qps[qp].u.srq_qp.pending_frags[prio], @@ -682,7 +684,7 @@ static inline int mca_btl_openib_endpoint_credit_acquire (struct mca_btl_base_en if(cm_return > 255) { hdr->cm_seen = 255; cm_return -= 255; - OPAL_THREAD_ADD32(&endpoint->qps[qp].u.pp_qp.cm_return, cm_return); + OPAL_THREAD_ADD_FETCH32(&endpoint->qps[qp].u.pp_qp.cm_return, cm_return); } else { hdr->cm_seen = cm_return; } @@ -697,18 +699,18 @@ static inline void mca_btl_openib_endpoint_credit_release (struct mca_btl_base_e mca_btl_openib_header_t *hdr = frag->hdr; if (BTL_OPENIB_IS_RDMA_CREDITS(hdr->credits)) { - OPAL_THREAD_ADD32(&endpoint->eager_rdma_local.credits, BTL_OPENIB_CREDITS(hdr->credits)); + OPAL_THREAD_ADD_FETCH32(&endpoint->eager_rdma_local.credits, BTL_OPENIB_CREDITS(hdr->credits)); } if (do_rdma) { - OPAL_THREAD_ADD32(&endpoint->eager_rdma_remote.tokens, 1); + OPAL_THREAD_ADD_FETCH32(&endpoint->eager_rdma_remote.tokens, 1); } else { if(BTL_OPENIB_QP_TYPE_PP(qp)) { - OPAL_THREAD_ADD32 (&endpoint->qps[qp].u.pp_qp.rd_credits, hdr->credits); - OPAL_THREAD_ADD32(&endpoint->qps[qp].u.pp_qp.sd_credits, 1); + OPAL_THREAD_ADD_FETCH32 (&endpoint->qps[qp].u.pp_qp.rd_credits, hdr->credits); + OPAL_THREAD_ADD_FETCH32(&endpoint->qps[qp].u.pp_qp.sd_credits, 1); } else if BTL_OPENIB_QP_TYPE_SRQ(qp){ mca_btl_openib_module_t *openib_btl = endpoint->endpoint_btl; - OPAL_THREAD_ADD32(&openib_btl->qps[qp].u.srq_qp.sd_credits, 1); + OPAL_THREAD_ADD_FETCH32(&openib_btl->qps[qp].u.srq_qp.sd_credits, 1); } } } diff --git a/opal/mca/btl/openib/btl_openib_get.c b/opal/mca/btl/openib/btl_openib_get.c index c8bc78105db..6dc73bc6e4c 100644 --- a/opal/mca/btl/openib/btl_openib_get.c +++ b/opal/mca/btl/openib/btl_openib_get.c @@ -148,9 +148,9 @@ int mca_btl_openib_get_internal (mca_btl_base_module_t *btl, struct mca_btl_base } /* check for a get token */ - if (OPAL_THREAD_ADD32(&ep->get_tokens,-1) < 0) { + if (OPAL_THREAD_ADD_FETCH32(&ep->get_tokens,-1) < 0) { qp_put_wqe(ep, qp); - OPAL_THREAD_ADD32(&ep->get_tokens,1); + OPAL_THREAD_ADD_FETCH32(&ep->get_tokens,1); return OPAL_ERR_OUT_OF_RESOURCE; } @@ -159,7 +159,7 @@ int mca_btl_openib_get_internal (mca_btl_base_module_t *btl, struct mca_btl_base if (ibv_post_send(ep->qps[qp].qp->lcl_qp, &frag->sr_desc, &bad_wr)) { qp_put_wqe(ep, qp); - OPAL_THREAD_ADD32(&ep->get_tokens,1); + OPAL_THREAD_ADD_FETCH32(&ep->get_tokens,1); return OPAL_ERROR; } diff --git a/opal/mca/btl/openib/btl_openib_ini.c b/opal/mca/btl/openib/btl_openib_ini.c index e6bc6e89c66..0e1b7551531 100644 --- a/opal/mca/btl/openib/btl_openib_ini.c +++ b/opal/mca/btl/openib/btl_openib_ini.c @@ -12,7 +12,7 @@ * All rights reserved. * Copyright (c) 2006-2013 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2008 Mellanox Technologies. All rights reserved. - * Copyright (c) 2012-2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2012-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 Intel, Inc. All rights reserved * Copyright (c) 2014-2015 Research Organization for Information Science @@ -160,7 +160,6 @@ int opal_btl_openib_ini_query(uint32_t vendor_id, uint32_t vendor_part_id, { int ret; device_values_t *h; - opal_list_item_t *item; if (!initialized) { if (OPAL_SUCCESS != (ret = opal_btl_openib_ini_init())) { @@ -176,10 +175,7 @@ int opal_btl_openib_ini_query(uint32_t vendor_id, uint32_t vendor_part_id, reset_values(values); /* Iterate over all the saved devices */ - for (item = opal_list_get_first(&devices); - item != opal_list_get_end(&devices); - item = opal_list_get_next(item)) { - h = (device_values_t*) item; + OPAL_LIST_FOREACH(h, &devices, device_values_t) { if (vendor_id == h->vendor_id && vendor_part_id == h->vendor_part_id) { /* Found it! */ @@ -208,15 +204,8 @@ int opal_btl_openib_ini_query(uint32_t vendor_id, uint32_t vendor_part_id, */ int opal_btl_openib_ini_finalize(void) { - opal_list_item_t *item; - if (initialized) { - for (item = opal_list_remove_first(&devices); - NULL != item; - item = opal_list_remove_first(&devices)) { - OBJ_RELEASE(item); - } - OBJ_DESTRUCT(&devices); + OPAL_LIST_DESTRUCT(&devices); initialized = true; } @@ -524,7 +513,6 @@ static void reset_values(opal_btl_openib_ini_values_t *v) static int save_section(parsed_section_values_t *s) { int i, j; - opal_list_item_t *item; device_values_t *h; bool found; @@ -541,10 +529,7 @@ static int save_section(parsed_section_values_t *s) found = false; /* Iterate over all the saved devices */ - for (item = opal_list_get_first(&devices); - item != opal_list_get_end(&devices); - item = opal_list_get_next(item)) { - h = (device_values_t*) item; + OPAL_LIST_FOREACH(h, &devices, device_values_t) { if (s->vendor_ids[i] == h->vendor_id && s->vendor_part_ids[j] == h->vendor_part_id) { /* Found a match. Update any newly-set values. */ diff --git a/opal/mca/btl/openib/btl_openib_ip.c b/opal/mca/btl/openib/btl_openib_ip.c index 2589890153f..8a9e5992ece 100644 --- a/opal/mca/btl/openib/btl_openib_ip.c +++ b/opal/mca/btl/openib/btl_openib_ip.c @@ -1,8 +1,11 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* * Copyright (c) 2008 Chelsio, Inc. All rights reserved. * Copyright (c) 2008-2010 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2010 Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2014 Intel, Inc. All rights reserved. + * Copyright (c) 2017 Los Alamos National Security, LLC. All rights + * reserved. * * Additional copyrights may follow * @@ -89,7 +92,7 @@ static char *stringify(uint32_t addr) uint64_t mca_btl_openib_get_ip_subnet_id(struct ibv_device *ib_dev, uint8_t port) { - opal_list_item_t *item; + struct rdma_addr_list *addr; /* In the off chance that the user forces a non-RDMACM CPC and an * IP-based mechanism, the list will be uninitialized. Return 0 @@ -100,10 +103,7 @@ uint64_t mca_btl_openib_get_ip_subnet_id(struct ibv_device *ib_dev, return 0; } - for (item = opal_list_get_first(myaddrs); - item != opal_list_get_end(myaddrs); - item = opal_list_get_next(item)) { - struct rdma_addr_list *addr = (struct rdma_addr_list *)item; + OPAL_LIST_FOREACH(addr, myaddrs, struct rdma_addr_list) { if (!strcmp(addr->dev_name, ib_dev->name) && port == addr->dev_port) { return addr->subnet; @@ -123,7 +123,7 @@ uint64_t mca_btl_openib_get_ip_subnet_id(struct ibv_device *ib_dev, uint32_t mca_btl_openib_rdma_get_ipv4addr(struct ibv_context *verbs, uint8_t port) { - opal_list_item_t *item; + struct rdma_addr_list *addr; /* Sanity check */ if (NULL == myaddrs) { @@ -132,10 +132,7 @@ uint32_t mca_btl_openib_rdma_get_ipv4addr(struct ibv_context *verbs, BTL_VERBOSE(("Looking for %s:%d in IP address list", ibv_get_device_name(verbs->device), port)); - for (item = opal_list_get_first(myaddrs); - item != opal_list_get_end(myaddrs); - item = opal_list_get_next(item)) { - struct rdma_addr_list *addr = (struct rdma_addr_list *)item; + OPAL_LIST_FOREACH(addr, myaddrs, struct rdma_addr_list) { if (!strcmp(addr->dev_name, verbs->device->name) && port == addr->dev_port) { BTL_VERBOSE(("FOUND: %s:%d is %s", @@ -404,19 +401,9 @@ int mca_btl_openib_build_rdma_addr_list(void) void mca_btl_openib_free_rdma_addr_list(void) { - opal_list_item_t *item, *next; - - if (NULL != myaddrs && 0 != opal_list_get_size(myaddrs)) { - for (item = opal_list_get_first(myaddrs); - item != opal_list_get_end(myaddrs); - item = next) { - struct rdma_addr_list *addr = (struct rdma_addr_list *)item; - next = opal_list_get_next(item); - opal_list_remove_item(myaddrs, item); - OBJ_RELEASE(addr); - } - OBJ_RELEASE(myaddrs); - myaddrs = NULL; + if (NULL != myaddrs) { + OPAL_LIST_RELEASE(myaddrs); + myaddrs = NULL; } } diff --git a/opal/mca/btl/openib/btl_openib_proc.c b/opal/mca/btl/openib/btl_openib_proc.c index f6e042ffc8d..a4b77fa6436 100644 --- a/opal/mca/btl/openib/btl_openib_proc.c +++ b/opal/mca/btl/openib/btl_openib_proc.c @@ -12,11 +12,11 @@ * All rights reserved. * Copyright (c) 2007-2015 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2006-2007 Voltaire All rights reserved. - * Copyright (c) 2014 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2017 Intel, Inc. All rights reserved. * Copyright (c) 2015-2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Mellanox Technologies. All rights reserved. - * Copyright (c) 2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2016-2017 Los Alamos National Security, LLC. All rights * reserved. * * $COPYRIGHT$ @@ -77,8 +77,6 @@ void mca_btl_openib_proc_construct(mca_btl_openib_proc_t* ib_proc) void mca_btl_openib_proc_destruct(mca_btl_openib_proc_t* ib_proc) { - mca_btl_openib_proc_btlptr_t* elem; - /* release resources */ if(NULL != ib_proc->proc_endpoints) { free(ib_proc->proc_endpoints); @@ -96,12 +94,7 @@ void mca_btl_openib_proc_destruct(mca_btl_openib_proc_t* ib_proc) } OBJ_DESTRUCT(&ib_proc->proc_lock); - elem = (mca_btl_openib_proc_btlptr_t*)opal_list_remove_first(&ib_proc->openib_btls); - while( NULL != elem ){ - OBJ_RELEASE(elem); - elem = (mca_btl_openib_proc_btlptr_t*)opal_list_remove_first(&ib_proc->openib_btls); - } - OBJ_DESTRUCT(&ib_proc->openib_btls); + OPAL_LIST_DESTRUCT(&ib_proc->openib_btls); } @@ -113,11 +106,7 @@ static mca_btl_openib_proc_t* ibproc_lookup_no_lock(opal_proc_t* proc) { mca_btl_openib_proc_t* ib_proc; - for(ib_proc = (mca_btl_openib_proc_t*) - opal_list_get_first(&mca_btl_openib_component.ib_procs); - ib_proc != (mca_btl_openib_proc_t*) - opal_list_get_end(&mca_btl_openib_component.ib_procs); - ib_proc = (mca_btl_openib_proc_t*)opal_list_get_next(ib_proc)) { + OPAL_LIST_FOREACH(ib_proc, &mca_btl_openib_component.ib_procs, mca_btl_openib_proc_t) { if(ib_proc->proc_opal == proc) { return ib_proc; } @@ -398,10 +387,7 @@ int mca_btl_openib_proc_reg_btl(mca_btl_openib_proc_t* ib_proc, { mca_btl_openib_proc_btlptr_t* elem; - - for(elem = (mca_btl_openib_proc_btlptr_t*)opal_list_get_first(&ib_proc->openib_btls); - elem != (mca_btl_openib_proc_btlptr_t*)opal_list_get_end(&ib_proc->openib_btls); - elem = (mca_btl_openib_proc_btlptr_t*)opal_list_get_next(elem)) { + OPAL_LIST_FOREACH(elem, &ib_proc->openib_btls, mca_btl_openib_proc_btlptr_t) { if(elem->openib_btl == openib_btl) { /* this is normal return meaning that this BTL has already touched this ib_proc */ return OPAL_ERR_RESOURCE_BUSY; diff --git a/opal/mca/btl/openib/connect/btl_openib_connect_base.c b/opal/mca/btl/openib/connect/btl_openib_connect_base.c index ca67d0f3635..7f1f75c6d91 100644 --- a/opal/mca/btl/openib/connect/btl_openib_connect_base.c +++ b/opal/mca/btl/openib/connect/btl_openib_connect_base.c @@ -32,6 +32,9 @@ #include "opal/util/proc.h" #include "opal/util/show_help.h" +#include "opal/util/sys_limits.h" +#include "opal/align.h" + /* * Array of all possible connection functions */ @@ -421,10 +424,27 @@ int opal_btl_openib_connect_base_alloc_cts(mca_btl_base_endpoint_t *endpoint) sizeof(mca_btl_openib_footer_t) + mca_btl_openib_component.qp_infos[mca_btl_openib_component.credits_qp].size; + int align_it = 0; + int page_size; + + page_size = opal_getpagesize(); + if (length >= page_size / 2) { align_it = 1; } + if (align_it) { +// I think this is only active for ~64k+ buffers anyway, but I'm not +// positive, so I'm only increasing the buffer size and alignment if +// it's not too small. That way we'd avoid wasting excessive memory +// in case this code was active for tiny buffers. + length = OPAL_ALIGN(length, page_size, int); + } + /* Explicitly don't use the mpool registration */ fli = &(endpoint->endpoint_cts_frag.super.super.base.super); fli->registration = NULL; - fli->ptr = malloc(length); + if (!align_it) { + fli->ptr = malloc(length); + } else { + posix_memalign((void**)&(fli->ptr), page_size, length); + } if (NULL == fli->ptr) { BTL_ERROR(("malloc failed")); return OPAL_ERR_OUT_OF_RESOURCE; diff --git a/opal/mca/btl/openib/connect/btl_openib_connect_rdmacm.c b/opal/mca/btl/openib/connect/btl_openib_connect_rdmacm.c index ce26219fe3f..5fa407dd5f6 100644 --- a/opal/mca/btl/openib/connect/btl_openib_connect_rdmacm.c +++ b/opal/mca/btl/openib/connect/btl_openib_connect_rdmacm.c @@ -5,9 +5,9 @@ * Copyright (c) 2008 Mellanox Technologies. All rights reserved. * Copyright (c) 2009 Sandia National Laboratories. All rights reserved. * Copyright (c) 2010 Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2012-2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2012-2017 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2013-2014 Intel, Inc. All rights reserved + * Copyright (c) 2013-2017 Intel, Inc. All rights reserved. * Copyright (c) 2014 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. @@ -1133,7 +1133,9 @@ static int handle_connect_request(struct rdma_cm_event *event) static void *call_disconnect_callback(int fd, int flags, void *v) { rdmacm_contents_t *contents = (rdmacm_contents_t *) v; +#if OPAL_ENABLE_DEBUG void *tmp = NULL; +#endif id_context_t *context; opal_list_item_t *item; @@ -1145,7 +1147,9 @@ static void *call_disconnect_callback(int fd, int flags, void *v) (void*) context->id)); if (!context->already_disconnected) { +#if OPAL_ENABLE_DEBUG tmp = context->id; +#endif rdma_disconnect(context->id); context->already_disconnected = true; } @@ -1214,8 +1218,8 @@ static int rdmacm_endpoint_finalize(struct mca_btl_base_endpoint_t *endpoint) call_disconnect_callback, contents); opal_event_active (&event, OPAL_EV_READ, 1); - /* remove_item returns the item before the item removed, - meaning that the for list is still safe */ + /* remove_item returns the item before the item removed, + meaning that the for list is still safe */ break; } } @@ -1260,9 +1264,12 @@ static int rdmacm_connect_endpoint(id_context_t *context, { rdmacm_contents_t *contents = context->contents; rdmacm_endpoint_local_cpc_data_t *data; + mca_btl_openib_endpoint_t *endpoint; +#if OPAL_ENABLE_DEBUG #if !BTL_OPENIB_RDMACM_IB_ADDR modex_message_t *message; +#endif #endif if (contents->server) { @@ -1295,13 +1302,14 @@ static int rdmacm_connect_endpoint(id_context_t *context, /* Only notify the upper layers after the last QP has been connected */ if (++data->rdmacm_counter < mca_btl_openib_component.num_qps) { - BTL_VERBOSE(("%s to peer %s, count == %d", contents->server?"server":"client", + BTL_VERBOSE(("%s to peer %s, count == %d", contents->server?"server":"client", opal_get_proc_hostname(endpoint->endpoint_proc->proc_opal), data->rdmacm_counter)); OPAL_OUTPUT((-1, "%s to peer %s, count == %d", contents->server?"server":"client", opal_get_proc_hostname(endpoint->endpoint_proc->proc_opal), data->rdmacm_counter)); return OPAL_SUCCESS; } +#if OPAL_ENABLE_DEBUG #if !BTL_OPENIB_RDMACM_IB_ADDR message = (modex_message_t *) endpoint->endpoint_remote_cpc_data->cbm_modex_message; BTL_VERBOSE(("%s connected!!! local %x remote %x state = %d", @@ -1309,6 +1317,7 @@ static int rdmacm_connect_endpoint(id_context_t *context, contents->ipaddr, message->ipaddr, endpoint->endpoint_state)); +#endif #endif /* Ensure that all the writes back to the endpoint and associated @@ -1343,7 +1352,7 @@ static int rdmacm_destroy_dummy_qp(id_context_t *context) Maybe the reject was already done. */ if (NULL != context->id) { - if (NULL != context->id->qp) { + if (NULL != context->id->qp) { ibv_destroy_qp(context->id->qp); context->id->qp = NULL; } @@ -1489,17 +1498,21 @@ static int finish_connect(id_context_t *context) struct rdma_conn_param conn_param; private_data_t msg; int rc; +#if OPAL_ENABLE_DEBUG #if !BTL_OPENIB_RDMACM_IB_ADDR struct sockaddr *peeraddr; uint32_t remoteipaddr; uint16_t remoteport; +#endif #endif modex_message_t *message; +#if OPAL_ENABLE_DEBUG #if !BTL_OPENIB_RDMACM_IB_ADDR - remoteport = rdma_get_dst_port(context->id); peeraddr = rdma_get_peer_addr(context->id); + remoteport = rdma_get_dst_port(context->id); remoteipaddr = ((struct sockaddr_in *)peeraddr)->sin_addr.s_addr; +#endif #endif message = (modex_message_t *) @@ -1661,8 +1674,12 @@ static int event_handler(struct rdma_cm_event *event) id_context_t *context = (id_context_t*) event->id->context; #if !BTL_OPENIB_RDMACM_IB_ADDR rdmacm_contents_t *contents; - struct sockaddr *peeraddr, *localaddr; - uint32_t peeripaddr, localipaddr; + struct sockaddr *localaddr; + uint32_t localipaddr; +#if OPAL_ENABLE_DEBUG + struct sockaddr *peeraddr; + uint32_t peeripaddr; +#endif #endif int rc = -1; opal_btl_openib_ini_values_t ini; @@ -1676,9 +1693,11 @@ static int event_handler(struct rdma_cm_event *event) contents = context->contents; localaddr = rdma_get_local_addr(event->id); - peeraddr = rdma_get_peer_addr(event->id); localipaddr = ((struct sockaddr_in *)localaddr)->sin_addr.s_addr; +#if OPAL_ENABLE_DEBUG + peeraddr = rdma_get_peer_addr(event->id); peeripaddr = ((struct sockaddr_in *)peeraddr)->sin_addr.s_addr; +#endif BTL_VERBOSE(("%s event_handler -- %s, status = %d to %x", contents->server?"server":"client", @@ -1769,11 +1788,11 @@ static int event_handler(struct rdma_cm_event *event) longer handle incoming requests. The rdma connection manager and lower level code doesn't handle retries, so we have to. */ - if (context->route_retry_count < rdmacm_resolve_max_retry_count) { - context->route_retry_count++; - rc = resolve_route(context); - break; - } + if (context->route_retry_count < rdmacm_resolve_max_retry_count) { + context->route_retry_count++; + rc = resolve_route(context); + break; + } show_help_rdmacm_event_error (event); rc = OPAL_ERROR; break; @@ -1879,7 +1898,7 @@ static int ipaddrcheck(id_context_t *context, rdmacm_contents_t *server = context->contents; uint32_t ipaddr; bool already_exists = false; - opal_list_item_t *item; + rdmacm_contents_t *contents; int server_tcp_port = rdma_get_src_port(context->id); char *str; @@ -1908,10 +1927,7 @@ static int ipaddrcheck(id_context_t *context, /* Ok, we found the IP address of this device/port. Have we already see this IP address/TCP port before? */ - for (item = opal_list_get_first(&server_listener_list); - item != opal_list_get_end(&server_listener_list); - item = opal_list_get_next(item)) { - rdmacm_contents_t *contents = (rdmacm_contents_t *)item; + OPAL_LIST_FOREACH(contents, &server_listener_list, rdmacm_contents_t) { BTL_VERBOSE(("paddr = %x, ipaddr addr = %x", contents->ipaddr, ipaddr)); if (contents->ipaddr == ipaddr && @@ -2002,11 +2018,11 @@ static int rdmacm_component_query(mca_btl_openib_module_t *openib_btl, opal_btl_ /* RDMACM is not supported for MPI_THREAD_MULTIPLE */ if (opal_using_threads()) { - BTL_VERBOSE(("rdmacm CPC is not supported with MPI_THREAD_MULTIPLE; skipped on %s:%d", - ibv_get_device_name(openib_btl->device->ib_dev), - openib_btl->port_num)); - rc = OPAL_ERR_NOT_SUPPORTED; - goto out; + BTL_VERBOSE(("rdmacm CPC is not supported with MPI_THREAD_MULTIPLE; skipped on %s:%d", + ibv_get_device_name(openib_btl->device->ib_dev), + openib_btl->port_num)); + rc = OPAL_ERR_NOT_SUPPORTED; + goto out; } /* RDMACM is not supported if we have any XRC QPs */ diff --git a/opal/mca/btl/openib/connect/btl_openib_connect_udcm.c b/opal/mca/btl/openib/connect/btl_openib_connect_udcm.c index 29b7de35540..ee5678120a0 100644 --- a/opal/mca/btl/openib/connect/btl_openib_connect_udcm.c +++ b/opal/mca/btl/openib/connect/btl_openib_connect_udcm.c @@ -7,7 +7,7 @@ * reserved. * Copyright (c) 2014-2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2014 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2017 Intel, Inc. All rights reserved. * Copyright (c) 2014 Bull SAS. All rights reserved. * Copyright (c) 2016 Mellanox Technologies. All rights reserved. * @@ -75,6 +75,7 @@ #include "connect/connect.h" #include "opal/util/sys_limits.h" +#include "opal/align.h" #if (ENABLE_DYNAMIC_SL) #include "connect/btl_openib_connect_sl.h" @@ -100,6 +101,8 @@ typedef struct { /** The port number of this port, also used to locate the source endpoint when an UD CM request arrives */ uint8_t mm_port_num; + /** Global ID (needed when routers are in use) */ + union ibv_gid mm_gid; } modex_msg_t; /* @@ -738,9 +741,17 @@ static int udcm_module_init (udcm_module_t *m, mca_btl_openib_module_t *btl) m->modex.mm_port_num = btl->port_num; m->modex.mm_qp_num = m->listen_qp->qp_num; - BTL_VERBOSE(("my modex = LID: %d, Port: %d, QPN: %d", - m->modex.mm_lid, m->modex.mm_port_num, - m->modex.mm_qp_num)); + rc = ibv_query_gid (btl->device->ib_dev_context, btl->port_num, + mca_btl_openib_component.gid_index, &m->modex.mm_gid); + if (0 != rc) { + BTL_VERBOSE(("error querying port GID")); + return OPAL_ERROR; + } + + BTL_VERBOSE(("my modex = LID: %d, Port: %d, QPN: %d, GID: %08x %08x", + m->modex.mm_lid, m->modex.mm_port_num, m->modex.mm_qp_num, + (unsigned int)m->modex.mm_gid.global.interface_id, + (unsigned int)m->modex.mm_gid.global.subnet_prefix)); m->cpc.data.cbm_modex_message_len = sizeof(m->modex); @@ -1030,7 +1041,7 @@ static void udcm_module_destroy_listen_qp (udcm_module_t *m) static int udcm_module_allocate_buffers (udcm_module_t *m) { - size_t total_size; + size_t total_size, page_size; m->msg_length = sizeof (udcm_msg_hdr_t) + mca_btl_openib_component.num_qps * sizeof (udcm_qp_t); @@ -1038,8 +1049,11 @@ static int udcm_module_allocate_buffers (udcm_module_t *m) total_size = (udcm_recv_count + 1) * (m->msg_length + UDCM_GRH_SIZE); + page_size = opal_getpagesize(); + total_size = OPAL_ALIGN(total_size, page_size, size_t); + m->cm_buffer = NULL; - posix_memalign ((void **)&m->cm_buffer, (size_t)opal_getpagesize(), + posix_memalign ((void **)&m->cm_buffer, (size_t)page_size, total_size); if (NULL == m->cm_buffer) { BTL_ERROR(("malloc failed! errno = %d", errno)); @@ -1528,6 +1542,7 @@ static int udcm_endpoint_init_data (mca_btl_base_endpoint_t *lcl_ep) { modex_msg_t *remote_msg = UDCM_ENDPOINT_REM_MODEX(lcl_ep); udcm_endpoint_t *udep = UDCM_ENDPOINT_DATA(lcl_ep); + udcm_module_t *m = UDCM_ENDPOINT_MODULE(lcl_ep); struct ibv_ah_attr ah_attr; int rc = OPAL_SUCCESS; @@ -1542,6 +1557,18 @@ static int udcm_endpoint_init_data (mca_btl_base_endpoint_t *lcl_ep) ah_attr.port_num = remote_msg->mm_port_num; ah_attr.sl = mca_btl_openib_component.ib_service_level; ah_attr.src_path_bits = lcl_ep->endpoint_btl->src_path_bits; + if (0 != memcmp (&remote_msg->mm_gid, &m->modex.mm_gid, sizeof (m->modex.mm_gid))) { + ah_attr.is_global = 1; + ah_attr.grh.flow_label = 0; + ah_attr.grh.dgid = remote_msg->mm_gid; + ah_attr.grh.sgid_index = mca_btl_openib_component.gid_index; + /* NTH: probably won't need to go over more than a single router. changeme if this + * assumption is wrong. this value should never be <= 1 as it will not leave the + * the subnet. */ + ah_attr.grh.hop_limit = 2; + /* Seems reasonable to set this to 0 for connection messages. */ + ah_attr.grh.traffic_class = 0; + } udep->ah = ibv_create_ah (lcl_ep->endpoint_btl->device->ib_pd, &ah_attr); if (!udep->ah) { @@ -1957,6 +1984,9 @@ static int udcm_process_messages (struct ibv_cq *event_cq, udcm_module_t *m) udcm_msg_t *message = NULL; udcm_message_recv_t *item; struct ibv_wc wc[20]; +#if OPAL_ENABLE_DEBUG + struct ibv_grh *grh; +#endif udcm_endpoint_t *udep; uint64_t dir; @@ -1969,11 +1999,6 @@ static int udcm_process_messages (struct ibv_cq *event_cq, udcm_module_t *m) for (i = 0 ; i < count ; i++) { dir = wc[i].wr_id & UDCM_WR_DIR_MASK; - BTL_VERBOSE(("WC: wr_id: 0x%016" PRIu64 ", status: %d, opcode: 0x%x, byte_len: %x, imm_data: 0x%08x, " - "qp_num: 0x%08x, src_qp: 0x%08x, wc_flags: 0x%x, slid: 0x%04x", - wc[i].wr_id, wc[i].status, wc[i].opcode, wc[i].byte_len, - wc[i].imm_data, wc[i].qp_num, wc[i].src_qp, wc[i].wc_flags, wc[i].slid)); - if (UDCM_WR_RECV_ID != dir) { opal_output (0, "unknown packet"); continue; @@ -1981,6 +2006,16 @@ static int udcm_process_messages (struct ibv_cq *event_cq, udcm_module_t *m) msg_num = (int)(wc[i].wr_id & (~UDCM_WR_DIR_MASK)); +#if OPAL_ENABLE_DEBUG + grh = (wc[i].wc_flags & IBV_WC_GRH) ? (struct ibv_grh *) udcm_module_get_recv_buffer (m, msg_num, false) : NULL; +#endif + + BTL_VERBOSE(("WC: wr_id: 0x%016" PRIu64 ", status: %d, opcode: 0x%x, byte_len: %x, imm_data: 0x%08x, " + "qp_num: 0x%08x, src_qp: 0x%08x, wc_flags: 0x%x, slid: 0x%04x grh_present: %s", + wc[i].wr_id, wc[i].status, wc[i].opcode, wc[i].byte_len, + wc[i].imm_data, wc[i].qp_num, wc[i].src_qp, wc[i].wc_flags, wc[i].slid, + grh ? "yes" : "no")); + if (IBV_WC_SUCCESS != wc[i].status) { BTL_ERROR(("recv work request for buffer %d failed, code = %d", msg_num, wc[i].status)); diff --git a/opal/mca/btl/portals4/Makefile.am b/opal/mca/btl/portals4/Makefile.am index d7cc49eca3a..f6c9351e336 100644 --- a/opal/mca/btl/portals4/Makefile.am +++ b/opal/mca/btl/portals4/Makefile.am @@ -12,6 +12,7 @@ # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2010-2012 Sandia National Laboratories. All rights reserved. # Copyright (c) 2014 Bull SAS. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -47,7 +48,7 @@ local_sources = \ mcacomponentdir = $(opallibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_btl_portals4_la_SOURCES = $(local_sources) -mca_btl_portals4_la_LIBADD = \ +mca_btl_portals4_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la \ $(btl_portals4_LIBS) mca_btl_portals4_la_LDFLAGS = -module -avoid-version $(btl_portals4_LDFLAGS) diff --git a/opal/mca/btl/portals4/btl_portals4.c b/opal/mca/btl/portals4/btl_portals4.c index b4504d502ce..80e28ef47bc 100644 --- a/opal/mca/btl/portals4/btl_portals4.c +++ b/opal/mca/btl/portals4/btl_portals4.c @@ -389,7 +389,6 @@ mca_btl_portals4_add_procs(struct mca_btl_base_module_t* btl_base, struct mca_btl_portals4_module_t* portals4_btl = (struct mca_btl_portals4_module_t*) btl_base; int ret; size_t i; - bool need_activate = false; opal_output_verbose(50, opal_btl_base_framework.framework_output, "mca_btl_portals4_add_procs: Adding %d procs (%d) for NI %d", @@ -397,10 +396,6 @@ mca_btl_portals4_add_procs(struct mca_btl_base_module_t* btl_base, (int) portals4_btl->portals_num_procs, portals4_btl->interface_num); - if (0 == portals4_btl->portals_num_procs) { - need_activate = true; - } - /* * The PML handed us a list of procs that need Portals4 * peer info. Complete those procs here. @@ -423,7 +418,7 @@ mca_btl_portals4_add_procs(struct mca_btl_base_module_t* btl_base, curr_proc, &btl_peer_data[i]); - OPAL_THREAD_ADD32(&portals4_btl->portals_num_procs, 1); + OPAL_THREAD_ADD_FETCH32(&portals4_btl->portals_num_procs, 1); /* and here we can reach */ opal_bitmap_set_bit(reachable, i); @@ -435,7 +430,7 @@ mca_btl_portals4_add_procs(struct mca_btl_base_module_t* btl_base, portals4_btl->interface_num)); } - if (need_activate && portals4_btl->portals_num_procs > 0) { + if (mca_btl_portals4_component.need_init && portals4_btl->portals_num_procs > 0) { if (mca_btl_portals4_component.use_logical) { ret = create_maptable(portals4_btl, nprocs, procs, btl_peer_data); if (OPAL_SUCCESS != ret) { @@ -453,6 +448,7 @@ mca_btl_portals4_add_procs(struct mca_btl_base_module_t* btl_base, __FILE__, __LINE__, ret); return ret; } + mca_btl_portals4_component.need_init = 0; } return OPAL_SUCCESS; @@ -476,12 +472,9 @@ mca_btl_portals4_del_procs(struct mca_btl_base_module_t *btl, portals4 entry in proc_endpoints instead of the peer_data */ for (i = 0 ; i < nprocs ; ++i) { free(btl_peer_data[i]); - OPAL_THREAD_ADD32(&portals4_btl->portals_num_procs, -1); + OPAL_THREAD_ADD_FETCH32(&portals4_btl->portals_num_procs, -1); } - if (0 == portals4_btl->portals_num_procs) - mca_btl_portals4_free_module(portals4_btl); - return OPAL_SUCCESS; } @@ -537,7 +530,7 @@ mca_btl_portals4_free(struct mca_btl_base_module_t* btl_base, if (frag->me_h != PTL_INVALID_HANDLE) { frag->me_h = PTL_INVALID_HANDLE; } - OPAL_THREAD_ADD32(&portals4_btl->portals_outstanding_ops, -1); + OPAL_THREAD_ADD_FETCH32(&portals4_btl->portals_outstanding_ops, -1); OPAL_OUTPUT_VERBOSE((90, opal_btl_base_framework.framework_output, "mca_btl_portals4_free: Decrementing portals_outstanding_ops=%d\n", portals4_btl->portals_outstanding_ops)); OPAL_BTL_PORTALS4_FRAG_RETURN_USER(portals4_btl, frag); @@ -622,7 +615,7 @@ mca_btl_portals4_register_mem(mca_btl_base_module_t *btl_base, return NULL; } - handle->key = OPAL_THREAD_ADD64(&(portals4_btl->portals_rdma_key), 1); + handle->key = OPAL_THREAD_ADD_FETCH64(&(portals4_btl->portals_rdma_key), 1); handle->remote_offset = 0; OPAL_OUTPUT_VERBOSE((90, opal_btl_base_framework.framework_output, @@ -662,7 +655,7 @@ mca_btl_portals4_register_mem(mca_btl_base_module_t *btl_base, opal_output_verbose(1, opal_btl_base_framework.framework_output, "%s:%d: PtlMEAppend failed: %d\n", __FILE__, __LINE__, ret); - OPAL_THREAD_ADD32(&portals4_btl->portals_outstanding_ops, -1); + OPAL_THREAD_ADD_FETCH32(&portals4_btl->portals_outstanding_ops, -1); return NULL; } OPAL_OUTPUT_VERBOSE((90, opal_btl_base_framework.framework_output, diff --git a/opal/mca/btl/portals4/btl_portals4_component.c b/opal/mca/btl/portals4/btl_portals4_component.c index eda9cd81f70..a56236d3e9f 100644 --- a/opal/mca/btl/portals4/btl_portals4_component.c +++ b/opal/mca/btl/portals4/btl_portals4_component.c @@ -609,7 +609,7 @@ mca_btl_portals4_component_progress(void) mca_btl_portals4_free(&portals4_btl->super, &frag->base); } if (0 != frag->size) { - OPAL_THREAD_ADD32(&portals4_btl->portals_outstanding_ops, -1); + OPAL_THREAD_ADD_FETCH32(&portals4_btl->portals_outstanding_ops, -1); OPAL_OUTPUT_VERBOSE((90, opal_btl_base_framework.framework_output, "PTL_EVENT_SEND: Decrementing portals_outstanding_ops=%d (1)\n", portals4_btl->portals_outstanding_ops)); @@ -646,7 +646,7 @@ mca_btl_portals4_component_progress(void) } if (0 != frag->size) { - OPAL_THREAD_ADD32(&portals4_btl->portals_outstanding_ops, -1); + OPAL_THREAD_ADD_FETCH32(&portals4_btl->portals_outstanding_ops, -1); OPAL_OUTPUT_VERBOSE((90, opal_btl_base_framework.framework_output, "PTL_EVENT_ACK: Decrementing portals_outstanding_ops=%d (2)\n", portals4_btl->portals_outstanding_ops)); } @@ -749,7 +749,7 @@ mca_btl_portals4_component_progress(void) OPAL_SUCCESS); OPAL_BTL_PORTALS4_FRAG_RETURN_USER(&portals4_btl->super, frag); - OPAL_THREAD_ADD32(&portals4_btl->portals_outstanding_ops, -1); + OPAL_THREAD_ADD_FETCH32(&portals4_btl->portals_outstanding_ops, -1); OPAL_OUTPUT_VERBOSE((90, opal_btl_base_framework.framework_output, "PTL_EVENT_REPLY: Decrementing portals_outstanding_ops=%d\n", portals4_btl->portals_outstanding_ops)); goto done; diff --git a/opal/mca/btl/portals4/btl_portals4_rdma.c b/opal/mca/btl/portals4/btl_portals4_rdma.c index 33fb9ab326e..9237b30fce2 100644 --- a/opal/mca/btl/portals4/btl_portals4_rdma.c +++ b/opal/mca/btl/portals4/btl_portals4_rdma.c @@ -53,16 +53,16 @@ mca_btl_portals4_get(struct mca_btl_base_module_t* btl_base, int ret; /* reserve space in the event queue for rdma operations immediately */ - while (OPAL_THREAD_ADD32(&portals4_btl->portals_outstanding_ops, 1) > + while (OPAL_THREAD_ADD_FETCH32(&portals4_btl->portals_outstanding_ops, 1) > portals4_btl->portals_max_outstanding_ops) { - OPAL_THREAD_ADD32(&portals4_btl->portals_outstanding_ops, -1); + OPAL_THREAD_ADD_FETCH32(&portals4_btl->portals_outstanding_ops, -1); OPAL_OUTPUT_VERBOSE((90, opal_btl_base_framework.framework_output, "Call to mca_btl_portals4_component_progress (1)\n")); mca_btl_portals4_component_progress(); } OPAL_BTL_PORTALS4_FRAG_ALLOC_USER(portals4_btl, frag); if (NULL == frag){ - OPAL_THREAD_ADD32(&portals4_btl->portals_outstanding_ops, -1); + OPAL_THREAD_ADD_FETCH32(&portals4_btl->portals_outstanding_ops, -1); return OPAL_ERROR; } OPAL_OUTPUT_VERBOSE((90, opal_btl_base_framework.framework_output, diff --git a/opal/mca/btl/portals4/btl_portals4_send.c b/opal/mca/btl/portals4/btl_portals4_send.c index 1f50fb2ef58..218ed877803 100644 --- a/opal/mca/btl/portals4/btl_portals4_send.c +++ b/opal/mca/btl/portals4/btl_portals4_send.c @@ -49,9 +49,9 @@ int mca_btl_portals4_send(struct mca_btl_base_module_t* btl_base, BTL_PORTALS4_SET_SEND_BITS(match_bits, 0, 0, tag, msglen_type); /* reserve space in the event queue for rdma operations immediately */ - while (OPAL_THREAD_ADD32(&portals4_btl->portals_outstanding_ops, 1) > + while (OPAL_THREAD_ADD_FETCH32(&portals4_btl->portals_outstanding_ops, 1) > portals4_btl->portals_max_outstanding_ops) { - OPAL_THREAD_ADD32(&portals4_btl->portals_outstanding_ops, -1); + OPAL_THREAD_ADD_FETCH32(&portals4_btl->portals_outstanding_ops, -1); OPAL_OUTPUT_VERBOSE((90, opal_btl_base_framework.framework_output, "Call to mca_btl_portals4_component_progress (4)\n")); mca_btl_portals4_component_progress(); diff --git a/opal/mca/btl/scif/Makefile.am b/opal/mca/btl/scif/Makefile.am index da1c9f7f5a7..828ef2e7dfb 100644 --- a/opal/mca/btl/scif/Makefile.am +++ b/opal/mca/btl/scif/Makefile.am @@ -39,7 +39,8 @@ mcacomponentdir = $(opallibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_btl_scif_la_SOURCES = $(scif_SOURCES) nodist_mca_btl_scif_la_SOURCES = $(scif_nodist_SOURCES) -mca_btl_scif_la_LIBADD = $(btl_scif_LIBS) +mca_btl_scif_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la \ + $(btl_scif_LIBS) mca_btl_scif_la_LDFLAGS = -module -avoid-version $(btl_scif_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/opal/mca/btl/self/Makefile.am b/opal/mca/btl/self/Makefile.am index e35fb91d803..3bafcb2e6ee 100644 --- a/opal/mca/btl/self/Makefile.am +++ b/opal/mca/btl/self/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -40,6 +41,7 @@ mcacomponentdir = $(opallibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_btl_self_la_SOURCES = $(libmca_btl_self_la_sources) mca_btl_self_la_LDFLAGS = -module -avoid-version +mca_btl_self_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la noinst_LTLIBRARIES = $(component_noinst) libmca_btl_self_la_SOURCES = $(libmca_btl_self_la_sources) diff --git a/opal/mca/btl/sm/Makefile.am b/opal/mca/btl/sm/Makefile.am index 06a064751b9..8d63851719f 100644 --- a/opal/mca/btl/sm/Makefile.am +++ b/opal/mca/btl/sm/Makefile.am @@ -9,8 +9,9 @@ # University of Stuttgart. All rights reserved. # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. -# Copyright (c) 2009-2014 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2009-2017 Cisco Systems, Inc. All rights reserved # Copyright (c) 2014 NVIDIA Corporation. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -20,14 +21,7 @@ dist_opaldata_DATA = help-mpi-btl-sm.txt -libmca_btl_sm_la_sources = \ - btl_sm.c \ - btl_sm.h \ - btl_sm_component.c \ - btl_sm_endpoint.h \ - btl_sm_fifo.h \ - btl_sm_frag.c \ - btl_sm_frag.h +libmca_btl_sm_la_sources = btl_sm_component.c # Make the output library in this directory, and name it either # mca__.la (for DSO builds) or libmca__.la @@ -41,19 +35,11 @@ component_noinst = libmca_btl_sm.la component_install = endif -# See opal/mca/common/sm/Makefile.am for an explanation of -# libmca_common_sm.la. - mcacomponentdir = $(opallibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_btl_sm_la_SOURCES = $(libmca_btl_sm_la_sources) mca_btl_sm_la_LDFLAGS = -module -avoid-version -mca_btl_sm_la_LIBADD = \ - $(OPAL_TOP_BUILDDIR)/opal/mca/common/sm/lib@OPAL_LIB_PREFIX@mca_common_sm.la -if OPAL_cuda_support -mca_btl_sm_la_LIBADD += \ - $(OPAL_TOP_BUILDDIR)/opal/mca/common/cuda/lib@OPAL_LIB_PREFIX@mca_common_cuda.la -endif +mca_btl_sm_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la mca_btl_sm_la_CPPFLAGS = $(btl_sm_CPPFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/opal/mca/btl/sm/btl_sm.c b/opal/mca/btl/sm/btl_sm.c deleted file mode 100644 index d9078f5bc74..00000000000 --- a/opal/mca/btl/sm/btl_sm.c +++ /dev/null @@ -1,1368 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2004-2011 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2014 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2007 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2006-2007 Voltaire. All rights reserved. - * Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2010-2015 Los Alamos National Security, LLC. - * All rights reserved. - * Copyright (c) 2010-2012 IBM Corporation. All rights reserved. - * Copyright (c) 2012 Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2013-2016 Intel, Inc. All rights reserved. - * Copyright (c) 2014-2017 Research Organization for Information Science - * and Technology (RIST). All rights reserved. - * Copyright (c) 2016 ARM, Inc. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#include "opal_config.h" - -#include -#include -#ifdef HAVE_UNISTD_H -#include -#endif -#ifdef HAVE_FCNTL_H -#include -#endif /* HAVE_FCNTL_H */ -#include -#ifdef HAVE_SYS_MMAN_H -#include -#endif /* HAVE_SYS_MMAN_H */ - -#if OPAL_BTL_SM_HAVE_CMA && OPAL_CMA_NEED_SYSCALL_DEFS -#include "opal/sys/cma.h" -#endif /* OPAL_CMA_NEED_SYSCALL_DEFS */ - -#include "opal/sys/atomic.h" -#include "opal/class/opal_bitmap.h" -#include "opal/util/output.h" -#include "opal/util/show_help.h" -#include "opal/util/printf.h" -#include "opal/mca/hwloc/base/base.h" -#include "opal/mca/pmix/base/base.h" -#include "opal/mca/shmem/base/base.h" -#include "opal/mca/shmem/shmem.h" - -#include "opal/datatype/opal_convertor.h" -#include "opal/mca/btl/btl.h" - -#include "opal/align.h" -#include "opal/util/sys_limits.h" - -#if OPAL_ENABLE_FT_CR == 1 -#include "opal/util/basename.h" -#include "opal/mca/crs/base/base.h" -#include "opal/util/basename.h" -#include "orte/mca/sstore/sstore.h" -#include "opal/runtime/opal_cr.h" -#endif - -#include "btl_sm.h" -#include "btl_sm_endpoint.h" -#include "btl_sm_frag.h" -#include "btl_sm_fifo.h" - -#include "opal/util/proc.h" - -mca_btl_sm_t mca_btl_sm = { - .super = { - .btl_component = &mca_btl_sm_component.super, - .btl_add_procs = mca_btl_sm_add_procs, - .btl_del_procs = mca_btl_sm_del_procs, - .btl_finalize = mca_btl_sm_finalize, - .btl_alloc = mca_btl_sm_alloc, - .btl_free = mca_btl_sm_free, - .btl_prepare_src = mca_btl_sm_prepare_src, - .btl_send = mca_btl_sm_send, - .btl_sendi = mca_btl_sm_sendi, - .btl_dump = mca_btl_sm_dump, - .btl_register_error = mca_btl_sm_register_error_cb, /* register error */ - .btl_ft_event = mca_btl_sm_ft_event - } -}; - -/* - * calculate offset of an address from the beginning of a shared memory segment - */ -#define ADDR2OFFSET(ADDR, BASE) ((char*)(ADDR) - (char*)(BASE)) - -/* - * calculate an absolute address in a local address space given an offset and - * a base address of a shared memory segment - */ -#define OFFSET2ADDR(OFFSET, BASE) ((ptrdiff_t)(OFFSET) + (char*)(BASE)) - -static void *mpool_calloc(size_t nmemb, size_t size) -{ - void *buf; - size_t bsize = nmemb * size; - mca_mpool_base_module_t *mpool = mca_btl_sm_component.sm_mpool; - - buf = mpool->mpool_alloc(mpool, bsize, opal_cache_line_size, 0); - - if (NULL == buf) - return NULL; - - memset(buf, 0, bsize); - return buf; -} - -static int -setup_mpool_base_resources(mca_btl_sm_component_t *comp_ptr, - mca_common_sm_mpool_resources_t *out_res) -{ - int rc = OPAL_SUCCESS; - int fd = -1; - ssize_t bread = 0; - - /* Wait for the file to be created */ - while (0 != access(comp_ptr->sm_rndv_file_name, R_OK)) { - opal_progress(); - } - - if (-1 == (fd = open(comp_ptr->sm_mpool_rndv_file_name, O_RDONLY))) { - int err = errno; - opal_show_help("help-mpi-btl-sm.txt", "sys call fail", true, - "open(2)", strerror(err), err); - rc = OPAL_ERR_IN_ERRNO; - goto out; - } - if ((ssize_t)sizeof(opal_shmem_ds_t) != (bread = - read(fd, &out_res->bs_meta_buf, sizeof(opal_shmem_ds_t)))) { - opal_output(0, "setup_mpool_base_resources: " - "Read inconsistency -- read: %lu, but expected: %lu!\n", - (unsigned long)bread, - (unsigned long)sizeof(opal_shmem_ds_t)); - rc = OPAL_ERROR; - goto out; - } - if ((ssize_t)sizeof(out_res->size) != (bread = - read(fd, &out_res->size, sizeof(size_t)))) { - opal_output(0, "setup_mpool_base_resources: " - "Read inconsistency -- read: %lu, but expected: %lu!\n", - (unsigned long)bread, - (unsigned long)sizeof(opal_shmem_ds_t)); - rc = OPAL_ERROR; - goto out; - } - -out: - if (-1 != fd) { - (void)close(fd); - } - return rc; -} - -static int -sm_segment_attach(mca_btl_sm_component_t *comp_ptr) -{ - int rc = OPAL_SUCCESS; - int fd = -1; - ssize_t bread = 0; - opal_shmem_ds_t *tmp_shmem_ds = calloc(1, sizeof(*tmp_shmem_ds)); - - if (NULL == tmp_shmem_ds) { - return OPAL_ERR_OUT_OF_RESOURCE; - } - if (-1 == (fd = open(comp_ptr->sm_rndv_file_name, O_RDONLY))) { - int err = errno; - opal_show_help("help-mpi-btl-sm.txt", "sys call fail", true, - "open(2)", strerror(err), err); - rc = OPAL_ERR_IN_ERRNO; - goto out; - } - if ((ssize_t)sizeof(opal_shmem_ds_t) != (bread = - read(fd, tmp_shmem_ds, sizeof(opal_shmem_ds_t)))) { - opal_output(0, "sm_segment_attach: " - "Read inconsistency -- read: %lu, but expected: %lu!\n", - (unsigned long)bread, - (unsigned long)sizeof(opal_shmem_ds_t)); - rc = OPAL_ERROR; - goto out; - } - if (NULL == (comp_ptr->sm_seg = - mca_common_sm_module_attach(tmp_shmem_ds, - sizeof(mca_common_sm_seg_header_t), - opal_cache_line_size))) { - /* don't have to detach here, because module_attach cleans up after - * itself on failure. */ - opal_output(0, "sm_segment_attach: " - "mca_common_sm_module_attach failure!\n"); - rc = OPAL_ERROR; - } - -out: - if (-1 != fd) { - (void)close(fd); - } - if (tmp_shmem_ds) { - free(tmp_shmem_ds); - } - return rc; -} - -static int -sm_btl_first_time_init(mca_btl_sm_t *sm_btl, - int32_t my_smp_rank, - int n) -{ - size_t length, length_payload; - sm_fifo_t *my_fifos; - int my_mem_node, num_mem_nodes, i, rc; - mca_common_sm_mpool_resources_t *res = NULL; - mca_btl_sm_component_t* m = &mca_btl_sm_component; - char *loc, *mynuma; - opal_process_name_t wildcard_rank; - - /* Assume we don't have hwloc support and fill in dummy info */ - mca_btl_sm_component.mem_node = my_mem_node = 0; - mca_btl_sm_component.num_mem_nodes = num_mem_nodes = 1; - - /* see if we were given a topology signature */ - wildcard_rank.jobid = OPAL_PROC_MY_NAME.jobid; - wildcard_rank.vpid = OPAL_VPID_WILDCARD; - OPAL_MODEX_RECV_VALUE_OPTIONAL(rc, OPAL_PMIX_TOPOLOGY_SIGNATURE, - &wildcard_rank, &loc, OPAL_STRING); - if (OPAL_SUCCESS == rc) { - /* the number of NUMA nodes is right at the front */ - num_mem_nodes = strtoul(loc, NULL, 10); - - free(loc); - } else { - /* If we have hwloc support, then get accurate information */ - if (OPAL_SUCCESS == opal_hwloc_base_get_topology()) { - i = opal_hwloc_base_get_nbobjs_by_type(opal_hwloc_topology, - HWLOC_OBJ_NODE, 0, - OPAL_HWLOC_AVAILABLE); - - /* JMS This tells me how many numa nodes are *available*, - but it's not how many are being used *by this job*. - Note that this is the value we've previously used (from - the previous carto-based implementation), but it really - should be improved to be how many NUMA nodes are being - used *in this job*. */ - num_mem_nodes = i; - } - } - if (0 == num_mem_nodes) { - /* the topology might not contain a NUMA object with hwloc < v2 - * if the node is not NUMA, so force it to one in this case */ - num_mem_nodes = 1; - } - mca_btl_sm_component.num_mem_nodes = num_mem_nodes; - /* see if we were given our location */ - loc = NULL; - OPAL_MODEX_RECV_VALUE_OPTIONAL(rc, OPAL_PMIX_LOCALITY_STRING, - &OPAL_PROC_MY_NAME, &loc, OPAL_STRING); - if (OPAL_SUCCESS == rc) { - if (NULL == loc) { - mca_btl_sm_component.mem_node = my_mem_node = -1; - } else { - /* get our NUMA location */ - mynuma = opal_hwloc_base_get_location(loc, HWLOC_OBJ_NODE, 0); - if (NULL == mynuma || - NULL != strchr(mynuma, ',') || - NULL != strchr(mynuma, '-')) { - /* we either have no idea what NUMA we are on, or we - * are on multiple NUMA nodes */ - mca_btl_sm_component.mem_node = my_mem_node = -1; - } else { - /* we are bound to a single NUMA node */ - my_mem_node = strtoul(mynuma, NULL, 10); - mca_btl_sm_component.mem_node = my_mem_node; - } - if (NULL != mynuma) { - free(mynuma); - } - free(loc); - } - } else { - /* If we have hwloc support, then get accurate information */ - if (OPAL_SUCCESS == opal_hwloc_base_get_topology() && num_mem_nodes > 0) { - int numa=0, w; - unsigned n_bound=0; - hwloc_cpuset_t avail; - hwloc_obj_t obj; - - /* count the number of NUMA nodes to which we are bound */ - for (w=0; w < i; w++) { - if (NULL == (obj = opal_hwloc_base_get_obj_by_type(opal_hwloc_topology, - HWLOC_OBJ_NODE, 0, w, - OPAL_HWLOC_AVAILABLE))) { - continue; - } - /* get that NUMA node's available cpus */ - avail = opal_hwloc_base_get_available_cpus(opal_hwloc_topology, obj); - /* see if we intersect */ - if (hwloc_bitmap_intersects(avail, opal_hwloc_my_cpuset)) { - n_bound++; - numa = w; - } - } - /* if we are located on more than one NUMA, or we didn't find - * a NUMA we are on, then not much we can do - */ - if (1 == n_bound) { - mca_btl_sm_component.mem_node = my_mem_node = numa; - } else { - mca_btl_sm_component.mem_node = my_mem_node = -1; - } - } - } - - if (NULL == (res = calloc(1, sizeof(*res)))) { - return OPAL_ERR_OUT_OF_RESOURCE; - } - - /* lookup shared memory pool */ - mca_btl_sm_component.sm_mpools = - (mca_mpool_base_module_t **)calloc(num_mem_nodes, - sizeof(mca_mpool_base_module_t *)); - - /* Disable memory binding, because each MPI process will claim pages in the - * mpool for their local NUMA node */ - res->mem_node = -1; - res->allocator = mca_btl_sm_component.allocator; - - if (OPAL_SUCCESS != (rc = setup_mpool_base_resources(m, res))) { - free(res); - return rc; - } - /* now that res is fully populated, create the thing */ - mca_btl_sm_component.sm_mpools[0] = common_sm_mpool_create (res); - /* Sanity check to ensure that we found it */ - if (NULL == mca_btl_sm_component.sm_mpools[0]) { - free(res); - return OPAL_ERR_OUT_OF_RESOURCE; - } - - mca_btl_sm_component.sm_mpool = mca_btl_sm_component.sm_mpools[0]; - - mca_btl_sm_component.sm_mpool_base = - mca_btl_sm_component.sm_mpools[0]->mpool_base(mca_btl_sm_component.sm_mpools[0]); - - /* create a list of peers */ - mca_btl_sm_component.sm_peers = (struct mca_btl_base_endpoint_t**) - calloc(n, sizeof(struct mca_btl_base_endpoint_t*)); - if (NULL == mca_btl_sm_component.sm_peers) { - free(res); - return OPAL_ERR_OUT_OF_RESOURCE; - } - - /* remember that node rank zero is already attached */ - if (0 != my_smp_rank) { - if (OPAL_SUCCESS != (rc = sm_segment_attach(m))) { - free(res); - return rc; - } - } - - /* it is now safe to free the mpool resources */ - free(res); - - /* check to make sure number of local procs is within the - * specified limits */ - if(mca_btl_sm_component.sm_max_procs > 0 && - mca_btl_sm_component.num_smp_procs + n > - mca_btl_sm_component.sm_max_procs) { - return OPAL_ERROR; - } - - mca_btl_sm_component.shm_fifo = (volatile sm_fifo_t **)mca_btl_sm_component.sm_seg->module_data_addr; - mca_btl_sm_component.shm_bases = (char**)(mca_btl_sm_component.shm_fifo + n); - mca_btl_sm_component.shm_mem_nodes = (uint16_t*)(mca_btl_sm_component.shm_bases + n); - - /* set the base of the shared memory segment */ - mca_btl_sm_component.shm_bases[mca_btl_sm_component.my_smp_rank] = - (char*)mca_btl_sm_component.sm_mpool_base; - mca_btl_sm_component.shm_mem_nodes[mca_btl_sm_component.my_smp_rank] = - (uint16_t)my_mem_node; - - /* initialize the array of fifo's "owned" by this process */ - if(NULL == (my_fifos = (sm_fifo_t*)mpool_calloc(FIFO_MAP_NUM(n), sizeof(sm_fifo_t)))) - return OPAL_ERR_OUT_OF_RESOURCE; - - mca_btl_sm_component.shm_fifo[mca_btl_sm_component.my_smp_rank] = my_fifos; - - /* cache the pointer to the 2d fifo array. These addresses - * are valid in the current process space */ - mca_btl_sm_component.fifo = (sm_fifo_t**)malloc(sizeof(sm_fifo_t*) * n); - - if(NULL == mca_btl_sm_component.fifo) - return OPAL_ERR_OUT_OF_RESOURCE; - - mca_btl_sm_component.fifo[mca_btl_sm_component.my_smp_rank] = my_fifos; - - mca_btl_sm_component.mem_nodes = (uint16_t *) malloc(sizeof(uint16_t) * n); - if(NULL == mca_btl_sm_component.mem_nodes) - return OPAL_ERR_OUT_OF_RESOURCE; - - /* initialize fragment descriptor free lists */ - - /* allocation will be for the fragment descriptor and payload buffer */ - length = sizeof(mca_btl_sm_frag1_t); - length_payload = - sizeof(mca_btl_sm_hdr_t) + mca_btl_sm_component.eager_limit; - i = opal_free_list_init (&mca_btl_sm_component.sm_frags_eager, length, - opal_cache_line_size, OBJ_CLASS(mca_btl_sm_frag1_t), - length_payload, opal_cache_line_size, - mca_btl_sm_component.sm_free_list_num, - mca_btl_sm_component.sm_free_list_max, - mca_btl_sm_component.sm_free_list_inc, - mca_btl_sm_component.sm_mpool, 0, NULL, NULL, NULL); - if ( OPAL_SUCCESS != i ) - return i; - - length = sizeof(mca_btl_sm_frag2_t); - length_payload = - sizeof(mca_btl_sm_hdr_t) + mca_btl_sm_component.max_frag_size; - i = opal_free_list_init (&mca_btl_sm_component.sm_frags_max, length, - opal_cache_line_size, OBJ_CLASS(mca_btl_sm_frag2_t), - length_payload, opal_cache_line_size, - mca_btl_sm_component.sm_free_list_num, - mca_btl_sm_component.sm_free_list_max, - mca_btl_sm_component.sm_free_list_inc, - mca_btl_sm_component.sm_mpool, 0, NULL, NULL, NULL); - if ( OPAL_SUCCESS != i ) - return i; - - i = opal_free_list_init (&mca_btl_sm_component.sm_frags_user, - sizeof(mca_btl_sm_user_t), - opal_cache_line_size, OBJ_CLASS(mca_btl_sm_user_t), - sizeof(mca_btl_sm_hdr_t), opal_cache_line_size, - mca_btl_sm_component.sm_free_list_num, - mca_btl_sm_component.sm_free_list_max, - mca_btl_sm_component.sm_free_list_inc, - mca_btl_sm_component.sm_mpool, 0, NULL, NULL, NULL); - if ( OPAL_SUCCESS != i ) - return i; - - mca_btl_sm_component.num_outstanding_frags = 0; - - mca_btl_sm_component.num_pending_sends = 0; - i = opal_free_list_init(&mca_btl_sm_component.pending_send_fl, - sizeof(btl_sm_pending_send_item_t), 8, - OBJ_CLASS(opal_free_list_item_t), - 0, 0, 16, -1, 32, NULL, 0, NULL, NULL, - NULL); - if ( OPAL_SUCCESS != i ) - return i; - - /* set flag indicating btl has been inited */ - sm_btl->btl_inited = true; - - return OPAL_SUCCESS; -} - -static struct mca_btl_base_endpoint_t * -create_sm_endpoint(int local_proc, struct opal_proc_t *proc) -{ - struct mca_btl_base_endpoint_t *ep; - -#if OPAL_ENABLE_PROGRESS_THREADS == 1 - char path[PATH_MAX]; -#endif - - ep = (struct mca_btl_base_endpoint_t*) - malloc(sizeof(struct mca_btl_base_endpoint_t)); - if(NULL == ep) - return NULL; - ep->peer_smp_rank = local_proc + mca_btl_sm_component.num_smp_procs; - - OBJ_CONSTRUCT(&ep->pending_sends, opal_list_t); - OBJ_CONSTRUCT(&ep->endpoint_lock, opal_mutex_t); -#if OPAL_ENABLE_PROGRESS_THREADS == 1 - sprintf(path, "%s"OPAL_PATH_SEP"sm_fifo.%lu", - opal_process_info.job_session_dir, - (unsigned long)proc->proc_name); - ep->fifo_fd = open(path, O_WRONLY); - if(ep->fifo_fd < 0) { - opal_output(0, "mca_btl_sm_add_procs: open(%s) failed with errno=%d\n", - path, errno); - free(ep); - return NULL; - } -#endif - return ep; -} - -int mca_btl_sm_add_procs( - struct mca_btl_base_module_t* btl, - size_t nprocs, - struct opal_proc_t **procs, - struct mca_btl_base_endpoint_t **peers, - opal_bitmap_t* reachability) -{ - int return_code = OPAL_SUCCESS; - int32_t n_local_procs = 0, proc, j, my_smp_rank = -1; - const opal_proc_t* my_proc; /* pointer to caller's proc structure */ - mca_btl_sm_t *sm_btl; - bool have_connected_peer = false; - char **bases; - /* for easy access to the mpool_sm_module */ - mca_common_sm_mpool_module_t *sm_mpool_modp = NULL; - - /* initializion */ - - sm_btl = (mca_btl_sm_t *)btl; - - /* get pointer to my proc structure */ - if( NULL == (my_proc = opal_proc_local_get()) ) - return OPAL_ERR_OUT_OF_RESOURCE; - - /* Get unique host identifier for each process in the list, - * and idetify procs that are on this host. Add procs on this - * host to shared memory reachbility list. Also, get number - * of local procs in the procs list. */ - for (proc = 0; proc < (int32_t)nprocs; proc++) { - /* check to see if this proc can be reached via shmem (i.e., - if they're on my local host and in my job) */ - if (procs[proc]->proc_name.jobid != my_proc->proc_name.jobid || - !OPAL_PROC_ON_LOCAL_NODE(procs[proc]->proc_flags)) { - peers[proc] = NULL; - continue; - } - - /* check to see if this is me */ - if(my_proc == procs[proc]) { - my_smp_rank = mca_btl_sm_component.my_smp_rank = n_local_procs++; - continue; - } - - /* sm doesn't support heterogeneous yet... */ - if (procs[proc]->proc_arch != my_proc->proc_arch) { - continue; - } - - /* we have someone to talk to */ - have_connected_peer = true; - - if(!(peers[proc] = create_sm_endpoint(n_local_procs, procs[proc]))) { - return_code = OPAL_ERROR; - goto CLEANUP; - } - n_local_procs++; - - /* add this proc to shared memory accessibility list */ - return_code = opal_bitmap_set_bit(reachability, proc); - if(OPAL_SUCCESS != return_code) - goto CLEANUP; - } - - /* jump out if there's not someone we can talk to */ - if (!have_connected_peer) - goto CLEANUP; - - /* make sure that my_smp_rank has been defined */ - if (-1 == my_smp_rank) { - return_code = OPAL_ERROR; - goto CLEANUP; - } - - if (!sm_btl->btl_inited) { - return_code = - sm_btl_first_time_init(sm_btl, my_smp_rank, - mca_btl_sm_component.sm_max_procs); - if (return_code != OPAL_SUCCESS) { - goto CLEANUP; - } - } - - /* set local proc's smp rank in the peers structure for - * rapid access and calculate reachability */ - for(proc = 0; proc < (int32_t)nprocs; proc++) { - if(NULL == peers[proc]) - continue; - mca_btl_sm_component.sm_peers[peers[proc]->peer_smp_rank] = peers[proc]; - peers[proc]->my_smp_rank = my_smp_rank; - } - - bases = mca_btl_sm_component.shm_bases; - sm_mpool_modp = (mca_common_sm_mpool_module_t *)mca_btl_sm_component.sm_mpool; - - /* initialize own FIFOs */ - /* - * The receiver initializes all its FIFOs. All components will - * be allocated near the receiver. Nothing will be local to - * "the sender" since there will be many senders. - */ - for(j = mca_btl_sm_component.num_smp_procs; - j < mca_btl_sm_component.num_smp_procs + FIFO_MAP_NUM(n_local_procs); j++) { - - return_code = sm_fifo_init( mca_btl_sm_component.fifo_size, - mca_btl_sm_component.sm_mpool, - &mca_btl_sm_component.fifo[my_smp_rank][j], - mca_btl_sm_component.fifo_lazy_free); - if(return_code != OPAL_SUCCESS) - goto CLEANUP; - } - - opal_atomic_wmb(); - - /* Sync with other local procs. Force the FIFO initialization to always - * happens before the readers access it. - */ - (void)opal_atomic_add_32(&mca_btl_sm_component.sm_seg->module_seg->seg_inited, 1); - while( n_local_procs > - mca_btl_sm_component.sm_seg->module_seg->seg_inited) { - opal_progress(); - opal_atomic_rmb(); - } - - /* it is now safe to unlink the shared memory segment. only one process - * needs to do this, so just let smp rank zero take care of it. */ - if (0 == my_smp_rank) { - if (OPAL_SUCCESS != - mca_common_sm_module_unlink(mca_btl_sm_component.sm_seg)) { - /* it is "okay" if this fails at this point. we have gone this far, - * so just warn about the failure and continue. this is probably - * only triggered by a programming error. */ - opal_output(0, "WARNING: common_sm_module_unlink failed.\n"); - } - /* SKG - another abstraction violation here, but I don't want to add - * extra code in the sm mpool for further synchronization. */ - - /* at this point, all processes have attached to the mpool segment. so - * it is safe to unlink it here. */ - if (OPAL_SUCCESS != - mca_common_sm_module_unlink(sm_mpool_modp->sm_common_module)) { - opal_output(0, "WARNING: common_sm_module_unlink failed.\n"); - } - if (-1 == unlink(mca_btl_sm_component.sm_mpool_rndv_file_name)) { - opal_output(0, "WARNING: %s unlink failed.\n", - mca_btl_sm_component.sm_mpool_rndv_file_name); - } - if (-1 == unlink(mca_btl_sm_component.sm_rndv_file_name)) { - opal_output(0, "WARNING: %s unlink failed.\n", - mca_btl_sm_component.sm_rndv_file_name); - } - } - - /* free up some space used by the name buffers */ - free(mca_btl_sm_component.sm_mpool_ctl_file_name); - free(mca_btl_sm_component.sm_mpool_rndv_file_name); - free(mca_btl_sm_component.sm_ctl_file_name); - free(mca_btl_sm_component.sm_rndv_file_name); - - /* coordinate with other processes */ - for(j = mca_btl_sm_component.num_smp_procs; - j < mca_btl_sm_component.num_smp_procs + n_local_procs; j++) { - ptrdiff_t diff; - - /* spin until this element is allocated */ - /* doesn't really wait for that process... FIFO might be allocated, but not initialized */ - opal_atomic_rmb(); - while(NULL == mca_btl_sm_component.shm_fifo[j]) { - opal_progress(); - opal_atomic_rmb(); - } - - /* Calculate the difference as (my_base - their_base) */ - diff = ADDR2OFFSET(bases[my_smp_rank], bases[j]); - - /* store local address of remote fifos */ - mca_btl_sm_component.fifo[j] = - (sm_fifo_t*)OFFSET2ADDR(diff, mca_btl_sm_component.shm_fifo[j]); - - /* cache local copy of peer memory node number */ - mca_btl_sm_component.mem_nodes[j] = mca_btl_sm_component.shm_mem_nodes[j]; - } - - /* update the local smp process count */ - mca_btl_sm_component.num_smp_procs += n_local_procs; - - /* make sure we have enough eager fragmnents for each process */ - return_code = opal_free_list_resize_mt (&mca_btl_sm_component.sm_frags_eager, - mca_btl_sm_component.num_smp_procs * 2); - if (OPAL_SUCCESS != return_code) - goto CLEANUP; - -CLEANUP: - return return_code; -} - -int mca_btl_sm_del_procs( - struct mca_btl_base_module_t* btl, - size_t nprocs, - struct opal_proc_t **procs, - struct mca_btl_base_endpoint_t **peers) -{ - return OPAL_SUCCESS; -} - - -/** - * MCA->BTL Clean up any resources held by BTL module - * before the module is unloaded. - * - * @param btl (IN) BTL module. - * - * Prior to unloading a BTL module, the MCA framework will call - * the BTL finalize method of the module. Any resources held by - * the BTL should be released and if required the memory corresponding - * to the BTL module freed. - * - */ - -int mca_btl_sm_finalize(struct mca_btl_base_module_t* btl) -{ - return OPAL_SUCCESS; -} - - -/* - * Register callback function for error handling.. - */ -int mca_btl_sm_register_error_cb( - struct mca_btl_base_module_t* btl, - mca_btl_base_module_error_cb_fn_t cbfunc) -{ - mca_btl_sm_t *sm_btl = (mca_btl_sm_t *)btl; - sm_btl->error_cb = cbfunc; - return OPAL_SUCCESS; -} - -/** - * Allocate a segment. - * - * @param btl (IN) BTL module - * @param size (IN) Request segment size. - */ -extern mca_btl_base_descriptor_t* mca_btl_sm_alloc( - struct mca_btl_base_module_t* btl, - struct mca_btl_base_endpoint_t* endpoint, - uint8_t order, - size_t size, - uint32_t flags) -{ - mca_btl_sm_frag_t* frag = NULL; - if(size <= mca_btl_sm_component.eager_limit) { - MCA_BTL_SM_FRAG_ALLOC_EAGER(frag); - } else if (size <= mca_btl_sm_component.max_frag_size) { - MCA_BTL_SM_FRAG_ALLOC_MAX(frag); - } - - if (OPAL_LIKELY(frag != NULL)) { - frag->segment.base.seg_len = size; - frag->base.des_flags = flags; - } - return (mca_btl_base_descriptor_t*)frag; -} - -/** - * Return a segment allocated by this BTL. - * - * @param btl (IN) BTL module - * @param segment (IN) Allocated segment. - */ -extern int mca_btl_sm_free( - struct mca_btl_base_module_t* btl, - mca_btl_base_descriptor_t* des) -{ - mca_btl_sm_frag_t* frag = (mca_btl_sm_frag_t*)des; - MCA_BTL_SM_FRAG_RETURN(frag); - - return OPAL_SUCCESS; -} - - -/** - * Pack data - * - * @param btl (IN) BTL module - */ -struct mca_btl_base_descriptor_t* mca_btl_sm_prepare_src( - struct mca_btl_base_module_t* btl, - struct mca_btl_base_endpoint_t* endpoint, - struct opal_convertor_t* convertor, - uint8_t order, - size_t reserve, - size_t* size, - uint32_t flags) -{ - mca_btl_sm_frag_t* frag; - struct iovec iov; - uint32_t iov_count = 1; - size_t max_data = *size; - int rc; - -#if OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA - mca_btl_sm_t* sm_btl = (mca_btl_sm_t*)btl; (void)sm_btl; - - if( (0 != reserve) || ( OPAL_UNLIKELY(!mca_btl_sm_component.use_knem) - && OPAL_UNLIKELY(!mca_btl_sm_component.use_cma)) ) { -#endif /* OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA */ - if ( reserve + max_data <= mca_btl_sm_component.eager_limit ) { - MCA_BTL_SM_FRAG_ALLOC_EAGER(frag); - } else { - MCA_BTL_SM_FRAG_ALLOC_MAX(frag); - } - if( OPAL_UNLIKELY(NULL == frag) ) { - return NULL; - } - - if( OPAL_UNLIKELY(reserve + max_data > frag->size) ) { - max_data = frag->size - reserve; - } - iov.iov_len = max_data; - iov.iov_base = - (IOVBASE_TYPE*)(((unsigned char*)(frag->segment.base.seg_addr.pval)) + reserve); - - rc = opal_convertor_pack(convertor, &iov, &iov_count, &max_data ); - if( OPAL_UNLIKELY(rc < 0) ) { - MCA_BTL_SM_FRAG_RETURN(frag); - return NULL; - } - frag->segment.base.seg_len = reserve + max_data; -#if OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA - } else { -#if OPAL_BTL_SM_HAVE_KNEM - struct knem_cmd_create_region knem_cr; - struct knem_cmd_param_iovec knem_iov; -#endif /* OPAL_BTL_SM_HAVE_KNEM */ - MCA_BTL_SM_FRAG_ALLOC_USER(frag); - if( OPAL_UNLIKELY(NULL == frag) ) { - return NULL; - } - iov.iov_len = max_data; - iov.iov_base = NULL; - rc = opal_convertor_pack(convertor, &iov, &iov_count, &max_data); - if( OPAL_UNLIKELY(rc < 0) ) { - MCA_BTL_SM_FRAG_RETURN(frag); - return NULL; - } - frag->segment.base.seg_addr.lval = (uint64_t)(uintptr_t) iov.iov_base; - frag->segment.base.seg_len = max_data; - -#if OPAL_BTL_SM_HAVE_KNEM - if (OPAL_LIKELY(mca_btl_sm_component.use_knem)) { - knem_iov.base = (uintptr_t)iov.iov_base; - knem_iov.len = max_data; - knem_cr.iovec_array = (uintptr_t)&knem_iov; - knem_cr.iovec_nr = iov_count; - knem_cr.protection = PROT_READ; - knem_cr.flags = KNEM_FLAG_SINGLEUSE; - if (OPAL_UNLIKELY(ioctl(sm_btl->knem_fd, KNEM_CMD_CREATE_REGION, &knem_cr) < 0)) { - return NULL; - } - frag->segment.key = knem_cr.cookie; - } -#endif /* OPAL_BTL_SM_HAVE_KNEM */ - -#if OPAL_BTL_SM_HAVE_CMA - if (OPAL_LIKELY(mca_btl_sm_component.use_cma)) { - /* Encode the pid as the key */ - frag->segment.key = getpid(); - } -#endif /* OPAL_BTL_SM_HAVE_CMA */ - } -#endif /* OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA */ - - frag->base.des_segments = &(frag->segment.base); - frag->base.des_segment_count = 1; - frag->base.order = MCA_BTL_NO_ORDER; - frag->base.des_flags = flags; - *size = max_data; - return &frag->base; -} - -#if 0 -#define MCA_BTL_SM_TOUCH_DATA_TILL_CACHELINE_BOUNDARY(sm_frag) \ - do { \ - char* _memory = (char*)(sm_frag)->segment.base.seg_addr.pval + \ - (sm_frag)->segment.base.seg_len; \ - int* _intmem; \ - size_t align = (intptr_t)_memory & 0xFUL; \ - switch( align & 0x3 ) { \ - case 3: *_memory = 0; _memory++; \ - case 2: *_memory = 0; _memory++; \ - case 1: *_memory = 0; _memory++; \ - } \ - align >>= 2; \ - _intmem = (int*)_memory; \ - switch( align ) { \ - case 3: *_intmem = 0; _intmem++; \ - case 2: *_intmem = 0; _intmem++; \ - case 1: *_intmem = 0; _intmem++; \ - } \ - } while(0) -#else -#define MCA_BTL_SM_TOUCH_DATA_TILL_CACHELINE_BOUNDARY(sm_frag) -#endif - -#if 0 - if( OPAL_LIKELY(align > 0) ) { \ - align = 0xFUL - align; \ - memset( _memory, 0, align ); \ - } \ - -#endif - -/** - * Initiate an inline send to the peer. If failure then return a descriptor. - * - * @param btl (IN) BTL module - * @param peer (IN) BTL peer addressing - */ -int mca_btl_sm_sendi( struct mca_btl_base_module_t* btl, - struct mca_btl_base_endpoint_t* endpoint, - struct opal_convertor_t* convertor, - void* header, - size_t header_size, - size_t payload_size, - uint8_t order, - uint32_t flags, - mca_btl_base_tag_t tag, - mca_btl_base_descriptor_t** descriptor ) -{ - size_t length = (header_size + payload_size); - mca_btl_sm_frag_t* frag; - int rc; - - if ( mca_btl_sm_component.num_outstanding_frags * 2 > (int) mca_btl_sm_component.fifo_size ) { - mca_btl_sm_component_progress(); - } - - /* this check should be unnecessary... turn into an assertion? */ - if( length < mca_btl_sm_component.eager_limit ) { - - /* allocate a fragment, giving up if we can't get one */ - /* note that frag==NULL is equivalent to rc returning an error code */ - MCA_BTL_SM_FRAG_ALLOC_EAGER(frag); - if( OPAL_UNLIKELY(NULL == frag) ) { - if (NULL != descriptor) { - *descriptor = NULL; - } - return OPAL_ERR_OUT_OF_RESOURCE; - } - - /* fill in fragment fields */ - frag->segment.base.seg_len = length; - frag->hdr->len = length; - assert( 0 == (flags & MCA_BTL_DES_SEND_ALWAYS_CALLBACK) ); - frag->base.des_flags = flags | MCA_BTL_DES_FLAGS_BTL_OWNERSHIP; /* why do any flags matter here other than OWNERSHIP? */ - frag->hdr->tag = tag; - frag->endpoint = endpoint; - - /* write the match header (with MPI comm/tag/etc. info) */ - memcpy( frag->segment.base.seg_addr.pval, header, header_size ); - - /* write the message data if there is any */ - /* - We can add MEMCHECKER calls before and after the packing. - */ - if( payload_size ) { - size_t max_data; - struct iovec iov; - uint32_t iov_count; - /* pack the data into the supplied buffer */ - iov.iov_base = (IOVBASE_TYPE*)((unsigned char*)frag->segment.base.seg_addr.pval + header_size); - iov.iov_len = max_data = payload_size; - iov_count = 1; - - (void)opal_convertor_pack( convertor, &iov, &iov_count, &max_data); - - assert(max_data == payload_size); - } - - MCA_BTL_SM_TOUCH_DATA_TILL_CACHELINE_BOUNDARY(frag); - - /* write the fragment pointer to the FIFO */ - /* - * Note that we don't care what the FIFO-write return code is. Even if - * the return code indicates failure, the write has still "completed" from - * our point of view: it has been posted to a "pending send" queue. - */ - OPAL_THREAD_ADD32(&mca_btl_sm_component.num_outstanding_frags, +1); - MCA_BTL_SM_FIFO_WRITE(endpoint, endpoint->my_smp_rank, - endpoint->peer_smp_rank, (void *) VIRTUAL2RELATIVE(frag->hdr), false, true, rc); - (void)rc; /* this is safe to ignore as the message is requeued till success */ - return OPAL_SUCCESS; - } - - if (NULL != descriptor) { - /* presumably, this code path will never get executed */ - *descriptor = mca_btl_sm_alloc( btl, endpoint, order, - payload_size + header_size, flags); - } - - return OPAL_ERR_RESOURCE_BUSY; -} - -/** - * Initiate a send to the peer. - * - * @param btl (IN) BTL module - * @param peer (IN) BTL peer addressing - */ -int mca_btl_sm_send( struct mca_btl_base_module_t* btl, - struct mca_btl_base_endpoint_t* endpoint, - struct mca_btl_base_descriptor_t* descriptor, - mca_btl_base_tag_t tag ) -{ - mca_btl_sm_frag_t* frag = (mca_btl_sm_frag_t*)descriptor; - int rc; - - if ( mca_btl_sm_component.num_outstanding_frags * 2 > (int) mca_btl_sm_component.fifo_size ) { - mca_btl_sm_component_progress(); - } - - /* available header space */ - frag->hdr->len = frag->segment.base.seg_len; - /* type of message, pt-2-pt, one-sided, etc */ - frag->hdr->tag = tag; - - MCA_BTL_SM_TOUCH_DATA_TILL_CACHELINE_BOUNDARY(frag); - - frag->endpoint = endpoint; - - /* - * post the descriptor in the queue - post with the relative - * address - */ - OPAL_THREAD_ADD32(&mca_btl_sm_component.num_outstanding_frags, +1); - MCA_BTL_SM_FIFO_WRITE(endpoint, endpoint->my_smp_rank, - endpoint->peer_smp_rank, (void *) VIRTUAL2RELATIVE(frag->hdr), false, true, rc); - if( OPAL_LIKELY(0 == rc) ) { - return 1; /* the data is completely gone */ - } - frag->base.des_flags |= MCA_BTL_DES_SEND_ALWAYS_CALLBACK; - /* not yet gone, but pending. Let the upper level knows that - * the callback will be triggered when the data will be sent. - */ - return 0; -} - -#if OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA -mca_btl_base_registration_handle_t *mca_btl_sm_register_mem (struct mca_btl_base_module_t* btl, - struct mca_btl_base_endpoint_t* endpoint, - void *base, size_t size, uint32_t flags) -{ - mca_btl_sm_registration_handle_t *handle; - opal_free_list_item_t *item = NULL; - - item = opal_free_list_get (&mca_btl_sm_component.registration_handles); - if (OPAL_UNLIKELY(NULL == item)) { - return NULL; - } - - handle = (mca_btl_sm_registration_handle_t *) item; - -#if OPAL_BTL_SM_HAVE_KNEM - if (OPAL_LIKELY(mca_btl_sm_component.use_knem)) { - struct knem_cmd_create_region knem_cr; - struct knem_cmd_param_iovec knem_iov; - - knem_iov.base = (uintptr_t)base & ~(opal_getpagesize() - 1); - knem_iov.len = OPAL_ALIGN(size + ((intptr_t) base - knem_iov.base), opal_getpagesize(), intptr_t); - knem_cr.iovec_array = (uintptr_t)&knem_iov; - knem_cr.iovec_nr = 1; - knem_cr.flags = 0; - knem_cr.protection = 0; - - if (flags & MCA_BTL_REG_FLAG_REMOTE_READ) { - knem_cr.protection |= PROT_READ; - } - if (flags & MCA_BTL_REG_FLAG_REMOTE_WRITE) { - knem_cr.protection |= PROT_WRITE; - } - - if (OPAL_UNLIKELY(ioctl(((mca_btl_sm_t*)btl)->knem_fd, KNEM_CMD_CREATE_REGION, &knem_cr) < 0)) { - opal_free_list_return (&mca_btl_sm_component.registration_handles, item); - return NULL; - } - - handle->btl_handle.data.knem.cookie = knem_cr.cookie; - handle->btl_handle.data.knem.base_addr = knem_iov.base; - } else -#endif - { - /* the pid could be included in a modex but this will work until btl/sm is - * deleted */ - handle->btl_handle.data.pid = getpid (); - } - - /* return the public part of the handle */ - return &handle->btl_handle; -} - -int mca_btl_sm_deregister_mem (struct mca_btl_base_module_t* btl, mca_btl_base_registration_handle_t *handle) -{ - mca_btl_sm_registration_handle_t *sm_handle = - (mca_btl_sm_registration_handle_t *)((intptr_t) handle - offsetof (mca_btl_sm_registration_handle_t, btl_handle)); - -#if OPAL_BTL_SM_HAVE_KNEM - if (OPAL_LIKELY(mca_btl_sm_component.use_knem)) { - (void) ioctl(((mca_btl_sm_t*)btl)->knem_fd, KNEM_CMD_DESTROY_REGION, &handle->data.knem.cookie); - } -#endif - - opal_free_list_return (&mca_btl_sm_component.registration_handles, &sm_handle->super); - - return OPAL_SUCCESS; -} -#endif /* OPAL_BTL_SM_HAVE_KNEM */ - -#if OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA - -/** - * Initiate an synchronous get. - */ -int mca_btl_sm_get_sync (mca_btl_base_module_t *btl, struct mca_btl_base_endpoint_t *endpoint, void *local_address, - uint64_t remote_address, mca_btl_base_registration_handle_t *local_handle, - mca_btl_base_registration_handle_t *remote_handle, size_t size, int flags, - int order, mca_btl_base_rdma_completion_fn_t cbfunc, void *cbcontext, void *cbdata) -{ -#if OPAL_BTL_SM_HAVE_KNEM - mca_btl_sm_t* sm_btl = (mca_btl_sm_t*) btl; - if (OPAL_LIKELY(mca_btl_sm_component.use_knem)) { - struct knem_cmd_inline_copy icopy; - struct knem_cmd_param_iovec recv_iovec; - - /* Fill in the ioctl data fields. There's no async completion, so - we don't need to worry about getting a slot, etc. */ - recv_iovec.base = (uintptr_t) local_address; - recv_iovec.len = size; - icopy.local_iovec_array = (uintptr_t)&recv_iovec; - icopy.local_iovec_nr = 1; - icopy.remote_cookie = remote_handle->data.knem.cookie; - icopy.remote_offset = remote_address - remote_handle->data.knem.base_addr; - icopy.write = 0; - - /* Use the DMA flag if knem supports it *and* the segment length - is greater than the cutoff. Note that if the knem_dma_min - value is 0 (i.e., the MCA param was set to 0), the segment size - will never be larger than it, so DMA will never be used. */ - icopy.flags = 0; - if (mca_btl_sm_component.knem_dma_min <= size) { - icopy.flags = mca_btl_sm_component.knem_dma_flag; - } - /* synchronous flags only, no need to specify icopy.async_status_index */ - - /* When the ioctl returns, the transfer is done and we can invoke - the btl callback and return the frag */ - if (OPAL_UNLIKELY(0 != ioctl(sm_btl->knem_fd, - KNEM_CMD_INLINE_COPY, &icopy))) { - return OPAL_ERROR; - } - - /* FIXME: what if icopy.current_status == KNEM_STATUS_FAILED? */ - } -#endif /* OPAL_BTL_SM_HAVE_KNEM */ - -#if OPAL_BTL_SM_HAVE_CMA - if (OPAL_LIKELY(mca_btl_sm_component.use_cma)) { - struct iovec local, remote; - pid_t remote_pid; - ssize_t val; - - remote_pid = remote_handle->data.pid; - remote.iov_base = (void *) (intptr_t) remote_address; - remote.iov_len = size; - local.iov_base = local_address; - local.iov_len = size; - - val = process_vm_readv(remote_pid, &local, 1, &remote, 1, 0); - - if (val != (ssize_t)size) { - if (val < 0) { - opal_output(0, "mca_btl_sm_get_sync: process_vm_readv failed: %i", - errno); - } else { - /* Should never get a short read from process_vm_readv */ - opal_output(0, "mca_btl_sm_get_sync: process_vm_readv short read: %i", - (int)val); - } - return OPAL_ERROR; - } - } -#endif /* OPAL_BTL_SM_HAVE_CMA */ - - cbfunc (btl, endpoint, local_address, local_handle, cbcontext, cbdata, OPAL_SUCCESS); - - return OPAL_SUCCESS; -} - -#endif /* OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA */ - -#if OPAL_BTL_SM_HAVE_KNEM -/* No support async_get for CMA yet */ - -/** - * Initiate an asynchronous get. - */ -int mca_btl_sm_get_async (mca_btl_base_module_t *btl, struct mca_btl_base_endpoint_t *endpoint, void *local_address, - uint64_t remote_address, mca_btl_base_registration_handle_t *local_handle, - mca_btl_base_registration_handle_t *remote_handle, size_t size, int flags, - int order, mca_btl_base_rdma_completion_fn_t cbfunc, void *cbcontext, void *cbdata) -{ - mca_btl_sm_t* sm_btl = (mca_btl_sm_t*) btl; - mca_btl_sm_frag_t* frag; - struct knem_cmd_inline_copy icopy; - struct knem_cmd_param_iovec recv_iovec; - - /* If we have no knem slots available, fall back to synchronous */ - if (sm_btl->knem_status_num_used >= - mca_btl_sm_component.knem_max_simultaneous) { - return mca_btl_sm_get_sync (btl, endpoint, local_address, remote_address, local_handle, - remote_handle, size, flags, order, cbfunc, cbcontext, cbdata); - } - - /* allocate a fragment to keep track of this transaction */ - MCA_BTL_SM_FRAG_ALLOC_USER(frag); - if (OPAL_UNLIKELY(NULL == frag)) { - return mca_btl_sm_get_sync (btl, endpoint, local_address, remote_address, local_handle, - remote_handle, size, flags, order, cbfunc, cbcontext, cbdata); - } - - /* fill in callback data */ - frag->cb.func = cbfunc; - frag->cb.context = cbcontext; - frag->cb.data = cbdata; - frag->cb.local_address = local_address; - frag->cb.local_handle = local_handle; - - /* We have a slot, so fill in the data fields. Bump the - first_avail and num_used counters. */ - recv_iovec.base = (uintptr_t) local_address; - recv_iovec.len = size; - icopy.local_iovec_array = (uintptr_t)&recv_iovec; - icopy.local_iovec_nr = 1; - icopy.write = 0; - icopy.async_status_index = sm_btl->knem_status_first_avail++; - if (sm_btl->knem_status_first_avail >= - mca_btl_sm_component.knem_max_simultaneous) { - sm_btl->knem_status_first_avail = 0; - } - ++sm_btl->knem_status_num_used; - icopy.remote_cookie = remote_handle->data.knem.cookie; - icopy.remote_offset = remote_address - remote_handle->data.knem.base_addr; - - /* Use the DMA flag if knem supports it *and* the segment length - is greater than the cutoff */ - icopy.flags = KNEM_FLAG_ASYNCDMACOMPLETE; - if (mca_btl_sm_component.knem_dma_min <= size) { - icopy.flags = mca_btl_sm_component.knem_dma_flag; - } - - sm_btl->knem_frag_array[icopy.async_status_index] = frag; - if (OPAL_LIKELY(0 == ioctl(sm_btl->knem_fd, - KNEM_CMD_INLINE_COPY, &icopy))) { - if (icopy.current_status != KNEM_STATUS_PENDING) { - MCA_BTL_SM_FRAG_RETURN(frag); - /* request completed synchronously */ - - /* FIXME: what if icopy.current_status == KNEM_STATUS_FAILED? */ - cbfunc (btl, endpoint, local_address, local_handle, cbcontext, cbdata, OPAL_SUCCESS); - - --sm_btl->knem_status_num_used; - ++sm_btl->knem_status_first_used; - if (sm_btl->knem_status_first_used >= - mca_btl_sm_component.knem_max_simultaneous) { - sm_btl->knem_status_first_used = 0; - } - } - return OPAL_SUCCESS; - } else { - return OPAL_ERROR; - } -} -#endif /* OPAL_BTL_SM_HAVE_KNEM */ - -/** - * - */ -void mca_btl_sm_dump(struct mca_btl_base_module_t* btl, - struct mca_btl_base_endpoint_t* endpoint, - int verbose) -{ - opal_list_item_t *item; - mca_btl_sm_frag_t* frag; - - if( NULL != endpoint ) { - mca_btl_base_err("BTL SM %p endpoint %p [smp_rank %d] [peer_rank %d]\n", - (void*) btl, (void*) endpoint, - endpoint->my_smp_rank, endpoint->peer_smp_rank); - for(item = opal_list_get_first(&endpoint->pending_sends); - item != opal_list_get_end(&endpoint->pending_sends); - item = opal_list_get_next(item)) { - frag = (mca_btl_sm_frag_t*)item; - mca_btl_base_err(" | frag %p size %lu (hdr frag %p len %lu rank %d tag %d)\n", - (void*) frag, frag->size, (void*) frag->hdr->frag, - frag->hdr->len, frag->hdr->my_smp_rank, - frag->hdr->tag); - } - } -} - -#if OPAL_ENABLE_FT_CR == 0 -int mca_btl_sm_ft_event(int state) { - return OPAL_SUCCESS; -} -#else -int mca_btl_sm_ft_event(int state) { - /* Notify mpool */ - if( NULL != mca_btl_sm_component.sm_mpool && - NULL != mca_btl_sm_component.sm_mpool->mpool_ft_event) { - mca_btl_sm_component.sm_mpool->mpool_ft_event(state); - } - - if(OPAL_CRS_CHECKPOINT == state) { - if( NULL != mca_btl_sm_component.sm_seg ) { - /* On restart we need the old file names to exist (not necessarily - * contain content) so the CRS component does not fail when searching - * for these old file handles. The restart procedure will make sure - * these files get cleaned up appropriately. - */ - /* Disabled to get FT code compiled again - * TODO: FIXIT soon - orte_sstore.set_attr(orte_sstore_handle_current, - SSTORE_METADATA_LOCAL_TOUCH, - mca_btl_sm_component.sm_seg->shmem_ds.seg_name); - */ - } - } - else if(OPAL_CRS_CONTINUE == state) { - if (opal_cr_continue_like_restart) { - if( NULL != mca_btl_sm_component.sm_seg ) { - /* Add shared memory file */ - opal_crs_base_cleanup_append(mca_btl_sm_component.sm_seg->shmem_ds.seg_name, false); - } - - /* Clear this so we force the module to re-init the sm files */ - mca_btl_sm_component.sm_mpool = NULL; - } - } - else if(OPAL_CRS_RESTART == state || - OPAL_CRS_RESTART_PRE == state) { - if( NULL != mca_btl_sm_component.sm_seg ) { - /* Add shared memory file */ - opal_crs_base_cleanup_append(mca_btl_sm_component.sm_seg->shmem_ds.seg_name, false); - } - - /* Clear this so we force the module to re-init the sm files */ - mca_btl_sm_component.sm_mpool = NULL; - } - else if(OPAL_CRS_TERM == state ) { - ; - } - else { - ; - } - - return OPAL_SUCCESS; -} -#endif /* OPAL_ENABLE_FT_CR */ diff --git a/opal/mca/btl/sm/btl_sm.h b/opal/mca/btl/sm/btl_sm.h deleted file mode 100644 index 9721bede3f4..00000000000 --- a/opal/mca/btl/sm/btl_sm.h +++ /dev/null @@ -1,587 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2012 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2006-2007 Voltaire. All rights reserved. - * Copyright (c) 2009-2010 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2010-2015 Los Alamos National Security, LLC. - * All rights reserved. - * Copyright (c) 2010-2012 IBM Corporation. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ -/** - * @file - */ -#ifndef MCA_BTL_SM_H -#define MCA_BTL_SM_H - -#include "opal_config.h" -#include -#include -#include -#include -#ifdef HAVE_SCHED_H -#include -#endif /* HAVE_SCHED_H */ -#if OPAL_BTL_SM_HAVE_KNEM -#include "knem_io.h" -#endif /* OPAL_BTL_SM_HAVE_KNEM */ - -#include "opal/util/bit_ops.h" -#include "opal/class/opal_free_list.h" -#include "opal/mca/btl/btl.h" -#include "opal/util/proc.h" -#include "opal/mca/common/sm/common_sm.h" - -BEGIN_C_DECLS - -/* - * Shared Memory FIFOs - * - * The FIFO is implemented as a circular queue with head and tail pointers - * (integer indices). For efficient wraparound indexing, the size of the - * queue is constrained to be a power of two and we "&" indices with a "mask". - * - * More than one process can write to the FIFO head. Therefore, there is a head - * lock. One cannot write until the head slot is empty, indicated by the special - * queue entry SM_FIFO_FREE. - * - * Only the receiver can read the FIFO tail. Therefore, the tail lock is - * required only in multithreaded applications. If a tail read returns the - * SM_FIFO_FREE value, that means the FIFO is empty. Once a non-FREE value - * has been read, the queue slot is *not* automatically reset to SM_FIFO_FREE. - * Rather, read tail slots are reset "lazily" (see "lazy_free" and "num_to_clear") - * to reduce the number of memory barriers and improve performance. - * - * Since the FIFO lives in shared memory that is mapped differently into - * each address space, the "queue" pointer is relative (each process must - * add its own offset) and the queue_recv pointer is meaningful only in the - * receiver's address space. - * - * Since multiple processes access different parts of the FIFO structure in - * different ways, we introduce padding to keep different parts on different - * cachelines. - */ - -#define SM_FIFO_FREE (void *) (-2) -/* We can't use opal_cache_line_size here because we need a - compile-time constant for padding the struct. We can't really have - a compile-time constant that is portable, either (e.g., compile on - one machine and run on another). So just use a big enough cache - line that should hopefully be good in most places. */ -#define SM_CACHE_LINE_PAD 128 - -struct sm_fifo_t { - /* This queue pointer is used only by the heads. */ - volatile void **queue; - char pad0[SM_CACHE_LINE_PAD - sizeof(void **)]; - /* This lock is used by the heads. */ - opal_atomic_lock_t head_lock; - char pad1[SM_CACHE_LINE_PAD - sizeof(opal_atomic_lock_t)]; - /* This index is used by the head holding the head lock. */ - volatile int head; - char pad2[SM_CACHE_LINE_PAD - sizeof(int)]; - /* This mask is used "read only" by all processes. */ - unsigned int mask; - char pad3[SM_CACHE_LINE_PAD - sizeof(int)]; - /* The following are used only by the tail. */ - volatile void **queue_recv; - opal_atomic_lock_t tail_lock; - volatile int tail; - int num_to_clear; - int lazy_free; - char pad4[SM_CACHE_LINE_PAD - sizeof(void **) - - sizeof(opal_atomic_lock_t) - - sizeof(int) * 3]; -}; -typedef struct sm_fifo_t sm_fifo_t; - -/* - * Shared Memory resource managment - */ - -#if OPAL_ENABLE_PROGRESS_THREADS == 1 -#define DATA (char)0 -#define DONE (char)1 -#endif - -typedef struct mca_btl_sm_mem_node_t { - mca_mpool_base_module_t* sm_mpool; /**< shared memory pool */ -} mca_btl_sm_mem_node_t; - -/** - * Shared Memory (SM) BTL module. - */ -struct mca_btl_sm_component_t { - mca_btl_base_component_3_0_0_t super; /**< base BTL component */ - int sm_free_list_num; /**< initial size of free lists */ - int sm_free_list_max; /**< maximum size of free lists */ - int sm_free_list_inc; /**< number of elements to alloc when growing free lists */ - int sm_max_procs; /**< upper limit on the number of processes using the shared memory pool */ - int sm_extra_procs; /**< number of extra procs to allow */ - char* sm_mpool_name; /**< name of shared memory pool module */ - mca_mpool_base_module_t **sm_mpools; /**< shared memory pools (one for each memory node) */ - mca_mpool_base_module_t *sm_mpool; /**< mpool on local node */ - void* sm_mpool_base; /**< base address of shared memory pool */ - size_t eager_limit; /**< first fragment size */ - size_t max_frag_size; /**< maximum (second and beyone) fragment size */ - opal_mutex_t sm_lock; - mca_common_sm_module_t *sm_seg; /**< description of shared memory segment */ - volatile sm_fifo_t **shm_fifo; /**< pointer to fifo 2D array in shared memory */ - char **shm_bases; /**< pointer to base pointers in shared memory */ - uint16_t *shm_mem_nodes; /**< pointer to mem noded in shared memory */ - sm_fifo_t **fifo; /**< cached copy of the pointer to the 2D - fifo array. The address in the shared - memory segment sm_ctl_header is a relative, - but this one, in process private memory, is - a real virtual address */ - uint16_t *mem_nodes; /**< cached copy of mem nodes of each local rank */ - unsigned int fifo_size; /**< number of FIFO queue entries */ - unsigned int fifo_lazy_free; /**< number of reads before lazy fifo free is triggered */ - int nfifos; /**< number of FIFOs per receiver */ - int32_t num_smp_procs; /**< current number of smp procs on this host */ - int32_t my_smp_rank; /**< My SMP process rank. Used for accessing - * SMP specfic data structures. */ - opal_free_list_t sm_frags_eager; /**< free list of sm first */ - opal_free_list_t sm_frags_max; /**< free list of sm second */ - opal_free_list_t sm_frags_user; - opal_free_list_t sm_first_frags_to_progress; /**< list of first - fragments that are - awaiting resources */ - struct mca_btl_base_endpoint_t **sm_peers; - - opal_free_list_t pending_send_fl; - int num_outstanding_frags; /**< number of fragments sent but not yet returned to free list */ - int num_pending_sends; /**< total number on all of my pending-send queues */ - int mem_node; - int num_mem_nodes; - -#if OPAL_ENABLE_PROGRESS_THREADS == 1 - char sm_fifo_path[PATH_MAX]; /**< path to fifo used to signal this process */ - int sm_fifo_fd; /**< file descriptor corresponding to opened fifo */ - opal_thread_t sm_fifo_thread; -#endif - struct mca_btl_sm_t **sm_btls; - struct mca_btl_sm_frag_t **table; - size_t sm_num_btls; - size_t sm_max_btls; - -#if OPAL_BTL_SM_HAVE_KNEM - /* Knem capabilities info */ - struct knem_cmd_info knem_info; -#endif -#if OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA - /** registration handles to hold knem cookies */ - opal_free_list_t registration_handles; -#endif /* OPAL_BTL_SM_HAVE_KNEM */ - - /** MCA: should we be using knem or not? neg=try but continue if - not available, 0=don't try, 1=try and fail if not available */ - int use_knem; - - /** MCA: minimal message size (bytes) to offload on DMA engine - when using knem */ - unsigned int knem_dma_min; - - /** MCA: how many simultaneous ongoing knem operations to - support */ - int knem_max_simultaneous; - - /** If we want DMA and DMA is supported, this will be loaded with - KNEM_FLAG_DMA. Otherwise, it'll be 0. */ - int knem_dma_flag; - - /** MCA: should we be using CMA or not? - 0 = no, 1 = yes */ - int use_cma; - - /* /// well-known file names for sm and sm mpool init /// */ - char *sm_mpool_ctl_file_name; - char *sm_mpool_rndv_file_name; - char *sm_ctl_file_name; - char *sm_rndv_file_name; - - /** minimum size of a btl/sm mpool */ - unsigned long mpool_min_size; - - /** allocator name to use with the mpool */ - char *allocator; -}; -typedef struct mca_btl_sm_component_t mca_btl_sm_component_t; -OPAL_MODULE_DECLSPEC extern mca_btl_sm_component_t mca_btl_sm_component; - -/** - * SM BTL Interface - */ -struct mca_btl_sm_t { - mca_btl_base_module_t super; /**< base BTL interface */ - bool btl_inited; /**< flag indicating if btl has been inited */ - mca_btl_base_module_error_cb_fn_t error_cb; - -#if OPAL_BTL_SM_HAVE_KNEM - - /* File descriptor for knem */ - int knem_fd; - - /* Array of knem status items for non-blocking knem requests */ - knem_status_t *knem_status_array; - - /* Array of fragments currently being moved by knem non-blocking - operations */ - struct mca_btl_sm_frag_t **knem_frag_array; - - /* First free/available location in knem_status_array */ - int knem_status_first_avail; - - /* First currently-being used location in the knem_status_array */ - int knem_status_first_used; - - /* Number of status items currently in use */ - int knem_status_num_used; -#endif /* OPAL_BTL_SM_HAVE_KNEM */ -}; -typedef struct mca_btl_sm_t mca_btl_sm_t; -OPAL_MODULE_DECLSPEC extern mca_btl_sm_t mca_btl_sm; - -struct btl_sm_pending_send_item_t -{ - opal_free_list_item_t super; - void *data; -}; -typedef struct btl_sm_pending_send_item_t btl_sm_pending_send_item_t; - -/*** - * FIFO support for sm BTL. - */ - -/*** - * One or more FIFO components may be a pointer that must be - * accessed by multiple processes. Since the shared region may - * be mmapped differently into each process's address space, - * these pointers will be relative to some base address. Here, - * we define macros to translate between relative addresses and - * virtual addresses. - */ -#define VIRTUAL2RELATIVE(VADDR ) ((long)(VADDR) - (long)mca_btl_sm_component.shm_bases[mca_btl_sm_component.my_smp_rank]) -#define RELATIVE2VIRTUAL(OFFSET) ((long)(OFFSET) + (long)mca_btl_sm_component.shm_bases[mca_btl_sm_component.my_smp_rank]) - -static inline int sm_fifo_init(int fifo_size, mca_mpool_base_module_t *mpool, - sm_fifo_t *fifo, int lazy_free) -{ - int i, qsize; - - /* figure out the queue size (a power of two that is at least 1) */ - qsize = opal_next_poweroftwo_inclusive (fifo_size); - - /* allocate the queue in the receiver's address space */ - fifo->queue_recv = (volatile void **)mpool->mpool_alloc( - mpool, sizeof(void *) * qsize, opal_cache_line_size, 0); - if(NULL == fifo->queue_recv) { - return OPAL_ERR_OUT_OF_RESOURCE; - } - - /* initialize the queue */ - for ( i = 0; i < qsize; i++ ) - fifo->queue_recv[i] = SM_FIFO_FREE; - - /* shift queue address to be relative */ - fifo->queue = (volatile void **) VIRTUAL2RELATIVE(fifo->queue_recv); - - /* initialize the locks */ - opal_atomic_init(&(fifo->head_lock), OPAL_ATOMIC_UNLOCKED); - opal_atomic_init(&(fifo->tail_lock), OPAL_ATOMIC_UNLOCKED); - opal_atomic_unlock(&(fifo->head_lock)); /* should be unnecessary */ - opal_atomic_unlock(&(fifo->tail_lock)); /* should be unnecessary */ - - /* other initializations */ - fifo->head = 0; - fifo->mask = qsize - 1; - fifo->tail = 0; - fifo->num_to_clear = 0; - fifo->lazy_free = lazy_free; - - return OPAL_SUCCESS; -} - - -static inline int sm_fifo_write(void *value, sm_fifo_t *fifo) -{ - volatile void **q = (volatile void **) RELATIVE2VIRTUAL(fifo->queue); - - /* if there is no free slot to write, report exhausted resource */ - opal_atomic_rmb(); - if ( SM_FIFO_FREE != q[fifo->head] ) - return OPAL_ERR_OUT_OF_RESOURCE; - - /* otherwise, write to the slot and advance the head index */ - q[fifo->head] = value; - opal_atomic_wmb(); - fifo->head = (fifo->head + 1) & fifo->mask; - return OPAL_SUCCESS; -} - - -static inline void *sm_fifo_read(sm_fifo_t *fifo) -{ - void *value; - - /* read the next queue entry */ - value = (void *) fifo->queue_recv[fifo->tail]; - - opal_atomic_rmb(); - - /* if you read a non-empty slot, advance the tail pointer */ - if ( SM_FIFO_FREE != value ) { - - fifo->tail = ( fifo->tail + 1 ) & fifo->mask; - fifo->num_to_clear += 1; - - /* check if it's time to free slots, which we do lazily */ - if ( fifo->num_to_clear >= fifo->lazy_free ) { - int i = (fifo->tail - fifo->num_to_clear ) & fifo->mask; - - while ( fifo->num_to_clear > 0 ) { - fifo->queue_recv[i] = SM_FIFO_FREE; - i = (i+1) & fifo->mask; - fifo->num_to_clear -= 1; - } - opal_atomic_wmb(); - } - } - - return value; -} - -/** - * shared memory component progress. - */ -extern int mca_btl_sm_component_progress(void); - - - -/** - * Register a callback function that is called on error.. - * - * @param btl (IN) BTL module - * @return Status indicating if cleanup was successful - */ - -int mca_btl_sm_register_error_cb( - struct mca_btl_base_module_t* btl, - mca_btl_base_module_error_cb_fn_t cbfunc -); - -/** - * Cleanup any resources held by the BTL. - * - * @param btl BTL instance. - * @return OPAL_SUCCESS or error status on failure. - */ - -extern int mca_btl_sm_finalize( - struct mca_btl_base_module_t* btl -); - - -/** - * PML->BTL notification of change in the process list. - * PML->BTL Notification that a receive fragment has been matched. - * Called for message that is send from process with the virtual - * address of the shared memory segment being different than that of - * the receiver. - * - * @param btl (IN) - * @param proc (IN) - * @param peer (OUT) - * @return OPAL_SUCCESS or error status on failure. - * - */ - -extern int mca_btl_sm_add_procs( - struct mca_btl_base_module_t* btl, - size_t nprocs, - struct opal_proc_t **procs, - struct mca_btl_base_endpoint_t** peers, - struct opal_bitmap_t* reachability -); - - -/** - * PML->BTL notification of change in the process list. - * - * @param btl (IN) BTL instance - * @param proc (IN) Peer process - * @param peer (IN) Peer addressing information. - * @return Status indicating if cleanup was successful - * - */ -extern int mca_btl_sm_del_procs( - struct mca_btl_base_module_t* btl, - size_t nprocs, - struct opal_proc_t **procs, - struct mca_btl_base_endpoint_t **peers -); - - -/** - * Allocate a segment. - * - * @param btl (IN) BTL module - * @param size (IN) Request segment size. - */ -extern mca_btl_base_descriptor_t* mca_btl_sm_alloc( - struct mca_btl_base_module_t* btl, - struct mca_btl_base_endpoint_t* endpoint, - uint8_t order, - size_t size, - uint32_t flags -); - -/** - * Return a segment allocated by this BTL. - * - * @param btl (IN) BTL module - * @param segment (IN) Allocated segment. - */ -extern int mca_btl_sm_free( - struct mca_btl_base_module_t* btl, - mca_btl_base_descriptor_t* segment -); - - -/** - * Pack data - * - * @param btl (IN) BTL module - * @param peer (IN) BTL peer addressing - */ -struct mca_btl_base_descriptor_t* mca_btl_sm_prepare_src( - struct mca_btl_base_module_t* btl, - struct mca_btl_base_endpoint_t* endpoint, - struct opal_convertor_t* convertor, - uint8_t order, - size_t reserve, - size_t* size, - uint32_t flags -); - - -/** - * Initiate an inlined send to the peer or return a descriptor. - * - * @param btl (IN) BTL module - * @param peer (IN) BTL peer addressing - */ -extern int mca_btl_sm_sendi( struct mca_btl_base_module_t* btl, - struct mca_btl_base_endpoint_t* endpoint, - struct opal_convertor_t* convertor, - void* header, - size_t header_size, - size_t payload_size, - uint8_t order, - uint32_t flags, - mca_btl_base_tag_t tag, - mca_btl_base_descriptor_t** descriptor ); - -/** - * Initiate a send to the peer. - * - * @param btl (IN) BTL module - * @param peer (IN) BTL peer addressing - */ -extern int mca_btl_sm_send( - struct mca_btl_base_module_t* btl, - struct mca_btl_base_endpoint_t* endpoint, - struct mca_btl_base_descriptor_t* descriptor, - mca_btl_base_tag_t tag -); - -#if OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA -/* - * Synchronous knem/cma get - */ -int mca_btl_sm_get_sync (mca_btl_base_module_t *btl, struct mca_btl_base_endpoint_t *endpoint, void *local_address, - uint64_t remote_address, mca_btl_base_registration_handle_t *local_handle, - mca_btl_base_registration_handle_t *remote_handle, size_t size, int flags, - int order, mca_btl_base_rdma_completion_fn_t cbfunc, void *cbcontext, void *cbdata); -#endif /* OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA */ - -#if OPAL_BTL_SM_HAVE_KNEM -/* - * Asynchronous knem get - */ -int mca_btl_sm_get_async (mca_btl_base_module_t *btl, struct mca_btl_base_endpoint_t *endpoint, void *local_address, - uint64_t remote_address, mca_btl_base_registration_handle_t *local_handle, - mca_btl_base_registration_handle_t *remote_handle, size_t size, int flags, - int order, mca_btl_base_rdma_completion_fn_t cbfunc, void *cbcontext, void *cbdata); - -#endif /* OPAL_BTL_SM_HAVE_KNEM */ - -extern void mca_btl_sm_dump(struct mca_btl_base_module_t* btl, - struct mca_btl_base_endpoint_t* endpoint, - int verbose); - -/** - * Fault Tolerance Event Notification Function - * @param state Checkpoint Stae - * @return OPAL_SUCCESS or failure status - */ -int mca_btl_sm_ft_event(int state); - -#if OPAL_ENABLE_PROGRESS_THREADS == 1 -void mca_btl_sm_component_event_thread(opal_object_t*); -#endif - -#if OPAL_ENABLE_PROGRESS_THREADS == 1 -#define MCA_BTL_SM_SIGNAL_PEER(peer) \ -{ \ - unsigned char cmd = DATA; \ - if(write(peer->fifo_fd, &cmd, sizeof(cmd)) != sizeof(cmd)) { \ - opal_output(0, "mca_btl_sm_send: write fifo failed: errno=%d\n", errno); \ - } \ -} -#else -#define MCA_BTL_SM_SIGNAL_PEER(peer) -#endif - -#if OPAL_BTL_SM_HAVE_KNEM | OPAL_BTL_SM_HAVE_CMA -struct mca_btl_base_registration_handle_t { - union { - struct { - uint64_t cookie; - intptr_t base_addr; - } knem; - pid_t pid; - } data; -}; - -struct mca_btl_sm_registration_handle_t { - opal_free_list_item_t super; - mca_btl_base_registration_handle_t btl_handle; -}; -typedef struct mca_btl_sm_registration_handle_t mca_btl_sm_registration_handle_t; - -mca_btl_base_registration_handle_t *mca_btl_sm_register_mem (struct mca_btl_base_module_t* btl, - struct mca_btl_base_endpoint_t* endpoint, - void *base, size_t size, uint32_t flags); - -int mca_btl_sm_deregister_mem (struct mca_btl_base_module_t* btl, mca_btl_base_registration_handle_t *handle); - -#endif - -END_C_DECLS - -#endif - diff --git a/opal/mca/btl/sm/btl_sm_component.c b/opal/mca/btl/sm/btl_sm_component.c index 35796b55f17..68e8121f1ca 100644 --- a/opal/mca/btl/sm/btl_sm_component.c +++ b/opal/mca/btl/sm/btl_sm_component.c @@ -11,13 +11,13 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006-2007 Voltaire. All rights reserved. - * Copyright (c) 2009-2016 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2009-2018 Cisco Systems, Inc. All rights reserved * Copyright (c) 2010-2015 Los Alamos National Security, LLC. * All rights reserved. * Copyright (c) 2011-2014 NVIDIA Corporation. All rights reserved. - * Copyright (c) 2010-2012 IBM Corporation. All rights reserved. + * Copyright (c) 2010-2017 IBM Corporation. All rights reserved. * Copyright (c) 2014-2016 Intel, Inc. All rights reserved. - * Copyright (c) 2014-2015 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -26,1209 +26,62 @@ * $HEADER$ */ #include "opal_config.h" -#include -#ifdef HAVE_UNISTD_H -#include -#endif /* HAVE_UNISTD_H */ + #include -#ifdef HAVE_FCNTL_H -#include -#endif /* HAVE_FCNTL_H */ -#ifdef HAVE_SYS_TYPES_H -#include -#endif /* HAVE_SYS_TYPES_H */ -#ifdef HAVE_SYS_MMAN_H -#include -#endif /* HAVE_SYS_MMAN_H */ -#ifdef HAVE_SYS_STAT_H -#include /* for mkfifo */ -#endif /* HAVE_SYS_STAT_H */ -#include "opal/mca/shmem/base/base.h" -#include "opal/mca/shmem/shmem.h" -#include "opal/util/bit_ops.h" +#include "opal/mca/btl/btl.h" +#include "opal/mca/btl/base/base.h" #include "opal/util/output.h" #include "opal/util/show_help.h" +#include "opal/util/argv.h" #include "opal/constants.h" -#include "opal/mca/mpool/base/base.h" -#include "opal/mca/common/sm/common_sm.h" -#include "opal/mca/btl/base/btl_base_error.h" - -#if OPAL_ENABLE_FT_CR == 1 -#include "opal/runtime/opal_cr.h" -#endif - -#include "btl_sm.h" -#include "btl_sm_frag.h" -#include "btl_sm_fifo.h" -#if OPAL_CUDA_SUPPORT -#include "opal/mca/common/cuda/common_cuda.h" -#endif /* OPAL_CUDA_SUPPORT */ - -#if OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA -static OBJ_CLASS_INSTANCE(mca_btl_sm_registration_handle_t, opal_free_list_item_t, NULL, NULL); -#endif - -static int mca_btl_sm_component_open(void); -static int mca_btl_sm_component_close(void); -static int sm_register(void); -static mca_btl_base_module_t** mca_btl_sm_component_init( - int *num_btls, - bool enable_progress_threads, - bool enable_mpi_threads -); - -typedef enum { - MCA_BTL_SM_RNDV_MOD_SM = 0, - MCA_BTL_SM_RNDV_MOD_MPOOL -} mca_btl_sm_rndv_module_type_t; - -/* - * Shared Memory (SM) component instance. - */ -mca_btl_sm_component_t mca_btl_sm_component = { - .super = { - /* First, the mca_base_component_t struct containing meta information - about the component itself */ - .btl_version = { - MCA_BTL_DEFAULT_VERSION("sm"), - .mca_open_component = mca_btl_sm_component_open, - .mca_close_component = mca_btl_sm_component_close, - .mca_register_component_params = sm_register, - }, - .btl_data = { - /* The component is checkpoint ready */ - .param_field = MCA_BASE_METADATA_PARAM_CHECKPOINT - }, - - .btl_init = mca_btl_sm_component_init, - .btl_progress = mca_btl_sm_component_progress, - } /* end super */ -}; - - -/* - * utility routines for parameter registration - */ - -static inline int mca_btl_sm_param_register_int( - const char* param_name, - int default_value, - int level, - int *storage) -{ - *storage = default_value; - (void) mca_base_component_var_register (&mca_btl_sm_component.super.btl_version, - param_name, NULL, MCA_BASE_VAR_TYPE_INT, - NULL, 0, 0, level, - MCA_BASE_VAR_SCOPE_READONLY, storage); - return *storage; -} - -static inline unsigned int mca_btl_sm_param_register_uint( - const char* param_name, - unsigned int default_value, - int level, - unsigned int *storage) -{ - *storage = default_value; - (void) mca_base_component_var_register (&mca_btl_sm_component.super.btl_version, - param_name, NULL, MCA_BASE_VAR_TYPE_UNSIGNED_INT, - NULL, 0, 0, level, - MCA_BASE_VAR_SCOPE_READONLY, storage); - return *storage; -} - -#if OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA -static void mca_btl_sm_dummy_get (void) -{ - /* If a backtrace ends at this function something has gone wrong with - * the btl bootstrapping. Check that the btl_get function was set to - * something reasonable. */ - abort (); -} -#endif - -static int mca_btl_sm_component_verify(void) { -#if OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA - if (mca_btl_sm_component.use_knem || mca_btl_sm_component.use_cma) { - mca_btl_sm.super.btl_flags |= MCA_BTL_FLAGS_GET; - /* set a dummy value for btl_get to prevent mca_btl_base_param_verify from - * unsetting the MCA_BTL_FLAGS_GET flags. */ - mca_btl_sm.super.btl_get = (mca_btl_base_module_get_fn_t) mca_btl_sm_dummy_get; - } - - if (mca_btl_sm_component.use_knem && mca_btl_sm_component.use_cma) { - /* Disable CMA if knem is runtime enabled */ - opal_output(0, "CMA disabled because knem is enabled"); - mca_btl_sm_component.use_cma = 0; - } - -#endif /* OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA */ - - return mca_btl_base_param_verify(&mca_btl_sm.super); -} - -static int sm_register(void) -{ - static bool have_knem = (bool) OPAL_BTL_SM_HAVE_KNEM; - - /* Register an MCA param to indicate whether we have knem support - or not */ - (void) mca_base_component_var_register(&mca_btl_sm_component.super.btl_version, - "have_knem_support", - "Whether this component supports the knem Linux kernel module or not", - MCA_BASE_VAR_TYPE_BOOL, - NULL, 0, - MCA_BASE_VAR_FLAG_DEFAULT_ONLY, - OPAL_INFO_LVL_4, - MCA_BASE_VAR_SCOPE_CONSTANT, - &have_knem); - - if (have_knem) { - mca_btl_sm_component.use_knem = -1; - } else { - mca_btl_sm_component.use_knem = 0; - } - (void) mca_base_component_var_register(&mca_btl_sm_component.super.btl_version, - "use_knem", "Whether knem support is desired or not " - "(negative = try to enable knem support, but continue " - "even if it is not available, 0 = do not enable knem " - "support, positive = try to enable knem support and " - "fail if it is not available)", MCA_BASE_VAR_TYPE_INT, - NULL, 0, 0, OPAL_INFO_LVL_4, - MCA_BASE_VAR_SCOPE_READONLY, &mca_btl_sm_component.use_knem); - - /* Currently disabling DMA mode by default; it's not clear that - this is useful in all applications and architectures. */ - mca_btl_sm_component.knem_dma_min = 0; - (void) mca_base_component_var_register(&mca_btl_sm_component.super.btl_version, - "knem_dma_min", - "Minimum message size (in bytes) to use the knem DMA mode; " - "ignored if knem does not support DMA mode (0 = do not use the " - "knem DMA mode)", MCA_BASE_VAR_TYPE_UNSIGNED_INT, NULL, 0, - 0, OPAL_INFO_LVL_5, MCA_BASE_VAR_SCOPE_READONLY, - &mca_btl_sm_component.knem_dma_min); - - mca_btl_sm_component.knem_max_simultaneous = 0; - (void) mca_base_component_var_register(&mca_btl_sm_component.super.btl_version, - "knem_max_simultaneous", - "Max number of simultaneous ongoing knem operations to support " - "(0 = do everything synchronously, which probably gives the " - "best large message latency; >0 means to do all operations " - "asynchronously, which supports better overlap for simultaneous " - "large message sends)", MCA_BASE_VAR_TYPE_UNSIGNED_INT, NULL, 0, - 0, OPAL_INFO_LVL_5, MCA_BASE_VAR_SCOPE_READONLY, - &mca_btl_sm_component.knem_max_simultaneous); - - mca_btl_sm_component.allocator = "bucket"; - (void) mca_base_component_var_register (&mca_btl_sm_component.super.btl_version, "allocator", - "Name of allocator component to use for btl/sm allocations", - MCA_BASE_VAR_TYPE_STRING, NULL, 0, 0, OPAL_INFO_LVL_9, - MCA_BASE_VAR_SCOPE_LOCAL, &mca_btl_sm_component.allocator); - - mca_btl_sm_component.mpool_min_size = 134217728; - (void) mca_base_component_var_register(&mca_btl_sm_component.super.btl_version, "min_size", - "Minimum size of the common/sm mpool shared memory file", - MCA_BASE_VAR_TYPE_UNSIGNED_LONG, NULL, 0, 0, - OPAL_INFO_LVL_9, MCA_BASE_VAR_SCOPE_READONLY, - &mca_btl_sm_component.mpool_min_size); - - /* CMA parameters */ - mca_btl_sm_component.use_cma = 0; - (void) mca_base_component_var_register(&mca_btl_sm_component.super.btl_version, - "use_cma", "Whether or not to enable CMA", - MCA_BASE_VAR_TYPE_UNSIGNED_INT, NULL, 0, 0, - OPAL_INFO_LVL_4, MCA_BASE_VAR_SCOPE_READONLY, - &mca_btl_sm_component.use_cma); - - /* register SM component parameters */ - mca_btl_sm_param_register_int("free_list_num", 8, OPAL_INFO_LVL_5, &mca_btl_sm_component.sm_free_list_num); - mca_btl_sm_param_register_int("free_list_max", -1, OPAL_INFO_LVL_5, &mca_btl_sm_component.sm_free_list_max); - mca_btl_sm_param_register_int("free_list_inc", 64, OPAL_INFO_LVL_5, &mca_btl_sm_component.sm_free_list_inc); - mca_btl_sm_param_register_int("max_procs", -1, OPAL_INFO_LVL_5, &mca_btl_sm_component.sm_max_procs); - mca_btl_sm_param_register_uint("fifo_size", 4096, OPAL_INFO_LVL_4, &mca_btl_sm_component.fifo_size); - mca_btl_sm_param_register_int("num_fifos", 1, OPAL_INFO_LVL_4, &mca_btl_sm_component.nfifos); - - mca_btl_sm_param_register_uint("fifo_lazy_free", 120, OPAL_INFO_LVL_5, &mca_btl_sm_component.fifo_lazy_free); - - /* default number of extra procs to allow for future growth */ - mca_btl_sm_param_register_int("sm_extra_procs", 0, OPAL_INFO_LVL_9, &mca_btl_sm_component.sm_extra_procs); - - mca_btl_sm.super.btl_exclusivity = MCA_BTL_EXCLUSIVITY_HIGH-1; - mca_btl_sm.super.btl_eager_limit = 4*1024; - mca_btl_sm.super.btl_rndv_eager_limit = 4*1024; - mca_btl_sm.super.btl_max_send_size = 32*1024; - mca_btl_sm.super.btl_rdma_pipeline_send_length = 64*1024; - mca_btl_sm.super.btl_rdma_pipeline_frag_size = 64*1024; - mca_btl_sm.super.btl_min_rdma_pipeline_size = 64*1024; - mca_btl_sm.super.btl_flags = MCA_BTL_FLAGS_SEND; - mca_btl_sm.super.btl_bandwidth = 9000; /* Mbs */ - mca_btl_sm.super.btl_latency = 1; /* Microsecs */ - -#if OPAL_BTL_SM_HAVE_KNEM - mca_btl_sm.super.btl_registration_handle_size = sizeof (mca_btl_base_registration_handle_t); -#endif - - /* Call the BTL based to register its MCA params */ - mca_btl_base_param_register(&mca_btl_sm_component.super.btl_version, - &mca_btl_sm.super); - - return mca_btl_sm_component_verify(); -} - -/* - * Called by MCA framework to open the component, registers - * component parameters. - */ - -static int mca_btl_sm_component_open(void) -{ - if (OPAL_SUCCESS != mca_btl_sm_component_verify()) { - return OPAL_ERROR; - } - - mca_btl_sm_component.sm_max_btls = 1; - - /* make sure the number of fifos is a power of 2 */ - mca_btl_sm_component.nfifos = opal_next_poweroftwo_inclusive (mca_btl_sm_component.nfifos); - - /* make sure that queue size and lazy free parameter are compatible */ - if (mca_btl_sm_component.fifo_lazy_free >= (mca_btl_sm_component.fifo_size >> 1) ) - mca_btl_sm_component.fifo_lazy_free = (mca_btl_sm_component.fifo_size >> 1); - if (mca_btl_sm_component.fifo_lazy_free <= 0) - mca_btl_sm_component.fifo_lazy_free = 1; - - mca_btl_sm_component.max_frag_size = mca_btl_sm.super.btl_max_send_size; - mca_btl_sm_component.eager_limit = mca_btl_sm.super.btl_eager_limit; - - /* initialize objects */ - OBJ_CONSTRUCT(&mca_btl_sm_component.sm_lock, opal_mutex_t); - OBJ_CONSTRUCT(&mca_btl_sm_component.sm_frags_eager, opal_free_list_t); - OBJ_CONSTRUCT(&mca_btl_sm_component.sm_frags_max, opal_free_list_t); - OBJ_CONSTRUCT(&mca_btl_sm_component.sm_frags_user, opal_free_list_t); - OBJ_CONSTRUCT(&mca_btl_sm_component.pending_send_fl, opal_free_list_t); - - mca_btl_sm_component.sm_seg = NULL; - -#if OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA - OBJ_CONSTRUCT(&mca_btl_sm_component.registration_handles, opal_free_list_t); -#endif - -#if OPAL_BTL_SM_HAVE_KNEM - mca_btl_sm.knem_fd = -1; - mca_btl_sm.knem_status_array = NULL; - mca_btl_sm.knem_frag_array = NULL; - mca_btl_sm.knem_status_num_used = 0; - mca_btl_sm.knem_status_first_avail = 0; - mca_btl_sm.knem_status_first_used = 0; -#endif - - return OPAL_SUCCESS; -} - - -/* - * component cleanup - sanity checking of queue lengths - */ - -static int mca_btl_sm_component_close(void) -{ - int return_value = OPAL_SUCCESS; - -#if OPAL_BTL_SM_HAVE_KNEM - if (NULL != mca_btl_sm.knem_frag_array) { - free(mca_btl_sm.knem_frag_array); - mca_btl_sm.knem_frag_array = NULL; - } - if (NULL != mca_btl_sm.knem_status_array) { - munmap(mca_btl_sm.knem_status_array, - mca_btl_sm_component.knem_max_simultaneous); - mca_btl_sm.knem_status_array = NULL; - } - if (-1 != mca_btl_sm.knem_fd) { - close(mca_btl_sm.knem_fd); - mca_btl_sm.knem_fd = -1; - } -#endif /* OPAL_BTL_SM_HAVE_KNEM */ - -#if OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA - OBJ_DESTRUCT(&mca_btl_sm_component.registration_handles); -#endif - - OBJ_DESTRUCT(&mca_btl_sm_component.sm_lock); - /** - * We don't have to destroy the fragment lists. They are allocated - * directly into the mmapped file, they will auto-magically disappear - * when the file get unmapped. - */ - /*OBJ_DESTRUCT(&mca_btl_sm_component.sm_frags_eager);*/ - /*OBJ_DESTRUCT(&mca_btl_sm_component.sm_frags_max);*/ - - /* unmap the shared memory control structure */ - if(mca_btl_sm_component.sm_seg != NULL) { - return_value = mca_common_sm_fini( mca_btl_sm_component.sm_seg ); - if( OPAL_SUCCESS != return_value ) { - return_value = OPAL_ERROR; - goto CLEANUP; - } - - /* unlink file, so that it will be deleted when all references - * to it are gone - no error checking, since we want all procs - * to call this, so that in an abnormal termination scenario, - * this file will still get cleaned up */ -#if OPAL_ENABLE_FT_CR == 1 - /* Only unlink the file if we are *not* restarting - * If we are restarting the file will be unlinked at a later time. - */ - if(OPAL_CR_STATUS_RESTART_PRE != opal_cr_checkpointing_state && - OPAL_CR_STATUS_RESTART_POST != opal_cr_checkpointing_state ) { - unlink(mca_btl_sm_component.sm_seg->shmem_ds.seg_name); - } -#else - unlink(mca_btl_sm_component.sm_seg->shmem_ds.seg_name); -#endif - OBJ_RELEASE(mca_btl_sm_component.sm_seg); - } - -#if OPAL_ENABLE_PROGRESS_THREADS == 1 - /* close/cleanup fifo create for event notification */ - if(mca_btl_sm_component.sm_fifo_fd > 0) { - /* write a done message down the pipe */ - unsigned char cmd = DONE; - if( write(mca_btl_sm_component.sm_fifo_fd,&cmd,sizeof(cmd)) != - sizeof(cmd)){ - opal_output(0, "mca_btl_sm_component_close: write fifo failed: errno=%d\n", - errno); - } - opal_thread_join(&mca_btl_sm_component.sm_fifo_thread, NULL); - close(mca_btl_sm_component.sm_fifo_fd); - unlink(mca_btl_sm_component.sm_fifo_path); - } -#endif - -CLEANUP: - -#if OPAL_CUDA_SUPPORT - mca_common_cuda_fini(); -#endif /* OPAL_CUDA_SUPPORT */ - - /* return */ - return return_value; -} - -/* - * Returns the number of processes on the node. - */ -static inline int -get_num_local_procs(void) -{ - /* num_local_peers does not include us in - * its calculation, so adjust for that */ - return (int)(1 + opal_process_info.num_local_peers); -} - -static void -calc_sm_max_procs(int n) -{ - /* see if need to allocate space for extra procs */ - if (0 > mca_btl_sm_component.sm_max_procs) { - /* no limit */ - if (0 <= mca_btl_sm_component.sm_extra_procs) { - /* limit */ - mca_btl_sm_component.sm_max_procs = - n + mca_btl_sm_component.sm_extra_procs; - } else { - /* no limit */ - mca_btl_sm_component.sm_max_procs = 2 * n; - } - } -} - -static int -create_and_attach(mca_btl_sm_component_t *comp_ptr, - size_t size, - char *file_name, - size_t size_ctl_structure, - size_t data_seg_alignment, - mca_common_sm_module_t **out_modp) - -{ - if (NULL == (*out_modp = - mca_common_sm_module_create_and_attach(size, file_name, - size_ctl_structure, - data_seg_alignment))) { - opal_output(0, "create_and_attach: unable to create shared memory " - "BTL coordinating structure :: size %lu \n", - (unsigned long)size); - return OPAL_ERROR; - } - return OPAL_SUCCESS; -} - -static int -get_mpool_res_size(int32_t max_procs, - size_t *out_res_size) -{ - size_t size = 0; - - *out_res_size = 0; - /* determine how much memory to create */ - /* - * This heuristic formula mostly says that we request memory for: - * - nfifos FIFOs, each comprising: - * . a sm_fifo_t structure - * . many pointers (fifo_size of them per FIFO) - * - eager fragments (2*n of them, allocated in sm_free_list_inc chunks) - * - max fragments (sm_free_list_num of them) - * - * On top of all that, we sprinkle in some number of - * "opal_cache_line_size" additions to account for some - * padding and edge effects that may lie in the allocator. - */ - size = FIFO_MAP_NUM(max_procs) * - (sizeof(sm_fifo_t) + sizeof(void *) * - mca_btl_sm_component.fifo_size + 4 * opal_cache_line_size) + - (2 * max_procs + mca_btl_sm_component.sm_free_list_inc) * - (mca_btl_sm_component.eager_limit + 2 * opal_cache_line_size) + - mca_btl_sm_component.sm_free_list_num * - (mca_btl_sm_component.max_frag_size + 2 * opal_cache_line_size); - - /* add something for the control structure */ - size += sizeof(mca_common_sm_module_t); - - /* before we multiply by max_procs, make sure the result won't overflow */ - /* Stick that little pad in, particularly since we'll eventually - * need a little extra space. E.g., in mca_mpool_sm_init() in - * mpool_sm_component.c when sizeof(mca_common_sm_module_t) is - * added. - */ - if (((double)size) * max_procs > LONG_MAX - 4096) { - return OPAL_ERR_VALUE_OUT_OF_BOUNDS; - } - size *= (size_t)max_procs; - *out_res_size = size; - return OPAL_SUCCESS; -} - - -/* Generates all the unique paths for the shared-memory segments that this BTL - * needs along with other file paths used to share "connection information". */ -static int -set_uniq_paths_for_init_rndv(mca_btl_sm_component_t *comp_ptr) -{ - int rc = OPAL_ERR_OUT_OF_RESOURCE; - - /* NOTE: don't forget to free these after init */ - comp_ptr->sm_mpool_ctl_file_name = NULL; - comp_ptr->sm_mpool_rndv_file_name = NULL; - comp_ptr->sm_ctl_file_name = NULL; - comp_ptr->sm_rndv_file_name = NULL; - - if (asprintf(&comp_ptr->sm_mpool_ctl_file_name, - "%s"OPAL_PATH_SEP"shared_mem_pool.%s", - opal_process_info.job_session_dir, - opal_process_info.nodename) < 0) { - /* rc set */ - goto out; - } - if (asprintf(&comp_ptr->sm_mpool_rndv_file_name, - "%s"OPAL_PATH_SEP"shared_mem_pool_rndv.%s", - opal_process_info.job_session_dir, - opal_process_info.nodename) < 0) { - /* rc set */ - goto out; - } - if (asprintf(&comp_ptr->sm_ctl_file_name, - "%s"OPAL_PATH_SEP"shared_mem_btl_module.%s", - opal_process_info.job_session_dir, - opal_process_info.nodename) < 0) { - /* rc set */ - goto out; - } - if (asprintf(&comp_ptr->sm_rndv_file_name, - "%s"OPAL_PATH_SEP"shared_mem_btl_rndv.%s", - opal_process_info.job_session_dir, - opal_process_info.nodename) < 0) { - /* rc set */ - goto out; - } - /* all is well */ - rc = OPAL_SUCCESS; - -out: - if (OPAL_SUCCESS != rc) { - if (comp_ptr->sm_mpool_ctl_file_name) { - free(comp_ptr->sm_mpool_ctl_file_name); - } - if (comp_ptr->sm_mpool_rndv_file_name) { - free(comp_ptr->sm_mpool_rndv_file_name); - } - if (comp_ptr->sm_ctl_file_name) { - free(comp_ptr->sm_ctl_file_name); - } - if (comp_ptr->sm_rndv_file_name) { - free(comp_ptr->sm_rndv_file_name); - } - } - return rc; -} - -static int -create_rndv_file(mca_btl_sm_component_t *comp_ptr, - mca_btl_sm_rndv_module_type_t type) -{ - size_t size = 0; - int rc = OPAL_SUCCESS; - int fd = -1; - char *fname = NULL; - char *tmpfname = NULL; - /* used as a temporary store so we can extract shmem_ds info */ - mca_common_sm_module_t *tmp_modp = NULL; - - if (MCA_BTL_SM_RNDV_MOD_MPOOL == type) { - /* get the segment size for the sm mpool. */ - if (OPAL_SUCCESS != (rc = get_mpool_res_size(comp_ptr->sm_max_procs, - &size))) { - /* rc is already set */ - goto out; - } - - /* update size if less than required minimum */ - if (size < mca_btl_sm_component.mpool_min_size) { - size = mca_btl_sm_component.mpool_min_size; - } - /* we only need the shmem_ds info at this point. initilization will be - * completed in the mpool module code. the idea is that we just need this - * info so we can populate the rndv file (or modex when we have it). */ - if (OPAL_SUCCESS != (rc = - create_and_attach(comp_ptr, size, comp_ptr->sm_mpool_ctl_file_name, - sizeof(mca_common_sm_module_t), 8, &tmp_modp))) { - /* rc is set */ - goto out; - } - fname = comp_ptr->sm_mpool_rndv_file_name; - } - else if (MCA_BTL_SM_RNDV_MOD_SM == type) { - /* calculate the segment size. */ - size = sizeof(mca_common_sm_seg_header_t) + - comp_ptr->sm_max_procs * - (sizeof(sm_fifo_t *) + - sizeof(char *) + sizeof(uint16_t)) + - opal_cache_line_size; - - if (OPAL_SUCCESS != (rc = - create_and_attach(comp_ptr, size, comp_ptr->sm_ctl_file_name, - sizeof(mca_common_sm_seg_header_t), - opal_cache_line_size, &comp_ptr->sm_seg))) { - /* rc is set */ - goto out; - } - fname = comp_ptr->sm_rndv_file_name; - tmp_modp = comp_ptr->sm_seg; - } - else { - return OPAL_ERR_BAD_PARAM; - } - - /* at this point, we have all the info we need to populate the rendezvous - * file containing all the meta info required for attach. */ - - /* now just write the contents of tmp_modp->shmem_ds to the full - * sizeof(opal_shmem_ds_t), so we know where the mpool_res_size - * starts. Note that we write into a temporary file first and - * then do a rename(2) to move the full file into its final - * destination. This avoids a race condition where a peer process - * might open/read part of the file before this processes finishes - * writing it (see - * https://github.com/open-mpi/ompi/issues/1230). */ - asprintf(&tmpfname, "%s.tmp", fname); - if (NULL == tmpfname) { - rc = OPAL_ERR_OUT_OF_RESOURCE; - goto out; - } - if (-1 == (fd = open(tmpfname, O_CREAT | O_RDWR, 0600))) { - int err = errno; - opal_show_help("help-mpi-btl-sm.txt", "sys call fail", true, - "open(2)", strerror(err), err); - rc = OPAL_ERR_IN_ERRNO; - goto out; - } - if ((ssize_t)sizeof(opal_shmem_ds_t) != write(fd, &(tmp_modp->shmem_ds), - sizeof(opal_shmem_ds_t))) { - int err = errno; - opal_show_help("help-mpi-btl-sm.txt", "sys call fail", true, - "write(2)", strerror(err), err); - rc = OPAL_ERR_IN_ERRNO; - goto out; - } - if (MCA_BTL_SM_RNDV_MOD_MPOOL == type) { - if ((ssize_t)sizeof(size) != write(fd, &size, sizeof(size))) { - int err = errno; - opal_show_help("help-mpi-btl-sm.txt", "sys call fail", true, - "write(2)", strerror(err), err); - rc = OPAL_ERR_IN_ERRNO; - goto out; - } - /* only do this for the mpool case */ - OBJ_RELEASE(tmp_modp); - } - (void)close(fd); - fd = -1; - if (0 != rename(tmpfname, fname)) { - rc = OPAL_ERR_IN_ERRNO; - goto out; - } - -out: - if (-1 != fd) { - (void)close(fd); - } - if (NULL != tmpfname) { - free(tmpfname); - } - return rc; -} - -/* - * Creates information required for the sm modex and modex sends it. - */ -static int -backing_store_init(mca_btl_sm_component_t *comp_ptr, - uint32_t local_rank) -{ - int rc = OPAL_SUCCESS; - - if (OPAL_SUCCESS != (rc = set_uniq_paths_for_init_rndv(comp_ptr))) { - goto out; - } - /* only let the lowest rank setup the metadata */ - if (0 == local_rank) { - /* === sm mpool === */ - if (OPAL_SUCCESS != (rc = - create_rndv_file(comp_ptr, MCA_BTL_SM_RNDV_MOD_MPOOL))) { - goto out; - } - /* === sm === */ - if (OPAL_SUCCESS != (rc = - create_rndv_file(comp_ptr, MCA_BTL_SM_RNDV_MOD_SM))) { - goto out; - } - } - -out: - return rc; -} - -/* - * SM component initialization - */ -static mca_btl_base_module_t ** -mca_btl_sm_component_init(int *num_btls, - bool enable_progress_threads, - bool enable_mpi_threads) -{ - int num_local_procs = 0; - mca_btl_base_module_t **btls = NULL; - uint32_t my_local_rank = UINT32_MAX; -#if OPAL_BTL_SM_HAVE_KNEM | OPAL_BTL_SM_HAVE_CMA - int rc; -#endif /* OPAL_BTL_SM_HAVE_KNEM | OPAL_BTL_SM_HAVE_CMA */ - - *num_btls = 0; - /* lookup/create shared memory pool only when used */ - mca_btl_sm_component.sm_mpool = NULL; - mca_btl_sm_component.sm_mpool_base = NULL; - -#if OPAL_CUDA_SUPPORT - mca_common_cuda_stage_one_init(); -#endif /* OPAL_CUDA_SUPPORT */ - - /* if no session directory was created, then we cannot be used */ - if (NULL == opal_process_info.job_session_dir) { - /* SKG - this isn't true anymore. Some backing facilities don't require a - * file-backed store. Extend shmem to provide this info one day. Especially - * when we use a proper modex for init. */ - return NULL; - } - /* if we don't have locality information, then we cannot be used because we - * need to know who the respective node ranks for initialization. note the - * use of my_local_rank here. we use this instead of my_node_rank because in - * the spawn case we need to designate a metadata creator rank within the - * set of processes that are initializing the btl, and my_local_rank seems - * to provide that for us. */ - if (UINT32_MAX == - (my_local_rank = opal_process_info.my_local_rank)) { - opal_show_help("help-mpi-btl-sm.txt", "no locality", true); - return NULL; - } - /* no use trying to use sm with less than two procs, so just bail. */ - if ((num_local_procs = get_num_local_procs()) < 2) { - return NULL; - } - /* calculate max procs so we can figure out how large to make the - * shared-memory segment. this routine sets component sm_max_procs. */ - calc_sm_max_procs(num_local_procs); - - /* This is where the modex will live some day. For now, just have local rank - * 0 create a rendezvous file containing the backing store info, so the - * other local procs can read from it during add_procs. The rest will just - * stash the known paths for use later in init. */ - if (OPAL_SUCCESS != backing_store_init(&mca_btl_sm_component, - my_local_rank)) { - return NULL; - } - -#if OPAL_ENABLE_PROGRESS_THREADS == 1 - /* create a named pipe to receive events */ - sprintf( mca_btl_sm_component.sm_fifo_path, - "%s"OPAL_PATH_SEP"sm_fifo.%lu", opal_process_info.job_session_dir, - (unsigned long)OPAL_PROC_MY_NAME.vpid ); - if(mkfifo(mca_btl_sm_component.sm_fifo_path, 0660) < 0) { - opal_output(0, "mca_btl_sm_component_init: mkfifo failed with errno=%d\n",errno); - return NULL; - } - mca_btl_sm_component.sm_fifo_fd = open(mca_btl_sm_component.sm_fifo_path, - O_RDWR); - if(mca_btl_sm_component.sm_fifo_fd < 0) { - opal_output(0, "mca_btl_sm_component_init: " - "open(%s) failed with errno=%d\n", - mca_btl_sm_component.sm_fifo_path, errno); - return NULL; - } - - OBJ_CONSTRUCT(&mca_btl_sm_component.sm_fifo_thread, opal_thread_t); - mca_btl_sm_component.sm_fifo_thread.t_run = - (opal_thread_fn_t)mca_btl_sm_component_event_thread; - opal_thread_start(&mca_btl_sm_component.sm_fifo_thread); -#endif - - mca_btl_sm_component.sm_btls = - (mca_btl_sm_t **)malloc(mca_btl_sm_component.sm_max_btls * - sizeof(mca_btl_sm_t *)); - if (NULL == mca_btl_sm_component.sm_btls) { - return NULL; - } - - /* allocate the Shared Memory BTL */ - *num_btls = 1; - btls = (mca_btl_base_module_t**)malloc(sizeof(mca_btl_base_module_t*)); - if (NULL == btls) { - return NULL; - } - - /* get pointer to the btls */ - btls[0] = (mca_btl_base_module_t*)(&(mca_btl_sm)); - mca_btl_sm_component.sm_btls[0] = (mca_btl_sm_t*)(&(mca_btl_sm)); - - /* initialize some BTL data */ - /* start with no SM procs */ - mca_btl_sm_component.num_smp_procs = 0; - mca_btl_sm_component.my_smp_rank = -1; /* not defined */ - mca_btl_sm_component.sm_num_btls = 1; - /* set flag indicating btl not inited */ - mca_btl_sm.btl_inited = false; - -#if OPAL_BTL_SM_HAVE_KNEM - if (mca_btl_sm_component.use_knem) { - if (0 != mca_btl_sm_component.use_knem) { - /* Open the knem device. Try to print a helpful message if we - fail to open it. */ - mca_btl_sm.knem_fd = open("/dev/knem", O_RDWR); - if (mca_btl_sm.knem_fd < 0) { - if (EACCES == errno) { - struct stat sbuf; - if (0 != stat("/dev/knem", &sbuf)) { - sbuf.st_mode = 0; - } - opal_show_help("help-mpi-btl-sm.txt", "knem permission denied", - true, opal_process_info.nodename, sbuf.st_mode); - } else { - opal_show_help("help-mpi-btl-sm.txt", "knem fail open", - true, opal_process_info.nodename, errno, - strerror(errno)); - } - goto no_knem; - } - - /* Check that the ABI if the kernel module running is the same - as what we were compiled against */ - rc = ioctl(mca_btl_sm.knem_fd, KNEM_CMD_GET_INFO, - &mca_btl_sm_component.knem_info); - if (rc < 0) { - opal_show_help("help-mpi-btl-sm.txt", "knem get ABI fail", - true, opal_process_info.nodename, errno, - strerror(errno)); - goto no_knem; - } - if (KNEM_ABI_VERSION != mca_btl_sm_component.knem_info.abi) { - opal_show_help("help-mpi-btl-sm.txt", "knem ABI mismatch", - true, opal_process_info.nodename, KNEM_ABI_VERSION, - mca_btl_sm_component.knem_info.abi); - goto no_knem; - } - - /* If we want DMA mode and DMA mode is supported, then set - knem_dma_flag to KNEM_FLAG_DMA. */ - mca_btl_sm_component.knem_dma_flag = 0; - if (mca_btl_sm_component.knem_dma_min > 0 && - (mca_btl_sm_component.knem_info.features & KNEM_FEATURE_DMA)) { - mca_btl_sm_component.knem_dma_flag = KNEM_FLAG_DMA; - } - - /* Get the array of statuses from knem if max_simultaneous > 0 */ - if (mca_btl_sm_component.knem_max_simultaneous > 0) { - mca_btl_sm.knem_status_array = mmap(NULL, - mca_btl_sm_component.knem_max_simultaneous, - (PROT_READ | PROT_WRITE), - MAP_SHARED, mca_btl_sm.knem_fd, - KNEM_STATUS_ARRAY_FILE_OFFSET); - if (MAP_FAILED == mca_btl_sm.knem_status_array) { - opal_show_help("help-mpi-btl-sm.txt", "knem mmap fail", - true, opal_process_info.nodename, errno, - strerror(errno)); - goto no_knem; - } - - /* The first available status index is 0. Make an empty frag - array. */ - mca_btl_sm.knem_frag_array = (mca_btl_sm_frag_t **) - malloc(sizeof(mca_btl_sm_frag_t *) * - mca_btl_sm_component.knem_max_simultaneous); - if (NULL == mca_btl_sm.knem_frag_array) { - opal_show_help("help-mpi-btl-sm.txt", "sys call fail", - true, "malloc", - strerror(errno), errno); - goto no_knem; - } - } - } - /* Set the BTL get function pointer if we're supporting KNEM; - choose between synchronous and asynchronous. */ - if (mca_btl_sm_component.knem_max_simultaneous > 0) { - mca_btl_sm.super.btl_get = mca_btl_sm_get_async; - } else { - mca_btl_sm.super.btl_get = mca_btl_sm_get_sync; - } - - mca_btl_sm.super.btl_register_mem = mca_btl_sm_register_mem; - mca_btl_sm.super.btl_deregister_mem = mca_btl_sm_deregister_mem; - } -#else - /* If the user explicitly asked for knem and we can't provide it, - error */ - if (mca_btl_sm_component.use_knem > 0) { - goto no_knem; - } -#endif /* OPAL_BTL_SM_HAVE_KNEM */ - -#if OPAL_BTL_SM_HAVE_CMA - if (mca_btl_sm_component.use_cma) { - /* Will only ever have either cma or knem enabled at runtime - so no problems with accidentally overwriting this set earlier */ - mca_btl_sm.super.btl_get = mca_btl_sm_get_sync; - mca_btl_sm.super.btl_register_mem = mca_btl_sm_register_mem; - mca_btl_sm.super.btl_deregister_mem = mca_btl_sm_deregister_mem; - } -#else - /* If the user explicitly asked for CMA and we can't provide itm - * error */ - if (mca_btl_sm_component.use_cma > 0) { - mca_btl_sm.super.btl_flags &= ~MCA_BTL_FLAGS_GET; - opal_show_help("help-mpi-btl-sm.txt", - "CMA requested but not available", - true, opal_process_info.nodename); - free(btls); - return NULL; - } -#endif /* OPAL_BTL_SM_HAVE_CMA */ - -#if OPAL_BTL_SM_HAVE_KNEM | OPAL_BTL_SM_HAVE_CMA - if (mca_btl_sm_component.use_cma || mca_btl_sm_component.use_knem) { - rc = opal_free_list_init (&mca_btl_sm_component.registration_handles, - sizeof (mca_btl_sm_registration_handle_t), - 8, OBJ_CLASS(mca_btl_sm_registration_handle_t), - 0, 0, mca_btl_sm_component.sm_free_list_num, - mca_btl_sm_component.sm_free_list_max, - mca_btl_sm_component.sm_free_list_inc, NULL, 0, - NULL, NULL, NULL); - if (OPAL_SUCCESS != rc) { - free (btls); - return NULL; - } - } -#endif - - return btls; - - no_knem: -#if OPAL_BTL_SM_HAVE_KNEM - mca_btl_sm.super.btl_flags &= ~MCA_BTL_FLAGS_GET; - - if (NULL != mca_btl_sm.knem_frag_array) { - free(mca_btl_sm.knem_frag_array); - mca_btl_sm.knem_frag_array = NULL; - } - if (NULL != mca_btl_sm.knem_status_array) { - munmap(mca_btl_sm.knem_status_array, - mca_btl_sm_component.knem_max_simultaneous); - mca_btl_sm.knem_status_array = NULL; - } - if (-1 != mca_btl_sm.knem_fd) { - close(mca_btl_sm.knem_fd); - mca_btl_sm.knem_fd = -1; - } -#endif /* OPAL_BTL_SM_HAVE_KNEM */ - - /* If "use_knem" is positive, then it's an error if knem support - is not available -- deactivate the sm btl. */ - if (mca_btl_sm_component.use_knem > 0) { - opal_show_help("help-mpi-btl-sm.txt", - "knem requested but not available", - true, opal_process_info.nodename); - free(btls); - return NULL; - } else if (0 == mca_btl_sm_component.use_cma) { - /* disable get when not using knem or cma */ - mca_btl_sm.super.btl_get = NULL; - mca_btl_sm.super.btl_flags &= ~MCA_BTL_FLAGS_GET; - mca_btl_sm_component.use_knem = 0; - } - - /* Otherwise, use_knem was 0 (and we didn't get here) or use_knem - was <0, in which case the fact that knem is not available is - not an error. */ - return btls; -} +static int mca_btl_sm_component_register(void); /* - * SM component progress. + * The "sm" BTL has been completely replaced by the "vader" BTL. + * + * The only purpose for this component is to print a show_help message + * to inform the user that they should be using the vader BTL. */ - -#if OPAL_ENABLE_PROGRESS_THREADS == 1 -void mca_btl_sm_component_event_thread(opal_object_t* thread) -{ - while(1) { - unsigned char cmd; - if(read(mca_btl_sm_component.sm_fifo_fd, &cmd, sizeof(cmd)) != sizeof(cmd)) { - /* error condition */ - return; - } - if( DONE == cmd ){ - /* return when done message received */ - return; - } - mca_btl_sm_component_progress(); +mca_btl_base_component_3_0_0_t mca_btl_sm_component = { + /* First, the mca_base_component_t struct containing meta information + about the component itself */ + .btl_version = { + MCA_BTL_DEFAULT_VERSION("sm"), + .mca_register_component_params = mca_btl_sm_component_register, + }, + .btl_data = { + /* The component is checkpoint ready */ + .param_field = MCA_BASE_METADATA_PARAM_CHECKPOINT } -} -#endif - -void btl_sm_process_pending_sends(struct mca_btl_base_endpoint_t *ep) -{ - btl_sm_pending_send_item_t *si; - int rc; - - while ( 0 < opal_list_get_size(&ep->pending_sends) ) { - /* Note that we access the size of ep->pending_sends unlocked - as it doesn't really matter if the result is wrong as - opal_list_remove_first is called with a lock and we handle it - not finding an item to process */ - OPAL_THREAD_LOCK(&ep->endpoint_lock); - si = (btl_sm_pending_send_item_t*)opal_list_remove_first(&ep->pending_sends); - OPAL_THREAD_UNLOCK(&ep->endpoint_lock); - - if(NULL == si) return; /* Another thread got in before us. Thats ok. */ - - OPAL_THREAD_ADD32(&mca_btl_sm_component.num_pending_sends, -1); - - MCA_BTL_SM_FIFO_WRITE(ep, ep->my_smp_rank, ep->peer_smp_rank, si->data, - true, false, rc); - - opal_free_list_return (&mca_btl_sm_component.pending_send_fl, (opal_free_list_item_t *) si); - - if ( OPAL_SUCCESS != rc ) - return; - } -} - -int mca_btl_sm_component_progress(void) -{ - /* local variables */ - mca_btl_base_segment_t seg; - mca_btl_sm_frag_t *frag; - mca_btl_sm_frag_t Frag; - sm_fifo_t *fifo = NULL; - mca_btl_sm_hdr_t *hdr; - int my_smp_rank = mca_btl_sm_component.my_smp_rank; - int peer_smp_rank, j, rc = 0, nevents = 0; - - /* first, deal with any pending sends */ - /* This check should be fast since we only need to check one variable. */ - if ( 0 < mca_btl_sm_component.num_pending_sends ) { - - /* perform a loop to find the endpoints that have pending sends */ - /* This can take a while longer if there are many endpoints to check. */ - for ( peer_smp_rank = 0; peer_smp_rank < mca_btl_sm_component.num_smp_procs; peer_smp_rank++) { - struct mca_btl_base_endpoint_t* endpoint; - if ( peer_smp_rank == my_smp_rank ) - continue; - endpoint = mca_btl_sm_component.sm_peers[peer_smp_rank]; - if ( 0 < opal_list_get_size(&endpoint->pending_sends) ) - btl_sm_process_pending_sends(endpoint); - } - } - - /* poll each fifo */ - for(j = 0; j < FIFO_MAP_NUM(mca_btl_sm_component.num_smp_procs); j++) { - fifo = &(mca_btl_sm_component.fifo[my_smp_rank][j]); - recheck_peer: - /* aquire thread lock */ - if(opal_using_threads()) { - opal_atomic_lock(&(fifo->tail_lock)); - } - - hdr = (mca_btl_sm_hdr_t *)sm_fifo_read(fifo); - - /* release thread lock */ - if(opal_using_threads()) { - opal_atomic_unlock(&(fifo->tail_lock)); - } - - if(SM_FIFO_FREE == hdr) { - continue; - } - - nevents++; - /* dispatch fragment by type */ - switch(((uintptr_t)hdr) & MCA_BTL_SM_FRAG_TYPE_MASK) { - case MCA_BTL_SM_FRAG_SEND: - { - mca_btl_active_message_callback_t* reg; - /* change the address from address relative to the shared - * memory address, to a true virtual address */ - hdr = (mca_btl_sm_hdr_t *) RELATIVE2VIRTUAL(hdr); - peer_smp_rank = hdr->my_smp_rank; -#if OPAL_ENABLE_DEBUG - if ( FIFO_MAP(peer_smp_rank) != j ) { - opal_output(0, "mca_btl_sm_component_progress: " - "rank %d got %d on FIFO %d, but this sender should send to FIFO %d\n", - my_smp_rank, peer_smp_rank, j, FIFO_MAP(peer_smp_rank)); - } -#endif - /* recv upcall */ - reg = mca_btl_base_active_message_trigger + hdr->tag; - seg.seg_addr.pval = ((char *)hdr) + sizeof(mca_btl_sm_hdr_t); - seg.seg_len = hdr->len; - Frag.base.des_segment_count = 1; - Frag.base.des_segments = &seg; - reg->cbfunc(&mca_btl_sm.super, hdr->tag, &(Frag.base), - reg->cbdata); - /* return the fragment */ - MCA_BTL_SM_FIFO_WRITE( - mca_btl_sm_component.sm_peers[peer_smp_rank], - my_smp_rank, peer_smp_rank, hdr->frag, false, true, rc); - break; - } - case MCA_BTL_SM_FRAG_ACK: - { - int status = (uintptr_t)hdr & MCA_BTL_SM_FRAG_STATUS_MASK; - int btl_ownership; - struct mca_btl_base_endpoint_t* endpoint; +}; - frag = (mca_btl_sm_frag_t *)((char*)((uintptr_t)hdr & - (~(MCA_BTL_SM_FRAG_TYPE_MASK | - MCA_BTL_SM_FRAG_STATUS_MASK)))); - endpoint = frag->endpoint; - btl_ownership = (frag->base.des_flags & MCA_BTL_DES_FLAGS_BTL_OWNERSHIP); - if( MCA_BTL_DES_SEND_ALWAYS_CALLBACK & frag->base.des_flags ) { - /* completion callback */ - frag->base.des_cbfunc(&mca_btl_sm.super, frag->endpoint, - &frag->base, status?OPAL_ERROR:OPAL_SUCCESS); - } - if( btl_ownership ) { - MCA_BTL_SM_FRAG_RETURN(frag); +static int mca_btl_sm_component_register(void) +{ + // If the sm component was explicitly requested, print a show_help + // message and return an error (which will cause the process to + // abort). + if (NULL != opal_btl_base_framework.framework_selection) { + char **names; + names = opal_argv_split(opal_btl_base_framework.framework_selection, + ','); + if (NULL != names) { + for (int i = 0; NULL != names[i]; ++i) { + if (strcmp(names[i], "sm") == 0) { + opal_show_help("help-mpi-btl-sm.txt", "btl sm is dead", + true); + opal_argv_free(names); + return OPAL_ERROR; } - OPAL_THREAD_ADD32(&mca_btl_sm_component.num_outstanding_frags, -1); - if ( 0 < opal_list_get_size(&endpoint->pending_sends) ) { - btl_sm_process_pending_sends(endpoint); - } - goto recheck_peer; } - default: - /* unknown */ - /* - * This code path should presumably never be called. - * It's unclear if it should exist or, if so, how it should be written. - * If we want to return it to the sending process, - * we have to figure out who the sender is. - * It seems we need to subtract the mask bits. - * Then, hopefully this is an sm header that has an smp_rank field. - * Presumably that means the received header was relative. - * Or, maybe this code should just be removed. - */ - opal_output(0, "mca_btl_sm_component_progress read an unknown type of header"); - hdr = (mca_btl_sm_hdr_t *) RELATIVE2VIRTUAL(hdr); - peer_smp_rank = hdr->my_smp_rank; - hdr = (mca_btl_sm_hdr_t*)((uintptr_t)hdr->frag | - MCA_BTL_SM_FRAG_STATUS_MASK); - MCA_BTL_SM_FIFO_WRITE( - mca_btl_sm_component.sm_peers[peer_smp_rank], - my_smp_rank, peer_smp_rank, hdr, false, true, rc); - break; } - } - (void)rc; /* this is safe to ignore as the message is requeued till success */ - -#if OPAL_BTL_SM_HAVE_KNEM - /* The sm btl is currently hard-wired for a single module. So - we're not breaking anything here by checking that one module - for knem specifics. - - Since knem completes requests in order, we can loop around the - circular status buffer until: - - we find a KNEM_STATUS_PENDING, or - - knem_status_num_used == 0 - Note that knem_status_num_used will never be >0 if - component.use_knem<0, so we'll never enter the while loop if - knem is not being used. It will also never be >0 if - max_simultaneous == 0 (because they will all complete - synchronously in _get). However, in order to save a jump - before the return we should test the use_knem here. - */ - if( 0 == mca_btl_sm_component.use_knem ) { - return nevents; + opal_argv_free(names); } - while (mca_btl_sm.knem_status_num_used > 0 && - KNEM_STATUS_PENDING != - mca_btl_sm.knem_status_array[mca_btl_sm.knem_status_first_used]) { - if (KNEM_STATUS_SUCCESS == - mca_btl_sm.knem_status_array[mca_btl_sm.knem_status_first_used]) { - /* Handle the completed fragment */ - frag = - mca_btl_sm.knem_frag_array[mca_btl_sm.knem_status_first_used]; - frag->cb.func (&mca_btl_sm.super, frag->endpoint, - frag->cb.local_address, frag->cb.local_handle, - frag->cb.context, frag->cb.data, OPAL_SUCCESS); - MCA_BTL_SM_FRAG_RETURN(frag); - - /* Bump counters, loop around the circular buffer if - necessary */ - ++nevents; - --mca_btl_sm.knem_status_num_used; - ++mca_btl_sm.knem_status_first_used; - if (mca_btl_sm.knem_status_first_used >= - mca_btl_sm_component.knem_max_simultaneous) { - mca_btl_sm.knem_status_first_used = 0; - } - } else { - /* JMS knem fail */ - break; - } - } -#endif /* OPAL_BTL_SM_HAVE_KNEM */ - return nevents; + // Tell the framework that we don't want this component to be + // considered. + return OPAL_ERR_NOT_AVAILABLE; } diff --git a/opal/mca/btl/sm/btl_sm_endpoint.h b/opal/mca/btl/sm/btl_sm_endpoint.h deleted file mode 100644 index 04708dc856d..00000000000 --- a/opal/mca/btl/sm/btl_sm_endpoint.h +++ /dev/null @@ -1,49 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2012 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2006-2007 Voltaire. All rights reserved. - * Copyright (c) 2014 Los Alamos National Security, LLC. All rights - * reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ -/** - * @file - */ -#ifndef MCA_BTL_SM_ENDPOINT_H -#define MCA_BTL_SM_ENDPOINT_H - -/** - * An abstraction that represents a connection to a endpoint process. - * An instance of mca_ptl_base_endpoint_t is associated w/ each process - * and BTL pair at startup. - */ - -struct mca_btl_base_endpoint_t { - int my_smp_rank; /**< My SMP process rank. Used for accessing - * SMP specfic data structures. */ - int peer_smp_rank; /**< My peer's SMP process rank. Used for accessing - * SMP specfic data structures. */ -#if OPAL_ENABLE_PROGRESS_THREADS == 1 - int fifo_fd; /**< pipe/fifo used to signal endpoint that data is queued */ -#endif - opal_list_t pending_sends; /**< pending data to send */ - - /** lock for concurrent access to endpoint state */ - opal_mutex_t endpoint_lock; - -}; - -void btl_sm_process_pending_sends(struct mca_btl_base_endpoint_t *ep); -#endif diff --git a/opal/mca/btl/sm/btl_sm_fifo.h b/opal/mca/btl/sm/btl_sm_fifo.h deleted file mode 100644 index 76ae46d2fa5..00000000000 --- a/opal/mca/btl/sm/btl_sm_fifo.h +++ /dev/null @@ -1,110 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2012 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2006-2007 Voltaire. All rights reserved. - * Copyright (c) 2009-2010 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2010-2015 Los Alamos National Security, LLC. - * All rights reserved. - * Copyright (c) 2010-2012 IBM Corporation. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef MCA_BTL_SM_FIFO_H -#define MCA_BTL_SM_FIFO_H - -#include "btl_sm.h" -#include "btl_sm_endpoint.h" - -static void -add_pending(struct mca_btl_base_endpoint_t *ep, void *data, bool resend) -{ - btl_sm_pending_send_item_t *si; - opal_free_list_item_t *i; - i = opal_free_list_get (&mca_btl_sm_component.pending_send_fl); - - /* don't handle error for now */ - assert(i != NULL); - - si = (btl_sm_pending_send_item_t*)i; - si->data = data; - - OPAL_THREAD_ADD32(&mca_btl_sm_component.num_pending_sends, +1); - - /* if data was on pending send list then prepend it to the list to - * minimize reordering */ - OPAL_THREAD_LOCK(&ep->endpoint_lock); - if (resend) - opal_list_prepend(&ep->pending_sends, (opal_list_item_t*)si); - else - opal_list_append(&ep->pending_sends, (opal_list_item_t*)si); - OPAL_THREAD_UNLOCK(&ep->endpoint_lock); -} - -/* - * FIFO_MAP(x) defines which FIFO on the receiver should be used - * by sender rank x. The map is some many-to-one hash. - * - * FIFO_MAP_NUM(n) defines how many FIFOs the receiver has for - * n senders. - * - * That is, - * - * for all 0 <= x < n: - * - * 0 <= FIFO_MAP(x) < FIFO_MAP_NUM(n) - * - * For example, using some power-of-two nfifos, we could have - * - * FIFO_MAP(x) = x & (nfifos-1) - * FIFO_MAP_NUM(n) = min(nfifos,n) - * - * Interesting limits include: - * - * nfifos very large: In this case, each sender has its - * own dedicated FIFO on each receiver and the receiver - * has one FIFO per sender. - * - * nfifos == 1: In this case, all senders use the same - * FIFO and each receiver has just one FIFO for all senders. - */ -#define FIFO_MAP(x) ((x) & (mca_btl_sm_component.nfifos - 1)) -#define FIFO_MAP_NUM(n) ( (mca_btl_sm_component.nfifos) < (n) ? (mca_btl_sm_component.nfifos) : (n) ) - - -#define MCA_BTL_SM_FIFO_WRITE(endpoint_peer, my_smp_rank, \ - peer_smp_rank, hdr, resend, retry_pending_sends, rc) \ -do { \ - sm_fifo_t* fifo = &(mca_btl_sm_component.fifo[peer_smp_rank][FIFO_MAP(my_smp_rank)]); \ - \ - if ( retry_pending_sends ) { \ - if ( 0 < opal_list_get_size(&endpoint_peer->pending_sends) ) { \ - btl_sm_process_pending_sends(endpoint_peer); \ - } \ - } \ - \ - opal_atomic_lock(&(fifo->head_lock)); \ - /* post fragment */ \ - if(sm_fifo_write(hdr, fifo) != OPAL_SUCCESS) { \ - add_pending(endpoint_peer, hdr, resend); \ - rc = OPAL_ERR_RESOURCE_BUSY; \ - } else { \ - MCA_BTL_SM_SIGNAL_PEER(endpoint_peer); \ - rc = OPAL_SUCCESS; \ - } \ - opal_atomic_unlock(&(fifo->head_lock)); \ -} while(0) - -#endif diff --git a/opal/mca/btl/sm/btl_sm_frag.c b/opal/mca/btl/sm/btl_sm_frag.c deleted file mode 100644 index 0e846173278..00000000000 --- a/opal/mca/btl/sm/btl_sm_frag.c +++ /dev/null @@ -1,76 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2009 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2009 Cisco Systems, Inc. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ -#include "opal_config.h" -#include "btl_sm_frag.h" - - -static inline void mca_btl_sm_frag_common_constructor(mca_btl_sm_frag_t* frag) -{ - frag->hdr = (mca_btl_sm_hdr_t*)frag->base.super.ptr; - if(frag->hdr != NULL) { - frag->hdr->frag = (mca_btl_sm_frag_t*)((uintptr_t)frag | - MCA_BTL_SM_FRAG_ACK); - frag->segment.base.seg_addr.pval = ((char*)frag->hdr) + - sizeof(mca_btl_sm_hdr_t); - frag->hdr->my_smp_rank = mca_btl_sm_component.my_smp_rank; - } - frag->segment.base.seg_len = frag->size; - frag->base.des_segments = &frag->segment.base; - frag->base.des_segment_count = 1; - frag->base.des_flags = 0; -} - -static void mca_btl_sm_frag1_constructor(mca_btl_sm_frag_t* frag) -{ - frag->size = mca_btl_sm_component.eager_limit; - frag->my_list = &mca_btl_sm_component.sm_frags_eager; - mca_btl_sm_frag_common_constructor(frag); -} - -static void mca_btl_sm_frag2_constructor(mca_btl_sm_frag_t* frag) -{ - frag->size = mca_btl_sm_component.max_frag_size; - frag->my_list = &mca_btl_sm_component.sm_frags_max; - mca_btl_sm_frag_common_constructor(frag); -} - -static void mca_btl_sm_user_constructor(mca_btl_sm_frag_t* frag) -{ - frag->size = 0; - frag->my_list = &mca_btl_sm_component.sm_frags_user; - mca_btl_sm_frag_common_constructor(frag); -} - -OBJ_CLASS_INSTANCE( - mca_btl_sm_frag1_t, - mca_btl_base_descriptor_t, - mca_btl_sm_frag1_constructor, - NULL); - -OBJ_CLASS_INSTANCE( - mca_btl_sm_frag2_t, - mca_btl_base_descriptor_t, - mca_btl_sm_frag2_constructor, - NULL); - -OBJ_CLASS_INSTANCE( - mca_btl_sm_user_t, - mca_btl_base_descriptor_t, - mca_btl_sm_user_constructor, - NULL); diff --git a/opal/mca/btl/sm/btl_sm_frag.h b/opal/mca/btl/sm/btl_sm_frag.h deleted file mode 100644 index 208f122b745..00000000000 --- a/opal/mca/btl/sm/btl_sm_frag.h +++ /dev/null @@ -1,115 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2013 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved. - * Copyright (c) 2009 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2014-2015 Los Alamos National Security, LLC. All rights - * reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ -/** - * @file - */ -#ifndef MCA_BTL_SM_SEND_FRAG_H -#define MCA_BTL_SM_SEND_FRAG_H - -#include "opal_config.h" -#include "btl_sm.h" - - -#define MCA_BTL_SM_FRAG_TYPE_MASK ((uintptr_t)0x3) -#define MCA_BTL_SM_FRAG_SEND ((uintptr_t)0x0) -#define MCA_BTL_SM_FRAG_ACK ((uintptr_t)0x1) -#define MCA_BTL_SM_FRAG_PUT ((uintptr_t)0x2) -#define MCA_BTL_SM_FRAG_GET ((uintptr_t)0x3) - -#define MCA_BTL_SM_FRAG_STATUS_MASK ((uintptr_t)0x4) - -struct mca_btl_sm_frag_t; - -struct mca_btl_sm_hdr_t { - struct mca_btl_sm_frag_t *frag; - size_t len; - int my_smp_rank; - mca_btl_base_tag_t tag; -}; -typedef struct mca_btl_sm_hdr_t mca_btl_sm_hdr_t; - -struct mca_btl_sm_segment_t { - mca_btl_base_segment_t base; -#if OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA - uint64_t key; -#endif /* OPAL_BTL_SM_HAVE_KNEM || OPAL_BTL_SM_HAVE_CMA */ -}; -typedef struct mca_btl_sm_segment_t mca_btl_sm_segment_t; - -/** - * shared memory send fragment derived type. - */ -struct mca_btl_sm_frag_t { - mca_btl_base_descriptor_t base; - mca_btl_sm_segment_t segment; - struct mca_btl_base_endpoint_t *endpoint; - size_t size; - /* pointer written to the FIFO, this is the base of the shared memory region */ - mca_btl_sm_hdr_t *hdr; - opal_free_list_t* my_list; -#if OPAL_BTL_SM_HAVE_KNEM - /* rdma callback data. required for async get */ - struct { - mca_btl_base_rdma_completion_fn_t func; - void *local_address; - struct mca_btl_base_registration_handle_t *local_handle; - void *context; - void *data; - } cb; -#endif -}; -typedef struct mca_btl_sm_frag_t mca_btl_sm_frag_t; -typedef struct mca_btl_sm_frag_t mca_btl_sm_frag1_t; -typedef struct mca_btl_sm_frag_t mca_btl_sm_frag2_t; -typedef struct mca_btl_sm_frag_t mca_btl_sm_user_t; - - -OBJ_CLASS_DECLARATION(mca_btl_sm_frag_t); -OBJ_CLASS_DECLARATION(mca_btl_sm_frag1_t); -OBJ_CLASS_DECLARATION(mca_btl_sm_frag2_t); -OBJ_CLASS_DECLARATION(mca_btl_sm_user_t); - -#define MCA_BTL_SM_FRAG_ALLOC_EAGER(frag) \ -{ \ - frag = (mca_btl_sm_frag_t*) \ - opal_free_list_get (&mca_btl_sm_component.sm_frags_eager); \ -} - -#define MCA_BTL_SM_FRAG_ALLOC_MAX(frag) \ -{ \ - frag = (mca_btl_sm_frag_t*) \ - opal_free_list_get (&mca_btl_sm_component.sm_frags_max); \ -} - -#define MCA_BTL_SM_FRAG_ALLOC_USER(frag) \ -{ \ - frag = (mca_btl_sm_frag_t*) \ - opal_free_list_get (&mca_btl_sm_component.sm_frags_user); \ -} - - -#define MCA_BTL_SM_FRAG_RETURN(frag) \ -{ \ - opal_free_list_return (frag->my_list, (opal_free_list_item_t*)(frag)); \ -} -#endif diff --git a/opal/mca/btl/sm/configure.m4 b/opal/mca/btl/sm/configure.m4 index 6caad120441..d288497287b 100644 --- a/opal/mca/btl/sm/configure.m4 +++ b/opal/mca/btl/sm/configure.m4 @@ -3,7 +3,7 @@ # Copyright (c) 2009 The University of Tennessee and The University # of Tennessee Research Foundation. All rights # reserved. -# Copyright (c) 2009-2010 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2009-2017 Cisco Systems, Inc. All rights reserved # Copyright (c) 2010-2012 IBM Corporation. All rights reserved. # Copyright (c) 2014 Los Alamos National Security, LLC. All rights # reserved. @@ -14,31 +14,13 @@ # $HEADER$ # +# The "sm" BTL is effectively dead; it has been wholly replaced +# by the "vader" BTL. This BTL now only exists to provide a help +# message to users advising them to use the "vader" BTL. + # MCA_btl_sm_CONFIG([action-if-can-compile], # [action-if-cant-compile]) # ------------------------------------------------ AC_DEFUN([MCA_opal_btl_sm_CONFIG],[ AC_CONFIG_FILES([opal/mca/btl/sm/Makefile]) - - OPAL_VAR_SCOPE_PUSH([btl_sm_cma_happy]) - OPAL_CHECK_CMA([btl_sm], [btl_sm_cma_happy=1], [btl_sm_cma_happy=0]) - - AC_DEFINE_UNQUOTED([OPAL_BTL_SM_HAVE_CMA], - [$btl_sm_cma_happy], - [If CMA support can be enabled]) - - OPAL_VAR_SCOPE_POP - - OPAL_VAR_SCOPE_PUSH([btl_sm_knem_happy]) - OPAL_CHECK_KNEM([btl_sm], - [btl_sm_knem_happy=1], - [btl_sm_knem_happy=0]) - - AC_DEFINE_UNQUOTED([OPAL_BTL_SM_HAVE_KNEM], - [$btl_sm_knem_happy], - [If knem support can be enabled]) - [$1] - # substitute in the things needed to build KNEM - AC_SUBST([btl_sm_CPPFLAGS]) - OPAL_VAR_SCOPE_POP ])dnl diff --git a/opal/mca/btl/sm/help-mpi-btl-sm.txt b/opal/mca/btl/sm/help-mpi-btl-sm.txt index 3cb288cd0da..8424944ccae 100644 --- a/opal/mca/btl/sm/help-mpi-btl-sm.txt +++ b/opal/mca/btl/sm/help-mpi-btl-sm.txt @@ -3,7 +3,7 @@ # Copyright (c) 2004-2009 The University of Tennessee and The University # of Tennessee Research Foundation. All rights # reserved. -# Copyright (c) 2006-2014 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2006-2017 Cisco Systems, Inc. All rights reserved # Copyright (c) 2012-2013 Los Alamos National Security, LLC. # All rights reserved. # $COPYRIGHT$ @@ -12,96 +12,13 @@ # # $HEADER$ # -# This is the US/English help file for Open MPI's shared memory support. +# This is the US/English help file for the deprecated "sm" BTL. # -[sys call fail] -A system call failed during sm BTL initialization that should -not have. It is likely that your MPI job will now either abort or -experience performance degradation. +[btl sm is dead] +As of version 3.0.0, the "sm" BTL is no longer available in Open MPI. - System call: %s - Error: %s (errno %d) -# -[no locality] -WARNING: Missing locality information required for sm initialization. -Continuing without shared memory support. -# -[knem permission denied] -Open MPI failed to open the /dev/knem device due to a permissions -problem. Please check with your system administrator to get the -permissions fixed, or set the btl_sm_use_knem MCA parameter to 0 to -run without /dev/knem support. - - Local host: %s - /dev/knem permissions: 0%o -# -[knem fail open] -Open MPI failed to open the /dev/knem device due to a local error. -Please check with your system administrator to get the problem fixed, -or set the btl_sm_use_knem MCA parameter to 0 to run without /dev/knem -support. - - Local host: %s - Errno: %d (%s) -# -[knem get ABI fail] -Open MPI failed to retrieve the ABI version from the /dev/knem device -due to a local error. This usually indicates an error in your -/dev/knem installation; please check with your system administrator, -or set the btl_sm_use_knem MCA parameter to 0 to run without /dev/knem -support. - - Local host: %s - Errno: %d (%s) -# -[knem ABI mismatch] -Open MPI was compiled with support for one version of the knem kernel -module, but it discovered a different version running in /dev/knem. -Open MPI needs to be installed with support for the same version of -knem as is in the running Linux kernel. Please check with your system -administrator, or set the btl_sm_use_knem MCA parameter to 0 to run -without /dev/knem support. - - Local host: %s - Open MPI's knem version: 0x%x - /dev/knem's version: 0x%x -# -[knem mmap fail] -Open MPI failed to map support from the knem Linux kernel module; this -shouldn't happen. Please check with your system administrator, or set -the btl_sm_use_knem MCA parameter to 0 to run without /dev/knem support. - - Local host: %s - System call: mmap() - Errno: %d (%s) -# -[knem init error] -Open MPI encountered an error during the knem initialization. Please -check with your system administrator, or set the btl_sm_use_knem MCA -parameter to 0 to run without /dev/knem support. - - Local host: %s - System call: %s - Errno: %d (%s) -# -[knem requested but not available] -WARNING: Linux kernel Knem support was requested via the -mca_btl_sm_use_knem MCA parameter, but Knem support was either not -compiled into this Open MPI installation, or Knem support was unable -to be activated in this process. - -The shared memory BTL will now deactivate itself, likely resulting in -lower performance for on-node communication. - - Local host: %s -# -[CMA requested but not available] -WARNING: Linux kernel CMA support was requested via the -mca_btl_sm_use_cma MCA parameter, but CMA support was either not -compiled into this Open MPI installation, or CMA support was unable -to be activated in this process. - -The shared memory BTL will now deactivate itself, likely resulting in -lower performance for on-node communication. +Efficient, high-speed same-node shared memory communication support in +Open MPI is available in the "vader" BTL. To use the vader BTL, you +can re-run your job with: - Local host: %s + mpirun --mca btl vader,self,... your_mpi_application diff --git a/opal/mca/btl/smcuda/Makefile.am b/opal/mca/btl/smcuda/Makefile.am index 077ddc792f4..733965596fd 100644 --- a/opal/mca/btl/smcuda/Makefile.am +++ b/opal/mca/btl/smcuda/Makefile.am @@ -11,6 +11,7 @@ # All rights reserved. # Copyright (c) 2009-2014 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2012 NVIDIA Corporation. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -48,7 +49,7 @@ mcacomponentdir = $(opallibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_btl_smcuda_la_SOURCES = $(libmca_btl_smcuda_la_sources) mca_btl_smcuda_la_LDFLAGS = -module -avoid-version -mca_btl_smcuda_la_LIBADD = \ +mca_btl_smcuda_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la \ $(OPAL_TOP_BUILDDIR)/opal/mca/common/sm/lib@OPAL_LIB_PREFIX@mca_common_sm.la mca_btl_smcuda_la_CPPFLAGS = $(btl_smcuda_CPPFLAGS) if OPAL_cuda_support diff --git a/opal/mca/btl/smcuda/README b/opal/mca/btl/smcuda/README index 8b9bcf91296..859015e1a45 100644 --- a/opal/mca/btl/smcuda/README +++ b/opal/mca/btl/smcuda/README @@ -109,5 +109,5 @@ remote read of the GPU data. The receiver maintains a cache of remote memory that it has handles open on. This is because a call to cuIpcOpenMemHandle() can be very expensive (90usec) so we want to avoid it when we can. The cache of remote memory is kept in a memory -pool that is associated with each endpoint. Note that we do not cache the loca +pool that is associated with each endpoint. Note that we do not cache the local memory handles because getting them is very cheap and there is no need. diff --git a/opal/mca/btl/smcuda/btl_smcuda.c b/opal/mca/btl/smcuda/btl_smcuda.c index 5f10ccd560b..561585ea4bf 100644 --- a/opal/mca/btl/smcuda/btl_smcuda.c +++ b/opal/mca/btl/smcuda/btl_smcuda.c @@ -12,11 +12,11 @@ * All rights reserved. * Copyright (c) 2006-2007 Voltaire. All rights reserved. * Copyright (c) 2009-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2010-2016 Los Alamos National Security, LLC. All rights + * Copyright (c) 2010-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2012-2015 NVIDIA Corporation. All rights reserved. * Copyright (c) 2012 Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2014 Research Organization for Information Science + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015-2016 Intel, Inc. All rights reserved. * $COPYRIGHT$ @@ -296,7 +296,6 @@ smcuda_btl_first_time_init(mca_btl_smcuda_t *smcuda_btl, num_mem_nodes > 0 && NULL != opal_process_info.cpuset) { int numa=0, w; unsigned n_bound=0; - hwloc_cpuset_t avail; hwloc_obj_t obj; /* count the number of NUMA nodes to which we are bound */ @@ -306,10 +305,8 @@ smcuda_btl_first_time_init(mca_btl_smcuda_t *smcuda_btl, OPAL_HWLOC_AVAILABLE))) { continue; } - /* get that NUMA node's available cpus */ - avail = opal_hwloc_base_get_available_cpus(opal_hwloc_topology, obj); - /* see if we intersect */ - if (hwloc_bitmap_intersects(avail, opal_hwloc_my_cpuset)) { + /* see if we intersect with that NUMA node's cpus */ + if (hwloc_bitmap_intersects(obj->cpuset, opal_hwloc_my_cpuset)) { n_bound++; numa = w; } @@ -639,7 +636,7 @@ int mca_btl_smcuda_add_procs( /* Sync with other local procs. Force the FIFO initialization to always * happens before the readers access it. */ - (void)opal_atomic_add_32(&mca_btl_smcuda_component.sm_seg->module_seg->seg_inited, 1); + (void)opal_atomic_add_fetch_32(&mca_btl_smcuda_component.sm_seg->module_seg->seg_inited, 1); while( n_local_procs > mca_btl_smcuda_component.sm_seg->module_seg->seg_inited) { opal_progress(); @@ -979,7 +976,7 @@ int mca_btl_smcuda_sendi( struct mca_btl_base_module_t* btl, * the return code indicates failure, the write has still "completed" from * our point of view: it has been posted to a "pending send" queue. */ - OPAL_THREAD_ADD32(&mca_btl_smcuda_component.num_outstanding_frags, +1); + OPAL_THREAD_ADD_FETCH32(&mca_btl_smcuda_component.num_outstanding_frags, +1); MCA_BTL_SMCUDA_FIFO_WRITE(endpoint, endpoint->my_smp_rank, endpoint->peer_smp_rank, (void *) VIRTUAL2RELATIVE(frag->hdr), false, true, rc); (void)rc; /* this is safe to ignore as the message is requeued till success */ @@ -1029,7 +1026,7 @@ int mca_btl_smcuda_send( struct mca_btl_base_module_t* btl, * post the descriptor in the queue - post with the relative * address */ - OPAL_THREAD_ADD32(&mca_btl_smcuda_component.num_outstanding_frags, +1); + OPAL_THREAD_ADD_FETCH32(&mca_btl_smcuda_component.num_outstanding_frags, +1); MCA_BTL_SMCUDA_FIFO_WRITE(endpoint, endpoint->my_smp_rank, endpoint->peer_smp_rank, (void *) VIRTUAL2RELATIVE(frag->hdr), false, true, rc); if( OPAL_LIKELY(0 == rc) ) { @@ -1244,7 +1241,7 @@ static void mca_btl_smcuda_send_cuda_ipc_request(struct mca_btl_base_module_t* b * the return code indicates failure, the write has still "completed" from * our point of view: it has been posted to a "pending send" queue. */ - OPAL_THREAD_ADD32(&mca_btl_smcuda_component.num_outstanding_frags, +1); + OPAL_THREAD_ADD_FETCH32(&mca_btl_smcuda_component.num_outstanding_frags, +1); opal_output_verbose(10, mca_btl_smcuda_component.cuda_ipc_output, "Sending CUDA IPC REQ (try=%d): myrank=%d, mydev=%d, peerrank=%d", endpoint->ipctries, @@ -1266,17 +1263,13 @@ void mca_btl_smcuda_dump(struct mca_btl_base_module_t* btl, struct mca_btl_base_endpoint_t* endpoint, int verbose) { - opal_list_item_t *item; mca_btl_smcuda_frag_t* frag; mca_btl_base_err("BTL SM %p endpoint %p [smp_rank %d] [peer_rank %d]\n", (void*) btl, (void*) endpoint, endpoint->my_smp_rank, endpoint->peer_smp_rank); if( NULL != endpoint ) { - for(item = opal_list_get_first(&endpoint->pending_sends); - item != opal_list_get_end(&endpoint->pending_sends); - item = opal_list_get_next(item)) { - frag = (mca_btl_smcuda_frag_t*)item; + OPAL_LIST_FOREACH(frag, &endpoint->pending_sends, mca_btl_smcuda_frag_t) { mca_btl_base_err(" | frag %p size %lu (hdr frag %p len %lu rank %d tag %d)\n", (void*) frag, frag->size, (void*) frag->hdr->frag, frag->hdr->len, frag->hdr->my_smp_rank, diff --git a/opal/mca/btl/smcuda/btl_smcuda.h b/opal/mca/btl/smcuda/btl_smcuda.h index 807d9081161..2fe7df377d4 100644 --- a/opal/mca/btl/smcuda/btl_smcuda.h +++ b/opal/mca/btl/smcuda/btl_smcuda.h @@ -269,8 +269,8 @@ static inline int sm_fifo_init(int fifo_size, mca_mpool_base_module_t *mpool, fifo->queue = (volatile void **) VIRTUAL2RELATIVE(fifo->queue_recv); /* initialize the locks */ - opal_atomic_init(&(fifo->head_lock), OPAL_ATOMIC_UNLOCKED); - opal_atomic_init(&(fifo->tail_lock), OPAL_ATOMIC_UNLOCKED); + opal_atomic_lock_init(&(fifo->head_lock), OPAL_ATOMIC_LOCK_UNLOCKED); + opal_atomic_lock_init(&(fifo->tail_lock), OPAL_ATOMIC_LOCK_UNLOCKED); opal_atomic_unlock(&(fifo->head_lock)); /* should be unnecessary */ opal_atomic_unlock(&(fifo->tail_lock)); /* should be unnecessary */ diff --git a/opal/mca/btl/smcuda/btl_smcuda_component.c b/opal/mca/btl/smcuda/btl_smcuda_component.c index 8aedf9f1d7a..d77398a9965 100644 --- a/opal/mca/btl/smcuda/btl_smcuda_component.c +++ b/opal/mca/btl/smcuda/btl_smcuda_component.c @@ -658,7 +658,7 @@ static void mca_btl_smcuda_send_cuda_ipc_ack(struct mca_btl_base_module_t* btl, * the return code indicates failure, the write has still "completed" from * our point of view: it has been posted to a "pending send" queue. */ - OPAL_THREAD_ADD32(&mca_btl_smcuda_component.num_outstanding_frags, +1); + OPAL_THREAD_ADD_FETCH32(&mca_btl_smcuda_component.num_outstanding_frags, +1); MCA_BTL_SMCUDA_FIFO_WRITE(endpoint, endpoint->my_smp_rank, endpoint->peer_smp_rank, (void *) VIRTUAL2RELATIVE(frag->hdr), false, true, rc); @@ -980,7 +980,7 @@ void btl_smcuda_process_pending_sends(struct mca_btl_base_endpoint_t *ep) if(NULL == si) return; /* Another thread got in before us. Thats ok. */ - OPAL_THREAD_ADD32(&mca_btl_smcuda_component.num_pending_sends, -1); + OPAL_THREAD_ADD_FETCH32(&mca_btl_smcuda_component.num_pending_sends, -1); MCA_BTL_SMCUDA_FIFO_WRITE(ep, ep->my_smp_rank, ep->peer_smp_rank, si->data, true, false, rc); @@ -1093,7 +1093,7 @@ int mca_btl_smcuda_component_progress(void) if( btl_ownership ) { MCA_BTL_SMCUDA_FRAG_RETURN(frag); } - OPAL_THREAD_ADD32(&mca_btl_smcuda_component.num_outstanding_frags, -1); + OPAL_THREAD_ADD_FETCH32(&mca_btl_smcuda_component.num_outstanding_frags, -1); if ( 0 < opal_list_get_size(&endpoint->pending_sends) ) { btl_smcuda_process_pending_sends(endpoint); } diff --git a/opal/mca/btl/smcuda/btl_smcuda_fifo.h b/opal/mca/btl/smcuda/btl_smcuda_fifo.h index 7fcf2c1c98c..c4db00d10a8 100644 --- a/opal/mca/btl/smcuda/btl_smcuda_fifo.h +++ b/opal/mca/btl/smcuda/btl_smcuda_fifo.h @@ -40,7 +40,7 @@ add_pending(struct mca_btl_base_endpoint_t *ep, void *data, bool resend) si = (btl_smcuda_pending_send_item_t*)i; si->data = data; - OPAL_THREAD_ADD32(&mca_btl_smcuda_component.num_pending_sends, +1); + OPAL_THREAD_ADD_FETCH32(&mca_btl_smcuda_component.num_pending_sends, +1); /* if data was on pending send list then prepend it to the list to * minimize reordering */ diff --git a/opal/mca/btl/tcp/Makefile.am b/opal/mca/btl/tcp/Makefile.am index 76f11849674..322a29507ef 100644 --- a/opal/mca/btl/tcp/Makefile.am +++ b/opal/mca/btl/tcp/Makefile.am @@ -11,6 +11,7 @@ # All rights reserved. # Copyright (c) 2009-2014 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2013 NVIDIA Corporation. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -56,7 +57,7 @@ mcacomponent_LTLIBRARIES = $(component) mca_btl_tcp_la_SOURCES = $(component_sources) mca_btl_tcp_la_LDFLAGS = -module -avoid-version if OPAL_cuda_support -mca_btl_tcp_la_LIBADD = \ +mca_btl_tcp_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la \ $(OPAL_TOP_BUILDDIR)/opal/mca/common/cuda/lib@OPAL_LIB_PREFIX@mca_common_cuda.la endif diff --git a/opal/mca/btl/tcp/btl_tcp.c b/opal/mca/btl/tcp/btl_tcp.c index ac6289cf1f9..f007565be38 100644 --- a/opal/mca/btl/tcp/btl_tcp.c +++ b/opal/mca/btl/tcp/btl_tcp.c @@ -12,7 +12,7 @@ * All rights reserved. * Copyright (c) 2006-2015 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2016 Research Organization for Information Science + * Copyright (c) 2016-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2016 Intel, Inc. All rights reserved. * @@ -31,12 +31,14 @@ #include "opal/mca/mpool/base/base.h" #include "opal/mca/mpool/mpool.h" #include "opal/mca/btl/base/btl_base_error.h" +#include "opal/opal_socket_errno.h" #include "btl_tcp.h" #include "btl_tcp_frag.h" #include "btl_tcp_proc.h" #include "btl_tcp_endpoint.h" + mca_btl_tcp_module_t mca_btl_tcp_module = { .super = { .btl_component = &mca_btl_tcp_component.super, @@ -135,11 +137,6 @@ int mca_btl_tcp_add_procs( struct mca_btl_base_module_t* btl, } peers[i] = tcp_endpoint; - - /* we increase the count of MPI users of the event library - once per peer, so that we are used until we aren't - connected to a peer */ - opal_progress_event_users_increment(); } return OPAL_SUCCESS; @@ -158,7 +155,6 @@ int mca_btl_tcp_del_procs(struct mca_btl_base_module_t* btl, mca_btl_tcp_endpoint_t* tcp_endpoint = endpoints[i]; opal_list_remove_item(&tcp_btl->tcp_endpoints, (opal_list_item_t*)tcp_endpoint); OBJ_RELEASE(tcp_endpoint); - opal_progress_event_users_decrement(); } OPAL_THREAD_UNLOCK(&tcp_btl->tcp_endpoints_mutex); return OPAL_SUCCESS; @@ -381,6 +377,7 @@ int mca_btl_tcp_put (mca_btl_base_module_t *btl, struct mca_btl_base_endpoint_t frag->segments[1].seg_addr.lval = remote_address; frag->segments[1].seg_len = size; + if (endpoint->endpoint_nbo) MCA_BTL_BASE_SEGMENT_HTON(frag->segments[1]); frag->base.des_flags = MCA_BTL_DES_FLAGS_BTL_OWNERSHIP | MCA_BTL_DES_SEND_ALWAYS_CALLBACK; frag->base.des_cbfunc = fake_rdma_complete; @@ -492,7 +489,6 @@ int mca_btl_tcp_finalize(struct mca_btl_base_module_t* btl) item = opal_list_remove_first(&tcp_btl->tcp_endpoints)) { mca_btl_tcp_endpoint_t *endpoint = (mca_btl_tcp_endpoint_t*)item; OBJ_RELEASE(endpoint); - opal_progress_event_users_decrement(); } free(tcp_btl); return OPAL_SUCCESS; @@ -530,3 +526,69 @@ void mca_btl_tcp_dump(struct mca_btl_base_module_t* base_btl, } #endif /* OPAL_ENABLE_DEBUG && WANT_PEER_DUMP */ } + + +/* + * A blocking recv for both blocking and non-blocking socket. + * Used to receive the small amount of connection information + * that identifies the endpoints + * + * when the socket is blocking (the caller introduces timeout) + * which happens during initial handshake otherwise socket is + * non-blocking most of the time. + */ + +int mca_btl_tcp_recv_blocking(int sd, void* data, size_t size) +{ + unsigned char* ptr = (unsigned char*)data; + size_t cnt = 0; + while (cnt < size) { + int retval = recv(sd, ((char *)ptr) + cnt, size - cnt, 0); + /* remote closed connection */ + if (0 == retval) { + OPAL_OUTPUT_VERBOSE((100, opal_btl_base_framework.framework_output, + "remote peer unexpectedly closed connection while I was waiting for a blocking message")); + break; + } + + /* socket is non-blocking so handle errors */ + if (retval < 0) { + if (opal_socket_errno != EINTR && + opal_socket_errno != EAGAIN && + opal_socket_errno != EWOULDBLOCK) { + BTL_ERROR(("recv(%d) failed: %s (%d)", sd, strerror(opal_socket_errno), opal_socket_errno)); + break; + } + continue; + } + cnt += retval; + } + return cnt; +} + + +/* + * A blocking send on a non-blocking socket. Used to send the small + * amount of connection information used during the initial handshake + * (magic string plus process guid) + */ + +int mca_btl_tcp_send_blocking(int sd, const void* data, size_t size) +{ + unsigned char* ptr = (unsigned char*)data; + size_t cnt = 0; + while(cnt < size) { + int retval = send(sd, ((const char *)ptr) + cnt, size - cnt, 0); + if (retval < 0) { + if (opal_socket_errno != EINTR && + opal_socket_errno != EAGAIN && + opal_socket_errno != EWOULDBLOCK) { + BTL_ERROR(("send() failed: %s (%d)", strerror(opal_socket_errno), opal_socket_errno)); + return -1; + } + continue; + } + cnt += retval; + } + return cnt; +} diff --git a/opal/mca/btl/tcp/btl_tcp.h b/opal/mca/btl/tcp/btl_tcp.h index c78bd30174b..846ee3b7ca9 100644 --- a/opal/mca/btl/tcp/btl_tcp.h +++ b/opal/mca/btl/tcp/btl_tcp.h @@ -167,7 +167,10 @@ struct mca_btl_tcp_module_t { #if 0 int tcp_ifindex; /**< BTL interface index */ #endif - struct sockaddr_storage tcp_ifaddr; /**< BTL interface address */ + struct sockaddr_storage tcp_ifaddr; /**< First IPv4 address discovered for this interface, bound as sending address for this BTL */ +#if OPAL_ENABLE_IPV6 + struct sockaddr_storage tcp_ifaddr_6; /**< First IPv6 address discovered for this interface, bound as sending address for this BTL */ +#endif uint32_t tcp_ifmask; /**< BTL interface netmask */ opal_mutex_t tcp_endpoints_mutex; @@ -351,5 +354,23 @@ mca_btl_tcp_dump(struct mca_btl_base_module_t* btl, */ int mca_btl_tcp_ft_event(int state); +/* + * A blocking send on a non-blocking socket. Used to send the small + * amount of connection information that identifies the endpoints + * endpoint. + */ +int mca_btl_tcp_send_blocking(int sd, const void* data, size_t size); + +/* + * A blocking recv for both blocking and non-blocking socket. + * Used to receive the small amount of connection information + * that identifies the endpoints + * + * when the socket is blocking (the caller introduces timeout) + * which happens during initial handshake otherwise socket is + * non-blocking most of the time. + */ +int mca_btl_tcp_recv_blocking(int sd, void* data, size_t size); + END_C_DECLS #endif diff --git a/opal/mca/btl/tcp/btl_tcp_component.c b/opal/mca/btl/tcp/btl_tcp_component.c index 4b9711531bb..e8b05880155 100644 --- a/opal/mca/btl/tcp/btl_tcp_component.c +++ b/opal/mca/btl/tcp/btl_tcp_component.c @@ -10,7 +10,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. - * Copyright (c) 2007-2015 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2007-2018 Cisco Systems, Inc. All rights reserved * Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2009 Oak Ridge National Laboratory * Copyright (c) 2012-2015 Los Alamos National Security, LLC. All rights @@ -54,6 +54,9 @@ #endif #include #include +#ifdef HAVE_SYS_TIME_H +#include +#endif #include "opal/mca/event/event.h" #include "opal/util/ethtool.h" @@ -250,8 +253,20 @@ static int mca_btl_tcp_component_register(void) mca_btl_tcp_param_register_int ("free_list_num", NULL, 8, OPAL_INFO_LVL_5, &mca_btl_tcp_component.tcp_free_list_num); mca_btl_tcp_param_register_int ("free_list_max", NULL, -1, OPAL_INFO_LVL_5, &mca_btl_tcp_component.tcp_free_list_max); mca_btl_tcp_param_register_int ("free_list_inc", NULL, 32, OPAL_INFO_LVL_5, &mca_btl_tcp_component.tcp_free_list_inc); - mca_btl_tcp_param_register_int ("sndbuf", NULL, 128*1024, OPAL_INFO_LVL_4, &mca_btl_tcp_component.tcp_sndbuf); - mca_btl_tcp_param_register_int ("rcvbuf", NULL, 128*1024, OPAL_INFO_LVL_4, &mca_btl_tcp_component.tcp_rcvbuf); + mca_btl_tcp_param_register_int ("sndbuf", + "The size of the send buffer socket option for each connection. " + "Modern TCP stacks generally are smarter than a fixed size and in some " + "situations setting a buffer size explicitly can actually lower " + "performance. 0 means the tcp btl will not try to set a send buffer " + "size.", + 0, OPAL_INFO_LVL_4, &mca_btl_tcp_component.tcp_sndbuf); + mca_btl_tcp_param_register_int ("rcvbuf", + "The size of the receive buffer socket option for each connection. " + "Modern TCP stacks generally are smarter than a fixed size and in some " + "situations setting a buffer size explicitly can actually lower " + "performance. 0 means the tcp btl will not try to set a send buffer " + "size.", + 0, OPAL_INFO_LVL_4, &mca_btl_tcp_component.tcp_rcvbuf); mca_btl_tcp_param_register_int ("endpoint_cache", "The size of the internal cache for each TCP connection. This cache is" " used to reduce the number of syscalls, by replacing them with memcpy." @@ -303,7 +318,12 @@ static int mca_btl_tcp_component_register(void) mca_btl_tcp_module.super.btl_rndv_eager_limit = 64*1024; mca_btl_tcp_module.super.btl_max_send_size = 128*1024; mca_btl_tcp_module.super.btl_rdma_pipeline_send_length = 128*1024; - mca_btl_tcp_module.super.btl_rdma_pipeline_frag_size = INT_MAX; + /* Some OSes have hard coded limits on how many bytes can be manipulated + * by each writev operation. Force a reasonable limit, to prevent overflowing + * a signed 32-bit integer (limit comes from BSD and OS X). We remove 1k to + * make some room for our internal headers. + */ + mca_btl_tcp_module.super.btl_rdma_pipeline_frag_size = ((1UL<<31) - 1024); mca_btl_tcp_module.super.btl_min_rdma_pipeline_size = 0; mca_btl_tcp_module.super.btl_flags = MCA_BTL_FLAGS_PUT | MCA_BTL_FLAGS_SEND_INPLACE | @@ -320,7 +340,11 @@ static int mca_btl_tcp_component_register(void) mca_btl_base_param_register(&mca_btl_tcp_component.super.btl_version, &mca_btl_tcp_module.super); - + if (mca_btl_tcp_module.super.btl_rdma_pipeline_frag_size > ((1UL<<31) - 1024) ) { + /* Assume a hard limit. A test in configure would be a better solution, but until then + * kicking-in the pipeline RDMA for extremely large data is good enough. */ + mca_btl_tcp_module.super.btl_rdma_pipeline_frag_size = ((1UL<<31) - 1024); + } mca_btl_tcp_param_register_int ("disable_family", NULL, 0, OPAL_INFO_LVL_2, &mca_btl_tcp_component.tcp_disable_family); return mca_btl_tcp_component_verify(); @@ -487,6 +511,17 @@ static int mca_btl_tcp_create(int if_kindex, const char* if_name) btl->tcp_send_handler = 0; #endif + struct sockaddr_storage addr; + opal_ifkindextoaddr(if_kindex, (struct sockaddr*) &addr, + sizeof (struct sockaddr_storage)); +#if OPAL_ENABLE_IPV6 + if (addr.ss_family == AF_INET6) { + btl->tcp_ifaddr_6 = addr; + } +#endif + if (addr.ss_family == AF_INET) { + btl->tcp_ifaddr = addr; + } /* allow user to specify interface bandwidth */ sprintf(param, "bandwidth_%s", if_name); mca_btl_tcp_param_register_uint(param, NULL, btl->super.btl_bandwidth, OPAL_INFO_LVL_5, &btl->super.btl_bandwidth); @@ -717,7 +752,9 @@ static int mca_btl_tcp_component_create_instances(void) char* if_name = *argv; int if_index = opal_ifnametokindex(if_name); if(if_index < 0) { - BTL_ERROR(("invalid interface \"%s\"", if_name)); + opal_show_help("help-mpi-btl-tcp.txt", "invalid if_inexclude", + true, "include", opal_process_info.nodename, + if_name, "Unknown interface name"); ret = OPAL_ERR_NOT_FOUND; goto cleanup; } @@ -844,13 +881,18 @@ static int mca_btl_tcp_component_create_listen(uint16_t af_family) freeaddrinfo (res); #ifdef IPV6_V6ONLY - /* in case of AF_INET6, disable v4-mapped addresses */ + /* If this OS supports the "IPV6_V6ONLY" constant, then set it + on this socket. It specifies that *only* V6 connections + should be accepted on this socket (vs. allowing incoming + both V4 and V6 connections -- which is actually defined + behavior for V6<-->V4 interop stuff). See + https://github.com/open-mpi/ompi/commit/95d7e08a6617530d57b6700c57738b351bfccbf8 for some + more details. */ if (AF_INET6 == af_family) { int flg = 1; if (setsockopt (sd, IPPROTO_IPV6, IPV6_V6ONLY, (char *) &flg, sizeof (flg)) < 0) { - opal_output(0, - "mca_btl_tcp_create_listen: unable to disable v4-mapped addresses\n"); + BTL_ERROR(("mca_btl_tcp_create_listen: unable to set IPV6_V6ONLY\n")); } } #endif /* IPV6_V6ONLY */ @@ -892,6 +934,10 @@ static int mca_btl_tcp_component_create_listen(uint16_t af_family) #else ((struct sockaddr_in*) &inaddr)->sin_port = htons(port + index); #endif /* OPAL_ENABLE_IPV6 */ + opal_output_verbose(30, opal_btl_base_framework.framework_output, + "btl:tcp: Attempting to bind to %s port %d", + (AF_INET == af_family) ? "AF_INET" : "AF_INET6", + port + index); if(bind(sd, (struct sockaddr*)&inaddr, addrlen) < 0) { if( (EADDRINUSE == opal_socket_errno) || (EADDRNOTAVAIL == opal_socket_errno) ) { continue; @@ -901,6 +947,10 @@ static int mca_btl_tcp_component_create_listen(uint16_t af_family) CLOSE_THE_SOCKET(sd); return OPAL_ERROR; } + opal_output_verbose(30, opal_btl_base_framework.framework_output, + "btl:tcp: Successfully bound to %s port %d", + (AF_INET == af_family) ? "AF_INET" : "AF_INET6", + port + index); goto socket_binded; } #if OPAL_ENABLE_IPV6 @@ -931,11 +981,19 @@ static int mca_btl_tcp_component_create_listen(uint16_t af_family) if (AF_INET6 == af_family) { mca_btl_tcp_component.tcp6_listen_port = ((struct sockaddr_in6*) &inaddr)->sin6_port; mca_btl_tcp_component.tcp6_listen_sd = sd; + opal_output_verbose(30, opal_btl_base_framework.framework_output, + "btl:tcp: my listening v6 socket port is %d", + ntohs(mca_btl_tcp_component.tcp6_listen_port)); } else #endif { + char str[16]; mca_btl_tcp_component.tcp_listen_port = ((struct sockaddr_in*) &inaddr)->sin_port; mca_btl_tcp_component.tcp_listen_sd = sd; + inet_ntop(AF_INET, &(((struct sockaddr_in*)&inaddr)->sin_addr), str, sizeof(str)); + opal_output_verbose(30, opal_btl_base_framework.framework_output, + "btl:tcp: my listening v4 socket is %s:%u", + str, ntohs(mca_btl_tcp_component.tcp_listen_port)); } /* setup listen backlog to maximum allowed by kernel */ @@ -948,15 +1006,20 @@ static int mca_btl_tcp_component_create_listen(uint16_t af_family) /* set socket up to be non-blocking, otherwise accept could block */ if((flags = fcntl(sd, F_GETFL, 0)) < 0) { - BTL_ERROR(("fcntl(F_GETFL) failed: %s (%d)", - strerror(opal_socket_errno), opal_socket_errno)); + opal_show_help("help-mpi-btl-tcp.txt", "socket flag fail", + true, opal_process_info.nodename, + getpid(), "fcntl(sd, F_GETFL, 0)", + strerror(opal_socket_errno), opal_socket_errno); CLOSE_THE_SOCKET(sd); return OPAL_ERROR; } else { flags |= O_NONBLOCK; if(fcntl(sd, F_SETFL, flags) < 0) { - BTL_ERROR(("fcntl(F_SETFL) failed: %s (%d)", - strerror(opal_socket_errno), opal_socket_errno)); + opal_show_help("help-mpi-btl-tcp.txt", "socket flag fail", + true, opal_process_info.nodename, + getpid(), + "fcntl(sd, F_SETFL, flags & O_NONBLOCK)", + strerror(opal_socket_errno), opal_socket_errno); CLOSE_THE_SOCKET(sd); return OPAL_ERROR; } @@ -1069,6 +1132,7 @@ static int mca_btl_tcp_component_exchange(void) size_t current_addr = 0; if(mca_btl_tcp_component.tcp_num_btls != 0) { + char ifn[32]; mca_btl_tcp_addr_t *addrs = (mca_btl_tcp_addr_t *)malloc(size); memset(addrs, 0, size); @@ -1086,6 +1150,9 @@ static int mca_btl_tcp_component_exchange(void) continue; } + opal_ifindextoname(index, ifn, sizeof(ifn)); + opal_output_verbose(30, opal_btl_base_framework.framework_output, + "btl:tcp: examining interface %s", ifn); if (OPAL_SUCCESS != opal_ifindextoaddr(index, (struct sockaddr*) &my_ss, sizeof (my_ss))) { @@ -1109,13 +1176,15 @@ static int mca_btl_tcp_component_exchange(void) addrs[current_addr].addr_ifkindex = opal_ifindextokindex (index); current_addr++; + opal_output_verbose(30, opal_btl_base_framework.framework_output, + "btl:tcp: using ipv4 interface %s", ifn); } else #endif if ((AF_INET == my_ss.ss_family) && (4 != mca_btl_tcp_component.tcp_disable_family)) { memcpy(&addrs[current_addr].addr_inet, - &((struct sockaddr_in*)&my_ss)->sin_addr, - sizeof(addrs[0].addr_inet)); + &((struct sockaddr_in*)&my_ss)->sin_addr, + sizeof(struct in_addr)); addrs[current_addr].addr_port = mca_btl_tcp_component.tcp_listen_port; addrs[current_addr].addr_family = MCA_BTL_TCP_AF_INET; @@ -1124,6 +1193,8 @@ static int mca_btl_tcp_component_exchange(void) addrs[current_addr].addr_ifkindex = opal_ifindextokindex (index); current_addr++; + opal_output_verbose(30, opal_btl_base_framework.framework_output, + "btl:tcp: using ipv6 interface %s", ifn); } } /* end of for opal_ifbegin() */ } /* end of for tcp_num_btls */ @@ -1287,45 +1358,144 @@ static void mca_btl_tcp_component_recv_handler(int sd, short flags, void* user) struct sockaddr_storage addr; opal_socklen_t addr_len = sizeof(addr); mca_btl_tcp_proc_t* btl_proc; - int retval; + bool sockopt = true; + size_t retval, len = strlen(mca_btl_tcp_magic_id_string); + mca_btl_tcp_endpoint_hs_msg_t hs_msg; + struct timeval save, tv; + socklen_t rcvtimeo_save_len = sizeof(save); + + /* Note, Socket will be in blocking mode during intial handshake + * hence setting SO_RCVTIMEO to say 2 seconds here to avoid waiting + * forever when connecting to older versions (that reply to the + * handshake with only the guid) or when the remote side isn't OMPI + */ + + /* get the current timeout value so we can reset to it */ + if (0 != getsockopt(sd, SOL_SOCKET, SO_RCVTIMEO, (void*)&save, &rcvtimeo_save_len)) { + if (ENOPROTOOPT == errno) { + sockopt = false; + } else { + opal_show_help("help-mpi-btl-tcp.txt", "socket flag fail", + true, opal_process_info.nodename, + getpid(), + "getsockopt(sd, SOL_SOCKET, SO_RCVTIMEO, ...)", + strerror(opal_socket_errno), opal_socket_errno); + return; + } + } else { + tv.tv_sec = 2; + tv.tv_usec = 0; + if (0 != setsockopt(sd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv))) { + opal_show_help("help-mpi-btl-tcp.txt", "socket flag fail", + true, opal_process_info.nodename, + getpid(), + "setsockopt(sd, SOL_SOCKET, SO_RCVTIMEO, ...)", + strerror(opal_socket_errno), opal_socket_errno); + return; + } + } OBJ_RELEASE(event); + retval = mca_btl_tcp_recv_blocking(sd, (void *)&hs_msg, sizeof(hs_msg)); + + /* If we get a zero-length message back, it's likely that we + connected to Open MPI peer process X simultaneously, and the + peer closed its connection to us (in favor of our connection to + them). This is not an error -- just close it and move on. + + Similarly, if we get less than sizeof(hs_msg) bytes, it + probably wasn't an Open MPI peer. But we don't really care, + because the peer closed the socket. So just close it and move + on. */ + if (retval < sizeof(hs_msg)) { + const char *peer = opal_fd_get_peer_name(sd); + opal_output_verbose(20, opal_btl_base_framework.framework_output, + "Peer %s closed socket without sending BTL TCP magic ID handshake (we received %d bytes out of the expected %d) -- closing/ignoring this connection", + peer, (int) retval, (int) sizeof(hs_msg)); + free((char*) peer); + CLOSE_THE_SOCKET(sd); + return; + } - /* recv the process identifier */ - retval = recv(sd, (char *)&guid, sizeof(guid), 0); - if(retval != sizeof(guid)) { + /* Open MPI uses a "magic" string to trivially verify that the + connecting process is a fellow Open MPI process. See if we got + the correct magic string. */ + guid = hs_msg.guid; + if (0 != strncmp(hs_msg.magic_id, mca_btl_tcp_magic_id_string, len)) { + const char *peer = opal_fd_get_peer_name(sd); + opal_output_verbose(20, opal_btl_base_framework.framework_output, + "Peer %s send us an incorrect Open MPI magic ID string (i.e., this was not a connection from the same version of Open MPI; expected \"%s\", received \"%s\")", + peer, + mca_btl_tcp_magic_id_string, + hs_msg.magic_id); + free((char*) peer); + + /* The other side probably isn't OMPI, so just hang up */ CLOSE_THE_SOCKET(sd); return; } + + if (sockopt) { + /* reset RECVTIMEO option to its original state */ + if (0 != setsockopt(sd, SOL_SOCKET, SO_RCVTIMEO, &save, sizeof(save))) { + opal_show_help("help-mpi-btl-tcp.txt", "socket flag fail", + true, opal_process_info.nodename, + getpid(), + "setsockopt(sd, SOL_SOCKET, SO_RCVTIMEO, ...)", + strerror(opal_socket_errno), opal_socket_errno); + return; + } + } + OPAL_PROCESS_NAME_NTOH(guid); /* now set socket up to be non-blocking */ if((flags = fcntl(sd, F_GETFL, 0)) < 0) { - BTL_ERROR(("fcntl(F_GETFL) failed: %s (%d)", - strerror(opal_socket_errno), opal_socket_errno)); + opal_show_help("help-mpi-btl-tcp.txt", "socket flag fail", + true, opal_process_info.nodename, + getpid(), "fcntl(sd, F_GETFL, 0)", + strerror(opal_socket_errno), opal_socket_errno); + CLOSE_THE_SOCKET(sd); } else { flags |= O_NONBLOCK; if(fcntl(sd, F_SETFL, flags) < 0) { - BTL_ERROR(("fcntl(F_SETFL) failed: %s (%d)", - strerror(opal_socket_errno), opal_socket_errno)); + opal_show_help("help-mpi-btl-tcp.txt", "socket flag fail", + true, opal_process_info.nodename, + getpid(), + "fcntl(sd, F_SETFL, flags & O_NONBLOCK)", + strerror(opal_socket_errno), opal_socket_errno); + CLOSE_THE_SOCKET(sd); } } /* lookup the corresponding process */ btl_proc = mca_btl_tcp_proc_lookup(&guid); if(NULL == btl_proc) { + opal_show_help("help-mpi-btl-tcp.txt", + "server accept cannot find guid", + true, opal_process_info.nodename, + getpid()); CLOSE_THE_SOCKET(sd); return; } /* lookup peer address */ if(getpeername(sd, (struct sockaddr*)&addr, &addr_len) != 0) { - BTL_ERROR(("getpeername() failed: %s (%d)", - strerror(opal_socket_errno), opal_socket_errno)); + opal_show_help("help-mpi-btl-tcp.txt", + "server getpeername failed", + true, opal_process_info.nodename, + getpid(), + strerror(opal_socket_errno), opal_socket_errno); CLOSE_THE_SOCKET(sd); return; } /* are there any existing peer instances willing to accept this connection */ (void)mca_btl_tcp_proc_accept(btl_proc, (struct sockaddr*)&addr, sd); + + const char *str = opal_fd_get_peer_name(sd); + opal_output_verbose(10, opal_btl_base_framework.framework_output, + "btl:tcp: now connected to %s, process %s", str, + OPAL_NAME_PRINT(btl_proc->proc_opal->proc_name)); + free((char*) str); } diff --git a/opal/mca/btl/tcp/btl_tcp_endpoint.c b/opal/mca/btl/tcp/btl_tcp_endpoint.c index 9cd97e34b21..f8df420ff8e 100644 --- a/opal/mca/btl/tcp/btl_tcp_endpoint.c +++ b/opal/mca/btl/tcp/btl_tcp_endpoint.c @@ -62,6 +62,11 @@ #include "btl_tcp_frag.h" #include "btl_tcp_addr.h" +/* + * Magic ID string send during connect/accept handshake + */ + +const char mca_btl_tcp_magic_id_string[MCA_BTL_TCP_MAGIC_STRING_LENGTH] = "OPAL-TCP-BTL"; /* * Initialize state of the endpoint instance. @@ -371,48 +376,42 @@ int mca_btl_tcp_endpoint_send(mca_btl_base_endpoint_t* btl_endpoint, mca_btl_tcp /* - * A blocking send on a non-blocking socket. Used to send the small amount of connection - * information that identifies the endpoints endpoint. + * A blocking send on a non-blocking socket. Used to send the small + * amount of connection information that identifies the endpoints endpoint. */ static int mca_btl_tcp_endpoint_send_blocking(mca_btl_base_endpoint_t* btl_endpoint, - void* data, size_t size) + const void* data, size_t size) { - unsigned char* ptr = (unsigned char*)data; - size_t cnt = 0; - while(cnt < size) { - int retval = send(btl_endpoint->endpoint_sd, (const char *)ptr+cnt, size-cnt, 0); - if(retval < 0) { - if(opal_socket_errno != EINTR && opal_socket_errno != EAGAIN && opal_socket_errno != EWOULDBLOCK) { - BTL_ERROR(("send(%d, %p, %lu/%lu) failed: %s (%d)", - btl_endpoint->endpoint_sd, data, cnt, size, - strerror(opal_socket_errno), opal_socket_errno)); - btl_endpoint->endpoint_state = MCA_BTL_TCP_FAILED; - mca_btl_tcp_endpoint_close(btl_endpoint); - return -1; - } - continue; - } - cnt += retval; + int ret = mca_btl_tcp_send_blocking(btl_endpoint->endpoint_sd, data, size); + if (ret < 0) { + mca_btl_tcp_endpoint_close(btl_endpoint); } - return cnt; + return ret; } - /* * Send the globally unique identifier for this process to a endpoint on * a newly connected socket. */ - -static int mca_btl_tcp_endpoint_send_connect_ack(mca_btl_base_endpoint_t* btl_endpoint) +static int +mca_btl_tcp_endpoint_send_connect_ack(mca_btl_base_endpoint_t* btl_endpoint) { - /* send process identifier to remote endpoint */ opal_process_name_t guid = opal_proc_local_get()->proc_name; - OPAL_PROCESS_NAME_HTON(guid); - if(mca_btl_tcp_endpoint_send_blocking(btl_endpoint, &guid, sizeof(guid)) != - sizeof(guid)) { - return OPAL_ERR_UNREACH; + + mca_btl_tcp_endpoint_hs_msg_t hs_msg; + strcpy(hs_msg.magic_id, mca_btl_tcp_magic_id_string); + hs_msg.guid = guid; + + if(sizeof(hs_msg) != + mca_btl_tcp_endpoint_send_blocking(btl_endpoint, + &hs_msg, sizeof(hs_msg))) { + opal_show_help("help-mpi-btl-tcp.txt", "client handshake fail", + true, opal_process_info.nodename, + sizeof(hs_msg), + "connect ACK failed to send magic-id and guid"); + return OPAL_ERR_UNREACH; } return OPAL_SUCCESS; } @@ -464,6 +463,10 @@ static void *mca_btl_tcp_endpoint_complete_accept(int fd, int flags, void *conte mca_btl_tcp_endpoint_event_init(btl_endpoint); MCA_BTL_TCP_ENDPOINT_DUMP(10, btl_endpoint, true, "event_add(recv) [endpoint_accept]"); opal_event_add(&btl_endpoint->endpoint_recv_event, 0); + if( mca_btl_tcp_event_base == opal_sync_event_base ) { + /* If no progress thread then raise the awarness of the default progress engine */ + opal_progress_event_users_increment(); + } mca_btl_tcp_endpoint_connected(btl_endpoint); MCA_BTL_TCP_ENDPOINT_DUMP(10, btl_endpoint, true, "accepted"); @@ -513,6 +516,10 @@ void mca_btl_tcp_endpoint_close(mca_btl_base_endpoint_t* btl_endpoint) btl_endpoint->endpoint_retries++; MCA_BTL_TCP_ENDPOINT_DUMP(1, btl_endpoint, false, "event_del(recv) [close]"); opal_event_del(&btl_endpoint->endpoint_recv_event); + if( mca_btl_tcp_event_base == opal_sync_event_base ) { + /* If no progress thread then lower the awarness of the default progress engine */ + opal_progress_event_users_decrement(); + } MCA_BTL_TCP_ENDPOINT_DUMP(1, btl_endpoint, false, "event_del(send) [close]"); opal_event_del(&btl_endpoint->endpoint_send_event); @@ -567,55 +574,27 @@ static void mca_btl_tcp_endpoint_connected(mca_btl_base_endpoint_t* btl_endpoint } -/* - * A blocking recv on a non-blocking socket. Used to receive the small - * amount of connection information that identifies the remote endpoint (guid). - */ -static int mca_btl_tcp_endpoint_recv_blocking(mca_btl_base_endpoint_t* btl_endpoint, void* data, size_t size) -{ - unsigned char* ptr = (unsigned char*)data; - size_t cnt = 0; - while(cnt < size) { - int retval = recv(btl_endpoint->endpoint_sd, (char *)ptr+cnt, size-cnt, 0); - - /* remote closed connection */ - if(retval == 0) { - mca_btl_tcp_endpoint_close(btl_endpoint); - return cnt; - } - - /* socket is non-blocking so handle errors */ - if(retval < 0) { - if(opal_socket_errno != EINTR && opal_socket_errno != EAGAIN && opal_socket_errno != EWOULDBLOCK) { - BTL_ERROR(("recv(%d, %lu/%lu) failed: %s (%d)", - btl_endpoint->endpoint_sd, cnt, size, strerror(opal_socket_errno), opal_socket_errno)); - btl_endpoint->endpoint_state = MCA_BTL_TCP_FAILED; - mca_btl_tcp_endpoint_close(btl_endpoint); - return -1; - } - continue; - } - cnt += retval; - } - return cnt; -} - - /* * Receive the endpoints globally unique process identification from a newly * connected socket and verify the expected response. If so, move the * socket to a connected state. + * + * NOTE: The return codes from this function are checked in + * mca_btl_tcp_endpoint_recv_handler(). Don't change them here + * without also changing the handling in _recv_handler()! */ static int mca_btl_tcp_endpoint_recv_connect_ack(mca_btl_base_endpoint_t* btl_endpoint) { - size_t s; - opal_process_name_t guid; + size_t retval, len = strlen(mca_btl_tcp_magic_id_string);; mca_btl_tcp_proc_t* btl_proc = btl_endpoint->endpoint_proc; + opal_process_name_t guid; - s = mca_btl_tcp_endpoint_recv_blocking(btl_endpoint, - &guid, sizeof(opal_process_name_t)); - if (s != sizeof(opal_process_name_t)) { - if (0 == s) { + mca_btl_tcp_endpoint_hs_msg_t hs_msg; + retval = mca_btl_tcp_recv_blocking(btl_endpoint->endpoint_sd, &hs_msg, sizeof(hs_msg)); + + if (sizeof(hs_msg) != retval) { + mca_btl_tcp_endpoint_close(btl_endpoint); + if (0 == retval) { /* If we get zero bytes, the peer closed the socket. This can happen when the two peers started the connection protocol simultaneously. Just report the problem @@ -624,10 +603,19 @@ static int mca_btl_tcp_endpoint_recv_connect_ack(mca_btl_base_endpoint_t* btl_en } opal_show_help("help-mpi-btl-tcp.txt", "client handshake fail", true, opal_process_info.nodename, - getpid(), - "did not receive entire connect ACK from peer"); - return OPAL_ERR_UNREACH; + getpid(), "did not receive entire connect ACK from peer"); + + return OPAL_ERR_BAD_PARAM; + } + if (0 != strncmp(hs_msg.magic_id, mca_btl_tcp_magic_id_string, len)) { + opal_show_help("help-mpi-btl-tcp.txt", "server did not receive magic string", + true, opal_process_info.nodename, + getpid(), "client", hs_msg.magic_id, + "string value"); + return OPAL_ERR_BAD_PARAM; } + + guid = hs_msg.guid; OPAL_PROCESS_NAME_NTOH(guid); /* compare this to the expected values */ /* TODO: this deserve a little bit more thinking as we are not supposed @@ -708,30 +696,76 @@ static int mca_btl_tcp_endpoint_start_connect(mca_btl_base_endpoint_t* btl_endpo /* setup the socket as non-blocking */ if((flags = fcntl(btl_endpoint->endpoint_sd, F_GETFL, 0)) < 0) { - BTL_ERROR(("fcntl(F_GETFL) failed: %s (%d)", - strerror(opal_socket_errno), opal_socket_errno)); + opal_show_help("help-mpi-btl-tcp.txt", "socket flag fail", + true, opal_process_info.nodename, + getpid(), "fcntl(sd, F_GETFL, 0)", + strerror(opal_socket_errno), opal_socket_errno); + /* Upper layer will handler the error */ + return OPAL_ERR_UNREACH; } else { flags |= O_NONBLOCK; - if(fcntl(btl_endpoint->endpoint_sd, F_SETFL, flags) < 0) - BTL_ERROR(("fcntl(F_SETFL) failed: %s (%d)", - strerror(opal_socket_errno), opal_socket_errno)); + if(fcntl(btl_endpoint->endpoint_sd, F_SETFL, flags) < 0) { + opal_show_help("help-mpi-btl-tcp.txt", "socket flag fail", + true, opal_process_info.nodename, + getpid(), + "fcntl(sd, F_SETFL, flags & O_NONBLOCK)", + strerror(opal_socket_errno), opal_socket_errno); + /* Upper layer will handler the error */ + return OPAL_ERR_UNREACH; + } } /* start the connect - will likely fail with EINPROGRESS */ mca_btl_tcp_proc_tosocks(btl_endpoint->endpoint_addr, &endpoint_addr); - - opal_output_verbose(20, opal_btl_base_framework.framework_output, + + /* Bind the socket to one of the addresses associated with + * this btl module. This sets the source IP to one of the + * addresses shared in modex, so that the destination rank + * can properly pair btl modules, even in cases where Linux + * might do something unexpected with routing */ + opal_socklen_t sockaddr_addrlen = sizeof(struct sockaddr_storage); + if (endpoint_addr.ss_family == AF_INET) { + assert(NULL != &btl_endpoint->endpoint_btl->tcp_ifaddr); + if (bind(btl_endpoint->endpoint_sd, (struct sockaddr*) &btl_endpoint->endpoint_btl->tcp_ifaddr, + sockaddr_addrlen) < 0) { + BTL_ERROR(("bind() failed: %s (%d)", strerror(opal_socket_errno), opal_socket_errno)); + + CLOSE_THE_SOCKET(btl_endpoint->endpoint_sd); + return OPAL_ERROR; + } + } +#if OPAL_ENABLE_IPV6 + if (endpoint_addr.ss_family == AF_INET6) { + assert(NULL != &btl_endpoint->endpoint_btl->tcp_ifaddr_6); + if (bind(btl_endpoint->endpoint_sd, (struct sockaddr*) &btl_endpoint->endpoint_btl->tcp_ifaddr_6, + sockaddr_addrlen) < 0) { + BTL_ERROR(("bind() failed: %s (%d)", strerror(opal_socket_errno), opal_socket_errno)); + + CLOSE_THE_SOCKET(btl_endpoint->endpoint_sd); + return OPAL_ERROR; + } + } +#endif + opal_output_verbose(10, opal_btl_base_framework.framework_output, "btl: tcp: attempting to connect() to %s address %s on port %d", OPAL_NAME_PRINT(btl_endpoint->endpoint_proc->proc_opal->proc_name), opal_net_get_hostname((struct sockaddr*) &endpoint_addr), ntohs(btl_endpoint->endpoint_addr->addr_port)); if(0 == connect(btl_endpoint->endpoint_sd, (struct sockaddr*)&endpoint_addr, addrlen)) { + opal_output_verbose(10, opal_btl_base_framework.framework_output, + "btl:tcp: connect() to %s:%d completed", + opal_net_get_hostname((struct sockaddr*) &endpoint_addr), + ntohs(((struct sockaddr_in*) &endpoint_addr)->sin_port)); /* send our globally unique process identifier to the endpoint */ if((rc = mca_btl_tcp_endpoint_send_connect_ack(btl_endpoint)) == OPAL_SUCCESS) { btl_endpoint->endpoint_state = MCA_BTL_TCP_CONNECT_ACK; MCA_BTL_TCP_ENDPOINT_DUMP(10, btl_endpoint, true, "event_add(recv) [start_connect]"); opal_event_add(&btl_endpoint->endpoint_recv_event, 0); + if( mca_btl_tcp_event_base == opal_sync_event_base ) { + /* If no progress thread then raise the awarness of the default progress engine */ + opal_progress_event_users_increment(); + } return OPAL_SUCCESS; } /* We connected to the peer, but he close the socket before we got a chance to send our guid */ @@ -742,6 +776,8 @@ static int mca_btl_tcp_endpoint_start_connect(mca_btl_base_endpoint_t* btl_endpo btl_endpoint->endpoint_state = MCA_BTL_TCP_CONNECTING; MCA_BTL_TCP_ENDPOINT_DUMP(10, btl_endpoint, true, "event_add(send) [start_connect]"); MCA_BTL_TCP_ACTIVATE_EVENT(&btl_endpoint->endpoint_send_event, 0); + opal_output_verbose(30, opal_btl_base_framework.framework_output, + "btl:tcp: would block, so allowing background progress"); return OPAL_SUCCESS; } } @@ -765,7 +801,7 @@ static int mca_btl_tcp_endpoint_start_connect(mca_btl_base_endpoint_t* btl_endpo * later. Otherwise, send this processes identifier to the endpoint on the * newly connected socket. */ -static void mca_btl_tcp_endpoint_complete_connect(mca_btl_base_endpoint_t* btl_endpoint) +static int mca_btl_tcp_endpoint_complete_connect(mca_btl_base_endpoint_t* btl_endpoint) { int so_error = 0; opal_socklen_t so_length = sizeof(so_error); @@ -781,32 +817,53 @@ static void mca_btl_tcp_endpoint_complete_connect(mca_btl_base_endpoint_t* btl_e /* check connect completion status */ if(getsockopt(btl_endpoint->endpoint_sd, SOL_SOCKET, SO_ERROR, (char *)&so_error, &so_length) < 0) { - BTL_ERROR(("getsockopt() to %s failed: %s (%d)", + opal_show_help("help-mpi-btl-tcp.txt", "socket flag fail", + true, opal_process_info.nodename, + getpid(), "fcntl(sd, F_GETFL, 0)", + strerror(opal_socket_errno), opal_socket_errno); + BTL_ERROR(("getsockopt() to %s:%d failed: %s (%d)", opal_net_get_hostname((struct sockaddr*) &endpoint_addr), + ((struct sockaddr_in*) &endpoint_addr)->sin_port, strerror(opal_socket_errno), opal_socket_errno)); mca_btl_tcp_endpoint_close(btl_endpoint); - return; + return OPAL_ERROR; } if(so_error == EINPROGRESS || so_error == EWOULDBLOCK) { - return; + return OPAL_SUCCESS; } if(so_error != 0) { - BTL_ERROR(("connect() to %s failed: %s (%d)", - opal_net_get_hostname((struct sockaddr*) &endpoint_addr), - strerror(so_error), so_error)); + char *msg; + asprintf(&msg, "connect() to %s:%d failed", + opal_net_get_hostname((struct sockaddr*) &endpoint_addr), + ntohs(((struct sockaddr_in*) &endpoint_addr)->sin_port)); + opal_show_help("help-mpi-btl-tcp.txt", "client connect fail", + true, opal_process_info.nodename, + getpid(), msg, + strerror(opal_socket_errno), opal_socket_errno); + free(msg); mca_btl_tcp_endpoint_close(btl_endpoint); - return; + return OPAL_ERROR; } + opal_output_verbose(10, opal_btl_base_framework.framework_output, + "btl:tcp: connect() to %s:%d completed (complete_connect), sending connect ACK", + opal_net_get_hostname((struct sockaddr*) &endpoint_addr), + ntohs(((struct sockaddr_in*) &endpoint_addr)->sin_port)); + if(mca_btl_tcp_endpoint_send_connect_ack(btl_endpoint) == OPAL_SUCCESS) { btl_endpoint->endpoint_state = MCA_BTL_TCP_CONNECT_ACK; opal_event_add(&btl_endpoint->endpoint_recv_event, 0); + if( mca_btl_tcp_event_base == opal_sync_event_base ) { + /* If no progress thread then raise the awarness of the default progress engine */ + opal_progress_event_users_increment(); + } MCA_BTL_TCP_ENDPOINT_DUMP(10, btl_endpoint, false, "event_add(recv) [complete_connect]"); - return; + return OPAL_SUCCESS; } MCA_BTL_TCP_ENDPOINT_DUMP(1, btl_endpoint, false, " [complete_connect]"); btl_endpoint->endpoint_state = MCA_BTL_TCP_FAILED; mca_btl_tcp_endpoint_close(btl_endpoint); + return OPAL_ERROR; } @@ -855,6 +912,26 @@ static void mca_btl_tcp_endpoint_recv_handler(int sd, short flags, void* user) OPAL_THREAD_UNLOCK(&btl_endpoint->endpoint_send_lock); MCA_BTL_TCP_ENDPOINT_DUMP(10, btl_endpoint, true, "connected"); } + else if (OPAL_ERR_BAD_PARAM == rc) { + /* If we get a BAD_PARAM, it means that it probably wasn't + an OMPI process on the other end of the socket (e.g., + the magic string ID failed). So we can probably just + close the socket and ignore this connection. */ + CLOSE_THE_SOCKET(sd); + } + else { + /* Otherwise, it probably *was* an OMPI peer process on + the other end, and something bad has probably + happened. */ + mca_btl_tcp_module_t *m = btl_endpoint->endpoint_btl; + + /* Fail up to the PML */ + if (NULL != m->tcp_error_cb) { + m->tcp_error_cb((mca_btl_base_module_t*) m, MCA_BTL_ERROR_FLAGS_FATAL, + btl_endpoint->endpoint_proc->proc_opal, + "TCP ACK is neither SUCCESS nor ERR (something bad has probably happened)"); + } + } OPAL_THREAD_UNLOCK(&btl_endpoint->endpoint_recv_lock); return; } diff --git a/opal/mca/btl/tcp/btl_tcp_endpoint.h b/opal/mca/btl/tcp/btl_tcp_endpoint.h index 5e405511911..70af7617e09 100644 --- a/opal/mca/btl/tcp/btl_tcp_endpoint.h +++ b/opal/mca/btl/tcp/btl_tcp_endpoint.h @@ -26,7 +26,7 @@ BEGIN_C_DECLS #define MCA_BTL_TCP_ENDPOINT_CACHE 1 - +#define MCA_BTL_TCP_MAGIC_STRING_LENGTH 16 /** * State of TCP endpoint connection. */ @@ -75,6 +75,14 @@ typedef struct mca_btl_base_endpoint_t mca_btl_base_endpoint_t; typedef mca_btl_base_endpoint_t mca_btl_tcp_endpoint_t; OBJ_CLASS_DECLARATION(mca_btl_tcp_endpoint_t); +/* Magic socket handshake string */ +extern const char mca_btl_tcp_magic_id_string[MCA_BTL_TCP_MAGIC_STRING_LENGTH]; + +typedef struct { + opal_process_name_t guid; + char magic_id[MCA_BTL_TCP_MAGIC_STRING_LENGTH]; +} mca_btl_tcp_endpoint_hs_msg_t; + void mca_btl_tcp_set_socket_options(int sd); void mca_btl_tcp_endpoint_close(mca_btl_base_endpoint_t*); int mca_btl_tcp_endpoint_send(mca_btl_base_endpoint_t*, struct mca_btl_tcp_frag_t*); diff --git a/opal/mca/btl/tcp/btl_tcp_frag.c b/opal/mca/btl/tcp/btl_tcp_frag.c index 08bf1536db2..56775067c9d 100644 --- a/opal/mca/btl/tcp/btl_tcp_frag.c +++ b/opal/mca/btl/tcp/btl_tcp_frag.c @@ -12,7 +12,7 @@ * All rights reserved. * Copyright (c) 2014 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015-2016 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015-2016 Cisco Systems, Inc. All rights reserved. * $COPYRIGHT$ @@ -112,11 +112,11 @@ size_t mca_btl_tcp_frag_dump(mca_btl_tcp_frag_t* frag, char* msg, char* buf, siz bool mca_btl_tcp_frag_send(mca_btl_tcp_frag_t* frag, int sd) { - ssize_t cnt = -1; + ssize_t cnt; size_t i, num_vecs; /* non-blocking write, but continue if interrupted */ - while(cnt < 0) { + do { cnt = writev(sd, frag->iov_ptr, frag->iov_cnt); if(cnt < 0) { switch(opal_socket_errno) { @@ -140,11 +140,11 @@ bool mca_btl_tcp_frag_send(mca_btl_tcp_frag_t* frag, int sd) return false; } } - } + } while(cnt < 0); /* if the write didn't complete - update the iovec state */ num_vecs = frag->iov_cnt; - for(i=0; i= (ssize_t)frag->iov_ptr->iov_len) { cnt -= frag->iov_ptr->iov_len; frag->iov_ptr++; @@ -166,8 +166,8 @@ bool mca_btl_tcp_frag_send(mca_btl_tcp_frag_t* frag, int sd) bool mca_btl_tcp_frag_recv(mca_btl_tcp_frag_t* frag, int sd) { mca_btl_base_endpoint_t* btl_endpoint = frag->endpoint; - int i, num_vecs, dont_copy_data = 0; ssize_t cnt; + int32_t i, num_vecs, dont_copy_data = 0; repeat: num_vecs = frag->iov_cnt; @@ -208,8 +208,7 @@ bool mca_btl_tcp_frag_recv(mca_btl_tcp_frag_t* frag, int sd) #endif /* MCA_BTL_TCP_ENDPOINT_CACHE */ /* non-blocking read, but continue if interrupted */ - cnt = -1; - while( cnt < 0 ) { + do { cnt = readv(sd, frag->iov_ptr, num_vecs); if( 0 < cnt ) goto advance_iov_position; if( cnt == 0 ) { @@ -247,7 +246,7 @@ bool mca_btl_tcp_frag_recv(mca_btl_tcp_frag_t* frag, int sd) mca_btl_tcp_endpoint_close(btl_endpoint); return false; } - } + } while( cnt < 0 ); advance_iov_position: /* if the read didn't complete - update the iovec state */ @@ -291,6 +290,7 @@ bool mca_btl_tcp_frag_recv(mca_btl_tcp_frag_t* frag, int sd) goto repeat; } else if (frag->iov_idx == 2) { for( i = 0; i < frag->hdr.count; i++ ) { + if (btl_endpoint->endpoint_nbo) MCA_BTL_BASE_SEGMENT_NTOH(frag->segments[i]); frag->iov[i+2].iov_base = (IOVBASE_TYPE*)frag->segments[i].seg_addr.pval; frag->iov[i+2].iov_len = frag->segments[i].seg_len; } diff --git a/opal/mca/btl/tcp/btl_tcp_frag.h b/opal/mca/btl/tcp/btl_tcp_frag.h index b73da8f6edb..e1b068502f9 100644 --- a/opal/mca/btl/tcp/btl_tcp_frag.h +++ b/opal/mca/btl/tcp/btl_tcp_frag.h @@ -53,8 +53,8 @@ struct mca_btl_tcp_frag_t { mca_btl_tcp_hdr_t hdr; struct iovec iov[MCA_BTL_TCP_FRAG_IOVEC_NUMBER + 1]; struct iovec *iov_ptr; - size_t iov_cnt; - size_t iov_idx; + uint32_t iov_cnt; + uint32_t iov_idx; size_t size; uint16_t next_step; int rc; diff --git a/opal/mca/btl/tcp/btl_tcp_proc.c b/opal/mca/btl/tcp/btl_tcp_proc.c index 78cff8381db..9abba8fe660 100644 --- a/opal/mca/btl/tcp/btl_tcp_proc.c +++ b/opal/mca/btl/tcp/btl_tcp_proc.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2014 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -11,12 +11,12 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2010 Oracle and/or its affiliates. All rights reserved - * Copyright (c) 2013-2015 Intel, Inc. All rights reserved + * Copyright (c) 2013-2017 Intel, Inc. All rights reserved. * Copyright (c) 2014-2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2015-2016 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2015-2017 Cisco Systems, Inc. All rights reserved + * Copyright (c) 2015-2018 Cisco Systems, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -417,10 +417,11 @@ int mca_btl_tcp_proc_insert( mca_btl_tcp_proc_t* btl_proc, unsigned int perm_size; int rc, *a = NULL; size_t i, j; - mca_btl_tcp_interface_t** peer_interfaces; + mca_btl_tcp_interface_t** peer_interfaces = NULL; mca_btl_tcp_proc_data_t _proc_data, *proc_data=&_proc_data; size_t max_peer_interfaces; memset(proc_data, 0, sizeof(mca_btl_tcp_proc_data_t)); + char str_local[128], str_remote[128]; if (NULL == (proc_hostname = opal_get_proc_hostname(btl_proc->proc_opal))) { return OPAL_ERR_UNREACH; @@ -450,7 +451,11 @@ int mca_btl_tcp_proc_insert( mca_btl_tcp_proc_t* btl_proc, max_peer_interfaces = proc_data->max_local_interfaces; peer_interfaces = (mca_btl_tcp_interface_t**)calloc( max_peer_interfaces, sizeof(mca_btl_tcp_interface_t*) ); - assert(NULL != peer_interfaces); + if (NULL == peer_interfaces) { + max_peer_interfaces = 0; + rc = OPAL_ERR_OUT_OF_RESOURCE; + goto exit; + } proc_data->num_peer_interfaces = 0; memset(proc_data->peer_kindex_to_index, -1, sizeof(int)*MAX_KERNEL_INTERFACE_INDEX); @@ -476,8 +481,9 @@ int mca_btl_tcp_proc_insert( mca_btl_tcp_proc_t* btl_proc, max_peer_interfaces <<= 1; peer_interfaces = (mca_btl_tcp_interface_t**)realloc( peer_interfaces, max_peer_interfaces * sizeof(mca_btl_tcp_interface_t*) ); - if( NULL == peer_interfaces ) + if( NULL == peer_interfaces ) { return OPAL_ERR_OUT_OF_RESOURCE; + } } peer_interfaces[index] = (mca_btl_tcp_interface_t *) malloc(sizeof(mca_btl_tcp_interface_t)); mca_btl_tcp_initialise_interface(peer_interfaces[index], @@ -485,10 +491,10 @@ int mca_btl_tcp_proc_insert( mca_btl_tcp_proc_t* btl_proc, } /* - * in case one of the peer addresses is already in use, + * in case the peer address has created all intended connections, * mark the complete peer interface as 'not available' */ - if(endpoint_addr->addr_inuse) { + if(endpoint_addr->addr_inuse >= mca_btl_tcp_component.tcp_num_links) { peer_interfaces[index]->inuse = 1; } @@ -508,10 +514,7 @@ int mca_btl_tcp_proc_insert( mca_btl_tcp_proc_t* btl_proc, default: opal_output(0, "unknown address family for tcp: %d\n", endpoint_addr_ss.ss_family); - /* - * return OPAL_UNREACH or some error, as this is not - * good - */ + return OPAL_ERR_UNREACH; } } @@ -553,14 +556,26 @@ int mca_btl_tcp_proc_insert( mca_btl_tcp_proc_t* btl_proc, if(NULL != proc_data->local_interfaces[i]->ipv4_address && NULL != peer_interfaces[j]->ipv4_address) { + /* Convert the IPv4 addresses into nicely-printable strings for verbose debugging output */ + inet_ntop(AF_INET, &(((struct sockaddr_in*) proc_data->local_interfaces[i]->ipv4_address))->sin_addr, + str_local, sizeof(str_local)); + inet_ntop(AF_INET, &(((struct sockaddr_in*) peer_interfaces[j]->ipv4_address))->sin_addr, + str_remote, sizeof(str_remote)); + if(opal_net_addr_isipv4public((struct sockaddr*) local_interface->ipv4_address) && opal_net_addr_isipv4public((struct sockaddr*) peer_interfaces[j]->ipv4_address)) { if(opal_net_samenetwork((struct sockaddr*) local_interface->ipv4_address, (struct sockaddr*) peer_interfaces[j]->ipv4_address, local_interface->ipv4_netmask)) { proc_data->weights[i][j] = CQ_PUBLIC_SAME_NETWORK; + opal_output_verbose(20, opal_btl_base_framework.framework_output, + "btl:tcp: path from %s to %s: IPV4 PUBLIC SAME NETWORK", + str_local, str_remote); } else { proc_data->weights[i][j] = CQ_PUBLIC_DIFFERENT_NETWORK; + opal_output_verbose(20, opal_btl_base_framework.framework_output, + "btl:tcp: path from %s to %s: IPV4 PUBLIC DIFFERENT NETWORK", + str_local, str_remote); } proc_data->best_addr[i][j] = peer_interfaces[j]->ipv4_endpoint_addr; continue; @@ -569,8 +584,14 @@ int mca_btl_tcp_proc_insert( mca_btl_tcp_proc_t* btl_proc, (struct sockaddr*) peer_interfaces[j]->ipv4_address, local_interface->ipv4_netmask)) { proc_data->weights[i][j] = CQ_PRIVATE_SAME_NETWORK; + opal_output_verbose(20, opal_btl_base_framework.framework_output, + "btl:tcp: path from %s to %s: IPV4 PRIVATE SAME NETWORK", + str_local, str_remote); } else { proc_data->weights[i][j] = CQ_PRIVATE_DIFFERENT_NETWORK; + opal_output_verbose(20, opal_btl_base_framework.framework_output, + "btl:tcp: path from %s to %s: IPV4 PRIVATE DIFFERENT NETWORK", + str_local, str_remote); } proc_data->best_addr[i][j] = peer_interfaces[j]->ipv4_endpoint_addr; continue; @@ -582,12 +603,24 @@ int mca_btl_tcp_proc_insert( mca_btl_tcp_proc_t* btl_proc, if(NULL != local_interface->ipv6_address && NULL != peer_interfaces[j]->ipv6_address) { + /* Convert the IPv6 addresses into nicely-printable strings for verbose debugging output */ + inet_ntop(AF_INET6, &(((struct sockaddr_in6*) local_interface->ipv6_address))->sin6_addr, + str_local, sizeof(str_local)); + inet_ntop(AF_INET6, &(((struct sockaddr_in6*) peer_interfaces[j]->ipv6_address))->sin6_addr, + str_remote, sizeof(str_remote)); + if(opal_net_samenetwork((struct sockaddr*) local_interface->ipv6_address, (struct sockaddr*) peer_interfaces[j]->ipv6_address, local_interface->ipv6_netmask)) { proc_data->weights[i][j] = CQ_PUBLIC_SAME_NETWORK; + opal_output_verbose(20, opal_btl_base_framework.framework_output, + "btl:tcp: path from %s to %s: IPV6 PUBLIC SAME NETWORK", + str_local, str_remote); } else { proc_data->weights[i][j] = CQ_PUBLIC_DIFFERENT_NETWORK; + opal_output_verbose(20, opal_btl_base_framework.framework_output, + "btl:tcp: path from %s to %s: IPV6 PUBLIC DIFFERENT NETWORK", + str_local, str_remote); } proc_data->best_addr[i][j] = peer_interfaces[j]->ipv6_endpoint_addr; continue; @@ -605,7 +638,8 @@ int mca_btl_tcp_proc_insert( mca_btl_tcp_proc_t* btl_proc, a = (int *) malloc(perm_size * sizeof(int)); if (NULL == a) { - return OPAL_ERR_OUT_OF_RESOURCE; + rc = OPAL_ERR_OUT_OF_RESOURCE; + goto exit; } /* Can only find the best set of connections when the number of @@ -660,7 +694,16 @@ int mca_btl_tcp_proc_insert( mca_btl_tcp_proc_t* btl_proc, rc = OPAL_SUCCESS; } } + if (OPAL_ERR_UNREACH == rc) { + opal_output_verbose(10, opal_btl_base_framework.framework_output, + "btl:tcp: host %s, process %s UNREACHABLE", + proc_hostname, + OPAL_NAME_PRINT(btl_proc->proc_opal->proc_name)); + } + exit: + // Ok to always free because proc_data() was memset() to 0 before + // any possible return (and free(NULL) is fine). for(i = 0; i < perm_size; ++i) { free(proc_data->weights[i]); free(proc_data->best_addr[i]); @@ -716,7 +759,7 @@ int mca_btl_tcp_proc_remove(mca_btl_tcp_proc_t* btl_proc, mca_btl_base_endpoint_ OBJ_RELEASE(btl_proc); return OPAL_SUCCESS; } - /* The endpoint_addr may still be NULL if this enpoint is + /* The endpoint_addr may still be NULL if this endpoint is being removed early in the wireup sequence (e.g., if it is unreachable by all other procs) */ if (NULL != btl_endpoint->endpoint_addr) { @@ -776,10 +819,15 @@ mca_btl_tcp_proc_t* mca_btl_tcp_proc_lookup(const opal_process_name_t *name) void mca_btl_tcp_proc_accept(mca_btl_tcp_proc_t* btl_proc, struct sockaddr* addr, int sd) { OPAL_THREAD_LOCK(&btl_proc->proc_lock); + int found_match = 0; + mca_btl_base_endpoint_t* match_btl_endpoint; + for( size_t i = 0; i < btl_proc->proc_endpoint_count; i++ ) { mca_btl_base_endpoint_t* btl_endpoint = btl_proc->proc_endpoints[i]; - /* Check all conditions before going to try to accept the connection. */ - if( btl_endpoint->endpoint_addr->addr_family != addr->sa_family ) { + /* We are not here to make a decision about what is good socket + * and what is not. We simply check that this socket fit the endpoint + * end we prepare for the real decision function mca_btl_tcp_endpoint_accept. */ + if( btl_endpoint->endpoint_addr->addr_family != addr->sa_family) { continue; } switch (addr->sa_family) { @@ -797,6 +845,10 @@ void mca_btl_tcp_proc_accept(mca_btl_tcp_proc_t* btl_proc, struct sockaddr* addr tmp[1], 16), (int)i, (int)btl_proc->proc_endpoint_count); continue; + } else if (btl_endpoint->endpoint_state != MCA_BTL_TCP_CLOSED) { + found_match = 1; + match_btl_endpoint = btl_endpoint; + continue; } break; #if OPAL_ENABLE_IPV6 @@ -821,31 +873,37 @@ void mca_btl_tcp_proc_accept(mca_btl_tcp_proc_t* btl_proc, struct sockaddr* addr ; } + /* Set state to CONNECTING to ensure that subsequent conenctions do not attempt to re-use endpoint in the num_links > 1 case*/ + btl_endpoint->endpoint_state = MCA_BTL_TCP_CONNECTING; (void)mca_btl_tcp_endpoint_accept(btl_endpoint, addr, sd); OPAL_THREAD_UNLOCK(&btl_proc->proc_lock); return; } + /* In this case the connection was inbound to an address exported, but was not in a CLOSED state. + * mca_btl_tcp_endpoint_accept() has logic to deal with the race condition that has likely caused this + * scenario, so call it here.*/ + if (found_match) { + (void)mca_btl_tcp_endpoint_accept(match_btl_endpoint, addr, sd); + OPAL_THREAD_UNLOCK(&btl_proc->proc_lock); + return; + } /* No further use of this socket. Close it */ CLOSE_THE_SOCKET(sd); { - size_t len = 1024; - char* addr_str = (char*)malloc(len); - if( NULL != addr_str ) { - memset(addr_str, 0, len); - for (size_t i = 0; i < btl_proc->proc_endpoint_count; i++) { - mca_btl_base_endpoint_t* btl_endpoint = btl_proc->proc_endpoints[i]; - if (btl_endpoint->endpoint_addr->addr_family != addr->sa_family) { - continue; - } - - if (addr_str[0] != '\0') { - strncat(addr_str, ", ", len); - len -= 2; - } - strncat(addr_str, inet_ntop(AF_INET6, (void*)(struct in6_addr*)&btl_endpoint->endpoint_addr->addr_inet, - addr_str + 1024 - len, INET6_ADDRSTRLEN), len); - len = 1024 - strlen(addr_str); + char *addr_str = NULL, *tmp, *pnet; + for (size_t i = 0; i < btl_proc->proc_endpoint_count; i++) { + mca_btl_base_endpoint_t* btl_endpoint = btl_proc->proc_endpoints[i]; + if (btl_endpoint->endpoint_addr->addr_family != addr->sa_family) { + continue; } + pnet = opal_net_get_hostname((struct sockaddr*)&btl_endpoint->endpoint_addr->addr_inet); + if (NULL == addr_str) { + (void)asprintf(&tmp, "\n\t%s", pnet); + } else { + (void)asprintf(&tmp, "%s\n\t%s", addr_str, pnet); + free(addr_str); + } + addr_str = tmp; } opal_show_help("help-mpi-btl-tcp.txt", "dropped inbound connection", true, opal_process_info.nodename, @@ -853,8 +911,10 @@ void mca_btl_tcp_proc_accept(mca_btl_tcp_proc_t* btl_proc, struct sockaddr* addr btl_proc->proc_opal->proc_hostname, OPAL_NAME_PRINT(btl_proc->proc_opal->proc_name), opal_net_get_hostname((struct sockaddr*)addr), - addr_str); - free(addr_str); + (NULL == addr_str) ? "NONE" : addr_str); + if (NULL != addr_str) { + free(addr_str); + } } OPAL_THREAD_UNLOCK(&btl_proc->proc_lock); } diff --git a/opal/mca/btl/tcp/help-mpi-btl-tcp.txt b/opal/mca/btl/tcp/help-mpi-btl-tcp.txt index a5781f7ed0c..d513ef4db21 100644 --- a/opal/mca/btl/tcp/help-mpi-btl-tcp.txt +++ b/opal/mca/btl/tcp/help-mpi-btl-tcp.txt @@ -1,6 +1,6 @@ # -*- text -*- # -# Copyright (c) 2009-2017 Cisco Systems, Inc. All rights reserved +# Copyright (c) 2009-2018 Cisco Systems, Inc. All rights reserved # Copyright (c) 2015-2016 The University of Tennessee and The University # of Tennessee Research Foundation. All rights # reserved. @@ -35,7 +35,7 @@ values are in the range [1 .. 2^16-1]. This value will be ignored WARNING: Open MPI failed to TCP connect to a peer MPI process. This should not happen. -Your Open MPI job may now fail. +Your Open MPI job may now hang or fail. Local host: %s PID: %d @@ -46,7 +46,7 @@ Your Open MPI job may now fail. WARNING: Open MPI failed to handshake with a connecting peer MPI process over TCP. This should not happen. -Your Open MPI job may now fail. +Your Open MPI job may now hang or fail. Local host: %s PID: %d @@ -100,3 +100,70 @@ hopefully be able to continue). Peer hostname: %s (%s) Source IP of socket: %s Known IPs of peer: %s +# +[socket flag fail] +WARNING: Open MPI failed to get or set flags on a TCP socket. This +should not happen. + +This may cause unpredictable behavior, and may end up hanging or +aborting your job. + + Local host: %s + PID: %d + Flag: %s + Error: %s (%d) +# +[server did not get guid] +WARNING: Open MPI accepted a TCP connection from what appears to be a +another Open MPI process but the peer process did not complete the +initial handshake properly. This should not happen. + +This attempted connection will be ignored; your MPI job may or may not +continue properly. + + Local host: %s + PID: %d +# +[server accept cannot find guid] +WARNING: Open MPI accepted a TCP connection from what appears to be a +another Open MPI process but cannot find a corresponding process +entry for that peer. + +This attempted connection will be ignored; your MPI job may or may not +continue properly. + + Local host: %s + PID: %d +# +[server getpeername failed] +WARNING: Open MPI failed to look up the peer IP address information of +a TCP connection that it just accepted. This should not happen. + +This attempted connection will be ignored; your MPI job may or may not +continue properly. + + Local host: %s + PID: %d + Error: %s (%d) +# +[server cannot find endpoint] +WARNING: Open MPI accepted a TCP connection from what appears to be a +valid peer Open MPI process but cannot find a corresponding endpoint +entry for that peer. This should not happen. + +This attempted connection will be ignored; your MPI job may or may not +continue properly. + + Local host: %s + PID: %d +# +[client connect fail] +WARNING: Open MPI failed to TCP connect to a peer MPI process via +TCP. This should not happen. + +Your Open MPI job may now fail. + + Local host: %s + PID: %d + Message: %s + Error: %s (%d) diff --git a/opal/mca/btl/template/Makefile.am b/opal/mca/btl/template/Makefile.am index 4257b99fb98..176a5958984 100644 --- a/opal/mca/btl/template/Makefile.am +++ b/opal/mca/btl/template/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -51,6 +52,7 @@ mcacomponentdir = $(opallibdir) mcacomponent_LTLIBRARIES = $(component) mca_btl_template_la_SOURCES = $(component_sources) mca_btl_template_la_LDFLAGS = -module -avoid-version +mca_btl_template_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la noinst_LTLIBRARIES = $(lib) libmca_btl_template_la_SOURCES = $(lib_sources) diff --git a/opal/mca/btl/ugni/Makefile.am b/opal/mca/btl/ugni/Makefile.am index 2e9153641eb..958e12aec58 100644 --- a/opal/mca/btl/ugni/Makefile.am +++ b/opal/mca/btl/ugni/Makefile.am @@ -48,7 +48,8 @@ mcacomponentdir = $(opallibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_btl_ugni_la_SOURCES = $(ugni_SOURCES) nodist_mca_btl_ugni_la_SOURCES = $(ugni_nodist_SOURCES) -mca_btl_ugni_la_LIBADD = $(btl_ugni_LIBS) +mca_btl_ugni_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la \ + $(btl_ugni_LIBS) mca_btl_ugni_la_LDFLAGS = -module -avoid-version $(btl_ugni_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/opal/mca/btl/ugni/btl_ugni_add_procs.c b/opal/mca/btl/ugni/btl_ugni_add_procs.c index 0634977f966..6a2fa9b81e2 100644 --- a/opal/mca/btl/ugni/btl_ugni_add_procs.c +++ b/opal/mca/btl/ugni/btl_ugni_add_procs.c @@ -3,7 +3,7 @@ * Copyright (c) 2011-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2011 UT-Battelle, LLC. All rights reserved. - * Copyright (c) 2014-2015 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2017 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -35,6 +35,7 @@ int mca_btl_ugni_add_procs (struct mca_btl_base_module_t* btl, size_t nprocs, mca_btl_ugni_module_t *ugni_module = (mca_btl_ugni_module_t *) btl; int rc; void *mmap_start_addr; + struct timeval tv = {.tv_sec = 0, .tv_usec = MCA_BTL_UGNI_CONNECT_USEC}; if (false == ugni_module->initialized) { @@ -156,7 +157,7 @@ int mca_btl_ugni_add_procs (struct mca_btl_base_module_t* btl, size_t nprocs, mca_btl_ugni_spawn_progress_thread(btl); } - opal_event_evtimer_add (&ugni_module->connection_event, (&(struct timeval) {.tv_sec = 0, .tv_usec = MCA_BTL_UGNI_CONNECT_USEC})); + opal_event_evtimer_add (&ugni_module->connection_event, &tv); ugni_module->initialized = true; } @@ -271,7 +272,7 @@ static int ugni_reg_mem (void *reg_data, void *base, size_t size, rc = mca_btl_ugni_reg_mem (ugni_module, base, size, (mca_btl_ugni_reg_t *) reg, cq, flags); if (OPAL_LIKELY(OPAL_SUCCESS == rc)) { - opal_atomic_add_32(&ugni_module->reg_count,1); + opal_atomic_add_fetch_32(&ugni_module->reg_count,1); } return rc; @@ -285,7 +286,7 @@ ugni_dereg_mem (void *reg_data, mca_rcache_base_registration_t *reg) rc = mca_btl_ugni_dereg_mem (ugni_module, (mca_btl_ugni_reg_t *) reg); if (OPAL_LIKELY(OPAL_SUCCESS == rc)) { - opal_atomic_add_32(&ugni_module->reg_count,-1); + opal_atomic_add_fetch_32(&ugni_module->reg_count,-1); } return rc; diff --git a/opal/mca/btl/ugni/btl_ugni_component.c b/opal/mca/btl/ugni/btl_ugni_component.c index 602fb1b589a..cafcdabfc37 100644 --- a/opal/mca/btl/ugni/btl_ugni_component.c +++ b/opal/mca/btl/ugni/btl_ugni_component.c @@ -3,6 +3,7 @@ * Copyright (c) 2011-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2011 UT-Battelle, LLC. All rights reserved. + * Copyright (c) 2017 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -511,7 +512,7 @@ int mca_btl_ugni_progress_datagram (mca_btl_ugni_device_t *device) return rc; } - BTL_VERBOSE(("remote datagram completion on handle %p", handle)); + BTL_VERBOSE(("remote datagram completion on handle %p", (void*)handle)); /* if this is a wildcard endpoint lookup the remote peer by the proc id we received */ if (handle == ugni_module->wildcard_ep) { @@ -542,7 +543,7 @@ int mca_btl_ugni_progress_datagram (mca_btl_ugni_device_t *device) BTL_VERBOSE(("directed datagram complete for endpoint %p", (void *) ep)); ep->dg_posted = false; - (void) opal_atomic_add_32 (&ugni_module->active_datagrams, -1); + (void) opal_atomic_add_fetch_32 (&ugni_module->active_datagrams, -1); } (void) mca_btl_ugni_ep_connect_progress (ep); @@ -630,7 +631,7 @@ static inline int mca_btl_ugni_progress_rdma (mca_btl_ugni_module_t *ugni_module BTL_VERBOSE(("got %d completed rdma descriptors", rc)); for (int i = 0 ; i < rc ; ++i) { - BTL_VERBOSE(("post descriptor %p complete. GNI_CQ_STATUS_OK(): %d", post_desc[i], + BTL_VERBOSE(("post descriptor %p complete. GNI_CQ_STATUS_OK(): %d", (void*)post_desc[i], GNI_CQ_STATUS_OK(event_data[i]))); if (OPAL_UNLIKELY(!GNI_CQ_STATUS_OK(event_data[i]))) { diff --git a/opal/mca/btl/ugni/btl_ugni_device.h b/opal/mca/btl/ugni/btl_ugni_device.h index 18a3b46416f..829869ed3c8 100644 --- a/opal/mca/btl/ugni/btl_ugni_device.h +++ b/opal/mca/btl/ugni/btl_ugni_device.h @@ -5,6 +5,7 @@ * Copyright (c) 2011 UT-Battelle, LLC. All rights reserved. * Copyright (c) 2014 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -121,7 +122,7 @@ static inline intptr_t mca_btl_ugni_post_fma_device (mca_btl_ugni_device_t *devi } BTL_VERBOSE(("Posting FMA descriptor %p with op_type %d, amo %d, ep_handle %p, remote_addr 0x%lx, " - "length %lu", desc, desc->desc.type, desc->desc.amo_cmd, desc->ep_handle, + "length %lu", (void*)desc, desc->desc.type, desc->desc.amo_cmd, (void*)desc->ep_handle, desc->desc.remote_addr, desc->desc.length)); rc = GNI_PostFma (desc->ep_handle->gni_handle, &desc->desc); @@ -160,7 +161,7 @@ static inline intptr_t mca_btl_ugni_post_rdma_device (mca_btl_ugni_device_t *dev desc->desc.src_cq_hndl = desc->cq->gni_handle; BTL_VERBOSE(("Posting RDMA descriptor %p with op_type %d, ep_handle %p, remote_addr 0x%lx, " - "length %lu", desc, desc->desc.type, desc->ep_handle, desc->desc.remote_addr, + "length %lu", (void*)desc, desc->desc.type, (void*)desc->ep_handle, desc->desc.remote_addr, desc->desc.length)); rc = GNI_PostRdma (desc->ep_handle->gni_handle, &desc->desc); diff --git a/opal/mca/btl/ugni/btl_ugni_endpoint.c b/opal/mca/btl/ugni/btl_ugni_endpoint.c index b1369a1ac3e..2f792839982 100644 --- a/opal/mca/btl/ugni/btl_ugni_endpoint.c +++ b/opal/mca/btl/ugni/btl_ugni_endpoint.c @@ -3,6 +3,7 @@ * Copyright (c) 2011-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2011-2013 UT-Battelle, LLC. All rights reserved. + * Copyright (c) 2017 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -51,7 +52,7 @@ static int mca_btl_ugni_endpoint_get_modex (mca_btl_base_endpoint_t *ep) ep->ep_rem_id = modex->id; - BTL_VERBOSE(("received modex for ep %p. addr: %d, id: %d", ep, ep->ep_rem_addr, ep->ep_rem_id)); + BTL_VERBOSE(("received modex for ep %p. addr: %d, id: %d", (void*)ep, ep->ep_rem_addr, ep->ep_rem_id)); free (modex); @@ -180,7 +181,7 @@ int mca_btl_ugni_ep_disconnect (mca_btl_base_endpoint_t *ep, bool send_disconnec } } while (device->dev_smsg_local_cq.active_operations); - (void) opal_atomic_add_32 (&ep->smsg_ep_handle->device->smsg_connections, -1); + (void) opal_atomic_add_fetch_32 (&ep->smsg_ep_handle->device->smsg_connections, -1); } mca_btl_ugni_device_lock (device); @@ -277,7 +278,7 @@ static inline int mca_btl_ugni_ep_connect_finish (mca_btl_base_endpoint_t *ep) { ep->rmt_irq_mem_hndl = ep->remote_attr->rmt_irq_mem_hndl; ep->state = MCA_BTL_UGNI_EP_STATE_CONNECTED; - (void) opal_atomic_add_32 (&ep->smsg_ep_handle->device->smsg_connections, 1); + (void) opal_atomic_add_fetch_32 (&ep->smsg_ep_handle->device->smsg_connections, 1); /* send all pending messages */ BTL_VERBOSE(("endpoint connected. posting %u sends", (unsigned int) opal_list_get_size (&ep->frag_wait_list))); @@ -301,7 +302,6 @@ static inline int mca_btl_ugni_ep_connect_finish (mca_btl_base_endpoint_t *ep) { static int mca_btl_ugni_directed_ep_post (mca_btl_base_endpoint_t *ep) { mca_btl_ugni_module_t *ugni_module = mca_btl_ugni_ep_btl (ep); - mca_btl_ugni_device_t *device = ep->smsg_ep_handle->device; gni_return_t rc; BTL_VERBOSE(("posting directed datagram to remote id: %d for endpoint %p", ep->ep_rem_id, (void *)ep)); @@ -312,7 +312,7 @@ static int mca_btl_ugni_directed_ep_post (mca_btl_base_endpoint_t *ep) ep->remote_attr, sizeof (*ep->remote_attr), MCA_BTL_UGNI_CONNECT_DIRECTED_ID | ep->index); if (OPAL_LIKELY(GNI_RC_SUCCESS == rc)) { - (void) opal_atomic_add_32 (&ugni_module->active_datagrams, 1); + (void) opal_atomic_add_fetch_32 (&ugni_module->active_datagrams, 1); } return mca_btl_rc_ugni_to_opal (rc); @@ -351,8 +351,8 @@ int mca_btl_ugni_ep_connect_progress (mca_btl_base_endpoint_t *ep) } } - BTL_VERBOSE(("ep->remote_attr->smsg_attr = {.msg_type = %d, .msg_buffer = 0x%lx}", ep->remote_attr->smsg_attr.msg_type, - ep->remote_attr->smsg_attr.msg_buffer)); + BTL_VERBOSE(("ep->remote_attr->smsg_attr = {.msg_type = %d, .msg_buffer = %p}", ep->remote_attr->smsg_attr.msg_type, + (void*)ep->remote_attr->smsg_attr.msg_buffer)); if (GNI_SMSG_TYPE_INVALID == ep->remote_attr->smsg_attr.msg_type) { /* use datagram to exchange connection information with the remote peer */ diff --git a/opal/mca/btl/ugni/btl_ugni_frag.h b/opal/mca/btl/ugni/btl_ugni_frag.h index bb8a58cbc8b..ac9c8bc6ec8 100644 --- a/opal/mca/btl/ugni/btl_ugni_frag.h +++ b/opal/mca/btl/ugni/btl_ugni_frag.h @@ -192,7 +192,7 @@ static inline bool mca_btl_ugni_frag_del_ref (mca_btl_ugni_base_frag_t *frag, in opal_atomic_mb (); - ref_cnt = OPAL_THREAD_ADD32(&frag->ref_cnt, -1); + ref_cnt = OPAL_THREAD_ADD_FETCH32(&frag->ref_cnt, -1); if (ref_cnt) { assert (ref_cnt > 0); return false; diff --git a/opal/mca/btl/ugni/btl_ugni_get.c b/opal/mca/btl/ugni/btl_ugni_get.c index 1f8ab248b03..07066c1b82b 100644 --- a/opal/mca/btl/ugni/btl_ugni_get.c +++ b/opal/mca/btl/ugni/btl_ugni_get.c @@ -134,7 +134,7 @@ int mca_btl_ugni_start_eager_get (mca_btl_base_endpoint_t *endpoint, { mca_btl_ugni_module_t *ugni_module = mca_btl_ugni_ep_btl (endpoint); size_t size; - int rc; + int rc = OPAL_SUCCESS; BTL_VERBOSE(("starting eager get for remote ctx: %p", hdr.eager.ctx)); diff --git a/opal/mca/btl/ugni/btl_ugni_module.c b/opal/mca/btl/ugni/btl_ugni_module.c index 0826cc2ba41..f4ade03b98b 100644 --- a/opal/mca/btl/ugni/btl_ugni_module.c +++ b/opal/mca/btl/ugni/btl_ugni_module.c @@ -5,6 +5,7 @@ * Copyright (c) 2011 UT-Battelle, LLC. All rights reserved. * Copyright (c) 2014-2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -61,14 +62,15 @@ mca_btl_ugni_module_t mca_btl_ugni_module = { } }; -static void *mca_btl_ugni_datagram_event (int foo, short bar, void *arg) +static void mca_btl_ugni_datagram_event (int foo, short bar, void *arg) { mca_btl_ugni_module_t *ugni_module = (mca_btl_ugni_module_t *) arg; mca_btl_ugni_device_t *device = ugni_module->devices; + struct timeval tv = {.tv_sec = 0, .tv_usec = MCA_BTL_UGNI_CONNECT_USEC}; mca_btl_ugni_progress_datagram (device); - opal_event_evtimer_add (&ugni_module->connection_event, (&(struct timeval) {.tv_sec = 0, .tv_usec = MCA_BTL_UGNI_CONNECT_USEC})); + opal_event_evtimer_add (&ugni_module->connection_event, &tv); } int diff --git a/opal/mca/btl/ugni/btl_ugni_send.c b/opal/mca/btl/ugni/btl_ugni_send.c index 0a018cbbd13..5b120b75965 100644 --- a/opal/mca/btl/ugni/btl_ugni_send.c +++ b/opal/mca/btl/ugni/btl_ugni_send.c @@ -1,10 +1,11 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ +/* -*- Mode: C; c-basic-offset:3 ; indent-tabs-mode:nil -*- */ /* * Copyright (c) 2011-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2011 UT-Battelle, LLC. All rights reserved. * Copyright (c) 2014 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -20,7 +21,7 @@ void mca_btl_ugni_wait_list_append (mca_btl_ugni_module_t *ugni_module, mca_btl_base_endpoint_t *endpoint, mca_btl_ugni_base_frag_t *frag) { - BTL_VERBOSE(("wait-listing fragment %p to %s. endpoint state %d\n", frag, OPAL_NAME_PRINT(endpoint->peer_proc->proc_name), endpoint->state)); + BTL_VERBOSE(("wait-listing fragment %p to %s. endpoint state %d\n", (void*)frag, OPAL_NAME_PRINT(endpoint->peer_proc->proc_name), endpoint->state)); frag->base.des_flags |= MCA_BTL_DES_SEND_ALWAYS_CALLBACK; @@ -150,7 +151,6 @@ int mca_btl_ugni_sendi (struct mca_btl_base_module_t *btl, rc = mca_btl_ugni_send_frag (endpoint, frag); if (OPAL_UNLIKELY(OPAL_SUCCESS != rc)) { - mca_btl_ugni_frag_return (frag); break; } diff --git a/opal/mca/btl/ugni/btl_ugni_smsg.c b/opal/mca/btl/ugni/btl_ugni_smsg.c index 0e338cc7603..b90c95a6a9e 100644 --- a/opal/mca/btl/ugni/btl_ugni_smsg.c +++ b/opal/mca/btl/ugni/btl_ugni_smsg.c @@ -59,12 +59,13 @@ int mca_btl_ugni_smsg_process (mca_btl_base_endpoint_t *ep) mca_btl_ugni_base_frag_t frag; mca_btl_base_segment_t seg; bool disconnect = false; + int32_t _tmp_value = 0; uintptr_t data_ptr; gni_return_t rc; uint32_t len; int count = 0; - if (!opal_atomic_cmpset_32 (&ep->smsg_progressing, 0, 1)) { + if (!opal_atomic_compare_exchange_strong_32 (&ep->smsg_progressing, &_tmp_value, 1)) { /* already progressing (we can't support reentry here) */ return 0; } diff --git a/opal/mca/btl/usnic/Makefile.am b/opal/mca/btl/usnic/Makefile.am index 76f49a08aef..ecd3099dc67 100644 --- a/opal/mca/btl/usnic/Makefile.am +++ b/opal/mca/btl/usnic/Makefile.am @@ -13,7 +13,9 @@ # reserved. # Copyright (c) 2010-2015 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2015 Intel, Inc. All rights reserved. -# Copyright (c) 2016 IBM Corporation. All rights reserved. +# Copyright (c) 2016-2017 IBM Corporation. All rights reserved. +# Copyright (c) 2017 Los Alamos National Security, LLC. All rights +# reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -21,7 +23,7 @@ # $HEADER$ # -AM_CPPFLAGS = -DBTL_IN_OPAL=1 $(opal_common_libfabric_CPPFLAGS) -DOMPI_LIBMPI_NAME=\"$(OMPI_LIBMPI_NAME)\" +AM_CPPFLAGS = -DBTL_IN_OPAL=1 $(opal_common_ofi_CPPFLAGS) -DOMPI_LIBMPI_NAME=\"$(OMPI_LIBMPI_NAME)\" EXTRA_DIST = README.txt README.test @@ -29,8 +31,7 @@ dist_opaldata_DATA = \ help-mpi-btl-usnic.txt test_sources = \ - test/btl_usnic_component_test.h \ - test/btl_usnic_graph_test.h + test/btl_usnic_component_test.h sources = \ btl_usnic_compat.h \ @@ -48,8 +49,6 @@ sources = \ btl_usnic_endpoint.h \ btl_usnic_frag.c \ btl_usnic_frag.h \ - btl_usnic_graph.h \ - btl_usnic_graph.c \ btl_usnic_hwloc.c \ btl_usnic_hwloc.h \ btl_usnic_map.c \ @@ -90,8 +89,8 @@ mca_btl_usnic_la_SOURCES = $(component_sources) mca_btl_usnic_la_LDFLAGS = \ $(opal_btl_usnic_LDFLAGS) \ -module -avoid-version -mca_btl_usnic_la_LIBADD = \ - $(OPAL_TOP_BUILDDIR)/opal/mca/common/libfabric/lib@OPAL_LIB_PREFIX@mca_common_libfabric.la +mca_btl_usnic_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la \ + $(OPAL_TOP_BUILDDIR)/opal/mca/common/ofi/lib@OPAL_LIB_PREFIX@mca_common_ofi.la noinst_LTLIBRARIES = $(lib) libmca_btl_usnic_la_SOURCES = $(lib_sources) diff --git a/opal/mca/btl/usnic/btl_usnic_compat.c b/opal/mca/btl/usnic/btl_usnic_compat.c index de649cb5147..05a4e0c97bf 100644 --- a/opal/mca/btl/usnic/btl_usnic_compat.c +++ b/opal/mca/btl/usnic/btl_usnic_compat.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2014-2016 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2014-2018 Cisco Systems, Inc. All rights reserved * Copyright (c) 2015 Intel, Inc. All rights reserved. * $COPYRIGHT$ * @@ -659,7 +659,7 @@ int opal_btl_usnic_put( /*----------------------------------------------------------------------*/ -#elif BTL_VERSION == 30 +#elif BTL_VERSION >= 30 /* * BTL 3.0 prepare_src function. diff --git a/opal/mca/btl/usnic/btl_usnic_compat.h b/opal/mca/btl/usnic/btl_usnic_compat.h index da99d13be26..2caf7337394 100644 --- a/opal/mca/btl/usnic/btl_usnic_compat.h +++ b/opal/mca/btl/usnic/btl_usnic_compat.h @@ -1,8 +1,9 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* - * Copyright (c) 2013-2016 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2013-2018 Cisco Systems, Inc. All rights reserved * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2018 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -17,6 +18,7 @@ #define BTL_USNIC_COMPAT_H #include "opal/mca/rcache/rcache.h" +#include "opal/mca/btl/btl.h" /************************************************************************/ @@ -75,7 +77,16 @@ # define USNIC_PUT_REMOTE des_segments # define USNIC_PUT_REMOTE_COUNT des_segments_count -# define BTL_VERSION 30 +// Starting after Open MPI v3.1.0, the BTL_VERSION macro was defined +// by btl.h (it'll likely get into v4.0.0 -- don't know if this change +// will migrate to the v3.x.y branches). So if BTL_VERSION is already +// defined, then we don't need to define it again. As of this writing +// (Feb 2018), this set of defines works fine with BTL v3.0.0 and +// v3.1.0. So we'll set the BTL version to the minimium acceptable +// value: 3.0.0. +# if !defined(BTL_VERSION) +# define BTL_VERSION 300 +# endif # define USNIC_COMPAT_FREE_LIST_GET(list, item) \ (item) = opal_free_list_get((list)) @@ -149,8 +160,6 @@ usnic_compat_proc_name_compare(opal_process_name_t a, # define opal_btl_usnic_ack_segment_t ompi_btl_usnic_ack_segment_t # define opal_btl_usnic_ack_segment_t_class ompi_btl_usnic_ack_segment_t_class -# define opal_btl_usnic_graph_t ompi_btl_usnic_graph_t - # define opal_btl_usnic_run_tests ompi_btl_usnic_run_tests # define USNIC_SEND_LOCAL des_src @@ -298,8 +307,6 @@ struct mca_btl_base_endpoint_t; #if BTL_VERSION == 20 -#include "ompi/mca/btl/btl.h" - /* This function changed signature in BTL 3.0 */ mca_btl_base_descriptor_t* opal_btl_usnic_prepare_src( @@ -336,9 +343,7 @@ opal_btl_usnic_put( /* BTL 3.0 (i.e., >=v1.9, but listed separately because these are really BTL API issues) */ -#elif BTL_VERSION == 30 - -#include "opal/mca/btl/btl.h" +#elif BTL_VERSION >= 300 /* This function changed signature compared to BTL 2.0 */ struct mca_btl_base_descriptor_t * diff --git a/opal/mca/btl/usnic/btl_usnic_component.c b/opal/mca/btl/usnic/btl_usnic_component.c index 8a42c08d029..25a64a25d26 100644 --- a/opal/mca/btl/usnic/btl_usnic_component.c +++ b/opal/mca/btl/usnic/btl_usnic_component.c @@ -704,6 +704,8 @@ static mca_btl_base_module_t** usnic_component_init(int* num_btl_modules, struct fi_info hints = {0}; struct fi_ep_attr ep_attr = {0}; struct fi_fabric_attr fabric_attr = {0}; + struct fi_rx_attr rx_attr = {0}; + struct fi_tx_attr tx_attr = {0}; /* We only want providers named "usnic" that are of type EP_DGRAM */ fabric_attr.prov_name = "usnic"; @@ -714,6 +716,11 @@ static mca_btl_base_module_t** usnic_component_init(int* num_btl_modules, hints.addr_format = FI_SOCKADDR; hints.ep_attr = &ep_attr; hints.fabric_attr = &fabric_attr; + hints.tx_attr = &tx_attr; + hints.rx_attr = &rx_attr; + + tx_attr.iov_limit = 1; + rx_attr.iov_limit = 1; ret = fi_getinfo(libfabric_api, NULL, 0, 0, &hints, &info_list); if (0 != ret) { diff --git a/opal/mca/btl/usnic/btl_usnic_graph.c b/opal/mca/btl/usnic/btl_usnic_graph.c deleted file mode 100644 index 33e2c1c212a..00000000000 --- a/opal/mca/btl/usnic/btl_usnic_graph.c +++ /dev/null @@ -1,1048 +0,0 @@ -/* - * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#include "opal_config.h" - -#include - -#include "opal_stdint.h" -#include "opal/class/opal_pointer_array.h" -#include "opal/constants.h" - -/* mainly for BTL_ERROR */ -#if BTL_IN_OPAL -#include "opal/mca/btl/btl.h" -#include "opal/mca/btl/base/base.h" -#include "opal/mca/btl/base/btl_base_error.h" -#else -#include "ompi/mca/btl/btl.h" -#include "ompi/mca/btl/base/base.h" -#include "ompi/mca/btl/base/btl_base_error.h" -#endif - -#include "btl_usnic.h" -#include "btl_usnic_graph.h" -#include "btl_usnic_compat.h" - -#define GRAPH_DEBUG 0 -#if GRAPH_DEBUG -# define GRAPH_DEBUG_OUT(args) BTL_OUTPUT(args) -#else -# define GRAPH_DEBUG_OUT(args) do {} while(0) -#endif - -#define MAX_COST INT64_MAX - -struct opal_btl_usnic_edge_t { - opal_object_t super; - - opal_list_item_t outbound_li; - opal_list_item_t inbound_li; - - /** source of this edge */ - int source; - - /** v_index of target of this edge */ - int target; - - /** cost (weight) of this edge */ - int64_t cost; - - /** - * (flow-network) capacity of this edge. Zero-capacity edges essentially do - * not exist and will be ignored by most of the algorithms implemented here. - */ - int capacity; - - /** any other information associated with this edge */ - void *e_data; -}; - -struct opal_btl_usnic_vertex_t { - /** index in the graph's array of vertices */ - int v_index; - - /** any other information associated with the vertex */ - void *v_data; - - /** linked list of edges for which this vertex is a source */ - opal_list_t out_edges; - - /** linked list of edges for which this vertex is a target */ - opal_list_t in_edges; -}; - -struct opal_btl_usnic_graph_t { - /** number of vertices currently in this graph */ - int num_vertices; - - /** vertices in this graph (with number of set elements == num_vertices) */ - opal_pointer_array_t vertices; - - /** index of the source vertex, or -1 if not present */ - int source_idx; - - /** index of the sink vertex, or -1 if not present */ - int sink_idx; - - /** user callback to clean up the v_data */ - opal_btl_usnic_cleanup_fn_t v_data_cleanup_fn; - - /** user callback to clean up the e_data */ - opal_btl_usnic_cleanup_fn_t e_data_cleanup_fn; -}; - -#ifndef MAX -# define MAX(a,b) ((a) > (b) ? (a) : (b)) -#endif - -#ifndef MIN -# define MIN(a,b) ((a) < (b) ? (a) : (b)) -#endif - -#define f(i,j) flow[n*i + j] - -#define LIST_FOREACH_CONTAINED(item, list, type, member) \ - for (item = container_of( (list)->opal_list_sentinel.opal_list_next, type, member ); \ - &item->member != &(list)->opal_list_sentinel; \ - item = container_of( \ - ((opal_list_item_t *) (&item->member))->opal_list_next, type, member )) - -#define LIST_FOREACH_SAFE_CONTAINED(item, next, list, type, member) \ - for (item = container_of( (list)->opal_list_sentinel.opal_list_next, type, member ), \ - next = container_of( \ - ((opal_list_item_t *) (&item->member))->opal_list_next, type, member ); \ - &item->member != &(list)->opal_list_sentinel; \ - item = next, \ - next = container_of( \ - ((opal_list_item_t *) (&item->member))->opal_list_next, type, member )) - -#define NUM_VERTICES(g) (g->num_vertices) - -#define CHECK_VERTEX_RANGE(g,v) \ - do { \ - if ((v) < 0 || \ - (v) >= NUM_VERTICES(g)) { \ - return OPAL_ERR_BAD_PARAM; \ - } \ - } while (0) - -/* cast away any constness of &g->vertices b/c the opal_pointer_array API is - * not const-correct */ -#define V_ID_TO_PTR(g, v_id) \ - ((opal_btl_usnic_vertex_t *) \ - opal_pointer_array_get_item((opal_pointer_array_t *)&g->vertices, v_id)) - -#define FOREACH_OUT_EDGE(g,v_id,e_ptr) \ - LIST_FOREACH_CONTAINED(e_ptr, \ - &(V_ID_TO_PTR(g, v_id)->out_edges), \ - opal_btl_usnic_edge_t, \ - outbound_li) - -#define FOREACH_IN_EDGE(g,v_id,e_ptr) \ - LIST_FOREACH_CONTAINED(e_ptr, \ - &(V_ID_TO_PTR(g, v_id)->in_edges), \ - opal_btl_usnic_edge_t, \ - inbound_li) - - -/* Iterate over (u,v) edge pairs along the given path, where path is defined - * by the predecessor array "pred". Stops when a -1 predecessor is - * encountered. Note: because it is a *predecessor* array, the traversal - * starts at the sink and progresses towards the source. */ -#define FOREACH_UV_ON_PATH(pred, source, sink, u, v) \ - for (u = pred[sink], v = sink; u != -1; v = u, u = pred[u]) - -/* ensure that (a+b<=max) */ -static inline void check_add64_overflow(int64_t a, int64_t b) -{ - assert(!((b > 0) && (a > (INT64_MAX - b))) && - !((b < 0) && (a < (INT64_MIN - b)))); -} - -static void edge_constructor(opal_btl_usnic_edge_t *e) -{ - OBJ_CONSTRUCT(&e->outbound_li, opal_list_item_t); - OBJ_CONSTRUCT(&e->inbound_li, opal_list_item_t); -} - -static void edge_destructor(opal_btl_usnic_edge_t *e) -{ - OBJ_DESTRUCT(&e->outbound_li); - OBJ_DESTRUCT(&e->inbound_li); -} - -OBJ_CLASS_DECLARATION(opal_btl_usnic_edge_t); -OBJ_CLASS_INSTANCE(opal_btl_usnic_edge_t, opal_object_t, - edge_constructor, edge_destructor); - -static void dump_vec(const char *name, int *vec, int n) - __opal_attribute_unused__; - -static void dump_vec(const char *name, int *vec, int n) -{ - int i; - fprintf(stderr, "%s={", name); - for (i = 0; i < n; ++i) { - fprintf(stderr, "[%d]=%2d, ", i, vec[i]); - } - fprintf(stderr, "}\n"); -} - -static void dump_vec64(const char *name, int64_t *vec, int n) - __opal_attribute_unused__; - -static void dump_vec64(const char *name, int64_t *vec, int n) -{ - int i; - fprintf(stderr, "%s={", name); - for (i = 0; i < n; ++i) { - fprintf(stderr, "[%d]=%2" PRIi64 ", ", i, vec[i]); - } - fprintf(stderr, "}\n"); -} - - -static void dump_flow(int *flow, int n) - __opal_attribute_unused__; - -static void dump_flow(int *flow, int n) -{ - int u, v; - - fprintf(stderr, "flow={\n"); - for (u = 0; u < n; ++u) { - fprintf(stderr, "u=%d| ", u); - for (v = 0; v < n; ++v) { - fprintf(stderr, "%2d,", f(u,v)); - } - fprintf(stderr, "\n"); - } - fprintf(stderr, "}\n"); -} - - -static int get_capacity(opal_btl_usnic_graph_t *g, int source, int target) -{ - opal_btl_usnic_edge_t *e; - - CHECK_VERTEX_RANGE(g, source); - CHECK_VERTEX_RANGE(g, target); - - FOREACH_OUT_EDGE(g, source, e) { - assert(e->source == source); - if (e->target == target) { - return e->capacity; - } - } - - return 0; -} - -static int -set_capacity(opal_btl_usnic_graph_t *g, int source, int target, int cap) -{ - opal_btl_usnic_edge_t *e; - - CHECK_VERTEX_RANGE(g, source); - CHECK_VERTEX_RANGE(g, target); - - FOREACH_OUT_EDGE(g, source, e) { - assert(e->source == source); - if (e->target == target) { - e->capacity = cap; - return OPAL_SUCCESS; - } - } - - return OPAL_ERR_NOT_FOUND; -} - -static void free_vertex(opal_btl_usnic_graph_t *g, - opal_btl_usnic_vertex_t *v) -{ - if (NULL != v) { - if (NULL != g->v_data_cleanup_fn && NULL != v->v_data) { - g->v_data_cleanup_fn(v->v_data); - } - free(v); - } -} - -int opal_btl_usnic_gr_create(opal_btl_usnic_cleanup_fn_t v_data_cleanup_fn, - opal_btl_usnic_cleanup_fn_t e_data_cleanup_fn, - opal_btl_usnic_graph_t **g_out) -{ - int err; - opal_btl_usnic_graph_t *g = NULL; - - if (NULL == g_out) { - return OPAL_ERR_BAD_PARAM; - } - *g_out = NULL; - - g = calloc(1, sizeof(*g)); - if (NULL == g) { - OPAL_ERROR_LOG(OPAL_ERR_OUT_OF_RESOURCE); - err = OPAL_ERR_OUT_OF_RESOURCE; - goto out_free_g; - } - - g->source_idx = -1; - g->sink_idx = -1; - - g->v_data_cleanup_fn = v_data_cleanup_fn; - g->e_data_cleanup_fn = e_data_cleanup_fn; - - /* now that we essentially have an empty graph, add vertices to it */ - OBJ_CONSTRUCT(&g->vertices, opal_pointer_array_t); - err = opal_pointer_array_init(&g->vertices, 0, INT_MAX, 32); - if (OPAL_SUCCESS != err) { - goto out_free_g; - } - - *g_out = g; - return OPAL_SUCCESS; - -out_free_g: - free(g); - return err; -} - -int opal_btl_usnic_gr_free(opal_btl_usnic_graph_t *g) -{ - int i; - opal_btl_usnic_edge_t *e, *next; - opal_btl_usnic_vertex_t *v; - - /* remove all edges from all out_edges lists */ - for (i = 0; i < NUM_VERTICES(g); ++i) { - v = V_ID_TO_PTR(g, i); - LIST_FOREACH_SAFE_CONTAINED(e, next, &v->out_edges, - opal_btl_usnic_edge_t, outbound_li) { - opal_list_remove_item(&v->out_edges, &e->outbound_li); - OBJ_RELEASE(e); - } - } - /* now remove from all in_edges lists and free the edge */ - for (i = 0; i < NUM_VERTICES(g); ++i) { - v = V_ID_TO_PTR(g, i); - LIST_FOREACH_SAFE_CONTAINED(e, next, &v->in_edges, - opal_btl_usnic_edge_t, inbound_li) { - opal_list_remove_item(&v->in_edges, &e->inbound_li); - - if (NULL != g->e_data_cleanup_fn && NULL != e->e_data) { - g->e_data_cleanup_fn(e->e_data); - } - OBJ_RELEASE(e); - } - - free_vertex(g, V_ID_TO_PTR(g, i)); - opal_pointer_array_set_item(&g->vertices, i, NULL); - } - g->num_vertices = 0; - - OBJ_DESTRUCT(&g->vertices); - free(g); - - return OPAL_SUCCESS; -} - -int opal_btl_usnic_gr_clone(const opal_btl_usnic_graph_t *g, - bool copy_user_data, - opal_btl_usnic_graph_t **g_clone_out) -{ - int err; - int i; - int index; - opal_btl_usnic_graph_t *gx; - opal_btl_usnic_edge_t *e; - - if (NULL == g_clone_out) { - return OPAL_ERR_BAD_PARAM; - } - *g_clone_out = NULL; - - if (copy_user_data) { - BTL_ERROR(("user data copy requested but not yet supported")); - abort(); - return OPAL_ERR_FATAL; - } - - gx = NULL; - err = opal_btl_usnic_gr_create(NULL, NULL, &gx); - if (OPAL_SUCCESS != err) { - return err; - } - assert(NULL != gx); - - /* reconstruct all vertices */ - for (i = 0; i < NUM_VERTICES(g); ++i) { - err = opal_btl_usnic_gr_add_vertex(gx, NULL, &index); - if (OPAL_SUCCESS != err) { - goto out_free_gx; - } - assert(index == i); - } - - /* now reconstruct all the edges (iterate by source vertex only to avoid - * double-adding) */ - for (i = 0; i < NUM_VERTICES(g); ++i) { - FOREACH_OUT_EDGE(g, i, e) { - assert(i == e->source); - err = opal_btl_usnic_gr_add_edge(gx, e->source, e->target, - e->cost, e->capacity, NULL); - if (OPAL_SUCCESS != err) { - goto out_free_gx; - } - } - } - - *g_clone_out = gx; - return OPAL_SUCCESS; - -out_free_gx: - /* we don't reach in and manipulate gx's state directly, so it should be - * safe to use the standard free function */ - opal_btl_usnic_gr_free(gx); - return err; -} - -int opal_btl_usnic_gr_indegree(const opal_btl_usnic_graph_t *g, - int vertex) -{ - opal_btl_usnic_vertex_t *v; - - v = V_ID_TO_PTR(g, vertex); - return opal_list_get_size(&v->in_edges); -} - -int opal_btl_usnic_gr_outdegree(const opal_btl_usnic_graph_t *g, - int vertex) -{ - opal_btl_usnic_vertex_t *v; - - v = V_ID_TO_PTR(g, vertex); - return opal_list_get_size(&v->out_edges); -} - -int opal_btl_usnic_gr_add_edge(opal_btl_usnic_graph_t *g, - int from, - int to, - int64_t cost, - int capacity, - void *e_data) -{ - opal_btl_usnic_edge_t *e; - opal_btl_usnic_vertex_t *v_from, *v_to; - - if (from < 0 || from >= NUM_VERTICES(g)) { - return OPAL_ERR_BAD_PARAM; - } - if (to < 0 || to >= NUM_VERTICES(g)) { - return OPAL_ERR_BAD_PARAM; - } - if (cost == MAX_COST) { - return OPAL_ERR_BAD_PARAM; - } - if (capacity < 0) { - /* negative cost is fine, but negative capacity is not currently - * handled appropriately */ - return OPAL_ERR_BAD_PARAM; - } - FOREACH_OUT_EDGE(g, from, e) { - assert(e->source == from); - if (e->target == to) { - return OPAL_EXISTS; - } - } - - /* this reference is owned by the out_edges list */ - e = OBJ_NEW(opal_btl_usnic_edge_t); - if (NULL == e) { - OPAL_ERROR_LOG(OPAL_ERR_OUT_OF_RESOURCE); - return OPAL_ERR_OUT_OF_RESOURCE; - } - - e->source = from; - e->target = to; - e->cost = cost; - e->capacity = capacity; - e->e_data = e_data; - - v_from = V_ID_TO_PTR(g, from); - opal_list_append(&v_from->out_edges, &e->outbound_li); - - OBJ_RETAIN(e); /* ref owned by in_edges list */ - v_to = V_ID_TO_PTR(g, to); - opal_list_append(&v_to->in_edges, &e->inbound_li); - - return OPAL_SUCCESS; -} - -int opal_btl_usnic_gr_add_vertex(opal_btl_usnic_graph_t *g, - void *v_data, - int *index_out) -{ - opal_btl_usnic_vertex_t *v; - - v = calloc(1, sizeof(*v)); - if (NULL == v) { - OPAL_ERROR_LOG(OPAL_ERR_OUT_OF_RESOURCE); - return OPAL_ERR_OUT_OF_RESOURCE; - } - - /* add to the ptr array early to simplify cleanup in the incredibly rare - * chance that adding fails */ - v->v_index = opal_pointer_array_add(&g->vertices, v); - if (-1 == v->v_index) { - free(v); - OPAL_ERROR_LOG(OPAL_ERR_OUT_OF_RESOURCE); - return OPAL_ERR_OUT_OF_RESOURCE; - } - assert(v->v_index == g->num_vertices); - - ++g->num_vertices; - - v->v_data = v_data; - OBJ_CONSTRUCT(&v->out_edges, opal_list_t); - OBJ_CONSTRUCT(&v->in_edges, opal_list_t); - - if (NULL != index_out) { - *index_out = v->v_index; - } - - return OPAL_SUCCESS; -} - -int opal_btl_usnic_gr_order(const opal_btl_usnic_graph_t *g) -{ - return NUM_VERTICES(g); -} - -/** - * shrink a flow matrix for old_n vertices to one works for new_n - * - * Takes a matrix stored in a one-dimensional array of size (old_n*old_n) and - * "truncates" it into a dense array of size (new_n*new_n) that only contain - * the flow values for the first new_n vertices. E.g., it turns this array - * (old_n=5, new_n=3): - * - * 1 2 3 4 5 - * 6 7 8 9 10 - * 11 12 13 14 15 - * 16 17 18 19 20 - * 21 22 23 24 25 - * - * into this array; - * - * 1 2 3 - * 6 7 8 - * 11 12 13 - */ -static void shrink_flow_matrix(int *flow, int old_n, int new_n) -{ - int u, v; - - assert(old_n > new_n); - - for (u = 0; u < new_n; ++u) { - for (v = 0; v < new_n; ++v) { - flow[new_n*u + v] = flow[old_n*u + v]; - } - } -} - -/** - * Compute the so-called "bottleneck" capacity value for a path "pred" through - * graph "gx". - */ -static int -bottleneck_path( - opal_btl_usnic_graph_t *gx, - int n, - int *pred) -{ - int u, v; - int min; - - min = INT_MAX; - FOREACH_UV_ON_PATH(pred, gx->source_idx, gx->sink_idx, u, v) { - int cap_f_uv = get_capacity(gx, u, v); - min = MIN(min, cap_f_uv); - } - - return min; -} - - -/** - * This routine implements the Bellman-Ford shortest paths algorithm, slightly - * specialized for our forumlation of flow networks: - * http://en.wikipedia.org/wiki/Bellman%E2%80%93Ford_algorithm - * - * Specifically, it attempts to find the shortest path from "source" to - * "target". It returns true if such a path was found, false otherwise. Any - * found path is returned in "pred" as a predecessor chain (i.e., pred[sink] - * is the start of the path and pred[pred[sink]] is its predecessor, etc.). - * - * The contents of "pred" are only valid if this routine returns true. - */ -static bool bellman_ford(opal_btl_usnic_graph_t *gx, - int source, - int target, - int *pred) -{ - int64_t *dist; - int i; - int n; - int u, v; - bool found_target = false; - - if (NULL == gx) { - OPAL_ERROR_LOG(OPAL_ERR_BAD_PARAM); - return false; - } - if (NULL == pred) { - OPAL_ERROR_LOG(OPAL_ERR_BAD_PARAM); - return false; - } - if (source < 0 || source >= NUM_VERTICES(gx)) { - return OPAL_ERR_BAD_PARAM; - } - if (target < 0 || target >= NUM_VERTICES(gx)) { - return OPAL_ERR_BAD_PARAM; - } - - /* initialize */ - n = opal_btl_usnic_gr_order(gx); - dist = malloc(n * sizeof(*dist)); - if (NULL == dist) { - OPAL_ERROR_LOG(OPAL_ERR_OUT_OF_RESOURCE); - goto out; - } - for (i = 0; i < n; ++i) { - dist[i] = MAX_COST; - pred[i] = -1; - } - dist[source] = 0; - - /* relax repeatedly */ - for (i = 1; i < NUM_VERTICES(gx); ++i) { - bool relaxed = false; -#if GRAPH_DEBUG - dump_vec("pred", pred, NUM_VERTICES(gx)); - dump_vec64("dist", dist, NUM_VERTICES(gx)); -#endif - - for (u = 0; u < NUM_VERTICES(gx); ++u) { - opal_btl_usnic_edge_t *e_ptr; - - FOREACH_OUT_EDGE(gx, u, e_ptr) { - v = e_ptr->target; - - /* make sure to only construct paths from edges that actually have - * non-zero capacity */ - if (e_ptr->capacity > 0 && - dist[u] != MAX_COST) { /* avoid signed overflow for "infinity" */ - check_add64_overflow(dist[u], e_ptr->cost); - if ((dist[u] + e_ptr->cost) < dist[v]) { - dist[v] = dist[u] + e_ptr->cost; - pred[v] = u; - relaxed = true; - } - } - } - } - /* optimization: stop if an outer iteration did not succeed in - * changing any dist/pred values (already at optimum) */ - if (!relaxed) { - GRAPH_DEBUG_OUT(("relaxed==false, breaking out")); - break; - } - } - - /* check for negative-cost cycles */ - for (u = 0; u < NUM_VERTICES(gx); ++u) { - opal_btl_usnic_edge_t * e_ptr; - - FOREACH_OUT_EDGE(gx, u, e_ptr) { - v = e_ptr->target; - if (e_ptr->capacity > 0 && - dist[u] != MAX_COST && /* avoid signed overflow */ - (dist[u] + e_ptr->cost) < dist[v]) { - BTL_ERROR(("negative-weight cycle detected")); - abort(); - goto out; - } - } - } - - if (dist[target] != MAX_COST) { - found_target = true; - } - -out: -#if GRAPH_DEBUG - dump_vec("pred", pred, NUM_VERTICES(gx)); -#endif - assert(pred[source] == -1); - free(dist); - GRAPH_DEBUG_OUT(("bellman_ford: found_target=%s", found_target ? "true" : "false")); - return found_target; -} - -/** - * Transform the given connected, bipartite, acyclic digraph into a flow - * network (i.e., add a source and a sink, with the source connected to vertex - * set V1 and the sink connected to vertex set V2). This also creates - * residual edges suitable for augmenting-path algorithms. All "source" nodes - * in the original graph are considered to have an output of 1 and "sink" - * nodes can take an input of 1. The result is that "forward" edges are all - * created with capacity=1, "backward" (residual) edges are created with - * capacity=0. - * - * After this routine, all capacities are "residual capacities" ($c_f$ in the - * literature). - * - * Initial flow throughout the network is assumed to be 0 at all edges. - * - * The graph will be left in an undefined state if an error occurs (though - * freeing it should still be safe). - */ -static int bipartite_to_flow(opal_btl_usnic_graph_t *g) -{ - int err; - int order; - int u, v; - int num_left, num_right; - - /* grab size before adding extra vertices */ - order = opal_btl_usnic_gr_order(g); - - err = opal_btl_usnic_gr_add_vertex(g, NULL, &g->source_idx); - if (OPAL_SUCCESS != err) { - return err; - } - err = opal_btl_usnic_gr_add_vertex(g, NULL, &g->sink_idx); - if (OPAL_SUCCESS != err) { - return err; - } - - /* The networks we are interested in are bipartite and have edges only - * from one partition to the other partition (none vice versa). We - * visualize this conventionally with all of the source vertices on the - * left-hand side of an imaginary rendering of the graph and the target - * vertices on the right-hand side of the rendering. The direction - * "forward" is considered to be moving from left to right. - */ - num_left = 0; - num_right = 0; - for (u = 0; u < order; ++u) { - int inbound = opal_btl_usnic_gr_indegree(g, u); - int outbound = opal_btl_usnic_gr_outdegree(g, u); - - if (inbound > 0 && outbound > 0) { - BTL_ERROR(("graph is not (unidirectionally) bipartite")); - abort(); - } - else if (inbound > 0) { - /* "right" side of the graph, create edges to the sink */ - ++num_right; - err = opal_btl_usnic_gr_add_edge(g, u, g->sink_idx, - 0, /* no cost */ - /*capacity=*/1, - /*e_data=*/NULL); - if (OPAL_SUCCESS != err) { - GRAPH_DEBUG_OUT(("add_edge failed")); - return err; - } - } - else if (outbound > 0) { - /* "left" side of the graph, create edges to the source */ - ++num_left; - err = opal_btl_usnic_gr_add_edge(g, g->source_idx, u, - 0, /* no cost */ - /*capacity=*/1, - /*e_data=*/NULL); - if (OPAL_SUCCESS != err) { - GRAPH_DEBUG_OUT(("add_edge failed")); - return err; - } - } - } - - /* it doesn't make sense to extend this graph with a source and sink - * unless */ - if (num_right == 0 || num_left == 0) { - return OPAL_ERR_BAD_PARAM; - } - - /* now run through and create "residual" edges as well (i.e., create edges - * in the reverse direction with 0 initial flow and a residual capacity of - * $c_f(u,v)=c(u,v)-f(u,v)$). Residual edges can exist where no edges - * exist in the original graph. - */ - order = opal_btl_usnic_gr_order(g); /* need residuals for newly created - source/sink edges too */ - for (u = 0; u < order; ++u) { - opal_btl_usnic_edge_t * e_ptr; - FOREACH_OUT_EDGE(g, u, e_ptr) { - v = e_ptr->target; - - /* (u,v) exists, add (v,u) if not already present. Cost is - * negative for these edges because "giving back" flow pays us - * back any cost already incurred. */ - err = opal_btl_usnic_gr_add_edge(g, v, u, - -e_ptr->cost, - /*capacity=*/0, - /*e_data=*/NULL); - if (OPAL_SUCCESS != err && OPAL_EXISTS != err) { - return err; - } - } - } - - return OPAL_SUCCESS; -} - -/** - * Implements the "Successive Shortest Path" algorithm for computing the - * minimum cost flow problem. This is a generalized version of the - * Ford-Fulkerson algorithm. There are two major changes from F-F: - * 1. In addition to capacities and flows, this algorithm pays attention to - * costs for traversing an edge. This particular function leaves the - * caller's costs alone but sets its own capacities. - * 2. Shortest paths are computed using the cost metric. - * - * The algorithm's sketch looks like: - * 1 Transform network G by adding source and sink, create residual edges - * 2 Initial flow x is zero - * 3 while ( Gx contains a path from s to t ) do - * 4 Find any shortest path P from s to t - * 5 Augment current flow x along P - * 6 update Gx - * - * This function mutates the given graph (adding vertices and edges, changing - * capacties, etc.), so callers may wish to clone the graph before calling - * this routine. - * - * The result is an array of (u,v) vertex pairs, where (u,v) is an edge in the - * original graph which has non-zero flow. - * - * Returns OMPI error codes like OPAL_SUCCESS/OPAL_ERR_OUT_OF_RESOURCE. - * - * This version of the algorithm has a theoretical upper bound on its running - * time of O(|V|^2 * |E| * f), where f is essentially the maximum flow in the - * graph. In our case, f=min(|V1|,|V2|), where V1 and V2 are the two - * constituent sets of the bipartite graph. - * - * This algorithm's performance could probably be improved by modifying it to - * use vertex potentials and Dijkstra's Algorithm instead of Bellman-Ford. - * Normally vertex potentials are needed in order to use Dijkstra's safely, - * but our graphs are constrained enough that this may not be necessary. - * Switching to Dijkstra's implemented with a heap should yield a reduced - * upper bound of O(|V| * |E| * f * log(|V|)). Let's consider this a future - * enhancement for the time being, since it's not obvious at this point that - * the faster running time will be worth the additional implementation - * complexity. - */ -static int min_cost_flow_ssp(opal_btl_usnic_graph_t *gx, - int **flow_out) -{ - int err = OPAL_SUCCESS; - int n; - int *pred = NULL; - int *flow = NULL; - int u, v; - int c; - - GRAPH_DEBUG_OUT(("begin min_cost_flow_ssp()")); - - if (NULL == flow_out) { - return OPAL_ERR_BAD_PARAM; - } - *flow_out = NULL; - - n = opal_btl_usnic_gr_order(gx); - - pred = malloc(n*sizeof(*pred)); - if (NULL == pred) { - OPAL_ERROR_LOG(OPAL_ERR_OUT_OF_RESOURCE); - err = OPAL_ERR_OUT_OF_RESOURCE; - goto out_error; - } - - /* "flow" is a 2d matrix of current flow values, all initialized to zero */ - flow = calloc(n*n, sizeof(*flow)); - if (NULL == flow) { - OPAL_ERROR_LOG(OPAL_ERR_OUT_OF_RESOURCE); - err = OPAL_ERR_OUT_OF_RESOURCE; - goto out_error; - } - - /* loop as long as paths exist from source to sink */ - while (bellman_ford(gx, gx->source_idx, gx->sink_idx, pred)) { - int cap_f_path; - - /* find any shortest path P from s to t (already present in pred) */ - GRAPH_DEBUG_OUT(("start outer iteration of SSP algorithm")); -#if GRAPH_DEBUG - dump_vec("pred", pred, NUM_VERTICES(gx)); - dump_flow(flow, n); -#endif - - cap_f_path = bottleneck_path(gx, n, pred); - - /* augment current flow along P */ - FOREACH_UV_ON_PATH(pred, gx->source_idx, gx->sink_idx, u, v) { - assert(u == pred[v]); - - f(u,v) = f(u,v) + cap_f_path; /* "forward" edge */ - f(v,u) = f(v,u) - cap_f_path; /* residual network edge */ - - assert(f(u,v) == -f(v,u)); /* skew symmetry invariant */ - - /* update Gx as we go along: decrease capacity by this new - * augmenting flow */ - c = get_capacity(gx, u, v) - cap_f_path; - assert(c >= 0); - err = set_capacity(gx, u, v, c); - if (OPAL_SUCCESS != err) { - BTL_ERROR(("unable to set capacity, missing edge?")); - abort(); - } - - c = get_capacity(gx, v, u) + cap_f_path; - assert(c >= 0); - err = set_capacity(gx, v, u, c); - if (OPAL_SUCCESS != err) { - BTL_ERROR(("unable to set capacity, missing edge?")); - abort(); - } - } - } - -out: - *flow_out = flow; - free(pred); - return err; - -out_error: - free(*flow_out); - GRAPH_DEBUG_OUT(("returning error %d", err)); - goto out; -} - -int opal_btl_usnic_solve_bipartite_assignment(const opal_btl_usnic_graph_t *g, - int *num_match_edges_out, - int **match_edges_out) -{ - int err; - int i; - int u, v; - int n; - int *flow = NULL; - opal_btl_usnic_graph_t *gx = NULL; - - if (NULL == match_edges_out || NULL == num_match_edges_out) { - return OPAL_ERR_BAD_PARAM; - } - *num_match_edges_out = 0; - *match_edges_out = NULL; - - /* don't perturb the caller's data structure */ - err = opal_btl_usnic_gr_clone(g, false, &gx); - if (OPAL_SUCCESS != err) { - GRAPH_DEBUG_OUT(("opal_btl_usnic_gr_clone failed")); - goto out; - } - - /* Transform gx into a residual flow network with capacities, a source, a - * sink, and residual edges. We track the actual flow separately in the - * "flow" matrix. Initial capacity for every forward edge is 1. Initial - * capacity for every backward (residual) edge is 0. - * - * For the remainder of this routine (and the ssp routine) the capacities - * refer to residual capacities ($c_f$) not capacities in the original - * graph. For convenience we adjust all residual capacities as we go - * along rather than recomputing them from the flow and capacities in the - * original graph. This allows many other graph operations to have no - * direct knowledge of the flow matrix. - */ - err = bipartite_to_flow(gx); - if (OPAL_SUCCESS != err) { - GRAPH_DEBUG_OUT(("bipartite_to_flow failed")); - OPAL_ERROR_LOG(err); - return err; - } - - /* Use the SSP algorithm to compute the min-cost flow over this network. - * Edges with non-zero flow in the result should be part of the matching. - * - * Note that the flow array returned is sized for gx, not for g. Index - * accordingly later on. - */ - err = min_cost_flow_ssp(gx, &flow); - if (OPAL_SUCCESS != err) { - GRAPH_DEBUG_OUT(("min_cost_flow_ssp failed")); - return err; - } - assert(NULL != flow); - - /* don't care about new edges in gx, only old edges in g */ - n = opal_btl_usnic_gr_order(g); - -#if GRAPH_DEBUG - dump_flow(flow, NUM_VERTICES(gx)); -#endif - shrink_flow_matrix(flow, opal_btl_usnic_gr_order(gx), n); -#if GRAPH_DEBUG - dump_flow(flow, n); -#endif - - for (u = 0; u < n; ++u) { - for (v = 0; v < n; ++v) { - if (f(u,v) > 0) { - ++(*num_match_edges_out); - } - } - } - - if (0 == *num_match_edges_out) { - /* avoid attempting to allocate a zero-byte buffer */ - goto out; - } - - *match_edges_out = malloc(*num_match_edges_out * sizeof(*match_edges_out)); - if (NULL == *match_edges_out) { - *num_match_edges_out = 0; - OPAL_ERROR_LOG(OPAL_ERR_OUT_OF_RESOURCE); - err = OPAL_ERR_OUT_OF_RESOURCE; - goto out; - } - - i = 0; - for (u = 0; u < n; ++u) { - for (v = 0; v < n; ++v) { - /* flow exists on this edge so include this edge in the matching */ - if (f(u,v) > 0) { - (*match_edges_out)[i++] = u; - (*match_edges_out)[i++] = v; - } - } - } - -out: - free(flow); - opal_btl_usnic_gr_free(gx); - return err; -} - -#include "test/btl_usnic_graph_test.h" diff --git a/opal/mca/btl/usnic/btl_usnic_graph.h b/opal/mca/btl/usnic/btl_usnic_graph.h deleted file mode 100644 index 484a08519c1..00000000000 --- a/opal/mca/btl/usnic/btl_usnic_graph.h +++ /dev/null @@ -1,163 +0,0 @@ -/* - * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -/* Implements an adjacency-list-based weighted directed graph (digraph), - * focused on supporting bipartite digraphs and flow-network problems. - * - * Note that some operations might be more efficient if this structure were - * converted to use an adjacency matrix instead of an adjacency list. OTOH - * that complicates other pieces of the implementation (specifically, adding - * and removing edges). */ - -#ifndef BTL_USNIC_GRAPH_H -#define BTL_USNIC_GRAPH_H - -#include "opal_config.h" - -struct opal_btl_usnic_vertex_t; -struct opal_btl_usnic_edge_t; -struct opal_btl_usnic_graph_t; - -typedef struct opal_btl_usnic_vertex_t opal_btl_usnic_vertex_t; -typedef struct opal_btl_usnic_edge_t opal_btl_usnic_edge_t; -typedef struct opal_btl_usnic_graph_t opal_btl_usnic_graph_t; - -/** - * callback function pointer type for cleaning up user data associated with a - * vertex or edge */ -typedef void (*opal_btl_usnic_cleanup_fn_t)(void *user_data); - -/** - * create a new empty graph - * - * Any new vertices will have NULL user data associated. - * - * @param[in] v_data_cleanup_fn cleanup function to use for vertex user data - * @param[in] e_data_cleanup_fn cleanup function to use for edge user data - * @param[out] g_out the created graph - * - * @returns OPAL_SUCCESS or an OMPI error code - */ -int opal_btl_usnic_gr_create(opal_btl_usnic_cleanup_fn_t v_data_cleanup_fn, - opal_btl_usnic_cleanup_fn_t e_data_cleanup_fn, - opal_btl_usnic_graph_t **g_out); - -/** - * free the given graph - * - * Any user data associated with vertices or edges in the graph will have - * the given edge/vertex cleanup callback invoked in some arbitrary order. - * - * @returns OPAL_SUCCESS or an OMPI error code - */ -int opal_btl_usnic_gr_free(opal_btl_usnic_graph_t *g); - -/** - * clone (deep copy) the given graph - * - * Note that copy_user_data==true is not currently supported (requires the - * addition of a copy callback for user data). - * - * @param[in] g the graph to clone - * @param[in] copy_user_data if true, copy vertex/edge user data to the new - * graph - * @param[in] g_clone_out the resulting cloned graph - * @returns OPAL_SUCCESS or an OMPI error code - */ -int opal_btl_usnic_gr_clone(const opal_btl_usnic_graph_t *g, - bool copy_user_data, - opal_btl_usnic_graph_t **g_clone_out); - -/** - * return the number of edges for which this vertex is a destination - * - * @param[in] g the graph to query - * @param[in] vertex the vertex id to query - * @returns the number of edges for which this vertex is a destination - */ -int opal_btl_usnic_gr_indegree(const opal_btl_usnic_graph_t *g, - int vertex); - -/** - * return the number of edges for which this vertex is a source - * - * @param[in] g the graph to query - * @param[in] vertex the vertex id to query - * @returns the number of edges for which this vertex is a source - */ -int opal_btl_usnic_gr_outdegree(const opal_btl_usnic_graph_t *g, - int vertex); - -/** - * add an edge to the given graph - * - * @param[in] from source vertex ID - * @param[in] to target vertex ID - * @param[in] cost cost value for this edge (lower is better) - * @param[in] capacity maximum flow transmissible on this edge - * @param[in] e_data caller data to associate with this edge, useful for - * debugging or minimizing state shared across components - * - * @returns OPAL_SUCCESS or an OMPI error code - */ -int opal_btl_usnic_gr_add_edge(opal_btl_usnic_graph_t *g, - int from, - int to, - int64_t cost, - int capacity, - void *e_data); - -/** - * add a vertex to the given graph - * - * @param[in] g graph to manipulate - * @param[in] v_data data to associate with the new vertex - * @param[out] index_out integer index of the new vertex. May be NULL. - * - * @returns OPAL_SUCCESS or an OMPI error code - */ -int opal_btl_usnic_gr_add_vertex(opal_btl_usnic_graph_t *g, - void *v_data, - int *index_out); - -/** - * compute the order of a graph (number of vertices) - * - * @param[in] g the graph to query - */ -int opal_btl_usnic_gr_order(const opal_btl_usnic_graph_t *g); - -/** - * This function solves the "assignment problem": - * http://en.wikipedia.org/wiki/Assignment_problem - * - * The goal is to find a maximum cardinality, minimum cost matching in a - * weighted bipartite graph. Maximum cardinality takes priority over minimum - * cost. - * - * Capacities in the given graph are ignored (assumed to be 1 at the start). - * It is also assumed that the graph only contains edges from one vertex set - * to the other and that no edges exist in the reverse direction ("forward" - * edges only). - * - * The algorithm(s) used will be deterministic. That is, given the exact same - * graph, two calls to this routine will result in the same matching result. - * - * @param[in] g an acyclic bipartite directed graph for - * which a matching is sought - * @param[out] num_match_edges_out number edges found in the matching - * @param[out] match_edges_out an array of (u,v) vertex pairs indicating - * which edges are in the matching - * - * @returns OPAL_SUCCESS or an OMPI error code - */ -int opal_btl_usnic_solve_bipartite_assignment(const opal_btl_usnic_graph_t *g, - int *num_match_edges_out, - int **match_edges_out); -#endif /* BTL_USNIC_GRAPH_H */ diff --git a/opal/mca/btl/usnic/btl_usnic_hwloc.c b/opal/mca/btl/usnic/btl_usnic_hwloc.c index e0230f02c9c..a435a8a4043 100644 --- a/opal/mca/btl/usnic/btl_usnic_hwloc.c +++ b/opal/mca/btl/usnic/btl_usnic_hwloc.c @@ -1,6 +1,6 @@ /* * Copyright (c) 2013-2016 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2016 Intel, Inc. All rights reserved. + * Copyright (c) 2016-2017 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -26,22 +26,34 @@ */ static hwloc_obj_t my_numa_node = NULL; static int num_numa_nodes = 0; -static const struct hwloc_distances_s *matrix = NULL; +static struct hwloc_distances_s *matrix = NULL; +#if HWLOC_API_VERSION >= 0x20000 +static unsigned int matrix_nr = 1; +#endif /* * Get the hwloc distance matrix (if we don't already have it). - * - * Note that the matrix data structure belongs to hwloc; we are not - * responsibile for freeing it. */ static int get_distance_matrix(void) { +#if HWLOC_API_VERSION < 0x20000 + /* Note that the matrix data structure belongs to hwloc; we are not + * responsible for freeing it. */ + if (NULL == matrix) { matrix = hwloc_get_whole_distance_matrix_by_type(opal_hwloc_topology, HWLOC_OBJ_NODE); } return (NULL == matrix) ? OPAL_ERROR : OPAL_SUCCESS; +#else + if (0 != hwloc_distances_get_by_type(opal_hwloc_topology, HWLOC_OBJ_NODE, + &matrix_nr, &matrix, + HWLOC_DISTANCES_KIND_MEANS_LATENCY, 0) || 0 == matrix_nr) { + return OPAL_ERROR; + } + return OPAL_SUCCESS; +#endif } /* @@ -219,6 +231,7 @@ int opal_btl_usnic_hwloc_distance(opal_btl_usnic_module_t *module) /* Lookup the distance between my NUMA node and the NUMA node of the device */ +#if HWLOC_API_VERSION < 0x20000 if (NULL != dev_numa) { module->numa_distance = matrix->latency[dev_numa->logical_index * num_numa_nodes + @@ -229,6 +242,40 @@ int opal_btl_usnic_hwloc_distance(opal_btl_usnic_module_t *module) module->linux_device_name, module->numa_distance); } +#else + if (NULL != dev_numa) { + int myindex, devindex; + unsigned int j; + myindex = -1; + for (j=0; j < matrix_nr; j++) { + if (my_numa_node == matrix->objs[j]) { + myindex = j; + break; + } + } + if (-1 == myindex) { + return OPAL_SUCCESS; + } + devindex = -1; + for (j=0; j < matrix_nr; j++) { + if (dev_numa == matrix->objs[j]) { + devindex = j; + break; + } + } + if (-1 == devindex) { + return OPAL_SUCCESS; + } + + module->numa_distance = + matrix->values[(devindex * num_numa_nodes) + myindex]; + + opal_output_verbose(5, USNIC_OUT, + "btl:usnic:filter_numa: %s is distance %d from me", + module->linux_device_name, + module->numa_distance); + } +#endif return OPAL_SUCCESS; } diff --git a/opal/mca/btl/usnic/btl_usnic_module.c b/opal/mca/btl/usnic/btl_usnic_module.c index efad1ed2b7c..ba0442c43c4 100644 --- a/opal/mca/btl/usnic/btl_usnic_module.c +++ b/opal/mca/btl/usnic/btl_usnic_module.c @@ -1659,7 +1659,7 @@ static int create_ep(opal_btl_usnic_module_t* module, rc, fi_strerror(-rc)); return OPAL_ERR_OUT_OF_RESOURCE; } - rc = fi_ep_bind(channel->ep, &module->av->fid, FI_RECV); + rc = fi_ep_bind(channel->ep, &module->av->fid, 0); if (0 != rc) { opal_show_help("help-mpi-btl-usnic.txt", "internal error during init", diff --git a/opal/mca/btl/usnic/btl_usnic_proc.c b/opal/mca/btl/usnic/btl_usnic_proc.c index f0fefbff964..5b96a77a7e6 100644 --- a/opal/mca/btl/usnic/btl_usnic_proc.c +++ b/opal/mca/btl/usnic/btl_usnic_proc.c @@ -27,6 +27,7 @@ #include "opal/util/arch.h" #include "opal/util/show_help.h" #include "opal/constants.h" +#include "opal/util/bipartite_graph.h" #include "btl_usnic_compat.h" #include "btl_usnic.h" @@ -34,7 +35,6 @@ #include "btl_usnic_endpoint.h" #include "btl_usnic_module.h" #include "btl_usnic_util.h" -#include "btl_usnic_graph.h" /* larger weight values are more desirable (i.e., worth, not cost) */ enum { @@ -427,13 +427,13 @@ static void edge_pairs_to_match_table( static int create_proc_module_graph( opal_btl_usnic_proc_t *proc, bool proc_is_left, - opal_btl_usnic_graph_t **g_out) + opal_bp_graph_t **g_out) { int err; int i, j; int u, v; int num_modules; - opal_btl_usnic_graph_t *g = NULL; + opal_bp_graph_t *g = NULL; if (NULL == g_out) { return OPAL_ERR_BAD_PARAM; @@ -444,7 +444,7 @@ static int create_proc_module_graph( /* Construct a bipartite graph with remote interfaces on the one side and * local interfaces (modules) on the other. */ - err = opal_btl_usnic_gr_create(NULL, NULL, &g); + err = opal_bp_graph_create(NULL, NULL, &g); if (OPAL_SUCCESS != err) { OPAL_ERROR_LOG(err); goto out; @@ -453,9 +453,9 @@ static int create_proc_module_graph( /* create vertices for each interface (local and remote) */ for (i = 0; i < num_modules; ++i) { int idx = -1; - err = opal_btl_usnic_gr_add_vertex(g, - mca_btl_usnic_component.usnic_active_modules[i], - &idx); + err = opal_bp_graph_add_vertex(g, + mca_btl_usnic_component.usnic_active_modules[i], + &idx); if (OPAL_SUCCESS != err) { OPAL_ERROR_LOG(err); goto out_free_graph; @@ -464,7 +464,7 @@ static int create_proc_module_graph( } for (i = 0; i < (int)proc->proc_modex_count; ++i) { int idx = -1; - err = opal_btl_usnic_gr_add_vertex(g, &proc->proc_modex[i], &idx); + err = opal_bp_graph_add_vertex(g, &proc->proc_modex[i], &idx); if (OPAL_SUCCESS != err) { OPAL_ERROR_LOG(err); goto out_free_graph; @@ -509,9 +509,9 @@ static int create_proc_module_graph( opal_output_verbose(20, USNIC_OUT, "btl:usnic:%s: adding edge (%d,%d) with cost=%" PRIi64 " for edge module[%d] <--> endpoint[%d]", __func__, u, v, cost, i, j); - err = opal_btl_usnic_gr_add_edge(g, u, v, cost, - /*capacity=*/1, - /*e_data=*/NULL); + err = opal_bp_graph_add_edge(g, u, v, cost, + /*capacity=*/1, + /*e_data=*/NULL); if (OPAL_SUCCESS != err) { OPAL_ERROR_LOG(err); goto out_free_graph; @@ -523,7 +523,7 @@ static int create_proc_module_graph( return OPAL_SUCCESS; out_free_graph: - opal_btl_usnic_gr_free(g); + opal_bp_graph_free(g); out: return err; } @@ -547,7 +547,7 @@ static int match_modex(opal_btl_usnic_module_t *module, int err = OPAL_SUCCESS; size_t i; uint32_t num_modules; - opal_btl_usnic_graph_t *g = NULL; + opal_bp_graph_t *g = NULL; bool proc_is_left; if (NULL == index_out) { @@ -599,7 +599,7 @@ static int match_modex(opal_btl_usnic_module_t *module, int nme = 0; int *me = NULL; - err = opal_btl_usnic_solve_bipartite_assignment(g, &nme, &me); + err = opal_bp_graph_solve_bipartite_assignment(g, &nme, &me); if (OPAL_SUCCESS != err) { OPAL_ERROR_LOG(err); goto out_free_graph; @@ -608,7 +608,7 @@ static int match_modex(opal_btl_usnic_module_t *module, edge_pairs_to_match_table(proc, proc_is_left, nme, me); free(me); - err = opal_btl_usnic_gr_free(g); + err = opal_bp_graph_free(g); if (OPAL_SUCCESS != err) { OPAL_ERROR_LOG(err); return err; @@ -655,7 +655,7 @@ static int match_modex(opal_btl_usnic_module_t *module, return (*index_out == -1 ? OPAL_ERR_NOT_FOUND : OPAL_SUCCESS); out_free_graph: - opal_btl_usnic_gr_free(g); + opal_bp_graph_free(g); out_free_table: free(proc->proc_ep_match_table); proc->proc_ep_match_table = NULL; @@ -741,6 +741,7 @@ opal_btl_usnic_create_endpoint(opal_btl_usnic_module_t *module, endpoint->endpoint_module = module; assert(modex_index >= 0 && modex_index < (int)proc->proc_modex_count); endpoint->endpoint_remote_modex = proc->proc_modex[modex_index]; + endpoint->endpoint_send_credits = module->sd_num; /* Start creating destinations; one for each channel. These progress in the background.a */ diff --git a/opal/mca/btl/usnic/btl_usnic_recv.c b/opal/mca/btl/usnic/btl_usnic_recv.c index 443e2b0e961..00e48d7a0dd 100644 --- a/opal/mca/btl/usnic/btl_usnic_recv.c +++ b/opal/mca/btl/usnic/btl_usnic_recv.c @@ -11,7 +11,7 @@ * All rights reserved. * Copyright (c) 2006 Sandia National Laboratories. All rights * reserved. - * Copyright (c) 2008-2015 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2008-2017 Cisco Systems, Inc. All rights reserved * Copyright (c) 2012 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ @@ -351,9 +351,9 @@ void opal_btl_usnic_recv_call(opal_btl_usnic_module_t *module, else { ++module->stats.num_unk_recvs; if (module->stats.num_unk_recvs < 10) { - opal_output(0, "unrecognized payload type %d", bseg->us_btl_header->payload_type); - opal_output(0, "base = %p, proto = %p, hdr = %p", bseg->us_list.ptr, seg->rs_protocol_header, (void*) bseg->us_btl_header); - opal_btl_usnic_dump_hex(bseg->us_list.ptr, 96+sizeof(*bseg->us_btl_header)); + opal_output_verbose(15, USNIC_OUT, "unrecognized payload type %d", bseg->us_btl_header->payload_type); + opal_output_verbose(15, USNIC_OUT, "base = %p, proto = %p, hdr = %p", bseg->us_list.ptr, seg->rs_protocol_header, (void*) bseg->us_btl_header); + opal_btl_usnic_dump_hex(15, USNIC_OUT, bseg->us_list.ptr, 96+sizeof(*bseg->us_btl_header)); } goto repost; } diff --git a/opal/mca/btl/usnic/btl_usnic_recv.h b/opal/mca/btl/usnic/btl_usnic_recv.h index 70ffa7d4db2..7e056e488db 100644 --- a/opal/mca/btl/usnic/btl_usnic_recv.h +++ b/opal/mca/btl/usnic/btl_usnic_recv.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013-2015 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2013-2017 Cisco Systems, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -295,7 +295,7 @@ opal_btl_usnic_recv_fast(opal_btl_usnic_module_t *module, #if 0 opal_output(0, "fast recv %d bytes:\n", bseg->us_btl_header->payload_len + sizeof(opal_btl_usnic_btl_header_t)); -opal_btl_usnic_dump_hex(bseg->us_btl_header, bseg->us_btl_header->payload_len + sizeof(opal_btl_usnic_btl_header_t)); +opal_btl_usnic_dump_hex(15, USNIC_OUT, bseg->us_btl_header, bseg->us_btl_header->payload_len + sizeof(opal_btl_usnic_btl_header_t)); #endif /* If this is a short incoming message (i.e., the message is wholly contained in this one message -- it is not chunked diff --git a/opal/mca/btl/usnic/btl_usnic_util.c b/opal/mca/btl/usnic/btl_usnic_util.c index 17eeb7650db..54a7be513a8 100644 --- a/opal/mca/btl/usnic/btl_usnic_util.c +++ b/opal/mca/btl/usnic/btl_usnic_util.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013-2016 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2013-2017 Cisco Systems, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -75,7 +75,8 @@ void opal_btl_usnic_util_abort(const char *msg, const char *file, int line) void -opal_btl_usnic_dump_hex(void *vaddr, int len) +opal_btl_usnic_dump_hex(int verbose_level, int output_id, + void *vaddr, int len) { char buf[128]; size_t bufspace; @@ -96,7 +97,8 @@ opal_btl_usnic_dump_hex(void *vaddr, int len) sum += addr[i]; if ((i&15) == 15) { - opal_output(0, "%4x: %s\n", i&~15, buf); + opal_output_verbose(verbose_level, output_id, + "%4x: %s\n", i&~15, buf); p = buf; memset(buf, 0, sizeof(buf)); @@ -104,9 +106,10 @@ opal_btl_usnic_dump_hex(void *vaddr, int len) } } if ((i&15) != 0) { - opal_output(0, "%4x: %s\n", i&~15, buf); + opal_output_verbose(verbose_level, output_id, + "%4x: %s\n", i&~15, buf); } - /*opal_output(0, "buffer sum = %x\n", sum); */ + /*opal_output_verbose(verbose_level, output_id, "buffer sum = %x\n", sum); */ } diff --git a/opal/mca/btl/usnic/btl_usnic_util.h b/opal/mca/btl/usnic/btl_usnic_util.h index 389deafd652..09bec876abd 100644 --- a/opal/mca/btl/usnic/btl_usnic_util.h +++ b/opal/mca/btl/usnic/btl_usnic_util.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013-2016 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2013-2017 Cisco Systems, Inc. All rights reserved * $COPYRIGHT$ * * Additional copyrights may follow @@ -117,7 +117,8 @@ void opal_btl_usnic_snprintf_ipv4_addr(char *out, size_t maxlen, void opal_btl_usnic_snprintf_bool_array(char *s, size_t slen, bool a[], size_t alen); -void opal_btl_usnic_dump_hex(void *vaddr, int len); +void opal_btl_usnic_dump_hex(int verbose_level, int output_id, + void *vaddr, int len); size_t opal_btl_usnic_convertor_pack_peek(const opal_convertor_t *conv, size_t max_len); diff --git a/opal/mca/btl/usnic/configure.m4 b/opal/mca/btl/usnic/configure.m4 index 406a8ffa06a..33d5dacdb75 100644 --- a/opal/mca/btl/usnic/configure.m4 +++ b/opal/mca/btl/usnic/configure.m4 @@ -12,7 +12,9 @@ # All rights reserved. # Copyright (c) 2006 Sandia National Laboratories. All rights # reserved. -# Copyright (c) 2010-2015 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2010-2017 Cisco Systems, Inc. All rights reserved +# Copyright (c) 2017 Los Alamos National Security, LLC. All rights +# reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -93,27 +95,27 @@ AC_DEFUN([_OPAL_BTL_USNIC_DO_CONFIG],[ AC_MSG_RESULT([$opal_btl_usnic_happy]) ]) - # The usnic BTL requires libfabric support. + # The usnic BTL requires OFI libfabric support. AS_IF([test "$opal_btl_usnic_happy" = "yes"], - [AC_MSG_CHECKING([whether libfabric support is available]) - AS_IF([test "$opal_common_libfabric_happy" = "yes"], + [AC_MSG_CHECKING([whether OFI libfabric support is available]) + AS_IF([test "$opal_common_ofi_happy" = "yes"], [opal_btl_usnic_happy=yes], [opal_btl_usnic_happy=no]) AC_MSG_RESULT([$opal_btl_usnic_happy]) ]) - # The usnic BTL requires at least libfabric v1.1 (there was a + # The usnic BTL requires at least OFI libfabric v1.1 (there was a # critical bug in libfabric v1.0). AS_IF([test "$opal_btl_usnic_happy" = "yes"], - [AC_MSG_CHECKING([whether libfabric is >= v1.1]) + [AC_MSG_CHECKING([whether OFI libfabric is >= v1.1]) opal_btl_usnic_CPPFLAGS_save=$CPPFLAGS - CPPFLAGS="$opal_common_libfabric_CPPFLAGS $CPPFLAGS" + CPPFLAGS="$opal_common_ofi_CPPFLAGS $CPPFLAGS" AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[#include ]], [[ #if !defined(FI_MAJOR_VERSION) -#error your version of libfabric is too old +#error your version of OFI libfabric is too old #elif FI_VERSION(FI_MAJOR_VERSION, FI_MINOR_VERSION) < FI_VERSION(1, 1) -#error your version of libfabric is too old +#error your version of OFI libfabric is too old #endif ]])], [opal_btl_usnic_happy=yes], @@ -122,10 +124,10 @@ AC_DEFUN([_OPAL_BTL_USNIC_DO_CONFIG],[ CPPFLAGS=$opal_btl_usnic_CPPFLAGS_save ]) - # Make sure we can find the libfabric usnic extensions header + # Make sure we can find the OFI libfabric usnic extensions header AS_IF([test "$opal_btl_usnic_happy" = "yes" ], [opal_btl_usnic_CPPFLAGS_save=$CPPFLAGS - CPPFLAGS="$opal_common_libfabric_CPPFLAGS $CPPFLAGS" + CPPFLAGS="$opal_common_ofi_CPPFLAGS $CPPFLAGS" AC_CHECK_HEADER([rdma/fi_ext_usnic.h], [], [opal_btl_usnic_happy=no]) @@ -141,5 +143,6 @@ AC_DEFUN([_OPAL_BTL_USNIC_DO_CONFIG],[ [$2]) ]) + OPAL_SUMMARY_ADD([[Transports]],[[Cisco usNIC]],[[btl_usnic]],[$opal_btl_usnic_happy]) OPAL_VAR_SCOPE_POP ])dnl diff --git a/opal/mca/btl/usnic/test/btl_usnic_graph_test.h b/opal/mca/btl/usnic/test/btl_usnic_graph_test.h deleted file mode 100644 index c5d25e6da2d..00000000000 --- a/opal/mca/btl/usnic/test/btl_usnic_graph_test.h +++ /dev/null @@ -1,1070 +0,0 @@ -/* - * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef BTL_USNIC_GRAPH_TEST_H -#define BTL_USNIC_GRAPH_TEST_H - -#if OPAL_BTL_USNIC_UNIT_TESTS - -#include -#include -#include "btl_usnic_test.h" - -#define check_graph_is_consistent(g) \ - do { \ - check(NUM_VERTICES(g) <= opal_pointer_array_get_size(&g->vertices)); \ - check(g->source_idx >= -1 || g->source_idx < NUM_VERTICES(g)); \ - check(g->sink_idx >= -1 || g->sink_idx < NUM_VERTICES(g)); \ - } while (0) - -#define check_has_in_out_degree(g, u, expected_indegree, expected_outdegree) \ - do { \ - check_int_eq(opal_btl_usnic_gr_indegree(g, (u)), expected_indegree); \ - check_int_eq(opal_btl_usnic_gr_outdegree(g, (u)), expected_outdegree); \ - } while (0) - -/* Check the given path for sanity and that it does not have a cycle. Uses - * the "racing pointers" approach for cycle checking. */ -#define check_path_cycle(n, source, sink, pred) \ - do { \ - int i_, j_; \ - check_int_eq(pred[source], -1); \ - for (i_ = 0; i_ < n; ++i_) { \ - check(pred[i_] >= -1); \ - check(pred[i_] < n); \ - } \ - i_ = (sink); \ - j_ = pred[(sink)]; \ - while (i_ != -1 && j_ != -1) { \ - check_msg(i_ != j_, "CYCLE DETECTED"); \ - i_ = pred[i_]; \ - j_ = pred[j_]; \ - if (j_ != -1) { \ - j_ = pred[j_]; \ - } \ - } \ - } while (0) - -static int v_cleanup_count = 0; -static int e_cleanup_count = 0; - -static void v_cleanup(void *v_data) -{ - ++v_cleanup_count; -} - -static void e_cleanup(void *e_data) -{ - ++e_cleanup_count; -} - -/* a utility function for comparing integer pairs, useful for sorting the edge - * list returned by opal_btl_usnic_solve_bipartite_assignment */ -static int cmp_int_pair(const void *a, const void *b) -{ - int *ia = (int *)a; - int *ib = (int *)b; - - if (ia[0] < ib[0]) { - return -1; - } - else if (ia[0] > ib[0]) { - return 1; - } - else { /* ia[0] == ib[0] */ - if (ia[1] < ib[1]) { - return -1; - } - else if (ia[1] > ib[1]) { - return 1; - } - else { - return 0; - } - } -} - -/* Simple time function so that we don't have to deal with the - complexity of finding mpi.h to use MPI_Wtime */ -static double gettime(void) -{ - double wtime; - struct timeval tv; - gettimeofday(&tv, NULL); - wtime = tv.tv_sec; - wtime += (double)tv.tv_usec / 1000000.0; - - return wtime; -} - -static int test_graph_create(void *ctx) -{ - opal_btl_usnic_graph_t *g; - int i; - int err; - int user_data; - int index; - - /* TEST CASE: check zero-vertex case */ - g = NULL; - err = opal_btl_usnic_gr_create(NULL, NULL, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - check(opal_btl_usnic_gr_order(g) == 0); - check_graph_is_consistent(g); - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - - /* TEST CASE: check nonzero-vertex case with no cleanup routines */ - g = NULL; - err = opal_btl_usnic_gr_create(NULL, NULL, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - check_graph_is_consistent(g); - for (i = 0; i < 4; ++i) { - index = -1; - err = opal_btl_usnic_gr_add_vertex(g, &user_data, &index); - check_err_code(err, OPAL_SUCCESS); - check(index == i); - } - check(opal_btl_usnic_gr_order(g) == 4); - check_graph_is_consistent(g); - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - - /* TEST CASE: make sure cleanup routines are invoked properly */ - g = NULL; - v_cleanup_count = 0; - e_cleanup_count = 0; - err = opal_btl_usnic_gr_create(&v_cleanup, &e_cleanup, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - check_graph_is_consistent(g); - for (i = 0; i < 5; ++i) { - err = opal_btl_usnic_gr_add_vertex(g, &user_data, &index); - check_err_code(err, OPAL_SUCCESS); - check(index == i); - } - check(opal_btl_usnic_gr_order(g) == 5); - check_graph_is_consistent(g); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/1, - /*capacity=*/2, &user_data); - check_graph_is_consistent(g); - check(v_cleanup_count == 0); - check(e_cleanup_count == 0); - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - check(v_cleanup_count == 5); - check(e_cleanup_count == 1); - - return TEST_PASSED; -} - -static int test_graph_clone(void *ctx) -{ - opal_btl_usnic_graph_t *g, *gx; - int i; - int err; - int user_data; - int index; - - /* TEST CASE: make sure that simple cloning works fine */ - g = NULL; - v_cleanup_count = 0; - e_cleanup_count = 0; - err = opal_btl_usnic_gr_create(&v_cleanup, &e_cleanup, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - check_graph_is_consistent(g); - - /* add 5 edges */ - for (i = 0; i < 5; ++i) { - err = opal_btl_usnic_gr_add_vertex(g, &user_data, &index); - check_err_code(err, OPAL_SUCCESS); - } - check(opal_btl_usnic_gr_order(g) == 5); - check_graph_is_consistent(g); - - /* and two edges */ - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/1, - /*capacity=*/2, &user_data); - check_err_code(err, OPAL_SUCCESS); - check_graph_is_consistent(g); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/3, /*v=*/1, /*cost=*/2, - /*capacity=*/100, &user_data); - check_err_code(err, OPAL_SUCCESS); - check_graph_is_consistent(g); - - /* now clone it and ensure that we get the same kind of graph */ - gx = NULL; - err = opal_btl_usnic_gr_clone(g, /*copy_user_data=*/false, &gx); - check_err_code(err, OPAL_SUCCESS); - check(gx != NULL); - - /* double check that cleanups still happen as expected after cloning */ - err = opal_btl_usnic_gr_free(gx); - check_err_code(err, OPAL_SUCCESS); - check(v_cleanup_count == 0); - check(e_cleanup_count == 0); - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - check(v_cleanup_count == 5); - check(e_cleanup_count == 2); - - return TEST_PASSED; -} - -static int test_graph_accessors(void *ctx) -{ - opal_btl_usnic_graph_t *g; - int i; - int err; - - /* TEST CASE: check _indegree/_outdegree/_order work correctly */ - err = opal_btl_usnic_gr_create(NULL, NULL, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - - for (i = 0; i < 4; ++i) { - err = opal_btl_usnic_gr_add_vertex(g, NULL, NULL); - check_err_code(err, OPAL_SUCCESS); - - check(opal_btl_usnic_gr_indegree(g, i) == 0); - check(opal_btl_usnic_gr_outdegree(g, i) == 0); - } - - check(opal_btl_usnic_gr_order(g) == 4); - - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/2, /*cost=*/2, - /*capacity=*/1, NULL); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/1, /*cost=*/2, - /*capacity=*/1, NULL); - - check(opal_btl_usnic_gr_indegree(g, 0) == 0); - check(opal_btl_usnic_gr_outdegree(g, 0) == 2); - check(opal_btl_usnic_gr_indegree(g, 1) == 1); - check(opal_btl_usnic_gr_outdegree(g, 1) == 0); - check(opal_btl_usnic_gr_indegree(g, 2) == 1); - check(opal_btl_usnic_gr_outdegree(g, 2) == 0); - check(opal_btl_usnic_gr_indegree(g, 3) == 0); - check(opal_btl_usnic_gr_outdegree(g, 3) == 0); - - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - - return TEST_PASSED; -} - -static int test_graph_assignment_solver(void *ctx) -{ - opal_btl_usnic_graph_t *g; - int i; - int err; - int nme; - int *me; - int iter; - double start, end; - - /* TEST CASE: check that simple cases are solved correctly - * - * 0 --> 2 - * 1 --> 3 - */ - err = opal_btl_usnic_gr_create(NULL, NULL, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - - for (i = 0; i < 4; ++i) { - err = opal_btl_usnic_gr_add_vertex(g, NULL, NULL); - check_err_code(err, OPAL_SUCCESS); - } - - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/2, /*cost=*/10, - /*capacity=*/1, NULL); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/1, /*v=*/3, /*cost=*/2, - /*capacity=*/1, NULL); - - me = NULL; - err = opal_btl_usnic_solve_bipartite_assignment(g, - &nme, - &me); - check_err_code(err, OPAL_SUCCESS); - check_int_eq(nme, 2); - check(me != NULL); - qsort(me, nme, 2*sizeof(int), &cmp_int_pair); - check(me[0] == 0 && me[1] == 2); - check(me[2] == 1 && me[3] == 3); - - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - - - /* TEST CASE: left side has more vertices than the right side - * - * 0 --> 3 - * 1 --> 4 - * 2 --> 4 - */ - err = opal_btl_usnic_gr_create(NULL, NULL, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - - for (i = 0; i < 5; ++i) { - err = opal_btl_usnic_gr_add_vertex(g, NULL, NULL); - check_err_code(err, OPAL_SUCCESS); - } - - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/10, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/1, /*v=*/4, /*cost=*/2, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/2, /*v=*/4, /*cost=*/1, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - - me = NULL; - err = opal_btl_usnic_solve_bipartite_assignment(g, - &nme, - &me); - check_err_code(err, OPAL_SUCCESS); - check_int_eq(nme, 2); - check(me != NULL); - qsort(me, nme, 2*sizeof(int), &cmp_int_pair); - check(me[0] == 0 && me[1] == 3); - check(me[2] == 2 && me[3] == 4); - free(me); - - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - - - /* test Christian's case: - * 0 --> 2 - * 0 --> 3 - * 1 --> 3 - * - * make sure that 0-->2 & 1-->3 get chosen. - */ - err = opal_btl_usnic_gr_create(NULL, NULL, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - - for (i = 0; i < 4; ++i) { - err = opal_btl_usnic_gr_add_vertex(g, NULL, NULL); - check_err_code(err, OPAL_SUCCESS); - } - - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/2, /*cost=*/10, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/1, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/1, /*v=*/3, /*cost=*/5, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - - me = NULL; - err = opal_btl_usnic_solve_bipartite_assignment(g, - &nme, - &me); - check_err_code(err, OPAL_SUCCESS); - check_int_eq(nme, 2); - check(me != NULL); - qsort(me, nme, 2*sizeof(int), &cmp_int_pair); - check(me[0] == 0 && me[1] == 2); - check(me[2] == 1 && me[3] == 3); - free(me); - - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - - /* Also need to do this version of it to be safe: - * 0 --> 2 - * 1 --> 2 - * 1 --> 3 - * - * Should choose 0-->2 & 1-->3 here too. - */ - err = opal_btl_usnic_gr_create(NULL, NULL, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - - for (i = 0; i < 4; ++i) { - err = opal_btl_usnic_gr_add_vertex(g, NULL, NULL); - check_err_code(err, OPAL_SUCCESS); - } - - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/2, /*cost=*/10, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/1, /*v=*/2, /*cost=*/1, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/1, /*v=*/3, /*cost=*/5, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - - me = NULL; - err = opal_btl_usnic_solve_bipartite_assignment(g, - &nme, - &me); - check_err_code(err, OPAL_SUCCESS); - check_int_eq(nme, 2); - check(me != NULL); - qsort(me, nme, 2*sizeof(int), &cmp_int_pair); - check(me[0] == 0 && me[1] == 2); - check(me[2] == 1 && me[3] == 3); - free(me); - - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - - /* TEST CASE: test Christian's case with negative weights: - * 0 --> 2 - * 0 --> 3 - * 1 --> 3 - * - * make sure that 0-->2 & 1-->3 get chosen. - */ - err = opal_btl_usnic_gr_create(NULL, NULL, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - - for (i = 0; i < 4; ++i) { - err = opal_btl_usnic_gr_add_vertex(g, NULL, NULL); - check_err_code(err, OPAL_SUCCESS); - } - - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/2, /*cost=*/-1, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/-10, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/1, /*v=*/3, /*cost=*/-5, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - - me = NULL; - err = opal_btl_usnic_solve_bipartite_assignment(g, - &nme, - &me); - check_err_code(err, OPAL_SUCCESS); - check_int_eq(nme, 2); - check(me != NULL); - qsort(me, nme, 2*sizeof(int), &cmp_int_pair); - check(me[0] == 0 && me[1] == 2); - check(me[2] == 1 && me[3] == 3); - free(me); - - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - - - /* TEST CASE: add some disconnected vertices - * 0 --> 2 - * 0 --> 3 - * 1 --> 3 - * x --> 4 - * - * make sure that 0-->2 & 1-->3 get chosen. - */ - err = opal_btl_usnic_gr_create(NULL, NULL, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - - for (i = 0; i < 5; ++i) { - err = opal_btl_usnic_gr_add_vertex(g, NULL, NULL); - check_err_code(err, OPAL_SUCCESS); - } - - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/2, /*cost=*/-1, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/-10, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/1, /*v=*/3, /*cost=*/-5, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - - me = NULL; - err = opal_btl_usnic_solve_bipartite_assignment(g, - &nme, - &me); - check_err_code(err, OPAL_SUCCESS); - check_int_eq(nme, 2); - check(me != NULL); - qsort(me, nme, 2*sizeof(int), &cmp_int_pair); - check(me[0] == 0 && me[1] == 2); - check(me[2] == 1 && me[3] == 3); - free(me); - - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - - /* TEST CASE: sample UDP graph from bldsb005 + bldsb007 - * 0 --> 2 (cost -4294967296) - * 1 --> 2 (cost -4294967296) - * 0 --> 3 (cost -4294967296) - * 1 --> 3 (cost -4294967296) - * - * Make sure that either (0-->2 && 1-->3) or (0-->3 && 1-->2) get chosen. - */ - err = opal_btl_usnic_gr_create(NULL, NULL, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - - for (i = 0; i < 4; ++i) { - err = opal_btl_usnic_gr_add_vertex(g, NULL, NULL); - check_err_code(err, OPAL_SUCCESS); - } - - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/2, /*cost=*/-4294967296, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/1, /*v=*/2, /*cost=*/-4294967296, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/-4294967296, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/1, /*v=*/3, /*cost=*/-4294967296, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - - me = NULL; - err = opal_btl_usnic_solve_bipartite_assignment(g, - &nme, - &me); - check_err_code(err, OPAL_SUCCESS); - check_int_eq(nme, 2); - check(me != NULL); - qsort(me, nme, 2*sizeof(int), &cmp_int_pair); - if (me[1] == 2) { - check(me[0] == 0 && me[1] == 2); - check(me[2] == 1 && me[3] == 3); - } else { - check(me[0] == 0 && me[1] == 3); - check(me[2] == 1 && me[3] == 2); - } - free(me); - - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - - - /* TEST CASE: check that simple cases are solved correctly - * - * 0 --> 2 - * 1 --> 2 - */ - err = opal_btl_usnic_gr_create(NULL, NULL, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - - for (i = 0; i < 3; ++i) { - err = opal_btl_usnic_gr_add_vertex(g, NULL, NULL); - check_err_code(err, OPAL_SUCCESS); - } - - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/2, /*cost=*/-100, - /*capacity=*/1, NULL); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/1, /*v=*/2, /*cost=*/-100, - /*capacity=*/1, NULL); - - me = NULL; - err = opal_btl_usnic_solve_bipartite_assignment(g, - &nme, - &me); - check_err_code(err, OPAL_SUCCESS); - check_int_eq(nme, 1); - check(me != NULL); - qsort(me, nme, 2*sizeof(int), &cmp_int_pair); - check((me[0] == 0 || me[0] == 1) && me[1] == 2); - - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - - - /* TEST CASE: performance sanity check - * - * Construct this graph and ensure that it doesn't take too long on a large - * cluster (1000 nodes). - * 0 --> 3 - * 1 --> 4 - * 2 --> 4 - */ -#define NUM_ITER (10000) - start = gettime(); - for (iter = 0; iter < NUM_ITER; ++iter) { - err = opal_btl_usnic_gr_create(NULL, NULL, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - - for (i = 0; i < 5; ++i) { - err = opal_btl_usnic_gr_add_vertex(g, NULL, NULL); - check_err_code(err, OPAL_SUCCESS); - } - - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/10, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/1, /*v=*/4, /*cost=*/2, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/2, /*v=*/4, /*cost=*/1, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - - me = NULL; - err = opal_btl_usnic_solve_bipartite_assignment(g, - &nme, - &me); - check_err_code(err, OPAL_SUCCESS); - check_int_eq(nme, 2); - check(me != NULL); - qsort(me, nme, 2*sizeof(int), &cmp_int_pair); - check(me[0] == 0 && me[1] == 3); - check(me[2] == 2 && me[3] == 4); - free(me); - - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - } - end = gettime(); - /* ensure that this operation on a 1000 node cluster will take less than one second */ - check(((end - start) / NUM_ITER) < 0.001); -#if 0 - fprintf(stderr, "timing for %d iterations is %f seconds (%f s/iter)\n", - NUM_ITER, end - start, (end - start) / NUM_ITER); -#endif - - return TEST_PASSED; -} - -static int test_graph_bellman_ford(void *ctx) -{ - opal_btl_usnic_graph_t *g; - int i; - int err; - bool path_found; - int *pred; - - /* TEST CASE: check that simple cases are solved correctly - * -> 0 --> 2 - * / \ - * 4 --> 5 - * \ / - * -> 1 --> 3 / - * - * should yield the path 5,1,3,6 (see costs in code below) - */ - err = opal_btl_usnic_gr_create(NULL, NULL, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - - for (i = 0; i < 6; ++i) { - err = opal_btl_usnic_gr_add_vertex(g, NULL, NULL); - check_err_code(err, OPAL_SUCCESS); - } - - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/2, /*cost=*/10, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/1, /*v=*/3, /*cost=*/2, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/4, /*v=*/0, /*cost=*/0, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/4, /*v=*/1, /*cost=*/0, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/2, /*v=*/5, /*cost=*/0, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/3, /*v=*/5, /*cost=*/0, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - - pred = malloc(6*sizeof(*pred)); - check(pred != NULL); - path_found = bellman_ford(g, /*source=*/4, /*target=*/5, pred); - check(path_found); - check_path_cycle(6, /*source=*/4, /*target=*/5, pred); - check_int_eq(pred[5], 3); - check_int_eq(pred[3], 1); - check_int_eq(pred[1], 4); - free(pred); - - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - - - /* TEST CASE: left side has more vertices than the right side, then - * convert to a flow network - * - * 0 --> 3 - * 1 --> 4 - * 2 --> 4 - */ - err = opal_btl_usnic_gr_create(NULL, NULL, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - - for (i = 0; i < 5; ++i) { - err = opal_btl_usnic_gr_add_vertex(g, NULL, NULL); - check_err_code(err, OPAL_SUCCESS); - } - - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/10, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/1, /*v=*/4, /*cost=*/2, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/2, /*v=*/4, /*cost=*/1, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - - err = bipartite_to_flow(g); - check_err_code(err, OPAL_SUCCESS); - - pred = malloc(7*sizeof(*pred)); - check(pred != NULL); - path_found = bellman_ford(g, /*source=*/5, /*target=*/6, pred); - check(path_found); - check_int_eq(g->source_idx, 5); - check_int_eq(g->sink_idx, 6); - check_path_cycle(7, /*source=*/5, /*target=*/6, pred); - check_int_eq(pred[6], 4); - check_int_eq(pred[4], 2); - check_int_eq(pred[2], 5); - free(pred); - - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - - /* TEST CASE: same as previous, but with very large cost values (try to - * catch incorrect integer conversions) - * - * 0 --> 3 - * 1 --> 4 - * 2 --> 4 - */ - err = opal_btl_usnic_gr_create(NULL, NULL, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - - for (i = 0; i < 5; ++i) { - err = opal_btl_usnic_gr_add_vertex(g, NULL, NULL); - check_err_code(err, OPAL_SUCCESS); - } - - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/INT32_MAX+10LL, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/1, /*v=*/4, /*cost=*/INT32_MAX+2LL, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/2, /*v=*/4, /*cost=*/INT32_MAX+1LL, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - - err = bipartite_to_flow(g); - check_err_code(err, OPAL_SUCCESS); - - pred = malloc(7*sizeof(*pred)); - check(pred != NULL); - path_found = bellman_ford(g, /*source=*/5, /*target=*/6, pred); - check(path_found); - check_int_eq(g->source_idx, 5); - check_int_eq(g->sink_idx, 6); - check_path_cycle(7, /*source=*/5, /*target=*/6, pred); - check_int_eq(pred[6], 4); - check_int_eq(pred[4], 2); - check_int_eq(pred[2], 5); - free(pred); - - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - - /* TEST CASE: left side has more vertices than the right side, then - * convert to a flow network. Negative costs are used, but should not - * result in a negative cycle. - * - * 0 --> 3 - * 1 --> 4 - * 2 --> 4 - */ - err = opal_btl_usnic_gr_create(NULL, NULL, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - - for (i = 0; i < 5; ++i) { - err = opal_btl_usnic_gr_add_vertex(g, NULL, NULL); - check_err_code(err, OPAL_SUCCESS); - } - - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/-1, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/1, /*v=*/4, /*cost=*/-2, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/2, /*v=*/4, /*cost=*/-10, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - - err = bipartite_to_flow(g); - check_err_code(err, OPAL_SUCCESS); - - pred = malloc(7*sizeof(*pred)); - check(pred != NULL); - path_found = bellman_ford(g, /*source=*/5, /*target=*/6, pred); - check(path_found); - check_int_eq(g->source_idx, 5); - check_int_eq(g->sink_idx, 6); - check_path_cycle(7, /*source=*/5, /*target=*/6, pred); - check_int_eq(pred[6], 4); - check_int_eq(pred[4], 2); - check_int_eq(pred[2], 5); - free(pred); - - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - - return TEST_PASSED; -} - -static int test_graph_flow_conversion(void *ctx) -{ - opal_btl_usnic_graph_t *g; - int i; - int err; - - /* TEST CASE: left side has more vertices than the right side, then - * convert to a flow network - * - * 0 --> 3 - * 1 --> 4 - * 2 --> 4 - */ - err = opal_btl_usnic_gr_create(NULL, NULL, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - - for (i = 0; i < 5; ++i) { - err = opal_btl_usnic_gr_add_vertex(g, NULL, NULL); - check_err_code(err, OPAL_SUCCESS); - } - - err = opal_btl_usnic_gr_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/10, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/1, /*v=*/4, /*cost=*/2, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/2, /*v=*/4, /*cost=*/1, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - - check_int_eq(opal_btl_usnic_gr_order(g), 5); - check_has_in_out_degree(g, 0, /*exp_indeg=*/0, /*exp_outdeg=*/1); - check_has_in_out_degree(g, 1, /*exp_indeg=*/0, /*exp_outdeg=*/1); - check_has_in_out_degree(g, 2, /*exp_indeg=*/0, /*exp_outdeg=*/1); - check_has_in_out_degree(g, 3, /*exp_indeg=*/1, /*exp_outdeg=*/0); - check_has_in_out_degree(g, 4, /*exp_indeg=*/2, /*exp_outdeg=*/0); - - /* this should add two nodes and a bunch of edges */ - err = bipartite_to_flow(g); - check_err_code(err, OPAL_SUCCESS); - - check_int_eq(opal_btl_usnic_gr_order(g), 7); - check_has_in_out_degree(g, 0, /*exp_indeg=*/2, /*exp_outdeg=*/2); - check_has_in_out_degree(g, 1, /*exp_indeg=*/2, /*exp_outdeg=*/2); - check_has_in_out_degree(g, 2, /*exp_indeg=*/2, /*exp_outdeg=*/2); - check_has_in_out_degree(g, 3, /*exp_indeg=*/2, /*exp_outdeg=*/2); - check_has_in_out_degree(g, 4, /*exp_indeg=*/3, /*exp_outdeg=*/3); - check_has_in_out_degree(g, 5, /*exp_indeg=*/3, /*exp_outdeg=*/3); - check_has_in_out_degree(g, 6, /*exp_indeg=*/2, /*exp_outdeg=*/2); - - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - - - /* TEST CASE: empty graph - * - * there's no reason that the code should bother to support this, it's not - * useful - */ - err = opal_btl_usnic_gr_create(NULL, NULL, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - check_int_eq(opal_btl_usnic_gr_order(g), 0); - err = bipartite_to_flow(g); - check_err_code(err, OPAL_ERR_BAD_PARAM); - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - - return TEST_PASSED; -} - -static int test_graph_param_checking(void *ctx) -{ - opal_btl_usnic_graph_t *g; - int i; - int err; - - err = opal_btl_usnic_gr_create(NULL, NULL, &g); - check_err_code(err, OPAL_SUCCESS); - check(g != NULL); - - /* try with no vertices */ - err = opal_btl_usnic_gr_add_edge(g, /*u=*/3, /*v=*/5, /*cost=*/0, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_ERR_BAD_PARAM); - - for (i = 0; i < 6; ++i) { - err = opal_btl_usnic_gr_add_vertex(g, NULL, NULL); - check_err_code(err, OPAL_SUCCESS); - } - - /* try u out of range */ - err = opal_btl_usnic_gr_add_edge(g, /*u=*/9, /*v=*/5, /*cost=*/0, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_ERR_BAD_PARAM); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/6, /*v=*/5, /*cost=*/0, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_ERR_BAD_PARAM); - - /* try v out of range */ - err = opal_btl_usnic_gr_add_edge(g, /*u=*/2, /*v=*/8, /*cost=*/0, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_ERR_BAD_PARAM); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/2, /*v=*/6, /*cost=*/0, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_ERR_BAD_PARAM); - - /* try adding an edge that already exists */ - err = opal_btl_usnic_gr_add_edge(g, /*u=*/2, /*v=*/4, /*cost=*/0, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/2, /*v=*/4, /*cost=*/0, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_EXISTS); - - /* try an edge with an out of range cost */ - err = opal_btl_usnic_gr_add_edge(g, /*u=*/2, /*v=*/3, /*cost=*/INT64_MAX, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_ERR_BAD_PARAM); - err = opal_btl_usnic_gr_add_edge(g, /*u=*/2, /*v=*/3, /*cost=*/INT64_MAX-1, - /*capacity=*/1, NULL); - check_err_code(err, OPAL_SUCCESS); - - err = opal_btl_usnic_gr_free(g); - check_err_code(err, OPAL_SUCCESS); - - return TEST_PASSED; -} - -static int test_graph_helper_macros(void *ctx) -{ - int u, v; - int pred[6]; - bool visited[6][6]; - int pair1[2]; - int pair2[2]; - -#define RESET_ARRAYS(n, pred, visited) \ - do { \ - for (u = 0; u < 6; ++u) { \ - pred[u] = -1; \ - for (v = 0; v < 6; ++v) { \ - visited[u][v] = false; \ - } \ - } \ - } while (0) - - /* TEST CASE: make sure that an empty path does not cause any edges to be - * visited */ - RESET_ARRAYS(6, pred, visited); - FOREACH_UV_ON_PATH(pred, 3, 5, u, v) { - visited[u][v] = true; - } - for (u = 0; u < 6; ++u) { - for (v = 0; v < 6; ++v) { - check(visited[u][v] == false); - } - } - - /* TEST CASE: make sure that every edge in the given path gets visited */ - RESET_ARRAYS(6, pred, visited); - pred[5] = 2; - pred[2] = 1; - pred[1] = 3; - FOREACH_UV_ON_PATH(pred, 3, 5, u, v) { - visited[u][v] = true; - } - for (u = 0; u < 6; ++u) { - for (v = 0; v < 6; ++v) { - if ((u == 2 && v == 5) || - (u == 1 && v == 2) || - (u == 3 && v == 1)) { - check(visited[u][v] == true); - } - else { - check(visited[u][v] == false); - } - } - } - -#undef RESET_ARRAYS - - /* not technically a macro, but make sure that the pair comparison function - * isn't broken (because it was in an earlier revision...) */ - pair1[0] = 0; pair1[1] = 1; - pair2[0] = 0; pair2[1] = 1; - check(cmp_int_pair(&pair1[0], &pair2[0]) == 0); - - pair1[0] = 1; pair1[1] = 1; - pair2[0] = 0; pair2[1] = 1; - check(cmp_int_pair(pair1, pair2) > 0); - - pair1[0] = 0; pair1[1] = 1; - pair2[0] = 1; pair2[1] = 1; - check(cmp_int_pair(pair1, pair2) < 0); - - pair1[0] = 1; pair1[1] = 0; - pair2[0] = 1; pair2[1] = 1; - check(cmp_int_pair(pair1, pair2) < 0); - - pair1[0] = 1; pair1[1] = 1; - pair2[0] = 1; pair2[1] = 0; - check(cmp_int_pair(pair1, pair2) > 0); - - return TEST_PASSED; -} - -USNIC_REGISTER_TEST("test_graph_create", test_graph_create, NULL) -USNIC_REGISTER_TEST("test_graph_clone", test_graph_clone, NULL) -USNIC_REGISTER_TEST("test_graph_accessors", test_graph_accessors, NULL) -USNIC_REGISTER_TEST("test_graph_assignment_solver", test_graph_assignment_solver, NULL) -USNIC_REGISTER_TEST("test_graph_bellman_ford", test_graph_bellman_ford, NULL) -USNIC_REGISTER_TEST("test_graph_flow_conversion", test_graph_flow_conversion, NULL) -USNIC_REGISTER_TEST("test_graph_param_checking", test_graph_param_checking, NULL) -USNIC_REGISTER_TEST("test_graph_helper_macros", test_graph_helper_macros, NULL) - -#endif /* OPAL_BTL_USNIC_UNIT_TESTS */ - -#endif /* BTL_USNIC_GRAPH_TEST_H */ diff --git a/opal/mca/btl/vader/Makefile.am b/opal/mca/btl/vader/Makefile.am index deaf5e06cb2..2280c490e91 100644 --- a/opal/mca/btl/vader/Makefile.am +++ b/opal/mca/btl/vader/Makefile.am @@ -12,6 +12,7 @@ # Copyright (c) 2009-2014 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2011-2014 Los Alamos National Security, LLC. All rights # reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -57,7 +58,8 @@ mcacomponentdir = $(opallibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_btl_vader_la_SOURCES = $(libmca_btl_vader_la_sources) mca_btl_vader_la_LDFLAGS = -module -avoid-version $(btl_vader_LDFLAGS) -mca_btl_vader_la_LIBADD = $(btl_vader_LIBS) +mca_btl_vader_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la \ + $(btl_vader_LIBS) noinst_LTLIBRARIES = $(component_noinst) libmca_btl_vader_la_SOURCES = $(libmca_btl_vader_la_sources) diff --git a/opal/mca/btl/vader/btl_vader.h b/opal/mca/btl/vader/btl_vader.h index 5290a7faa78..f0e8ef678f5 100644 --- a/opal/mca/btl/vader/btl_vader.h +++ b/opal/mca/btl/vader/btl_vader.h @@ -12,7 +12,7 @@ * All rights reserved. * Copyright (c) 2006-2007 Voltaire. All rights reserved. * Copyright (c) 2009-2010 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2010-2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2010-2017 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2015 Mellanox Technologies. All rights reserved. * @@ -136,6 +136,8 @@ struct mca_btl_vader_component_t { opal_list_t pending_endpoints; /**< list of endpoints with pending fragments */ opal_list_t pending_fragments; /**< fragments pending remote completion */ + char *backing_directory; /**< directory to place shared memory backing files */ + /* knem stuff */ #if OPAL_BTL_VADER_HAVE_KNEM unsigned int knem_dma_min; /**< minimum size to enable DMA for knem transfers (0 disables) */ diff --git a/opal/mca/btl/vader/btl_vader_component.c b/opal/mca/btl/vader/btl_vader_component.c index 38cc5fb987a..edd49dd0bd1 100644 --- a/opal/mca/btl/vader/btl_vader_component.c +++ b/opal/mca/btl/vader/btl_vader_component.c @@ -12,11 +12,11 @@ * All rights reserved. * Copyright (c) 2006-2007 Voltaire. All rights reserved. * Copyright (c) 2009-2010 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2010-2015 Los Alamos National Security, LLC. + * Copyright (c) 2010-2017 Los Alamos National Security, LLC. * All rights reserved. * Copyright (c) 2011 NVIDIA Corporation. All rights reserved. - * Copyright (c) 2014-2015 Intel, Inc. All rights reserved. - * Copyright (c) 2014-2016 Research Organization for Information Science + * Copyright (c) 2014-2018 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2018 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -211,6 +211,19 @@ static int mca_btl_vader_component_register (void) OPAL_INFO_LVL_3, MCA_BASE_VAR_SCOPE_GROUP, &mca_btl_vader_component.single_copy_mechanism); OBJ_RELEASE(new_enum); + if (0 == access ("/dev/shm", W_OK)) { + mca_btl_vader_component.backing_directory = "/dev/shm"; + } else { + mca_btl_vader_component.backing_directory = opal_process_info.job_session_dir; + } + (void) mca_base_component_var_register (&mca_btl_vader_component.super.btl_version, "backing_directory", + "Directory to place backing files for shared memory communication. " + "This directory should be on a local filesystem such as /tmp or " + "/dev/shm (default: (linux) /dev/shm, (others) session directory)", + MCA_BASE_VAR_TYPE_STRING, NULL, 0, 0, OPAL_INFO_LVL_3, + MCA_BASE_VAR_SCOPE_READONLY, &mca_btl_vader_component.backing_directory); + + #if OPAL_BTL_VADER_HAVE_KNEM /* Currently disabling DMA mode by default; it's not clear that this is useful in all applications and architectures. */ mca_btl_vader_component.knem_dma_min = 0; @@ -491,12 +504,15 @@ static mca_btl_base_module_t **mca_btl_vader_component_init (int *num_btls, if (MCA_BTL_VADER_XPMEM != mca_btl_vader_component.single_copy_mechanism) { char *sm_file; - rc = asprintf(&sm_file, "%s" OPAL_PATH_SEP "vader_segment.%s.%d", opal_process_info.proc_session_dir, - opal_process_info.nodename, MCA_BTL_VADER_LOCAL_RANK); + rc = asprintf(&sm_file, "%s" OPAL_PATH_SEP "vader_segment.%s.%x.%d", mca_btl_vader_component.backing_directory, + opal_process_info.nodename, OPAL_PROC_MY_NAME.jobid, MCA_BTL_VADER_LOCAL_RANK); if (0 > rc) { free (btls); return NULL; } + if (NULL != opal_pmix.register_cleanup) { + opal_pmix.register_cleanup (sm_file, false, false, false); + } rc = opal_shmem_segment_create (&component->seg_ds, sm_file, component->segment_size); free (sm_file); diff --git a/opal/mca/btl/vader/btl_vader_fbox.h b/opal/mca/btl/vader/btl_vader_fbox.h index 6f09cb6c513..abaf12811e4 100644 --- a/opal/mca/btl/vader/btl_vader_fbox.h +++ b/opal/mca/btl/vader/btl_vader_fbox.h @@ -22,12 +22,12 @@ typedef union mca_btl_vader_fbox_hdr_t { * in multiple instructions. To ensure that seq is never loaded before tag * and the tag is never read before seq put them in the same 32-bits of the * header. */ + /** message size */ + uint32_t size; /** message tag */ uint16_t tag; /** sequence number */ uint16_t seq; - /** message size */ - uint32_t size; } data; uint64_t ival; } mca_btl_vader_fbox_hdr_t; @@ -52,20 +52,24 @@ static inline void mca_btl_vader_fbox_set_header (mca_btl_vader_fbox_hdr_t *hdr, { mca_btl_vader_fbox_hdr_t tmp = {.data = {.tag = tag, .seq = seq, .size = size}}; hdr->ival = tmp.ival; + opal_atomic_wmb (); } /* attempt to reserve a contiguous segment from the remote ep */ -static inline unsigned char *mca_btl_vader_reserve_fbox (mca_btl_base_endpoint_t *ep, size_t size) +static inline bool mca_btl_vader_fbox_sendi (mca_btl_base_endpoint_t *ep, unsigned char tag, + void * restrict header, const size_t header_size, + void * restrict payload, const size_t payload_size) { const unsigned int fbox_size = mca_btl_vader_component.fbox_size; + size_t size = header_size + payload_size; unsigned int start, end, buffer_free; size_t data_size = size; - unsigned char *dst; + unsigned char *dst, *data; bool hbs, hbm; /* don't try to use the per-peer buffer for messages that will fill up more than 25% of the buffer */ if (OPAL_UNLIKELY(NULL == ep->fbox_out.buffer || size > (fbox_size >> 2))) { - return NULL; + return false; } OPAL_THREAD_LOCK(&ep->lock); @@ -119,15 +123,23 @@ static inline unsigned char *mca_btl_vader_reserve_fbox (mca_btl_base_endpoint_t ep->fbox_out.end = (hbs << 31) | end; opal_atomic_wmb (); OPAL_THREAD_UNLOCK(&ep->lock); - return NULL; + return false; } } BTL_VERBOSE(("writing fragment of size %u to offset %u {start: 0x%x, end: 0x%x (hbs: %d)} of peer's buffer. free = %u", (unsigned int) size, end, start, end, hbs, buffer_free)); + data = dst + sizeof (mca_btl_vader_fbox_hdr_t); + + memcpy (data, header, header_size); + if (payload) { + /* inline sends are typically just pml headers (due to MCA_BTL_FLAGS_SEND_INPLACE) */ + memcpy (data + header_size, payload, payload_size); + } + /* write out part of the header now. the tag will be written when the data is available */ - mca_btl_vader_fbox_set_header (MCA_BTL_VADER_FBOX_HDR(dst), 0, ep->fbox_out.seq++, data_size); + mca_btl_vader_fbox_set_header (MCA_BTL_VADER_FBOX_HDR(dst), tag, ep->fbox_out.seq++, data_size); end += size; @@ -145,40 +157,6 @@ static inline unsigned char *mca_btl_vader_reserve_fbox (mca_btl_base_endpoint_t opal_atomic_wmb (); OPAL_THREAD_UNLOCK(&ep->lock); - return dst + sizeof (mca_btl_vader_fbox_hdr_t); -} - -static inline void mca_btl_vader_fbox_send (unsigned char * restrict fbox, unsigned char tag) -{ - /* ensure data writes have completed before we mark the data as available */ - opal_atomic_wmb (); - - /* the header proceeds the fbox buffer */ - MCA_BTL_VADER_FBOX_HDR ((intptr_t) fbox)[-1].data.tag = tag; -} - -static inline bool mca_btl_vader_fbox_sendi (mca_btl_base_endpoint_t *ep, unsigned char tag, - void * restrict header, const size_t header_size, - void * restrict payload, const size_t payload_size) -{ - const size_t total_size = header_size + payload_size; - unsigned char * restrict fbox; - - fbox = mca_btl_vader_reserve_fbox(ep, total_size); - if (OPAL_UNLIKELY(NULL == fbox)) { - return false; - } - - memcpy (fbox, header, header_size); - if (payload) { - /* inline sends are typically just pml headers (due to MCA_BTL_FLAGS_SEND_INPLACE) */ - memcpy (fbox + header_size, payload, payload_size); - } - - /* mark the fbox as sent */ - mca_btl_vader_fbox_send (fbox, tag); - - /* send complete */ return true; } @@ -261,14 +239,14 @@ static inline bool mca_btl_vader_check_fboxes (void) static inline void mca_btl_vader_try_fbox_setup (mca_btl_base_endpoint_t *ep, mca_btl_vader_hdr_t *hdr) { - if (OPAL_UNLIKELY(NULL == ep->fbox_out.buffer && mca_btl_vader_component.fbox_threshold == OPAL_THREAD_ADD_SIZE_T (&ep->send_count, 1))) { + if (OPAL_UNLIKELY(NULL == ep->fbox_out.buffer && mca_btl_vader_component.fbox_threshold == OPAL_THREAD_ADD_FETCH_SIZE_T (&ep->send_count, 1))) { /* protect access to mca_btl_vader_component.segment_offset */ OPAL_THREAD_LOCK(&mca_btl_vader_component.lock); if (mca_btl_vader_component.segment_size >= mca_btl_vader_component.segment_offset + mca_btl_vader_component.fbox_size && mca_btl_vader_component.fbox_max > mca_btl_vader_component.fbox_count) { /* verify the remote side will accept another fbox */ - if (0 <= opal_atomic_add_32 (&ep->fifo->fbox_available, -1)) { + if (0 <= opal_atomic_add_fetch_32 (&ep->fifo->fbox_available, -1)) { void *fbox_base = mca_btl_vader_component.my_segment + mca_btl_vader_component.segment_offset; mca_btl_vader_component.segment_offset += mca_btl_vader_component.fbox_size; @@ -280,7 +258,7 @@ static inline void mca_btl_vader_try_fbox_setup (mca_btl_base_endpoint_t *ep, mc hdr->fbox_base = virtual2relative((char *) ep->fbox_out.buffer); ++mca_btl_vader_component.fbox_count; } else { - opal_atomic_add_32 (&ep->fifo->fbox_available, 1); + opal_atomic_add_fetch_32 (&ep->fifo->fbox_available, 1); } opal_atomic_wmb (); diff --git a/opal/mca/btl/vader/btl_vader_fifo.h b/opal/mca/btl/vader/btl_vader_fifo.h index 5f6488b44bf..0dc70bc8a13 100644 --- a/opal/mca/btl/vader/btl_vader_fifo.h +++ b/opal/mca/btl/vader/btl_vader_fifo.h @@ -12,7 +12,7 @@ * All rights reserved. * Copyright (c) 2006-2007 Voltaire. All rights reserved. * Copyright (c) 2009-2010 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2010-2014 Los Alamos National Security, LLC. + * Copyright (c) 2010-2017 Los Alamos National Security, LLC. * All rights reserved. * $COPYRIGHT$ * @@ -30,8 +30,9 @@ #include "btl_vader_endpoint.h" #include "btl_vader_frag.h" +#define vader_item_compare_exchange(x, y, z) opal_atomic_compare_exchange_strong_ptr ((volatile void **) (x), (void **) (y), (void *) (z)) + #if SIZEOF_VOID_P == 8 - #define vader_item_cmpset(x, y, z) opal_atomic_cmpset_64((volatile int64_t *)(x), (int64_t)(y), (int64_t)(z)) #define vader_item_swap(x, y) opal_atomic_swap_64((volatile int64_t *)(x), (int64_t)(y)) #define MCA_BTL_VADER_OFFSET_MASK 0xffffffffll @@ -40,7 +41,6 @@ typedef int64_t fifo_value_t; #else - #define vader_item_cmpset(x, y, z) opal_atomic_cmpset_32((volatile int32_t *)(x), (int32_t)(y), (int32_t)(z)) #define vader_item_swap(x, y) opal_atomic_swap_32((volatile int32_t *)(x), (int32_t)(y)) #define MCA_BTL_VADER_OFFSET_MASK 0x00ffffffl @@ -138,7 +138,7 @@ static inline mca_btl_vader_hdr_t *vader_fifo_read (vader_fifo_t *fifo, struct m if (OPAL_UNLIKELY(VADER_FIFO_FREE == hdr->next)) { opal_atomic_rmb(); - if (!vader_item_cmpset (&fifo->fifo_tail, value, VADER_FIFO_FREE)) { + if (!vader_item_compare_exchange (&fifo->fifo_tail, &value, VADER_FIFO_FREE)) { while (VADER_FIFO_FREE == hdr->next) { opal_atomic_rmb (); } diff --git a/opal/mca/btl/vader/btl_vader_frag.c b/opal/mca/btl/vader/btl_vader_frag.c index 0cd45e10292..a132ea3d725 100644 --- a/opal/mca/btl/vader/btl_vader_frag.c +++ b/opal/mca/btl/vader/btl_vader_frag.c @@ -36,7 +36,6 @@ static inline void mca_btl_vader_frag_constructor (mca_btl_vader_frag_t *frag) frag->base.des_segments = frag->segments; frag->base.des_segment_count = 1; - frag->fbox = NULL; } int mca_btl_vader_frag_init (opal_free_list_item_t *item, void *ctx) diff --git a/opal/mca/btl/vader/btl_vader_frag.h b/opal/mca/btl/vader/btl_vader_frag.h index e89e87aba8f..a7ab4811950 100644 --- a/opal/mca/btl/vader/btl_vader_frag.h +++ b/opal/mca/btl/vader/btl_vader_frag.h @@ -67,8 +67,6 @@ struct mca_btl_vader_frag_t { mca_btl_base_segment_t segments[2]; /** endpoint this fragment is active on */ struct mca_btl_base_endpoint_t *endpoint; - /** fast box in use (or NULL) */ - unsigned char * restrict fbox; /** fragment header (in the shared memory region) */ mca_btl_vader_hdr_t *hdr; /** free list this fragment was allocated within */ @@ -95,7 +93,6 @@ static inline void mca_btl_vader_frag_return (mca_btl_vader_frag_t *frag) frag->segments[0].seg_addr.pval = (char *)(frag->hdr + 1); frag->base.des_segment_count = 1; - frag->fbox = NULL; opal_free_list_return (frag->my_list, (opal_free_list_item_t *)frag); } diff --git a/opal/mca/btl/vader/btl_vader_get.c b/opal/mca/btl/vader/btl_vader_get.c index f77a1df8216..add6889aa14 100644 --- a/opal/mca/btl/vader/btl_vader_get.c +++ b/opal/mca/btl/vader/btl_vader_get.c @@ -2,6 +2,8 @@ /* * Copyright (c) 2010-2014 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2018 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -23,6 +25,7 @@ #include "opal/sys/cma.h" #endif /* OPAL_CMA_NEED_SYSCALL_DEFS */ + #endif /** @@ -71,11 +74,34 @@ int mca_btl_vader_get_cma (mca_btl_base_module_t *btl, mca_btl_base_endpoint_t * struct iovec dst_iov = {.iov_base = local_address, .iov_len = size}; ssize_t ret; - ret = process_vm_readv (endpoint->segment_data.other.seg_ds->seg_cpid, &dst_iov, 1, &src_iov, 1, 0); - if (ret != (ssize_t)size) { - opal_output(0, "Read %ld, expected %lu, errno = %d\n", (long)ret, (unsigned long)size, errno); - return OPAL_ERROR; - } + /* + * According to the man page : + * "On success, process_vm_readv() returns the number of bytes read and + * process_vm_writev() returns the number of bytes written. This return + * value may be less than the total number of requested bytes, if a + * partial read/write occurred. (Partial transfers apply at the + * granularity of iovec elements. These system calls won't perform a + * partial transfer that splits a single iovec element.)". + * So since we use a single iovec element, the returned size should either + * be 0 or size, and the do loop should not be needed here. + * We tried on various Linux kernels with size > 2 GB, and surprisingly, + * the returned value is always 0x7ffff000 (fwiw, it happens to be the size + * of the larger number of pages that fits a signed 32 bits integer). + * We do not know whether this is a bug from the kernel, the libc or even + * the man page, but for the time being, we do as is process_vm_readv() could + * return any value. + */ + do { + ret = process_vm_readv (endpoint->segment_data.other.seg_ds->seg_cpid, &dst_iov, 1, &src_iov, 1, 0); + if (0 > ret) { + opal_output(0, "Read %ld, expected %lu, errno = %d\n", (long)ret, (unsigned long)size, errno); + return OPAL_ERROR; + } + src_iov.iov_base = (void *)((char *)src_iov.iov_base + ret); + src_iov.iov_len -= ret; + dst_iov.iov_base = (void *)((char *)dst_iov.iov_base + ret); + dst_iov.iov_len -= ret; + } while (0 < src_iov.iov_len); /* always call the callback function */ cbfunc (btl, endpoint, local_address, local_handle, cbcontext, cbdata, OPAL_SUCCESS); diff --git a/opal/mca/btl/vader/btl_vader_knem.c b/opal/mca/btl/vader/btl_vader_knem.c index 96a7e775272..69139cb1bfe 100644 --- a/opal/mca/btl/vader/btl_vader_knem.c +++ b/opal/mca/btl/vader/btl_vader_knem.c @@ -1,6 +1,6 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* - * Copyright (c) 2014-2015 Los Alamos National Security, LLC. All rights + * Copyright (c) 2014-2017 Los Alamos National Security, LLC. All rights * reserved. * $COPYRIGHT$ * @@ -109,7 +109,6 @@ int mca_btl_vader_knem_init (void) struct knem_cmd_info knem_info; int rc; - signal (SIGSEGV, SIG_DFL); /* Open the knem device. Try to print a helpful message if we fail to open it. */ mca_btl_vader.knem_fd = open("/dev/knem", O_RDWR); diff --git a/opal/mca/btl/vader/btl_vader_module.c b/opal/mca/btl/vader/btl_vader_module.c index 5c9c0849476..c28012ffc7f 100644 --- a/opal/mca/btl/vader/btl_vader_module.c +++ b/opal/mca/btl/vader/btl_vader_module.c @@ -440,7 +440,6 @@ static struct mca_btl_base_descriptor_t *vader_prepare_src (struct mca_btl_base_ { const size_t total_size = reserve + *size; mca_btl_vader_frag_t *frag; - unsigned char *fbox; void *data_ptr; int rc; @@ -506,19 +505,6 @@ static struct mca_btl_base_descriptor_t *vader_prepare_src (struct mca_btl_base_ frag->base.des_segment_count = 2; } else { #endif - - /* inline send */ - if (OPAL_LIKELY(MCA_BTL_DES_FLAGS_BTL_OWNERSHIP & flags)) { - /* try to reserve a fast box for this transfer only if the - * fragment does not belong to the caller */ - fbox = mca_btl_vader_reserve_fbox (endpoint, total_size); - if (OPAL_LIKELY(fbox)) { - frag->segments[0].seg_addr.pval = fbox; - } - - frag->fbox = fbox; - } - /* NTH: the covertor adds some latency so we bypass it here */ memcpy ((void *)((uintptr_t)frag->segments[0].seg_addr.pval + reserve), data_ptr, *size); frag->segments[0].seg_len = total_size; diff --git a/opal/mca/btl/vader/btl_vader_put.c b/opal/mca/btl/vader/btl_vader_put.c index c3d21124126..ec2690d312e 100644 --- a/opal/mca/btl/vader/btl_vader_put.c +++ b/opal/mca/btl/vader/btl_vader_put.c @@ -2,8 +2,8 @@ /* * Copyright (c) 2010-2014 Los Alamos National Security, LLC. All rights * reserved. - * Copyright (c) 2014 Research Organization for Information Science - * and Technology (RIST). All rights reserved. + * Copyright (c) 2014-2018 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -69,11 +69,18 @@ int mca_btl_vader_put_cma (mca_btl_base_module_t *btl, mca_btl_base_endpoint_t * struct iovec dst_iov = {.iov_base = (void *)(intptr_t) remote_address, .iov_len = size}; ssize_t ret; - ret = process_vm_writev (endpoint->segment_data.other.seg_ds->seg_cpid, &src_iov, 1, &dst_iov, 1, 0); - if (ret != (ssize_t)size) { - opal_output(0, "Wrote %ld, expected %lu, errno = %d\n", (long)ret, (unsigned long)size, errno); - return OPAL_ERROR; - } + /* This should not be needed, see the rationale in mca_btl_vader_get_cma() */ + do { + ret = process_vm_writev (endpoint->segment_data.other.seg_ds->seg_cpid, &src_iov, 1, &dst_iov, 1, 0); + if (0 > ret) { + opal_output(0, "Wrote %ld, expected %lu, errno = %d\n", (long)ret, (unsigned long)size, errno); + return OPAL_ERROR; + } + src_iov.iov_base = (void *)((char *)src_iov.iov_base + ret); + src_iov.iov_len -= ret; + dst_iov.iov_base = (void *)((char *)dst_iov.iov_base + ret); + dst_iov.iov_len -= ret; + } while (0 < src_iov.iov_len); /* always call the callback function */ cbfunc (btl, endpoint, local_address, local_handle, cbcontext, cbdata, OPAL_SUCCESS); diff --git a/opal/mca/btl/vader/btl_vader_send.c b/opal/mca/btl/vader/btl_vader_send.c index 08bfa5a6238..f4e1af823ab 100644 --- a/opal/mca/btl/vader/btl_vader_send.c +++ b/opal/mca/btl/vader/btl_vader_send.c @@ -42,12 +42,9 @@ int mca_btl_vader_send (struct mca_btl_base_module_t *btl, mca_btl_vader_frag_t *frag = (mca_btl_vader_frag_t *) descriptor; const size_t total_size = frag->segments[0].seg_len; - if (OPAL_LIKELY(frag->fbox)) { - mca_btl_vader_fbox_send (frag->fbox, tag); - mca_btl_vader_frag_complete (frag); - - return 1; - } + /* in order to work around a long standing ob1 bug (see #3845) we have to always + * make the callback. once this is fixed in ob1 we can restore the code below. */ + frag->base.des_flags |= MCA_BTL_DES_SEND_ALWAYS_CALLBACK; /* header (+ optional inline data) */ frag->hdr->len = total_size; @@ -69,6 +66,9 @@ int mca_btl_vader_send (struct mca_btl_base_module_t *btl, return OPAL_SUCCESS; } + return OPAL_SUCCESS; + +#if 0 if ((frag->hdr->flags & MCA_BTL_VADER_FLAG_SINGLE_COPY) || !(frag->base.des_flags & MCA_BTL_DES_FLAGS_BTL_OWNERSHIP)) { frag->base.des_flags |= MCA_BTL_DES_SEND_ALWAYS_CALLBACK; @@ -79,4 +79,5 @@ int mca_btl_vader_send (struct mca_btl_base_module_t *btl, /* data is gone (from the pml's perspective). frag callback/release will happen later */ return 1; +#endif } diff --git a/opal/mca/btl/vader/btl_vader_xpmem.c b/opal/mca/btl/vader/btl_vader_xpmem.c index f635b2c6cdf..11c4e10cee8 100644 --- a/opal/mca/btl/vader/btl_vader_xpmem.c +++ b/opal/mca/btl/vader/btl_vader_xpmem.c @@ -1,6 +1,6 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* - * Copyright (c) 2011-2014 Los Alamos National Security, LLC. All rights + * Copyright (c) 2011-2018 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2014 The University of Tennessee and The University * of Tennessee Research Foundation. All rights @@ -33,7 +33,6 @@ int mca_btl_vader_xpmem_init (void) } struct vader_check_reg_ctx_t { - mca_rcache_base_vma_module_t *vma_module; mca_btl_base_endpoint_t *ep; mca_rcache_base_registration_t **reg; uintptr_t base; @@ -54,17 +53,28 @@ static int vader_check_reg (mca_rcache_base_registration_t *reg, void *ctx) vader_ctx->reg[0] = reg; if (vader_ctx->bound <= (uintptr_t) reg->bound && vader_ctx->base >= (uintptr_t) reg->base) { - (void)opal_atomic_add (®->ref_count, 1); + opal_atomic_add (®->ref_count, 1); return 1; } - /* remove this pointer from the rcache and decrement its reference count - (so it is detached later) */ - mca_rcache_base_vma_delete (vader_ctx->vma_module, reg); - return 2; } +void vader_return_registration (mca_rcache_base_registration_t *reg, struct mca_btl_base_endpoint_t *ep) +{ + mca_rcache_base_vma_module_t *vma_module = mca_btl_vader_component.vma_module; + int32_t ref_count; + + ref_count = opal_atomic_add_fetch_32 (®->ref_count, -1); + if (OPAL_UNLIKELY(0 == ref_count && !(reg->flags & MCA_RCACHE_FLAGS_PERSIST))) { + mca_rcache_base_vma_delete (vma_module, reg); + + opal_memchecker_base_mem_noaccess (reg->rcache_context, (uintptr_t)(reg->bound - reg->base)); + (void)xpmem_detach (reg->rcache_context); + OBJ_RELEASE (reg); + } +} + /* look up the remote pointer in the peer rcache and attach if * necessary */ mca_rcache_base_registration_t *vader_get_registation (struct mca_btl_base_endpoint_t *ep, void *rem_ptr, @@ -73,7 +83,7 @@ mca_rcache_base_registration_t *vader_get_registation (struct mca_btl_base_endpo mca_rcache_base_vma_module_t *vma_module = mca_btl_vader_component.vma_module; uint64_t attach_align = 1 << mca_btl_vader_component.log_attach_align; mca_rcache_base_registration_t *reg = NULL; - vader_check_reg_ctx_t check_ctx = {.ep = ep, .reg = ®, .vma_module = vma_module}; + vader_check_reg_ctx_t check_ctx = {.ep = ep, .reg = ®}; xpmem_addr_t xpmem_addr; uintptr_t base, bound; int rc; @@ -88,16 +98,17 @@ mca_rcache_base_registration_t *vader_get_registation (struct mca_btl_base_endpo check_ctx.bound = bound; /* several segments may match the base pointer */ - rc = mca_rcache_base_vma_iterate (vma_module, (void *) base, bound - base, vader_check_reg, &check_ctx); + rc = mca_rcache_base_vma_iterate (vma_module, (void *) base, bound - base, true, vader_check_reg, &check_ctx); if (2 == rc) { + /* remove this pointer from the rcache and decrement its reference count + (so it is detached later) */ + mca_rcache_base_vma_delete (vma_module, reg); + /* start the new segment from the lower of the two bases */ base = (uintptr_t) reg->base < base ? (uintptr_t) reg->base : base; - if (OPAL_LIKELY(0 == opal_atomic_add_32 (®->ref_count, -1))) { - /* this pointer is not in use */ - (void) xpmem_detach (reg->rcache_context); - OBJ_RELEASE(reg); - } + /* remove the last reference to this registration */ + vader_return_registration (reg, ep); reg = NULL; } @@ -127,7 +138,9 @@ mca_rcache_base_registration_t *vader_get_registation (struct mca_btl_base_endpo opal_memchecker_base_mem_defined (reg->rcache_context, bound - base); - mca_rcache_base_vma_insert (vma_module, reg, 0); + if (!(flags & MCA_RCACHE_FLAGS_PERSIST)) { + mca_rcache_base_vma_insert (vma_module, reg, 0); + } } } @@ -138,22 +151,6 @@ mca_rcache_base_registration_t *vader_get_registation (struct mca_btl_base_endpo return reg; } -void vader_return_registration (mca_rcache_base_registration_t *reg, struct mca_btl_base_endpoint_t *ep) -{ - mca_rcache_base_vma_module_t *vma_module = mca_btl_vader_component.vma_module; - int32_t ref_count; - - ref_count = opal_atomic_add_32 (®->ref_count, -1); - if (OPAL_UNLIKELY(0 == ref_count && !(reg->flags & MCA_RCACHE_FLAGS_PERSIST))) { - /* protect rcache access */ - mca_rcache_base_vma_delete (vma_module, reg); - - opal_memchecker_base_mem_noaccess (reg->rcache_context, (uintptr_t)(reg->bound - reg->base)); - (void)xpmem_detach (reg->rcache_context); - OBJ_RELEASE (reg); - } -} - static int mca_btl_vader_endpoint_xpmem_rcache_cleanup (mca_rcache_base_registration_t *reg, void *ctx) { mca_rcache_base_vma_module_t *vma_module = mca_btl_vader_component.vma_module; @@ -161,7 +158,6 @@ static int mca_btl_vader_endpoint_xpmem_rcache_cleanup (mca_rcache_base_registra if ((intptr_t) reg->alloc_base == ep->peer_smp_rank) { /* otherwise dereg will fail on assert */ reg->ref_count = 0; - (void) mca_rcache_base_vma_delete (vma_module, reg); OBJ_RELEASE(reg); } @@ -172,7 +168,7 @@ void mca_btl_vader_xpmem_cleanup_endpoint (struct mca_btl_base_endpoint_t *ep) { /* clean out the registration cache */ (void) mca_rcache_base_vma_iterate (mca_btl_vader_component.vma_module, - NULL, (size_t) -1, + NULL, (size_t) -1, true, mca_btl_vader_endpoint_xpmem_rcache_cleanup, (void *) ep); if (ep->segment_base) { diff --git a/opal/mca/common/cuda/common_cuda.c b/opal/mca/common/cuda/common_cuda.c index 138ad7e658e..b8689dbf9cd 100644 --- a/opal/mca/common/cuda/common_cuda.c +++ b/opal/mca/common/cuda/common_cuda.c @@ -429,8 +429,10 @@ int mca_common_cuda_stage_one_init(void) if (true != stage_one_init_passed) { errmsg = opal_argv_join(errmsgs, '\n'); - opal_show_help("help-mpi-common-cuda.txt", "dlopen failed", true, - errmsg); + if (opal_warn_on_missing_libcuda) { + opal_show_help("help-mpi-common-cuda.txt", "dlopen failed", true, + errmsg); + } opal_cuda_support = 0; } opal_argv_free(errmsgs); @@ -1157,10 +1159,10 @@ int cuda_closememhandle(void *reg_data, mca_rcache_base_registration_t *reg) if (ctx_ok) { result = cuFunc.cuIpcCloseMemHandle((CUdeviceptr)cuda_reg->base.alloc_base); if (OPAL_UNLIKELY(CUDA_SUCCESS != result)) { - opal_show_help("help-mpi-common-cuda.txt", "cuIpcCloseMemHandle failed", - true, result, cuda_reg->base.alloc_base); - opal_output(0, "Sleep on %d", getpid()); - sleep(20); + if (CUDA_ERROR_DEINITIALIZED != result) { + opal_show_help("help-mpi-common-cuda.txt", "cuIpcCloseMemHandle failed", + true, result, cuda_reg->base.alloc_base); + } /* We will just continue on and hope things continue to work. */ } else { opal_output_verbose(10, mca_common_cuda_output, diff --git a/opal/mca/common/cuda/help-mpi-common-cuda.txt b/opal/mca/common/cuda/help-mpi-common-cuda.txt index a1877c35d6e..0b306cac3ed 100644 --- a/opal/mca/common/cuda/help-mpi-common-cuda.txt +++ b/opal/mca/common/cuda/help-mpi-common-cuda.txt @@ -166,7 +166,7 @@ The library attempted to open the following supporting CUDA libraries, but each of them failed. CUDA-aware support is disabled. %s If you are not interested in CUDA-aware support, then run with ---mca mpi_cuda_support 0 to suppress this message. If you are interested +--mca opal_warn_on_missing_libcuda 0 to suppress this message. If you are interested in CUDA-aware support, then try setting LD_LIBRARY_PATH to the location of libcuda.so.1 to get passed this issue. # diff --git a/opal/mca/common/libfabric/Makefile.am b/opal/mca/common/libfabric/Makefile.am deleted file mode 100644 index 5da6be35cd6..00000000000 --- a/opal/mca/common/libfabric/Makefile.am +++ /dev/null @@ -1,102 +0,0 @@ -# -# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana -# University Research and Technology -# Corporation. All rights reserved. -# Copyright (c) 2004-2013 The University of Tennessee and The University -# of Tennessee Research Foundation. All rights -# reserved. -# Copyright (c) 2004-2009 High Performance Computing Center Stuttgart, -# University of Stuttgart. All rights reserved. -# Copyright (c) 2004-2005 The Regents of the University of California. -# All rights reserved. -# Copyright (c) 2011-2013 NVIDIA Corporation. All rights reserved. -# Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. -# Copyright (c) 2015 Intel, Inc. All rights reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# -# A word of explanation... -# -# This library is linked against various MCA components because the -# support for libfabrics is needed in various places. -# -# Note that building this common component statically and linking -# against other dynamic components is *not* supported! - -AM_CPPFLAGS = $(opal_common_libfabric_CPPFLAGS) - -# Header files - -headers = \ - common_libfabric.h - -# Source files - -sources = \ - common_libfabric.c - -# As per above, we'll either have an installable or noinst result. -# The installable one should follow the same MCA prefix naming rules -# (i.e., libmca__.la). The noinst one can be named -# whatever it wants, although libmca___noinst.la is -# recommended. - -# To simplify components that link to this library, we will *always* -# have an output libtool library named libmca__.la -- even -# for case 2) described above (i.e., so there's no conditional logic -# necessary in component Makefile.am's that link to this library). -# Hence, if we're creating a noinst version of this library (i.e., -# case 2), we sym link it to the libmca__.la name -# (libtool will do the Right Things under the covers). See the -# all-local and clean-local rules, below, for how this is effected. - -lib_LTLIBRARIES = -noinst_LTLIBRARIES = -comp_inst = lib@OPAL_LIB_PREFIX@mca_common_libfabric.la -comp_noinst = lib@OPAL_LIB_PREFIX@mca_common_libfabric_noinst.la - -if MCA_BUILD_opal_common_libfabric_DSO -lib_LTLIBRARIES += $(comp_inst) -else -noinst_LTLIBRARIES += $(comp_noinst) -endif - -lib@OPAL_LIB_PREFIX@mca_common_libfabric_la_SOURCES = $(headers) $(sources) -lib@OPAL_LIB_PREFIX@mca_common_libfabric_la_LDFLAGS = \ - $(opal_common_libfabric_LDFLAGS) \ - -version-info $(libmca_opal_common_libfabric_so_version) -lib@OPAL_LIB_PREFIX@mca_common_libfabric_la_LIBADD = $(opal_common_libfabric_LIBS) - -lib@OPAL_LIB_PREFIX@mca_common_libfabric_noinst_la_SOURCES = $(headers) $(sources) -lib@OPAL_LIB_PREFIX@mca_common_libfabric_noinst_la_LDFLAGS = $(opal_common_libfabric_LDFLAGS) -lib@OPAL_LIB_PREFIX@mca_common_libfabric_noinst_la_LIBADD = $(opal_common_libfabric_LIBS) - -# Conditionally install the header files - -if WANT_INSTALL_HEADERS -opaldir = $(opalincludedir)/$(subdir) -opal_HEADERS = $(headers) -endif - -# These two rules will sym link the "noinst" libtool library filename -# to the installable libtool library filename in the case where we are -# compiling this component statically (case 2), described above). - -V=0 -OMPI_V_LN_SCOMP = $(ompi__v_LN_SCOMP_$V) -ompi__v_LN_SCOMP_ = $(ompi__v_LN_SCOMP_$AM_DEFAULT_VERBOSITY) -ompi__v_LN_SCOMP_0 = @echo " LN_S " `basename $(comp_inst)`; - -all-local: - $(OMPI_V_LN_SCOMP) if test -z "$(lib_LTLIBRARIES)"; then \ - rm -f "$(comp_inst)"; \ - $(LN_S) "$(comp_noinst)" "$(comp_inst)"; \ - fi - -clean-local: - if test -z "$(lib_LTLIBRARIES)"; then \ - rm -f "$(comp_inst)"; \ - fi diff --git a/opal/mca/common/libfabric/common_libfabric.c b/opal/mca/common/libfabric/common_libfabric.c deleted file mode 100644 index cb989af93c5..00000000000 --- a/opal/mca/common/libfabric/common_libfabric.c +++ /dev/null @@ -1,21 +0,0 @@ -/* - * Copyright (c) 2015 Intel, Inc. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#include "opal_config.h" -#include "opal/constants.h" - -#include -#include - -#include "common_libfabric.h" - -int mca_common_libfabric_register_mca_variables(void) -{ - return OPAL_SUCCESS; -} diff --git a/opal/mca/common/libfabric/common_libfabric.h b/opal/mca/common/libfabric/common_libfabric.h deleted file mode 100644 index 10bc05598f8..00000000000 --- a/opal/mca/common/libfabric/common_libfabric.h +++ /dev/null @@ -1,16 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2015 Intel, Inc. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef OPAL_MCA_COMMON_LIBFABRIC_H -#define OPAL_MCA_COMMON_LIBFABRIC_H - -OPAL_DECLSPEC int mca_common_libfabric_register_mca_variables(void); - -#endif /* OPAL_MCA_COMMON_LIBFABRIC_H */ diff --git a/opal/mca/common/libfabric/configure.m4 b/opal/mca/common/libfabric/configure.m4 deleted file mode 100644 index 49e7d46c895..00000000000 --- a/opal/mca/common/libfabric/configure.m4 +++ /dev/null @@ -1,30 +0,0 @@ -# -*- shell-script -*- -# -# Copyright (c) 2011-2013 NVIDIA Corporation. All rights reserved. -# Copyright (c) 2013 The University of Tennessee and The University -# of Tennessee Research Foundation. All rights -# reserved. -# Copyright (c) 2015 Intel, Inc. All rights reserved. -# Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - -AC_DEFUN([MCA_opal_common_libfabric_CONFIG],[ - AC_CONFIG_FILES([opal/mca/common/libfabric/Makefile]) - - # Check for libfabric. Note that $opal_common_libfabric_happy is - # used in other configure.m4's to know if libfabric configured - # successfully. - OPAL_CHECK_LIBFABRIC([opal_common_libfabric], - [opal_common_libfabric_happy=yes - common_libfabric_WRAPPER_EXTRA_LDFLAGS=$opal_common_libfabric_LDFLAGS - common_libfabric_WRAPPER_EXTRA_LIBS=$opal_common_libfabric_LIBS - $1], - [opal_common_libfabric_happy=no - $2]) - -])dnl diff --git a/opal/mca/common/ofi/Makefile.am b/opal/mca/common/ofi/Makefile.am new file mode 100644 index 00000000000..658e1a703f2 --- /dev/null +++ b/opal/mca/common/ofi/Makefile.am @@ -0,0 +1,105 @@ +# +# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana +# University Research and Technology +# Corporation. All rights reserved. +# Copyright (c) 2004-2013 The University of Tennessee and The University +# of Tennessee Research Foundation. All rights +# reserved. +# Copyright (c) 2004-2009 High Performance Computing Center Stuttgart, +# University of Stuttgart. All rights reserved. +# Copyright (c) 2004-2005 The Regents of the University of California. +# All rights reserved. +# Copyright (c) 2011-2013 NVIDIA Corporation. All rights reserved. +# Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2015 Intel, Inc. All rights reserved. +# Copyright (c) 2017 Los Alamos National Security, LLC. All rights +# reserved. +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# +# A word of explanation... +# +# This library is linked against various MCA components because the +# support for ofis is needed in various places. +# +# Note that building this common component statically and linking +# against other dynamic components is *not* supported! + +AM_CPPFLAGS = $(opal_common_ofi_CPPFLAGS) + +# Header files + +headers = \ + common_ofi.h + +# Source files + +sources = \ + common_ofi.c + +# As per above, we'll either have an installable or noinst result. +# The installable one should follow the same MCA prefix naming rules +# (i.e., libmca__.la). The noinst one can be named +# whatever it wants, although libmca___noinst.la is +# recommended. + +# To simplify components that link to this library, we will *always* +# have an output libtool library named libmca__.la -- even +# for case 2) described above (i.e., so there's no conditional logic +# necessary in component Makefile.am's that link to this library). +# Hence, if we're creating a noinst version of this library (i.e., +# case 2), we sym link it to the libmca__.la name +# (libtool will do the Right Things under the covers). See the +# all-local and clean-local rules, below, for how this is effected. + +lib_LTLIBRARIES = +noinst_LTLIBRARIES = +comp_inst = lib@OPAL_LIB_PREFIX@mca_common_ofi.la +comp_noinst = lib@OPAL_LIB_PREFIX@mca_common_ofi_noinst.la + + +if MCA_BUILD_opal_common_ofi_DSO +lib_LTLIBRARIES += $(comp_inst) +else +noinst_LTLIBRARIES += $(comp_noinst) +endif + +lib@OPAL_LIB_PREFIX@mca_common_ofi_la_SOURCES = $(headers) $(sources) +lib@OPAL_LIB_PREFIX@mca_common_ofi_la_LDFLAGS = \ + $(opal_common_ofi_LDFLAGS) \ + -version-info $(libmca_opal_common_ofi_so_version) +lib@OPAL_LIB_PREFIX@mca_common_ofi_la_LIBADD = $(opal_common_ofi_LIBS) + +lib@OPAL_LIB_PREFIX@mca_common_ofi_noinst_la_SOURCES = $(headers) $(sources) +lib@OPAL_LIB_PREFIX@mca_common_ofi_noinst_la_LDFLAGS = $(opal_common_ofi_LDFLAGS) +lib@OPAL_LIB_PREFIX@mca_common_ofi_noinst_la_LIBADD = $(opal_common_ofi_LIBS) + +# Conditionally install the header files + +if WANT_INSTALL_HEADERS +opaldir = $(opalincludedir)/$(subdir) +opal_HEADERS = $(headers) +endif + +# These two rules will sym link the "noinst" libtool library filename +# to the installable libtool library filename in the case where we are +# compiling this component statically (case 2), described above). + +V=0 +OMPI_V_LN_SCOMP = $(ompi__v_LN_SCOMP_$V) +ompi__v_LN_SCOMP_ = $(ompi__v_LN_SCOMP_$AM_DEFAULT_VERBOSITY) +ompi__v_LN_SCOMP_0 = @echo " LN_S " `basename $(comp_inst)`; + +all-local: + $(OMPI_V_LN_SCOMP) if test -z "$(lib_LTLIBRARIES)"; then \ + rm -f "$(comp_inst)"; \ + $(LN_S) "$(comp_noinst)" "$(comp_inst)"; \ + fi + +clean-local: + if test -z "$(lib_LTLIBRARIES)"; then \ + rm -f "$(comp_inst)"; \ + fi diff --git a/opal/mca/common/ofi/common_ofi.c b/opal/mca/common/ofi/common_ofi.c new file mode 100644 index 00000000000..c2d02be50bb --- /dev/null +++ b/opal/mca/common/ofi/common_ofi.c @@ -0,0 +1,23 @@ +/* + * Copyright (c) 2015 Intel, Inc. All rights reserved. + * Copyright (c) 2017 Los Alamos National Security, LLC. All rights + * reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "opal_config.h" +#include "opal/constants.h" + +#include +#include + +#include "common_ofi.h" + +int mca_common_ofi_register_mca_variables(void) +{ + return OPAL_SUCCESS; +} diff --git a/opal/mca/common/ofi/common_ofi.h b/opal/mca/common/ofi/common_ofi.h new file mode 100644 index 00000000000..bb5a04f35a8 --- /dev/null +++ b/opal/mca/common/ofi/common_ofi.h @@ -0,0 +1,18 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ +/* + * Copyright (c) 2015 Intel, Inc. All rights reserved. + * Copyright (c) 2017 Los Alamos National Security, LLC. All rights + * reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#ifndef OPAL_MCA_COMMON_OFI_H +#define OPAL_MCA_COMMON_OFI_H + +OPAL_DECLSPEC int mca_common_ofi_register_mca_variables(void); + +#endif /* OPAL_MCA_COMMON_OFI_H */ diff --git a/opal/mca/common/ofi/configure.m4 b/opal/mca/common/ofi/configure.m4 new file mode 100644 index 00000000000..4e47ad278dd --- /dev/null +++ b/opal/mca/common/ofi/configure.m4 @@ -0,0 +1,32 @@ +# -*- shell-script -*- +# +# Copyright (c) 2011-2013 NVIDIA Corporation. All rights reserved. +# Copyright (c) 2013 The University of Tennessee and The University +# of Tennessee Research Foundation. All rights +# reserved. +# Copyright (c) 2015 Intel, Inc. All rights reserved. +# Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 Los Alamos National Security, LLC. All rights +# reserved. +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +AC_DEFUN([MCA_opal_common_ofi_CONFIG],[ + AC_CONFIG_FILES([opal/mca/common/ofi/Makefile]) + + # Check for ofi. Note that $opal_common_ofi_happy is + # used in other configure.m4's to know if ofi configured + # successfully. + OPAL_CHECK_OFI([opal_common_ofi], + [opal_common_ofi_happy=yes + common_ofi_WRAPPER_EXTRA_LDFLAGS=$opal_common_ofi_LDFLAGS + common_ofi_WRAPPER_EXTRA_LIBS=$opal_common_ofi_LIBS + $1], + [opal_common_ofi_happy=no + $2]) + +])dnl diff --git a/opal/mca/common/libfabric/owner.txt b/opal/mca/common/ofi/owner.txt similarity index 100% rename from opal/mca/common/libfabric/owner.txt rename to opal/mca/common/ofi/owner.txt diff --git a/opal/mca/common/sm/common_sm.c b/opal/mca/common/sm/common_sm.c index 826e56e01ab..c6e2a0fdaf8 100644 --- a/opal/mca/common/sm/common_sm.c +++ b/opal/mca/common/sm/common_sm.c @@ -122,7 +122,7 @@ attach_and_init(opal_shmem_ds_t *shmem_bufp, /* initialize some segment information */ size_t mem_offset = map->module_data_addr - (unsigned char *)map->module_seg; - opal_atomic_init(&map->module_seg->seg_lock, OPAL_ATOMIC_UNLOCKED); + opal_atomic_lock_init(&map->module_seg->seg_lock, OPAL_ATOMIC_LOCK_UNLOCKED); map->module_seg->seg_inited = 0; map->module_seg->seg_num_procs_inited = 0; map->module_seg->seg_offset = mem_offset; @@ -131,7 +131,7 @@ attach_and_init(opal_shmem_ds_t *shmem_bufp, } /* increment the number of processes that are attached to the segment. */ - (void)opal_atomic_add_size_t(&map->module_seg->seg_num_procs_inited, 1); + (void)opal_atomic_add_fetch_size_t(&map->module_seg->seg_num_procs_inited, 1); /* commit the changes before we return */ opal_atomic_wmb(); diff --git a/opal/mca/common/verbs/common_verbs.h b/opal/mca/common/verbs/common_verbs.h index 36ce3d85d1f..f68bea086eb 100644 --- a/opal/mca/common/verbs/common_verbs.h +++ b/opal/mca/common/verbs/common_verbs.h @@ -172,7 +172,7 @@ OPAL_DECLSPEC int opal_common_verbs_qp_test(struct ibv_context *device_context, * Known limitations: * If ibv_fork_init is called after ibv_create_* functions - it will have no effect. * OMPI initializes verbs many times during initialization in the following verbs components: - * oob/ud, btl/openib, mtl/mxm, pml/yalla, oshmem/ikrit, oshmem/yoda, ompi/mca/coll/{fca,hcoll} + * oob/ud, btl/openib, mtl/mxm, pml/yalla, oshmem/ikrit, ompi/mca/coll/{fca,hcoll} * * So, ibv_fork_init should be called once, in the beginning of the init flow of every verb component * to proper request fork support. diff --git a/opal/mca/common/verbs/common_verbs_port.c b/opal/mca/common/verbs/common_verbs_port.c index 831ba3fbccd..973a82666ef 100644 --- a/opal/mca/common/verbs/common_verbs_port.c +++ b/opal/mca/common/verbs/common_verbs_port.c @@ -68,6 +68,10 @@ int opal_common_verbs_port_bw(struct ibv_port_attr *port_attr, /* EDR: 25.78125 Gbps * 64/66, in megabits */ *bandwidth = 25000; break; + case 64: + /* HDR: 50Gbps * 64/66, in megabits */ + *bandwidth = 50000; + break; default: /* Who knows? */ return OPAL_ERR_NOT_FOUND; diff --git a/opal/mca/compress/bzip/Makefile.am b/opal/mca/compress/bzip/Makefile.am index 41ed41f1c7d..90b9c363750 100644 --- a/opal/mca/compress/bzip/Makefile.am +++ b/opal/mca/compress/bzip/Makefile.am @@ -2,6 +2,7 @@ # Copyright (c) 2004-2010 The Trustees of Indiana University. # All rights reserved. # Copyright (c) 2014-2015 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -30,6 +31,7 @@ mcacomponentdir = $(opallibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_compress_bzip_la_SOURCES = $(sources) mca_compress_bzip_la_LDFLAGS = -module -avoid-version +mca_compress_bzip_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la noinst_LTLIBRARIES = $(component_noinst) libmca_compress_bzip_la_SOURCES = $(sources) diff --git a/opal/mca/compress/gzip/Makefile.am b/opal/mca/compress/gzip/Makefile.am index e2107b4ca19..40ee38cf091 100644 --- a/opal/mca/compress/gzip/Makefile.am +++ b/opal/mca/compress/gzip/Makefile.am @@ -2,6 +2,7 @@ # Copyright (c) 2004-2010 The Trustees of Indiana University. # All rights reserved. # Copyright (c) 2014-2015 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -30,6 +31,7 @@ mcacomponentdir = $(opallibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_compress_gzip_la_SOURCES = $(sources) mca_compress_gzip_la_LDFLAGS = -module -avoid-version +mca_compress_gzip_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la noinst_LTLIBRARIES = $(component_noinst) libmca_compress_gzip_la_SOURCES = $(sources) diff --git a/opal/mca/crs/base/base.h b/opal/mca/crs/base/base.h index a7c30a12f78..4ea7087a867 100644 --- a/opal/mca/crs/base/base.h +++ b/opal/mca/crs/base/base.h @@ -11,6 +11,7 @@ * All rights reserved. * Copyright (c) 2007 Evergrid, Inc. All rights reserved. * + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -120,9 +121,9 @@ BEGIN_C_DECLS typedef int (*opal_crs_base_self_restart_fn_t)(void); typedef int (*opal_crs_base_self_continue_fn_t)(void); - extern opal_crs_base_self_checkpoint_fn_t crs_base_self_checkpoint_fn; - extern opal_crs_base_self_restart_fn_t crs_base_self_restart_fn; - extern opal_crs_base_self_continue_fn_t crs_base_self_continue_fn; + extern opal_crs_base_self_checkpoint_fn_t ompi_crs_base_self_checkpoint_fn; + extern opal_crs_base_self_restart_fn_t ompi_crs_base_self_restart_fn; + extern opal_crs_base_self_continue_fn_t ompi_crs_base_self_continue_fn; OPAL_DECLSPEC int opal_crs_base_self_register_checkpoint_callback (opal_crs_base_self_checkpoint_fn_t function); diff --git a/opal/mca/crs/base/crs_base_fns.c b/opal/mca/crs/base/crs_base_fns.c index 923184e017d..ef5370451dc 100644 --- a/opal/mca/crs/base/crs_base_fns.c +++ b/opal/mca/crs/base/crs_base_fns.c @@ -14,6 +14,7 @@ * and Technology (RIST). All rights reserved. * Copyright (c) 2015 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -47,9 +48,9 @@ #include "opal/mca/crs/crs.h" #include "opal/mca/crs/base/base.h" -opal_crs_base_self_checkpoint_fn_t crs_base_self_checkpoint_fn = NULL; -opal_crs_base_self_restart_fn_t crs_base_self_restart_fn = NULL; -opal_crs_base_self_continue_fn_t crs_base_self_continue_fn = NULL; +opal_crs_base_self_checkpoint_fn_t ompi_crs_base_self_checkpoint_fn = NULL; +opal_crs_base_self_restart_fn_t ompi_crs_base_self_restart_fn = NULL; +opal_crs_base_self_continue_fn_t ompi_crs_base_self_continue_fn = NULL; /****************** * Local Functions @@ -330,19 +331,19 @@ int opal_crs_base_clear_options(opal_crs_base_ckpt_options_t *target) int opal_crs_base_self_register_checkpoint_callback(opal_crs_base_self_checkpoint_fn_t function) { - crs_base_self_checkpoint_fn = function; + ompi_crs_base_self_checkpoint_fn = function; return OPAL_SUCCESS; } int opal_crs_base_self_register_restart_callback(opal_crs_base_self_restart_fn_t function) { - crs_base_self_restart_fn = function; + ompi_crs_base_self_restart_fn = function; return OPAL_SUCCESS; } int opal_crs_base_self_register_continue_callback(opal_crs_base_self_continue_fn_t function) { - crs_base_self_continue_fn = function; + ompi_crs_base_self_continue_fn = function; return OPAL_SUCCESS; } diff --git a/ompi/mca/sharedfp/addproc/.opal_ignore b/opal/mca/crs/blcr/.opal_ignore similarity index 100% rename from ompi/mca/sharedfp/addproc/.opal_ignore rename to opal/mca/crs/blcr/.opal_ignore diff --git a/opal/mca/crs/blcr/Makefile.am b/opal/mca/crs/blcr/Makefile.am index 6743c1879c9..7e0e22bc4d1 100644 --- a/opal/mca/crs/blcr/Makefile.am +++ b/opal/mca/crs/blcr/Makefile.am @@ -8,6 +8,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -41,7 +42,8 @@ mcacomponentdir = $(opallibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_crs_blcr_la_SOURCES = $(sources) mca_crs_blcr_la_LDFLAGS = -module -avoid-version $(crs_blcr_LDFLAGS) -mca_crs_blcr_la_LIBADD = $(crs_blcr_LIBS) +mca_crs_blcr_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la \ + $(crs_blcr_LIBS) noinst_LTLIBRARIES = $(component_noinst) libmca_crs_blcr_la_SOURCES = $(sources) diff --git a/opal/mca/crs/blcr/crs_blcr_module.c b/opal/mca/crs/blcr/crs_blcr_module.c index eb9d6274421..c84e79bfbe2 100644 --- a/opal/mca/crs/blcr/crs_blcr_module.c +++ b/opal/mca/crs/blcr/crs_blcr_module.c @@ -10,6 +10,7 @@ * Copyright (c) 2007 Evergrid, Inc. All rights reserved. * Copyright (c) 2011 Oak Ridge National Labs. All rights reserved. * + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -640,7 +641,7 @@ static int opal_crs_blcr_thread_callback(void *arg) { else #endif { - if(OPAL_SUCCESS != (ret = trigger_user_inc_callback(OPAL_CR_INC_CRS_PRE_CKPT, + if(OPAL_SUCCESS != (ret = ompi_trigger_user_inc_callback(OPAL_CR_INC_CRS_PRE_CKPT, OPAL_CR_INC_STATE_PREPARE)) ) { ; } @@ -665,7 +666,7 @@ static int opal_crs_blcr_thread_callback(void *arg) { blcr_current_state = OPAL_CRS_CONTINUE; } - if( OPAL_SUCCESS != (ret = trigger_user_inc_callback(OPAL_CR_INC_CRS_POST_CKPT, + if( OPAL_SUCCESS != (ret = ompi_trigger_user_inc_callback(OPAL_CR_INC_CRS_POST_CKPT, (blcr_current_state == OPAL_CRS_CONTINUE ? OPAL_CR_INC_STATE_CONTINUE : OPAL_CR_INC_STATE_RESTART))) ) { diff --git a/opal/mca/reachable/weighted/.opal_ignore b/opal/mca/crs/criu/.opal_ignore similarity index 100% rename from opal/mca/reachable/weighted/.opal_ignore rename to opal/mca/crs/criu/.opal_ignore diff --git a/opal/mca/crs/criu/Makefile.am b/opal/mca/crs/criu/Makefile.am index 4754afe1296..1088e7be763 100644 --- a/opal/mca/crs/criu/Makefile.am +++ b/opal/mca/crs/criu/Makefile.am @@ -10,6 +10,7 @@ # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2014 Hochschule Esslingen. All rights reserved. # +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -41,7 +42,8 @@ mcacomponentdir = $(opallibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_crs_criu_la_SOURCES = $(sources) mca_crs_criu_la_LDFLAGS = -module -avoid-version $(crs_criu_LDFLAGS) -mca_crs_criu_la_LIBADD = $(crs_criu_LIBS) +mca_crs_criu_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la \ + $(crs_criu_LIBS) noinst_LTLIBRARIES = $(component_noinst) libmca_crs_criu_la_SOURCES = $(sources) diff --git a/orte/mca/rmaps/lama/.opal_ignore b/opal/mca/crs/dmtcp/.opal_ignore similarity index 100% rename from orte/mca/rmaps/lama/.opal_ignore rename to opal/mca/crs/dmtcp/.opal_ignore diff --git a/opal/mca/crs/dmtcp/Makefile.am b/opal/mca/crs/dmtcp/Makefile.am index 34dd01be912..91bbbe91a1b 100644 --- a/opal/mca/crs/dmtcp/Makefile.am +++ b/opal/mca/crs/dmtcp/Makefile.am @@ -2,6 +2,7 @@ # Copyright (c) 2010 The Trustees of Indiana University. # All rights reserved. # Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -33,7 +34,8 @@ mcacomponentdir = $(opallibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_crs_dmtcp_la_SOURCES = $(sources) mca_crs_dmtcp_la_LDFLAGS = -module -avoid-version $(crs_dmtcp_LDFLAGS) -mca_crs_dmtcp_la_LIBADD = $(crs_dmtcp_LIBS) +mca_crs_dmtcp_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la \ + $(crs_dmtcp_LIBS) noinst_LTLIBRARIES = $(component_noinst) libmca_crs_dmtcp_la_SOURCES = $(sources) diff --git a/opal/mca/crs/none/Makefile.am b/opal/mca/crs/none/Makefile.am index 2f8237db674..bed25017b2e 100644 --- a/opal/mca/crs/none/Makefile.am +++ b/opal/mca/crs/none/Makefile.am @@ -4,6 +4,7 @@ # Copyright (c) 2009 High Performance Computing Center Stuttgart, # University of Stuttgart. All rights reserved. # Copyright (c) 2010-2015 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -34,6 +35,7 @@ mcacomponentdir = $(opallibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_crs_none_la_SOURCES = $(sources) mca_crs_none_la_LDFLAGS = -module -avoid-version +mca_crs_none_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la noinst_LTLIBRARIES = $(component_noinst) libmca_crs_none_la_SOURCES = $(sources) diff --git a/opal/mca/crs/self/Makefile.am b/opal/mca/crs/self/Makefile.am index 3e61079e619..2b08e9de3b6 100644 --- a/opal/mca/crs/self/Makefile.am +++ b/opal/mca/crs/self/Makefile.am @@ -8,6 +8,7 @@ # Copyright (c) 2004-2005 The Regents of the University of California. # All rights reserved. # Copyright (c) 2010-2015 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -38,6 +39,7 @@ mcacomponentdir = $(opallibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_crs_self_la_SOURCES = $(sources) mca_crs_self_la_LDFLAGS = -module -avoid-version +mca_crs_self_la_LIBADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la noinst_LTLIBRARIES = $(component_noinst) libmca_crs_self_la_SOURCES = $(sources) diff --git a/opal/mca/dl/dlopen/configure.m4 b/opal/mca/dl/dlopen/configure.m4 index 74b59a25d4d..714b880edc1 100644 --- a/opal/mca/dl/dlopen/configure.m4 +++ b/opal/mca/dl/dlopen/configure.m4 @@ -1,6 +1,6 @@ # -*- shell-script -*- # -# Copyright (c) 2009-2015 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2009-2017 Cisco Systems, Inc. All rights reserved # # $COPYRIGHT$ # @@ -47,7 +47,8 @@ AC_DEFUN([MCA_opal_dl_dlopen_CONFIG],[ ]) AS_IF([test "$opal_dl_dlopen_happy" = "yes"], - [opal_dl_dlopen_ADD_LIBS=$opal_dl_dlopen_LIBS + [dl_dlopen_ADD_LIBS=$opal_dl_dlopen_LIBS + dl_dlopen_WRAPPER_EXTRA_LIBS=$opal_dl_dlopen_LIBS $1], [$2]) diff --git a/opal/mca/event/external/configure.m4 b/opal/mca/event/external/configure.m4 index cc789e3726c..498af38b405 100644 --- a/opal/mca/event/external/configure.m4 +++ b/opal/mca/event/external/configure.m4 @@ -2,9 +2,10 @@ # # Copyright (c) 2009-2013 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2013 Los Alamos National Security, LLC. All rights reserved. -# Copyright (c) 2015 Research Organization for Information Science +# Copyright (c) 2015-2017 Research Organization for Information Science # and Technology (RIST). All rights reserved. # +# Copyright (c) 2017 Intel, Inc. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -95,9 +96,27 @@ AC_DEFUN([MCA_opal_event_external_CONFIG],[ AS_IF([test "$with_libevent" != "external" && test "$with_libevent" != "yes"], [opal_event_dir=$with_libevent AC_MSG_RESULT([$opal_event_dir]) - OPAL_CHECK_WITHDIR([libevent], [$with_libdir], + OPAL_CHECK_WITHDIR([libevent], [$opal_event_dir], [include/event.h]) - ], + AS_IF([test -z "$with_libevent_libdir" || test "$with_libevent_libdir" = "yes"], + [AC_MSG_CHECKING([for $with_libevent/lib64]) + AS_IF([test -d "$with_libevent/lib64"], + [opal_event_libdir_found=yes + AC_MSG_RESULT([found])], + [opal_event_libdir_found=no + AC_MSG_RESULT([not found])]) + AS_IF([test "$opal_event_libdir_found" = "yes"], + [opal_event_libdir="$with_libevent/lib64"], + [AC_MSG_CHECKING([for $with_libevent/lib]) + AS_IF([test -d "$with_libevent/lib"], + [AC_MSG_RESULT([found]) + opal_event_libdir="$with_libevent/lib"], + [AC_MSG_RESULT([not found]) + AC_MSG_WARN([Library directories were not found:]) + AC_MSG_WARN([ $with_libevent/lib64]) + AC_MSG_WARN([ $with_libevent/lib]) + AC_MSG_WARN([Please use --with-libevent-libdir to identify it.]) + AC_MSG_ERROR([Cannot continue])])])])], [AC_MSG_RESULT([(default search paths)])]) AS_IF([test ! -z "$with_libevent_libdir" && test "$with_libevent_libdir" != "yes"], [opal_event_libdir="$with_libevent_libdir"]) @@ -127,6 +146,7 @@ AC_DEFUN([MCA_opal_event_external_CONFIG],[ AC_MSG_WARN([Open MPI requires libevent to be compiled with]) AC_MSG_WARN([thread support enabled]) AC_MSG_ERROR([Cannot continue])]) + AC_CHECK_LIB([event_pthreads], [evthread_use_pthreads], [], [AC_MSG_WARN([External libevent does not have thread support]) diff --git a/opal/mca/event/external/event_external_component.c b/opal/mca/event/external/event_external_component.c index 7856b7b06b8..aa0ebe0f24a 100644 --- a/opal/mca/event/external/event_external_component.c +++ b/opal/mca/event/external/event_external_component.c @@ -5,6 +5,7 @@ * reserved. * Copyright (c) 2017 IBM Corporation. All rights reserved. * + * Copyright (c) 2017 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -34,7 +35,7 @@ const char *opal_event_external_component_version_string = static int event_external_open(void); static int event_external_register (void); -char *event_module_include = NULL; +char *ompi_event_module_include = NULL; /* * Instantiate the public struct with all of our public information @@ -81,12 +82,12 @@ static int event_external_register (void) { all_available_eventops = event_get_supported_methods(); #ifdef __APPLE__ - event_module_include ="select"; + ompi_event_module_include ="select"; #else - event_module_include = "poll"; + ompi_event_module_include = "poll"; #endif - avail = opal_argv_join(all_available_eventops, ','); + avail = opal_argv_join((char**)all_available_eventops, ','); asprintf( &help_msg, "Comma-delimited list of libevent subsystems " "to use (%s -- available on your platform)", @@ -98,7 +99,7 @@ static int event_external_register (void) { MCA_BASE_VAR_FLAG_SETTABLE, OPAL_INFO_LVL_3, MCA_BASE_VAR_SCOPE_LOCAL, - &event_module_include); + &ompi_event_module_include); free(help_msg); /* release the help message */ free(avail); avail = NULL; diff --git a/opal/mca/event/external/event_external_module.c b/opal/mca/event/external/event_external_module.c index 9eb773dc710..2ee67c7ad5c 100644 --- a/opal/mca/event/external/event_external_module.c +++ b/opal/mca/event/external/event_external_module.c @@ -17,7 +17,7 @@ #include "opal/util/argv.h" -extern char *event_module_include; +extern char *ompi_event_module_include; static struct event_config *config = NULL; opal_event_base_t* opal_event_base_create(void) @@ -45,11 +45,11 @@ int opal_event_init(void) all_available_eventops = event_get_supported_methods(); - if (NULL == event_module_include) { + if (NULL == ompi_event_module_include) { /* Shouldn't happen, but... */ - event_module_include = strdup("select"); + ompi_event_module_include = strdup("select"); } - includes = opal_argv_split(event_module_include,','); + includes = opal_argv_split(ompi_event_module_include,','); /* get a configuration object */ config = event_config_new(); diff --git a/opal/mca/event/external/external.h b/opal/mca/event/external/external.h index ada10ebbaed..29b2eaaef55 100644 --- a/opal/mca/event/external/external.h +++ b/opal/mca/event/external/external.h @@ -1,7 +1,7 @@ /* * Copyright (c) 2011-2015 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2013 Los Alamos National Security, LLC. All rights reserved. - * Copyright (c) 2015 Intel, Inc. All rights reserved. + * Copyright (c) 2015-2017 Intel, Inc. All rights reserved. * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2017 IBM Corporation. All rights reserved. @@ -75,6 +75,8 @@ OPAL_DECLSPEC int opal_event_finalize(void); #define opal_event_set(b, x, fd, fg, cb, arg) event_assign((x), (b), (fd), (fg), (event_callback_fn) (cb), (arg)) +#define opal_event_assign(x, b, fd, fg, cb, arg) event_assign((x), (b), (fd), (fg), (event_callback_fn) (cb), (arg)) + #define opal_event_add(ev, tv) event_add((ev), (tv)) #define opal_event_del(ev) event_del((ev)) diff --git a/opal/mca/event/libevent2022/libevent/event-internal.h b/opal/mca/event/libevent2022/libevent/event-internal.h index 4163a7d7ae2..3ffe509aa97 100644 --- a/opal/mca/event/libevent2022/libevent/event-internal.h +++ b/opal/mca/event/libevent2022/libevent/event-internal.h @@ -161,8 +161,8 @@ struct event_changelist { #ifndef _EVENT_DISABLE_DEBUG_MODE /* Global internal flag: set to one if debug mode is on. */ -extern int _event_debug_mode_on; -#define EVENT_DEBUG_MODE_IS_ON() (_event_debug_mode_on) +extern int ompi__event_debug_mode_on; +#define EVENT_DEBUG_MODE_IS_ON() (ompi__event_debug_mode_on) #else #define EVENT_DEBUG_MODE_IS_ON() (0) #endif diff --git a/opal/mca/event/libevent2022/libevent/event.c b/opal/mca/event/libevent2022/libevent/event.c index b9f47c42a08..cdeddce1325 100644 --- a/opal/mca/event/libevent2022/libevent/event.c +++ b/opal/mca/event/libevent2022/libevent/event.c @@ -93,7 +93,7 @@ extern const struct eventop win32ops; #endif /* Array of backends in order of preference. */ -static const struct eventop *eventops[] = { +static const struct eventop *ompi_eventops[] = { #if defined(_EVENT_HAVE_EVENT_PORTS) && _EVENT_HAVE_EVENT_PORTS &evportops, #endif @@ -120,8 +120,8 @@ static const struct eventop *eventops[] = { /**** End Open MPI Changes ****/ /* Global state; deprecated */ -struct event_base *event_global_current_base_ = NULL; -#define current_base event_global_current_base_ +struct event_base *ompi_event_global_current_base_ = NULL; +#define current_base ompi_event_global_current_base_ /* Global state */ @@ -181,7 +181,7 @@ eq_debug_entry(const struct event_debug_entry *a, return a->ptr == b->ptr; } -int _event_debug_mode_on = 0; +int ompi__event_debug_mode_on = 0; /* Set if it's too late to enable event_debug_mode. */ static int event_debug_mode_too_late = 0; #ifndef _EVENT_DISABLE_THREAD_SUPPORT @@ -197,7 +197,7 @@ HT_GENERATE(event_debug_map, event_debug_entry, node, hash_debug_entry, /* Macro: record that ev is now setup (that is, ready for an add) */ #define _event_debug_note_setup(ev) do { \ - if (_event_debug_mode_on) { \ + if (ompi__event_debug_mode_on) { \ struct event_debug_entry *dent,find; \ find.ptr = (ev); \ EVLOCK_LOCK(_event_debug_map_lock, 0); \ @@ -219,7 +219,7 @@ HT_GENERATE(event_debug_map, event_debug_entry, node, hash_debug_entry, } while (0) /* Macro: record that ev is no longer setup */ #define _event_debug_note_teardown(ev) do { \ - if (_event_debug_mode_on) { \ + if (ompi__event_debug_mode_on) { \ struct event_debug_entry *dent,find; \ find.ptr = (ev); \ EVLOCK_LOCK(_event_debug_map_lock, 0); \ @@ -232,7 +232,7 @@ HT_GENERATE(event_debug_map, event_debug_entry, node, hash_debug_entry, } while (0) /* Macro: record that ev is now added */ #define _event_debug_note_add(ev) do { \ - if (_event_debug_mode_on) { \ + if (ompi__event_debug_mode_on) { \ struct event_debug_entry *dent,find; \ find.ptr = (ev); \ EVLOCK_LOCK(_event_debug_map_lock, 0); \ @@ -253,7 +253,7 @@ HT_GENERATE(event_debug_map, event_debug_entry, node, hash_debug_entry, } while (0) /* Macro: record that ev is no longer added */ #define _event_debug_note_del(ev) do { \ - if (_event_debug_mode_on) { \ + if (ompi__event_debug_mode_on) { \ struct event_debug_entry *dent,find; \ find.ptr = (ev); \ EVLOCK_LOCK(_event_debug_map_lock, 0); \ @@ -274,7 +274,7 @@ HT_GENERATE(event_debug_map, event_debug_entry, node, hash_debug_entry, } while (0) /* Macro: assert that ev is setup (i.e., okay to add or inspect) */ #define _event_debug_assert_is_setup(ev) do { \ - if (_event_debug_mode_on) { \ + if (ompi__event_debug_mode_on) { \ struct event_debug_entry *dent,find; \ find.ptr = (ev); \ EVLOCK_LOCK(_event_debug_map_lock, 0); \ @@ -293,7 +293,7 @@ HT_GENERATE(event_debug_map, event_debug_entry, node, hash_debug_entry, /* Macro: assert that ev is not added (i.e., okay to tear down or set * up again) */ #define _event_debug_assert_not_added(ev) do { \ - if (_event_debug_mode_on) { \ + if (ompi__event_debug_mode_on) { \ struct event_debug_entry *dent,find; \ find.ptr = (ev); \ EVLOCK_LOCK(_event_debug_map_lock, 0); \ @@ -521,13 +521,13 @@ void event_enable_debug_mode(void) { #ifndef _EVENT_DISABLE_DEBUG_MODE - if (_event_debug_mode_on) + if (ompi__event_debug_mode_on) event_errx(1, "%s was called twice!", __func__); if (event_debug_mode_too_late) event_errx(1, "%s must be called *before* creating any events " "or event_bases",__func__); - _event_debug_mode_on = 1; + ompi__event_debug_mode_on = 1; HT_INIT(event_debug_map, &global_debug_map); #endif @@ -590,23 +590,23 @@ event_base_new_with_config(const struct event_config *cfg) should_check_environment = !(cfg && (cfg->flags & EVENT_BASE_FLAG_IGNORE_ENV)); - for (i = 0; eventops[i] && !base->evbase; i++) { + for (i = 0; ompi_eventops[i] && !base->evbase; i++) { if (cfg != NULL) { /* determine if this backend should be avoided */ if (event_config_is_avoided_method(cfg, - eventops[i]->name)) + ompi_eventops[i]->name)) continue; - if ((eventops[i]->features & cfg->require_features) + if ((ompi_eventops[i]->features & cfg->require_features) != cfg->require_features) continue; } /* also obey the environment variables */ if (should_check_environment && - event_is_method_disabled(eventops[i]->name)) + event_is_method_disabled(ompi_eventops[i]->name)) continue; - base->evsel = eventops[i]; + base->evsel = ompi_eventops[i]; base->evbase = base->evsel->init(base); } @@ -898,7 +898,7 @@ event_get_supported_methods(void) int i = 0, k; /* count all methods */ - for (method = &eventops[0]; *method != NULL; ++method) { + for (method = &ompi_eventops[0]; *method != NULL; ++method) { ++i; } @@ -908,8 +908,8 @@ event_get_supported_methods(void) return (NULL); /* populate the array with the supported methods */ - for (k = 0, i = 0; eventops[k] != NULL; ++k) { - tmp[i++] = eventops[k]->name; + for (k = 0, i = 0; ompi_eventops[k] != NULL; ++k) { + tmp[i++] = ompi_eventops[k]->name; } tmp[i] = NULL; diff --git a/opal/mca/event/libevent2022/libevent/evmap-internal.h b/opal/mca/event/libevent2022/libevent/evmap-internal.h index 23b5a8a0cd8..00833accc5e 100644 --- a/opal/mca/event/libevent2022/libevent/evmap-internal.h +++ b/opal/mca/event/libevent2022/libevent/evmap-internal.h @@ -51,7 +51,7 @@ void evmap_signal_clear(struct event_signal_map* ctx); /** Add an IO event (some combination of EV_READ or EV_WRITE) to an event_base's list of events on a given file descriptor, and tell the - underlying eventops about the fd if its state has changed. + underlying ompi_eventops about the fd if its state has changed. Requires that ev is not already added. @@ -62,7 +62,7 @@ void evmap_signal_clear(struct event_signal_map* ctx); int evmap_io_add(struct event_base *base, evutil_socket_t fd, struct event *ev); /** Remove an IO event (some combination of EV_READ or EV_WRITE) to an event_base's list of events on a given file descriptor, and tell the - underlying eventops about the fd if its state has changed. + underlying ompi_eventops about the fd if its state has changed. @param base the event_base to operate on. @param fd the file descriptor corresponding to ev. diff --git a/opal/mca/event/libevent2022/libevent/evthread-internal.h b/opal/mca/event/libevent2022/libevent/evthread-internal.h index ccfcdde84d6..69f07414e20 100644 --- a/opal/mca/event/libevent2022/libevent/evthread-internal.h +++ b/opal/mca/event/libevent2022/libevent/evthread-internal.h @@ -47,55 +47,55 @@ struct event_base; #if ! defined(_EVENT_DISABLE_THREAD_SUPPORT) && defined(EVTHREAD_EXPOSE_STRUCTS) /* Global function pointers to lock-related functions. NULL if locking isn't enabled. */ -extern struct evthread_lock_callbacks _evthread_lock_fns; -extern struct evthread_condition_callbacks _evthread_cond_fns; -extern unsigned long (*_evthread_id_fn)(void); -extern int _evthread_lock_debugging_enabled; +extern struct evthread_lock_callbacks ompi__evthread_lock_fns; +extern struct evthread_condition_callbacks ompi__evthread_cond_fns; +extern unsigned long (*ompi__evthread_id_fn)(void); +extern int ompi__evthread_lock_debugging_enabled; /** Return the ID of the current thread, or 1 if threading isn't enabled. */ #define EVTHREAD_GET_ID() \ - (_evthread_id_fn ? _evthread_id_fn() : 1) + (ompi__evthread_id_fn ? ompi__evthread_id_fn() : 1) /** Return true iff we're in the thread that is currently (or most recently) * running a given event_base's loop. Requires lock. */ #define EVBASE_IN_THREAD(base) \ - (_evthread_id_fn == NULL || \ - (base)->th_owner_id == _evthread_id_fn()) + (ompi__evthread_id_fn == NULL || \ + (base)->th_owner_id == ompi__evthread_id_fn()) /** Return true iff we need to notify the base's main thread about changes to * its state, because it's currently running the main loop in another * thread. Requires lock. */ #define EVBASE_NEED_NOTIFY(base) \ - (_evthread_id_fn != NULL && \ + (ompi__evthread_id_fn != NULL && \ (base)->running_loop && \ - (base)->th_owner_id != _evthread_id_fn()) + (base)->th_owner_id != ompi__evthread_id_fn()) /** Allocate a new lock, and store it in lockvar, a void*. Sets lockvar to NULL if locking is not enabled. */ #define EVTHREAD_ALLOC_LOCK(lockvar, locktype) \ - ((lockvar) = _evthread_lock_fns.alloc ? \ - _evthread_lock_fns.alloc(locktype) : NULL) + ((lockvar) = ompi__evthread_lock_fns.alloc ? \ + ompi__evthread_lock_fns.alloc(locktype) : NULL) /** Free a given lock, if it is present and locking is enabled. */ #define EVTHREAD_FREE_LOCK(lockvar, locktype) \ do { \ void *_lock_tmp_ = (lockvar); \ - if (_lock_tmp_ && _evthread_lock_fns.free) \ - _evthread_lock_fns.free(_lock_tmp_, (locktype)); \ + if (_lock_tmp_ && ompi__evthread_lock_fns.free) \ + ompi__evthread_lock_fns.free(_lock_tmp_, (locktype)); \ } while (0) /** Acquire a lock. */ #define EVLOCK_LOCK(lockvar,mode) \ do { \ if (lockvar) \ - _evthread_lock_fns.lock(mode, lockvar); \ + ompi__evthread_lock_fns.lock(mode, lockvar); \ } while (0) /** Release a lock */ #define EVLOCK_UNLOCK(lockvar,mode) \ do { \ if (lockvar) \ - _evthread_lock_fns.unlock(mode, lockvar); \ + ompi__evthread_lock_fns.unlock(mode, lockvar); \ } while (0) /** Helper: put lockvar1 and lockvar2 into pointerwise ascending order. */ @@ -123,7 +123,7 @@ extern int _evthread_lock_debugging_enabled; * locked and held by us. */ #define EVLOCK_ASSERT_LOCKED(lock) \ do { \ - if ((lock) && _evthread_lock_debugging_enabled) { \ + if ((lock) && ompi__evthread_lock_debugging_enabled) { \ EVUTIL_ASSERT(_evthread_is_debug_lock_held(lock)); \ } \ } while (0) @@ -134,8 +134,8 @@ static inline int EVLOCK_TRY_LOCK(void *lock); static inline int EVLOCK_TRY_LOCK(void *lock) { - if (lock && _evthread_lock_fns.lock) { - int r = _evthread_lock_fns.lock(EVTHREAD_TRY, lock); + if (lock && ompi__evthread_lock_fns.lock) { + int r = ompi__evthread_lock_fns.lock(EVTHREAD_TRY, lock); return !r; } else { /* Locking is disabled either globally or for this thing; @@ -147,35 +147,35 @@ EVLOCK_TRY_LOCK(void *lock) /** Allocate a new condition variable and store it in the void *, condvar */ #define EVTHREAD_ALLOC_COND(condvar) \ do { \ - (condvar) = _evthread_cond_fns.alloc_condition ? \ - _evthread_cond_fns.alloc_condition(0) : NULL; \ + (condvar) = ompi__evthread_cond_fns.alloc_condition ? \ + ompi__evthread_cond_fns.alloc_condition(0) : NULL; \ } while (0) /** Deallocate and free a condition variable in condvar */ #define EVTHREAD_FREE_COND(cond) \ do { \ if (cond) \ - _evthread_cond_fns.free_condition((cond)); \ + ompi__evthread_cond_fns.free_condition((cond)); \ } while (0) /** Signal one thread waiting on cond */ #define EVTHREAD_COND_SIGNAL(cond) \ - ( (cond) ? _evthread_cond_fns.signal_condition((cond), 0) : 0 ) + ( (cond) ? ompi__evthread_cond_fns.signal_condition((cond), 0) : 0 ) /** Signal all threads waiting on cond */ #define EVTHREAD_COND_BROADCAST(cond) \ - ( (cond) ? _evthread_cond_fns.signal_condition((cond), 1) : 0 ) + ( (cond) ? ompi__evthread_cond_fns.signal_condition((cond), 1) : 0 ) /** Wait until the condition 'cond' is signalled. Must be called while * holding 'lock'. The lock will be released until the condition is * signalled, at which point it will be acquired again. Returns 0 for * success, -1 for failure. */ #define EVTHREAD_COND_WAIT(cond, lock) \ - ( (cond) ? _evthread_cond_fns.wait_condition((cond), (lock), NULL) : 0 ) + ( (cond) ? ompi__evthread_cond_fns.wait_condition((cond), (lock), NULL) : 0 ) /** As EVTHREAD_COND_WAIT, but gives up after 'tv' has elapsed. Returns 1 * on timeout. */ #define EVTHREAD_COND_WAIT_TIMED(cond, lock, tv) \ - ( (cond) ? _evthread_cond_fns.wait_condition((cond), (lock), (tv)) : 0 ) + ( (cond) ? ompi__evthread_cond_fns.wait_condition((cond), (lock), (tv)) : 0 ) /** True iff locking functions have been configured. */ #define EVTHREAD_LOCKING_ENABLED() \ - (_evthread_lock_fns.lock != NULL) + (ompi__evthread_lock_fns.lock != NULL) #elif ! defined(_EVENT_DISABLE_THREAD_SUPPORT) diff --git a/opal/mca/event/libevent2022/libevent/evthread.c b/opal/mca/event/libevent2022/libevent/evthread.c index 90e195d584a..5f1e7a2b869 100644 --- a/opal/mca/event/libevent2022/libevent/evthread.c +++ b/opal/mca/event/libevent2022/libevent/evthread.c @@ -45,12 +45,12 @@ #endif /* globals */ -GLOBAL int _evthread_lock_debugging_enabled = 0; -GLOBAL struct evthread_lock_callbacks _evthread_lock_fns = { +GLOBAL int ompi__evthread_lock_debugging_enabled = 0; +GLOBAL struct evthread_lock_callbacks ompi__evthread_lock_fns = { 0, 0, NULL, NULL, NULL, NULL }; -GLOBAL unsigned long (*_evthread_id_fn)(void) = NULL; -GLOBAL struct evthread_condition_callbacks _evthread_cond_fns = { +GLOBAL unsigned long (*ompi__evthread_id_fn)(void) = NULL; +GLOBAL struct evthread_condition_callbacks ompi__evthread_cond_fns = { 0, NULL, NULL, NULL, NULL }; @@ -65,21 +65,21 @@ static struct evthread_condition_callbacks _original_cond_fns = { void evthread_set_id_callback(unsigned long (*id_fn)(void)) { - _evthread_id_fn = id_fn; + ompi__evthread_id_fn = id_fn; } int evthread_set_lock_callbacks(const struct evthread_lock_callbacks *cbs) { struct evthread_lock_callbacks *target = - _evthread_lock_debugging_enabled - ? &_original_lock_fns : &_evthread_lock_fns; + ompi__evthread_lock_debugging_enabled + ? &_original_lock_fns : &ompi__evthread_lock_fns; if (!cbs) { if (target->alloc) event_warnx("Trying to disable lock functions after " "they have been set up will probaby not work."); - memset(target, 0, sizeof(_evthread_lock_fns)); + memset(target, 0, sizeof(ompi__evthread_lock_fns)); return 0; } if (target->alloc) { @@ -98,7 +98,7 @@ evthread_set_lock_callbacks(const struct evthread_lock_callbacks *cbs) return -1; } if (cbs->alloc && cbs->free && cbs->lock && cbs->unlock) { - memcpy(target, cbs, sizeof(_evthread_lock_fns)); + memcpy(target, cbs, sizeof(ompi__evthread_lock_fns)); return event_global_setup_locks_(1); } else { return -1; @@ -109,15 +109,15 @@ int evthread_set_condition_callbacks(const struct evthread_condition_callbacks *cbs) { struct evthread_condition_callbacks *target = - _evthread_lock_debugging_enabled - ? &_original_cond_fns : &_evthread_cond_fns; + ompi__evthread_lock_debugging_enabled + ? &_original_cond_fns : &ompi__evthread_cond_fns; if (!cbs) { if (target->alloc_condition) event_warnx("Trying to disable condition functions " "after they have been set up will probaby not " "work."); - memset(target, 0, sizeof(_evthread_cond_fns)); + memset(target, 0, sizeof(ompi__evthread_cond_fns)); return 0; } if (target->alloc_condition) { @@ -136,12 +136,12 @@ evthread_set_condition_callbacks(const struct evthread_condition_callbacks *cbs) } if (cbs->alloc_condition && cbs->free_condition && cbs->signal_condition && cbs->wait_condition) { - memcpy(target, cbs, sizeof(_evthread_cond_fns)); + memcpy(target, cbs, sizeof(ompi__evthread_cond_fns)); } - if (_evthread_lock_debugging_enabled) { - _evthread_cond_fns.alloc_condition = cbs->alloc_condition; - _evthread_cond_fns.free_condition = cbs->free_condition; - _evthread_cond_fns.signal_condition = cbs->signal_condition; + if (ompi__evthread_lock_debugging_enabled) { + ompi__evthread_cond_fns.alloc_condition = cbs->alloc_condition; + ompi__evthread_cond_fns.free_condition = cbs->free_condition; + ompi__evthread_cond_fns.signal_condition = cbs->signal_condition; } return 0; } @@ -197,9 +197,9 @@ evthread_debug_lock_mark_locked(unsigned mode, struct debug_lock *lock) ++lock->count; if (!(lock->locktype & EVTHREAD_LOCKTYPE_RECURSIVE)) EVUTIL_ASSERT(lock->count == 1); - if (_evthread_id_fn) { + if (ompi__evthread_id_fn) { unsigned long me; - me = _evthread_id_fn(); + me = ompi__evthread_id_fn(); if (lock->count > 1) EVUTIL_ASSERT(lock->held_by == me); lock->held_by = me; @@ -230,8 +230,8 @@ evthread_debug_lock_mark_unlocked(unsigned mode, struct debug_lock *lock) EVUTIL_ASSERT(mode & (EVTHREAD_READ|EVTHREAD_WRITE)); else EVUTIL_ASSERT((mode & (EVTHREAD_READ|EVTHREAD_WRITE)) == 0); - if (_evthread_id_fn) { - EVUTIL_ASSERT(lock->held_by == _evthread_id_fn()); + if (ompi__evthread_id_fn) { + EVUTIL_ASSERT(lock->held_by == ompi__evthread_id_fn()); if (lock->count == 1) lock->held_by = 0; } @@ -274,17 +274,17 @@ evthread_enable_lock_debuging(void) debug_lock_lock, debug_lock_unlock }; - if (_evthread_lock_debugging_enabled) + if (ompi__evthread_lock_debugging_enabled) return; - memcpy(&_original_lock_fns, &_evthread_lock_fns, + memcpy(&_original_lock_fns, &ompi__evthread_lock_fns, sizeof(struct evthread_lock_callbacks)); - memcpy(&_evthread_lock_fns, &cbs, + memcpy(&ompi__evthread_lock_fns, &cbs, sizeof(struct evthread_lock_callbacks)); - memcpy(&_original_cond_fns, &_evthread_cond_fns, + memcpy(&_original_cond_fns, &ompi__evthread_cond_fns, sizeof(struct evthread_condition_callbacks)); - _evthread_cond_fns.wait_condition = debug_cond_wait; - _evthread_lock_debugging_enabled = 1; + ompi__evthread_cond_fns.wait_condition = debug_cond_wait; + ompi__evthread_lock_debugging_enabled = 1; /* XXX return value should get checked. */ event_global_setup_locks_(0); @@ -296,8 +296,8 @@ _evthread_is_debug_lock_held(void *lock_) struct debug_lock *lock = lock_; if (! lock->count) return 0; - if (_evthread_id_fn) { - unsigned long me = _evthread_id_fn(); + if (ompi__evthread_id_fn) { + unsigned long me = ompi__evthread_id_fn(); if (lock->held_by != me) return 0; } @@ -344,15 +344,15 @@ evthread_setup_global_lock_(void *lock_, unsigned locktype, int enable_locks) lock->count = 0; lock->held_by = 0; return lock; - } else if (enable_locks && ! _evthread_lock_debugging_enabled) { + } else if (enable_locks && ! ompi__evthread_lock_debugging_enabled) { /* Case 3: allocate a regular lock */ EVUTIL_ASSERT(lock_ == NULL); - return _evthread_lock_fns.alloc(locktype); + return ompi__evthread_lock_fns.alloc(locktype); } else { /* Case 4: Fill in a debug lock with a real lock */ struct debug_lock *lock = lock_; EVUTIL_ASSERT(enable_locks && - _evthread_lock_debugging_enabled); + ompi__evthread_lock_debugging_enabled); EVUTIL_ASSERT(lock->locktype == locktype); EVUTIL_ASSERT(lock->lock == NULL); lock->lock = _original_lock_fns.alloc( @@ -371,74 +371,74 @@ evthread_setup_global_lock_(void *lock_, unsigned locktype, int enable_locks) unsigned long _evthreadimpl_get_id() { - return _evthread_id_fn ? _evthread_id_fn() : 1; + return ompi__evthread_id_fn ? ompi__evthread_id_fn() : 1; } void * _evthreadimpl_lock_alloc(unsigned locktype) { - return _evthread_lock_fns.alloc ? - _evthread_lock_fns.alloc(locktype) : NULL; + return ompi__evthread_lock_fns.alloc ? + ompi__evthread_lock_fns.alloc(locktype) : NULL; } void _evthreadimpl_lock_free(void *lock, unsigned locktype) { - if (_evthread_lock_fns.free) - _evthread_lock_fns.free(lock, locktype); + if (ompi__evthread_lock_fns.free) + ompi__evthread_lock_fns.free(lock, locktype); } int _evthreadimpl_lock_lock(unsigned mode, void *lock) { - if (_evthread_lock_fns.lock) - return _evthread_lock_fns.lock(mode, lock); + if (ompi__evthread_lock_fns.lock) + return ompi__evthread_lock_fns.lock(mode, lock); else return 0; } int _evthreadimpl_lock_unlock(unsigned mode, void *lock) { - if (_evthread_lock_fns.unlock) - return _evthread_lock_fns.unlock(mode, lock); + if (ompi__evthread_lock_fns.unlock) + return ompi__evthread_lock_fns.unlock(mode, lock); else return 0; } void * _evthreadimpl_cond_alloc(unsigned condtype) { - return _evthread_cond_fns.alloc_condition ? - _evthread_cond_fns.alloc_condition(condtype) : NULL; + return ompi__evthread_cond_fns.alloc_condition ? + ompi__evthread_cond_fns.alloc_condition(condtype) : NULL; } void _evthreadimpl_cond_free(void *cond) { - if (_evthread_cond_fns.free_condition) - _evthread_cond_fns.free_condition(cond); + if (ompi__evthread_cond_fns.free_condition) + ompi__evthread_cond_fns.free_condition(cond); } int _evthreadimpl_cond_signal(void *cond, int broadcast) { - if (_evthread_cond_fns.signal_condition) - return _evthread_cond_fns.signal_condition(cond, broadcast); + if (ompi__evthread_cond_fns.signal_condition) + return ompi__evthread_cond_fns.signal_condition(cond, broadcast); else return 0; } int _evthreadimpl_cond_wait(void *cond, void *lock, const struct timeval *tv) { - if (_evthread_cond_fns.wait_condition) - return _evthread_cond_fns.wait_condition(cond, lock, tv); + if (ompi__evthread_cond_fns.wait_condition) + return ompi__evthread_cond_fns.wait_condition(cond, lock, tv); else return 0; } int _evthreadimpl_is_lock_debugging_enabled(void) { - return _evthread_lock_debugging_enabled; + return ompi__evthread_lock_debugging_enabled; } int _evthreadimpl_locking_enabled(void) { - return _evthread_lock_fns.lock != NULL; + return ompi__evthread_lock_fns.lock != NULL; } #endif diff --git a/opal/mca/event/libevent2022/libevent/evutil.c b/opal/mca/event/libevent2022/libevent/evutil.c index 33445170f64..214f9082dbc 100644 --- a/opal/mca/event/libevent2022/libevent/evutil.c +++ b/opal/mca/event/libevent2022/libevent/evutil.c @@ -2113,7 +2113,7 @@ _evutil_weakrand(void) * Volatile pointer to memset: we use this to keep the compiler from * eliminating our call to memset. */ -void * (*volatile evutil_memset_volatile_)(void *, int, size_t) = memset; +static void * (*volatile evutil_memset_volatile_)(void *, int, size_t) = memset; void evutil_memclear_(void *mem, size_t len) diff --git a/opal/mca/event/libevent2022/libevent/log-internal.h b/opal/mca/event/libevent2022/libevent/log-internal.h index 9b8e0fa2902..49a7c3359f0 100644 --- a/opal/mca/event/libevent2022/libevent/log-internal.h +++ b/opal/mca/event/libevent2022/libevent/log-internal.h @@ -57,7 +57,7 @@ void _event_debugx(const char *fmt, ...) EV_CHECK_FMT(1,2); #undef EV_CHECK_FMT /**** OMPI CHANGE ****/ -extern int event_enable_debug_output; +extern int ompi_event_enable_debug_output; /**** END OMPI CHANGE ****/ #endif diff --git a/opal/mca/event/libevent2022/libevent/log.c b/opal/mca/event/libevent2022/libevent/log.c index b65517f7691..43ee9b38045 100644 --- a/opal/mca/event/libevent2022/libevent/log.c +++ b/opal/mca/event/libevent2022/libevent/log.c @@ -64,7 +64,7 @@ static void event_exit(int errcode) EV_NORETURN; static event_fatal_cb fatal_fn = NULL; /**** OMPI CHANGE ****/ -int event_enable_debug_output = 0; +int ompi_event_enable_debug_output = 0; /**** END OMPI CHANGE ****/ void diff --git a/opal/mca/event/libevent2022/libevent/test/regress.c b/opal/mca/event/libevent2022/libevent/test/regress.c index 5935f9be071..aec74616bb6 100644 --- a/opal/mca/event/libevent2022/libevent/test/regress.c +++ b/opal/mca/event/libevent2022/libevent/test/regress.c @@ -814,7 +814,7 @@ test_common_timeout(void *ptr) #ifndef WIN32 static void signal_cb(evutil_socket_t fd, short event, void *arg); -#define current_base event_global_current_base_ +#define current_base ompi_event_global_current_base_ extern struct event_base *current_base; static void diff --git a/opal/mca/event/libevent2022/libevent2022.h b/opal/mca/event/libevent2022/libevent2022.h index 51a3d2f5f40..de3443539f0 100644 --- a/opal/mca/event/libevent2022/libevent2022.h +++ b/opal/mca/event/libevent2022/libevent2022.h @@ -110,6 +110,8 @@ OPAL_DECLSPEC int opal_event_finalize(void); #define opal_event_set(b, x, fd, fg, cb, arg) event_assign((x), (b), (fd), (fg), (event_callback_fn) (cb), (arg)) +#define opal_event_assign(x, b, fd, fg, cb, arg) event_assign((x), (b), (fd), (fg), (event_callback_fn) (cb), (arg)) + #define opal_event_add(ev, tv) event_add((ev), (tv)) #define opal_event_del(ev) event_del((ev)) diff --git a/opal/mca/event/libevent2022/libevent2022_component.c b/opal/mca/event/libevent2022/libevent2022_component.c index 1151428f915..6c8171dcf8c 100644 --- a/opal/mca/event/libevent2022/libevent2022_component.c +++ b/opal/mca/event/libevent2022/libevent2022_component.c @@ -4,6 +4,7 @@ * Copyright (c) 2012-2015 Los Alamos National Security, LLC. All rights reserved. * Copyright (c) 2015 Intel, Inc. All rights reserved. * + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -35,7 +36,7 @@ const char *opal_event_libevent2022_component_version_string = /* * MCA variables */ -char *event_module_include = NULL; +char *ompi_event_module_include = NULL; /* copied from event.c */ #if defined(_EVENT_HAVE_EVENT_PORTS) && _EVENT_HAVE_EVENT_PORTS @@ -61,7 +62,7 @@ extern const struct eventop win32ops; #endif /* Array of backends in order of preference. */ -const struct eventop *eventops[] = { +const struct eventop *ompi_eventops[] = { #if defined(_EVENT_HAVE_EVENT_PORTS) && _EVENT_HAVE_EVENT_PORTS &evportops, #endif @@ -122,7 +123,7 @@ const opal_event_component_t mca_event_libevent2022_component = { static int libevent2022_register (void) { - const struct eventop** _eventop = eventops; + const struct eventop** _eventop = ompi_eventops; char available_eventops[BUFSIZ] = "none"; char *help_msg = NULL; int ret; @@ -156,18 +157,18 @@ static int libevent2022_register (void) const int len = sizeof (available_eventops); int cur_len = snprintf (available_eventops, len, "%s", (*(_eventop++))->name); - for (int i = 1 ; eventops[i] && cur_len < len ; ++i) { + for (int i = 1 ; ompi_eventops[i] && cur_len < len ; ++i) { cur_len += snprintf (available_eventops + cur_len, len - cur_len, ", %s", - eventops[i]->name); + ompi_eventops[i]->name); } /* ensure the available_eventops string is always NULL-terminated */ available_eventops[len - 1] = '\0'; } #ifdef __APPLE__ - event_module_include ="select"; + ompi_event_module_include ="select"; #else - event_module_include = "poll"; + ompi_event_module_include = "poll"; #endif asprintf( &help_msg, @@ -181,7 +182,7 @@ static int libevent2022_register (void) MCA_BASE_VAR_FLAG_SETTABLE, OPAL_INFO_LVL_3, MCA_BASE_VAR_SCOPE_LOCAL, - &event_module_include); + &ompi_event_module_include); free(help_msg); /* release the help message */ if (0 > ret) { diff --git a/opal/mca/event/libevent2022/libevent2022_module.c b/opal/mca/event/libevent2022/libevent2022_module.c index 050a898330c..b36f4d4f985 100644 --- a/opal/mca/event/libevent2022/libevent2022_module.c +++ b/opal/mca/event/libevent2022/libevent2022_module.c @@ -7,6 +7,7 @@ * Copyright (c) 2015 Intel, Inc. All rights reserved. * Copyright (c) 2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -67,8 +68,8 @@ #include "opal/mca/event/event.h" static struct event_config *config=NULL; -extern char *event_module_include; -extern const struct eventop *eventops[]; +extern char *ompi_event_module_include; +extern const struct eventop *ompi_eventops[]; opal_event_base_t* opal_event_base_create(void) { @@ -93,29 +94,29 @@ int opal_event_init(void) dumpit = true; } - if (NULL == event_module_include) { + if (NULL == ompi_event_module_include) { /* Shouldn't happen, but... */ - event_module_include = strdup("select"); + ompi_event_module_include = strdup("select"); } - includes = opal_argv_split(event_module_include,','); + includes = opal_argv_split(ompi_event_module_include,','); /* get a configuration object */ config = event_config_new(); /* cycle thru the available subsystems */ - for (i = 0 ; NULL != eventops[i] ; ++i) { + for (i = 0 ; NULL != ompi_eventops[i] ; ++i) { /* if this module isn't included in the given ones, * then exclude it */ dumpit = true; for (j=0; NULL != includes[j]; j++) { if (0 == strcmp("all", includes[j]) || - 0 == strcmp(eventops[i]->name, includes[j])) { + 0 == strcmp(ompi_eventops[i]->name, includes[j])) { dumpit = false; break; } } if (dumpit) { - event_config_avoid_method(config, eventops[i]->name); + event_config_avoid_method(config, ompi_eventops[i]->name); } } opal_argv_free(includes); diff --git a/opal/mca/hwloc/Makefile.am b/opal/mca/hwloc/Makefile.am index 69d8853c131..fdda561a64f 100644 --- a/opal/mca/hwloc/Makefile.am +++ b/opal/mca/hwloc/Makefile.am @@ -9,6 +9,8 @@ # $HEADER$ # +EXTRA_DIST = autogen.options + # main library setup noinst_LTLIBRARIES = libmca_hwloc.la libmca_hwloc_la_SOURCES = diff --git a/opal/mca/hwloc/base/base.h b/opal/mca/hwloc/base/base.h index 0a9c482a743..bb4b64f598f 100644 --- a/opal/mca/hwloc/base/base.h +++ b/opal/mca/hwloc/base/base.h @@ -1,6 +1,8 @@ /* * Copyright (c) 2011-2017 Cisco Systems, Inc. All rights reserved * Copyright (c) 2013-2017 Intel, Inc. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -17,6 +19,12 @@ #include "opal/mca/hwloc/hwloc-internal.h" +#if HWLOC_API_VERSION < 0x20000 +#define HWLOC_OBJ_L3CACHE HWLOC_OBJ_CACHE +#define HWLOC_OBJ_L2CACHE HWLOC_OBJ_CACHE +#define HWLOC_OBJ_L1CACHE HWLOC_OBJ_CACHE +#endif + /* * Global functions for MCA overall hwloc open and close */ @@ -81,6 +89,20 @@ OPAL_DECLSPEC extern char *opal_hwloc_base_topo_file; hwloc_bitmap_free(bind); \ } while(0); +#if HWLOC_API_VERSION < 0x20000 +#define OPAL_HWLOC_MAKE_OBJ_CACHE(level, obj, cache_level) \ + do { \ + obj = HWLOC_OBJ_CACHE; \ + cache_level = level; \ + } while(0) +#else +#define OPAL_HWLOC_MAKE_OBJ_CACHE(level, obj, cache_level) \ + do { \ + obj = HWLOC_OBJ_L##level##CACHE; \ + cache_level = 0; \ + } while(0) +#endif + OPAL_DECLSPEC opal_hwloc_locality_t opal_hwloc_base_get_relative_locality(hwloc_topology_t topo, char *cpuset1, char *cpuset2); @@ -132,9 +154,6 @@ typedef enum { */ OPAL_DECLSPEC extern opal_hwloc_base_mbfa_t opal_hwloc_base_mbfa; -/* some critical helper functions */ -OPAL_DECLSPEC int opal_hwloc_base_filter_cpus(hwloc_topology_t topo); - /** * Discover / load the hwloc topology (i.e., call hwloc_topology_init() and * hwloc_topology_load()). @@ -146,12 +165,12 @@ OPAL_DECLSPEC int opal_hwloc_base_get_topology(void); */ OPAL_DECLSPEC int opal_hwloc_base_set_topology(char *topofile); +OPAL_DECLSPEC int opal_hwloc_base_filter_cpus(hwloc_topology_t topo); + /** * Free the hwloc topology. */ OPAL_DECLSPEC void opal_hwloc_base_free_topology(hwloc_topology_t topo); -OPAL_DECLSPEC hwloc_cpuset_t opal_hwloc_base_get_available_cpus(hwloc_topology_t topo, - hwloc_obj_t obj); OPAL_DECLSPEC unsigned int opal_hwloc_base_get_nbobjs_by_type(hwloc_topology_t topo, hwloc_obj_type_t target, unsigned cache_level, @@ -285,6 +304,9 @@ OPAL_DECLSPEC char* opal_hwloc_base_get_location(char *locality, OPAL_DECLSPEC opal_hwloc_locality_t opal_hwloc_compute_relative_locality(char *loc1, char *loc2); +OPAL_DECLSPEC int opal_hwloc_base_topology_export_xmlbuffer(hwloc_topology_t topology, char **xmlpath, int *buflen); + +OPAL_DECLSPEC int opal_hwloc_base_topology_set_flags (hwloc_topology_t topology, unsigned long flags, bool io); END_C_DECLS #endif /* OPAL_HWLOC_BASE_H */ diff --git a/opal/mca/hwloc/base/hwloc_base_dt.c b/opal/mca/hwloc/base/hwloc_base_dt.c index 10ab99688ae..0840ee13f11 100644 --- a/opal/mca/hwloc/base/hwloc_base_dt.c +++ b/opal/mca/hwloc/base/hwloc_base_dt.c @@ -31,7 +31,7 @@ int opal_hwloc_pack(opal_buffer_t *buffer, const void *src, t = tarray[i]; /* extract an xml-buffer representation of the tree */ - if (0 != hwloc_topology_export_xmlbuffer(t, &xmlbuffer, &len)) { + if (0 != opal_hwloc_base_topology_export_xmlbuffer(t, &xmlbuffer, &len)) { return OPAL_ERROR; } @@ -106,9 +106,7 @@ int opal_hwloc_unpack(opal_buffer_t *buffer, void *dest, /* since we are loading this from an external source, we have to * explicitly set a flag so hwloc sets things up correctly */ - if (0 != hwloc_topology_set_flags(t, (HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM | - HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM | - HWLOC_TOPOLOGY_FLAG_IO_DEVICES))) { + if (0 != opal_hwloc_base_topology_set_flags(t, HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM, true)) { rc = OPAL_ERROR; hwloc_topology_destroy(t); goto cleanup; @@ -137,11 +135,6 @@ int opal_hwloc_unpack(opal_buffer_t *buffer, void *dest, goto cleanup; } - /* filter the cpus thru any default cpu set */ - if (OPAL_SUCCESS != (rc = opal_hwloc_base_filter_cpus(t))) { - goto cleanup; - } - /* pass it back */ tarray[i] = t; @@ -197,10 +190,10 @@ int opal_hwloc_compare(const hwloc_topology_t topo1, * where we really need to do a tree-wise search so we only compare * the things we care about, and ignore stuff like MAC addresses */ - if (0 != hwloc_topology_export_xmlbuffer(t1, &x1, &l1)) { + if (0 != opal_hwloc_base_topology_export_xmlbuffer(t1, &x1, &l1)) { return OPAL_EQUAL; } - if (0 != hwloc_topology_export_xmlbuffer(t2, &x2, &l2)) { + if (0 != opal_hwloc_base_topology_export_xmlbuffer(t2, &x2, &l2)) { free(x1); return OPAL_EQUAL; } @@ -269,18 +262,6 @@ static void print_hwloc_obj(char **output, char *prefix, free(tmp); tmp = tmp2; } - if (NULL != obj->online_cpuset) { - hwloc_bitmap_snprintf(string, OPAL_HWLOC_MAX_STRING, obj->online_cpuset); - asprintf(&tmp2, "%s%sOnline: %s", tmp, pfx, string); - free(tmp); - tmp = tmp2; - } - if (NULL != obj->allowed_cpuset) { - hwloc_bitmap_snprintf(string, OPAL_HWLOC_MAX_STRING, obj->allowed_cpuset); - asprintf(&tmp2, "%s%sAllowed: %s", tmp, pfx, string); - free(tmp); - tmp = tmp2; - } if (HWLOC_OBJ_MACHINE == obj->type) { /* root level object - add support values */ support = (struct hwloc_topology_support*)hwloc_topology_get_support(topo); diff --git a/opal/mca/hwloc/base/hwloc_base_frame.c b/opal/mca/hwloc/base/hwloc_base_frame.c index e27985d38eb..ea57fef56d1 100644 --- a/opal/mca/hwloc/base/hwloc_base_frame.c +++ b/opal/mca/hwloc/base/hwloc_base_frame.c @@ -1,7 +1,7 @@ /* * Copyright (c) 2011-2017 Cisco Systems, Inc. All rights reserved * Copyright (c) 2013-2017 Intel, Inc. All rights reserved. - * Copyright (c) 2016 Research Organization for Information Science + * Copyright (c) 2016-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -50,9 +50,9 @@ hwloc_obj_type_t opal_hwloc_levels[] = { HWLOC_OBJ_MACHINE, HWLOC_OBJ_NODE, HWLOC_OBJ_SOCKET, - HWLOC_OBJ_CACHE, - HWLOC_OBJ_CACHE, - HWLOC_OBJ_CACHE, + HWLOC_OBJ_L3CACHE, + HWLOC_OBJ_L2CACHE, + HWLOC_OBJ_L1CACHE, HWLOC_OBJ_CORE, HWLOC_OBJ_PU }; @@ -120,7 +120,7 @@ static int opal_hwloc_base_register(mca_base_register_flag_t flags) (void) mca_base_var_register("opal", "hwloc", "base", "binding_policy", "Policy for binding processes. Allowed values: none, hwthread, core, l1cache, l2cache, " "l3cache, socket, numa, board (\"none\" is the default when oversubscribed, \"core\" is " - "the default when np<=2, and \"socket\" is the default when np>2). Allowed qualifiers: " + "the default when np<=2, and \"numa\" is the default when np>2). Allowed qualifiers: " "overload-allowed, if-supported", MCA_BASE_VAR_TYPE_STRING, NULL, 0, 0, OPAL_INFO_LVL_9, MCA_BASE_VAR_SCOPE_READONLY, &opal_hwloc_base_binding_policy); diff --git a/opal/mca/hwloc/base/hwloc_base_util.c b/opal/mca/hwloc/base/hwloc_base_util.c index 8c30b316e25..fcc5f6d4ad0 100644 --- a/opal/mca/hwloc/base/hwloc_base_util.c +++ b/opal/mca/hwloc/base/hwloc_base_util.c @@ -11,7 +11,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2011-2017 Cisco Systems, Inc. All rights reserved - * Copyright (c) 2012-2015 Los Alamos National Security, LLC. + * Copyright (c) 2012-2017 Los Alamos National Security, LLC. * All rights reserved. * Copyright (c) 2013-2017 Intel, Inc. All rights reserved. * Copyright (c) 2015-2017 Research Organization for Information Science @@ -32,6 +32,15 @@ #ifdef HAVE_UNISTD_H #include #endif +#ifdef HAVE_ENDIAN_H +#include +#endif +#ifdef HAVE_SYS_STAT_H +#include +#endif +#if HAVE_FCNTL_H +#include +#endif #include "opal/runtime/opal.h" #include "opal/constants.h" @@ -45,6 +54,8 @@ #include "opal/mca/hwloc/hwloc-internal.h" #include "opal/mca/hwloc/base/base.h" +static bool topo_in_shmem = false; + /* * Provide the hwloc object that corresponds to the given * processor id of the given type. Remember: "processor" here [usually] means "core" -- @@ -134,8 +145,12 @@ int opal_hwloc_base_filter_cpus(hwloc_topology_t topo) /* process any specified default cpu set against this topology */ if (NULL == opal_hwloc_base_cpu_list) { /* get the root available cpuset */ - avail = hwloc_bitmap_alloc(); - hwloc_bitmap_and(avail, root->online_cpuset, root->allowed_cpuset); + #if HWLOC_API_VERSION < 0x20000 + avail = hwloc_bitmap_alloc(); + hwloc_bitmap_and(avail, root->online_cpuset, root->allowed_cpuset); + #else + avail = hwloc_bitmap_dup(root->allowed_cpuset); + #endif OPAL_OUTPUT_VERBOSE((5, opal_hwloc_base_framework.framework_output, "hwloc:base: no cpus specified - using root available cpuset")); } else { @@ -154,7 +169,12 @@ int opal_hwloc_base_filter_cpus(hwloc_topology_t topo) /* only one cpu given - get that object */ cpu = strtoul(range[0], NULL, 10); if (NULL != (pu = opal_hwloc_base_get_pu(topo, cpu, OPAL_HWLOC_LOGICAL))) { - hwloc_bitmap_and(pucpus, pu->online_cpuset, pu->allowed_cpuset); + #if HWLOC_API_VERSION < 0x20000 + hwloc_bitmap_and(pucpus, pu->online_cpuset, pu->allowed_cpuset); + #else + hwloc_bitmap_free(pucpus); + pucpus = hwloc_bitmap_dup(pu->allowed_cpuset); + #endif hwloc_bitmap_or(res, avail, pucpus); hwloc_bitmap_copy(avail, res); data = (opal_hwloc_obj_data_t*)pu->userdata; @@ -171,7 +191,12 @@ int opal_hwloc_base_filter_cpus(hwloc_topology_t topo) end = strtoul(range[1], NULL, 10); for (cpu=start; cpu <= end; cpu++) { if (NULL != (pu = opal_hwloc_base_get_pu(topo, cpu, OPAL_HWLOC_LOGICAL))) { - hwloc_bitmap_and(pucpus, pu->online_cpuset, pu->allowed_cpuset); + #if HWLOC_API_VERSION < 0x20000 + hwloc_bitmap_and(pucpus, pu->online_cpuset, pu->allowed_cpuset); + #else + hwloc_bitmap_free(pucpus); + pucpus = hwloc_bitmap_dup(pu->allowed_cpuset); + #endif hwloc_bitmap_or(res, avail, pucpus); hwloc_bitmap_copy(avail, res); data = (opal_hwloc_obj_data_t*)pu->userdata; @@ -205,6 +230,7 @@ static void fill_cache_line_size(void) { int i = 0, cache_level = 2; unsigned size; + unsigned int cache_object = HWLOC_OBJ_L2CACHE; hwloc_obj_t obj; bool found = false; @@ -214,10 +240,11 @@ static void fill_cache_line_size(void) i=0; while (1) { obj = opal_hwloc_base_get_obj_by_type(opal_hwloc_topology, - HWLOC_OBJ_CACHE, cache_level, + cache_object, cache_level, i, OPAL_HWLOC_LOGICAL); if (NULL == obj) { --cache_level; + cache_object = HWLOC_OBJ_L1CACHE; break; } else { if (NULL != obj->attr && @@ -244,28 +271,91 @@ int opal_hwloc_base_get_topology(void) int rc; opal_process_name_t wildcard_rank; char *val = NULL; +#if HWLOC_API_VERSION >= 0x20000 + int rc2, rc3, fd; + uint64_t addr, *aptr, size, *sptr; + char *shmemfile; +#endif - OPAL_OUTPUT_VERBOSE((2, opal_hwloc_base_framework.framework_output, - "hwloc:base:get_topology")); + opal_output_verbose(2, opal_hwloc_base_framework.framework_output, + "hwloc:base:get_topology"); - /* see if we already got it */ + /* see if we already have it */ if (NULL != opal_hwloc_topology) { return OPAL_SUCCESS; } + wildcard_rank.jobid = OPAL_PROC_MY_NAME.jobid; + wildcard_rank.vpid = OPAL_VPID_WILDCARD; if (NULL != opal_pmix.get) { - /* try to retrieve it from the PMIx store */ +#if HWLOC_API_VERSION >= 0x20000 + opal_output_verbose(2, opal_hwloc_base_framework.framework_output, + "hwloc:base: looking for topology in shared memory"); + + /* first try to get the shmem link, if available */ + aptr = &addr; + sptr = &size; + OPAL_MODEX_RECV_VALUE_OPTIONAL(rc, OPAL_PMIX_HWLOC_SHMEM_FILE, + &wildcard_rank, (void**)&shmemfile, OPAL_STRING); + OPAL_MODEX_RECV_VALUE_OPTIONAL(rc2, OPAL_PMIX_HWLOC_SHMEM_ADDR, + &wildcard_rank, (void**)&aptr, OPAL_SIZE); + OPAL_MODEX_RECV_VALUE_OPTIONAL(rc3, OPAL_PMIX_HWLOC_SHMEM_SIZE, + &wildcard_rank, (void**)&sptr, OPAL_SIZE); + if (OPAL_SUCCESS == rc && OPAL_SUCCESS == rc2 && OPAL_SUCCESS == rc3) { + if (0 > (fd = open(shmemfile, O_RDONLY))) { + free(shmemfile); + OPAL_ERROR_LOG(OPAL_ERR_FILE_OPEN_FAILURE) + return OPAL_ERR_FILE_OPEN_FAILURE; + } + free(shmemfile); + if (0 != hwloc_shmem_topology_adopt(&opal_hwloc_topology, fd, + 0, (void*)addr, size, 0)) { + if (4 < opal_output_get_verbosity(opal_hwloc_base_framework.framework_output)) { + FILE *file = fopen("/proc/self/maps", "r"); + if (file) { + char line[256]; + opal_output(0, "Dumping /proc/self/maps"); + + while (fgets(line, sizeof(line), file) != NULL) { + char *end = strchr(line, '\n'); + if (end) { + *end = '\0'; + } + opal_output(0, "%s", line); + } + fclose(file); + } + } + /* failed to adopt from shmem, fallback to other ways to get the topology */ + } else { + opal_output_verbose(2, opal_hwloc_base_framework.framework_output, + "hwloc:base: topology in shared memory"); + topo_in_shmem = true; + return OPAL_SUCCESS; + } + } +#endif + /* if that isn't available, then try to retrieve + * the xml representation from the PMIx data store */ opal_output_verbose(1, opal_hwloc_base_framework.framework_output, - "hwloc:base instantiating topology"); - wildcard_rank.jobid = OPAL_PROC_MY_NAME.jobid; - wildcard_rank.vpid = OPAL_VPID_WILDCARD; - OPAL_MODEX_RECV_VALUE_OPTIONAL(rc, OPAL_PMIX_LOCAL_TOPO, - &wildcard_rank, &val, OPAL_STRING); + "hwloc:base[%s:%d] getting topology XML string", + __FILE__, __LINE__); +#if HWLOC_API_VERSION >= 0x20000 + OPAL_MODEX_RECV_VALUE_IMMEDIATE(rc, OPAL_PMIX_HWLOC_XML_V2, + &wildcard_rank, &val, OPAL_STRING); +#else + OPAL_MODEX_RECV_VALUE_IMMEDIATE(rc, OPAL_PMIX_HWLOC_XML_V1, + &wildcard_rank, &val, OPAL_STRING); +#endif } else { + opal_output_verbose(1, opal_hwloc_base_framework.framework_output, + "hwloc:base PMIx not available"); rc = OPAL_ERR_NOT_SUPPORTED; } if (OPAL_SUCCESS == rc && NULL != val) { + opal_output_verbose(1, opal_hwloc_base_framework.framework_output, + "hwloc:base loading topology from XML"); /* load the topology */ if (0 != hwloc_topology_init(&opal_hwloc_topology)) { free(val); @@ -279,10 +369,9 @@ int opal_hwloc_base_get_topology(void) /* since we are loading this from an external source, we have to * explicitly set a flag so hwloc sets things up correctly */ - if (0 != hwloc_topology_set_flags(opal_hwloc_topology, - (HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM | - HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM | - HWLOC_TOPOLOGY_FLAG_IO_DEVICES))) { + if (0 != opal_hwloc_base_topology_set_flags(opal_hwloc_topology, + HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM, + true)) { hwloc_topology_destroy(opal_hwloc_topology); free(val); return OPAL_ERROR; @@ -300,17 +389,23 @@ int opal_hwloc_base_get_topology(void) return rc; } } else if (NULL == opal_hwloc_base_topo_file) { + opal_output_verbose(1, opal_hwloc_base_framework.framework_output, + "hwloc:base discovering topology"); if (0 != hwloc_topology_init(&opal_hwloc_topology) || - 0 != hwloc_topology_set_flags(opal_hwloc_topology, - (HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM | - HWLOC_TOPOLOGY_FLAG_IO_DEVICES)) || + 0 != opal_hwloc_base_topology_set_flags(opal_hwloc_topology, 0, true) || 0 != hwloc_topology_load(opal_hwloc_topology)) { + OPAL_ERROR_LOG(OPAL_ERR_NOT_SUPPORTED); return OPAL_ERR_NOT_SUPPORTED; } + /* filter the cpus thru any default cpu set */ if (OPAL_SUCCESS != (rc = opal_hwloc_base_filter_cpus(opal_hwloc_topology))) { + hwloc_topology_destroy(opal_hwloc_topology); return rc; } } else { + opal_output_verbose(1, opal_hwloc_base_framework.framework_output, + "hwloc:base loading topology from file %s", + opal_hwloc_base_topo_file); if (OPAL_SUCCESS != (rc = opal_hwloc_base_set_topology(opal_hwloc_base_topo_file))) { return rc; } @@ -331,7 +426,6 @@ int opal_hwloc_base_get_topology(void) int opal_hwloc_base_set_topology(char *topofile) { struct hwloc_topology_support *support; - int rc; OPAL_OUTPUT_VERBOSE((5, opal_hwloc_base_framework.framework_output, "hwloc:base:set_topology %s", topofile)); @@ -351,10 +445,9 @@ int opal_hwloc_base_set_topology(char *topofile) /* since we are loading this from an external source, we have to * explicitly set a flag so hwloc sets things up correctly */ - if (0 != hwloc_topology_set_flags(opal_hwloc_topology, - (HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM | - HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM | - HWLOC_TOPOLOGY_FLAG_IO_DEVICES))) { + if (0 != opal_hwloc_base_topology_set_flags(opal_hwloc_topology, + HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM, + true)) { hwloc_topology_destroy(opal_hwloc_topology); return OPAL_ERR_NOT_SUPPORTED; } @@ -374,12 +467,6 @@ int opal_hwloc_base_set_topology(char *topofile) support->cpubind->set_thisproc_cpubind = true; support->membind->set_thisproc_membind = true; - /* filter the cpus thru any default cpu set */ - rc = opal_hwloc_base_filter_cpus(opal_hwloc_topology); - if (OPAL_SUCCESS != rc) { - return rc; - } - /* fill opal_cache_line_size global with the smallest L1 cache line size */ fill_cache_line_size(); @@ -412,18 +499,20 @@ void opal_hwloc_base_free_topology(hwloc_topology_t topo) opal_hwloc_topo_data_t *rdata; unsigned k; - obj = hwloc_get_root_obj(topo); - /* release the root-level userdata */ - if (NULL != obj->userdata) { - rdata = (opal_hwloc_topo_data_t*)obj->userdata; - OBJ_RELEASE(rdata); - obj->userdata = NULL; - } - /* now recursively descend and release userdata - * in the rest of the objects - */ - for (k=0; k < obj->arity; k++) { - free_object(obj->children[k]); + if (!topo_in_shmem) { + obj = hwloc_get_root_obj(topo); + /* release the root-level userdata */ + if (NULL != obj->userdata) { + rdata = (opal_hwloc_topo_data_t*)obj->userdata; + OBJ_RELEASE(rdata); + obj->userdata = NULL; + } + /* now recursively descend and release userdata + * in the rest of the objects + */ + for (k=0; k < obj->arity; k++) { + free_object(obj->children[k]); + } } hwloc_topology_destroy(topo); } @@ -431,7 +520,6 @@ void opal_hwloc_base_free_topology(hwloc_topology_t topo) void opal_hwloc_base_get_local_cpuset(void) { hwloc_obj_t root; - hwloc_cpuset_t base_cpus; if (NULL != opal_hwloc_topology) { if (NULL == opal_hwloc_my_cpuset) { @@ -444,8 +532,7 @@ void opal_hwloc_base_get_local_cpuset(void) HWLOC_CPUBIND_PROCESS) < 0) { /* we are not bound - use the root's available cpuset */ root = hwloc_get_root_obj(opal_hwloc_topology); - base_cpus = opal_hwloc_base_get_available_cpus(opal_hwloc_topology, root); - hwloc_bitmap_copy(opal_hwloc_my_cpuset, base_cpus); + hwloc_bitmap_copy(opal_hwloc_my_cpuset, root->cpuset); } } } @@ -473,72 +560,6 @@ int opal_hwloc_base_report_bind_failure(const char *file, return OPAL_SUCCESS; } -hwloc_cpuset_t opal_hwloc_base_get_available_cpus(hwloc_topology_t topo, - hwloc_obj_t obj) -{ - hwloc_obj_t root; - hwloc_cpuset_t avail, specd=NULL; - opal_hwloc_topo_data_t *rdata; - opal_hwloc_obj_data_t *data; - - OPAL_OUTPUT_VERBOSE((10, opal_hwloc_base_framework.framework_output, - "hwloc:base: get available cpus")); - - /* get the node-level information */ - root = hwloc_get_root_obj(topo); - rdata = (opal_hwloc_topo_data_t*)root->userdata; - /* bozo check */ - if (NULL == rdata) { - rdata = OBJ_NEW(opal_hwloc_topo_data_t); - root->userdata = (void*)rdata; - OPAL_OUTPUT_VERBOSE((5, opal_hwloc_base_framework.framework_output, - "hwloc:base:get_available_cpus first time - filtering cpus")); - } - - /* are we asking about the root object? */ - if (obj == root) { - OPAL_OUTPUT_VERBOSE((5, opal_hwloc_base_framework.framework_output, - "hwloc:base:get_available_cpus root object")); - return rdata->available; - } - - /* some hwloc object types don't have cpus */ - if (NULL == obj->online_cpuset || NULL == obj->allowed_cpuset) { - return NULL; - } - - /* see if we already have this info */ - if (NULL == (data = (opal_hwloc_obj_data_t*)obj->userdata)) { - /* nope - create the object */ - data = OBJ_NEW(opal_hwloc_obj_data_t); - obj->userdata = (void*)data; - } - - /* do we have the cpuset */ - if (NULL != data->available) { - return data->available; - } - - /* find the available processors on this object */ - avail = hwloc_bitmap_alloc(); - hwloc_bitmap_and(avail, obj->online_cpuset, obj->allowed_cpuset); - - /* filter this against the node-available processors */ - if (NULL == rdata->available) { - hwloc_bitmap_free(avail); - return NULL; - } - specd = hwloc_bitmap_alloc(); - hwloc_bitmap_and(specd, avail, rdata->available); - - /* cache the info */ - data->available = specd; - - /* cleanup */ - hwloc_bitmap_free(avail); - return specd; -} - static void df_search_cores(hwloc_obj_t obj, unsigned int *cnt) { unsigned k; @@ -551,13 +572,6 @@ static void df_search_cores(hwloc_obj_t obj, unsigned int *cnt) obj->userdata = (void*)data; } if (NULL == opal_hwloc_base_cpu_list) { - if (!hwloc_bitmap_intersects(obj->cpuset, obj->allowed_cpuset)) { - /* - * do not count not allowed cores (e.g. cores with zero allowed PU) - * if SMT is enabled, do count cores with at least one allowed hwthread - */ - return; - } data->npus = 1; } *cnt += data->npus; @@ -604,7 +618,6 @@ unsigned int opal_hwloc_base_get_npus(hwloc_topology_t topo, { opal_hwloc_obj_data_t *data; unsigned int cnt = 0; - hwloc_cpuset_t cpuset; data = (opal_hwloc_obj_data_t*)obj->userdata; if (NULL == data || !data->npus_calculated) { @@ -628,12 +641,13 @@ unsigned int opal_hwloc_base_get_npus(hwloc_topology_t topo, df_search_cores(obj, &cnt); } } else { + hwloc_cpuset_t cpuset; /* if we are treating cores as cpus, or the system can't detect * "cores", then get the available cpuset for this object - this will * create and store the data */ - if (NULL == (cpuset = opal_hwloc_base_get_available_cpus(topo, obj))) { + if (NULL == (cpuset = obj->cpuset)) { return 0; } /* count the number of bits that are set - there is @@ -685,10 +699,13 @@ unsigned int opal_hwloc_base_get_obj_idx(hwloc_topology_t topo, return data->idx; } +#if HWLOC_API_VERSION < 0x20000 /* determine the number of objects of this type */ if (HWLOC_OBJ_CACHE == obj->type) { cache_level = obj->attr->cache.depth; } +#endif + nobjs = opal_hwloc_base_get_nbobjs_by_type(topo, obj->type, cache_level, rtype); OPAL_OUTPUT_VERBOSE((5, opal_hwloc_base_framework.framework_output, @@ -738,9 +755,11 @@ static hwloc_obj_t df_search(hwloc_topology_t topo, opal_hwloc_obj_data_t *data; if (target == start->type) { +#if HWLOC_API_VERSION < 0x20000 if (HWLOC_OBJ_CACHE == start->type && cache_level != start->attr->cache.depth) { goto notfound; } +#endif if (OPAL_HWLOC_LOGICAL == rtype) { /* the hwloc tree is composed of LOGICAL objects, so the only * time we come here is when we are looking for logical caches @@ -794,7 +813,7 @@ static hwloc_obj_t df_search(hwloc_topology_t topo, } /* see if we already know our available cpuset */ if (NULL == data->available) { - data->available = opal_hwloc_base_get_available_cpus(topo, start); + data->available = hwloc_bitmap_dup(start->cpuset); } if (NULL != data->available && !hwloc_bitmap_iszero(data->available)) { if (NULL != num_objs) { @@ -812,7 +831,9 @@ static hwloc_obj_t df_search(hwloc_topology_t topo, return NULL; } - notfound: +#if HWLOC_API_VERSION < 0x20000 + notfound: +#endif for (k=0; k < start->arity; k++) { obj = df_search(topo, start->children[k], target, cache_level, nobj, rtype, idx, num_objs); if (NULL != obj) { @@ -830,7 +851,6 @@ unsigned int opal_hwloc_base_get_nbobjs_by_type(hwloc_topology_t topo, { unsigned int num_objs, idx; hwloc_obj_t obj; - opal_list_item_t *item; opal_hwloc_summary_t *sum; opal_hwloc_topo_data_t *data; int rc; @@ -846,7 +866,11 @@ unsigned int opal_hwloc_base_get_nbobjs_by_type(hwloc_topology_t topo, * use the hwloc accessor to get it, unless it is a CACHE * as these are treated as special cases */ - if (OPAL_HWLOC_LOGICAL == rtype && HWLOC_OBJ_CACHE != target) { + if (OPAL_HWLOC_LOGICAL == rtype +#if HWLOC_API_VERSION < 0x20000 + && HWLOC_OBJ_CACHE != target +#endif + ) { /* we should not get an error back, but just in case... */ if (0 > (rc = hwloc_get_nbobjs_by_type(topo, target))) { opal_output(0, "UNKNOWN HWLOC ERROR"); @@ -866,10 +890,7 @@ unsigned int opal_hwloc_base_get_nbobjs_by_type(hwloc_topology_t topo, data = OBJ_NEW(opal_hwloc_topo_data_t); obj->userdata = (void*)data; } else { - for (item = opal_list_get_first(&data->summaries); - item != opal_list_get_end(&data->summaries); - item = opal_list_get_next(item)) { - sum = (opal_hwloc_summary_t*)item; + OPAL_LIST_FOREACH(sum, &data->summaries, opal_hwloc_summary_t) { if (target == sum->type && cache_level == sum->cache_level && rtype == sum->rtype) { @@ -915,9 +936,11 @@ static hwloc_obj_t df_search_min_bound(hwloc_topology_t topo, if (0 == (k = opal_hwloc_base_get_npus(topo, start))) { goto notfound; } +#if HWLOC_API_VERSION < 0x20000 if (HWLOC_OBJ_CACHE == start->type && cache_level != start->attr->cache.depth) { goto notfound; } +#endif /* see how many procs are bound to us */ data = (opal_hwloc_obj_data_t*)start->userdata; if (NULL == data) { @@ -980,14 +1003,18 @@ hwloc_obj_t opal_hwloc_base_find_min_bound_target_under_obj(hwloc_topology_t top /* again, we have to treat caches differently as * the levels distinguish them */ +#if HWLOC_API_VERSION < 0x20000 if (HWLOC_OBJ_CACHE == target && cache_level < obj->attr->cache.depth) { goto moveon; } +#endif return obj; } - moveon: +#if HWLOC_API_VERSION < 0x20000 + moveon: +#endif /* the hwloc accessors all report at the topo level, * so we have to do some work */ @@ -996,16 +1023,17 @@ hwloc_obj_t opal_hwloc_base_find_min_bound_target_under_obj(hwloc_topology_t top loc = df_search_min_bound(topo, obj, target, cache_level, &min_bound); if (NULL != loc) { +#if HWLOC_API_VERSION < 0x20000 if (HWLOC_OBJ_CACHE == target) { OPAL_OUTPUT_VERBOSE((5, opal_hwloc_base_framework.framework_output, "hwloc:base:min_bound_under_obj found min bound of %u on %s:%u:%u", min_bound, hwloc_obj_type_string(target), cache_level, loc->logical_index)); - } else { + } else +#endif OPAL_OUTPUT_VERBOSE((5, opal_hwloc_base_framework.framework_output, "hwloc:base:min_bound_under_obj found min bound of %u on %s:%u", min_bound, hwloc_obj_type_string(target), loc->logical_index)); - } } return loc; @@ -1032,7 +1060,11 @@ hwloc_obj_t opal_hwloc_base_get_obj_by_type(hwloc_topology_t topo, * use the hwloc accessor to get it, unless it is a CACHE * as these are treated as special cases */ - if (OPAL_HWLOC_LOGICAL == rtype && HWLOC_OBJ_CACHE != target) { + if (OPAL_HWLOC_LOGICAL == rtype +#if HWLOC_API_VERSION < 0x20000 + && HWLOC_OBJ_CACHE != target +#endif + ) { return hwloc_get_obj_by_type(topo, target, instance); } @@ -1095,7 +1127,6 @@ static int socket_to_cpu_set(char *cpus, int lower_range, upper_range; int socket_id; hwloc_obj_t obj; - hwloc_bitmap_t res; if ('*' == cpus[0]) { /* requesting cpumask for ALL sockets */ @@ -1103,8 +1134,7 @@ static int socket_to_cpu_set(char *cpus, /* set to all available processors - essentially, * this specification equates to unbound */ - res = opal_hwloc_base_get_available_cpus(topo, obj); - hwloc_bitmap_or(cpumask, cpumask, res); + hwloc_bitmap_or(cpumask, cpumask, obj->cpuset); return OPAL_SUCCESS; } @@ -1115,8 +1145,7 @@ static int socket_to_cpu_set(char *cpus, socket_id = atoi(range[0]); obj = opal_hwloc_base_get_obj_by_type(topo, HWLOC_OBJ_SOCKET, 0, socket_id, rtype); /* get the available cpus for this socket */ - res = opal_hwloc_base_get_available_cpus(topo, obj); - hwloc_bitmap_or(cpumask, cpumask, res); + hwloc_bitmap_or(cpumask, cpumask, obj->cpuset); break; case 2: /* range of sockets was given */ @@ -1125,10 +1154,8 @@ static int socket_to_cpu_set(char *cpus, /* cycle across the range of sockets */ for (socket_id=lower_range; socket_id<=upper_range; socket_id++) { obj = opal_hwloc_base_get_obj_by_type(topo, HWLOC_OBJ_SOCKET, 0, socket_id, rtype); - /* get the available cpus for this socket */ - res = opal_hwloc_base_get_available_cpus(topo, obj); - /* set the corresponding bits in the bitmask */ - hwloc_bitmap_or(cpumask, cpumask, res); + /* set the available cpus for this socket bits in the bitmask */ + hwloc_bitmap_or(cpumask, cpumask, obj->cpuset); } break; default: @@ -1152,7 +1179,6 @@ static int socket_core_to_cpu_set(char *socket_core_list, int lower_range, upper_range; int socket_id, core_id; hwloc_obj_t socket, core; - hwloc_cpuset_t res; unsigned int idx; hwloc_obj_type_t obj_type = HWLOC_OBJ_CORE; @@ -1182,9 +1208,8 @@ static int socket_core_to_cpu_set(char *socket_core_list, corestr = socket_core[i]; } if ('*' == corestr[0]) { - /* set to all available cpus on this socket */ - res = opal_hwloc_base_get_available_cpus(topo, socket); - hwloc_bitmap_or(cpumask, cpumask, res); + /* set to all cpus on this socket */ + hwloc_bitmap_or(cpumask, cpumask, socket->cpuset); /* we are done - already assigned all cores! */ rc = OPAL_SUCCESS; break; @@ -1208,8 +1233,7 @@ static int socket_core_to_cpu_set(char *socket_core_list, return OPAL_ERR_NOT_FOUND; } /* get the cpus */ - res = opal_hwloc_base_get_available_cpus(topo, core); - hwloc_bitmap_or(cpumask, cpumask, res); + hwloc_bitmap_or(cpumask, cpumask, core->cpuset); } opal_argv_free(list); break; @@ -1230,10 +1254,8 @@ static int socket_core_to_cpu_set(char *socket_core_list, opal_argv_free(socket_core); return OPAL_ERR_NOT_FOUND; } - /* get the cpus */ - res = opal_hwloc_base_get_available_cpus(topo, core); - /* add them into the result */ - hwloc_bitmap_or(cpumask, cpumask, res); + /* get the cpus add them into the result */ + hwloc_bitmap_or(cpumask, cpumask, core->cpuset); } break; @@ -1258,7 +1280,6 @@ int opal_hwloc_base_cpu_list_parse(const char *slot_str, char **item, **rngs; int rc, i, j, k; hwloc_obj_t pu; - hwloc_cpuset_t pucpus; char **range, **list; size_t range_cnt; int core_id, lower_range, upper_range; @@ -1352,10 +1373,8 @@ int opal_hwloc_base_cpu_list_parse(const char *slot_str, opal_argv_free(list); return OPAL_ERR_SILENT; } - /* get the available cpus for that object */ - pucpus = opal_hwloc_base_get_available_cpus(topo, pu); - /* set that in the mask */ - hwloc_bitmap_or(cpumask, cpumask, pucpus); + /* get the cpus for that object and set them in the massk*/ + hwloc_bitmap_or(cpumask, cpumask, pu->cpuset); } opal_argv_free(list); break; @@ -1371,10 +1390,8 @@ int opal_hwloc_base_cpu_list_parse(const char *slot_str, opal_argv_free(rngs); return OPAL_ERR_SILENT; } - /* get the available cpus for that object */ - pucpus = opal_hwloc_base_get_available_cpus(topo, pu); - /* set that in the mask */ - hwloc_bitmap_or(cpumask, cpumask, pucpus); + /* get the cpus for that object and set them in the mask*/ + hwloc_bitmap_or(cpumask, cpumask, pu->cpuset); } break; @@ -1399,7 +1416,6 @@ opal_hwloc_locality_t opal_hwloc_base_get_relative_locality(hwloc_topology_t top opal_hwloc_locality_t locality; hwloc_obj_t obj; unsigned depth, d, width, w; - hwloc_cpuset_t avail; bool shared; hwloc_obj_type_t type; int sect1, sect2; @@ -1433,7 +1449,13 @@ opal_hwloc_locality_t opal_hwloc_base_get_relative_locality(hwloc_topology_t top /* if it isn't one of interest, then ignore it */ if (HWLOC_OBJ_NODE != type && HWLOC_OBJ_SOCKET != type && +#if HWLOC_API_VERSION < 0x20000 HWLOC_OBJ_CACHE != type && +#else + HWLOC_OBJ_L3CACHE != type && + HWLOC_OBJ_L2CACHE != type && + HWLOC_OBJ_L1CACHE != type && +#endif HWLOC_OBJ_CORE != type && HWLOC_OBJ_PU != type) { continue; @@ -1447,11 +1469,9 @@ opal_hwloc_locality_t opal_hwloc_base_get_relative_locality(hwloc_topology_t top for (w=0; w < width; w++) { /* get the object at this depth/index */ obj = hwloc_get_obj_by_depth(topo, d, w); - /* get the available cpuset for this obj */ - avail = opal_hwloc_base_get_available_cpus(topo, obj); - /* see if our locations intersect with it */ - sect1 = hwloc_bitmap_intersects(avail, loc1); - sect2 = hwloc_bitmap_intersects(avail, loc2); + /* see if our locations intersect with the cpuset for this obj */ + sect1 = hwloc_bitmap_intersects(obj->cpuset, loc1); + sect2 = hwloc_bitmap_intersects(obj->cpuset, loc2); /* if both intersect, then we share this level */ if (sect1 && sect2) { shared = true; @@ -1462,6 +1482,7 @@ opal_hwloc_locality_t opal_hwloc_base_get_relative_locality(hwloc_topology_t top case HWLOC_OBJ_SOCKET: locality |= OPAL_PROC_ON_SOCKET; break; +#if HWLOC_API_VERSION < 0x20000 case HWLOC_OBJ_CACHE: if (3 == obj->attr->cache.depth) { locality |= OPAL_PROC_ON_L3CACHE; @@ -1471,6 +1492,17 @@ opal_hwloc_locality_t opal_hwloc_base_get_relative_locality(hwloc_topology_t top locality |= OPAL_PROC_ON_L1CACHE; } break; +#else + case HWLOC_OBJ_L3CACHE: + locality |= OPAL_PROC_ON_L3CACHE; + break; + case HWLOC_OBJ_L2CACHE: + locality |= OPAL_PROC_ON_L2CACHE; + break; + case HWLOC_OBJ_L1CACHE: + locality |= OPAL_PROC_ON_L1CACHE; + break; +#endif case HWLOC_OBJ_CORE: locality |= OPAL_PROC_ON_CORE; break; @@ -1867,9 +1899,7 @@ int opal_hwloc_base_cset2str(char *str, int len, /* if the cpuset includes all available cpus, then we are unbound */ root = hwloc_get_root_obj(topo); - if (NULL == root->userdata) { - opal_hwloc_base_filter_cpus(topo); - } else { + if (NULL != root->userdata) { sum = (opal_hwloc_topo_data_t*)root->userdata; if (NULL == sum->available) { return OPAL_ERROR; @@ -1937,9 +1967,7 @@ int opal_hwloc_base_cset2mapstr(char *str, int len, /* if the cpuset includes all available cpus, then we are unbound */ root = hwloc_get_root_obj(topo); - if (NULL == root->userdata) { - opal_hwloc_base_filter_cpus(topo); - } else { + if (NULL != root->userdata) { sum = (opal_hwloc_topo_data_t*)root->userdata; if (NULL == sum->available) { return OPAL_ERROR; @@ -2009,14 +2037,18 @@ static int dist_cmp_fn (opal_list_item_t **a, opal_list_item_t **b) static void sort_by_dist(hwloc_topology_t topo, char* device_name, opal_list_t *sorted_list) { hwloc_obj_t device_obj = NULL; - hwloc_obj_t obj = NULL, root = NULL; - const struct hwloc_distances_s* distances; + hwloc_obj_t obj = NULL; + struct hwloc_distances_s* distances; opal_rmaps_numa_node_t *numa_node; int close_node_index; float latency; unsigned int j; +#if HWLOC_API_VERSION < 0x20000 + hwloc_obj_t root = NULL; int depth; unsigned i; +#endif + unsigned distances_nr = 0; for (device_obj = hwloc_get_obj_by_type(topo, HWLOC_OBJ_OS_DEVICE, 0); device_obj; device_obj = hwloc_get_next_osdev(topo, device_obj)) { if (device_obj->attr->osdev.type == HWLOC_OBJ_OSDEV_OPENFABRICS @@ -2037,6 +2069,7 @@ static void sort_by_dist(hwloc_topology_t topo, char* device_name, opal_list_t * } /* find distance matrix for all numa nodes */ +#if HWLOC_API_VERSION < 0x20000 distances = hwloc_get_whole_distance_matrix_by_type(topo, HWLOC_OBJ_NODE); if (NULL == distances) { /* we can try to find distances under group object. This info can be there. */ @@ -2073,6 +2106,24 @@ static void sort_by_dist(hwloc_topology_t topo, char* device_name, opal_list_t * numa_node->dist_from_closed = latency; opal_list_append(sorted_list, &numa_node->super); } +#else + distances_nr = 1; + if (0 != hwloc_distances_get_by_type(topo, HWLOC_OBJ_NODE, &distances_nr, &distances, + HWLOC_DISTANCES_KIND_MEANS_LATENCY, 0) || 0 == distances_nr) { + opal_output_verbose(5, opal_hwloc_base_framework.framework_output, + "hwloc:base:get_sorted_numa_list: There is no information about distances on the node."); + return; + } + /* fill list of numa nodes */ + for (j = 0; j < distances->nbobjs; j++) { + latency = distances->values[close_node_index + distances->nbobjs * j]; + numa_node = OBJ_NEW(opal_rmaps_numa_node_t); + numa_node->index = j; + numa_node->dist_from_closed = latency; + opal_list_append(sorted_list, &numa_node->super); + } + hwloc_distances_release(topo, distances); +#endif /* sort numa nodes by distance from the closest one to PCI */ opal_list_sort(sorted_list, dist_cmp_fn); return; @@ -2098,7 +2149,6 @@ static int find_devices(hwloc_topology_t topo, char** device_name) int opal_hwloc_get_sorted_numa_list(hwloc_topology_t topo, char* device_name, opal_list_t *sorted_list) { hwloc_obj_t obj; - opal_list_item_t *item; opal_hwloc_summary_t *sum; opal_hwloc_topo_data_t *data; opal_rmaps_numa_node_t *numa, *copy_numa; @@ -2110,10 +2160,7 @@ int opal_hwloc_get_sorted_numa_list(hwloc_topology_t topo, char* device_name, op /* we call opal_hwloc_base_get_nbobjs_by_type() before it to fill summary object so it should exist*/ data = (opal_hwloc_topo_data_t*)obj->userdata; if (NULL != data) { - for (item = opal_list_get_first(&data->summaries); - item != opal_list_get_end(&data->summaries); - item = opal_list_get_next(item)) { - sum = (opal_hwloc_summary_t*)item; + OPAL_LIST_FOREACH(sum, &data->summaries, opal_hwloc_summary_t) { if (HWLOC_OBJ_NODE == sum->type) { if (opal_list_get_size(&sum->sorted_by_dist_list) > 0) { OPAL_LIST_FOREACH(numa, &(sum->sorted_by_dist_list), opal_rmaps_numa_node_t) { @@ -2163,15 +2210,15 @@ int opal_hwloc_get_sorted_numa_list(hwloc_topology_t topo, char* device_name, op char* opal_hwloc_base_get_topo_signature(hwloc_topology_t topo) { int nnuma, nsocket, nl3, nl2, nl1, ncore, nhwt; - char *sig=NULL, *arch=NULL; + char *sig=NULL, *arch = NULL, *endian; hwloc_obj_t obj; unsigned i; nnuma = opal_hwloc_base_get_nbobjs_by_type(topo, HWLOC_OBJ_NODE, 0, OPAL_HWLOC_AVAILABLE); nsocket = opal_hwloc_base_get_nbobjs_by_type(topo, HWLOC_OBJ_SOCKET, 0, OPAL_HWLOC_AVAILABLE); - nl3 = opal_hwloc_base_get_nbobjs_by_type(topo, HWLOC_OBJ_CACHE, 3, OPAL_HWLOC_AVAILABLE); - nl2 = opal_hwloc_base_get_nbobjs_by_type(topo, HWLOC_OBJ_CACHE, 2, OPAL_HWLOC_AVAILABLE); - nl1 = opal_hwloc_base_get_nbobjs_by_type(topo, HWLOC_OBJ_CACHE, 1, OPAL_HWLOC_AVAILABLE); + nl3 = opal_hwloc_base_get_nbobjs_by_type(topo, HWLOC_OBJ_L3CACHE, 3, OPAL_HWLOC_AVAILABLE); + nl2 = opal_hwloc_base_get_nbobjs_by_type(topo, HWLOC_OBJ_L2CACHE, 2, OPAL_HWLOC_AVAILABLE); + nl1 = opal_hwloc_base_get_nbobjs_by_type(topo, HWLOC_OBJ_L1CACHE, 1, OPAL_HWLOC_AVAILABLE); ncore = opal_hwloc_base_get_nbobjs_by_type(topo, HWLOC_OBJ_CORE, 0, OPAL_HWLOC_AVAILABLE); nhwt = opal_hwloc_base_get_nbobjs_by_type(topo, HWLOC_OBJ_PU, 0, OPAL_HWLOC_AVAILABLE); @@ -2183,14 +2230,22 @@ char* opal_hwloc_base_get_topo_signature(hwloc_topology_t topo) break; } } - if (NULL == arch) { - asprintf(&sig, "%dN:%dS:%dL3:%dL2:%dL1:%dC:%dH", - nnuma, nsocket, nl3, nl2, nl1, ncore, nhwt); - } else { - asprintf(&sig, "%dN:%dS:%dL3:%dL2:%dL1:%dC:%dH:%s", - nnuma, nsocket, nl3, nl2, nl1, ncore, nhwt, arch); + arch = "unknown"; } + +#ifdef __BYTE_ORDER +#if __BYTE_ORDER == __LITTLE_ENDIAN + endian = "le"; +#else + endian = "be"; +#endif +#else + endian = "unknown"; +#endif + + asprintf(&sig, "%dN:%dS:%dL3:%dL2:%dL1:%dC:%dH:%s:%s", + nnuma, nsocket, nl3, nl2, nl1, ncore, nhwt, arch, endian); return sig; } @@ -2200,7 +2255,7 @@ char* opal_hwloc_base_get_locality_string(hwloc_topology_t topo, hwloc_obj_t obj; char *locality=NULL, *tmp, *t2; unsigned depth, d, width, w; - hwloc_cpuset_t cpuset, avail, result; + hwloc_cpuset_t cpuset, result; hwloc_obj_type_t type; /* if this proc is not bound, then there is no locality. We @@ -2230,7 +2285,13 @@ char* opal_hwloc_base_get_locality_string(hwloc_topology_t topo, /* if it isn't one of interest, then ignore it */ if (HWLOC_OBJ_NODE != type && HWLOC_OBJ_SOCKET != type && +#if HWLOC_API_VERSION < 0x20000 HWLOC_OBJ_CACHE != type && +#else + HWLOC_OBJ_L1CACHE != type && + HWLOC_OBJ_L2CACHE != type && + HWLOC_OBJ_L3CACHE != type && +#endif HWLOC_OBJ_CORE != type && HWLOC_OBJ_PU != type) { continue; @@ -2248,10 +2309,8 @@ char* opal_hwloc_base_get_locality_string(hwloc_topology_t topo, for (w=0; w < width; w++) { /* get the object at this depth/index */ obj = hwloc_get_obj_by_depth(topo, d, w); - /* get the available cpuset for this obj */ - avail = opal_hwloc_base_get_available_cpus(topo, obj); /* see if the location intersects with it */ - if (hwloc_bitmap_intersects(avail, cpuset)) { + if (hwloc_bitmap_intersects(obj->cpuset, cpuset)) { hwloc_bitmap_set(result, w); } } @@ -2274,6 +2333,7 @@ char* opal_hwloc_base_get_locality_string(hwloc_topology_t topo, } locality = t2; break; +#if HWLOC_API_VERSION < 0x20000 case HWLOC_OBJ_CACHE: if (3 == obj->attr->cache.depth) { asprintf(&t2, "%sL3%s:", (NULL == locality) ? "" : locality, tmp); @@ -2298,6 +2358,29 @@ char* opal_hwloc_base_get_locality_string(hwloc_topology_t topo, break; } break; +#else + case HWLOC_OBJ_L3CACHE: + asprintf(&t2, "%sL3%s:", (NULL == locality) ? "" : locality, tmp); + if (NULL != locality) { + free(locality); + } + locality = t2; + break; + case HWLOC_OBJ_L2CACHE: + asprintf(&t2, "%sL2%s:", (NULL == locality) ? "" : locality, tmp); + if (NULL != locality) { + free(locality); + } + locality = t2; + break; + case HWLOC_OBJ_L1CACHE: + asprintf(&t2, "%sL1%s:", (NULL == locality) ? "" : locality, tmp); + if (NULL != locality) { + free(locality); + } + locality = t2; + break; +#endif case HWLOC_OBJ_CORE: asprintf(&t2, "%sCR%s:", (NULL == locality) ? "" : locality, tmp); if (NULL != locality) { @@ -2348,6 +2431,7 @@ char* opal_hwloc_base_get_location(char *locality, case HWLOC_OBJ_SOCKET: srch = "SK"; break; +#if HWLOC_API_VERSION < 0x20000 case HWLOC_OBJ_CACHE: if (3 == index) { srch = "L3"; @@ -2357,6 +2441,17 @@ char* opal_hwloc_base_get_location(char *locality, srch = "L0"; } break; +#else + case HWLOC_OBJ_L3CACHE: + srch = "L3"; + break; + case HWLOC_OBJ_L2CACHE: + srch = "L2"; + break; + case HWLOC_OBJ_L1CACHE: + srch = "L0"; + break; +#endif case HWLOC_OBJ_CORE: srch = "CR"; break; @@ -2442,3 +2537,23 @@ opal_hwloc_locality_t opal_hwloc_compute_relative_locality(char *loc1, char *loc hwloc_bitmap_free(bit2); return locality; } + +int opal_hwloc_base_topology_export_xmlbuffer(hwloc_topology_t topology, char **xmlpath, int *buflen) { +#if HWLOC_API_VERSION < 0x20000 + return hwloc_topology_export_xmlbuffer(topology, xmlpath, buflen); +#else + return hwloc_topology_export_xmlbuffer(topology, xmlpath, buflen, 0); +#endif +} + +int opal_hwloc_base_topology_set_flags (hwloc_topology_t topology, unsigned long flags, bool io) { + if (io) { +#if HWLOC_API_VERSION < 0x20000 + flags |= HWLOC_TOPOLOGY_FLAG_IO_DEVICES; +#else + int ret = hwloc_topology_set_io_types_filter(topology, HWLOC_TYPE_FILTER_KEEP_IMPORTANT); + if (0 != ret) return ret; +#endif + } + return hwloc_topology_set_flags(topology, flags); +} diff --git a/opal/mca/hwloc/external/configure.m4 b/opal/mca/hwloc/external/configure.m4 index 032eebce59a..411d8ad1c1f 100644 --- a/opal/mca/hwloc/external/configure.m4 +++ b/opal/mca/hwloc/external/configure.m4 @@ -103,7 +103,8 @@ AC_DEFUN([MCA_opal_hwloc_external_CONFIG],[ AS_IF([test "$with_hwloc" = "external"], [opal_hwloc_external_want=yes]) AS_IF([test "$with_hwloc" != "" && \ test "$with_hwloc" != "no" && \ - test "$with_hwloc" != "internal"], [opal_hwloc_external_want=yes]) + test "$with_hwloc" != "internal" && \ + test "$with_hwloc" != "future"], [opal_hwloc_external_want=yes]) AS_IF([test "$with_hwloc" = "no"], [opal_hwloc_external_want=no]) # If we still want external support, try it @@ -183,21 +184,7 @@ AC_DEFUN([MCA_opal_hwloc_external_CONFIG],[ [AC_MSG_RESULT([yes])], [AC_MSG_RESULT([no]) AC_MSG_ERROR([Cannot continue])]) - AC_MSG_CHECKING([if external hwloc version is lower than 2.0]) - AS_IF([test "$opal_hwloc_dir" != ""], - [opal_hwloc_external_CFLAGS_save=$CFLAGS - CFLAGS="-I$opal_hwloc_dir/include $opal_hwloc_external_CFLAGS_save"]) - AC_COMPILE_IFELSE( - [AC_LANG_PROGRAM([[#include ]], - [[ -#if HWLOC_API_VERSION >= 0x00020000 -#error "hwloc API version is greater or equal than 0x00020000" -#endif - ]])], - [AC_MSG_RESULT([yes])], - [AC_MSG_RESULT([no]) - AC_MSG_ERROR([OMPI does not currently support hwloc v2 API -Cannot continue])]) + AS_IF([test "$opal_hwloc_dir" != ""], [CFLAGS=$opal_hwloc_external_CFLAGS_save]) diff --git a/opal/mca/hwloc/external/external.h b/opal/mca/hwloc/external/external.h index 0b04d3cf33b..6558a0bcbd1 100644 --- a/opal/mca/hwloc/external/external.h +++ b/opal/mca/hwloc/external/external.h @@ -3,7 +3,7 @@ * Copyright (c) 2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. * - * Copyright (c) 2016 Intel, Inc. All rights reserved. + * Copyright (c) 2016-2017 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -43,6 +43,11 @@ BEGIN_C_DECLS # endif #endif +#if HWLOC_API_VERSION < 0x00010b00 +#define HWLOC_OBJ_NUMANODE HWLOC_OBJ_NODE +#define HWLOC_OBJ_PACKAGE HWLOC_OBJ_SOCKET +#endif + END_C_DECLS #endif /* MCA_OPAL_HWLOC_EXTERNAL_H */ diff --git a/opal/mca/hwloc/hwloc1113/Makefile.am b/opal/mca/hwloc/hwloc1113/Makefile.am deleted file mode 100644 index 78c39895e24..00000000000 --- a/opal/mca/hwloc/hwloc1113/Makefile.am +++ /dev/null @@ -1,86 +0,0 @@ -# -# Copyright (c) 2011-2016 Cisco Systems, Inc. All rights reserved. -# Copyright (c) 2014-2015 Intel, Inc. All right reserved. -# Copyright (c) 2016 Los Alamos National Security, LLC. All rights -# reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - -# Due to what might be a bug in Automake, we need to remove stamp-h? -# files manually. See -# http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19418. -DISTCLEANFILES = \ - hwloc/include/hwloc/autogen/stamp-h? \ - hwloc/include/private/autogen/stamp-h? - -# Need to include these files so that these directories are carried in -# the tarball (in case someone invokes autogen.sh on a dist tarball). -EXTRA_DIST = \ - hwloc/doc/README.txt \ - hwloc/contrib/systemd/README.txt \ - hwloc/tests/README.txt \ - hwloc/utils/README.txt - -SUBDIRS = hwloc - -# Headers and sources -headers = hwloc1113.h -sources = hwloc1113_component.c - -# We only ever build this component statically -noinst_LTLIBRARIES = libmca_hwloc_hwloc1113.la -libmca_hwloc_hwloc1113_la_SOURCES = $(headers) $(sources) -nodist_libmca_hwloc_hwloc1113_la_SOURCES = $(nodist_headers) -libmca_hwloc_hwloc1113_la_LDFLAGS = -module -avoid-version $(opal_hwloc_hwloc1113_LDFLAGS) -libmca_hwloc_hwloc1113_la_LIBADD = $(opal_hwloc_hwloc1113_LIBS) -libmca_hwloc_hwloc1113_la_DEPENDENCIES = \ - $(HWLOC_top_builddir)/src/libhwloc_embedded.la - -# Since the rest of the code base includes the underlying hwloc.h, we -# also have to install the underlying header files when -# --with-devel-headers is specified. hwloc doesn't support this; the -# least gross way to make this happen is just to list all of hwloc's -# header files here. :-( -headers += \ - hwloc/include/hwloc.h \ - hwloc/include/hwloc/bitmap.h \ - hwloc/include/hwloc/cuda.h \ - hwloc/include/hwloc/cudart.h \ - hwloc/include/hwloc/deprecated.h \ - hwloc/include/hwloc/diff.h \ - hwloc/include/hwloc/gl.h \ - hwloc/include/hwloc/helper.h \ - hwloc/include/hwloc/inlines.h \ - hwloc/include/hwloc/intel-mic.h \ - hwloc/include/hwloc/myriexpress.h \ - hwloc/include/hwloc/nvml.h \ - hwloc/include/hwloc/opencl.h \ - hwloc/include/hwloc/openfabrics-verbs.h \ - hwloc/include/hwloc/plugins.h \ - hwloc/include/hwloc/rename.h \ - hwloc/include/private/private.h \ - hwloc/include/private/debug.h \ - hwloc/include/private/misc.h \ - hwloc/include/private/cpuid-x86.h -nodist_headers = hwloc/include/hwloc/autogen/config.h - -if HWLOC_HAVE_LINUX -headers += \ - hwloc/include/hwloc/linux.h \ - hwloc/include/hwloc/linux-libnuma.h -endif HWLOC_HAVE_LINUX - -if HWLOC_HAVE_SCHED_SETAFFINITY -headers += hwloc/include/hwloc/glibc-sched.h -endif HWLOC_HAVE_SCHED_SETAFFINITY - -# Conditionally install the header files -if WANT_INSTALL_HEADERS -opaldir = $(opalincludedir)/$(subdir) -nobase_opal_HEADERS = $(headers) -nobase_nodist_opal_HEADERS = $(nodist_headers) -endif diff --git a/opal/mca/hwloc/hwloc1113/README-ompi.txt b/opal/mca/hwloc/hwloc1113/README-ompi.txt deleted file mode 100644 index 1679af491ef..00000000000 --- a/opal/mca/hwloc/hwloc1113/README-ompi.txt +++ /dev/null @@ -1,6 +0,0 @@ -Cherry-picked commits after 1.11.3: - -open-mpi/hwloc@9549fd59af04dca2e2340e17f0e685f8c552d818 -open-mpi/hwloc@0ab7af5e90fc2b58be30b2126cc2a73f9f7ecfe9 -open-mpi/hwloc@8b44fb1c812d01582887548c2fc28ee78255619 -open-mpi/hwloc@d4565c351e5f01e27d3e106e3a4c2f971a37c9dd diff --git a/opal/mca/hwloc/hwloc1113/configure.m4 b/opal/mca/hwloc/hwloc1113/configure.m4 deleted file mode 100644 index 95d68607ec1..00000000000 --- a/opal/mca/hwloc/hwloc1113/configure.m4 +++ /dev/null @@ -1,189 +0,0 @@ -# -*- shell-script -*- -# -# Copyright (c) 2009-2017 Cisco Systems, Inc. All rights reserved -# Copyright (c) 2014-2015 Intel, Inc. All rights reserved. -# Copyright (c) 2015-2017 Research Organization for Information Science -# and Technology (RIST). All rights reserved. -# Copyright (c) 2016 Los Alamos National Security, LLC. All rights -# reserved. -# -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - -# -# Priority -# -AC_DEFUN([MCA_opal_hwloc_hwloc1113_PRIORITY], [90]) - -# -# Force this component to compile in static-only mode -# -AC_DEFUN([MCA_opal_hwloc_hwloc1113_COMPILE_MODE], [ - AC_MSG_CHECKING([for MCA component $2:$3 compile mode]) - $4="static" - AC_MSG_RESULT([$$4]) -]) - -# Include hwloc m4 files -m4_include(opal/mca/hwloc/hwloc1113/hwloc/config/hwloc.m4) -m4_include(opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_pkg.m4) -m4_include(opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_check_attributes.m4) -m4_include(opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_check_visibility.m4) -m4_include(opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_check_vendor.m4) -m4_include(opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_components.m4) - -# MCA_hwloc_hwloc1113_POST_CONFIG() -# --------------------------------- -AC_DEFUN([MCA_opal_hwloc_hwloc1113_POST_CONFIG],[ - OPAL_VAR_SCOPE_PUSH([opal_hwloc_hwloc1113_basedir]) - - # If we won, then do all the rest of the setup - AS_IF([test "$1" = "1" && test "$opal_hwloc_hwloc1113_support" = "yes"], - [ - # Set this variable so that the framework m4 knows what - # file to include in opal/mca/hwloc/hwloc-internal.h - opal_hwloc_hwloc1113_basedir=opal/mca/hwloc/hwloc1113 - opal_hwloc_base_include="$opal_hwloc_hwloc1113_basedir/hwloc1113.h" - - # Add some stuff to CPPFLAGS so that the rest of the source - # tree can be built - file=$opal_hwloc_hwloc1113_basedir/hwloc - CPPFLAGS="-I$OPAL_TOP_SRCDIR/$file/include $CPPFLAGS" - AS_IF([test "$OPAL_TOP_BUILDDIR" != "$OPAL_TOP_SRCDIR"], - [CPPFLAGS="-I$OPAL_TOP_BUILDDIR/$file/include $CPPFLAGS"]) - unset file - ]) - OPAL_VAR_SCOPE_POP - - # This must be run unconditionally - HWLOC_DO_AM_CONDITIONALS -])dnl - - -# MCA_hwloc_hwloc1113_CONFIG([action-if-found], [action-if-not-found]) -# -------------------------------------------------------------------- -AC_DEFUN([MCA_opal_hwloc_hwloc1113_CONFIG],[ - # Hwloc needs to know if we have Verbs support - AC_REQUIRE([OPAL_CHECK_VERBS_DIR]) - - AC_CONFIG_FILES([opal/mca/hwloc/hwloc1113/Makefile]) - - OPAL_VAR_SCOPE_PUSH([HWLOC_VERSION opal_hwloc_hwloc1113_save_CPPFLAGS opal_hwloc_hwloc1113_save_LDFLAGS opal_hwloc_hwloc1113_save_LIBS opal_hwloc_hwloc1113_save_cairo opal_hwloc_hwloc1113_save_xml opal_hwloc_hwloc1113_basedir opal_hwloc_hwloc1113_file opal_hwloc_hwloc1113_save_cflags CPPFLAGS_save LIBS_save opal_hwloc_external]) - - # default to this component not providing support - opal_hwloc_hwloc1113_basedir=opal/mca/hwloc/hwloc1113 - opal_hwloc_hwloc1113_support=no - - AS_IF([test "$with_hwloc" = "internal" || test -z "$with_hwloc" || test "$with_hwloc" = "yes"], - [opal_hwloc_external="no"], - [opal_hwloc_external="yes"]) - - opal_hwloc_hwloc1113_save_CPPFLAGS=$CPPFLAGS - opal_hwloc_hwloc1113_save_LDFLAGS=$LDFLAGS - opal_hwloc_hwloc1113_save_LIBS=$LIBS - - # Run the hwloc configuration - if no external hwloc, then set the prefixi - # to minimize the chance that someone will use the internal symbols - AS_IF([test "$opal_hwloc_external" = "no"], - [HWLOC_SET_SYMBOL_PREFIX([opal_hwloc1113_])]) - - # save XML or graphical options - opal_hwloc_hwloc1113_save_cairo=$enable_cairo - opal_hwloc_hwloc1113_save_xml=$enable_xml - opal_hwloc_hwloc1113_save_static=$enable_static - opal_hwloc_hwloc1113_save_shared=$enable_shared - opal_hwloc_hwloc1113_save_plugins=$enable_plugins - - # never enable hwloc's graphical option - enable_cairo=no - - # never enable hwloc's plugin system - enable_plugins=no - enable_static=yes - enable_shared=no - - # Override -- disable hwloc's libxml2 support, but enable the - # native hwloc XML support - enable_libxml2=no - enable_xml=yes - - # hwloc checks for compiler visibility, and its needs to do - # this without "picky" flags. - opal_hwloc_hwloc1113_save_cflags=$CFLAGS - CFLAGS=$OPAL_CFLAGS_BEFORE_PICKY - HWLOC_SETUP_CORE([opal/mca/hwloc/hwloc1113/hwloc], - [AC_MSG_CHECKING([whether hwloc configure succeeded]) - AC_MSG_RESULT([yes]) - HWLOC_VERSION="internal v`$srcdir/$opal_hwloc_hwloc1113_basedir/hwloc/config/hwloc_get_version.sh $srcdir/$opal_hwloc_hwloc1113_basedir/hwloc/VERSION`" - - # Build flags for our Makefile.am - opal_hwloc_hwloc1113_LDFLAGS='$(HWLOC_EMBEDDED_LDFLAGS)' - opal_hwloc_hwloc1113_LIBS='$(OPAL_TOP_BUILDDIR)/'"$opal_hwloc_hwloc1113_basedir"'/hwloc/src/libhwloc_embedded.la $(HWLOC_EMBEDDED_LIBS)' - opal_hwloc_hwloc1113_support=yes - - AC_DEFINE_UNQUOTED([HWLOC_HWLOC1113_HWLOC_VERSION], - ["$HWLOC_VERSION"], - [Version of hwloc]) - - # Do we have verbs support? - CPPFLAGS_save=$CPPFLAGS - AS_IF([test "$opal_want_verbs" = "yes"], - [CPPFLAGS="-I$opal_verbs_dir/include $CPPFLAGS"]) - AC_CHECK_HEADERS([infiniband/verbs.h]) - CPPFLAGS=$CPPFLAGS_save - ], - [AC_MSG_CHECKING([whether hwloc configure succeeded]) - AC_MSG_RESULT([no]) - opal_hwloc_hwloc1113_support=no]) - CFLAGS=$opal_hwloc_hwloc1113_save_cflags - - # Restore some env variables, if necessary - AS_IF([test -n "$opal_hwloc_hwloc1113_save_cairo"], - [enable_cairo=$opal_hwloc_hwloc1113_save_cairo]) - AS_IF([test -n "$opal_hwloc_hwloc1113_save_xml"], - [enable_xml=$opal_hwloc_hwloc1113_save_xml]) - AS_IF([test -n "$opal_hwloc_hwloc1113_save_static"], - [enable_static=$opal_hwloc_hwloc1113_save_static]) - AS_IF([test -n "$opal_hwloc_hwloc1113_save_shared"], - [enable_shared=$opal_hwloc_hwloc1113_save_shared]) - AS_IF([test -n "$opal_hwloc_hwloc1113_save_plugins"], - [enable_plugins=$opal_hwloc_hwloc1113_save_shared]) - - CPPFLAGS=$opal_hwloc_hwloc1113_save_CPPFLAGS - LDFLAGS=$opal_hwloc_hwloc1113_save_LDFLAGS - LIBS=$opal_hwloc_hwloc1113_save_LIBS - - AC_SUBST([opal_hwloc_hwloc1113_CFLAGS]) - AC_SUBST([opal_hwloc_hwloc1113_CPPFLAGS]) - AC_SUBST([opal_hwloc_hwloc1113_LDFLAGS]) - AC_SUBST([opal_hwloc_hwloc1113_LIBS]) - - # Finally, add some flags to the wrapper compiler so that our - # headers can be found. - hwloc_hwloc1113_WRAPPER_EXTRA_LDFLAGS="$HWLOC_EMBEDDED_LDFLAGS" - hwloc_hwloc1113_WRAPPER_EXTRA_LIBS="$HWLOC_EMBEDDED_LIBS" - hwloc_hwloc1113_WRAPPER_EXTRA_CPPFLAGS='-I${pkgincludedir}/'"$opal_hwloc_hwloc1113_basedir/hwloc/include" - - # If we are not building the internal hwloc, then indicate that - # this component should not be built. NOTE: we still did all the - # above configury so that all the proper GNU Autotools - # infrastructure is setup properly (e.g., w.r.t. SUBDIRS=hwloc in - # this directory's Makefile.am, we still need the Autotools "make - # distclean" infrastructure to work properly). - AS_IF([test "$opal_hwloc_external" = "yes"], - [AC_MSG_WARN([using an external hwloc; disqualifying this component]) - opal_hwloc_hwloc1113_support=no], - [AC_DEFINE([HAVE_DECL_HWLOC_OBJ_OSDEV_COPROC], [1]) - AC_DEFINE([HAVE_HWLOC_TOPOLOGY_DUP], [1])]) - - # Done! - AS_IF([test "$opal_hwloc_hwloc1113_support" = "yes"], - [$1], - [$2]) - - OPAL_VAR_SCOPE_POP -])dnl diff --git a/opal/mca/hwloc/hwloc1113/hwloc/AUTHORS b/opal/mca/hwloc/hwloc1113/hwloc/AUTHORS deleted file mode 100644 index 0e52215789f..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/AUTHORS +++ /dev/null @@ -1,10 +0,0 @@ -Cédric Augonnet -Guillaume Beauchamp -Jérôme Clet-Ortega -Ludovic Courtès -Nathalie Furmento -Brice Goglin -Alexey Kardashevskiy -Antoine Rougier (University of Bordeaux intern) -Jeff Squyres -Samuel Thibault diff --git a/opal/mca/hwloc/hwloc1113/hwloc/Makefile.am b/opal/mca/hwloc/hwloc1113/hwloc/Makefile.am deleted file mode 100644 index e046a07de86..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/Makefile.am +++ /dev/null @@ -1,73 +0,0 @@ -# Copyright © 2009-2016 Inria. All rights reserved. -# Copyright © 2009 Université Bordeaux -# Copyright © 2009-2014 Cisco Systems, Inc. All rights reserved. -# See COPYING in top-level directory. - -# Note that the -I directory must *exactly* match what was specified -# via AC_CONFIG_MACRO_DIR in configure.ac. -ACLOCAL_AMFLAGS = -I ./config - -SUBDIRS = src include -if HWLOC_BUILD_STANDALONE -SUBDIRS += tests utils contrib/systemd -# We need doc/ if HWLOC_BUILD_DOXYGEN, or during make install if HWLOC_INSTALL_DOXYGEN. -# There's no INSTALL_SUBDIRS, so always enter doc/ and check HWLOC_BUILD/INSTALL_DOXYGEN there -SUBDIRS += doc -endif - -# Do not let automake automatically add the non-standalone dirs to the -# distribution tarball if we're building in embedded mode. -DIST_SUBDIRS = $(SUBDIRS) - -# Only install the pkg file if we're building in standalone mode (and not on Windows) -if HWLOC_BUILD_STANDALONE -pkgconfigdir = $(libdir)/pkgconfig -pkgconfig_DATA = hwloc.pc -endif - -# Only install the valgrind suppressions file if we're building in standalone mode -if HWLOC_BUILD_STANDALONE -dist_pkgdata_DATA = contrib/hwloc-valgrind.supp -endif - -# -# "make distcheck" requires that tarballs are able to be able to "make -# dist", so we have to include config/distscript.sh. -# -EXTRA_DIST = \ - README VERSION COPYING AUTHORS \ - config/hwloc_get_version.sh \ - config/distscript.sh - -# Only install entire visual studio subdirectory if we're building in standalone mode -if HWLOC_BUILD_STANDALONE -EXTRA_DIST += contrib/windows -endif - -if HWLOC_BUILD_STANDALONE -dist-hook: - sh "$(top_srcdir)/config/distscript.sh" "$(top_srcdir)" "$(distdir)" "$(HWLOC_VERSION)" -endif HWLOC_BUILD_STANDALONE - -# -# Build the documenation and top-level README file -# -if HWLOC_BUILD_STANDALONE -.PHONY: doc readme -doc readme: - $(MAKE) -C doc -endif HWLOC_BUILD_STANDALONE - -if HWLOC_BUILD_STANDALONE -if HWLOC_HAVE_WINDOWS -# -# Winball specific rules -# -install-data-local: - sed -e 's/$$/'$$'\015'/ < $(srcdir)/README > $(DESTDIR)$(prefix)/README.txt - sed -e 's/$$/'$$'\015'/ < $(srcdir)/NEWS > $(DESTDIR)$(prefix)/NEWS.txt - sed -e 's/$$/'$$'\015'/ < $(srcdir)/COPYING > $(DESTDIR)$(prefix)/COPYING.txt -uninstall-local: - rm -f $(DESTDIR)$(prefix)/README.txt $(DESTDIR)$(prefix)/NEWS.txt $(DESTDIR)$(prefix)/COPYING.txt -endif HWLOC_HAVE_WINDOWS -endif HWLOC_BUILD_STANDALONE diff --git a/opal/mca/hwloc/hwloc1113/hwloc/NEWS b/opal/mca/hwloc/hwloc1113/hwloc/NEWS deleted file mode 100644 index ad43c293d25..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/NEWS +++ /dev/null @@ -1,1309 +0,0 @@ -Copyright © 2009 CNRS -Copyright © 2009-2016 Inria. All rights reserved. -Copyright © 2009-2013 Université Bordeaux -Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. - -$COPYRIGHT$ - -Additional copyrights may follow - -$HEADER$ - -=========================================================================== - -This file contains the main features as well as overviews of specific -bug fixes (and other actions) for each version of hwloc since version -0.9 (as initially released as "libtopology", then re-branded to "hwloc" -in v0.9.1). - - -Version 1.11.4 --------------- -* Fix Linux build with -m32 with respect to libudev. - Thanks to Paul Hargrove for reporting the issue. - - -Version 1.11.3 --------------- -* Bug fixes - + Fix a memory leak on Linux S/390 hosts with books. - + Fix /proc/mounts parsing on Linux by using mntent.h. - Thanks to Nathan Hjelm for reporting the issue. - + Fix a x86 infinite loop on VMware due to the x2APIC feature being - advertised without actually being fully supported. - Thanks to Jianjun Wen for reporting the problem and testing the patch. - + Fix the return value of hwloc_alloc() on mmap() failure. - Thanks to Hugo Brunie for reporting the issue. - + Fix the return value of command-line tools in some error cases. - + Do not break individual thread bindings during x86 backend discovery in a - multithreaded process. Thanks to Farouk Mansouri for the report. - + Fix hwloc-bind --membind for CPU-less NUMA nodes. - + Fix some corner cases in the XML export/import of application userdata. -* API Improvements - + Add HWLOC_MEMBIND_BYNODESET flag so that membind() functions accept - either cpusets or nodesets. - + Add hwloc_get_area_memlocation() to check where pages are actually - allocated. Only implemented on Linux for now. - - There's no _nodeset() variant, but the new flag HWLOC_MEMBIND_BYNODESET - is supported. - + Make hwloc_obj_type_sscanf() parse back everything that may be outputted - by hwloc_obj_type_snprintf(). -* Detection Improvements - + Allow the x86 backend to add missing cache levels, so that it completes - what the Solaris backend lacks. - Thanks to Ryan Zezeski for reporting the issue. - + Do not filter-out FibreChannel PCI adapters by default anymore. - Thanks to Matt Muggeridge for the report. - + Add support for CUDA compute capability 6.x. -* Tools - + Add --support to hwloc-info to list supported features, just like with - hwloc_topology_get_support(). - - Also add --objects and --topology to explicitly switch between the - default modes. - + Add --tid to let hwloc-bind operate on individual threads on Linux. - + Add --nodeset to let hwloc-bind report memory binding as NUMA node sets. - + hwloc-annotate and lstopo don't drop application userdata from XMLs anymore. - - Add --cu to hwloc-annotate to drop these application userdata. - + Make the hwloc-dump-hwdata dump directory configurable through configure - options such as --runstatedir or --localstatedir. -* Misc Improvements - + Add systemd service template contrib/systemd/hwloc-dump-hwdata.service - for launching hwloc-dump-hwdata at boot on Linux. - Thanks to Grzegorz Andrejczuk. - + Add HWLOC_PLUGINS_BLACKLIST environment variable to prevent some plugins - from being loaded. Thanks to Alexandre Denis for the suggestion. - + Small improvements for various Windows build systems, - thanks to Jonathan L Peyton and Marco Atzeri. - - -Version 1.11.2 --------------- -* Improve support for Intel Knights Landing Xeon Phi on Linux: - + Group local NUMA nodes of normal memory (DDR) and high-bandwidth memory - (MCDRAM) together through "Cluster" groups so that the local MCDRAM is - easy to find. - - See "How do I find the local MCDRAM NUMA node on Intel Knights - Landing Xeon Phi?" in the documentation. - - For uniformity across all KNL configurations, always have a NUMA node - object even if the host is UMA. - + Fix the detection of the memory-side cache: - - Add the hwloc-dump-hwdata superuser utility to dump SMBIOS information - into /var/run/hwloc/ as root during boot, and load this dumped - information from the hwloc library at runtime. - - See "Why do I need hwloc-dump-hwdata for caches on Intel Knights - Landing Xeon Phi?" in the documentation. - Thanks to Grzegorz Andrejczuk for the patches and for the help. -* The x86 and linux backends may now be combined for discovering CPUs - through x86 CPUID and memory from the Linux kernel. - This is useful for working around buggy CPU information reported by Linux - (for instance the AMD Bulldozer/Piledriver bug below). - Combination is enabled by passing HWLOC_COMPONENTS=x86 in the environment. -* Fix L3 cache sharing on AMD Opteron 63xx (Piledriver) and 62xx (Bulldozer) - in the x86 backend. Thanks to many users who helped. -* Fix the overzealous L3 cache sharing fix added to the x86 backend in 1.11.1 - for AMD Opteron 61xx (Magny-Cours) processors. -* The x86 backend may now add the info attribute Inclusive=0 or 1 to caches - it discovers, or to caches discovered by other backends earlier. - Thanks to Guillaume Beauchamp for the patch. -* Fix the management on alloc_membind() allocation failures on AIX, HP-UX - and OSF/Tru64. -* Fix spurious failures to load with ENOMEM on AIX in case of Misc objects - below PUs. -* lstopo improvements in X11 and Windows graphical mode: - + Add + - f 1 shortcuts to manually zoom-in, zoom-out, reset the scale, - or fit the entire window. - + Display all keyboard shortcuts in the console. -* Debug messages may be disabled at runtime by passing HWLOC_DEBUG_VERBOSE=0 - in the environment when --enable-debug was passed to configure. -* Add a FAQ entry "What are these Group objects in my topology?". - - -Version 1.11.1 --------------- -* Detection fixes - + Hardwire the topology of Fujitsu K-computer, FX10, FX100 servers to - workaround buggy Linux kernels. - Thanks to Takahiro Kawashima and Gilles Gouaillardet. - + Fix L3 cache information on AMD Opteron 61xx Magny-Cours processors - in the x86 backend. Thanks to Guillaume Beauchamp for the patch. - + Detect block devices directly attached to PCI without a controller, - for instance NVMe disks. Thanks to Barry M. Tannenbaum. - + Add the PCISlot attribute to all PCI functions instead of only the - first one. -* Miscellaneous internal fixes - + Ignore PCI bridges that could fail assertions by reporting buggy - secondary-subordinate bus numbers - Thanks to George Bosilca for reporting the issue. - + Fix an overzealous assertion when inserting an intermediate Group object - while Groups are totally ignored. - + Fix a memory leak on Linux on AMD processors with dual-core compute units. - Thanks to Bob Benner. - + Fix a memory leak on failure to load a xml diff file. - + Fix some segfaults when inputting an invalid synthetic description. - + Fix a segfault when plugins fail to find core symbols. - Thanks to Guy Streeter. -* Many fixes and improvements in the Windows backend: - + Fix the discovery of more than 32 processors and multiple processor - groups. Thanks to Barry M. Tannenbaum for the help. - + Add thread binding set support in case of multiple process groups. - + Add thread binding get support. - + Add get_last_cpu_location() support for the current thread. - + Disable the unsupported process binding in case of multiple processor - groups. - + Fix/update the Visual Studio support under contrib/windows. - Thanks to Eloi Gaudry for the help. -* Tools fixes - + Fix a segfault when displaying logical indexes in the graphical lstopo. - Thanks to Guillaume Mercier for reporting the issue. - + Fix lstopo linking with X11 libraries, for instance on Mac OS X. - Thanks to Scott Atchley and Pierre Ramet for reporting the issue. - + hwloc-annotate, hwloc-diff and hwloc-patch do not drop unavailable - resources from the output anymore and those may be annotated as well. - + Command-line tools may now import XML from the standard input with -i -.xml - + Add missing documentation for the hwloc-info --no-icaches option. - - -Version 1.11.0 --------------- -* API - + Socket objects are renamed into Package to align with the terminology - used by processor vendors. The old HWLOC_OBJ_SOCKET type and "Socket" - name are still supported for backward compatibility. - + HWLOC_OBJ_NODE is replaced with HWLOC_OBJ_NUMANODE for clarification. - HWLOC_OBJ_NODE is still supported for backward compatibility. - "Node" and "NUMANode" strings are supported as in earlier releases. -* Detection improvements - + Add support for Intel Knights Landing Xeon Phi. - Thanks to Grzegorz Andrejczuk and Lukasz Anaczkowski. - + Add Vendor, Model, Revision, SerialNumber, Type and LinuxDeviceID - info attributes to Block OS devices on Linux. Thanks to Vineet Pedaballe - for the help. - - Add --disable-libudev to avoid dependency on the libudev library. - + Add "MemoryModule" Misc objects with information about DIMMs, on Linux - when privileged and when I/O is enabled. - Thanks to Vineet Pedaballe for the help. - + Add a PCISlot attribute to PCI devices on Linux when supported to - identify the physical PCI slot where the board is plugged. - + Add CPUStepping info attribute on x86 processors, - thanks to Thomas Röhl for the suggestion. - + Ignore the device-tree on non-Power architectures to avoid buggy - detection on ARM. Thanks to Orion Poplawski for reporting the issue. - + Work-around buggy Xeon E5v3 BIOS reporting invalid PCI-NUMA affinity - for the PCI links on the second processor. - + Add support for CUDA compute capability 5.x, thanks Benjamin Worpitz. - + Many fixes to the x86 backend - - Add L1i and fix L2/L3 type on old AMD processors without topoext support. - - Fix Intel CPU family and model numbers when basic family isn't 6 or 15. - - Fix package IDs on recent AMD processors. - - Fix misc issues due to incomplete APIC IDs on x2APIC processors. - - Avoid buggy discovery on old SGI Altix UVs with non-unique APIC IDs. - + Gather total machine memory on NetBSD. -* Tools - + lstopo - - Collapse identical PCI devices unless --no-collapse is given. - This avoids gigantic outputs when a PCI device contains dozens of - identical virtual functions. - - The ASCII art output is now called "ascii", for instance in - "lstopo -.ascii". - The former "txt" extension is retained for backward compatibility. - - Automatically scales graphical box width to the inner text in Cairo, - ASCII and Windows outputs. - - Add --rect to lstopo to force rectangular layout even for NUMA nodes. - - Add --restrict-flags to configure the behavior of --restrict. - - Objects may have a "Type" info attribute to specify a better type name - and display it in lstopo. - - Really export all verbose information to the given output file. - + hwloc-annotate - - May now operate on all types of objects, including I/O. - - May now insert Misc objects in the topology. - - Do not drop instruction caches and I/O devices from the output anymore. - + Fix lstopo path in hwloc-gather-topology after install. -* Misc - + Fix hwloc/cudart.h for machines with multiple PCI domains, - thanks to Imre Kerr for reporting the problem. - + Fix PCI Bridge-specific depth attribute. - + Fix hwloc_bitmap_intersect() for two infinite bitmaps. - + Fix some corner cases in the building of levels on large NUMA machines - with non-uniform NUMA groups and I/Os. - + Improve the performance of object insertion by cpuset for large - topologies. - + Prefix verbose XML import errors with the source name. - + Improve pkg-config checks and error messages. - + Fix excluding after a component with an argument in the HWLOC_COMPONENTS - environment variable. -* Documentation - + Fix the recommended way in documentation and examples to allocate memory - on some node, it should use HWLOC_MEMBIND_BIND. - Thanks to Nicolas Bouzat for reporting the issue. - + Add a "Miscellaneous objects" section in the documentation. - + Add a FAQ entry "What happens to my topology if I disable symmetric - multithreading, hyper-threading, etc. ?" to the documentation. - - -Version 1.10.1 --------------- -* Actually remove disallowed NUMA nodes from nodesets when the whole-system - flag isn't enabled. -* Fix the gathering of PCI domains. Thanks to James Custer for reporting - the issue and providing a patch. -* Fix the merging of identical parent and child in presence of Misc objects. - Thanks to Dave Love for reporting the issue. -* Fix some misordering of children when merging with ignore_keep_structure() - in partially allowed topologies. -* Fix an overzealous assertion in the debug code when running on a single-PU - host with I/O. Thanks to Thomas Van Doren for reporting the issue. -* Don't forget to setup NUMA node object nodesets in x86 backend (for BSDs) - and OSF/Tru64 backend. -* Fix cpuid-x86 build error with gcc -O3 on x86-32. Thanks to Thomas Van Doren - for reporting the issue. -* Fix support for future very large caches in the x86 backend. -* Fix vendor/device names for SR-IOV PCI devices on Linux. -* Fix an unlikely crash in case of buggy hierarchical distance matrix. -* Fix PU os_index on some AIX releases. Thanks to Hendryk Bockelmann and - Erik Schnetter for helping debugging. -* Fix hwloc_bitmap_isincluded() in case of infinite sets. -* Change hwloc-ls.desktop into a lstopo.desktop and only install it if - lstopo is built with Cairo/X11 support. It cannot work with a non-graphical - lstopo or hwloc-ls. -* Add support for the renaming of Socket into Package in future releases. -* Add support for the replacement of HWLOC_OBJ_NODE with HWLOC_OBJ_NUMANODE - in future releases. -* Clarify the documentation of distance matrices in hwloc.h and in the manpage - of the hwloc-distances. Thanks to Dave Love for the suggestion. -* Improve some error messages by displaying more information about the - hwloc library in use. -* Document how to deal with the ABI break when upgrading to the upcoming 2.0 - See "How do I handle ABI breaks and API upgrades ?" in the FAQ. - - -Version 1.10.0 --------------- -* API - + Add hwloc_topology_export_synthetic() to export a topology to a - synthetic string without using lstopo. See the Synthetic topologies - section in the documentation. - + Add hwloc_topology_set/get_userdata() to let the application save - a private pointer in the topology whenever it needs a way to find - its own object corresponding to a topology. - + Add hwloc_get_numanode_obj_by_os_index() and document that this function - as well as hwloc_get_pu_obj_by_os_index() are good at converting - nodesets and cpusets into objects. - + hwloc_distrib() does not ignore any objects anymore when there are - too many of them. They get merged with others instead. - Thanks to Tim Creech for reporting the issue. -* Tools - + hwloc-bind --get now executes the command after displaying - the binding instead of ignoring the command entirely. - Thanks to John Donners for the suggestion. - + Clarify that memory sizes shown in lstopo are local by default - unless specified (total memory added in the root object). -* Synthetic topologies - + Synthetic topology descriptions may now specify attributes such as - memory sizes and OS indexes. See the Synthetic topologies section - in the documentation. - + lstopo now exports in this fully-detailed format by default. - The new option --export-synthetic-flags may be used to revert - back the old format. -* Documentation - + Add the doc/examples/ subdirectory with several real-life examples, - including the already existing hwloc-hello.C for basics. - Thanks to Rob Aulwes for the suggestion. - + Improve the documentation of CPU and memory binding in the API. - + Add a FAQ entry about operating system errors, especially on AMD - platforms with buggy cache information. - + Add a FAQ entry about loading many topologies in a single program. -* Misc - + Work around buggy Linux kernels reporting 2 sockets instead - 1 socket with 2 NUMA nodes for each Xeon E5 v3 (Haswell) processor. - + pciutils/libpci support is now removed since libpciaccess works - well and there's also a Linux-specific PCI backend. For the record, - pciutils was GPL and therefore disabled by default since v1.6.2. - + Add --disable-cpuid configure flag to work around buggy processor - simulators reporting invalid CPUID information. - Thanks for Andrew Friedley for reporting the issue. - + Fix a racy use of libltdl when manipulating multiple topologies in - different threads. - Thanks to Andra Hugo for reporting the issue and testing patches. - + Fix some build failures in private/misc.h. - Thanks to Pavan Balaji and Ralph Castain for the reports. - + Fix failures to detect X11/Xutil.h on some Solaris platforms. - Thanks to Siegmar Gross for reporting the failure. - + The plugin ABI has changed, this release will not load plugins - built against previous hwloc releases. - - -Version 1.9.1 -------------- -* Fix a crash when the PCI locality is invalid. Attach to the root object - instead. Thanks to Nicolas Denoyelle for reporting the issue. -* Fix -f in lstopo manpage. Thanks to Jirka Hladky for reporting the issue. -* Fix hwloc_obj_type_sscanf() and others when strncasecmp() is not properly - available. Thanks to Nick Papior Andersen for reporting the problem. -* Mark Linux file descriptors as close-on-exec to avoid leaks on exec. -* Fix some minor memory leaks. - - -Version 1.9.0 -------------- -* API - + Add hwloc_obj_type_sscanf() to extend hwloc_obj_type_of_string() with - type-specific attributes such as Cache/Group depth and Cache type. - hwloc_obj_type_of_string() is moved to hwloc/deprecated.h. - + Add hwloc_linux_get_tid_last_cpu_location() for retrieving the - last CPU where a Linux thread given by TID ran. - + Add hwloc_distrib() to extend the old hwloc_distribute[v]() functions. - hwloc_distribute[v]() is moved to hwloc/deprecated.h. - + Don't mix total and local memory when displaying verbose object attributes - with hwloc_obj_attr_snprintf() or in lstopo. -* Backends - + Add CPUVendor, CPUModelNumber and CPUFamilyNumber info attributes for - x86, ia64 and Xeon Phi sockets on Linux, to extend the x86-specific - support added in v1.8.1. Requested by Ralph Castain. - + Add many CPU- and Platform-related info attributes on ARM and POWER - platforms, in the Machine and Socket objects. - + Add CUDA info attributes describing the number of multiprocessors and - cores and the size of the global, shared and L2 cache memories in CUDA - OS devices. - + Add OpenCL info attributes describing the number of compute units and - the global memory size in OpenCL OS devices. - + The synthetic backend now accepts extended types such as L2Cache, L1i or - Group3. lstopo also exports synthetic strings using these extended types. -* Tools - + lstopo - - Do not overwrite output files by default anymore. - Pass -f or --force to enforce it. - - Display OpenCL, CUDA and Xeon Phi numbers of cores and memory sizes - in the graphical output. - - Fix export to stdout when specifying a Cairo-based output type - with --of. - + hwloc-ps - - Add -e or --get-last-cpu-location to report where processes/threads - run instead of where they are bound. - - Report locations as likely-more-useful objects such as Cores or Sockets - instead of Caches when possible. - + hwloc-bind - - Fix failure on Windows when not using --pid. - - Add -e as a synonym to --get-last-cpu-location. - + hwloc-distrib - - Add --reverse to distribute using last objects first and singlify - into last bits first. Thanks to Jirka Hladky for the suggestion. - + hwloc-info - - Report unified caches when looking for data or instruction cache - ancestor objects. -* Misc - + Add experimental Visual Studio support under contrib/windows. - Thanks to Eloi Gaudry for his help and for providing the first draft. - + Fix some overzealous assertions and warnings about the ordering of - objects on a level with respect to cpusets. The ordering is only - guaranteed for complete cpusets (based on the first bit in sets). - + Fix some memory leaks when importing xml diffs and when exporting a - "too complex" entry. - - -Version 1.8.1 -------------- -* Fix the cpuid code on Windows 64bits so that the x86 backend gets - enabled as expected and can populate CPU information. - Thanks to Robin Scher for reporting the problem. -* Add CPUVendor/CPUModelNumber/CPUFamilyNumber attributes when running - on x86 architecture. Thanks to Ralph Castain for the suggestion. -* Work around buggy BIOS reporting duplicate NUMA nodes on Linux. - Thanks to Jeff Becker for reporting the problem and testing the patch. -* Add a name to the lstopo graphical window. Thanks to Michael Prokop - for reporting the issue. - - -Version 1.8.0 -------------- -* New components - + Add the "linuxpci" component that always works on Linux even when - libpciaccess and libpci aren't available (and even with a modified - file-system root). By default the old "pci" component runs first - because "linuxpci" lacks device names (obj->name is always NULL). -* API - + Add the topology difference API in hwloc/diff.h for manipulating - many similar topologies. - + Add hwloc_topology_dup() for duplicating an entire topology. - + hwloc.h and hwloc/helper.h have been reorganized to clarify the - documentation sections. The actual inline code has moved out of hwloc.h - into the new hwloc/inlines.h. - + Deprecated functions are now in hwloc/deprecated.h, and not in the - official documentation anymore. -* Tools - + Add hwloc-diff and hwloc-patch tools together with the new diff API. - + Add hwloc-compress-dir to (de)compress an entire directory of XML files - using hwloc-diff and hwloc-patch. - + Object colors in the graphical output of lstopo may be changed by adding - a "lstopoStyle" info attribute. See CUSTOM COLORS in the lstopo(1) manpage - for details. Thanks to Jirka Hladky for discussing the idea. - + hwloc-gather-topology may now gather I/O-related files on Linux when - --io is given. Only the linuxpci component supports discovering I/O - objects from these extended tarballs. - + hwloc-annotate now supports --ri to remove/replace info attributes with - a given name. - + hwloc-info supports "root" and "all" special locations for dumping - information about the root object. - + lstopo now supports --append-legend to append custom lines of text - to the legend in the graphical output. Thanks to Jirka Hladky for - discussing the idea. - + hwloc-calc and friends have a more robust parsing of locations given - on the command-line and they report useful error messages about it. - + Add --whole-system to hwloc-bind, hwloc-calc, hwloc-distances and - hwloc-distrib, and add --restrict to hwloc-bind for uniformity among - tools. -* Misc - + Calling hwloc_topology_load() or hwloc_topology_set_*() on an already - loaded topology now returns an error (deprecated since release 1.6.1). - + Fix the initialisation of cpusets and nodesets in Group objects added - when inserting PCI hostbridges. - + Never merge Group objects that were added explicitly by the user with - hwloc_custom_insert_group_object_by_parent(). - + Add a sanity check during dynamic plugin loading to prevent some - crashes when hwloc is dynamically loaded by another plugin mechanisms. - + Add --with-hwloc-plugins-path to specify the install/load directories - of plugins. - + Add the MICSerialNumber info attribute to the root object when running - hwloc inside a Xeon Phi to match the same attribute in the MIC OS device - when running in the host. - - -Version 1.7.2 -------------- -* Do not create invalid block OS devices on very old Linux kernel such - as RHEL4 2.6.9. -* Fix PCI subvendor/device IDs. -* Fix the management of Misc objects inserted by parent. - Thanks to Jirka Hladky for reporting the problem. -* Add a PortState into attribute to OpenFabrics OS devices. -* Add a MICSerialNumber info attribute to Xeon PHI/MIC OS devices. -* Improve verbose error messages when failing to load from XML. - - -Version 1.7.1 -------------- -* Fix a failed assertion in the distance grouping code when loading a XML - file that already contains some groups. - Thanks to Laercio Lima Pilla for reporting the problem. -* Remove unexpected Group objects when loading XML topologies with I/O - objects and NUMA distances. - Thanks to Elena Elkina for reporting the problem and testing patches. -* Fix PCI link speed discovery when using libpciaccess. -* Fix invalid libpciaccess virtual function device/vendor IDs when using - SR-IOV PCI devices on Linux. -* Fix GL component build with old NVCtrl releases. - Thanks to Jirka Hladky for reporting the problem. -* Fix embedding breakage caused by libltdl. - Thanks to Pavan Balaji for reporting the problem. -* Always use the system-wide libltdl instead of shipping one inside hwloc. -* Document issues when enabling plugins while embedding hwloc in another - project, in the documentation section Embedding hwloc in Other Software. -* Add a FAQ entry "How to get useful topology information on NetBSD?" - in the documentation. -* Somes fixes in the renaming code for embedding. -* Miscellaneous minor build fixes. - - -Version 1.7.0 -------------- -* New operating system backends - + Add BlueGene/Q compute node kernel (CNK) support. See the FAQ in the - documentation for details. Thanks to Jeff Hammond, Christopher Samuel - and Erik Schnetter for their help. - + Add NetBSD support, thanks to Aleksej Saushev. -* New I/O device discovery - + Add co-processor OS devices such as "mic0" for Intel Xeon Phi (MIC) - on Linux. Thanks to Jerome Vienne for helping. - + Add co-processor OS devices such as "cuda0" for NVIDIA CUDA-capable GPUs. - + Add co-processor OS devices such as "opencl0d0" for OpenCL GPU devices - on the AMD OpenCL implementation. - + Add GPU OS devices such as ":0.0" for NVIDIA X11 displays. - + Add GPU OS devices such as "nvml0" for NVIDIA GPUs. - Thanks to Marwan Abdellah and Stefan Eilemann for helping. - These new OS devices have some string info attributes such as CoProcType, - GPUModel, etc. to better identify them. - See the I/O Devices and Attributes documentation sections for details. -* New components - + Add the "opencl", "cuda", "nvml" and "gl" components for I/O device - discovery. - + "nvml" also improves the discovery of NVIDIA GPU PCIe link speed. - All of these new components may be built as plugins. They may also be - disabled entirely by passing --disable-opencl/cuda/nvml/gl to configure. - See the I/O Devices, Components and Plugins, and FAQ documentation - sections for details. -* API - + Add hwloc_topology_get_flags(). - + Add hwloc/plugins.h for building external plugins. - See the Adding new discovery components and plugins section. -* Interoperability - + Add hwloc/opencl.h, hwloc/nvml.h, hwloc/gl.h and hwloc/intel-mic.h - to retrieve the locality of OS devices that correspond to AMD OpenCL - GPU devices or indexes, to NVML devices or indexes, to NVIDIA X11 - displays, or to Intel Xeon Phi (MIC) device indexes. - + Add new helpers in hwloc/cuda.h and hwloc/cudart.h to convert - between CUDA devices or indexes and hwloc OS devices. - + Add hwloc_ibv_get_device_osdev() and clarify the requirements - of the OpenFabrics Verbs helpers in hwloc/openfabrics-verbs.h. -* Tools - + hwloc-info is not only a synonym of lstopo -s anymore, it also - dumps information about objects given on the command-line. -* Documentation - + Add a section "Existing components and plugins". - + Add a list of common OS devices in section "Software devices". - + Add a new FAQ entry "Why is lstopo slow?" about lstopo slowness - issues because of GPUs. - + Clarify the documentation of inline helpers in hwloc/myriexpress.h - and hwloc/openfabrics-verbs.h. -* Misc - + Improve cache detection on AIX. - + The HWLOC_COMPONENTS variable now excludes the components whose - names are prefixed with '-'. - + lstopo --ignore PU now works when displaying the topology in - graphical and textual mode (not when exporting to XML). - + Make sure I/O options always appear in lstopo usage, not only when - using pciutils/libpci. - + Remove some unneeded Linux specific includes from some interoperability - headers. - + Fix some inconsistencies in hwloc-distrib and hwloc-assembler-remote - manpages. Thanks to Guy Streeter for the report. - + Fix a memory leak on AIX when getting memory binding. - + Fix many small memory leaks on Linux. - + The `libpci' component is now called `pci' but the old name is still - accepted in the HWLOC_COMPONENTS variable for backward compatibility. - - -Version 1.6.2 -------------- -* Use libpciaccess instead of pciutils/libpci by default for I/O discovery. - pciutils/libpci is only used if --enable-libpci is given to configure - because its GPL license may taint hwloc. See the Installation section - in the documentation for details. -* Fix get_cpubind on Solaris when bound to a single PU with - processor_bind(). Thanks to Eugene Loh for reporting the problem - and providing a patch. - - -Version 1.6.1 -------------- -* Fix some crash or buggy detection in the x86 backend when Linux - cgroups/cpusets restrict the available CPUs. -* Fix the pkg-config output with --libs --static. - Thanks to Erik Schnetter for reporting one of the problems. -* Fix the output of hwloc-calc -H --hierarchical when using logical - indexes in the output. -* Calling hwloc_topology_load() multiple times on the same topology - is officially deprecated. hwloc will warn in such cases. -* Add some documentation about existing plugins/components, package - dependencies, and I/O devices specification on the command-line. - - -Version 1.6.0 -------------- -* Major changes - + Reorganize the backend infrastructure to support dynamic selection - of components and dynamic loading of plugins. For details, see the - new documentation section Components and plugins. - - The HWLOC_COMPONENTS variable lets one replace the default discovery - components. - - Dynamic loading of plugins may be enabled with --enable-plugins - (except on AIX and Windows). It will build libxml2 and libpci - support as separated modules. This helps reducing the dependencies - of the core hwloc library when distributed as a binary package. -* Backends - + Add CPUModel detection on Darwin and x86/FreeBSD. - Thanks to Robin Scher for providing ways to implement this. - + The x86 backend now adds CPUModel info attributes to socket objects - created by other backends that do not natively support this attribute. - + Fix detection on FreeBSD in case of cpuset restriction. Thanks to - Sebastian Kuzminsky for reporting the problem. -* XML - + Add hwloc_topology_set_userdata_import/export_callback(), - hwloc_export_obj_userdata() and _userdata_base64() to let - applications specify how to save/restore the custom data they placed - in the userdata private pointer field of hwloc objects. -* Tools - + Add hwloc-annotate program to add string info attributes to XML - topologies. - + Add --pid-cmd to hwloc-ps to append the output of a command to each - PID line. May be used for showing Open MPI process ranks, see the - hwloc-ps(1) manpage for details. - + hwloc-bind now exits with an error if binding fails; the executable - is not launched unless binding suceeeded or --force was given. - + Add --quiet to hwloc-calc and hwloc-bind to hide non-fatal error - messages. - + Fix command-line pid support in windows tools. - + All programs accept --verbose as a synonym to -v. -* Misc - + Fix some DIR descriptor leaks on Linux. - + Fix I/O device lists when some were filtered out after a XML import. - + Fix the removal of I/O objects when importing a I/O-enabled XML topology - without any I/O topology flag. - + When merging objects with HWLOC_IGNORE_TYPE_KEEP_STRUCTURE or - lstopo --merge, compare object types before deciding which one of two - identical object to remove (e.g. keep sockets in favor of caches). - + Add some GUID- and LID-related info attributes to OpenFabrics - OS devices. - + Only add CPUType socket attributes on Solaris/Sparc. Other cases - don't report reliable information (Solaris/x86), and a replacement - is available as the Architecture string info in the Machine object. - + Add missing Backend string info on Solaris in most cases. - + Document object attributes and string infos in a new Attributes - section in the documentation. - + Add a section about Synthetic topologies in the documentation. - - -Version 1.5.2 (some of these changes are in v1.6.2 but not in v1.6) -------------- -* Use libpciaccess instead of pciutils/libpci by default for I/O discovery. - pciutils/libpci is only used if --enable-libpci is given to configure - because its GPL license may taint hwloc. See the Installation section - in the documentation for details. -* Fix get_cpubind on Solaris when bound to a single PU with - processor_bind(). Thanks to Eugene Loh for reporting the problem - and providing a patch. -* Fix some DIR descriptor leaks on Linux. -* Fix I/O device lists when some were filtered out after a XML import. -* Add missing Backend string info on Solaris in most cases. -* Fix the removal of I/O objects when importing a I/O-enabled XML topology - without any I/O topology flag. -* Fix the output of hwloc-calc -H --hierarchical when using logical - indexes in the output. -* Fix the pkg-config output with --libs --static. - Thanks to Erik Schnetter for reporting one of the problems. - - -Version 1.5.1 -------------- -* Fix block OS device detection on Linux kernel 3.3 and later. - Thanks to Guy Streeter for reporting the problem and testing the fix. -* Fix the cpuid code in the x86 backend (for FreeBSD). Thanks to - Sebastian Kuzminsky for reporting problems and testing patches. -* Fix 64bit detection on FreeBSD. -* Fix some corner cases in the management of the thissystem flag with - respect to topology flags and environment variables. -* Fix some corner cases in command-line parsing checks in hwloc-distrib - and hwloc-distances. -* Make sure we do not miss some block OS devices on old Linux kernels - when a single PCI device has multiple IDE hosts/devices behind it. -* Do not disable I/O devices or instruction caches in hwloc-assembler output. - - -Version 1.5.0 -------------- -* Backends - + Do not limit the number of processors to 1024 on Solaris anymore. - + Gather total machine memory on FreeBSD. Thanks to Cyril Roelandt. - + XML topology files do not depend on the locale anymore. Float numbers - such as NUMA distances or PCI link speeds now always use a dot as a - decimal separator. - + Add instruction caches detection on Linux, AIX, Windows and Darwin. - + Add get_last_cpu_location() support for the current thread on AIX. - + Support binding on AIX when threads or processes were bound with - bindprocessor(). Thanks to Hendryk Bockelmann for reporting the issue - and testing patches, and to Farid Parpia for explaining the binding - interfaces. - + Improve AMD topology detection in the x86 backend (for FreeBSD) using - the topoext feature. -* API - + Increase HWLOC_API_VERSION to 0x00010500 so that API changes may be - detected at build-time. - + Add a cache type attribute describind Data, Instruction and Unified - caches. Caches with different types but same depth (for instance L1d - and L1i) are placed on different levels. - + Add hwloc_get_cache_type_depth() to retrieve the hwloc level depth of - of the given cache depth and type, for instance L1i or L2. - It helps disambiguating the case where hwloc_get_type_depth() returns - HWLOC_TYPE_DEPTH_MULTIPLE. - + Instruction caches are ignored unless HWLOC_TOPOLOGY_FLAG_ICACHES is - passed to hwloc_topology_set_flags() before load. - + Add hwloc_ibv_get_device_osdev_by_name() OpenFabrics helper in - openfabrics-verbs.h to find the hwloc OS device object corresponding to - an OpenFabrics device. -* Tools - + Add lstopo-no-graphics, a lstopo built without graphical support to - avoid dependencies on external libraries such as Cairo and X11. When - supported, graphical outputs are only available in the original lstopo - program. - - Packagers splitting lstopo and lstopo-no-graphics into different - packages are advised to use the alternatives system so that lstopo - points to the best available binary. - + Instruction caches are enabled in lstopo by default. Use --no-icaches - to disable them. - + Add -t/--threads to show threads in hwloc-ps. -* Removal of obsolete components - + Remove the old cpuset interface (hwloc/cpuset.h) which is deprecated and - superseded by the bitmap API (hwloc/bitmap.h) since v1.1. - hwloc_cpuset and nodeset types are still defined, but all hwloc_cpuset_* - compatibility wrappers are now gone. - + Remove Linux libnuma conversion helpers for the deprecated and - broken nodemask_t interface. - + Remove support for "Proc" type name, it was superseded by "PU" in v1.0. - + Remove hwloc-mask symlinks, it was replaced by hwloc-calc in v1.0. -* Misc - + Fix PCIe 3.0 link speed computation. - + Non-printable characters are dropped from strings during XML export. - + Fix importing of escaped characters with the minimalistic XML backend. - + Assert hwloc_is_thissystem() in several I/O related helpers. - + Fix some memory leaks in the x86 backend for FreeBSD. - + Minor fixes to ease native builds on Windows. - + Limit the number of retries when operating on all threads within a - process on Linux if the list of threads is heavily getting modified. - - -Version 1.4.3 -------------- -* This release is only meant to fix the pciutils license issue when upgrading - to hwloc v1.5 or later is not possible. It contains several other minor - fixes but ignores many of them that are only in v1.5 or later. -* Use libpciaccess instead of pciutils/libpci by default for I/O discovery. - pciutils/libpci is only used if --enable-libpci is given to configure - because its GPL license may taint hwloc. See the Installation section - in the documentation for details. -* Fix PCIe 3.0 link speed computation. -* Fix importing of escaped characters with the minimalistic XML backend. -* Fix a memory leak in the x86 backend. - - -Version 1.4.2 -------------- -* Fix build on Solaris 9 and earlier when fabsf() is not a compiler - built-in. Thanks to Igor Galić for reporting the problem. -* Fix support for more than 32 processors on Windows. Thanks to Hartmut - Kaiser for reporting the problem. -* Fix process-wide binding and cpulocation routines on Linux when some - threads disappear in the meantime. Thanks to Vlad Roubtsov for reporting - the issue. -* Make installed scripts executable. Thanks to Jirka Hladky for reporting - the problem. -* Fix libtool revision management when building for Windows. This fix was - also released as hwloc v1.4.1.1 Windows builds. Thanks to Hartmut Kaiser - for reporting the problem. -* Fix the __hwloc_inline keyword in public headers when compiling with a - C++ compiler. -* Add Port info attribute to network OS devices inside OpenFabrics PCI - devices so as to identify which interface corresponds to which port. -* Document requirements for interoperability helpers: I/O devices discovery - is required for some of them; the topology must match the current host - for most of them. - - -Version 1.4.1 -------------- -* This release contains all changes from v1.3.2. -* Fix hwloc_alloc_membind, thanks Karl Napf for reporting the issue. -* Fix memory leaks in some get_membind() functions. -* Fix helpers converting from Linux libnuma to hwloc (hwloc/linux-libnuma.h) - in case of out-of-order NUMA node ids. -* Fix some overzealous assertions in the distance grouping code. -* Workaround BIOS reporting empty I/O locality in CUDA and OpenFabrics - helpers on Linux. Thanks to Albert Solernou for reporting the problem. -* Install a valgrind suppressions file hwloc-valgrind.supp (see the FAQ). -* Fix memory binding documentation. Thanks to Karl Napf for reporting the - issues. - - -Version 1.4.0 (does not contain all v1.3.2 changes) -------------- -* Major features - + Add "custom" interface and "assembler" tools to build multi-node - topology. See the Multi-node Topologies section in the documentation - for details. -* Interface improvements - + Add symmetric_subtree object attribute to ease assumptions when consulting - regular symmetric topologies. - + Add a CPUModel and CPUType info attribute to Socket objects on Linux - and Solaris. - + Add hwloc_get_obj_index_inside_cpuset() to retrieve the "logical" index - of an object within a subtree of the topology. - + Add more NVIDIA CUDA helpers in cuda.h and cudart.h to find hwloc objects - corresponding to CUDA devices. -* Discovery improvements - + Add a group object above partial distance matrices to make sure - the matrices are available in the final topology, except when this - new object would contradict the existing hierarchy. - + Grouping by distances now also works when loading from XML. - + Fix some corner cases in object insertion, for instance when dealing - with NUMA nodes without any CPU. -* Backends - + Implement hwloc_get_area_membind() on Linux. - + Honor I/O topology flags when importing from XML. - + Further improve XML-related error checking and reporting. - + Hide synthetic topology error messages unless HWLOC_SYNTHETIC_VERBOSE=1. -* Tools - + Add synthetic exporting of symmetric topologies to lstopo. - + lstopo --horiz and --vert can now be applied to some specific object types. - + lstopo -v -p now displays distance matrices with physical indexes. - + Add hwloc-distances utility to list distances. -* Documentation - + Fix and/or document the behavior of most inline functions in hwloc/helper.h - when the topology contains some I/O or Misc objects. - + Backend documentation enhancements. -* Bug fixes - + Fix missing last bit in hwloc_linux_get_thread_cpubind(). - Thanks to Carolina Gómez-Tostón Gutiérrez for reporting the issue. - + Fix FreeBSD build without cpuid support. - + Fix several Windows build issues. - + Fix inline keyword definition in public headers. - + Fix dependencies in the embedded library. - + Improve visibility support detection. Thanks to Dave Love for providing - the patch. - + Remove references to internal symbols in the tools. - - -Version 1.3.3 -------------- -* This release is only meant to fix the pciutils license issue when upgrading - to hwloc v1.4 or later is not possible. It contains several other minor - fixes but ignores many of them that are only in v1.4 or later. -* Use libpciaccess instead of pciutils/libpci by default for I/O discovery. - pciutils/libpci is only used if --enable-libpci is given to configure - because its GPL license may taint hwloc. See the Installation section - in the documentation for details. - - -Version 1.3.2 -------------- -* Fix missing last bit in hwloc_linux_get_thread_cpubind(). - Thanks to Carolina Gómez-Tostón Gutiérrez for reporting the issue. -* Fix build with -mcmodel=medium. Thanks to Devendar Bureddy for reporting - the issue. -* Fix build with Solaris Studio 12 compiler when XML is disabled. - Thanks to Paul H. Hargrove for reporting the problem. -* Fix installation with old GNU sed, for instance on Red Hat 8. - Thanks to Paul H. Hargrove for reporting the problem. -* Fix PCI locality when Linux cgroups restrict the available CPUs. -* Fix floating point issue when grouping by distance on mips64 architecture. - Thanks to Paul H. Hargrove for reporting the problem. -* Fix conversion from/to Linux libnuma when some NUMA nodes have no memory. -* Fix support for gccfss compilers with broken ffs() support. Thanks to - Paul H. Hargrove for reporting the problem and providing a patch. -* Fix FreeBSD build without cpuid support. -* Fix several Windows build issues. -* Fix inline keyword definition in public headers. -* Fix dependencies in the embedded library. -* Detect when a compiler such as xlc may not report compile errors - properly, causing some configure checks to be wrong. Thanks to - Paul H. Hargrove for reporting the problem and providing a patch. -* Improve visibility support detection. Thanks to Dave Love for providing - the patch. -* Remove references to internal symbols in the tools. -* Fix installation on systems with limited command-line size. - Thanks to Paul H. Hargrove for reporting the problem. -* Further improve XML-related error checking and reporting. - - -Version 1.3.1 -------------- -* Fix pciutils detection with pkg-config when not installed in standard - directories. -* Fix visibility options detection with the Solaris Studio compiler. - Thanks to Igor Galić and Terry Dontje for reporting the problems. -* Fix support for old Linux sched.h headers such as those found - on Red Hat 8. Thanks to Paul H. Hargrove for reporting the problems. -* Fix inline and attribute support for Solaris compilers. Thanks to - Dave Love for reporting the problems. -* Print a short summary at the end of the configure output. Thanks to - Stefan Eilemann for the suggestion. -* Add --disable-libnuma configure option to disable libnuma-based - memory binding support on Linux. Thanks to Rayson Ho for the - suggestion. -* Make hwloc's configure script properly obey $PKG_CONFIG. Thanks to - Nathan Phillip Brink for raising the issue. -* Silence some harmless pciutils warnings, thanks to Paul H. Hargrove - for reporting the problem. -* Fix the documentation with respect to hwloc_pid_t and hwloc_thread_t - being either pid_t and pthread_t on Unix, or HANDLE on Windows. - - -Version 1.3.0 -------------- -* Major features - + Add I/O devices and bridges to the topology using the pciutils - library. Only enabled after setting the relevant flag with - hwloc_topology_set_flags() before hwloc_topology_load(). See the - I/O Devices section in the documentation for details. -* Discovery improvements - + Add associativity to the cache attributes. - + Add support for s390/z11 "books" on Linux. - + Add the HWLOC_GROUPING_ACCURACY environment variable to relax - distance-based grouping constraints. See the Environment Variables - section in the documentation for details about grouping behavior - and configuration. - + Allow user-given distance matrices to remove or replace those - discovered by the OS backend. -* XML improvements - + XML is now always supported: a minimalistic custom import/export - code is used when libxml2 is not available. It is only guaranteed - to read XML files generated by hwloc. - + hwloc_topology_export_xml() and export_xmlbuffer() now return an - integer. - + Add hwloc_free_xmlbuffer() to free the buffer allocated by - hwloc_topology_export_xmlbuffer(). - + Hide XML topology error messages unless HWLOC_XML_VERBOSE=1. -* Minor API updates - + Add hwloc_obj_add_info to customize object info attributes. -* Tools - + lstopo now displays I/O devices by default. Several options are - added to configure the I/O discovery. - + hwloc-calc and hwloc-bind now accept I/O devices as input. - + Add --restrict option to hwloc-calc and hwloc-distribute. - + Add --sep option to change the output field separator in hwloc-calc. - + Add --whole-system option to hwloc-ps. - - -Version 1.2.2 -------------- -* Fix build on AIX 5.2, thanks Utpal Kumar Ray for the report. -* Fix XML import of very large page sizes or counts on 32bits platform, - thanks to Karsten Hopp for the RedHat ticket. -* Fix crash when administrator limitations such as Linux cgroup require - to restrict distance matrices. Thanks to Ake Sandgren for reporting the - problem. -* Fix the removal of objects such as AMD Magny-Cours dual-node sockets - in case of administrator restrictions. -* Improve error reporting and messages in case of wrong synthetic topology - description. -* Several other minor internal fixes and documentation improvements. - - -Version 1.2.1 -------------- -* Improve support of AMD Bulldozer "Compute-Unit" modules by detecting - logical processors with different core IDs on Linux. -* Fix hwloc-ps crash when listing processes from another Linux cpuset. - Thanks to Carl Smith for reporting the problem. -* Fix build on AIX and Solaris. Thanks to Carl Smith and Andreas Kupries - for reporting the problems. -* Fix cache size detection on Darwin. Thanks to Erkcan Özcan for reporting - the problem. -* Make configure fail if --enable-xml or --enable-cairo is given and - proper support cannot be found. Thanks to Andreas Kupries for reporting - the XML problem. -* Fix spurious L1 cache detection on AIX. Thanks to Hendryk Bockelmann - for reporting the problem. -* Fix hwloc_get_last_cpu_location(THREAD) on Linux. Thanks to Gabriele - Fatigati for reporting the problem. -* Fix object distance detection on Solaris. -* Add pthread_self weak symbol to ease static linking. -* Minor documentation fixes. - - -Version 1.2.0 -------------- -* Major features - + Expose latency matrices in the API as an array of distance structures - within objects. Add several helpers to find distances. - + Add hwloc_topology_set_distance_matrix() and environment variables - to provide a matrix of distances between a given set of objects. - + Add hwloc_get_last_cpu_location() and hwloc_get_proc_last_cpu_location() - to retrieve the processors where a process or thread recently ran. - - Add the corresponding --get-last-cpu-location option to hwloc-bind. - + Add hwloc_topology_restrict() to restrict an existing topology to a - given cpuset. - - Add the corresponding --restrict option to lstopo. -* Minor API updates - + Add hwloc_bitmap_list_sscanf/snprintf/asprintf to convert between bitmaps - and strings such as 4-5,7-9,12,15- - + hwloc_bitmap_set/clr_range() now support infinite ranges. - + Clarify the difference between inserting Misc objects by cpuset or by - parent. - + hwloc_insert_misc_object_by_cpuset() now returns NULL in case of error. -* Discovery improvements - + x86 backend (for freebsd): add x2APIC support - + Support standard device-tree phandle, to get better support on e.g. ARM - systems providing it. - + Detect cache size on AIX. Thanks Christopher and IBM. - + Improve grouping to support asymmetric topologies. -* Tools - + Command-line tools now support "all" and "root" special locations - consisting in the entire topology, as well as type names with depth - attributes such as L2 or Group4. - + hwloc-calc improvements: - - Add --number-of/-N option to report the number of objects of a given - type or depth. - - -I is now equivalent to --intersect for listing the indexes of - objects of a given type or depth that intersects the input. - - Add -H to report the output as a hierarchical combination of types - and depths. - + Add --thissystem to lstopo. - + Add lstopo-win, a console-less lstopo variant on Windows. -* Miscellaneous - + Remove C99 usage from code base. - + Rename hwloc-gather-topology.sh into hwloc-gather-topology - + Fix AMD cache discovery on freebsd when there is no L3 cache, thanks - Andriy Gapon for the fix. - - -Version 1.1.2 -------------- -* Fix a segfault in the distance-based grouping code when some objects - are not placed in any group. Thanks to Bernd Kallies for reporting - the problem and providing a patch. -* Fix the command-line parsing of hwloc-bind --mempolicy interleave. - Thanks to Guy Streeter for reporting the problem. -* Stop truncating the output in hwloc_obj_attr_snprintf() and in the - corresponding lstopo output. Thanks to Guy Streeter for reporting the - problem. -* Fix object levels ordering in synthetic topologies. -* Fix potential incoherency between device tree and kernel information, - when SMT is disabled on Power machines. -* Fix and document the behavior of hwloc_topology_set_synthetic() in case - of invalid argument. Thanks to Guy Streeter for reporting the problem. -* Add some verbose error message reporting when it looks like the OS - gives erroneous information. -* Do not include unistd.h and stdint.h in public headers on Windows. -* Move config.h files into their own subdirectories to avoid name - conflicts when AC_CONFIG_HEADERS adds -I's for them. -* Remove the use of declaring variables inside "for" loops. -* Some other minor fixes. -* Many minor documentation fixes. - - -Version 1.1.1 -------------- -* Add hwloc_get_api_version() which returns the version of hwloc used - at runtime. Thanks to Guy Streeter for the suggestion. -* Fix the number of hugepages reported for NUMA nodes on Linux. -* Fix hwloc_bitmap_to_ulong() right after allocating the bitmap. - Thanks to Bernd Kallies for reporting the problem. -* Fix hwloc_bitmap_from_ith_ulong() to properly zero the first ulong. - Thanks to Guy Streeter for reporting the problem. -* Fix hwloc_get_membind_nodeset() on Linux. - Thanks to Bernd Kallies for reporting the problem and providing a patch. -* Fix some file descriptor leaks in the Linux discovery. -* Fix the minimum width of NUMA nodes, caches and the legend in the graphical - lstopo output. Thanks to Jirka Hladky for reporting the problem. -* Various fixes to bitmap conversion from/to taskset-strings. -* Fix and document snprintf functions behavior when the buffer size is too - small or zero. Thanks to Guy Streeter for reporting the problem. -* Fix configure to avoid spurious enabling of the cpuid backend. - Thanks to Tim Anderson for reporting the problem. -* Cleanup error management in hwloc-gather-topology.sh. - Thanks to Jirka Hladky for reporting the problem and providing a patch. -* Add a manpage and usage for hwloc-gather-topology.sh on Linux. - Thanks to Jirka Hladky for providing a patch. -* Memory binding documentation enhancements. - - -Version 1.1.0 -------------- - -* API - + Increase HWLOC_API_VERSION to 0x00010100 so that API changes may be - detected at build-time. - + Add a memory binding interface. - + The cpuset API (hwloc/cpuset.h) is now deprecated. It is replaced by - the bitmap API (hwloc/bitmap.h) which offers the same features with more - generic names since it applies to CPU sets, node sets and more. - Backward compatibility with the cpuset API and ABI is still provided but - it will be removed in a future release. - Old types (hwloc_cpuset_t, ...) are still available as a way to clarify - what kind of hwloc_bitmap_t each API function manipulates. - Upgrading to the new API only requires to replace hwloc_cpuset_ function - calls with the corresponding hwloc_bitmap_ calls, with the following - renaming exceptions: - - hwloc_cpuset_cpu -> hwloc_bitmap_only - - hwloc_cpuset_all_but_cpu -> hwloc_bitmap_allbut - - hwloc_cpuset_from_string -> hwloc_bitmap_sscanf - + Add an `infos' array in each object to store couples of info names and - values. It enables generic storage of things like the old dmi board infos - that were previously stored in machine specific attributes. - + Add linesize cache attribute. -* Features - + Bitmaps (and thus CPU sets and node sets) are dynamically (re-)allocated, - the maximal number of CPUs (HWLOC_NBMAXCPUS) has been removed. - + Improve the distance-based grouping code to better support irregular - distance matrices. - + Add support for device-tree to get cache information (useful on Power - architectures). -* Helpers - + Add NVIDIA CUDA helpers in cuda.h and cudart.h to ease interoperability - with CUDA Runtime and Driver APIs. - + Add Myrinet Express helper in myriexpress.h to ease interoperability. -* Tools - + lstopo now displays physical/OS indexes by default in graphical mode - (use -l to switch back to logical indexes). The textual output still uses - logical by default (use -p to switch to physical indexes). - + lstopo prefixes logical indexes with `L#' and physical indexes with `P#'. - Physical indexes are also printed as `P#N' instead of `phys=N' within - object attributes (in parentheses). - + Add a legend at the bottom of the lstopo graphical output, use --no-legend - to remove it. - + Add hwloc-ps to list process' bindings. - + Add --membind and --mempolicy options to hwloc-bind. - + Improve tools command-line options by adding a generic --input option - (and more) which replaces the old --xml, --synthetic and --fsys-root. - + Cleanup lstopo output configuration by adding --output-format. - + Add --intersect in hwloc-calc, and replace --objects with --largest. - + Add the ability to work on standard input in hwloc-calc. - + Add --from, --to and --at in hwloc-distrib. - + Add taskset-specific functions and command-line tools options to - manipulate CPU set strings in the format of the taskset program. - + Install hwloc-gather-topology.sh on Linux. - - -Version 1.0.3 -------------- - -* Fix support for Linux cpuset when emulated by a cgroup mount point. -* Remove unneeded runtime dependency on libibverbs.so in the library and - all utils programs. -* Fix hwloc_cpuset_to_linux_libnuma_ulongs in case of non-linear OS-indexes - for NUMA nodes. -* lstopo now displays physical/OS indexes by default in graphical mode - (use -l to switch back to logical indexes). The textual output still uses - logical by default (use -p to switch to physical indexes). - - -Version 1.0.2 -------------- - -* Public headers can now be included directly from C++ programs. -* Solaris fix for non-contiguous cpu numbers. Thanks to Rolf vandeVaart for - reporting the issue. -* Darwin 10.4 fix. Thanks to Olivier Cessenat for reporting the issue. -* Revert 1.0.1 patch that ignored sockets with unknown ID values since it - only slightly helped POWER7 machines with old Linux kernels while it - prevents recent kernels from getting the complete POWER7 topology. -* Fix hwloc_get_common_ancestor_obj(). -* Remove arch-specific bits in public headers. -* Some fixes in the lstopo graphical output. -* Various man page clarifications and minor updates. - - -Version 1.0.1 -------------- - -* Various Solaris fixes. Thanks to Yannick Martin for reporting the issue. -* Fix "non-native" builds on x86 platforms (e.g., when building 32 - bit executables with compilers that natively build 64 bit). -* Ignore sockets with unknown ID values (which fixes issues on POWER7 - machines). Thanks to Greg Bauer for reporting the issue. -* Various man page clarifications and minor updates. -* Fixed memory leaks in hwloc_setup_group_from_min_distance_clique(). -* Fix cache type filtering on MS Windows 7. Thanks to Αλέξανδρος - Παπαδογιαννάκ for reporting the issue. -* Fixed warnings when compiling with -DNDEBUG. - - -Version 1.0.0 -------------- - -* The ABI of the library has changed. -* Backend updates - + Add FreeBSD support. - + Add x86 cpuid based backend. - + Add Linux cgroup support to the Linux cpuset code. - + Support binding of entire multithreaded process on Linux. - + Fix and enable Group support in Windows. - + Cleanup XML export/import. -* Objects - + HWLOC_OBJ_PROC is renamed into HWLOC_OBJ_PU for "Processing Unit", - its stringified type name is now "PU". - + Use new HWLOC_OBJ_GROUP objects instead of MISC when grouping - objects according to NUMA distances or arbitrary OS aggregation. - + Rework memory attributes. - + Add different cpusets in each object to specify processors that - are offline, unavailable, ... - + Cleanup the storage of object names and DMI infos. -* Features - + Add support for looking up specific PID topology information. - + Add hwloc_topology_export_xml() to export the topology in a XML file. - + Add hwloc_topology_get_support() to retrieve the supported features - for the current topology context. - + Support non-SYSTEM object as the root of the tree, use MACHINE in - most common cases. - + Add hwloc_get_*cpubind() routines to retrieve the current binding - of processes and threads. -* API - + Add HWLOC_API_VERSION to help detect the currently used API version. - + Add missing ending "e" to *compare* functions. - + Add several routines to emulate PLPA functions. - + Rename and rework the cpuset and/or/xor/not/clear operators to output - their result in a dedicated argument instead of modifying one input. - + Deprecate hwloc_obj_snprintf() in favor of hwloc_obj_type/attr_snprintf(). - + Clarify the use of parent and ancestor in the API, do not use father. - + Replace hwloc_get_system_obj() with hwloc_get_root_obj(). - + Return -1 instead of HWLOC_OBJ_TYPE_MAX in the API since the latter - isn't public. - + Relax constraints in hwloc_obj_type_of_string(). - + Improve displaying of memory sizes. - + Add 0x prefix to cpuset strings. -* Tools - + lstopo now displays logical indexes by default, use --physical to - revert back to OS/physical indexes. - + Add colors in the lstopo graphical outputs to distinguish between online, - offline, reserved, ... objects. - + Extend lstopo to show cpusets, filter objects by type, ... - + Renamed hwloc-mask into hwloc-calc which supports many new options. -* Documentation - + Add a hwloc(7) manpage containing general information. - + Add documentation about how to switch from PLPA to hwloc. - + Cleanup the distributed documentation files. -* Miscellaneous - + Many compilers warning fixes. - + Cleanup the ABI by using the visibility attribute. - + Add project embedding support. - - -Version 0.9.4 (unreleased) --------------------------- - -* Fix reseting colors to normal in lstopo -.txt output. -* Fix Linux pthread_t binding error report. - - -Version 0.9.3 -------------- - -* Fix autogen.sh to work with Autoconf 2.63. -* Fix various crashes in particular conditions: - - xml files with root attributes - - offline CPUs - - partial sysfs support - - unparseable /proc/cpuinfo - - ignoring NUMA level while Misc level have been generated -* Tweak documentation a bit -* Do not require the pthread library for binding the current thread on Linux -* Do not erroneously consider the sched_setaffinity prototype is the old version - when there is actually none. -* Fix _syscall3 compilation on archs for which we do not have the - sched_setaffinity system call number. -* Fix AIX binding. -* Fix libraries dependencies: now only lstopo depends on libtermcap, fix - binutils-gold link -* Have make check always build and run hwloc-hello.c -* Do not limit size of a cpuset. - - -Version 0.9.2 -------------- - -* Trivial documentation changes. - - -Version 0.9.1 -------------- - -* Re-branded to "hwloc" and moved to the Open MPI project, relicensed under the - BSD license. -* The prefix of all functions and tools is now hwloc, and some public - functions were also renamed for real. -* Group NUMA nodes into Misc objects according to their physical distance - that may be reported by the OS/BIOS. - May be ignored by setting HWLOC_IGNORE_DISTANCES=1 in the environment. -* Ignore offline CPUs on Solaris. -* Improved binding support on AIX. -* Add HP-UX support. -* CPU sets are now allocated/freed dynamically. -* Add command line options to tune the lstopo graphical output, add - semi-graphical textual output -* Extend topobind to support multiple cpusets or objects on the command - line as topomask does. -* Add an Infiniband-specific helper hwloc/openfabrics-verbs.h to retrieve - the physical location of IB devices. - - -Version 0.9 (libtopology) -------------------------- - -* First release. diff --git a/opal/mca/hwloc/hwloc1113/hwloc/README b/opal/mca/hwloc/hwloc1113/hwloc/README deleted file mode 100644 index 07abc25a14a..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/README +++ /dev/null @@ -1,83 +0,0 @@ -Introduction - -hwloc provides command line tools and a C API to obtain the hierarchical map of -key computing elements, such as: NUMA memory nodes, shared caches, processor -packages, processor cores, processing units (logical processors or "threads") -and even I/O devices. hwloc also gathers various attributes such as cache and -memory information, and is portable across a variety of different operating -systems and platforms. Additionally it may assemble the topologies of multiple -machines into a single one so as to let applications consult the topology of an -entire fabric or cluster at once. - -hwloc primarily aims at helping high-performance computing (HPC) applications, -but is also applicable to any project seeking to exploit code and/or data -locality on modern computing platforms. - -Note that the hwloc project represents the merger of the libtopology project -from inria and the Portable Linux Processor Affinity (PLPA) sub-project from -Open MPI. Both of these prior projects are now deprecated. The first hwloc -release was essentially a "re-branding" of the libtopology code base, but with -both a few genuinely new features and a few PLPA-like features added in. Prior -releases of hwloc included documentation about switching from PLPA to hwloc; -this documentation has been dropped on the assumption that everyone who was -using PLPA has already switched to hwloc. - -hwloc supports the following operating systems: - - * Linux (including old kernels not having sysfs topology information, with - knowledge of cpusets, offline CPUs, ScaleMP vSMP and Kerrighed support) on - all supported hardware, including Intel Xeon Phi (KNL and KNC, either - standalone or as a coprocessor) and NumaScale NumaConnect. - * Solaris - * AIX - * Darwin / OS X - * FreeBSD and its variants (such as kFreeBSD/GNU) - * NetBSD - * OSF/1 (a.k.a., Tru64) - * HP-UX - * Microsoft Windows - * IBM BlueGene/Q Compute Node Kernel (CNK) - -Since it uses standard Operating System information, hwloc's support is mostly -independant from the processor type (x86, powerpc, ...) and just relies on the -Operating System support. The only exception to this is kFreeBSD, which does -not support topology information, and hwloc thus uses an x86-only CPUID-based -backend (which can be used for other OSes too, see the Components and plugins -section). - -To check whether hwloc works on a particular machine, just try to build it and -run lstopo or lstopo-no-graphics. If some things do not look right (e.g. bogus -or missing cache information), see Questions and Bugs below. - -hwloc only reports the number of processors on unsupported operating systems; -no topology information is available. - -For development and debugging purposes, hwloc also offers the ability to work -on "fake" topologies: - - * Symmetrical tree of resources generated from a list of level arities - * Remote machine simulation through the gathering of Linux sysfs topology - files - -hwloc can display the topology in a human-readable format, either in graphical -mode (X11), or by exporting in one of several different formats, including: -plain text, PDF, PNG, and FIG (see CLI Examples below). Note that some of the -export formats require additional support libraries. - -hwloc offers a programming interface for manipulating topologies and objects. -It also brings a powerful CPU bitmap API that is used to describe topology -objects location on physical/logical processors. See the Programming Interface -below. It may also be used to binding applications onto certain cores or memory -nodes. Several utility programs are also provided to ease command-line -manipulation of topology objects, binding of processes, and so on. - -Perl bindings are available from Bernd Kallies on CPAN. - -Python bindings are available from Guy Streeter: - - * Fedora RPM and tarball. - * git tree (html). - - - -See https://www.open-mpi.org/projects/hwloc/doc/ for more hwloc documentation. diff --git a/opal/mca/hwloc/hwloc1113/hwloc/VERSION b/opal/mca/hwloc/hwloc1113/hwloc/VERSION deleted file mode 100644 index d840fbcc0ad..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/VERSION +++ /dev/null @@ -1,46 +0,0 @@ -# This is the VERSION file for hwloc, describing the precise version -# of hwloc in this distribution. The various components of the version -# number below are combined to form a single version number string. - -# major, minor, and release are generally combined in the form -# ... If release is zero, then it is omitted. - -# Please update HWLOC_VERSION in contrib/windows/private_config.h too. - -major=1 -minor=11 -release=3 - -# greek is used for alpha or beta release tags. If it is non-empty, -# it will be appended to the version number. It does not have to be -# numeric. Common examples include a1 (alpha release 1), b1 (beta -# release 1), sc2005 (Super Computing 2005 release). The only -# requirement is that it must be entirely printable ASCII characters -# and have no white space. - -greek= - -# The date when this release was created - -date="Apr 26, 2016" - -# If snapshot=1, then use the value from snapshot_version as the -# entire hwloc version (i.e., ignore major, minor, release, and -# greek). This is only set to 1 when making snapshot tarballs. -snapshot=0 -snapshot_version=${major}.${minor}.${release}${greek}-git - -# The shared library version of hwloc's public library. This version -# is maintained in accordance with the "Library Interface Versions" -# chapter from the GNU Libtool documentation. Notes: - -# 1. Since version numbers are associated with *releases*, the version -# number maintained on the hwloc git master (and developer branches) -# is always 0:0:0. - -# 2. Version numbers are described in the Libtool current:revision:age -# format. - -libhwloc_so_version=12:0:7 - -# Please also update the lines in contrib/windows/libhwloc.vcxproj diff --git a/opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_internal.m4 b/opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_internal.m4 deleted file mode 100644 index 20fb77bca43..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_internal.m4 +++ /dev/null @@ -1,433 +0,0 @@ -dnl -*- Autoconf -*- -dnl -dnl Copyright © 2009-2016 Inria. All rights reserved. -dnl Copyright © 2009, 2011 Université Bordeaux -dnl Copyright © 2004-2005 The Trustees of Indiana University and Indiana -dnl University Research and Technology -dnl Corporation. All rights reserved. -dnl Copyright © 2004-2005 The Regents of the University of California. -dnl All rights reserved. -dnl Copyright © 2004-2008 High Performance Computing Center Stuttgart, -dnl University of Stuttgart. All rights reserved. -dnl Copyright © 2006-2014 Cisco Systems, Inc. All rights reserved. -dnl -dnl See COPYING in top-level directory. - -#----------------------------------------------------------------------- - -# Probably only ever invoked by hwloc's configure.ac -AC_DEFUN([HWLOC_BUILD_STANDALONE],[ - hwloc_mode=standalone -])dnl - -#----------------------------------------------------------------------- - -# Probably only ever invoked by hwloc's configure.ac -AC_DEFUN([HWLOC_DEFINE_ARGS],[ - # Embedded mode, or standalone? - AC_ARG_ENABLE([embedded-mode], - AC_HELP_STRING([--enable-embedded-mode], - [Using --enable-embedded-mode puts the HWLOC into "embedded" mode. The default is --disable-embedded-mode, meaning that the HWLOC is in "standalone" mode.])) - - # Change the symbol prefix? - AC_ARG_WITH([hwloc-symbol-prefix], - AC_HELP_STRING([--with-hwloc-symbol-prefix=STRING], - [STRING can be any valid C symbol name. It will be prefixed to all public HWLOC symbols. Default: "hwloc_"])) - - # Debug mode? - AC_ARG_ENABLE([debug], - AC_HELP_STRING([--enable-debug], - [Using --enable-debug enables various hwloc maintainer-level debugging controls. This option is not recomended for end users.])) - - # Doxygen? - AC_ARG_ENABLE([doxygen], - [AC_HELP_STRING([--enable-doxygen], - [enable support for building Doxygen documentation (note that this option is ONLY relevant in developer builds; Doxygen documentation is pre-built for tarball builds and this option is therefore ignored)])]) - - # Picky? - AC_ARG_ENABLE(picky, - AC_HELP_STRING([--disable-picky], - [When in developer checkouts of hwloc and compiling with gcc, the default is to enable maximum compiler pickyness. Using --disable-picky or --enable-picky overrides any default setting])) - - # Cairo? - AC_ARG_ENABLE([cairo], - AS_HELP_STRING([--disable-cairo], - [Disable the Cairo back-end of hwloc's lstopo command])) - - # CPUID - AC_ARG_ENABLE([cpuid], - AS_HELP_STRING([--disable-cpuid], - [Disable the cpuid-based architecture specific support (x86 component)])) - - # XML using libxml2? - AC_ARG_ENABLE([libxml2], - AS_HELP_STRING([--disable-libxml2], - [Do not use libxml2 for XML support, use a custom minimalistic support])) - - # PCI? - AC_ARG_ENABLE([pci], - AS_HELP_STRING([--disable-pci], - [Disable the PCI device discovery])) - - # OpenCL? - AC_ARG_ENABLE([opencl], - AS_HELP_STRING([--disable-opencl], - [Disable the OpenCL device discovery])) - - # CUDA? - AC_ARG_ENABLE([cuda], - AS_HELP_STRING([--disable-cuda], - [Disable the CUDA device discovery using libcudart])) - - # NVML? - AC_ARG_ENABLE([nvml], - AS_HELP_STRING([--disable-nvml], - [Disable the NVML device discovery])) - - # GL/Display - AC_ARG_ENABLE([gl], - AS_HELP_STRING([--disable-gl], - [Disable the GL display device discovery])) - - # Linux libnuma - AC_ARG_ENABLE([libnuma], - AS_HELP_STRING([--disable-libnuma], - [Disable the Linux libnuma])) - - # LibUdev - AC_ARG_ENABLE([libudev], - AS_HELP_STRING([--disable-libudev], - [Disable the Linux libudev])) - - # Plugins - AC_ARG_ENABLE([plugins], - AS_HELP_STRING([--enable-plugins=name,...], - [Build the given components as dynamically-loaded plugins])) - -])dnl - -#----------------------------------------------------------------------- - -dnl We only build documentation if this is a developer checkout. -dnl Distribution tarballs just install pre-built docuemntation that was -dnl included in the tarball. - -# Probably only ever invoked by hwloc's configure.ac -AC_DEFUN([HWLOC_SETUP_DOCS],[ - cat < /dev/null` - - AC_ARG_VAR([PDFLATEX], [Location of the pdflatex program (required for building the hwloc doxygen documentation)]) - AC_PATH_TOOL([PDFLATEX], [pdflatex]) - - AC_ARG_VAR([MAKEINDEX], [Location of the makeindex program (required for building the hwloc doxygen documentation)]) - AC_PATH_TOOL([MAKEINDEX], [makeindex]) - - AC_ARG_VAR([FIG2DEV], [Location of the fig2dev program (required for building the hwloc doxygen documentation)]) - AC_PATH_TOOL([FIG2DEV], [fig2dev]) - - AC_ARG_VAR([GS], [Location of the gs program (required for building the hwloc doxygen documentation)]) - AC_PATH_TOOL([GS], [gs]) - - AC_ARG_VAR([EPSTOPDF], [Location of the epstopdf program (required for building the hwloc doxygen documentation)]) - AC_PATH_TOOL([EPSTOPDF], [epstopdf]) - - AC_MSG_CHECKING([if can build doxygen docs]) - AS_IF([test "x$DOXYGEN" != "x" -a "x$PDFLATEX" != "x" -a "x$MAKEINDEX" != "x" -a "x$FIG2DEV" != "x" -a "x$GS" != "x" -a "x$EPSTOPDF" != "x"], - [hwloc_generate_doxs=yes], [hwloc_generate_doxs=no]) - AC_MSG_RESULT([$hwloc_generate_doxs]) - AS_IF([test "x$hwloc_generate_doxs" = xyes -a "x$HWLOC_DOXYGEN_VERSION" = x1.6.2], - [hwloc_generate_doxs="no"; AC_MSG_WARN([doxygen 1.6.2 has broken short name support, disabling])]) - - AC_REQUIRE([AC_PROG_SED]) - - # Making the top-level README requires w3m or lynx. - AC_ARG_VAR([W3M], [Location of the w3m program (required to building the top-level hwloc README file)]) - AC_PATH_TOOL([W3M], [w3m]) - AC_ARG_VAR([LYNX], [Location of the lynx program (required to building the top-level hwloc README file)]) - AC_PATH_TOOL([LYNX], [lynx]) - - AC_MSG_CHECKING([if can build top-level README]) - AS_IF([test "x$W3M" != "x"], - [hwloc_generate_readme=yes - HWLOC_W3_GENERATOR=$W3M], - [AS_IF([test "x$LYNX" != "x"], - [hwloc_generate_readme=yes - HWLOC_W3_GENERATOR="$LYNX -dump -nolist"], - [hwloc_generate_readme=no])]) - AC_SUBST(HWLOC_W3_GENERATOR) - AC_MSG_RESULT([$hwloc_generate_readme]) - - # If any one of the above tools is missing, we will refuse to make dist. - AC_MSG_CHECKING([if will build doxygen docs]) - AS_IF([test "x$hwloc_generate_doxs" = "xyes" -a "x$enable_doxygen" != "xno"], - [], [hwloc_generate_doxs=no]) - AC_MSG_RESULT([$hwloc_generate_doxs]) - - # See if we want to install the doxygen docs - AC_MSG_CHECKING([if will install doxygen docs]) - AS_IF([test "x$hwloc_generate_doxs" = "xyes" -o \ - -f "$srcdir/doc/doxygen-doc/man/man3/hwloc_distrib.3" -a \ - -f "$srcdir/doc/doxygen-doc/hwloc-a4.pdf" -a \ - -f "$srcdir/doc/doxygen-doc/hwloc-letter.pdf"], - [hwloc_install_doxs=yes], - [hwloc_install_doxs=no]) - AC_MSG_RESULT([$hwloc_install_doxs]) - - # For the common developer case, if we're in a developer checkout and - # using the GNU compilers, turn on maximum warnings unless - # specifically disabled by the user. - AC_MSG_CHECKING([whether to enable "picky" compiler mode]) - hwloc_want_picky=0 - AS_IF([test "$hwloc_c_vendor" = "gnu"], - [AS_IF([test -d "$srcdir/.hg" -o -d "$srcdir/.git"], - [hwloc_want_picky=1])]) - if test "$enable_picky" = "yes"; then - if test "$GCC" = "yes"; then - AC_MSG_RESULT([yes]) - hwloc_want_picky=1 - else - AC_MSG_RESULT([no]) - AC_MSG_WARN([Warning: --enable-picky used, but is currently only defined for the GCC compiler set -- automatically disabled]) - hwloc_want_picky=0 - fi - elif test "$enable_picky" = "no"; then - AC_MSG_RESULT([no]) - hwloc_want_picky=0 - else - if test "$hwloc_want_picky" = 1; then - AC_MSG_RESULT([yes (default)]) - else - AC_MSG_RESULT([no (default)]) - fi - fi - if test "$hwloc_want_picky" = 1; then - add="-Wall -Wunused-parameter -Wundef -Wno-long-long -Wsign-compare" - add="$add -Wmissing-prototypes -Wstrict-prototypes" - add="$add -Wcomment -pedantic" - - HWLOC_CFLAGS="$HWLOC_CFLAGS $add" - fi - - # Generate some files for the docs - AC_CONFIG_FILES( - hwloc_config_prefix[doc/Makefile] - hwloc_config_prefix[doc/examples/Makefile] - hwloc_config_prefix[doc/doxygen-config.cfg]) -]) - -#----------------------------------------------------------------------- - -# Probably only ever invoked by hwloc's configure.ac -AC_DEFUN([HWLOC_SETUP_UTILS],[ - cat <= 2.70 and in some backports - if test "x${runstatedir}" != "x"; then - HWLOC_runstatedir=${runstatedir} - else - HWLOC_runstatedir='${localstatedir}/run' - fi - AC_SUBST([HWLOC_runstatedir]) - - # Cairo support - hwloc_cairo_happy=no - if test "x$enable_cairo" != "xno"; then - HWLOC_PKG_CHECK_MODULES([CAIRO], [cairo], [cairo_fill], [cairo.h], - [hwloc_cairo_happy=yes], - [hwloc_cairo_happy=no]) - fi - - if test "x$hwloc_cairo_happy" = "xyes"; then - AC_DEFINE([HWLOC_HAVE_CAIRO], [1], [Define to 1 if you have the `cairo' library.]) - else - AS_IF([test "$enable_cairo" = "yes"], - [AC_MSG_WARN([--enable-cairo requested, but Cairo/X11 support was not found]) - AC_MSG_ERROR([Cannot continue])]) - fi - - AC_CHECK_TYPES([wchar_t], [ - AC_CHECK_FUNCS([putwc]) - ], [], [[#include ]]) - - HWLOC_XML_LOCALIZED=1 - AC_CHECK_HEADERS([locale.h xlocale.h], [ - AC_CHECK_FUNCS([setlocale]) - AC_CHECK_FUNCS([uselocale], [HWLOC_XML_LOCALIZED=0]) - ]) - AC_SUBST([HWLOC_XML_LOCALIZED]) - AC_CHECK_HEADERS([langinfo.h], [ - AC_CHECK_FUNCS([nl_langinfo]) - ]) - hwloc_old_LIBS="$LIBS" - chosen_curses="" - for curses in ncurses curses - do - for lib in "" -ltermcap -l${curses}w -l$curses - do - AC_MSG_CHECKING(termcap support using $curses and $lib) - LIBS="$hwloc_old_LIBS $lib" - AC_LINK_IFELSE([AC_LANG_PROGRAM([[ -#include <$curses.h> -#include -]], [[tparm(NULL, 0, 0, 0, 0, 0, 0, 0, 0, 0)]])], [ - AC_MSG_RESULT(yes) - AC_SUBST([HWLOC_TERMCAP_LIBS], ["$LIBS"]) - AC_DEFINE([HWLOC_HAVE_LIBTERMCAP], [1], - [Define to 1 if you have a library providing the termcap interface]) - chosen_curses=$curses - ], [ - AC_MSG_RESULT(no) - ]) - test "x$chosen_curses" != "x" && break - done - test "x$chosen_curses" != "x" && break - done - if test "$chosen_curses" = ncurses - then - AC_DEFINE([HWLOC_USE_NCURSES], [1], [Define to 1 if ncurses works, preferred over curses]) - fi - LIBS="$hwloc_old_LIBS" - unset hwloc_old_LIBS - - AC_PATH_TOOL(RMPATH, rm) - - _HWLOC_CHECK_DIFF_U - _HWLOC_CHECK_DIFF_W - - # Only generate this if we're building the utilities - AC_CONFIG_FILES( - hwloc_config_prefix[utils/Makefile] - hwloc_config_prefix[utils/hwloc/Makefile] - hwloc_config_prefix[utils/lstopo/Makefile] - hwloc_config_prefix[hwloc.pc]) -])dnl - -#----------------------------------------------------------------------- - -# Probably only ever invoked by hwloc's configure.ac -AC_DEFUN([HWLOC_SETUP_TESTS],[ - cat <]) - - AC_CHECK_HEADERS([infiniband/verbs.h], [ - AC_CHECK_LIB([ibverbs], [ibv_open_device], - [AC_DEFINE([HAVE_LIBIBVERBS], 1, [Define to 1 if we have -libverbs]) - hwloc_have_libibverbs=yes]) - ]) - - AC_CHECK_HEADERS([myriexpress.h], [ - AC_MSG_CHECKING(if MX_NUMA_NODE exists) - AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[#include ]], - [[int a = MX_NUMA_NODE;]])], - [AC_MSG_RESULT(yes) - AC_CHECK_LIB([myriexpress], [mx_get_info], - [AC_DEFINE([HAVE_MYRIEXPRESS], 1, [Define to 1 if we have -lmyriexpress]) - hwloc_have_myriexpress=yes])], - [AC_MSG_RESULT(no)])]) - - AC_CHECK_PROGS(XMLLINT, [xmllint]) - - AC_CHECK_PROGS(BUNZIPP, bunzip2, false) - - AC_MSG_CHECKING(if CXX works) - AC_LANG_PUSH([C++]) - AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[ -#include -using namespace std; -int foo(void) { - cout << "test" << endl; - return 0; -} - ]])], [hwloc_have_cxx=yes], [hwloc_have_cxx=no]) - AC_LANG_POP([C++]) - AC_MSG_RESULT([$hwloc_have_cxx]) - - _HWLOC_CHECK_DIFF_U - - # Only generate these files if we're making the tests - AC_CONFIG_FILES( - hwloc_config_prefix[tests/Makefile] - hwloc_config_prefix[tests/linux/Makefile] - hwloc_config_prefix[tests/linux/gather/Makefile] - hwloc_config_prefix[tests/xml/Makefile] - hwloc_config_prefix[tests/ports/Makefile] - hwloc_config_prefix[tests/rename/Makefile] - hwloc_config_prefix[tests/linux/gather/test-gather-topology.sh] - hwloc_config_prefix[tests/linux/test-topology.sh] - hwloc_config_prefix[tests/xml/test-topology.sh] - hwloc_config_prefix[tests/wrapper.sh] - hwloc_config_prefix[utils/hwloc/hwloc-assembler-remote] - hwloc_config_prefix[utils/hwloc/hwloc-compress-dir] - hwloc_config_prefix[utils/hwloc/hwloc-gather-topology] - hwloc_config_prefix[utils/hwloc/test-hwloc-annotate.sh] - hwloc_config_prefix[utils/hwloc/test-hwloc-assembler.sh] - hwloc_config_prefix[utils/hwloc/test-hwloc-calc.sh] - hwloc_config_prefix[utils/hwloc/test-hwloc-compress-dir.sh] - hwloc_config_prefix[utils/hwloc/test-hwloc-diffpatch.sh] - hwloc_config_prefix[utils/hwloc/test-hwloc-distances.sh] - hwloc_config_prefix[utils/hwloc/test-hwloc-distrib.sh] - hwloc_config_prefix[utils/hwloc/test-hwloc-info.sh] - hwloc_config_prefix[utils/hwloc/test-fake-plugin.sh] - hwloc_config_prefix[utils/lstopo/test-hwloc-ls.sh] - hwloc_config_prefix[contrib/systemd/Makefile]) - - AC_CONFIG_COMMANDS([chmoding-scripts], [chmod +x ]hwloc_config_prefix[tests/linux/test-topology.sh ]hwloc_config_prefix[tests/xml/test-topology.sh ]hwloc_config_prefix[tests/linux/gather/test-gather-topology.sh ]hwloc_config_prefix[tests/wrapper.sh ]hwloc_config_prefix[utils/hwloc/hwloc-assembler-remote ]hwloc_config_prefix[utils/hwloc/hwloc-compress-dir ]hwloc_config_prefix[utils/hwloc/hwloc-gather-topology ]hwloc_config_prefix[utils/hwloc/test-hwloc-annotate.sh ]hwloc_config_prefix[utils/hwloc/test-hwloc-assembler.sh ]hwloc_config_prefix[utils/hwloc/test-hwloc-calc.sh ]hwloc_config_prefix[utils/hwloc/test-hwloc-compress-dir.sh ]hwloc_config_prefix[utils/hwloc/test-hwloc-diffpatch.sh ]hwloc_config_prefix[utils/hwloc/test-hwloc-distances.sh ]hwloc_config_prefix[utils/hwloc/test-hwloc-distrib.sh ]hwloc_config_prefix[utils/hwloc/test-hwloc-info.sh ]hwloc_config_prefix[utils/hwloc/test-fake-plugin.sh ]hwloc_config_prefix[utils/lstopo/test-hwloc-ls.sh]) - - # These links are only needed in standalone mode. It would - # be nice to m4 foreach this somehow, but whenever I tried - # it, I got obscure "invalid tag" errors from - # AC_CONFIG_LINKS. :-\ Since these tests are only run when - # built in standalone mode, only generate them in - # standalone mode. - AC_CONFIG_LINKS( - hwloc_config_prefix[tests/ports/topology-solaris.c]:hwloc_config_prefix[src/topology-solaris.c] - hwloc_config_prefix[tests/ports/topology-solaris-chiptype.c]:hwloc_config_prefix[src/topology-solaris-chiptype.c] - hwloc_config_prefix[tests/ports/topology-aix.c]:hwloc_config_prefix[src/topology-aix.c] - hwloc_config_prefix[tests/ports/topology-osf.c]:hwloc_config_prefix[src/topology-osf.c] - hwloc_config_prefix[tests/ports/topology-windows.c]:hwloc_config_prefix[src/topology-windows.c] - hwloc_config_prefix[tests/ports/topology-darwin.c]:hwloc_config_prefix[src/topology-darwin.c] - hwloc_config_prefix[tests/ports/topology-freebsd.c]:hwloc_config_prefix[src/topology-freebsd.c] - hwloc_config_prefix[tests/ports/topology-netbsd.c]:hwloc_config_prefix[src/topology-netbsd.c] - hwloc_config_prefix[tests/ports/topology-hpux.c]:hwloc_config_prefix[src/topology-hpux.c] - hwloc_config_prefix[tests/ports/topology-bgq.c]:hwloc_config_prefix[src/topology-bgq.c] - hwloc_config_prefix[tests/ports/topology-opencl.c]:hwloc_config_prefix[src/topology-opencl.c] - hwloc_config_prefix[tests/ports/topology-cuda.c]:hwloc_config_prefix[src/topology-cuda.c] - hwloc_config_prefix[tests/ports/topology-nvml.c]:hwloc_config_prefix[src/topology-nvml.c] - hwloc_config_prefix[tests/ports/topology-gl.c]:hwloc_config_prefix[src/topology-gl.c]) - ]) -])dnl diff --git a/opal/mca/hwloc/hwloc1113/hwloc/configure.ac b/opal/mca/hwloc/hwloc1113/hwloc/configure.ac deleted file mode 100644 index d4b51b3b05e..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/configure.ac +++ /dev/null @@ -1,242 +0,0 @@ -# -*- shell-script -*- -# -# Copyright © 2009 CNRS -# Copyright © 2009-2015 Inria. All rights reserved. -# Copyright © 2009, 2011-2012 Université Bordeaux -# Copyright © 2009-2014 Cisco Systems, Inc. All rights reserved. -# -# See COPYING in top-level directory. -# -# Additional copyrights may follow -# -# $HEADER$ -# - -#################################################################### -# Autoconf, Automake, and Libtool bootstrapping -#################################################################### - -AC_INIT([hwloc], - [m4_normalize(esyscmd([config/hwloc_get_version.sh VERSION --version]))], - [http://www.open-mpi.org/projects/hwloc/], [hwloc]) -AC_PREREQ(2.63) -AC_CONFIG_AUX_DIR(./config) -# Note that this directory must *exactly* match what was specified via -# -I in ACLOCAL_AMFLAGS in the top-level Makefile.am. -AC_CONFIG_MACRO_DIR(./config) - -cat <0 and <3), -# but it is necessary in AM 1.12.x. -m4_ifdef([AM_PROG_AR], [AM_PROG_AR]) - -AC_ARG_VAR(CC_FOR_BUILD,[build system C compiler]) -AS_IF([test -z "$CC_FOR_BUILD"],[ - AC_SUBST([CC_FOR_BUILD], [$CC]) -]) - -#################################################################### -# CLI arguments -#################################################################### - -# Define hwloc's configure arguments -HWLOC_DEFINE_ARGS - -# If debug mode, add -g -AS_IF([test "$hwloc_debug" = "1"], - [CFLAGS="$CFLAGS -g"]) - -# If the user didn't specifically ask for embedding mode, default to -# standalone mode -AS_IF([test "$enable_embedded_mode" != "yes"], - [AS_IF([test ! -d "$srcdir/doc"], - [AC_MSG_WARN([The hwloc source tree looks incomplete for a standalone]) - AC_MSG_WARN([build. Perhaps this hwloc tree is intended for an embedded]) - AC_MSG_WARN([build? Try using the --enable-embedded-mode switch.]) - AC_MSG_ERROR([Cannot build standalone hwloc])], - [HWLOC_BUILD_STANDALONE])]) - -#################################################################### -# Setup for the hwloc API -#################################################################### - -# Setup the hwloc core -HWLOC_SETUP_CORE([], [], [AC_MSG_ERROR([Cannot build hwloc core])], [1]) - -# Setup hwloc's docs, utils, and tests -AS_IF([test "$hwloc_mode" = "standalone"], - [HWLOC_SETUP_DOCS - HWLOC_SETUP_UTILS - HWLOC_SETUP_TESTS]) - -cat < -#include -#include -#include -#include - -/* - * Symbol transforms - */ -#include - -/* - * Bitmap definitions - */ - -#include - - -#ifdef __cplusplus -extern "C" { -#endif - - -/** \defgroup hwlocality_api_version API version - * @{ - */ - -/** \brief Indicate at build time which hwloc API version is being used. */ -#define HWLOC_API_VERSION 0x00010b00 - -/** \brief Indicate at runtime which hwloc API version was used at build time. - * - * Should be ::HWLOC_API_VERSION if running on the same version. - */ -HWLOC_DECLSPEC unsigned hwloc_get_api_version(void); - -/** \brief Current component and plugin ABI version (see hwloc/plugins.h) */ -#define HWLOC_COMPONENT_ABI 4 - -/** @} */ - - - -/** \defgroup hwlocality_object_sets Object Sets (hwloc_cpuset_t and hwloc_nodeset_t) - * - * Hwloc uses bitmaps to represent two distinct kinds of object sets: - * CPU sets (::hwloc_cpuset_t) and NUMA node sets (::hwloc_nodeset_t). - * These types are both typedefs to a common back end type - * (::hwloc_bitmap_t), and therefore all the hwloc bitmap functions - * are applicable to both ::hwloc_cpuset_t and ::hwloc_nodeset_t (see - * \ref hwlocality_bitmap). - * - * The rationale for having two different types is that even though - * the actions one wants to perform on these types are the same (e.g., - * enable and disable individual items in the set/mask), they're used - * in very different contexts: one for specifying which processors to - * use and one for specifying which NUMA nodes to use. Hence, the - * name difference is really just to reflect the intent of where the - * type is used. - * - * @{ - */ - -/** \brief A CPU set is a bitmap whose bits are set according to CPU - * physical OS indexes. - * - * It may be consulted and modified with the bitmap API as any - * ::hwloc_bitmap_t (see hwloc/bitmap.h). - * - * Each bit may be converted into a PU object using - * hwloc_get_pu_obj_by_os_index(). - */ -typedef hwloc_bitmap_t hwloc_cpuset_t; -/** \brief A non-modifiable ::hwloc_cpuset_t. */ -typedef hwloc_const_bitmap_t hwloc_const_cpuset_t; - -/** \brief A node set is a bitmap whose bits are set according to NUMA - * memory node physical OS indexes. - * - * It may be consulted and modified with the bitmap API as any - * ::hwloc_bitmap_t (see hwloc/bitmap.h). - * Each bit may be converted into a NUMA node object using - * hwloc_get_numanode_obj_by_os_index(). - * - * When binding memory on a system without any NUMA node - * (when the whole memory is considered as a single memory bank), - * the nodeset may be either empty (no memory selected) - * or full (whole system memory selected). - * - * See also \ref hwlocality_helper_nodeset_convert. - */ -typedef hwloc_bitmap_t hwloc_nodeset_t; -/** \brief A non-modifiable ::hwloc_nodeset_t. - */ -typedef hwloc_const_bitmap_t hwloc_const_nodeset_t; - -/** @} */ - - - -/** \defgroup hwlocality_object_types Object Types - * @{ - */ - -/** \brief Type of topology object. - * - * \note Do not rely on the ordering or completeness of the values as new ones - * may be defined in the future! If you need to compare types, use - * hwloc_compare_types() instead. - */ -typedef enum { - /* *************************************************************** - WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING - - If new enum values are added here, you MUST also go update the - obj_type_order[] and obj_order_type[] arrays in src/topology.c. - - WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING - *************************************************************** */ - - HWLOC_OBJ_SYSTEM, /**< \brief Whole system (may be a cluster of machines). - * The whole system that is accessible to hwloc. - * That may comprise several machines in SSI systems - * like Kerrighed. - */ - HWLOC_OBJ_MACHINE, /**< \brief Machine. - * The typical root object type. - * A set of processors and memory with cache - * coherency. - */ - HWLOC_OBJ_NUMANODE, /**< \brief NUMA node. - * A set of processors around memory which the - * processors can directly access. - */ - HWLOC_OBJ_PACKAGE, /**< \brief Physical package, what goes into a socket. - * In the physical meaning, i.e. that you can add - * or remove physically. - */ - HWLOC_OBJ_CACHE, /**< \brief Cache. - * Can be L1i, L1d, L2, L3, ... - */ - HWLOC_OBJ_CORE, /**< \brief Core. - * A computation unit (may be shared by several - * logical processors). - */ - HWLOC_OBJ_PU, /**< \brief Processing Unit, or (Logical) Processor. - * An execution unit (may share a core with some - * other logical processors, e.g. in the case of - * an SMT core). - * - * Objects of this kind are always reported and can - * thus be used as fallback when others are not. - */ - - HWLOC_OBJ_GROUP, /**< \brief Group objects. - * Objects which do not fit in the above but are - * detected by hwloc and are useful to take into - * account for affinity. For instance, some operating systems - * expose their arbitrary processors aggregation this - * way. And hwloc may insert such objects to group - * NUMA nodes according to their distances. - * See also \ref faq_groups. - * - * These objects are ignored when they do not bring - * any structure. - */ - - HWLOC_OBJ_MISC, /**< \brief Miscellaneous objects. - * Objects without particular meaning, that can e.g. be - * added by the application for its own use, or by hwloc - * for miscellaneous objects such as MemoryModule (DIMMs). - */ - - HWLOC_OBJ_BRIDGE, /**< \brief Bridge. - * Any bridge that connects the host or an I/O bus, - * to another I/O bus. - * Bridge objects have neither CPU sets nor node sets. - * They are not added to the topology unless I/O discovery - * is enabled with hwloc_topology_set_flags(). - */ - HWLOC_OBJ_PCI_DEVICE, /**< \brief PCI device. - * These objects have neither CPU sets nor node sets. - * They are not added to the topology unless I/O discovery - * is enabled with hwloc_topology_set_flags(). - */ - HWLOC_OBJ_OS_DEVICE, /**< \brief Operating system device. - * These objects have neither CPU sets nor node sets. - * They are not added to the topology unless I/O discovery - * is enabled with hwloc_topology_set_flags(). - */ - - HWLOC_OBJ_TYPE_MAX /**< \private Sentinel value */ - - /* *************************************************************** - WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING - - If new enum values are added here, you MUST also go update the - obj_type_order[] and obj_order_type[] arrays in src/topology.c. - - WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING - *************************************************************** */ -} hwloc_obj_type_t; - -/** \brief Cache type. */ -typedef enum hwloc_obj_cache_type_e { - HWLOC_OBJ_CACHE_UNIFIED, /**< \brief Unified cache. */ - HWLOC_OBJ_CACHE_DATA, /**< \brief Data cache. */ - HWLOC_OBJ_CACHE_INSTRUCTION /**< \brief Instruction cache. - * Only used when the ::HWLOC_TOPOLOGY_FLAG_ICACHES topology flag is set. */ -} hwloc_obj_cache_type_t; - -/** \brief Type of one side (upstream or downstream) of an I/O bridge. */ -typedef enum hwloc_obj_bridge_type_e { - HWLOC_OBJ_BRIDGE_HOST, /**< \brief Host-side of a bridge, only possible upstream. */ - HWLOC_OBJ_BRIDGE_PCI /**< \brief PCI-side of a bridge. */ -} hwloc_obj_bridge_type_t; - -/** \brief Type of a OS device. */ -typedef enum hwloc_obj_osdev_type_e { - HWLOC_OBJ_OSDEV_BLOCK, /**< \brief Operating system block device. - * For instance "sda" on Linux. */ - HWLOC_OBJ_OSDEV_GPU, /**< \brief Operating system GPU device. - * For instance ":0.0" for a GL display, - * "card0" for a Linux DRM device. */ - HWLOC_OBJ_OSDEV_NETWORK, /**< \brief Operating system network device. - * For instance the "eth0" interface on Linux. */ - HWLOC_OBJ_OSDEV_OPENFABRICS, /**< \brief Operating system openfabrics device. - * For instance the "mlx4_0" InfiniBand HCA, - * or "hfi1_0" Omni-Path interface on Linux. */ - HWLOC_OBJ_OSDEV_DMA, /**< \brief Operating system dma engine device. - * For instance the "dma0chan0" DMA channel on Linux. */ - HWLOC_OBJ_OSDEV_COPROC /**< \brief Operating system co-processor device. - * For instance "mic0" for a Xeon Phi (MIC) on Linux, - * "opencl0d0" for a OpenCL device, - * "cuda0" for a CUDA device. */ -} hwloc_obj_osdev_type_t; - -/** \brief Compare the depth of two object types - * - * Types shouldn't be compared as they are, since newer ones may be added in - * the future. This function returns less than, equal to, or greater than zero - * respectively if \p type1 objects usually include \p type2 objects, are the - * same as \p type2 objects, or are included in \p type2 objects. If the types - * can not be compared (because neither is usually contained in the other), - * ::HWLOC_TYPE_UNORDERED is returned. Object types containing CPUs can always - * be compared (usually, a system contains machines which contain nodes which - * contain packages which contain caches, which contain cores, which contain - * processors). - * - * \note ::HWLOC_OBJ_PU will always be the deepest. - * \note This does not mean that the actual topology will respect that order: - * e.g. as of today cores may also contain caches, and packages may also contain - * nodes. This is thus just to be seen as a fallback comparison method. - */ -HWLOC_DECLSPEC int hwloc_compare_types (hwloc_obj_type_t type1, hwloc_obj_type_t type2) __hwloc_attribute_const; - -enum hwloc_compare_types_e { - HWLOC_TYPE_UNORDERED = INT_MAX /**< \brief Value returned by hwloc_compare_types() when types can not be compared. \hideinitializer */ -}; - -/** @} */ - - - -/** \defgroup hwlocality_objects Object Structure and Attributes - * @{ - */ - -union hwloc_obj_attr_u; - -/** \brief Object memory */ -struct hwloc_obj_memory_s { - hwloc_uint64_t total_memory; /**< \brief Total memory (in bytes) in this object and its children */ - hwloc_uint64_t local_memory; /**< \brief Local memory (in bytes) */ - - /** \brief Size of array \p page_types */ - unsigned page_types_len; - /** \brief Array of local memory page types, \c NULL if no local memory and \p page_types is 0. - * - * The array is sorted by increasing \p size fields. - * It contains \p page_types_len slots. - */ - struct hwloc_obj_memory_page_type_s { - hwloc_uint64_t size; /**< \brief Size of pages */ - hwloc_uint64_t count; /**< \brief Number of pages of this size */ - } * page_types; -}; - -/** \brief Structure of a topology object - * - * Applications must not modify any field except hwloc_obj.userdata. - */ -struct hwloc_obj { - /* physical information */ - hwloc_obj_type_t type; /**< \brief Type of object */ - - unsigned os_index; /**< \brief OS-provided physical index number. - * It is not guaranteed unique across the entire machine, - * except for PUs and NUMA nodes. - */ - char *name; /**< \brief Object-specific name if any. - * Mostly used for identifying OS devices and Misc objects where - * a name string is more useful than numerical indexes. - */ - - struct hwloc_obj_memory_s memory; /**< \brief Memory attributes */ - - union hwloc_obj_attr_u *attr; /**< \brief Object type-specific Attributes, - * may be \c NULL if no attribute value was found */ - - /* global position */ - unsigned depth; /**< \brief Vertical index in the hierarchy. - * If the topology is symmetric, this is equal to the - * parent depth plus one, and also equal to the number - * of parent/child links from the root object to here. - */ - unsigned logical_index; /**< \brief Horizontal index in the whole list of similar objects, - * hence guaranteed unique across the entire machine. - * Could be a "cousin_rank" since it's the rank within the "cousin" list below - */ - signed os_level; /**< \brief OS-provided physical level, -1 if unknown or meaningless */ - - /* cousins are all objects of the same type (and depth) across the entire topology */ - struct hwloc_obj *next_cousin; /**< \brief Next object of same type and depth */ - struct hwloc_obj *prev_cousin; /**< \brief Previous object of same type and depth */ - - /* children of the same parent are siblings, even if they may have different type and depth */ - struct hwloc_obj *parent; /**< \brief Parent, \c NULL if root (system object) */ - unsigned sibling_rank; /**< \brief Index in parent's \c children[] array */ - struct hwloc_obj *next_sibling; /**< \brief Next object below the same parent */ - struct hwloc_obj *prev_sibling; /**< \brief Previous object below the same parent */ - - /* children array below this object */ - unsigned arity; /**< \brief Number of children */ - struct hwloc_obj **children; /**< \brief Children, \c children[0 .. arity -1] */ - struct hwloc_obj *first_child; /**< \brief First child */ - struct hwloc_obj *last_child; /**< \brief Last child */ - - /* misc */ - void *userdata; /**< \brief Application-given private data pointer, - * initialized to \c NULL, use it as you wish. - * See hwloc_topology_set_userdata_export_callback() - * if you wish to export this field to XML. */ - - /* cpusets and nodesets */ - hwloc_cpuset_t cpuset; /**< \brief CPUs covered by this object - * - * This is the set of CPUs for which there are PU objects in the topology - * under this object, i.e. which are known to be physically contained in this - * object and known how (the children path between this object and the PU - * objects). - * - * If the ::HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM configuration flag is set, some of - * these CPUs may be offline, or not allowed for binding, see online_cpuset - * and allowed_cpuset. - * - * \note Its value must not be changed, hwloc_bitmap_dup() must be used instead. - */ - hwloc_cpuset_t complete_cpuset; /**< \brief The complete CPU set of logical processors of this object, - * - * This includes not only the same as the cpuset field, but also the CPUs for - * which topology information is unknown or incomplete, and the CPUs that are - * ignored when the ::HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM flag is not set. - * Thus no corresponding PU object may be found in the topology, because the - * precise position is undefined. It is however known that it would be somewhere - * under this object. - * - * \note Its value must not be changed, hwloc_bitmap_dup() must be used instead. - */ - hwloc_cpuset_t online_cpuset; /**< \brief The CPU set of online logical processors - * - * This includes the CPUs contained in this object that are online, i.e. draw - * power and can execute threads. It may however not be allowed to bind to - * them due to administration rules, see allowed_cpuset. - * - * \note Its value must not be changed, hwloc_bitmap_dup() must be used instead. - */ - hwloc_cpuset_t allowed_cpuset; /**< \brief The CPU set of allowed logical processors - * - * This includes the CPUs contained in this object which are allowed for - * binding, i.e. passing them to the hwloc binding functions should not return - * permission errors. This is usually restricted by administration rules. - * Some of them may however be offline so binding to them may still not be - * possible, see online_cpuset. - * - * \note Its value must not be changed, hwloc_bitmap_dup() must be used instead. - */ - - hwloc_nodeset_t nodeset; /**< \brief NUMA nodes covered by this object or containing this object - * - * This is the set of NUMA nodes for which there are NUMA node objects in the - * topology under or above this object, i.e. which are known to be physically - * contained in this object or containing it and known how (the children path - * between this object and the NUMA node objects). - * - * In the end, these nodes are those that are close to the current object. - * - * If the ::HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM configuration flag is set, some of - * these nodes may not be allowed for allocation, see allowed_nodeset. - * - * If there are no NUMA nodes in the machine, all the memory is close to this - * object, so \p nodeset is full. - * - * \note Its value must not be changed, hwloc_bitmap_dup() must be used instead. - */ - hwloc_nodeset_t complete_nodeset; /**< \brief The complete NUMA node set of this object, - * - * This includes not only the same as the nodeset field, but also the NUMA - * nodes for which topology information is unknown or incomplete, and the nodes - * that are ignored when the ::HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM flag is not set. - * Thus no corresponding NUMA node object may be found in the topology, because the - * precise position is undefined. It is however known that it would be - * somewhere under this object. - * - * If there are no NUMA nodes in the machine, all the memory is close to this - * object, so \p complete_nodeset is full. - * - * \note Its value must not be changed, hwloc_bitmap_dup() must be used instead. - */ - hwloc_nodeset_t allowed_nodeset; /**< \brief The set of allowed NUMA memory nodes - * - * This includes the NUMA memory nodes contained in this object which are - * allowed for memory allocation, i.e. passing them to NUMA node-directed - * memory allocation should not return permission errors. This is usually - * restricted by administration rules. - * - * If there are no NUMA nodes in the machine, all the memory is close to this - * object, so \p allowed_nodeset is full. - * - * \note Its value must not be changed, hwloc_bitmap_dup() must be used instead. - */ - - struct hwloc_distances_s **distances; /**< \brief Distances between all objects at same depth below this object */ - unsigned distances_count; - - struct hwloc_obj_info_s *infos; /**< \brief Array of stringified info type=name. */ - unsigned infos_count; /**< \brief Size of infos array. */ - - int symmetric_subtree; /**< \brief Set if the subtree of objects below this object is symmetric, - * which means all children and their children have identical subtrees. - * If set in the topology root object, lstopo may export the topology - * as a synthetic string. - */ -}; -/** - * \brief Convenience typedef; a pointer to a struct hwloc_obj. - */ -typedef struct hwloc_obj * hwloc_obj_t; - -/** \brief Object type-specific Attributes */ -union hwloc_obj_attr_u { - /** \brief Cache-specific Object Attributes */ - struct hwloc_cache_attr_s { - hwloc_uint64_t size; /**< \brief Size of cache in bytes */ - unsigned depth; /**< \brief Depth of cache (e.g., L1, L2, ...etc.) */ - unsigned linesize; /**< \brief Cache-line size in bytes. 0 if unknown */ - int associativity; /**< \brief Ways of associativity, - * -1 if fully associative, 0 if unknown */ - hwloc_obj_cache_type_t type; /**< \brief Cache type */ - } cache; - /** \brief Group-specific Object Attributes */ - struct hwloc_group_attr_s { - unsigned depth; /**< \brief Depth of group object */ - } group; - /** \brief PCI Device specific Object Attributes */ - struct hwloc_pcidev_attr_s { - unsigned short domain; - unsigned char bus, dev, func; - unsigned short class_id; - unsigned short vendor_id, device_id, subvendor_id, subdevice_id; - unsigned char revision; - float linkspeed; /* in GB/s */ - } pcidev; - /** \brief Bridge specific Object Attribues */ - struct hwloc_bridge_attr_s { - union { - struct hwloc_pcidev_attr_s pci; - } upstream; - hwloc_obj_bridge_type_t upstream_type; - union { - struct { - unsigned short domain; - unsigned char secondary_bus, subordinate_bus; - } pci; - } downstream; - hwloc_obj_bridge_type_t downstream_type; - unsigned depth; - } bridge; - /** \brief OS Device specific Object Attributes */ - struct hwloc_osdev_attr_s { - hwloc_obj_osdev_type_t type; - } osdev; -}; - -/** \brief Distances between objects - * - * One object may contain a distance structure describing distances - * between all its descendants at a given relative depth. If the - * containing object is the root object of the topology, then the - * distances are available for all objects in the machine. - * - * If the \p latency pointer is not \c NULL, the pointed array contains - * memory latencies (non-zero values), see below. - * - * In the future, some other types of distances may be considered. - * In these cases, \p latency may be \c NULL. - */ -struct hwloc_distances_s { - unsigned relative_depth; /**< \brief Relative depth of the considered objects - * below the object containing this distance information. */ - unsigned nbobjs; /**< \brief Number of objects considered in the matrix. - * It is the number of descendant objects at \p relative_depth - * below the containing object. - * It corresponds to the result of hwloc_get_nbobjs_inside_cpuset_by_depth(). */ - - float *latency; /**< \brief Matrix of latencies between objects, stored as a one-dimension array. - * May be \c NULL if the distances considered here are not latencies. - * - * Unless defined by the user, this currently contains latencies - * between NUMA nodes (as reported in the System Locality Distance Information Table - * (SLIT) in the ACPI specification), which may or may not be accurate. - * It corresponds to the latency for accessing the memory of one node - * from a core in another node. - * - * Values are normalized to get 1.0 as the minimal value in the matrix. - * Latency from i-th to j-th object is stored in slot i*nbobjs+j. - */ - float latency_max; /**< \brief The maximal value in the latency matrix. */ - float latency_base; /**< \brief The multiplier that should be applied to latency matrix - * to retrieve the original OS-provided latencies. - * Usually 10 on Linux since ACPI SLIT uses 10 for local latency. - */ -}; - -/** \brief Object info - * - * \sa hwlocality_info_attr - */ -struct hwloc_obj_info_s { - char *name; /**< \brief Info name */ - char *value; /**< \brief Info value */ -}; - -/** @} */ - - - -/** \defgroup hwlocality_creation Topology Creation and Destruction - * @{ - */ - -struct hwloc_topology; -/** \brief Topology context - * - * To be initialized with hwloc_topology_init() and built with hwloc_topology_load(). - */ -typedef struct hwloc_topology * hwloc_topology_t; - -/** \brief Allocate a topology context. - * - * \param[out] topologyp is assigned a pointer to the new allocated context. - * - * \return 0 on success, -1 on error. - */ -HWLOC_DECLSPEC int hwloc_topology_init (hwloc_topology_t *topologyp); - -/** \brief Build the actual topology - * - * Build the actual topology once initialized with hwloc_topology_init() and - * tuned with \ref hwlocality_configuration routines. - * No other routine may be called earlier using this topology context. - * - * \param topology is the topology to be loaded with objects. - * - * \return 0 on success, -1 on error. - * - * \note On failure, the topology is reinitialized. It should be either - * destroyed with hwloc_topology_destroy() or configured and loaded again. - * - * \note This function may be called only once per topology. - * - * \sa hwlocality_configuration - */ -HWLOC_DECLSPEC int hwloc_topology_load(hwloc_topology_t topology); - -/** \brief Terminate and free a topology context - * - * \param topology is the topology to be freed - */ -HWLOC_DECLSPEC void hwloc_topology_destroy (hwloc_topology_t topology); - -/** \brief Duplicate a topology. - * - * The entire topology structure as well as its objects - * are duplicated into a new one. - * - * This is useful for keeping a backup while modifying a topology. - * - * \note Object userdata is not duplicated since hwloc does not know what it point to. - * The objects of both old and new topologies will point to the same userdata. - */ -HWLOC_DECLSPEC int hwloc_topology_dup(hwloc_topology_t *newtopology, hwloc_topology_t oldtopology); - -/** \brief Run internal checks on a topology structure - * - * The program aborts if an inconsistency is detected in the given topology. - * - * \param topology is the topology to be checked - * - * \note This routine is only useful to developers. - * - * \note The input topology should have been previously loaded with - * hwloc_topology_load(). - */ -HWLOC_DECLSPEC void hwloc_topology_check(hwloc_topology_t topology); - -/** @} */ - - - -/** \defgroup hwlocality_configuration Topology Detection Configuration and Query - * - * Several functions can optionally be called between hwloc_topology_init() and - * hwloc_topology_load() to configure how the detection should be performed, - * e.g. to ignore some objects types, define a synthetic topology, etc. - * - * If none of them is called, the default is to detect all the objects of the - * machine that the caller is allowed to access. - * - * This default behavior may also be modified through environment variables - * if the application did not modify it already. - * Setting HWLOC_XMLFILE in the environment enforces the discovery from a XML - * file as if hwloc_topology_set_xml() had been called. - * HWLOC_FSROOT switches to reading the topology from the specified Linux - * filesystem root as if hwloc_topology_set_fsroot() had been called. - * Finally, HWLOC_THISSYSTEM enforces the return value of - * hwloc_topology_is_thissystem(). - * - * @{ - */ - -/** \brief Ignore an object type. - * - * Ignore all objects from the given type. - * The bottom-level type ::HWLOC_OBJ_PU may not be ignored. - * The top-level object of the hierarchy will never be ignored, even if this function - * succeeds. - * Group objects are always ignored if they do not bring any structure - * since they are designed to add structure to the topology. - * I/O objects may not be ignored, topology flags should be used to configure - * their discovery instead. - */ -HWLOC_DECLSPEC int hwloc_topology_ignore_type(hwloc_topology_t topology, hwloc_obj_type_t type); - -/** \brief Ignore an object type if it does not bring any structure. - * - * Ignore all objects from the given type as long as they do not bring any structure: - * Each ignored object should have a single children or be the only child of its parent. - * The bottom-level type ::HWLOC_OBJ_PU may not be ignored. - * I/O objects may not be ignored, topology flags should be used to configure - * their discovery instead. - */ -HWLOC_DECLSPEC int hwloc_topology_ignore_type_keep_structure(hwloc_topology_t topology, hwloc_obj_type_t type); - -/** \brief Ignore all objects that do not bring any structure. - * - * Ignore all objects that do not bring any structure: - * This is equivalent to calling hwloc_topology_ignore_type_keep_structure() - * for all object types. - */ -HWLOC_DECLSPEC int hwloc_topology_ignore_all_keep_structure(hwloc_topology_t topology); - -/** \brief Flags to be set onto a topology context before load. - * - * Flags should be given to hwloc_topology_set_flags(). - * They may also be returned by hwloc_topology_get_flags(). - */ -enum hwloc_topology_flags_e { - /** \brief Detect the whole system, ignore reservations and offline settings. - * - * Gather all resources, even if some were disabled by the administrator. - * For instance, ignore Linux Cgroup/Cpusets and gather all processors and memory nodes, - * and ignore the fact that some resources may be offline. - * - * When this flag is not set, PUs that are disallowed are not added to the topology. - * Parent objects (package, core, cache, etc.) are added only if some of their children are allowed. - * NUMA nodes are always added but their available memory is set to 0 when disallowed. - * \hideinitializer - */ - HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM = (1UL<<0), - - /** \brief Assume that the selected backend provides the topology for the - * system on which we are running. - * - * This forces hwloc_topology_is_thissystem() to return 1, i.e. makes hwloc assume that - * the selected backend provides the topology for the system on which we are running, - * even if it is not the OS-specific backend but the XML backend for instance. - * This means making the binding functions actually call the OS-specific - * system calls and really do binding, while the XML backend would otherwise - * provide empty hooks just returning success. - * - * Setting the environment variable HWLOC_THISSYSTEM may also result in the - * same behavior. - * - * This can be used for efficiency reasons to first detect the topology once, - * save it to an XML file, and quickly reload it later through the XML - * backend, but still having binding functions actually do bind. - * \hideinitializer - */ - HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM = (1UL<<1), - - /** \brief Detect PCI devices. - * - * By default, I/O devices are ignored. This flag enables I/O device - * detection using the pci backend. Only the common PCI devices (GPUs, - * NICs, block devices, ...) and host bridges (objects that connect the host - * objects to an I/O subsystem) will be added to the topology. - * Additionally it also enables MemoryModule misc objects. - * Uncommon devices and other bridges (such as PCI-to-PCI bridges) will be - * ignored. - * \hideinitializer - */ - HWLOC_TOPOLOGY_FLAG_IO_DEVICES = (1UL<<2), - - /** \brief Detect PCI bridges. - * - * This flag should be combined with ::HWLOC_TOPOLOGY_FLAG_IO_DEVICES to enable - * the detection of both common devices and of all useful bridges (bridges that - * have at least one device behind them). - * \hideinitializer - */ - HWLOC_TOPOLOGY_FLAG_IO_BRIDGES = (1UL<<3), - - /** \brief Detect the whole PCI hierarchy. - * - * This flag enables detection of all I/O devices (even the uncommon ones - * such as DMA channels) and bridges (even those that have no device behind - * them) using the pci backend. - * This implies ::HWLOC_TOPOLOGY_FLAG_IO_DEVICES. - * \hideinitializer - */ - HWLOC_TOPOLOGY_FLAG_WHOLE_IO = (1UL<<4), - - /** \brief Detect instruction caches. - * - * This flag enables detection of Instruction caches, - * instead of only Data and Unified caches. - * \hideinitializer - */ - HWLOC_TOPOLOGY_FLAG_ICACHES = (1UL<<5) -}; - -/** \brief Set OR'ed flags to non-yet-loaded topology. - * - * Set a OR'ed set of ::hwloc_topology_flags_e onto a topology that was not yet loaded. - * - * If this function is called multiple times, the last invokation will erase - * and replace the set of flags that was previously set. - * - * The flags set in a topology may be retrieved with hwloc_topology_get_flags() - */ -HWLOC_DECLSPEC int hwloc_topology_set_flags (hwloc_topology_t topology, unsigned long flags); - -/** \brief Get OR'ed flags of a topology. - * - * Get the OR'ed set of ::hwloc_topology_flags_e of a topology. - * - * \return the flags previously set with hwloc_topology_set_flags(). - */ -HWLOC_DECLSPEC unsigned long hwloc_topology_get_flags (hwloc_topology_t topology); - -/** \brief Change which process the topology is viewed from - * - * On some systems, processes may have different views of the machine, for - * instance the set of allowed CPUs. By default, hwloc exposes the view from - * the current process. Calling hwloc_topology_set_pid() permits to make it - * expose the topology of the machine from the point of view of another - * process. - * - * \note \p hwloc_pid_t is \p pid_t on Unix platforms, - * and \p HANDLE on native Windows platforms. - * - * \note -1 is returned and errno is set to ENOSYS on platforms that do not - * support this feature. - */ -HWLOC_DECLSPEC int hwloc_topology_set_pid(hwloc_topology_t __hwloc_restrict topology, hwloc_pid_t pid); - -/** \brief Change the file-system root path when building the topology from sysfs/procfs. - * - * On Linux system, use sysfs and procfs files as if they were mounted on the given - * \p fsroot_path instead of the main file-system root. Setting the environment - * variable HWLOC_FSROOT may also result in this behavior. - * Not using the main file-system root causes hwloc_topology_is_thissystem() - * to return 0. - * - * Note that this function does not actually load topology - * information; it just tells hwloc where to load it from. You'll - * still need to invoke hwloc_topology_load() to actually load the - * topology information. - * - * \return -1 with errno set to ENOSYS on non-Linux and on Linux systems that - * do not support it. - * \return -1 with the appropriate errno if \p fsroot_path cannot be used. - * - * \note For convenience, this backend provides empty binding hooks which just - * return success. To have hwloc still actually call OS-specific hooks, the - * ::HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM has to be set to assert that the loaded - * file is really the underlying system. - * - * \note On success, the Linux component replaces the previously enabled - * component (if any), but the topology is not actually modified until - * hwloc_topology_load(). - */ -HWLOC_DECLSPEC int hwloc_topology_set_fsroot(hwloc_topology_t __hwloc_restrict topology, const char * __hwloc_restrict fsroot_path); - -/** \brief Enable synthetic topology. - * - * Gather topology information from the given \p description, - * a space-separated string of numbers describing - * the arity of each level. - * Each number may be prefixed with a type and a colon to enforce the type - * of a level. If only some level types are enforced, hwloc will try to - * choose the other types according to usual topologies, but it may fail - * and you may have to specify more level types manually. - * See also the \ref synthetic. - * - * If \p description was properly parsed and describes a valid topology - * configuration, this function returns 0. - * Otherwise -1 is returned and errno is set to EINVAL. - * - * Note that this function does not actually load topology - * information; it just tells hwloc where to load it from. You'll - * still need to invoke hwloc_topology_load() to actually load the - * topology information. - * - * \note For convenience, this backend provides empty binding hooks which just - * return success. - * - * \note On success, the synthetic component replaces the previously enabled - * component (if any), but the topology is not actually modified until - * hwloc_topology_load(). - */ -HWLOC_DECLSPEC int hwloc_topology_set_synthetic(hwloc_topology_t __hwloc_restrict topology, const char * __hwloc_restrict description); - -/** \brief Enable XML-file based topology. - * - * Gather topology information from the XML file given at \p xmlpath. - * Setting the environment variable HWLOC_XMLFILE may also result in this behavior. - * This file may have been generated earlier with hwloc_topology_export_xml() - * or lstopo file.xml. - * - * Note that this function does not actually load topology - * information; it just tells hwloc where to load it from. You'll - * still need to invoke hwloc_topology_load() to actually load the - * topology information. - * - * \return -1 with errno set to EINVAL on failure to read the XML file. - * - * \note See also hwloc_topology_set_userdata_import_callback() - * for importing application-specific object userdata. - * - * \note For convenience, this backend provides empty binding hooks which just - * return success. To have hwloc still actually call OS-specific hooks, the - * ::HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM has to be set to assert that the loaded - * file is really the underlying system. - * - * \note On success, the XML component replaces the previously enabled - * component (if any), but the topology is not actually modified until - * hwloc_topology_load(). - */ -HWLOC_DECLSPEC int hwloc_topology_set_xml(hwloc_topology_t __hwloc_restrict topology, const char * __hwloc_restrict xmlpath); - -/** \brief Enable XML based topology using a memory buffer (instead of - * a file, as with hwloc_topology_set_xml()). - * - * Gather topology information from the XML memory buffer given at \p - * buffer and of length \p size. This buffer may have been filled - * earlier with hwloc_topology_export_xmlbuffer(). - * - * Note that this function does not actually load topology - * information; it just tells hwloc where to load it from. You'll - * still need to invoke hwloc_topology_load() to actually load the - * topology information. - * - * \return -1 with errno set to EINVAL on failure to read the XML buffer. - * - * \note See also hwloc_topology_set_userdata_import_callback() - * for importing application-specific object userdata. - * - * \note For convenience, this backend provides empty binding hooks which just - * return success. To have hwloc still actually call OS-specific hooks, the - * ::HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM has to be set to assert that the loaded - * file is really the underlying system. - * - * \note On success, the XML component replaces the previously enabled - * component (if any), but the topology is not actually modified until - * hwloc_topology_load(). - */ -HWLOC_DECLSPEC int hwloc_topology_set_xmlbuffer(hwloc_topology_t __hwloc_restrict topology, const char * __hwloc_restrict buffer, int size); - -/** \brief Prepare the topology for custom assembly. - * - * The topology then contains a single root object. - * It must then be built by inserting other topologies with - * hwloc_custom_insert_topology() or single objects with - * hwloc_custom_insert_group_object_by_parent(). - * hwloc_topology_load() must be called to finalize the new - * topology as usual. - * - * \note If nothing is inserted in the topology, - * hwloc_topology_load() will fail with errno set to EINVAL. - * - * \note The cpuset and nodeset of the root object are NULL because - * these sets are meaningless when assembling multiple topologies. - * - * \note On success, the custom component replaces the previously enabled - * component (if any), but the topology is not actually modified until - * hwloc_topology_load(). - */ -HWLOC_DECLSPEC int hwloc_topology_set_custom(hwloc_topology_t topology); - -/** \brief Provide a distance matrix. - * - * Provide the matrix of distances between a set of objects of the given type. - * \p nbobjs must be at least 2. - * The set may or may not contain all the existing objects of this type. - * The objects are specified by their OS/physical index in the \p os_index - * array. The \p distances matrix follows the same order. - * The distance from object i to object j in the i*nbobjs+j. - * - * A single latency matrix may be defined for each type. - * If another distance matrix already exists for the given type, - * either because the user specified it or because the OS offers it, - * it will be replaced by the given one. - * If \p nbobjs is \c 0, \p os_index is \c NULL and \p distances is \c NULL, - * the existing distance matrix for the given type is removed. - * - * \note Distance matrices are ignored in multi-node topologies. - */ -HWLOC_DECLSPEC int hwloc_topology_set_distance_matrix(hwloc_topology_t __hwloc_restrict topology, - hwloc_obj_type_t type, unsigned nbobjs, - unsigned *os_index, float *distances); - -/** \brief Does the topology context come from this system? - * - * \return 1 if this topology context was built using the system - * running this program. - * \return 0 instead (for instance if using another file-system root, - * a XML topology file, or a synthetic topology). - */ -HWLOC_DECLSPEC int hwloc_topology_is_thissystem(hwloc_topology_t __hwloc_restrict topology) __hwloc_attribute_pure; - -/** \brief Flags describing actual discovery support for this topology. */ -struct hwloc_topology_discovery_support { - /** \brief Detecting the number of PU objects is supported. */ - unsigned char pu; -}; - -/** \brief Flags describing actual PU binding support for this topology. - * - * A flag may be set even if the feature isn't supported in all cases - * (e.g. binding to random sets of non-contiguous objects). - */ -struct hwloc_topology_cpubind_support { - /** Binding the whole current process is supported. */ - unsigned char set_thisproc_cpubind; - /** Getting the binding of the whole current process is supported. */ - unsigned char get_thisproc_cpubind; - /** Binding a whole given process is supported. */ - unsigned char set_proc_cpubind; - /** Getting the binding of a whole given process is supported. */ - unsigned char get_proc_cpubind; - /** Binding the current thread only is supported. */ - unsigned char set_thisthread_cpubind; - /** Getting the binding of the current thread only is supported. */ - unsigned char get_thisthread_cpubind; - /** Binding a given thread only is supported. */ - unsigned char set_thread_cpubind; - /** Getting the binding of a given thread only is supported. */ - unsigned char get_thread_cpubind; - /** Getting the last processors where the whole current process ran is supported */ - unsigned char get_thisproc_last_cpu_location; - /** Getting the last processors where a whole process ran is supported */ - unsigned char get_proc_last_cpu_location; - /** Getting the last processors where the current thread ran is supported */ - unsigned char get_thisthread_last_cpu_location; -}; - -/** \brief Flags describing actual memory binding support for this topology. - * - * A flag may be set even if the feature isn't supported in all cases - * (e.g. binding to random sets of non-contiguous objects). - */ -struct hwloc_topology_membind_support { - /** Binding the whole current process is supported. */ - unsigned char set_thisproc_membind; - /** Getting the binding of the whole current process is supported. */ - unsigned char get_thisproc_membind; - /** Binding a whole given process is supported. */ - unsigned char set_proc_membind; - /** Getting the binding of a whole given process is supported. */ - unsigned char get_proc_membind; - /** Binding the current thread only is supported. */ - unsigned char set_thisthread_membind; - /** Getting the binding of the current thread only is supported. */ - unsigned char get_thisthread_membind; - /** Binding a given memory area is supported. */ - unsigned char set_area_membind; - /** Getting the binding of a given memory area is supported. */ - unsigned char get_area_membind; - /** Allocating a bound memory area is supported. */ - unsigned char alloc_membind; - /** First-touch policy is supported. */ - unsigned char firsttouch_membind; - /** Bind policy is supported. */ - unsigned char bind_membind; - /** Interleave policy is supported. */ - unsigned char interleave_membind; - /** Replication policy is supported. */ - unsigned char replicate_membind; - /** Next-touch migration policy is supported. */ - unsigned char nexttouch_membind; - /** Migration flags is supported. */ - unsigned char migrate_membind; - /** Getting the last NUMA nodes where a memory area was allocated is supported */ - unsigned char get_area_memlocation; -}; - -/** \brief Set of flags describing actual support for this topology. - * - * This is retrieved with hwloc_topology_get_support() and will be valid until - * the topology object is destroyed. Note: the values are correct only after - * discovery. - */ -struct hwloc_topology_support { - struct hwloc_topology_discovery_support *discovery; - struct hwloc_topology_cpubind_support *cpubind; - struct hwloc_topology_membind_support *membind; -}; - -/** \brief Retrieve the topology support. - * - * Each flag indicates whether a feature is supported. - * If set to 0, the feature is not supported. - * If set to 1, the feature is supported, but the corresponding - * call may still fail in some corner cases. - * - * These features are also listed by hwloc-info \--support - */ -HWLOC_DECLSPEC const struct hwloc_topology_support *hwloc_topology_get_support(hwloc_topology_t __hwloc_restrict topology); - -/** \brief Set the topology-specific userdata pointer. - * - * Each topology may store one application-given private data pointer. - * It is initialized to \c NULL. - * hwloc will never modify it. - * - * Use it as you wish, after hwloc_topology_init() and until hwloc_topolog_destroy(). - * - * This pointer is not exported to XML. - */ -HWLOC_DECLSPEC void hwloc_topology_set_userdata(hwloc_topology_t topology, const void *userdata); - -/** \brief Retrieve the topology-specific userdata pointer. - * - * Retrieve the application-given private data pointer that was - * previously set with hwloc_topology_set_userdata(). - */ -HWLOC_DECLSPEC void * hwloc_topology_get_userdata(hwloc_topology_t topology); - -/** @} */ - - - -/** \defgroup hwlocality_levels Object levels, depths and types - * @{ - * - * Be sure to see the figure in \ref termsanddefs that shows a - * complete topology tree, including depths, child/sibling/cousin - * relationships, and an example of an asymmetric topology where one - * package has fewer caches than its peers. - */ - -/** \brief Get the depth of the hierarchical tree of objects. - * - * This is the depth of ::HWLOC_OBJ_PU objects plus one. - */ -HWLOC_DECLSPEC unsigned hwloc_topology_get_depth(hwloc_topology_t __hwloc_restrict topology) __hwloc_attribute_pure; - -/** \brief Returns the depth of objects of type \p type. - * - * If no object of this type is present on the underlying architecture, or if - * the OS doesn't provide this kind of information, the function returns - * ::HWLOC_TYPE_DEPTH_UNKNOWN. - * - * If type is absent but a similar type is acceptable, see also - * hwloc_get_type_or_below_depth() and hwloc_get_type_or_above_depth(). - * - * If some objects of the given type exist in different levels, - * for instance L1 and L2 caches, or L1i and L1d caches, - * the function returns ::HWLOC_TYPE_DEPTH_MULTIPLE. - * See hwloc_get_cache_type_depth() in hwloc/helper.h to better handle this - * case. - * - * If an I/O object type is given, the function returns a virtual value - * because I/O objects are stored in special levels that are not CPU-related. - * This virtual depth may be passed to other hwloc functions such as - * hwloc_get_obj_by_depth() but it should not be considered as an actual - * depth by the application. In particular, it should not be compared with - * any other object depth or with the entire topology depth. - */ -HWLOC_DECLSPEC int hwloc_get_type_depth (hwloc_topology_t topology, hwloc_obj_type_t type); - -enum hwloc_get_type_depth_e { - HWLOC_TYPE_DEPTH_UNKNOWN = -1, /**< \brief No object of given type exists in the topology. \hideinitializer */ - HWLOC_TYPE_DEPTH_MULTIPLE = -2, /**< \brief Objects of given type exist at different depth in the topology. \hideinitializer */ - HWLOC_TYPE_DEPTH_BRIDGE = -3, /**< \brief Virtual depth for bridge object level. \hideinitializer */ - HWLOC_TYPE_DEPTH_PCI_DEVICE = -4, /**< \brief Virtual depth for PCI device object level. \hideinitializer */ - HWLOC_TYPE_DEPTH_OS_DEVICE = -5 /**< \brief Virtual depth for software device object level. \hideinitializer */ -}; - -/** \brief Returns the depth of objects of type \p type or below - * - * If no object of this type is present on the underlying architecture, the - * function returns the depth of the first "present" object typically found - * inside \p type. - * - * If some objects of the given type exist in different levels, for instance - * L1 and L2 caches, the function returns ::HWLOC_TYPE_DEPTH_MULTIPLE. - */ -static __hwloc_inline int -hwloc_get_type_or_below_depth (hwloc_topology_t topology, hwloc_obj_type_t type) __hwloc_attribute_pure; - -/** \brief Returns the depth of objects of type \p type or above - * - * If no object of this type is present on the underlying architecture, the - * function returns the depth of the first "present" object typically - * containing \p type. - * - * If some objects of the given type exist in different levels, for instance - * L1 and L2 caches, the function returns ::HWLOC_TYPE_DEPTH_MULTIPLE. - */ -static __hwloc_inline int -hwloc_get_type_or_above_depth (hwloc_topology_t topology, hwloc_obj_type_t type) __hwloc_attribute_pure; - -/** \brief Returns the type of objects at depth \p depth. - * - * \p depth should between 0 and hwloc_topology_get_depth()-1. - * - * \return -1 if depth \p depth does not exist. - */ -HWLOC_DECLSPEC hwloc_obj_type_t hwloc_get_depth_type (hwloc_topology_t topology, unsigned depth) __hwloc_attribute_pure; - -/** \brief Returns the width of level at depth \p depth. - */ -HWLOC_DECLSPEC unsigned hwloc_get_nbobjs_by_depth (hwloc_topology_t topology, unsigned depth) __hwloc_attribute_pure; - -/** \brief Returns the width of level type \p type - * - * If no object for that type exists, 0 is returned. - * If there are several levels with objects of that type, -1 is returned. - */ -static __hwloc_inline int -hwloc_get_nbobjs_by_type (hwloc_topology_t topology, hwloc_obj_type_t type) __hwloc_attribute_pure; - -/** \brief Returns the top-object of the topology-tree. - * - * Its type is typically ::HWLOC_OBJ_MACHINE but it could be different - * for complex topologies. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_root_obj (hwloc_topology_t topology) __hwloc_attribute_pure; - -/** \brief Returns the topology object at logical index \p idx from depth \p depth */ -HWLOC_DECLSPEC hwloc_obj_t hwloc_get_obj_by_depth (hwloc_topology_t topology, unsigned depth, unsigned idx) __hwloc_attribute_pure; - -/** \brief Returns the topology object at logical index \p idx with type \p type - * - * If no object for that type exists, \c NULL is returned. - * If there are several levels with objects of that type, \c NULL is returned - * and ther caller may fallback to hwloc_get_obj_by_depth(). - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_obj_by_type (hwloc_topology_t topology, hwloc_obj_type_t type, unsigned idx) __hwloc_attribute_pure; - -/** \brief Returns the next object at depth \p depth. - * - * If \p prev is \c NULL, return the first object at depth \p depth. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_next_obj_by_depth (hwloc_topology_t topology, unsigned depth, hwloc_obj_t prev); - -/** \brief Returns the next object of type \p type. - * - * If \p prev is \c NULL, return the first object at type \p type. If - * there are multiple or no depth for given type, return \c NULL and - * let the caller fallback to hwloc_get_next_obj_by_depth(). - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_next_obj_by_type (hwloc_topology_t topology, hwloc_obj_type_t type, - hwloc_obj_t prev); - -/** @} */ - - - -/** \defgroup hwlocality_object_strings Converting between Object Types, Sets and Attributes, and Strings - * @{ - */ - -/** \brief Return a constant stringified object type. - * - * This function is the basic way to convert a generic type into a string. - * - * hwloc_obj_type_snprintf() may return a more precise output for a specific - * object, but it requires the caller to provide the output buffer. - */ -HWLOC_DECLSPEC const char * hwloc_obj_type_string (hwloc_obj_type_t type) __hwloc_attribute_const; - -/** \brief Stringify the type of a given topology object into a human-readable form. - * - * Contrary to hwloc_obj_type_string(), this function includes object-specific - * attributes (such as the Group depth, the Bridge type, or OS device type) - * in the output, and it requires the caller to provide the output buffer. - * - * The output is guaranteed to be the same for all objects of a same topology level. - * - * If \p size is 0, \p string may safely be \c NULL. - * - * \return the number of character that were actually written if not truncating, - * or that would have been written (not including the ending \\0). - */ -HWLOC_DECLSPEC int hwloc_obj_type_snprintf(char * __hwloc_restrict string, size_t size, hwloc_obj_t obj, - int verbose); - -/** \brief Stringify the attributes of a given topology object into a human-readable form. - * - * Attribute values are separated by \p separator. - * - * Only the major attributes are printed in non-verbose mode. - * - * If \p size is 0, \p string may safely be \c NULL. - * - * \return the number of character that were actually written if not truncating, - * or that would have been written (not including the ending \\0). - */ -HWLOC_DECLSPEC int hwloc_obj_attr_snprintf(char * __hwloc_restrict string, size_t size, hwloc_obj_t obj, const char * __hwloc_restrict separator, - int verbose); - -/** \brief Stringify the cpuset containing a set of objects. - * - * If \p size is 0, \p string may safely be \c NULL. - * - * \return the number of character that were actually written if not truncating, - * or that would have been written (not including the ending \\0). - */ -HWLOC_DECLSPEC int hwloc_obj_cpuset_snprintf(char * __hwloc_restrict str, size_t size, size_t nobj, const hwloc_obj_t * __hwloc_restrict objs); - -/** \brief Return an object type and attributes from a type string. - * - * Convert strings such as "Package" or "Cache" into the corresponding types. - * Matching is case-insensitive, and only the first letters are actually - * required to match. - * - * This function is guaranteed to match any string returned by hwloc_obj_type_string() - * or hwloc_obj_type_snprintf(). - * - * Types that have specific attributes, for instance caches and groups, - * may be returned in \p depthattrp and \p typeattrp. They are ignored - * when these pointers are \c NULL. - * - * For instance "L2i" or "L2iCache" would return - * type HWLOC_OBJ_CACHE in \p typep, 2 in \p depthattrp, - * and HWLOC_OBJ_CACHE_TYPE_INSTRUCTION in \p typeattrp - * (this last pointer should point to a hwloc_obj_cache_type_t). - * "Group3" would return type HWLOC_OBJ_GROUP type and 3 in \p depthattrp. - * Attributes that are not specified in the string (for instance "Group" - * without a depth, or "L2Cache" without a cache type) are set to -1. - * - * \p typeattrp is only filled if the size specified in \p typeattrsize - * is large enough. It is currently only used for caches, and the required - * size is at least the size of hwloc_obj_cache_type_t. - * - * \return 0 if a type was correctly identified, otherwise -1. - * - * \note This is an extended version of the now deprecated hwloc_obj_type_of_string() - */ -HWLOC_DECLSPEC int hwloc_obj_type_sscanf(const char *string, - hwloc_obj_type_t *typep, - int *depthattrp, - void *typeattrp, size_t typeattrsize); - -/** @} */ - - - -/** \defgroup hwlocality_info_attr Consulting and Adding Key-Value Info Attributes - * - * @{ - */ - -/** \brief Search the given key name in object infos and return the corresponding value. - * - * If multiple keys match the given name, only the first one is returned. - * - * \return \c NULL if no such key exists. - */ -static __hwloc_inline const char * -hwloc_obj_get_info_by_name(hwloc_obj_t obj, const char *name) __hwloc_attribute_pure; - -/** \brief Add the given info name and value pair to the given object. - * - * The info is appended to the existing info array even if another key - * with the same name already exists. - * - * The input strings are copied before being added in the object infos. - * - * \note This function may be used to enforce object colors in the lstopo - * graphical output by using "lstopoStyle" as a name and "Background=#rrggbb" - * as a value. See CUSTOM COLORS in the lstopo(1) manpage for details. - * - * \note If \p value contains some non-printable characters, they will - * be dropped when exporting to XML, see hwloc_topology_export_xml(). - */ -HWLOC_DECLSPEC void hwloc_obj_add_info(hwloc_obj_t obj, const char *name, const char *value); - -/** @} */ - - - -/** \defgroup hwlocality_cpubinding CPU binding - * - * It is often useful to call hwloc_bitmap_singlify() first so that a single CPU - * remains in the set. This way, the process will not even migrate between - * different CPUs inside the given set. - * Some operating systems also only support that kind of binding. - * - * Some operating systems do not provide all hwloc-supported - * mechanisms to bind processes, threads, etc. - * hwloc_topology_get_support() may be used to query about the actual CPU - * binding support in the currently used operating system. - * - * When the requested binding operation is not available and the - * ::HWLOC_CPUBIND_STRICT flag was passed, the function returns -1. - * \p errno is set to \c ENOSYS when it is not possible to bind the requested kind of object - * processes/threads. errno is set to \c EXDEV when the requested cpuset - * can not be enforced (e.g. some systems only allow one CPU, and some - * other systems only allow one NUMA node). - * - * If ::HWLOC_CPUBIND_STRICT was not passed, the function may fail as well, - * or the operating system may use a slightly different operation - * (with side-effects, smaller binding set, etc.) - * when the requested operation is not exactly supported. - * - * The most portable version that should be preferred over the others, - * whenever possible, is the following one which just binds the current program, - * assuming it is single-threaded: - * - * \code - * hwloc_set_cpubind(topology, set, 0), - * \endcode - * - * If the program may be multithreaded, the following one should be preferred - * to only bind the current thread: - * - * \code - * hwloc_set_cpubind(topology, set, HWLOC_CPUBIND_THREAD), - * \endcode - * - * \sa Some example codes are available under doc/examples/ in the source tree. - * - * \note To unbind, just call the binding function with either a full cpuset or - * a cpuset equal to the system cpuset. - * - * \note On some operating systems, CPU binding may have effects on memory binding, see - * ::HWLOC_CPUBIND_NOMEMBIND - * - * \note Running lstopo \--top or hwloc-ps can be a very convenient tool to check - * how binding actually happened. - * @{ - */ - -/** \brief Process/Thread binding flags. - * - * These bit flags can be used to refine the binding policy. - * - * The default (0) is to bind the current process, assumed to be - * single-threaded, in a non-strict way. This is the most portable - * way to bind as all operating systems usually provide it. - * - * \note Not all systems support all kinds of binding. See the - * "Detailed Description" section of \ref hwlocality_cpubinding for a - * description of errors that can occur. - */ -typedef enum { - /** \brief Bind all threads of the current (possibly) multithreaded process. - * \hideinitializer */ - HWLOC_CPUBIND_PROCESS = (1<<0), - - /** \brief Bind current thread of current process. - * \hideinitializer */ - HWLOC_CPUBIND_THREAD = (1<<1), - - /** \brief Request for strict binding from the OS. - * - * By default, when the designated CPUs are all busy while other - * CPUs are idle, operating systems may execute the thread/process - * on those other CPUs instead of the designated CPUs, to let them - * progress anyway. Strict binding means that the thread/process - * will _never_ execute on other cpus than the designated CPUs, even - * when those are busy with other tasks and other CPUs are idle. - * - * \note Depending on the operating system, strict binding may not - * be possible (e.g., the OS does not implement it) or not allowed - * (e.g., for an administrative reasons), and the function will fail - * in that case. - * - * When retrieving the binding of a process, this flag checks - * whether all its threads actually have the same binding. If the - * flag is not given, the binding of each thread will be - * accumulated. - * - * \note This flag is meaningless when retrieving the binding of a - * thread. - * \hideinitializer - */ - HWLOC_CPUBIND_STRICT = (1<<2), - - /** \brief Avoid any effect on memory binding - * - * On some operating systems, some CPU binding function would also - * bind the memory on the corresponding NUMA node. It is often not - * a problem for the application, but if it is, setting this flag - * will make hwloc avoid using OS functions that would also bind - * memory. This will however reduce the support of CPU bindings, - * i.e. potentially return -1 with errno set to ENOSYS in some - * cases. - * - * This flag is only meaningful when used with functions that set - * the CPU binding. It is ignored when used with functions that get - * CPU binding information. - * \hideinitializer - */ - HWLOC_CPUBIND_NOMEMBIND = (1<<3) -} hwloc_cpubind_flags_t; - -/** \brief Bind current process or thread on cpus given in physical bitmap \p set. - * - * \return -1 with errno set to ENOSYS if the action is not supported - * \return -1 with errno set to EXDEV if the binding cannot be enforced - */ -HWLOC_DECLSPEC int hwloc_set_cpubind(hwloc_topology_t topology, hwloc_const_cpuset_t set, int flags); - -/** \brief Get current process or thread binding. - * - * Writes into \p set the physical cpuset which the process or thread (according to \e - * flags) was last bound to. - */ -HWLOC_DECLSPEC int hwloc_get_cpubind(hwloc_topology_t topology, hwloc_cpuset_t set, int flags); - -/** \brief Bind a process \p pid on cpus given in physical bitmap \p set. - * - * \note \p hwloc_pid_t is \p pid_t on Unix platforms, - * and \p HANDLE on native Windows platforms. - * - * \note As a special case on Linux, if a tid (thread ID) is supplied - * instead of a pid (process ID) and ::HWLOC_CPUBIND_THREAD is passed in flags, - * the binding is applied to that specific thread. - * - * \note On non-Linux systems, ::HWLOC_CPUBIND_THREAD can not be used in \p flags. - */ -HWLOC_DECLSPEC int hwloc_set_proc_cpubind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_cpuset_t set, int flags); - -/** \brief Get the current physical binding of process \p pid. - * - * \note \p hwloc_pid_t is \p pid_t on Unix platforms, - * and \p HANDLE on native Windows platforms. - * - * \note As a special case on Linux, if a tid (thread ID) is supplied - * instead of a pid (process ID) and ::HWLOC_CPUBIND_THREAD is passed in flags, - * the binding for that specific thread is returned. - * - * \note On non-Linux systems, ::HWLOC_CPUBIND_THREAD can not be used in \p flags. - */ -HWLOC_DECLSPEC int hwloc_get_proc_cpubind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_cpuset_t set, int flags); - -#ifdef hwloc_thread_t -/** \brief Bind a thread \p thread on cpus given in physical bitmap \p set. - * - * \note \p hwloc_thread_t is \p pthread_t on Unix platforms, - * and \p HANDLE on native Windows platforms. - * - * \note ::HWLOC_CPUBIND_PROCESS can not be used in \p flags. - */ -HWLOC_DECLSPEC int hwloc_set_thread_cpubind(hwloc_topology_t topology, hwloc_thread_t thread, hwloc_const_cpuset_t set, int flags); -#endif - -#ifdef hwloc_thread_t -/** \brief Get the current physical binding of thread \p tid. - * - * \note \p hwloc_thread_t is \p pthread_t on Unix platforms, - * and \p HANDLE on native Windows platforms. - * - * \note ::HWLOC_CPUBIND_PROCESS can not be used in \p flags. - */ -HWLOC_DECLSPEC int hwloc_get_thread_cpubind(hwloc_topology_t topology, hwloc_thread_t thread, hwloc_cpuset_t set, int flags); -#endif - -/** \brief Get the last physical CPU where the current process or thread ran. - * - * The operating system may move some tasks from one processor - * to another at any time according to their binding, - * so this function may return something that is already - * outdated. - * - * \p flags can include either ::HWLOC_CPUBIND_PROCESS or ::HWLOC_CPUBIND_THREAD to - * specify whether the query should be for the whole process (union of all CPUs - * on which all threads are running), or only the current thread. If the - * process is single-threaded, flags can be set to zero to let hwloc use - * whichever method is available on the underlying OS. - */ -HWLOC_DECLSPEC int hwloc_get_last_cpu_location(hwloc_topology_t topology, hwloc_cpuset_t set, int flags); - -/** \brief Get the last physical CPU where a process ran. - * - * The operating system may move some tasks from one processor - * to another at any time according to their binding, - * so this function may return something that is already - * outdated. - * - * \note \p hwloc_pid_t is \p pid_t on Unix platforms, - * and \p HANDLE on native Windows platforms. - * - * \note As a special case on Linux, if a tid (thread ID) is supplied - * instead of a pid (process ID) and ::HWLOC_CPUBIND_THREAD is passed in flags, - * the last CPU location of that specific thread is returned. - * - * \note On non-Linux systems, ::HWLOC_CPUBIND_THREAD can not be used in \p flags. - */ -HWLOC_DECLSPEC int hwloc_get_proc_last_cpu_location(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_cpuset_t set, int flags); - -/** @} */ - - - -/** \defgroup hwlocality_membinding Memory binding - * - * Memory binding can be done three ways: - * - * - explicit memory allocation thanks to hwloc_alloc_membind() and friends: - * the binding will have effect on the memory allocated by these functions. - * - implicit memory binding through binding policy: hwloc_set_membind() and - * friends only define the current policy of the process, which will be - * applied to the subsequent calls to malloc() and friends. - * - migration of existing memory ranges, thanks to hwloc_set_area_membind() - * and friends, which move already-allocated data. - * - * Not all operating systems support all three ways. - * hwloc_topology_get_support() may be used to query about the actual memory - * binding support in the currently used operating system. - * - * When the requested binding operation is not available and the - * ::HWLOC_MEMBIND_STRICT flag was passed, the function returns -1. - * \p errno will be set to \c ENOSYS when the system does support - * the specified action or policy - * (e.g., some systems only allow binding memory on a per-thread - * basis, whereas other systems only allow binding memory for all - * threads in a process). - * \p errno will be set to EXDEV when the requested set can not be enforced - * (e.g., some systems only allow binding memory to a single NUMA node). - * - * If ::HWLOC_MEMBIND_STRICT was not passed, the function may fail as well, - * or the operating system may use a slightly different operation - * (with side-effects, smaller binding set, etc.) - * when the requested operation is not exactly supported. - * - * The most portable form that should be preferred over the others - * whenever possible is as follows. - * It allocates some memory hopefully bound to the specified set. - * To do so, hwloc will possibly have to change the current memory - * binding policy in order to actually get the memory bound, if the OS - * does not provide any other way to simply allocate bound memory - * without changing the policy for all allocations. That is the - * difference with hwloc_alloc_membind(), which will never change the - * current memory binding policy. - * - * \code - * hwloc_alloc_membind_policy(topology, size, set, - * HWLOC_MEMBIND_BIND, 0); - * \endcode - * - * Each hwloc memory binding function is available in two forms: one - * that takes a bitmap argument (a CPU set by default, or a NUMA memory - * node set if the flag ::HWLOC_MEMBIND_BYNODESET is specified), - * and another one (whose name ends with _nodeset) that always takes - * a NUMA memory node set. - * See \ref hwlocality_object_sets and \ref hwlocality_bitmap for a - * discussion of CPU sets and NUMA memory node sets. - * It is also possible to convert between CPU set and node set using - * hwloc_cpuset_to_nodeset() or hwloc_cpuset_from_nodeset(). - * - * Memory binding by CPU set cannot work for CPU-less NUMA memory nodes. - * Binding by nodeset should therefore be preferred whenever possible. - * - * \sa Some example codes are available under doc/examples/ in the source tree. - * - * \note On some operating systems, memory binding affects the CPU - * binding; see ::HWLOC_MEMBIND_NOCPUBIND - * @{ - */ - -/** \brief Memory binding policy. - * - * These constants can be used to choose the binding policy. Only one policy can - * be used at a time (i.e., the values cannot be OR'ed together). - * - * Not all systems support all kinds of binding. - * hwloc_topology_get_support() may be used to query about the actual memory - * binding policy support in the currently used operating system. - * See the "Detailed Description" section of \ref hwlocality_membinding - * for a description of errors that can occur. - */ -typedef enum { - /** \brief Reset the memory allocation policy to the system default. - * Depending on the operating system, this may correspond to - * ::HWLOC_MEMBIND_FIRSTTOUCH (Linux), - * or ::HWLOC_MEMBIND_BIND (AIX, HP-UX, OSF, Solaris, Windows). - * This policy is never returned by get membind functions when running - * on normal machines. - * It is only returned when binding hooks are empty because the topology - * was loaded from XML, or HWLOC_THISSYSTEM=0, etc. - * \hideinitializer */ - HWLOC_MEMBIND_DEFAULT = 0, - - /** \brief Allocate memory - * but do not immediately bind it to a specific locality. Instead, - * each page in the allocation is bound only when it is first - * touched. Pages are individually bound to the local NUMA node of - * the first thread that touches it. If there is not enough memory - * on the node, allocation may be done in the specified nodes - * before allocating on other nodes. - * \hideinitializer */ - HWLOC_MEMBIND_FIRSTTOUCH = 1, - - /** \brief Allocate memory on the specified nodes. - * \hideinitializer */ - HWLOC_MEMBIND_BIND = 2, - - /** \brief Allocate memory on the given nodes in an interleaved - * / round-robin manner. The precise layout of the memory across - * multiple NUMA nodes is OS/system specific. Interleaving can be - * useful when threads distributed across the specified NUMA nodes - * will all be accessing the whole memory range concurrently, since - * the interleave will then balance the memory references. - * \hideinitializer */ - HWLOC_MEMBIND_INTERLEAVE = 3, - - /** \brief Replicate memory on the given nodes; reads from this - * memory will attempt to be serviced from the NUMA node local to - * the reading thread. Replicating can be useful when multiple - * threads from the specified NUMA nodes will be sharing the same - * read-only data. - * - * This policy can only be used with existing memory allocations - * (i.e., the hwloc_set_*membind*() functions); it cannot be used - * with functions that allocate new memory (i.e., the hwloc_alloc*() - * functions). - * \hideinitializer */ - HWLOC_MEMBIND_REPLICATE = 4, - - /** \brief For each page bound with this policy, by next time - * it is touched (and next time only), it is moved from its current - * location to the local NUMA node of the thread where the memory - * reference occurred (if it needs to be moved at all). - * \hideinitializer */ - HWLOC_MEMBIND_NEXTTOUCH = 5, - - /** \brief Returned by get_membind() functions when multiple - * threads or parts of a memory area have differing memory binding - * policies. - * \hideinitializer */ - HWLOC_MEMBIND_MIXED = -1 -} hwloc_membind_policy_t; - -/** \brief Memory binding flags. - * - * These flags can be used to refine the binding policy. - * All flags can be logically OR'ed together with the exception of - * ::HWLOC_MEMBIND_PROCESS and ::HWLOC_MEMBIND_THREAD; - * these two flags are mutually exclusive. - * - * Not all systems support all kinds of binding. - * hwloc_topology_get_support() may be used to query about the actual memory - * binding support in the currently used operating system. - * See the "Detailed Description" section of \ref hwlocality_membinding - * for a description of errors that can occur. - */ -typedef enum { - /** \brief Set policy for all threads of the specified (possibly - * multithreaded) process. This flag is mutually exclusive with - * ::HWLOC_MEMBIND_THREAD. - * \hideinitializer */ - HWLOC_MEMBIND_PROCESS = (1<<0), - - /** \brief Set policy for a specific thread of the current process. - * This flag is mutually exclusive with ::HWLOC_MEMBIND_PROCESS. - * \hideinitializer */ - HWLOC_MEMBIND_THREAD = (1<<1), - - /** Request strict binding from the OS. The function will fail if - * the binding can not be guaranteed / completely enforced. - * - * This flag has slightly different meanings depending on which - * function it is used with. - * \hideinitializer */ - HWLOC_MEMBIND_STRICT = (1<<2), - - /** \brief Migrate existing allocated memory. If the memory cannot - * be migrated and the ::HWLOC_MEMBIND_STRICT flag is passed, an error - * will be returned. - * \hideinitializer */ - HWLOC_MEMBIND_MIGRATE = (1<<3), - - /** \brief Avoid any effect on CPU binding. - * - * On some operating systems, some underlying memory binding - * functions also bind the application to the corresponding CPU(s). - * Using this flag will cause hwloc to avoid using OS functions that - * could potentially affect CPU bindings. Note, however, that using - * NOCPUBIND may reduce hwloc's overall memory binding - * support. Specifically: some of hwloc's memory binding functions - * may fail with errno set to ENOSYS when used with NOCPUBIND. - * \hideinitializer - */ - HWLOC_MEMBIND_NOCPUBIND = (1<<4), - - /** \brief Consider the bitmap argument as a nodeset. - * - * Functions whose name ends with _nodeset() take a nodeset argument. - * Other functions take a bitmap argument that is considered a nodeset - * if this flag is given, or a cpuset otherwise. - * - * Memory binding by CPU set cannot work for CPU-less NUMA memory nodes. - * Binding by nodeset should therefore be preferred whenever possible. - */ - HWLOC_MEMBIND_BYNODESET = (1<<5) -} hwloc_membind_flags_t; - -/** \brief Set the default memory binding policy of the current - * process or thread to prefer the NUMA node(s) specified by \p nodeset - * - * If neither ::HWLOC_MEMBIND_PROCESS nor ::HWLOC_MEMBIND_THREAD is - * specified, the current process is assumed to be single-threaded. - * This is the most portable form as it permits hwloc to use either - * process-based OS functions or thread-based OS functions, depending - * on which are available. - * - * \return -1 with errno set to ENOSYS if the action is not supported - * \return -1 with errno set to EXDEV if the binding cannot be enforced - */ -HWLOC_DECLSPEC int hwloc_set_membind_nodeset(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags); - -/** \brief Set the default memory binding policy of the current - * process or thread to prefer the NUMA node(s) specified by \p set - * - * If neither ::HWLOC_MEMBIND_PROCESS nor ::HWLOC_MEMBIND_THREAD is - * specified, the current process is assumed to be single-threaded. - * This is the most portable form as it permits hwloc to use either - * process-based OS functions or thread-based OS functions, depending - * on which are available. - * - * If ::HWLOC_MEMBIND_BYNODESET is specified, set is considered a nodeset. - * Otherwise it's a cpuset. - * - * \return -1 with errno set to ENOSYS if the action is not supported - * \return -1 with errno set to EXDEV if the binding cannot be enforced - */ -HWLOC_DECLSPEC int hwloc_set_membind(hwloc_topology_t topology, hwloc_const_bitmap_t set, hwloc_membind_policy_t policy, int flags); - -/** \brief Query the default memory binding policy and physical locality of the - * current process or thread. - * - * This function has two output parameters: \p nodeset and \p policy. - * The values returned in these parameters depend on both the \p flags - * passed in and the current memory binding policies and nodesets in - * the queried target. - * - * Passing the ::HWLOC_MEMBIND_PROCESS flag specifies that the query - * target is the current policies and nodesets for all the threads in - * the current process. Passing ::HWLOC_MEMBIND_THREAD specifies that - * the query target is the current policy and nodeset for only the - * thread invoking this function. - * - * If neither of these flags are passed (which is the most portable - * method), the process is assumed to be single threaded. This allows - * hwloc to use either process-based OS functions or thread-based OS - * functions, depending on which are available. - * - * ::HWLOC_MEMBIND_STRICT is only meaningful when ::HWLOC_MEMBIND_PROCESS - * is also specified. In this case, hwloc will check the default - * memory policies and nodesets for all threads in the process. If - * they are not identical, -1 is returned and errno is set to EXDEV. - * If they are identical, the values are returned in \p nodeset and \p - * policy. - * - * Otherwise, if ::HWLOC_MEMBIND_PROCESS is specified (and - * ::HWLOC_MEMBIND_STRICT is \em not specified), \p nodeset is set to - * the logical OR of all threads' default nodeset. - * If all threads' default policies are the same, \p policy is set to - * that policy. If they are different, \p policy is set to - * ::HWLOC_MEMBIND_MIXED. - * - * In the ::HWLOC_MEMBIND_THREAD case (or when neither - * ::HWLOC_MEMBIND_PROCESS or ::HWLOC_MEMBIND_THREAD is specified), there - * is only one nodeset and policy; they are returned in \p nodeset and - * \p policy, respectively. - * - * If any other flags are specified, -1 is returned and errno is set - * to EINVAL. - */ -HWLOC_DECLSPEC int hwloc_get_membind_nodeset(hwloc_topology_t topology, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags); - -/** \brief Query the default memory binding policy and physical locality of the - * current process or thread. - * - * This function has two output parameters: \p set and \p policy. - * The values returned in these parameters depend on both the \p flags - * passed in and the current memory binding policies and nodesets in - * the queried target. - * - * Passing the ::HWLOC_MEMBIND_PROCESS flag specifies that the query - * target is the current policies and nodesets for all the threads in - * the current process. Passing ::HWLOC_MEMBIND_THREAD specifies that - * the query target is the current policy and nodeset for only the - * thread invoking this function. - * - * If neither of these flags are passed (which is the most portable - * method), the process is assumed to be single threaded. This allows - * hwloc to use either process-based OS functions or thread-based OS - * functions, depending on which are available. - * - * ::HWLOC_MEMBIND_STRICT is only meaningful when ::HWLOC_MEMBIND_PROCESS - * is also specified. In this case, hwloc will check the default - * memory policies and nodesets for all threads in the process. If - * they are not identical, -1 is returned and errno is set to EXDEV. - * If they are identical, the values are returned in \p set and \p - * policy. - * - * Otherwise, if ::HWLOC_MEMBIND_PROCESS is specified (and - * ::HWLOC_MEMBIND_STRICT is \em not specified), the default set - * from each thread is logically OR'ed together. - * If all threads' default policies are the same, \p policy is set to - * that policy. If they are different, \p policy is set to - * ::HWLOC_MEMBIND_MIXED. - * - * In the ::HWLOC_MEMBIND_THREAD case (or when neither - * ::HWLOC_MEMBIND_PROCESS or ::HWLOC_MEMBIND_THREAD is specified), there - * is only one set and policy; they are returned in \p set and - * \p policy, respectively. - * - * If ::HWLOC_MEMBIND_BYNODESET is specified, set is considered a nodeset. - * Otherwise it's a cpuset. - * - * If any other flags are specified, -1 is returned and errno is set - * to EINVAL. - */ -HWLOC_DECLSPEC int hwloc_get_membind(hwloc_topology_t topology, hwloc_bitmap_t set, hwloc_membind_policy_t * policy, int flags); - -/** \brief Set the default memory binding policy of the specified - * process to prefer the NUMA node(s) specified by \p nodeset - * - * \return -1 with errno set to ENOSYS if the action is not supported - * \return -1 with errno set to EXDEV if the binding cannot be enforced - * - * \note \p hwloc_pid_t is \p pid_t on Unix platforms, - * and \p HANDLE on native Windows platforms. - */ -HWLOC_DECLSPEC int hwloc_set_proc_membind_nodeset(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags); - -/** \brief Set the default memory binding policy of the specified - * process to prefer the NUMA node(s) specified by \p set - * - * If ::HWLOC_MEMBIND_BYNODESET is specified, set is considered a nodeset. - * Otherwise it's a cpuset. - * - * \return -1 with errno set to ENOSYS if the action is not supported - * \return -1 with errno set to EXDEV if the binding cannot be enforced - * - * \note \p hwloc_pid_t is \p pid_t on Unix platforms, - * and \p HANDLE on native Windows platforms. - */ -HWLOC_DECLSPEC int hwloc_set_proc_membind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_bitmap_t set, hwloc_membind_policy_t policy, int flags); - -/** \brief Query the default memory binding policy and physical locality of the - * specified process. - * - * This function has two output parameters: \p nodeset and \p policy. - * The values returned in these parameters depend on both the \p flags - * passed in and the current memory binding policies and nodesets in - * the queried target. - * - * Passing the ::HWLOC_MEMBIND_PROCESS flag specifies that the query - * target is the current policies and nodesets for all the threads in - * the specified process. If ::HWLOC_MEMBIND_PROCESS is not specified - * (which is the most portable method), the process is assumed to be - * single threaded. This allows hwloc to use either process-based OS - * functions or thread-based OS functions, depending on which are - * available. - * - * Note that it does not make sense to pass ::HWLOC_MEMBIND_THREAD to - * this function. - * - * If ::HWLOC_MEMBIND_STRICT is specified, hwloc will check the default - * memory policies and nodesets for all threads in the specified - * process. If they are not identical, -1 is returned and errno is - * set to EXDEV. If they are identical, the values are returned in \p - * nodeset and \p policy. - * - * Otherwise, \p nodeset is set to the logical OR of all threads' - * default nodeset. If all threads' default policies are the same, \p - * policy is set to that policy. If they are different, \p policy is - * set to ::HWLOC_MEMBIND_MIXED. - * - * If any other flags are specified, -1 is returned and errno is set - * to EINVAL. - * - * \note \p hwloc_pid_t is \p pid_t on Unix platforms, - * and \p HANDLE on native Windows platforms. - */ -HWLOC_DECLSPEC int hwloc_get_proc_membind_nodeset(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags); - -/** \brief Query the default memory binding policy and physical locality of the - * specified process. - * - * This function has two output parameters: \p set and \p policy. - * The values returned in these parameters depend on both the \p flags - * passed in and the current memory binding policies and nodesets in - * the queried target. - * - * Passing the ::HWLOC_MEMBIND_PROCESS flag specifies that the query - * target is the current policies and nodesets for all the threads in - * the specified process. If ::HWLOC_MEMBIND_PROCESS is not specified - * (which is the most portable method), the process is assumed to be - * single threaded. This allows hwloc to use either process-based OS - * functions or thread-based OS functions, depending on which are - * available. - * - * Note that it does not make sense to pass ::HWLOC_MEMBIND_THREAD to - * this function. - * - * If ::HWLOC_MEMBIND_STRICT is specified, hwloc will check the default - * memory policies and nodesets for all threads in the specified - * process. If they are not identical, -1 is returned and errno is - * set to EXDEV. If they are identical, the values are returned in \p - * set and \p policy. - * - * Otherwise, \p set is set to the logical OR of all threads' - * default set. If all threads' default policies - * are the same, \p policy is set to that policy. If they are - * different, \p policy is set to ::HWLOC_MEMBIND_MIXED. - * - * If ::HWLOC_MEMBIND_BYNODESET is specified, set is considered a nodeset. - * Otherwise it's a cpuset. - * - * If any other flags are specified, -1 is returned and errno is set - * to EINVAL. - * - * \note \p hwloc_pid_t is \p pid_t on Unix platforms, - * and \p HANDLE on native Windows platforms. - */ -HWLOC_DECLSPEC int hwloc_get_proc_membind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_bitmap_t set, hwloc_membind_policy_t * policy, int flags); - -/** \brief Bind the already-allocated memory identified by (addr, len) - * to the NUMA node(s) specified by \p nodeset. - * - * \return 0 if \p len is 0. - * \return -1 with errno set to ENOSYS if the action is not supported - * \return -1 with errno set to EXDEV if the binding cannot be enforced - */ -HWLOC_DECLSPEC int hwloc_set_area_membind_nodeset(hwloc_topology_t topology, const void *addr, size_t len, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags); - -/** \brief Bind the already-allocated memory identified by (addr, len) - * to the NUMA node(s) specified by \p set. - * - * If ::HWLOC_MEMBIND_BYNODESET is specified, set is considered a nodeset. - * Otherwise it's a cpuset. - * - * \return 0 if \p len is 0. - * \return -1 with errno set to ENOSYS if the action is not supported - * \return -1 with errno set to EXDEV if the binding cannot be enforced - */ -HWLOC_DECLSPEC int hwloc_set_area_membind(hwloc_topology_t topology, const void *addr, size_t len, hwloc_const_bitmap_t set, hwloc_membind_policy_t policy, int flags); - -/** \brief Query the physical NUMA node(s) and binding policy of the memory - * identified by (\p addr, \p len ). - * - * This function has two output parameters: \p nodeset and \p policy. - * The values returned in these parameters depend on both the \p flags - * passed in and the memory binding policies and nodesets of the pages - * in the address range. - * - * If ::HWLOC_MEMBIND_STRICT is specified, the target pages are first - * checked to see if they all have the same memory binding policy and - * nodeset. If they do not, -1 is returned and errno is set to EXDEV. - * If they are identical across all pages, the nodeset and policy are - * returned in \p nodeset and \p policy, respectively. - * - * If ::HWLOC_MEMBIND_STRICT is not specified, \p nodeset is set to the - * union of all NUMA node(s) containing pages in the address range. - * If all pages in the target have the same policy, it is returned in - * \p policy. Otherwise, \p policy is set to ::HWLOC_MEMBIND_MIXED. - * - * If \p len is 0, -1 is returned and errno is set to EINVAL. - * - * If any other flags are specified, -1 is returned and errno is set - * to EINVAL. - */ -HWLOC_DECLSPEC int hwloc_get_area_membind_nodeset(hwloc_topology_t topology, const void *addr, size_t len, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags); - -/** \brief Query the CPUs near the physical NUMA node(s) and binding policy of - * the memory identified by (\p addr, \p len ). - * - * This function has two output parameters: \p set and \p policy. - * The values returned in these parameters depend on both the \p flags - * passed in and the memory binding policies and nodesets of the pages - * in the address range. - * - * If ::HWLOC_MEMBIND_STRICT is specified, the target pages are first - * checked to see if they all have the same memory binding policy and - * nodeset. If they do not, -1 is returned and errno is set to EXDEV. - * If they are identical across all pages, the set and policy are - * returned in \p set and \p policy, respectively. - * - * If ::HWLOC_MEMBIND_STRICT is not specified, the union of all NUMA - * node(s) containing pages in the address range is calculated. - * If all pages in the target have the same policy, it is returned in - * \p policy. Otherwise, \p policy is set to ::HWLOC_MEMBIND_MIXED. - * - * If ::HWLOC_MEMBIND_BYNODESET is specified, set is considered a nodeset. - * Otherwise it's a cpuset. - * - * If \p len is 0, -1 is returned and errno is set to EINVAL. - * - * If any other flags are specified, -1 is returned and errno is set - * to EINVAL. - */ -HWLOC_DECLSPEC int hwloc_get_area_membind(hwloc_topology_t topology, const void *addr, size_t len, hwloc_bitmap_t set, hwloc_membind_policy_t * policy, int flags); - -/** \brief Get the NUMA nodes where memory identified by (\p addr, \p len ) is physically allocated. - * - * Fills \p set according to the NUMA nodes where the memory area pages - * are physically allocated. If no page is actually allocated yet, - * \p set may be empty. - * - * If pages spread to multiple nodes, it is not specified whether they spread - * equitably, or whether most of them are on a single node, etc. - * - * The operating system may move memory pages from one processor - * to another at any time according to their binding, - * so this function may return something that is already - * outdated. - * - * If ::HWLOC_MEMBIND_BYNODESET is specified, set is considered a nodeset. - * Otherwise it's a cpuset. - * - * If \p len is 0, \p set is emptied. - * - * Flags are currently unused. - */ -HWLOC_DECLSPEC int hwloc_get_area_memlocation(hwloc_topology_t topology, const void *addr, size_t len, hwloc_bitmap_t set, int flags); - -/** \brief Allocate some memory - * - * This is equivalent to malloc(), except that it tries to allocate - * page-aligned memory from the OS. - * - * \note The allocated memory should be freed with hwloc_free(). - */ -HWLOC_DECLSPEC void *hwloc_alloc(hwloc_topology_t topology, size_t len); - -/** \brief Allocate some memory on NUMA memory nodes specified by \p nodeset - * - * \return NULL with errno set to ENOSYS if the action is not supported - * and ::HWLOC_MEMBIND_STRICT is given - * \return NULL with errno set to EXDEV if the binding cannot be enforced - * and ::HWLOC_MEMBIND_STRICT is given - * \return NULL with errno set to ENOMEM if the memory allocation failed - * even before trying to bind. - * - * \note The allocated memory should be freed with hwloc_free(). - */ -HWLOC_DECLSPEC void *hwloc_alloc_membind_nodeset(hwloc_topology_t topology, size_t len, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) __hwloc_attribute_malloc; - -/** \brief Allocate some memory on NUMA memory nodes specified by \p set - * - * \return NULL with errno set to ENOSYS if the action is not supported - * and ::HWLOC_MEMBIND_STRICT is given - * \return NULL with errno set to EXDEV if the binding cannot be enforced - * and ::HWLOC_MEMBIND_STRICT is given - * \return NULL with errno set to ENOMEM if the memory allocation failed - * even before trying to bind. - * - * If ::HWLOC_MEMBIND_BYNODESET is specified, set is considered a nodeset. - * Otherwise it's a cpuset. - * - * \note The allocated memory should be freed with hwloc_free(). - */ -HWLOC_DECLSPEC void *hwloc_alloc_membind(hwloc_topology_t topology, size_t len, hwloc_const_bitmap_t set, hwloc_membind_policy_t policy, int flags) __hwloc_attribute_malloc; - -/** \brief Allocate some memory on NUMA memory nodes specified by \p nodeset - * - * This is similar to hwloc_alloc_membind() except that it is allowed to change - * the current memory binding policy, thus providing more binding support, at - * the expense of changing the current state. - */ -static __hwloc_inline void * -hwloc_alloc_membind_policy_nodeset(hwloc_topology_t topology, size_t len, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) __hwloc_attribute_malloc; - -/** \brief Allocate some memory on NUMA memory nodes specified by \p set - * - * This is similar to hwloc_alloc_membind_nodeset() except that it is allowed to change - * the current memory binding policy, thus providing more binding support, at - * the expense of changing the current state. - * - * If ::HWLOC_MEMBIND_BYNODESET is specified, set is considered a nodeset. - * Otherwise it's a cpuset. - */ -static __hwloc_inline void * -hwloc_alloc_membind_policy(hwloc_topology_t topology, size_t len, hwloc_const_bitmap_t set, hwloc_membind_policy_t policy, int flags) __hwloc_attribute_malloc; - -/** \brief Free memory that was previously allocated by hwloc_alloc() - * or hwloc_alloc_membind(). - */ -HWLOC_DECLSPEC int hwloc_free(hwloc_topology_t topology, void *addr, size_t len); - -/** @} */ - - - -/** \defgroup hwlocality_tinker Modifying a loaded Topology - * @{ - */ - -/** \brief Add a MISC object to the topology - * - * A new MISC object will be created and inserted into the topology at the - * position given by bitmap \p cpuset. This offers a way to add new - * intermediate levels to the topology hierarchy. - * - * \p cpuset and \p name will be copied to setup the new object attributes. - * - * \return the newly-created object. - * \return \c NULL if the insertion conflicts with the existing topology tree. - * - * \note If \p name contains some non-printable characters, they will - * be dropped when exporting to XML, see hwloc_topology_export_xml(). - */ -HWLOC_DECLSPEC hwloc_obj_t hwloc_topology_insert_misc_object_by_cpuset(hwloc_topology_t topology, hwloc_const_cpuset_t cpuset, const char *name); - -/** \brief Add a MISC object as a leaf of the topology - * - * A new MISC object will be created and inserted into the topology at the - * position given by parent. It is appended to the list of existing children, - * without ever adding any intermediate hierarchy level. This is useful for - * annotating the topology without actually changing the hierarchy. - * - * \p name will be copied to the setup the new object attributes. - * However, the new leaf object will not have any \p cpuset. - * - * \return the newly-created object - * - * \note If \p name contains some non-printable characters, they will - * be dropped when exporting to XML, see hwloc_topology_export_xml(). - */ -HWLOC_DECLSPEC hwloc_obj_t hwloc_topology_insert_misc_object_by_parent(hwloc_topology_t topology, hwloc_obj_t parent, const char *name); - -/** \brief Flags to be given to hwloc_topology_restrict(). */ -enum hwloc_restrict_flags_e { - /** \brief Adapt distance matrices according to objects being removed during restriction. - * If this flag is not set, distance matrices are removed. - * \hideinitializer - */ - HWLOC_RESTRICT_FLAG_ADAPT_DISTANCES = (1<<0), - - /** \brief Move Misc objects to ancestors if their parents are removed during restriction. - * If this flag is not set, Misc objects are removed when their parents are removed. - * \hideinitializer - */ - HWLOC_RESTRICT_FLAG_ADAPT_MISC = (1<<1), - - /** \brief Move I/O objects to ancestors if their parents are removed during restriction. - * If this flag is not set, I/O devices and bridges are removed when their parents are removed. - * \hideinitializer - */ - HWLOC_RESTRICT_FLAG_ADAPT_IO = (1<<2) -}; - -/** \brief Restrict the topology to the given CPU set. - * - * Topology \p topology is modified so as to remove all objects that - * are not included (or partially included) in the CPU set \p cpuset. - * All objects CPU and node sets are restricted accordingly. - * - * \p flags is a OR'ed set of ::hwloc_restrict_flags_e. - * - * \note This call may not be reverted by restricting back to a larger - * cpuset. Once dropped during restriction, objects may not be brought - * back, except by loading another topology with hwloc_topology_load(). - * - * \return 0 on success. - * - * \return -1 with errno set to EINVAL if the input cpuset is invalid. - * The topology is not modified in this case. - * - * \return -1 with errno set to ENOMEM on failure to allocate internal data. - * The topology is reinitialized in this case. It should be either - * destroyed with hwloc_topology_destroy() or configured and loaded again. - */ -HWLOC_DECLSPEC int hwloc_topology_restrict(hwloc_topology_t __hwloc_restrict topology, hwloc_const_cpuset_t cpuset, unsigned long flags); - -/** @} */ - - - -/** \defgroup hwlocality_custom Building Custom Topologies - * - * A custom topology may be initialized by calling hwloc_topology_set_custom() - * after hwloc_topology_init(). It may then be modified by inserting objects - * or entire topologies. Once done assembling, hwloc_topology_load() should - * be invoked as usual to finalize the topology. - * @{ - */ - -/** \brief Insert an existing topology inside a custom topology - * - * Duplicate the existing topology \p oldtopology inside a new - * custom topology \p newtopology as a leaf of object \p newparent. - * - * If \p oldroot is not \c NULL, duplicate \p oldroot and all its - * children instead of the entire \p oldtopology. Passing the root - * object of \p oldtopology in \p oldroot is equivalent to passing - * \c NULL. - * - * The custom topology \p newtopology must have been prepared with - * hwloc_topology_set_custom() and not loaded with hwloc_topology_load() - * yet. - * - * \p newparent may be either the root of \p newtopology or an object - * that was added through hwloc_custom_insert_group_object_by_parent(). - * - * \note The cpuset and nodeset of the \p newparent object are not - * modified based on the contents of \p oldtopology. - */ -HWLOC_DECLSPEC int hwloc_custom_insert_topology(hwloc_topology_t newtopology, hwloc_obj_t newparent, hwloc_topology_t oldtopology, hwloc_obj_t oldroot); - -/** \brief Insert a new group object inside a custom topology - * - * An object with type ::HWLOC_OBJ_GROUP is inserted as a new child - * of object \p parent. - * - * \p groupdepth is the depth attribute to be given to the new object. - * It may for instance be 0 for top-level groups, 1 for their children, - * and so on. - * - * The custom topology \p newtopology must have been prepared with - * hwloc_topology_set_custom() and not loaded with hwloc_topology_load() - * yet. - * - * \p parent may be either the root of \p topology or an object that - * was added earlier through hwloc_custom_insert_group_object_by_parent(). - * - * \note The cpuset and nodeset of the new group object are NULL because - * these sets are meaningless when assembling multiple topologies. - * - * \note The cpuset and nodeset of the \p parent object are not modified. - */ -HWLOC_DECLSPEC hwloc_obj_t hwloc_custom_insert_group_object_by_parent(hwloc_topology_t topology, hwloc_obj_t parent, int groupdepth); - -/** @} */ - - - -/** \defgroup hwlocality_xmlexport Exporting Topologies to XML - * @{ - */ - -/** \brief Export the topology into an XML file. - * - * This file may be loaded later through hwloc_topology_set_xml(). - * - * \return -1 if a failure occured. - * - * \note See also hwloc_topology_set_userdata_export_callback() - * for exporting application-specific object userdata. - * - * \note The topology-specific userdata pointer is ignored when exporting to XML. - * - * \note Only printable characters may be exported to XML string attributes. - * Any other character, especially any non-ASCII character, will be silently - * dropped. - * - * \note If \p name is "-", the XML output is sent to the standard output. - */ -HWLOC_DECLSPEC int hwloc_topology_export_xml(hwloc_topology_t topology, const char *xmlpath); - -/** \brief Export the topology into a newly-allocated XML memory buffer. - * - * \p xmlbuffer is allocated by the callee and should be freed with - * hwloc_free_xmlbuffer() later in the caller. - * - * This memory buffer may be loaded later through hwloc_topology_set_xmlbuffer(). - * - * \return -1 if a failure occured. - * - * \note See also hwloc_topology_set_userdata_export_callback() - * for exporting application-specific object userdata. - * - * \note The topology-specific userdata pointer is ignored when exporting to XML. - * - * \note Only printable characters may be exported to XML string attributes. - * Any other character, especially any non-ASCII character, will be silently - * dropped. - */ -HWLOC_DECLSPEC int hwloc_topology_export_xmlbuffer(hwloc_topology_t topology, char **xmlbuffer, int *buflen); - -/** \brief Free a buffer allocated by hwloc_topology_export_xmlbuffer() */ -HWLOC_DECLSPEC void hwloc_free_xmlbuffer(hwloc_topology_t topology, char *xmlbuffer); - -/** \brief Set the application-specific callback for exporting object userdata - * - * The object userdata pointer is not exported to XML by default because hwloc - * does not know what it contains. - * - * This function lets applications set \p export_cb to a callback function - * that converts this opaque userdata into an exportable string. - * - * \p export_cb is invoked during XML export for each object whose - * \p userdata pointer is not \c NULL. - * The callback should use hwloc_export_obj_userdata() or - * hwloc_export_obj_userdata_base64() to actually export - * something to XML (possibly multiple times per object). - * - * \p export_cb may be set to \c NULL if userdata should not be exported to XML. - * - * \note The topology-specific userdata pointer is ignored when exporting to XML. - */ -HWLOC_DECLSPEC void hwloc_topology_set_userdata_export_callback(hwloc_topology_t topology, - void (*export_cb)(void *reserved, hwloc_topology_t topology, hwloc_obj_t obj)); - -/** \brief Export some object userdata to XML - * - * This function may only be called from within the export() callback passed - * to hwloc_topology_set_userdata_export_callback(). - * It may be invoked one of multiple times to export some userdata to XML. - * The \p buffer content of length \p length is stored with optional name - * \p name. - * - * When importing this XML file, the import() callback (if set) will be - * called exactly as many times as hwloc_export_obj_userdata() was called - * during export(). It will receive the corresponding \p name, \p buffer - * and \p length arguments. - * - * \p reserved, \p topology and \p obj must be the first three parameters - * that were given to the export callback. - * - * Only printable characters may be exported to XML string attributes. - * If a non-printable character is passed in \p name or \p buffer, - * the function returns -1 with errno set to EINVAL. - * - * If exporting binary data, the application should first encode into - * printable characters only (or use hwloc_export_obj_userdata_base64()). - * It should also take care of portability issues if the export may - * be reimported on a different architecture. - */ -HWLOC_DECLSPEC int hwloc_export_obj_userdata(void *reserved, hwloc_topology_t topology, hwloc_obj_t obj, const char *name, const void *buffer, size_t length); - -/** \brief Encode and export some object userdata to XML - * - * This function is similar to hwloc_export_obj_userdata() but it encodes - * the input buffer into printable characters before exporting. - * On import, decoding is automatically performed before the data is given - * to the import() callback if any. - * - * This function may only be called from within the export() callback passed - * to hwloc_topology_set_userdata_export_callback(). - * - * The function does not take care of portability issues if the export - * may be reimported on a different architecture. - */ -HWLOC_DECLSPEC int hwloc_export_obj_userdata_base64(void *reserved, hwloc_topology_t topology, hwloc_obj_t obj, const char *name, const void *buffer, size_t length); - -/** \brief Set the application-specific callback for importing userdata - * - * On XML import, userdata is ignored by default because hwloc does not know - * how to store it in memory. - * - * This function lets applications set \p import_cb to a callback function - * that will get the XML-stored userdata and store it in the object as expected - * by the application. - * - * \p import_cb is called during hwloc_topology_load() as many times as - * hwloc_export_obj_userdata() was called during export. The topology - * is not entirely setup yet. Object attributes are ready to consult, - * but links between objects are not. - * - * \p import_cb may be \c NULL if userdata should be ignored during import. - * - * \note \p buffer contains \p length characters followed by a null byte ('\0'). - * - * \note This function should be called before hwloc_topology_load(). - * - * \note The topology-specific userdata pointer is ignored when importing from XML. - */ -HWLOC_DECLSPEC void hwloc_topology_set_userdata_import_callback(hwloc_topology_t topology, - void (*import_cb)(hwloc_topology_t topology, hwloc_obj_t obj, const char *name, const void *buffer, size_t length)); - -/** @} */ - - -/** \defgroup hwlocality_syntheticexport Exporting Topologies to Synthetic - * @{ - */ - -/** \brief Flags for exporting synthetic topologies. - * - * Flags to be given as a OR'ed set to hwloc_topology_export_synthetic(). - */ -enum hwloc_topology_export_synthetic_flags_e { - /** \brief Export extended types such as L2dcache as basic types such as Cache. - * - * This is required if loading the synthetic description with hwloc < 1.9. - * \hideinitializer - */ - HWLOC_TOPOLOGY_EXPORT_SYNTHETIC_FLAG_NO_EXTENDED_TYPES = (1UL<<0), - - /** \brief Do not export level attributes. - * - * Ignore level attributes such as memory/cache sizes or PU indexes. - * This is required if loading the synthetic description with hwloc < 1.10. - * \hideinitializer - */ - HWLOC_TOPOLOGY_EXPORT_SYNTHETIC_FLAG_NO_ATTRS = (1UL<<1) -}; - -/** \brief Export the topology as a synthetic string. - * - * At most \p buflen characters will be written in \p buffer, - * including the terminating \0. - * - * This exported string may be given back to hwloc_topology_set_synthetic(). - * - * \p flags is a OR'ed set of hwloc_topology_export_synthetic_flags_e. - * - * \return The number of characters that were written, - * not including the terminating \0. - * - * \return -1 if the topology could not be exported, - * for instance if it is not symmetric. - * - * \note A 1024-byte buffer should be large enough for exporting - * topologies in the vast majority of cases. - */ - HWLOC_DECLSPEC int hwloc_topology_export_synthetic(hwloc_topology_t topology, char *buffer, size_t buflen, unsigned long flags); - -/** @} */ - - - -#ifdef __cplusplus -} /* extern "C" */ -#endif - - -/* high-level helpers */ -#include - -/* inline code of some functions above */ -#include - -/* topology diffs */ -#include - -/* deprecated headers */ -#include - -#endif /* HWLOC_H */ diff --git a/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/autogen/config.h.in b/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/autogen/config.h.in deleted file mode 100644 index e101b0a479b..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/autogen/config.h.in +++ /dev/null @@ -1,201 +0,0 @@ -/* -*- c -*- - * Copyright © 2009 CNRS - * Copyright © 2009-2014 Inria. All rights reserved. - * Copyright © 2009-2012 Université Bordeaux - * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. - * See COPYING in top-level directory. - */ - -/* The configuration file */ - -#ifndef HWLOC_CONFIG_H -#define HWLOC_CONFIG_H - -#if (__GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 95)) -# define __hwloc_restrict __restrict -#else -# if __STDC_VERSION__ >= 199901L -# define __hwloc_restrict restrict -# else -# define __hwloc_restrict -# endif -#endif - -/* Note that if we're compiling C++, then just use the "inline" - keyword, since it's part of C++ */ -#if defined(c_plusplus) || defined(__cplusplus) -# define __hwloc_inline inline -#elif defined(_MSC_VER) || defined(__HP_cc) -# define __hwloc_inline __inline -#else -# define __hwloc_inline __inline__ -#endif - -/* - * Note: this is public. We can not assume anything from the compiler used - * by the application and thus the HWLOC_HAVE_* macros below are not - * fetched from the autoconf result here. We only automatically use a few - * well-known easy cases. - */ - -/* Some handy constants to make the logic below a little more readable */ -#if defined(__cplusplus) && \ - (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR >= 4)) -#define GXX_ABOVE_3_4 1 -#else -#define GXX_ABOVE_3_4 0 -#endif - -#if !defined(__cplusplus) && \ - (__GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 95)) -#define GCC_ABOVE_2_95 1 -#else -#define GCC_ABOVE_2_95 0 -#endif - -#if !defined(__cplusplus) && \ - (__GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 96)) -#define GCC_ABOVE_2_96 1 -#else -#define GCC_ABOVE_2_96 0 -#endif - -#if !defined(__cplusplus) && \ - (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 3)) -#define GCC_ABOVE_3_3 1 -#else -#define GCC_ABOVE_3_3 0 -#endif - -/* Maybe before gcc 2.95 too */ -#ifdef HWLOC_HAVE_ATTRIBUTE_UNUSED -#define __HWLOC_HAVE_ATTRIBUTE_UNUSED HWLOC_HAVE_ATTRIBUTE_UNUSED -#elif defined(__GNUC__) -# define __HWLOC_HAVE_ATTRIBUTE_UNUSED (GXX_ABOVE_3_4 || GCC_ABOVE_2_95) -#else -# define __HWLOC_HAVE_ATTRIBUTE_UNUSED 0 -#endif -#if __HWLOC_HAVE_ATTRIBUTE_UNUSED -# define __hwloc_attribute_unused __attribute__((__unused__)) -#else -# define __hwloc_attribute_unused -#endif - -#ifdef HWLOC_HAVE_ATTRIBUTE_MALLOC -#define __HWLOC_HAVE_ATTRIBUTE_MALLOC HWLOC_HAVE_ATTRIBUTE_MALLOC -#elif defined(__GNUC__) -# define __HWLOC_HAVE_ATTRIBUTE_MALLOC (GXX_ABOVE_3_4 || GCC_ABOVE_2_96) -#else -# define __HWLOC_HAVE_ATTRIBUTE_MALLOC 0 -#endif -#if __HWLOC_HAVE_ATTRIBUTE_MALLOC -# define __hwloc_attribute_malloc __attribute__((__malloc__)) -#else -# define __hwloc_attribute_malloc -#endif - -#ifdef HWLOC_HAVE_ATTRIBUTE_CONST -#define __HWLOC_HAVE_ATTRIBUTE_CONST HWLOC_HAVE_ATTRIBUTE_CONST -#elif defined(__GNUC__) -# define __HWLOC_HAVE_ATTRIBUTE_CONST (GXX_ABOVE_3_4 || GCC_ABOVE_2_95) -#else -# define __HWLOC_HAVE_ATTRIBUTE_CONST 0 -#endif -#if __HWLOC_HAVE_ATTRIBUTE_CONST -# define __hwloc_attribute_const __attribute__((__const__)) -#else -# define __hwloc_attribute_const -#endif - -#ifdef HWLOC_HAVE_ATTRIBUTE_PURE -#define __HWLOC_HAVE_ATTRIBUTE_PURE HWLOC_HAVE_ATTRIBUTE_PURE -#elif defined(__GNUC__) -# define __HWLOC_HAVE_ATTRIBUTE_PURE (GXX_ABOVE_3_4 || GCC_ABOVE_2_96) -#else -# define __HWLOC_HAVE_ATTRIBUTE_PURE 0 -#endif -#if __HWLOC_HAVE_ATTRIBUTE_PURE -# define __hwloc_attribute_pure __attribute__((__pure__)) -#else -# define __hwloc_attribute_pure -#endif - -#ifdef HWLOC_HAVE_ATTRIBUTE_DEPRECATED -#define __HWLOC_HAVE_ATTRIBUTE_DEPRECATED HWLOC_HAVE_ATTRIBUTE_DEPRECATED -#elif defined(__GNUC__) -# define __HWLOC_HAVE_ATTRIBUTE_DEPRECATED (GXX_ABOVE_3_4 || GCC_ABOVE_3_3) -#else -# define __HWLOC_HAVE_ATTRIBUTE_DEPRECATED 0 -#endif -#if __HWLOC_HAVE_ATTRIBUTE_DEPRECATED -# define __hwloc_attribute_deprecated __attribute__((__deprecated__)) -#else -# define __hwloc_attribute_deprecated -#endif - -#ifdef HWLOC_HAVE_ATTRIBUTE_MAY_ALIAS -#define __HWLOC_HAVE_ATTRIBUTE_MAY_ALIAS HWLOC_HAVE_ATTRIBUTE_MAY_ALIAS -#elif defined(__GNUC__) -# define __HWLOC_HAVE_ATTRIBUTE_MAY_ALIAS (GXX_ABOVE_3_4 || GCC_ABOVE_3_3) -#else -# define __HWLOC_HAVE_ATTRIBUTE_MAY_ALIAS 0 -#endif -#if __HWLOC_HAVE_ATTRIBUTE_MAY_ALIAS -# define __hwloc_attribute_may_alias __attribute__((__may_alias__)) -#else -# define __hwloc_attribute_may_alias -#endif - -#ifdef HWLOC_C_HAVE_VISIBILITY -# if HWLOC_C_HAVE_VISIBILITY -# define HWLOC_DECLSPEC __attribute__((__visibility__("default"))) -# else -# define HWLOC_DECLSPEC -# endif -#else -# define HWLOC_DECLSPEC -#endif - -/* Defined to 1 on Linux */ -#undef HWLOC_LINUX_SYS - -/* Defined to 1 if the CPU_SET macro works */ -#undef HWLOC_HAVE_CPU_SET - -/* Defined to 1 if you have the `windows.h' header. */ -#undef HWLOC_HAVE_WINDOWS_H -#undef hwloc_pid_t -#undef hwloc_thread_t - -#ifdef HWLOC_HAVE_WINDOWS_H - -# include -typedef DWORDLONG hwloc_uint64_t; - -#else /* HWLOC_HAVE_WINDOWS_H */ - -# ifdef hwloc_thread_t -# include -# endif /* hwloc_thread_t */ - -/* Defined to 1 if you have the header file. */ -# undef HWLOC_HAVE_STDINT_H - -# include -# ifdef HWLOC_HAVE_STDINT_H -# include -# endif -typedef uint64_t hwloc_uint64_t; - -#endif /* HWLOC_HAVE_WINDOWS_H */ - -/* Whether we need to re-define all the hwloc public symbols or not */ -#undef HWLOC_SYM_TRANSFORM - -/* The hwloc symbol prefix */ -#undef HWLOC_SYM_PREFIX - -/* The hwloc symbol prefix in all caps */ -#undef HWLOC_SYM_PREFIX_CAPS - -#endif /* HWLOC_CONFIG_H */ diff --git a/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/cuda.h b/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/cuda.h deleted file mode 100644 index a02d677699b..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/cuda.h +++ /dev/null @@ -1,224 +0,0 @@ -/* - * Copyright © 2010-2015 Inria. All rights reserved. - * Copyright © 2010-2011 Université Bordeaux - * Copyright © 2011 Cisco Systems, Inc. All rights reserved. - * See COPYING in top-level directory. - */ - -/** \file - * \brief Macros to help interaction between hwloc and the CUDA Driver API. - * - * Applications that use both hwloc and the CUDA Driver API may want to - * include this file so as to get topology information for CUDA devices. - * - */ - -#ifndef HWLOC_CUDA_H -#define HWLOC_CUDA_H - -#include -#include -#include -#ifdef HWLOC_LINUX_SYS -#include -#endif - -#include - - -#ifdef __cplusplus -extern "C" { -#endif - - -/** \defgroup hwlocality_cuda Interoperability with the CUDA Driver API - * - * This interface offers ways to retrieve topology information about - * CUDA devices when using the CUDA Driver API. - * - * @{ - */ - -/** \brief Return the domain, bus and device IDs of the CUDA device \p cudevice. - * - * Device \p cudevice must match the local machine. - */ -static __hwloc_inline int -hwloc_cuda_get_device_pci_ids(hwloc_topology_t topology __hwloc_attribute_unused, - CUdevice cudevice, int *domain, int *bus, int *dev) -{ - CUresult cres; - -#if CUDA_VERSION >= 4000 - cres = cuDeviceGetAttribute(domain, CU_DEVICE_ATTRIBUTE_PCI_DOMAIN_ID, cudevice); - if (cres != CUDA_SUCCESS) { - errno = ENOSYS; - return -1; - } -#else - *domain = 0; -#endif - cres = cuDeviceGetAttribute(bus, CU_DEVICE_ATTRIBUTE_PCI_BUS_ID, cudevice); - if (cres != CUDA_SUCCESS) { - errno = ENOSYS; - return -1; - } - cres = cuDeviceGetAttribute(dev, CU_DEVICE_ATTRIBUTE_PCI_DEVICE_ID, cudevice); - if (cres != CUDA_SUCCESS) { - errno = ENOSYS; - return -1; - } - - return 0; -} - -/** \brief Get the CPU set of logical processors that are physically - * close to device \p cudevice. - * - * Return the CPU set describing the locality of the CUDA device \p cudevice. - * - * Topology \p topology and device \p cudevice must match the local machine. - * I/O devices detection and the CUDA component are not needed in the topology. - * - * The function only returns the locality of the device. - * If more information about the device is needed, OS objects should - * be used instead, see hwloc_cuda_get_device_osdev() - * and hwloc_cuda_get_device_osdev_by_index(). - * - * This function is currently only implemented in a meaningful way for - * Linux; other systems will simply get a full cpuset. - */ -static __hwloc_inline int -hwloc_cuda_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused, - CUdevice cudevice, hwloc_cpuset_t set) -{ -#ifdef HWLOC_LINUX_SYS - /* If we're on Linux, use the sysfs mechanism to get the local cpus */ -#define HWLOC_CUDA_DEVICE_SYSFS_PATH_MAX 128 - char path[HWLOC_CUDA_DEVICE_SYSFS_PATH_MAX]; - FILE *sysfile = NULL; - int domainid, busid, deviceid; - - if (hwloc_cuda_get_device_pci_ids(topology, cudevice, &domainid, &busid, &deviceid)) - return -1; - - if (!hwloc_topology_is_thissystem(topology)) { - errno = EINVAL; - return -1; - } - - sprintf(path, "/sys/bus/pci/devices/%04x:%02x:%02x.0/local_cpus", domainid, busid, deviceid); - sysfile = fopen(path, "r"); - if (!sysfile) - return -1; - - hwloc_linux_parse_cpumap_file(sysfile, set); - if (hwloc_bitmap_iszero(set)) - hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); - - fclose(sysfile); -#else - /* Non-Linux systems simply get a full cpuset */ - hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); -#endif - return 0; -} - -/** \brief Get the hwloc PCI device object corresponding to the - * CUDA device \p cudevice. - * - * Return the PCI device object describing the CUDA device \p cudevice. - * Return NULL if there is none. - * - * Topology \p topology and device \p cudevice must match the local machine. - * I/O devices detection must be enabled in topology \p topology. - * The CUDA component is not needed in the topology. - */ -static __hwloc_inline hwloc_obj_t -hwloc_cuda_get_device_pcidev(hwloc_topology_t topology, CUdevice cudevice) -{ - int domain, bus, dev; - - if (hwloc_cuda_get_device_pci_ids(topology, cudevice, &domain, &bus, &dev)) - return NULL; - - return hwloc_get_pcidev_by_busid(topology, domain, bus, dev, 0); -} - -/** \brief Get the hwloc OS device object corresponding to CUDA device \p cudevice. - * - * Return the hwloc OS device object that describes the given - * CUDA device \p cudevice. Return NULL if there is none. - * - * Topology \p topology and device \p cudevice must match the local machine. - * I/O devices detection and the NVML component must be enabled in the topology. - * If not, the locality of the object may still be found using - * hwloc_cuda_get_device_cpuset(). - * - * \note The corresponding hwloc PCI device may be found by looking - * at the result parent pointer. - */ -static __hwloc_inline hwloc_obj_t -hwloc_cuda_get_device_osdev(hwloc_topology_t topology, CUdevice cudevice) -{ - hwloc_obj_t osdev = NULL; - int domain, bus, dev; - - if (hwloc_cuda_get_device_pci_ids(topology, cudevice, &domain, &bus, &dev)) - return NULL; - - osdev = NULL; - while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) { - hwloc_obj_t pcidev = osdev->parent; - if (strncmp(osdev->name, "cuda", 4)) - continue; - if (pcidev - && pcidev->type == HWLOC_OBJ_PCI_DEVICE - && (int) pcidev->attr->pcidev.domain == domain - && (int) pcidev->attr->pcidev.bus == bus - && (int) pcidev->attr->pcidev.dev == dev - && pcidev->attr->pcidev.func == 0) - return osdev; - } - - return NULL; -} - -/** \brief Get the hwloc OS device object corresponding to the - * CUDA device whose index is \p idx. - * - * Return the OS device object describing the CUDA device whose - * index is \p idx. Return NULL if there is none. - * - * The topology \p topology does not necessarily have to match the current - * machine. For instance the topology may be an XML import of a remote host. - * I/O devices detection and the CUDA component must be enabled in the topology. - * - * \note The corresponding PCI device object can be obtained by looking - * at the OS device parent object. - * - * \note This function is identical to hwloc_cudart_get_device_osdev_by_index(). - */ -static __hwloc_inline hwloc_obj_t -hwloc_cuda_get_device_osdev_by_index(hwloc_topology_t topology, unsigned idx) -{ - hwloc_obj_t osdev = NULL; - while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) { - if (HWLOC_OBJ_OSDEV_COPROC == osdev->attr->osdev.type - && osdev->name - && !strncmp("cuda", osdev->name, 4) - && atoi(osdev->name + 4) == (int) idx) - return osdev; - } - return NULL; -} - -/** @} */ - - -#ifdef __cplusplus -} /* extern "C" */ -#endif - - -#endif /* HWLOC_CUDA_H */ diff --git a/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/deprecated.h b/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/deprecated.h deleted file mode 100644 index ac42c13de02..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/deprecated.h +++ /dev/null @@ -1,102 +0,0 @@ -/* - * Copyright © 2009 CNRS - * Copyright © 2009-2014 Inria. All rights reserved. - * Copyright © 2009-2012 Université Bordeaux - * Copyright © 2009-2010 Cisco Systems, Inc. All rights reserved. - * See COPYING in top-level directory. - */ - -/** - * This file contains the inline code of functions declared in hwloc.h - */ - -#ifndef HWLOC_DEPRECATED_H -#define HWLOC_DEPRECATED_H - -#ifndef HWLOC_H -#error Please include the main hwloc.h instead -#endif - -#ifdef __cplusplus -extern "C" { -#endif - -/* backward compat with v1.10 before Socket->Package renaming */ -#define HWLOC_OBJ_SOCKET HWLOC_OBJ_PACKAGE -/* backward compat with v1.10 before Node->NUMANode clarification */ -#define HWLOC_OBJ_NODE HWLOC_OBJ_NUMANODE - -/** \brief Return an object type from the string - * - * \return -1 if unrecognized. - */ -HWLOC_DECLSPEC hwloc_obj_type_t hwloc_obj_type_of_string (const char * string) __hwloc_attribute_pure __hwloc_attribute_deprecated; - -/** \brief Stringify a given topology object into a human-readable form. - * - * \note This function is deprecated in favor of hwloc_obj_type_snprintf() - * and hwloc_obj_attr_snprintf() since it is not very flexible and - * only prints physical/OS indexes. - * - * Fill string \p string up to \p size characters with the description - * of topology object \p obj in topology \p topology. - * - * If \p verbose is set, a longer description is used. Otherwise a - * short description is used. - * - * \p indexprefix is used to prefix the \p os_index attribute number of - * the object in the description. If \c NULL, the \c # character is used. - * - * If \p size is 0, \p string may safely be \c NULL. - * - * \return the number of character that were actually written if not truncating, - * or that would have been written (not including the ending \\0). - */ -HWLOC_DECLSPEC int hwloc_obj_snprintf(char * __hwloc_restrict string, size_t size, - hwloc_topology_t topology, hwloc_obj_t obj, - const char * __hwloc_restrict indexprefix, int verbose) __hwloc_attribute_deprecated; - -/** \brief Distribute \p n items over the topology under \p root - * - * Array \p cpuset will be filled with \p n cpusets recursively distributed - * linearly over the topology under \p root, down to depth \p until (which can - * be INT_MAX to distribute down to the finest level). - * - * This is typically useful when an application wants to distribute \p n - * threads over a machine, giving each of them as much private cache as - * possible and keeping them locally in number order. - * - * The caller may typically want to also call hwloc_bitmap_singlify() - * before binding a thread so that it does not move at all. - * - * \note This function requires the \p root object to have a CPU set. - */ -static __hwloc_inline void -hwloc_distribute(hwloc_topology_t topology, hwloc_obj_t root, hwloc_cpuset_t *set, unsigned n, unsigned until) __hwloc_attribute_deprecated; -static __hwloc_inline void -hwloc_distribute(hwloc_topology_t topology, hwloc_obj_t root, hwloc_cpuset_t *set, unsigned n, unsigned until) -{ - hwloc_distrib(topology, &root, 1, set, n, until, 0); -} - -/** \brief Distribute \p n items over the topology under \p roots - * - * This is the same as hwloc_distribute(), but takes an array of roots instead of - * just one root. - * - * \note This function requires the \p roots objects to have a CPU set. - */ -static __hwloc_inline void -hwloc_distributev(hwloc_topology_t topology, hwloc_obj_t *roots, unsigned n_roots, hwloc_cpuset_t *set, unsigned n, unsigned until) __hwloc_attribute_deprecated; -static __hwloc_inline void -hwloc_distributev(hwloc_topology_t topology, hwloc_obj_t *roots, unsigned n_roots, hwloc_cpuset_t *set, unsigned n, unsigned until) -{ - hwloc_distrib(topology, roots, n_roots, set, n, until, 0); -} - -#ifdef __cplusplus -} /* extern "C" */ -#endif - - -#endif /* HWLOC_DEPRECATED_H */ diff --git a/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/helper.h b/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/helper.h deleted file mode 100644 index 029f2a37efc..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/helper.h +++ /dev/null @@ -1,1290 +0,0 @@ -/* - * Copyright © 2009 CNRS - * Copyright © 2009-2014 Inria. All rights reserved. - * Copyright © 2009-2012 Université Bordeaux - * Copyright © 2009-2010 Cisco Systems, Inc. All rights reserved. - * See COPYING in top-level directory. - */ - -/** \file - * \brief High-level hwloc traversal helpers. - */ - -#ifndef HWLOC_HELPER_H -#define HWLOC_HELPER_H - -#ifndef HWLOC_H -#error Please include the main hwloc.h instead -#endif - -#include -#include - - -#ifdef __cplusplus -extern "C" { -#endif - - -/** \defgroup hwlocality_helper_find_inside Finding Objects inside a CPU set - * @{ - */ - -/** \brief Get the first largest object included in the given cpuset \p set. - * - * \return the first object that is included in \p set and whose parent is not. - * - * This is convenient for iterating over all largest objects within a CPU set - * by doing a loop getting the first largest object and clearing its CPU set - * from the remaining CPU set. - * - * \note This function cannot work if the root object does not have a CPU set, - * e.g. if the topology is made of different machines. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_first_largest_obj_inside_cpuset(hwloc_topology_t topology, hwloc_const_cpuset_t set) -{ - hwloc_obj_t obj = hwloc_get_root_obj(topology); - if (!obj->cpuset || !hwloc_bitmap_intersects(obj->cpuset, set)) - return NULL; - while (!hwloc_bitmap_isincluded(obj->cpuset, set)) { - /* while the object intersects without being included, look at its children */ - hwloc_obj_t child = obj->first_child; - while (child) { - if (child->cpuset && hwloc_bitmap_intersects(child->cpuset, set)) - break; - child = child->next_sibling; - } - if (!child) - /* no child intersects, return their father */ - return obj; - /* found one intersecting child, look at its children */ - obj = child; - } - /* obj is included, return it */ - return obj; -} - -/** \brief Get the set of largest objects covering exactly a given cpuset \p set - * - * \return the number of objects returned in \p objs. - * - * \note This function cannot work if the root object does not have a CPU set, - * e.g. if the topology is made of different machines. - */ -HWLOC_DECLSPEC int hwloc_get_largest_objs_inside_cpuset (hwloc_topology_t topology, hwloc_const_cpuset_t set, - hwloc_obj_t * __hwloc_restrict objs, int max); - -/** \brief Return the next object at depth \p depth included in CPU set \p set. - * - * If \p prev is \c NULL, return the first object at depth \p depth - * included in \p set. The next invokation should pass the previous - * return value in \p prev so as to obtain the next object in \p set. - * - * \note This function cannot work if objects at the given depth do - * not have CPU sets or if the topology is made of different machines. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_next_obj_inside_cpuset_by_depth (hwloc_topology_t topology, hwloc_const_cpuset_t set, - unsigned depth, hwloc_obj_t prev) -{ - hwloc_obj_t next = hwloc_get_next_obj_by_depth(topology, depth, prev); - if (!next || !next->cpuset) - return NULL; - while (next && !hwloc_bitmap_isincluded(next->cpuset, set)) - next = next->next_cousin; - return next; -} - -/** \brief Return the next object of type \p type included in CPU set \p set. - * - * If there are multiple or no depth for given type, return \c NULL - * and let the caller fallback to - * hwloc_get_next_obj_inside_cpuset_by_depth(). - * - * \note This function cannot work if objects of the given type do - * not have CPU sets or if the topology is made of different machines. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_next_obj_inside_cpuset_by_type (hwloc_topology_t topology, hwloc_const_cpuset_t set, - hwloc_obj_type_t type, hwloc_obj_t prev) -{ - int depth = hwloc_get_type_depth(topology, type); - if (depth == HWLOC_TYPE_DEPTH_UNKNOWN || depth == HWLOC_TYPE_DEPTH_MULTIPLE) - return NULL; - return hwloc_get_next_obj_inside_cpuset_by_depth(topology, set, depth, prev); -} - -/** \brief Return the (logically) \p idx -th object at depth \p depth included in CPU set \p set. - * - * \note This function cannot work if objects at the given depth do - * not have CPU sets or if the topology is made of different machines. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_obj_inside_cpuset_by_depth (hwloc_topology_t topology, hwloc_const_cpuset_t set, - unsigned depth, unsigned idx) __hwloc_attribute_pure; -static __hwloc_inline hwloc_obj_t -hwloc_get_obj_inside_cpuset_by_depth (hwloc_topology_t topology, hwloc_const_cpuset_t set, - unsigned depth, unsigned idx) -{ - hwloc_obj_t obj = hwloc_get_obj_by_depth (topology, depth, 0); - unsigned count = 0; - if (!obj || !obj->cpuset) - return NULL; - while (obj) { - if (hwloc_bitmap_isincluded(obj->cpuset, set)) { - if (count == idx) - return obj; - count++; - } - obj = obj->next_cousin; - } - return NULL; -} - -/** \brief Return the \p idx -th object of type \p type included in CPU set \p set. - * - * If there are multiple or no depth for given type, return \c NULL - * and let the caller fallback to - * hwloc_get_obj_inside_cpuset_by_depth(). - * - * \note This function cannot work if objects of the given type do - * not have CPU sets or if the topology is made of different machines. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_obj_inside_cpuset_by_type (hwloc_topology_t topology, hwloc_const_cpuset_t set, - hwloc_obj_type_t type, unsigned idx) __hwloc_attribute_pure; -static __hwloc_inline hwloc_obj_t -hwloc_get_obj_inside_cpuset_by_type (hwloc_topology_t topology, hwloc_const_cpuset_t set, - hwloc_obj_type_t type, unsigned idx) -{ - int depth = hwloc_get_type_depth(topology, type); - if (depth == HWLOC_TYPE_DEPTH_UNKNOWN || depth == HWLOC_TYPE_DEPTH_MULTIPLE) - return NULL; - return hwloc_get_obj_inside_cpuset_by_depth(topology, set, depth, idx); -} - -/** \brief Return the number of objects at depth \p depth included in CPU set \p set. - * - * \note This function cannot work if objects at the given depth do - * not have CPU sets or if the topology is made of different machines. - */ -static __hwloc_inline unsigned -hwloc_get_nbobjs_inside_cpuset_by_depth (hwloc_topology_t topology, hwloc_const_cpuset_t set, - unsigned depth) __hwloc_attribute_pure; -static __hwloc_inline unsigned -hwloc_get_nbobjs_inside_cpuset_by_depth (hwloc_topology_t topology, hwloc_const_cpuset_t set, - unsigned depth) -{ - hwloc_obj_t obj = hwloc_get_obj_by_depth (topology, depth, 0); - unsigned count = 0; - if (!obj || !obj->cpuset) - return 0; - while (obj) { - if (hwloc_bitmap_isincluded(obj->cpuset, set)) - count++; - obj = obj->next_cousin; - } - return count; -} - -/** \brief Return the number of objects of type \p type included in CPU set \p set. - * - * If no object for that type exists inside CPU set \p set, 0 is - * returned. If there are several levels with objects of that type - * inside CPU set \p set, -1 is returned. - * - * \note This function cannot work if objects of the given type do - * not have CPU sets or if the topology is made of different machines. - */ -static __hwloc_inline int -hwloc_get_nbobjs_inside_cpuset_by_type (hwloc_topology_t topology, hwloc_const_cpuset_t set, - hwloc_obj_type_t type) __hwloc_attribute_pure; -static __hwloc_inline int -hwloc_get_nbobjs_inside_cpuset_by_type (hwloc_topology_t topology, hwloc_const_cpuset_t set, - hwloc_obj_type_t type) -{ - int depth = hwloc_get_type_depth(topology, type); - if (depth == HWLOC_TYPE_DEPTH_UNKNOWN) - return 0; - if (depth == HWLOC_TYPE_DEPTH_MULTIPLE) - return -1; /* FIXME: agregate nbobjs from different levels? */ - return hwloc_get_nbobjs_inside_cpuset_by_depth(topology, set, depth); -} - -/** \brief Return the logical index among the objects included in CPU set \p set. - * - * Consult all objects in the same level as \p obj and inside CPU set \p set - * in the logical order, and return the index of \p obj within them. - * If \p set covers the entire topology, this is the logical index of \p obj. - * Otherwise, this is similar to a logical index within the part of the topology - * defined by CPU set \p set. - */ -static __hwloc_inline int -hwloc_get_obj_index_inside_cpuset (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_const_cpuset_t set, - hwloc_obj_t obj) __hwloc_attribute_pure; -static __hwloc_inline int -hwloc_get_obj_index_inside_cpuset (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_const_cpuset_t set, - hwloc_obj_t obj) -{ - int idx = 0; - if (!hwloc_bitmap_isincluded(obj->cpuset, set)) - return -1; - /* count how many objects are inside the cpuset on the way from us to the beginning of the level */ - while ((obj = obj->prev_cousin) != NULL) - if (hwloc_bitmap_isincluded(obj->cpuset, set)) - idx++; - return idx; -} - -/** @} */ - - - -/** \defgroup hwlocality_helper_find_covering Finding Objects covering at least CPU set - * @{ - */ - -/** \brief Get the child covering at least CPU set \p set. - * - * \return \c NULL if no child matches or if \p set is empty. - * - * \note This function cannot work if parent does not have a CPU set. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_child_covering_cpuset (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_const_cpuset_t set, - hwloc_obj_t parent) __hwloc_attribute_pure; -static __hwloc_inline hwloc_obj_t -hwloc_get_child_covering_cpuset (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_const_cpuset_t set, - hwloc_obj_t parent) -{ - hwloc_obj_t child; - if (!parent->cpuset || hwloc_bitmap_iszero(set)) - return NULL; - child = parent->first_child; - while (child) { - if (child->cpuset && hwloc_bitmap_isincluded(set, child->cpuset)) - return child; - child = child->next_sibling; - } - return NULL; -} - -/** \brief Get the lowest object covering at least CPU set \p set - * - * \return \c NULL if no object matches or if \p set is empty. - * - * \note This function cannot work if the root object does not have a CPU set, - * e.g. if the topology is made of different machines. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_obj_covering_cpuset (hwloc_topology_t topology, hwloc_const_cpuset_t set) __hwloc_attribute_pure; -static __hwloc_inline hwloc_obj_t -hwloc_get_obj_covering_cpuset (hwloc_topology_t topology, hwloc_const_cpuset_t set) -{ - struct hwloc_obj *current = hwloc_get_root_obj(topology); - if (hwloc_bitmap_iszero(set) || !current->cpuset || !hwloc_bitmap_isincluded(set, current->cpuset)) - return NULL; - while (1) { - hwloc_obj_t child = hwloc_get_child_covering_cpuset(topology, set, current); - if (!child) - return current; - current = child; - } -} - -/** \brief Iterate through same-depth objects covering at least CPU set \p set - * - * If object \p prev is \c NULL, return the first object at depth \p - * depth covering at least part of CPU set \p set. The next - * invokation should pass the previous return value in \p prev so as - * to obtain the next object covering at least another part of \p set. - * - * \note This function cannot work if objects at the given depth do - * not have CPU sets or if the topology is made of different machines. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_next_obj_covering_cpuset_by_depth(hwloc_topology_t topology, hwloc_const_cpuset_t set, - unsigned depth, hwloc_obj_t prev) -{ - hwloc_obj_t next = hwloc_get_next_obj_by_depth(topology, depth, prev); - if (!next || !next->cpuset) - return NULL; - while (next && !hwloc_bitmap_intersects(set, next->cpuset)) - next = next->next_cousin; - return next; -} - -/** \brief Iterate through same-type objects covering at least CPU set \p set - * - * If object \p prev is \c NULL, return the first object of type \p - * type covering at least part of CPU set \p set. The next invokation - * should pass the previous return value in \p prev so as to obtain - * the next object of type \p type covering at least another part of - * \p set. - * - * If there are no or multiple depths for type \p type, \c NULL is returned. - * The caller may fallback to hwloc_get_next_obj_covering_cpuset_by_depth() - * for each depth. - * - * \note This function cannot work if objects of the given type do - * not have CPU sets or if the topology is made of different machines. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_next_obj_covering_cpuset_by_type(hwloc_topology_t topology, hwloc_const_cpuset_t set, - hwloc_obj_type_t type, hwloc_obj_t prev) -{ - int depth = hwloc_get_type_depth(topology, type); - if (depth == HWLOC_TYPE_DEPTH_UNKNOWN || depth == HWLOC_TYPE_DEPTH_MULTIPLE) - return NULL; - return hwloc_get_next_obj_covering_cpuset_by_depth(topology, set, depth, prev); -} - -/** @} */ - - - -/** \defgroup hwlocality_helper_ancestors Looking at Ancestor and Child Objects - * @{ - * - * Be sure to see the figure in \ref termsanddefs that shows a - * complete topology tree, including depths, child/sibling/cousin - * relationships, and an example of an asymmetric topology where one - * package has fewer caches than its peers. - */ - -/** \brief Returns the ancestor object of \p obj at depth \p depth. */ -static __hwloc_inline hwloc_obj_t -hwloc_get_ancestor_obj_by_depth (hwloc_topology_t topology __hwloc_attribute_unused, unsigned depth, hwloc_obj_t obj) __hwloc_attribute_pure; -static __hwloc_inline hwloc_obj_t -hwloc_get_ancestor_obj_by_depth (hwloc_topology_t topology __hwloc_attribute_unused, unsigned depth, hwloc_obj_t obj) -{ - hwloc_obj_t ancestor = obj; - if (obj->depth < depth) - return NULL; - while (ancestor && ancestor->depth > depth) - ancestor = ancestor->parent; - return ancestor; -} - -/** \brief Returns the ancestor object of \p obj with type \p type. */ -static __hwloc_inline hwloc_obj_t -hwloc_get_ancestor_obj_by_type (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_type_t type, hwloc_obj_t obj) __hwloc_attribute_pure; -static __hwloc_inline hwloc_obj_t -hwloc_get_ancestor_obj_by_type (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_type_t type, hwloc_obj_t obj) -{ - hwloc_obj_t ancestor = obj->parent; - while (ancestor && ancestor->type != type) - ancestor = ancestor->parent; - return ancestor; -} - -/** \brief Returns the common parent object to objects \p obj1 and \p obj2 */ -static __hwloc_inline hwloc_obj_t -hwloc_get_common_ancestor_obj (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_t obj1, hwloc_obj_t obj2) __hwloc_attribute_pure; -static __hwloc_inline hwloc_obj_t -hwloc_get_common_ancestor_obj (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_t obj1, hwloc_obj_t obj2) -{ - /* the loop isn't so easy since intermediate ancestors may have - * different depth, causing us to alternate between using obj1->parent - * and obj2->parent. Also, even if at some point we find ancestors of - * of the same depth, their ancestors may have different depth again. - */ - while (obj1 != obj2) { - while (obj1->depth > obj2->depth) - obj1 = obj1->parent; - while (obj2->depth > obj1->depth) - obj2 = obj2->parent; - if (obj1 != obj2 && obj1->depth == obj2->depth) { - obj1 = obj1->parent; - obj2 = obj2->parent; - } - } - return obj1; -} - -/** \brief Returns true if \p obj is inside the subtree beginning with ancestor object \p subtree_root. - * - * \note This function assumes that both \p obj and \p subtree_root have a \p cpuset. - */ -static __hwloc_inline int -hwloc_obj_is_in_subtree (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_t obj, hwloc_obj_t subtree_root) __hwloc_attribute_pure; -static __hwloc_inline int -hwloc_obj_is_in_subtree (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_t obj, hwloc_obj_t subtree_root) -{ - return hwloc_bitmap_isincluded(obj->cpuset, subtree_root->cpuset); -} - -/** \brief Return the next child. - * - * If \p prev is \c NULL, return the first child. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_next_child (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_t parent, hwloc_obj_t prev) -{ - if (!prev) - return parent->first_child; - if (prev->parent != parent) - return NULL; - return prev->next_sibling; -} - -/** @} */ - - - -/** \defgroup hwlocality_helper_find_cache Looking at Cache Objects - * @{ - */ - -/** \brief Find the depth of cache objects matching cache depth and type. - * - * Return the depth of the topology level that contains cache objects - * whose attributes match \p cachedepth and \p cachetype. This function - * intends to disambiguate the case where hwloc_get_type_depth() returns - * ::HWLOC_TYPE_DEPTH_MULTIPLE. - * - * If no cache level matches, ::HWLOC_TYPE_DEPTH_UNKNOWN is returned. - * - * If \p cachetype is ::HWLOC_OBJ_CACHE_UNIFIED, the depth of the - * unique matching unified cache level is returned. - * - * If \p cachetype is ::HWLOC_OBJ_CACHE_DATA or ::HWLOC_OBJ_CACHE_INSTRUCTION, - * either a matching cache, or a unified cache is returned. - * - * If \p cachetype is \c -1, it is ignored and multiple levels may - * match. The function returns either the depth of a uniquely matching - * level or ::HWLOC_TYPE_DEPTH_MULTIPLE. - */ -static __hwloc_inline int -hwloc_get_cache_type_depth (hwloc_topology_t topology, - unsigned cachelevel, hwloc_obj_cache_type_t cachetype) -{ - int depth; - int found = HWLOC_TYPE_DEPTH_UNKNOWN; - for (depth=0; ; depth++) { - hwloc_obj_t obj = hwloc_get_obj_by_depth(topology, depth, 0); - if (!obj) - break; - if (obj->type != HWLOC_OBJ_CACHE || obj->attr->cache.depth != cachelevel) - /* doesn't match, try next depth */ - continue; - if (cachetype == (hwloc_obj_cache_type_t) -1) { - if (found != HWLOC_TYPE_DEPTH_UNKNOWN) { - /* second match, return MULTIPLE */ - return HWLOC_TYPE_DEPTH_MULTIPLE; - } - /* first match, mark it as found */ - found = depth; - continue; - } - if (obj->attr->cache.type == cachetype || obj->attr->cache.type == HWLOC_OBJ_CACHE_UNIFIED) - /* exact match (either unified is alone, or we match instruction or data), return immediately */ - return depth; - } - /* went to the bottom, return what we found */ - return found; -} - -/** \brief Get the first cache covering a cpuset \p set - * - * \return \c NULL if no cache matches. - * - * \note This function cannot work if the root object does not have a CPU set, - * e.g. if the topology is made of different machines. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_cache_covering_cpuset (hwloc_topology_t topology, hwloc_const_cpuset_t set) __hwloc_attribute_pure; -static __hwloc_inline hwloc_obj_t -hwloc_get_cache_covering_cpuset (hwloc_topology_t topology, hwloc_const_cpuset_t set) -{ - hwloc_obj_t current = hwloc_get_obj_covering_cpuset(topology, set); - while (current) { - if (current->type == HWLOC_OBJ_CACHE) - return current; - current = current->parent; - } - return NULL; -} - -/** \brief Get the first cache shared between an object and somebody else. - * - * \return \c NULL if no cache matches or if an invalid object is given. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_shared_cache_covering_obj (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_t obj) __hwloc_attribute_pure; -static __hwloc_inline hwloc_obj_t -hwloc_get_shared_cache_covering_obj (hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_t obj) -{ - hwloc_obj_t current = obj->parent; - if (!obj->cpuset) - return NULL; - while (current && current->cpuset) { - if (!hwloc_bitmap_isequal(current->cpuset, obj->cpuset) - && current->type == HWLOC_OBJ_CACHE) - return current; - current = current->parent; - } - return NULL; -} - -/** @} */ - - - -/** \defgroup hwlocality_helper_find_misc Finding objects, miscellaneous helpers - * @{ - * - * Be sure to see the figure in \ref termsanddefs that shows a - * complete topology tree, including depths, child/sibling/cousin - * relationships, and an example of an asymmetric topology where one - * package has fewer caches than its peers. - */ - -/** \brief Returns the object of type ::HWLOC_OBJ_PU with \p os_index. - * - * This function is useful for converting a CPU set into the PU - * objects it contains. - * When retrieving the current binding (e.g. with hwloc_get_cpubind()), - * one may iterate over the bits of the resulting CPU set with - * hwloc_bitmap_foreach_begin(), and find the corresponding PUs - * with this function. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_pu_obj_by_os_index(hwloc_topology_t topology, unsigned os_index) __hwloc_attribute_pure; -static __hwloc_inline hwloc_obj_t -hwloc_get_pu_obj_by_os_index(hwloc_topology_t topology, unsigned os_index) -{ - hwloc_obj_t obj = NULL; - while ((obj = hwloc_get_next_obj_by_type(topology, HWLOC_OBJ_PU, obj)) != NULL) - if (obj->os_index == os_index) - return obj; - return NULL; -} - -/** \brief Returns the object of type ::HWLOC_OBJ_NUMANODE with \p os_index. - * - * This function is useful for converting a nodeset into the NUMA node - * objects it contains. - * When retrieving the current binding (e.g. with hwloc_get_membind_nodeset()), - * one may iterate over the bits of the resulting nodeset with - * hwloc_bitmap_foreach_begin(), and find the corresponding NUMA nodes - * with this function. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_numanode_obj_by_os_index(hwloc_topology_t topology, unsigned os_index) __hwloc_attribute_pure; -static __hwloc_inline hwloc_obj_t -hwloc_get_numanode_obj_by_os_index(hwloc_topology_t topology, unsigned os_index) -{ - hwloc_obj_t obj = NULL; - while ((obj = hwloc_get_next_obj_by_type(topology, HWLOC_OBJ_NUMANODE, obj)) != NULL) - if (obj->os_index == os_index) - return obj; - return NULL; -} - -/** \brief Do a depth-first traversal of the topology to find and sort - * - * all objects that are at the same depth than \p src. - * Report in \p objs up to \p max physically closest ones to \p src. - * - * \return the number of objects returned in \p objs. - * - * \return 0 if \p src is an I/O object. - * - * \note This function requires the \p src object to have a CPU set. - */ -/* TODO: rather provide an iterator? Provide a way to know how much should be allocated? By returning the total number of objects instead? */ -HWLOC_DECLSPEC unsigned hwloc_get_closest_objs (hwloc_topology_t topology, hwloc_obj_t src, hwloc_obj_t * __hwloc_restrict objs, unsigned max); - -/** \brief Find an object below another object, both specified by types and indexes. - * - * Start from the top system object and find object of type \p type1 - * and logical index \p idx1. Then look below this object and find another - * object of type \p type2 and logical index \p idx2. Indexes are specified - * within the parent, not withing the entire system. - * - * For instance, if type1 is PACKAGE, idx1 is 2, type2 is CORE and idx2 - * is 3, return the fourth core object below the third package. - * - * \note This function requires these objects to have a CPU set. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_obj_below_by_type (hwloc_topology_t topology, - hwloc_obj_type_t type1, unsigned idx1, - hwloc_obj_type_t type2, unsigned idx2) __hwloc_attribute_pure; -static __hwloc_inline hwloc_obj_t -hwloc_get_obj_below_by_type (hwloc_topology_t topology, - hwloc_obj_type_t type1, unsigned idx1, - hwloc_obj_type_t type2, unsigned idx2) -{ - hwloc_obj_t obj; - obj = hwloc_get_obj_by_type (topology, type1, idx1); - if (!obj || !obj->cpuset) - return NULL; - return hwloc_get_obj_inside_cpuset_by_type(topology, obj->cpuset, type2, idx2); -} - -/** \brief Find an object below a chain of objects specified by types and indexes. - * - * This is a generalized version of hwloc_get_obj_below_by_type(). - * - * Arrays \p typev and \p idxv must contain \p nr types and indexes. - * - * Start from the top system object and walk the arrays \p typev and \p idxv. - * For each type and logical index couple in the arrays, look under the previously found - * object to find the index-th object of the given type. - * Indexes are specified within the parent, not withing the entire system. - * - * For instance, if nr is 3, typev contains NODE, PACKAGE and CORE, - * and idxv contains 0, 1 and 2, return the third core object below - * the second package below the first NUMA node. - * - * \note This function requires all these objects and the root object - * to have a CPU set. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_obj_below_array_by_type (hwloc_topology_t topology, int nr, hwloc_obj_type_t *typev, unsigned *idxv) __hwloc_attribute_pure; -static __hwloc_inline hwloc_obj_t -hwloc_get_obj_below_array_by_type (hwloc_topology_t topology, int nr, hwloc_obj_type_t *typev, unsigned *idxv) -{ - hwloc_obj_t obj = hwloc_get_root_obj(topology); - int i; - for(i=0; icpuset) - return NULL; - obj = hwloc_get_obj_inside_cpuset_by_type(topology, obj->cpuset, typev[i], idxv[i]); - } - return obj; -} - -/** @} */ - - - -/** \defgroup hwlocality_helper_distribute Distributing items over a topology - * @{ - */ - -/** \brief Flags to be given to hwloc_distrib(). - */ -enum hwloc_distrib_flags_e { - /** \brief Distrib in reverse order, starting from the last objects. - * \hideinitializer - */ - HWLOC_DISTRIB_FLAG_REVERSE = (1UL<<0) -}; - -/** \brief Distribute \p n items over the topology under \p roots - * - * Array \p set will be filled with \p n cpusets recursively distributed - * linearly over the topology under objects \p roots, down to depth \p until - * (which can be INT_MAX to distribute down to the finest level). - * - * \p n_roots is usually 1 and \p roots only contains the topology root object - * so as to distribute over the entire topology. - * - * This is typically useful when an application wants to distribute \p n - * threads over a machine, giving each of them as much private cache as - * possible and keeping them locally in number order. - * - * The caller may typically want to also call hwloc_bitmap_singlify() - * before binding a thread so that it does not move at all. - * - * \p flags should be 0 or a OR'ed set of ::hwloc_distrib_flags_e. - * - * \note This function requires the \p roots objects to have a CPU set. - * - * \note This function replaces the now deprecated hwloc_distribute() - * and hwloc_distributev() functions. - */ -static __hwloc_inline int -hwloc_distrib(hwloc_topology_t topology, - hwloc_obj_t *roots, unsigned n_roots, - hwloc_cpuset_t *set, - unsigned n, - unsigned until, unsigned long flags) -{ - unsigned i; - unsigned tot_weight; - unsigned given, givenweight; - hwloc_cpuset_t *cpusetp = set; - - if (flags & ~HWLOC_DISTRIB_FLAG_REVERSE) { - errno = EINVAL; - return -1; - } - - tot_weight = 0; - for (i = 0; i < n_roots; i++) - if (roots[i]->cpuset) - tot_weight += hwloc_bitmap_weight(roots[i]->cpuset); - - for (i = 0, given = 0, givenweight = 0; i < n_roots; i++) { - unsigned chunk, weight; - hwloc_obj_t root = roots[flags & HWLOC_DISTRIB_FLAG_REVERSE ? n_roots-1-i : i]; - hwloc_cpuset_t cpuset = root->cpuset; - if (!cpuset) - continue; - weight = hwloc_bitmap_weight(cpuset); - if (!weight) - continue; - /* Give to root a chunk proportional to its weight. - * If previous chunks got rounded-up, we may get a bit less. */ - chunk = (( (givenweight+weight) * n + tot_weight-1) / tot_weight) - - (( givenweight * n + tot_weight-1) / tot_weight); - if (!root->arity || chunk <= 1 || root->depth >= until) { - /* We can't split any more, put everything there. */ - if (chunk) { - /* Fill cpusets with ours */ - unsigned j; - for (j=0; j < chunk; j++) - cpusetp[j] = hwloc_bitmap_dup(cpuset); - } else { - /* We got no chunk, just merge our cpuset to a previous one - * (the first chunk cannot be empty) - * so that this root doesn't get ignored. - */ - assert(given); - hwloc_bitmap_or(cpusetp[-1], cpusetp[-1], cpuset); - } - } else { - /* Still more to distribute, recurse into children */ - hwloc_distrib(topology, root->children, root->arity, cpusetp, chunk, until, flags); - } - cpusetp += chunk; - given += chunk; - givenweight += weight; - } - - return 0; -} - -/** @} */ - - - -/** \defgroup hwlocality_helper_topology_sets CPU and node sets of entire topologies - * @{ - */ -/** \brief Get complete CPU set - * - * \return the complete CPU set of logical processors of the system. If the - * topology is the result of a combination of several systems, NULL is - * returned. - * - * \note The returned cpuset is not newly allocated and should thus not be - * changed or freed; hwloc_bitmap_dup() must be used to obtain a local copy. - */ -static __hwloc_inline hwloc_const_cpuset_t -hwloc_topology_get_complete_cpuset(hwloc_topology_t topology) __hwloc_attribute_pure; -static __hwloc_inline hwloc_const_cpuset_t -hwloc_topology_get_complete_cpuset(hwloc_topology_t topology) -{ - return hwloc_get_root_obj(topology)->complete_cpuset; -} - -/** \brief Get topology CPU set - * - * \return the CPU set of logical processors of the system for which hwloc - * provides topology information. This is equivalent to the cpuset of the - * system object. If the topology is the result of a combination of several - * systems, NULL is returned. - * - * \note The returned cpuset is not newly allocated and should thus not be - * changed or freed; hwloc_bitmap_dup() must be used to obtain a local copy. - */ -static __hwloc_inline hwloc_const_cpuset_t -hwloc_topology_get_topology_cpuset(hwloc_topology_t topology) __hwloc_attribute_pure; -static __hwloc_inline hwloc_const_cpuset_t -hwloc_topology_get_topology_cpuset(hwloc_topology_t topology) -{ - return hwloc_get_root_obj(topology)->cpuset; -} - -/** \brief Get online CPU set - * - * \return the CPU set of online logical processors of the system. If the - * topology is the result of a combination of several systems, NULL is - * returned. - * - * \note The returned cpuset is not newly allocated and should thus not be - * changed or freed; hwloc_bitmap_dup() must be used to obtain a local copy. - */ -static __hwloc_inline hwloc_const_cpuset_t -hwloc_topology_get_online_cpuset(hwloc_topology_t topology) __hwloc_attribute_pure; -static __hwloc_inline hwloc_const_cpuset_t -hwloc_topology_get_online_cpuset(hwloc_topology_t topology) -{ - return hwloc_get_root_obj(topology)->online_cpuset; -} - -/** \brief Get allowed CPU set - * - * \return the CPU set of allowed logical processors of the system. If the - * topology is the result of a combination of several systems, NULL is - * returned. - * - * \note The returned cpuset is not newly allocated and should thus not be - * changed or freed, hwloc_bitmap_dup() must be used to obtain a local copy. - */ -static __hwloc_inline hwloc_const_cpuset_t -hwloc_topology_get_allowed_cpuset(hwloc_topology_t topology) __hwloc_attribute_pure; -static __hwloc_inline hwloc_const_cpuset_t -hwloc_topology_get_allowed_cpuset(hwloc_topology_t topology) -{ - return hwloc_get_root_obj(topology)->allowed_cpuset; -} - -/** \brief Get complete node set - * - * \return the complete node set of memory of the system. If the - * topology is the result of a combination of several systems, NULL is - * returned. - * - * \note The returned nodeset is not newly allocated and should thus not be - * changed or freed; hwloc_bitmap_dup() must be used to obtain a local copy. - */ -static __hwloc_inline hwloc_const_nodeset_t -hwloc_topology_get_complete_nodeset(hwloc_topology_t topology) __hwloc_attribute_pure; -static __hwloc_inline hwloc_const_nodeset_t -hwloc_topology_get_complete_nodeset(hwloc_topology_t topology) -{ - return hwloc_get_root_obj(topology)->complete_nodeset; -} - -/** \brief Get topology node set - * - * \return the node set of memory of the system for which hwloc - * provides topology information. This is equivalent to the nodeset of the - * system object. If the topology is the result of a combination of several - * systems, NULL is returned. - * - * \note The returned nodeset is not newly allocated and should thus not be - * changed or freed; hwloc_bitmap_dup() must be used to obtain a local copy. - */ -static __hwloc_inline hwloc_const_nodeset_t -hwloc_topology_get_topology_nodeset(hwloc_topology_t topology) __hwloc_attribute_pure; -static __hwloc_inline hwloc_const_nodeset_t -hwloc_topology_get_topology_nodeset(hwloc_topology_t topology) -{ - return hwloc_get_root_obj(topology)->nodeset; -} - -/** \brief Get allowed node set - * - * \return the node set of allowed memory of the system. If the - * topology is the result of a combination of several systems, NULL is - * returned. - * - * \note The returned nodeset is not newly allocated and should thus not be - * changed or freed, hwloc_bitmap_dup() must be used to obtain a local copy. - */ -static __hwloc_inline hwloc_const_nodeset_t -hwloc_topology_get_allowed_nodeset(hwloc_topology_t topology) __hwloc_attribute_pure; -static __hwloc_inline hwloc_const_nodeset_t -hwloc_topology_get_allowed_nodeset(hwloc_topology_t topology) -{ - return hwloc_get_root_obj(topology)->allowed_nodeset; -} - -/** @} */ - - - -/** \defgroup hwlocality_helper_nodeset_convert Converting between CPU sets and node sets - * - * There are two semantics for converting cpusets to nodesets depending on how - * non-NUMA machines are handled. - * - * When manipulating nodesets for memory binding, non-NUMA machines should be - * considered as having a single NUMA node. The standard conversion routines - * below should be used so that marking the first bit of the nodeset means - * that memory should be bound to a non-NUMA whole machine. - * - * When manipulating nodesets as an actual list of NUMA nodes without any - * need to handle memory binding on non-NUMA machines, the strict conversion - * routines may be used instead. - * @{ - */ - -/** \brief Convert a CPU set into a NUMA node set and handle non-NUMA cases - * - * If some NUMA nodes have no CPUs at all, this function never sets their - * indexes in the output node set, even if a full CPU set is given in input. - * - * If the topology contains no NUMA nodes, the machine is considered - * as a single memory node, and the following behavior is used: - * If \p cpuset is empty, \p nodeset will be emptied as well. - * Otherwise \p nodeset will be entirely filled. - */ -static __hwloc_inline void -hwloc_cpuset_to_nodeset(hwloc_topology_t topology, hwloc_const_cpuset_t _cpuset, hwloc_nodeset_t nodeset) -{ - int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE); - hwloc_obj_t obj; - - if (depth == HWLOC_TYPE_DEPTH_UNKNOWN) { - if (hwloc_bitmap_iszero(_cpuset)) - hwloc_bitmap_zero(nodeset); - else - /* Assume the whole system */ - hwloc_bitmap_fill(nodeset); - return; - } - - hwloc_bitmap_zero(nodeset); - obj = NULL; - while ((obj = hwloc_get_next_obj_covering_cpuset_by_depth(topology, _cpuset, depth, obj)) != NULL) - hwloc_bitmap_set(nodeset, obj->os_index); -} - -/** \brief Convert a CPU set into a NUMA node set without handling non-NUMA cases - * - * This is the strict variant of hwloc_cpuset_to_nodeset(). It does not fix - * non-NUMA cases. If the topology contains some NUMA nodes, behave exactly - * the same. However, if the topology contains no NUMA nodes, return an empty - * nodeset. - */ -static __hwloc_inline void -hwloc_cpuset_to_nodeset_strict(struct hwloc_topology *topology, hwloc_const_cpuset_t _cpuset, hwloc_nodeset_t nodeset) -{ - int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE); - hwloc_obj_t obj; - if (depth == HWLOC_TYPE_DEPTH_UNKNOWN ) - return; - hwloc_bitmap_zero(nodeset); - obj = NULL; - while ((obj = hwloc_get_next_obj_covering_cpuset_by_depth(topology, _cpuset, depth, obj)) != NULL) - hwloc_bitmap_set(nodeset, obj->os_index); -} - -/** \brief Convert a NUMA node set into a CPU set and handle non-NUMA cases - * - * If the topology contains no NUMA nodes, the machine is considered - * as a single memory node, and the following behavior is used: - * If \p nodeset is empty, \p cpuset will be emptied as well. - * Otherwise \p cpuset will be entirely filled. - * This is useful for manipulating memory binding sets. - */ -static __hwloc_inline void -hwloc_cpuset_from_nodeset(hwloc_topology_t topology, hwloc_cpuset_t _cpuset, hwloc_const_nodeset_t nodeset) -{ - int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE); - hwloc_obj_t obj; - - if (depth == HWLOC_TYPE_DEPTH_UNKNOWN ) { - if (hwloc_bitmap_iszero(nodeset)) - hwloc_bitmap_zero(_cpuset); - else - /* Assume the whole system */ - hwloc_bitmap_fill(_cpuset); - return; - } - - hwloc_bitmap_zero(_cpuset); - obj = NULL; - while ((obj = hwloc_get_next_obj_by_depth(topology, depth, obj)) != NULL) { - if (hwloc_bitmap_isset(nodeset, obj->os_index)) - /* no need to check obj->cpuset because objects in levels always have a cpuset */ - hwloc_bitmap_or(_cpuset, _cpuset, obj->cpuset); - } -} - -/** \brief Convert a NUMA node set into a CPU set without handling non-NUMA cases - * - * This is the strict variant of hwloc_cpuset_from_nodeset(). It does not fix - * non-NUMA cases. If the topology contains some NUMA nodes, behave exactly - * the same. However, if the topology contains no NUMA nodes, return an empty - * cpuset. - */ -static __hwloc_inline void -hwloc_cpuset_from_nodeset_strict(struct hwloc_topology *topology, hwloc_cpuset_t _cpuset, hwloc_const_nodeset_t nodeset) -{ - int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE); - hwloc_obj_t obj; - if (depth == HWLOC_TYPE_DEPTH_UNKNOWN ) - return; - hwloc_bitmap_zero(_cpuset); - obj = NULL; - while ((obj = hwloc_get_next_obj_by_depth(topology, depth, obj)) != NULL) - if (hwloc_bitmap_isset(nodeset, obj->os_index)) - /* no need to check obj->cpuset because objects in levels always have a cpuset */ - hwloc_bitmap_or(_cpuset, _cpuset, obj->cpuset); -} - -/** @} */ - - - -/** \defgroup hwlocality_distances Manipulating Distances - * @{ - */ - -/** \brief Get the distances between all objects at the given depth. - * - * \return a distances structure containing a matrix with all distances - * between all objects at the given depth. - * - * Slot i+nbobjs*j contains the distance from the object of logical index i - * the object of logical index j. - * - * \note This function only returns matrices covering the whole topology, - * without any unknown distance value. Those matrices are available in - * top-level object of the hierarchy. Matrices of lower objects are not - * reported here since they cover only part of the machine. - * - * The returned structure belongs to the hwloc library. The caller should - * not modify or free it. - * - * \return \c NULL if no such distance matrix exists. - */ - -static __hwloc_inline const struct hwloc_distances_s * -hwloc_get_whole_distance_matrix_by_depth(hwloc_topology_t topology, unsigned depth) -{ - hwloc_obj_t root = hwloc_get_root_obj(topology); - unsigned i; - for(i=0; idistances_count; i++) - if (root->distances[i]->relative_depth == depth) - return root->distances[i]; - return NULL; -} - -/** \brief Get the distances between all objects of a given type. - * - * \return a distances structure containing a matrix with all distances - * between all objects of the given type. - * - * Slot i+nbobjs*j contains the distance from the object of logical index i - * the object of logical index j. - * - * \note This function only returns matrices covering the whole topology, - * without any unknown distance value. Those matrices are available in - * top-level object of the hierarchy. Matrices of lower objects are not - * reported here since they cover only part of the machine. - * - * The returned structure belongs to the hwloc library. The caller should - * not modify or free it. - * - * \return \c NULL if no such distance matrix exists. - */ - -static __hwloc_inline const struct hwloc_distances_s * -hwloc_get_whole_distance_matrix_by_type(hwloc_topology_t topology, hwloc_obj_type_t type) -{ - int depth = hwloc_get_type_depth(topology, type); - if (depth < 0) - return NULL; - return hwloc_get_whole_distance_matrix_by_depth(topology, depth); -} - -/** \brief Get distances for the given depth and covering some objects - * - * Return a distance matrix that describes depth \p depth and covers at - * least object \p obj and all its children. - * - * When looking for the distance between some objects, a common ancestor should - * be passed in \p obj. - * - * \p firstp is set to logical index of the first object described by the matrix. - * - * The returned structure belongs to the hwloc library. The caller should - * not modify or free it. - */ -static __hwloc_inline const struct hwloc_distances_s * -hwloc_get_distance_matrix_covering_obj_by_depth(hwloc_topology_t topology, - hwloc_obj_t obj, unsigned depth, - unsigned *firstp) -{ - while (obj && obj->cpuset) { - unsigned i; - for(i=0; idistances_count; i++) - if (obj->distances[i]->relative_depth == depth - obj->depth) { - if (!obj->distances[i]->nbobjs) - continue; - *firstp = hwloc_get_next_obj_inside_cpuset_by_depth(topology, obj->cpuset, depth, NULL)->logical_index; - return obj->distances[i]; - } - obj = obj->parent; - } - return NULL; -} - -/** \brief Get the latency in both directions between two objects. - * - * Look at ancestor objects from the bottom to the top until one of them - * contains a distance matrix that matches the objects exactly. - * - * \p latency gets the value from object \p obj1 to \p obj2, while - * \p reverse_latency gets the reverse-direction value, which - * may be different on some architectures. - * - * \return -1 if no ancestor contains a matching latency matrix. - */ -static __hwloc_inline int -hwloc_get_latency(hwloc_topology_t topology, - hwloc_obj_t obj1, hwloc_obj_t obj2, - float *latency, float *reverse_latency) -{ - hwloc_obj_t ancestor; - const struct hwloc_distances_s * distances; - unsigned first_logical ; - - if (obj1->depth != obj2->depth) { - errno = EINVAL; - return -1; - } - - ancestor = hwloc_get_common_ancestor_obj(topology, obj1, obj2); - distances = hwloc_get_distance_matrix_covering_obj_by_depth(topology, ancestor, obj1->depth, &first_logical); - if (distances && distances->latency) { - const float * latency_matrix = distances->latency; - unsigned nbobjs = distances->nbobjs; - unsigned l1 = obj1->logical_index - first_logical; - unsigned l2 = obj2->logical_index - first_logical; - *latency = latency_matrix[l1*nbobjs+l2]; - *reverse_latency = latency_matrix[l2*nbobjs+l1]; - return 0; - } - - errno = ENOSYS; - return -1; -} - -/** @} */ - - - -/** \defgroup hwlocality_advanced_io Finding I/O objects - * @{ - */ - -/** \brief Get the first non-I/O ancestor object. - * - * Given the I/O object \p ioobj, find the smallest non-I/O ancestor - * object. This regular object may then be used for binding because - * its locality is the same as \p ioobj. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_non_io_ancestor_obj(hwloc_topology_t topology __hwloc_attribute_unused, - hwloc_obj_t ioobj) -{ - hwloc_obj_t obj = ioobj; - while (obj && !obj->cpuset) { - obj = obj->parent; - } - return obj; -} - -/** \brief Get the next PCI device in the system. - * - * \return the first PCI device if \p prev is \c NULL. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_next_pcidev(hwloc_topology_t topology, hwloc_obj_t prev) -{ - return hwloc_get_next_obj_by_type(topology, HWLOC_OBJ_PCI_DEVICE, prev); -} - -/** \brief Find the PCI device object matching the PCI bus id - * given domain, bus device and function PCI bus id. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_pcidev_by_busid(hwloc_topology_t topology, - unsigned domain, unsigned bus, unsigned dev, unsigned func) -{ - hwloc_obj_t obj = NULL; - while ((obj = hwloc_get_next_pcidev(topology, obj)) != NULL) { - if (obj->attr->pcidev.domain == domain - && obj->attr->pcidev.bus == bus - && obj->attr->pcidev.dev == dev - && obj->attr->pcidev.func == func) - return obj; - } - return NULL; -} - -/** \brief Find the PCI device object matching the PCI bus id - * given as a string xxxx:yy:zz.t or yy:zz.t. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_pcidev_by_busidstring(hwloc_topology_t topology, const char *busid) -{ - unsigned domain = 0; /* default */ - unsigned bus, dev, func; - - if (sscanf(busid, "%x:%x.%x", &bus, &dev, &func) != 3 - && sscanf(busid, "%x:%x:%x.%x", &domain, &bus, &dev, &func) != 4) { - errno = EINVAL; - return NULL; - } - - return hwloc_get_pcidev_by_busid(topology, domain, bus, dev, func); -} - -/** \brief Get the next OS device in the system. - * - * \return the first OS device if \p prev is \c NULL. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_next_osdev(hwloc_topology_t topology, hwloc_obj_t prev) -{ - return hwloc_get_next_obj_by_type(topology, HWLOC_OBJ_OS_DEVICE, prev); -} - -/** \brief Get the next bridge in the system. - * - * \return the first bridge if \p prev is \c NULL. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_next_bridge(hwloc_topology_t topology, hwloc_obj_t prev) -{ - return hwloc_get_next_obj_by_type(topology, HWLOC_OBJ_BRIDGE, prev); -} - -/* \brief Checks whether a given bridge covers a given PCI bus. - */ -static __hwloc_inline int -hwloc_bridge_covers_pcibus(hwloc_obj_t bridge, - unsigned domain, unsigned bus) -{ - return bridge->type == HWLOC_OBJ_BRIDGE - && bridge->attr->bridge.downstream_type == HWLOC_OBJ_BRIDGE_PCI - && bridge->attr->bridge.downstream.pci.domain == domain - && bridge->attr->bridge.downstream.pci.secondary_bus <= bus - && bridge->attr->bridge.downstream.pci.subordinate_bus >= bus; -} - -/** \brief Find the hostbridge that covers the given PCI bus. - * - * This is useful for finding the locality of a bus because - * it is the hostbridge parent cpuset. - */ -static __hwloc_inline hwloc_obj_t -hwloc_get_hostbridge_by_pcibus(hwloc_topology_t topology, - unsigned domain, unsigned bus) -{ - hwloc_obj_t obj = NULL; - while ((obj = hwloc_get_next_bridge(topology, obj)) != NULL) { - if (hwloc_bridge_covers_pcibus(obj, domain, bus)) { - /* found bridge covering this pcibus, make sure it's a hostbridge */ - assert(obj->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_HOST); - assert(obj->parent->type != HWLOC_OBJ_BRIDGE); - assert(obj->parent->cpuset); - return obj; - } - } - return NULL; -} - -/** @} */ - - - -#ifdef __cplusplus -} /* extern "C" */ -#endif - - -#endif /* HWLOC_HELPER_H */ diff --git a/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/intel-mic.h b/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/intel-mic.h deleted file mode 100644 index d58237b3d4b..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/intel-mic.h +++ /dev/null @@ -1,143 +0,0 @@ -/* - * Copyright © 2013 Inria. All rights reserved. - * See COPYING in top-level directory. - */ - -/** \file - * \brief Macros to help interaction between hwloc and Intel Xeon Phi (MIC). - * - * Applications that use both hwloc and Intel Xeon Phi (MIC) may want to - * include this file so as to get topology information for MIC devices. - */ - -#ifndef HWLOC_INTEL_MIC_H -#define HWLOC_INTEL_MIC_H - -#include -#include -#include -#ifdef HWLOC_LINUX_SYS -#include -#include -#include -#endif - -#include -#include - - -#ifdef __cplusplus -extern "C" { -#endif - - -/** \defgroup hwlocality_intel_mic Interoperability with Intel Xeon Phi (MIC) - * - * This interface offers ways to retrieve topology information about - * Intel Xeon Phi (MIC) devices. - * - * @{ - */ - -/** \brief Get the CPU set of logical processors that are physically - * close to MIC device whose index is \p idx. - * - * Return the CPU set describing the locality of the MIC device whose index is \p idx. - * - * Topology \p topology and device index \p idx must match the local machine. - * I/O devices detection is not needed in the topology. - * - * The function only returns the locality of the device. - * If more information about the device is needed, OS objects should - * be used instead, see hwloc_intel_mic_get_device_osdev_by_index(). - * - * This function is currently only implemented in a meaningful way for - * Linux; other systems will simply get a full cpuset. - */ -static __hwloc_inline int -hwloc_intel_mic_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused, - int idx __hwloc_attribute_unused, - hwloc_cpuset_t set) -{ -#ifdef HWLOC_LINUX_SYS - /* If we're on Linux, use the sysfs mechanism to get the local cpus */ -#define HWLOC_INTEL_MIC_DEVICE_SYSFS_PATH_MAX 128 - char path[HWLOC_INTEL_MIC_DEVICE_SYSFS_PATH_MAX]; - DIR *sysdir = NULL; - FILE *sysfile = NULL; - struct dirent *dirent; - unsigned pcibus, pcidev, pcifunc; - - if (!hwloc_topology_is_thissystem(topology)) { - errno = EINVAL; - return -1; - } - - sprintf(path, "/sys/class/mic/mic%d", idx); - sysdir = opendir(path); - if (!sysdir) - return -1; - - while ((dirent = readdir(sysdir)) != NULL) { - if (sscanf(dirent->d_name, "pci_%02x:%02x.%02x", &pcibus, &pcidev, &pcifunc) == 3) { - sprintf(path, "/sys/class/mic/mic%d/pci_%02x:%02x.%02x/local_cpus", idx, pcibus, pcidev, pcifunc); - sysfile = fopen(path, "r"); - if (!sysfile) { - closedir(sysdir); - return -1; - } - - hwloc_linux_parse_cpumap_file(sysfile, set); - if (hwloc_bitmap_iszero(set)) - hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); - - fclose(sysfile); - break; - } - } - - closedir(sysdir); -#else - /* Non-Linux systems simply get a full cpuset */ - hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); -#endif - return 0; -} - -/** \brief Get the hwloc OS device object corresponding to the - * MIC device for the given index. - * - * Return the OS device object describing the MIC device whose index is \p idx. - * Return NULL if there is none. - * - * The topology \p topology does not necessarily have to match the current - * machine. For instance the topology may be an XML import of a remote host. - * I/O devices detection must be enabled in the topology. - * - * \note The corresponding PCI device object can be obtained by looking - * at the OS device parent object. - */ -static __hwloc_inline hwloc_obj_t -hwloc_intel_mic_get_device_osdev_by_index(hwloc_topology_t topology, - unsigned idx) -{ - hwloc_obj_t osdev = NULL; - while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) { - if (HWLOC_OBJ_OSDEV_COPROC == osdev->attr->osdev.type - && osdev->name - && !strncmp("mic", osdev->name, 3) - && atoi(osdev->name + 3) == (int) idx) - return osdev; - } - return NULL; -} - -/** @} */ - - -#ifdef __cplusplus -} /* extern "C" */ -#endif - - -#endif /* HWLOC_INTEL_MIC_H */ diff --git a/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/linux-libnuma.h b/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/linux-libnuma.h deleted file mode 100644 index 74dbfd0a969..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/linux-libnuma.h +++ /dev/null @@ -1,355 +0,0 @@ -/* - * Copyright © 2009 CNRS - * Copyright © 2009-2014 Inria. All rights reserved. - * Copyright © 2009-2010, 2012 Université Bordeaux - * See COPYING in top-level directory. - */ - -/** \file - * \brief Macros to help interaction between hwloc and Linux libnuma. - * - * Applications that use both Linux libnuma and hwloc may want to - * include this file so as to ease conversion between their respective types. -*/ - -#ifndef HWLOC_LINUX_LIBNUMA_H -#define HWLOC_LINUX_LIBNUMA_H - -#include -#include - - -#ifdef __cplusplus -extern "C" { -#endif - - -/** \defgroup hwlocality_linux_libnuma_ulongs Interoperability with Linux libnuma unsigned long masks - * - * This interface helps converting between Linux libnuma unsigned long masks - * and hwloc cpusets and nodesets. - * - * It also offers a consistent behavior on non-NUMA machines - * or non-NUMA-aware kernels by assuming that the machines have a single - * NUMA node. - * - * \note Topology \p topology must match the current machine. - * - * \note The behavior of libnuma is undefined if the kernel is not NUMA-aware. - * (when CONFIG_NUMA is not set in the kernel configuration). - * This helper and libnuma may thus not be strictly compatible in this case, - * which may be detected by checking whether numa_available() returns -1. - * - * @{ - */ - - -/** \brief Convert hwloc CPU set \p cpuset into the array of unsigned long \p mask - * - * \p mask is the array of unsigned long that will be filled. - * \p maxnode contains the maximal node number that may be stored in \p mask. - * \p maxnode will be set to the maximal node number that was found, plus one. - * - * This function may be used before calling set_mempolicy, mbind, migrate_pages - * or any other function that takes an array of unsigned long and a maximal - * node number as input parameter. - */ -static __hwloc_inline int -hwloc_cpuset_to_linux_libnuma_ulongs(hwloc_topology_t topology, hwloc_const_cpuset_t cpuset, - unsigned long *mask, unsigned long *maxnode) -{ - int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE); - unsigned long outmaxnode = -1; - - /* round-up to the next ulong and clear all bytes */ - *maxnode = (*maxnode + 8*sizeof(*mask) - 1) & ~(8*sizeof(*mask) - 1); - memset(mask, 0, *maxnode/8); - - if (depth != HWLOC_TYPE_DEPTH_UNKNOWN) { - hwloc_obj_t node = NULL; - while ((node = hwloc_get_next_obj_covering_cpuset_by_depth(topology, cpuset, depth, node)) != NULL) { - if (node->os_index >= *maxnode) - continue; - mask[node->os_index/sizeof(*mask)/8] |= 1UL << (node->os_index % (sizeof(*mask)*8)); - if (outmaxnode == (unsigned long) -1 || outmaxnode < node->os_index) - outmaxnode = node->os_index; - } - - } else { - /* if no numa, libnuma assumes we have a single node */ - if (!hwloc_bitmap_iszero(cpuset)) { - mask[0] = 1; - outmaxnode = 0; - } - } - - *maxnode = outmaxnode+1; - return 0; -} - -/** \brief Convert hwloc NUMA node set \p nodeset into the array of unsigned long \p mask - * - * \p mask is the array of unsigned long that will be filled. - * \p maxnode contains the maximal node number that may be stored in \p mask. - * \p maxnode will be set to the maximal node number that was found, plus one. - * - * This function may be used before calling set_mempolicy, mbind, migrate_pages - * or any other function that takes an array of unsigned long and a maximal - * node number as input parameter. - */ -static __hwloc_inline int -hwloc_nodeset_to_linux_libnuma_ulongs(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset, - unsigned long *mask, unsigned long *maxnode) -{ - int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE); - unsigned long outmaxnode = -1; - - /* round-up to the next ulong and clear all bytes */ - *maxnode = (*maxnode + 8*sizeof(*mask) - 1) & ~(8*sizeof(*mask) - 1); - memset(mask, 0, *maxnode/8); - - if (depth != HWLOC_TYPE_DEPTH_UNKNOWN) { - hwloc_obj_t node = NULL; - while ((node = hwloc_get_next_obj_by_depth(topology, depth, node)) != NULL) { - if (node->os_index >= *maxnode) - continue; - if (!hwloc_bitmap_isset(nodeset, node->os_index)) - continue; - mask[node->os_index/sizeof(*mask)/8] |= 1UL << (node->os_index % (sizeof(*mask)*8)); - if (outmaxnode == (unsigned long) -1 || outmaxnode < node->os_index) - outmaxnode = node->os_index; - } - - } else { - /* if no numa, libnuma assumes we have a single node */ - if (!hwloc_bitmap_iszero(nodeset)) { - mask[0] = 1; - outmaxnode = 0; - } - } - - *maxnode = outmaxnode+1; - return 0; -} - -/** \brief Convert the array of unsigned long \p mask into hwloc CPU set - * - * \p mask is a array of unsigned long that will be read. - * \p maxnode contains the maximal node number that may be read in \p mask. - * - * This function may be used after calling get_mempolicy or any other function - * that takes an array of unsigned long as output parameter (and possibly - * a maximal node number as input parameter). - */ -static __hwloc_inline int -hwloc_cpuset_from_linux_libnuma_ulongs(hwloc_topology_t topology, hwloc_cpuset_t cpuset, - const unsigned long *mask, unsigned long maxnode) -{ - int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE); - - if (depth != HWLOC_TYPE_DEPTH_UNKNOWN) { - hwloc_obj_t node = NULL; - hwloc_bitmap_zero(cpuset); - while ((node = hwloc_get_next_obj_by_depth(topology, depth, node)) != NULL) - if (node->os_index < maxnode - && (mask[node->os_index/sizeof(*mask)/8] & (1UL << (node->os_index % (sizeof(*mask)*8))))) - hwloc_bitmap_or(cpuset, cpuset, node->cpuset); - } else { - /* if no numa, libnuma assumes we have a single node */ - if (mask[0] & 1) - hwloc_bitmap_copy(cpuset, hwloc_topology_get_complete_cpuset(topology)); - else - hwloc_bitmap_zero(cpuset); - } - - return 0; -} - -/** \brief Convert the array of unsigned long \p mask into hwloc NUMA node set - * - * \p mask is a array of unsigned long that will be read. - * \p maxnode contains the maximal node number that may be read in \p mask. - * - * This function may be used after calling get_mempolicy or any other function - * that takes an array of unsigned long as output parameter (and possibly - * a maximal node number as input parameter). - */ -static __hwloc_inline int -hwloc_nodeset_from_linux_libnuma_ulongs(hwloc_topology_t topology, hwloc_nodeset_t nodeset, - const unsigned long *mask, unsigned long maxnode) -{ - int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE); - - if (depth != HWLOC_TYPE_DEPTH_UNKNOWN) { - hwloc_obj_t node = NULL; - hwloc_bitmap_zero(nodeset); - while ((node = hwloc_get_next_obj_by_depth(topology, depth, node)) != NULL) - if (node->os_index < maxnode - && (mask[node->os_index/sizeof(*mask)/8] & (1UL << (node->os_index % (sizeof(*mask)*8))))) - hwloc_bitmap_set(nodeset, node->os_index); - } else { - /* if no numa, libnuma assumes we have a single node */ - if (mask[0] & 1) - hwloc_bitmap_fill(nodeset); - else - hwloc_bitmap_zero(nodeset); - } - - return 0; -} - -/** @} */ - - - -/** \defgroup hwlocality_linux_libnuma_bitmask Interoperability with Linux libnuma bitmask - * - * This interface helps converting between Linux libnuma bitmasks - * and hwloc cpusets and nodesets. - * - * It also offers a consistent behavior on non-NUMA machines - * or non-NUMA-aware kernels by assuming that the machines have a single - * NUMA node. - * - * \note Topology \p topology must match the current machine. - * - * \note The behavior of libnuma is undefined if the kernel is not NUMA-aware. - * (when CONFIG_NUMA is not set in the kernel configuration). - * This helper and libnuma may thus not be strictly compatible in this case, - * which may be detected by checking whether numa_available() returns -1. - * - * @{ - */ - - -/** \brief Convert hwloc CPU set \p cpuset into the returned libnuma bitmask - * - * The returned bitmask should later be freed with numa_bitmask_free. - * - * This function may be used before calling many numa_ functions - * that use a struct bitmask as an input parameter. - * - * \return newly allocated struct bitmask. - */ -static __hwloc_inline struct bitmask * -hwloc_cpuset_to_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_const_cpuset_t cpuset) __hwloc_attribute_malloc; -static __hwloc_inline struct bitmask * -hwloc_cpuset_to_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_const_cpuset_t cpuset) -{ - int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE); - struct bitmask *bitmask = numa_allocate_cpumask(); - if (!bitmask) - return NULL; - - if (depth != HWLOC_TYPE_DEPTH_UNKNOWN) { - hwloc_obj_t node = NULL; - while ((node = hwloc_get_next_obj_covering_cpuset_by_depth(topology, cpuset, depth, node)) != NULL) - if (node->memory.local_memory) - numa_bitmask_setbit(bitmask, node->os_index); - } else { - /* if no numa, libnuma assumes we have a single node */ - if (!hwloc_bitmap_iszero(cpuset)) - numa_bitmask_setbit(bitmask, 0); - } - - return bitmask; -} - -/** \brief Convert hwloc NUMA node set \p nodeset into the returned libnuma bitmask - * - * The returned bitmask should later be freed with numa_bitmask_free. - * - * This function may be used before calling many numa_ functions - * that use a struct bitmask as an input parameter. - * - * \return newly allocated struct bitmask. - */ -static __hwloc_inline struct bitmask * -hwloc_nodeset_to_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset) __hwloc_attribute_malloc; -static __hwloc_inline struct bitmask * -hwloc_nodeset_to_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset) -{ - int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE); - struct bitmask *bitmask = numa_allocate_cpumask(); - if (!bitmask) - return NULL; - - if (depth != HWLOC_TYPE_DEPTH_UNKNOWN) { - hwloc_obj_t node = NULL; - while ((node = hwloc_get_next_obj_by_depth(topology, depth, node)) != NULL) - if (hwloc_bitmap_isset(nodeset, node->os_index) && node->memory.local_memory) - numa_bitmask_setbit(bitmask, node->os_index); - } else { - /* if no numa, libnuma assumes we have a single node */ - if (!hwloc_bitmap_iszero(nodeset)) - numa_bitmask_setbit(bitmask, 0); - } - - return bitmask; -} - -/** \brief Convert libnuma bitmask \p bitmask into hwloc CPU set \p cpuset - * - * This function may be used after calling many numa_ functions - * that use a struct bitmask as an output parameter. - */ -static __hwloc_inline int -hwloc_cpuset_from_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_cpuset_t cpuset, - const struct bitmask *bitmask) -{ - int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE); - - if (depth != HWLOC_TYPE_DEPTH_UNKNOWN) { - hwloc_obj_t node = NULL; - hwloc_bitmap_zero(cpuset); - while ((node = hwloc_get_next_obj_by_depth(topology, depth, node)) != NULL) - if (numa_bitmask_isbitset(bitmask, node->os_index)) - hwloc_bitmap_or(cpuset, cpuset, node->cpuset); - } else { - /* if no numa, libnuma assumes we have a single node */ - if (numa_bitmask_isbitset(bitmask, 0)) - hwloc_bitmap_copy(cpuset, hwloc_topology_get_complete_cpuset(topology)); - else - hwloc_bitmap_zero(cpuset); - } - - return 0; -} - -/** \brief Convert libnuma bitmask \p bitmask into hwloc NUMA node set \p nodeset - * - * This function may be used after calling many numa_ functions - * that use a struct bitmask as an output parameter. - */ -static __hwloc_inline int -hwloc_nodeset_from_linux_libnuma_bitmask(hwloc_topology_t topology, hwloc_nodeset_t nodeset, - const struct bitmask *bitmask) -{ - int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE); - - if (depth != HWLOC_TYPE_DEPTH_UNKNOWN) { - hwloc_obj_t node = NULL; - hwloc_bitmap_zero(nodeset); - while ((node = hwloc_get_next_obj_by_depth(topology, depth, node)) != NULL) - if (numa_bitmask_isbitset(bitmask, node->os_index)) - hwloc_bitmap_set(nodeset, node->os_index); - } else { - /* if no numa, libnuma assumes we have a single node */ - if (numa_bitmask_isbitset(bitmask, 0)) - hwloc_bitmap_fill(nodeset); - else - hwloc_bitmap_zero(nodeset); - } - - return 0; -} - -/** @} */ - - -#ifdef __cplusplus -} /* extern "C" */ -#endif - - -#endif /* HWLOC_LINUX_NUMA_H */ diff --git a/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/nvml.h b/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/nvml.h deleted file mode 100644 index 462b3326661..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/nvml.h +++ /dev/null @@ -1,176 +0,0 @@ -/* - * Copyright © 2012-2013 Inria. All rights reserved. - * See COPYING in top-level directory. - */ - -/** \file - * \brief Macros to help interaction between hwloc and the NVIDIA Management Library. - * - * Applications that use both hwloc and the NVIDIA Management Library may want to - * include this file so as to get topology information for NVML devices. - */ - -#ifndef HWLOC_NVML_H -#define HWLOC_NVML_H - -#include -#include -#include -#ifdef HWLOC_LINUX_SYS -#include -#endif - -#include - - -#ifdef __cplusplus -extern "C" { -#endif - - -/** \defgroup hwlocality_nvml Interoperability with the NVIDIA Management Library - * - * This interface offers ways to retrieve topology information about - * devices managed by the NVIDIA Management Library (NVML). - * - * @{ - */ - -/** \brief Get the CPU set of logical processors that are physically - * close to NVML device \p device. - * - * Return the CPU set describing the locality of the NVML device \p device. - * - * Topology \p topology and device \p device must match the local machine. - * I/O devices detection and the NVML component are not needed in the topology. - * - * The function only returns the locality of the device. - * If more information about the device is needed, OS objects should - * be used instead, see hwloc_nvml_get_device_osdev() - * and hwloc_nvml_get_device_osdev_by_index(). - * - * This function is currently only implemented in a meaningful way for - * Linux; other systems will simply get a full cpuset. - */ -static __hwloc_inline int -hwloc_nvml_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused, - nvmlDevice_t device, hwloc_cpuset_t set) -{ -#ifdef HWLOC_LINUX_SYS - /* If we're on Linux, use the sysfs mechanism to get the local cpus */ -#define HWLOC_NVML_DEVICE_SYSFS_PATH_MAX 128 - char path[HWLOC_NVML_DEVICE_SYSFS_PATH_MAX]; - FILE *sysfile = NULL; - nvmlReturn_t nvres; - nvmlPciInfo_t pci; - - if (!hwloc_topology_is_thissystem(topology)) { - errno = EINVAL; - return -1; - } - - nvres = nvmlDeviceGetPciInfo(device, &pci); - if (NVML_SUCCESS != nvres) { - errno = EINVAL; - return -1; - } - - sprintf(path, "/sys/bus/pci/devices/%04x:%02x:%02x.0/local_cpus", pci.domain, pci.bus, pci.device); - sysfile = fopen(path, "r"); - if (!sysfile) - return -1; - - hwloc_linux_parse_cpumap_file(sysfile, set); - if (hwloc_bitmap_iszero(set)) - hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); - - fclose(sysfile); -#else - /* Non-Linux systems simply get a full cpuset */ - hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); -#endif - return 0; -} - -/** \brief Get the hwloc OS device object corresponding to the - * NVML device whose index is \p idx. - * - * Return the OS device object describing the NVML device whose - * index is \p idx. Returns NULL if there is none. - * - * The topology \p topology does not necessarily have to match the current - * machine. For instance the topology may be an XML import of a remote host. - * I/O devices detection and the NVML component must be enabled in the topology. - * - * \note The corresponding PCI device object can be obtained by looking - * at the OS device parent object. - */ -static __hwloc_inline hwloc_obj_t -hwloc_nvml_get_device_osdev_by_index(hwloc_topology_t topology, unsigned idx) -{ - hwloc_obj_t osdev = NULL; - while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) { - if (HWLOC_OBJ_OSDEV_GPU == osdev->attr->osdev.type - && osdev->name - && !strncmp("nvml", osdev->name, 4) - && atoi(osdev->name + 4) == (int) idx) - return osdev; - } - return NULL; -} - -/** \brief Get the hwloc OS device object corresponding to NVML device \p device. - * - * Return the hwloc OS device object that describes the given - * NVML device \p device. Return NULL if there is none. - * - * Topology \p topology and device \p device must match the local machine. - * I/O devices detection and the NVML component must be enabled in the topology. - * If not, the locality of the object may still be found using - * hwloc_nvml_get_device_cpuset(). - * - * \note The corresponding hwloc PCI device may be found by looking - * at the result parent pointer. - */ -static __hwloc_inline hwloc_obj_t -hwloc_nvml_get_device_osdev(hwloc_topology_t topology, nvmlDevice_t device) -{ - hwloc_obj_t osdev; - nvmlReturn_t nvres; - nvmlPciInfo_t pci; - - if (!hwloc_topology_is_thissystem(topology)) { - errno = EINVAL; - return NULL; - } - - nvres = nvmlDeviceGetPciInfo(device, &pci); - if (NVML_SUCCESS != nvres) - return NULL; - - osdev = NULL; - while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) { - hwloc_obj_t pcidev = osdev->parent; - if (strncmp(osdev->name, "nvml", 4)) - continue; - if (pcidev - && pcidev->type == HWLOC_OBJ_PCI_DEVICE - && pcidev->attr->pcidev.domain == pci.domain - && pcidev->attr->pcidev.bus == pci.bus - && pcidev->attr->pcidev.dev == pci.device - && pcidev->attr->pcidev.func == 0) - return osdev; - } - - return NULL; -} - -/** @} */ - - -#ifdef __cplusplus -} /* extern "C" */ -#endif - - -#endif /* HWLOC_NVML_H */ diff --git a/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/opencl.h b/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/opencl.h deleted file mode 100644 index 0301ad988bf..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/opencl.h +++ /dev/null @@ -1,199 +0,0 @@ -/* - * Copyright © 2012-2013 Inria. All rights reserved. - * Copyright © 2013 Université Bordeaux. All right reserved. - * See COPYING in top-level directory. - */ - -/** \file - * \brief Macros to help interaction between hwloc and the OpenCL interface. - * - * Applications that use both hwloc and OpenCL may want to - * include this file so as to get topology information for OpenCL devices. - */ - -#ifndef HWLOC_OPENCL_H -#define HWLOC_OPENCL_H - -#include -#include -#include -#ifdef HWLOC_LINUX_SYS -#include -#endif - -#include -#include - -#include - - -#ifdef __cplusplus -extern "C" { -#endif - - -/** \defgroup hwlocality_opencl Interoperability with OpenCL - * - * This interface offers ways to retrieve topology information about - * OpenCL devices. - * - * Only the AMD OpenCL interface currently offers useful locality information - * about its devices. - * - * @{ - */ - -/** \brief Get the CPU set of logical processors that are physically - * close to OpenCL device \p device. - * - * Return the CPU set describing the locality of the OpenCL device \p device. - * - * Topology \p topology and device \p device must match the local machine. - * I/O devices detection and the OpenCL component are not needed in the topology. - * - * The function only returns the locality of the device. - * If more information about the device is needed, OS objects should - * be used instead, see hwloc_opencl_get_device_osdev() - * and hwloc_opencl_get_device_osdev_by_index(). - * - * This function is currently only implemented in a meaningful way for - * Linux with the AMD OpenCL implementation; other systems will simply - * get a full cpuset. - */ -static __hwloc_inline int -hwloc_opencl_get_device_cpuset(hwloc_topology_t topology __hwloc_attribute_unused, - cl_device_id device __hwloc_attribute_unused, - hwloc_cpuset_t set) -{ -#if (defined HWLOC_LINUX_SYS) && (defined CL_DEVICE_TOPOLOGY_AMD) - /* If we're on Linux + AMD OpenCL, use the AMD extension + the sysfs mechanism to get the local cpus */ -#define HWLOC_OPENCL_DEVICE_SYSFS_PATH_MAX 128 - char path[HWLOC_OPENCL_DEVICE_SYSFS_PATH_MAX]; - FILE *sysfile = NULL; - cl_device_topology_amd amdtopo; - cl_int clret; - - if (!hwloc_topology_is_thissystem(topology)) { - errno = EINVAL; - return -1; - } - - clret = clGetDeviceInfo(device, CL_DEVICE_TOPOLOGY_AMD, sizeof(amdtopo), &amdtopo, NULL); - if (CL_SUCCESS != clret) { - hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); - return 0; - } - if (CL_DEVICE_TOPOLOGY_TYPE_PCIE_AMD != amdtopo.raw.type) { - hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); - return 0; - } - - sprintf(path, "/sys/bus/pci/devices/0000:%02x:%02x.%01x/local_cpus", amdtopo.pcie.bus, amdtopo.pcie.device, amdtopo.pcie.function); - sysfile = fopen(path, "r"); - if (!sysfile) - return -1; - - hwloc_linux_parse_cpumap_file(sysfile, set); - if (hwloc_bitmap_iszero(set)) - hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); - - fclose(sysfile); -#else - /* Non-Linux + AMD OpenCL systems simply get a full cpuset */ - hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology)); -#endif - return 0; -} - -/** \brief Get the hwloc OS device object corresponding to the - * OpenCL device for the given indexes. - * - * Return the OS device object describing the OpenCL device - * whose platform index is \p platform_index, - * and whose device index within this platform if \p device_index. - * Return NULL if there is none. - * - * The topology \p topology does not necessarily have to match the current - * machine. For instance the topology may be an XML import of a remote host. - * I/O devices detection and the OpenCL component must be enabled in the topology. - * - * \note The corresponding PCI device object can be obtained by looking - * at the OS device parent object. - */ -static __hwloc_inline hwloc_obj_t -hwloc_opencl_get_device_osdev_by_index(hwloc_topology_t topology, - unsigned platform_index, unsigned device_index) -{ - unsigned x = (unsigned) -1, y = (unsigned) -1; - hwloc_obj_t osdev = NULL; - while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) { - if (HWLOC_OBJ_OSDEV_COPROC == osdev->attr->osdev.type - && osdev->name - && sscanf(osdev->name, "opencl%ud%u", &x, &y) == 2 - && platform_index == x && device_index == y) - return osdev; - } - return NULL; -} - -/** \brief Get the hwloc OS device object corresponding to OpenCL device \p device. - * - * Return the hwloc OS device object that describes the given - * OpenCL device \p device. Return NULL if there is none. - * - * Topology \p topology and device \p device must match the local machine. - * I/O devices detection and the OpenCL component must be enabled in the topology. - * If not, the locality of the object may still be found using - * hwloc_opencl_get_device_cpuset(). - * - * \note The corresponding hwloc PCI device may be found by looking - * at the result parent pointer. - */ -static __hwloc_inline hwloc_obj_t -hwloc_opencl_get_device_osdev(hwloc_topology_t topology __hwloc_attribute_unused, - cl_device_id device __hwloc_attribute_unused) -{ -#ifdef CL_DEVICE_TOPOLOGY_AMD - hwloc_obj_t osdev; - cl_device_topology_amd amdtopo; - cl_int clret; - - clret = clGetDeviceInfo(device, CL_DEVICE_TOPOLOGY_AMD, sizeof(amdtopo), &amdtopo, NULL); - if (CL_SUCCESS != clret) { - errno = EINVAL; - return NULL; - } - if (CL_DEVICE_TOPOLOGY_TYPE_PCIE_AMD != amdtopo.raw.type) { - errno = EINVAL; - return NULL; - } - - osdev = NULL; - while ((osdev = hwloc_get_next_osdev(topology, osdev)) != NULL) { - hwloc_obj_t pcidev = osdev->parent; - if (strncmp(osdev->name, "opencl", 6)) - continue; - if (pcidev - && pcidev->type == HWLOC_OBJ_PCI_DEVICE - && pcidev->attr->pcidev.domain == 0 - && pcidev->attr->pcidev.bus == amdtopo.pcie.bus - && pcidev->attr->pcidev.dev == amdtopo.pcie.device - && pcidev->attr->pcidev.func == amdtopo.pcie.function) - return osdev; - } - - return NULL; -#else - return NULL; -#endif -} - -/** @} */ - - -#ifdef __cplusplus -} /* extern "C" */ -#endif - - -#endif /* HWLOC_OPENCL_H */ diff --git a/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/plugins.h b/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/plugins.h deleted file mode 100644 index 510157bcf51..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/plugins.h +++ /dev/null @@ -1,439 +0,0 @@ -/* - * Copyright © 2013-2015 Inria. All rights reserved. - * See COPYING in top-level directory. - */ - -#ifndef HWLOC_PLUGINS_H -#define HWLOC_PLUGINS_H - -/** \file - * \brief Public interface for building hwloc plugins. - */ - -struct hwloc_backend; - -#include -#ifdef HWLOC_INSIDE_PLUGIN -/* needed for hwloc_plugin_check_namespace() */ -#include -#endif - - - -/** \defgroup hwlocality_disc_components Components and Plugins: Discovery components - * @{ - */ - -/** \brief Discovery component type */ -typedef enum hwloc_disc_component_type_e { - /** \brief CPU-only discovery through the OS, or generic no-OS support. - * \hideinitializer */ - HWLOC_DISC_COMPONENT_TYPE_CPU = (1<<0), - - /** \brief xml, synthetic or custom, - * platform-specific components such as bgq. - * Anything the discovers CPU and everything else. - * No misc backend is expected to complement a global component. - * \hideinitializer */ - HWLOC_DISC_COMPONENT_TYPE_GLOBAL = (1<<1), - - /** \brief OpenCL, Cuda, etc. - * \hideinitializer */ - HWLOC_DISC_COMPONENT_TYPE_MISC = (1<<2) -} hwloc_disc_component_type_t; - -/** \brief Discovery component structure - * - * This is the major kind of components, taking care of the discovery. - * They are registered by generic components, either statically-built or as plugins. - */ -struct hwloc_disc_component { - /** \brief Discovery component type */ - hwloc_disc_component_type_t type; - - /** \brief Name. - * If this component is built as a plugin, this name does not have to match the plugin filename. - */ - const char *name; - - /** \brief Component types to exclude, as an OR'ed set of ::hwloc_disc_component_type_e. - * - * For a GLOBAL component, this usually includes all other types (~0). - * - * Other components only exclude types that may bring conflicting - * topology information. MISC components should likely not be excluded - * since they usually bring non-primary additional information. - */ - unsigned excludes; - - /** \brief Instantiate callback to create a backend from the component. - * Parameters data1, data2, data3 are NULL except for components - * that have special enabling routines such as hwloc_topology_set_xml(). */ - struct hwloc_backend * (*instantiate)(struct hwloc_disc_component *component, const void *data1, const void *data2, const void *data3); - - /** \brief Component priority. - * Used to sort topology->components, higher priority first. - * Also used to decide between two components with the same name. - * - * Usual values are - * 50 for native OS (or platform) components, - * 45 for x86, - * 40 for no-OS fallback, - * 30 for global components (xml/synthetic/custom), - * 20 for pci, - * 10 for other misc components (opencl etc.). - */ - unsigned priority; - - /** \private Used internally to list components by priority on topology->components - * (the component structure is usually read-only, - * the core copies it before using this field for queueing) - */ - struct hwloc_disc_component * next; -}; - -/** @} */ - - - - -/** \defgroup hwlocality_disc_backends Components and Plugins: Discovery backends - * @{ - */ - -/** \brief Discovery backend structure - * - * A backend is the instantiation of a discovery component. - * When a component gets enabled for a topology, - * its instantiate() callback creates a backend. - * - * hwloc_backend_alloc() initializes all fields to default values - * that the component may change (except "component" and "next") - * before enabling the backend with hwloc_backend_enable(). - */ -struct hwloc_backend { - /** \private Reserved for the core, set by hwloc_backend_alloc() */ - struct hwloc_disc_component * component; - /** \private Reserved for the core, set by hwloc_backend_enable() */ - struct hwloc_topology * topology; - /** \private Reserved for the core. Set to 1 if forced through envvar, 0 otherwise. */ - int envvar_forced; - /** \private Reserved for the core. Used internally to list backends topology->backends. */ - struct hwloc_backend * next; - - /** \brief Backend flags, as an OR'ed set of ::hwloc_backend_flag_e */ - unsigned long flags; - - /** \brief Backend-specific 'is_custom' property. - * Shortcut on !strcmp(..->component->name, "custom"). - * Only the custom component should touch this. */ - int is_custom; - - /** \brief Backend-specific 'is_thissystem' property. - * Set to 0 or 1 if the backend should enforce the thissystem flag when it gets enabled. - * Set to -1 if the backend doesn't care (default). */ - int is_thissystem; - - /** \brief Backend private data, or NULL if none. */ - void * private_data; - /** \brief Callback for freeing the private_data. - * May be NULL. - */ - void (*disable)(struct hwloc_backend *backend); - - /** \brief Main discovery callback. - * returns > 0 if it modified the topology tree, -1 on error, 0 otherwise. - * May be NULL if type is ::HWLOC_DISC_COMPONENT_TYPE_MISC. */ - int (*discover)(struct hwloc_backend *backend); - - /** \brief Callback used by the PCI backend to retrieve the locality of a PCI object from the OS/cpu backend. - * May be NULL. */ - int (*get_obj_cpuset)(struct hwloc_backend *backend, struct hwloc_backend *caller, struct hwloc_obj *obj, hwloc_bitmap_t cpuset); - - /** \brief Callback called by backends to notify this backend that a new object was added. - * returns > 0 if it modified the topology tree, 0 otherwise. - * May be NULL. */ - int (*notify_new_object)(struct hwloc_backend *backend, struct hwloc_backend *caller, struct hwloc_obj *obj); -}; - -/** \brief Backend flags */ -enum hwloc_backend_flag_e { - /** \brief Levels should be reconnected before this backend discover() is used. - * \hideinitializer */ - HWLOC_BACKEND_FLAG_NEED_LEVELS = (1UL<<0) -}; - -/** \brief Allocate a backend structure, set good default values, initialize backend->component and topology, etc. - * The caller will then modify whatever needed, and call hwloc_backend_enable(). - */ -HWLOC_DECLSPEC struct hwloc_backend * hwloc_backend_alloc(struct hwloc_disc_component *component); - -/** \brief Enable a previously allocated and setup backend. */ -HWLOC_DECLSPEC int hwloc_backend_enable(struct hwloc_topology *topology, struct hwloc_backend *backend); - -/** \brief Used by backends discovery callbacks to request locality information from others. - * - * Traverse the list of enabled backends until one has a - * get_obj_cpuset() method, and call it. - */ -HWLOC_DECLSPEC int hwloc_backends_get_obj_cpuset(struct hwloc_backend *caller, struct hwloc_obj *obj, hwloc_bitmap_t cpuset); - -/** \brief Used by backends discovery callbacks to notify other - * backends of new objects. - * - * Traverse the list of enabled backends (all but caller) and invoke - * their notify_new_object() method to notify them that a new object - * just got added to the topology. - * - * Currently only used for notifying of new PCI device objects. - */ -HWLOC_DECLSPEC int hwloc_backends_notify_new_object(struct hwloc_backend *caller, struct hwloc_obj *obj); - -/** @} */ - - - - -/** \defgroup hwlocality_generic_components Components and Plugins: Generic components - * @{ - */ - -/** \brief Generic component type */ -typedef enum hwloc_component_type_e { - /** \brief The data field must point to a struct hwloc_disc_component. */ - HWLOC_COMPONENT_TYPE_DISC, - - /** \brief The data field must point to a struct hwloc_xml_component. */ - HWLOC_COMPONENT_TYPE_XML -} hwloc_component_type_t; - -/** \brief Generic component structure - * - * Generic components structure, either statically listed by configure in static-components.h - * or dynamically loaded as a plugin. - */ -struct hwloc_component { - /** \brief Component ABI version, set to ::HWLOC_COMPONENT_ABI */ - unsigned abi; - - /** \brief Process-wide component initialization callback. - * - * This optional callback is called when the component is registered - * to the hwloc core (after loading the plugin). - * - * When the component is built as a plugin, this callback - * should call hwloc_check_plugin_namespace() - * and return an negative error code on error. - * - * \p flags is always 0 for now. - * - * \return 0 on success, or a negative code on error. - * - * \note If the component uses ltdl for loading its own plugins, - * it should load/unload them only in init() and finalize(), - * to avoid race conditions with hwloc's use of ltdl. - */ - int (*init)(unsigned long flags); - - /** \brief Process-wide component termination callback. - * - * This optional callback is called after unregistering the component - * from the hwloc core (before unloading the plugin). - * - * \p flags is always 0 for now. - * - * \note If the component uses ltdl for loading its own plugins, - * it should load/unload them only in init() and finalize(), - * to avoid race conditions with hwloc's use of ltdl. - */ - void (*finalize)(unsigned long flags); - - /** \brief Component type */ - hwloc_component_type_t type; - - /** \brief Component flags, unused for now */ - unsigned long flags; - - /** \brief Component data, pointing to a struct hwloc_disc_component or struct hwloc_xml_component. */ - void * data; -}; - -/** @} */ - - - - -/** \defgroup hwlocality_components_core_funcs Components and Plugins: Core functions to be used by components - * @{ - */ - -/** \brief Add an object to the topology. - * - * It is sorted along the tree of other objects according to the inclusion of - * cpusets, to eventually be added as a child of the smallest object including - * this object. - * - * If the cpuset is empty, the type of the object (and maybe some attributes) - * must be enough to find where to insert the object. This is especially true - * for NUMA nodes with memory and no CPUs. - * - * The given object should not have children. - * - * This shall only be called before levels are built. - * - * In case of error, hwloc_report_os_error() is called. - * - * Returns the object on success. - * Returns NULL and frees obj on error. - * Returns another object and frees obj if it was merged with an identical pre-existing object. - */ -HWLOC_DECLSPEC struct hwloc_obj *hwloc_insert_object_by_cpuset(struct hwloc_topology *topology, hwloc_obj_t obj); - -/** \brief Type of error callbacks during object insertion */ -typedef void (*hwloc_report_error_t)(const char * msg, int line); -/** \brief Report an insertion error from a backend */ -HWLOC_DECLSPEC void hwloc_report_os_error(const char * msg, int line); -/** \brief Check whether insertion errors are hidden */ -HWLOC_DECLSPEC int hwloc_hide_errors(void); - -/** \brief Add an object to the topology and specify which error callback to use. - * - * Aside from the error callback selection, this function is identical to hwloc_insert_object_by_cpuset() - */ -HWLOC_DECLSPEC struct hwloc_obj *hwloc__insert_object_by_cpuset(struct hwloc_topology *topology, hwloc_obj_t obj, hwloc_report_error_t report_error); - -/** \brief Insert an object somewhere in the topology. - * - * It is added as the last child of the given parent. - * The cpuset is completely ignored, so strange objects such as I/O devices should - * preferably be inserted with this. - * - * When used for "normal" children with cpusets (when importing from XML - * when duplicating a topology), the caller should make sure children are inserted - * in order. - * - * The given object may have children. - * - * Remember to call topology_connect() afterwards to fix handy pointers. - */ -HWLOC_DECLSPEC void hwloc_insert_object_by_parent(struct hwloc_topology *topology, hwloc_obj_t parent, hwloc_obj_t obj); - -/** \brief Allocate and initialize an object of the given type and physical index */ -static __hwloc_inline struct hwloc_obj * -hwloc_alloc_setup_object(hwloc_obj_type_t type, signed os_index) -{ - struct hwloc_obj *obj = malloc(sizeof(*obj)); - memset(obj, 0, sizeof(*obj)); - obj->type = type; - obj->os_index = os_index; - obj->os_level = -1; - obj->attr = malloc(sizeof(*obj->attr)); - memset(obj->attr, 0, sizeof(*obj->attr)); - /* do not allocate the cpuset here, let the caller do it */ - return obj; -} - -/** \brief Setup object cpusets/nodesets by OR'ing its children. - * - * Used when adding an object late in the topology, after propagating sets up and down. - * The caller should use this after inserting by cpuset (which means the cpusets is already OK). - * Typical case: PCI backend adding a hostbridge parent. - */ -HWLOC_DECLSPEC int hwloc_fill_object_sets(hwloc_obj_t obj); - -/** \brief Make sure that plugins can lookup core symbols. - * - * This is a sanity check to avoid lazy-lookup failures when libhwloc - * is loaded within a plugin, and later tries to load its own plugins. - * This may fail (and abort the program) if libhwloc symbols are in a - * private namespace. - * - * \return 0 on success. - * \return -1 if the plugin cannot be successfully loaded. The caller - * plugin init() callback should return a negative error code as well. - * - * Plugins should call this function in their init() callback to avoid - * later crashes if lazy symbol resolution is used by the upper layer that - * loaded hwloc (e.g. OpenCL implementations using dlopen with RTLD_LAZY). - * - * \note The build system must define HWLOC_INSIDE_PLUGIN if and only if - * building the caller as a plugin. - * - * \note This function should remain inline so plugins can call it even - * when they cannot find libhwloc symbols. - */ -static __hwloc_inline int -hwloc_plugin_check_namespace(const char *pluginname __hwloc_attribute_unused, const char *symbol __hwloc_attribute_unused) -{ -#ifdef HWLOC_INSIDE_PLUGIN - lt_dlhandle handle; - void *sym; - handle = lt_dlopen(NULL); - if (!handle) - /* cannot check, assume things will work */ - return 0; - sym = lt_dlsym(handle, symbol); - lt_dlclose(handle); - if (!sym) { - static int verboseenv_checked = 0; - static int verboseenv_value = 0; - if (!verboseenv_checked) { - const char *verboseenv = getenv("HWLOC_PLUGINS_VERBOSE"); - verboseenv_value = verboseenv ? atoi(verboseenv) : 0; - verboseenv_checked = 1; - } - if (verboseenv_value) - fprintf(stderr, "Plugin `%s' disabling itself because it cannot find the `%s' core symbol.\n", - pluginname, symbol); - return -1; - } -#endif /* HWLOC_INSIDE_PLUGIN */ - return 0; -} - -/** @} */ - - - - -/** \defgroup hwlocality_components_pci_funcs Components and Plugins: PCI functions to be used by components - * @{ - */ - -/** \brief Insert a list of PCI devices and bridges in the backend topology. - * - * Insert a list of objects (either PCI device or bridges) starting at first_obj - * (linked by next_sibling in the topology, and ending with NULL). - * Objects are placed under the right bridges, and the remaining upstream bridges - * are then inserted in the topology by calling the get_obj_cpuset() callback to - * find their locality. - */ -HWLOC_DECLSPEC int hwloc_insert_pci_device_list(struct hwloc_backend *backend, struct hwloc_obj *first_obj); - -/** \brief Return the offset of the given capability in the PCI config space buffer - * - * This function requires a 256-bytes config space. Unknown/unavailable bytes should be set to 0xff. - */ -HWLOC_DECLSPEC unsigned hwloc_pci_find_cap(const unsigned char *config, unsigned cap); - -/** \brief Fill linkspeed by reading the PCI config space where PCI_CAP_ID_EXP is at position offset. - * - * Needs 20 bytes of EXP capability block starting at offset in the config space - * for registers up to link status. - */ -HWLOC_DECLSPEC int hwloc_pci_find_linkspeed(const unsigned char *config, unsigned offset, float *linkspeed); - -/** \brief Modify the PCI device object into a bridge and fill its attribute if a bridge is found in the PCI config space. - * - * This function requires 64 bytes of common configuration header at the beginning of config. - * - * Returns -1 and destroys /p obj if bridge fields are invalid. - */ -HWLOC_DECLSPEC int hwloc_pci_prepare_bridge(hwloc_obj_t obj, const unsigned char *config); - -/** @} */ - - - - -#endif /* HWLOC_PLUGINS_H */ diff --git a/opal/mca/hwloc/hwloc1113/hwloc/include/private/autogen/config.h.in b/opal/mca/hwloc/hwloc1113/hwloc/include/private/autogen/config.h.in deleted file mode 100644 index 1d8b4fcc5c2..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/include/private/autogen/config.h.in +++ /dev/null @@ -1,720 +0,0 @@ -/* include/private/autogen/config.h.in. Generated from configure.ac by autoheader. */ - -/* -*- c -*- - * - * Copyright © 2009, 2011, 2012 CNRS, inria., Université Bordeaux All rights reserved. - * Copyright © 2009 Cisco Systems, Inc. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - * - * This file is automatically generated by configure. Edits will be lost - * the next time you run configure! - */ - -#ifndef HWLOC_CONFIGURE_H -#define HWLOC_CONFIGURE_H - - -/* Define to 1 if the system has the type `CACHE_DESCRIPTOR'. */ -#undef HAVE_CACHE_DESCRIPTOR - -/* Define to 1 if the system has the type `CACHE_RELATIONSHIP'. */ -#undef HAVE_CACHE_RELATIONSHIP - -/* Define to 1 if you have the `clz' function. */ -#undef HAVE_CLZ - -/* Define to 1 if you have the `clzl' function. */ -#undef HAVE_CLZL - -/* Define to 1 if you have the header file. */ -#undef HAVE_CL_CL_EXT_H - -/* Define to 1 if you have the `cpuset_setaffinity' function. */ -#undef HAVE_CPUSET_SETAFFINITY - -/* Define to 1 if you have the `cpuset_setid' function. */ -#undef HAVE_CPUSET_SETID - -/* Define to 1 if you have the header file. */ -#undef HAVE_CTYPE_H - -/* Define to 1 if we have -lcuda */ -#undef HAVE_CUDA - -/* Define to 1 if you have the header file. */ -#undef HAVE_CUDA_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_CUDA_RUNTIME_API_H - -/* Define to 1 if you have the declaration of `CL_DEVICE_TOPOLOGY_AMD', and to - 0 if you don't. */ -#undef HAVE_DECL_CL_DEVICE_TOPOLOGY_AMD - -/* Define to 1 if you have the declaration of `CTL_HW', and to 0 if you don't. - */ -#undef HAVE_DECL_CTL_HW - -/* Define to 1 if you have the declaration of `fabsf', and to 0 if you don't. - */ -#undef HAVE_DECL_FABSF - -/* Define to 1 if you have the declaration of `getexecname', and to 0 if you - don't. */ -#undef HAVE_DECL_GETEXECNAME - -/* Define to 1 if you have the declaration of `GetModuleFileName', and to 0 if - you don't. */ -#undef HAVE_DECL_GETMODULEFILENAME - -/* Define to 1 if you have the declaration of `getprogname', and to 0 if you - don't. */ -#undef HAVE_DECL_GETPROGNAME - -/* Define to 1 if you have the declaration of `HW_NCPU', and to 0 if you - don't. */ -#undef HAVE_DECL_HW_NCPU - -/* Define to 1 if you have the declaration of - `nvmlDeviceGetMaxPcieLinkGeneration', and to 0 if you don't. */ -#undef HAVE_DECL_NVMLDEVICEGETMAXPCIELINKGENERATION - -/* Define to 1 if you have the declaration of `pthread_getaffinity_np', and to - 0 if you don't. */ -#undef HAVE_DECL_PTHREAD_GETAFFINITY_NP - -/* Define to 1 if you have the declaration of `pthread_setaffinity_np', and to - 0 if you don't. */ -#undef HAVE_DECL_PTHREAD_SETAFFINITY_NP - -/* Define to 1 if you have the declaration of `RUNNING_ON_VALGRIND', and to 0 - if you don't. */ -#undef HAVE_DECL_RUNNING_ON_VALGRIND - -/* Define to 1 if you have the declaration of `snprintf', and to 0 if you - don't. */ -#undef HAVE_DECL_SNPRINTF - -/* Define to 1 if you have the declaration of `strcasecmp', and to 0 if you - don't. */ -#undef HAVE_DECL_STRCASECMP - -/* Define to 1 if you have the declaration of `strtoull', and to 0 if you - don't. */ -#undef HAVE_DECL_STRTOULL - -/* Define to 1 if you have the declaration of `_putenv', and to 0 if you - don't. */ -#undef HAVE_DECL__PUTENV - -/* Define to 1 if you have the declaration of `_SC_LARGE_PAGESIZE', and to 0 - if you don't. */ -#undef HAVE_DECL__SC_LARGE_PAGESIZE - -/* Define to 1 if you have the declaration of `_SC_NPROCESSORS_CONF', and to 0 - if you don't. */ -#undef HAVE_DECL__SC_NPROCESSORS_CONF - -/* Define to 1 if you have the declaration of `_SC_NPROCESSORS_ONLN', and to 0 - if you don't. */ -#undef HAVE_DECL__SC_NPROCESSORS_ONLN - -/* Define to 1 if you have the declaration of `_SC_NPROC_CONF', and to 0 if - you don't. */ -#undef HAVE_DECL__SC_NPROC_CONF - -/* Define to 1 if you have the declaration of `_SC_NPROC_ONLN', and to 0 if - you don't. */ -#undef HAVE_DECL__SC_NPROC_ONLN - -/* Define to 1 if you have the declaration of `_SC_PAGESIZE', and to 0 if you - don't. */ -#undef HAVE_DECL__SC_PAGESIZE - -/* Define to 1 if you have the declaration of `_SC_PAGE_SIZE', and to 0 if you - don't. */ -#undef HAVE_DECL__SC_PAGE_SIZE - -/* Define to 1 if you have the declaration of `_strdup', and to 0 if you - don't. */ -#undef HAVE_DECL__STRDUP - -/* Define to 1 if you have the header file. */ -#undef HAVE_DIRENT_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_DLFCN_H - -/* Define to 1 if you have the `ffs' function. */ -#undef HAVE_FFS - -/* Define to 1 if you have the `ffsl' function. */ -#undef HAVE_FFSL - -/* Define to 1 if you have the `fls' function. */ -#undef HAVE_FLS - -/* Define to 1 if you have the `flsl' function. */ -#undef HAVE_FLSL - -/* Define to 1 if you have the `getpagesize' function. */ -#undef HAVE_GETPAGESIZE - -/* Define to 1 if the system has the type `GROUP_AFFINITY'. */ -#undef HAVE_GROUP_AFFINITY - -/* Define to 1 if the system has the type `GROUP_RELATIONSHIP'. */ -#undef HAVE_GROUP_RELATIONSHIP - -/* Define to 1 if you have the `host_info' function. */ -#undef HAVE_HOST_INFO - -/* Define to 1 if you have the header file. */ -#undef HAVE_INFINIBAND_VERBS_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_INTTYPES_H - -/* Define to 1 if the system has the type `KAFFINITY'. */ -#undef HAVE_KAFFINITY - -/* Define to 1 if you have the header file. */ -#undef HAVE_KSTAT_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_LANGINFO_H - -/* Define to 1 if we have -lgdi32 */ -#undef HAVE_LIBGDI32 - -/* Define to 1 if we have -libverbs */ -#undef HAVE_LIBIBVERBS - -/* Define to 1 if we have -lkstat */ -#undef HAVE_LIBKSTAT - -/* Define to 1 if we have -llgrp */ -#undef HAVE_LIBLGRP - -/* Define to 1 if you have the header file. */ -#undef HAVE_LIBUDEV_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_LOCALE_H - -/* Define to 1 if the system has the type `LOGICAL_PROCESSOR_RELATIONSHIP'. */ -#undef HAVE_LOGICAL_PROCESSOR_RELATIONSHIP - -/* Define to 1 if you have the header file. */ -#undef HAVE_MACH_MACH_HOST_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_MACH_MACH_INIT_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_MALLOC_H - -/* Define to 1 if you have the `memalign' function. */ -#undef HAVE_MEMALIGN - -/* Define to 1 if you have the header file. */ -#undef HAVE_MEMORY_H - -/* Define to 1 if we have -lmyriexpress */ -#undef HAVE_MYRIEXPRESS - -/* Define to 1 if you have the header file. */ -#undef HAVE_MYRIEXPRESS_H - -/* Define to 1 if you have the `nl_langinfo' function. */ -#undef HAVE_NL_LANGINFO - -/* Define to 1 if you have the header file. */ -#undef HAVE_NUMAIF_H - -/* Define to 1 if the system has the type `NUMA_NODE_RELATIONSHIP'. */ -#undef HAVE_NUMA_NODE_RELATIONSHIP - -/* Define to 1 if you have the header file. */ -#undef HAVE_NVCTRL_NVCTRL_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_NVML_H - -/* Define to 1 if you have the `openat' function. */ -#undef HAVE_OPENAT - -/* Define to 1 if you have the header file. */ -#undef HAVE_PICL_H - -/* Define to 1 if you have the `posix_memalign' function. */ -#undef HAVE_POSIX_MEMALIGN - -/* Define to 1 if the system has the type `PROCESSOR_CACHE_TYPE'. */ -#undef HAVE_PROCESSOR_CACHE_TYPE - -/* Define to 1 if the system has the type `PROCESSOR_GROUP_INFO'. */ -#undef HAVE_PROCESSOR_GROUP_INFO - -/* Define to 1 if the system has the type `PROCESSOR_NUMBER'. */ -#undef HAVE_PROCESSOR_NUMBER - -/* Define to 1 if the system has the type `PROCESSOR_RELATIONSHIP'. */ -#undef HAVE_PROCESSOR_RELATIONSHIP - -/* Define to '1' if program_invocation_name is present and usable */ -#undef HAVE_PROGRAM_INVOCATION_NAME - -/* Define to 1 if the system has the type `PSAPI_WORKING_SET_EX_BLOCK'. */ -#undef HAVE_PSAPI_WORKING_SET_EX_BLOCK - -/* Define to 1 if the system has the type `PSAPI_WORKING_SET_EX_INFORMATION'. - */ -#undef HAVE_PSAPI_WORKING_SET_EX_INFORMATION - -/* Define to 1 if you have the header file. */ -#undef HAVE_PTHREAD_NP_H - -/* Define to 1 if the system has the type `pthread_t'. */ -#undef HAVE_PTHREAD_T - -/* Define to 1 if you have the `putwc' function. */ -#undef HAVE_PUTWC - -/* Define to 1 if the system has the type `RelationProcessorPackage'. */ -#undef HAVE_RELATIONPROCESSORPACKAGE - -/* Define to 1 if you have the `setlocale' function. */ -#undef HAVE_SETLOCALE - -/* Define to 1 if the system has the type `ssize_t'. */ -#undef HAVE_SSIZE_T - -/* Define to 1 if you have the header file. */ -#undef HAVE_STDINT_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_STDLIB_H - -/* Define to 1 if you have the `strftime' function. */ -#undef HAVE_STRFTIME - -/* Define to 1 if you have the header file. */ -#undef HAVE_STRINGS_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_STRING_H - -/* Define to 1 if you have the `strncasecmp' function. */ -#undef HAVE_STRNCASECMP - -/* Define to 1 if you have the `strtoull' function. */ -#undef HAVE_STRTOULL - -/* Define to '1' if sysctl is present and usable */ -#undef HAVE_SYSCTL - -/* Define to '1' if sysctlbyname is present and usable */ -#undef HAVE_SYSCTLBYNAME - -/* Define to 1 if the system has the type - `SYSTEM_LOGICAL_PROCESSOR_INFORMATION'. */ -#undef HAVE_SYSTEM_LOGICAL_PROCESSOR_INFORMATION - -/* Define to 1 if the system has the type - `SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX'. */ -#undef HAVE_SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX - -/* Define to 1 if you have the header file. */ -#undef HAVE_SYS_CPUSET_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_SYS_LGRP_USER_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_SYS_MMAN_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_SYS_PARAM_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_SYS_STAT_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_SYS_SYSCTL_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_SYS_TYPES_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_SYS_UTSNAME_H - -/* Define to 1 if you have the `uname' function. */ -#undef HAVE_UNAME - -/* Define to 1 if you have the header file. */ -#undef HAVE_UNISTD_H - -/* Define to 1 if you have the `uselocale' function. */ -#undef HAVE_USELOCALE - -/* Define to 1 if you have the header file. */ -#undef HAVE_VALGRIND_VALGRIND_H - -/* Define to 1 if the system has the type `wchar_t'. */ -#undef HAVE_WCHAR_T - -/* Define to 1 if you have the header file. */ -#undef HAVE_X11_KEYSYM_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_X11_XLIB_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_X11_XUTIL_H - -/* Define to 1 if you have the header file. */ -#undef HAVE_XLOCALE_H - -/* Define to '1' if __progname is present and usable */ -#undef HAVE___PROGNAME - -/* Define to 1 on AIX */ -#undef HWLOC_AIX_SYS - -/* Define to 1 on BlueGene/Q */ -#undef HWLOC_BGQ_SYS - -/* Whether C compiler supports symbol visibility or not */ -#undef HWLOC_C_HAVE_VISIBILITY - -/* Define to 1 on Darwin */ -#undef HWLOC_DARWIN_SYS - -/* Whether we are in debugging mode or not */ -#undef HWLOC_DEBUG - -/* Define to 1 on *FREEBSD */ -#undef HWLOC_FREEBSD_SYS - -/* Whether your compiler has __attribute__ or not */ -#undef HWLOC_HAVE_ATTRIBUTE - -/* Whether your compiler has __attribute__ aligned or not */ -#undef HWLOC_HAVE_ATTRIBUTE_ALIGNED - -/* Whether your compiler has __attribute__ always_inline or not */ -#undef HWLOC_HAVE_ATTRIBUTE_ALWAYS_INLINE - -/* Whether your compiler has __attribute__ cold or not */ -#undef HWLOC_HAVE_ATTRIBUTE_COLD - -/* Whether your compiler has __attribute__ const or not */ -#undef HWLOC_HAVE_ATTRIBUTE_CONST - -/* Whether your compiler has __attribute__ deprecated or not */ -#undef HWLOC_HAVE_ATTRIBUTE_DEPRECATED - -/* Whether your compiler has __attribute__ format or not */ -#undef HWLOC_HAVE_ATTRIBUTE_FORMAT - -/* Whether your compiler has __attribute__ hot or not */ -#undef HWLOC_HAVE_ATTRIBUTE_HOT - -/* Whether your compiler has __attribute__ malloc or not */ -#undef HWLOC_HAVE_ATTRIBUTE_MALLOC - -/* Whether your compiler has __attribute__ may_alias or not */ -#undef HWLOC_HAVE_ATTRIBUTE_MAY_ALIAS - -/* Whether your compiler has __attribute__ nonnull or not */ -#undef HWLOC_HAVE_ATTRIBUTE_NONNULL - -/* Whether your compiler has __attribute__ noreturn or not */ -#undef HWLOC_HAVE_ATTRIBUTE_NORETURN - -/* Whether your compiler has __attribute__ no_instrument_function or not */ -#undef HWLOC_HAVE_ATTRIBUTE_NO_INSTRUMENT_FUNCTION - -/* Whether your compiler has __attribute__ packed or not */ -#undef HWLOC_HAVE_ATTRIBUTE_PACKED - -/* Whether your compiler has __attribute__ pure or not */ -#undef HWLOC_HAVE_ATTRIBUTE_PURE - -/* Whether your compiler has __attribute__ sentinel or not */ -#undef HWLOC_HAVE_ATTRIBUTE_SENTINEL - -/* Whether your compiler has __attribute__ unused or not */ -#undef HWLOC_HAVE_ATTRIBUTE_UNUSED - -/* Whether your compiler has __attribute__ warn unused result or not */ -#undef HWLOC_HAVE_ATTRIBUTE_WARN_UNUSED_RESULT - -/* Whether your compiler has __attribute__ weak alias or not */ -#undef HWLOC_HAVE_ATTRIBUTE_WEAK_ALIAS - -/* Define to 1 if your `ffs' function is known to be broken. */ -#undef HWLOC_HAVE_BROKEN_FFS - -/* Define to 1 if you have the `cairo' library. */ -#undef HWLOC_HAVE_CAIRO - -/* Define to 1 if you have the `clz' function. */ -#undef HWLOC_HAVE_CLZ - -/* Define to 1 if you have the `clzl' function. */ -#undef HWLOC_HAVE_CLZL - -/* Define to 1 if the CPU_SET macro works */ -#undef HWLOC_HAVE_CPU_SET - -/* Define to 1 if the CPU_SET_S macro works */ -#undef HWLOC_HAVE_CPU_SET_S - -/* Define to 1 if you have the `cudart' SDK. */ -#undef HWLOC_HAVE_CUDART - -/* Define to 1 if function `clz' is declared by system headers */ -#undef HWLOC_HAVE_DECL_CLZ - -/* Define to 1 if function `clzl' is declared by system headers */ -#undef HWLOC_HAVE_DECL_CLZL - -/* Define to 1 if function `ffs' is declared by system headers */ -#undef HWLOC_HAVE_DECL_FFS - -/* Define to 1 if function `ffsl' is declared by system headers */ -#undef HWLOC_HAVE_DECL_FFSL - -/* Define to 1 if function `fls' is declared by system headers */ -#undef HWLOC_HAVE_DECL_FLS - -/* Define to 1 if function `flsl' is declared by system headers */ -#undef HWLOC_HAVE_DECL_FLSL - -/* Define to 1 if function `strncasecmp' is declared by system headers */ -#undef HWLOC_HAVE_DECL_STRNCASECMP - -/* Define to 1 if you have the `ffs' function. */ -#undef HWLOC_HAVE_FFS - -/* Define to 1 if you have the `ffsl' function. */ -#undef HWLOC_HAVE_FFSL - -/* Define to 1 if you have the `fls' function. */ -#undef HWLOC_HAVE_FLS - -/* Define to 1 if you have the `flsl' function. */ -#undef HWLOC_HAVE_FLSL - -/* Define to 1 if you have the GL module components. */ -#undef HWLOC_HAVE_GL - -/* Define to 1 if you have a library providing the termcap interface */ -#undef HWLOC_HAVE_LIBTERMCAP - -/* Define to 1 if you have the `libxml2' library. */ -#undef HWLOC_HAVE_LIBXML2 - -/* Define to 1 if building the Linux PCI component */ -#undef HWLOC_HAVE_LINUXPCI - -/* Define to 1 if mbind is available. */ -#undef HWLOC_HAVE_MBIND - -/* Define to 1 if migrate_pages is available. */ -#undef HWLOC_HAVE_MIGRATE_PAGES - -/* Define to 1 if move_pages is available. */ -#undef HWLOC_HAVE_MOVE_PAGES - -/* Define to 1 if you have the `NVML' library. */ -#undef HWLOC_HAVE_NVML - -/* Define to 1 if glibc provides the old prototype (without length) of - sched_setaffinity() */ -#undef HWLOC_HAVE_OLD_SCHED_SETAFFINITY - -/* Define to 1 if you have the `OpenCL' library. */ -#undef HWLOC_HAVE_OPENCL - -/* Define to 1 if the hwloc library should support dynamically-loaded plugins - */ -#undef HWLOC_HAVE_PLUGINS - -/* `Define to 1 if you have pthread_getthrds_np' */ -#undef HWLOC_HAVE_PTHREAD_GETTHRDS_NP - -/* Define to 1 if pthread mutexes are available */ -#undef HWLOC_HAVE_PTHREAD_MUTEX - -/* Define to 1 if glibc provides a prototype of sched_setaffinity() */ -#undef HWLOC_HAVE_SCHED_SETAFFINITY - -/* Define to 1 if set_mempolicy is available. */ -#undef HWLOC_HAVE_SET_MEMPOLICY - -/* Define to 1 if you have the header file. */ -#undef HWLOC_HAVE_STDINT_H - -/* Define to 1 if function `syscall' is available */ -#undef HWLOC_HAVE_SYSCALL - -/* Define to 1 if you have the `windows.h' header. */ -#undef HWLOC_HAVE_WINDOWS_H - -/* Define to 1 if X11 headers including Xutil.h and keysym.h are available. */ -#undef HWLOC_HAVE_X11_KEYSYM - -/* Define to 1 if you have x86 cpuid */ -#undef HWLOC_HAVE_X86_CPUID - -/* Define to 1 on HP-UX */ -#undef HWLOC_HPUX_SYS - -/* Define to 1 on Irix */ -#undef HWLOC_IRIX_SYS - -/* Define to 1 on Linux */ -#undef HWLOC_LINUX_SYS - -/* Define to 1 on *NETBSD */ -#undef HWLOC_NETBSD_SYS - -/* Define to 1 on OSF */ -#undef HWLOC_OSF_SYS - -/* The size of `unsigned int', as computed by sizeof */ -#undef HWLOC_SIZEOF_UNSIGNED_INT - -/* The size of `unsigned long', as computed by sizeof */ -#undef HWLOC_SIZEOF_UNSIGNED_LONG - -/* Define to 1 on Solaris */ -#undef HWLOC_SOLARIS_SYS - -/* The hwloc symbol prefix */ -#undef HWLOC_SYM_PREFIX - -/* The hwloc symbol prefix in all caps */ -#undef HWLOC_SYM_PREFIX_CAPS - -/* Whether we need to re-define all the hwloc public symbols or not */ -#undef HWLOC_SYM_TRANSFORM - -/* Define to 1 on unsupported systems */ -#undef HWLOC_UNSUPPORTED_SYS - -/* Define to 1 if ncurses works, preferred over curses */ -#undef HWLOC_USE_NCURSES - -/* The library version, always available, even in embedded mode, contrary to - VERSION */ -#undef HWLOC_VERSION - -/* Define to 1 on WINDOWS */ -#undef HWLOC_WIN_SYS - -/* Define to 1 on x86_32 */ -#undef HWLOC_X86_32_ARCH - -/* Define to 1 on x86_64 */ -#undef HWLOC_X86_64_ARCH - -/* Define to the sub-directory where libtool stores uninstalled libraries. */ -#undef LT_OBJDIR - -/* Name of package */ -#undef PACKAGE - -/* Define to the address where bug reports for this package should be sent. */ -#undef PACKAGE_BUGREPORT - -/* Define to the full name of this package. */ -#undef PACKAGE_NAME - -/* Define to the full name and version of this package. */ -#undef PACKAGE_STRING - -/* Define to the one symbol short name of this package. */ -#undef PACKAGE_TARNAME - -/* Define to the home page for this package. */ -#undef PACKAGE_URL - -/* Define to the version of this package. */ -#undef PACKAGE_VERSION - -/* The size of `unsigned int', as computed by sizeof. */ -#undef SIZEOF_UNSIGNED_INT - -/* The size of `unsigned long', as computed by sizeof. */ -#undef SIZEOF_UNSIGNED_LONG - -/* The size of `void *', as computed by sizeof. */ -#undef SIZEOF_VOID_P - -/* Define to 1 if you have the ANSI C header files. */ -#undef STDC_HEADERS - -/* Enable extensions on HP-UX. */ -#ifndef _HPUX_SOURCE -# undef _HPUX_SOURCE -#endif - - -/* Enable extensions on AIX 3, Interix. */ -#ifndef _ALL_SOURCE -# undef _ALL_SOURCE -#endif -/* Enable GNU extensions on systems that have them. */ -#ifndef _GNU_SOURCE -# undef _GNU_SOURCE -#endif -/* Enable threading extensions on Solaris. */ -#ifndef _POSIX_PTHREAD_SEMANTICS -# undef _POSIX_PTHREAD_SEMANTICS -#endif -/* Enable extensions on HP NonStop. */ -#ifndef _TANDEM_SOURCE -# undef _TANDEM_SOURCE -#endif -/* Enable general extensions on Solaris. */ -#ifndef __EXTENSIONS__ -# undef __EXTENSIONS__ -#endif - - -/* Version number of package */ -#undef VERSION - -/* Define to 1 if the X Window System is missing or not being used. */ -#undef X_DISPLAY_MISSING - -/* Are we building for HP-UX? */ -#undef _HPUX_SOURCE - -/* Define to 1 if on MINIX. */ -#undef _MINIX - -/* Define to 2 if the system does not provide POSIX.1 features except with - this defined. */ -#undef _POSIX_1_SOURCE - -/* Define to 1 if you need to in order for `stat' and other things to work. */ -#undef _POSIX_SOURCE - -/* Define this to the process ID type */ -#undef hwloc_pid_t - -/* Define this to the thread ID type */ -#undef hwloc_thread_t - - -#endif /* HWLOC_CONFIGURE_H */ - diff --git a/opal/mca/hwloc/hwloc1113/hwloc/include/private/components.h b/opal/mca/hwloc/hwloc1113/hwloc/include/private/components.h deleted file mode 100644 index b36634535f9..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/include/private/components.h +++ /dev/null @@ -1,40 +0,0 @@ -/* - * Copyright © 2012 Inria. All rights reserved. - * See COPYING in top-level directory. - */ - - -#ifdef HWLOC_INSIDE_PLUGIN -/* - * these declarations are internal only, they are not available to plugins - * (many functions below are internal static symbols). - */ -#error This file should not be used in plugins -#endif - - -#ifndef PRIVATE_COMPONENTS_H -#define PRIVATE_COMPONENTS_H 1 - -#include - -struct hwloc_topology; - -extern int hwloc_disc_component_force_enable(struct hwloc_topology *topology, - int envvar_forced, /* 1 if forced through envvar, 0 if forced through API */ - int type, const char *name, - const void *data1, const void *data2, const void *data3); -extern void hwloc_disc_components_enable_others(struct hwloc_topology *topology); - -/* Compute the topology is_thissystem flag based on enabled backends */ -extern void hwloc_backends_is_thissystem(struct hwloc_topology *topology); - -/* Disable and destroy all backends used by a topology */ -extern void hwloc_backends_disable_all(struct hwloc_topology *topology); - -/* Used by the core to setup/destroy the list of components */ -extern void hwloc_components_init(struct hwloc_topology *topology); /* increases components refcount, should be called exactly once per topology (during init) */ -extern void hwloc_components_destroy_all(struct hwloc_topology *topology); /* decreases components refcount, should be called exactly once per topology (during destroy) */ - -#endif /* PRIVATE_COMPONENTS_H */ - diff --git a/opal/mca/hwloc/hwloc1113/hwloc/include/private/misc.h b/opal/mca/hwloc/hwloc1113/hwloc/include/private/misc.h deleted file mode 100644 index dbfc2d8e8a1..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/include/private/misc.h +++ /dev/null @@ -1,409 +0,0 @@ -/* - * Copyright © 2009 CNRS - * Copyright © 2009-2016 Inria. All rights reserved. - * Copyright © 2009-2012 Université Bordeaux - * Copyright © 2011 Cisco Systems, Inc. All rights reserved. - * See COPYING in top-level directory. - */ - -/* Misc macros and inlines. */ - -#ifndef HWLOC_PRIVATE_MISC_H -#define HWLOC_PRIVATE_MISC_H - -#include -#include - -#ifdef HWLOC_HAVE_DECL_STRNCASECMP -#ifdef HAVE_STRINGS_H -#include -#endif -#else -#ifdef HAVE_CTYPE_H -#include -#endif -#endif - -/* Compile-time assertion */ -#define HWLOC_BUILD_ASSERT(condition) ((void)sizeof(char[1 - 2*!(condition)])) - -#define HWLOC_BITS_PER_LONG (HWLOC_SIZEOF_UNSIGNED_LONG * 8) -#define HWLOC_BITS_PER_INT (HWLOC_SIZEOF_UNSIGNED_INT * 8) - -#if (HWLOC_BITS_PER_LONG != 32) && (HWLOC_BITS_PER_LONG != 64) -#error "unknown size for unsigned long." -#endif - -#if (HWLOC_BITS_PER_INT != 16) && (HWLOC_BITS_PER_INT != 32) && (HWLOC_BITS_PER_INT != 64) -#error "unknown size for unsigned int." -#endif - - -/** - * ffsl helpers. - */ - -#if defined(HWLOC_HAVE_BROKEN_FFS) - -/* System has a broken ffs(). - * We must check the before __GNUC__ or HWLOC_HAVE_FFSL - */ -# define HWLOC_NO_FFS - -#elif defined(__GNUC__) - -# if (__GNUC__ >= 4) || ((__GNUC__ == 3) && (__GNUC_MINOR__ >= 4)) - /* Starting from 3.4, gcc has a long variant. */ -# define hwloc_ffsl(x) __builtin_ffsl(x) -# else -# define hwloc_ffs(x) __builtin_ffs(x) -# define HWLOC_NEED_FFSL -# endif - -#elif defined(HWLOC_HAVE_FFSL) - -# ifndef HWLOC_HAVE_DECL_FFSL -extern int ffsl(long) __hwloc_attribute_const; -# endif - -# define hwloc_ffsl(x) ffsl(x) - -#elif defined(HWLOC_HAVE_FFS) - -# ifndef HWLOC_HAVE_DECL_FFS -extern int ffs(int) __hwloc_attribute_const; -# endif - -# define hwloc_ffs(x) ffs(x) -# define HWLOC_NEED_FFSL - -#else /* no ffs implementation */ - -# define HWLOC_NO_FFS - -#endif - -#ifdef HWLOC_NO_FFS - -/* no ffs or it is known to be broken */ -static __hwloc_inline int -hwloc_ffsl_manual(unsigned long x) __hwloc_attribute_const; -static __hwloc_inline int -hwloc_ffsl_manual(unsigned long x) -{ - int i; - - if (!x) - return 0; - - i = 1; -#if HWLOC_BITS_PER_LONG >= 64 - if (!(x & 0xfffffffful)) { - x >>= 32; - i += 32; - } -#endif - if (!(x & 0xffffu)) { - x >>= 16; - i += 16; - } - if (!(x & 0xff)) { - x >>= 8; - i += 8; - } - if (!(x & 0xf)) { - x >>= 4; - i += 4; - } - if (!(x & 0x3)) { - x >>= 2; - i += 2; - } - if (!(x & 0x1)) { - x >>= 1; - i += 1; - } - - return i; -} -/* always define hwloc_ffsl as a macro, to avoid renaming breakage */ -#define hwloc_ffsl hwloc_ffsl_manual - -#elif defined(HWLOC_NEED_FFSL) - -/* We only have an int ffs(int) implementation, build a long one. */ - -/* First make it 32 bits if it was only 16. */ -static __hwloc_inline int -hwloc_ffs32(unsigned long x) __hwloc_attribute_const; -static __hwloc_inline int -hwloc_ffs32(unsigned long x) -{ -#if HWLOC_BITS_PER_INT == 16 - int low_ffs, hi_ffs; - - low_ffs = hwloc_ffs(x & 0xfffful); - if (low_ffs) - return low_ffs; - - hi_ffs = hwloc_ffs(x >> 16); - if (hi_ffs) - return hi_ffs + 16; - - return 0; -#else - return hwloc_ffs(x); -#endif -} - -/* Then make it 64 bit if longs are. */ -static __hwloc_inline int -hwloc_ffsl_from_ffs32(unsigned long x) __hwloc_attribute_const; -static __hwloc_inline int -hwloc_ffsl_from_ffs32(unsigned long x) -{ -#if HWLOC_BITS_PER_LONG == 64 - int low_ffs, hi_ffs; - - low_ffs = hwloc_ffs32(x & 0xfffffffful); - if (low_ffs) - return low_ffs; - - hi_ffs = hwloc_ffs32(x >> 32); - if (hi_ffs) - return hi_ffs + 32; - - return 0; -#else - return hwloc_ffs32(x); -#endif -} -/* always define hwloc_ffsl as a macro, to avoid renaming breakage */ -#define hwloc_ffsl hwloc_ffsl_from_ffs32 - -#endif - -/** - * flsl helpers. - */ -#ifdef __GNUC_____ - -# if (__GNUC__ >= 4) || ((__GNUC__ == 3) && (__GNUC_MINOR__ >= 4)) -# define hwloc_flsl(x) (x ? 8*sizeof(long) - __builtin_clzl(x) : 0) -# else -# define hwloc_fls(x) (x ? 8*sizeof(int) - __builtin_clz(x) : 0) -# define HWLOC_NEED_FLSL -# endif - -#elif defined(HWLOC_HAVE_FLSL) - -# ifndef HWLOC_HAVE_DECL_FLSL -extern int flsl(long) __hwloc_attribute_const; -# endif - -# define hwloc_flsl(x) flsl(x) - -#elif defined(HWLOC_HAVE_CLZL) - -# ifndef HWLOC_HAVE_DECL_CLZL -extern int clzl(long) __hwloc_attribute_const; -# endif - -# define hwloc_flsl(x) (x ? 8*sizeof(long) - clzl(x) : 0) - -#elif defined(HWLOC_HAVE_FLS) - -# ifndef HWLOC_HAVE_DECL_FLS -extern int fls(int) __hwloc_attribute_const; -# endif - -# define hwloc_fls(x) fls(x) -# define HWLOC_NEED_FLSL - -#elif defined(HWLOC_HAVE_CLZ) - -# ifndef HWLOC_HAVE_DECL_CLZ -extern int clz(int) __hwloc_attribute_const; -# endif - -# define hwloc_fls(x) (x ? 8*sizeof(int) - clz(x) : 0) -# define HWLOC_NEED_FLSL - -#else /* no fls implementation */ - -static __hwloc_inline int -hwloc_flsl_manual(unsigned long x) __hwloc_attribute_const; -static __hwloc_inline int -hwloc_flsl_manual(unsigned long x) -{ - int i = 0; - - if (!x) - return 0; - - i = 1; -#if HWLOC_BITS_PER_LONG >= 64 - if ((x & 0xffffffff00000000ul)) { - x >>= 32; - i += 32; - } -#endif - if ((x & 0xffff0000u)) { - x >>= 16; - i += 16; - } - if ((x & 0xff00)) { - x >>= 8; - i += 8; - } - if ((x & 0xf0)) { - x >>= 4; - i += 4; - } - if ((x & 0xc)) { - x >>= 2; - i += 2; - } - if ((x & 0x2)) { - x >>= 1; - i += 1; - } - - return i; -} -/* always define hwloc_flsl as a macro, to avoid renaming breakage */ -#define hwloc_flsl hwloc_flsl_manual - -#endif - -#ifdef HWLOC_NEED_FLSL - -/* We only have an int fls(int) implementation, build a long one. */ - -/* First make it 32 bits if it was only 16. */ -static __hwloc_inline int -hwloc_fls32(unsigned long x) __hwloc_attribute_const; -static __hwloc_inline int -hwloc_fls32(unsigned long x) -{ -#if HWLOC_BITS_PER_INT == 16 - int low_fls, hi_fls; - - hi_fls = hwloc_fls(x >> 16); - if (hi_fls) - return hi_fls + 16; - - low_fls = hwloc_fls(x & 0xfffful); - if (low_fls) - return low_fls; - - return 0; -#else - return hwloc_fls(x); -#endif -} - -/* Then make it 64 bit if longs are. */ -static __hwloc_inline int -hwloc_flsl_from_fls32(unsigned long x) __hwloc_attribute_const; -static __hwloc_inline int -hwloc_flsl_from_fls32(unsigned long x) -{ -#if HWLOC_BITS_PER_LONG == 64 - int low_fls, hi_fls; - - hi_fls = hwloc_fls32(x >> 32); - if (hi_fls) - return hi_fls + 32; - - low_fls = hwloc_fls32(x & 0xfffffffful); - if (low_fls) - return low_fls; - - return 0; -#else - return hwloc_fls32(x); -#endif -} -/* always define hwloc_flsl as a macro, to avoid renaming breakage */ -#define hwloc_flsl hwloc_flsl_from_fls32 - -#endif - -static __hwloc_inline int -hwloc_weight_long(unsigned long w) __hwloc_attribute_const; -static __hwloc_inline int -hwloc_weight_long(unsigned long w) -{ -#if HWLOC_BITS_PER_LONG == 32 -#if (__GNUC__ >= 4) || ((__GNUC__ == 3) && (__GNUC_MINOR__) >= 4) - return __builtin_popcount(w); -#else - unsigned int res = (w & 0x55555555) + ((w >> 1) & 0x55555555); - res = (res & 0x33333333) + ((res >> 2) & 0x33333333); - res = (res & 0x0F0F0F0F) + ((res >> 4) & 0x0F0F0F0F); - res = (res & 0x00FF00FF) + ((res >> 8) & 0x00FF00FF); - return (res & 0x0000FFFF) + ((res >> 16) & 0x0000FFFF); -#endif -#else /* HWLOC_BITS_PER_LONG == 32 */ -#if (__GNUC__ >= 4) || ((__GNUC__ == 3) && (__GNUC_MINOR__) >= 4) - return __builtin_popcountll(w); -#else - unsigned long res; - res = (w & 0x5555555555555555ul) + ((w >> 1) & 0x5555555555555555ul); - res = (res & 0x3333333333333333ul) + ((res >> 2) & 0x3333333333333333ul); - res = (res & 0x0F0F0F0F0F0F0F0Ful) + ((res >> 4) & 0x0F0F0F0F0F0F0F0Ful); - res = (res & 0x00FF00FF00FF00FFul) + ((res >> 8) & 0x00FF00FF00FF00FFul); - res = (res & 0x0000FFFF0000FFFFul) + ((res >> 16) & 0x0000FFFF0000FFFFul); - return (res & 0x00000000FFFFFFFFul) + ((res >> 32) & 0x00000000FFFFFFFFul); -#endif -#endif /* HWLOC_BITS_PER_LONG == 64 */ -} - -#if !HAVE_DECL_STRTOULL && defined(HAVE_STRTOULL) -unsigned long long int strtoull(const char *nptr, char **endptr, int base); -#endif - -static __hwloc_inline int hwloc_strncasecmp(const char *s1, const char *s2, size_t n) -{ -#ifdef HWLOC_HAVE_DECL_STRNCASECMP - return strncasecmp(s1, s2, n); -#else - while (n) { - char c1 = tolower(*s1), c2 = tolower(*s2); - if (!c1 || !c2 || c1 != c2) - return c1-c2; - n--; s1++; s2++; - } - return 0; -#endif -} - -#ifdef HWLOC_WIN_SYS -# ifndef HAVE_SSIZE_T -typedef SSIZE_T ssize_t; -# endif -# if !HAVE_DECL_STRTOULL && !defined(HAVE_STRTOULL) -# define strtoull _strtoui64 -# endif -# ifndef S_ISREG -# define S_ISREG(m) ((m) & S_IFREG) -# endif -# ifndef S_ISDIR -# define S_ISDIR(m) (((m) & S_IFMT) == S_IFDIR) -# endif -# if !HAVE_DECL_STRCASECMP -# define strcasecmp _stricmp -# endif -# if !HAVE_DECL_SNPRINTF -# define snprintf _snprintf -# endif -# if HAVE_DECL__STRDUP -# define strdup _strdup -# endif -# if HAVE_DECL__PUTENV -# define putenv _putenv -# endif -#endif - -#endif /* HWLOC_PRIVATE_MISC_H */ diff --git a/opal/mca/hwloc/hwloc1113/hwloc/include/private/private.h b/opal/mca/hwloc/hwloc1113/hwloc/include/private/private.h deleted file mode 100644 index 24ded2893a8..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/include/private/private.h +++ /dev/null @@ -1,341 +0,0 @@ -/* - * Copyright © 2009 CNRS - * Copyright © 2009-2016 Inria. All rights reserved. - * Copyright © 2009-2012 Université Bordeaux - * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. - * - * See COPYING in top-level directory. - */ - -/* Internal types and helpers. */ - - -#ifdef HWLOC_INSIDE_PLUGIN -/* - * these declarations are internal only, they are not available to plugins - * (many functions below are internal static symbols). - */ -#error This file should not be used in plugins -#endif - - -#ifndef HWLOC_PRIVATE_H -#define HWLOC_PRIVATE_H - -#include -#include -#include -#include -#include -#include -#ifdef HAVE_UNISTD_H -#include -#endif -#ifdef HAVE_STDINT_H -#include -#endif -#ifdef HAVE_SYS_UTSNAME_H -#include -#endif -#include - -enum hwloc_ignore_type_e { - HWLOC_IGNORE_TYPE_NEVER = 0, - HWLOC_IGNORE_TYPE_KEEP_STRUCTURE, - HWLOC_IGNORE_TYPE_ALWAYS -}; - -#define HWLOC_DEPTH_MAX 128 - -struct hwloc_topology { - unsigned nb_levels; /* Number of horizontal levels */ - unsigned next_group_depth; /* Depth of the next Group object that we may create */ - unsigned level_nbobjects[HWLOC_DEPTH_MAX]; /* Number of objects on each horizontal level */ - struct hwloc_obj **levels[HWLOC_DEPTH_MAX]; /* Direct access to levels, levels[l = 0 .. nblevels-1][0..level_nbobjects[l]] */ - unsigned long flags; - int type_depth[HWLOC_OBJ_TYPE_MAX]; - enum hwloc_ignore_type_e ignored_types[HWLOC_OBJ_TYPE_MAX]; - int is_thissystem; - int is_loaded; - hwloc_pid_t pid; /* Process ID the topology is view from, 0 for self */ - void *userdata; - - unsigned bridge_nbobjects; - struct hwloc_obj **bridge_level; - struct hwloc_obj *first_bridge, *last_bridge; - unsigned pcidev_nbobjects; - struct hwloc_obj **pcidev_level; - struct hwloc_obj *first_pcidev, *last_pcidev; - unsigned osdev_nbobjects; - struct hwloc_obj **osdev_level; - struct hwloc_obj *first_osdev, *last_osdev; - - struct hwloc_binding_hooks { - int (*set_thisproc_cpubind)(hwloc_topology_t topology, hwloc_const_cpuset_t set, int flags); - int (*get_thisproc_cpubind)(hwloc_topology_t topology, hwloc_cpuset_t set, int flags); - int (*set_thisthread_cpubind)(hwloc_topology_t topology, hwloc_const_cpuset_t set, int flags); - int (*get_thisthread_cpubind)(hwloc_topology_t topology, hwloc_cpuset_t set, int flags); - int (*set_proc_cpubind)(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_cpuset_t set, int flags); - int (*get_proc_cpubind)(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_cpuset_t set, int flags); -#ifdef hwloc_thread_t - int (*set_thread_cpubind)(hwloc_topology_t topology, hwloc_thread_t tid, hwloc_const_cpuset_t set, int flags); - int (*get_thread_cpubind)(hwloc_topology_t topology, hwloc_thread_t tid, hwloc_cpuset_t set, int flags); -#endif - - int (*get_thisproc_last_cpu_location)(hwloc_topology_t topology, hwloc_cpuset_t set, int flags); - int (*get_thisthread_last_cpu_location)(hwloc_topology_t topology, hwloc_cpuset_t set, int flags); - int (*get_proc_last_cpu_location)(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_cpuset_t set, int flags); - - int (*set_thisproc_membind)(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags); - int (*get_thisproc_membind)(hwloc_topology_t topology, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags); - int (*set_thisthread_membind)(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags); - int (*get_thisthread_membind)(hwloc_topology_t topology, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags); - int (*set_proc_membind)(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags); - int (*get_proc_membind)(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags); - int (*set_area_membind)(hwloc_topology_t topology, const void *addr, size_t len, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags); - int (*get_area_membind)(hwloc_topology_t topology, const void *addr, size_t len, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags); - int (*get_area_memlocation)(hwloc_topology_t topology, const void *addr, size_t len, hwloc_nodeset_t nodeset, int flags); - /* This has to return the same kind of pointer as alloc_membind, so that free_membind can be used on it */ - void *(*alloc)(hwloc_topology_t topology, size_t len); - /* alloc_membind has to always succeed if !(flags & HWLOC_MEMBIND_STRICT). - * see hwloc_alloc_or_fail which is convenient for that. */ - void *(*alloc_membind)(hwloc_topology_t topology, size_t len, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags); - int (*free_membind)(hwloc_topology_t topology, void *addr, size_t len); - } binding_hooks; - - struct hwloc_topology_support support; - - void (*userdata_export_cb)(void *reserved, struct hwloc_topology *topology, struct hwloc_obj *obj); - void (*userdata_import_cb)(struct hwloc_topology *topology, struct hwloc_obj *obj, const char *name, const void *buffer, size_t length); - int userdata_not_decoded; - - struct hwloc_os_distances_s { - hwloc_obj_type_t type; - int nbobjs; - unsigned *indexes; /* array of OS indexes before we can convert them into objs. always available. - */ - struct hwloc_obj **objs; /* array of objects, in the same order as above. - * either given (by a backend) together with the indexes array above. - * or build from the above indexes array when not given (by the user). - */ - float *distances; /* distance matrices, ordered according to the above indexes/objs array. - * distance from i to j is stored in slot i*nbnodes+j. - * will be copied into the main logical-index-ordered distance at the end of the discovery. - */ - int forced; /* set if the user forced a matrix to ignore the OS one */ - - struct hwloc_os_distances_s *prev, *next; - } *first_osdist, *last_osdist; - - /* list of enabled backends. */ - struct hwloc_backend * backends; -}; - -extern void hwloc_alloc_obj_cpusets(hwloc_obj_t obj); -extern void hwloc_setup_pu_level(struct hwloc_topology *topology, unsigned nb_pus); -extern int hwloc_get_sysctlbyname(const char *name, int64_t *n); -extern int hwloc_get_sysctl(int name[], unsigned namelen, int *n); -extern unsigned hwloc_fallback_nbprocessors(struct hwloc_topology *topology); -extern void hwloc_connect_children(hwloc_obj_t obj); -extern int hwloc_connect_levels(hwloc_topology_t topology); - -extern int hwloc__object_cpusets_compare_first(hwloc_obj_t obj1, hwloc_obj_t obj2); - -extern void hwloc_topology_setup_defaults(struct hwloc_topology *topology); -extern void hwloc_topology_clear(struct hwloc_topology *topology); - -extern void hwloc__add_info(struct hwloc_obj_info_s **infosp, unsigned *countp, const char *name, const char *value); -extern char ** hwloc__find_info_slot(struct hwloc_obj_info_s **infosp, unsigned *countp, const char *name); -extern void hwloc__move_infos(struct hwloc_obj_info_s **dst_infosp, unsigned *dst_countp, struct hwloc_obj_info_s **src_infosp, unsigned *src_countp); -extern void hwloc__free_infos(struct hwloc_obj_info_s *infos, unsigned count); - -/* set native OS binding hooks */ -extern void hwloc_set_native_binding_hooks(struct hwloc_binding_hooks *hooks, struct hwloc_topology_support *support); -/* set either native OS binding hooks (if thissystem), or dummy ones */ -extern void hwloc_set_binding_hooks(struct hwloc_topology *topology); - -#if defined(HWLOC_LINUX_SYS) -extern void hwloc_set_linuxfs_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); -#endif /* HWLOC_LINUX_SYS */ - -#if defined(HWLOC_BGQ_SYS) -extern void hwloc_set_bgq_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); -#endif /* HWLOC_BGQ_SYS */ - -#ifdef HWLOC_SOLARIS_SYS -extern void hwloc_set_solaris_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); -#endif /* HWLOC_SOLARIS_SYS */ - -#ifdef HWLOC_AIX_SYS -extern void hwloc_set_aix_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); -#endif /* HWLOC_AIX_SYS */ - -#ifdef HWLOC_OSF_SYS -extern void hwloc_set_osf_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); -#endif /* HWLOC_OSF_SYS */ - -#ifdef HWLOC_WIN_SYS -extern void hwloc_set_windows_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); -#endif /* HWLOC_WIN_SYS */ - -#ifdef HWLOC_DARWIN_SYS -extern void hwloc_set_darwin_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); -#endif /* HWLOC_DARWIN_SYS */ - -#ifdef HWLOC_FREEBSD_SYS -extern void hwloc_set_freebsd_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); -#endif /* HWLOC_FREEBSD_SYS */ - -#ifdef HWLOC_NETBSD_SYS -extern void hwloc_set_netbsd_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); -#endif /* HWLOC_NETBSD_SYS */ - -#ifdef HWLOC_HPUX_SYS -extern void hwloc_set_hpux_hooks(struct hwloc_binding_hooks *binding_hooks, struct hwloc_topology_support *support); -#endif /* HWLOC_HPUX_SYS */ - -extern int hwloc_look_hardwired_fujitsu_k(struct hwloc_topology *topology); -extern int hwloc_look_hardwired_fujitsu_fx10(struct hwloc_topology *topology); -extern int hwloc_look_hardwired_fujitsu_fx100(struct hwloc_topology *topology); - -/* Insert uname-specific names/values in the object infos array. - * If cached_uname isn't NULL, it is used as a struct utsname instead of recalling uname. - * Any field that starts with \0 is ignored. - */ -extern void hwloc_add_uname_info(struct hwloc_topology *topology, void *cached_uname); - -/* Free obj and its attributes assuming it doesn't have any children/parent anymore */ -extern void hwloc_free_unlinked_object(hwloc_obj_t obj); - -/* Duplicate src and its children under newparent in newtopology */ -extern void hwloc__duplicate_objects(struct hwloc_topology *newtopology, struct hwloc_obj *newparent, struct hwloc_obj *src); - -/* This can be used for the alloc field to get allocated data that can be freed by free() */ -void *hwloc_alloc_heap(hwloc_topology_t topology, size_t len); - -/* This can be used for the alloc field to get allocated data that can be freed by munmap() */ -void *hwloc_alloc_mmap(hwloc_topology_t topology, size_t len); - -/* This can be used for the free_membind field to free data using free() */ -int hwloc_free_heap(hwloc_topology_t topology, void *addr, size_t len); - -/* This can be used for the free_membind field to free data using munmap() */ -int hwloc_free_mmap(hwloc_topology_t topology, void *addr, size_t len); - -/* Allocates unbound memory or fail, depending on whether STRICT is requested - * or not */ -static __hwloc_inline void * -hwloc_alloc_or_fail(hwloc_topology_t topology, size_t len, int flags) -{ - if (flags & HWLOC_MEMBIND_STRICT) - return NULL; - return hwloc_alloc(topology, len); -} - -extern void hwloc_distances_init(struct hwloc_topology *topology); -extern void hwloc_distances_destroy(struct hwloc_topology *topology); -extern void hwloc_distances_set(struct hwloc_topology *topology, hwloc_obj_type_t type, unsigned nbobjs, unsigned *indexes, hwloc_obj_t *objs, float *distances, int force); -extern void hwloc_distances_set_from_env(struct hwloc_topology *topology); -extern void hwloc_distances_restrict_os(struct hwloc_topology *topology); -extern void hwloc_distances_restrict(struct hwloc_topology *topology, unsigned long flags); -extern void hwloc_distances_finalize_os(struct hwloc_topology *topology); -extern void hwloc_distances_finalize_logical(struct hwloc_topology *topology); -extern void hwloc_clear_object_distances(struct hwloc_obj *obj); -extern void hwloc_clear_object_distances_one(struct hwloc_distances_s *distances); -extern void hwloc_group_by_distances(struct hwloc_topology *topology); - -#ifdef HAVE_USELOCALE -#include "locale.h" -#ifdef HAVE_XLOCALE_H -#include "xlocale.h" -#endif -#define hwloc_localeswitch_declare locale_t __old_locale = (locale_t)0, __new_locale -#define hwloc_localeswitch_init() do { \ - __new_locale = newlocale(LC_ALL_MASK, "C", (locale_t)0); \ - if (__new_locale != (locale_t)0) \ - __old_locale = uselocale(__new_locale); \ -} while (0) -#define hwloc_localeswitch_fini() do { \ - if (__new_locale != (locale_t)0) { \ - uselocale(__old_locale); \ - freelocale(__new_locale); \ - } \ -} while(0) -#else /* HAVE_USELOCALE */ -#if __HWLOC_HAVE_ATTRIBUTE_UNUSED -#define hwloc_localeswitch_declare int __dummy_nolocale __hwloc_attribute_unused -#define hwloc_localeswitch_init() -#else -#define hwloc_localeswitch_declare int __dummy_nolocale -#define hwloc_localeswitch_init() (void)__dummy_nolocale -#endif -#define hwloc_localeswitch_fini() -#endif /* HAVE_USELOCALE */ - -#if !HAVE_DECL_FABSF -#define fabsf(f) fabs((double)(f)) -#endif - -#if HAVE_DECL__SC_PAGE_SIZE -#define hwloc_getpagesize() sysconf(_SC_PAGE_SIZE) -#elif HAVE_DECL__SC_PAGESIZE -#define hwloc_getpagesize() sysconf(_SC_PAGESIZE) -#elif defined HAVE_GETPAGESIZE -#define hwloc_getpagesize() getpagesize() -#else -#undef hwloc_getpagesize -#endif - -/* encode src buffer into target buffer. - * targsize must be at least 4*((srclength+2)/3)+1. - * target will be 0-terminated. - */ -extern int hwloc_encode_to_base64(const char *src, size_t srclength, char *target, size_t targsize); -/* decode src buffer into target buffer. - * src is 0-terminated. - * targsize must be at least srclength*3/4+1 (srclength not including \0) - * but only srclength*3/4 characters will be meaningful - * (the next one may be partially written during decoding, but it should be ignored). - */ -extern int hwloc_decode_from_base64(char const *src, char *target, size_t targsize); - -/* Check whether needle matches the beginning of haystack, at least n, and up - * to a colon or \0 */ -extern int hwloc_namecoloncmp(const char *haystack, const char *needle, size_t n); - -#ifdef HWLOC_HAVE_ATTRIBUTE_FORMAT -# if HWLOC_HAVE_ATTRIBUTE_FORMAT -# define __hwloc_attribute_format(type, str, arg) __attribute__((__format__(type, str, arg))) -# else -# define __hwloc_attribute_format(type, str, arg) -# endif -#else -# define __hwloc_attribute_format(type, str, arg) -#endif - -#define hwloc_memory_size_printf_value(_size, _verbose) \ - ((_size) < (10ULL<<20) || _verbose ? (((_size)>>9)+1)>>1 : (_size) < (10ULL<<30) ? (((_size)>>19)+1)>>1 : (_size) < (10ULL<<40) ? (((_size)>>29)+1)>>1 : (((_size)>>39)+1)>>1) -#define hwloc_memory_size_printf_unit(_size, _verbose) \ - ((_size) < (10ULL<<20) || _verbose ? "KB" : (_size) < (10ULL<<30) ? "MB" : (_size) < (10ULL<<40) ? "GB" : "TB") - -/* On some systems, snprintf returns the size of written data, not the actually - * required size. hwloc_snprintf always report the actually required size. */ -extern int hwloc_snprintf(char *str, size_t size, const char *format, ...) __hwloc_attribute_format(printf, 3, 4); - -extern void hwloc_obj_add_info_nodup(hwloc_obj_t obj, const char *name, const char *value, int nodup); - -/* Return the name of the currently running program, if supported. - * If not NULL, must be freed by the caller. - */ -extern char * hwloc_progname(struct hwloc_topology *topology); - -#define HWLOC_BITMAP_EQUAL 0 /* Bitmaps are equal */ -#define HWLOC_BITMAP_INCLUDED 1 /* First bitmap included in second */ -#define HWLOC_BITMAP_CONTAINS 2 /* First bitmap contains second */ -#define HWLOC_BITMAP_INTERSECTS 3 /* Bitmaps intersect without any inclusion */ -#define HWLOC_BITMAP_DIFFERENT 4 /* Bitmaps do not intersect */ - -/** \brief Compare bitmaps \p bitmap1 and \p bitmap2 from an inclusion point of view. - */ -HWLOC_DECLSPEC int hwloc_bitmap_compare_inclusion(hwloc_const_bitmap_t bitmap1, hwloc_const_bitmap_t bitmap2) __hwloc_attribute_pure; -#endif /* HWLOC_PRIVATE_H */ diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/Makefile.am b/opal/mca/hwloc/hwloc1113/hwloc/src/Makefile.am deleted file mode 100644 index fa7dd891741..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/Makefile.am +++ /dev/null @@ -1,236 +0,0 @@ -# Copyright © 2009-2016 Inria. All rights reserved. -# Copyright © 2009-2012 Université Bordeaux -# Copyright © 2009-2014 Cisco Systems, Inc. All rights reserved. -# Copyright © 2011-2012 Oracle and/or its affiliates. All rights reserved. -# See COPYING in top-level directory. - -AM_CFLAGS = $(HWLOC_CFLAGS) -AM_CPPFLAGS = $(HWLOC_CPPFLAGS) -DHWLOC_INSIDE_LIBHWLOC -AM_LDFLAGS = $(HWLOC_LDFLAGS) - -EXTRA_DIST = dolib.c - -# If we're in standalone mode, build the installable library. -# Otherwise, build the embedded library. - -if HWLOC_BUILD_STANDALONE -lib_LTLIBRARIES = libhwloc.la -else -noinst_LTLIBRARIES = libhwloc_embedded.la -endif - -pluginsdir = @HWLOC_PLUGINS_DIR@ -plugins_LTLIBRARIES = -plugins_ldflags = -module -avoid-version -lltdl -# Beware that files are not rebuilt automatically when reconfiguring with different paths in these flags. -AM_CPPFLAGS += -DHWLOC_PLUGINS_PATH=\"$(HWLOC_PLUGINS_PATH)\" -DRUNSTATEDIR=\"$(HWLOC_runstatedir)\" - -# Sources and ldflags - -sources = \ - topology.c \ - traversal.c \ - distances.c \ - components.c \ - bind.c \ - bitmap.c \ - pci-common.c \ - diff.c \ - misc.c \ - base64.c \ - topology-noos.c \ - topology-synthetic.c \ - topology-custom.c \ - topology-xml.c \ - topology-xml-nolibxml.c -ldflags = - -# Conditionally add to the sources and ldflags - -if HWLOC_HAVE_LIBXML2 -if HWLOC_XML_LIBXML_BUILD_STATIC -sources += topology-xml-libxml.c -else -plugins_LTLIBRARIES += hwloc_xml_libxml.la -hwloc_xml_libxml_la_SOURCES = topology-xml-libxml.c -hwloc_xml_libxml_la_CFLAGS = $(AM_CFLAGS) $(HWLOC_LIBXML2_CFLAGS) -DHWLOC_INSIDE_PLUGIN -hwloc_xml_libxml_la_LDFLAGS = $(plugins_ldflags) $(HWLOC_LIBXML2_LIBS) -endif -endif HWLOC_HAVE_LIBXML2 - -if HWLOC_HAVE_PCI -if HWLOC_PCI_BUILD_STATIC -sources += topology-pci.c -else -plugins_LTLIBRARIES += hwloc_pci.la -hwloc_pci_la_SOURCES = topology-pci.c -hwloc_pci_la_CFLAGS = $(AM_CFLAGS) $(HWLOC_PCIACCESS_CFLAGS) -DHWLOC_INSIDE_PLUGIN -hwloc_pci_la_LDFLAGS = $(plugins_ldflags) $(HWLOC_PCIACCESS_LIBS) -endif -endif HWLOC_HAVE_PCI - -if HWLOC_HAVE_OPENCL -if HWLOC_OPENCL_BUILD_STATIC -sources += topology-opencl.c -else -plugins_LTLIBRARIES += hwloc_opencl.la -hwloc_opencl_la_SOURCES = topology-opencl.c -hwloc_opencl_la_CFLAGS = $(AM_CFLAGS) $(HWLOC_OPENCL_CFLAGS) -DHWLOC_INSIDE_PLUGIN -hwloc_opencl_la_LDFLAGS = $(plugins_ldflags) $(HWLOC_OPENCL_LIBS) -endif -endif HWLOC_HAVE_OPENCL - -if HWLOC_HAVE_CUDART -if HWLOC_CUDA_BUILD_STATIC -sources += topology-cuda.c -else -plugins_LTLIBRARIES += hwloc_cuda.la -hwloc_cuda_la_SOURCES = topology-cuda.c -hwloc_cuda_la_CFLAGS = $(AM_CFLAGS) $(HWLOC_CUDA_CFLAGS) -DHWLOC_INSIDE_PLUGIN -hwloc_cuda_la_LDFLAGS = $(plugins_ldflags) $(HWLOC_CUDA_LIBS) -endif -endif HWLOC_HAVE_CUDART - -if HWLOC_HAVE_NVML -if HWLOC_NVML_BUILD_STATIC -sources += topology-nvml.c -else -plugins_LTLIBRARIES += hwloc_nvml.la -hwloc_nvml_la_SOURCES = topology-nvml.c -hwloc_nvml_la_CFLAGS = $(AM_CFLAGS) $(HWLOC_NVML_CFLAGS) -DHWLOC_INSIDE_PLUGIN -hwloc_nvml_la_LDFLAGS = $(plugins_ldflags) $(HWLOC_NVML_LIBS) -endif -endif HWLOC_HAVE_NVML - -if HWLOC_HAVE_GL -if HWLOC_GL_BUILD_STATIC -sources += topology-gl.c -else -plugins_LTLIBRARIES += hwloc_gl.la -hwloc_gl_la_SOURCES = topology-gl.c -hwloc_gl_la_CFLAGS = $(AM_CFLAGS) $(HWLOC_GL_CFLAGS) -DHWLOC_INSIDE_PLUGIN -hwloc_gl_la_LDFLAGS = $(plugins_ldflags) $(HWLOC_GL_LIBS) -endif -endif HWLOC_HAVE_GL - -if HWLOC_HAVE_SOLARIS -sources += topology-solaris.c -sources += topology-solaris-chiptype.c -endif HWLOC_HAVE_SOLARIS - -if HWLOC_HAVE_LINUX -sources += topology-linux.c topology-hardwired.c -endif HWLOC_HAVE_LINUX - -if HWLOC_HAVE_BGQ -sources += topology-bgq.c -endif HWLOC_HAVE_BGQ - -if HWLOC_HAVE_AIX -sources += topology-aix.c -ldflags += -lpthread -endif HWLOC_HAVE_AIX - -if HWLOC_HAVE_OSF -sources += topology-osf.c -ldflags += -lnuma -lpthread -endif HWLOC_HAVE_OSF - -if HWLOC_HAVE_HPUX -sources += topology-hpux.c -ldflags += -lpthread -endif HWLOC_HAVE_HPUX - -if HWLOC_HAVE_WINDOWS -sources += topology-windows.c -endif HWLOC_HAVE_WINDOWS - -if HWLOC_HAVE_DARWIN -sources += topology-darwin.c -endif HWLOC_HAVE_DARWIN - -if HWLOC_HAVE_FREEBSD -sources += topology-freebsd.c -endif HWLOC_HAVE_FREEBSD - -if HWLOC_HAVE_NETBSD -sources += topology-netbsd.c -ldflags += -lpthread -endif HWLOC_HAVE_NETBSD - -if HWLOC_HAVE_X86_CPUID -sources += topology-x86.c -endif HWLOC_HAVE_X86_CPUID - -if HWLOC_HAVE_GCC -ldflags += -no-undefined -endif HWLOC_HAVE_GCC - - -if HWLOC_HAVE_WINDOWS -# Windows specific rules - -LC_MESSAGES=C -export LC_MESSAGES -ldflags += -Xlinker --output-def -Xlinker .libs/libhwloc.def - -if HWLOC_HAVE_MS_LIB -dolib$(EXEEXT): dolib.c - $(CC_FOR_BUILD) $< -o $@ -.libs/libhwloc.lib: libhwloc.la dolib$(EXEEXT) - [ ! -r .libs/libhwloc.def ] || ./dolib$(EXEEXT) "$(HWLOC_MS_LIB)" $(HWLOC_MS_LIB_ARCH) .libs/libhwloc.def $(libhwloc_so_version) .libs/libhwloc.lib -all-local: .libs/libhwloc.lib -clean-local: - $(RM) dolib$(EXEEXT) -endif HWLOC_HAVE_MS_LIB - -install-exec-hook: - [ ! -r .libs/libhwloc.def ] || $(INSTALL) .libs/libhwloc.def $(DESTDIR)$(libdir) -if HWLOC_HAVE_MS_LIB - [ ! -r .libs/libhwloc.def ] || $(INSTALL) .libs/libhwloc.lib $(DESTDIR)$(libdir) - [ ! -r .libs/libhwloc.def ] || $(INSTALL) .libs/libhwloc.exp $(DESTDIR)$(libdir) -endif HWLOC_HAVE_MS_LIB - -uninstall-local: - rm -f $(DESTDIR)$(libdir)/libhwloc.def -if HWLOC_HAVE_MS_LIB - rm -f $(DESTDIR)$(libdir)/libhwloc.lib $(DESTDIR)$(libdir)/libhwloc.exp -endif HWLOC_HAVE_MS_LIB - -# End of Windows specific rules -endif HWLOC_HAVE_WINDOWS - - -# Installable library - -libhwloc_la_SOURCES = $(sources) -libhwloc_la_LDFLAGS = $(ldflags) -version-info $(libhwloc_so_version) $(HWLOC_LIBS) - -if HWLOC_HAVE_PLUGINS -AM_CPPFLAGS += $(LTDLINCL) -libhwloc_la_LDFLAGS += -export-dynamic -libhwloc_la_LIBADD = $(LIBLTDL) -endif - -# Embedded library (note the lack of a .so version number -- that -# intentionally only appears in the installable library). Also note -# the lack of _LDFLAGS -- all libs are added by the upper layer (via -# HWLOC_EMBEDDED_LIBS). - -libhwloc_embedded_la_SOURCES = $(sources) - -# XML data (only install if we're building in standalone mode) - -if HWLOC_BUILD_STANDALONE -xml_DATA = $(srcdir)/hwloc.dtd -xmldir = $(pkgdatadir) -EXTRA_DIST += hwloc.dtd -endif - -DISTCLEANFILES = static-components.h - -if HWLOC_HAVE_PLUGINS -check_LTLIBRARIES = hwloc_fake.la -hwloc_fake_la_SOURCES = topology-fake.c -hwloc_fake_la_LDFLAGS = $(plugins_ldflags) -rpath /nowhere # force libtool to build a shared-library even it's check-only -endif diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/base64.c b/opal/mca/hwloc/hwloc1113/hwloc/src/base64.c deleted file mode 100644 index 4e1976fde4b..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/base64.c +++ /dev/null @@ -1,306 +0,0 @@ -/* - * Copyright © 2012 Inria. All rights reserved. - * See COPYING in top-level directory. - * - * Modifications after import: - * - removed all #if - * - updated prototypes - * - updated #include - */ - -/* $OpenBSD: base64.c,v 1.5 2006/10/21 09:55:03 otto Exp $ */ - -/* - * Copyright (c) 1996 by Internet Software Consortium. - * - * Permission to use, copy, modify, and distribute this software for any - * purpose with or without fee is hereby granted, provided that the above - * copyright notice and this permission notice appear in all copies. - * - * THE SOFTWARE IS PROVIDED "AS IS" AND INTERNET SOFTWARE CONSORTIUM DISCLAIMS - * ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES - * OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL INTERNET SOFTWARE - * CONSORTIUM BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL - * DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR - * PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS - * ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS - * SOFTWARE. - */ - -/* - * Portions Copyright (c) 1995 by International Business Machines, Inc. - * - * International Business Machines, Inc. (hereinafter called IBM) grants - * permission under its copyrights to use, copy, modify, and distribute this - * Software with or without fee, provided that the above copyright notice and - * all paragraphs of this notice appear in all copies, and that the name of IBM - * not be used in connection with the marketing of any product incorporating - * the Software or modifications thereof, without specific, written prior - * permission. - * - * To the extent it has a right to do so, IBM grants an immunity from suit - * under its patents, if any, for the use, sale or manufacture of products to - * the extent that such products are used for performing Domain Name System - * dynamic updates in TCP/IP networks by means of the Software. No immunity is - * granted for any product per se or for any other function of any product. - * - * THE SOFTWARE IS PROVIDED "AS IS", AND IBM DISCLAIMS ALL WARRANTIES, - * INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A - * PARTICULAR PURPOSE. IN NO EVENT SHALL IBM BE LIABLE FOR ANY SPECIAL, - * DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ARISING - * OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE, EVEN - * IF IBM IS APPRISED OF THE POSSIBILITY OF SUCH DAMAGES. - */ - -/* OPENBSD ORIGINAL: lib/libc/net/base64.c */ - -static const char Base64[] = - "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; -static const char Pad64 = '='; - -/* (From RFC1521 and draft-ietf-dnssec-secext-03.txt) - The following encoding technique is taken from RFC 1521 by Borenstein - and Freed. It is reproduced here in a slightly edited form for - convenience. - - A 65-character subset of US-ASCII is used, enabling 6 bits to be - represented per printable character. (The extra 65th character, "=", - is used to signify a special processing function.) - - The encoding process represents 24-bit groups of input bits as output - strings of 4 encoded characters. Proceeding from left to right, a - 24-bit input group is formed by concatenating 3 8-bit input groups. - These 24 bits are then treated as 4 concatenated 6-bit groups, each - of which is translated into a single digit in the base64 alphabet. - - Each 6-bit group is used as an index into an array of 64 printable - characters. The character referenced by the index is placed in the - output string. - - Table 1: The Base64 Alphabet - - Value Encoding Value Encoding Value Encoding Value Encoding - 0 A 17 R 34 i 51 z - 1 B 18 S 35 j 52 0 - 2 C 19 T 36 k 53 1 - 3 D 20 U 37 l 54 2 - 4 E 21 V 38 m 55 3 - 5 F 22 W 39 n 56 4 - 6 G 23 X 40 o 57 5 - 7 H 24 Y 41 p 58 6 - 8 I 25 Z 42 q 59 7 - 9 J 26 a 43 r 60 8 - 10 K 27 b 44 s 61 9 - 11 L 28 c 45 t 62 + - 12 M 29 d 46 u 63 / - 13 N 30 e 47 v - 14 O 31 f 48 w (pad) = - 15 P 32 g 49 x - 16 Q 33 h 50 y - - Special processing is performed if fewer than 24 bits are available - at the end of the data being encoded. A full encoding quantum is - always completed at the end of a quantity. When fewer than 24 input - bits are available in an input group, zero bits are added (on the - right) to form an integral number of 6-bit groups. Padding at the - end of the data is performed using the '=' character. - - Since all base64 input is an integral number of octets, only the - ------------------------------------------------- - following cases can arise: - - (1) the final quantum of encoding input is an integral - multiple of 24 bits; here, the final unit of encoded - output will be an integral multiple of 4 characters - with no "=" padding, - (2) the final quantum of encoding input is exactly 8 bits; - here, the final unit of encoded output will be two - characters followed by two "=" padding characters, or - (3) the final quantum of encoding input is exactly 16 bits; - here, the final unit of encoded output will be three - characters followed by one "=" padding character. - */ - -#include -#include -#include - -#include - -int -hwloc_encode_to_base64(const char *src, size_t srclength, char *target, size_t targsize) -{ - size_t datalength = 0; - unsigned char input[3]; - unsigned char output[4]; - unsigned int i; - - while (2 < srclength) { - input[0] = *src++; - input[1] = *src++; - input[2] = *src++; - srclength -= 3; - - output[0] = input[0] >> 2; - output[1] = ((input[0] & 0x03) << 4) + (input[1] >> 4); - output[2] = ((input[1] & 0x0f) << 2) + (input[2] >> 6); - output[3] = input[2] & 0x3f; - - if (datalength + 4 > targsize) - return (-1); - target[datalength++] = Base64[output[0]]; - target[datalength++] = Base64[output[1]]; - target[datalength++] = Base64[output[2]]; - target[datalength++] = Base64[output[3]]; - } - - /* Now we worry about padding. */ - if (0 != srclength) { - /* Get what's left. */ - input[0] = input[1] = input[2] = '\0'; - for (i = 0; i < srclength; i++) - input[i] = *src++; - - output[0] = input[0] >> 2; - output[1] = ((input[0] & 0x03) << 4) + (input[1] >> 4); - output[2] = ((input[1] & 0x0f) << 2) + (input[2] >> 6); - - if (datalength + 4 > targsize) - return (-1); - target[datalength++] = Base64[output[0]]; - target[datalength++] = Base64[output[1]]; - if (srclength == 1) - target[datalength++] = Pad64; - else - target[datalength++] = Base64[output[2]]; - target[datalength++] = Pad64; - } - if (datalength >= targsize) - return (-1); - target[datalength] = '\0'; /* Returned value doesn't count \0. */ - return (int)(datalength); -} - -/* skips all whitespace anywhere. - converts characters, four at a time, starting at (or after) - src from base - 64 numbers into three 8 bit bytes in the target area. - it returns the number of data bytes stored at the target, or -1 on error. - */ - -int -hwloc_decode_from_base64(char const *src, char *target, size_t targsize) -{ - unsigned int tarindex, state; - int ch; - char *pos; - - state = 0; - tarindex = 0; - - while ((ch = *src++) != '\0') { - if (isspace(ch)) /* Skip whitespace anywhere. */ - continue; - - if (ch == Pad64) - break; - - pos = strchr(Base64, ch); - if (pos == 0) /* A non-base64 character. */ - return (-1); - - switch (state) { - case 0: - if (target) { - if (tarindex >= targsize) - return (-1); - target[tarindex] = (char)(pos - Base64) << 2; - } - state = 1; - break; - case 1: - if (target) { - if (tarindex + 1 >= targsize) - return (-1); - target[tarindex] |= (pos - Base64) >> 4; - target[tarindex+1] = ((pos - Base64) & 0x0f) - << 4 ; - } - tarindex++; - state = 2; - break; - case 2: - if (target) { - if (tarindex + 1 >= targsize) - return (-1); - target[tarindex] |= (pos - Base64) >> 2; - target[tarindex+1] = ((pos - Base64) & 0x03) - << 6; - } - tarindex++; - state = 3; - break; - case 3: - if (target) { - if (tarindex >= targsize) - return (-1); - target[tarindex] |= (pos - Base64); - } - tarindex++; - state = 0; - break; - } - } - - /* - * We are done decoding Base-64 chars. Let's see if we ended - * on a byte boundary, and/or with erroneous trailing characters. - */ - - if (ch == Pad64) { /* We got a pad char. */ - ch = *src++; /* Skip it, get next. */ - switch (state) { - case 0: /* Invalid = in first position */ - case 1: /* Invalid = in second position */ - return (-1); - - case 2: /* Valid, means one byte of info */ - /* Skip any number of spaces. */ - for (; ch != '\0'; ch = *src++) - if (!isspace(ch)) - break; - /* Make sure there is another trailing = sign. */ - if (ch != Pad64) - return (-1); - ch = *src++; /* Skip the = */ - /* Fall through to "single trailing =" case. */ - /* FALLTHROUGH */ - - case 3: /* Valid, means two bytes of info */ - /* - * We know this char is an =. Is there anything but - * whitespace after it? - */ - for (; ch != '\0'; ch = *src++) - if (!isspace(ch)) - return (-1); - - /* - * Now make sure for cases 2 and 3 that the "extra" - * bits that slopped past the last full byte were - * zeros. If we don't check them, they become a - * subliminal channel. - */ - if (target && target[tarindex] != 0) - return (-1); - } - } else { - /* - * We ended by seeing the end of the string. Make sure we - * have no partial bytes lying around. - */ - if (state != 0) - return (-1); - } - - return (tarindex); -} diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/bitmap.c b/opal/mca/hwloc/hwloc1113/hwloc/src/bitmap.c deleted file mode 100644 index d6b5c5ec5f2..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/bitmap.c +++ /dev/null @@ -1,1485 +0,0 @@ -/* - * Copyright © 2009 CNRS - * Copyright © 2009-2015 Inria. All rights reserved. - * Copyright © 2009-2011 Université Bordeaux - * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. - * See COPYING in top-level directory. - */ - -#include -#include -#include -#include -#include -#include - -#include -#include -#include -#include -#include - -/* TODO - * - have a way to change the initial allocation size - * - preallocate inside the bitmap structure (so that the whole structure is a cacheline for instance) - * and allocate a dedicated array only later when reallocating larger - */ - -/* magic number */ -#define HWLOC_BITMAP_MAGIC 0x20091007 - -/* actual opaque type internals */ -struct hwloc_bitmap_s { - unsigned ulongs_count; /* how many ulong bitmasks are valid, >= 1 */ - unsigned ulongs_allocated; /* how many ulong bitmasks are allocated, >= ulongs_count */ - unsigned long *ulongs; - int infinite; /* set to 1 if all bits beyond ulongs are set */ -#ifdef HWLOC_DEBUG - int magic; -#endif -}; - -/* overzealous check in debug-mode, not as powerful as valgrind but still useful */ -#ifdef HWLOC_DEBUG -#define HWLOC__BITMAP_CHECK(set) do { \ - assert((set)->magic == HWLOC_BITMAP_MAGIC); \ - assert((set)->ulongs_count >= 1); \ - assert((set)->ulongs_allocated >= (set)->ulongs_count); \ -} while (0) -#else -#define HWLOC__BITMAP_CHECK(set) -#endif - -/* extract a subset from a set using an index or a cpu */ -#define HWLOC_SUBBITMAP_INDEX(cpu) ((cpu)/(HWLOC_BITS_PER_LONG)) -#define HWLOC_SUBBITMAP_CPU_ULBIT(cpu) ((cpu)%(HWLOC_BITS_PER_LONG)) -/* Read from a bitmap ulong without knowing whether x is valid. - * Writers should make sure that x is valid and modify set->ulongs[x] directly. - */ -#define HWLOC_SUBBITMAP_READULONG(set,x) ((x) < (set)->ulongs_count ? (set)->ulongs[x] : (set)->infinite ? HWLOC_SUBBITMAP_FULL : HWLOC_SUBBITMAP_ZERO) - -/* predefined subset values */ -#define HWLOC_SUBBITMAP_ZERO 0UL -#define HWLOC_SUBBITMAP_FULL (~0UL) -#define HWLOC_SUBBITMAP_ULBIT(bit) (1UL<<(bit)) -#define HWLOC_SUBBITMAP_CPU(cpu) HWLOC_SUBBITMAP_ULBIT(HWLOC_SUBBITMAP_CPU_ULBIT(cpu)) -#define HWLOC_SUBBITMAP_ULBIT_TO(bit) (HWLOC_SUBBITMAP_FULL>>(HWLOC_BITS_PER_LONG-1-(bit))) -#define HWLOC_SUBBITMAP_ULBIT_FROM(bit) (HWLOC_SUBBITMAP_FULL<<(bit)) -#define HWLOC_SUBBITMAP_ULBIT_FROMTO(begin,end) (HWLOC_SUBBITMAP_ULBIT_TO(end) & HWLOC_SUBBITMAP_ULBIT_FROM(begin)) - -struct hwloc_bitmap_s * hwloc_bitmap_alloc(void) -{ - struct hwloc_bitmap_s * set; - - set = malloc(sizeof(struct hwloc_bitmap_s)); - if (!set) - return NULL; - - set->ulongs_count = 1; - set->ulongs_allocated = 64/sizeof(unsigned long); - set->ulongs = malloc(64); - if (!set->ulongs) { - free(set); - return NULL; - } - - set->ulongs[0] = HWLOC_SUBBITMAP_ZERO; - set->infinite = 0; -#ifdef HWLOC_DEBUG - set->magic = HWLOC_BITMAP_MAGIC; -#endif - return set; -} - -struct hwloc_bitmap_s * hwloc_bitmap_alloc_full(void) -{ - struct hwloc_bitmap_s * set = hwloc_bitmap_alloc(); - if (set) { - set->infinite = 1; - set->ulongs[0] = HWLOC_SUBBITMAP_FULL; - } - return set; -} - -void hwloc_bitmap_free(struct hwloc_bitmap_s * set) -{ - if (!set) - return; - - HWLOC__BITMAP_CHECK(set); -#ifdef HWLOC_DEBUG - set->magic = 0; -#endif - - free(set->ulongs); - free(set); -} - -/* enlarge until it contains at least needed_count ulongs. - */ -static void -hwloc_bitmap_enlarge_by_ulongs(struct hwloc_bitmap_s * set, unsigned needed_count) -{ - unsigned tmp = 1 << hwloc_flsl((unsigned long) needed_count - 1); - if (tmp > set->ulongs_allocated) { - set->ulongs = realloc(set->ulongs, tmp * sizeof(unsigned long)); - assert(set->ulongs); - set->ulongs_allocated = tmp; - } -} - -/* enlarge until it contains at least needed_count ulongs, - * and update new ulongs according to the infinite field. - */ -static void -hwloc_bitmap_realloc_by_ulongs(struct hwloc_bitmap_s * set, unsigned needed_count) -{ - unsigned i; - - HWLOC__BITMAP_CHECK(set); - - if (needed_count <= set->ulongs_count) - return; - - /* realloc larger if needed */ - hwloc_bitmap_enlarge_by_ulongs(set, needed_count); - - /* fill the newly allocated subset depending on the infinite flag */ - for(i=set->ulongs_count; iulongs[i] = set->infinite ? HWLOC_SUBBITMAP_FULL : HWLOC_SUBBITMAP_ZERO; - set->ulongs_count = needed_count; -} - -/* realloc until it contains at least cpu+1 bits */ -#define hwloc_bitmap_realloc_by_cpu_index(set, cpu) hwloc_bitmap_realloc_by_ulongs(set, ((cpu)/HWLOC_BITS_PER_LONG)+1) - -/* reset a bitmap to exactely the needed size. - * the caller must reinitialize all ulongs and the infinite flag later. - */ -static void -hwloc_bitmap_reset_by_ulongs(struct hwloc_bitmap_s * set, unsigned needed_count) -{ - hwloc_bitmap_enlarge_by_ulongs(set, needed_count); - set->ulongs_count = needed_count; -} - -/* reset until it contains exactly cpu+1 bits (roundup to a ulong). - * the caller must reinitialize all ulongs and the infinite flag later. - */ -#define hwloc_bitmap_reset_by_cpu_index(set, cpu) hwloc_bitmap_reset_by_ulongs(set, ((cpu)/HWLOC_BITS_PER_LONG)+1) - -struct hwloc_bitmap_s * hwloc_bitmap_dup(const struct hwloc_bitmap_s * old) -{ - struct hwloc_bitmap_s * new; - - if (!old) - return NULL; - - HWLOC__BITMAP_CHECK(old); - - new = malloc(sizeof(struct hwloc_bitmap_s)); - if (!new) - return NULL; - - new->ulongs = malloc(old->ulongs_allocated * sizeof(unsigned long)); - if (!new->ulongs) { - free(new); - return NULL; - } - new->ulongs_allocated = old->ulongs_allocated; - new->ulongs_count = old->ulongs_count; - memcpy(new->ulongs, old->ulongs, new->ulongs_count * sizeof(unsigned long)); - new->infinite = old->infinite; -#ifdef HWLOC_DEBUG - new->magic = HWLOC_BITMAP_MAGIC; -#endif - return new; -} - -void hwloc_bitmap_copy(struct hwloc_bitmap_s * dst, const struct hwloc_bitmap_s * src) -{ - HWLOC__BITMAP_CHECK(dst); - HWLOC__BITMAP_CHECK(src); - - hwloc_bitmap_reset_by_ulongs(dst, src->ulongs_count); - - memcpy(dst->ulongs, src->ulongs, src->ulongs_count * sizeof(unsigned long)); - dst->infinite = src->infinite; -} - -/* Strings always use 32bit groups */ -#define HWLOC_PRIxSUBBITMAP "%08lx" -#define HWLOC_BITMAP_SUBSTRING_SIZE 32 -#define HWLOC_BITMAP_SUBSTRING_LENGTH (HWLOC_BITMAP_SUBSTRING_SIZE/4) -#define HWLOC_BITMAP_STRING_PER_LONG (HWLOC_BITS_PER_LONG/HWLOC_BITMAP_SUBSTRING_SIZE) - -int hwloc_bitmap_snprintf(char * __hwloc_restrict buf, size_t buflen, const struct hwloc_bitmap_s * __hwloc_restrict set) -{ - ssize_t size = buflen; - char *tmp = buf; - int res, ret = 0; - int needcomma = 0; - int i; - unsigned long accum = 0; - int accumed = 0; -#if HWLOC_BITS_PER_LONG == HWLOC_BITMAP_SUBSTRING_SIZE - const unsigned long accum_mask = ~0UL; -#else /* HWLOC_BITS_PER_LONG != HWLOC_BITMAP_SUBSTRING_SIZE */ - const unsigned long accum_mask = ((1UL << HWLOC_BITMAP_SUBSTRING_SIZE) - 1) << (HWLOC_BITS_PER_LONG - HWLOC_BITMAP_SUBSTRING_SIZE); -#endif /* HWLOC_BITS_PER_LONG != HWLOC_BITMAP_SUBSTRING_SIZE */ - - HWLOC__BITMAP_CHECK(set); - - /* mark the end in case we do nothing later */ - if (buflen > 0) - tmp[0] = '\0'; - - if (set->infinite) { - res = hwloc_snprintf(tmp, size, "0xf...f"); - needcomma = 1; - if (res < 0) - return -1; - ret += res; - if (res >= size) - res = size>0 ? (int)size - 1 : 0; - tmp += res; - size -= res; - } - - i=set->ulongs_count-1; - - if (set->infinite) { - /* ignore starting FULL since we have 0xf...f already */ - while (i>=0 && set->ulongs[i] == HWLOC_SUBBITMAP_FULL) - i--; - } else { - /* ignore starting ZERO except the last one */ - while (i>=0 && set->ulongs[i] == HWLOC_SUBBITMAP_ZERO) - i--; - } - - while (i>=0 || accumed) { - /* Refill accumulator */ - if (!accumed) { - accum = set->ulongs[i--]; - accumed = HWLOC_BITS_PER_LONG; - } - - if (accum & accum_mask) { - /* print the whole subset if not empty */ - res = hwloc_snprintf(tmp, size, needcomma ? ",0x" HWLOC_PRIxSUBBITMAP : "0x" HWLOC_PRIxSUBBITMAP, - (accum & accum_mask) >> (HWLOC_BITS_PER_LONG - HWLOC_BITMAP_SUBSTRING_SIZE)); - needcomma = 1; - } else if (i == -1 && accumed == HWLOC_BITMAP_SUBSTRING_SIZE) { - /* print a single 0 to mark the last subset */ - res = hwloc_snprintf(tmp, size, needcomma ? ",0x0" : "0x0"); - } else if (needcomma) { - res = hwloc_snprintf(tmp, size, ","); - } else { - res = 0; - } - if (res < 0) - return -1; - ret += res; - -#if HWLOC_BITS_PER_LONG == HWLOC_BITMAP_SUBSTRING_SIZE - accum = 0; - accumed = 0; -#else - accum <<= HWLOC_BITMAP_SUBSTRING_SIZE; - accumed -= HWLOC_BITMAP_SUBSTRING_SIZE; -#endif - - if (res >= size) - res = size>0 ? (int)size - 1 : 0; - - tmp += res; - size -= res; - } - - /* if didn't display anything, display 0x0 */ - if (!ret) { - res = hwloc_snprintf(tmp, size, "0x0"); - if (res < 0) - return -1; - ret += res; - } - - return ret; -} - -int hwloc_bitmap_asprintf(char ** strp, const struct hwloc_bitmap_s * __hwloc_restrict set) -{ - int len; - char *buf; - - HWLOC__BITMAP_CHECK(set); - - len = hwloc_bitmap_snprintf(NULL, 0, set); - buf = malloc(len+1); - *strp = buf; - return hwloc_bitmap_snprintf(buf, len+1, set); -} - -int hwloc_bitmap_sscanf(struct hwloc_bitmap_s *set, const char * __hwloc_restrict string) -{ - const char * current = string; - unsigned long accum = 0; - int count=0; - int infinite = 0; - - /* count how many substrings there are */ - count++; - while ((current = strchr(current+1, ',')) != NULL) - count++; - - current = string; - if (!strncmp("0xf...f", current, 7)) { - current += 7; - if (*current != ',') { - /* special case for infinite/full bitmap */ - hwloc_bitmap_fill(set); - return 0; - } - current++; - infinite = 1; - count--; - } - - hwloc_bitmap_reset_by_ulongs(set, (count + HWLOC_BITMAP_STRING_PER_LONG - 1) / HWLOC_BITMAP_STRING_PER_LONG); - set->infinite = 0; - - while (*current != '\0') { - unsigned long val; - char *next; - val = strtoul(current, &next, 16); - - assert(count > 0); - count--; - - accum |= (val << ((count * HWLOC_BITMAP_SUBSTRING_SIZE) % HWLOC_BITS_PER_LONG)); - if (!(count % HWLOC_BITMAP_STRING_PER_LONG)) { - set->ulongs[count / HWLOC_BITMAP_STRING_PER_LONG] = accum; - accum = 0; - } - - if (*next != ',') { - if (*next || count > 0) - goto failed; - else - break; - } - current = (const char*) next+1; - } - - set->infinite = infinite; /* set at the end, to avoid spurious realloc with filled new ulongs */ - - return 0; - - failed: - /* failure to parse */ - hwloc_bitmap_zero(set); - return -1; -} - -int hwloc_bitmap_list_snprintf(char * __hwloc_restrict buf, size_t buflen, const struct hwloc_bitmap_s * __hwloc_restrict set) -{ - int prev = -1; - hwloc_bitmap_t reverse; - ssize_t size = buflen; - char *tmp = buf; - int res, ret = 0; - int needcomma = 0; - - HWLOC__BITMAP_CHECK(set); - - reverse = hwloc_bitmap_alloc(); /* FIXME: add hwloc_bitmap_alloc_size() + hwloc_bitmap_init_allocated() to avoid malloc? */ - hwloc_bitmap_not(reverse, set); - - /* mark the end in case we do nothing later */ - if (buflen > 0) - tmp[0] = '\0'; - - while (1) { - int begin, end; - - begin = hwloc_bitmap_next(set, prev); - if (begin == -1) - break; - end = hwloc_bitmap_next(reverse, begin); - - if (end == begin+1) { - res = hwloc_snprintf(tmp, size, needcomma ? ",%d" : "%d", begin); - } else if (end == -1) { - res = hwloc_snprintf(tmp, size, needcomma ? ",%d-" : "%d-", begin); - } else { - res = hwloc_snprintf(tmp, size, needcomma ? ",%d-%d" : "%d-%d", begin, end-1); - } - if (res < 0) { - hwloc_bitmap_free(reverse); - return -1; - } - ret += res; - - if (res >= size) - res = size>0 ? (int)size - 1 : 0; - - tmp += res; - size -= res; - needcomma = 1; - - if (end == -1) - break; - else - prev = end - 1; - } - - hwloc_bitmap_free(reverse); - - return ret; -} - -int hwloc_bitmap_list_asprintf(char ** strp, const struct hwloc_bitmap_s * __hwloc_restrict set) -{ - int len; - char *buf; - - HWLOC__BITMAP_CHECK(set); - - len = hwloc_bitmap_list_snprintf(NULL, 0, set); - buf = malloc(len+1); - *strp = buf; - return hwloc_bitmap_list_snprintf(buf, len+1, set); -} - -int hwloc_bitmap_list_sscanf(struct hwloc_bitmap_s *set, const char * __hwloc_restrict string) -{ - const char * current = string; - char *next; - long begin = -1, val; - - hwloc_bitmap_zero(set); - - while (*current != '\0') { - - /* ignore empty ranges */ - while (*current == ',') - current++; - - val = strtoul(current, &next, 0); - /* make sure we got at least one digit */ - if (next == current) - goto failed; - - if (begin != -1) { - /* finishing a range */ - hwloc_bitmap_set_range(set, begin, val); - begin = -1; - - } else if (*next == '-') { - /* starting a new range */ - if (*(next+1) == '\0') { - /* infinite range */ - hwloc_bitmap_set_range(set, val, -1); - break; - } else { - /* normal range */ - begin = val; - } - - } else if (*next == ',' || *next == '\0') { - /* single digit */ - hwloc_bitmap_set(set, val); - } - - if (*next == '\0') - break; - current = next+1; - } - - return 0; - - failed: - /* failure to parse */ - hwloc_bitmap_zero(set); - return -1; -} - -int hwloc_bitmap_taskset_snprintf(char * __hwloc_restrict buf, size_t buflen, const struct hwloc_bitmap_s * __hwloc_restrict set) -{ - ssize_t size = buflen; - char *tmp = buf; - int res, ret = 0; - int started = 0; - int i; - - HWLOC__BITMAP_CHECK(set); - - /* mark the end in case we do nothing later */ - if (buflen > 0) - tmp[0] = '\0'; - - if (set->infinite) { - res = hwloc_snprintf(tmp, size, "0xf...f"); - started = 1; - if (res < 0) - return -1; - ret += res; - if (res >= size) - res = size>0 ? (int)size - 1 : 0; - tmp += res; - size -= res; - } - - i=set->ulongs_count-1; - - if (set->infinite) { - /* ignore starting FULL since we have 0xf...f already */ - while (i>=0 && set->ulongs[i] == HWLOC_SUBBITMAP_FULL) - i--; - } else { - /* ignore starting ZERO except the last one */ - while (i>=1 && set->ulongs[i] == HWLOC_SUBBITMAP_ZERO) - i--; - } - - while (i>=0) { - unsigned long val = set->ulongs[i--]; - if (started) { - /* print the whole subset */ -#if HWLOC_BITS_PER_LONG == 64 - res = hwloc_snprintf(tmp, size, "%016lx", val); -#else - res = hwloc_snprintf(tmp, size, "%08lx", val); -#endif - } else if (val || i == -1) { - res = hwloc_snprintf(tmp, size, "0x%lx", val); - started = 1; - } else { - res = 0; - } - if (res < 0) - return -1; - ret += res; - if (res >= size) - res = size>0 ? (int)size - 1 : 0; - tmp += res; - size -= res; - } - - /* if didn't display anything, display 0x0 */ - if (!ret) { - res = hwloc_snprintf(tmp, size, "0x0"); - if (res < 0) - return -1; - ret += res; - } - - return ret; -} - -int hwloc_bitmap_taskset_asprintf(char ** strp, const struct hwloc_bitmap_s * __hwloc_restrict set) -{ - int len; - char *buf; - - HWLOC__BITMAP_CHECK(set); - - len = hwloc_bitmap_taskset_snprintf(NULL, 0, set); - buf = malloc(len+1); - *strp = buf; - return hwloc_bitmap_taskset_snprintf(buf, len+1, set); -} - -int hwloc_bitmap_taskset_sscanf(struct hwloc_bitmap_s *set, const char * __hwloc_restrict string) -{ - const char * current = string; - int chars; - int count; - int infinite = 0; - - current = string; - if (!strncmp("0xf...f", current, 7)) { - /* infinite bitmap */ - infinite = 1; - current += 7; - if (*current == '\0') { - /* special case for infinite/full bitmap */ - hwloc_bitmap_fill(set); - return 0; - } - } else { - /* finite bitmap */ - if (!strncmp("0x", current, 2)) - current += 2; - if (*current == '\0') { - /* special case for empty bitmap */ - hwloc_bitmap_zero(set); - return 0; - } - } - /* we know there are other characters now */ - - chars = (int)strlen(current); - count = (chars * 4 + HWLOC_BITS_PER_LONG - 1) / HWLOC_BITS_PER_LONG; - - hwloc_bitmap_reset_by_ulongs(set, count); - set->infinite = 0; - - while (*current != '\0') { - int tmpchars; - char ustr[17]; - unsigned long val; - char *next; - - tmpchars = chars % (HWLOC_BITS_PER_LONG/4); - if (!tmpchars) - tmpchars = (HWLOC_BITS_PER_LONG/4); - - memcpy(ustr, current, tmpchars); - ustr[tmpchars] = '\0'; - val = strtoul(ustr, &next, 16); - if (*next != '\0') - goto failed; - - set->ulongs[count-1] = val; - - current += tmpchars; - chars -= tmpchars; - count--; - } - - set->infinite = infinite; /* set at the end, to avoid spurious realloc with filled new ulongs */ - - return 0; - - failed: - /* failure to parse */ - hwloc_bitmap_zero(set); - return -1; -} - -static void hwloc_bitmap__zero(struct hwloc_bitmap_s *set) -{ - unsigned i; - for(i=0; iulongs_count; i++) - set->ulongs[i] = HWLOC_SUBBITMAP_ZERO; - set->infinite = 0; -} - -void hwloc_bitmap_zero(struct hwloc_bitmap_s * set) -{ - HWLOC__BITMAP_CHECK(set); - - hwloc_bitmap_reset_by_ulongs(set, 1); - hwloc_bitmap__zero(set); -} - -static void hwloc_bitmap__fill(struct hwloc_bitmap_s * set) -{ - unsigned i; - for(i=0; iulongs_count; i++) - set->ulongs[i] = HWLOC_SUBBITMAP_FULL; - set->infinite = 1; -} - -void hwloc_bitmap_fill(struct hwloc_bitmap_s * set) -{ - HWLOC__BITMAP_CHECK(set); - - hwloc_bitmap_reset_by_ulongs(set, 1); - hwloc_bitmap__fill(set); -} - -void hwloc_bitmap_from_ulong(struct hwloc_bitmap_s *set, unsigned long mask) -{ - HWLOC__BITMAP_CHECK(set); - - hwloc_bitmap_reset_by_ulongs(set, 1); - set->ulongs[0] = mask; /* there's always at least one ulong allocated */ - set->infinite = 0; -} - -void hwloc_bitmap_from_ith_ulong(struct hwloc_bitmap_s *set, unsigned i, unsigned long mask) -{ - unsigned j; - - HWLOC__BITMAP_CHECK(set); - - hwloc_bitmap_reset_by_ulongs(set, i+1); - set->ulongs[i] = mask; - for(j=0; julongs[j] = HWLOC_SUBBITMAP_ZERO; - set->infinite = 0; -} - -unsigned long hwloc_bitmap_to_ulong(const struct hwloc_bitmap_s *set) -{ - HWLOC__BITMAP_CHECK(set); - - return set->ulongs[0]; /* there's always at least one ulong allocated */ -} - -unsigned long hwloc_bitmap_to_ith_ulong(const struct hwloc_bitmap_s *set, unsigned i) -{ - HWLOC__BITMAP_CHECK(set); - - return HWLOC_SUBBITMAP_READULONG(set, i); -} - -void hwloc_bitmap_only(struct hwloc_bitmap_s * set, unsigned cpu) -{ - unsigned index_ = HWLOC_SUBBITMAP_INDEX(cpu); - - HWLOC__BITMAP_CHECK(set); - - hwloc_bitmap_reset_by_cpu_index(set, cpu); - hwloc_bitmap__zero(set); - set->ulongs[index_] |= HWLOC_SUBBITMAP_CPU(cpu); -} - -void hwloc_bitmap_allbut(struct hwloc_bitmap_s * set, unsigned cpu) -{ - unsigned index_ = HWLOC_SUBBITMAP_INDEX(cpu); - - HWLOC__BITMAP_CHECK(set); - - hwloc_bitmap_reset_by_cpu_index(set, cpu); - hwloc_bitmap__fill(set); - set->ulongs[index_] &= ~HWLOC_SUBBITMAP_CPU(cpu); -} - -void hwloc_bitmap_set(struct hwloc_bitmap_s * set, unsigned cpu) -{ - unsigned index_ = HWLOC_SUBBITMAP_INDEX(cpu); - - HWLOC__BITMAP_CHECK(set); - - /* nothing to do if setting inside the infinite part of the bitmap */ - if (set->infinite && cpu >= set->ulongs_count * HWLOC_BITS_PER_LONG) - return; - - hwloc_bitmap_realloc_by_cpu_index(set, cpu); - set->ulongs[index_] |= HWLOC_SUBBITMAP_CPU(cpu); -} - -void hwloc_bitmap_set_range(struct hwloc_bitmap_s * set, unsigned begincpu, int _endcpu) -{ - unsigned i; - unsigned beginset,endset; - unsigned endcpu = (unsigned) _endcpu; - - HWLOC__BITMAP_CHECK(set); - - if (_endcpu == -1) { - set->infinite = 1; - /* keep endcpu == -1 since this unsigned is actually larger than anything else */ - } - - if (set->infinite) { - /* truncate the range according to the infinite part of the bitmap */ - if (endcpu >= set->ulongs_count * HWLOC_BITS_PER_LONG) - endcpu = set->ulongs_count * HWLOC_BITS_PER_LONG - 1; - if (begincpu >= set->ulongs_count * HWLOC_BITS_PER_LONG) - return; - } - if (endcpu < begincpu) - return; - hwloc_bitmap_realloc_by_cpu_index(set, endcpu); - - beginset = HWLOC_SUBBITMAP_INDEX(begincpu); - endset = HWLOC_SUBBITMAP_INDEX(endcpu); - for(i=beginset+1; iulongs[i] = HWLOC_SUBBITMAP_FULL; - if (beginset == endset) { - set->ulongs[beginset] |= HWLOC_SUBBITMAP_ULBIT_FROMTO(HWLOC_SUBBITMAP_CPU_ULBIT(begincpu), HWLOC_SUBBITMAP_CPU_ULBIT(endcpu)); - } else { - set->ulongs[beginset] |= HWLOC_SUBBITMAP_ULBIT_FROM(HWLOC_SUBBITMAP_CPU_ULBIT(begincpu)); - set->ulongs[endset] |= HWLOC_SUBBITMAP_ULBIT_TO(HWLOC_SUBBITMAP_CPU_ULBIT(endcpu)); - } -} - -void hwloc_bitmap_set_ith_ulong(struct hwloc_bitmap_s *set, unsigned i, unsigned long mask) -{ - HWLOC__BITMAP_CHECK(set); - - hwloc_bitmap_realloc_by_ulongs(set, i+1); - set->ulongs[i] = mask; -} - -void hwloc_bitmap_clr(struct hwloc_bitmap_s * set, unsigned cpu) -{ - unsigned index_ = HWLOC_SUBBITMAP_INDEX(cpu); - - HWLOC__BITMAP_CHECK(set); - - /* nothing to do if clearing inside the infinitely-unset part of the bitmap */ - if (!set->infinite && cpu >= set->ulongs_count * HWLOC_BITS_PER_LONG) - return; - - hwloc_bitmap_realloc_by_cpu_index(set, cpu); - set->ulongs[index_] &= ~HWLOC_SUBBITMAP_CPU(cpu); -} - -void hwloc_bitmap_clr_range(struct hwloc_bitmap_s * set, unsigned begincpu, int _endcpu) -{ - unsigned i; - unsigned beginset,endset; - unsigned endcpu = (unsigned) _endcpu; - - HWLOC__BITMAP_CHECK(set); - - if (_endcpu == -1) { - set->infinite = 0; - /* keep endcpu == -1 since this unsigned is actually larger than anything else */ - } - - if (!set->infinite) { - /* truncate the range according to the infinitely-unset part of the bitmap */ - if (endcpu >= set->ulongs_count * HWLOC_BITS_PER_LONG) - endcpu = set->ulongs_count * HWLOC_BITS_PER_LONG - 1; - if (begincpu >= set->ulongs_count * HWLOC_BITS_PER_LONG) - return; - } - if (endcpu < begincpu) - return; - hwloc_bitmap_realloc_by_cpu_index(set, endcpu); - - beginset = HWLOC_SUBBITMAP_INDEX(begincpu); - endset = HWLOC_SUBBITMAP_INDEX(endcpu); - for(i=beginset+1; iulongs[i] = HWLOC_SUBBITMAP_ZERO; - if (beginset == endset) { - set->ulongs[beginset] &= ~HWLOC_SUBBITMAP_ULBIT_FROMTO(HWLOC_SUBBITMAP_CPU_ULBIT(begincpu), HWLOC_SUBBITMAP_CPU_ULBIT(endcpu)); - } else { - set->ulongs[beginset] &= ~HWLOC_SUBBITMAP_ULBIT_FROM(HWLOC_SUBBITMAP_CPU_ULBIT(begincpu)); - set->ulongs[endset] &= ~HWLOC_SUBBITMAP_ULBIT_TO(HWLOC_SUBBITMAP_CPU_ULBIT(endcpu)); - } -} - -int hwloc_bitmap_isset(const struct hwloc_bitmap_s * set, unsigned cpu) -{ - unsigned index_ = HWLOC_SUBBITMAP_INDEX(cpu); - - HWLOC__BITMAP_CHECK(set); - - return (HWLOC_SUBBITMAP_READULONG(set, index_) & HWLOC_SUBBITMAP_CPU(cpu)) != 0; -} - -int hwloc_bitmap_iszero(const struct hwloc_bitmap_s *set) -{ - unsigned i; - - HWLOC__BITMAP_CHECK(set); - - if (set->infinite) - return 0; - for(i=0; iulongs_count; i++) - if (set->ulongs[i] != HWLOC_SUBBITMAP_ZERO) - return 0; - return 1; -} - -int hwloc_bitmap_isfull(const struct hwloc_bitmap_s *set) -{ - unsigned i; - - HWLOC__BITMAP_CHECK(set); - - if (!set->infinite) - return 0; - for(i=0; iulongs_count; i++) - if (set->ulongs[i] != HWLOC_SUBBITMAP_FULL) - return 0; - return 1; -} - -int hwloc_bitmap_isequal (const struct hwloc_bitmap_s *set1, const struct hwloc_bitmap_s *set2) -{ - unsigned count1 = set1->ulongs_count; - unsigned count2 = set2->ulongs_count; - unsigned min_count = count1 < count2 ? count1 : count2; - unsigned i; - - HWLOC__BITMAP_CHECK(set1); - HWLOC__BITMAP_CHECK(set2); - - for(i=0; iulongs[i] != set2->ulongs[i]) - return 0; - - if (count1 != count2) { - unsigned long w1 = set1->infinite ? HWLOC_SUBBITMAP_FULL : HWLOC_SUBBITMAP_ZERO; - unsigned long w2 = set2->infinite ? HWLOC_SUBBITMAP_FULL : HWLOC_SUBBITMAP_ZERO; - for(i=min_count; iulongs[i] != w2) - return 0; - } - for(i=min_count; iulongs[i] != w1) - return 0; - } - } - - if (set1->infinite != set2->infinite) - return 0; - - return 1; -} - -int hwloc_bitmap_intersects (const struct hwloc_bitmap_s *set1, const struct hwloc_bitmap_s *set2) -{ - unsigned count1 = set1->ulongs_count; - unsigned count2 = set2->ulongs_count; - unsigned min_count = count1 < count2 ? count1 : count2; - unsigned i; - - HWLOC__BITMAP_CHECK(set1); - HWLOC__BITMAP_CHECK(set2); - - for(i=0; iulongs[i] & set2->ulongs[i]) - return 1; - - if (count1 != count2) { - if (set2->infinite) { - for(i=min_count; iulongs_count; i++) - if (set1->ulongs[i]) - return 1; - } - if (set1->infinite) { - for(i=min_count; iulongs_count; i++) - if (set2->ulongs[i]) - return 1; - } - } - - if (set1->infinite && set2->infinite) - return 1; - - return 0; -} - -int hwloc_bitmap_isincluded (const struct hwloc_bitmap_s *sub_set, const struct hwloc_bitmap_s *super_set) -{ - unsigned super_count = super_set->ulongs_count; - unsigned sub_count = sub_set->ulongs_count; - unsigned min_count = super_count < sub_count ? super_count : sub_count; - unsigned i; - - HWLOC__BITMAP_CHECK(sub_set); - HWLOC__BITMAP_CHECK(super_set); - - for(i=0; iulongs[i] != (super_set->ulongs[i] | sub_set->ulongs[i])) - return 0; - - if (super_count != sub_count) { - if (!super_set->infinite) - for(i=min_count; iulongs[i]) - return 0; - if (sub_set->infinite) - for(i=min_count; iulongs[i] != HWLOC_SUBBITMAP_FULL) - return 0; - } - - if (sub_set->infinite && !super_set->infinite) - return 0; - - return 1; -} - -void hwloc_bitmap_or (struct hwloc_bitmap_s *res, const struct hwloc_bitmap_s *set1, const struct hwloc_bitmap_s *set2) -{ - /* cache counts so that we can reset res even if it's also set1 or set2 */ - unsigned count1 = set1->ulongs_count; - unsigned count2 = set2->ulongs_count; - unsigned max_count = count1 > count2 ? count1 : count2; - unsigned min_count = count1 + count2 - max_count; - unsigned i; - - HWLOC__BITMAP_CHECK(res); - HWLOC__BITMAP_CHECK(set1); - HWLOC__BITMAP_CHECK(set2); - - hwloc_bitmap_reset_by_ulongs(res, max_count); - - for(i=0; iulongs[i] = set1->ulongs[i] | set2->ulongs[i]; - - if (count1 != count2) { - if (min_count < count1) { - if (set2->infinite) { - res->ulongs_count = min_count; - } else { - for(i=min_count; iulongs[i] = set1->ulongs[i]; - } - } else { - if (set1->infinite) { - res->ulongs_count = min_count; - } else { - for(i=min_count; iulongs[i] = set2->ulongs[i]; - } - } - } - - res->infinite = set1->infinite || set2->infinite; -} - -void hwloc_bitmap_and (struct hwloc_bitmap_s *res, const struct hwloc_bitmap_s *set1, const struct hwloc_bitmap_s *set2) -{ - /* cache counts so that we can reset res even if it's also set1 or set2 */ - unsigned count1 = set1->ulongs_count; - unsigned count2 = set2->ulongs_count; - unsigned max_count = count1 > count2 ? count1 : count2; - unsigned min_count = count1 + count2 - max_count; - unsigned i; - - HWLOC__BITMAP_CHECK(res); - HWLOC__BITMAP_CHECK(set1); - HWLOC__BITMAP_CHECK(set2); - - hwloc_bitmap_reset_by_ulongs(res, max_count); - - for(i=0; iulongs[i] = set1->ulongs[i] & set2->ulongs[i]; - - if (count1 != count2) { - if (min_count < count1) { - if (set2->infinite) { - for(i=min_count; iulongs[i] = set1->ulongs[i]; - } else { - res->ulongs_count = min_count; - } - } else { - if (set1->infinite) { - for(i=min_count; iulongs[i] = set2->ulongs[i]; - } else { - res->ulongs_count = min_count; - } - } - } - - res->infinite = set1->infinite && set2->infinite; -} - -void hwloc_bitmap_andnot (struct hwloc_bitmap_s *res, const struct hwloc_bitmap_s *set1, const struct hwloc_bitmap_s *set2) -{ - /* cache counts so that we can reset res even if it's also set1 or set2 */ - unsigned count1 = set1->ulongs_count; - unsigned count2 = set2->ulongs_count; - unsigned max_count = count1 > count2 ? count1 : count2; - unsigned min_count = count1 + count2 - max_count; - unsigned i; - - HWLOC__BITMAP_CHECK(res); - HWLOC__BITMAP_CHECK(set1); - HWLOC__BITMAP_CHECK(set2); - - hwloc_bitmap_reset_by_ulongs(res, max_count); - - for(i=0; iulongs[i] = set1->ulongs[i] & ~set2->ulongs[i]; - - if (count1 != count2) { - if (min_count < count1) { - if (!set2->infinite) { - for(i=min_count; iulongs[i] = set1->ulongs[i]; - } else { - res->ulongs_count = min_count; - } - } else { - if (set1->infinite) { - for(i=min_count; iulongs[i] = ~set2->ulongs[i]; - } else { - res->ulongs_count = min_count; - } - } - } - - res->infinite = set1->infinite && !set2->infinite; -} - -void hwloc_bitmap_xor (struct hwloc_bitmap_s *res, const struct hwloc_bitmap_s *set1, const struct hwloc_bitmap_s *set2) -{ - /* cache counts so that we can reset res even if it's also set1 or set2 */ - unsigned count1 = set1->ulongs_count; - unsigned count2 = set2->ulongs_count; - unsigned max_count = count1 > count2 ? count1 : count2; - unsigned min_count = count1 + count2 - max_count; - unsigned i; - - HWLOC__BITMAP_CHECK(res); - HWLOC__BITMAP_CHECK(set1); - HWLOC__BITMAP_CHECK(set2); - - hwloc_bitmap_reset_by_ulongs(res, max_count); - - for(i=0; iulongs[i] = set1->ulongs[i] ^ set2->ulongs[i]; - - if (count1 != count2) { - if (min_count < count1) { - unsigned long w2 = set2->infinite ? HWLOC_SUBBITMAP_FULL : HWLOC_SUBBITMAP_ZERO; - for(i=min_count; iulongs[i] = set1->ulongs[i] ^ w2; - } else { - unsigned long w1 = set1->infinite ? HWLOC_SUBBITMAP_FULL : HWLOC_SUBBITMAP_ZERO; - for(i=min_count; iulongs[i] = set2->ulongs[i] ^ w1; - } - } - - res->infinite = (!set1->infinite) != (!set2->infinite); -} - -void hwloc_bitmap_not (struct hwloc_bitmap_s *res, const struct hwloc_bitmap_s *set) -{ - unsigned count = set->ulongs_count; - unsigned i; - - HWLOC__BITMAP_CHECK(res); - HWLOC__BITMAP_CHECK(set); - - hwloc_bitmap_reset_by_ulongs(res, count); - - for(i=0; iulongs[i] = ~set->ulongs[i]; - - res->infinite = !set->infinite; -} - -int hwloc_bitmap_first(const struct hwloc_bitmap_s * set) -{ - unsigned i; - - HWLOC__BITMAP_CHECK(set); - - for(i=0; iulongs_count; i++) { - /* subsets are unsigned longs, use ffsl */ - unsigned long w = set->ulongs[i]; - if (w) - return hwloc_ffsl(w) - 1 + HWLOC_BITS_PER_LONG*i; - } - - if (set->infinite) - return set->ulongs_count * HWLOC_BITS_PER_LONG; - - return -1; -} - -int hwloc_bitmap_last(const struct hwloc_bitmap_s * set) -{ - int i; - - HWLOC__BITMAP_CHECK(set); - - if (set->infinite) - return -1; - - for(i=set->ulongs_count-1; i>=0; i--) { - /* subsets are unsigned longs, use flsl */ - unsigned long w = set->ulongs[i]; - if (w) - return hwloc_flsl(w) - 1 + HWLOC_BITS_PER_LONG*i; - } - - return -1; -} - -int hwloc_bitmap_next(const struct hwloc_bitmap_s * set, int prev_cpu) -{ - unsigned i = HWLOC_SUBBITMAP_INDEX(prev_cpu + 1); - - HWLOC__BITMAP_CHECK(set); - - if (i >= set->ulongs_count) { - if (set->infinite) - return prev_cpu + 1; - else - return -1; - } - - for(; iulongs_count; i++) { - /* subsets are unsigned longs, use ffsl */ - unsigned long w = set->ulongs[i]; - - /* if the prev cpu is in the same word as the possible next one, - we need to mask out previous cpus */ - if (prev_cpu >= 0 && HWLOC_SUBBITMAP_INDEX((unsigned) prev_cpu) == i) - w &= ~HWLOC_SUBBITMAP_ULBIT_TO(HWLOC_SUBBITMAP_CPU_ULBIT(prev_cpu)); - - if (w) - return hwloc_ffsl(w) - 1 + HWLOC_BITS_PER_LONG*i; - } - - if (set->infinite) - return set->ulongs_count * HWLOC_BITS_PER_LONG; - - return -1; -} - -void hwloc_bitmap_singlify(struct hwloc_bitmap_s * set) -{ - unsigned i; - int found = 0; - - HWLOC__BITMAP_CHECK(set); - - for(i=0; iulongs_count; i++) { - if (found) { - set->ulongs[i] = HWLOC_SUBBITMAP_ZERO; - continue; - } else { - /* subsets are unsigned longs, use ffsl */ - unsigned long w = set->ulongs[i]; - if (w) { - int _ffs = hwloc_ffsl(w); - set->ulongs[i] = HWLOC_SUBBITMAP_CPU(_ffs-1); - found = 1; - } - } - } - - if (set->infinite) { - if (found) { - set->infinite = 0; - } else { - /* set the first non allocated bit */ - unsigned first = set->ulongs_count * HWLOC_BITS_PER_LONG; - set->infinite = 0; /* do not let realloc fill the newly allocated sets */ - hwloc_bitmap_set(set, first); - } - } -} - -int hwloc_bitmap_compare_first(const struct hwloc_bitmap_s * set1, const struct hwloc_bitmap_s * set2) -{ - unsigned count1 = set1->ulongs_count; - unsigned count2 = set2->ulongs_count; - unsigned max_count = count1 > count2 ? count1 : count2; - unsigned min_count = count1 + count2 - max_count; - unsigned i; - - HWLOC__BITMAP_CHECK(set1); - HWLOC__BITMAP_CHECK(set2); - - for(i=0; iulongs[i]; - unsigned long w2 = set2->ulongs[i]; - if (w1 || w2) { - int _ffs1 = hwloc_ffsl(w1); - int _ffs2 = hwloc_ffsl(w2); - /* if both have a bit set, compare for real */ - if (_ffs1 && _ffs2) - return _ffs1-_ffs2; - /* one is empty, and it is considered higher, so reverse-compare them */ - return _ffs2-_ffs1; - } - } - - if (count1 != count2) { - if (min_count < count2) { - for(i=min_count; iulongs[i]; - if (set1->infinite) - return -!(w2 & 1); - else if (w2) - return 1; - } - } else { - for(i=min_count; iulongs[i]; - if (set2->infinite) - return !(w1 & 1); - else if (w1) - return -1; - } - } - } - - return !!set1->infinite - !!set2->infinite; -} - -int hwloc_bitmap_compare(const struct hwloc_bitmap_s * set1, const struct hwloc_bitmap_s * set2) -{ - unsigned count1 = set1->ulongs_count; - unsigned count2 = set2->ulongs_count; - unsigned max_count = count1 > count2 ? count1 : count2; - unsigned min_count = count1 + count2 - max_count; - int i; - - HWLOC__BITMAP_CHECK(set1); - HWLOC__BITMAP_CHECK(set2); - - if ((!set1->infinite) != (!set2->infinite)) - return !!set1->infinite - !!set2->infinite; - - if (count1 != count2) { - if (min_count < count2) { - unsigned long val1 = set1->infinite ? HWLOC_SUBBITMAP_FULL : HWLOC_SUBBITMAP_ZERO; - for(i=max_count-1; i>=(signed) min_count; i--) { - unsigned long val2 = set2->ulongs[i]; - if (val1 == val2) - continue; - return val1 < val2 ? -1 : 1; - } - } else { - unsigned long val2 = set2->infinite ? HWLOC_SUBBITMAP_FULL : HWLOC_SUBBITMAP_ZERO; - for(i=max_count-1; i>=(signed) min_count; i--) { - unsigned long val1 = set1->ulongs[i]; - if (val1 == val2) - continue; - return val1 < val2 ? -1 : 1; - } - } - } - - for(i=min_count-1; i>=0; i--) { - unsigned long val1 = set1->ulongs[i]; - unsigned long val2 = set2->ulongs[i]; - if (val1 == val2) - continue; - return val1 < val2 ? -1 : 1; - } - - return 0; -} - -int hwloc_bitmap_weight(const struct hwloc_bitmap_s * set) -{ - int weight = 0; - unsigned i; - - HWLOC__BITMAP_CHECK(set); - - if (set->infinite) - return -1; - - for(i=0; iulongs_count; i++) - weight += hwloc_weight_long(set->ulongs[i]); - return weight; -} - -int hwloc_bitmap_compare_inclusion(const struct hwloc_bitmap_s * set1, const struct hwloc_bitmap_s * set2) -{ - unsigned max_count = set1->ulongs_count > set2->ulongs_count ? set1->ulongs_count : set2->ulongs_count; - int result = HWLOC_BITMAP_EQUAL; /* means empty sets return equal */ - int empty1 = 1; - int empty2 = 1; - unsigned i; - - HWLOC__BITMAP_CHECK(set1); - HWLOC__BITMAP_CHECK(set2); - - for(i=0; iinfinite) { - if (set2->infinite) { - /* set2 infinite only */ - if (result == HWLOC_BITMAP_CONTAINS) { - if (!empty2) - return HWLOC_BITMAP_INTERSECTS; - result = HWLOC_BITMAP_DIFFERENT; - } else if (result == HWLOC_BITMAP_EQUAL) { - result = HWLOC_BITMAP_INCLUDED; - } - /* no change otherwise */ - } - } else if (!set2->infinite) { - /* set1 infinite only */ - if (result == HWLOC_BITMAP_INCLUDED) { - if (!empty1) - return HWLOC_BITMAP_INTERSECTS; - result = HWLOC_BITMAP_DIFFERENT; - } else if (result == HWLOC_BITMAP_EQUAL) { - result = HWLOC_BITMAP_CONTAINS; - } - /* no change otherwise */ - } else { - /* both infinite */ - if (result == HWLOC_BITMAP_DIFFERENT) - return HWLOC_BITMAP_INTERSECTS; - /* equal/contains/included unchanged */ - } - - return result; -} diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/diff.c b/opal/mca/hwloc/hwloc1113/hwloc/src/diff.c deleted file mode 100644 index 060aa93f556..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/diff.c +++ /dev/null @@ -1,408 +0,0 @@ -/* - * Copyright © 2013-2014 Inria. All rights reserved. - * See COPYING in top-level directory. - */ - -#include -#include -#include - -int hwloc_topology_diff_destroy(hwloc_topology_t topology __hwloc_attribute_unused, - hwloc_topology_diff_t diff) -{ - hwloc_topology_diff_t next; - while (diff) { - next = diff->generic.next; - switch (diff->generic.type) { - default: - break; - case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR: - switch (diff->obj_attr.diff.generic.type) { - default: - break; - case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_NAME: - case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_INFO: - free(diff->obj_attr.diff.string.name); - free(diff->obj_attr.diff.string.oldvalue); - free(diff->obj_attr.diff.string.newvalue); - break; - } - break; - } - free(diff); - diff = next; - } - return 0; -} - -/************************ - * Computing diffs - */ - -static void hwloc_append_diff(hwloc_topology_diff_t newdiff, - hwloc_topology_diff_t *firstdiffp, - hwloc_topology_diff_t *lastdiffp) -{ - if (*firstdiffp) - (*lastdiffp)->generic.next = newdiff; - else - *firstdiffp = newdiff; - *lastdiffp = newdiff; - newdiff->generic.next = NULL; -} - -static int hwloc_append_diff_too_complex(hwloc_obj_t obj1, - hwloc_topology_diff_t *firstdiffp, - hwloc_topology_diff_t *lastdiffp) -{ - hwloc_topology_diff_t newdiff; - newdiff = malloc(sizeof(*newdiff)); - if (!newdiff) - return -1; - - newdiff->too_complex.type = HWLOC_TOPOLOGY_DIFF_TOO_COMPLEX; - newdiff->too_complex.obj_depth = obj1->depth; - newdiff->too_complex.obj_index = obj1->logical_index; - hwloc_append_diff(newdiff, firstdiffp, lastdiffp); - return 0; -} - -static int hwloc_append_diff_obj_attr_string(hwloc_obj_t obj, - hwloc_topology_diff_obj_attr_type_t type, - const char *name, - const char *oldvalue, - const char *newvalue, - hwloc_topology_diff_t *firstdiffp, - hwloc_topology_diff_t *lastdiffp) -{ - hwloc_topology_diff_t newdiff; - - if (obj->type == HWLOC_OBJ_MISC) - /* TODO: add a custom level/depth for Misc */ - return hwloc_append_diff_too_complex(obj, firstdiffp, lastdiffp); - - newdiff = malloc(sizeof(*newdiff)); - if (!newdiff) - return -1; - - newdiff->obj_attr.type = HWLOC_TOPOLOGY_DIFF_OBJ_ATTR; - newdiff->obj_attr.obj_depth = obj->depth; - newdiff->obj_attr.obj_index = obj->logical_index; - newdiff->obj_attr.diff.string.type = type; - newdiff->obj_attr.diff.string.name = name ? strdup(name) : NULL; - newdiff->obj_attr.diff.string.oldvalue = oldvalue ? strdup(oldvalue) : NULL; - newdiff->obj_attr.diff.string.newvalue = newvalue ? strdup(newvalue) : NULL; - hwloc_append_diff(newdiff, firstdiffp, lastdiffp); - return 0; -} - -static int hwloc_append_diff_obj_attr_uint64(hwloc_obj_t obj, - hwloc_topology_diff_obj_attr_type_t type, - hwloc_uint64_t idx, - hwloc_uint64_t oldvalue, - hwloc_uint64_t newvalue, - hwloc_topology_diff_t *firstdiffp, - hwloc_topology_diff_t *lastdiffp) -{ - hwloc_topology_diff_t newdiff; - - if (obj->type == HWLOC_OBJ_MISC) - /* TODO: add a custom level/depth for Misc */ - return hwloc_append_diff_too_complex(obj, firstdiffp, lastdiffp); - - newdiff = malloc(sizeof(*newdiff)); - if (!newdiff) - return -1; - - newdiff->obj_attr.type = HWLOC_TOPOLOGY_DIFF_OBJ_ATTR; - newdiff->obj_attr.obj_depth = obj->depth; - newdiff->obj_attr.obj_index = obj->logical_index; - newdiff->obj_attr.diff.uint64.type = type; - newdiff->obj_attr.diff.uint64.index = idx; - newdiff->obj_attr.diff.uint64.oldvalue = oldvalue; - newdiff->obj_attr.diff.uint64.newvalue = newvalue; - hwloc_append_diff(newdiff, firstdiffp, lastdiffp); - return 0; -} - -static int -hwloc_diff_trees(hwloc_topology_t topo1, hwloc_obj_t obj1, - hwloc_topology_t topo2, hwloc_obj_t obj2, - unsigned flags, - hwloc_topology_diff_t *firstdiffp, hwloc_topology_diff_t *lastdiffp) -{ - unsigned i; - int err; - - if (obj1->depth != obj2->depth) - goto out_too_complex; - if (obj1->type != obj2->type) - goto out_too_complex; - - if (obj1->os_index != obj2->os_index) - /* we could allow different os_index for non-PU non-NUMAnode objects - * but it's likely useless anyway */ - goto out_too_complex; - -#define _SETS_DIFFERENT(_set1, _set2) \ - ( ( !(_set1) != !(_set2) ) \ - || ( (_set1) && !hwloc_bitmap_isequal(_set1, _set2) ) ) -#define SETS_DIFFERENT(_set, _obj1, _obj2) _SETS_DIFFERENT((_obj1)->_set, (_obj2)->_set) - if (SETS_DIFFERENT(cpuset, obj1, obj2) - || SETS_DIFFERENT(complete_cpuset, obj1, obj2) - || SETS_DIFFERENT(online_cpuset, obj1, obj2) - || SETS_DIFFERENT(allowed_cpuset, obj1, obj2) - || SETS_DIFFERENT(nodeset, obj1, obj2) - || SETS_DIFFERENT(complete_nodeset, obj1, obj2) - || SETS_DIFFERENT(allowed_nodeset, obj1, obj2)) - goto out_too_complex; - - /* no need to check logical_index, sibling_rank, symmetric_subtree, - * the parents did it */ - - if ((!obj1->name) != (!obj2->name) - || (obj1->name && strcmp(obj1->name, obj2->name))) { - err = hwloc_append_diff_obj_attr_string(obj1, - HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_NAME, - NULL, - obj1->name, - obj2->name, - firstdiffp, lastdiffp); - if (err < 0) - return err; - } - - /* memory */ - if (obj1->memory.local_memory != obj2->memory.local_memory) { - err = hwloc_append_diff_obj_attr_uint64(obj1, - HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_SIZE, - 0, - obj1->memory.local_memory, - obj2->memory.local_memory, - firstdiffp, lastdiffp); - if (err < 0) - return err; - } - /* ignore memory page_types */ - - /* ignore os_level */ - - /* type-specific attrs */ - switch (obj1->type) { - default: - break; - case HWLOC_OBJ_CACHE: - if (memcmp(obj1->attr, obj2->attr, sizeof(obj1->attr->cache))) - goto out_too_complex; - break; - case HWLOC_OBJ_GROUP: - if (memcmp(obj1->attr, obj2->attr, sizeof(obj1->attr->group))) - goto out_too_complex; - break; - case HWLOC_OBJ_PCI_DEVICE: - if (memcmp(obj1->attr, obj2->attr, sizeof(obj1->attr->pcidev))) - goto out_too_complex; - break; - case HWLOC_OBJ_BRIDGE: - if (memcmp(obj1->attr, obj2->attr, sizeof(obj1->attr->bridge))) - goto out_too_complex; - break; - case HWLOC_OBJ_OS_DEVICE: - if (memcmp(obj1->attr, obj2->attr, sizeof(obj1->attr->osdev))) - goto out_too_complex; - break; - } - - /* distances */ - if (obj1->distances_count != obj2->distances_count) - goto out_too_complex; - for(i=0; idistances_count; i++) { - struct hwloc_distances_s *d1 = obj1->distances[i], *d2 = obj2->distances[i]; - if (d1->relative_depth != d2->relative_depth - || d1->nbobjs != d2->nbobjs - || d1->latency_max != d2->latency_max - || d1->latency_base != d2->latency_base - || memcmp(d1->latency, d2->latency, d1->nbobjs * d1->nbobjs * sizeof(*d1->latency))) - goto out_too_complex; - } - - /* infos */ - if (obj1->infos_count != obj2->infos_count) - goto out_too_complex; - for(i=0; iinfos_count; i++) { - if (strcmp(obj1->infos[i].name, obj2->infos[i].name)) - goto out_too_complex; - if (strcmp(obj1->infos[i].value, obj2->infos[i].value)) { - err = hwloc_append_diff_obj_attr_string(obj1, - HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_INFO, - obj1->infos[i].name, - obj1->infos[i].value, - obj2->infos[i].value, - firstdiffp, lastdiffp); - if (err < 0) - return err; - } - } - - /* ignore userdata */ - - /* children */ - if (obj1->arity != obj2->arity) - goto out_too_complex; - for(i=0; iarity; i++) { - err = hwloc_diff_trees(topo1, obj1->children[i], - topo2, obj2->children[i], - flags, - firstdiffp, lastdiffp); - if (err < 0) - return err; - } - - return 0; - -out_too_complex: - hwloc_append_diff_too_complex(obj1, firstdiffp, lastdiffp); - return 0; -} - -int hwloc_topology_diff_build(hwloc_topology_t topo1, - hwloc_topology_t topo2, - unsigned long flags, - hwloc_topology_diff_t *diffp) -{ - hwloc_topology_diff_t lastdiff, tmpdiff; - int err; - - if (flags != 0) { - errno = EINVAL; - return -1; - } - - *diffp = NULL; - err = hwloc_diff_trees(topo1, hwloc_get_root_obj(topo1), - topo2, hwloc_get_root_obj(topo2), - flags, - diffp, &lastdiff); - - if (!err) { - tmpdiff = *diffp; - while (tmpdiff) { - if (tmpdiff->generic.type == HWLOC_TOPOLOGY_DIFF_TOO_COMPLEX) { - err = 1; - break; - } - tmpdiff = tmpdiff->generic.next; - } - } - - return err; -} - -/******************** - * Applying diffs - */ - -static int -hwloc_apply_diff_one(hwloc_topology_t topology, - hwloc_topology_diff_t diff, - unsigned long flags) -{ - int reverse = !!(flags & HWLOC_TOPOLOGY_DIFF_APPLY_REVERSE); - - switch (diff->generic.type) { - case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR: { - struct hwloc_topology_diff_obj_attr_s *obj_attr = &diff->obj_attr; - hwloc_obj_t obj = hwloc_get_obj_by_depth(topology, obj_attr->obj_depth, obj_attr->obj_index); - if (!obj) - return -1; - - switch (obj_attr->diff.generic.type) { - case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_SIZE: { - hwloc_obj_t tmpobj; - hwloc_uint64_t oldvalue = reverse ? obj_attr->diff.uint64.newvalue : obj_attr->diff.uint64.oldvalue; - hwloc_uint64_t newvalue = reverse ? obj_attr->diff.uint64.oldvalue : obj_attr->diff.uint64.newvalue; - hwloc_uint64_t valuediff = newvalue - oldvalue; - if (obj->memory.local_memory != oldvalue) - return -1; - obj->memory.local_memory = newvalue; - tmpobj = obj; - while (tmpobj) { - tmpobj->memory.total_memory += valuediff; - tmpobj = tmpobj->parent; - } - break; - } - case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_NAME: { - const char *oldvalue = reverse ? obj_attr->diff.string.newvalue : obj_attr->diff.string.oldvalue; - const char *newvalue = reverse ? obj_attr->diff.string.oldvalue : obj_attr->diff.string.newvalue; - if (!obj->name || strcmp(obj->name, oldvalue)) - return -1; - free(obj->name); - obj->name = strdup(newvalue); - break; - } - case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_INFO: { - const char *name = obj_attr->diff.string.name; - const char *oldvalue = reverse ? obj_attr->diff.string.newvalue : obj_attr->diff.string.oldvalue; - const char *newvalue = reverse ? obj_attr->diff.string.oldvalue : obj_attr->diff.string.newvalue; - unsigned i; - int found = 0; - for(i=0; iinfos_count; i++) { - if (!strcmp(obj->infos[i].name, name) - && !strcmp(obj->infos[i].value, oldvalue)) { - free(obj->infos[i].value); - obj->infos[i].value = strdup(newvalue); - found = 1; - break; - } - } - if (!found) - return -1; - break; - } - default: - return -1; - } - - break; - } - default: - return -1; - } - - return 0; -} - -int hwloc_topology_diff_apply(hwloc_topology_t topology, - hwloc_topology_diff_t diff, - unsigned long flags) -{ - hwloc_topology_diff_t tmpdiff, tmpdiff2; - int err, nr; - - if (flags & ~HWLOC_TOPOLOGY_DIFF_APPLY_REVERSE) { - errno = EINVAL; - return -1; - } - - tmpdiff = diff; - nr = 0; - while (tmpdiff) { - nr++; - err = hwloc_apply_diff_one(topology, tmpdiff, flags); - if (err < 0) - goto cancel; - tmpdiff = tmpdiff->generic.next; - } - return 0; - -cancel: - tmpdiff2 = tmpdiff; - tmpdiff = diff; - while (tmpdiff != tmpdiff2) { - hwloc_apply_diff_one(topology, tmpdiff, flags ^ HWLOC_TOPOLOGY_DIFF_APPLY_REVERSE); - tmpdiff = tmpdiff->generic.next; - } - errno = EINVAL; - return -nr; /* return the index (starting at 1) of the first element that couldn't be applied */ -} diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/distances.c b/opal/mca/hwloc/hwloc1113/hwloc/src/distances.c deleted file mode 100644 index b2bfbdd8bbf..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/distances.c +++ /dev/null @@ -1,1063 +0,0 @@ -/* - * Copyright © 2010-2016 Inria. All rights reserved. - * Copyright © 2011-2012 Université Bordeaux - * Copyright © 2011 Cisco Systems, Inc. All rights reserved. - * See COPYING in top-level directory. - */ - -#include -#include -#include -#include -#include - -#include -#include - -/************************** - * Main Init/Clear/Destroy - */ - -/* called during topology init */ -void hwloc_distances_init(struct hwloc_topology *topology) -{ - topology->first_osdist = topology->last_osdist = NULL; -} - -/* called during topology destroy */ -void hwloc_distances_destroy(struct hwloc_topology * topology) -{ - struct hwloc_os_distances_s *osdist, *next = topology->first_osdist; - while ((osdist = next) != NULL) { - next = osdist->next; - /* remove final distance matrics AND physically-ordered ones */ - free(osdist->indexes); - free(osdist->objs); - free(osdist->distances); - free(osdist); - } - topology->first_osdist = topology->last_osdist = NULL; -} - -/****************************************************** - * Inserting distances in the topology - * from a backend, from the environment or by the user - */ - -/* insert a distance matrix in the topology. - * the caller gives us those pointers, we take care of freeing them later and so on. - */ -void hwloc_distances_set(hwloc_topology_t __hwloc_restrict topology, hwloc_obj_type_t type, - unsigned nbobjs, unsigned *indexes, hwloc_obj_t *objs, float *distances, - int force) -{ - struct hwloc_os_distances_s *osdist, *next = topology->first_osdist; - /* look for existing distances for the same type */ - while ((osdist = next) != NULL) { - next = osdist->next; - if (osdist->type == type) { - if (osdist->forced && !force) { - /* there is a forced distance element, ignore the new non-forced one */ - free(indexes); - free(objs); - free(distances); - return; - } else if (force) { - /* we're forcing a new distance, remove the old ones */ - free(osdist->indexes); - free(osdist->objs); - free(osdist->distances); - /* remove current object */ - if (osdist->prev) - osdist->prev->next = next; - else - topology->first_osdist = next; - if (next) - next->prev = osdist->prev; - else - topology->last_osdist = osdist->prev; - /* free current object */ - free(osdist); - } - } - } - - if (!nbobjs) - /* we're just clearing, return now */ - return; - assert(nbobjs >= 2); - - /* create the new element */ - osdist = malloc(sizeof(struct hwloc_os_distances_s)); - osdist->nbobjs = nbobjs; - osdist->indexes = indexes; - osdist->objs = objs; - osdist->distances = distances; - osdist->forced = force; - osdist->type = type; - /* insert it */ - osdist->next = NULL; - osdist->prev = topology->last_osdist; - if (topology->last_osdist) - topology->last_osdist->next = osdist; - else - topology->first_osdist = osdist; - topology->last_osdist = osdist; -} - -/* make sure a user-given distance matrix is sane */ -static int hwloc_distances__check_matrix(hwloc_topology_t __hwloc_restrict topology __hwloc_attribute_unused, hwloc_obj_type_t type __hwloc_attribute_unused, - unsigned nbobjs, unsigned *indexes, hwloc_obj_t *objs __hwloc_attribute_unused, float *distances __hwloc_attribute_unused) -{ - unsigned i,j; - /* make sure we don't have the same index twice */ - for(i=0; i= 2) { - /* generate the matrix to create x groups of y elements */ - if (x*y*z != nbobjs) { - fprintf(stderr, "Ignoring %s distances from environment variable, invalid grouping (%u*%u*%u=%u instead of %u)\n", - hwloc_obj_type_string(type), x, y, z, x*y*z, nbobjs); - free(indexes); - free(distances); - return; - } - for(i=0; ifirst_osdist; osdist; osdist = osdist->next) { - /* remove the objs array, we'll rebuild it from the indexes - * depending on remaining objects */ - free(osdist->objs); - osdist->objs = NULL; - } -} - - -/* cleanup everything we created from distances so that we may rebuild them - * at the end of restrict() - */ -void hwloc_distances_restrict(struct hwloc_topology *topology, unsigned long flags) -{ - if (flags & HWLOC_RESTRICT_FLAG_ADAPT_DISTANCES) { - /* some objects may have been removed, clear objects arrays so that finalize_os rebuilds them properly */ - hwloc_distances_restrict_os(topology); - } else { - /* if not adapting distances, drop everything */ - hwloc_distances_destroy(topology); - } -} - -/************************************************************** - * Convert user/env given array of indexes into actual objects - */ - -static hwloc_obj_t hwloc_find_obj_by_type_and_os_index(hwloc_obj_t root, hwloc_obj_type_t type, unsigned os_index) -{ - hwloc_obj_t child; - if (root->type == type && root->os_index == os_index) - return root; - child = root->first_child; - while (child) { - hwloc_obj_t found = hwloc_find_obj_by_type_and_os_index(child, type, os_index); - if (found) - return found; - child = child->next_sibling; - } - return NULL; -} - -/* convert distance indexes that were previously stored in the topology - * into actual objects if not done already. - * it's already done when distances come from backends (this function should not be called then). - * it's not done when distances come from the user. - * - * returns -1 if the matrix was invalid - */ -static int -hwloc_distances__finalize_os(struct hwloc_topology *topology, struct hwloc_os_distances_s *osdist) -{ - unsigned nbobjs = osdist->nbobjs; - unsigned *indexes = osdist->indexes; - float *distances = osdist->distances; - unsigned i, j; - hwloc_obj_type_t type = osdist->type; - hwloc_obj_t *objs = calloc(nbobjs, sizeof(hwloc_obj_t)); - - assert(!osdist->objs); - - /* traverse the topology and look for the relevant objects */ - for(i=0; ilevels[0][0], type, indexes[i]); - if (!obj) { - - /* shift the matrix */ -#define OLDPOS(i,j) (distances+(i)*nbobjs+(j)) -#define NEWPOS(i,j) (distances+(i)*(nbobjs-1)+(j)) - if (i>0) { - /** no need to move beginning of 0th line */ - for(j=0; jnbobjs = nbobjs; - if (!nbobjs) { - /* the whole matrix was invalid, let the caller remove this distances */ - free(objs); - return -1; - } - - /* setup the objs array */ - osdist->objs = objs; - return 0; -} - - -void hwloc_distances_finalize_os(struct hwloc_topology *topology) -{ - int dropall = !topology->levels[0][0]->cpuset; /* we don't support distances on multinode systems */ - - struct hwloc_os_distances_s *osdist, *next = topology->first_osdist; - while ((osdist = next) != NULL) { - int err; - next = osdist->next; - - if (dropall) - goto drop; - - /* remove final distance matrics AND physically-ordered ones */ - - if (osdist->objs) - /* nothing to do, switch to the next element */ - continue; - - err = hwloc_distances__finalize_os(topology, osdist); - if (!err) - /* convert ok, switch to the next element */ - continue; - - drop: - /* remove this element */ - free(osdist->indexes); - free(osdist->distances); - /* remove current object */ - if (osdist->prev) - osdist->prev->next = next; - else - topology->first_osdist = next; - if (next) - next->prev = osdist->prev; - else - topology->last_osdist = osdist->prev; - /* free current object */ - free(osdist); - } -} - -/*********************************************************** - * Convert internal distances given by the backend/env/user - * into exported logical distances attached to objects - */ - -static void -hwloc_distances__finalize_logical(struct hwloc_topology *topology, - unsigned nbobjs, - hwloc_obj_t *objs, float *osmatrix) -{ - unsigned i, j, li, lj, minl; - float min = FLT_MAX, max = FLT_MIN; - hwloc_obj_t root; - float *matrix; - hwloc_cpuset_t cpuset, complete_cpuset; - hwloc_nodeset_t nodeset, complete_nodeset; - unsigned relative_depth; - int idx; - - /* find the root */ - cpuset = hwloc_bitmap_alloc(); - complete_cpuset = hwloc_bitmap_alloc(); - nodeset = hwloc_bitmap_alloc(); - complete_nodeset = hwloc_bitmap_alloc(); - for(i=0; icpuset); - if (objs[i]->complete_cpuset) - hwloc_bitmap_or(complete_cpuset, complete_cpuset, objs[i]->complete_cpuset); - if (objs[i]->nodeset) - hwloc_bitmap_or(nodeset, nodeset, objs[i]->nodeset); - if (objs[i]->complete_nodeset) - hwloc_bitmap_or(complete_nodeset, complete_nodeset, objs[i]->complete_nodeset); - } - /* find the object covering cpuset, we'll take care of the nodeset later */ - root = hwloc_get_obj_covering_cpuset(topology, cpuset); - /* walk up to find a parent that also covers the nodeset */ - while (root && - (!hwloc_bitmap_isincluded(nodeset, root->nodeset) - || !hwloc_bitmap_isincluded(complete_nodeset, root->complete_nodeset) - || !hwloc_bitmap_isincluded(complete_cpuset, root->complete_cpuset))) - root = root->parent; - if (!root) { - /* should not happen, ignore the distance matrix and report an error. */ - if (!hwloc_hide_errors()) { - char *a, *b; - hwloc_bitmap_asprintf(&a, cpuset); - hwloc_bitmap_asprintf(&b, nodeset); - fprintf(stderr, "****************************************************************************\n"); - fprintf(stderr, "* hwloc %s has encountered an error when adding a distance matrix to the topology.\n", HWLOC_VERSION); - fprintf(stderr, "*\n"); - fprintf(stderr, "* hwloc_distances__finalize_logical() could not find any object covering\n"); - fprintf(stderr, "* cpuset %s and nodeset %s\n", a, b); - fprintf(stderr, "*\n"); - fprintf(stderr, "* Please report this error message to the hwloc user's mailing list,\n"); -#ifdef HWLOC_LINUX_SYS - fprintf(stderr, "* along with the output from the hwloc-gather-topology script.\n"); -#else - fprintf(stderr, "* along with any relevant topology information from your platform.\n"); -#endif - fprintf(stderr, "****************************************************************************\n"); - free(a); - free(b); - } - hwloc_bitmap_free(cpuset); - hwloc_bitmap_free(complete_cpuset); - hwloc_bitmap_free(nodeset); - hwloc_bitmap_free(complete_nodeset); - return; - } - /* don't attach to Misc objects */ - while (root->type == HWLOC_OBJ_MISC) - root = root->parent; - /* ideally, root has the exact cpuset and nodeset. - * but ignoring or other things that remove objects may cause the object array to reduce */ - assert(hwloc_bitmap_isincluded(cpuset, root->cpuset)); - assert(hwloc_bitmap_isincluded(complete_cpuset, root->complete_cpuset)); - assert(hwloc_bitmap_isincluded(nodeset, root->nodeset)); - assert(hwloc_bitmap_isincluded(complete_nodeset, root->complete_nodeset)); - hwloc_bitmap_free(cpuset); - hwloc_bitmap_free(complete_cpuset); - hwloc_bitmap_free(nodeset); - hwloc_bitmap_free(complete_nodeset); - if (root->depth >= objs[0]->depth) { - /* strange topology led us to find invalid relative depth, ignore */ - return; - } - relative_depth = objs[0]->depth - root->depth; /* this assume that we have distances between objects of the same level */ - - if (nbobjs != hwloc_get_nbobjs_inside_cpuset_by_depth(topology, root->cpuset, root->depth + relative_depth)) - /* the root does not cover the right number of objects, maybe we failed to insert a root (bad intersect or so). */ - return; - - /* get the logical index offset, it's the min of all logical indexes */ - minl = UINT_MAX; - for(i=0; i objs[i]->logical_index) - minl = objs[i]->logical_index; - - /* compute/check min/max values */ - for(i=0; i max) - max = val; - } - if (!min) { - /* Linux up to 2.6.36 reports ACPI SLIT distances, which should be memory latencies. - * Except of SGI IP27 (SGI Origin 200/2000 with MIPS processors) where the distances - * are the number of hops between routers. - */ - hwloc_debug("%s", "minimal distance is 0, matrix does not seem to contain latencies, ignoring\n"); - return; - } - - /* store the normalized latency matrix in the root object */ - idx = root->distances_count++; - root->distances = realloc(root->distances, root->distances_count * sizeof(struct hwloc_distances_s *)); - root->distances[idx] = malloc(sizeof(struct hwloc_distances_s)); - root->distances[idx]->relative_depth = relative_depth; - root->distances[idx]->nbobjs = nbobjs; - root->distances[idx]->latency = matrix = malloc(nbobjs*nbobjs*sizeof(float)); - root->distances[idx]->latency_base = (float) min; -#define NORMALIZE_LATENCY(d) ((d)/(min)) - root->distances[idx]->latency_max = NORMALIZE_LATENCY(max); - for(i=0; ilogical_index - minl; - matrix[li*nbobjs+li] = NORMALIZE_LATENCY(osmatrix[i*nbobjs+i]); - for(j=i+1; jlogical_index - minl; - matrix[li*nbobjs+lj] = NORMALIZE_LATENCY(osmatrix[i*nbobjs+j]); - matrix[lj*nbobjs+li] = NORMALIZE_LATENCY(osmatrix[j*nbobjs+i]); - } - } -} - -/* convert internal distances into logically-ordered distances - * that can be exposed in the API - */ -void -hwloc_distances_finalize_logical(struct hwloc_topology *topology) -{ - unsigned nbobjs; - int depth; - struct hwloc_os_distances_s * osdist; - for(osdist = topology->first_osdist; osdist; osdist = osdist->next) { - - nbobjs = osdist->nbobjs; - if (!nbobjs) - continue; - - depth = hwloc_get_type_depth(topology, osdist->type); - if (depth == HWLOC_TYPE_DEPTH_UNKNOWN || depth == HWLOC_TYPE_DEPTH_MULTIPLE) - continue; - - if (osdist->objs) { - assert(osdist->distances); - hwloc_distances__finalize_logical(topology, nbobjs, - osdist->objs, - osdist->distances); - } - } -} - -/*************************************************** - * Destroying logical distances attached to objects - */ - -/* destroy an object distances structure */ -void -hwloc_clear_object_distances_one(struct hwloc_distances_s * distances) -{ - free(distances->latency); - free(distances); - -} - -void -hwloc_clear_object_distances(hwloc_obj_t obj) -{ - unsigned i; - for (i=0; idistances_count; i++) - hwloc_clear_object_distances_one(obj->distances[i]); - free(obj->distances); - obj->distances = NULL; - obj->distances_count = 0; -} - -/****************************************** - * Grouping objects according to distances - */ - -static void hwloc_report_user_distance_error(const char *msg, int line) -{ - static int reported = 0; - - if (!reported && !hwloc_hide_errors()) { - fprintf(stderr, "****************************************************************************\n"); - fprintf(stderr, "* hwloc %s has encountered what looks like an error from user-given distances.\n", HWLOC_VERSION); - fprintf(stderr, "*\n"); - fprintf(stderr, "* %s\n", msg); - fprintf(stderr, "* Error occurred in topology.c line %d\n", line); - fprintf(stderr, "*\n"); - fprintf(stderr, "* Please make sure that distances given through the interface or environment\n"); - fprintf(stderr, "* variables do not contradict any other topology information.\n"); - fprintf(stderr, "****************************************************************************\n"); - reported = 1; - } -} - -static int hwloc_compare_distances(float a, float b, float accuracy) -{ - if (accuracy != 0.0 && fabsf(a-b) < a * accuracy) - return 0; - return a < b ? -1 : a == b ? 0 : 1; -} - -/* - * Place objects in groups if they are in a transitive graph of minimal distances. - * Return how many groups were created, or 0 if some incomplete distance graphs were found. - */ -static unsigned -hwloc__find_groups_by_min_distance(unsigned nbobjs, - float *_distances, - float accuracy, - unsigned *groupids, - int verbose) -{ - float min_distance = FLT_MAX; - unsigned groupid = 1; - unsigned i,j,k; - unsigned skipped = 0; - -#define DISTANCE(i, j) _distances[(i) * nbobjs + (j)] - - memset(groupids, 0, nbobjs*sizeof(*groupids)); - - /* find the minimal distance */ - for(i=0; itype), accuracies[i]); - if (needcheck && hwloc__check_grouping_matrix(nbobjs, _distances, accuracies[i], verbose) < 0) - continue; - nbgroups = hwloc__find_groups_by_min_distance(nbobjs, _distances, accuracies[i], groupids, verbose); - if (nbgroups) - break; - } - if (!nbgroups) - goto outter_free; - - /* For convenience, put these declarations inside a block. It's a - crying shame we can't use C99 syntax here, and have to do a bunch - of mallocs. :-( */ - { - hwloc_obj_t *groupobjs = NULL; - unsigned *groupsizes = NULL; - float *groupdistances = NULL; - unsigned failed = 0; - - groupobjs = malloc(sizeof(hwloc_obj_t) * nbgroups); - groupsizes = malloc(sizeof(unsigned) * nbgroups); - groupdistances = malloc(sizeof(float) * nbgroups * nbgroups); - if (NULL == groupobjs || NULL == groupsizes || NULL == groupdistances) { - goto inner_free; - } - /* create new Group objects and record their size */ - memset(&(groupsizes[0]), 0, sizeof(groupsizes[0]) * nbgroups); - for(i=0; icpuset = hwloc_bitmap_alloc(); - group_obj->attr->group.depth = topology->next_group_depth; - for (j=0; jcpuset, group_obj->cpuset, objs[j]->cpuset); - if (objs[i]->complete_cpuset) { - if (!group_obj->complete_cpuset) - group_obj->complete_cpuset = hwloc_bitmap_alloc(); - hwloc_bitmap_or(group_obj->complete_cpuset, group_obj->complete_cpuset, objs[j]->complete_cpuset); - } - /* if one obj has a nodeset, assemble a group nodeset */ - if (objs[j]->nodeset) { - if (!group_obj->nodeset) - group_obj->nodeset = hwloc_bitmap_alloc(); - hwloc_bitmap_or(group_obj->nodeset, group_obj->nodeset, objs[j]->nodeset); - } - if (objs[i]->complete_nodeset) { - if (!group_obj->complete_nodeset) - group_obj->complete_nodeset = hwloc_bitmap_alloc(); - hwloc_bitmap_or(group_obj->complete_nodeset, group_obj->complete_nodeset, objs[j]->complete_nodeset); - } - groupsizes[i]++; - } - hwloc_debug_1arg_bitmap("adding Group object with %u objects and cpuset %s\n", - groupsizes[i], group_obj->cpuset); - res_obj = hwloc__insert_object_by_cpuset(topology, group_obj, - fromuser ? hwloc_report_user_distance_error : hwloc_report_os_error); - /* res_obj may be NULL on failure to insert. */ - if (!res_obj) - failed++; - /* or it may be different from groupobjs if we got groups from XML import before grouping */ - groupobjs[i] = res_obj; - } - - if (failed) - /* don't try to group above if we got a NULL group here, just keep this incomplete level */ - goto inner_free; - - /* factorize distances */ - memset(&(groupdistances[0]), 0, sizeof(groupdistances[0]) * nbgroups * nbgroups); -#undef DISTANCE -#define DISTANCE(i, j) _distances[(i) * nbobjs + (j)] -#define GROUP_DISTANCE(i, j) groupdistances[(i) * nbgroups + (j)] - for(i=0; inext_group_depth++; - hwloc__groups_by_distances(topology, nbgroups, groupobjs, (float*) groupdistances, nbaccuracies, accuracies, fromuser, 0 /* no need to check generated matrix */, verbose); - - inner_free: - /* Safely free everything */ - if (NULL != groupobjs) { - free(groupobjs); - } - if (NULL != groupsizes) { - free(groupsizes); - } - if (NULL != groupdistances) { - free(groupdistances); - } - } - - outter_free: - if (NULL != groupids) { - free(groupids); - } -} - -void -hwloc_group_by_distances(struct hwloc_topology *topology) -{ - unsigned nbobjs; - struct hwloc_os_distances_s * osdist; - const char *env; - float accuracies[5] = { 0.0f, 0.01f, 0.02f, 0.05f, 0.1f }; - unsigned nbaccuracies = 5; - hwloc_obj_t group_obj; - int verbose = 0; - unsigned i; - hwloc_localeswitch_declare; -#ifdef HWLOC_DEBUG - unsigned j; -#endif - - env = getenv("HWLOC_GROUPING"); - if (env && !atoi(env)) - return; - /* backward compat with v1.2 */ - if (getenv("HWLOC_IGNORE_DISTANCES")) - return; - - hwloc_localeswitch_init(); - env = getenv("HWLOC_GROUPING_ACCURACY"); - if (!env) { - /* only use 0.0 */ - nbaccuracies = 1; - } else if (strcmp(env, "try")) { - /* use the given value */ - nbaccuracies = 1; - accuracies[0] = (float) atof(env); - } /* otherwise try all values */ - hwloc_localeswitch_fini(); - -#ifdef HWLOC_DEBUG - verbose = 1; -#else - env = getenv("HWLOC_GROUPING_VERBOSE"); - if (env) - verbose = atoi(env); -#endif - - for(osdist = topology->first_osdist; osdist; osdist = osdist->next) { - - nbobjs = osdist->nbobjs; - if (!nbobjs) - continue; - - if (osdist->objs) { - /* if we have objs, we must have distances as well, - * thanks to hwloc_convert_distances_indexes_into_objects() - */ - assert(osdist->distances); - -#ifdef HWLOC_DEBUG - hwloc_debug("%s", "trying to group objects using distance matrix:\n"); - hwloc_debug("%s", " index"); - for(j=0; jobjs[j]->os_index); - hwloc_debug("%s", "\n"); - for(i=0; iobjs[i]->os_index); - for(j=0; jdistances[i*nbobjs + j]); - hwloc_debug("%s", "\n"); - } -#endif - - hwloc__groups_by_distances(topology, nbobjs, - osdist->objs, - osdist->distances, - nbaccuracies, accuracies, - osdist->indexes != NULL, - 1 /* check the first matrice */, - verbose); - - /* add a final group object covering everybody so that the distance matrix can be stored somewhere. - * this group will be merged into a regular object if the matrix isn't strangely incomplete - */ - group_obj = hwloc_alloc_setup_object(HWLOC_OBJ_GROUP, -1); - group_obj->attr->group.depth = (unsigned) -1; - group_obj->cpuset = hwloc_bitmap_alloc(); - for(i=0; icpuset, group_obj->cpuset, osdist->objs[i]->cpuset); - if (osdist->objs[i]->complete_cpuset) { - if (!group_obj->complete_cpuset) - group_obj->complete_cpuset = hwloc_bitmap_alloc(); - hwloc_bitmap_or(group_obj->complete_cpuset, group_obj->complete_cpuset, osdist->objs[i]->complete_cpuset); - } - /* if one obj has a nodeset, assemble a group nodeset */ - if (osdist->objs[i]->nodeset) { - if (!group_obj->nodeset) - group_obj->nodeset = hwloc_bitmap_alloc(); - hwloc_bitmap_or(group_obj->nodeset, group_obj->nodeset, osdist->objs[i]->nodeset); - } - if (osdist->objs[i]->complete_nodeset) { - if (!group_obj->complete_nodeset) - group_obj->complete_nodeset = hwloc_bitmap_alloc(); - hwloc_bitmap_or(group_obj->complete_nodeset, group_obj->complete_nodeset, osdist->objs[i]->complete_nodeset); - } - } - hwloc_debug_1arg_bitmap("adding Group object (as root of distance matrix with %u objects) with cpuset %s\n", - nbobjs, group_obj->cpuset); - hwloc__insert_object_by_cpuset(topology, group_obj, - osdist->indexes != NULL ? hwloc_report_user_distance_error : hwloc_report_os_error); - } - } -} diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/hwloc.dtd b/opal/mca/hwloc/hwloc1113/hwloc/src/hwloc.dtd deleted file mode 100644 index 5e494f80a8d..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/hwloc.dtd +++ /dev/null @@ -1,71 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/pci-common.c b/opal/mca/hwloc/hwloc1113/hwloc/src/pci-common.c deleted file mode 100644 index 39df9dc4212..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/pci-common.c +++ /dev/null @@ -1,537 +0,0 @@ -/* - * Copyright © 2009-2016 Inria. All rights reserved. - * See COPYING in top-level directory. - */ - -#include -#include -#include -#include -#include -#include - -#ifdef HWLOC_DEBUG -static void -hwloc_pci_traverse_print_cb(void * cbdata __hwloc_attribute_unused, - struct hwloc_obj *pcidev) -{ - char busid[14]; - hwloc_obj_t parent; - - /* indent */ - parent = pcidev->parent; - while (parent) { - hwloc_debug("%s", " "); - parent = parent->parent; - } - - snprintf(busid, sizeof(busid), "%04x:%02x:%02x.%01x", - pcidev->attr->pcidev.domain, pcidev->attr->pcidev.bus, pcidev->attr->pcidev.dev, pcidev->attr->pcidev.func); - - if (pcidev->type == HWLOC_OBJ_BRIDGE) { - if (pcidev->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_HOST) - hwloc_debug("HostBridge"); - else - hwloc_debug("Bridge [%04x:%04x]", busid, - pcidev->attr->pcidev.vendor_id, pcidev->attr->pcidev.device_id); - hwloc_debug(" to %04x:[%02x:%02x]\n", - pcidev->attr->bridge.downstream.pci.domain, pcidev->attr->bridge.downstream.pci.secondary_bus, pcidev->attr->bridge.downstream.pci.subordinate_bus); - } else - hwloc_debug("%s Device [%04x:%04x (%04x:%04x) rev=%02x class=%04x]\n", busid, - pcidev->attr->pcidev.vendor_id, pcidev->attr->pcidev.device_id, - pcidev->attr->pcidev.subvendor_id, pcidev->attr->pcidev.subdevice_id, - pcidev->attr->pcidev.revision, pcidev->attr->pcidev.class_id); -} -#endif /* HWLOC_DEBUG */ - -static void -hwloc_pci_traverse_lookuposdevices_cb(void * cbdata, - struct hwloc_obj *pcidev) -{ - struct hwloc_backend *backend = cbdata; - - if (pcidev->type == HWLOC_OBJ_BRIDGE) - return; - - hwloc_backends_notify_new_object(backend, pcidev); -} - -static void -hwloc_pci__traverse(void * cbdata, struct hwloc_obj *root, - void (*cb)(void * cbdata, struct hwloc_obj *)) -{ - struct hwloc_obj *child = root->first_child; - while (child) { - cb(cbdata, child); - if (child->type == HWLOC_OBJ_BRIDGE) - hwloc_pci__traverse(cbdata, child, cb); - child = child->next_sibling; - } -} - -static void -hwloc_pci_traverse(void * cbdata, struct hwloc_obj *root, - void (*cb)(void * cbdata, struct hwloc_obj *)) -{ - hwloc_pci__traverse(cbdata, root, cb); -} - -enum hwloc_pci_busid_comparison_e { - HWLOC_PCI_BUSID_LOWER, - HWLOC_PCI_BUSID_HIGHER, - HWLOC_PCI_BUSID_INCLUDED, - HWLOC_PCI_BUSID_SUPERSET -}; - -static enum hwloc_pci_busid_comparison_e -hwloc_pci_compare_busids(struct hwloc_obj *a, struct hwloc_obj *b) -{ - if (a->type == HWLOC_OBJ_BRIDGE) - assert(a->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_PCI); - if (b->type == HWLOC_OBJ_BRIDGE) - assert(b->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_PCI); - - if (a->attr->pcidev.domain < b->attr->pcidev.domain) - return HWLOC_PCI_BUSID_LOWER; - if (a->attr->pcidev.domain > b->attr->pcidev.domain) - return HWLOC_PCI_BUSID_HIGHER; - - if (a->type == HWLOC_OBJ_BRIDGE - && b->attr->pcidev.bus >= a->attr->bridge.downstream.pci.secondary_bus - && b->attr->pcidev.bus <= a->attr->bridge.downstream.pci.subordinate_bus) - return HWLOC_PCI_BUSID_SUPERSET; - if (b->type == HWLOC_OBJ_BRIDGE - && a->attr->pcidev.bus >= b->attr->bridge.downstream.pci.secondary_bus - && a->attr->pcidev.bus <= b->attr->bridge.downstream.pci.subordinate_bus) - return HWLOC_PCI_BUSID_INCLUDED; - - if (a->attr->pcidev.bus < b->attr->pcidev.bus) - return HWLOC_PCI_BUSID_LOWER; - if (a->attr->pcidev.bus > b->attr->pcidev.bus) - return HWLOC_PCI_BUSID_HIGHER; - - if (a->attr->pcidev.dev < b->attr->pcidev.dev) - return HWLOC_PCI_BUSID_LOWER; - if (a->attr->pcidev.dev > b->attr->pcidev.dev) - return HWLOC_PCI_BUSID_HIGHER; - - if (a->attr->pcidev.func < b->attr->pcidev.func) - return HWLOC_PCI_BUSID_LOWER; - if (a->attr->pcidev.func > b->attr->pcidev.func) - return HWLOC_PCI_BUSID_HIGHER; - - /* Should never reach here. Abort on both debug builds and - non-debug builds */ - assert(0); - fprintf(stderr, "Bad assertion in hwloc %s:%d (aborting)\n", __FILE__, __LINE__); - exit(1); -} - -static void -hwloc_pci_add_child_before(struct hwloc_obj *root, struct hwloc_obj *child, struct hwloc_obj *new) -{ - if (child) { - new->prev_sibling = child->prev_sibling; - child->prev_sibling = new; - } else { - new->prev_sibling = root->last_child; - root->last_child = new; - } - - if (new->prev_sibling) - new->prev_sibling->next_sibling = new; - else - root->first_child = new; - new->next_sibling = child; - - new->parent = root; /* so that hwloc_pci_traverse_print_cb() can indent by depth */ -} - -static void -hwloc_pci_remove_child(struct hwloc_obj *root, struct hwloc_obj *child) -{ - if (child->next_sibling) - child->next_sibling->prev_sibling = child->prev_sibling; - else - root->last_child = child->prev_sibling; - if (child->prev_sibling) - child->prev_sibling->next_sibling = child->next_sibling; - else - root->first_child = child->next_sibling; - child->prev_sibling = NULL; - child->next_sibling = NULL; -} - -static void hwloc_pci_add_object(struct hwloc_obj *root, struct hwloc_obj *new); - -static void -hwloc_pci_try_insert_siblings_below_new_bridge(struct hwloc_obj *root, struct hwloc_obj *new) -{ - enum hwloc_pci_busid_comparison_e comp; - struct hwloc_obj *current, *next; - - next = new->next_sibling; - while (next) { - current = next; - next = current->next_sibling; - - comp = hwloc_pci_compare_busids(current, new); - assert(comp != HWLOC_PCI_BUSID_SUPERSET); - if (comp == HWLOC_PCI_BUSID_HIGHER) - continue; - assert(comp == HWLOC_PCI_BUSID_INCLUDED); - - /* move this object below the new bridge */ - hwloc_pci_remove_child(root, current); - hwloc_pci_add_object(new, current); - } -} - -static void -hwloc_pci_add_object(struct hwloc_obj *root, struct hwloc_obj *new) -{ - struct hwloc_obj *current; - - current = root->first_child; - while (current) { - enum hwloc_pci_busid_comparison_e comp = hwloc_pci_compare_busids(new, current); - switch (comp) { - case HWLOC_PCI_BUSID_HIGHER: - /* go further */ - current = current->next_sibling; - continue; - case HWLOC_PCI_BUSID_INCLUDED: - /* insert below current bridge */ - hwloc_pci_add_object(current, new); - return; - case HWLOC_PCI_BUSID_LOWER: - case HWLOC_PCI_BUSID_SUPERSET: - /* insert before current object */ - hwloc_pci_add_child_before(root, current, new); - /* walk next siblings and move them below new bridge if needed */ - hwloc_pci_try_insert_siblings_below_new_bridge(root, new); - return; - } - } - /* add to the end of the list if higher than everybody */ - hwloc_pci_add_child_before(root, NULL, new); -} - -static struct hwloc_obj * -hwloc_pci_fixup_hostbridge_parent(struct hwloc_topology *topology __hwloc_attribute_unused, - struct hwloc_obj *hostbridge, - struct hwloc_obj *parent) -{ - /* Xeon E5v3 in cluster-on-die mode only have PCI on the first NUMA node of each package. - * but many dual-processor host report the second PCI hierarchy on 2nd NUMA of first package. - */ - if (parent->depth >= 2 - && parent->type == HWLOC_OBJ_NUMANODE - && parent->sibling_rank == 1 && parent->parent->arity == 2 - && parent->parent->type == HWLOC_OBJ_PACKAGE - && parent->parent->sibling_rank == 0 && parent->parent->parent->arity == 2) { - const char *cpumodel = hwloc_obj_get_info_by_name(parent->parent, "CPUModel"); - if (cpumodel && strstr(cpumodel, "Xeon")) { - if (!hwloc_hide_errors()) { - fprintf(stderr, "****************************************************************************\n"); - fprintf(stderr, "* hwloc %s has encountered an incorrect PCI locality information.\n", HWLOC_VERSION); - fprintf(stderr, "* PCI bus %04x:%02x is supposedly close to 2nd NUMA node of 1st package,\n", - hostbridge->first_child->attr->pcidev.domain, hostbridge->first_child->attr->pcidev.bus); - fprintf(stderr, "* however hwloc believes this is impossible on this architecture.\n"); - fprintf(stderr, "* Therefore the PCI bus will be moved to 1st NUMA node of 2nd package.\n"); - fprintf(stderr, "*\n"); - fprintf(stderr, "* If you feel this fixup is wrong, disable it by setting in your environment\n"); - fprintf(stderr, "* HWLOC_PCI_%04x_%02x_LOCALCPUS= (empty value), and report the problem\n", - hostbridge->first_child->attr->pcidev.domain, hostbridge->first_child->attr->pcidev.bus); - fprintf(stderr, "* to the hwloc's user mailing list together with the XML output of lstopo.\n"); - fprintf(stderr, "*\n"); - fprintf(stderr, "* You may silence this message by setting HWLOC_HIDE_ERRORS=1 in your environment.\n"); - fprintf(stderr, "****************************************************************************\n"); - } - return parent->parent->next_sibling->first_child; - } - } - - return parent; -} - -static struct hwloc_obj * -hwloc_pci_find_hostbridge_parent(struct hwloc_topology *topology, struct hwloc_backend *backend, - struct hwloc_obj *hostbridge) -{ - hwloc_bitmap_t cpuset = hwloc_bitmap_alloc(); - struct hwloc_obj *parent; - const char *env; - int err; - - /* override the cpuset with the environment if given */ - int forced = 0; - char envname[256]; - snprintf(envname, sizeof(envname), "HWLOC_PCI_%04x_%02x_LOCALCPUS", - hostbridge->first_child->attr->pcidev.domain, hostbridge->first_child->attr->pcidev.bus); - env = getenv(envname); - if (env) - /* if env exists but is empty, don't let quirks change what the OS reports */ - forced = 1; - if (env && *env) { - /* force the hostbridge cpuset */ - hwloc_debug("Overriding localcpus using %s in the environment\n", envname); - hwloc_bitmap_sscanf(cpuset, env); - } else { - /* get the hostbridge cpuset by acking the OS backend. - * it's not a PCI device, so we use its first child locality info. - */ - err = hwloc_backends_get_obj_cpuset(backend, hostbridge->first_child, cpuset); - if (err < 0) - /* if we got nothing, assume the hostbridge is attached to the top of hierarchy */ - hwloc_bitmap_copy(cpuset, hwloc_topology_get_topology_cpuset(topology)); - } - - hwloc_debug_bitmap("Attaching hostbridge to cpuset %s\n", cpuset); - - /* restrict to the existing topology cpuset to avoid errors later */ - hwloc_bitmap_and(cpuset, cpuset, hwloc_topology_get_topology_cpuset(topology)); - - /* if the remaining cpuset is empty, take the root */ - if (hwloc_bitmap_iszero(cpuset)) - hwloc_bitmap_copy(cpuset, hwloc_topology_get_topology_cpuset(topology)); - - /* attach the hostbridge now that it contains the right objects */ - parent = hwloc_get_obj_covering_cpuset(topology, cpuset); - /* in the worst case, we got the root object */ - - if (hwloc_bitmap_isequal(cpuset, parent->cpuset)) { - /* this object has the right cpuset, but it could be a cache or so, - * go up as long as the cpuset is the same - */ - while (parent->parent && hwloc_bitmap_isequal(parent->cpuset, parent->parent->cpuset)) - parent = parent->parent; - - if (!forced) - parent = hwloc_pci_fixup_hostbridge_parent(topology, hostbridge, parent); - - } else { - /* the object we found is too large, insert an intermediate group */ - hwloc_obj_t group_obj = hwloc_alloc_setup_object(HWLOC_OBJ_GROUP, -1); - if (group_obj) { - group_obj->cpuset = hwloc_bitmap_dup(cpuset); - group_obj->complete_cpuset = hwloc_bitmap_dup(cpuset); - group_obj->attr->group.depth = (unsigned) -1; - parent = hwloc__insert_object_by_cpuset(topology, group_obj, hwloc_report_os_error); - if (parent == group_obj) - /* if didn't get merged, setup its sets */ - hwloc_fill_object_sets(group_obj); - if (!parent) - /* Failed to insert the parent, maybe a conflicting cpuset, attach to the root object instead */ - parent = hwloc_get_root_obj(topology); - } - } - - hwloc_bitmap_free(cpuset); - - return parent; -} - -int -hwloc_insert_pci_device_list(struct hwloc_backend *backend, - struct hwloc_obj *first_obj) -{ - struct hwloc_topology *topology = backend->topology; - struct hwloc_obj fakeparent; - struct hwloc_obj *obj; - unsigned current_hostbridge; - - if (!first_obj) - /* found nothing, exit */ - return 0; - - /* first, organise object as tree under a fake parent object */ - fakeparent.parent = NULL; - fakeparent.first_child = NULL; - fakeparent.last_child = NULL; - while (first_obj) { - obj = first_obj; - first_obj = obj->next_sibling; - hwloc_pci_add_object(&fakeparent, obj); - } - -#ifdef HWLOC_DEBUG - hwloc_debug("%s", "\nPCI hierarchy under fake parent:\n"); - hwloc_pci_traverse(NULL, &fakeparent, hwloc_pci_traverse_print_cb); - hwloc_debug("%s", "\n"); -#endif - - /* walk the hierarchy, and lookup OS devices */ - hwloc_pci_traverse(backend, &fakeparent, hwloc_pci_traverse_lookuposdevices_cb); - - /* - * fakeparent lists all objects connected to any upstream bus in the machine. - * We now create one real hostbridge object per upstream bus. - * It's not actually a PCI device so we have to create it. - */ - current_hostbridge = 0; - while (fakeparent.first_child) { - /* start a new host bridge */ - struct hwloc_obj *hostbridge = hwloc_alloc_setup_object(HWLOC_OBJ_BRIDGE, current_hostbridge++); - struct hwloc_obj *child = fakeparent.first_child; - struct hwloc_obj *next_child; - struct hwloc_obj *parent; - unsigned short current_domain = child->attr->pcidev.domain; - unsigned char current_bus = child->attr->pcidev.bus; - unsigned char current_subordinate = current_bus; - - hwloc_debug("Starting new PCI hostbridge %04x:%02x\n", current_domain, current_bus); - - /* - * attach all objects from the same upstream domain/bus - */ - next_child: - next_child = child->next_sibling; - hwloc_pci_remove_child(&fakeparent, child); - hwloc_pci_add_child_before(hostbridge, NULL, child); - - /* compute hostbridge secondary/subordinate buses */ - if (child->type == HWLOC_OBJ_BRIDGE - && child->attr->bridge.downstream.pci.subordinate_bus > current_subordinate) - current_subordinate = child->attr->bridge.downstream.pci.subordinate_bus; - - /* use next child if it has the same domains/bus */ - child = next_child; - if (child - && child->attr->pcidev.domain == current_domain - && child->attr->pcidev.bus == current_bus) - goto next_child; - - /* finish setting up this hostbridge */ - hostbridge->attr->bridge.upstream_type = HWLOC_OBJ_BRIDGE_HOST; - hostbridge->attr->bridge.downstream_type = HWLOC_OBJ_BRIDGE_PCI; - hostbridge->attr->bridge.downstream.pci.domain = current_domain; - hostbridge->attr->bridge.downstream.pci.secondary_bus = current_bus; - hostbridge->attr->bridge.downstream.pci.subordinate_bus = current_subordinate; - hwloc_debug("New PCI hostbridge %04x:[%02x-%02x]\n", - current_domain, current_bus, current_subordinate); - - /* attach the hostbridge where it belongs */ - parent = hwloc_pci_find_hostbridge_parent(topology, backend, hostbridge); - hwloc_insert_object_by_parent(topology, parent, hostbridge); - } - - return 1; -} - -#define HWLOC_PCI_STATUS 0x06 -#define HWLOC_PCI_STATUS_CAP_LIST 0x10 -#define HWLOC_PCI_CAPABILITY_LIST 0x34 -#define HWLOC_PCI_CAP_LIST_ID 0 -#define HWLOC_PCI_CAP_LIST_NEXT 1 - -unsigned -hwloc_pci_find_cap(const unsigned char *config, unsigned cap) -{ - unsigned char seen[256] = { 0 }; - unsigned char ptr; /* unsigned char to make sure we stay within the 256-byte config space */ - - if (!(config[HWLOC_PCI_STATUS] & HWLOC_PCI_STATUS_CAP_LIST)) - return 0; - - for (ptr = config[HWLOC_PCI_CAPABILITY_LIST] & ~3; - ptr; /* exit if next is 0 */ - ptr = config[ptr + HWLOC_PCI_CAP_LIST_NEXT] & ~3) { - unsigned char id; - - /* Looped around! */ - if (seen[ptr]) - break; - seen[ptr] = 1; - - id = config[ptr + HWLOC_PCI_CAP_LIST_ID]; - if (id == cap) - return ptr; - if (id == 0xff) /* exit if id is 0 or 0xff */ - break; - } - return 0; -} - -#define HWLOC_PCI_EXP_LNKSTA 0x12 -#define HWLOC_PCI_EXP_LNKSTA_SPEED 0x000f -#define HWLOC_PCI_EXP_LNKSTA_WIDTH 0x03f0 - -int -hwloc_pci_find_linkspeed(const unsigned char *config, - unsigned offset, float *linkspeed) -{ - unsigned linksta, speed, width; - float lanespeed; - - memcpy(&linksta, &config[offset + HWLOC_PCI_EXP_LNKSTA], 4); - speed = linksta & HWLOC_PCI_EXP_LNKSTA_SPEED; /* PCIe generation */ - width = (linksta & HWLOC_PCI_EXP_LNKSTA_WIDTH) >> 4; /* how many lanes */ - /* PCIe Gen1 = 2.5GT/s signal-rate per lane with 8/10 encoding = 0.25GB/s data-rate per lane - * PCIe Gen2 = 5 GT/s signal-rate per lane with 8/10 encoding = 0.5 GB/s data-rate per lane - * PCIe Gen3 = 8 GT/s signal-rate per lane with 128/130 encoding = 1 GB/s data-rate per lane - */ - lanespeed = speed <= 2 ? 2.5f * speed * 0.8f : 8.0f * 128/130; /* Gbit/s per lane */ - *linkspeed = lanespeed * width / 8; /* GB/s */ - return 0; -} - -#define HWLOC_PCI_HEADER_TYPE 0x0e -#define HWLOC_PCI_HEADER_TYPE_BRIDGE 1 -#define HWLOC_PCI_CLASS_BRIDGE_PCI 0x0604 -#define HWLOC_PCI_PRIMARY_BUS 0x18 -#define HWLOC_PCI_SECONDARY_BUS 0x19 -#define HWLOC_PCI_SUBORDINATE_BUS 0x1a - -int -hwloc_pci_prepare_bridge(hwloc_obj_t obj, - const unsigned char *config) -{ - unsigned char headertype; - unsigned isbridge; - struct hwloc_pcidev_attr_s *pattr = &obj->attr->pcidev; - struct hwloc_bridge_attr_s *battr; - - headertype = config[HWLOC_PCI_HEADER_TYPE] & 0x7f; - isbridge = (pattr->class_id == HWLOC_PCI_CLASS_BRIDGE_PCI - && headertype == HWLOC_PCI_HEADER_TYPE_BRIDGE); - - if (!isbridge) - return 0; - - battr = &obj->attr->bridge; - - if (config[HWLOC_PCI_PRIMARY_BUS] != pattr->bus) { - /* Sometimes the config space contains 00 instead of the actual primary bus number. - * Always trust the bus ID because it was built by the system which has more information - * to workaround such problems (e.g. ACPI information about PCI parent/children). - */ - hwloc_debug(" %04x:%02x:%02x.%01x bridge with (ignored) invalid PCI_PRIMARY_BUS %02x\n", - pattr->domain, pattr->bus, pattr->dev, pattr->func, config[HWLOC_PCI_PRIMARY_BUS]); - } - - obj->type = HWLOC_OBJ_BRIDGE; - battr->upstream_type = HWLOC_OBJ_BRIDGE_PCI; - battr->downstream_type = HWLOC_OBJ_BRIDGE_PCI; - battr->downstream.pci.domain = pattr->domain; - battr->downstream.pci.secondary_bus = config[HWLOC_PCI_SECONDARY_BUS]; - battr->downstream.pci.subordinate_bus = config[HWLOC_PCI_SUBORDINATE_BUS]; - - if (battr->downstream.pci.secondary_bus <= pattr->bus - || battr->downstream.pci.subordinate_bus <= pattr->bus - || battr->downstream.pci.secondary_bus > battr->downstream.pci.subordinate_bus) { - /* This should catch most cases of invalid bridge information - * (e.g. 00 for secondary and subordinate). - * Ideally we would also check that [secondary-subordinate] is included - * in the parent bridge [secondary+1:subordinate]. But that's hard to do - * because objects may be discovered out of order (especially in the fsroot case). - */ - hwloc_debug(" %04x:%02x:%02x.%01x bridge has invalid secondary-subordinate buses [%02x-%02x]\n", - pattr->domain, pattr->bus, pattr->dev, pattr->func, - battr->downstream.pci.secondary_bus, battr->downstream.pci.subordinate_bus); - hwloc_free_unlinked_object(obj); - return -1; - } - - return 0; -} diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-bgq.c b/opal/mca/hwloc/hwloc1113/hwloc/src/topology-bgq.c deleted file mode 100644 index f3aec626074..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-bgq.c +++ /dev/null @@ -1,245 +0,0 @@ -/* - * Copyright © 2013-2015 Inria. All rights reserved. - * See COPYING in top-level directory. - */ - -#include - -#include -#include -#include - -#include -#include -#include -#include -#include - -#ifndef HWLOC_DISABLE_BGQ_PORT_TEST - -static int -hwloc_look_bgq(struct hwloc_backend *backend) -{ - struct hwloc_topology *topology = backend->topology; - unsigned i; - const char *env; - - if (!topology->levels[0][0]->cpuset) { - /* Nobody created objects yet, setup everything */ - hwloc_bitmap_t set; - hwloc_obj_t obj; - -#define HWLOC_BGQ_CORES 17 /* spare core ignored for now */ - - hwloc_alloc_obj_cpusets(topology->levels[0][0]); - /* mark the 17th core (OS-reserved) as disallowed */ - hwloc_bitmap_clr_range(topology->levels[0][0]->allowed_cpuset, (HWLOC_BGQ_CORES-1)*4, HWLOC_BGQ_CORES*4-1); - - env = getenv("BG_THREADMODEL"); - if (!env || atoi(env) != 2) { - /* process cannot use cores/threads outside of its Kernel_ThreadMask() */ - uint64_t bgmask = Kernel_ThreadMask(Kernel_MyTcoord()); - /* the mask is reversed, manually reverse it */ - for(i=0; i<64; i++) - if (((bgmask >> i) & 1) == 0) - hwloc_bitmap_clr(topology->levels[0][0]->allowed_cpuset, 63-i); - } - - /* a single memory bank */ - set = hwloc_bitmap_alloc(); - hwloc_bitmap_set(set, 0); - topology->levels[0][0]->nodeset = set; - topology->levels[0][0]->memory.local_memory = 16ULL*1024*1024*1024ULL; - - /* package */ - obj = hwloc_alloc_setup_object(HWLOC_OBJ_PACKAGE, 0); - set = hwloc_bitmap_alloc(); - hwloc_bitmap_set_range(set, 0, HWLOC_BGQ_CORES*4-1); - obj->cpuset = set; - hwloc_obj_add_info(obj, "CPUModel", "IBM PowerPC A2"); - hwloc_insert_object_by_cpuset(topology, obj); - - /* shared L2 */ - obj = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); - obj->cpuset = hwloc_bitmap_dup(set); - obj->attr->cache.type = HWLOC_OBJ_CACHE_UNIFIED; - obj->attr->cache.depth = 2; - obj->attr->cache.size = 32*1024*1024; - obj->attr->cache.linesize = 128; - obj->attr->cache.associativity = 16; - hwloc_insert_object_by_cpuset(topology, obj); - - /* Cores */ - for(i=0; icpuset = set; - hwloc_insert_object_by_cpuset(topology, obj); - /* L1d */ - obj = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); - obj->cpuset = hwloc_bitmap_dup(set); - obj->attr->cache.type = HWLOC_OBJ_CACHE_DATA; - obj->attr->cache.depth = 1; - obj->attr->cache.size = 16*1024; - obj->attr->cache.linesize = 64; - obj->attr->cache.associativity = 8; - hwloc_insert_object_by_cpuset(topology, obj); - /* L1i */ - obj = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); - obj->cpuset = hwloc_bitmap_dup(set); - obj->attr->cache.type = HWLOC_OBJ_CACHE_INSTRUCTION; - obj->attr->cache.depth = 1; - obj->attr->cache.size = 16*1024; - obj->attr->cache.linesize = 64; - obj->attr->cache.associativity = 4; - hwloc_insert_object_by_cpuset(topology, obj); - /* there's also a L1p "prefetch cache" of 4kB with 128B lines */ - } - - /* PUs */ - hwloc_setup_pu_level(topology, HWLOC_BGQ_CORES*4); - } - - /* Add BGQ specific information */ - - hwloc_obj_add_info(topology->levels[0][0], "Backend", "BGQ"); - if (topology->is_thissystem) - hwloc_add_uname_info(topology, NULL); - return 1; -} - -static int -hwloc_bgq_get_thread_cpubind(hwloc_topology_t topology, pthread_t thread, hwloc_bitmap_t hwloc_set, int flags __hwloc_attribute_unused) -{ - unsigned pu; - cpu_set_t bg_set; - int err; - - if (topology->pid) { - errno = ENOSYS; - return -1; - } - err = pthread_getaffinity_np(thread, sizeof(bg_set), &bg_set); - if (err) { - errno = err; - return -1; - } - for(pu=0; pu<64; pu++) - if (CPU_ISSET(pu, &bg_set)) { - /* the binding cannot contain multiple PUs */ - hwloc_bitmap_only(hwloc_set, pu); - break; - } - return 0; -} - -static int -hwloc_bgq_get_thisthread_cpubind(hwloc_topology_t topology, hwloc_bitmap_t hwloc_set, int flags __hwloc_attribute_unused) -{ - if (topology->pid) { - errno = ENOSYS; - return -1; - } - hwloc_bitmap_only(hwloc_set, Kernel_ProcessorID()); - return 0; -} - -static int -hwloc_bgq_set_thread_cpubind(hwloc_topology_t topology, pthread_t thread, hwloc_const_bitmap_t hwloc_set, int flags) -{ - unsigned pu; - cpu_set_t bg_set; - int err; - - if (topology->pid) { - errno = ENOSYS; - return -1; - } - /* the binding cannot contain multiple PUs. - * keep the first PU only, and error out if STRICT. - */ - if (hwloc_bitmap_weight(hwloc_set) != 1) { - if ((flags & HWLOC_CPUBIND_STRICT)) { - errno = ENOSYS; - return -1; - } - } - pu = hwloc_bitmap_first(hwloc_set); - CPU_ZERO(&bg_set); - CPU_SET(pu, &bg_set); - err = pthread_setaffinity_np(thread, sizeof(bg_set), &bg_set); - if (err) { - errno = err; - return -1; - } - return 0; -} - -static int -hwloc_bgq_set_thisthread_cpubind(hwloc_topology_t topology, hwloc_const_bitmap_t hwloc_set, int flags) -{ - return hwloc_bgq_set_thread_cpubind(topology, pthread_self(), hwloc_set, flags); -} - -void -hwloc_set_bgq_hooks(struct hwloc_binding_hooks *hooks __hwloc_attribute_unused, - struct hwloc_topology_support *support __hwloc_attribute_unused) -{ - hooks->set_thisthread_cpubind = hwloc_bgq_set_thisthread_cpubind; - hooks->set_thread_cpubind = hwloc_bgq_set_thread_cpubind; - hooks->get_thisthread_cpubind = hwloc_bgq_get_thisthread_cpubind; - hooks->get_thread_cpubind = hwloc_bgq_get_thread_cpubind; - /* threads cannot be bound to more than one PU, so get_last_cpu_location == get_cpubind */ - hooks->get_thisthread_last_cpu_location = hwloc_bgq_get_thisthread_cpubind; - /* hooks->get_thread_last_cpu_location = hwloc_bgq_get_thread_cpubind; */ -} - -static struct hwloc_backend * -hwloc_bgq_component_instantiate(struct hwloc_disc_component *component, - const void *_data1 __hwloc_attribute_unused, - const void *_data2 __hwloc_attribute_unused, - const void *_data3 __hwloc_attribute_unused) -{ - struct utsname utsname; - struct hwloc_backend *backend; - const char *env; - int err; - - env = getenv("HWLOC_FORCE_BGQ"); - if (!env || !atoi(env)) { - err = uname(&utsname); - if (err || strcmp(utsname.sysname, "CNK") || strcmp(utsname.machine, "BGQ")) { - fprintf(stderr, "*** Found unexpected uname sysname `%s' machine `%s'\n", utsname.sysname, utsname.machine); - fprintf(stderr, "*** The BGQ backend is only enabled on compute nodes by default (sysname=CNK machine=BGQ)\n"); - fprintf(stderr, "*** Set HWLOC_FORCE_BGQ=1 in the environment to enforce the BGQ backend anyway.\n"); - return NULL; - } - } - - backend = hwloc_backend_alloc(component); - if (!backend) - return NULL; - backend->discover = hwloc_look_bgq; - return backend; -} - -static struct hwloc_disc_component hwloc_bgq_disc_component = { - HWLOC_DISC_COMPONENT_TYPE_GLOBAL, - "bgq", - ~0, - hwloc_bgq_component_instantiate, - 50, - NULL -}; - -const struct hwloc_component hwloc_bgq_component = { - HWLOC_COMPONENT_ABI, - NULL, NULL, - HWLOC_COMPONENT_TYPE_DISC, - 0, - &hwloc_bgq_disc_component -}; - -#endif /* !HWLOC_DISABLE_BGQ_PORT_TEST */ diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-cuda.c b/opal/mca/hwloc/hwloc1113/hwloc/src/topology-cuda.c deleted file mode 100644 index 103319f523f..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-cuda.c +++ /dev/null @@ -1,250 +0,0 @@ -/* - * Copyright © 2011 Université Bordeaux - * Copyright © 2012-2014 Inria. All rights reserved. - * See COPYING in top-level directory. - */ - -#include -#include -#include -#include - -/* private headers allowed for convenience because this plugin is built within hwloc */ -#include -#include - -#include - -struct hwloc_cuda_backend_data_s { - unsigned nr_devices; /* -1 when unknown yet, first callback will setup */ - struct hwloc_cuda_device_info_s { - int idx; - unsigned pcidomain, pcibus, pcidev, pcifunc; - } * devices; -}; - -/* query all PCI bus ids for later */ -static void -hwloc_cuda_query_devices(struct hwloc_cuda_backend_data_s *data) -{ - cudaError_t cures; - int nb, i; - - /* mark the number of devices as 0 in case we fail below, - * so that we don't try again later. - */ - data->nr_devices = 0; - - cures = cudaGetDeviceCount(&nb); - if (cures) - return; - - /* allocate structs */ - data->devices = malloc(nb * sizeof(*data->devices)); - if (!data->devices) - return; - - for (i = 0; i < nb; i++) { - struct hwloc_cuda_device_info_s *info = &data->devices[data->nr_devices]; - int domain, bus, dev; - - if (hwloc_cudart_get_device_pci_ids(NULL /* topology unused */, i, &domain, &bus, &dev)) - continue; - - info->idx = i; - info->pcidomain = (unsigned) domain; - info->pcibus = (unsigned) bus; - info->pcidev = (unsigned) dev; - info->pcifunc = 0; - - /* validate this device */ - data->nr_devices++; - } - - return; -} - -static unsigned hwloc_cuda_cores_per_MP(int major, int minor) -{ - /* based on CUDA C Programming Guide, Annex G */ - switch (major) { - case 1: - switch (minor) { - case 0: - case 1: - case 2: - case 3: return 8; - } - break; - case 2: - switch (minor) { - case 0: return 32; - case 1: return 48; - } - break; - case 3: - return 192; - case 5: - return 128; - case 6: - return 64; - } - hwloc_debug("unknown compute capability %u.%u, disabling core display.\n", major, minor); - return 0; -} - -static int -hwloc_cuda_backend_notify_new_object(struct hwloc_backend *backend, struct hwloc_backend *caller __hwloc_attribute_unused, - struct hwloc_obj *pcidev) -{ - struct hwloc_topology *topology = backend->topology; - struct hwloc_cuda_backend_data_s *data = backend->private_data; - unsigned i; - - if (!(hwloc_topology_get_flags(topology) & (HWLOC_TOPOLOGY_FLAG_IO_DEVICES|HWLOC_TOPOLOGY_FLAG_WHOLE_IO))) - return 0; - - if (!hwloc_topology_is_thissystem(topology)) { - hwloc_debug("%s", "\nno CUDA detection (not thissystem)\n"); - return 0; - } - - if (HWLOC_OBJ_PCI_DEVICE != pcidev->type) - return 0; - - if (data->nr_devices == (unsigned) -1) { - /* first call, lookup all devices */ - hwloc_cuda_query_devices(data); - /* if it fails, data->nr_devices = 0 so we won't do anything below and in next callbacks */ - } - - if (!data->nr_devices) - /* found no devices */ - return 0; - - for(i=0; inr_devices; i++) { - struct hwloc_cuda_device_info_s *info = &data->devices[i]; - char cuda_name[32]; - char number[32]; - struct cudaDeviceProp prop; - hwloc_obj_t cuda_device; - cudaError_t cures; - unsigned cores; - - if (info->pcidomain != pcidev->attr->pcidev.domain) - continue; - if (info->pcibus != pcidev->attr->pcidev.bus) - continue; - if (info->pcidev != pcidev->attr->pcidev.dev) - continue; - if (info->pcifunc != pcidev->attr->pcidev.func) - continue; - - cuda_device = hwloc_alloc_setup_object(HWLOC_OBJ_OS_DEVICE, -1); - snprintf(cuda_name, sizeof(cuda_name), "cuda%d", info->idx); - cuda_device->name = strdup(cuda_name); - cuda_device->depth = (unsigned) HWLOC_TYPE_DEPTH_UNKNOWN; - cuda_device->attr->osdev.type = HWLOC_OBJ_OSDEV_COPROC; - - hwloc_obj_add_info(cuda_device, "CoProcType", "CUDA"); - hwloc_obj_add_info(cuda_device, "Backend", "CUDA"); - hwloc_obj_add_info(cuda_device, "GPUVendor", "NVIDIA Corporation"); - - cures = cudaGetDeviceProperties(&prop, info->idx); - if (!cures) - hwloc_obj_add_info(cuda_device, "GPUModel", prop.name); - - snprintf(number, sizeof(number), "%llu", ((unsigned long long) prop.totalGlobalMem) >> 10); - hwloc_obj_add_info(cuda_device, "CUDAGlobalMemorySize", number); - - snprintf(number, sizeof(number), "%llu", ((unsigned long long) prop.l2CacheSize) >> 10); - hwloc_obj_add_info(cuda_device, "CUDAL2CacheSize", number); - - snprintf(number, sizeof(number), "%d", prop.multiProcessorCount); - hwloc_obj_add_info(cuda_device, "CUDAMultiProcessors", number); - - cores = hwloc_cuda_cores_per_MP(prop.major, prop.minor); - if (cores) { - snprintf(number, sizeof(number), "%u", cores); - hwloc_obj_add_info(cuda_device, "CUDACoresPerMP", number); - } - - snprintf(number, sizeof(number), "%llu", ((unsigned long long) prop.sharedMemPerBlock) >> 10); - hwloc_obj_add_info(cuda_device, "CUDASharedMemorySizePerMP", number); - - hwloc_insert_object_by_parent(topology, pcidev, cuda_device); - return 1; - } - - return 0; -} - -static void -hwloc_cuda_backend_disable(struct hwloc_backend *backend) -{ - struct hwloc_cuda_backend_data_s *data = backend->private_data; - free(data->devices); - free(data); -} - -static struct hwloc_backend * -hwloc_cuda_component_instantiate(struct hwloc_disc_component *component, - const void *_data1 __hwloc_attribute_unused, - const void *_data2 __hwloc_attribute_unused, - const void *_data3 __hwloc_attribute_unused) -{ - struct hwloc_backend *backend; - struct hwloc_cuda_backend_data_s *data; - - /* thissystem may not be fully initialized yet, we'll check flags in discover() */ - - backend = hwloc_backend_alloc(component); - if (!backend) - return NULL; - - data = malloc(sizeof(*data)); - if (!data) { - free(backend); - return NULL; - } - /* the first callback will initialize those */ - data->nr_devices = (unsigned) -1; /* unknown yet */ - data->devices = NULL; - - backend->private_data = data; - backend->disable = hwloc_cuda_backend_disable; - - backend->notify_new_object = hwloc_cuda_backend_notify_new_object; - return backend; -} - -static struct hwloc_disc_component hwloc_cuda_disc_component = { - HWLOC_DISC_COMPONENT_TYPE_MISC, - "cuda", - HWLOC_DISC_COMPONENT_TYPE_GLOBAL, - hwloc_cuda_component_instantiate, - 10, /* after pci */ - NULL -}; - -static int -hwloc_cuda_component_init(unsigned long flags) -{ - if (flags) - return -1; - if (hwloc_plugin_check_namespace("cuda", "hwloc_backend_alloc") < 0) - return -1; - return 0; -} - -#ifdef HWLOC_INSIDE_PLUGIN -HWLOC_DECLSPEC extern const struct hwloc_component hwloc_cuda_component; -#endif - -const struct hwloc_component hwloc_cuda_component = { - HWLOC_COMPONENT_ABI, - hwloc_cuda_component_init, NULL, - HWLOC_COMPONENT_TYPE_DISC, - 0, - &hwloc_cuda_disc_component -}; diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-custom.c b/opal/mca/hwloc/hwloc1113/hwloc/src/topology-custom.c deleted file mode 100644 index c3ccfaadc9c..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-custom.c +++ /dev/null @@ -1,100 +0,0 @@ -/* - * Copyright © 2011-2014 Inria. All rights reserved. - * See COPYING in top-level directory. - */ - -#include -#include -#include - -hwloc_obj_t -hwloc_custom_insert_group_object_by_parent(struct hwloc_topology *topology, hwloc_obj_t parent, int groupdepth) -{ - hwloc_obj_t obj; - - /* must be called between set_custom() and load(), so there's a single backend, the custom one */ - if (topology->is_loaded || !topology->backends || !topology->backends->is_custom) { - errno = EINVAL; - return NULL; - } - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_GROUP, -1); - obj->attr->group.depth = groupdepth; - hwloc_obj_add_info(obj, "Backend", "Custom"); - hwloc_insert_object_by_parent(topology, parent, obj); - /* insert_object_by_parent() doesn't merge during insert, so obj is still valid */ - - return obj; -} - -int -hwloc_custom_insert_topology(struct hwloc_topology *newtopology, - struct hwloc_obj *newparent, - struct hwloc_topology *oldtopology, - struct hwloc_obj *oldroot) -{ - /* must be called between set_custom() and load(), so there's a single backend, the custom one */ - if (newtopology->is_loaded || !newtopology->backends || !newtopology->backends->is_custom) { - errno = EINVAL; - return -1; - } - - if (!oldtopology->is_loaded) { - errno = EINVAL; - return -1; - } - - hwloc__duplicate_objects(newtopology, newparent, oldroot ? oldroot : oldtopology->levels[0][0]); - return 0; -} - -static int -hwloc_look_custom(struct hwloc_backend *backend) -{ - struct hwloc_topology *topology = backend->topology; - hwloc_obj_t root = topology->levels[0][0]; - - assert(!root->cpuset); - - if (!root->first_child) { - errno = EINVAL; - return -1; - } - - root->type = HWLOC_OBJ_SYSTEM; - hwloc_obj_add_info(root, "Backend", "Custom"); - return 1; -} - -static struct hwloc_backend * -hwloc_custom_component_instantiate(struct hwloc_disc_component *component, - const void *_data1 __hwloc_attribute_unused, - const void *_data2 __hwloc_attribute_unused, - const void *_data3 __hwloc_attribute_unused) -{ - struct hwloc_backend *backend; - backend = hwloc_backend_alloc(component); - if (!backend) - return NULL; - backend->discover = hwloc_look_custom; - backend->is_custom = 1; - backend->is_thissystem = 0; - return backend; -} - -static struct hwloc_disc_component hwloc_custom_disc_component = { - HWLOC_DISC_COMPONENT_TYPE_GLOBAL, - "custom", - ~0, - hwloc_custom_component_instantiate, - 30, - NULL -}; - -const struct hwloc_component hwloc_custom_component = { - HWLOC_COMPONENT_ABI, - NULL, NULL, - HWLOC_COMPONENT_TYPE_DISC, - 0, - &hwloc_custom_disc_component -}; diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-gl.c b/opal/mca/hwloc/hwloc1113/hwloc/src/topology-gl.c deleted file mode 100644 index 45e9e2dff66..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-gl.c +++ /dev/null @@ -1,271 +0,0 @@ -/* - * Copyright © 2012-2013 Blue Brain Project, BBP/EPFL. All rights reserved. - * Copyright © 2012-2014 Inria. All rights reserved. - * See COPYING in top-level directory. - */ - -#include -#include -#include - -/* private headers allowed for convenience because this plugin is built within hwloc */ -#include -#include - -#include -#include -#include -#include -#include - -#define HWLOC_GL_SERVER_MAX 10 -#define HWLOC_GL_SCREEN_MAX 10 -struct hwloc_gl_backend_data_s { - unsigned nr_display; - struct hwloc_gl_display_info_s { - char name[10]; - unsigned port, device; - unsigned pcidomain, pcibus, pcidevice, pcifunc; - char *productname; - } display[HWLOC_GL_SERVER_MAX*HWLOC_GL_SCREEN_MAX]; -}; - -static void -hwloc_gl_query_devices(struct hwloc_gl_backend_data_s *data) -{ - int err; - unsigned i,j; - - /* mark the number of display as 0 in case we fail below, - * so that we don't try again later. - */ - data->nr_display = 0; - - for (i = 0; i < HWLOC_GL_SERVER_MAX; ++i) { - Display* display; - char displayName[10]; - int opcode, event, error; - - /* open X server */ - snprintf(displayName, sizeof(displayName), ":%u", i); - display = XOpenDisplay(displayName); - if (!display) - continue; - - /* Check for NV-CONTROL extension (it's per server) */ - if(!XQueryExtension(display, "NV-CONTROL", &opcode, &event, &error)) { - XCloseDisplay(display); - continue; - } - - for (j = 0; j < (unsigned) ScreenCount(display) && j < HWLOC_GL_SCREEN_MAX; j++) { - struct hwloc_gl_display_info_s *info = &data->display[data->nr_display]; - const int screen = j; - unsigned int *ptr_binary_data; - int data_length; - int gpu_number; - int nv_ctrl_pci_bus; - int nv_ctrl_pci_device; - int nv_ctrl_pci_domain; - int nv_ctrl_pci_func; - char *productname; - - /* the server supports NV-CONTROL but it may contain non-NVIDIA screen that don't support it */ - if (!XNVCTRLIsNvScreen(display, screen)) - continue; - - /* Gets the GPU number attached to the default screen. */ - /* For further details, see the */ - err = XNVCTRLQueryTargetBinaryData (display, NV_CTRL_TARGET_TYPE_X_SCREEN, screen, 0, - NV_CTRL_BINARY_DATA_GPUS_USED_BY_XSCREEN, - (unsigned char **) &ptr_binary_data, &data_length); - if (!err) - continue; - - gpu_number = ptr_binary_data[1]; - free(ptr_binary_data); - -#ifdef NV_CTRL_PCI_DOMAIN - /* Gets the ID's of the GPU defined by gpu_number - * For further details, see the */ - err = XNVCTRLQueryTargetAttribute(display, NV_CTRL_TARGET_TYPE_GPU, gpu_number, 0, - NV_CTRL_PCI_DOMAIN, &nv_ctrl_pci_domain); - if (!err) - continue; -#else - nv_ctrl_pci_domain = 0; -#endif - - err = XNVCTRLQueryTargetAttribute(display, NV_CTRL_TARGET_TYPE_GPU, gpu_number, 0, - NV_CTRL_PCI_BUS, &nv_ctrl_pci_bus); - if (!err) - continue; - - err = XNVCTRLQueryTargetAttribute(display, NV_CTRL_TARGET_TYPE_GPU, gpu_number, 0, - NV_CTRL_PCI_DEVICE, &nv_ctrl_pci_device); - if (!err) - continue; - - err = XNVCTRLQueryTargetAttribute(display, NV_CTRL_TARGET_TYPE_GPU, gpu_number, 0, - NV_CTRL_PCI_FUNCTION, &nv_ctrl_pci_func); - if (!err) - continue; - - productname = NULL; - err = XNVCTRLQueryTargetStringAttribute(display, NV_CTRL_TARGET_TYPE_GPU, gpu_number, 0, - NV_CTRL_STRING_PRODUCT_NAME, &productname); - - snprintf(info->name, sizeof(info->name), ":%u.%u", i, j); - info->port = i; - info->device = j; - info->pcidomain = nv_ctrl_pci_domain; - info->pcibus = nv_ctrl_pci_bus; - info->pcidevice = nv_ctrl_pci_device; - info->pcifunc = nv_ctrl_pci_func; - info->productname = productname; - - hwloc_debug("GL device %s (product %s) on PCI 0000:%02x:%02x.%u\n", info->name, productname, - nv_ctrl_pci_domain, nv_ctrl_pci_bus, nv_ctrl_pci_device, nv_ctrl_pci_func); - - /* validate this device */ - data->nr_display++; - } - XCloseDisplay(display); - } -} - -static int -hwloc_gl_backend_notify_new_object(struct hwloc_backend *backend, struct hwloc_backend *caller __hwloc_attribute_unused, - struct hwloc_obj *pcidev) -{ - struct hwloc_topology *topology = backend->topology; - struct hwloc_gl_backend_data_s *data = backend->private_data; - unsigned i, res; - - if (!(hwloc_topology_get_flags(topology) & (HWLOC_TOPOLOGY_FLAG_IO_DEVICES|HWLOC_TOPOLOGY_FLAG_WHOLE_IO))) - return 0; - - if (!hwloc_topology_is_thissystem(topology)) { - hwloc_debug("%s", "\nno GL detection (not thissystem)\n"); - return 0; - } - - if (HWLOC_OBJ_PCI_DEVICE != pcidev->type) - return 0; - - if (data->nr_display == (unsigned) -1) { - /* first call, lookup all display */ - hwloc_gl_query_devices(data); - /* if it fails, data->nr_display = 0 so we won't do anything below and in next callbacks */ - } - - if (!data->nr_display) - /* found no display */ - return 0; - - /* now the display array is ready to use */ - res = 0; - for(i=0; inr_display; i++) { - struct hwloc_gl_display_info_s *info = &data->display[i]; - hwloc_obj_t osdev; - - if (info->pcidomain != pcidev->attr->pcidev.domain) - continue; - if (info->pcibus != pcidev->attr->pcidev.bus) - continue; - if (info->pcidevice != pcidev->attr->pcidev.dev) - continue; - if (info->pcifunc != pcidev->attr->pcidev.func) - continue; - - osdev = hwloc_alloc_setup_object(HWLOC_OBJ_OS_DEVICE, -1); - osdev->name = strdup(info->name); - osdev->logical_index = -1; - osdev->attr->osdev.type = HWLOC_OBJ_OSDEV_GPU; - hwloc_obj_add_info(osdev, "Backend", "GL"); - hwloc_obj_add_info(osdev, "GPUVendor", "NVIDIA Corporation"); - if (info->productname) - hwloc_obj_add_info(osdev, "GPUModel", info->productname); - hwloc_insert_object_by_parent(topology, pcidev, osdev); - - res++; - /* there may be others */ - } - - return res; -} - -static void -hwloc_gl_backend_disable(struct hwloc_backend *backend) -{ - struct hwloc_gl_backend_data_s *data = backend->private_data; - unsigned i; - if (data->nr_display != (unsigned) -1) { /* could be -1 if --no-io */ - for(i=0; inr_display; i++) { - struct hwloc_gl_display_info_s *info = &data->display[i]; - free(info->productname); - } - } - free(backend->private_data); -} - -static struct hwloc_backend * -hwloc_gl_component_instantiate(struct hwloc_disc_component *component, - const void *_data1 __hwloc_attribute_unused, - const void *_data2 __hwloc_attribute_unused, - const void *_data3 __hwloc_attribute_unused) -{ - struct hwloc_backend *backend; - struct hwloc_gl_backend_data_s *data; - - /* thissystem may not be fully initialized yet, we'll check flags in discover() */ - - backend = hwloc_backend_alloc(component); - if (!backend) - return NULL; - - data = malloc(sizeof(*data)); - if (!data) { - free(backend); - return NULL; - } - /* the first callback will initialize those */ - data->nr_display = (unsigned) -1; /* unknown yet */ - - backend->private_data = data; - backend->disable = hwloc_gl_backend_disable; - - backend->notify_new_object = hwloc_gl_backend_notify_new_object; - return backend; -} - -static struct hwloc_disc_component hwloc_gl_disc_component = { - HWLOC_DISC_COMPONENT_TYPE_MISC, - "gl", - HWLOC_DISC_COMPONENT_TYPE_GLOBAL, - hwloc_gl_component_instantiate, - 10, /* after pci */ - NULL -}; - -static int -hwloc_gl_component_init(unsigned long flags) -{ - if (flags) - return -1; - if (hwloc_plugin_check_namespace("gl", "hwloc_backend_alloc") < 0) - return -1; - return 0; -} - -#ifdef HWLOC_INSIDE_PLUGIN -HWLOC_DECLSPEC extern const struct hwloc_component hwloc_gl_component; -#endif - -const struct hwloc_component hwloc_gl_component = { - HWLOC_COMPONENT_ABI, - hwloc_gl_component_init, NULL, - HWLOC_COMPONENT_TYPE_DISC, - 0, - &hwloc_gl_disc_component -}; diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-hardwired.c b/opal/mca/hwloc/hwloc1113/hwloc/src/topology-hardwired.c deleted file mode 100644 index d448f3d55b7..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-hardwired.c +++ /dev/null @@ -1,197 +0,0 @@ -/* - * Copyright © 2015-2016 Inria. All rights reserved. - * See COPYING in top-level directory. - */ - -#include - -#include -#include - -int hwloc_look_hardwired_fujitsu_k(struct hwloc_topology *topology) -{ - /* If a broken core gets disabled, its bit disappears and other core bits are NOT shifted towards 0. - * Node is not given to user job, not need to handle that case properly. - */ - unsigned i; - hwloc_obj_t obj; - hwloc_bitmap_t set; - - for(i=0; i<8; i++) { - set = hwloc_bitmap_alloc(); - hwloc_bitmap_set(set, i); - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); - obj->cpuset = hwloc_bitmap_dup(set); - obj->attr->cache.type = HWLOC_OBJ_CACHE_INSTRUCTION; - obj->attr->cache.depth = 1; - obj->attr->cache.size = 32*1024; - obj->attr->cache.linesize = 128; - obj->attr->cache.associativity = 2; - hwloc_insert_object_by_cpuset(topology, obj); - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); - obj->cpuset = hwloc_bitmap_dup(set); - obj->attr->cache.type = HWLOC_OBJ_CACHE_DATA; - obj->attr->cache.depth = 1; - obj->attr->cache.size = 32*1024; - obj->attr->cache.linesize = 128; - obj->attr->cache.associativity = 2; - hwloc_insert_object_by_cpuset(topology, obj); - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_CORE, i); - obj->cpuset = set; - hwloc_insert_object_by_cpuset(topology, obj); - } - - set = hwloc_bitmap_alloc(); - hwloc_bitmap_set_range(set, 0, 7); - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); - obj->cpuset = hwloc_bitmap_dup(set); - obj->attr->cache.type = HWLOC_OBJ_CACHE_UNIFIED; - obj->attr->cache.depth = 2; - obj->attr->cache.size = 6*1024*1024; - obj->attr->cache.linesize = 128; - obj->attr->cache.associativity = 12; - hwloc_insert_object_by_cpuset(topology, obj); - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_PACKAGE, 0); - obj->cpuset = set; - hwloc_obj_add_info(obj, "CPUVendor", "Fujitsu"); - hwloc_obj_add_info(obj, "CPUModel", "SPARC64 VIIIfx"); - hwloc_insert_object_by_cpuset(topology, obj); - - hwloc_setup_pu_level(topology, 8); - - return 0; -} - -int hwloc_look_hardwired_fujitsu_fx10(struct hwloc_topology *topology) -{ - /* If a broken core gets disabled, its bit disappears and other core bits are NOT shifted towards 0. - * Node is not given to user job, not need to handle that case properly. - */ - unsigned i; - hwloc_obj_t obj; - hwloc_bitmap_t set; - - for(i=0; i<16; i++) { - set = hwloc_bitmap_alloc(); - hwloc_bitmap_set(set, i); - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); - obj->cpuset = hwloc_bitmap_dup(set); - obj->attr->cache.type = HWLOC_OBJ_CACHE_INSTRUCTION; - obj->attr->cache.depth = 1; - obj->attr->cache.size = 32*1024; - obj->attr->cache.linesize = 128; - obj->attr->cache.associativity = 2; - hwloc_insert_object_by_cpuset(topology, obj); - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); - obj->cpuset = hwloc_bitmap_dup(set); - obj->attr->cache.type = HWLOC_OBJ_CACHE_DATA; - obj->attr->cache.depth = 1; - obj->attr->cache.size = 32*1024; - obj->attr->cache.linesize = 128; - obj->attr->cache.associativity = 2; - hwloc_insert_object_by_cpuset(topology, obj); - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_CORE, i); - obj->cpuset = set; - hwloc_insert_object_by_cpuset(topology, obj); - } - - set = hwloc_bitmap_alloc(); - hwloc_bitmap_set_range(set, 0, 15); - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); - obj->cpuset = hwloc_bitmap_dup(set); - obj->attr->cache.type = HWLOC_OBJ_CACHE_UNIFIED; - obj->attr->cache.depth = 2; - obj->attr->cache.size = 12*1024*1024; - obj->attr->cache.linesize = 128; - obj->attr->cache.associativity = 24; - hwloc_insert_object_by_cpuset(topology, obj); - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_PACKAGE, 0); - obj->cpuset = set; - hwloc_obj_add_info(obj, "CPUVendor", "Fujitsu"); - hwloc_obj_add_info(obj, "CPUModel", "SPARC64 IXfx"); - hwloc_insert_object_by_cpuset(topology, obj); - - hwloc_setup_pu_level(topology, 16); - - return 0; -} - -int hwloc_look_hardwired_fujitsu_fx100(struct hwloc_topology *topology) -{ - /* If a broken core gets disabled, its bit disappears and other core bits are NOT shifted towards 0. - * Node is not given to user job, not need to handle that case properly. - */ - unsigned i; - hwloc_obj_t obj; - hwloc_bitmap_t set; - - for(i=0; i<34; i++) { - set = hwloc_bitmap_alloc(); - hwloc_bitmap_set(set, i); - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); - obj->cpuset = hwloc_bitmap_dup(set); - obj->attr->cache.type = HWLOC_OBJ_CACHE_INSTRUCTION; - obj->attr->cache.depth = 1; - obj->attr->cache.size = 64*1024; - obj->attr->cache.linesize = 256; - obj->attr->cache.associativity = 4; - hwloc_insert_object_by_cpuset(topology, obj); - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); - obj->cpuset = hwloc_bitmap_dup(set); - obj->attr->cache.type = HWLOC_OBJ_CACHE_DATA; - obj->attr->cache.depth = 1; - obj->attr->cache.size = 64*1024; - obj->attr->cache.linesize = 256; - obj->attr->cache.associativity = 4; - hwloc_insert_object_by_cpuset(topology, obj); - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_CORE, i); - obj->cpuset = set; - hwloc_insert_object_by_cpuset(topology, obj); - } - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); - obj->cpuset = hwloc_bitmap_alloc(); - hwloc_bitmap_set_range(obj->cpuset, 0, 15); - hwloc_bitmap_set(obj->cpuset, 32); - obj->attr->cache.type = HWLOC_OBJ_CACHE_UNIFIED; - obj->attr->cache.depth = 2; - obj->attr->cache.size = 12*1024*1024; - obj->attr->cache.linesize = 256; - obj->attr->cache.associativity = 24; - hwloc_insert_object_by_cpuset(topology, obj); - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); - obj->cpuset = hwloc_bitmap_alloc(); - hwloc_bitmap_set_range(obj->cpuset, 16, 31); - hwloc_bitmap_set(obj->cpuset, 33); - obj->attr->cache.type = HWLOC_OBJ_CACHE_UNIFIED; - obj->attr->cache.depth = 2; - obj->attr->cache.size = 12*1024*1024; - obj->attr->cache.linesize = 256; - obj->attr->cache.associativity = 24; - hwloc_insert_object_by_cpuset(topology, obj); - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_PACKAGE, 0); - obj->cpuset = hwloc_bitmap_alloc(); - hwloc_bitmap_set_range(obj->cpuset, 0, 33); - hwloc_obj_add_info(obj, "CPUVendor", "Fujitsu"); - hwloc_obj_add_info(obj, "CPUModel", "SPARC64 XIfx"); - hwloc_insert_object_by_cpuset(topology, obj); - - hwloc_setup_pu_level(topology, 34); - - return 0; -} diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-linux.c b/opal/mca/hwloc/hwloc1113/hwloc/src/topology-linux.c deleted file mode 100644 index fc8dc510ab8..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-linux.c +++ /dev/null @@ -1,5524 +0,0 @@ -/* - * Copyright © 2009 CNRS - * Copyright © 2009-2016 Inria. All rights reserved. - * Copyright © 2009-2013, 2015 Université Bordeaux - * Copyright © 2009-2014 Cisco Systems, Inc. All rights reserved. - * Copyright © 2015 Intel, Inc. All rights reserved. - * Copyright © 2010 IBM - * See COPYING in top-level directory. - */ - -#include -#include -#include -#include -#include -#include -#include - -#include -#include -#include -#include -#include -#ifdef HAVE_DIRENT_H -#include -#endif -#ifdef HAVE_UNISTD_H -#include -#endif -#ifdef HWLOC_HAVE_LIBUDEV -#include -#endif -#include -#include -#include -#include -#include -#include -#include -#if defined HWLOC_HAVE_SET_MEMPOLICY || defined HWLOC_HAVE_MBIND || defined HWLOC_HAVE_MOVE_PAGES -#define migratepages migrate_pages /* workaround broken migratepages prototype in numaif.h before libnuma 2.0.2 */ -#include -#endif - -struct hwloc_linux_backend_data_s { - char *root_path; /* NULL if unused */ - int root_fd; /* The file descriptor for the file system root, used when browsing, e.g., Linux' sysfs and procfs. */ - int is_real_fsroot; /* Boolean saying whether root_fd points to the real filesystem root of the system */ -#ifdef HWLOC_HAVE_LIBUDEV - struct udev *udev; /* Global udev context */ -#endif - char *dumped_hwdata_dirname; - enum { - HWLOC_LINUX_ARCH_X86, /* x86 32 or 64bits, including k1om (KNC) */ - HWLOC_LINUX_ARCH_IA64, - HWLOC_LINUX_ARCH_ARM, - HWLOC_LINUX_ARCH_POWER, - HWLOC_LINUX_ARCH_UNKNOWN - } arch; - int is_knl; - struct utsname utsname; /* fields contain \0 when unknown */ - unsigned fallback_nbprocessors; - unsigned pagesize; - - int deprecated_classlinks_model; /* -2 if never tried, -1 if unknown, 0 if new (device contains class/name), 1 if old (device contains class:name) */ - int mic_need_directlookup; /* if not tried yet, 0 if not needed, 1 if needed */ - unsigned mic_directlookup_id_max; /* -1 if not tried yet, 0 if none to lookup, maxid+1 otherwise */ -}; - - - -/*************************** - * Misc Abstraction layers * - ***************************/ - -#if !(defined HWLOC_HAVE_SCHED_SETAFFINITY) && (defined HWLOC_HAVE_SYSCALL) -/* libc doesn't have support for sched_setaffinity, make system call - * ourselves: */ -# include -# ifndef __NR_sched_setaffinity -# ifdef __i386__ -# define __NR_sched_setaffinity 241 -# elif defined(__x86_64__) -# define __NR_sched_setaffinity 203 -# elif defined(__ia64__) -# define __NR_sched_setaffinity 1231 -# elif defined(__hppa__) -# define __NR_sched_setaffinity 211 -# elif defined(__alpha__) -# define __NR_sched_setaffinity 395 -# elif defined(__s390__) -# define __NR_sched_setaffinity 239 -# elif defined(__sparc__) -# define __NR_sched_setaffinity 261 -# elif defined(__m68k__) -# define __NR_sched_setaffinity 311 -# elif defined(__powerpc__) || defined(__ppc__) || defined(__PPC__) || defined(__powerpc64__) || defined(__ppc64__) -# define __NR_sched_setaffinity 222 -# elif defined(__arm__) -# define __NR_sched_setaffinity 241 -# elif defined(__cris__) -# define __NR_sched_setaffinity 241 -/*# elif defined(__mips__) - # define __NR_sched_setaffinity TODO (32/64/nabi) */ -# else -# warning "don't know the syscall number for sched_setaffinity on this architecture, will not support binding" -# define sched_setaffinity(pid, lg, mask) (errno = ENOSYS, -1) -# endif -# endif -# ifndef sched_setaffinity -# define sched_setaffinity(pid, lg, mask) syscall(__NR_sched_setaffinity, pid, lg, mask) -# endif -# ifndef __NR_sched_getaffinity -# ifdef __i386__ -# define __NR_sched_getaffinity 242 -# elif defined(__x86_64__) -# define __NR_sched_getaffinity 204 -# elif defined(__ia64__) -# define __NR_sched_getaffinity 1232 -# elif defined(__hppa__) -# define __NR_sched_getaffinity 212 -# elif defined(__alpha__) -# define __NR_sched_getaffinity 396 -# elif defined(__s390__) -# define __NR_sched_getaffinity 240 -# elif defined(__sparc__) -# define __NR_sched_getaffinity 260 -# elif defined(__m68k__) -# define __NR_sched_getaffinity 312 -# elif defined(__powerpc__) || defined(__ppc__) || defined(__PPC__) || defined(__powerpc64__) || defined(__ppc64__) -# define __NR_sched_getaffinity 223 -# elif defined(__arm__) -# define __NR_sched_getaffinity 242 -# elif defined(__cris__) -# define __NR_sched_getaffinity 242 -/*# elif defined(__mips__) - # define __NR_sched_getaffinity TODO (32/64/nabi) */ -# else -# warning "don't know the syscall number for sched_getaffinity on this architecture, will not support getting binding" -# define sched_getaffinity(pid, lg, mask) (errno = ENOSYS, -1) -# endif -# endif -# ifndef sched_getaffinity -# define sched_getaffinity(pid, lg, mask) (syscall(__NR_sched_getaffinity, pid, lg, mask) < 0 ? -1 : 0) -# endif -#endif - -/* Added for ntohl() */ -#include - -#ifdef HAVE_OPENAT -/* Use our own filesystem functions if we have openat */ - -static const char * -hwloc_checkat(const char *path, int fsroot_fd) -{ - const char *relative_path; - if (fsroot_fd < 0) { - errno = EBADF; - return NULL; - } - - /* Skip leading slashes. */ - for (relative_path = path; *relative_path == '/'; relative_path++); - - return relative_path; -} - -static int -hwloc_openat(const char *path, int fsroot_fd) -{ - const char *relative_path; - - relative_path = hwloc_checkat(path, fsroot_fd); - if (!relative_path) - return -1; - - return openat (fsroot_fd, relative_path, O_RDONLY); -} - -static FILE * -hwloc_fopenat(const char *path, const char *mode, int fsroot_fd) -{ - int fd; - - if (strcmp(mode, "r")) { - errno = ENOTSUP; - return NULL; - } - - fd = hwloc_openat (path, fsroot_fd); - if (fd == -1) - return NULL; - - return fdopen(fd, mode); -} - -static int -hwloc_accessat(const char *path, int mode, int fsroot_fd) -{ - const char *relative_path; - - relative_path = hwloc_checkat(path, fsroot_fd); - if (!relative_path) - return -1; - - return faccessat(fsroot_fd, relative_path, mode, 0); -} - -static int -hwloc_fstatat(const char *path, struct stat *st, int flags, int fsroot_fd) -{ - const char *relative_path; - - relative_path = hwloc_checkat(path, fsroot_fd); - if (!relative_path) - return -1; - - return fstatat(fsroot_fd, relative_path, st, flags); -} - -static DIR* -hwloc_opendirat(const char *path, int fsroot_fd) -{ - int dir_fd; - const char *relative_path; - - relative_path = hwloc_checkat(path, fsroot_fd); - if (!relative_path) - return NULL; - - dir_fd = openat(fsroot_fd, relative_path, O_RDONLY | O_DIRECTORY); - if (dir_fd < 0) - return NULL; - - return fdopendir(dir_fd); -} - -#endif /* HAVE_OPENAT */ - -/* Static inline version of fopen so that we can use openat if we have - it, but still preserve compiler parameter checking */ -static __hwloc_inline int -hwloc_open(const char *p, int d __hwloc_attribute_unused) -{ -#ifdef HAVE_OPENAT - return hwloc_openat(p, d); -#else - return open(p, O_RDONLY); -#endif -} - -static __hwloc_inline FILE * -hwloc_fopen(const char *p, const char *m, int d __hwloc_attribute_unused) -{ -#ifdef HAVE_OPENAT - return hwloc_fopenat(p, m, d); -#else - return fopen(p, m); -#endif -} - -/* Static inline version of access so that we can use openat if we have - it, but still preserve compiler parameter checking */ -static __hwloc_inline int -hwloc_access(const char *p, int m, int d __hwloc_attribute_unused) -{ -#ifdef HAVE_OPENAT - return hwloc_accessat(p, m, d); -#else - return access(p, m); -#endif -} - -static __hwloc_inline int -hwloc_stat(const char *p, struct stat *st, int d __hwloc_attribute_unused) -{ -#ifdef HAVE_OPENAT - return hwloc_fstatat(p, st, 0, d); -#else - return stat(p, st); -#endif -} - -static __hwloc_inline int -hwloc_lstat(const char *p, struct stat *st, int d __hwloc_attribute_unused) -{ -#ifdef HAVE_OPENAT - return hwloc_fstatat(p, st, AT_SYMLINK_NOFOLLOW, d); -#else - return lstat(p, st); -#endif -} - -/* Static inline version of opendir so that we can use openat if we have - it, but still preserve compiler parameter checking */ -static __hwloc_inline DIR * -hwloc_opendir(const char *p, int d __hwloc_attribute_unused) -{ -#ifdef HAVE_OPENAT - return hwloc_opendirat(p, d); -#else - return opendir(p); -#endif -} - - -/***************************** - ******* CpuBind Hooks ******* - *****************************/ - -int -hwloc_linux_set_tid_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, pid_t tid __hwloc_attribute_unused, hwloc_const_bitmap_t hwloc_set __hwloc_attribute_unused) -{ - /* TODO Kerrighed: Use - * int migrate (pid_t pid, int destination_node); - * int migrate_self (int destination_node); - * int thread_migrate (int thread_id, int destination_node); - */ - - /* The resulting binding is always strict */ - -#if defined(HWLOC_HAVE_CPU_SET_S) && !defined(HWLOC_HAVE_OLD_SCHED_SETAFFINITY) - cpu_set_t *plinux_set; - unsigned cpu; - int last; - size_t setsize; - int err; - - last = hwloc_bitmap_last(hwloc_set); - if (last == -1) { - errno = EINVAL; - return -1; - } - - setsize = CPU_ALLOC_SIZE(last+1); - plinux_set = CPU_ALLOC(last+1); - - CPU_ZERO_S(setsize, plinux_set); - hwloc_bitmap_foreach_begin(cpu, hwloc_set) - CPU_SET_S(cpu, setsize, plinux_set); - hwloc_bitmap_foreach_end(); - - err = sched_setaffinity(tid, setsize, plinux_set); - - CPU_FREE(plinux_set); - return err; -#elif defined(HWLOC_HAVE_CPU_SET) - cpu_set_t linux_set; - unsigned cpu; - - CPU_ZERO(&linux_set); - hwloc_bitmap_foreach_begin(cpu, hwloc_set) - CPU_SET(cpu, &linux_set); - hwloc_bitmap_foreach_end(); - -#ifdef HWLOC_HAVE_OLD_SCHED_SETAFFINITY - return sched_setaffinity(tid, &linux_set); -#else /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ - return sched_setaffinity(tid, sizeof(linux_set), &linux_set); -#endif /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ -#elif defined(HWLOC_HAVE_SYSCALL) - unsigned long mask = hwloc_bitmap_to_ulong(hwloc_set); - -#ifdef HWLOC_HAVE_OLD_SCHED_SETAFFINITY - return sched_setaffinity(tid, (void*) &mask); -#else /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ - return sched_setaffinity(tid, sizeof(mask), (void*) &mask); -#endif /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ -#else /* !SYSCALL */ - errno = ENOSYS; - return -1; -#endif /* !SYSCALL */ -} - -#if defined(HWLOC_HAVE_CPU_SET_S) && !defined(HWLOC_HAVE_OLD_SCHED_SETAFFINITY) -static int -hwloc_linux_parse_cpuset_file(FILE *file, hwloc_bitmap_t set) -{ - unsigned long start, stop; - - /* reset to zero first */ - hwloc_bitmap_zero(set); - - while (fscanf(file, "%lu", &start) == 1) - { - int c = fgetc(file); - - stop = start; - - if (c == '-') { - /* Range */ - if (fscanf(file, "%lu", &stop) != 1) { - /* Expected a number here */ - errno = EINVAL; - return -1; - } - c = fgetc(file); - } - - if (c == EOF || c == '\n') { - hwloc_bitmap_set_range(set, start, stop); - break; - } - - if (c != ',') { - /* Expected EOF, EOL, or a comma */ - errno = EINVAL; - return -1; - } - - hwloc_bitmap_set_range(set, start, stop); - } - - return 0; -} - -/* - * On some kernels, sched_getaffinity requires the output size to be larger - * than the kernel cpu_set size (defined by CONFIG_NR_CPUS). - * Try sched_affinity on ourself until we find a nr_cpus value that makes - * the kernel happy. - */ -static int -hwloc_linux_find_kernel_nr_cpus(hwloc_topology_t topology) -{ - static int _nr_cpus = -1; - int nr_cpus = _nr_cpus; - FILE *possible; - - if (nr_cpus != -1) - /* already computed */ - return nr_cpus; - - if (topology->levels[0][0]->complete_cpuset) - /* start with a nr_cpus that may contain the whole topology */ - nr_cpus = hwloc_bitmap_last(topology->levels[0][0]->complete_cpuset) + 1; - if (nr_cpus <= 0) - /* start from scratch, the topology isn't ready yet (complete_cpuset is missing (-1) or empty (0))*/ - nr_cpus = 1; - - possible = fopen("/sys/devices/system/cpu/possible", "r"); /* binding only supported in real fsroot, no need for data->root_fd */ - if (possible) { - hwloc_bitmap_t possible_bitmap = hwloc_bitmap_alloc(); - if (hwloc_linux_parse_cpuset_file(possible, possible_bitmap) == 0) { - int max_possible = hwloc_bitmap_last(possible_bitmap); - - hwloc_debug_bitmap("possible CPUs are %s\n", possible_bitmap); - - if (nr_cpus < max_possible + 1) - nr_cpus = max_possible + 1; - } - fclose(possible); - hwloc_bitmap_free(possible_bitmap); - } - - while (1) { - cpu_set_t *set = CPU_ALLOC(nr_cpus); - size_t setsize = CPU_ALLOC_SIZE(nr_cpus); - int err = sched_getaffinity(0, setsize, set); /* always works, unless setsize is too small */ - CPU_FREE(set); - nr_cpus = setsize * 8; /* that's the value that was actually tested */ - if (!err) - /* found it */ - return _nr_cpus = nr_cpus; - nr_cpus *= 2; - } -} -#endif - -int -hwloc_linux_get_tid_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, pid_t tid __hwloc_attribute_unused, hwloc_bitmap_t hwloc_set __hwloc_attribute_unused) -{ - int err __hwloc_attribute_unused; - /* TODO Kerrighed */ - -#if defined(HWLOC_HAVE_CPU_SET_S) && !defined(HWLOC_HAVE_OLD_SCHED_SETAFFINITY) - cpu_set_t *plinux_set; - unsigned cpu; - int last; - size_t setsize; - int kernel_nr_cpus; - - /* find the kernel nr_cpus so as to use a large enough cpu_set size */ - kernel_nr_cpus = hwloc_linux_find_kernel_nr_cpus(topology); - setsize = CPU_ALLOC_SIZE(kernel_nr_cpus); - plinux_set = CPU_ALLOC(kernel_nr_cpus); - - err = sched_getaffinity(tid, setsize, plinux_set); - - if (err < 0) { - CPU_FREE(plinux_set); - return -1; - } - - last = -1; - if (topology->levels[0][0]->complete_cpuset) - last = hwloc_bitmap_last(topology->levels[0][0]->complete_cpuset); - if (last == -1) - /* round the maximal support number, the topology isn't ready yet (complete_cpuset is missing or empty)*/ - last = kernel_nr_cpus-1; - - hwloc_bitmap_zero(hwloc_set); - for(cpu=0; cpu<=(unsigned) last; cpu++) - if (CPU_ISSET_S(cpu, setsize, plinux_set)) - hwloc_bitmap_set(hwloc_set, cpu); - - CPU_FREE(plinux_set); -#elif defined(HWLOC_HAVE_CPU_SET) - cpu_set_t linux_set; - unsigned cpu; - -#ifdef HWLOC_HAVE_OLD_SCHED_SETAFFINITY - err = sched_getaffinity(tid, &linux_set); -#else /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ - err = sched_getaffinity(tid, sizeof(linux_set), &linux_set); -#endif /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ - if (err < 0) - return -1; - - hwloc_bitmap_zero(hwloc_set); - for(cpu=0; cpud_name, ".") || !strcmp(dirent->d_name, "..")) - continue; - tids[nr_tids++] = atoi(dirent->d_name); - } - - *nr_tidsp = nr_tids; - *tidsp = tids; - return 0; -} - -/* Per-tid callbacks */ -typedef int (*hwloc_linux_foreach_proc_tid_cb_t)(hwloc_topology_t topology, pid_t tid, void *data, int idx); - -static int -hwloc_linux_foreach_proc_tid(hwloc_topology_t topology, - pid_t pid, hwloc_linux_foreach_proc_tid_cb_t cb, - void *data) -{ - char taskdir_path[128]; - DIR *taskdir; - pid_t *tids, *newtids; - unsigned i, nr, newnr, failed = 0, failed_errno = 0; - unsigned retrynr = 0; - int err; - - if (pid) - snprintf(taskdir_path, sizeof(taskdir_path), "/proc/%u/task", (unsigned) pid); - else - snprintf(taskdir_path, sizeof(taskdir_path), "/proc/self/task"); - - taskdir = opendir(taskdir_path); - if (!taskdir) { - if (errno == ENOENT) - errno = EINVAL; - err = -1; - goto out; - } - - /* read the current list of threads */ - err = hwloc_linux_get_proc_tids(taskdir, &nr, &tids); - if (err < 0) - goto out_with_dir; - - retry: - /* apply the callback to all threads */ - failed=0; - for(i=0; i 10) { - /* we tried 10 times, it didn't work, the application is probably creating/destroying many threads, stop trying */ - errno = EAGAIN; - err = -1; - goto out_with_tids; - } - goto retry; - } else { - free(newtids); - } - - /* if all threads failed, return the last errno. */ - if (failed) { - err = -1; - errno = failed_errno; - goto out_with_tids; - } - - err = 0; - out_with_tids: - free(tids); - out_with_dir: - closedir(taskdir); - out: - return err; -} - -/* Per-tid proc_set_cpubind callback and caller. - * Callback data is a hwloc_bitmap_t. */ -static int -hwloc_linux_foreach_proc_tid_set_cpubind_cb(hwloc_topology_t topology, pid_t tid, void *data, int idx __hwloc_attribute_unused) -{ - return hwloc_linux_set_tid_cpubind(topology, tid, (hwloc_bitmap_t) data); -} - -static int -hwloc_linux_set_pid_cpubind(hwloc_topology_t topology, pid_t pid, hwloc_const_bitmap_t hwloc_set, int flags __hwloc_attribute_unused) -{ - return hwloc_linux_foreach_proc_tid(topology, pid, - hwloc_linux_foreach_proc_tid_set_cpubind_cb, - (void*) hwloc_set); -} - -/* Per-tid proc_get_cpubind callback data, callback function and caller */ -struct hwloc_linux_foreach_proc_tid_get_cpubind_cb_data_s { - hwloc_bitmap_t cpuset; - hwloc_bitmap_t tidset; - int flags; -}; - -static int -hwloc_linux_foreach_proc_tid_get_cpubind_cb(hwloc_topology_t topology, pid_t tid, void *_data, int idx) -{ - struct hwloc_linux_foreach_proc_tid_get_cpubind_cb_data_s *data = _data; - hwloc_bitmap_t cpuset = data->cpuset; - hwloc_bitmap_t tidset = data->tidset; - int flags = data->flags; - - if (hwloc_linux_get_tid_cpubind(topology, tid, tidset)) - return -1; - - /* reset the cpuset on first iteration */ - if (!idx) - hwloc_bitmap_zero(cpuset); - - if (flags & HWLOC_CPUBIND_STRICT) { - /* if STRICT, we want all threads to have the same binding */ - if (!idx) { - /* this is the first thread, copy its binding */ - hwloc_bitmap_copy(cpuset, tidset); - } else if (!hwloc_bitmap_isequal(cpuset, tidset)) { - /* this is not the first thread, and it's binding is different */ - errno = EXDEV; - return -1; - } - } else { - /* if not STRICT, just OR all thread bindings */ - hwloc_bitmap_or(cpuset, cpuset, tidset); - } - return 0; -} - -static int -hwloc_linux_get_pid_cpubind(hwloc_topology_t topology, pid_t pid, hwloc_bitmap_t hwloc_set, int flags) -{ - struct hwloc_linux_foreach_proc_tid_get_cpubind_cb_data_s data; - hwloc_bitmap_t tidset = hwloc_bitmap_alloc(); - int ret; - - data.cpuset = hwloc_set; - data.tidset = tidset; - data.flags = flags; - ret = hwloc_linux_foreach_proc_tid(topology, pid, - hwloc_linux_foreach_proc_tid_get_cpubind_cb, - (void*) &data); - hwloc_bitmap_free(tidset); - return ret; -} - -static int -hwloc_linux_set_proc_cpubind(hwloc_topology_t topology, pid_t pid, hwloc_const_bitmap_t hwloc_set, int flags) -{ - if (pid == 0) - pid = topology->pid; - if (flags & HWLOC_CPUBIND_THREAD) - return hwloc_linux_set_tid_cpubind(topology, pid, hwloc_set); - else - return hwloc_linux_set_pid_cpubind(topology, pid, hwloc_set, flags); -} - -static int -hwloc_linux_get_proc_cpubind(hwloc_topology_t topology, pid_t pid, hwloc_bitmap_t hwloc_set, int flags) -{ - if (pid == 0) - pid = topology->pid; - if (flags & HWLOC_CPUBIND_THREAD) - return hwloc_linux_get_tid_cpubind(topology, pid, hwloc_set); - else - return hwloc_linux_get_pid_cpubind(topology, pid, hwloc_set, flags); -} - -static int -hwloc_linux_set_thisproc_cpubind(hwloc_topology_t topology, hwloc_const_bitmap_t hwloc_set, int flags) -{ - return hwloc_linux_set_pid_cpubind(topology, topology->pid, hwloc_set, flags); -} - -static int -hwloc_linux_get_thisproc_cpubind(hwloc_topology_t topology, hwloc_bitmap_t hwloc_set, int flags) -{ - return hwloc_linux_get_pid_cpubind(topology, topology->pid, hwloc_set, flags); -} - -static int -hwloc_linux_set_thisthread_cpubind(hwloc_topology_t topology, hwloc_const_bitmap_t hwloc_set, int flags __hwloc_attribute_unused) -{ - if (topology->pid) { - errno = ENOSYS; - return -1; - } - return hwloc_linux_set_tid_cpubind(topology, 0, hwloc_set); -} - -static int -hwloc_linux_get_thisthread_cpubind(hwloc_topology_t topology, hwloc_bitmap_t hwloc_set, int flags __hwloc_attribute_unused) -{ - if (topology->pid) { - errno = ENOSYS; - return -1; - } - return hwloc_linux_get_tid_cpubind(topology, 0, hwloc_set); -} - -#if HAVE_DECL_PTHREAD_SETAFFINITY_NP -#pragma weak pthread_setaffinity_np -#pragma weak pthread_self - -static int -hwloc_linux_set_thread_cpubind(hwloc_topology_t topology, pthread_t tid, hwloc_const_bitmap_t hwloc_set, int flags __hwloc_attribute_unused) -{ - int err; - - if (topology->pid) { - errno = ENOSYS; - return -1; - } - - if (!pthread_self) { - /* ?! Application uses set_thread_cpubind, but doesn't link against libpthread ?! */ - errno = ENOSYS; - return -1; - } - if (tid == pthread_self()) - return hwloc_linux_set_tid_cpubind(topology, 0, hwloc_set); - - if (!pthread_setaffinity_np) { - errno = ENOSYS; - return -1; - } - /* TODO Kerrighed: Use - * int migrate (pid_t pid, int destination_node); - * int migrate_self (int destination_node); - * int thread_migrate (int thread_id, int destination_node); - */ - -#if defined(HWLOC_HAVE_CPU_SET_S) && !defined(HWLOC_HAVE_OLD_SCHED_SETAFFINITY) - /* Use a separate block so that we can define specific variable - types here */ - { - cpu_set_t *plinux_set; - unsigned cpu; - int last; - size_t setsize; - - last = hwloc_bitmap_last(hwloc_set); - if (last == -1) { - errno = EINVAL; - return -1; - } - - setsize = CPU_ALLOC_SIZE(last+1); - plinux_set = CPU_ALLOC(last+1); - - CPU_ZERO_S(setsize, plinux_set); - hwloc_bitmap_foreach_begin(cpu, hwloc_set) - CPU_SET_S(cpu, setsize, plinux_set); - hwloc_bitmap_foreach_end(); - - err = pthread_setaffinity_np(tid, setsize, plinux_set); - - CPU_FREE(plinux_set); - } -#elif defined(HWLOC_HAVE_CPU_SET) - /* Use a separate block so that we can define specific variable - types here */ - { - cpu_set_t linux_set; - unsigned cpu; - - CPU_ZERO(&linux_set); - hwloc_bitmap_foreach_begin(cpu, hwloc_set) - CPU_SET(cpu, &linux_set); - hwloc_bitmap_foreach_end(); - -#ifdef HWLOC_HAVE_OLD_SCHED_SETAFFINITY - err = pthread_setaffinity_np(tid, &linux_set); -#else /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ - err = pthread_setaffinity_np(tid, sizeof(linux_set), &linux_set); -#endif /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ - } -#else /* CPU_SET */ - /* Use a separate block so that we can define specific variable - types here */ - { - unsigned long mask = hwloc_bitmap_to_ulong(hwloc_set); - -#ifdef HWLOC_HAVE_OLD_SCHED_SETAFFINITY - err = pthread_setaffinity_np(tid, (void*) &mask); -#else /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ - err = pthread_setaffinity_np(tid, sizeof(mask), (void*) &mask); -#endif /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ - } -#endif /* CPU_SET */ - - if (err) { - errno = err; - return -1; - } - return 0; -} -#endif /* HAVE_DECL_PTHREAD_SETAFFINITY_NP */ - -#if HAVE_DECL_PTHREAD_GETAFFINITY_NP -#pragma weak pthread_getaffinity_np -#pragma weak pthread_self - -static int -hwloc_linux_get_thread_cpubind(hwloc_topology_t topology, pthread_t tid, hwloc_bitmap_t hwloc_set, int flags __hwloc_attribute_unused) -{ - int err; - - if (topology->pid) { - errno = ENOSYS; - return -1; - } - - if (!pthread_self) { - /* ?! Application uses set_thread_cpubind, but doesn't link against libpthread ?! */ - errno = ENOSYS; - return -1; - } - if (tid == pthread_self()) - return hwloc_linux_get_tid_cpubind(topology, 0, hwloc_set); - - if (!pthread_getaffinity_np) { - errno = ENOSYS; - return -1; - } - /* TODO Kerrighed */ - -#if defined(HWLOC_HAVE_CPU_SET_S) && !defined(HWLOC_HAVE_OLD_SCHED_SETAFFINITY) - /* Use a separate block so that we can define specific variable - types here */ - { - cpu_set_t *plinux_set; - unsigned cpu; - int last; - size_t setsize; - - last = hwloc_bitmap_last(topology->levels[0][0]->complete_cpuset); - assert (last != -1); - - setsize = CPU_ALLOC_SIZE(last+1); - plinux_set = CPU_ALLOC(last+1); - - err = pthread_getaffinity_np(tid, setsize, plinux_set); - if (err) { - CPU_FREE(plinux_set); - errno = err; - return -1; - } - - hwloc_bitmap_zero(hwloc_set); - for(cpu=0; cpu<=(unsigned) last; cpu++) - if (CPU_ISSET_S(cpu, setsize, plinux_set)) - hwloc_bitmap_set(hwloc_set, cpu); - - CPU_FREE(plinux_set); - } -#elif defined(HWLOC_HAVE_CPU_SET) - /* Use a separate block so that we can define specific variable - types here */ - { - cpu_set_t linux_set; - unsigned cpu; - -#ifdef HWLOC_HAVE_OLD_SCHED_SETAFFINITY - err = pthread_getaffinity_np(tid, &linux_set); -#else /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ - err = pthread_getaffinity_np(tid, sizeof(linux_set), &linux_set); -#endif /* HWLOC_HAVE_OLD_SCHED_SETAFFINITY */ - if (err) { - errno = err; - return -1; - } - - hwloc_bitmap_zero(hwloc_set); - for(cpu=0; cpucpuset; - hwloc_bitmap_t tidset = data->tidset; - - if (hwloc_linux_get_tid_last_cpu_location(topology, tid, tidset)) - return -1; - - /* reset the cpuset on first iteration */ - if (!idx) - hwloc_bitmap_zero(cpuset); - - hwloc_bitmap_or(cpuset, cpuset, tidset); - return 0; -} - -static int -hwloc_linux_get_pid_last_cpu_location(hwloc_topology_t topology, pid_t pid, hwloc_bitmap_t hwloc_set, int flags __hwloc_attribute_unused) -{ - struct hwloc_linux_foreach_proc_tid_get_last_cpu_location_cb_data_s data; - hwloc_bitmap_t tidset = hwloc_bitmap_alloc(); - int ret; - - data.cpuset = hwloc_set; - data.tidset = tidset; - ret = hwloc_linux_foreach_proc_tid(topology, pid, - hwloc_linux_foreach_proc_tid_get_last_cpu_location_cb, - &data); - hwloc_bitmap_free(tidset); - return ret; -} - -static int -hwloc_linux_get_proc_last_cpu_location(hwloc_topology_t topology, pid_t pid, hwloc_bitmap_t hwloc_set, int flags) -{ - if (pid == 0) - pid = topology->pid; - if (flags & HWLOC_CPUBIND_THREAD) - return hwloc_linux_get_tid_last_cpu_location(topology, pid, hwloc_set); - else - return hwloc_linux_get_pid_last_cpu_location(topology, pid, hwloc_set, flags); -} - -static int -hwloc_linux_get_thisproc_last_cpu_location(hwloc_topology_t topology, hwloc_bitmap_t hwloc_set, int flags) -{ - return hwloc_linux_get_pid_last_cpu_location(topology, topology->pid, hwloc_set, flags); -} - -static int -hwloc_linux_get_thisthread_last_cpu_location(hwloc_topology_t topology, hwloc_bitmap_t hwloc_set, int flags __hwloc_attribute_unused) -{ - if (topology->pid) { - errno = ENOSYS; - return -1; - } - return hwloc_linux_get_tid_last_cpu_location(topology, 0, hwloc_set); -} - - - -/*************************** - ****** Membind hooks ****** - ***************************/ - -#if defined HWLOC_HAVE_SET_MEMPOLICY || defined HWLOC_HAVE_MBIND -static int -hwloc_linux_membind_policy_from_hwloc(int *linuxpolicy, hwloc_membind_policy_t policy, int flags) -{ - switch (policy) { - case HWLOC_MEMBIND_DEFAULT: - case HWLOC_MEMBIND_FIRSTTOUCH: - *linuxpolicy = MPOL_DEFAULT; - break; - case HWLOC_MEMBIND_BIND: - if (flags & HWLOC_MEMBIND_STRICT) - *linuxpolicy = MPOL_BIND; - else - *linuxpolicy = MPOL_PREFERRED; - break; - case HWLOC_MEMBIND_INTERLEAVE: - *linuxpolicy = MPOL_INTERLEAVE; - break; - /* TODO: next-touch when (if?) patch applied upstream */ - default: - errno = ENOSYS; - return -1; - } - return 0; -} - -static int -hwloc_linux_membind_mask_from_nodeset(hwloc_topology_t topology __hwloc_attribute_unused, - hwloc_const_nodeset_t nodeset, - unsigned *max_os_index_p, unsigned long **linuxmaskp) -{ - unsigned max_os_index = 0; /* highest os_index + 1 */ - unsigned long *linuxmask; - unsigned i; - hwloc_nodeset_t linux_nodeset = NULL; - - if (hwloc_bitmap_isfull(nodeset)) { - linux_nodeset = hwloc_bitmap_alloc(); - hwloc_bitmap_only(linux_nodeset, 0); - nodeset = linux_nodeset; - } - - max_os_index = hwloc_bitmap_last(nodeset); - if (max_os_index == (unsigned) -1) - max_os_index = 0; - /* add 1 to convert the last os_index into a max_os_index, - * and round up to the nearest multiple of BITS_PER_LONG */ - max_os_index = (max_os_index + 1 + HWLOC_BITS_PER_LONG - 1) & ~(HWLOC_BITS_PER_LONG - 1); - - linuxmask = calloc(max_os_index/HWLOC_BITS_PER_LONG, sizeof(long)); - if (!linuxmask) { - hwloc_bitmap_free(linux_nodeset); - errno = ENOMEM; - return -1; - } - - for(i=0; i= 0) - hwloc_bitmap_set(nodeset, status[i]); - ret = 0; - - out_with_pages: - free(pages); - free(status); - return ret; -} -#endif /* HWLOC_HAVE_MOVE_PAGES */ - -void -hwloc_set_linuxfs_hooks(struct hwloc_binding_hooks *hooks, - struct hwloc_topology_support *support __hwloc_attribute_unused) -{ - hooks->set_thisthread_cpubind = hwloc_linux_set_thisthread_cpubind; - hooks->get_thisthread_cpubind = hwloc_linux_get_thisthread_cpubind; - hooks->set_thisproc_cpubind = hwloc_linux_set_thisproc_cpubind; - hooks->get_thisproc_cpubind = hwloc_linux_get_thisproc_cpubind; - hooks->set_proc_cpubind = hwloc_linux_set_proc_cpubind; - hooks->get_proc_cpubind = hwloc_linux_get_proc_cpubind; -#if HAVE_DECL_PTHREAD_SETAFFINITY_NP - hooks->set_thread_cpubind = hwloc_linux_set_thread_cpubind; -#endif /* HAVE_DECL_PTHREAD_SETAFFINITY_NP */ -#if HAVE_DECL_PTHREAD_GETAFFINITY_NP - hooks->get_thread_cpubind = hwloc_linux_get_thread_cpubind; -#endif /* HAVE_DECL_PTHREAD_GETAFFINITY_NP */ - hooks->get_thisthread_last_cpu_location = hwloc_linux_get_thisthread_last_cpu_location; - hooks->get_thisproc_last_cpu_location = hwloc_linux_get_thisproc_last_cpu_location; - hooks->get_proc_last_cpu_location = hwloc_linux_get_proc_last_cpu_location; -#ifdef HWLOC_HAVE_SET_MEMPOLICY - hooks->set_thisthread_membind = hwloc_linux_set_thisthread_membind; - hooks->get_thisthread_membind = hwloc_linux_get_thisthread_membind; - hooks->get_area_membind = hwloc_linux_get_area_membind; -#endif /* HWLOC_HAVE_SET_MEMPOLICY */ -#ifdef HWLOC_HAVE_MBIND - hooks->set_area_membind = hwloc_linux_set_area_membind; -#ifdef HWLOC_HAVE_MOVE_PAGES - hooks->get_area_memlocation = hwloc_linux_get_area_memlocation; -#endif /* HWLOC_HAVE_MOVE_PAGES */ - hooks->alloc_membind = hwloc_linux_alloc_membind; - hooks->alloc = hwloc_alloc_mmap; - hooks->free_membind = hwloc_free_mmap; - support->membind->firsttouch_membind = 1; - support->membind->bind_membind = 1; - support->membind->interleave_membind = 1; -#endif /* HWLOC_HAVE_MBIND */ -#if (defined HWLOC_HAVE_MIGRATE_PAGES) || ((defined HWLOC_HAVE_MBIND) && (defined MPOL_MF_MOVE)) - support->membind->migrate_membind = 1; -#endif -} - - - -/******************************************* - *** Misc Helpers for Topology Discovery *** - *******************************************/ - -/* cpuinfo array */ -struct hwloc_linux_cpuinfo_proc { - /* set during hwloc_linux_parse_cpuinfo */ - unsigned long Pproc; - /* set during hwloc_linux_parse_cpuinfo or -1 if unknown*/ - long Pcore, Ppkg; - /* set later, or -1 if unknown */ - long Lcore, Lpkg; - - /* custom info, set during hwloc_linux_parse_cpuinfo */ - struct hwloc_obj_info_s *infos; - unsigned infos_count; -}; - -static int -hwloc_parse_sysfs_unsigned(const char *mappath, unsigned *value, int fsroot_fd) -{ - char string[11]; - FILE * fd; - - fd = hwloc_fopen(mappath, "r", fsroot_fd); - if (!fd) { - *value = -1; - return -1; - } - - if (!fgets(string, 11, fd)) { - *value = -1; - fclose(fd); - return -1; - } - *value = strtoul(string, NULL, 10); - - fclose(fd); - - return 0; -} - - -/* kernel cpumaps are composed of an array of 32bits cpumasks */ -#define KERNEL_CPU_MASK_BITS 32 -#define KERNEL_CPU_MAP_LEN (KERNEL_CPU_MASK_BITS/4+2) - -int -hwloc_linux_parse_cpumap_file(FILE *file, hwloc_bitmap_t set) -{ - unsigned long *maps; - unsigned long map; - int nr_maps = 0; - static int nr_maps_allocated = 8; /* only compute the power-of-two above the kernel cpumask size once */ - int i; - - maps = malloc(nr_maps_allocated * sizeof(*maps)); - - /* reset to zero first */ - hwloc_bitmap_zero(set); - - /* parse the whole mask */ - while (fscanf(file, "%lx,", &map) == 1) /* read one kernel cpu mask and the ending comma */ - { - if (nr_maps == nr_maps_allocated) { - nr_maps_allocated *= 2; - maps = realloc(maps, nr_maps_allocated * sizeof(*maps)); - } - - if (!map && !nr_maps) - /* ignore the first map if it's empty */ - continue; - - memmove(&maps[1], &maps[0], nr_maps*sizeof(*maps)); - maps[0] = map; - nr_maps++; - } - - /* convert into a set */ -#if KERNEL_CPU_MASK_BITS == HWLOC_BITS_PER_LONG - for(i=0; imnt_type, "cpuset")) { - hwloc_debug("Found cpuset mount point on %s\n", mntent->mnt_dir); - *cpuset_mntpnt = strdup(mntent->mnt_dir); - break; - } else if (!strcmp(mntent->mnt_type, "cgroup")) { - /* found a cgroup mntpnt */ - char *opt, *opts = mntent->mnt_opts; - int cpuset_opt = 0; - int noprefix_opt = 0; - /* look at options */ - while ((opt = strsep(&opts, ",")) != NULL) { - if (!strcmp(opt, "cpuset")) - cpuset_opt = 1; - else if (!strcmp(opt, "noprefix")) - noprefix_opt = 1; - } - if (!cpuset_opt) - continue; - if (noprefix_opt) { - hwloc_debug("Found cgroup emulating a cpuset mount point on %s\n", mntent->mnt_dir); - *cpuset_mntpnt = strdup(mntent->mnt_dir); - } else { - hwloc_debug("Found cgroup/cpuset mount point on %s\n", mntent->mnt_dir); - *cgroup_mntpnt = strdup(mntent->mnt_dir); - } - break; - } - } - - endmntent(fd); -} - -/* - * Linux cpusets may be managed directly or through cgroup. - * If cgroup is used, tasks get a /proc/pid/cgroup which may contain a - * single line %d:cpuset:. If cpuset are used they get /proc/pid/cpuset - * containing . - */ -static char * -hwloc_read_linux_cpuset_name(int fsroot_fd, hwloc_pid_t pid) -{ -#define CPUSET_NAME_LEN 128 - char cpuset_name[CPUSET_NAME_LEN]; - FILE *fd; - char *tmp; - - /* check whether a cgroup-cpuset is enabled */ - if (!pid) - fd = hwloc_fopen("/proc/self/cgroup", "r", fsroot_fd); - else { - char path[] = "/proc/XXXXXXXXXX/cgroup"; - snprintf(path, sizeof(path), "/proc/%d/cgroup", pid); - fd = hwloc_fopen(path, "r", fsroot_fd); - } - if (fd) { - /* find a cpuset line */ -#define CGROUP_LINE_LEN 256 - char line[CGROUP_LINE_LEN]; - while (fgets(line, sizeof(line), fd)) { - char *end, *colon = strchr(line, ':'); - if (!colon) - continue; - if (strncmp(colon, ":cpuset:", 8)) - continue; - - /* found a cgroup-cpuset line, return the name */ - fclose(fd); - end = strchr(colon, '\n'); - if (end) - *end = '\0'; - hwloc_debug("Found cgroup-cpuset %s\n", colon+8); - return strdup(colon+8); - } - fclose(fd); - } - - /* check whether a cpuset is enabled */ - if (!pid) - fd = hwloc_fopen("/proc/self/cpuset", "r", fsroot_fd); - else { - char path[] = "/proc/XXXXXXXXXX/cpuset"; - snprintf(path, sizeof(path), "/proc/%d/cpuset", pid); - fd = hwloc_fopen(path, "r", fsroot_fd); - } - if (!fd) { - /* found nothing */ - hwloc_debug("%s", "No cgroup or cpuset found\n"); - return NULL; - } - - /* found a cpuset, return the name */ - tmp = fgets(cpuset_name, sizeof(cpuset_name), fd); - fclose(fd); - if (!tmp) - return NULL; - tmp = strchr(cpuset_name, '\n'); - if (tmp) - *tmp = '\0'; - hwloc_debug("Found cpuset %s\n", cpuset_name); - return strdup(cpuset_name); -} - -/* - * Then, the cpuset description is available from either the cgroup or - * the cpuset filesystem (usually mounted in / or /dev) where there - * are cgroup/cpuset.{cpus,mems} or cpuset/{cpus,mems} files. - */ -static char * -hwloc_read_linux_cpuset_mask(const char *cgroup_mntpnt, const char *cpuset_mntpnt, const char *cpuset_name, const char *attr_name, int fsroot_fd) -{ -#define CPUSET_FILENAME_LEN 256 - char cpuset_filename[CPUSET_FILENAME_LEN]; - FILE *fd; - char *info = NULL, *tmp; - ssize_t ssize; - size_t size; - - if (cgroup_mntpnt) { - /* try to read the cpuset from cgroup */ - snprintf(cpuset_filename, CPUSET_FILENAME_LEN, "%s%s/cpuset.%s", cgroup_mntpnt, cpuset_name, attr_name); - hwloc_debug("Trying to read cgroup file <%s>\n", cpuset_filename); - fd = hwloc_fopen(cpuset_filename, "r", fsroot_fd); - if (fd) - goto gotfile; - } else if (cpuset_mntpnt) { - /* try to read the cpuset directly */ - snprintf(cpuset_filename, CPUSET_FILENAME_LEN, "%s%s/%s", cpuset_mntpnt, cpuset_name, attr_name); - hwloc_debug("Trying to read cpuset file <%s>\n", cpuset_filename); - fd = hwloc_fopen(cpuset_filename, "r", fsroot_fd); - if (fd) - goto gotfile; - } - - /* found no cpuset description, ignore it */ - hwloc_debug("Couldn't find cpuset <%s> description, ignoring\n", cpuset_name); - goto out; - -gotfile: - ssize = getline(&info, &size, fd); - fclose(fd); - if (ssize < 0) - goto out; - if (!info) - goto out; - - tmp = strchr(info, '\n'); - if (tmp) - *tmp = '\0'; - -out: - return info; -} - -static void -hwloc_admin_disable_set_from_cpuset(struct hwloc_linux_backend_data_s *data, - const char *cgroup_mntpnt, const char *cpuset_mntpnt, const char *cpuset_name, - const char *attr_name, - hwloc_bitmap_t admin_enabled_cpus_set) -{ - char *cpuset_mask; - char *current, *comma, *tmp; - int prevlast, nextfirst, nextlast; /* beginning/end of enabled-segments */ - hwloc_bitmap_t tmpset; - - cpuset_mask = hwloc_read_linux_cpuset_mask(cgroup_mntpnt, cpuset_mntpnt, cpuset_name, - attr_name, data->root_fd); - if (!cpuset_mask) - return; - - hwloc_debug("found cpuset %s: %s\n", attr_name, cpuset_mask); - - current = cpuset_mask; - prevlast = -1; - - while (1) { - /* save a pointer to the next comma and erase it to simplify things */ - comma = strchr(current, ','); - if (comma) - *comma = '\0'; - - /* find current enabled-segment bounds */ - nextfirst = strtoul(current, &tmp, 0); - if (*tmp == '-') - nextlast = strtoul(tmp+1, NULL, 0); - else - nextlast = nextfirst; - if (prevlast+1 <= nextfirst-1) { - hwloc_debug("%s [%d:%d] excluded by cpuset\n", attr_name, prevlast+1, nextfirst-1); - hwloc_bitmap_clr_range(admin_enabled_cpus_set, prevlast+1, nextfirst-1); - } - - /* switch to next enabled-segment */ - prevlast = nextlast; - if (!comma) - break; - current = comma+1; - } - - hwloc_debug("%s [%d:%d] excluded by cpuset\n", attr_name, prevlast+1, nextfirst-1); - /* no easy way to clear until the infinity */ - tmpset = hwloc_bitmap_alloc(); - hwloc_bitmap_set_range(tmpset, 0, prevlast); - hwloc_bitmap_and(admin_enabled_cpus_set, admin_enabled_cpus_set, tmpset); - hwloc_bitmap_free(tmpset); - - free(cpuset_mask); -} - -static void -hwloc_parse_meminfo_info(struct hwloc_linux_backend_data_s *data, - const char *path, - int prefixlength, - uint64_t *local_memory, - uint64_t *meminfo_hugepages_count, - uint64_t *meminfo_hugepages_size, - int onlytotal) -{ - char string[64]; - FILE *fd; - - fd = hwloc_fopen(path, "r", data->root_fd); - if (!fd) - return; - - while (fgets(string, sizeof(string), fd) && *string != '\0') - { - unsigned long long number; - if (strlen(string) < (size_t) prefixlength) - continue; - if (sscanf(string+prefixlength, "MemTotal: %llu kB", (unsigned long long *) &number) == 1) { - *local_memory = number << 10; - if (onlytotal) - break; - } - else if (!onlytotal) { - if (sscanf(string+prefixlength, "Hugepagesize: %llu", (unsigned long long *) &number) == 1) - *meminfo_hugepages_size = number << 10; - else if (sscanf(string+prefixlength, "HugePages_Free: %llu", (unsigned long long *) &number) == 1) - /* these are free hugepages, not the total amount of huge pages */ - *meminfo_hugepages_count = number; - } - } - - fclose(fd); -} - -#define SYSFS_NUMA_NODE_PATH_LEN 128 - -static void -hwloc_parse_hugepages_info(struct hwloc_linux_backend_data_s *data, - const char *dirpath, - struct hwloc_obj_memory_s *memory, - uint64_t *remaining_local_memory) -{ - DIR *dir; - struct dirent *dirent; - unsigned long index_ = 1; - FILE *hpfd; - char line[64]; - char path[SYSFS_NUMA_NODE_PATH_LEN]; - - dir = hwloc_opendir(dirpath, data->root_fd); - if (dir) { - while ((dirent = readdir(dir)) != NULL) { - if (strncmp(dirent->d_name, "hugepages-", 10)) - continue; - memory->page_types[index_].size = strtoul(dirent->d_name+10, NULL, 0) * 1024ULL; - sprintf(path, "%s/%s/nr_hugepages", dirpath, dirent->d_name); - hpfd = hwloc_fopen(path, "r", data->root_fd); - if (hpfd) { - if (fgets(line, sizeof(line), hpfd)) { - /* these are the actual total amount of huge pages */ - memory->page_types[index_].count = strtoull(line, NULL, 0); - *remaining_local_memory -= memory->page_types[index_].count * memory->page_types[index_].size; - index_++; - } - fclose(hpfd); - } - } - closedir(dir); - memory->page_types_len = index_; - } -} - -static void -hwloc_get_kerrighed_node_meminfo_info(struct hwloc_topology *topology, - struct hwloc_linux_backend_data_s *data, - unsigned long node, struct hwloc_obj_memory_s *memory) -{ - char path[128]; - uint64_t meminfo_hugepages_count, meminfo_hugepages_size = 0; - - if (topology->is_thissystem) { - memory->page_types_len = 2; - memory->page_types = malloc(2*sizeof(*memory->page_types)); - memset(memory->page_types, 0, 2*sizeof(*memory->page_types)); - /* Try to get the hugepage size from sysconf in case we fail to get it from /proc/meminfo later */ -#ifdef HAVE__SC_LARGE_PAGESIZE - memory->page_types[1].size = sysconf(_SC_LARGE_PAGESIZE); -#endif - memory->page_types[0].size = data->pagesize; - } - - snprintf(path, sizeof(path), "/proc/nodes/node%lu/meminfo", node); - hwloc_parse_meminfo_info(data, path, 0 /* no prefix */, - &memory->local_memory, - &meminfo_hugepages_count, &meminfo_hugepages_size, - memory->page_types == NULL); - - if (memory->page_types) { - uint64_t remaining_local_memory = memory->local_memory; - if (meminfo_hugepages_size) { - memory->page_types[1].size = meminfo_hugepages_size; - memory->page_types[1].count = meminfo_hugepages_count; - remaining_local_memory -= meminfo_hugepages_count * meminfo_hugepages_size; - } else { - memory->page_types_len = 1; - } - memory->page_types[0].count = remaining_local_memory / memory->page_types[0].size; - } -} - -static void -hwloc_get_procfs_meminfo_info(struct hwloc_topology *topology, - struct hwloc_linux_backend_data_s *data, - struct hwloc_obj_memory_s *memory) -{ - uint64_t meminfo_hugepages_count, meminfo_hugepages_size = 0; - struct stat st; - int has_sysfs_hugepages = 0; - const char *pagesize_env = getenv("HWLOC_DEBUG_PAGESIZE"); - int types = 2; - int err; - - err = hwloc_stat("/sys/kernel/mm/hugepages", &st, data->root_fd); - if (!err) { - types = 1 + st.st_nlink-2; - has_sysfs_hugepages = 1; - } - - if (topology->is_thissystem || pagesize_env) { - /* we cannot report any page_type info unless we have the page size. - * we'll take it either from the system if local, or from the debug env variable - */ - memory->page_types_len = types; - memory->page_types = calloc(types, sizeof(*memory->page_types)); - } - - if (topology->is_thissystem) { - /* Get the page and hugepage sizes from sysconf */ -#ifdef HAVE__SC_LARGE_PAGESIZE - memory->page_types[1].size = sysconf(_SC_LARGE_PAGESIZE); -#endif - memory->page_types[0].size = data->pagesize; /* might be overwritten later by /proc/meminfo or sysfs */ - } - - hwloc_parse_meminfo_info(data, "/proc/meminfo", 0 /* no prefix */, - &memory->local_memory, - &meminfo_hugepages_count, &meminfo_hugepages_size, - memory->page_types == NULL); - - if (memory->page_types) { - uint64_t remaining_local_memory = memory->local_memory; - if (has_sysfs_hugepages) { - /* read from node%d/hugepages/hugepages-%skB/nr_hugepages */ - hwloc_parse_hugepages_info(data, "/sys/kernel/mm/hugepages", memory, &remaining_local_memory); - } else { - /* use what we found in meminfo */ - if (meminfo_hugepages_size) { - memory->page_types[1].size = meminfo_hugepages_size; - memory->page_types[1].count = meminfo_hugepages_count; - remaining_local_memory -= meminfo_hugepages_count * meminfo_hugepages_size; - } else { - memory->page_types_len = 1; - } - } - - if (pagesize_env) { - /* We cannot get the pagesize if not thissystem, use the env-given one to experience the code during make check */ - memory->page_types[0].size = strtoull(pagesize_env, NULL, 10); - /* If failed, use 4kB */ - if (!memory->page_types[0].size) - memory->page_types[0].size = 4096; - } - assert(memory->page_types[0].size); /* from sysconf if local or from the env */ - /* memory->page_types[1].size from sysconf if local, or from /proc/meminfo, or from sysfs, - * may be 0 if no hugepage support in the kernel */ - - memory->page_types[0].count = remaining_local_memory / memory->page_types[0].size; - } -} - -static void -hwloc_sysfs_node_meminfo_info(struct hwloc_topology *topology, - struct hwloc_linux_backend_data_s *data, - const char *syspath, int node, - struct hwloc_obj_memory_s *memory) -{ - char path[SYSFS_NUMA_NODE_PATH_LEN]; - char meminfopath[SYSFS_NUMA_NODE_PATH_LEN]; - uint64_t meminfo_hugepages_count = 0; - uint64_t meminfo_hugepages_size = 0; - struct stat st; - int has_sysfs_hugepages = 0; - int types = 2; - int err; - - sprintf(path, "%s/node%d/hugepages", syspath, node); - err = hwloc_stat(path, &st, data->root_fd); - if (!err) { - types = 1 + st.st_nlink-2; - has_sysfs_hugepages = 1; - } - - if (topology->is_thissystem) { - memory->page_types_len = types; - memory->page_types = malloc(types*sizeof(*memory->page_types)); - memset(memory->page_types, 0, types*sizeof(*memory->page_types)); - } - - sprintf(meminfopath, "%s/node%d/meminfo", syspath, node); - hwloc_parse_meminfo_info(data, meminfopath, - snprintf(NULL, 0, "Node %d ", node), - &memory->local_memory, - &meminfo_hugepages_count, NULL /* no hugepage size in node-specific meminfo */, - memory->page_types == NULL); - - if (memory->page_types) { - uint64_t remaining_local_memory = memory->local_memory; - if (has_sysfs_hugepages) { - /* read from node%d/hugepages/hugepages-%skB/nr_hugepages */ - hwloc_parse_hugepages_info(data, path, memory, &remaining_local_memory); - } else { - /* get hugepage size from machine-specific meminfo since there is no size in node-specific meminfo, - * hwloc_get_procfs_meminfo_info must have been called earlier */ - meminfo_hugepages_size = topology->levels[0][0]->memory.page_types[1].size; - /* use what we found in meminfo */ - if (meminfo_hugepages_size) { - memory->page_types[1].count = meminfo_hugepages_count; - memory->page_types[1].size = meminfo_hugepages_size; - remaining_local_memory -= meminfo_hugepages_count * meminfo_hugepages_size; - } else { - memory->page_types_len = 1; - } - } - /* update what's remaining as normal pages */ - memory->page_types[0].size = data->pagesize; - memory->page_types[0].count = remaining_local_memory / memory->page_types[0].size; - } -} - -static void -hwloc_parse_node_distance(const char *distancepath, unsigned nbnodes, float *distances, int fsroot_fd) -{ - char string[4096]; /* enough for hundreds of nodes */ - char *tmp, *next; - FILE * fd; - - fd = hwloc_fopen(distancepath, "r", fsroot_fd); - if (!fd) - return; - - if (!fgets(string, sizeof(string), fd)) { - fclose(fd); - return; - } - - tmp = string; - while (tmp) { - unsigned distance = strtoul(tmp, &next, 0); - if (next == tmp) - break; - *distances = (float) distance; - distances++; - nbnodes--; - if (!nbnodes) - break; - tmp = next+1; - } - - fclose(fd); -} - -static void -hwloc__get_dmi_id_one_info(struct hwloc_linux_backend_data_s *data, - hwloc_obj_t obj, - char *path, unsigned pathlen, - const char *dmi_name, const char *hwloc_name) -{ - char dmi_line[64]; - char *tmp; - FILE *fd; - - strcpy(path+pathlen, dmi_name); - fd = hwloc_fopen(path, "r", data->root_fd); - if (!fd) - return; - - dmi_line[0] = '\0'; - tmp = fgets(dmi_line, sizeof(dmi_line), fd); - fclose (fd); - - if (tmp && dmi_line[0] != '\0') { - tmp = strchr(dmi_line, '\n'); - if (tmp) - *tmp = '\0'; - hwloc_debug("found %s '%s'\n", hwloc_name, dmi_line); - hwloc_obj_add_info(obj, hwloc_name, dmi_line); - } -} - -static void -hwloc__get_dmi_id_info(struct hwloc_linux_backend_data_s *data, hwloc_obj_t obj) -{ - char path[128]; - unsigned pathlen; - DIR *dir; - - strcpy(path, "/sys/devices/virtual/dmi/id"); - dir = hwloc_opendir(path, data->root_fd); - if (dir) { - pathlen = 27; - } else { - strcpy(path, "/sys/class/dmi/id"); - dir = hwloc_opendir(path, data->root_fd); - if (dir) - pathlen = 17; - else - return; - } - closedir(dir); - - path[pathlen++] = '/'; - - hwloc__get_dmi_id_one_info(data, obj, path, pathlen, "product_name", "DMIProductName"); - hwloc__get_dmi_id_one_info(data, obj, path, pathlen, "product_version", "DMIProductVersion"); - hwloc__get_dmi_id_one_info(data, obj, path, pathlen, "product_serial", "DMIProductSerial"); - hwloc__get_dmi_id_one_info(data, obj, path, pathlen, "product_uuid", "DMIProductUUID"); - hwloc__get_dmi_id_one_info(data, obj, path, pathlen, "board_vendor", "DMIBoardVendor"); - hwloc__get_dmi_id_one_info(data, obj, path, pathlen, "board_name", "DMIBoardName"); - hwloc__get_dmi_id_one_info(data, obj, path, pathlen, "board_version", "DMIBoardVersion"); - hwloc__get_dmi_id_one_info(data, obj, path, pathlen, "board_serial", "DMIBoardSerial"); - hwloc__get_dmi_id_one_info(data, obj, path, pathlen, "board_asset_tag", "DMIBoardAssetTag"); - hwloc__get_dmi_id_one_info(data, obj, path, pathlen, "chassis_vendor", "DMIChassisVendor"); - hwloc__get_dmi_id_one_info(data, obj, path, pathlen, "chassis_type", "DMIChassisType"); - hwloc__get_dmi_id_one_info(data, obj, path, pathlen, "chassis_version", "DMIChassisVersion"); - hwloc__get_dmi_id_one_info(data, obj, path, pathlen, "chassis_serial", "DMIChassisSerial"); - hwloc__get_dmi_id_one_info(data, obj, path, pathlen, "chassis_asset_tag", "DMIChassisAssetTag"); - hwloc__get_dmi_id_one_info(data, obj, path, pathlen, "bios_vendor", "DMIBIOSVendor"); - hwloc__get_dmi_id_one_info(data, obj, path, pathlen, "bios_version", "DMIBIOSVersion"); - hwloc__get_dmi_id_one_info(data, obj, path, pathlen, "bios_date", "DMIBIOSDate"); - hwloc__get_dmi_id_one_info(data, obj, path, pathlen, "sys_vendor", "DMISysVendor"); -} - -struct hwloc_firmware_dmi_mem_device_header { - unsigned char type; - unsigned char length; - unsigned char handle[2]; - unsigned char phy_mem_handle[2]; - unsigned char mem_err_handle[2]; - unsigned char tot_width[2]; - unsigned char dat_width[2]; - unsigned char size[2]; - unsigned char ff; - unsigned char dev_set; - unsigned char dev_loc_str_num; - unsigned char bank_loc_str_num; - unsigned char mem_type; - unsigned char type_detail[2]; - unsigned char speed[2]; - unsigned char manuf_str_num; - unsigned char serial_str_num; - unsigned char asset_tag_str_num; - unsigned char part_num_str_num; - /* don't include the following fields since we don't need them, - * some old implementations may miss them. - */ -}; - -static int check_dmi_entry(const char *buffer) -{ - /* reject empty strings */ - if (!*buffer) - return 0; - /* reject strings of spaces (at least Dell use this for empty memory slots) */ - if (strspn(buffer, " ") == strlen(buffer)) - return 0; - return 1; -} - -static void -hwloc__get_firmware_dmi_memory_info_one(struct hwloc_topology *topology, - unsigned idx, const char *path, FILE *fd, - struct hwloc_firmware_dmi_mem_device_header *header) -{ - unsigned slen; - char buffer[256]; /* enough for memory device strings, or at least for each of them */ - unsigned foff; /* offset in raw file */ - unsigned boff; /* offset in buffer read from raw file */ - unsigned i; - struct hwloc_obj_info_s *infos = NULL; - unsigned infos_count = 0; - hwloc_obj_t misc; - int foundinfo = 0; - - hwloc__add_info(&infos, &infos_count, "Type", "MemoryModule"); - - /* start after the header */ - foff = header->length; - i = 1; - while (1) { - /* read one buffer */ - if (fseek(fd, foff, SEEK_SET) < 0) - break; - if (!fgets(buffer, sizeof(buffer), fd)) - break; - /* read string at the beginning of the buffer */ - boff = 0; - while (1) { - /* stop on empty string */ - if (!buffer[boff]) - goto done; - /* stop if this string goes to the end of the buffer */ - slen = strlen(buffer+boff); - if (boff + slen+1 == sizeof(buffer)) - break; - /* string didn't get truncated, should be OK */ - if (i == header->manuf_str_num) { - if (check_dmi_entry(buffer+boff)) { - hwloc__add_info(&infos, &infos_count, "Vendor", buffer+boff); - foundinfo = 1; - } - } else if (i == header->serial_str_num) { - if (check_dmi_entry(buffer+boff)) { - hwloc__add_info(&infos, &infos_count, "SerialNumber", buffer+boff); - foundinfo = 1; - } - } else if (i == header->asset_tag_str_num) { - if (check_dmi_entry(buffer+boff)) { - hwloc__add_info(&infos, &infos_count, "AssetTag", buffer+boff); - foundinfo = 1; - } - } else if (i == header->part_num_str_num) { - if (check_dmi_entry(buffer+boff)) { - hwloc__add_info(&infos, &infos_count, "PartNumber", buffer+boff); - foundinfo = 1; - } - } else if (i == header->dev_loc_str_num) { - if (check_dmi_entry(buffer+boff)) { - hwloc__add_info(&infos, &infos_count, "DeviceLocation", buffer+boff); - /* only a location, not an actual info about the device */ - } - } else if (i == header->bank_loc_str_num) { - if (check_dmi_entry(buffer+boff)) { - hwloc__add_info(&infos, &infos_count, "BankLocation", buffer+boff); - /* only a location, not an actual info about the device */ - } - } else { - goto done; - } - /* next string in buffer */ - boff += slen+1; - i++; - } - /* couldn't read a single full string from that buffer, we're screwed */ - if (!boff) { - fprintf(stderr, "hwloc could read a DMI firmware entry #%u in %s\n", - i, path); - break; - } - /* reread buffer after previous string */ - foff += boff; - } - -done: - if (!foundinfo) { - /* found no actual info about the device. if there's only location info, the slot may be empty */ - goto out_with_infos; - } - - misc = hwloc_alloc_setup_object(HWLOC_OBJ_MISC, idx); - if (!misc) - goto out_with_infos; - - hwloc__move_infos(&misc->infos, &misc->infos_count, &infos, &infos_count); - /* FIXME: find a way to identify the corresponding NUMA node and attach these objects there. - * but it means we need to parse DeviceLocation=DIMM_B4 but these vary significantly - * with the vendor, and it's hard to be 100% sure 'B' is second socket. - * Examples at http://sourceforge.net/p/edac-utils/code/HEAD/tree/trunk/src/etc/labels.db - * or https://github.com/grondo/edac-utils/blob/master/src/etc/labels.db - */ - hwloc_insert_object_by_parent(topology, hwloc_get_root_obj(topology), misc); - return; - - out_with_infos: - hwloc__free_infos(infos, infos_count); -} - -static void -hwloc__get_firmware_dmi_memory_info(struct hwloc_topology *topology, - struct hwloc_linux_backend_data_s *data) -{ - char path[128]; - unsigned i; - - for(i=0; ; i++) { - FILE *fd; - struct hwloc_firmware_dmi_mem_device_header header; - int err; - - snprintf(path, sizeof(path), "/sys/firmware/dmi/entries/17-%u/raw", i); - fd = hwloc_fopen(path, "r", data->root_fd); - if (!fd) - break; - - err = fread(&header, sizeof(header), 1, fd); - if (err != 1) - break; - if (header.length < sizeof(header)) { - /* invalid, or too old entry/spec that doesn't contain what we need */ - fclose(fd); - break; - } - - hwloc__get_firmware_dmi_memory_info_one(topology, i, path, fd, &header); - - fclose(fd); - } -} - - -/*********************************** - ****** Device tree Discovery ****** - ***********************************/ - -/* Reads the entire file and returns bytes read if bytes_read != NULL - * Returned pointer can be freed by using free(). */ -static void * -hwloc_read_raw(const char *p, const char *p1, size_t *bytes_read, int root_fd) -{ - char fname[256]; - char *ret = NULL; - struct stat fs; - int file = -1; - - snprintf(fname, sizeof(fname), "%s/%s", p, p1); - - file = hwloc_open(fname, root_fd); - if (-1 == file) { - goto out_no_close; - } - if (fstat(file, &fs)) { - goto out; - } - - ret = (char *) malloc(fs.st_size); - if (NULL != ret) { - ssize_t cb = read(file, ret, fs.st_size); - if (cb == -1) { - free(ret); - ret = NULL; - } else { - if (NULL != bytes_read) - *bytes_read = cb; - } - } - - out: - close(file); - out_no_close: - return ret; -} - -/* Reads the entire file and returns it as a 0-terminated string - * Returned pointer can be freed by using free(). */ -static char * -hwloc_read_str(const char *p, const char *p1, int root_fd) -{ - size_t cb = 0; - char *ret = hwloc_read_raw(p, p1, &cb, root_fd); - if ((NULL != ret) && (0 < cb) && (0 != ret[cb-1])) { - ret = realloc(ret, cb + 1); - ret[cb] = 0; - } - return ret; -} - -/* Reads first 32bit bigendian value */ -static ssize_t -hwloc_read_unit32be(const char *p, const char *p1, uint32_t *buf, int root_fd) -{ - size_t cb = 0; - uint32_t *tmp = hwloc_read_raw(p, p1, &cb, root_fd); - if (sizeof(*buf) != cb) { - errno = EINVAL; - free(tmp); /* tmp is either NULL or contains useless things */ - return -1; - } - *buf = htonl(*tmp); - free(tmp); - return sizeof(*buf); -} - -typedef struct { - unsigned int n, allocated; - struct { - hwloc_bitmap_t cpuset; - uint32_t phandle; - uint32_t l2_cache; - char *name; - } *p; -} device_tree_cpus_t; - -static void -add_device_tree_cpus_node(device_tree_cpus_t *cpus, hwloc_bitmap_t cpuset, - uint32_t l2_cache, uint32_t phandle, const char *name) -{ - if (cpus->n == cpus->allocated) { - if (!cpus->allocated) - cpus->allocated = 64; - else - cpus->allocated *= 2; - cpus->p = realloc(cpus->p, cpus->allocated * sizeof(cpus->p[0])); - } - cpus->p[cpus->n].phandle = phandle; - cpus->p[cpus->n].cpuset = (NULL == cpuset)?NULL:hwloc_bitmap_dup(cpuset); - cpus->p[cpus->n].l2_cache = l2_cache; - cpus->p[cpus->n].name = strdup(name); - ++cpus->n; -} - -/* Walks over the cache list in order to detect nested caches and CPU mask for each */ -static int -look_powerpc_device_tree_discover_cache(device_tree_cpus_t *cpus, - uint32_t phandle, unsigned int *level, hwloc_bitmap_t cpuset) -{ - unsigned int i; - int ret = -1; - if ((NULL == level) || (NULL == cpuset) || phandle == (uint32_t) -1) - return ret; - for (i = 0; i < cpus->n; ++i) { - if (phandle != cpus->p[i].l2_cache) - continue; - if (NULL != cpus->p[i].cpuset) { - hwloc_bitmap_or(cpuset, cpuset, cpus->p[i].cpuset); - ret = 0; - } else { - ++(*level); - if (0 == look_powerpc_device_tree_discover_cache(cpus, - cpus->p[i].phandle, level, cpuset)) - ret = 0; - } - } - return ret; -} - -static void -try__add_cache_from_device_tree_cpu(struct hwloc_topology *topology, - unsigned int level, hwloc_obj_cache_type_t type, - uint32_t cache_line_size, uint32_t cache_size, uint32_t cache_sets, - hwloc_bitmap_t cpuset) -{ - struct hwloc_obj *c = NULL; - - if (0 == cache_size) - return; - - c = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); - c->attr->cache.depth = level; - c->attr->cache.linesize = cache_line_size; - c->attr->cache.size = cache_size; - c->attr->cache.type = type; - if (cache_sets == 1) - /* likely wrong, make it unknown */ - cache_sets = 0; - if (cache_sets && cache_line_size) - c->attr->cache.associativity = cache_size / (cache_sets * cache_line_size); - else - c->attr->cache.associativity = 0; - c->cpuset = hwloc_bitmap_dup(cpuset); - hwloc_debug_2args_bitmap("cache (%s) depth %d has cpuset %s\n", - type == HWLOC_OBJ_CACHE_UNIFIED ? "unified" : (type == HWLOC_OBJ_CACHE_DATA ? "data" : "instruction"), - level, c->cpuset); - hwloc_insert_object_by_cpuset(topology, c); -} - -static void -try_add_cache_from_device_tree_cpu(struct hwloc_topology *topology, - struct hwloc_linux_backend_data_s *data, - const char *cpu, unsigned int level, hwloc_bitmap_t cpuset) -{ - /* d-cache-block-size - ignore */ - /* d-cache-line-size - to read, in bytes */ - /* d-cache-sets - ignore */ - /* d-cache-size - to read, in bytes */ - /* i-cache, same for instruction */ - /* cache-unified only exist if data and instruction caches are unified */ - /* d-tlb-sets - ignore */ - /* d-tlb-size - ignore, always 0 on power6 */ - /* i-tlb-*, same */ - uint32_t d_cache_line_size = 0, d_cache_size = 0, d_cache_sets = 0; - uint32_t i_cache_line_size = 0, i_cache_size = 0, i_cache_sets = 0; - char unified_path[1024]; - struct stat statbuf; - int unified; - - snprintf(unified_path, sizeof(unified_path), "%s/cache-unified", cpu); - unified = (hwloc_stat(unified_path, &statbuf, data->root_fd) == 0); - - hwloc_read_unit32be(cpu, "d-cache-line-size", &d_cache_line_size, - data->root_fd); - hwloc_read_unit32be(cpu, "d-cache-size", &d_cache_size, - data->root_fd); - hwloc_read_unit32be(cpu, "d-cache-sets", &d_cache_sets, - data->root_fd); - hwloc_read_unit32be(cpu, "i-cache-line-size", &i_cache_line_size, - data->root_fd); - hwloc_read_unit32be(cpu, "i-cache-size", &i_cache_size, - data->root_fd); - hwloc_read_unit32be(cpu, "i-cache-sets", &i_cache_sets, - data->root_fd); - - if (!unified) - try__add_cache_from_device_tree_cpu(topology, level, HWLOC_OBJ_CACHE_INSTRUCTION, - i_cache_line_size, i_cache_size, i_cache_sets, cpuset); - try__add_cache_from_device_tree_cpu(topology, level, unified ? HWLOC_OBJ_CACHE_UNIFIED : HWLOC_OBJ_CACHE_DATA, - d_cache_line_size, d_cache_size, d_cache_sets, cpuset); -} - -/* - * Discovers L1/L2/L3 cache information on IBM PowerPC systems for old kernels (RHEL5.*) - * which provide NUMA nodes information without any details - */ -static void -look_powerpc_device_tree(struct hwloc_topology *topology, - struct hwloc_linux_backend_data_s *data) -{ - device_tree_cpus_t cpus; - const char ofroot[] = "/proc/device-tree/cpus"; - unsigned int i; - int root_fd = data->root_fd; - DIR *dt = hwloc_opendir(ofroot, root_fd); - struct dirent *dirent; - - if (NULL == dt) - return; - - /* only works for Power so far, and not useful on ARM */ - if (data->arch != HWLOC_LINUX_ARCH_POWER) - return; - - cpus.n = 0; - cpus.p = NULL; - cpus.allocated = 0; - - while (NULL != (dirent = readdir(dt))) { - char cpu[256]; - char *device_type; - uint32_t reg = -1, l2_cache = -1, phandle = -1; - - if ('.' == dirent->d_name[0]) - continue; - - snprintf(cpu, sizeof(cpu), "%s/%s", ofroot, dirent->d_name); - - device_type = hwloc_read_str(cpu, "device_type", root_fd); - if (NULL == device_type) - continue; - - hwloc_read_unit32be(cpu, "reg", ®, root_fd); - if (hwloc_read_unit32be(cpu, "next-level-cache", &l2_cache, root_fd) == -1) - hwloc_read_unit32be(cpu, "l2-cache", &l2_cache, root_fd); - if (hwloc_read_unit32be(cpu, "phandle", &phandle, root_fd) == -1) - if (hwloc_read_unit32be(cpu, "ibm,phandle", &phandle, root_fd) == -1) - hwloc_read_unit32be(cpu, "linux,phandle", &phandle, root_fd); - - if (0 == strcmp(device_type, "cache")) { - add_device_tree_cpus_node(&cpus, NULL, l2_cache, phandle, dirent->d_name); - } - else if (0 == strcmp(device_type, "cpu")) { - /* Found CPU */ - hwloc_bitmap_t cpuset = NULL; - size_t cb = 0; - uint32_t *threads = hwloc_read_raw(cpu, "ibm,ppc-interrupt-server#s", &cb, root_fd); - uint32_t nthreads = cb / sizeof(threads[0]); - - if (NULL != threads) { - cpuset = hwloc_bitmap_alloc(); - for (i = 0; i < nthreads; ++i) { - if (hwloc_bitmap_isset(topology->levels[0][0]->complete_cpuset, ntohl(threads[i]))) - hwloc_bitmap_set(cpuset, ntohl(threads[i])); - } - free(threads); - } else if ((unsigned int)-1 != reg) { - /* Doesn't work on ARM because cpu "reg" do not start at 0. - * We know the first cpu "reg" is the lowest. The others are likely - * in order assuming the device-tree shows objects in order. - */ - cpuset = hwloc_bitmap_alloc(); - hwloc_bitmap_set(cpuset, reg); - } - - if (NULL == cpuset) { - hwloc_debug("%s has no \"reg\" property, skipping\n", cpu); - } else { - struct hwloc_obj *core = NULL; - add_device_tree_cpus_node(&cpus, cpuset, l2_cache, phandle, dirent->d_name); - - /* Add core */ - core = hwloc_alloc_setup_object(HWLOC_OBJ_CORE, reg); - core->cpuset = hwloc_bitmap_dup(cpuset); - hwloc_insert_object_by_cpuset(topology, core); - - /* Add L1 cache */ - try_add_cache_from_device_tree_cpu(topology, data, cpu, 1, cpuset); - - hwloc_bitmap_free(cpuset); - } - } - free(device_type); - } - closedir(dt); - - /* No cores and L2 cache were found, exiting */ - if (0 == cpus.n) { - hwloc_debug("No cores and L2 cache were found in %s, exiting\n", ofroot); - return; - } - -#ifdef HWLOC_DEBUG - for (i = 0; i < cpus.n; ++i) { - hwloc_debug("%i: %s ibm,phandle=%08X l2_cache=%08X ", - i, cpus.p[i].name, cpus.p[i].phandle, cpus.p[i].l2_cache); - if (NULL == cpus.p[i].cpuset) { - hwloc_debug("%s\n", "no cpuset"); - } else { - hwloc_debug_bitmap("cpuset %s\n", cpus.p[i].cpuset); - } - } -#endif - - /* Scan L2/L3/... caches */ - for (i = 0; i < cpus.n; ++i) { - unsigned int level = 2; - hwloc_bitmap_t cpuset; - /* Skip real CPUs */ - if (NULL != cpus.p[i].cpuset) - continue; - - /* Calculate cache level and CPU mask */ - cpuset = hwloc_bitmap_alloc(); - if (0 == look_powerpc_device_tree_discover_cache(&cpus, - cpus.p[i].phandle, &level, cpuset)) { - char cpu[256]; - snprintf(cpu, sizeof(cpu), "%s/%s", ofroot, cpus.p[i].name); - try_add_cache_from_device_tree_cpu(topology, data, cpu, level, cpuset); - } - hwloc_bitmap_free(cpuset); - } - - /* Do cleanup */ - for (i = 0; i < cpus.n; ++i) { - hwloc_bitmap_free(cpus.p[i].cpuset); - free(cpus.p[i].name); - } - free(cpus.p); -} - -/* Try to add memory-side caches for KNL. - * Returns 0 on success and -1 otherwise */ -static int hwloc_linux_try_add_knl_mcdram_caches(hwloc_topology_t topology, struct hwloc_linux_backend_data_s *data, hwloc_obj_t *nodes, unsigned nbnodes) -{ - char *knl_cache_file; - long long int cache_size = -1; - int associativity = -1; - int inclusiveness = -1; - int line_size = -1; - unsigned i; - FILE *f; - char buffer[512] = {0}; - char *data_beg = NULL; - char *data_end = NULL; - - if (asprintf(&knl_cache_file, "%s/knl_memoryside_cache", data->dumped_hwdata_dirname) < 0) - return -1; - - hwloc_debug("Reading knl cache data from: %s\n", knl_cache_file); - f = hwloc_fopen(knl_cache_file, "r", data->root_fd); - if (!f) { - hwloc_debug("Unable to open KNL data file `%s' (%s)\n", knl_cache_file, strerror(errno)); - free(knl_cache_file); - return -1; - } - free(knl_cache_file); - - data_beg = &buffer[0]; - data_end = data_beg + fread(buffer, 1, sizeof(buffer), f); - - /* file must start with version information, only 1 accepted for now */ - if (strncmp("version: 1\n", data_beg, strlen("version: 1\n"))) { - fprintf(stderr, "Invalid knl_memoryside_cache header, expected \"version: 1\".\n"); - fclose(f); - return -1; - } - data_beg += strlen("version: 1\n"); - - while (data_beg < data_end) { - char *line_end = strstr(data_beg, "\n"); - if (!line_end) - break; - if (!strncmp("cache_size:", data_beg, strlen("cache_size"))) { - sscanf(data_beg, "cache_size: %lld", &cache_size); - hwloc_debug("read cache_size=%lld\n", cache_size); - } else if (!strncmp("line_size:", data_beg, strlen("line_size:"))) { - sscanf(data_beg, "line_size: %d", &line_size); - hwloc_debug("read line_size=%d\n", line_size); - } else if (!strncmp("inclusiveness:", data_beg, strlen("inclusiveness:"))) { - sscanf(data_beg, "inclusiveness: %d", &inclusiveness); - hwloc_debug("read inclusiveness=%d\n", inclusiveness); - } else if (!strncmp("associativity:", data_beg, strlen("associativity:"))) { - sscanf(data_beg, "associativity: %d\n", &associativity); - hwloc_debug("read associativity=%d\n", associativity); - } - data_beg += line_end - data_beg +1; - } - - fclose(f); - - if (line_size == -1 || cache_size == -1 || associativity == -1 || inclusiveness == -1) { - hwloc_debug("Incorrect file format line_size=%d cache_size=%lld associativity=%d inclusiveness=%d\n", - line_size, cache_size, associativity, inclusiveness); - return -1; - } - - for(i=0; icpuset)) - /* one L3 per DDR, none for MCDRAM nodes */ - continue; - - cache = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); - if (!cache) - return -1; - - cache->attr->cache.depth = 3; - cache->attr->cache.type = HWLOC_OBJ_CACHE_UNIFIED; - cache->attr->cache.associativity = associativity; - hwloc_obj_add_info(cache, "Inclusive", inclusiveness ? "1" : "0"); - cache->attr->cache.size = cache_size; - cache->attr->cache.linesize = line_size; - cache->cpuset = hwloc_bitmap_dup(nodes[i]->cpuset); - hwloc_obj_add_info(cache, "Type", "MemorySideCache"); - hwloc_insert_object_by_cpuset(topology, cache); - } - return 0; -} - - - -/************************************** - ****** Sysfs Topology Discovery ****** - **************************************/ - -static int -look_sysfsnode(struct hwloc_topology *topology, - struct hwloc_linux_backend_data_s *data, - const char *path, unsigned *found) -{ - unsigned osnode; - unsigned nbnodes = 0; - DIR *dir; - struct dirent *dirent; - hwloc_bitmap_t nodeset; - - *found = 0; - - /* Get the list of nodes first */ - dir = hwloc_opendir(path, data->root_fd); - if (dir) - { - nodeset = hwloc_bitmap_alloc(); - while ((dirent = readdir(dir)) != NULL) - { - if (strncmp(dirent->d_name, "node", 4)) - continue; - osnode = strtoul(dirent->d_name+4, NULL, 0); - hwloc_bitmap_set(nodeset, osnode); - nbnodes++; - } - closedir(dir); - } - else - return -1; - - if (!nbnodes || (nbnodes == 1 && !data->is_knl)) { /* always keep NUMA for KNL, or configs might look too different */ - hwloc_bitmap_free(nodeset); - return 0; - } - - /* For convenience, put these declarations inside a block. */ - - { - hwloc_obj_t * nodes = calloc(nbnodes, sizeof(hwloc_obj_t)); - unsigned *indexes = calloc(nbnodes, sizeof(unsigned)); - float * distances = NULL; - int failednodes = 0; - unsigned index_; - - if (NULL == nodes || NULL == indexes) { - free(nodes); - free(indexes); - hwloc_bitmap_free(nodeset); - nbnodes = 0; - goto out; - } - - /* Unsparsify node indexes. - * We'll need them later because Linux groups sparse distances - * and keeps them in order in the sysfs distance files. - * It'll simplify things in the meantime. - */ - index_ = 0; - hwloc_bitmap_foreach_begin (osnode, nodeset) { - indexes[index_] = osnode; - index_++; - } hwloc_bitmap_foreach_end(); - hwloc_bitmap_free(nodeset); - -#ifdef HWLOC_DEBUG - hwloc_debug("%s", "NUMA indexes: "); - for (index_ = 0; index_ < nbnodes; index_++) { - hwloc_debug(" %u", indexes[index_]); - } - hwloc_debug("%s", "\n"); -#endif - - /* Create NUMA objects */ - for (index_ = 0; index_ < nbnodes; index_++) { - char nodepath[SYSFS_NUMA_NODE_PATH_LEN]; - hwloc_bitmap_t cpuset; - hwloc_obj_t node, res_obj; - int annotate; - - osnode = indexes[index_]; - - sprintf(nodepath, "%s/node%u/cpumap", path, osnode); - cpuset = hwloc_parse_cpumap(nodepath, data->root_fd); - if (!cpuset) { - /* This NUMA object won't be inserted, we'll ignore distances */ - failednodes++; - continue; - } - - node = hwloc_get_numanode_obj_by_os_index(topology, osnode); - annotate = (node != NULL); - if (!annotate) { - /* create a new node */ - node = hwloc_alloc_setup_object(HWLOC_OBJ_NUMANODE, osnode); - node->cpuset = cpuset; - node->nodeset = hwloc_bitmap_alloc(); - hwloc_bitmap_set(node->nodeset, osnode); - } - hwloc_sysfs_node_meminfo_info(topology, data, path, osnode, &node->memory); - - hwloc_debug_1arg_bitmap("os node %u has cpuset %s\n", - osnode, node->cpuset); - - if (annotate) { - nodes[index_] = node; - } else { - res_obj = hwloc_insert_object_by_cpuset(topology, node); - if (node == res_obj) { - nodes[index_] = node; - } else { - /* We got merged somehow, could be a buggy BIOS reporting wrong NUMA node cpuset. - * This object disappeared, we'll ignore distances */ - failednodes++; - } - } - } - - if (!failednodes && data->is_knl) - hwloc_linux_try_add_knl_mcdram_caches(topology, data, nodes, nbnodes); - - if (failednodes) { - /* failed to read/create some nodes, don't bother reading/fixing - * a distance matrix that would likely be wrong anyway. - */ - nbnodes -= failednodes; - } else if (nbnodes > 1) { - distances = calloc(nbnodes*nbnodes, sizeof(float)); - } - - if (NULL == distances) { - free(nodes); - free(indexes); - goto out; - } - - /* Get actual distances now */ - for (index_ = 0; index_ < nbnodes; index_++) { - char nodepath[SYSFS_NUMA_NODE_PATH_LEN]; - - osnode = indexes[index_]; - - /* Linux nodeX/distance file contains distance from X to other localities (from ACPI SLIT table or so), - * store them in slots X*N...X*N+N-1 */ - sprintf(nodepath, "%s/node%u/distance", path, osnode); - hwloc_parse_node_distance(nodepath, nbnodes, distances+index_*nbnodes, data->root_fd); - } - - if (data->is_knl) { - char *env = getenv("HWLOC_KNL_NUMA_QUIRK"); - if (!(env && !atoi(env)) && nbnodes>=2) { /* SNC2 or SNC4, with 0 or 2/4 MCDRAM, and 0-4 DDR nodes */ - unsigned i, j, closest; - for(i=0; icpuset)) - /* nodes with CPU, that's DDR, skip it */ - continue; - hwloc_obj_add_info(nodes[i], "Type", "MCDRAM"); - - /* DDR is the closest node with CPUs */ - closest = (unsigned)-1; - for(j=0; jcpuset)) - /* nodes without CPU, that's another MCDRAM, skip it */ - continue; - if (closest == (unsigned)-1 || distances[i*nbnodes+j]cpuset = hwloc_bitmap_dup(nodes[i]->cpuset); - cluster->nodeset = hwloc_bitmap_dup(nodes[i]->nodeset); - hwloc_bitmap_or(cluster->cpuset, cluster->cpuset, nodes[closest]->cpuset); - hwloc_bitmap_or(cluster->nodeset, cluster->nodeset, nodes[closest]->nodeset); - hwloc_obj_add_info(cluster, "Type", "Cluster"); - hwloc_insert_object_by_cpuset(topology, cluster); - } - } - /* drop the distance matrix, it contradicts the above NUMA layout groups */ - free(distances); - free(nodes); - free(indexes); - goto out; - } - } - - hwloc_distances_set(topology, HWLOC_OBJ_NUMANODE, nbnodes, indexes, nodes, distances, 0 /* OS cannot force */); - } - - out: - *found = nbnodes; - return 0; -} - -/* Look at Linux' /sys/devices/system/cpu/cpu%d/topology/ */ -static int -look_sysfscpu(struct hwloc_topology *topology, - struct hwloc_linux_backend_data_s *data, - const char *path, - struct hwloc_linux_cpuinfo_proc * cpuinfo_Lprocs, unsigned cpuinfo_numprocs) -{ - hwloc_bitmap_t cpuset; /* Set of cpus for which we have topology information */ -#define CPU_TOPOLOGY_STR_LEN 128 - char str[CPU_TOPOLOGY_STR_LEN]; - DIR *dir; - int i,j; - FILE *fd; - unsigned caches_added, merge_buggy_core_siblings; - hwloc_obj_t packages = NULL; /* temporary list of packages before actual insert in the tree */ - int threadwithcoreid = -1; /* we don't know yet if threads have their own coreids within thread_siblings */ - - /* fill the cpuset of interesting cpus */ - dir = hwloc_opendir(path, data->root_fd); - if (!dir) - return -1; - else { - struct dirent *dirent; - cpuset = hwloc_bitmap_alloc(); - - while ((dirent = readdir(dir)) != NULL) { - unsigned long cpu; - char online[2]; - - if (strncmp(dirent->d_name, "cpu", 3)) - continue; - cpu = strtoul(dirent->d_name+3, NULL, 0); - - /* Maybe we don't have topology information but at least it exists */ - hwloc_bitmap_set(topology->levels[0][0]->complete_cpuset, cpu); - - /* check whether this processor is online */ - sprintf(str, "%s/cpu%lu/online", path, cpu); - fd = hwloc_fopen(str, "r", data->root_fd); - if (fd) { - if (fgets(online, sizeof(online), fd)) { - fclose(fd); - if (atoi(online)) { - hwloc_debug("os proc %lu is online\n", cpu); - } else { - hwloc_debug("os proc %lu is offline\n", cpu); - hwloc_bitmap_clr(topology->levels[0][0]->online_cpuset, cpu); - } - } else { - fclose(fd); - } - } - - /* check whether the kernel exports topology information for this cpu */ - sprintf(str, "%s/cpu%lu/topology", path, cpu); - if (hwloc_access(str, X_OK, data->root_fd) < 0 && errno == ENOENT) { - hwloc_debug("os proc %lu has no accessible %s/cpu%lu/topology\n", - cpu, path, cpu); - continue; - } - - hwloc_bitmap_set(cpuset, cpu); - } - closedir(dir); - } - - topology->support.discovery->pu = 1; - hwloc_debug_1arg_bitmap("found %d cpu topologies, cpuset %s\n", - hwloc_bitmap_weight(cpuset), cpuset); - - merge_buggy_core_siblings = (data->arch == HWLOC_LINUX_ARCH_X86); - caches_added = 0; - hwloc_bitmap_foreach_begin(i, cpuset) - { - hwloc_bitmap_t packageset, coreset, bookset, threadset; - unsigned mypackageid, mycoreid, mybookid; - - /* look at the package */ - mypackageid = 0; /* shut-up the compiler */ - sprintf(str, "%s/cpu%d/topology/physical_package_id", path, i); - hwloc_parse_sysfs_unsigned(str, &mypackageid, data->root_fd); - - sprintf(str, "%s/cpu%d/topology/core_siblings", path, i); - packageset = hwloc_parse_cpumap(str, data->root_fd); - if (packageset && hwloc_bitmap_first(packageset) == i) { - /* first cpu in this package, add the package */ - struct hwloc_obj *package; - - if (merge_buggy_core_siblings) { - /* check for another package with same physical_package_id */ - hwloc_obj_t curpackage = packages; - while (curpackage) { - if (curpackage->os_index == mypackageid) { - /* found another package with same physical_package_id but different core_siblings. - * looks like a buggy kernel on Intel Xeon E5 v3 processor with two rings. - * merge these core_siblings to extend the existing first package object. - */ - static int reported = 0; - if (!reported && !hwloc_hide_errors()) { - char *a, *b; - hwloc_bitmap_asprintf(&a, curpackage->cpuset); - hwloc_bitmap_asprintf(&b, packageset); - fprintf(stderr, "****************************************************************************\n"); - fprintf(stderr, "* hwloc %s has detected buggy sysfs package information: Two packages have\n", HWLOC_VERSION); - fprintf(stderr, "* the same physical package id %u but different core_siblings %s and %s\n", - mypackageid, a, b); - fprintf(stderr, "* hwloc is merging these packages into a single one assuming your Linux kernel\n"); - fprintf(stderr, "* does not support this processor correctly.\n"); - fprintf(stderr, "* You may hide this warning by setting HWLOC_HIDE_ERRORS=1 in the environment.\n"); - fprintf(stderr, "*\n"); - fprintf(stderr, "* If hwloc does not report the right number of packages,\n"); - fprintf(stderr, "* please report this error message to the hwloc user's mailing list,\n"); - fprintf(stderr, "* along with the output+tarball generated by the hwloc-gather-topology script.\n"); - fprintf(stderr, "****************************************************************************\n"); - reported = 1; - free(a); - free(b); - } - hwloc_bitmap_or(curpackage->cpuset, curpackage->cpuset, packageset); - goto package_done; - } - curpackage = curpackage->next_cousin; - } - } - - /* no package with same physical_package_id, create a new one */ - package = hwloc_alloc_setup_object(HWLOC_OBJ_PACKAGE, mypackageid); - package->cpuset = packageset; - hwloc_debug_1arg_bitmap("os package %u has cpuset %s\n", - mypackageid, packageset); - /* add cpuinfo */ - if (cpuinfo_Lprocs) { - for(j=0; j<(int) cpuinfo_numprocs; j++) - if ((int) cpuinfo_Lprocs[j].Pproc == i) { - hwloc__move_infos(&package->infos, &package->infos_count, - &cpuinfo_Lprocs[j].infos, &cpuinfo_Lprocs[j].infos_count); - } - } - /* insert in a temporary list in case we have to modify the cpuset by merging other core_siblings later. - * we'll actually insert the tree at the end of the entire sysfs cpu loop. - */ - package->next_cousin = packages; - packages = package; - - packageset = NULL; /* don't free it */ - } -package_done: - hwloc_bitmap_free(packageset); - - /* look at the core */ - mycoreid = 0; /* shut-up the compiler */ - sprintf(str, "%s/cpu%d/topology/core_id", path, i); - hwloc_parse_sysfs_unsigned(str, &mycoreid, data->root_fd); - - sprintf(str, "%s/cpu%d/topology/thread_siblings", path, i); - coreset = hwloc_parse_cpumap(str, data->root_fd); - - if (coreset) { - if (hwloc_bitmap_weight(coreset) > 1 && threadwithcoreid == -1) { - /* check if this is hyper-threading or different coreids */ - unsigned siblingid, siblingcoreid; - siblingid = hwloc_bitmap_first(coreset); - if (siblingid == (unsigned) i) - siblingid = hwloc_bitmap_next(coreset, i); - siblingcoreid = mycoreid; - sprintf(str, "%s/cpu%d/topology/core_id", path, siblingid); - hwloc_parse_sysfs_unsigned(str, &siblingcoreid, data->root_fd); - threadwithcoreid = (siblingcoreid != mycoreid); - } - if (hwloc_bitmap_first(coreset) == i || threadwithcoreid) { - /* regular core */ - struct hwloc_obj *core = hwloc_alloc_setup_object(HWLOC_OBJ_CORE, mycoreid); - if (threadwithcoreid) - /* amd multicore compute-unit, create one core per thread */ - hwloc_bitmap_only(coreset, i); - core->cpuset = coreset; - hwloc_debug_1arg_bitmap("os core %u has cpuset %s\n", - mycoreid, core->cpuset); - hwloc_insert_object_by_cpuset(topology, core); - coreset = NULL; /* don't free it */ - } - hwloc_bitmap_free(coreset); - } - - /* look at the books */ - mybookid = 0; /* shut-up the compiler */ - sprintf(str, "%s/cpu%d/topology/book_id", path, i); - if (hwloc_parse_sysfs_unsigned(str, &mybookid, data->root_fd) == 0) { - - sprintf(str, "%s/cpu%d/topology/book_siblings", path, i); - bookset = hwloc_parse_cpumap(str, data->root_fd); - if (bookset && hwloc_bitmap_first(bookset) == i) { - struct hwloc_obj *book = hwloc_alloc_setup_object(HWLOC_OBJ_GROUP, mybookid); - book->cpuset = bookset; - hwloc_debug_1arg_bitmap("os book %u has cpuset %s\n", - mybookid, bookset); - hwloc_obj_add_info(book, "Type", "Book"); - hwloc_insert_object_by_cpuset(topology, book); - bookset = NULL; /* don't free it */ - } - hwloc_bitmap_free(bookset); - } - - { - /* look at the thread */ - struct hwloc_obj *thread = hwloc_alloc_setup_object(HWLOC_OBJ_PU, i); - threadset = hwloc_bitmap_alloc(); - hwloc_bitmap_only(threadset, i); - thread->cpuset = threadset; - hwloc_debug_1arg_bitmap("thread %d has cpuset %s\n", - i, threadset); - hwloc_insert_object_by_cpuset(topology, thread); - } - - /* look at the caches */ - for(j=0; j<10; j++) { -#define SHARED_CPU_MAP_STRLEN 128 - char mappath[SHARED_CPU_MAP_STRLEN]; - char str2[20]; /* enough for a level number (one digit) or a type (Data/Instruction/Unified) */ - hwloc_bitmap_t cacheset; - unsigned long kB = 0; - unsigned linesize = 0; - unsigned sets = 0, lines_per_tag = 1; - int depth; /* 0 for L1, .... */ - hwloc_obj_cache_type_t type = HWLOC_OBJ_CACHE_UNIFIED; /* default */ - - /* get the cache level depth */ - sprintf(mappath, "%s/cpu%d/cache/index%d/level", path, i, j); - fd = hwloc_fopen(mappath, "r", data->root_fd); - if (fd) { - char *res = fgets(str2,sizeof(str2), fd); - fclose(fd); - if (res) - depth = strtoul(str2, NULL, 10)-1; - else - continue; - } else - continue; - - /* cache type */ - sprintf(mappath, "%s/cpu%d/cache/index%d/type", path, i, j); - fd = hwloc_fopen(mappath, "r", data->root_fd); - if (fd) { - if (fgets(str2, sizeof(str2), fd)) { - fclose(fd); - if (!strncmp(str2, "Data", 4)) - type = HWLOC_OBJ_CACHE_DATA; - else if (!strncmp(str2, "Unified", 7)) - type = HWLOC_OBJ_CACHE_UNIFIED; - else if (!strncmp(str2, "Instruction", 11)) - type = HWLOC_OBJ_CACHE_INSTRUCTION; - else - continue; - } else { - fclose(fd); - continue; - } - } else - continue; - - /* get the cache size */ - sprintf(mappath, "%s/cpu%d/cache/index%d/size", path, i, j); - fd = hwloc_fopen(mappath, "r", data->root_fd); - if (fd) { - if (fgets(str2,sizeof(str2), fd)) - kB = atol(str2); /* in kB */ - fclose(fd); - } - /* KNL reports L3 with size=0 and full cpuset in cpuid. - * Let hwloc_linux_try_add_knl_mcdram_cache() detect it better. - */ - if (!kB && depth == 2 && data->is_knl) - continue; - - /* get the line size */ - sprintf(mappath, "%s/cpu%d/cache/index%d/coherency_line_size", path, i, j); - fd = hwloc_fopen(mappath, "r", data->root_fd); - if (fd) { - if (fgets(str2,sizeof(str2), fd)) - linesize = atol(str2); /* in bytes */ - fclose(fd); - } - - /* get the number of sets and lines per tag. - * don't take the associativity directly in "ways_of_associativity" because - * some archs (ia64, ppc) put 0 there when fully-associative, while others (x86) put something like -1 there. - */ - sprintf(mappath, "%s/cpu%d/cache/index%d/number_of_sets", path, i, j); - fd = hwloc_fopen(mappath, "r", data->root_fd); - if (fd) { - if (fgets(str2,sizeof(str2), fd)) - sets = atol(str2); - fclose(fd); - } - sprintf(mappath, "%s/cpu%d/cache/index%d/physical_line_partition", path, i, j); - fd = hwloc_fopen(mappath, "r", data->root_fd); - if (fd) { - if (fgets(str2,sizeof(str2), fd)) - lines_per_tag = atol(str2); - fclose(fd); - } - - sprintf(mappath, "%s/cpu%d/cache/index%d/shared_cpu_map", path, i, j); - cacheset = hwloc_parse_cpumap(mappath, data->root_fd); - if (cacheset) { - if (hwloc_bitmap_iszero(cacheset)) { - /* ia64 returning empty L3 and L2i? use the core set instead */ - hwloc_bitmap_free(cacheset); - sprintf(mappath, "%s/cpu%d/topology/thread_siblings", path, i); - cacheset = hwloc_parse_cpumap(mappath, data->root_fd); - } - - if (hwloc_bitmap_first(cacheset) == i) { - /* first cpu in this cache, add the cache */ - struct hwloc_obj *cache = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, -1); - cache->attr->cache.size = kB << 10; - cache->attr->cache.depth = depth+1; - cache->attr->cache.linesize = linesize; - cache->attr->cache.type = type; - if (!linesize || !lines_per_tag || !sets) - cache->attr->cache.associativity = 0; /* unknown */ - else if (sets == 1) - cache->attr->cache.associativity = 0; /* likely wrong, make it unknown */ - else - cache->attr->cache.associativity = (kB << 10) / linesize / lines_per_tag / sets; - cache->cpuset = cacheset; - hwloc_debug_1arg_bitmap("cache depth %d has cpuset %s\n", - depth, cacheset); - hwloc_insert_object_by_cpuset(topology, cache); - cacheset = NULL; /* don't free it */ - ++caches_added; - } - } - hwloc_bitmap_free(cacheset); - } - } - hwloc_bitmap_foreach_end(); - - /* actually insert in the tree now that package cpusets have been fixed-up */ - while (packages) { - hwloc_obj_t next = packages->next_cousin; - packages->next_cousin = NULL; - hwloc_insert_object_by_cpuset(topology, packages); - packages = next; - } - - if (0 == caches_added) - look_powerpc_device_tree(topology, data); - - hwloc_bitmap_free(cpuset); - - return 0; -} - - - -/**************************************** - ****** cpuinfo Topology Discovery ****** - ****************************************/ - -static int -hwloc_linux_parse_cpuinfo_x86(const char *prefix, const char *value, - struct hwloc_obj_info_s **infos, unsigned *infos_count, - int is_global __hwloc_attribute_unused) -{ - if (!strcmp("vendor_id", prefix)) { - hwloc__add_info(infos, infos_count, "CPUVendor", value); - } else if (!strcmp("model name", prefix)) { - hwloc__add_info(infos, infos_count, "CPUModel", value); - } else if (!strcmp("model", prefix)) { - hwloc__add_info(infos, infos_count, "CPUModelNumber", value); - } else if (!strcmp("cpu family", prefix)) { - hwloc__add_info(infos, infos_count, "CPUFamilyNumber", value); - } else if (!strcmp("stepping", prefix)) { - hwloc__add_info(infos, infos_count, "CPUStepping", value); - } - return 0; -} - -static int -hwloc_linux_parse_cpuinfo_ia64(const char *prefix, const char *value, - struct hwloc_obj_info_s **infos, unsigned *infos_count, - int is_global __hwloc_attribute_unused) -{ - if (!strcmp("vendor", prefix)) { - hwloc__add_info(infos, infos_count, "CPUVendor", value); - } else if (!strcmp("model name", prefix)) { - hwloc__add_info(infos, infos_count, "CPUModel", value); - } else if (!strcmp("model", prefix)) { - hwloc__add_info(infos, infos_count, "CPUModelNumber", value); - } else if (!strcmp("family", prefix)) { - hwloc__add_info(infos, infos_count, "CPUFamilyNumber", value); - } - return 0; -} - -static int -hwloc_linux_parse_cpuinfo_arm(const char *prefix, const char *value, - struct hwloc_obj_info_s **infos, unsigned *infos_count, - int is_global __hwloc_attribute_unused) -{ - if (!strcmp("Processor", prefix) /* old kernels with one Processor header */ - || !strcmp("model name", prefix) /* new kernels with one model name per core */) { - hwloc__add_info(infos, infos_count, "CPUModel", value); - } else if (!strcmp("CPU implementer", prefix)) { - hwloc__add_info(infos, infos_count, "CPUImplementer", value); - } else if (!strcmp("CPU architecture", prefix)) { - hwloc__add_info(infos, infos_count, "CPUArchitecture", value); - } else if (!strcmp("CPU variant", prefix)) { - hwloc__add_info(infos, infos_count, "CPUVariant", value); - } else if (!strcmp("CPU part", prefix)) { - hwloc__add_info(infos, infos_count, "CPUPart", value); - } else if (!strcmp("CPU revision", prefix)) { - hwloc__add_info(infos, infos_count, "CPURevision", value); - } else if (!strcmp("Hardware", prefix)) { - hwloc__add_info(infos, infos_count, "HardwareName", value); - } else if (!strcmp("Revision", prefix)) { - hwloc__add_info(infos, infos_count, "HardwareRevision", value); - } else if (!strcmp("Serial", prefix)) { - hwloc__add_info(infos, infos_count, "HardwareSerial", value); - } - return 0; -} - -static int -hwloc_linux_parse_cpuinfo_ppc(const char *prefix, const char *value, - struct hwloc_obj_info_s **infos, unsigned *infos_count, - int is_global) -{ - /* common fields */ - if (!strcmp("cpu", prefix)) { - hwloc__add_info(infos, infos_count, "CPUModel", value); - } else if (!strcmp("platform", prefix)) { - hwloc__add_info(infos, infos_count, "PlatformName", value); - } else if (!strcmp("model", prefix)) { - hwloc__add_info(infos, infos_count, "PlatformModel", value); - } - /* platform-specific fields */ - else if (!strcasecmp("vendor", prefix)) { - hwloc__add_info(infos, infos_count, "PlatformVendor", value); - } else if (!strcmp("Board ID", prefix)) { - hwloc__add_info(infos, infos_count, "PlatformBoardID", value); - } else if (!strcmp("Board", prefix) - || !strcasecmp("Machine", prefix)) { - /* machine and board are similar (and often more precise) than model above */ - char **valuep = hwloc__find_info_slot(infos, infos_count, "PlatformModel"); - if (*valuep) - free(*valuep); - *valuep = strdup(value); - } else if (!strcasecmp("Revision", prefix) - || !strcmp("Hardware rev", prefix)) { - hwloc__add_info(infos, infos_count, is_global ? "PlatformRevision" : "CPURevision", value); - } else if (!strcmp("SVR", prefix)) { - hwloc__add_info(infos, infos_count, "SystemVersionRegister", value); - } else if (!strcmp("PVR", prefix)) { - hwloc__add_info(infos, infos_count, "ProcessorVersionRegister", value); - } - /* don't match 'board*' because there's also "board l2" on some platforms */ - return 0; -} - -/* - * avr32: "chip type\t:" => OK - * blackfin: "model name\t:" => OK - * h8300: "CPU:" => OK - * m68k: "CPU:" => OK - * mips: "cpu model\t\t:" => OK - * openrisc: "CPU:" => OK - * sparc: "cpu\t\t:" => OK - * tile: "model name\t:" => OK - * unicore32: "Processor\t:" => OK - * alpha: "cpu\t\t\t: Alpha" + "cpu model\t\t:" => "cpu" overwritten by "cpu model", no processor indexes - * cris: "cpu\t\t:" + "cpu model\t:" => only "cpu" - * frv: "CPU-Core:" + "CPU:" => only "CPU" - * mn10300: "cpu core :" + "model name :" => only "model name" - * parisc: "cpu family\t:" + "cpu\t\t:" => only "cpu" - * - * not supported because of conflicts with other arch minor lines: - * m32r: "cpu family\t:" => KO (adding "cpu family" would break "blackfin") - * microblaze: "CPU-Family:" => KO - * sh: "cpu family\t:" + "cpu type\t:" => KO - * xtensa: "model\t\t:" => KO - */ -static int -hwloc_linux_parse_cpuinfo_generic(const char *prefix, const char *value, - struct hwloc_obj_info_s **infos, unsigned *infos_count, - int is_global __hwloc_attribute_unused) -{ - if (!strcmp("model name", prefix) - || !strcmp("Processor", prefix) - || !strcmp("chip type", prefix) - || !strcmp("cpu model", prefix) - || !strcasecmp("cpu", prefix)) { - /* keep the last one, assume it's more precise than the first one. - * we should have the Architecture keypair for basic information anyway. - */ - char **valuep = hwloc__find_info_slot(infos, infos_count, "CPUModel"); - if (*valuep) - free(*valuep); - *valuep = strdup(value); - } - return 0; -} - -/* Lprocs_p set to NULL unless returns > 0 */ -static int -hwloc_linux_parse_cpuinfo(struct hwloc_linux_backend_data_s *data, - const char *path, - struct hwloc_linux_cpuinfo_proc ** Lprocs_p, - struct hwloc_obj_info_s **global_infos, unsigned *global_infos_count) -{ - FILE *fd; - char *str = NULL; - char *endptr; - unsigned len; - unsigned allocated_Lprocs = 0; - struct hwloc_linux_cpuinfo_proc * Lprocs = NULL; - unsigned numprocs = 0; - int curproc = -1; - int (*parse_cpuinfo_func)(const char *, const char *, struct hwloc_obj_info_s **, unsigned *, int) = NULL; - - if (!(fd=hwloc_fopen(path,"r", data->root_fd))) - { - hwloc_debug("could not open %s\n", path); - return -1; - } - -# define PROCESSOR "processor" -# define PACKAGEID "physical id" /* the longest one */ -# define COREID "core id" - len = 128; /* vendor/model can be very long */ - str = malloc(len); - hwloc_debug("\n\n * Topology extraction from %s *\n\n", path); - while (fgets(str,len,fd)!=NULL) { - unsigned long Ppkg, Pcore, Pproc; - char *end, *dot, *prefix, *value; - int noend = 0; - - /* remove the ending \n */ - end = strchr(str, '\n'); - if (end) - *end = 0; - else - noend = 1; - /* if empty line, skip and reset curproc */ - if (!*str) { - curproc = -1; - continue; - } - /* skip lines with no dot */ - dot = strchr(str, ':'); - if (!dot) - continue; - /* skip lines not starting with a letter */ - if ((*str > 'z' || *str < 'a') - && (*str > 'Z' || *str < 'A')) - continue; - - /* mark the end of the prefix */ - prefix = str; - end = dot; - while (end[-1] == ' ' || end[-1] == ' ') end--; /* need a strrspn() */ - *end = 0; - /* find beginning of value, its end is already marked */ - value = dot+1 + strspn(dot+1, " "); - - /* defines for parsing numbers */ -# define getprocnb_begin(field, var) \ - if (!strcmp(field,prefix)) { \ - var = strtoul(value,&endptr,0); \ - if (endptr==value) { \ - hwloc_debug("no number in "field" field of %s\n", path); \ - goto err; \ - } else if (var==ULONG_MAX) { \ - hwloc_debug("too big "field" number in %s\n", path); \ - goto err; \ - } \ - hwloc_debug(field " %lu\n", var) -# define getprocnb_end() \ - } - /* actually parse numbers */ - getprocnb_begin(PROCESSOR, Pproc); - curproc = numprocs++; - if (numprocs > allocated_Lprocs) { - if (!allocated_Lprocs) - allocated_Lprocs = 8; - else - allocated_Lprocs *= 2; - Lprocs = realloc(Lprocs, allocated_Lprocs * sizeof(*Lprocs)); - } - Lprocs[curproc].Pproc = Pproc; - Lprocs[curproc].Pcore = -1; - Lprocs[curproc].Ppkg = -1; - Lprocs[curproc].Lcore = -1; - Lprocs[curproc].Lpkg = -1; - Lprocs[curproc].infos = NULL; - Lprocs[curproc].infos_count = 0; - getprocnb_end() else - getprocnb_begin(PACKAGEID, Ppkg); - Lprocs[curproc].Ppkg = Ppkg; - getprocnb_end() else - getprocnb_begin(COREID, Pcore); - Lprocs[curproc].Pcore = Pcore; - getprocnb_end() else { - - /* architecture specific or default routine for parsing cpumodel */ - switch (data->arch) { - case HWLOC_LINUX_ARCH_X86: - parse_cpuinfo_func = hwloc_linux_parse_cpuinfo_x86; - break; - case HWLOC_LINUX_ARCH_ARM: - parse_cpuinfo_func = hwloc_linux_parse_cpuinfo_arm; - break; - case HWLOC_LINUX_ARCH_POWER: - parse_cpuinfo_func = hwloc_linux_parse_cpuinfo_ppc; - break; - case HWLOC_LINUX_ARCH_IA64: - parse_cpuinfo_func = hwloc_linux_parse_cpuinfo_ia64; - break; - default: - parse_cpuinfo_func = hwloc_linux_parse_cpuinfo_generic; - } - - /* we can't assume that we already got a processor index line: - * alpha/frv/h8300/m68k/microblaze/sparc have no processor lines at all, only a global entry. - * tile has a global section with model name before the list of processor lines. - */ - parse_cpuinfo_func(prefix, value, - curproc >= 0 ? &Lprocs[curproc].infos : global_infos, - curproc >= 0 ? &Lprocs[curproc].infos_count : global_infos_count, - curproc < 0); - } - - if (noend) { - /* ignore end of line */ - if (fscanf(fd,"%*[^\n]") == EOF) - break; - getc(fd); - } - } - fclose(fd); - free(str); - - *Lprocs_p = Lprocs; - return numprocs; - - err: - fclose(fd); - free(str); - free(Lprocs); - *Lprocs_p = NULL; - return -1; -} - -static void -hwloc_linux_free_cpuinfo(struct hwloc_linux_cpuinfo_proc * Lprocs, unsigned numprocs, - struct hwloc_obj_info_s *global_infos, unsigned global_infos_count) -{ - if (Lprocs) { - unsigned i; - for(i=0; icpuset = hwloc_bitmap_alloc(); - hwloc_bitmap_only(obj->cpuset, Pproc); - hwloc_debug_2args_bitmap("cpu %lu (os %lu) has cpuset %s\n", - Lproc, Pproc, obj->cpuset); - hwloc_insert_object_by_cpuset(topology, obj); - } - - topology->support.discovery->pu = 1; - hwloc_bitmap_copy(online_cpuset, cpuset); - hwloc_bitmap_free(cpuset); - - hwloc_debug("%u online processors found\n", numprocs); - hwloc_debug_bitmap("online processor cpuset: %s\n", online_cpuset); - - hwloc_debug("%s", "\n * Topology summary *\n"); - hwloc_debug("%u processors)\n", numprocs); - - /* fill Lprocs[].Lpkg and Lpkg_to_Ppkg */ - for(Lproc=0; Lproc0) { - for (i = 0; i < numpkgs; i++) { - struct hwloc_obj *obj = hwloc_alloc_setup_object(HWLOC_OBJ_PACKAGE, Lpkg_to_Ppkg[i]); - int doneinfos = 0; - obj->cpuset = hwloc_bitmap_alloc(); - for(j=0; jcpuset, Lprocs[j].Pproc); - if (!doneinfos) { - hwloc__move_infos(&obj->infos, &obj->infos_count, &Lprocs[j].infos, &Lprocs[j].infos_count); - doneinfos = 1; - } - } - hwloc_debug_1arg_bitmap("package %d has cpuset %s\n", i, obj->cpuset); - hwloc_insert_object_by_cpuset(topology, obj); - } - hwloc_debug("%s", "\n"); - } - - /* fill Lprocs[].Lcore, Lcore_to_Ppkg and Lcore_to_Pcore */ - for(Lproc=0; Lproc0) { - for (i = 0; i < numcores; i++) { - struct hwloc_obj *obj = hwloc_alloc_setup_object(HWLOC_OBJ_CORE, Lcore_to_Pcore[i]); - obj->cpuset = hwloc_bitmap_alloc(); - for(j=0; jcpuset, Lprocs[j].Pproc); - hwloc_debug_1arg_bitmap("Core %d has cpuset %s\n", i, obj->cpuset); - hwloc_insert_object_by_cpuset(topology, obj); - } - hwloc_debug("%s", "\n"); - } - - free(Lcore_to_Pcore); - free(Lcore_to_Ppkg); - free(Lpkg_to_Ppkg); - return 0; -} - - - -/************************************* - ****** Main Topology Discovery ****** - *************************************/ - -static void -hwloc__linux_get_mic_sn(struct hwloc_topology *topology, struct hwloc_linux_backend_data_s *data) -{ - FILE *file; - char line[64], *tmp, *end; - file = hwloc_fopen("/proc/elog", "r", data->root_fd); - if (!file) - return; - if (!fgets(line, sizeof(line), file)) - goto out_with_file; - if (strncmp(line, "Card ", 5)) - goto out_with_file; - tmp = line + 5; - end = strchr(tmp, ':'); - if (!end) - goto out_with_file; - *end = '\0'; - hwloc_obj_add_info(hwloc_get_root_obj(topology), "MICSerialNumber", tmp); - - out_with_file: - fclose(file); -} - -static void -hwloc_gather_system_info(struct hwloc_topology *topology, - struct hwloc_linux_backend_data_s *data) -{ - FILE *file; - char line[128]; /* enough for utsname fields */ - const char *env; - - /* initialize to something sane, in case !is_thissystem and we can't find things in /proc/hwloc-nofile-info */ - memset(&data->utsname, 0, sizeof(data->utsname)); - data->fallback_nbprocessors = 1; - data->pagesize = 4096; - - /* read thissystem info */ - if (topology->is_thissystem) { - uname(&data->utsname); - data->fallback_nbprocessors = hwloc_fallback_nbprocessors(topology); - data->pagesize = hwloc_getpagesize(); - } - - /* overwrite with optional /proc/hwloc-nofile-info */ - file = hwloc_fopen("/proc/hwloc-nofile-info", "r", data->root_fd); - if (file) { - while (fgets(line, sizeof(line), file)) { - char *tmp = strchr(line, '\n'); - if (!strncmp("OSName: ", line, 8)) { - if (tmp) - *tmp = '\0'; - strncpy(data->utsname.sysname, line+8, sizeof(data->utsname.sysname)); - data->utsname.sysname[sizeof(data->utsname.sysname)-1] = '\0'; - } else if (!strncmp("OSRelease: ", line, 11)) { - if (tmp) - *tmp = '\0'; - strncpy(data->utsname.release, line+11, sizeof(data->utsname.release)); - data->utsname.release[sizeof(data->utsname.release)-1] = '\0'; - } else if (!strncmp("OSVersion: ", line, 11)) { - if (tmp) - *tmp = '\0'; - strncpy(data->utsname.version, line+11, sizeof(data->utsname.version)); - data->utsname.version[sizeof(data->utsname.version)-1] = '\0'; - } else if (!strncmp("HostName: ", line, 10)) { - if (tmp) - *tmp = '\0'; - strncpy(data->utsname.nodename, line+10, sizeof(data->utsname.nodename)); - data->utsname.nodename[sizeof(data->utsname.nodename)-1] = '\0'; - } else if (!strncmp("Architecture: ", line, 14)) { - if (tmp) - *tmp = '\0'; - strncpy(data->utsname.machine, line+14, sizeof(data->utsname.machine)); - data->utsname.machine[sizeof(data->utsname.machine)-1] = '\0'; - } else if (!strncmp("FallbackNbProcessors: ", line, 22)) { - if (tmp) - *tmp = '\0'; - data->fallback_nbprocessors = atoi(line+22); - } else if (!strncmp("PageSize: ", line, 10)) { - if (tmp) - *tmp = '\0'; - data->pagesize = strtoull(line+10, NULL, 10); - } else { - hwloc_debug("ignored /proc/hwloc-nofile-info line %s\n", line); - /* ignored */ - } - } - fclose(file); - } - - env = getenv("HWLOC_DUMP_NOFILE_INFO"); - if (env && *env) { - file = fopen(env, "w"); - if (file) { - if (*data->utsname.sysname) - fprintf(file, "OSName: %s\n", data->utsname.sysname); - if (*data->utsname.release) - fprintf(file, "OSRelease: %s\n", data->utsname.release); - if (*data->utsname.version) - fprintf(file, "OSVersion: %s\n", data->utsname.version); - if (*data->utsname.nodename) - fprintf(file, "HostName: %s\n", data->utsname.nodename); - if (*data->utsname.machine) - fprintf(file, "Architecture: %s\n", data->utsname.machine); - fprintf(file, "FallbackNbProcessors: %u\n", data->fallback_nbprocessors); - fprintf(file, "PageSize: %llu\n", (unsigned long long) data->pagesize); - fclose(file); - } - } - - /* detect arch for quirks, using configure #defines if possible, or uname */ -#if (defined HWLOC_X86_32_ARCH) || (defined HWLOC_X86_64_ARCH) /* does not cover KNC */ - if (topology->is_thissystem) - data->arch = HWLOC_LINUX_ARCH_X86; -#endif - if (data->arch == HWLOC_LINUX_ARCH_UNKNOWN && *data->utsname.machine) { - if (!strcmp(data->utsname.machine, "x86_64") - || (data->utsname.machine[0] == 'i' && !strcmp(data->utsname.machine+2, "86")) - || !strcmp(data->utsname.machine, "k1om")) - data->arch = HWLOC_LINUX_ARCH_X86; - else if (!strncmp(data->utsname.machine, "arm", 3)) - data->arch = HWLOC_LINUX_ARCH_ARM; - else if (!strncmp(data->utsname.machine, "ppc", 3) - || !strncmp(data->utsname.machine, "power", 5)) - data->arch = HWLOC_LINUX_ARCH_POWER; - else if (!strcmp(data->utsname.machine, "ia64")) - data->arch = HWLOC_LINUX_ARCH_IA64; - } -} - -/* returns 0 on success, -1 on non-match or error during hardwired load */ -static int -hwloc_linux_try_hardwired_cpuinfo(struct hwloc_backend *backend) -{ - struct hwloc_topology *topology = backend->topology; - struct hwloc_linux_backend_data_s *data = backend->private_data; - FILE *fd; - char line[128]; - - if (getenv("HWLOC_NO_HARDWIRED_TOPOLOGY")) - return -1; - - if (!strcmp(data->utsname.machine, "s64fx")) { - /* Fujistu K-computer, FX10, and FX100 use specific processors - * whose Linux topology support is broken until 4.1 (acc455cffa75070d55e74fc7802b49edbc080e92and) - * and existing machines will likely never be fixed by kernel upgrade. - */ - - /* /proc/cpuinfo starts with one of these lines: - * "cpu : Fujitsu SPARC64 VIIIfx" - * "cpu : Fujitsu SPARC64 XIfx" - * "cpu : Fujitsu SPARC64 IXfx" - */ - fd = hwloc_fopen("/proc/cpuinfo", "r", data->root_fd); - if (!fd) - return -1; - - if (!fgets(line, sizeof(line), fd)) { - fclose(fd); - return -1; - } - fclose(fd); - - if (strncmp(line, "cpu ", 4)) - return -1; - - if (strstr(line, "Fujitsu SPARC64 VIIIfx")) - return hwloc_look_hardwired_fujitsu_k(topology); - else if (strstr(line, "Fujitsu SPARC64 IXfx")) - return hwloc_look_hardwired_fujitsu_fx10(topology); - else if (strstr(line, "FUJITSU SPARC64 XIfx")) - return hwloc_look_hardwired_fujitsu_fx100(topology); - } - return -1; -} - -static int -hwloc_look_linuxfs(struct hwloc_backend *backend) -{ - struct hwloc_topology *topology = backend->topology; - struct hwloc_linux_backend_data_s *data = backend->private_data; - DIR *nodes_dir; - unsigned nbnodes; - char *cpuset_mntpnt, *cgroup_mntpnt, *cpuset_name = NULL; - struct hwloc_linux_cpuinfo_proc * Lprocs = NULL; - struct hwloc_obj_info_s *global_infos = NULL; - unsigned global_infos_count = 0; - int numprocs = 0; - int already_pus; - int err; - - already_pus = (topology->levels[0][0]->complete_cpuset != NULL - && !hwloc_bitmap_iszero(topology->levels[0][0]->complete_cpuset)); - /* if there are PUs, still look at memory information - * since x86 misses NUMA node information (unless the processor supports topoext) - * memory size. - */ - - /* allocate root sets in case not done yet */ - hwloc_alloc_obj_cpusets(topology->levels[0][0]); - - /********************************* - * Platform information for later - */ - hwloc_gather_system_info(topology, data); - - /********************** - * /proc/cpuinfo - */ - numprocs = hwloc_linux_parse_cpuinfo(data, "/proc/cpuinfo", &Lprocs, &global_infos, &global_infos_count); - - /************************** - * detect model for quirks - */ - if (data->arch == HWLOC_LINUX_ARCH_X86 && numprocs > 0) { - unsigned i; - const char *cpuvendor = NULL, *cpufamilynumber = NULL, *cpumodelnumber = NULL; - for(i=0; iis_knl = 1; - } - - /********************** - * Gather the list of admin-disabled cpus and mems - */ - hwloc_find_linux_cpuset_mntpnt(&cgroup_mntpnt, &cpuset_mntpnt, data->root_path); - if (cgroup_mntpnt || cpuset_mntpnt) { - cpuset_name = hwloc_read_linux_cpuset_name(data->root_fd, topology->pid); - if (cpuset_name) { - hwloc_admin_disable_set_from_cpuset(data, cgroup_mntpnt, cpuset_mntpnt, cpuset_name, "cpus", topology->levels[0][0]->allowed_cpuset); - hwloc_admin_disable_set_from_cpuset(data, cgroup_mntpnt, cpuset_mntpnt, cpuset_name, "mems", topology->levels[0][0]->allowed_nodeset); - } - free(cgroup_mntpnt); - free(cpuset_mntpnt); - } - - nodes_dir = hwloc_opendir("/proc/nodes", data->root_fd); - if (nodes_dir) { - /* Kerrighed */ - struct dirent *dirent; - char path[128]; - hwloc_obj_t machine; - hwloc_bitmap_t machine_online_set; - - if (already_pus) - /* we don't support extending kerrighed topologies */ - return 0; - - /* replace top-level object type with SYSTEM and add some MACHINE underneath */ - - topology->levels[0][0]->type = HWLOC_OBJ_SYSTEM; - topology->levels[0][0]->name = strdup("Kerrighed"); - - /* No cpuset support for now. */ - /* No sys support for now. */ - while ((dirent = readdir(nodes_dir)) != NULL) { - struct hwloc_linux_cpuinfo_proc * machine_Lprocs = NULL; - struct hwloc_obj_info_s *machine_global_infos = NULL; - unsigned machine_global_infos_count = 0; - int machine_numprocs = 0; - unsigned long node; - if (strncmp(dirent->d_name, "node", 4)) - continue; - machine_online_set = hwloc_bitmap_alloc(); - node = strtoul(dirent->d_name+4, NULL, 0); - snprintf(path, sizeof(path), "/proc/nodes/node%lu/cpuinfo", node); - machine_numprocs = hwloc_linux_parse_cpuinfo(data, path, &machine_Lprocs, &machine_global_infos, &machine_global_infos_count); - err = look_cpuinfo(topology, machine_Lprocs, machine_numprocs, machine_online_set); - hwloc_linux_free_cpuinfo(machine_Lprocs, machine_numprocs, machine_global_infos, machine_global_infos_count); - if (err < 0) { - hwloc_bitmap_free(machine_online_set); - continue; - } - hwloc_bitmap_or(topology->levels[0][0]->online_cpuset, topology->levels[0][0]->online_cpuset, machine_online_set); - machine = hwloc_alloc_setup_object(HWLOC_OBJ_MACHINE, node); - machine->cpuset = machine_online_set; - hwloc_debug_1arg_bitmap("machine number %lu has cpuset %s\n", - node, machine_online_set); - - /* Get the machine memory attributes */ - hwloc_get_kerrighed_node_meminfo_info(topology, data, node, &machine->memory); - - /* Gather DMI info */ - /* FIXME: get the right DMI info of each machine */ - hwloc__get_dmi_id_info(data, machine); - - hwloc_insert_object_by_cpuset(topology, machine); - } - closedir(nodes_dir); - } else { - /********************* - * Memory information - */ - - /* Get the machine memory attributes */ - hwloc_get_procfs_meminfo_info(topology, data, &topology->levels[0][0]->memory); - - /* Gather NUMA information. Must be after hwloc_get_procfs_meminfo_info so that the hugepage size is known */ - if (look_sysfsnode(topology, data, "/sys/bus/node/devices", &nbnodes) < 0) - look_sysfsnode(topology, data, "/sys/devices/system/node", &nbnodes); - - /* if we found some numa nodes, the machine object has no local memory */ - if (nbnodes) { - unsigned i; - topology->levels[0][0]->memory.local_memory = 0; - if (topology->levels[0][0]->memory.page_types) - for(i=0; ilevels[0][0]->memory.page_types_len; i++) - topology->levels[0][0]->memory.page_types[i].count = 0; - } - - /********************** - * CPU information - */ - - /* Don't rediscover CPU resources if already done */ - if (already_pus) - goto done; - - /* Gather the list of cpus now */ - err = hwloc_linux_try_hardwired_cpuinfo(backend); - if (!err) - goto done; - - /* setup root info */ - hwloc__move_infos(&hwloc_get_root_obj(topology)->infos, &hwloc_get_root_obj(topology)->infos_count, - &global_infos, &global_infos_count); - - if (getenv("HWLOC_LINUX_USE_CPUINFO") - || (hwloc_access("/sys/devices/system/cpu/cpu0/topology/core_siblings", R_OK, data->root_fd) < 0 - && hwloc_access("/sys/devices/system/cpu/cpu0/topology/thread_siblings", R_OK, data->root_fd) < 0 - && hwloc_access("/sys/bus/cpu/devices/cpu0/topology/thread_siblings", R_OK, data->root_fd) < 0 - && hwloc_access("/sys/bus/cpu/devices/cpu0/topology/core_siblings", R_OK, data->root_fd) < 0)) { - /* revert to reading cpuinfo only if /sys/.../topology unavailable (before 2.6.16) - * or not containing anything interesting */ - if (numprocs > 0) - err = look_cpuinfo(topology, Lprocs, numprocs, topology->levels[0][0]->online_cpuset); - else - err = -1; - if (err < 0) - hwloc_setup_pu_level(topology, data->fallback_nbprocessors); - look_powerpc_device_tree(topology, data); - - } else { - /* sysfs */ - if (look_sysfscpu(topology, data, "/sys/bus/cpu/devices", Lprocs, numprocs) < 0) - if (look_sysfscpu(topology, data, "/sys/devices/system/cpu", Lprocs, numprocs) < 0) - /* sysfs but we failed to read cpu topology, fallback */ - hwloc_setup_pu_level(topology, data->fallback_nbprocessors); - } - - done: - - /********************** - * Misc - */ - - /* Gather DMI info */ - hwloc__get_dmi_id_info(data, topology->levels[0][0]); - if (hwloc_topology_get_flags(topology) & (HWLOC_TOPOLOGY_FLAG_IO_DEVICES|HWLOC_TOPOLOGY_FLAG_WHOLE_IO)) - hwloc__get_firmware_dmi_memory_info(topology, data); - } - - hwloc_obj_add_info(topology->levels[0][0], "Backend", "Linux"); - if (cpuset_name) { - hwloc_obj_add_info(topology->levels[0][0], "LinuxCgroup", cpuset_name); - free(cpuset_name); - } - - hwloc__linux_get_mic_sn(topology, data); - - /* data->utsname was filled with real uname or \0, we can safely pass it */ - hwloc_add_uname_info(topology, &data->utsname); - - hwloc_linux_free_cpuinfo(Lprocs, numprocs, global_infos, global_infos_count); - return 1; -} - - - -/**************************************** - ***** Linux PCI backend callbacks ****** - **************************************** - * Do not support changing the fsroot (use sysfs) - */ - -static hwloc_obj_t -hwloc_linux_add_os_device(struct hwloc_backend *backend, struct hwloc_obj *pcidev, hwloc_obj_osdev_type_t type, const char *name) -{ - struct hwloc_topology *topology = backend->topology; - struct hwloc_obj *obj = hwloc_alloc_setup_object(HWLOC_OBJ_OS_DEVICE, -1); - obj->name = strdup(name); - obj->logical_index = -1; - obj->attr->osdev.type = type; - - hwloc_insert_object_by_parent(topology, pcidev, obj); - /* insert_object_by_parent() doesn't merge during insert, so obj is still valid */ - - return obj; -} - -typedef void (*hwloc_linux_class_fillinfos_t)(struct hwloc_backend *backend, struct hwloc_obj *osdev, const char *osdevpath); - -/* cannot be used in fsroot-aware code, would have to move to a per-topology variable */ - -static void -hwloc_linux_check_deprecated_classlinks_model(struct hwloc_linux_backend_data_s *data) -{ - int root_fd = data->root_fd; - DIR *dir; - struct dirent *dirent; - char path[128]; - struct stat st; - - data->deprecated_classlinks_model = -1; - - dir = hwloc_opendir("/sys/class/net", root_fd); - if (!dir) - return; - while ((dirent = readdir(dir)) != NULL) { - if (!strcmp(dirent->d_name, ".") || !strcmp(dirent->d_name, "..") || !strcmp(dirent->d_name, "lo")) - continue; - snprintf(path, sizeof(path), "/sys/class/net/%s/device/net/%s", dirent->d_name, dirent->d_name); - if (hwloc_stat(path, &st, root_fd) == 0) { - data->deprecated_classlinks_model = 0; - goto out; - } - snprintf(path, sizeof(path), "/sys/class/net/%s/device/net:%s", dirent->d_name, dirent->d_name); - if (hwloc_stat(path, &st, root_fd) == 0) { - data->deprecated_classlinks_model = 1; - goto out; - } - } -out: - closedir(dir); -} - -/* class objects that are immediately below pci devices: - * look for objects of the given classname below a sysfs (pcidev) directory - */ -static int -hwloc_linux_class_readdir(struct hwloc_backend *backend, - struct hwloc_obj *pcidev, const char *devicepath, - hwloc_obj_osdev_type_t type, const char *classname, - hwloc_linux_class_fillinfos_t fillinfo) -{ - struct hwloc_linux_backend_data_s *data = backend->private_data; - int root_fd = data->root_fd; - size_t classnamelen = strlen(classname); - char path[256]; - DIR *dir; - struct dirent *dirent; - hwloc_obj_t obj; - int res = 0, err; - - if (data->deprecated_classlinks_model == -2) - hwloc_linux_check_deprecated_classlinks_model(data); - - if (data->deprecated_classlinks_model != 1) { - /* modern sysfs: // */ - struct stat st; - snprintf(path, sizeof(path), "%s/%s", devicepath, classname); - - /* some very host kernel (2.6.9/RHEL4) have / symlink without any way to find . - * make sure / is a directory to avoid this case. - */ - err = hwloc_lstat(path, &st, root_fd); - if (err < 0 || !S_ISDIR(st.st_mode)) - goto trydeprecated; - - dir = hwloc_opendir(path, root_fd); - if (dir) { - data->deprecated_classlinks_model = 0; - while ((dirent = readdir(dir)) != NULL) { - if (!strcmp(dirent->d_name, ".") || !strcmp(dirent->d_name, "..")) - continue; - obj = hwloc_linux_add_os_device(backend, pcidev, type, dirent->d_name); - if (fillinfo) { - snprintf(path, sizeof(path), "%s/%s/%s", devicepath, classname, dirent->d_name); - fillinfo(backend, obj, path); - } - res++; - } - closedir(dir); - return res; - } - } - -trydeprecated: - if (data->deprecated_classlinks_model != 0) { - /* deprecated sysfs: /: */ - dir = hwloc_opendir(devicepath, root_fd); - if (dir) { - while ((dirent = readdir(dir)) != NULL) { - if (strncmp(dirent->d_name, classname, classnamelen) || dirent->d_name[classnamelen] != ':') - continue; - data->deprecated_classlinks_model = 1; - obj = hwloc_linux_add_os_device(backend, pcidev, type, dirent->d_name + classnamelen+1); - if (fillinfo) { - snprintf(path, sizeof(path), "%s/%s", devicepath, dirent->d_name); - fillinfo(backend, obj, path); - } - res++; - } - closedir(dir); - return res; - } - } - - return 0; -} - -/* - * look for net objects below a pcidev in sysfs - */ -static void -hwloc_linux_net_class_fillinfos(struct hwloc_backend *backend, - struct hwloc_obj *obj, const char *osdevpath) -{ - struct hwloc_linux_backend_data_s *data = backend->private_data; - int root_fd = data->root_fd; - FILE *fd; - struct stat st; - char path[256]; - snprintf(path, sizeof(path), "%s/address", osdevpath); - fd = hwloc_fopen(path, "r", root_fd); - if (fd) { - char address[128]; - if (fgets(address, sizeof(address), fd)) { - char *eol = strchr(address, '\n'); - if (eol) - *eol = 0; - hwloc_obj_add_info(obj, "Address", address); - } - fclose(fd); - } - snprintf(path, sizeof(path), "%s/device/infiniband", osdevpath); - if (!hwloc_stat(path, &st, root_fd)) { - snprintf(path, sizeof(path), "%s/dev_id", osdevpath); - fd = hwloc_fopen(path, "r", root_fd); - if (fd) { - char hexid[16]; - if (fgets(hexid, sizeof(hexid), fd)) { - char *eoid; - unsigned long port; - port = strtoul(hexid, &eoid, 0); - if (eoid != hexid) { - char portstr[16]; - snprintf(portstr, sizeof(portstr), "%ld", port+1); - hwloc_obj_add_info(obj, "Port", portstr); - } - } - fclose(fd); - } - } -} - -static int -hwloc_linux_lookup_net_class(struct hwloc_backend *backend, - struct hwloc_obj *pcidev, const char *pcidevpath) -{ - return hwloc_linux_class_readdir(backend, pcidev, pcidevpath, HWLOC_OBJ_OSDEV_NETWORK, "net", hwloc_linux_net_class_fillinfos); -} - -/* - * look for infiniband objects below a pcidev in sysfs - */ -static void -hwloc_linux_infiniband_class_fillinfos(struct hwloc_backend *backend, - struct hwloc_obj *obj, const char *osdevpath) -{ - struct hwloc_linux_backend_data_s *data = backend->private_data; - int root_fd = data->root_fd; - FILE *fd; - char path[256]; - unsigned i,j; - - snprintf(path, sizeof(path), "%s/node_guid", osdevpath); - fd = hwloc_fopen(path, "r", root_fd); - if (fd) { - char guidvalue[20]; - if (fgets(guidvalue, sizeof(guidvalue), fd)) { - size_t len; - len = strspn(guidvalue, "0123456789abcdefx:"); - assert(len == 19); - guidvalue[len] = '\0'; - hwloc_obj_add_info(obj, "NodeGUID", guidvalue); - } - fclose(fd); - } - - snprintf(path, sizeof(path), "%s/sys_image_guid", osdevpath); - fd = hwloc_fopen(path, "r", root_fd); - if (fd) { - char guidvalue[20]; - if (fgets(guidvalue, sizeof(guidvalue), fd)) { - size_t len; - len = strspn(guidvalue, "0123456789abcdefx:"); - assert(len == 19); - guidvalue[len] = '\0'; - hwloc_obj_add_info(obj, "SysImageGUID", guidvalue); - } - fclose(fd); - } - - for(i=1; ; i++) { - snprintf(path, sizeof(path), "%s/ports/%u/state", osdevpath, i); - fd = hwloc_fopen(path, "r", root_fd); - if (fd) { - char statevalue[2]; - if (fgets(statevalue, sizeof(statevalue), fd)) { - char statename[32]; - statevalue[1] = '\0'; /* only keep the first byte/digit */ - snprintf(statename, sizeof(statename), "Port%uState", i); - hwloc_obj_add_info(obj, statename, statevalue); - } - fclose(fd); - } else { - /* no such port */ - break; - } - - snprintf(path, sizeof(path), "%s/ports/%u/lid", osdevpath, i); - fd = hwloc_fopen(path, "r", root_fd); - if (fd) { - char lidvalue[11]; - if (fgets(lidvalue, sizeof(lidvalue), fd)) { - char lidname[32]; - size_t len; - len = strspn(lidvalue, "0123456789abcdefx"); - lidvalue[len] = '\0'; - snprintf(lidname, sizeof(lidname), "Port%uLID", i); - hwloc_obj_add_info(obj, lidname, lidvalue); - } - fclose(fd); - } - - snprintf(path, sizeof(path), "%s/ports/%u/lid_mask_count", osdevpath, i); - fd = hwloc_fopen(path, "r", root_fd); - if (fd) { - char lidvalue[11]; - if (fgets(lidvalue, sizeof(lidvalue), fd)) { - char lidname[32]; - size_t len; - len = strspn(lidvalue, "0123456789"); - lidvalue[len] = '\0'; - snprintf(lidname, sizeof(lidname), "Port%uLMC", i); - hwloc_obj_add_info(obj, lidname, lidvalue); - } - fclose(fd); - } - - for(j=0; ; j++) { - snprintf(path, sizeof(path), "%s/ports/%u/gids/%u", osdevpath, i, j); - fd = hwloc_fopen(path, "r", root_fd); - if (fd) { - char gidvalue[40]; - if (fgets(gidvalue, sizeof(gidvalue), fd)) { - char gidname[32]; - size_t len; - len = strspn(gidvalue, "0123456789abcdefx:"); - assert(len == 39); - gidvalue[len] = '\0'; - if (strncmp(gidvalue+20, "0000:0000:0000:0000", 19)) { - /* only keep initialized GIDs */ - snprintf(gidname, sizeof(gidname), "Port%uGID%u", i, j); - hwloc_obj_add_info(obj, gidname, gidvalue); - } - } - fclose(fd); - } else { - /* no such port */ - break; - } - } - } -} - -static int -hwloc_linux_lookup_openfabrics_class(struct hwloc_backend *backend, - struct hwloc_obj *pcidev, const char *pcidevpath) -{ - return hwloc_linux_class_readdir(backend, pcidev, pcidevpath, HWLOC_OBJ_OSDEV_OPENFABRICS, "infiniband", hwloc_linux_infiniband_class_fillinfos); -} - -/* look for dma objects below a pcidev in sysfs */ -static int -hwloc_linux_lookup_dma_class(struct hwloc_backend *backend, - struct hwloc_obj *pcidev, const char *pcidevpath) -{ - return hwloc_linux_class_readdir(backend, pcidev, pcidevpath, HWLOC_OBJ_OSDEV_DMA, "dma", NULL); -} - -/* look for drm objects below a pcidev in sysfs */ -static int -hwloc_linux_lookup_drm_class(struct hwloc_backend *backend, - struct hwloc_obj *pcidev, const char *pcidevpath) -{ - return hwloc_linux_class_readdir(backend, pcidev, pcidevpath, HWLOC_OBJ_OSDEV_GPU, "drm", NULL); - - /* we could look at the "graphics" class too, but it doesn't help for proprietary drivers either */ - - /* GPU devices (even with a proprietary driver) seem to have a boot_vga field in their PCI device directory (since 2.6.30), - * so we could create a OS device for each PCI devices with such a field. - * boot_vga is actually created when class >> 8 == VGA (it contains 1 for boot vga device), so it's trivial anyway. - */ -} - -/* - * look for block objects below a pcidev in sysfs - */ - -static void -hwloc_linux_block_class_fillinfos(struct hwloc_backend *backend, - struct hwloc_obj *obj, const char *osdevpath) -{ - struct hwloc_linux_backend_data_s *data = backend->private_data; - int root_fd = data->root_fd; - FILE *fd; - char path[256]; - char line[128]; - char vendor[64] = ""; - char model[64] = ""; - char serial[64] = ""; - char revision[64] = ""; - char blocktype[64] = ""; - unsigned major_id, minor_id; - char *tmp; - - snprintf(path, sizeof(path), "%s/dev", osdevpath); - fd = hwloc_fopen(path, "r", root_fd); - if (!fd) - return; - - if (NULL == fgets(line, sizeof(line), fd)) { - fclose(fd); - return; - } - fclose(fd); - - if (sscanf(line, "%u:%u", &major_id, &minor_id) != 2) - return; - tmp = strchr(line, '\n'); - if (tmp) - *tmp = '\0'; - hwloc_obj_add_info(obj, "LinuxDeviceID", line); - -#ifdef HWLOC_HAVE_LIBUDEV - if (data->udev) { - struct udev_device *dev; - const char *prop; - dev = udev_device_new_from_subsystem_sysname(data->udev, "block", obj->name); - if (!dev) - return; - prop = udev_device_get_property_value(dev, "ID_VENDOR"); - if (prop) { - strncpy(vendor, prop, sizeof(vendor)); - vendor[sizeof(vendor)-1] = '\0'; - } - prop = udev_device_get_property_value(dev, "ID_MODEL"); - if (prop) { - strncpy(model, prop, sizeof(model)); - model[sizeof(model)-1] = '\0'; - } - prop = udev_device_get_property_value(dev, "ID_REVISION"); - if (prop) { - strncpy(revision, prop, sizeof(revision)); - revision[sizeof(revision)-1] = '\0'; - } - prop = udev_device_get_property_value(dev, "ID_SERIAL_SHORT"); - if (prop) { - strncpy(serial, prop, sizeof(serial)); - serial[sizeof(serial)-1] = '\0'; - } - prop = udev_device_get_property_value(dev, "ID_TYPE"); - if (prop) { - strncpy(blocktype, prop, sizeof(blocktype)); - blocktype[sizeof(blocktype)-1] = '\0'; - } - - udev_device_unref(dev); - } else - /* fallback to reading files, works with any fsroot */ -#endif - { - snprintf(path, sizeof(path), "/run/udev/data/b%u:%u", major_id, minor_id); - fd = hwloc_fopen(path, "r", root_fd); - if (!fd) - return; - - while (NULL != fgets(line, sizeof(line), fd)) { - tmp = strchr(line, '\n'); - if (tmp) - *tmp = '\0'; - if (!strncmp(line, "E:ID_VENDOR=", strlen("E:ID_VENDOR="))) { - strncpy(vendor, line+strlen("E:ID_VENDOR="), sizeof(vendor)); - vendor[sizeof(vendor)-1] = '\0'; - } else if (!strncmp(line, "E:ID_MODEL=", strlen("E:ID_MODEL="))) { - strncpy(model, line+strlen("E:ID_MODEL="), sizeof(model)); - model[sizeof(model)-1] = '\0'; - } else if (!strncmp(line, "E:ID_REVISION=", strlen("E:ID_REVISION="))) { - strncpy(revision, line+strlen("E:ID_REVISION="), sizeof(revision)); - revision[sizeof(revision)-1] = '\0'; - } else if (!strncmp(line, "E:ID_SERIAL_SHORT=", strlen("E:ID_SERIAL_SHORT="))) { - strncpy(serial, line+strlen("E:ID_SERIAL_SHORT="), sizeof(serial)); - serial[sizeof(serial)-1] = '\0'; - } else if (!strncmp(line, "E:ID_TYPE=", strlen("E:ID_TYPE="))) { - strncpy(blocktype, line+strlen("E:ID_TYPE="), sizeof(blocktype)); - blocktype[sizeof(blocktype)-1] = '\0'; - } - } - fclose(fd); - } - - /* clear fake "ATA" vendor name */ - if (!strcasecmp(vendor, "ATA")) - *vendor = '\0'; - /* overwrite vendor name from model when possible */ - if (!*vendor) { - if (!strncasecmp(model, "wd", 2)) - strcpy(vendor, "Western Digital"); - else if (!strncasecmp(model, "st", 2)) - strcpy(vendor, "Seagate"); - else if (!strncasecmp(model, "samsung", 7)) - strcpy(vendor, "Samsung"); - else if (!strncasecmp(model, "sandisk", 7)) - strcpy(vendor, "SanDisk"); - else if (!strncasecmp(model, "toshiba", 7)) - strcpy(vendor, "Toshiba"); - } - - if (*vendor) - hwloc_obj_add_info(obj, "Vendor", vendor); - if (*model) - hwloc_obj_add_info(obj, "Model", model); - if (*revision) - hwloc_obj_add_info(obj, "Revision", revision); - if (*serial) - hwloc_obj_add_info(obj, "SerialNumber", serial); - - if (!strcmp(blocktype, "disk")) - hwloc_obj_add_info(obj, "Type", "Disk"); - else if (!strcmp(blocktype, "tape")) - hwloc_obj_add_info(obj, "Type", "Tape"); - else if (!strcmp(blocktype, "cd") || !strcmp(blocktype, "floppy") || !strcmp(blocktype, "optical")) - hwloc_obj_add_info(obj, "Type", "Removable Media Device"); - else /* generic, usb mass storage/rbc, usb mass storage/scsi */ - hwloc_obj_add_info(obj, "Type", "Other"); -} - -/* block class objects are in - * host%d/target%d:%d:%d/%d:%d:%d:%d/ - * or - * host%d/port-%d:%d/end_device-%d:%d/target%d:%d:%d/%d:%d:%d:%d/ - * or - * ide%d/%d.%d/ - * below pci devices */ -static int -hwloc_linux_lookup_host_block_class(struct hwloc_backend *backend, - struct hwloc_obj *pcidev, char *path, size_t pathlen) -{ - struct hwloc_linux_backend_data_s *data = backend->private_data; - int root_fd = data->root_fd; - DIR *hostdir, *portdir, *targetdir; - struct dirent *hostdirent, *portdirent, *targetdirent; - size_t hostdlen, portdlen, targetdlen; - int dummy; - int res = 0; - - hostdir = hwloc_opendir(path, root_fd); - if (!hostdir) - return 0; - - while ((hostdirent = readdir(hostdir)) != NULL) { - if (sscanf(hostdirent->d_name, "port-%d:%d", &dummy, &dummy) == 2) - { - /* found host%d/port-%d:%d */ - path[pathlen] = '/'; - strcpy(&path[pathlen+1], hostdirent->d_name); - pathlen += hostdlen = 1+strlen(hostdirent->d_name); - portdir = hwloc_opendir(path, root_fd); - if (!portdir) - continue; - while ((portdirent = readdir(portdir)) != NULL) { - if (sscanf(portdirent->d_name, "end_device-%d:%d", &dummy, &dummy) == 2) { - /* found host%d/port-%d:%d/end_device-%d:%d */ - path[pathlen] = '/'; - strcpy(&path[pathlen+1], portdirent->d_name); - pathlen += portdlen = 1+strlen(portdirent->d_name); - res += hwloc_linux_lookup_host_block_class(backend, pcidev, path, pathlen); - /* restore parent path */ - pathlen -= portdlen; - path[pathlen] = '\0'; - } - } - closedir(portdir); - /* restore parent path */ - pathlen -= hostdlen; - path[pathlen] = '\0'; - continue; - } else if (sscanf(hostdirent->d_name, "target%d:%d:%d", &dummy, &dummy, &dummy) == 3) { - /* found host%d/target%d:%d:%d */ - path[pathlen] = '/'; - strcpy(&path[pathlen+1], hostdirent->d_name); - pathlen += hostdlen = 1+strlen(hostdirent->d_name); - targetdir = hwloc_opendir(path, root_fd); - if (!targetdir) - continue; - while ((targetdirent = readdir(targetdir)) != NULL) { - if (sscanf(targetdirent->d_name, "%d:%d:%d:%d", &dummy, &dummy, &dummy, &dummy) != 4) - continue; - /* found host%d/target%d:%d:%d/%d:%d:%d:%d */ - path[pathlen] = '/'; - strcpy(&path[pathlen+1], targetdirent->d_name); - pathlen += targetdlen = 1+strlen(targetdirent->d_name); - /* lookup block class for real */ - res += hwloc_linux_class_readdir(backend, pcidev, path, HWLOC_OBJ_OSDEV_BLOCK, "block", hwloc_linux_block_class_fillinfos); - /* restore parent path */ - pathlen -= targetdlen; - path[pathlen] = '\0'; - } - closedir(targetdir); - /* restore parent path */ - pathlen -= hostdlen; - path[pathlen] = '\0'; - } - } - closedir(hostdir); - - return res; -} - -static int -hwloc_linux_lookup_block_class(struct hwloc_backend *backend, - struct hwloc_obj *pcidev, const char *pcidevpath) -{ - struct hwloc_linux_backend_data_s *data = backend->private_data; - int root_fd = data->root_fd; - size_t pathlen; - DIR *devicedir, *hostdir; - struct dirent *devicedirent, *hostdirent; - size_t devicedlen, hostdlen; - char path[256]; - int dummy; - int res = 0; - - strcpy(path, pcidevpath); - pathlen = strlen(path); - - /* look for a direct block device here (such as NVMe, something without controller subdirs in the middle) */ - res += hwloc_linux_class_readdir(backend, pcidev, path, - HWLOC_OBJ_OSDEV_BLOCK, "block", - hwloc_linux_block_class_fillinfos); - if (res) - return res; - /* otherwise try to find controller subdirectories */ - - devicedir = hwloc_opendir(pcidevpath, root_fd); - if (!devicedir) - return 0; - - while ((devicedirent = readdir(devicedir)) != NULL) { - if (sscanf(devicedirent->d_name, "ide%d", &dummy) == 1) { - /* found ide%d */ - path[pathlen] = '/'; - strcpy(&path[pathlen+1], devicedirent->d_name); - pathlen += devicedlen = 1+strlen(devicedirent->d_name); - hostdir = hwloc_opendir(path, root_fd); - if (!hostdir) - continue; - while ((hostdirent = readdir(hostdir)) != NULL) { - if (sscanf(hostdirent->d_name, "%d.%d", &dummy, &dummy) == 2) { - /* found ide%d/%d.%d */ - path[pathlen] = '/'; - strcpy(&path[pathlen+1], hostdirent->d_name); - pathlen += hostdlen = 1+strlen(hostdirent->d_name); - /* lookup block class for real */ - res += hwloc_linux_class_readdir(backend, pcidev, path, HWLOC_OBJ_OSDEV_BLOCK, "block", NULL); - /* restore parent path */ - pathlen -= hostdlen; - path[pathlen] = '\0'; - } - } - closedir(hostdir); - /* restore parent path */ - pathlen -= devicedlen; - path[pathlen] = '\0'; - } else if (sscanf(devicedirent->d_name, "host%d", &dummy) == 1) { - /* found host%d */ - path[pathlen] = '/'; - strcpy(&path[pathlen+1], devicedirent->d_name); - pathlen += devicedlen = 1+strlen(devicedirent->d_name); - res += hwloc_linux_lookup_host_block_class(backend, pcidev, path, pathlen); - /* restore parent path */ - pathlen -= devicedlen; - path[pathlen] = '\0'; - } else if (sscanf(devicedirent->d_name, "ata%d", &dummy) == 1) { - /* found ata%d */ - path[pathlen] = '/'; - strcpy(&path[pathlen+1], devicedirent->d_name); - pathlen += devicedlen = 1+strlen(devicedirent->d_name); - hostdir = hwloc_opendir(path, root_fd); - if (!hostdir) - continue; - while ((hostdirent = readdir(hostdir)) != NULL) { - if (sscanf(hostdirent->d_name, "host%d", &dummy) == 1) { - /* found ata%d/host%d */ - path[pathlen] = '/'; - strcpy(&path[pathlen+1], hostdirent->d_name); - pathlen += hostdlen = 1+strlen(hostdirent->d_name); - /* lookup block class for real */ - res += hwloc_linux_lookup_host_block_class(backend, pcidev, path, pathlen); - /* restore parent path */ - pathlen -= hostdlen; - path[pathlen] = '\0'; - } - } - closedir(hostdir); - /* restore parent path */ - pathlen -= devicedlen; - path[pathlen] = '\0'; - } - } - closedir(devicedir); - - return res; -} - -static void -hwloc_linux_mic_class_fillinfos(struct hwloc_backend *backend, - struct hwloc_obj *obj, const char *osdevpath) -{ - struct hwloc_linux_backend_data_s *data = backend->private_data; - int root_fd = data->root_fd; - FILE *fd; - char path[256]; - - hwloc_obj_add_info(obj, "CoProcType", "MIC"); - - snprintf(path, sizeof(path), "%s/family", osdevpath); - fd = hwloc_fopen(path, "r", root_fd); - if (fd) { - char family[64]; - if (fgets(family, sizeof(family), fd)) { - char *eol = strchr(family, '\n'); - if (eol) - *eol = 0; - hwloc_obj_add_info(obj, "MICFamily", family); - } - fclose(fd); - } - - snprintf(path, sizeof(path), "%s/sku", osdevpath); - fd = hwloc_fopen(path, "r", root_fd); - if (fd) { - char sku[64]; - if (fgets(sku, sizeof(sku), fd)) { - char *eol = strchr(sku, '\n'); - if (eol) - *eol = 0; - hwloc_obj_add_info(obj, "MICSKU", sku); - } - fclose(fd); - } - - snprintf(path, sizeof(path), "%s/serialnumber", osdevpath); - fd = hwloc_fopen(path, "r", root_fd); - if (fd) { - char sn[64]; - if (fgets(sn, sizeof(sn), fd)) { - char *eol = strchr(sn, '\n'); - if (eol) - *eol = 0; - hwloc_obj_add_info(obj, "MICSerialNumber", sn); - } - fclose(fd); - } - - snprintf(path, sizeof(path), "%s/active_cores", osdevpath); - fd = hwloc_fopen(path, "r", root_fd); - if (fd) { - char string[10]; - if (fgets(string, sizeof(string), fd)) { - unsigned long count = strtoul(string, NULL, 16); - snprintf(string, sizeof(string), "%lu", count); - hwloc_obj_add_info(obj, "MICActiveCores", string); - } - fclose(fd); - } - - snprintf(path, sizeof(path), "%s/memsize", osdevpath); - fd = hwloc_fopen(path, "r", root_fd); - if (fd) { - char string[20]; - if (fgets(string, sizeof(string), fd)) { - unsigned long count = strtoul(string, NULL, 16); - snprintf(string, sizeof(string), "%lu", count); - hwloc_obj_add_info(obj, "MICMemorySize", string); - } - fclose(fd); - } -} - -static int -hwloc_linux_lookup_mic_class(struct hwloc_backend *backend, - struct hwloc_obj *pcidev, const char *pcidevpath) -{ - return hwloc_linux_class_readdir(backend, pcidev, pcidevpath, HWLOC_OBJ_OSDEV_COPROC, "mic", hwloc_linux_mic_class_fillinfos); -} - -static int -hwloc_linux_directlookup_mic_class(struct hwloc_backend *backend, - struct hwloc_obj *pcidev) -{ - struct hwloc_linux_backend_data_s *data = backend->private_data; - int root_fd = data->root_fd; - char path[256]; - struct stat st; - hwloc_obj_t obj; - unsigned idx; - int res = 0; - - if (!data->mic_directlookup_id_max) - /* already tried, nothing to do */ - return 0; - - if (data->mic_directlookup_id_max == (unsigned) -1) { - /* never tried, find out the max id */ - DIR *dir; - struct dirent *dirent; - - /* make sure we never do this lookup again */ - data->mic_directlookup_id_max = 0; - - /* read the entire class and find the max id of mic%u dirents */ - dir = hwloc_opendir("/sys/devices/virtual/mic", root_fd); - if (!dir) { - dir = hwloc_opendir("/sys/class/mic", root_fd); - if (!dir) - return 0; - } - while ((dirent = readdir(dir)) != NULL) { - if (!strcmp(dirent->d_name, ".") || !strcmp(dirent->d_name, "..")) - continue; - if (sscanf(dirent->d_name, "mic%u", &idx) != 1) - continue; - if (idx >= data->mic_directlookup_id_max) - data->mic_directlookup_id_max = idx+1; - } - closedir(dir); - } - - /* now iterate over the mic ids and see if one matches our pcidev */ - for(idx=0; idxmic_directlookup_id_max; idx++) { - snprintf(path, sizeof(path), "/sys/class/mic/mic%u/pci_%02x:%02x.%02x", - idx, pcidev->attr->pcidev.bus, pcidev->attr->pcidev.dev, pcidev->attr->pcidev.func); - if (hwloc_stat(path, &st, root_fd) < 0) - continue; - snprintf(path, sizeof(path), "mic%u", idx); - obj = hwloc_linux_add_os_device(backend, pcidev, HWLOC_OBJ_OSDEV_COPROC, path); - snprintf(path, sizeof(path), "/sys/class/mic/mic%u", idx); - hwloc_linux_mic_class_fillinfos(backend, obj, path); - res++; - } - - return res; -} - -/* - * backend callback for inserting objects inside a pci device - */ -static int -hwloc_linux_backend_notify_new_object(struct hwloc_backend *backend, struct hwloc_backend *caller __hwloc_attribute_unused, - struct hwloc_obj *obj) -{ - struct hwloc_linux_backend_data_s *data = backend->private_data; - char pcidevpath[256]; - int res = 0; - - /* this callback is only used in the libpci backend for now */ - assert(obj->type == HWLOC_OBJ_PCI_DEVICE); - - snprintf(pcidevpath, sizeof(pcidevpath), "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/", - obj->attr->pcidev.domain, obj->attr->pcidev.bus, - obj->attr->pcidev.dev, obj->attr->pcidev.func); - - res += hwloc_linux_lookup_net_class(backend, obj, pcidevpath); - res += hwloc_linux_lookup_openfabrics_class(backend, obj, pcidevpath); - res += hwloc_linux_lookup_dma_class(backend, obj, pcidevpath); - res += hwloc_linux_lookup_drm_class(backend, obj, pcidevpath); - res += hwloc_linux_lookup_block_class(backend, obj, pcidevpath); - - if (data->mic_need_directlookup == -1) { - struct stat st; - if (hwloc_stat("/sys/class/mic/mic0", &st, data->root_fd) == 0 - && hwloc_stat("/sys/class/mic/mic0/device/mic/mic0", &st, data->root_fd) == -1) - /* hwloc_linux_lookup_mic_class will fail because pcidev sysfs directories - * do not have mic/mic%u symlinks to mic devices (old mic driver). - * if so, try from the mic class. - */ - data->mic_need_directlookup = 1; - else - data->mic_need_directlookup = 0; - } - if (data->mic_need_directlookup) - res += hwloc_linux_directlookup_mic_class(backend, obj); - else - res += hwloc_linux_lookup_mic_class(backend, obj, pcidevpath); - - return res; -} - -/* - * backend callback for retrieving the location of a pci device - */ -static int -hwloc_linux_backend_get_obj_cpuset(struct hwloc_backend *backend, - struct hwloc_backend *caller __hwloc_attribute_unused, - struct hwloc_obj *obj, hwloc_bitmap_t cpuset) -{ - struct hwloc_linux_backend_data_s *data = backend->private_data; - char path[256]; - FILE *file; - int err; - - /* this callback is only used in the libpci backend for now */ - assert(obj->type == HWLOC_OBJ_PCI_DEVICE - || (obj->type == HWLOC_OBJ_BRIDGE && obj->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_PCI)); - - snprintf(path, sizeof(path), "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/local_cpus", - obj->attr->pcidev.domain, obj->attr->pcidev.bus, - obj->attr->pcidev.dev, obj->attr->pcidev.func); - file = hwloc_fopen(path, "r", data->root_fd); - if (file) { - err = hwloc_linux_parse_cpumap_file(file, cpuset); - fclose(file); - if (!err && !hwloc_bitmap_iszero(cpuset)) - return 0; - } - return -1; -} - - - -/******************************* - ******* Linux component ******* - *******************************/ - -static void -hwloc_linux_backend_disable(struct hwloc_backend *backend) -{ - struct hwloc_linux_backend_data_s *data = backend->private_data; -#ifdef HAVE_OPENAT - if (data->root_path) - free(data->root_path); - close(data->root_fd); -#endif -#ifdef HWLOC_HAVE_LIBUDEV - if (data->udev) - udev_unref(data->udev); -#endif - free(data); -} - -static struct hwloc_backend * -hwloc_linux_component_instantiate(struct hwloc_disc_component *component, - const void *_data1, - const void *_data2 __hwloc_attribute_unused, - const void *_data3 __hwloc_attribute_unused) -{ - struct hwloc_backend *backend; - struct hwloc_linux_backend_data_s *data; - const char * fsroot_path = _data1; - int flags, root = -1; - - backend = hwloc_backend_alloc(component); - if (!backend) - goto out; - - data = malloc(sizeof(*data)); - if (!data) { - errno = ENOMEM; - goto out_with_backend; - } - - backend->private_data = data; - backend->flags = HWLOC_BACKEND_FLAG_NEED_LEVELS; - backend->discover = hwloc_look_linuxfs; - backend->get_obj_cpuset = hwloc_linux_backend_get_obj_cpuset; - backend->notify_new_object = hwloc_linux_backend_notify_new_object; - backend->disable = hwloc_linux_backend_disable; - - /* default values */ - data->arch = HWLOC_LINUX_ARCH_UNKNOWN; - data->is_knl = 0; - data->is_real_fsroot = 1; - data->root_path = NULL; - if (!fsroot_path) - fsroot_path = "/"; - -#ifdef HAVE_OPENAT - root = open(fsroot_path, O_RDONLY | O_DIRECTORY); - if (root < 0) - goto out_with_data; - - if (strcmp(fsroot_path, "/")) { - backend->is_thissystem = 0; - data->is_real_fsroot = 0; - data->root_path = strdup(fsroot_path); - } - - /* Since this fd stays open after hwloc returns, mark it as - close-on-exec so that children don't inherit it. Stevens says - that we should GETFD before we SETFD, so we do. */ - flags = fcntl(root, F_GETFD, 0); - if (-1 == flags || - -1 == fcntl(root, F_SETFD, FD_CLOEXEC | flags)) { - close(root); - root = -1; - goto out_with_data; - } -#else - if (strcmp(fsroot_path, "/")) { - errno = ENOSYS; - goto out_with_data; - } -#endif - data->root_fd = root; - -#ifdef HWLOC_HAVE_LIBUDEV - data->udev = NULL; - if (data->is_real_fsroot) { - data->udev = udev_new(); - } -#endif - - data->dumped_hwdata_dirname = getenv("HWLOC_DUMPED_HWDATA_DIR"); - if (!data->dumped_hwdata_dirname) - data->dumped_hwdata_dirname = RUNSTATEDIR "/hwloc/"; - - data->deprecated_classlinks_model = -2; /* never tried */ - data->mic_need_directlookup = -1; /* not initialized */ - data->mic_directlookup_id_max = -1; /* not initialized */ - - return backend; - - out_with_data: -#ifdef HAVE_OPENAT - if (data->root_path) - free(data->root_path); -#endif - free(data); - out_with_backend: - free(backend); - out: - return NULL; -} - -static struct hwloc_disc_component hwloc_linux_disc_component = { - HWLOC_DISC_COMPONENT_TYPE_CPU, - "linux", - HWLOC_DISC_COMPONENT_TYPE_GLOBAL, - hwloc_linux_component_instantiate, - 50, - NULL -}; - -const struct hwloc_component hwloc_linux_component = { - HWLOC_COMPONENT_ABI, - NULL, NULL, - HWLOC_COMPONENT_TYPE_DISC, - 0, - &hwloc_linux_disc_component -}; - - - - -#ifdef HWLOC_HAVE_LINUXPCI - -/*********************************** - ******* Linux PCI component ******* - ***********************************/ - -#define HWLOC_PCI_REVISION_ID 0x08 -#define HWLOC_PCI_CAP_ID_EXP 0x10 -#define HWLOC_PCI_CLASS_NOT_DEFINED 0x0000 - -static int -hwloc_look_linuxfs_pci(struct hwloc_backend *backend) -{ - struct hwloc_topology *topology = backend->topology; - struct hwloc_backend *tmpbackend; - hwloc_obj_t first_obj = NULL, last_obj = NULL; - int root_fd = -1; - DIR *dir; - struct dirent *dirent; - int res = 0; - - if (!(hwloc_topology_get_flags(topology) & (HWLOC_TOPOLOGY_FLAG_IO_DEVICES|HWLOC_TOPOLOGY_FLAG_WHOLE_IO))) - return 0; - - if (hwloc_get_next_pcidev(topology, NULL)) { - hwloc_debug("%s", "PCI objects already added, ignoring linuxpci backend.\n"); - return 0; - } - - /* hackily find the linux backend to steal its fsroot */ - tmpbackend = topology->backends; - while (tmpbackend) { - if (tmpbackend->component == &hwloc_linux_disc_component) { - root_fd = ((struct hwloc_linux_backend_data_s *) tmpbackend->private_data)->root_fd; - hwloc_debug("linuxpci backend stole linux backend root_fd %d\n", root_fd); - break; } - tmpbackend = tmpbackend->next; - } - /* take our own descriptor, either pointing to linux fsroot, or to / if not found */ - if (root_fd >= 0) - root_fd = dup(root_fd); - else - root_fd = open("/", O_RDONLY | O_DIRECTORY); - - dir = hwloc_opendir("/sys/bus/pci/devices/", root_fd); - if (!dir) - goto out_with_rootfd; - - while ((dirent = readdir(dir)) != NULL) { - unsigned domain, bus, dev, func; - hwloc_obj_t obj; - struct hwloc_pcidev_attr_s *attr; - unsigned os_index; - char path[64]; - char value[16]; - size_t read; - FILE *file; - - if (sscanf(dirent->d_name, "%04x:%02x:%02x.%01x", &domain, &bus, &dev, &func) != 4) - continue; - - os_index = (domain << 20) + (bus << 12) + (dev << 4) + func; - obj = hwloc_alloc_setup_object(HWLOC_OBJ_PCI_DEVICE, os_index); - if (!obj) - break; - attr = &obj->attr->pcidev; - - attr->domain = domain; - attr->bus = bus; - attr->dev = dev; - attr->func = func; - - /* default (unknown) values */ - attr->vendor_id = 0; - attr->device_id = 0; - attr->class_id = HWLOC_PCI_CLASS_NOT_DEFINED; - attr->revision = 0; - attr->subvendor_id = 0; - attr->subdevice_id = 0; - attr->linkspeed = 0; - - snprintf(path, sizeof(path), "/sys/bus/pci/devices/%s/vendor", dirent->d_name); - file = hwloc_fopen(path, "r", root_fd); - if (file) { - read = fread(value, 1, sizeof(value), file); - fclose(file); - if (read) - attr->vendor_id = strtoul(value, NULL, 16); - } - snprintf(path, sizeof(path), "/sys/bus/pci/devices/%s/device", dirent->d_name); - file = hwloc_fopen(path, "r", root_fd); - if (file) { - read = fread(value, 1, sizeof(value), file); - fclose(file); - if (read) - attr->device_id = strtoul(value, NULL, 16); - } - snprintf(path, sizeof(path), "/sys/bus/pci/devices/%s/class", dirent->d_name); - file = hwloc_fopen(path, "r", root_fd); - if (file) { - read = fread(value, 1, sizeof(value), file); - fclose(file); - if (read) - attr->class_id = strtoul(value, NULL, 16) >> 8; - } - snprintf(path, sizeof(path), "/sys/bus/pci/devices/%s/subsystem_vendor", dirent->d_name); - file = hwloc_fopen(path, "r", root_fd); - if (file) { - read = fread(value, 1, sizeof(value), file); - fclose(file); - if (read) - attr->subvendor_id = strtoul(value, NULL, 16); - } - snprintf(path, sizeof(path), "/sys/bus/pci/devices/%s/subsystem_device", dirent->d_name); - file = hwloc_fopen(path, "r", root_fd); - if (file) { - read = fread(value, 1, sizeof(value), file); - fclose(file); - if (read) - attr->subdevice_id = strtoul(value, NULL, 16); - } - - snprintf(path, sizeof(path), "/sys/bus/pci/devices/%s/config", dirent->d_name); - file = hwloc_fopen(path, "r", root_fd); - if (file) { -#define CONFIG_SPACE_CACHESIZE 256 - unsigned char config_space_cache[CONFIG_SPACE_CACHESIZE]; - unsigned offset; - - /* initialize the config space in case we fail to read it (missing permissions, etc). */ - memset(config_space_cache, 0xff, CONFIG_SPACE_CACHESIZE); - read = fread(config_space_cache, 1, CONFIG_SPACE_CACHESIZE, file); - (void) read; /* we initialized config_space_cache in case we don't read enough, ignore the read length */ - fclose(file); - - /* is this a bridge? */ - if (hwloc_pci_prepare_bridge(obj, config_space_cache) < 0) - continue; - - /* get the revision */ - attr->revision = config_space_cache[HWLOC_PCI_REVISION_ID]; - - /* try to get the link speed */ - offset = hwloc_pci_find_cap(config_space_cache, HWLOC_PCI_CAP_ID_EXP); - if (offset > 0 && offset + 20 /* size of PCI express block up to link status */ <= CONFIG_SPACE_CACHESIZE) - hwloc_pci_find_linkspeed(config_space_cache, offset, &attr->linkspeed); - } - - if (first_obj) - last_obj->next_sibling = obj; - else - first_obj = obj; - last_obj = obj; - } - - closedir(dir); - - dir = hwloc_opendir("/sys/bus/pci/slots/", root_fd); - if (dir) { - while ((dirent = readdir(dir)) != NULL) { - char path[64]; - FILE *file; - if (dirent->d_name[0] == '.') - continue; - snprintf(path, sizeof(path), "/sys/bus/pci/slots/%s/address", dirent->d_name); - file = hwloc_fopen(path, "r", root_fd); - if (file) { - unsigned domain, bus, dev; - if (fscanf(file, "%x:%x:%x", &domain, &bus, &dev) == 3) { - hwloc_obj_t obj = first_obj; - while (obj) { - if (obj->attr->pcidev.domain == domain - && obj->attr->pcidev.bus == bus - && obj->attr->pcidev.dev == dev) { - hwloc_obj_add_info(obj, "PCISlot", dirent->d_name); - } - obj = obj->next_sibling; - } - } - fclose(file); - } - } - closedir(dir); - } - - res = hwloc_insert_pci_device_list(backend, first_obj); - - out_with_rootfd: - close(root_fd); - return res; -} - -static struct hwloc_backend * -hwloc_linuxpci_component_instantiate(struct hwloc_disc_component *component, - const void *_data1 __hwloc_attribute_unused, - const void *_data2 __hwloc_attribute_unused, - const void *_data3 __hwloc_attribute_unused) -{ - struct hwloc_backend *backend; - - /* thissystem may not be fully initialized yet, we'll check flags in discover() */ - - backend = hwloc_backend_alloc(component); - if (!backend) - return NULL; - backend->flags = HWLOC_BACKEND_FLAG_NEED_LEVELS; - backend->discover = hwloc_look_linuxfs_pci; - return backend; -} - -static struct hwloc_disc_component hwloc_linuxpci_disc_component = { - HWLOC_DISC_COMPONENT_TYPE_MISC, - "linuxpci", - HWLOC_DISC_COMPONENT_TYPE_GLOBAL, - hwloc_linuxpci_component_instantiate, - 19, /* after pci */ - NULL -}; - -const struct hwloc_component hwloc_linuxpci_component = { - HWLOC_COMPONENT_ABI, - NULL, NULL, - HWLOC_COMPONENT_TYPE_DISC, - 0, - &hwloc_linuxpci_disc_component -}; - -#endif /* HWLOC_HAVE_LINUXPCI */ diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-nvml.c b/opal/mca/hwloc/hwloc1113/hwloc/src/topology-nvml.c deleted file mode 100644 index 9c36d0a40b7..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-nvml.c +++ /dev/null @@ -1,239 +0,0 @@ -/* - * Copyright © 2012-2014 Inria. All rights reserved. - * See COPYING in top-level directory. - */ - -#include -#include -#include - -/* private headers allowed for convenience because this plugin is built within hwloc */ -#include -#include - -#include - -struct hwloc_nvml_backend_data_s { - unsigned nr_devices; /* -1 when unknown yet, first callback will setup */ - struct hwloc_nvml_device_info_s { - char name[64]; - char serial[64]; - char uuid[64]; - unsigned pcidomain, pcibus, pcidev, pcifunc; - float maxlinkspeed; - } * devices; -}; - -static void -hwloc_nvml_query_devices(struct hwloc_nvml_backend_data_s *data) -{ - nvmlReturn_t ret; - unsigned nb, i; - - /* mark the number of devices as 0 in case we fail below, - * so that we don't try again later. - */ - data->nr_devices = 0; - - ret = nvmlInit(); - if (NVML_SUCCESS != ret) - goto out; - ret = nvmlDeviceGetCount(&nb); - if (NVML_SUCCESS != ret) - goto out_with_init; - - /* allocate structs */ - data->devices = malloc(nb * sizeof(*data->devices)); - if (!data->devices) - goto out_with_init; - - for(i=0; idevices[data->nr_devices]; - nvmlPciInfo_t pci; - nvmlDevice_t device; - - ret = nvmlDeviceGetHandleByIndex(i, &device); - assert(ret == NVML_SUCCESS); - - ret = nvmlDeviceGetPciInfo(device, &pci); - if (NVML_SUCCESS != ret) - continue; - - info->pcidomain = pci.domain; - info->pcibus = pci.bus; - info->pcidev = pci.device; - info->pcifunc = 0; - - info->name[0] = '\0'; - ret = nvmlDeviceGetName(device, info->name, sizeof(info->name)); - /* these may fail with NVML_ERROR_NOT_SUPPORTED on old devices */ - info->serial[0] = '\0'; - ret = nvmlDeviceGetSerial(device, info->serial, sizeof(info->serial)); - info->uuid[0] = '\0'; - ret = nvmlDeviceGetUUID(device, info->uuid, sizeof(info->uuid)); - - info->maxlinkspeed = 0.0f; -#if HAVE_DECL_NVMLDEVICEGETMAXPCIELINKGENERATION - { - unsigned maxwidth = 0, maxgen = 0; - float lanespeed; - nvmlDeviceGetMaxPcieLinkWidth(device, &maxwidth); - nvmlDeviceGetMaxPcieLinkGeneration(device, &maxgen); - /* PCIe Gen1 = 2.5GT/s signal-rate per lane with 8/10 encoding = 0.25GB/s data-rate per lane - * PCIe Gen2 = 5 GT/s signal-rate per lane with 8/10 encoding = 0.5 GB/s data-rate per lane - * PCIe Gen3 = 8 GT/s signal-rate per lane with 128/130 encoding = 1 GB/s data-rate per lane - */ - lanespeed = maxgen <= 2 ? 2.5 * maxgen * 0.8 : 8.0 * 128/130; /* Gbit/s per lane */ - info->maxlinkspeed = lanespeed * maxwidth / 8; /* GB/s */ - } -#endif - - /* validate this device */ - data->nr_devices++; - } - -out_with_init: - nvmlShutdown(); -out: - return; -} - -static int -hwloc_nvml_backend_notify_new_object(struct hwloc_backend *backend, struct hwloc_backend *caller __hwloc_attribute_unused, - struct hwloc_obj *pcidev) -{ - struct hwloc_topology *topology = backend->topology; - struct hwloc_nvml_backend_data_s *data = backend->private_data; - unsigned i; - - if (!(hwloc_topology_get_flags(topology) & (HWLOC_TOPOLOGY_FLAG_IO_DEVICES|HWLOC_TOPOLOGY_FLAG_WHOLE_IO))) - return 0; - - if (!hwloc_topology_is_thissystem(topology)) { - hwloc_debug("%s", "\nno NVML detection (not thissystem)\n"); - return 0; - } - - if (HWLOC_OBJ_PCI_DEVICE != pcidev->type) - return 0; - - if (data->nr_devices == (unsigned) -1) { - /* first call, lookup all devices */ - hwloc_nvml_query_devices(data); - /* if it fails, data->nr_devices = 0 so we won't do anything below and in next callbacks */ - } - - if (!data->nr_devices) - /* found no devices */ - return 0; - - /* now the devices array is ready to use */ - for(i=0; inr_devices; i++) { - struct hwloc_nvml_device_info_s *info = &data->devices[i]; - hwloc_obj_t osdev; - char buffer[64]; - - if (info->pcidomain != pcidev->attr->pcidev.domain) - continue; - if (info->pcibus != pcidev->attr->pcidev.bus) - continue; - if (info->pcidev != pcidev->attr->pcidev.dev) - continue; - if (info->pcifunc != pcidev->attr->pcidev.func) - continue; - - osdev = hwloc_alloc_setup_object(HWLOC_OBJ_OS_DEVICE, -1); - snprintf(buffer, sizeof(buffer), "nvml%d", i); - osdev->name = strdup(buffer); - osdev->depth = (unsigned) HWLOC_TYPE_DEPTH_UNKNOWN; - osdev->attr->osdev.type = HWLOC_OBJ_OSDEV_GPU; - - hwloc_obj_add_info(osdev, "Backend", "NVML"); - hwloc_obj_add_info(osdev, "GPUVendor", "NVIDIA Corporation"); - hwloc_obj_add_info(osdev, "GPUModel", info->name); - if (info->serial[0] != '\0') - hwloc_obj_add_info(osdev, "NVIDIASerial", info->serial); - if (info->uuid[0] != '\0') - hwloc_obj_add_info(osdev, "NVIDIAUUID", info->uuid); - - hwloc_insert_object_by_parent(topology, pcidev, osdev); - - if (info->maxlinkspeed != 0.0f) - /* we found the max link speed, replace the current link speed found by pci (or none) */ - pcidev->attr->pcidev.linkspeed = info->maxlinkspeed; - - return 1; - } - - return 0; -} - -static void -hwloc_nvml_backend_disable(struct hwloc_backend *backend) -{ - struct hwloc_nvml_backend_data_s *data = backend->private_data; - free(data->devices); - free(data); -} - -static struct hwloc_backend * -hwloc_nvml_component_instantiate(struct hwloc_disc_component *component, - const void *_data1 __hwloc_attribute_unused, - const void *_data2 __hwloc_attribute_unused, - const void *_data3 __hwloc_attribute_unused) -{ - struct hwloc_backend *backend; - struct hwloc_nvml_backend_data_s *data; - - /* thissystem may not be fully initialized yet, we'll check flags in discover() */ - - backend = hwloc_backend_alloc(component); - if (!backend) - return NULL; - - data = malloc(sizeof(*data)); - if (!data) { - free(backend); - return NULL; - } - /* the first callback will initialize those */ - data->nr_devices = (unsigned) -1; /* unknown yet */ - data->devices = NULL; - - backend->private_data = data; - backend->disable = hwloc_nvml_backend_disable; - - backend->notify_new_object = hwloc_nvml_backend_notify_new_object; - return backend; -} - -static struct hwloc_disc_component hwloc_nvml_disc_component = { - HWLOC_DISC_COMPONENT_TYPE_MISC, - "nvml", - HWLOC_DISC_COMPONENT_TYPE_GLOBAL, - hwloc_nvml_component_instantiate, - 5, /* after pci, and after cuda since likely less useful */ - NULL -}; - -static int -hwloc_nvml_component_init(unsigned long flags) -{ - if (flags) - return -1; - if (hwloc_plugin_check_namespace("nvml", "hwloc_backend_alloc") < 0) - return -1; - return 0; -} - -#ifdef HWLOC_INSIDE_PLUGIN -HWLOC_DECLSPEC extern const struct hwloc_component hwloc_nvml_component; -#endif - -const struct hwloc_component hwloc_nvml_component = { - HWLOC_COMPONENT_ABI, - hwloc_nvml_component_init, NULL, - HWLOC_COMPONENT_TYPE_DISC, - 0, - &hwloc_nvml_disc_component -}; diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-opencl.c b/opal/mca/hwloc/hwloc1113/hwloc/src/topology-opencl.c deleted file mode 100644 index 85057c7c15b..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-opencl.c +++ /dev/null @@ -1,346 +0,0 @@ -/* - * Copyright © 2012-2014 Inria. All rights reserved. - * Copyright © 2013 Université Bordeaux. All right reserved. - * See COPYING in top-level directory. - */ - -#include -#include -#include - -/* private headers allowed for convenience because this plugin is built within hwloc */ -#include -#include - -#include - -typedef enum hwloc_opencl_device_type_e { - HWLOC_OPENCL_DEVICE_AMD -} hwloc_opencl_device_type_t; - -struct hwloc_opencl_backend_data_s { - unsigned nr_devices; /* -1 when unknown yet, first callback will setup */ - struct hwloc_opencl_device_info_s { - hwloc_opencl_device_type_t type; - - unsigned platformidx; - char platformname[64]; - unsigned platformdeviceidx; - char devicename[64]; - char devicevendor[64]; - char devicetype[64]; - - unsigned computeunits; - unsigned long long globalmemsize; - - union hwloc_opencl_device_info_u { - struct hwloc_opencl_device_info_amd_s { - unsigned pcidomain, pcibus, pcidev, pcifunc; - } amd; - } specific; - } * devices; -}; - -static void -hwloc_opencl_query_devices(struct hwloc_opencl_backend_data_s *data) -{ - cl_platform_id *platform_ids = NULL; - cl_uint nr_platforms; - cl_device_id *device_ids = NULL; - cl_uint nr_devices, nr_total_devices, tmp; - cl_int clret; - unsigned curpfidx, curpfdvidx, i; - - /* mark the number of devices as 0 in case we fail below, - * so that we don't try again later. - */ - data->nr_devices = 0; - - /* count platforms, allocate and get them */ - clret = clGetPlatformIDs(0, NULL, &nr_platforms); - if (CL_SUCCESS != clret || !nr_platforms) - goto out; - hwloc_debug("%u OpenCL platforms\n", nr_platforms); - platform_ids = malloc(nr_platforms * sizeof(*platform_ids)); - if (!platform_ids) - goto out; - clret = clGetPlatformIDs(nr_platforms, platform_ids, &nr_platforms); - if (CL_SUCCESS != clret || !nr_platforms) - goto out_with_platform_ids; - - /* how many devices, total? */ - tmp = 0; - for(i=0; idevices = malloc(nr_total_devices * sizeof(*data->devices)); - if (!data->devices || !device_ids) - goto out_with_device_ids; - /* actually query device ids */ - tmp = 0; - for(i=0; idevices[data->nr_devices]; - cl_platform_id platform_id = 0; - cl_device_type type; -#ifdef CL_DEVICE_TOPOLOGY_AMD - cl_device_topology_amd amdtopo; -#endif - cl_ulong globalmemsize; - cl_uint computeunits; - - hwloc_debug("Looking device %p\n", device_ids[i]); - - info->platformname[0] = '\0'; - clret = clGetDeviceInfo(device_ids[i], CL_DEVICE_PLATFORM, sizeof(platform_id), &platform_id, NULL); - if (CL_SUCCESS != clret) - continue; - clGetPlatformInfo(platform_id, CL_PLATFORM_NAME, sizeof(info->platformname), info->platformname, NULL); - - info->devicename[0] = '\0'; -#ifdef CL_DEVICE_BOARD_NAME_AMD - clGetDeviceInfo(device_ids[i], CL_DEVICE_BOARD_NAME_AMD, sizeof(info->devicename), info->devicename, NULL); -#else - clGetDeviceInfo(device_ids[i], CL_DEVICE_NAME, sizeof(info->devicename), info->devicename, NULL); -#endif - info->devicevendor[0] = '\0'; - clGetDeviceInfo(device_ids[i], CL_DEVICE_VENDOR, sizeof(info->devicevendor), info->devicevendor, NULL); - - clGetDeviceInfo(device_ids[i], CL_DEVICE_TYPE, sizeof(type), &type, NULL); - switch (type) { - case CL_DEVICE_TYPE_CPU: /* FIXME: cannot happen in PCI devices? */ - strcpy(info->devicetype, "CPU"); - break; - case CL_DEVICE_TYPE_GPU: - strcpy(info->devicetype, "GPU"); - break; - case CL_DEVICE_TYPE_ACCELERATOR: - strcpy(info->devicetype, "Accelerator"); - break; - default: - strcpy(info->devicetype, "Unknown"); - break; - } - - clGetDeviceInfo(device_ids[i], CL_DEVICE_GLOBAL_MEM_SIZE, sizeof(globalmemsize), &globalmemsize, NULL); - info->globalmemsize = globalmemsize / 1024; - - clGetDeviceInfo(device_ids[i], CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(computeunits), &computeunits, NULL); - info->computeunits = computeunits; - - hwloc_debug("platform %s device %s vendor %s type %s\n", info->platformname, info->devicename, info->devicevendor, info->devicetype); - - /* find our indexes */ - while (platform_id != platform_ids[curpfidx]) { - curpfidx++; - curpfdvidx = 0; - } - info->platformidx = curpfidx; - info->platformdeviceidx = curpfdvidx; - curpfdvidx++; - - hwloc_debug("This is opencl%dd%d\n", info->platformidx, info->platformdeviceidx); - -#ifdef CL_DEVICE_TOPOLOGY_AMD - clret = clGetDeviceInfo(device_ids[i], CL_DEVICE_TOPOLOGY_AMD, sizeof(amdtopo), &amdtopo, NULL); - if (CL_SUCCESS != clret) { - hwloc_debug("no AMD-specific device information: %d\n", clret); - continue; - } - if (CL_DEVICE_TOPOLOGY_TYPE_PCIE_AMD != amdtopo.raw.type) { - hwloc_debug("not a PCIe device: %u\n", amdtopo.raw.type); - continue; - } - - info->type = HWLOC_OPENCL_DEVICE_AMD; - info->specific.amd.pcidomain = 0; - info->specific.amd.pcibus = amdtopo.pcie.bus; - info->specific.amd.pcidev = amdtopo.pcie.device; - info->specific.amd.pcifunc = amdtopo.pcie.function; - - hwloc_debug("OpenCL device on PCI 0000:%02x:%02x.%u\n", amdtopo.pcie.bus, amdtopo.pcie.device, amdtopo.pcie.function); - - /* validate this device */ - data->nr_devices++; -#endif /* HAVE_DECL_CL_DEVICE_TOPOLOGY_AMD */ - } - free(device_ids); - free(platform_ids); - return; - -out_with_device_ids: - free(device_ids); - free(data->devices); - data->devices = NULL; -out_with_platform_ids: - free(platform_ids); -out: - return; -} - -static int -hwloc_opencl_backend_notify_new_object(struct hwloc_backend *backend, struct hwloc_backend *caller __hwloc_attribute_unused, - struct hwloc_obj *pcidev) -{ - struct hwloc_topology *topology = backend->topology; - struct hwloc_opencl_backend_data_s *data = backend->private_data; - unsigned i; - - if (!(hwloc_topology_get_flags(topology) & (HWLOC_TOPOLOGY_FLAG_IO_DEVICES|HWLOC_TOPOLOGY_FLAG_WHOLE_IO))) - return 0; - - if (!hwloc_topology_is_thissystem(topology)) { - hwloc_debug("%s", "\nno OpenCL detection (not thissystem)\n"); - return 0; - } - - if (HWLOC_OBJ_PCI_DEVICE != pcidev->type) - return 0; - - if (data->nr_devices == (unsigned) -1) { - /* first call, lookup all devices */ - hwloc_opencl_query_devices(data); - /* if it fails, data->nr_devices = 0 so we won't do anything below and in next callbacks */ - } - - if (!data->nr_devices) - /* found no devices */ - return 0; - - /* now the devices array is ready to use */ - for(i=0; inr_devices; i++) { - struct hwloc_opencl_device_info_s *info = &data->devices[i]; - hwloc_obj_t osdev; - char buffer[64]; - - assert(info->type == HWLOC_OPENCL_DEVICE_AMD); - if (info->specific.amd.pcidomain != pcidev->attr->pcidev.domain) - continue; - if (info->specific.amd.pcibus != pcidev->attr->pcidev.bus) - continue; - if (info->specific.amd.pcidev != pcidev->attr->pcidev.dev) - continue; - if (info->specific.amd.pcifunc != pcidev->attr->pcidev.func) - continue; - - osdev = hwloc_alloc_setup_object(HWLOC_OBJ_OS_DEVICE, -1); - snprintf(buffer, sizeof(buffer), "opencl%dd%d", info->platformidx, info->platformdeviceidx); - osdev->name = strdup(buffer); - osdev->depth = (unsigned) HWLOC_TYPE_DEPTH_UNKNOWN; - osdev->attr->osdev.type = HWLOC_OBJ_OSDEV_COPROC; - - hwloc_obj_add_info(osdev, "CoProcType", "OpenCL"); - hwloc_obj_add_info(osdev, "Backend", "OpenCL"); - hwloc_obj_add_info(osdev, "OpenCLDeviceType", info->devicetype); - - if (info->devicevendor[0] != '\0') - hwloc_obj_add_info(osdev, "GPUVendor", info->devicevendor); - if (info->devicename[0] != '\0') - hwloc_obj_add_info(osdev, "GPUModel", info->devicename); - - snprintf(buffer, sizeof(buffer), "%u", info->platformidx); - hwloc_obj_add_info(osdev, "OpenCLPlatformIndex", buffer); - if (info->platformname[0] != '\0') - hwloc_obj_add_info(osdev, "OpenCLPlatformName", info->platformname); - - snprintf(buffer, sizeof(buffer), "%u", info->platformdeviceidx); - hwloc_obj_add_info(osdev, "OpenCLPlatformDeviceIndex", buffer); - - snprintf(buffer, sizeof(buffer), "%u", info->computeunits); - hwloc_obj_add_info(osdev, "OpenCLComputeUnits", buffer); - - snprintf(buffer, sizeof(buffer), "%llu", info->globalmemsize); - hwloc_obj_add_info(osdev, "OpenCLGlobalMemorySize", buffer); - - hwloc_insert_object_by_parent(topology, pcidev, osdev); - return 1; - } - - return 0; -} - -static void -hwloc_opencl_backend_disable(struct hwloc_backend *backend) -{ - struct hwloc_opencl_backend_data_s *data = backend->private_data; - free(data->devices); - free(data); -} - -static struct hwloc_backend * -hwloc_opencl_component_instantiate(struct hwloc_disc_component *component, - const void *_data1 __hwloc_attribute_unused, - const void *_data2 __hwloc_attribute_unused, - const void *_data3 __hwloc_attribute_unused) -{ - struct hwloc_backend *backend; - struct hwloc_opencl_backend_data_s *data; - - /* thissystem may not be fully initialized yet, we'll check flags in discover() */ - - backend = hwloc_backend_alloc(component); - if (!backend) - return NULL; - - data = malloc(sizeof(*data)); - if (!data) { - free(backend); - return NULL; - } - /* the first callback will initialize those */ - data->nr_devices = (unsigned) -1; /* unknown yet */ - data->devices = NULL; - - backend->private_data = data; - backend->disable = hwloc_opencl_backend_disable; - - backend->notify_new_object = hwloc_opencl_backend_notify_new_object; - return backend; -} - -static struct hwloc_disc_component hwloc_opencl_disc_component = { - HWLOC_DISC_COMPONENT_TYPE_MISC, - "opencl", - HWLOC_DISC_COMPONENT_TYPE_GLOBAL, - hwloc_opencl_component_instantiate, - 10, /* after pci */ - NULL -}; - -static int -hwloc_opencl_component_init(unsigned long flags) -{ - if (flags) - return -1; - if (hwloc_plugin_check_namespace("opencl", "hwloc_backend_alloc") < 0) - return -1; - return 0; -} - -#ifdef HWLOC_INSIDE_PLUGIN -HWLOC_DECLSPEC extern const struct hwloc_component hwloc_opencl_component; -#endif - -const struct hwloc_component hwloc_opencl_component = { - HWLOC_COMPONENT_ABI, - hwloc_opencl_component_init, NULL, - HWLOC_COMPONENT_TYPE_DISC, - 0, - &hwloc_opencl_disc_component -}; diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-osf.c b/opal/mca/hwloc/hwloc1113/hwloc/src/topology-osf.c deleted file mode 100644 index b403d1343fc..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-osf.c +++ /dev/null @@ -1,392 +0,0 @@ -/* - * Copyright © 2009 CNRS - * Copyright © 2009-2015 Inria. All rights reserved. - * Copyright © 2009-2011 Université Bordeaux - * Copyright © 2011 Cisco Systems, Inc. All rights reserved. - * See COPYING in top-level directory. - */ - -#include - -#include -#ifdef HAVE_DIRENT_H -#include -#endif -#ifdef HAVE_UNISTD_H -#include -#endif -#include -#include -#include -#include -#include -#include - -#include -#include -#include - -#include -#include -#include -#include - -/* - * TODO - * - * nsg_init(), nsg_attach_pid(), RAD_MIGRATE/RAD_WAIT - * assign_pid_to_pset() - * - * pthread_use_only_cpu too? - */ - -static int -prepare_radset(hwloc_topology_t topology __hwloc_attribute_unused, radset_t *radset, hwloc_const_bitmap_t hwloc_set) -{ - unsigned cpu; - cpuset_t target_cpuset; - cpuset_t cpuset, xor_cpuset; - radid_t radid; - int ret = 0; - int ret_errno = 0; - int nbnodes = rad_get_num(); - - cpusetcreate(&target_cpuset); - cpuemptyset(target_cpuset); - hwloc_bitmap_foreach_begin(cpu, hwloc_set) - cpuaddset(target_cpuset, cpu); - hwloc_bitmap_foreach_end(); - - cpusetcreate(&cpuset); - cpusetcreate(&xor_cpuset); - for (radid = 0; radid < nbnodes; radid++) { - cpuemptyset(cpuset); - if (rad_get_cpus(radid, cpuset)==-1) { - fprintf(stderr,"rad_get_cpus(%d) failed: %s\n",radid,strerror(errno)); - continue; - } - cpuxorset(target_cpuset, cpuset, xor_cpuset); - if (cpucountset(xor_cpuset) == 0) { - /* Found it */ - radsetcreate(radset); - rademptyset(*radset); - radaddset(*radset, radid); - ret = 1; - goto out; - } - } - /* radset containing exactly this set of CPUs not found */ - ret_errno = EXDEV; - -out: - cpusetdestroy(&target_cpuset); - cpusetdestroy(&cpuset); - cpusetdestroy(&xor_cpuset); - errno = ret_errno; - return ret; -} - -/* Note: get_cpubind not available on OSF */ - -static int -hwloc_osf_set_thread_cpubind(hwloc_topology_t topology, hwloc_thread_t thread, hwloc_const_bitmap_t hwloc_set, int flags) -{ - radset_t radset; - - if (hwloc_bitmap_isequal(hwloc_set, hwloc_topology_get_complete_cpuset(topology))) { - if ((errno = pthread_rad_detach(thread))) - return -1; - return 0; - } - - /* Apparently OSF migrates pages */ - if (flags & HWLOC_CPUBIND_NOMEMBIND) { - errno = ENOSYS; - return -1; - } - - if (!prepare_radset(topology, &radset, hwloc_set)) - return -1; - - if (flags & HWLOC_CPUBIND_STRICT) { - if ((errno = pthread_rad_bind(thread, radset, RAD_INSIST | RAD_WAIT))) - return -1; - } else { - if ((errno = pthread_rad_attach(thread, radset, RAD_WAIT))) - return -1; - } - radsetdestroy(&radset); - - return 0; -} - -static int -hwloc_osf_set_proc_cpubind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_bitmap_t hwloc_set, int flags) -{ - radset_t radset; - - if (hwloc_bitmap_isequal(hwloc_set, hwloc_topology_get_complete_cpuset(topology))) { - if (rad_detach_pid(pid)) - return -1; - return 0; - } - - /* Apparently OSF migrates pages */ - if (flags & HWLOC_CPUBIND_NOMEMBIND) { - errno = ENOSYS; - return -1; - } - - if (!prepare_radset(topology, &radset, hwloc_set)) - return -1; - - if (flags & HWLOC_CPUBIND_STRICT) { - if (rad_bind_pid(pid, radset, RAD_INSIST | RAD_WAIT)) - return -1; - } else { - if (rad_attach_pid(pid, radset, RAD_WAIT)) - return -1; - } - radsetdestroy(&radset); - - return 0; -} - -static int -hwloc_osf_set_thisthread_cpubind(hwloc_topology_t topology, hwloc_const_bitmap_t hwloc_set, int flags) -{ - return hwloc_osf_set_thread_cpubind(topology, pthread_self(), hwloc_set, flags); -} - -static int -hwloc_osf_set_thisproc_cpubind(hwloc_topology_t topology, hwloc_const_bitmap_t hwloc_set, int flags) -{ - return hwloc_osf_set_proc_cpubind(topology, getpid(), hwloc_set, flags); -} - -static int -hwloc_osf_prepare_mattr(hwloc_topology_t topology __hwloc_attribute_unused, memalloc_attr_t *mattr, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags __hwloc_attribute_unused) -{ - unsigned long osf_policy; - int node; - - switch (policy) { - case HWLOC_MEMBIND_FIRSTTOUCH: - osf_policy = MPOL_THREAD; - break; - case HWLOC_MEMBIND_DEFAULT: - case HWLOC_MEMBIND_BIND: - osf_policy = MPOL_DIRECTED; - break; - case HWLOC_MEMBIND_INTERLEAVE: - osf_policy = MPOL_STRIPPED; - break; - case HWLOC_MEMBIND_REPLICATE: - osf_policy = MPOL_REPLICATED; - break; - default: - errno = ENOSYS; - return -1; - } - - memset(mattr, 0, sizeof(*mattr)); - mattr->mattr_policy = osf_policy; - mattr->mattr_rad = RAD_NONE; - radsetcreate(&mattr->mattr_radset); - rademptyset(mattr->mattr_radset); - - hwloc_bitmap_foreach_begin(node, nodeset) - radaddset(mattr->mattr_radset, node); - hwloc_bitmap_foreach_end(); - return 0; -} - -static int -hwloc_osf_set_area_membind(hwloc_topology_t topology, const void *addr, size_t len, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) -{ - memalloc_attr_t mattr; - int behavior = 0; - int ret; - - if (flags & HWLOC_MEMBIND_MIGRATE) - behavior |= MADV_CURRENT; - if (flags & HWLOC_MEMBIND_STRICT) - behavior |= MADV_INSIST; - - if (hwloc_osf_prepare_mattr(topology, &mattr, nodeset, policy, flags)) - return -1; - - ret = nmadvise(addr, len, MADV_CURRENT, &mattr); - radsetdestroy(&mattr.mattr_radset); - return ret; -} - -static void * -hwloc_osf_alloc_membind(hwloc_topology_t topology, size_t len, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) -{ - memalloc_attr_t mattr; - void *ptr; - - if (hwloc_osf_prepare_mattr(topology, &mattr, nodeset, policy, flags)) - return hwloc_alloc_or_fail(topology, len, flags); - - /* TODO: rather use acreate/amalloc ? */ - ptr = nmmap(NULL, len, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, - 0, &mattr); - radsetdestroy(&mattr.mattr_radset); - return ptr == MAP_FAILED ? NULL : ptr; -} - -static int -hwloc_look_osf(struct hwloc_backend *backend) -{ - struct hwloc_topology *topology = backend->topology; - cpu_cursor_t cursor; - unsigned nbnodes; - radid_t radid, radid2; - radset_t radset, radset2; - cpuid_t cpuid; - cpuset_t cpuset; - struct hwloc_obj *obj; - unsigned distance; - - if (topology->levels[0][0]->cpuset) - /* somebody discovered things */ - return 0; - - hwloc_alloc_obj_cpusets(topology->levels[0][0]); - - nbnodes = rad_get_num(); - - cpusetcreate(&cpuset); - radsetcreate(&radset); - radsetcreate(&radset2); - { - hwloc_obj_t *nodes = calloc(nbnodes, sizeof(hwloc_obj_t)); - unsigned *indexes = calloc(nbnodes, sizeof(unsigned)); - float *distances = calloc(nbnodes*nbnodes, sizeof(float)); - unsigned nfound; - numa_attr_t attr; - - attr.nattr_type = R_RAD; - attr.nattr_descr.rd_radset = radset; - attr.nattr_flags = 0; - - for (radid = 0; radid < (radid_t) nbnodes; radid++) { - rademptyset(radset); - radaddset(radset, radid); - cpuemptyset(cpuset); - if (rad_get_cpus(radid, cpuset)==-1) { - fprintf(stderr,"rad_get_cpus(%d) failed: %s\n",radid,strerror(errno)); - continue; - } - - indexes[radid] = radid; - nodes[radid] = obj = hwloc_alloc_setup_object(HWLOC_OBJ_NUMANODE, radid); - obj->nodeset = hwloc_bitmap_alloc(); - hwloc_bitmap_set(obj->nodeset, radid); - obj->cpuset = hwloc_bitmap_alloc(); - obj->memory.local_memory = rad_get_physmem(radid) * hwloc_getpagesize(); - obj->memory.page_types_len = 2; - obj->memory.page_types = malloc(2*sizeof(*obj->memory.page_types)); - memset(obj->memory.page_types, 0, 2*sizeof(*obj->memory.page_types)); - obj->memory.page_types[0].size = hwloc_getpagesize(); -#ifdef HAVE__SC_LARGE_PAGESIZE - obj->memory.page_types[1].size = sysconf(_SC_LARGE_PAGESIZE); -#endif - - cursor = SET_CURSOR_INIT; - while((cpuid = cpu_foreach(cpuset, 0, &cursor)) != CPU_NONE) - hwloc_bitmap_set(obj->cpuset, cpuid); - - hwloc_debug_1arg_bitmap("node %d has cpuset %s\n", - radid, obj->cpuset); - - hwloc_insert_object_by_cpuset(topology, obj); - - nfound = 0; - for (radid2 = 0; radid2 < (radid_t) nbnodes; radid2++) - distances[radid*nbnodes+radid2] = RAD_DIST_REMOTE; - for (distance = RAD_DIST_LOCAL; distance < RAD_DIST_REMOTE; distance++) { - attr.nattr_distance = distance; - /* get set of NUMA nodes at distance <= DISTANCE */ - if (nloc(&attr, radset2)) { - fprintf(stderr,"nloc failed: %s\n", strerror(errno)); - continue; - } - cursor = SET_CURSOR_INIT; - while ((radid2 = rad_foreach(radset2, 0, &cursor)) != RAD_NONE) { - if (distances[radid*nbnodes+radid2] == RAD_DIST_REMOTE) { - distances[radid*nbnodes+radid2] = (float) distance; - nfound++; - } - } - if (nfound == nbnodes) - /* Finished finding distances, no need to go up to RAD_DIST_REMOTE */ - break; - } - } - - hwloc_distances_set(topology, HWLOC_OBJ_NUMANODE, nbnodes, indexes, nodes, distances, 0 /* OS cannot force */); - } - radsetdestroy(&radset2); - radsetdestroy(&radset); - cpusetdestroy(&cpuset); - - /* add PU objects */ - hwloc_setup_pu_level(topology, hwloc_fallback_nbprocessors(topology)); - - hwloc_obj_add_info(topology->levels[0][0], "Backend", "OSF"); - if (topology->is_thissystem) - hwloc_add_uname_info(topology, NULL); - return 1; -} - -void -hwloc_set_osf_hooks(struct hwloc_binding_hooks *hooks, - struct hwloc_topology_support *support) -{ - hooks->set_thread_cpubind = hwloc_osf_set_thread_cpubind; - hooks->set_thisthread_cpubind = hwloc_osf_set_thisthread_cpubind; - hooks->set_proc_cpubind = hwloc_osf_set_proc_cpubind; - hooks->set_thisproc_cpubind = hwloc_osf_set_thisproc_cpubind; - hooks->set_area_membind = hwloc_osf_set_area_membind; - hooks->alloc_membind = hwloc_osf_alloc_membind; - hooks->alloc = hwloc_alloc_mmap; - hooks->free_membind = hwloc_free_mmap; - support->membind->firsttouch_membind = 1; - support->membind->bind_membind = 1; - support->membind->interleave_membind = 1; - support->membind->replicate_membind = 1; -} - -static struct hwloc_backend * -hwloc_osf_component_instantiate(struct hwloc_disc_component *component, - const void *_data1 __hwloc_attribute_unused, - const void *_data2 __hwloc_attribute_unused, - const void *_data3 __hwloc_attribute_unused) -{ - struct hwloc_backend *backend; - backend = hwloc_backend_alloc(component); - if (!backend) - return NULL; - backend->discover = hwloc_look_osf; - return backend; -} - -static struct hwloc_disc_component hwloc_osf_disc_component = { - HWLOC_DISC_COMPONENT_TYPE_CPU, - "osf", - HWLOC_DISC_COMPONENT_TYPE_GLOBAL, - hwloc_osf_component_instantiate, - 50, - NULL -}; - -const struct hwloc_component hwloc_osf_component = { - HWLOC_COMPONENT_ABI, - NULL, NULL, - HWLOC_COMPONENT_TYPE_DISC, - 0, - &hwloc_osf_disc_component -}; diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-pci.c b/opal/mca/hwloc/hwloc1113/hwloc/src/topology-pci.c deleted file mode 100644 index d7028e9076f..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-pci.c +++ /dev/null @@ -1,346 +0,0 @@ -/* - * Copyright © 2009 CNRS - * Copyright © 2009-2016 Inria. All rights reserved. - * Copyright © 2009-2011, 2013 Université Bordeaux - * Copyright © 2014 Cisco Systems, Inc. All rights reserved. - * Copyright © 2015 Research Organization for Information Science - * and Technology (RIST). All rights reserved. - * See COPYING in top-level directory. - */ - -#include -#include -#include -#include - -/* private headers allowed for convenience because this plugin is built within hwloc */ -#include -#include - -#include -#include -#include -#include -#include -#ifdef HWLOC_LINUX_SYS -#include -#endif - -#include - -#ifndef PCI_HEADER_TYPE -#define PCI_HEADER_TYPE 0x0e -#endif -#ifndef PCI_HEADER_TYPE_BRIDGE -#define PCI_HEADER_TYPE_BRIDGE 1 -#endif - -#ifndef PCI_CLASS_DEVICE -#define PCI_CLASS_DEVICE 0x0a -#endif -#ifndef PCI_CLASS_BRIDGE_PCI -#define PCI_CLASS_BRIDGE_PCI 0x0604 -#endif - -#ifndef PCI_REVISION_ID -#define PCI_REVISION_ID 0x08 -#endif - -#ifndef PCI_SUBSYSTEM_VENDOR_ID -#define PCI_SUBSYSTEM_VENDOR_ID 0x2c -#endif -#ifndef PCI_SUBSYSTEM_ID -#define PCI_SUBSYSTEM_ID 0x2e -#endif - -#ifndef PCI_PRIMARY_BUS -#define PCI_PRIMARY_BUS 0x18 -#endif -#ifndef PCI_SECONDARY_BUS -#define PCI_SECONDARY_BUS 0x19 -#endif -#ifndef PCI_SUBORDINATE_BUS -#define PCI_SUBORDINATE_BUS 0x1a -#endif - -#ifndef PCI_CAP_ID_EXP -#define PCI_CAP_ID_EXP 0x10 -#endif - -#ifndef PCI_CAP_NORMAL -#define PCI_CAP_NORMAL 1 -#endif - -#define CONFIG_SPACE_CACHESIZE 256 - - -static int -hwloc_look_pci(struct hwloc_backend *backend) -{ - struct hwloc_topology *topology = backend->topology; - struct hwloc_obj *first_obj = NULL, *last_obj = NULL; - int ret; - struct pci_device_iterator *iter; - struct pci_device *pcidev; -#ifdef HWLOC_LINUX_SYS - DIR *dir; -#endif - - if (!(hwloc_topology_get_flags(topology) & (HWLOC_TOPOLOGY_FLAG_IO_DEVICES|HWLOC_TOPOLOGY_FLAG_WHOLE_IO))) - return 0; - - if (hwloc_get_next_pcidev(topology, NULL)) { - hwloc_debug("%s", "PCI objects already added, ignoring pci backend.\n"); - return 0; - } - - if (!hwloc_topology_is_thissystem(topology)) { - hwloc_debug("%s", "\nno PCI detection (not thissystem)\n"); - return 0; - } - - hwloc_debug("%s", "\nScanning PCI buses...\n"); - - /* initialize PCI scanning */ - ret = pci_system_init(); - if (ret) { - hwloc_debug("%s", "Can not initialize libpciaccess\n"); - return -1; - } - - iter = pci_slot_match_iterator_create(NULL); - - /* iterate over devices */ - for (pcidev = pci_device_next(iter); - pcidev; - pcidev = pci_device_next(iter)) - { - const char *vendorname, *devicename, *fullname; - unsigned char config_space_cache[CONFIG_SPACE_CACHESIZE]; - struct hwloc_obj *obj; - unsigned os_index; - unsigned domain; - unsigned device_class; - unsigned short tmp16; - char name[128]; - unsigned offset; - - /* initialize the config space in case we fail to read it (missing permissions, etc). */ - memset(config_space_cache, 0xff, CONFIG_SPACE_CACHESIZE); - pci_device_probe(pcidev); - pci_device_cfg_read(pcidev, config_space_cache, 0, CONFIG_SPACE_CACHESIZE, NULL); - - /* try to read the domain */ - domain = pcidev->domain; - - /* try to read the device_class */ - device_class = pcidev->device_class >> 8; - - /* fixup SR-IOV buggy VF device/vendor IDs */ - if (0xffff == pcidev->vendor_id && 0xffff == pcidev->device_id) { - /* SR-IOV puts ffff:ffff in Virtual Function config space. - * The actual VF device ID is stored at a special (dynamic) location in the Physical Function config space. - * VF and PF have the same vendor ID. - * - * libpciaccess just returns ffff:ffff, needs to be fixed. - * linuxpci is OK because sysfs files are already fixed the kernel. - * (pciutils is OK when it uses those Linux sysfs files.) - * - * Reading these files is an easy way to work around the libpciaccess issue on Linux, - * but we have no way to know if this is caused by SR-IOV or not. - * - * TODO: - * If PF has CAP_ID_PCIX or CAP_ID_EXP (offset>0), - * look for extended capability PCI_EXT_CAP_ID_SRIOV (need extended config space (more than 256 bytes)), - * then read the VF device ID after it (PCI_IOV_DID bytes later). - * Needs access to extended config space (needs root on Linux). - * TODO: - * Add string info attributes in VF and PF objects? - */ -#ifdef HWLOC_LINUX_SYS - /* Workaround for Linux (the kernel returns the VF device/vendor IDs). */ - char path[64]; - char value[16]; - FILE *file; - size_t read; - - snprintf(path, sizeof(path), "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/vendor", - domain, pcidev->bus, pcidev->dev, pcidev->func); - file = fopen(path, "r"); - if (file) { - read = fread(value, 1, sizeof(value), file); - fclose(file); - if (read) - /* fixup the pciaccess struct so that pci_device_get_vendor_name() is correct later. */ - pcidev->vendor_id = strtoul(value, NULL, 16); - } - - snprintf(path, sizeof(path), "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/device", - domain, pcidev->bus, pcidev->dev, pcidev->func); - file = fopen(path, "r"); - if (file) { - read = fread(value, 1, sizeof(value), file); - fclose(file); - if (read) - /* fixup the pciaccess struct so that pci_device_get_device_name() is correct later. */ - pcidev->device_id = strtoul(value, NULL, 16); - } -#endif - } - - /* might be useful for debugging (note that domain might be truncated) */ - os_index = (domain << 20) + (pcidev->bus << 12) + (pcidev->dev << 4) + pcidev->func; - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_PCI_DEVICE, os_index); - obj->attr->pcidev.domain = domain; - obj->attr->pcidev.bus = pcidev->bus; - obj->attr->pcidev.dev = pcidev->dev; - obj->attr->pcidev.func = pcidev->func; - obj->attr->pcidev.vendor_id = pcidev->vendor_id; - obj->attr->pcidev.device_id = pcidev->device_id; - obj->attr->pcidev.class_id = device_class; - obj->attr->pcidev.revision = config_space_cache[PCI_REVISION_ID]; - - obj->attr->pcidev.linkspeed = 0; /* unknown */ - offset = hwloc_pci_find_cap(config_space_cache, PCI_CAP_ID_EXP); - - if (offset > 0 && offset + 20 /* size of PCI express block up to link status */ <= CONFIG_SPACE_CACHESIZE) - hwloc_pci_find_linkspeed(config_space_cache, offset, &obj->attr->pcidev.linkspeed); - - if (hwloc_pci_prepare_bridge(obj, config_space_cache) < 0) - continue; - - if (obj->type == HWLOC_OBJ_PCI_DEVICE) { - memcpy(&tmp16, &config_space_cache[PCI_SUBSYSTEM_VENDOR_ID], sizeof(tmp16)); - obj->attr->pcidev.subvendor_id = tmp16; - memcpy(&tmp16, &config_space_cache[PCI_SUBSYSTEM_ID], sizeof(tmp16)); - obj->attr->pcidev.subdevice_id = tmp16; - } else { - /* TODO: - * bridge must lookup PCI_CAP_ID_SSVID and then look at offset+PCI_SSVID_VENDOR/DEVICE_ID - * cardbus must look at PCI_CB_SUBSYSTEM_VENDOR_ID and PCI_CB_SUBSYSTEM_ID - */ - } - - /* get the vendor name */ - vendorname = pci_device_get_vendor_name(pcidev); - if (vendorname && *vendorname) - hwloc_obj_add_info(obj, "PCIVendor", vendorname); - - /* get the device name */ - devicename = pci_device_get_device_name(pcidev); - if (devicename && *devicename) - hwloc_obj_add_info(obj, "PCIDevice", devicename); - - /* generate or get the fullname */ - snprintf(name, sizeof(name), "%s%s%s", - vendorname ? vendorname : "", - vendorname && devicename ? " " : "", - devicename ? devicename : ""); - fullname = name; - if (*name) - obj->name = strdup(name); - hwloc_debug(" %04x:%02x:%02x.%01x %04x %04x:%04x %s\n", - domain, pcidev->bus, pcidev->dev, pcidev->func, - device_class, pcidev->vendor_id, pcidev->device_id, - fullname && *fullname ? fullname : "??"); - - /* queue the object for now */ - if (first_obj) - last_obj->next_sibling = obj; - else - first_obj = obj; - last_obj = obj; - } - - /* finalize device scanning */ - pci_iterator_destroy(iter); - pci_system_cleanup(); - -#ifdef HWLOC_LINUX_SYS - dir = opendir("/sys/bus/pci/slots/"); - if (dir) { - struct dirent *dirent; - while ((dirent = readdir(dir)) != NULL) { - char path[64]; - FILE *file; - if (dirent->d_name[0] == '.') - continue; - snprintf(path, sizeof(path), "/sys/bus/pci/slots/%s/address", dirent->d_name); - file = fopen(path, "r"); - if (file) { - unsigned domain, bus, dev; - if (fscanf(file, "%x:%x:%x", &domain, &bus, &dev) == 3) { - hwloc_obj_t obj = first_obj; - while (obj) { - if (obj->attr->pcidev.domain == domain - && obj->attr->pcidev.bus == bus - && obj->attr->pcidev.dev == dev) { - hwloc_obj_add_info(obj, "PCISlot", dirent->d_name); - } - obj = obj->next_sibling; - } - } - fclose(file); - } - } - closedir(dir); - } -#endif - - return hwloc_insert_pci_device_list(backend, first_obj); -} - -static struct hwloc_backend * -hwloc_pci_component_instantiate(struct hwloc_disc_component *component, - const void *_data1 __hwloc_attribute_unused, - const void *_data2 __hwloc_attribute_unused, - const void *_data3 __hwloc_attribute_unused) -{ - struct hwloc_backend *backend; - - /* thissystem may not be fully initialized yet, we'll check flags in discover() */ - - backend = hwloc_backend_alloc(component); - if (!backend) - return NULL; - backend->flags = HWLOC_BACKEND_FLAG_NEED_LEVELS; -#ifdef HWLOC_SOLARIS_SYS - if ((uid_t)0 != geteuid()) - backend->discover = NULL; - else -#endif - backend->discover = hwloc_look_pci; - return backend; -} - -static struct hwloc_disc_component hwloc_pci_disc_component = { - HWLOC_DISC_COMPONENT_TYPE_MISC, - "pci", - HWLOC_DISC_COMPONENT_TYPE_GLOBAL, - hwloc_pci_component_instantiate, - 20, - NULL -}; - -static int -hwloc_pci_component_init(unsigned long flags) -{ - if (flags) - return -1; - if (hwloc_plugin_check_namespace("pci", "hwloc_backend_alloc") < 0) - return -1; - return 0; -} - -#ifdef HWLOC_INSIDE_PLUGIN -HWLOC_DECLSPEC extern const struct hwloc_component hwloc_pci_component; -#endif - -const struct hwloc_component hwloc_pci_component = { - HWLOC_COMPONENT_ABI, - hwloc_pci_component_init, NULL, - HWLOC_COMPONENT_TYPE_DISC, - 0, - &hwloc_pci_disc_component -}; diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-solaris.c b/opal/mca/hwloc/hwloc1113/hwloc/src/topology-solaris.c deleted file mode 100644 index 06a4115e5ae..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-solaris.c +++ /dev/null @@ -1,803 +0,0 @@ -/* - * Copyright © 2009 CNRS - * Copyright © 2009-2016 Inria. All rights reserved. - * Copyright © 2009-2011 Université Bordeaux - * Copyright © 2011 Cisco Systems, Inc. All rights reserved. - * Copyright © 2011 Oracle and/or its affiliates. All rights reserved. - * See COPYING in top-level directory. - */ - -#include -#include -#include -#include -#include - -#include -#include -#ifdef HAVE_DIRENT_H -#include -#endif -#ifdef HAVE_UNISTD_H -#include -#endif -#include -#include -#include -#include -#include - -#ifdef HAVE_LIBLGRP -# include -#endif - -/* TODO: use psets? (only for root) - * TODO: get cache info from prtdiag? (it is setgid sys to be able to read from - * crw-r----- 1 root sys 88, 0 nov 3 14:35 /devices/pseudo/devinfo@0:devinfo - * and run (apparently undocumented) ioctls on it. - */ - -static int -hwloc_solaris_set_sth_cpubind(hwloc_topology_t topology, idtype_t idtype, id_t id, hwloc_const_bitmap_t hwloc_set, int flags) -{ - unsigned target_cpu; - - /* The resulting binding is always strict */ - - if (hwloc_bitmap_isequal(hwloc_set, hwloc_topology_get_complete_cpuset(topology))) { - if (processor_bind(idtype, id, PBIND_NONE, NULL) != 0) - return -1; -#ifdef HAVE_LIBLGRP - if (!(flags & HWLOC_CPUBIND_NOMEMBIND)) { - int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE); - if (depth >= 0) { - int n = hwloc_get_nbobjs_by_depth(topology, depth); - int i; - - for (i = 0; i < n; i++) { - hwloc_obj_t obj = hwloc_get_obj_by_depth(topology, depth, i); - lgrp_affinity_set(idtype, id, obj->os_index, LGRP_AFF_NONE); - } - } - } -#endif /* HAVE_LIBLGRP */ - return 0; - } - -#ifdef HAVE_LIBLGRP - if (!(flags & HWLOC_CPUBIND_NOMEMBIND)) { - int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE); - if (depth >= 0) { - int n = hwloc_get_nbobjs_by_depth(topology, depth); - int i; - int ok; - hwloc_bitmap_t target = hwloc_bitmap_alloc(); - - for (i = 0; i < n; i++) { - hwloc_obj_t obj = hwloc_get_obj_by_depth(topology, depth, i); - if (hwloc_bitmap_isincluded(obj->cpuset, hwloc_set)) - hwloc_bitmap_or(target, target, obj->cpuset); - } - - ok = hwloc_bitmap_isequal(target, hwloc_set); - hwloc_bitmap_free(target); - - if (ok) { - /* Ok, managed to achieve hwloc_set by just combining NUMA nodes */ - - for (i = 0; i < n; i++) { - hwloc_obj_t obj = hwloc_get_obj_by_depth(topology, depth, i); - - if (hwloc_bitmap_isincluded(obj->cpuset, hwloc_set)) { - lgrp_affinity_set(idtype, id, obj->os_index, LGRP_AFF_STRONG); - } else { - if (flags & HWLOC_CPUBIND_STRICT) - lgrp_affinity_set(idtype, id, obj->os_index, LGRP_AFF_NONE); - else - lgrp_affinity_set(idtype, id, obj->os_index, LGRP_AFF_WEAK); - } - } - - return 0; - } - } - } -#endif /* HAVE_LIBLGRP */ - - if (hwloc_bitmap_weight(hwloc_set) != 1) { - errno = EXDEV; - return -1; - } - - target_cpu = hwloc_bitmap_first(hwloc_set); - - if (processor_bind(idtype, id, - (processorid_t) (target_cpu), NULL) != 0) - return -1; - - return 0; -} - -static int -hwloc_solaris_set_proc_cpubind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_bitmap_t hwloc_set, int flags) -{ - return hwloc_solaris_set_sth_cpubind(topology, P_PID, pid, hwloc_set, flags); -} - -static int -hwloc_solaris_set_thisproc_cpubind(hwloc_topology_t topology, hwloc_const_bitmap_t hwloc_set, int flags) -{ - return hwloc_solaris_set_sth_cpubind(topology, P_PID, P_MYID, hwloc_set, flags); -} - -static int -hwloc_solaris_set_thisthread_cpubind(hwloc_topology_t topology, hwloc_const_bitmap_t hwloc_set, int flags) -{ - return hwloc_solaris_set_sth_cpubind(topology, P_LWPID, P_MYID, hwloc_set, flags); -} - -#ifdef HAVE_LIBLGRP -static int -hwloc_solaris_get_sth_cpubind(hwloc_topology_t topology, idtype_t idtype, id_t id, hwloc_bitmap_t hwloc_set, int flags __hwloc_attribute_unused) -{ - processorid_t binding; - int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE); - int n; - int i; - - if (depth < 0) { - errno = ENOSYS; - return -1; - } - - /* first check if processor_bind() was used to bind to a single processor rather than to an lgroup */ - if ( processor_bind(idtype, id, PBIND_QUERY, &binding) == 0 && binding != PBIND_NONE ) { - hwloc_bitmap_only(hwloc_set, binding); - return 0; - } - - /* if not, check lgroups */ - hwloc_bitmap_zero(hwloc_set); - n = hwloc_get_nbobjs_by_depth(topology, depth); - for (i = 0; i < n; i++) { - hwloc_obj_t obj = hwloc_get_obj_by_depth(topology, depth, i); - lgrp_affinity_t aff = lgrp_affinity_get(idtype, id, obj->os_index); - - if (aff == LGRP_AFF_STRONG) - hwloc_bitmap_or(hwloc_set, hwloc_set, obj->cpuset); - } - - if (hwloc_bitmap_iszero(hwloc_set)) - hwloc_bitmap_copy(hwloc_set, hwloc_topology_get_complete_cpuset(topology)); - - return 0; -} - -static int -hwloc_solaris_get_proc_cpubind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_bitmap_t hwloc_set, int flags) -{ - return hwloc_solaris_get_sth_cpubind(topology, P_PID, pid, hwloc_set, flags); -} - -static int -hwloc_solaris_get_thisproc_cpubind(hwloc_topology_t topology, hwloc_bitmap_t hwloc_set, int flags) -{ - return hwloc_solaris_get_sth_cpubind(topology, P_PID, P_MYID, hwloc_set, flags); -} - -static int -hwloc_solaris_get_thisthread_cpubind(hwloc_topology_t topology, hwloc_bitmap_t hwloc_set, int flags) -{ - return hwloc_solaris_get_sth_cpubind(topology, P_LWPID, P_MYID, hwloc_set, flags); -} -#endif /* HAVE_LIBLGRP */ - -/* TODO: given thread, probably not easy because of the historical n:m implementation */ -#ifdef HAVE_LIBLGRP -static int -hwloc_solaris_set_sth_membind(hwloc_topology_t topology, idtype_t idtype, id_t id, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) -{ - int depth; - int n, i; - - switch (policy) { - case HWLOC_MEMBIND_DEFAULT: - case HWLOC_MEMBIND_BIND: - break; - default: - errno = ENOSYS; - return -1; - } - - if (flags & HWLOC_MEMBIND_NOCPUBIND) { - errno = ENOSYS; - return -1; - } - - depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE); - if (depth < 0) { - errno = EXDEV; - return -1; - } - n = hwloc_get_nbobjs_by_depth(topology, depth); - - for (i = 0; i < n; i++) { - hwloc_obj_t obj = hwloc_get_obj_by_depth(topology, depth, i); - if (hwloc_bitmap_isset(nodeset, obj->os_index)) { - lgrp_affinity_set(idtype, id, obj->os_index, LGRP_AFF_STRONG); - } else { - if (flags & HWLOC_CPUBIND_STRICT) - lgrp_affinity_set(idtype, id, obj->os_index, LGRP_AFF_NONE); - else - lgrp_affinity_set(idtype, id, obj->os_index, LGRP_AFF_WEAK); - } - } - - return 0; -} - -static int -hwloc_solaris_set_proc_membind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) -{ - return hwloc_solaris_set_sth_membind(topology, P_PID, pid, nodeset, policy, flags); -} - -static int -hwloc_solaris_set_thisproc_membind(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) -{ - return hwloc_solaris_set_sth_membind(topology, P_PID, P_MYID, nodeset, policy, flags); -} - -static int -hwloc_solaris_set_thisthread_membind(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) -{ - return hwloc_solaris_set_sth_membind(topology, P_LWPID, P_MYID, nodeset, policy, flags); -} - -static int -hwloc_solaris_get_sth_membind(hwloc_topology_t topology, idtype_t idtype, id_t id, hwloc_nodeset_t nodeset, hwloc_membind_policy_t *policy, int flags __hwloc_attribute_unused) -{ - int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_NUMANODE); - int n; - int i; - - if (depth < 0) { - errno = ENOSYS; - return -1; - } - - hwloc_bitmap_zero(nodeset); - n = hwloc_get_nbobjs_by_depth(topology, depth); - - for (i = 0; i < n; i++) { - hwloc_obj_t obj = hwloc_get_obj_by_depth(topology, depth, i); - lgrp_affinity_t aff = lgrp_affinity_get(idtype, id, obj->os_index); - - if (aff == LGRP_AFF_STRONG) - hwloc_bitmap_set(nodeset, obj->os_index); - } - - if (hwloc_bitmap_iszero(nodeset)) - hwloc_bitmap_copy(nodeset, hwloc_topology_get_complete_nodeset(topology)); - - *policy = HWLOC_MEMBIND_BIND; - return 0; -} - -static int -hwloc_solaris_get_proc_membind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_nodeset_t nodeset, hwloc_membind_policy_t *policy, int flags) -{ - return hwloc_solaris_get_sth_membind(topology, P_PID, pid, nodeset, policy, flags); -} - -static int -hwloc_solaris_get_thisproc_membind(hwloc_topology_t topology, hwloc_nodeset_t nodeset, hwloc_membind_policy_t *policy, int flags) -{ - return hwloc_solaris_get_sth_membind(topology, P_PID, P_MYID, nodeset, policy, flags); -} - -static int -hwloc_solaris_get_thisthread_membind(hwloc_topology_t topology, hwloc_nodeset_t nodeset, hwloc_membind_policy_t *policy, int flags) -{ - return hwloc_solaris_get_sth_membind(topology, P_LWPID, P_MYID, nodeset, policy, flags); -} -#endif /* HAVE_LIBLGRP */ - - -#ifdef MADV_ACCESS_LWP -static int -hwloc_solaris_set_area_membind(hwloc_topology_t topology, const void *addr, size_t len, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags __hwloc_attribute_unused) -{ - int advice; - size_t remainder; - - /* Can not give a set of nodes just for an area. */ - if (!hwloc_bitmap_isequal(nodeset, hwloc_topology_get_complete_nodeset(topology))) { - errno = EXDEV; - return -1; - } - - switch (policy) { - case HWLOC_MEMBIND_DEFAULT: - case HWLOC_MEMBIND_BIND: - advice = MADV_ACCESS_DEFAULT; - break; - case HWLOC_MEMBIND_FIRSTTOUCH: - case HWLOC_MEMBIND_NEXTTOUCH: - advice = MADV_ACCESS_LWP; - break; - case HWLOC_MEMBIND_INTERLEAVE: - advice = MADV_ACCESS_MANY; - break; - default: - errno = ENOSYS; - return -1; - } - - remainder = (uintptr_t) addr & (sysconf(_SC_PAGESIZE)-1); - addr = (char*) addr - remainder; - len += remainder; - return madvise((void*) addr, len, advice); -} -#endif - -#ifdef HAVE_LIBLGRP -static void -browse(struct hwloc_topology *topology, lgrp_cookie_t cookie, lgrp_id_t lgrp, hwloc_obj_t *glob_lgrps, unsigned *curlgrp) -{ - int n; - hwloc_obj_t obj; - lgrp_mem_size_t mem_size; - - n = lgrp_cpus(cookie, lgrp, NULL, 0, LGRP_CONTENT_HIERARCHY); - if (n == -1) - return; - - /* Is this lgrp a NUMA node? */ - if ((mem_size = lgrp_mem_size(cookie, lgrp, LGRP_MEM_SZ_INSTALLED, LGRP_CONTENT_DIRECT)) > 0) - { - int i; - processorid_t *cpuids; - cpuids = malloc(sizeof(processorid_t) * n); - assert(cpuids != NULL); - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_NUMANODE, lgrp); - obj->nodeset = hwloc_bitmap_alloc(); - hwloc_bitmap_set(obj->nodeset, lgrp); - obj->cpuset = hwloc_bitmap_alloc(); - glob_lgrps[(*curlgrp)++] = obj; - - lgrp_cpus(cookie, lgrp, cpuids, n, LGRP_CONTENT_HIERARCHY); - for (i = 0; i < n ; i++) { - hwloc_debug("node %ld's cpu %d is %d\n", lgrp, i, cpuids[i]); - hwloc_bitmap_set(obj->cpuset, cpuids[i]); - } - hwloc_debug_1arg_bitmap("node %ld has cpuset %s\n", - lgrp, obj->cpuset); - - /* or LGRP_MEM_SZ_FREE */ - hwloc_debug("node %ld has %lldkB\n", lgrp, mem_size/1024); - obj->memory.local_memory = mem_size; - obj->memory.page_types_len = 2; - obj->memory.page_types = malloc(2*sizeof(*obj->memory.page_types)); - memset(obj->memory.page_types, 0, 2*sizeof(*obj->memory.page_types)); - obj->memory.page_types[0].size = hwloc_getpagesize(); -#ifdef HAVE__SC_LARGE_PAGESIZE - obj->memory.page_types[1].size = sysconf(_SC_LARGE_PAGESIZE); -#endif - hwloc_insert_object_by_cpuset(topology, obj); - free(cpuids); - } - - n = lgrp_children(cookie, lgrp, NULL, 0); - { - lgrp_id_t *lgrps; - int i; - - lgrps = malloc(sizeof(lgrp_id_t) * n); - assert(lgrps != NULL); - lgrp_children(cookie, lgrp, lgrps, n); - hwloc_debug("lgrp %ld has %d children\n", lgrp, n); - for (i = 0; i < n ; i++) - { - browse(topology, cookie, lgrps[i], glob_lgrps, curlgrp); - } - hwloc_debug("lgrp %ld's children done\n", lgrp); - free(lgrps); - } -} - -static void -hwloc_look_lgrp(struct hwloc_topology *topology) -{ - lgrp_cookie_t cookie; - unsigned curlgrp = 0; - int nlgrps; - lgrp_id_t root; - - if ((topology->flags & HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM)) - cookie = lgrp_init(LGRP_VIEW_OS); - else - cookie = lgrp_init(LGRP_VIEW_CALLER); - if (cookie == LGRP_COOKIE_NONE) - { - hwloc_debug("lgrp_init failed: %s\n", strerror(errno)); - return; - } - nlgrps = lgrp_nlgrps(cookie); - root = lgrp_root(cookie); - if (nlgrps > 0) { - hwloc_obj_t *glob_lgrps = calloc(nlgrps, sizeof(hwloc_obj_t)); - browse(topology, cookie, root, glob_lgrps, &curlgrp); -#ifdef HAVE_LGRP_LATENCY_COOKIE - if (nlgrps > 1) { - float *distances = calloc(curlgrp*curlgrp, sizeof(float)); - unsigned *indexes = calloc(curlgrp,sizeof(unsigned)); - unsigned i, j; - for (i = 0; i < curlgrp; i++) { - indexes[i] = glob_lgrps[i]->os_index; - for (j = 0; j < curlgrp; j++) - distances[i*curlgrp+j] = (float) lgrp_latency_cookie(cookie, glob_lgrps[i]->os_index, glob_lgrps[j]->os_index, LGRP_LAT_CPU_TO_MEM); - } - hwloc_distances_set(topology, HWLOC_OBJ_NUMANODE, curlgrp, indexes, glob_lgrps, distances, 0 /* OS cannot force */); - } -#endif /* HAVE_LGRP_LATENCY_COOKIE */ - } - lgrp_fini(cookie); -} -#endif /* LIBLGRP */ - -#ifdef HAVE_LIBKSTAT -#include -static int -hwloc_look_kstat(struct hwloc_topology *topology) -{ - /* FIXME this assumes that all packages are identical */ - char *CPUType = hwloc_solaris_get_chip_type(); - char *CPUModel = hwloc_solaris_get_chip_model(); - - kstat_ctl_t *kc = kstat_open(); - kstat_t *ksp; - kstat_named_t *stat; - unsigned look_cores = 1, look_chips = 1; - - unsigned Pproc_max = 0; - unsigned Pproc_alloc = 256; - struct hwloc_solaris_Pproc { - unsigned Lpkg, Ppkg, Lcore, Lproc; - } * Pproc = malloc(Pproc_alloc * sizeof(*Pproc)); - - unsigned Lproc_num = 0; - unsigned Lproc_alloc = 256; - struct hwloc_solaris_Lproc { - unsigned Pproc; - } * Lproc = malloc(Lproc_alloc * sizeof(*Lproc)); - - unsigned Lcore_num = 0; - unsigned Lcore_alloc = 256; - struct hwloc_solaris_Lcore { - unsigned Pcore, Ppkg; - } * Lcore = malloc(Lcore_alloc * sizeof(*Lcore)); - - unsigned Lpkg_num = 0; - unsigned Lpkg_alloc = 256; - struct hwloc_solaris_Lpkg { - unsigned Ppkg; - } * Lpkg = malloc(Lpkg_alloc * sizeof(*Lpkg)); - - unsigned pkgid, coreid, cpuid; - unsigned i; - - for (i = 0; i < Pproc_alloc; i++) { - Pproc[i].Lproc = -1; - Pproc[i].Lpkg = -1; - Pproc[i].Ppkg = -1; - Pproc[i].Lcore = -1; - } - - if (!kc) { - hwloc_debug("kstat_open failed: %s\n", strerror(errno)); - free(Pproc); - free(Lproc); - free(Lcore); - free(Lpkg); - return 0; - } - - for (ksp = kc->kc_chain; ksp; ksp = ksp->ks_next) - { - if (strncmp("cpu_info", ksp->ks_module, 8)) - continue; - - cpuid = ksp->ks_instance; - - if (kstat_read(kc, ksp, NULL) == -1) - { - fprintf(stderr, "kstat_read failed for CPU%u: %s\n", cpuid, strerror(errno)); - continue; - } - - hwloc_debug("cpu%u\n", cpuid); - - if (cpuid >= Pproc_alloc) { - Pproc_alloc *= 2; - Pproc = realloc(Pproc, Pproc_alloc * sizeof(*Pproc)); - for(i = Pproc_alloc/2; i < Pproc_alloc; i++) { - Pproc[i].Lproc = -1; - Pproc[i].Lpkg = -1; - Pproc[i].Ppkg = -1; - Pproc[i].Lcore = -1; - } - } - Pproc[cpuid].Lproc = Lproc_num; - - if (Lproc_num >= Lproc_alloc) { - Lproc_alloc *= 2; - Lproc = realloc(Lproc, Lproc_alloc * sizeof(*Lproc)); - } - Lproc[Lproc_num].Pproc = cpuid; - Lproc_num++; - - if (cpuid >= Pproc_max) - Pproc_max = cpuid + 1; - - stat = (kstat_named_t *) kstat_data_lookup(ksp, "state"); - if (!stat) - hwloc_debug("could not read state for CPU%u: %s\n", cpuid, strerror(errno)); - else if (stat->data_type != KSTAT_DATA_CHAR) - hwloc_debug("unknown kstat type %d for cpu state\n", stat->data_type); - else - { - hwloc_debug("cpu%u's state is %s\n", cpuid, stat->value.c); - if (strcmp(stat->value.c, "on-line")) - /* not online */ - hwloc_bitmap_clr(topology->levels[0][0]->online_cpuset, cpuid); - } - - if (look_chips) do { - /* Get Chip ID */ - stat = (kstat_named_t *) kstat_data_lookup(ksp, "chip_id"); - if (!stat) - { - if (Lpkg_num) - fprintf(stderr, "could not read package id for CPU%u: %s\n", cpuid, strerror(errno)); - else - hwloc_debug("could not read package id for CPU%u: %s\n", cpuid, strerror(errno)); - look_chips = 0; - continue; - } - switch (stat->data_type) { - case KSTAT_DATA_INT32: - pkgid = stat->value.i32; - break; - case KSTAT_DATA_UINT32: - pkgid = stat->value.ui32; - break; -#ifdef _INT64_TYPE - case KSTAT_DATA_UINT64: - pkgid = stat->value.ui64; - break; - case KSTAT_DATA_INT64: - pkgid = stat->value.i64; - break; -#endif - default: - fprintf(stderr, "chip_id type %d unknown\n", stat->data_type); - look_chips = 0; - continue; - } - Pproc[cpuid].Ppkg = pkgid; - for (i = 0; i < Lpkg_num; i++) - if (pkgid == Lpkg[i].Ppkg) - break; - Pproc[cpuid].Lpkg = i; - hwloc_debug("%u on package %u (%u)\n", cpuid, i, pkgid); - if (i == Lpkg_num) { - if (Lpkg_num == Lpkg_alloc) { - Lpkg_alloc *= 2; - Lpkg = realloc(Lpkg, Lpkg_alloc * sizeof(*Lpkg)); - } - Lpkg[Lpkg_num++].Ppkg = pkgid; - } - } while(0); - - if (look_cores) do { - /* Get Core ID */ - stat = (kstat_named_t *) kstat_data_lookup(ksp, "core_id"); - if (!stat) - { - if (Lcore_num) - fprintf(stderr, "could not read core id for CPU%u: %s\n", cpuid, strerror(errno)); - else - hwloc_debug("could not read core id for CPU%u: %s\n", cpuid, strerror(errno)); - look_cores = 0; - continue; - } - switch (stat->data_type) { - case KSTAT_DATA_INT32: - coreid = stat->value.i32; - break; - case KSTAT_DATA_UINT32: - coreid = stat->value.ui32; - break; -#ifdef _INT64_TYPE - case KSTAT_DATA_UINT64: - coreid = stat->value.ui64; - break; - case KSTAT_DATA_INT64: - coreid = stat->value.i64; - break; -#endif - default: - fprintf(stderr, "core_id type %d unknown\n", stat->data_type); - look_cores = 0; - continue; - } - for (i = 0; i < Lcore_num; i++) - if (coreid == Lcore[i].Pcore && Pproc[cpuid].Ppkg == Lcore[i].Ppkg) - break; - Pproc[cpuid].Lcore = i; - hwloc_debug("%u on core %u (%u)\n", cpuid, i, coreid); - if (i == Lcore_num) { - if (Lcore_num == Lcore_alloc) { - Lcore_alloc *= 2; - Lcore = realloc(Lcore, Lcore_alloc * sizeof(*Lcore)); - } - Lcore[Lcore_num].Ppkg = Pproc[cpuid].Ppkg; - Lcore[Lcore_num++].Pcore = coreid; - } - } while(0); - - /* Note: there is also clog_id for the Thread ID (not unique) and - * pkg_core_id for the core ID (not unique). They are not useful to us - * however. */ - } - - if (look_chips) { - struct hwloc_obj *obj; - unsigned j,k; - hwloc_debug("%d Packages\n", Lpkg_num); - for (j = 0; j < Lpkg_num; j++) { - obj = hwloc_alloc_setup_object(HWLOC_OBJ_PACKAGE, Lpkg[j].Ppkg); - if (CPUType) - hwloc_obj_add_info(obj, "CPUType", CPUType); - if (CPUModel) - hwloc_obj_add_info(obj, "CPUModel", CPUModel); - obj->cpuset = hwloc_bitmap_alloc(); - for(k=0; kcpuset, k); - hwloc_debug_1arg_bitmap("Package %d has cpuset %s\n", j, obj->cpuset); - hwloc_insert_object_by_cpuset(topology, obj); - } - hwloc_debug("%s", "\n"); - } - - if (look_cores) { - struct hwloc_obj *obj; - unsigned j,k; - hwloc_debug("%d Cores\n", Lcore_num); - for (j = 0; j < Lcore_num; j++) { - obj = hwloc_alloc_setup_object(HWLOC_OBJ_CORE, Lcore[j].Pcore); - obj->cpuset = hwloc_bitmap_alloc(); - for(k=0; kcpuset, k); - hwloc_debug_1arg_bitmap("Core %d has cpuset %s\n", j, obj->cpuset); - hwloc_insert_object_by_cpuset(topology, obj); - } - hwloc_debug("%s", "\n"); - } - if (Lproc_num) { - struct hwloc_obj *obj; - unsigned j,k; - hwloc_debug("%d PUs\n", Lproc_num); - for (j = 0; j < Lproc_num; j++) { - obj = hwloc_alloc_setup_object(HWLOC_OBJ_PU, Lproc[j].Pproc); - obj->cpuset = hwloc_bitmap_alloc(); - for(k=0; kcpuset, k); - hwloc_debug_1arg_bitmap("PU %d has cpuset %s\n", j, obj->cpuset); - hwloc_insert_object_by_cpuset(topology, obj); - } - hwloc_debug("%s", "\n"); - } - - kstat_close(kc); - - free(Pproc); - free(Lproc); - free(Lcore); - free(Lpkg); - - return Lproc_num > 0; -} -#endif /* LIBKSTAT */ - -static int -hwloc_look_solaris(struct hwloc_backend *backend) -{ - struct hwloc_topology *topology = backend->topology; - unsigned nbprocs = hwloc_fallback_nbprocessors (topology); - int alreadypus = 0; - - if (topology->levels[0][0]->cpuset) - /* somebody discovered things */ - return 0; - - hwloc_alloc_obj_cpusets(topology->levels[0][0]); - -#ifdef HAVE_LIBLGRP - hwloc_look_lgrp(topology); -#endif /* HAVE_LIBLGRP */ -#ifdef HAVE_LIBKSTAT - if (hwloc_look_kstat(topology) > 0) - alreadypus = 1; -#endif /* HAVE_LIBKSTAT */ - if (!alreadypus) - hwloc_setup_pu_level(topology, nbprocs); - - hwloc_obj_add_info(topology->levels[0][0], "Backend", "Solaris"); - if (topology->is_thissystem) - hwloc_add_uname_info(topology, NULL); - return 1; -} - -void -hwloc_set_solaris_hooks(struct hwloc_binding_hooks *hooks, - struct hwloc_topology_support *support __hwloc_attribute_unused) -{ - hooks->set_proc_cpubind = hwloc_solaris_set_proc_cpubind; - hooks->set_thisproc_cpubind = hwloc_solaris_set_thisproc_cpubind; - hooks->set_thisthread_cpubind = hwloc_solaris_set_thisthread_cpubind; -#ifdef HAVE_LIBLGRP - hooks->get_proc_cpubind = hwloc_solaris_get_proc_cpubind; - hooks->get_thisproc_cpubind = hwloc_solaris_get_thisproc_cpubind; - hooks->get_thisthread_cpubind = hwloc_solaris_get_thisthread_cpubind; - hooks->set_proc_membind = hwloc_solaris_set_proc_membind; - hooks->set_thisproc_membind = hwloc_solaris_set_thisproc_membind; - hooks->set_thisthread_membind = hwloc_solaris_set_thisthread_membind; - hooks->get_proc_membind = hwloc_solaris_get_proc_membind; - hooks->get_thisproc_membind = hwloc_solaris_get_thisproc_membind; - hooks->get_thisthread_membind = hwloc_solaris_get_thisthread_membind; -#endif /* HAVE_LIBLGRP */ -#ifdef MADV_ACCESS_LWP - hooks->set_area_membind = hwloc_solaris_set_area_membind; - support->membind->firsttouch_membind = 1; - support->membind->bind_membind = 1; - support->membind->interleave_membind = 1; - support->membind->nexttouch_membind = 1; -#endif -} - -static struct hwloc_backend * -hwloc_solaris_component_instantiate(struct hwloc_disc_component *component, - const void *_data1 __hwloc_attribute_unused, - const void *_data2 __hwloc_attribute_unused, - const void *_data3 __hwloc_attribute_unused) -{ - struct hwloc_backend *backend; - backend = hwloc_backend_alloc(component); - if (!backend) - return NULL; - backend->discover = hwloc_look_solaris; - return backend; -} - -static struct hwloc_disc_component hwloc_solaris_disc_component = { - HWLOC_DISC_COMPONENT_TYPE_CPU, - "solaris", - HWLOC_DISC_COMPONENT_TYPE_GLOBAL, - hwloc_solaris_component_instantiate, - 50, - NULL -}; - -const struct hwloc_component hwloc_solaris_component = { - HWLOC_COMPONENT_ABI, - NULL, NULL, - HWLOC_COMPONENT_TYPE_DISC, - 0, - &hwloc_solaris_disc_component -}; diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-synthetic.c b/opal/mca/hwloc/hwloc1113/hwloc/src/topology-synthetic.c deleted file mode 100644 index 5e7a4260470..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-synthetic.c +++ /dev/null @@ -1,1117 +0,0 @@ -/* - * Copyright © 2009 CNRS - * Copyright © 2009-2015 Inria. All rights reserved. - * Copyright © 2009-2010 Université Bordeaux - * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. - * See COPYING in top-level directory. - */ - -#include -#include -#include -#include -#include - -#include -#include -#ifdef HAVE_STRINGS_H -#include -#endif - -struct hwloc_synthetic_level_data_s { - unsigned arity; - unsigned long totalwidth; - hwloc_obj_type_t type; - unsigned depth; /* For caches/groups */ - hwloc_obj_cache_type_t cachetype; /* For caches */ - hwloc_uint64_t memorysize; /* For caches/memory */ - - /* the indexes= attribute before parsing */ - const char *index_string; - unsigned long index_string_length; - /* the array of explicit indexes after parsing */ - unsigned *index_array; - - /* used while filling the topology */ - unsigned next_os_index; /* id of the next object for that level */ -}; - -struct hwloc_synthetic_backend_data_s { - /* synthetic backend parameters */ - char *string; -#define HWLOC_SYNTHETIC_MAX_DEPTH 128 - struct hwloc_synthetic_level_data_s level[HWLOC_SYNTHETIC_MAX_DEPTH]; -}; - -struct hwloc_synthetic_intlv_loop_s { - unsigned step; - unsigned nb; - unsigned level_depth; -}; - -static void -hwloc_synthetic_process_level_indexes(struct hwloc_synthetic_backend_data_s *data, - unsigned curleveldepth, - int verbose) -{ - struct hwloc_synthetic_level_data_s *curlevel = &data->level[curleveldepth]; - unsigned long total = curlevel->totalwidth; - const char *attr = curlevel->index_string; - unsigned long length = curlevel->index_string_length; - unsigned *array = NULL; - struct hwloc_synthetic_intlv_loop_s * loops = NULL; - size_t i; - - if (!attr) - return; - - array = calloc(total, sizeof(*array)); - if (!array) { - if (verbose) - fprintf(stderr, "Failed to allocate synthetic index array of size %lu\n", total); - goto out; - } - - i = strspn(attr, "0123456789,"); - if (i == length) { - /* explicit array of indexes */ - - for(i=0; iindex_array = array; - - } else { - /* interleaving */ - unsigned nr_loops = 1, cur_loop; - unsigned minstep = total; - unsigned long nbs = 1; - unsigned j, mul; - const char *tmp; - - tmp = attr; - while (tmp) { - tmp = strchr(tmp, ':'); - if (!tmp || tmp >= attr+length) - break; - nr_loops++; - tmp++; - } - /* nr_loops colon-separated fields, but we may need one more at the end */ - loops = malloc((nr_loops+1)*sizeof(*loops)); - if (!loops) { - if (verbose) - fprintf(stderr, "Failed to allocate synthetic index interleave loop array of size %u\n", nr_loops); - goto out_with_array; - } - - if (*attr >= '0' && *attr <= '9') { - /* interleaving as x*y:z*t:... */ - unsigned step, nb; - - tmp = attr; - cur_loop = 0; - while (tmp) { - char *tmp2, *tmp3; - step = (unsigned) strtol(tmp, &tmp2, 0); - if (tmp2 == tmp || *tmp2 != '*') { - if (verbose) - fprintf(stderr, "Failed to read synthetic index interleaving loop '%s' without number before '*'\n", tmp); - goto out_with_loops; - } - if (!step) { - if (verbose) - fprintf(stderr, "Invalid interleaving loop with step 0 at '%s'\n", tmp); - goto out_with_loops; - } - tmp2++; - nb = (unsigned) strtol(tmp2, &tmp3, 0); - if (tmp3 == tmp2 || (*tmp3 && *tmp3 != ':' && *tmp3 != ')' && *tmp3 != ' ')) { - if (verbose) - fprintf(stderr, "Failed to read synthetic index interleaving loop '%s' without number between '*' and ':'\n", tmp); - goto out_with_loops; - } - if (!nb) { - if (verbose) - fprintf(stderr, "Invalid interleaving loop with number 0 at '%s'\n", tmp2); - goto out_with_loops; - } - loops[cur_loop].step = step; - loops[cur_loop].nb = nb; - if (step < minstep) - minstep = step; - nbs *= nb; - cur_loop++; - if (*tmp3 == ')' || *tmp3 == ' ') - break; - tmp = (const char*) (tmp3+1); - } - - } else { - /* interleaving as type1:type2:... */ - hwloc_obj_type_t type; - hwloc_obj_cache_type_t cachetypeattr; - int depthattr; - int err; - - /* find level depths for each interleaving loop */ - tmp = attr; - cur_loop = 0; - while (tmp) { - err = hwloc_obj_type_sscanf(tmp, &type, &depthattr, &cachetypeattr, sizeof(cachetypeattr)); - if (err < 0) { - if (verbose) - fprintf(stderr, "Failed to read synthetic index interleaving loop type '%s'\n", tmp); - goto out_with_loops; - } - if (type == HWLOC_OBJ_MISC || type == HWLOC_OBJ_BRIDGE || type == HWLOC_OBJ_PCI_DEVICE || type == HWLOC_OBJ_OS_DEVICE) { - if (verbose) - fprintf(stderr, "Misc object type disallowed in synthetic index interleaving loop type '%s'\n", tmp); - goto out_with_loops; - } - for(i=0; ilevel[i].type) - continue; - if ((type == HWLOC_OBJ_GROUP || type == HWLOC_OBJ_CACHE) - && depthattr != -1 - && (unsigned) depthattr != data->level[i].depth) - continue; - if (type == HWLOC_OBJ_CACHE - && cachetypeattr != (hwloc_obj_cache_type_t) -1 - && cachetypeattr != data->level[i].cachetype) - continue; - loops[cur_loop].level_depth = (unsigned)i; - break; - } - if (i == curleveldepth) { - if (verbose) - fprintf(stderr, "Failed to find level for synthetic index interleaving loop type '%s' above '%s'\n", - tmp, hwloc_obj_type_string(curlevel->type)); - goto out_with_loops; - } - tmp = strchr(tmp, ':'); - if (!tmp || tmp > attr+length) - break; - tmp++; - cur_loop++; - } - - /* compute actual loop step/nb */ - for(cur_loop=0; cur_loop prevdepth) - prevdepth = loops[i].level_depth; - } - step = curlevel->totalwidth / data->level[mydepth].totalwidth; /* number of objects below us */ - nb = data->level[mydepth].totalwidth / data->level[prevdepth].totalwidth; /* number of us within parent */ - - loops[cur_loop].step = step; - loops[cur_loop].nb = nb; - assert(nb); - assert(step); - if (step < minstep) - minstep = step; - nbs *= nb; - } - } - assert(nbs); - - if (nbs != total) { - /* one loop of total/nbs steps is missing, add it if it's just the smallest one */ - if (minstep == total/nbs) { - loops[nr_loops].step = 1; - loops[nr_loops].nb = total/nbs; - nr_loops++; - } else { - if (verbose) - fprintf(stderr, "Invalid index interleaving total width %lu instead of %lu\n", nbs, total); - goto out_with_loops; - } - } - - /* generate the array of indexes */ - mul = 1; - for(i=0; i= total) { - if (verbose) - fprintf(stderr, "Invalid index interleaving generates out-of-range index %u\n", array[j]); - goto out_with_loops; - } - if (!array[j] && j) { - if (verbose) - fprintf(stderr, "Invalid index interleaving generates duplicate index values\n"); - goto out_with_loops; - } - } - - free(loops); - curlevel->index_array = array; - } - - return; - - out_with_loops: - free(loops); - out_with_array: - free(array); - out: - return; -} - -static hwloc_uint64_t -hwloc_synthetic_parse_memory_attr(const char *attr, const char **endp) -{ - const char *endptr; - hwloc_uint64_t size; - size = strtoull(attr, (char **) &endptr, 0); - if (!hwloc_strncasecmp(endptr, "TB", 2)) { - size <<= 40; - endptr += 2; - } else if (!hwloc_strncasecmp(endptr, "GB", 2)) { - size <<= 30; - endptr += 2; - } else if (!hwloc_strncasecmp(endptr, "MB", 2)) { - size <<= 20; - endptr += 2; - } else if (!hwloc_strncasecmp(endptr, "kB", 2)) { - size <<= 10; - endptr += 2; - } - *endp = endptr; - return size; -} - -static int -hwloc_synthetic_parse_level_attrs(const char *attrs, const char **next_posp, - struct hwloc_synthetic_level_data_s *curlevel, - int verbose) -{ - hwloc_obj_type_t type = curlevel->type; - const char *next_pos; - hwloc_uint64_t memorysize = 0; - const char *index_string = NULL; - size_t index_string_length = 0; - - next_pos = (const char *) strchr(attrs, ')'); - if (!next_pos) { - if (verbose) - fprintf(stderr, "Missing attribute closing bracket in synthetic string doesn't have a number of objects at '%s'\n", attrs); - errno = EINVAL; - return -1; - } - - while (')' != *attrs) { - if (HWLOC_OBJ_CACHE == type && !strncmp("size=", attrs, 5)) { - memorysize = hwloc_synthetic_parse_memory_attr(attrs+5, &attrs); - - } else if (HWLOC_OBJ_CACHE != type && !strncmp("memory=", attrs, 7)) { - memorysize = hwloc_synthetic_parse_memory_attr(attrs+7, &attrs); - - } else if (!strncmp("indexes=", attrs, 8)) { - index_string = attrs+8; - attrs += 8; - index_string_length = strcspn(attrs, " )"); - attrs += index_string_length; - - } else { - if (verbose) - fprintf(stderr, "Unknown attribute at '%s'\n", attrs); - errno = EINVAL; - return -1; - } - - if (' ' == *attrs) - attrs++; - else if (')' != *attrs) { - if (verbose) - fprintf(stderr, "Missing parameter separator at '%s'\n", attrs); - errno = EINVAL; - return -1; - } - } - - curlevel->memorysize = memorysize; - curlevel->index_string = index_string; - curlevel->index_string_length = (unsigned long)index_string_length; - *next_posp = next_pos+1; - return 0; -} - -/* Read from description a series of integers describing a symmetrical - topology and update the hwloc_synthetic_backend_data_s accordingly. On - success, return zero. */ -static int -hwloc_backend_synthetic_init(struct hwloc_synthetic_backend_data_s *data, - const char *description) -{ - const char *pos, *next_pos; - unsigned long item, count; - unsigned i; - int cache_depth = 0, group_depth = 0; - int nb_machine_levels = 0, nb_node_levels = 0; - int nb_pu_levels = 0; - int verbose = 0; - const char *env = getenv("HWLOC_SYNTHETIC_VERBOSE"); - int err; - unsigned long totalarity = 1; - - if (env) - verbose = atoi(env); - - /* default values before we add root attributes */ - data->level[0].totalwidth = 1; - data->level[0].type = HWLOC_OBJ_MACHINE; - data->level[0].index_string = NULL; - data->level[0].index_array = NULL; - data->level[0].memorysize = 0; - if (*description == '(') { - err = hwloc_synthetic_parse_level_attrs(description+1, &description, &data->level[0], verbose); - if (err < 0) - return err; - } - - for (pos = description, count = 1; *pos; pos = next_pos) { -#define HWLOC_OBJ_TYPE_UNKNOWN ((hwloc_obj_type_t) -1) - hwloc_obj_type_t type = HWLOC_OBJ_TYPE_UNKNOWN; - int typedepth = -1; - hwloc_obj_cache_type_t cachetype = (hwloc_obj_cache_type_t) -1; - - /* initialize parent arity to 0 so that the levels are not infinite */ - data->level[count-1].arity = 0; - - while (*pos == ' ') - pos++; - - if (!*pos) - break; - - if (*pos < '0' || *pos > '9') { - if (hwloc_obj_type_sscanf(pos, &type, &typedepth, &cachetype, sizeof(cachetype)) < 0) { - if (verbose) - fprintf(stderr, "Synthetic string with unknown object type at '%s'\n", pos); - errno = EINVAL; - goto error; - } - if (type == HWLOC_OBJ_SYSTEM || type == HWLOC_OBJ_MISC || type == HWLOC_OBJ_BRIDGE || type == HWLOC_OBJ_PCI_DEVICE || type == HWLOC_OBJ_OS_DEVICE) { - if (verbose) - fprintf(stderr, "Synthetic string with disallowed object type at '%s'\n", pos); - errno = EINVAL; - goto error; - } - - next_pos = strchr(pos, ':'); - if (!next_pos) { - if (verbose) - fprintf(stderr,"Synthetic string doesn't have a `:' after object type at '%s'\n", pos); - errno = EINVAL; - goto error; - } - pos = next_pos + 1; - } - data->level[count].type = type; - data->level[count].depth = (unsigned) typedepth; - data->level[count].cachetype = cachetype; - - item = strtoul(pos, (char **)&next_pos, 0); - if (next_pos == pos) { - if (verbose) - fprintf(stderr,"Synthetic string doesn't have a number of objects at '%s'\n", pos); - errno = EINVAL; - goto error; - } - if (!item) { - if (verbose) - fprintf(stderr,"Synthetic string with disallow 0 number of objects at '%s'\n", pos); - errno = EINVAL; - goto error; - } - data->level[count-1].arity = (unsigned)item; - - totalarity *= item; - data->level[count].totalwidth = totalarity; - data->level[count].index_string = NULL; - data->level[count].index_array = NULL; - data->level[count].memorysize = 0; - if (*next_pos == '(') { - err = hwloc_synthetic_parse_level_attrs(next_pos+1, &next_pos, &data->level[count], verbose); - if (err < 0) - goto error; - } - - if (count + 1 >= HWLOC_SYNTHETIC_MAX_DEPTH) { - if (verbose) - fprintf(stderr,"Too many synthetic levels, max %d\n", HWLOC_SYNTHETIC_MAX_DEPTH); - errno = EINVAL; - goto error; - } - if (item > UINT_MAX) { - if (verbose) - fprintf(stderr,"Too big arity, max %u\n", UINT_MAX); - errno = EINVAL; - goto error; - } - - count++; - } - - if (count <= 0) { - if (verbose) - fprintf(stderr, "Synthetic string doesn't contain any object\n"); - errno = EINVAL; - goto error; - } - - for(i=count-1; i>0; i--) { - struct hwloc_synthetic_level_data_s *curlevel = &data->level[i]; - hwloc_obj_type_t type; - - type = curlevel->type; - - if (i == count-1 && type != HWLOC_OBJ_TYPE_UNKNOWN && type != HWLOC_OBJ_PU) { - if (verbose) - fprintf(stderr, "Synthetic string cannot use non-PU type for last level\n"); - errno = EINVAL; - return -1; - } - if (i != count-1 && type == HWLOC_OBJ_PU) { - if (verbose) - fprintf(stderr, "Synthetic string cannot use PU type for non-last level\n"); - errno = EINVAL; - return -1; - } - - if (type == HWLOC_OBJ_TYPE_UNKNOWN) { - if (i == count-1) - type = HWLOC_OBJ_PU; - else { - switch (data->level[i+1].type) { - case HWLOC_OBJ_PU: type = HWLOC_OBJ_CORE; break; - case HWLOC_OBJ_CORE: type = HWLOC_OBJ_CACHE; break; - case HWLOC_OBJ_CACHE: type = HWLOC_OBJ_PACKAGE; break; - case HWLOC_OBJ_PACKAGE: type = HWLOC_OBJ_NUMANODE; break; - case HWLOC_OBJ_NUMANODE: - case HWLOC_OBJ_MACHINE: - case HWLOC_OBJ_GROUP: type = HWLOC_OBJ_GROUP; break; - default: - assert(0); - } - } - curlevel->type = type; - } - switch (type) { - case HWLOC_OBJ_PU: - nb_pu_levels++; - break; - case HWLOC_OBJ_CACHE: - cache_depth++; - break; - case HWLOC_OBJ_GROUP: - group_depth++; - break; - case HWLOC_OBJ_NUMANODE: - nb_node_levels++; - break; - case HWLOC_OBJ_MACHINE: - nb_machine_levels++; - break; - default: - break; - } - } - - if (!nb_pu_levels) { - if (verbose) - fprintf(stderr, "Synthetic string missing ending number of PUs\n"); - errno = EINVAL; - return -1; - } - if (nb_pu_levels > 1) { - if (verbose) - fprintf(stderr, "Synthetic string can not have several PU levels\n"); - errno = EINVAL; - return -1; - } - if (nb_node_levels > 1) { - if (verbose) - fprintf(stderr, "Synthetic string can not have several NUMA node levels\n"); - errno = EINVAL; - return -1; - } - if (nb_machine_levels > 1) { - if (verbose) - fprintf(stderr, "Synthetic string can not have several machine levels\n"); - errno = EINVAL; - return -1; - } - - if (nb_machine_levels) - data->level[0].type = HWLOC_OBJ_SYSTEM; - else { - data->level[0].type = HWLOC_OBJ_MACHINE; - nb_machine_levels++; - } - - if (cache_depth == 1) - /* if there is a single cache level, make it L2 */ - cache_depth = 2; - - for (i=0; ilevel[i]; - hwloc_obj_type_t type = curlevel->type; - - if (type == HWLOC_OBJ_GROUP) { - if (curlevel->depth == (unsigned)-1) - curlevel->depth = group_depth--; - - } else if (type == HWLOC_OBJ_CACHE) { - if (curlevel->depth == (unsigned)-1) - curlevel->depth = cache_depth--; - if (curlevel->cachetype == (hwloc_obj_cache_type_t) -1) - curlevel->cachetype = curlevel->depth == 1 ? HWLOC_OBJ_CACHE_DATA : HWLOC_OBJ_CACHE_UNIFIED; - if (!curlevel->memorysize) { - if (1 == curlevel->depth) - /* 32Kb in L1 */ - curlevel->memorysize = 32*1024; - else - /* *4 at each level, starting from 1MB for L2, unified */ - curlevel->memorysize = 256*1024 << (2*curlevel->depth); - } - - } else if (type == HWLOC_OBJ_NUMANODE && !curlevel->memorysize) { - /* 1GB in memory nodes. */ - curlevel->memorysize = 1024*1024*1024; - } - - hwloc_synthetic_process_level_indexes(data, i, verbose); - } - - data->string = strdup(description); - data->level[count-1].arity = 0; - return 0; - - error: - for(i=0; ilevel[i]; - free(curlevel->index_array); - if (!curlevel->arity) - break; - } - return -1; -} - -static void -hwloc_synthetic__post_look_hooks(struct hwloc_synthetic_level_data_s *curlevel, - hwloc_obj_t obj) -{ - switch (obj->type) { - case HWLOC_OBJ_GROUP: - obj->attr->group.depth = curlevel->depth; - break; - case HWLOC_OBJ_SYSTEM: - break; - case HWLOC_OBJ_MACHINE: - break; - case HWLOC_OBJ_NUMANODE: - break; - case HWLOC_OBJ_PACKAGE: - break; - case HWLOC_OBJ_CACHE: - obj->attr->cache.depth = curlevel->depth; - obj->attr->cache.linesize = 64; - obj->attr->cache.type = curlevel->cachetype; - obj->attr->cache.size = curlevel->memorysize; - break; - case HWLOC_OBJ_CORE: - break; - case HWLOC_OBJ_PU: - break; - case HWLOC_OBJ_BRIDGE: - case HWLOC_OBJ_PCI_DEVICE: - case HWLOC_OBJ_OS_DEVICE: - case HWLOC_OBJ_MISC: - case HWLOC_OBJ_TYPE_MAX: - /* Should never happen */ - assert(0); - break; - } - if (curlevel->memorysize && HWLOC_OBJ_CACHE != obj->type) { - obj->memory.local_memory = curlevel->memorysize; - obj->memory.page_types_len = 1; - obj->memory.page_types = malloc(sizeof(*obj->memory.page_types)); - memset(obj->memory.page_types, 0, sizeof(*obj->memory.page_types)); - obj->memory.page_types[0].size = 4096; - obj->memory.page_types[0].count = curlevel->memorysize / 4096; - } -} - -/* - * Recursively build objects whose cpu start at first_cpu - * - level gives where to look in the type, arity and id arrays - * - the id array is used as a variable to get unique IDs for a given level. - * - generated memory should be added to *memory_kB. - * - generated cpus should be added to parent_cpuset. - * - next cpu number to be used should be returned. - */ -static void -hwloc__look_synthetic(struct hwloc_topology *topology, - struct hwloc_synthetic_backend_data_s *data, - int level, - hwloc_bitmap_t parent_cpuset) -{ - hwloc_obj_t obj; - unsigned i; - struct hwloc_synthetic_level_data_s *curlevel = &data->level[level]; - hwloc_obj_type_t type = curlevel->type; - unsigned os_index; - - /* pre-hooks */ - switch (type) { - case HWLOC_OBJ_GROUP: - break; - case HWLOC_OBJ_MACHINE: - break; - case HWLOC_OBJ_NUMANODE: - break; - case HWLOC_OBJ_PACKAGE: - break; - case HWLOC_OBJ_CACHE: - break; - case HWLOC_OBJ_CORE: - break; - case HWLOC_OBJ_PU: - break; - case HWLOC_OBJ_SYSTEM: - case HWLOC_OBJ_BRIDGE: - case HWLOC_OBJ_PCI_DEVICE: - case HWLOC_OBJ_OS_DEVICE: - case HWLOC_OBJ_MISC: - case HWLOC_OBJ_TYPE_MAX: - /* Should never happen */ - assert(0); - break; - } - - os_index = curlevel->next_os_index++; - if (curlevel->index_array) - os_index = curlevel->index_array[os_index]; - obj = hwloc_alloc_setup_object(type, os_index); - obj->cpuset = hwloc_bitmap_alloc(); - - if (!curlevel->arity) { - hwloc_bitmap_set(obj->cpuset, os_index); - } else { - for (i = 0; i < curlevel->arity; i++) - hwloc__look_synthetic(topology, data, level + 1, obj->cpuset); - } - - if (type == HWLOC_OBJ_NUMANODE) { - obj->nodeset = hwloc_bitmap_alloc(); - hwloc_bitmap_set(obj->nodeset, os_index); - } - - hwloc_bitmap_or(parent_cpuset, parent_cpuset, obj->cpuset); - - hwloc_synthetic__post_look_hooks(curlevel, obj); - - hwloc_insert_object_by_cpuset(topology, obj); -} - -static int -hwloc_look_synthetic(struct hwloc_backend *backend) -{ - struct hwloc_topology *topology = backend->topology; - struct hwloc_synthetic_backend_data_s *data = backend->private_data; - hwloc_bitmap_t cpuset = hwloc_bitmap_alloc(); - unsigned i; - - assert(!topology->levels[0][0]->cpuset); - - hwloc_alloc_obj_cpusets(topology->levels[0][0]); - - topology->support.discovery->pu = 1; - - /* start with os_index 0 for each level */ - for (i = 0; data->level[i].arity > 0; i++) - data->level[i].next_os_index = 0; - /* ... including the last one */ - data->level[i].next_os_index = 0; - - /* update first level type according to the synthetic type array */ - topology->levels[0][0]->type = data->level[0].type; - hwloc_synthetic__post_look_hooks(&data->level[0], topology->levels[0][0]); - - for (i = 0; i < data->level[0].arity; i++) - hwloc__look_synthetic(topology, data, 1, cpuset); - - hwloc_bitmap_free(cpuset); - - hwloc_obj_add_info(topology->levels[0][0], "Backend", "Synthetic"); - hwloc_obj_add_info(topology->levels[0][0], "SyntheticDescription", data->string); - return 1; -} - -static void -hwloc_synthetic_backend_disable(struct hwloc_backend *backend) -{ - struct hwloc_synthetic_backend_data_s *data = backend->private_data; - unsigned i; - for(i=0; ilevel[i]; - free(curlevel->index_array); - if (!curlevel->arity) - break; - } - free(data->string); - free(data); -} - -static struct hwloc_backend * -hwloc_synthetic_component_instantiate(struct hwloc_disc_component *component, - const void *_data1, - const void *_data2 __hwloc_attribute_unused, - const void *_data3 __hwloc_attribute_unused) -{ - struct hwloc_backend *backend; - struct hwloc_synthetic_backend_data_s *data; - int err; - - if (!_data1) { - errno = EINVAL; - goto out; - } - - backend = hwloc_backend_alloc(component); - if (!backend) - goto out; - - data = malloc(sizeof(*data)); - if (!data) { - errno = ENOMEM; - goto out_with_backend; - } - - err = hwloc_backend_synthetic_init(data, (const char *) _data1); - if (err < 0) - goto out_with_data; - - backend->private_data = data; - backend->discover = hwloc_look_synthetic; - backend->disable = hwloc_synthetic_backend_disable; - backend->is_thissystem = 0; - - return backend; - - out_with_data: - free(data); - out_with_backend: - free(backend); - out: - return NULL; -} - -static struct hwloc_disc_component hwloc_synthetic_disc_component = { - HWLOC_DISC_COMPONENT_TYPE_GLOBAL, - "synthetic", - ~0, - hwloc_synthetic_component_instantiate, - 30, - NULL -}; - -const struct hwloc_component hwloc_synthetic_component = { - HWLOC_COMPONENT_ABI, - NULL, NULL, - HWLOC_COMPONENT_TYPE_DISC, - 0, - &hwloc_synthetic_disc_component -}; - -static int hwloc_topology_export_synthetic_indexes(struct hwloc_topology * topology, - hwloc_obj_t obj, - char *buffer, size_t buflen) -{ - unsigned depth = obj->depth; - unsigned total = topology->level_nbobjects[depth]; - unsigned step = 1; - unsigned nr_loops = 0; - struct hwloc_synthetic_intlv_loop_s *loops = NULL; - hwloc_obj_t cur; - unsigned i, j; - ssize_t tmplen = buflen; - char *tmp = buffer; - int res, ret = 0; - - /* must start with 0 */ - if (obj->os_index) - goto exportall; - - while (step != total) { - /* must be a divider of the total */ - if (total % step) - goto exportall; - - /* look for os_index == step */ - for(i=1; ilevels[depth][i]->os_index == step) - break; - if (i == total) - goto exportall; - for(j=2; jlevels[depth][i*j]->os_index != step*j) - break; - - nr_loops++; - loops = realloc(loops, nr_loops*sizeof(*loops)); - if (!loops) - goto exportall; - loops[nr_loops-1].step = i; - loops[nr_loops-1].nb = j; - step *= j; - } - - /* check this interleaving */ - for(i=0; ilevels[depth][i]->os_index != ind) - goto exportall; - } - - /* success, print it */ - for(j=0; j= tmplen) - res = tmplen>0 ? (int)tmplen - 1 : 0; - tmp += res; - tmplen -= res; - } - - if (loops) - free(loops); - - return ret; - - exportall: - if (loops) - free(loops); - - /* dump all indexes */ - cur = obj; - while (cur) { - res = snprintf(tmp, tmplen, "%u%s", cur->os_index, - cur->next_cousin ? "," : ")"); - if (res < 0) - return -1; - ret += res; - if (res >= tmplen) - res = tmplen>0 ? (int)tmplen - 1 : 0; - tmp += res; - tmplen -= res; - cur = cur->next_cousin; - } - return ret; -} - -static int hwloc_topology_export_synthetic_obj_attr(struct hwloc_topology * topology, - hwloc_obj_t obj, - char *buffer, size_t buflen) -{ - const char * separator = " "; - const char * prefix = "("; - char cachesize[64] = ""; - char memsize[64] = ""; - int needindexes = 0; - - if (HWLOC_OBJ_CACHE == obj->type && obj->attr->cache.size) { - snprintf(cachesize, sizeof(cachesize), "%ssize=%llu", - prefix, (unsigned long long) obj->attr->cache.size); - prefix = separator; - } - if (obj->memory.local_memory) { - snprintf(memsize, sizeof(memsize), "%smemory=%llu", - prefix, (unsigned long long) obj->memory.local_memory); - prefix = separator; - } - if (obj->type == HWLOC_OBJ_PU || obj->type == HWLOC_OBJ_NUMANODE) { - hwloc_obj_t cur = obj; - while (cur) { - if (cur->os_index != cur->logical_index) { - needindexes = 1; - break; - } - cur = cur->next_cousin; - } - } - if (*cachesize || *memsize || needindexes) { - ssize_t tmplen = buflen; - char *tmp = buffer; - int res, ret = 0; - - res = hwloc_snprintf(tmp, tmplen, "%s%s%s", cachesize, memsize, needindexes ? "" : ")"); - if (res < 0) - return -1; - ret += res; - if (res >= tmplen) - res = tmplen>0 ? (int)tmplen - 1 : 0; - tmp += res; - tmplen -= res; - - if (needindexes) { - res = snprintf(tmp, tmplen, "%sindexes=", prefix); - if (res < 0) - return -1; - ret += res; - if (res >= tmplen) - res = tmplen>0 ? (int)tmplen - 1 : 0; - tmp += res; - tmplen -= res; - - res = hwloc_topology_export_synthetic_indexes(topology, obj, tmp, tmplen); - if (res < 0) - return -1; - ret += res; - if (res >= tmplen) - res = tmplen>0 ? (int)tmplen - 1 : 0; - tmp += res; - tmplen -= res; - } - return ret; - } else { - return 0; - } -} - -int -hwloc_topology_export_synthetic(struct hwloc_topology * topology, - char *buffer, size_t buflen, - unsigned long flags) -{ - hwloc_obj_t obj = hwloc_get_root_obj(topology); - ssize_t tmplen = buflen; - char *tmp = buffer; - int res, ret = 0; - int arity; - const char * separator = " "; - const char * prefix = ""; - - if (flags & ~(HWLOC_TOPOLOGY_EXPORT_SYNTHETIC_FLAG_NO_EXTENDED_TYPES|HWLOC_TOPOLOGY_EXPORT_SYNTHETIC_FLAG_NO_ATTRS)) { - errno = EINVAL; - return -1; - } - - /* TODO: add a flag to ignore symmetric_subtree and I/Os. - * just assume things are symmetric with the left branches of the tree. - * but the number of objects per level may be wrong, what to do with OS index array in this case? - * only allow ignoring symmetric_subtree if the level width remains OK? - */ - - /* TODO: add a root object by default, with a prefix such as tree= - * so that we can backward-compatibly recognize whether there's a root or not. - * and add a flag to disable it. - */ - - /* TODO: flag to force all indexes, not only for PU and NUMA? */ - - if (!obj->symmetric_subtree) { - errno = EINVAL; - return -1; - } - - if (!(flags & HWLOC_TOPOLOGY_EXPORT_SYNTHETIC_FLAG_NO_ATTRS)) { - /* root attributes */ - res = hwloc_topology_export_synthetic_obj_attr(topology, obj, tmp, tmplen); - if (res < 0) - return -1; - ret += res; - if (ret > 0) - prefix = separator; - if (res >= tmplen) - res = tmplen>0 ? (int)tmplen - 1 : 0; - tmp += res; - tmplen -= res; - } - - arity = obj->arity; - while (arity) { - /* for each level */ - obj = obj->first_child; - if (flags & HWLOC_TOPOLOGY_EXPORT_SYNTHETIC_FLAG_NO_EXTENDED_TYPES) { - res = hwloc_snprintf(tmp, tmplen, "%s%s:%u", prefix, hwloc_obj_type_string(obj->type), arity); - } else { - char types[64]; - hwloc_obj_type_snprintf(types, sizeof(types), obj, 1); - res = hwloc_snprintf(tmp, tmplen, "%s%s:%u", prefix, types, arity); - } - if (res < 0) - return -1; - ret += res; - if (res >= tmplen) - res = tmplen>0 ? (int)tmplen - 1 : 0; - tmp += res; - tmplen -= res; - - if (!(flags & HWLOC_TOPOLOGY_EXPORT_SYNTHETIC_FLAG_NO_ATTRS)) { - /* obj attributes */ - res = hwloc_topology_export_synthetic_obj_attr(topology, obj, tmp, tmplen); - if (res < 0) - return -1; - ret += res; - if (res >= tmplen) - res = tmplen>0 ? (int)tmplen - 1 : 0; - tmp += res; - tmplen -= res; - } - - /* next level */ - prefix = separator; - arity = obj->arity; - } - - return ret; -} diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-windows.c b/opal/mca/hwloc/hwloc1113/hwloc/src/topology-windows.c deleted file mode 100644 index bace45b230b..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-windows.c +++ /dev/null @@ -1,1130 +0,0 @@ -/* - * Copyright © 2009 CNRS - * Copyright © 2009-2015 Inria. All rights reserved. - * Copyright © 2009-2012 Université Bordeaux - * Copyright © 2011 Cisco Systems, Inc. All rights reserved. - * See COPYING in top-level directory. - */ - -/* To try to get all declarations duplicated below. */ -#define _WIN32_WINNT 0x0601 - -#include -#include -#include -#include - -#include - -#ifndef HAVE_KAFFINITY -typedef ULONG_PTR KAFFINITY, *PKAFFINITY; -#endif - -#ifndef HAVE_PROCESSOR_CACHE_TYPE -typedef enum _PROCESSOR_CACHE_TYPE { - CacheUnified, - CacheInstruction, - CacheData, - CacheTrace -} PROCESSOR_CACHE_TYPE; -#endif - -#ifndef CACHE_FULLY_ASSOCIATIVE -#define CACHE_FULLY_ASSOCIATIVE 0xFF -#endif - -#ifndef MAXIMUM_PROC_PER_GROUP /* missing in MinGW */ -#define MAXIMUM_PROC_PER_GROUP 64 -#endif - -#ifndef HAVE_CACHE_DESCRIPTOR -typedef struct _CACHE_DESCRIPTOR { - BYTE Level; - BYTE Associativity; - WORD LineSize; - DWORD Size; /* in bytes */ - PROCESSOR_CACHE_TYPE Type; -} CACHE_DESCRIPTOR, *PCACHE_DESCRIPTOR; -#endif - -#ifndef HAVE_LOGICAL_PROCESSOR_RELATIONSHIP -typedef enum _LOGICAL_PROCESSOR_RELATIONSHIP { - RelationProcessorCore, - RelationNumaNode, - RelationCache, - RelationProcessorPackage, - RelationGroup, - RelationAll = 0xffff -} LOGICAL_PROCESSOR_RELATIONSHIP; -#else /* HAVE_LOGICAL_PROCESSOR_RELATIONSHIP */ -# ifndef HAVE_RELATIONPROCESSORPACKAGE -# define RelationProcessorPackage 3 -# define RelationGroup 4 -# define RelationAll 0xffff -# endif /* HAVE_RELATIONPROCESSORPACKAGE */ -#endif /* HAVE_LOGICAL_PROCESSOR_RELATIONSHIP */ - -#ifndef HAVE_SYSTEM_LOGICAL_PROCESSOR_INFORMATION -typedef struct _SYSTEM_LOGICAL_PROCESSOR_INFORMATION { - ULONG_PTR ProcessorMask; - LOGICAL_PROCESSOR_RELATIONSHIP Relationship; - _ANONYMOUS_UNION - union { - struct { - BYTE flags; - } ProcessorCore; - struct { - DWORD NodeNumber; - } NumaNode; - CACHE_DESCRIPTOR Cache; - ULONGLONG Reserved[2]; - } DUMMYUNIONNAME; -} SYSTEM_LOGICAL_PROCESSOR_INFORMATION, *PSYSTEM_LOGICAL_PROCESSOR_INFORMATION; -#endif - -/* Extended interface, for group support */ - -#ifndef HAVE_GROUP_AFFINITY -typedef struct _GROUP_AFFINITY { - KAFFINITY Mask; - WORD Group; - WORD Reserved[3]; -} GROUP_AFFINITY, *PGROUP_AFFINITY; -#endif - -#ifndef HAVE_PROCESSOR_RELATIONSHIP -typedef struct _PROCESSOR_RELATIONSHIP { - BYTE Flags; - BYTE Reserved[21]; - WORD GroupCount; - GROUP_AFFINITY GroupMask[ANYSIZE_ARRAY]; -} PROCESSOR_RELATIONSHIP, *PPROCESSOR_RELATIONSHIP; -#endif - -#ifndef HAVE_NUMA_NODE_RELATIONSHIP -typedef struct _NUMA_NODE_RELATIONSHIP { - DWORD NodeNumber; - BYTE Reserved[20]; - GROUP_AFFINITY GroupMask; -} NUMA_NODE_RELATIONSHIP, *PNUMA_NODE_RELATIONSHIP; -#endif - -#ifndef HAVE_CACHE_RELATIONSHIP -typedef struct _CACHE_RELATIONSHIP { - BYTE Level; - BYTE Associativity; - WORD LineSize; - DWORD CacheSize; - PROCESSOR_CACHE_TYPE Type; - BYTE Reserved[20]; - GROUP_AFFINITY GroupMask; -} CACHE_RELATIONSHIP, *PCACHE_RELATIONSHIP; -#endif - -#ifndef HAVE_PROCESSOR_GROUP_INFO -typedef struct _PROCESSOR_GROUP_INFO { - BYTE MaximumProcessorCount; - BYTE ActiveProcessorCount; - BYTE Reserved[38]; - KAFFINITY ActiveProcessorMask; -} PROCESSOR_GROUP_INFO, *PPROCESSOR_GROUP_INFO; -#endif - -#ifndef HAVE_GROUP_RELATIONSHIP -typedef struct _GROUP_RELATIONSHIP { - WORD MaximumGroupCount; - WORD ActiveGroupCount; - ULONGLONG Reserved[2]; - PROCESSOR_GROUP_INFO GroupInfo[ANYSIZE_ARRAY]; -} GROUP_RELATIONSHIP, *PGROUP_RELATIONSHIP; -#endif - -#ifndef HAVE_SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX -typedef struct _SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX { - LOGICAL_PROCESSOR_RELATIONSHIP Relationship; - DWORD Size; - _ANONYMOUS_UNION - union { - PROCESSOR_RELATIONSHIP Processor; - NUMA_NODE_RELATIONSHIP NumaNode; - CACHE_RELATIONSHIP Cache; - GROUP_RELATIONSHIP Group; - /* Odd: no member to tell the cpu mask of the package... */ - } DUMMYUNIONNAME; -} SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX, *PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX; -#endif - -#ifndef HAVE_PSAPI_WORKING_SET_EX_BLOCK -typedef union _PSAPI_WORKING_SET_EX_BLOCK { - ULONG_PTR Flags; - struct { - unsigned Valid :1; - unsigned ShareCount :3; - unsigned Win32Protection :11; - unsigned Shared :1; - unsigned Node :6; - unsigned Locked :1; - unsigned LargePage :1; - }; -} PSAPI_WORKING_SET_EX_BLOCK; -#endif - -#ifndef HAVE_PSAPI_WORKING_SET_EX_INFORMATION -typedef struct _PSAPI_WORKING_SET_EX_INFORMATION { - PVOID VirtualAddress; - PSAPI_WORKING_SET_EX_BLOCK VirtualAttributes; -} PSAPI_WORKING_SET_EX_INFORMATION; -#endif - -#ifndef HAVE_PROCESSOR_NUMBER -typedef struct _PROCESSOR_NUMBER { - WORD Group; - BYTE Number; - BYTE Reserved; -} PROCESSOR_NUMBER, *PPROCESSOR_NUMBER; -#endif - -/* Function pointers */ - -typedef WORD (WINAPI *PFN_GETACTIVEPROCESSORGROUPCOUNT)(void); -static PFN_GETACTIVEPROCESSORGROUPCOUNT GetActiveProcessorGroupCountProc; - -static unsigned long nr_processor_groups = 1; - -typedef WORD (WINAPI *PFN_GETACTIVEPROCESSORCOUNT)(WORD); -static PFN_GETACTIVEPROCESSORCOUNT GetActiveProcessorCountProc; - -typedef DWORD (WINAPI *PFN_GETCURRENTPROCESSORNUMBER)(void); -static PFN_GETCURRENTPROCESSORNUMBER GetCurrentProcessorNumberProc; - -typedef VOID (WINAPI *PFN_GETCURRENTPROCESSORNUMBEREX)(PPROCESSOR_NUMBER); -static PFN_GETCURRENTPROCESSORNUMBEREX GetCurrentProcessorNumberExProc; - -typedef BOOL (WINAPI *PFN_GETLOGICALPROCESSORINFORMATION)(PSYSTEM_LOGICAL_PROCESSOR_INFORMATION Buffer, PDWORD ReturnLength); -static PFN_GETLOGICALPROCESSORINFORMATION GetLogicalProcessorInformationProc; - -typedef BOOL (WINAPI *PFN_GETLOGICALPROCESSORINFORMATIONEX)(LOGICAL_PROCESSOR_RELATIONSHIP relationship, PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX Buffer, PDWORD ReturnLength); -static PFN_GETLOGICALPROCESSORINFORMATIONEX GetLogicalProcessorInformationExProc; - -typedef BOOL (WINAPI *PFN_SETTHREADGROUPAFFINITY)(HANDLE hThread, const GROUP_AFFINITY *GroupAffinity, PGROUP_AFFINITY PreviousGroupAffinity); -static PFN_SETTHREADGROUPAFFINITY SetThreadGroupAffinityProc; - -typedef BOOL (WINAPI *PFN_GETTHREADGROUPAFFINITY)(HANDLE hThread, PGROUP_AFFINITY GroupAffinity); -static PFN_GETTHREADGROUPAFFINITY GetThreadGroupAffinityProc; - -typedef BOOL (WINAPI *PFN_GETNUMAAVAILABLEMEMORYNODE)(UCHAR Node, PULONGLONG AvailableBytes); -static PFN_GETNUMAAVAILABLEMEMORYNODE GetNumaAvailableMemoryNodeProc; - -typedef BOOL (WINAPI *PFN_GETNUMAAVAILABLEMEMORYNODEEX)(USHORT Node, PULONGLONG AvailableBytes); -static PFN_GETNUMAAVAILABLEMEMORYNODEEX GetNumaAvailableMemoryNodeExProc; - -typedef LPVOID (WINAPI *PFN_VIRTUALALLOCEXNUMA)(HANDLE hProcess, LPVOID lpAddress, SIZE_T dwSize, DWORD flAllocationType, DWORD flProtect, DWORD nndPreferred); -static PFN_VIRTUALALLOCEXNUMA VirtualAllocExNumaProc; - -typedef BOOL (WINAPI *PFN_VIRTUALFREEEX)(HANDLE hProcess, LPVOID lpAddress, SIZE_T dwSize, DWORD dwFreeType); -static PFN_VIRTUALFREEEX VirtualFreeExProc; - -typedef BOOL (WINAPI *PFN_QUERYWORKINGSETEX)(HANDLE hProcess, PVOID pv, DWORD cb); -static PFN_QUERYWORKINGSETEX QueryWorkingSetExProc; - -static void hwloc_win_get_function_ptrs(void) -{ - HMODULE kernel32; - - kernel32 = LoadLibrary("kernel32.dll"); - if (kernel32) { - GetActiveProcessorGroupCountProc = - (PFN_GETACTIVEPROCESSORGROUPCOUNT) GetProcAddress(kernel32, "GetActiveProcessorGroupCount"); - GetActiveProcessorCountProc = - (PFN_GETACTIVEPROCESSORCOUNT) GetProcAddress(kernel32, "GetActiveProcessorCount"); - GetLogicalProcessorInformationProc = - (PFN_GETLOGICALPROCESSORINFORMATION) GetProcAddress(kernel32, "GetLogicalProcessorInformation"); - GetCurrentProcessorNumberProc = - (PFN_GETCURRENTPROCESSORNUMBER) GetProcAddress(kernel32, "GetCurrentProcessorNumber"); - GetCurrentProcessorNumberExProc = - (PFN_GETCURRENTPROCESSORNUMBEREX) GetProcAddress(kernel32, "GetCurrentProcessorNumberEx"); - SetThreadGroupAffinityProc = - (PFN_SETTHREADGROUPAFFINITY) GetProcAddress(kernel32, "SetThreadGroupAffinity"); - GetThreadGroupAffinityProc = - (PFN_GETTHREADGROUPAFFINITY) GetProcAddress(kernel32, "GetThreadGroupAffinity"); - GetNumaAvailableMemoryNodeProc = - (PFN_GETNUMAAVAILABLEMEMORYNODE) GetProcAddress(kernel32, "GetNumaAvailableMemoryNode"); - GetNumaAvailableMemoryNodeExProc = - (PFN_GETNUMAAVAILABLEMEMORYNODEEX) GetProcAddress(kernel32, "GetNumaAvailableMemoryNodeEx"); - GetLogicalProcessorInformationExProc = - (PFN_GETLOGICALPROCESSORINFORMATIONEX)GetProcAddress(kernel32, "GetLogicalProcessorInformationEx"); - VirtualAllocExNumaProc = - (PFN_VIRTUALALLOCEXNUMA) GetProcAddress(kernel32, "K32QueryWorkingSetEx"); - VirtualAllocExNumaProc =* - (PFN_VIRTUALALLOCEXNUMA) GetProcAddress(kernel32, "VirtualAllocExNuma"); - VirtualFreeExProc = - (PFN_VIRTUALFREEEX) GetProcAddress(kernel32, "VirtualFreeEx"); - } - - if (GetActiveProcessorGroupCountProc) - nr_processor_groups = GetActiveProcessorGroupCountProc(); - - if (!VirtualAllocExNumaProc) { - HMODULE psapi = LoadLibrary("psapi.dll"); - if (psapi) - VirtualAllocExNumaProc = (PFN_VIRTUALALLOCEXNUMA) GetProcAddress(psapi, "QueryWorkingSetEx"); - } -} - -/* - * ULONG_PTR and DWORD_PTR are 64/32bits depending on the arch - * while bitmaps use unsigned long (always 32bits) - */ - -static void hwloc_bitmap_from_ULONG_PTR(hwloc_bitmap_t set, ULONG_PTR mask) -{ -#if SIZEOF_VOID_P == 8 - hwloc_bitmap_from_ulong(set, mask & 0xffffffff); - hwloc_bitmap_set_ith_ulong(set, 1, mask >> 32); -#else - hwloc_bitmap_from_ulong(set, mask); -#endif -} - -static void hwloc_bitmap_from_ith_ULONG_PTR(hwloc_bitmap_t set, unsigned i, ULONG_PTR mask) -{ -#if SIZEOF_VOID_P == 8 - hwloc_bitmap_from_ith_ulong(set, 2*i, mask & 0xffffffff); - hwloc_bitmap_set_ith_ulong(set, 2*i+1, mask >> 32); -#else - hwloc_bitmap_from_ith_ulong(set, i, mask); -#endif -} - -static void hwloc_bitmap_set_ith_ULONG_PTR(hwloc_bitmap_t set, unsigned i, ULONG_PTR mask) -{ -#if SIZEOF_VOID_P == 8 - hwloc_bitmap_set_ith_ulong(set, 2*i, mask & 0xffffffff); - hwloc_bitmap_set_ith_ulong(set, 2*i+1, mask >> 32); -#else - hwloc_bitmap_set_ith_ulong(set, i, mask); -#endif -} - -static ULONG_PTR hwloc_bitmap_to_ULONG_PTR(hwloc_const_bitmap_t set) -{ -#if SIZEOF_VOID_P == 8 - ULONG_PTR up = hwloc_bitmap_to_ith_ulong(set, 1); - up <<= 32; - up |= hwloc_bitmap_to_ulong(set); - return up; -#else - return hwloc_bitmap_to_ulong(set); -#endif -} - -static ULONG_PTR hwloc_bitmap_to_ith_ULONG_PTR(hwloc_const_bitmap_t set, unsigned i) -{ -#if SIZEOF_VOID_P == 8 - ULONG_PTR up = hwloc_bitmap_to_ith_ulong(set, 2*i+1); - up <<= 32; - up |= hwloc_bitmap_to_ith_ulong(set, 2*i); - return up; -#else - return hwloc_bitmap_to_ith_ulong(set, i); -#endif -} - -/* convert set into index+mask if all set bits are in the same ULONG. - * otherwise return -1. - */ -static int hwloc_bitmap_to_single_ULONG_PTR(hwloc_const_bitmap_t set, unsigned *index, ULONG_PTR *mask) -{ - unsigned first_ulp, last_ulp; - if (hwloc_bitmap_weight(set) == -1) - return -1; - first_ulp = hwloc_bitmap_first(set) / (sizeof(ULONG_PTR)*8); - last_ulp = hwloc_bitmap_last(set) / (sizeof(ULONG_PTR)*8); - if (first_ulp != last_ulp) - return -1; - *mask = hwloc_bitmap_to_ith_ULONG_PTR(set, first_ulp); - *index = first_ulp; - return 0; -} - -/************************************************************** - * hwloc PU numbering with respect to Windows processor groups - * - * Everywhere below we reserve 64 physical indexes per processor groups because that's - * the maximum (MAXIMUM_PROC_PER_GROUP). Windows may actually use less bits than that - * in some groups (either to avoid splitting NUMA nodes across groups, or because of OS - * tweaks such as "bcdedit /set groupsize 8") but we keep some unused indexes for simplicity. - * That means PU physical indexes and cpusets may be non-contigous. - * That also means hwloc_fallback_nbprocessors() below must return the last PU index + 1 - * instead the actual number of processors. - */ - -/******************** - * last_cpu_location - */ - -static int -hwloc_win_get_thisthread_last_cpu_location(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_cpuset_t set, int flags __hwloc_attribute_unused) -{ - assert(GetCurrentProcessorNumberExProc || (GetCurrentProcessorNumberProc && nr_processor_groups == 1)); - - if (nr_processor_groups > 1 || !GetCurrentProcessorNumberProc) { - PROCESSOR_NUMBER num; - GetCurrentProcessorNumberExProc(&num); - hwloc_bitmap_from_ith_ULONG_PTR(set, num.Group, ((ULONG_PTR)1) << num.Number); - return 0; - } - - hwloc_bitmap_from_ith_ULONG_PTR(set, 0, ((ULONG_PTR)1) << GetCurrentProcessorNumberProc()); - return 0; -} - -/* TODO: hwloc_win_get_thisproc_last_cpu_location() using - * CreateToolhelp32Snapshot(), Thread32First/Next() - * th.th32OwnerProcessID == GetCurrentProcessId() for filtering within process - * OpenThread(THREAD_SET_INFORMATION|THREAD_QUERY_INFORMATION, FALSE, te32.th32ThreadID) to get a handle. - */ - - -/****************************** - * set cpu/membind for threads - */ - -/* TODO: SetThreadIdealProcessor{,Ex} */ - -static int -hwloc_win_set_thread_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_thread_t thread, hwloc_const_bitmap_t hwloc_set, int flags) -{ - DWORD_PTR mask; - unsigned group; - - if (flags & HWLOC_CPUBIND_NOMEMBIND) { - errno = ENOSYS; - return -1; - } - - if (hwloc_bitmap_to_single_ULONG_PTR(hwloc_set, &group, &mask) < 0) { - errno = ENOSYS; - return -1; - } - - assert(nr_processor_groups == 1 || SetThreadGroupAffinityProc); - - if (nr_processor_groups > 1) { - GROUP_AFFINITY aff; - memset(&aff, 0, sizeof(aff)); /* we get Invalid Parameter error if Reserved field isn't cleared */ - aff.Group = group; - aff.Mask = mask; - if (!SetThreadGroupAffinityProc(thread, &aff, NULL)) - return -1; - - } else { - /* SetThreadAffinityMask() only changes the mask inside the current processor group */ - /* The resulting binding is always strict */ - if (!SetThreadAffinityMask(thread, mask)) - return -1; - } - return 0; -} - -/* TODO: SetThreadGroupAffinity to get affinity */ - -static int -hwloc_win_set_thisthread_cpubind(hwloc_topology_t topology, hwloc_const_bitmap_t hwloc_set, int flags) -{ - return hwloc_win_set_thread_cpubind(topology, GetCurrentThread(), hwloc_set, flags); -} - -static int -hwloc_win_set_thisthread_membind(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) -{ - int ret; - hwloc_cpuset_t cpuset; - - if ((policy != HWLOC_MEMBIND_DEFAULT && policy != HWLOC_MEMBIND_BIND) - || flags & HWLOC_MEMBIND_NOCPUBIND) { - errno = ENOSYS; - return -1; - } - - cpuset = hwloc_bitmap_alloc(); - hwloc_cpuset_from_nodeset(topology, cpuset, nodeset); - ret = hwloc_win_set_thisthread_cpubind(topology, cpuset, flags & HWLOC_MEMBIND_STRICT?HWLOC_CPUBIND_STRICT:0); - hwloc_bitmap_free(cpuset); - return ret; -} - - -/****************************** - * get cpu/membind for threads - */ - - static int -hwloc_win_get_thread_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_thread_t thread, hwloc_cpuset_t set, int flags __hwloc_attribute_unused) -{ - GROUP_AFFINITY aff; - - assert(GetThreadGroupAffinityProc); - - if (!GetThreadGroupAffinityProc(thread, &aff)) - return -1; - hwloc_bitmap_from_ith_ULONG_PTR(set, aff.Group, aff.Mask); - return 0; -} - -static int -hwloc_win_get_thisthread_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_cpuset_t set, int flags __hwloc_attribute_unused) -{ - return hwloc_win_get_thread_cpubind(topology, GetCurrentThread(), set, flags); -} - -static int -hwloc_win_get_thisthread_membind(hwloc_topology_t topology, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags) -{ - int ret; - hwloc_cpuset_t cpuset = hwloc_bitmap_alloc(); - ret = hwloc_win_get_thread_cpubind(topology, GetCurrentThread(), cpuset, flags); - if (!ret) { - *policy = HWLOC_MEMBIND_BIND; - hwloc_cpuset_to_nodeset(topology, cpuset, nodeset); - } - hwloc_bitmap_free(cpuset); - return ret; -} - - -/******************************** - * set cpu/membind for processes - */ - -static int -hwloc_win_set_proc_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_pid_t proc, hwloc_const_bitmap_t hwloc_set, int flags) -{ - DWORD_PTR mask; - - assert(nr_processor_groups == 1); - - if (flags & HWLOC_CPUBIND_NOMEMBIND) { - errno = ENOSYS; - return -1; - } - - /* TODO: SetThreadGroupAffinity() for all threads doesn't enforce the whole process affinity, - * maybe because of process-specific resource locality */ - /* TODO: if we are in a single group (check with GetProcessGroupAffinity()), - * SetProcessAffinityMask() changes the binding within that same group. - */ - /* TODO: NtSetInformationProcess() works very well for binding to any mask in a single group, - * but it's an internal routine. - */ - /* TODO: checks whether hwloc-bind.c needs to pass INHERIT_PARENT_AFFINITY to CreateProcess() instead of execvp(). */ - - /* The resulting binding is always strict */ - mask = hwloc_bitmap_to_ULONG_PTR(hwloc_set); - if (!SetProcessAffinityMask(proc, mask)) - return -1; - return 0; -} - -static int -hwloc_win_set_thisproc_cpubind(hwloc_topology_t topology, hwloc_const_bitmap_t hwloc_set, int flags) -{ - return hwloc_win_set_proc_cpubind(topology, GetCurrentProcess(), hwloc_set, flags); -} - -static int -hwloc_win_set_proc_membind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) -{ - int ret; - hwloc_cpuset_t cpuset; - - if ((policy != HWLOC_MEMBIND_DEFAULT && policy != HWLOC_MEMBIND_BIND) - || flags & HWLOC_MEMBIND_NOCPUBIND) { - errno = ENOSYS; - return -1; - } - - cpuset = hwloc_bitmap_alloc(); - hwloc_cpuset_from_nodeset(topology, cpuset, nodeset); - ret = hwloc_win_set_proc_cpubind(topology, pid, cpuset, flags & HWLOC_MEMBIND_STRICT?HWLOC_CPUBIND_STRICT:0); - hwloc_bitmap_free(cpuset); - return ret; -} - -static int -hwloc_win_set_thisproc_membind(hwloc_topology_t topology, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) -{ - return hwloc_win_set_proc_membind(topology, GetCurrentProcess(), nodeset, policy, flags); -} - - -/******************************** - * get cpu/membind for processes - */ - -static int -hwloc_win_get_proc_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_pid_t proc, hwloc_bitmap_t hwloc_set, int flags) -{ - DWORD_PTR proc_mask, sys_mask; - - assert(nr_processor_groups == 1); - - if (flags & HWLOC_CPUBIND_NOMEMBIND) { - errno = ENOSYS; - return -1; - } - - /* TODO: if we are in a single group (check with GetProcessGroupAffinity()), - * GetProcessAffinityMask() gives the mask within that group. - */ - /* TODO: if we are in multiple groups, GetProcessGroupAffinity() gives their IDs, - * but we don't know their masks. - */ - /* TODO: GetThreadGroupAffinity() for all threads can be smaller than the whole process affinity, - * maybe because of process-specific resource locality. - */ - - if (!GetProcessAffinityMask(proc, &proc_mask, &sys_mask)) - return -1; - hwloc_bitmap_from_ULONG_PTR(hwloc_set, proc_mask); - return 0; -} - -static int -hwloc_win_get_proc_membind(hwloc_topology_t topology, hwloc_pid_t pid, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags) -{ - int ret; - hwloc_cpuset_t cpuset = hwloc_bitmap_alloc(); - ret = hwloc_win_get_proc_cpubind(topology, pid, cpuset, flags & HWLOC_MEMBIND_STRICT?HWLOC_CPUBIND_STRICT:0); - if (!ret) { - *policy = HWLOC_MEMBIND_BIND; - hwloc_cpuset_to_nodeset(topology, cpuset, nodeset); - } - hwloc_bitmap_free(cpuset); - return ret; -} - -static int -hwloc_win_get_thisproc_cpubind(hwloc_topology_t topology, hwloc_bitmap_t hwloc_cpuset, int flags) -{ - return hwloc_win_get_proc_cpubind(topology, GetCurrentProcess(), hwloc_cpuset, flags); -} - -static int -hwloc_win_get_thisproc_membind(hwloc_topology_t topology, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags) -{ - return hwloc_win_get_proc_membind(topology, GetCurrentProcess(), nodeset, policy, flags); -} - - -/************************ - * membind alloc/free - */ - -static void * -hwloc_win_alloc(hwloc_topology_t topology __hwloc_attribute_unused, size_t len) { - return VirtualAlloc(NULL, len, MEM_COMMIT|MEM_RESERVE, PAGE_EXECUTE_READWRITE); -} - -static void * -hwloc_win_alloc_membind(hwloc_topology_t topology __hwloc_attribute_unused, size_t len, hwloc_const_nodeset_t nodeset, hwloc_membind_policy_t policy, int flags) { - int node; - - switch (policy) { - case HWLOC_MEMBIND_DEFAULT: - case HWLOC_MEMBIND_BIND: - break; - default: - errno = ENOSYS; - return hwloc_alloc_or_fail(topology, len, flags); - } - - if (flags & HWLOC_MEMBIND_STRICT) { - errno = ENOSYS; - return NULL; - } - - if (hwloc_bitmap_weight(nodeset) != 1) { - /* Not a single node, can't do this */ - errno = EXDEV; - return hwloc_alloc_or_fail(topology, len, flags); - } - - node = hwloc_bitmap_first(nodeset); - return VirtualAllocExNumaProc(GetCurrentProcess(), NULL, len, MEM_COMMIT|MEM_RESERVE, PAGE_EXECUTE_READWRITE, node); -} - -static int -hwloc_win_free_membind(hwloc_topology_t topology __hwloc_attribute_unused, void *addr, size_t len __hwloc_attribute_unused) { - if (!addr) - return 0; - if (!VirtualFreeExProc(GetCurrentProcess(), addr, 0, MEM_RELEASE)) - return -1; - return 0; -} - - -/********************** - * membind for areas - */ - -static int -hwloc_win_get_area_membind(hwloc_topology_t topology __hwloc_attribute_unused, const void *addr, size_t len, hwloc_nodeset_t nodeset, hwloc_membind_policy_t * policy, int flags) -{ - SYSTEM_INFO SystemInfo; - DWORD page_size; - uintptr_t start; - unsigned nb; - - GetSystemInfo(&SystemInfo); - page_size = SystemInfo.dwPageSize; - - start = (((uintptr_t) addr) / page_size) * page_size; - nb = (unsigned)((((uintptr_t) addr + len - start) + page_size - 1) / page_size); - - if (!nb) - nb = 1; - - { - PSAPI_WORKING_SET_EX_INFORMATION *pv; - unsigned i; - - pv = calloc(nb, sizeof(*pv)); - - for (i = 0; i < nb; i++) - pv[i].VirtualAddress = (void*) (start + i * page_size); - if (!QueryWorkingSetExProc(GetCurrentProcess(), pv, nb * sizeof(*pv))) { - free(pv); - return -1; - } - *policy = HWLOC_MEMBIND_BIND; - if (flags & HWLOC_MEMBIND_STRICT) { - unsigned node = pv[0].VirtualAttributes.Node; - for (i = 1; i < nb; i++) { - if (pv[i].VirtualAttributes.Node != node) { - errno = EXDEV; - free(pv); - return -1; - } - } - hwloc_bitmap_only(nodeset, node); - free(pv); - return 0; - } - hwloc_bitmap_zero(nodeset); - for (i = 0; i < nb; i++) - hwloc_bitmap_set(nodeset, pv[i].VirtualAttributes.Node); - free(pv); - return 0; - } -} - - -/************************* - * discovery - */ - -static int -hwloc_look_windows(struct hwloc_backend *backend) -{ - struct hwloc_topology *topology = backend->topology; - hwloc_bitmap_t groups_pu_set = NULL; - SYSTEM_INFO SystemInfo; - DWORD length; - - if (topology->levels[0][0]->cpuset) - /* somebody discovered things */ - return 0; - - hwloc_alloc_obj_cpusets(topology->levels[0][0]); - - GetSystemInfo(&SystemInfo); - - if (!GetLogicalProcessorInformationExProc && GetLogicalProcessorInformationProc) { - PSYSTEM_LOGICAL_PROCESSOR_INFORMATION procInfo; - unsigned id; - unsigned i; - struct hwloc_obj *obj; - hwloc_obj_type_t type; - - length = 0; - procInfo = NULL; - - while (1) { - if (GetLogicalProcessorInformationProc(procInfo, &length)) - break; - if (GetLastError() != ERROR_INSUFFICIENT_BUFFER) - return -1; - procInfo = realloc(procInfo, length); - } - - assert(!length || procInfo); - - for (i = 0; i < length / sizeof(*procInfo); i++) { - - /* Ignore unknown caches */ - if (procInfo->Relationship == RelationCache - && procInfo->Cache.Type != CacheUnified - && procInfo->Cache.Type != CacheData - && procInfo->Cache.Type != CacheInstruction) - continue; - - id = -1; - switch (procInfo[i].Relationship) { - case RelationNumaNode: - type = HWLOC_OBJ_NUMANODE; - id = procInfo[i].NumaNode.NodeNumber; - break; - case RelationProcessorPackage: - type = HWLOC_OBJ_PACKAGE; - break; - case RelationCache: - type = HWLOC_OBJ_CACHE; - break; - case RelationProcessorCore: - type = HWLOC_OBJ_CORE; - break; - case RelationGroup: - default: - type = HWLOC_OBJ_GROUP; - break; - } - - obj = hwloc_alloc_setup_object(type, id); - obj->cpuset = hwloc_bitmap_alloc(); - hwloc_debug("%s#%u mask %lx\n", hwloc_obj_type_string(type), id, procInfo[i].ProcessorMask); - /* ProcessorMask is a ULONG_PTR */ - hwloc_bitmap_set_ith_ULONG_PTR(obj->cpuset, 0, procInfo[i].ProcessorMask); - hwloc_debug_2args_bitmap("%s#%u bitmap %s\n", hwloc_obj_type_string(type), id, obj->cpuset); - - switch (type) { - case HWLOC_OBJ_NUMANODE: - { - ULONGLONG avail; - obj->nodeset = hwloc_bitmap_alloc(); - hwloc_bitmap_set(obj->nodeset, id); - if ((GetNumaAvailableMemoryNodeExProc && GetNumaAvailableMemoryNodeExProc(id, &avail)) - || (GetNumaAvailableMemoryNodeProc && GetNumaAvailableMemoryNodeProc(id, &avail))) - obj->memory.local_memory = avail; - obj->memory.page_types_len = 2; - obj->memory.page_types = malloc(2 * sizeof(*obj->memory.page_types)); - memset(obj->memory.page_types, 0, 2 * sizeof(*obj->memory.page_types)); - obj->memory.page_types_len = 1; - obj->memory.page_types[0].size = SystemInfo.dwPageSize; -#ifdef HAVE__SC_LARGE_PAGESIZE - obj->memory.page_types_len++; - obj->memory.page_types[1].size = sysconf(_SC_LARGE_PAGESIZE); -#endif - break; - } - case HWLOC_OBJ_CACHE: - obj->attr->cache.size = procInfo[i].Cache.Size; - obj->attr->cache.associativity = procInfo[i].Cache.Associativity == CACHE_FULLY_ASSOCIATIVE ? -1 : procInfo[i].Cache.Associativity ; - obj->attr->cache.linesize = procInfo[i].Cache.LineSize; - obj->attr->cache.depth = procInfo[i].Cache.Level; - switch (procInfo->Cache.Type) { - case CacheUnified: - obj->attr->cache.type = HWLOC_OBJ_CACHE_UNIFIED; - break; - case CacheData: - obj->attr->cache.type = HWLOC_OBJ_CACHE_DATA; - break; - case CacheInstruction: - obj->attr->cache.type = HWLOC_OBJ_CACHE_INSTRUCTION; - break; - default: - hwloc_free_unlinked_object(obj); - continue; - } - break; - case HWLOC_OBJ_GROUP: - obj->attr->group.depth = procInfo[i].Relationship == RelationGroup; - break; - default: - break; - } - hwloc_insert_object_by_cpuset(topology, obj); - } - - free(procInfo); - } - - if (GetLogicalProcessorInformationExProc) { - PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX procInfoTotal, procInfo; - - unsigned id; - struct hwloc_obj *obj; - hwloc_obj_type_t type; - - length = 0; - procInfoTotal = NULL; - - while (1) { - if (GetLogicalProcessorInformationExProc(RelationAll, procInfoTotal, &length)) - break; - if (GetLastError() != ERROR_INSUFFICIENT_BUFFER) - return -1; - procInfoTotal = realloc(procInfoTotal, length); - } - - for (procInfo = procInfoTotal; - (void*) procInfo < (void*) ((uintptr_t) procInfoTotal + length); - procInfo = (void*) ((uintptr_t) procInfo + procInfo->Size)) { - unsigned num, i; - GROUP_AFFINITY *GroupMask; - - /* Ignore unknown caches */ - if (procInfo->Relationship == RelationCache - && procInfo->Cache.Type != CacheUnified - && procInfo->Cache.Type != CacheData - && procInfo->Cache.Type != CacheInstruction) - continue; - - id = -1; - switch (procInfo->Relationship) { - case RelationNumaNode: - type = HWLOC_OBJ_NUMANODE; - num = 1; - GroupMask = &procInfo->NumaNode.GroupMask; - id = procInfo->NumaNode.NodeNumber; - break; - case RelationProcessorPackage: - type = HWLOC_OBJ_PACKAGE; - num = procInfo->Processor.GroupCount; - GroupMask = procInfo->Processor.GroupMask; - break; - case RelationCache: - type = HWLOC_OBJ_CACHE; - num = 1; - GroupMask = &procInfo->Cache.GroupMask; - break; - case RelationProcessorCore: - type = HWLOC_OBJ_CORE; - num = procInfo->Processor.GroupCount; - GroupMask = procInfo->Processor.GroupMask; - break; - case RelationGroup: - /* So strange an interface... */ - for (id = 0; id < procInfo->Group.ActiveGroupCount; id++) { - KAFFINITY mask; - obj = hwloc_alloc_setup_object(HWLOC_OBJ_GROUP, id); - obj->cpuset = hwloc_bitmap_alloc(); - mask = procInfo->Group.GroupInfo[id].ActiveProcessorMask; - hwloc_debug("group %u %d cpus mask %lx\n", id, - procInfo->Group.GroupInfo[id].ActiveProcessorCount, mask); - /* KAFFINITY is ULONG_PTR */ - hwloc_bitmap_set_ith_ULONG_PTR(obj->cpuset, id, mask); - hwloc_debug_2args_bitmap("group %u %d bitmap %s\n", id, procInfo->Group.GroupInfo[id].ActiveProcessorCount, obj->cpuset); - - /* save the set of PUs so that we can create them at the end */ - if (!groups_pu_set) - groups_pu_set = hwloc_bitmap_alloc(); - hwloc_bitmap_or(groups_pu_set, groups_pu_set, obj->cpuset); - - hwloc_insert_object_by_cpuset(topology, obj); - } - continue; - default: - /* Don't know how to get the mask. */ - hwloc_debug("unknown relation %d\n", procInfo->Relationship); - continue; - } - - obj = hwloc_alloc_setup_object(type, id); - obj->cpuset = hwloc_bitmap_alloc(); - for (i = 0; i < num; i++) { - hwloc_debug("%s#%u %d: mask %d:%lx\n", hwloc_obj_type_string(type), id, i, GroupMask[i].Group, GroupMask[i].Mask); - /* GROUP_AFFINITY.Mask is KAFFINITY, which is ULONG_PTR */ - hwloc_bitmap_set_ith_ULONG_PTR(obj->cpuset, GroupMask[i].Group, GroupMask[i].Mask); - } - hwloc_debug_2args_bitmap("%s#%u bitmap %s\n", hwloc_obj_type_string(type), id, obj->cpuset); - - switch (type) { - case HWLOC_OBJ_NUMANODE: - { - ULONGLONG avail; - obj->nodeset = hwloc_bitmap_alloc(); - hwloc_bitmap_set(obj->nodeset, id); - if ((GetNumaAvailableMemoryNodeExProc && GetNumaAvailableMemoryNodeExProc(id, &avail)) - || (GetNumaAvailableMemoryNodeProc && GetNumaAvailableMemoryNodeProc(id, &avail))) - obj->memory.local_memory = avail; - obj->memory.page_types = malloc(2 * sizeof(*obj->memory.page_types)); - memset(obj->memory.page_types, 0, 2 * sizeof(*obj->memory.page_types)); - obj->memory.page_types_len = 1; - obj->memory.page_types[0].size = SystemInfo.dwPageSize; -#ifdef HAVE__SC_LARGE_PAGESIZE - obj->memory.page_types_len++; - obj->memory.page_types[1].size = sysconf(_SC_LARGE_PAGESIZE); -#endif - break; - } - case HWLOC_OBJ_CACHE: - obj->attr->cache.size = procInfo->Cache.CacheSize; - obj->attr->cache.associativity = procInfo->Cache.Associativity == CACHE_FULLY_ASSOCIATIVE ? -1 : procInfo->Cache.Associativity ; - obj->attr->cache.linesize = procInfo->Cache.LineSize; - obj->attr->cache.depth = procInfo->Cache.Level; - switch (procInfo->Cache.Type) { - case CacheUnified: - obj->attr->cache.type = HWLOC_OBJ_CACHE_UNIFIED; - break; - case CacheData: - obj->attr->cache.type = HWLOC_OBJ_CACHE_DATA; - break; - case CacheInstruction: - obj->attr->cache.type = HWLOC_OBJ_CACHE_INSTRUCTION; - break; - default: - hwloc_free_unlinked_object(obj); - continue; - } - break; - default: - break; - } - hwloc_insert_object_by_cpuset(topology, obj); - } - free(procInfoTotal); - } - - if (groups_pu_set) { - /* the system supports multiple Groups. - * PU indexes may be discontiguous, especially if Groups contain less than 64 procs. - */ - hwloc_obj_t obj; - unsigned idx; - hwloc_bitmap_foreach_begin(idx, groups_pu_set) { - obj = hwloc_alloc_setup_object(HWLOC_OBJ_PU, idx); - obj->cpuset = hwloc_bitmap_alloc(); - hwloc_bitmap_only(obj->cpuset, idx); - hwloc_debug_1arg_bitmap("cpu %u has cpuset %s\n", - idx, obj->cpuset); - hwloc_insert_object_by_cpuset(topology, obj); - } hwloc_bitmap_foreach_end(); - hwloc_bitmap_free(groups_pu_set); - } else { - /* no processor groups */ - SYSTEM_INFO sysinfo; - hwloc_obj_t obj; - unsigned idx; - GetSystemInfo(&sysinfo); - for(idx=0; idx<32; idx++) - if (sysinfo.dwActiveProcessorMask & (((DWORD_PTR)1)<cpuset = hwloc_bitmap_alloc(); - hwloc_bitmap_only(obj->cpuset, idx); - hwloc_debug_1arg_bitmap("cpu %u has cpuset %s\n", - idx, obj->cpuset); - hwloc_insert_object_by_cpuset(topology, obj); - } - } - - hwloc_obj_add_info(topology->levels[0][0], "Backend", "Windows"); - if (topology->is_thissystem) - hwloc_add_uname_info(topology, NULL); - return 1; -} - -void -hwloc_set_windows_hooks(struct hwloc_binding_hooks *hooks, - struct hwloc_topology_support *support) -{ - if (GetCurrentProcessorNumberExProc || (GetCurrentProcessorNumberProc && nr_processor_groups == 1)) - hooks->get_thisthread_last_cpu_location = hwloc_win_get_thisthread_last_cpu_location; - - if (nr_processor_groups == 1) { - hooks->set_proc_cpubind = hwloc_win_set_proc_cpubind; - hooks->get_proc_cpubind = hwloc_win_get_proc_cpubind; - hooks->set_thisproc_cpubind = hwloc_win_set_thisproc_cpubind; - hooks->get_thisproc_cpubind = hwloc_win_get_thisproc_cpubind; - hooks->set_proc_membind = hwloc_win_set_proc_membind; - hooks->get_proc_membind = hwloc_win_get_proc_membind; - hooks->set_thisproc_membind = hwloc_win_set_thisproc_membind; - hooks->get_thisproc_membind = hwloc_win_get_thisproc_membind; - } - if (nr_processor_groups == 1 || SetThreadGroupAffinityProc) { - hooks->set_thread_cpubind = hwloc_win_set_thread_cpubind; - hooks->set_thisthread_cpubind = hwloc_win_set_thisthread_cpubind; - hooks->set_thisthread_membind = hwloc_win_set_thisthread_membind; - } - if (GetThreadGroupAffinityProc) { - hooks->get_thread_cpubind = hwloc_win_get_thread_cpubind; - hooks->get_thisthread_cpubind = hwloc_win_get_thisthread_cpubind; - hooks->get_thisthread_membind = hwloc_win_get_thisthread_membind; - } - - if (VirtualAllocExNumaProc) { - hooks->alloc_membind = hwloc_win_alloc_membind; - hooks->alloc = hwloc_win_alloc; - hooks->free_membind = hwloc_win_free_membind; - support->membind->bind_membind = 1; - } - - if (QueryWorkingSetExProc) - hooks->get_area_membind = hwloc_win_get_area_membind; -} - -static int hwloc_windows_component_init(unsigned long flags __hwloc_attribute_unused) -{ - hwloc_win_get_function_ptrs(); - return 0; -} - -static void hwloc_windows_component_finalize(unsigned long flags __hwloc_attribute_unused) -{ -} - -static struct hwloc_backend * -hwloc_windows_component_instantiate(struct hwloc_disc_component *component, - const void *_data1 __hwloc_attribute_unused, - const void *_data2 __hwloc_attribute_unused, - const void *_data3 __hwloc_attribute_unused) -{ - struct hwloc_backend *backend; - backend = hwloc_backend_alloc(component); - if (!backend) - return NULL; - backend->discover = hwloc_look_windows; - return backend; -} - -static struct hwloc_disc_component hwloc_windows_disc_component = { - HWLOC_DISC_COMPONENT_TYPE_CPU, - "windows", - HWLOC_DISC_COMPONENT_TYPE_GLOBAL, - hwloc_windows_component_instantiate, - 50, - NULL -}; - -const struct hwloc_component hwloc_windows_component = { - HWLOC_COMPONENT_ABI, - hwloc_windows_component_init, hwloc_windows_component_finalize, - HWLOC_COMPONENT_TYPE_DISC, - 0, - &hwloc_windows_disc_component -}; - -unsigned -hwloc_fallback_nbprocessors(struct hwloc_topology *topology) { - int n; - SYSTEM_INFO sysinfo; - - /* by default, ignore groups (return only the number in the current group) */ - GetSystemInfo(&sysinfo); - n = sysinfo.dwNumberOfProcessors; /* FIXME could be non-contigous, rather return a mask from dwActiveProcessorMask? */ - - if (nr_processor_groups > 1) { - /* assume n-1 groups are complete, since that's how we store things in cpusets */ - if (GetActiveProcessorCountProc) - n = MAXIMUM_PROC_PER_GROUP*(nr_processor_groups-1) - + GetActiveProcessorCountProc((WORD)nr_processor_groups-1); - else - n = MAXIMUM_PROC_PER_GROUP*nr_processor_groups; - } - - if (n >= 1) - topology->support.discovery->pu = 1; - else - n = 1; - return n; -} diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-x86.c b/opal/mca/hwloc/hwloc1113/hwloc/src/topology-x86.c deleted file mode 100644 index 72f96115d53..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-x86.c +++ /dev/null @@ -1,1208 +0,0 @@ -/* - * Copyright © 2010-2016 Inria. All rights reserved. - * Copyright © 2010-2013 Université Bordeaux - * Copyright © 2010-2011 Cisco Systems, Inc. All rights reserved. - * See COPYING in top-level directory. - * - * - * This backend is only used when the operating system does not export - * the necessary hardware topology information to user-space applications. - * Currently, only the FreeBSD backend relies on this x86 backend. - * - * Other backends such as Linux have their own way to retrieve various - * pieces of hardware topology information from the operating system - * on various architectures, without having to use this x86-specific code. - */ - -#include -#include -#include -#include -#include - -#include - -#ifdef HAVE_VALGRIND_VALGRIND_H -#include -#endif - -struct hwloc_x86_backend_data_s { - unsigned nbprocs; - hwloc_bitmap_t apicid_set; - int apicid_unique; - int is_knl; -}; - -#define has_topoext(features) ((features)[6] & (1 << 22)) -#define has_x2apic(features) ((features)[4] & (1 << 21)) - -struct cacheinfo { - unsigned type; - unsigned level; - unsigned nbthreads_sharing; - unsigned cacheid; - - unsigned linesize; - unsigned linepart; - int inclusive; - int ways; - unsigned sets; - unsigned long size; -}; - -struct procinfo { - unsigned present; - unsigned apicid; - unsigned max_log_proc; - unsigned max_nbcores; - unsigned max_nbthreads; - unsigned packageid; - unsigned nodeid; - unsigned unitid; - unsigned logprocid; - unsigned threadid; - unsigned coreid; - unsigned *otherids; - unsigned levels; - unsigned numcaches; - struct cacheinfo *cache; - char cpuvendor[13]; - char cpumodel[3*4*4+1]; - unsigned cpustepping; - unsigned cpumodelnumber; - unsigned cpufamilynumber; -}; - -enum cpuid_type { - intel, - amd, - unknown -}; - -static void fill_amd_cache(struct procinfo *infos, unsigned level, int type, unsigned cpuid) -{ - struct cacheinfo *cache; - unsigned cachenum; - unsigned long size = 0; - - if (level == 1) - size = ((cpuid >> 24)) << 10; - else if (level == 2) - size = ((cpuid >> 16)) << 10; - else if (level == 3) - size = ((cpuid >> 18)) << 19; - if (!size) - return; - - cachenum = infos->numcaches++; - infos->cache = realloc(infos->cache, infos->numcaches*sizeof(*infos->cache)); - cache = &infos->cache[cachenum]; - - cache->type = type; - cache->level = level; - if (level <= 2) - cache->nbthreads_sharing = 1; - else - cache->nbthreads_sharing = infos->max_log_proc; - cache->linesize = cpuid & 0xff; - cache->linepart = 0; - cache->inclusive = 0; /* old AMD (K8-K10) supposed to have exclusive caches */ - - if (level == 1) { - cache->ways = (cpuid >> 16) & 0xff; - if (cache->ways == 0xff) - /* Fully associative */ - cache->ways = -1; - } else { - static const unsigned ways_tab[] = { 0, 1, 2, 0, 4, 0, 8, 0, 16, 0, 32, 48, 64, 96, 128, -1 }; - unsigned ways = (cpuid >> 12) & 0xf; - cache->ways = ways_tab[ways]; - } - cache->size = size; - cache->sets = 0; - - hwloc_debug("cache L%u t%u linesize %u ways %u size %luKB\n", cache->level, cache->nbthreads_sharing, cache->linesize, cache->ways, cache->size >> 10); -} - -/* Fetch information from the processor itself thanks to cpuid and store it in - * infos for summarize to analyze them globally */ -static void look_proc(struct hwloc_backend *backend, struct procinfo *infos, unsigned highest_cpuid, unsigned highest_ext_cpuid, unsigned *features, enum cpuid_type cpuid_type) -{ - struct hwloc_x86_backend_data_s *data = backend->private_data; - unsigned eax, ebx, ecx = 0, edx; - unsigned cachenum; - struct cacheinfo *cache; - unsigned regs[4]; - unsigned _model, _extendedmodel, _family, _extendedfamily; - - infos->present = 1; - - /* on return from this function, the following fields must be set in infos: - * packageid, nodeid, unitid, coreid, threadid, or -1 - * apicid - * levels and levels slots in otherids[] - * numcaches and numcaches slots in caches[] - * - * max_log_proc, max_nbthreads, max_nbcores, logprocid - * are only used temporarily inside this function and its callees. - */ - - /* Get apicid, max_log_proc, packageid, logprocid from cpuid 0x01 */ - eax = 0x01; - hwloc_x86_cpuid(&eax, &ebx, &ecx, &edx); - infos->apicid = ebx >> 24; - if (edx & (1 << 28)) - infos->max_log_proc = 1 << hwloc_flsl(((ebx >> 16) & 0xff) - 1); - else - infos->max_log_proc = 1; - hwloc_debug("APIC ID 0x%02x max_log_proc %u\n", infos->apicid, infos->max_log_proc); - infos->packageid = infos->apicid / infos->max_log_proc; - infos->logprocid = infos->apicid % infos->max_log_proc; - hwloc_debug("phys %u thread %u\n", infos->packageid, infos->logprocid); - - /* Get cpu model/family/stepping numbers from same cpuid */ - _model = (eax>>4) & 0xf; - _extendedmodel = (eax>>16) & 0xf; - _family = (eax>>8) & 0xf; - _extendedfamily = (eax>>20) & 0xff; - if ((cpuid_type == intel || cpuid_type == amd) && _family == 0xf) { - infos->cpufamilynumber = _family + _extendedfamily; - } else { - infos->cpufamilynumber = _family; - } - if ((cpuid_type == intel && (_family == 0x6 || _family == 0xf)) - || (cpuid_type == amd && _family == 0xf)) { - infos->cpumodelnumber = _model + (_extendedmodel << 4); - } else { - infos->cpumodelnumber = _model; - } - infos->cpustepping = eax & 0xf; - - if (cpuid_type == intel && infos->cpufamilynumber == 0x6 && infos->cpumodelnumber == 0x57) - data->is_knl = 1; - - /* Get cpu vendor string from cpuid 0x00 */ - memset(regs, 0, sizeof(regs)); - regs[0] = 0; - hwloc_x86_cpuid(®s[0], ®s[1], ®s[3], ®s[2]); - memcpy(infos->cpuvendor, regs+1, 4*3); - /* infos was calloc'ed, already ends with \0 */ - - /* Get cpu model string from cpuid 0x80000002-4 */ - if (highest_ext_cpuid >= 0x80000004) { - memset(regs, 0, sizeof(regs)); - regs[0] = 0x80000002; - hwloc_x86_cpuid(®s[0], ®s[1], ®s[2], ®s[3]); - memcpy(infos->cpumodel, regs, 4*4); - regs[0] = 0x80000003; - hwloc_x86_cpuid(®s[0], ®s[1], ®s[2], ®s[3]); - memcpy(infos->cpumodel + 4*4, regs, 4*4); - regs[0] = 0x80000004; - hwloc_x86_cpuid(®s[0], ®s[1], ®s[2], ®s[3]); - memcpy(infos->cpumodel + 4*4*2, regs, 4*4); - /* infos was calloc'ed, already ends with \0 */ - } - - /* Get core/thread information from cpuid 0x80000008 - * (not supported on Intel) - */ - if (cpuid_type != intel && highest_ext_cpuid >= 0x80000008) { - unsigned coreidsize; - eax = 0x80000008; - hwloc_x86_cpuid(&eax, &ebx, &ecx, &edx); - coreidsize = (ecx >> 12) & 0xf; - hwloc_debug("core ID size: %u\n", coreidsize); - if (!coreidsize) { - infos->max_nbcores = (ecx & 0xff) + 1; - } else - infos->max_nbcores = 1 << coreidsize; - hwloc_debug("Thus max # of cores: %u\n", infos->max_nbcores); - /* Still no multithreaded AMD */ - infos->max_nbthreads = 1 ; - hwloc_debug("and max # of threads: %u\n", infos->max_nbthreads); - /* The legacy max_log_proc is deprecated, it can be smaller than max_nbcores, - * which is the maximum number of cores that the processor could theoretically support - * (see "Multiple Core Calculation" in the AMD CPUID specification). - * Recompute packageid/logprocid/threadid/coreid accordingly. - */ - infos->packageid = infos->apicid / infos->max_nbcores; - infos->logprocid = infos->apicid % infos->max_nbcores; - infos->threadid = infos->logprocid % infos->max_nbthreads; - infos->coreid = infos->logprocid / infos->max_nbthreads; - hwloc_debug("this is thread %u of core %u\n", infos->threadid, infos->coreid); - } - - infos->numcaches = 0; - infos->cache = NULL; - - /* Get apicid, nodeid, unitid from cpuid 0x8000001e - * and cache information from cpuid 0x8000001d - * (AMD topology extension) - */ - if (cpuid_type != intel && has_topoext(features)) { - unsigned apic_id, node_id, nodes_per_proc, unit_id, cores_per_unit; - - eax = 0x8000001e; - hwloc_x86_cpuid(&eax, &ebx, &ecx, &edx); - infos->apicid = apic_id = eax; - infos->nodeid = node_id = ecx & 0xff; - nodes_per_proc = ((ecx >> 8) & 7) + 1; - if (nodes_per_proc > 2) { - hwloc_debug("warning: undefined value %d, assuming it means %d\n", nodes_per_proc, nodes_per_proc); - } - infos->unitid = unit_id = ebx & 0xff; - cores_per_unit = ((ebx >> 8) & 3) + 1; - hwloc_debug("x2APIC %08x, %d nodes, node %d, %d cores in unit %d\n", apic_id, nodes_per_proc, node_id, cores_per_unit, unit_id); - - for (cachenum = 0; ; cachenum++) { - unsigned type; - eax = 0x8000001d; - ecx = cachenum; - hwloc_x86_cpuid(&eax, &ebx, &ecx, &edx); - type = eax & 0x1f; - if (type == 0) - break; - infos->numcaches++; - } - - cache = infos->cache = malloc(infos->numcaches * sizeof(*infos->cache)); - - for (cachenum = 0; ; cachenum++) { - unsigned long linesize, linepart, ways, sets; - unsigned type; - eax = 0x8000001d; - ecx = cachenum; - hwloc_x86_cpuid(&eax, &ebx, &ecx, &edx); - - type = eax & 0x1f; - - if (type == 0) - break; - - cache->type = type; - cache->level = (eax >> 5) & 0x7; - /* Note: actually number of cores */ - cache->nbthreads_sharing = ((eax >> 14) & 0xfff) + 1; - - cache->linesize = linesize = (ebx & 0xfff) + 1; - cache->linepart = linepart = ((ebx >> 12) & 0x3ff) + 1; - ways = ((ebx >> 22) & 0x3ff) + 1; - - if (eax & (1 << 9)) - /* Fully associative */ - cache->ways = -1; - else - cache->ways = ways; - cache->sets = sets = ecx + 1; - cache->size = linesize * linepart * ways * sets; - cache->inclusive = edx & 0x2; - - hwloc_debug("cache %u type %u L%u t%u c%u linesize %lu linepart %lu ways %lu sets %lu, size %uKB\n", cachenum, cache->type, cache->level, cache->nbthreads_sharing, infos->max_nbcores, linesize, linepart, ways, sets, cache->size >> 10); - - cache++; - } - } else { - /* If there's no topoext, - * get cache information from cpuid 0x80000005 and 0x80000006 - * (not supported on Intel) - */ - if (cpuid_type != intel && highest_ext_cpuid >= 0x80000005) { - eax = 0x80000005; - hwloc_x86_cpuid(&eax, &ebx, &ecx, &edx); - fill_amd_cache(infos, 1, 1, ecx); /* L1d */ - fill_amd_cache(infos, 1, 2, edx); /* L1i */ - } - if (cpuid_type != intel && highest_ext_cpuid >= 0x80000006) { - eax = 0x80000006; - hwloc_x86_cpuid(&eax, &ebx, &ecx, &edx); - if (ecx & 0xf000) - /* This is actually supported on Intel but LinePerTag isn't returned in bits 8-11. - * Could be useful if some Intels (at least before Core micro-architecture) - * support this leaf without leaf 0x4. - */ - fill_amd_cache(infos, 2, 3, ecx); /* L2u */ - if (edx & 0xf000) - fill_amd_cache(infos, 3, 3, edx); /* L3u */ - } - } - - /* Get thread/core + cache information from cpuid 0x04 - * (not supported on AMD) - */ - if (cpuid_type != amd && highest_cpuid >= 0x04) { - unsigned level; - for (cachenum = 0; ; cachenum++) { - unsigned type; - eax = 0x04; - ecx = cachenum; - hwloc_x86_cpuid(&eax, &ebx, &ecx, &edx); - - type = eax & 0x1f; - - hwloc_debug("cache %u type %u\n", cachenum, type); - - if (type == 0) - break; - level = (eax >> 5) & 0x7; - if (data->is_knl && level == 3) - /* KNL reports wrong L3 information (size always 0, cpuset always the entire machine, ignore it */ - break; - infos->numcaches++; - - if (!cachenum) { - /* by the way, get thread/core information from the first cache */ - infos->max_nbcores = ((eax >> 26) & 0x3f) + 1; - infos->max_nbthreads = infos->max_log_proc / infos->max_nbcores; - hwloc_debug("thus %u threads\n", infos->max_nbthreads); - infos->threadid = infos->logprocid % infos->max_nbthreads; - infos->coreid = infos->logprocid / infos->max_nbthreads; - hwloc_debug("this is thread %u of core %u\n", infos->threadid, infos->coreid); - } - } - - cache = infos->cache = malloc(infos->numcaches * sizeof(*infos->cache)); - - for (cachenum = 0; ; cachenum++) { - unsigned long linesize, linepart, ways, sets; - unsigned type; - eax = 0x04; - ecx = cachenum; - hwloc_x86_cpuid(&eax, &ebx, &ecx, &edx); - - type = eax & 0x1f; - - if (type == 0) - break; - level = (eax >> 5) & 0x7; - if (data->is_knl && level == 3) - /* KNL reports wrong L3 information (size always 0, cpuset always the entire machine, ignore it */ - break; - - cache->type = type; - cache->level = level; - cache->nbthreads_sharing = ((eax >> 14) & 0xfff) + 1; - - cache->linesize = linesize = (ebx & 0xfff) + 1; - cache->linepart = linepart = ((ebx >> 12) & 0x3ff) + 1; - ways = ((ebx >> 22) & 0x3ff) + 1; - if (eax & (1 << 9)) - /* Fully associative */ - cache->ways = -1; - else - cache->ways = ways; - cache->sets = sets = ecx + 1; - cache->size = linesize * linepart * ways * sets; - cache->inclusive = edx & 0x2; - - hwloc_debug("cache %u type %u L%u t%u c%u linesize %lu linepart %lu ways %lu sets %lu, size %uKB\n", cachenum, cache->type, cache->level, cache->nbthreads_sharing, infos->max_nbcores, linesize, linepart, ways, sets, cache->size >> 10); - - cache++; - } - } - - /* Get package/core/thread information from cpuid 0x0b - * (Intel x2APIC) - */ - if (cpuid_type == intel && highest_cpuid >= 0x0b && has_x2apic(features)) { - unsigned level, apic_nextshift, apic_number, apic_type, apic_id = 0, apic_shift = 0, id; - for (level = 0; ; level++) { - ecx = level; - eax = 0x0b; - hwloc_x86_cpuid(&eax, &ebx, &ecx, &edx); - if (!eax && !ebx) - break; - } - if (level) { - infos->levels = level; - infos->otherids = malloc(level * sizeof(*infos->otherids)); - for (level = 0; ; level++) { - ecx = level; - eax = 0x0b; - hwloc_x86_cpuid(&eax, &ebx, &ecx, &edx); - if (!eax && !ebx) - break; - apic_nextshift = eax & 0x1f; - apic_number = ebx & 0xffff; - apic_type = (ecx & 0xff00) >> 8; - apic_id = edx; - id = (apic_id >> apic_shift) & ((1 << (apic_nextshift - apic_shift)) - 1); - hwloc_debug("x2APIC %08x %d: nextshift %d num %2d type %d id %2d\n", apic_id, level, apic_nextshift, apic_number, apic_type, id); - infos->apicid = apic_id; - infos->otherids[level] = UINT_MAX; - switch (apic_type) { - case 1: - infos->threadid = id; - break; - case 2: - infos->coreid = id; - break; - default: - hwloc_debug("x2APIC %d: unknown type %d\n", level, apic_type); - infos->otherids[level] = apic_id >> apic_shift; - break; - } - apic_shift = apic_nextshift; - } - infos->apicid = apic_id; - infos->packageid = apic_id >> apic_shift; - hwloc_debug("x2APIC remainder: %d\n", infos->packageid); - hwloc_debug("this is thread %u of core %u\n", infos->threadid, infos->coreid); - } - } - - /* Now that we have all info, compute cacheids and apply quirks */ - for (cachenum = 0; cachenum < infos->numcaches; cachenum++) { - struct cacheinfo *cache = &infos->cache[cachenum]; - - /* default cacheid value */ - cache->cacheid = infos->apicid / cache->nbthreads_sharing; - - /* AMD quirk */ - if (cpuid_type == amd - && infos->cpufamilynumber== 0x10 && infos->cpumodelnumber == 0x9 - && cache->level == 3 - && (cache->ways == -1 || (cache->ways % 2 == 0)) && cache->nbthreads_sharing >= 8) { - /* Fix AMD family 0x10 model 0x9 (Magny-Cours) with 8 or 12 cores. - * The L3 (and its associativity) is actually split into two halves). - */ - if (cache->nbthreads_sharing == 16) - cache->nbthreads_sharing = 12; /* nbthreads_sharing is a power of 2 but the processor actually has 8 or 12 cores */ - cache->nbthreads_sharing /= 2; - cache->size /= 2; - if (cache->ways != -1) - cache->ways /= 2; - /* AMD Magny-Cours 12-cores processor reserve APIC ids as AAAAAABBBBBB.... - * among first L3 (A), second L3 (B), and unexisting cores (.). - * On multi-socket servers, L3 in non-first sockets may have APIC id ranges - * such as [16-21] that are not aligned on multiple of nbthreads_sharing (6). - * That means, we can't just compare apicid/nbthreads_sharing to identify siblings. - */ - cache->cacheid = (infos->apicid % infos->max_log_proc) / cache->nbthreads_sharing /* cacheid within the package */ - + 2 * (infos->apicid / infos->max_log_proc); /* add 2 caches per previous package */ - - } else if (cpuid_type == amd - && infos->cpufamilynumber == 0x15 - && (infos->cpumodelnumber == 0x1 /* Bulldozer */ || infos->cpumodelnumber == 0x2 /* Piledriver */) - && cache->level == 3 && cache->nbthreads_sharing == 6) { - /* AMD Bulldozer and Piledriver 12-core processors have same APIC ids as Magny-Cours above, - * but we can't merge the checks because the original nbthreads_sharing must be exactly 6 here. - */ - cache->cacheid = (infos->apicid % infos->max_log_proc) / cache->nbthreads_sharing /* cacheid within the package */ - + 2 * (infos->apicid / infos->max_log_proc); /* add 2 cache per previous package */ - } - } - - if (hwloc_bitmap_isset(data->apicid_set, infos->apicid)) - data->apicid_unique = 0; - else - hwloc_bitmap_set(data->apicid_set, infos->apicid); -} - -static void -hwloc_x86_add_cpuinfos(hwloc_obj_t obj, struct procinfo *info, int nodup) -{ - char number[8]; - hwloc_obj_add_info_nodup(obj, "CPUVendor", info->cpuvendor, nodup); - snprintf(number, sizeof(number), "%u", info->cpufamilynumber); - hwloc_obj_add_info_nodup(obj, "CPUFamilyNumber", number, nodup); - snprintf(number, sizeof(number), "%u", info->cpumodelnumber); - hwloc_obj_add_info_nodup(obj, "CPUModelNumber", number, nodup); - if (info->cpumodel[0]) { - const char *c = info->cpumodel; - while (*c == ' ') - c++; - hwloc_obj_add_info_nodup(obj, "CPUModel", c, nodup); - } - snprintf(number, sizeof(number), "%u", info->cpustepping); - hwloc_obj_add_info_nodup(obj, "CPUStepping", number, nodup); -} - -/* Analyse information stored in infos, and build/annotate topology levels accordingly */ -static int summarize(struct hwloc_backend *backend, struct procinfo *infos, int fulldiscovery) -{ - struct hwloc_topology *topology = backend->topology; - struct hwloc_x86_backend_data_s *data = backend->private_data; - unsigned nbprocs = data->nbprocs; - hwloc_bitmap_t complete_cpuset = hwloc_bitmap_alloc(); - unsigned i, j, l, level, type; - unsigned nbpackages = 0; - int one = -1; - unsigned next_group_depth = topology->next_group_depth; - int caches_added = 0; - hwloc_bitmap_t remaining_cpuset; - - for (i = 0; i < nbprocs; i++) - if (infos[i].present) { - hwloc_bitmap_set(complete_cpuset, i); - one = i; - } - - if (one == -1) { - hwloc_bitmap_free(complete_cpuset); - return 0; - } - - remaining_cpuset = hwloc_bitmap_alloc(); - - /* Ideally, when fulldiscovery=0, we could add any object that doesn't exist yet. - * But what if the x86 and the native backends disagree because one is buggy? Which one to trust? - * Only annotate existing objects for now. - */ - - /* Look for packages */ - if (fulldiscovery) { - hwloc_bitmap_t package_cpuset; - hwloc_obj_t package; - - hwloc_bitmap_copy(remaining_cpuset, complete_cpuset); - while ((i = hwloc_bitmap_first(remaining_cpuset)) != (unsigned) -1) { - unsigned packageid = infos[i].packageid; - - package_cpuset = hwloc_bitmap_alloc(); - for (j = i; j < nbprocs; j++) { - if (infos[j].packageid == packageid) { - hwloc_bitmap_set(package_cpuset, j); - hwloc_bitmap_clr(remaining_cpuset, j); - } - } - package = hwloc_alloc_setup_object(HWLOC_OBJ_PACKAGE, packageid); - package->cpuset = package_cpuset; - - hwloc_x86_add_cpuinfos(package, &infos[i], 0); - - hwloc_debug_1arg_bitmap("os package %u has cpuset %s\n", - packageid, package_cpuset); - hwloc_insert_object_by_cpuset(topology, package); - nbpackages++; - } - - } else { - /* Annotate packages previously-existing packages */ - hwloc_obj_t package = NULL; - int same = 1; - nbpackages = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_PACKAGE); - /* check whether all packages have the same info */ - for(i=1; ios_index == (unsigned) -1) { - /* try to fix the package OS index if unknown. - * FIXME: ideally, we should check all bits in case x86 and the native backend disagree. - */ - for(i=0; icpuset, i)) { - package->os_index = infos[i].packageid; - break; - } - } - } - for(i=0; ios_index || (same && package->os_index == (unsigned) -1)) { - hwloc_x86_add_cpuinfos(package, &infos[i], 1); - break; - } - } - } - } - /* If there was no package, annotate the Machine instead */ - if ((!nbpackages) && infos[0].cpumodel[0]) { - hwloc_x86_add_cpuinfos(hwloc_get_root_obj(topology), &infos[0], 1); - } - - /* Look for Numa nodes inside packages */ - if (fulldiscovery) { - hwloc_bitmap_t node_cpuset; - hwloc_obj_t node; - - hwloc_bitmap_copy(remaining_cpuset, complete_cpuset); - while ((i = hwloc_bitmap_first(remaining_cpuset)) != (unsigned) -1) { - unsigned packageid = infos[i].packageid; - unsigned nodeid = infos[i].nodeid; - - if (nodeid == (unsigned)-1) { - hwloc_bitmap_clr(remaining_cpuset, i); - continue; - } - - node_cpuset = hwloc_bitmap_alloc(); - for (j = i; j < nbprocs; j++) { - if (infos[j].nodeid == (unsigned) -1) { - hwloc_bitmap_clr(remaining_cpuset, j); - continue; - } - - if (infos[j].packageid == packageid && infos[j].nodeid == nodeid) { - hwloc_bitmap_set(node_cpuset, j); - hwloc_bitmap_clr(remaining_cpuset, j); - } - } - node = hwloc_alloc_setup_object(HWLOC_OBJ_NUMANODE, nodeid); - node->cpuset = node_cpuset; - node->nodeset = hwloc_bitmap_alloc(); - hwloc_bitmap_set(node->nodeset, nodeid); - hwloc_debug_1arg_bitmap("os node %u has cpuset %s\n", - nodeid, node_cpuset); - hwloc_insert_object_by_cpuset(topology, node); - } - } - - /* Look for Compute units inside packages */ - if (fulldiscovery) { - hwloc_bitmap_t unit_cpuset; - hwloc_obj_t unit; - - hwloc_bitmap_copy(remaining_cpuset, complete_cpuset); - while ((i = hwloc_bitmap_first(remaining_cpuset)) != (unsigned) -1) { - unsigned packageid = infos[i].packageid; - unsigned unitid = infos[i].unitid; - - if (unitid == (unsigned)-1) { - hwloc_bitmap_clr(remaining_cpuset, i); - continue; - } - - unit_cpuset = hwloc_bitmap_alloc(); - for (j = i; j < nbprocs; j++) { - if (infos[j].unitid == (unsigned) -1) { - hwloc_bitmap_clr(remaining_cpuset, j); - continue; - } - - if (infos[j].packageid == packageid && infos[j].unitid == unitid) { - hwloc_bitmap_set(unit_cpuset, j); - hwloc_bitmap_clr(remaining_cpuset, j); - } - } - unit = hwloc_alloc_setup_object(HWLOC_OBJ_GROUP, unitid); - unit->cpuset = unit_cpuset; - hwloc_obj_add_info(unit, "Type", "ComputeUnit"); - hwloc_debug_1arg_bitmap("os unit %u has cpuset %s\n", - unitid, unit_cpuset); - hwloc_insert_object_by_cpuset(topology, unit); - } - } - - /* Look for unknown objects */ - if (infos[one].otherids) { - for (level = infos[one].levels-1; level <= infos[one].levels-1; level--) { - if (infos[one].otherids[level] != UINT_MAX) { - hwloc_bitmap_t unknown_cpuset; - hwloc_obj_t unknown_obj; - - hwloc_bitmap_copy(remaining_cpuset, complete_cpuset); - while ((i = hwloc_bitmap_first(remaining_cpuset)) != (unsigned) -1) { - unsigned unknownid = infos[i].otherids[level]; - - unknown_cpuset = hwloc_bitmap_alloc(); - for (j = i; j < nbprocs; j++) { - if (infos[j].otherids[level] == unknownid) { - hwloc_bitmap_set(unknown_cpuset, j); - hwloc_bitmap_clr(remaining_cpuset, j); - } - } - unknown_obj = hwloc_alloc_setup_object(HWLOC_OBJ_GROUP, unknownid); - unknown_obj->cpuset = unknown_cpuset; - unknown_obj->os_level = level; - unknown_obj->attr->group.depth = topology->next_group_depth + level; - if (next_group_depth <= topology->next_group_depth + level) - next_group_depth = topology->next_group_depth + level + 1; - hwloc_debug_2args_bitmap("os unknown%d %u has cpuset %s\n", - level, unknownid, unknown_cpuset); - hwloc_insert_object_by_cpuset(topology, unknown_obj); - } - } - } - } - - /* Look for cores */ - if (fulldiscovery) { - hwloc_bitmap_t core_cpuset; - hwloc_obj_t core; - - hwloc_bitmap_copy(remaining_cpuset, complete_cpuset); - while ((i = hwloc_bitmap_first(remaining_cpuset)) != (unsigned) -1) { - unsigned packageid = infos[i].packageid; - unsigned coreid = infos[i].coreid; - - if (coreid == (unsigned) -1) { - hwloc_bitmap_clr(remaining_cpuset, i); - continue; - } - - core_cpuset = hwloc_bitmap_alloc(); - for (j = i; j < nbprocs; j++) { - if (infos[j].coreid == (unsigned) -1) { - hwloc_bitmap_clr(remaining_cpuset, j); - continue; - } - - if (infos[j].packageid == packageid && infos[j].coreid == coreid) { - hwloc_bitmap_set(core_cpuset, j); - hwloc_bitmap_clr(remaining_cpuset, j); - } - } - core = hwloc_alloc_setup_object(HWLOC_OBJ_CORE, coreid); - core->cpuset = core_cpuset; - hwloc_debug_1arg_bitmap("os core %u has cpuset %s\n", - coreid, core_cpuset); - hwloc_insert_object_by_cpuset(topology, core); - } - } - - /* Look for PUs */ - if (fulldiscovery) { - unsigned i; - hwloc_debug("%s", "\n\n * CPU cpusets *\n\n"); - for (i=0; icpuset = hwloc_bitmap_alloc(); - hwloc_bitmap_only(obj->cpuset, i); - hwloc_debug_1arg_bitmap("PU %u has cpuset %s\n", i, obj->cpuset); - hwloc_insert_object_by_cpuset(topology, obj); - } - } - - /* Look for caches */ - /* First find max level */ - level = 0; - for (i = 0; i < nbprocs; i++) - for (j = 0; j < infos[i].numcaches; j++) - if (infos[i].cache[j].level > level) - level = infos[i].cache[j].level; - while (level > 0) { - for (type = 1; type <= 3; type++) { - /* Look for caches of that type at level level */ - { - hwloc_obj_t cache; - - hwloc_bitmap_copy(remaining_cpuset, complete_cpuset); - while ((i = hwloc_bitmap_first(remaining_cpuset)) != (unsigned) -1) { - hwloc_bitmap_t puset; - int depth; - - for (l = 0; l < infos[i].numcaches; l++) { - if (infos[i].cache[l].level == level && infos[i].cache[l].type == type) - break; - } - if (l == infos[i].numcaches) { - /* no cache Llevel of that type in i */ - hwloc_bitmap_clr(remaining_cpuset, i); - continue; - } - - puset = hwloc_bitmap_alloc(); - hwloc_bitmap_set(puset, i); - depth = hwloc_get_cache_type_depth(topology, level, - type == 1 ? HWLOC_OBJ_CACHE_DATA : type == 2 ? HWLOC_OBJ_CACHE_INSTRUCTION : HWLOC_OBJ_CACHE_UNIFIED); - if (depth != HWLOC_TYPE_DEPTH_UNKNOWN) - cache = hwloc_get_next_obj_covering_cpuset_by_depth(topology, puset, depth, NULL); - else - cache = NULL; - hwloc_bitmap_free(puset); - - if (cache) { - /* Found cache above that PU, annotate if no such attribute yet */ - if (!hwloc_obj_get_info_by_name(cache, "Inclusive")) - hwloc_obj_add_info(cache, "Inclusive", infos[i].cache[l].inclusive ? "1" : "0"); - hwloc_bitmap_andnot(remaining_cpuset, remaining_cpuset, cache->cpuset); - } else { - /* Add the missing cache */ - hwloc_bitmap_t cache_cpuset; - unsigned packageid = infos[i].packageid; - unsigned cacheid = infos[i].cache[l].cacheid; - /* Now look for others sharing it */ - cache_cpuset = hwloc_bitmap_alloc(); - for (j = i; j < nbprocs; j++) { - unsigned l2; - for (l2 = 0; l2 < infos[j].numcaches; l2++) { - if (infos[j].cache[l2].level == level && infos[j].cache[l2].type == type) - break; - } - if (l2 == infos[j].numcaches) { - /* no cache Llevel of that type in j */ - hwloc_bitmap_clr(remaining_cpuset, j); - continue; - } - if (infos[j].packageid == packageid && infos[j].cache[l2].cacheid == cacheid) { - hwloc_bitmap_set(cache_cpuset, j); - hwloc_bitmap_clr(remaining_cpuset, j); - } - } - cache = hwloc_alloc_setup_object(HWLOC_OBJ_CACHE, cacheid); - cache->attr->cache.depth = level; - cache->attr->cache.size = infos[i].cache[l].size; - cache->attr->cache.linesize = infos[i].cache[l].linesize; - cache->attr->cache.associativity = infos[i].cache[l].ways; - switch (infos[i].cache[l].type) { - case 1: - cache->attr->cache.type = HWLOC_OBJ_CACHE_DATA; - break; - case 2: - cache->attr->cache.type = HWLOC_OBJ_CACHE_INSTRUCTION; - break; - case 3: - cache->attr->cache.type = HWLOC_OBJ_CACHE_UNIFIED; - break; - } - cache->cpuset = cache_cpuset; - hwloc_obj_add_info(cache, "Inclusive", infos[i].cache[l].inclusive ? "1" : "0"); - hwloc_debug_2args_bitmap("os L%u cache %u has cpuset %s\n", - level, cacheid, cache_cpuset); - hwloc_insert_object_by_cpuset(topology, cache); - caches_added++; - } - } - } - } - level--; - } - - hwloc_bitmap_free(remaining_cpuset); - hwloc_bitmap_free(complete_cpuset); - topology->next_group_depth = next_group_depth; - - return fulldiscovery || caches_added; -} - -static int -look_procs(struct hwloc_backend *backend, struct procinfo *infos, int fulldiscovery, - unsigned highest_cpuid, unsigned highest_ext_cpuid, unsigned *features, enum cpuid_type cpuid_type, - int (*get_cpubind)(hwloc_topology_t topology, hwloc_cpuset_t set, int flags), - int (*set_cpubind)(hwloc_topology_t topology, hwloc_const_cpuset_t set, int flags)) -{ - struct hwloc_x86_backend_data_s *data = backend->private_data; - struct hwloc_topology *topology = backend->topology; - unsigned nbprocs = data->nbprocs; - hwloc_bitmap_t orig_cpuset = hwloc_bitmap_alloc(); - hwloc_bitmap_t set; - unsigned i; - int ret = 0; - - if (get_cpubind(topology, orig_cpuset, HWLOC_CPUBIND_STRICT)) { - hwloc_bitmap_free(orig_cpuset); - return -1; - } - - set = hwloc_bitmap_alloc(); - - for (i = 0; i < nbprocs; i++) { - hwloc_bitmap_only(set, i); - hwloc_debug("binding to CPU%d\n", i); - if (set_cpubind(topology, set, HWLOC_CPUBIND_STRICT)) { - hwloc_debug("could not bind to CPU%d: %s\n", i, strerror(errno)); - continue; - } - look_proc(backend, &infos[i], highest_cpuid, highest_ext_cpuid, features, cpuid_type); - } - - set_cpubind(topology, orig_cpuset, 0); - hwloc_bitmap_free(set); - hwloc_bitmap_free(orig_cpuset); - - if (!data->apicid_unique) - fulldiscovery = 0; - else - ret = summarize(backend, infos, fulldiscovery); - return ret; -} - -#if defined HWLOC_FREEBSD_SYS && defined HAVE_CPUSET_SETID -#include -#include -typedef cpusetid_t hwloc_x86_os_state_t; -static void hwloc_x86_os_state_save(hwloc_x86_os_state_t *state) -{ - /* temporary make all cpus available during discovery */ - cpuset_getid(CPU_LEVEL_CPUSET, CPU_WHICH_PID, -1, state); - cpuset_setid(CPU_WHICH_PID, -1, 0); -} -static void hwloc_x86_os_state_restore(hwloc_x86_os_state_t *state) -{ - /* restore initial cpuset */ - cpuset_setid(CPU_WHICH_PID, -1, *state); -} -#else /* !defined HWLOC_FREEBSD_SYS || !defined HAVE_CPUSET_SETID */ -typedef void * hwloc_x86_os_state_t; -static void hwloc_x86_os_state_save(hwloc_x86_os_state_t *state __hwloc_attribute_unused) { } -static void hwloc_x86_os_state_restore(hwloc_x86_os_state_t *state __hwloc_attribute_unused) { } -#endif /* !defined HWLOC_FREEBSD_SYS || !defined HAVE_CPUSET_SETID */ - - -#define INTEL_EBX ('G' | ('e'<<8) | ('n'<<16) | ('u'<<24)) -#define INTEL_EDX ('i' | ('n'<<8) | ('e'<<16) | ('I'<<24)) -#define INTEL_ECX ('n' | ('t'<<8) | ('e'<<16) | ('l'<<24)) - -#define AMD_EBX ('A' | ('u'<<8) | ('t'<<16) | ('h'<<24)) -#define AMD_EDX ('e' | ('n'<<8) | ('t'<<16) | ('i'<<24)) -#define AMD_ECX ('c' | ('A'<<8) | ('M'<<16) | ('D'<<24)) - -/* fake cpubind for when nbprocs=1 and no binding support */ -static int fake_get_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, - hwloc_cpuset_t set __hwloc_attribute_unused, - int flags __hwloc_attribute_unused) -{ - return 0; -} -static int fake_set_cpubind(hwloc_topology_t topology __hwloc_attribute_unused, - hwloc_const_cpuset_t set __hwloc_attribute_unused, - int flags __hwloc_attribute_unused) -{ - return 0; -} - -static -int hwloc_look_x86(struct hwloc_backend *backend, int fulldiscovery) -{ - struct hwloc_x86_backend_data_s *data = backend->private_data; - unsigned nbprocs = data->nbprocs; - unsigned eax, ebx, ecx = 0, edx; - unsigned i; - unsigned highest_cpuid; - unsigned highest_ext_cpuid; - /* This stores cpuid features with the same indexing as Linux */ - unsigned features[10] = { 0 }; - struct procinfo *infos = NULL; - enum cpuid_type cpuid_type = unknown; - hwloc_x86_os_state_t os_state; - struct hwloc_binding_hooks hooks; - struct hwloc_topology_support support; - struct hwloc_topology_membind_support memsupport __hwloc_attribute_unused; - int (*get_cpubind)(hwloc_topology_t topology, hwloc_cpuset_t set, int flags); - int (*set_cpubind)(hwloc_topology_t topology, hwloc_const_cpuset_t set, int flags); - int ret = -1; - - /* check if binding works */ - memset(&hooks, 0, sizeof(hooks)); - support.membind = &memsupport; - hwloc_set_native_binding_hooks(&hooks, &support); - if (hooks.get_thisthread_cpubind && hooks.set_thisthread_cpubind) { - get_cpubind = hooks.get_thisthread_cpubind; - set_cpubind = hooks.set_thisthread_cpubind; - } else if (hooks.get_thisproc_cpubind && hooks.set_thisproc_cpubind) { - get_cpubind = hooks.get_thisproc_cpubind; - set_cpubind = hooks.set_thisproc_cpubind; - } else { - /* we need binding support if there are multiple PUs */ - if (nbprocs > 1) - goto out; - get_cpubind = fake_get_cpubind; - set_cpubind = fake_set_cpubind; - } - - if (!hwloc_have_x86_cpuid()) - goto out; - - infos = calloc(nbprocs, sizeof(struct procinfo)); - if (NULL == infos) - goto out; - for (i = 0; i < nbprocs; i++) { - infos[i].nodeid = (unsigned) -1; - infos[i].packageid = (unsigned) -1; - infos[i].unitid = (unsigned) -1; - infos[i].coreid = (unsigned) -1; - infos[i].threadid = (unsigned) -1; - } - - eax = 0x00; - hwloc_x86_cpuid(&eax, &ebx, &ecx, &edx); - highest_cpuid = eax; - if (ebx == INTEL_EBX && ecx == INTEL_ECX && edx == INTEL_EDX) - cpuid_type = intel; - if (ebx == AMD_EBX && ecx == AMD_ECX && edx == AMD_EDX) - cpuid_type = amd; - - hwloc_debug("highest cpuid %x, cpuid type %u\n", highest_cpuid, cpuid_type); - if (highest_cpuid < 0x01) { - goto out_with_infos; - } - - eax = 0x01; - hwloc_x86_cpuid(&eax, &ebx, &ecx, &edx); - features[0] = edx; - features[4] = ecx; - - eax = 0x80000000; - hwloc_x86_cpuid(&eax, &ebx, &ecx, &edx); - highest_ext_cpuid = eax; - - hwloc_debug("highest extended cpuid %x\n", highest_ext_cpuid); - - if (highest_cpuid >= 0x7) { - eax = 0x7; - ecx = 0; - hwloc_x86_cpuid(&eax, &ebx, &ecx, &edx); - features[9] = ebx; - } - - if (cpuid_type != intel && highest_ext_cpuid >= 0x80000001) { - eax = 0x80000001; - hwloc_x86_cpuid(&eax, &ebx, &ecx, &edx); - features[1] = edx; - features[6] = ecx; - } - - hwloc_x86_os_state_save(&os_state); - - ret = look_procs(backend, infos, fulldiscovery, - highest_cpuid, highest_ext_cpuid, features, cpuid_type, - get_cpubind, set_cpubind); - if (ret >= 0) - /* success, we're done */ - goto out_with_os_state; - - if (nbprocs == 1) { - /* only one processor, no need to bind */ - look_proc(backend, &infos[0], highest_cpuid, highest_ext_cpuid, features, cpuid_type); - ret = summarize(backend, infos, fulldiscovery); - } - -out_with_os_state: - hwloc_x86_os_state_restore(&os_state); - -out_with_infos: - if (NULL != infos) { - for (i = 0; i < nbprocs; i++) { - free(infos[i].cache); - if (infos[i].otherids) - free(infos[i].otherids); - } - free(infos); - } - -out: - return ret; -} - -static int -hwloc_x86_discover(struct hwloc_backend *backend) -{ - struct hwloc_x86_backend_data_s *data = backend->private_data; - struct hwloc_topology *topology = backend->topology; - int alreadypus = 0; - int ret; - -#if HAVE_DECL_RUNNING_ON_VALGRIND - if (RUNNING_ON_VALGRIND) { - fprintf(stderr, "hwloc x86 backend cannot work under Valgrind, disabling.\n"); - return 0; - } -#endif - - data->nbprocs = hwloc_fallback_nbprocessors(topology); - - if (!topology->is_thissystem) { - hwloc_debug("%s", "\nno x86 detection (not thissystem)\n"); - return 0; - } - - if (topology->levels[0][0]->cpuset) { - /* somebody else discovered things */ - if (topology->nb_levels == 2 && topology->level_nbobjects[1] == data->nbprocs) { - /* only PUs were discovered, as much as we would, complete the topology with everything else */ - alreadypus = 1; - goto fulldiscovery; - } - - /* several object types were added, we can't easily complete, just do partial discovery */ - ret = hwloc_look_x86(backend, 0); - if (ret) - hwloc_obj_add_info(topology->levels[0][0], "Backend", "x86"); - return ret; - } else { - /* topology is empty, initialize it */ - hwloc_alloc_obj_cpusets(topology->levels[0][0]); - } - -fulldiscovery: - if (hwloc_look_x86(backend, 1) < 0) { - /* if failed, create PUs */ - if (!alreadypus) - hwloc_setup_pu_level(topology, data->nbprocs); - } - - hwloc_obj_add_info(topology->levels[0][0], "Backend", "x86"); - -#ifdef HAVE_UNAME - hwloc_add_uname_info(topology, NULL); /* we already know is_thissystem() is true */ -#else - /* uname isn't available, manually setup the "Architecture" info */ -#ifdef HWLOC_X86_64_ARCH - hwloc_obj_add_info(topology->levels[0][0], "Architecture", "x86_64"); -#else - hwloc_obj_add_info(topology->levels[0][0], "Architecture", "x86"); -#endif -#endif - return 1; -} - -static void -hwloc_x86_backend_disable(struct hwloc_backend *backend) -{ - struct hwloc_x86_backend_data_s *data = backend->private_data; - hwloc_bitmap_free(data->apicid_set); - free(data); -} - -static struct hwloc_backend * -hwloc_x86_component_instantiate(struct hwloc_disc_component *component, - const void *_data1 __hwloc_attribute_unused, - const void *_data2 __hwloc_attribute_unused, - const void *_data3 __hwloc_attribute_unused) -{ - struct hwloc_backend *backend; - struct hwloc_x86_backend_data_s *data; - - backend = hwloc_backend_alloc(component); - if (!backend) - goto out; - - data = malloc(sizeof(*data)); - if (!data) { - errno = ENOMEM; - goto out_with_backend; - } - - backend->private_data = data; - backend->flags = HWLOC_BACKEND_FLAG_NEED_LEVELS; - backend->discover = hwloc_x86_discover; - backend->disable = hwloc_x86_backend_disable; - - /* default values */ - data->is_knl = 0; - data->apicid_set = hwloc_bitmap_alloc(); - data->apicid_unique = 1; - - return backend; - - out_with_backend: - free(backend); - out: - return NULL; -} - -static struct hwloc_disc_component hwloc_x86_disc_component = { - HWLOC_DISC_COMPONENT_TYPE_CPU, - "x86", - HWLOC_DISC_COMPONENT_TYPE_GLOBAL, - hwloc_x86_component_instantiate, - 45, /* between native and no_os */ - NULL -}; - -const struct hwloc_component hwloc_x86_component = { - HWLOC_COMPONENT_ABI, - NULL, NULL, - HWLOC_COMPONENT_TYPE_DISC, - 0, - &hwloc_x86_disc_component -}; diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-xml.c b/opal/mca/hwloc/hwloc1113/hwloc/src/topology-xml.c deleted file mode 100644 index 220afd1a45d..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/topology-xml.c +++ /dev/null @@ -1,1789 +0,0 @@ -/* - * Copyright © 2009 CNRS - * Copyright © 2009-2016 Inria. All rights reserved. - * Copyright © 2009-2011 Université Bordeaux - * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. - * See COPYING in top-level directory. - */ - -#include -#include -#include -#include -#include -#include - -int -hwloc__xml_verbose(void) -{ - static int first = 1; - static int verbose = 0; - if (first) { - const char *env = getenv("HWLOC_XML_VERBOSE"); - if (env) - verbose = atoi(env); - first = 0; - } - return verbose; -} - -static int -hwloc_nolibxml_import(void) -{ - static int first = 1; - static int nolibxml = 0; - if (first) { - const char *env = getenv("HWLOC_NO_LIBXML_IMPORT"); - if (env) - nolibxml = atoi(env); - first = 0; - } - return nolibxml; -} - -static int -hwloc_nolibxml_export(void) -{ - static int first = 1; - static int nolibxml = 0; - if (first) { - const char *env = getenv("HWLOC_NO_LIBXML_EXPORT"); - if (env) - nolibxml = atoi(env); - first = 0; - } - return nolibxml; -} - -#define BASE64_ENCODED_LENGTH(length) (4*(((length)+2)/3)) - -/********************************* - ********* XML callbacks ********* - *********************************/ - -/* set when registering nolibxml and libxml components. - * modifications protected by the components mutex. - * read by the common XML code in topology-xml.c to jump to the right XML backend. - */ -static struct hwloc_xml_callbacks *hwloc_nolibxml_callbacks = NULL, *hwloc_libxml_callbacks = NULL; - -void -hwloc_xml_callbacks_register(struct hwloc_xml_component *comp) -{ - if (!hwloc_nolibxml_callbacks) - hwloc_nolibxml_callbacks = comp->nolibxml_callbacks; - if (!hwloc_libxml_callbacks) - hwloc_libxml_callbacks = comp->libxml_callbacks; -} - -void -hwloc_xml_callbacks_reset(void) -{ - hwloc_nolibxml_callbacks = NULL; - hwloc_libxml_callbacks = NULL; -} - -/************************************************ - ********* XML import (common routines) ********* - ************************************************/ - -static void -hwloc__xml_import_object_attr(struct hwloc_topology *topology __hwloc_attribute_unused, struct hwloc_obj *obj, - const char *name, const char *value, - hwloc__xml_import_state_t state) -{ - if (!strcmp(name, "type")) { - /* already handled */ - return; - } - - else if (!strcmp(name, "os_level")) - obj->os_level = strtoul(value, NULL, 10); - else if (!strcmp(name, "os_index")) - obj->os_index = strtoul(value, NULL, 10); - else if (!strcmp(name, "cpuset")) { - obj->cpuset = hwloc_bitmap_alloc(); - hwloc_bitmap_sscanf(obj->cpuset, value); - } else if (!strcmp(name, "complete_cpuset")) { - obj->complete_cpuset = hwloc_bitmap_alloc(); - hwloc_bitmap_sscanf(obj->complete_cpuset,value); - } else if (!strcmp(name, "online_cpuset")) { - obj->online_cpuset = hwloc_bitmap_alloc(); - hwloc_bitmap_sscanf(obj->online_cpuset, value); - } else if (!strcmp(name, "allowed_cpuset")) { - obj->allowed_cpuset = hwloc_bitmap_alloc(); - hwloc_bitmap_sscanf(obj->allowed_cpuset, value); - } else if (!strcmp(name, "nodeset")) { - obj->nodeset = hwloc_bitmap_alloc(); - hwloc_bitmap_sscanf(obj->nodeset, value); - } else if (!strcmp(name, "complete_nodeset")) { - obj->complete_nodeset = hwloc_bitmap_alloc(); - hwloc_bitmap_sscanf(obj->complete_nodeset, value); - } else if (!strcmp(name, "allowed_nodeset")) { - obj->allowed_nodeset = hwloc_bitmap_alloc(); - hwloc_bitmap_sscanf(obj->allowed_nodeset, value); - } else if (!strcmp(name, "name")) - obj->name = strdup(value); - - else if (!strcmp(name, "cache_size")) { - unsigned long long lvalue = strtoull(value, NULL, 10); - if (obj->type == HWLOC_OBJ_CACHE) - obj->attr->cache.size = lvalue; - else if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring cache_size attribute for non-cache object type\n", - state->global->msgprefix); - } - - else if (!strcmp(name, "cache_linesize")) { - unsigned long lvalue = strtoul(value, NULL, 10); - if (obj->type == HWLOC_OBJ_CACHE) - obj->attr->cache.linesize = lvalue; - else if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring cache_linesize attribute for non-cache object type\n", - state->global->msgprefix); - } - - else if (!strcmp(name, "cache_associativity")) { - unsigned long lvalue = strtoul(value, NULL, 10); - if (obj->type == HWLOC_OBJ_CACHE) - obj->attr->cache.associativity = lvalue; - else if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring cache_associativity attribute for non-cache object type\n", - state->global->msgprefix); - } - - else if (!strcmp(name, "cache_type")) { - unsigned long lvalue = strtoul(value, NULL, 10); - if (obj->type == HWLOC_OBJ_CACHE) { - if (lvalue == HWLOC_OBJ_CACHE_UNIFIED - || lvalue == HWLOC_OBJ_CACHE_DATA - || lvalue == HWLOC_OBJ_CACHE_INSTRUCTION) - obj->attr->cache.type = (hwloc_obj_cache_type_t) lvalue; - else - fprintf(stderr, "%s: ignoring invalid cache_type attribute %ld\n", - state->global->msgprefix, lvalue); - } else if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring cache_type attribute for non-cache object type\n", - state->global->msgprefix); - } - - else if (!strcmp(name, "local_memory")) - obj->memory.local_memory = strtoull(value, NULL, 10); - - else if (!strcmp(name, "depth")) { - unsigned long lvalue = strtoul(value, NULL, 10); - switch (obj->type) { - case HWLOC_OBJ_CACHE: - obj->attr->cache.depth = lvalue; - break; - case HWLOC_OBJ_GROUP: - obj->attr->group.depth = lvalue; - break; - case HWLOC_OBJ_BRIDGE: - obj->attr->bridge.depth = lvalue; - break; - default: - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring depth attribute for object type without depth\n", - state->global->msgprefix); - break; - } - } - - else if (!strcmp(name, "pci_busid")) { - switch (obj->type) { - case HWLOC_OBJ_PCI_DEVICE: - case HWLOC_OBJ_BRIDGE: { - unsigned domain, bus, dev, func; - if (sscanf(value, "%04x:%02x:%02x.%01x", - &domain, &bus, &dev, &func) != 4) { - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring invalid pci_busid format string %s\n", - state->global->msgprefix, value); - } else { - obj->attr->pcidev.domain = domain; - obj->attr->pcidev.bus = bus; - obj->attr->pcidev.dev = dev; - obj->attr->pcidev.func = func; - } - break; - } - default: - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring pci_busid attribute for non-PCI object\n", - state->global->msgprefix); - break; - } - } - - else if (!strcmp(name, "pci_type")) { - switch (obj->type) { - case HWLOC_OBJ_PCI_DEVICE: - case HWLOC_OBJ_BRIDGE: { - unsigned classid, vendor, device, subvendor, subdevice, revision; - if (sscanf(value, "%04x [%04x:%04x] [%04x:%04x] %02x", - &classid, &vendor, &device, &subvendor, &subdevice, &revision) != 6) { - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring invalid pci_type format string %s\n", - state->global->msgprefix, value); - } else { - obj->attr->pcidev.class_id = classid; - obj->attr->pcidev.vendor_id = vendor; - obj->attr->pcidev.device_id = device; - obj->attr->pcidev.subvendor_id = subvendor; - obj->attr->pcidev.subdevice_id = subdevice; - obj->attr->pcidev.revision = revision; - } - break; - } - default: - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring pci_type attribute for non-PCI object\n", - state->global->msgprefix); - break; - } - } - - else if (!strcmp(name, "pci_link_speed")) { - switch (obj->type) { - case HWLOC_OBJ_PCI_DEVICE: - case HWLOC_OBJ_BRIDGE: { - obj->attr->pcidev.linkspeed = (float) atof(value); - break; - } - default: - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring pci_link_speed attribute for non-PCI object\n", - state->global->msgprefix); - break; - } - } - - else if (!strcmp(name, "bridge_type")) { - switch (obj->type) { - case HWLOC_OBJ_BRIDGE: { - unsigned upstream_type, downstream_type; - if (sscanf(value, "%u-%u", &upstream_type, &downstream_type) != 2) { - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring invalid bridge_type format string %s\n", - state->global->msgprefix, value); - } else { - obj->attr->bridge.upstream_type = (hwloc_obj_bridge_type_t) upstream_type; - obj->attr->bridge.downstream_type = (hwloc_obj_bridge_type_t) downstream_type; - }; - break; - } - default: - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring bridge_type attribute for non-bridge object\n", - state->global->msgprefix); - break; - } - } - - else if (!strcmp(name, "bridge_pci")) { - switch (obj->type) { - case HWLOC_OBJ_BRIDGE: { - unsigned domain, secbus, subbus; - if (sscanf(value, "%04x:[%02x-%02x]", - &domain, &secbus, &subbus) != 3) { - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring invalid bridge_pci format string %s\n", - state->global->msgprefix, value); - } else { - obj->attr->bridge.downstream.pci.domain = domain; - obj->attr->bridge.downstream.pci.secondary_bus = secbus; - obj->attr->bridge.downstream.pci.subordinate_bus = subbus; - } - break; - } - default: - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring bridge_pci attribute for non-bridge object\n", - state->global->msgprefix); - break; - } - } - - else if (!strcmp(name, "osdev_type")) { - switch (obj->type) { - case HWLOC_OBJ_OS_DEVICE: { - unsigned osdev_type; - if (sscanf(value, "%u", &osdev_type) != 1) { - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring invalid osdev_type format string %s\n", - state->global->msgprefix, value); - } else - obj->attr->osdev.type = (hwloc_obj_osdev_type_t) osdev_type; - break; - } - default: - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring osdev_type attribute for non-osdev object\n", - state->global->msgprefix); - break; - } - } - - /************************** - * forward compat with 2.0 - */ - else if (!strcmp(name, "kind") || !strcmp(name, "subkind")) { - if (obj->type == HWLOC_OBJ_GROUP) { - /* ignored, unused in <2.0 */ - } else { - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring %s attribute for non-group object\n", - state->global->msgprefix, name); - } - } - else if (!strcmp(name, "subtype")) { - /* FIXME: should be "CoProcType" for osdev/coproc but we don't have that type-specific attribute yet */ - hwloc_obj_add_info(obj, "Type", value); - } - - - /************************* - * deprecated (from 1.0) - */ - else if (!strcmp(name, "dmi_board_vendor")) { - hwloc_obj_add_info(obj, "DMIBoardVendor", value); - } - else if (!strcmp(name, "dmi_board_name")) { - hwloc_obj_add_info(obj, "DMIBoardName", value); - } - - /************************* - * deprecated (from 0.9) - */ - else if (!strcmp(name, "memory_kB")) { - unsigned long long lvalue = strtoull(value, NULL, 10); - switch (obj->type) { - case HWLOC_OBJ_CACHE: - obj->attr->cache.size = lvalue << 10; - break; - case HWLOC_OBJ_NUMANODE: - case HWLOC_OBJ_MACHINE: - case HWLOC_OBJ_SYSTEM: - obj->memory.local_memory = lvalue << 10; - break; - default: - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring memory_kB attribute for object type without memory\n", - state->global->msgprefix); - break; - } - } - else if (!strcmp(name, "huge_page_size_kB")) { - unsigned long lvalue = strtoul(value, NULL, 10); - switch (obj->type) { - case HWLOC_OBJ_NUMANODE: - case HWLOC_OBJ_MACHINE: - case HWLOC_OBJ_SYSTEM: - if (!obj->memory.page_types) { - obj->memory.page_types = malloc(sizeof(*obj->memory.page_types)); - obj->memory.page_types_len = 1; - } - obj->memory.page_types[0].size = lvalue << 10; - break; - default: - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring huge_page_size_kB attribute for object type without huge pages\n", - state->global->msgprefix); - break; - } - } - else if (!strcmp(name, "huge_page_free")) { - unsigned long lvalue = strtoul(value, NULL, 10); - switch (obj->type) { - case HWLOC_OBJ_NUMANODE: - case HWLOC_OBJ_MACHINE: - case HWLOC_OBJ_SYSTEM: - if (!obj->memory.page_types) { - obj->memory.page_types = malloc(sizeof(*obj->memory.page_types)); - obj->memory.page_types_len = 1; - } - obj->memory.page_types[0].count = lvalue; - break; - default: - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring huge_page_free attribute for object type without huge pages\n", - state->global->msgprefix); - break; - } - } - /* - * end of deprecated (from 0.9) - *******************************/ - - - - else if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring unknown object attribute %s\n", - state->global->msgprefix, name); -} - - -static int -hwloc__xml_import_info(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_t obj, - hwloc__xml_import_state_t state) -{ - char *infoname = NULL; - char *infovalue = NULL; - - while (1) { - char *attrname, *attrvalue; - if (state->global->next_attr(state, &attrname, &attrvalue) < 0) - break; - if (!strcmp(attrname, "name")) - infoname = attrvalue; - else if (!strcmp(attrname, "value")) - infovalue = attrvalue; - else - return -1; - } - - if (infoname) - /* empty strings are ignored by libxml */ - hwloc_obj_add_info(obj, infoname, infovalue ? infovalue : ""); - - return state->global->close_tag(state); -} - -static int -hwloc__xml_import_pagetype(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_t obj, - hwloc__xml_import_state_t state) -{ - uint64_t size = 0, count = 0; - - while (1) { - char *attrname, *attrvalue; - if (state->global->next_attr(state, &attrname, &attrvalue) < 0) - break; - if (!strcmp(attrname, "size")) - size = strtoull(attrvalue, NULL, 10); - else if (!strcmp(attrname, "count")) - count = strtoull(attrvalue, NULL, 10); - else - return -1; - } - - if (size) { - int idx = obj->memory.page_types_len; - obj->memory.page_types = realloc(obj->memory.page_types, (idx+1)*sizeof(*obj->memory.page_types)); - obj->memory.page_types_len = idx+1; - obj->memory.page_types[idx].size = size; - obj->memory.page_types[idx].count = count; - } - - return state->global->close_tag(state); -} - -static int -hwloc__xml_import_distances(struct hwloc_xml_backend_data_s *data, - hwloc_obj_t obj, - hwloc__xml_import_state_t state) -{ - unsigned long reldepth = 0, nbobjs = 0; - float latbase = 0; - char *tag; - int ret; - - while (1) { - char *attrname, *attrvalue; - if (state->global->next_attr(state, &attrname, &attrvalue) < 0) - break; - if (!strcmp(attrname, "nbobjs")) - nbobjs = strtoul(attrvalue, NULL, 10); - else if (!strcmp(attrname, "relative_depth")) - reldepth = strtoul(attrvalue, NULL, 10); - else if (!strcmp(attrname, "latency_base")) - latbase = (float) atof(attrvalue); - else - return -1; - } - - if (nbobjs && reldepth && latbase) { - unsigned i; - float *matrix, latmax = 0; - struct hwloc_xml_imported_distances_s *distances; - - matrix = malloc(nbobjs*nbobjs*sizeof(float)); - distances = malloc(sizeof(*distances)); - if (!matrix || !distances) { - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: failed to allocate distance matrix for %lu objects\n", - state->global->msgprefix, nbobjs); - free(distances); - free(matrix); - return -1; - } - - distances->root = obj; - distances->distances.relative_depth = reldepth; - distances->distances.nbobjs = nbobjs; - distances->distances.latency = matrix; - distances->distances.latency_base = latbase; - - for(i=0; iglobal->find_child(state, &childstate, &tag); - if (ret <= 0 || strcmp(tag, "latency")) { - /* a latency child is needed */ - free(distances->distances.latency); - free(distances); - return -1; - } - - ret = state->global->next_attr(&childstate, &attrname, &attrvalue); - if (ret < 0 || strcmp(attrname, "value")) { - free(distances->distances.latency); - free(distances); - return -1; - } - - val = (float) atof((char *) attrvalue); - matrix[i] = val; - if (val > latmax) - latmax = val; - - ret = state->global->close_tag(&childstate); - if (ret < 0) - return -1; - - state->global->close_child(&childstate); - } - - distances->distances.latency_max = latmax; - - if (nbobjs < 2) { - /* distances with a single object are useless, even if the XML isn't invalid */ - assert(nbobjs == 1); - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring invalid distance matrix with only 1 object\n", - state->global->msgprefix); - free(matrix); - free(distances); - } else { - /* queue the distance */ - if (data->last_distances) - data->last_distances->next = distances; - else - data->first_distances = distances; - distances->prev = data->last_distances; - distances->next = NULL; - } - } - - return state->global->close_tag(state); -} - -static int -hwloc__xml_import_userdata(hwloc_topology_t topology __hwloc_attribute_unused, hwloc_obj_t obj, - hwloc__xml_import_state_t state) -{ - size_t length = 0; - int encoded = 0; - char *name = NULL; /* optional */ - int ret; - - while (1) { - char *attrname, *attrvalue; - if (state->global->next_attr(state, &attrname, &attrvalue) < 0) - break; - if (!strcmp(attrname, "length")) - length = strtoul(attrvalue, NULL, 10); - else if (!strcmp(attrname, "encoding")) - encoded = !strcmp(attrvalue, "base64"); - else if (!strcmp(attrname, "name")) - name = attrvalue; - else - return -1; - } - - if (!topology->userdata_import_cb) { - char *buffer; - size_t reallength = encoded ? BASE64_ENCODED_LENGTH(length) : length; - ret = state->global->get_content(state, &buffer, reallength); - if (ret < 0) - return -1; - - } else if (topology->userdata_not_decoded) { - char *buffer, *fakename; - size_t reallength = encoded ? BASE64_ENCODED_LENGTH(length) : length; - ret = state->global->get_content(state, &buffer, reallength); - if (ret < 0) - return -1; - fakename = malloc(6 + 1 + (name ? strlen(name) : 4) + 1); - if (!fakename) - return -1; - sprintf(fakename, encoded ? "base64%c%s" : "normal%c%s", name ? ':' : '-', name ? name : "anon"); - topology->userdata_import_cb(topology, obj, fakename, buffer, length); - free(fakename); - - } else if (encoded && length) { - char *encoded_buffer; - size_t encoded_length = BASE64_ENCODED_LENGTH(length); - ret = state->global->get_content(state, &encoded_buffer, encoded_length); - if (ret < 0) - return -1; - if (ret) { - char *decoded_buffer = malloc(length+1); - if (!decoded_buffer) - return -1; - assert(encoded_buffer[encoded_length] == 0); - ret = hwloc_decode_from_base64(encoded_buffer, decoded_buffer, length+1); - if (ret != (int) length) { - free(decoded_buffer); - return -1; - } - topology->userdata_import_cb(topology, obj, name, decoded_buffer, length); - free(decoded_buffer); - } - - } else { /* always handle length==0 in the non-encoded case */ - char *buffer = ""; - if (length) { - ret = state->global->get_content(state, &buffer, length); - if (ret < 0) - return -1; - } - topology->userdata_import_cb(topology, obj, name, buffer, length); - } - - state->global->close_content(state); - return state->global->close_tag(state); -} - -static int -hwloc__xml_import_object(hwloc_topology_t topology, - struct hwloc_xml_backend_data_s *data, - hwloc_obj_t obj, - hwloc__xml_import_state_t state) -{ - hwloc_obj_t parent = obj->parent; - - /* process attributes */ - while (1) { - char *attrname, *attrvalue; - if (state->global->next_attr(state, &attrname, &attrvalue) < 0) - break; - if (!strcmp(attrname, "type")) { - if (hwloc_obj_type_sscanf(attrvalue, &obj->type, NULL, NULL, 0) < 0) - goto error_with_object; - } else { - /* type needed first */ - if (obj->type == (hwloc_obj_type_t)-1) - goto error_with_object; - hwloc__xml_import_object_attr(topology, obj, attrname, attrvalue, state); - } - } - - if (parent) { - /* root->parent is NULL, and root is already inserted */ - - /* warn if inserting out-of-order */ - if (parent->cpuset) { /* don't compare children if multinode parent */ - hwloc_obj_t *current; - for (current = &parent->first_child; *current; current = &(*current)->next_sibling) { - hwloc_bitmap_t curcpuset = (*current)->cpuset; - if (obj->cpuset && (!curcpuset || hwloc__object_cpusets_compare_first(obj, *current) < 0)) { - static int reported = 0; - if (!reported && !hwloc_hide_errors()) { - char *progname = hwloc_progname(topology); - const char *origversion = hwloc_obj_get_info_by_name(topology->levels[0][0], "hwlocVersion"); - const char *origprogname = hwloc_obj_get_info_by_name(topology->levels[0][0], "ProcessName"); - char *c1, *cc1, t1[64]; - char *c2 = NULL, *cc2 = NULL, t2[64]; - hwloc_bitmap_asprintf(&c1, obj->cpuset); - hwloc_bitmap_asprintf(&cc1, obj->complete_cpuset); - hwloc_obj_type_snprintf(t1, sizeof(t1), obj, 0); - if (curcpuset) - hwloc_bitmap_asprintf(&c2, curcpuset); - if ((*current)->complete_cpuset) - hwloc_bitmap_asprintf(&cc2, (*current)->complete_cpuset); - hwloc_obj_type_snprintf(t2, sizeof(t2), *current, 0); - fprintf(stderr, "****************************************************************************\n"); - fprintf(stderr, "* hwloc has encountered an out-of-order XML topology load.\n"); - fprintf(stderr, "* Object %s cpuset %s complete %s\n", - t1, c1, cc1); - fprintf(stderr, "* was inserted after object %s with %s and %s.\n", - t2, c2 ? c2 : "none", cc2 ? cc2 : "none"); - fprintf(stderr, "* The error occured in hwloc %s inside process `%s', while\n", - HWLOC_VERSION, - progname ? progname : ""); - if (origversion || origprogname) - fprintf(stderr, "* the input XML was generated by hwloc %s inside process `%s'.\n", - origversion ? origversion : "(unknown version)", - origprogname ? origprogname : ""); - else - fprintf(stderr, "* the input XML was generated by an unspecified ancient hwloc release.\n"); - fprintf(stderr, "* Please check that your input topology XML file is valid.\n"); - fprintf(stderr, "****************************************************************************\n"); - free(c1); - free(cc1); - if (c2) - free(c2); - if (cc2) - free(cc2); - free(progname); - reported = 1; - } - } - } - } - - hwloc_insert_object_by_parent(topology, obj->parent /* filled by the caller */, obj); - /* insert_object_by_parent() doesn't merge during insert, so obj is still valid */ - } - - /* process subnodes */ - while (1) { - struct hwloc__xml_import_state_s childstate; - char *tag; - int ret; - - ret = state->global->find_child(state, &childstate, &tag); - if (ret < 0) - goto error; - if (!ret) - break; - - if (!strcmp(tag, "object")) { - hwloc_obj_t childobj = hwloc_alloc_setup_object(HWLOC_OBJ_TYPE_MAX, -1); - childobj->parent = obj; /* store the parent pointer for use in insert() below */ - ret = hwloc__xml_import_object(topology, data, childobj, &childstate); - } else if (!strcmp(tag, "page_type")) { - ret = hwloc__xml_import_pagetype(topology, obj, &childstate); - } else if (!strcmp(tag, "info")) { - ret = hwloc__xml_import_info(topology, obj, &childstate); - } else if (!strcmp(tag, "distances")) { - ret = hwloc__xml_import_distances(data, obj, &childstate); - } else if (!strcmp(tag, "userdata")) { - ret = hwloc__xml_import_userdata(topology, obj, &childstate); - } else - ret = -1; - - if (ret < 0) - goto error; - - state->global->close_child(&childstate); - } - - return state->global->close_tag(state); - - error_with_object: - hwloc_free_unlinked_object(obj); - error: - return -1; -} - -static int -hwloc__xml_import_diff_one(hwloc__xml_import_state_t state, - hwloc_topology_diff_t *firstdiffp, - hwloc_topology_diff_t *lastdiffp) -{ - char *type_s = NULL; - char *obj_depth_s = NULL; - char *obj_index_s = NULL; - char *obj_attr_type_s = NULL; -/* char *obj_attr_index_s = NULL; unused for now */ - char *obj_attr_name_s = NULL; - char *obj_attr_oldvalue_s = NULL; - char *obj_attr_newvalue_s = NULL; - - while (1) { - char *attrname, *attrvalue; - if (state->global->next_attr(state, &attrname, &attrvalue) < 0) - break; - if (!strcmp(attrname, "type")) - type_s = attrvalue; - else if (!strcmp(attrname, "obj_depth")) - obj_depth_s = attrvalue; - else if (!strcmp(attrname, "obj_index")) - obj_index_s = attrvalue; - else if (!strcmp(attrname, "obj_attr_type")) - obj_attr_type_s = attrvalue; - else if (!strcmp(attrname, "obj_attr_index")) - { /* obj_attr_index_s = attrvalue; unused for now */ } - else if (!strcmp(attrname, "obj_attr_name")) - obj_attr_name_s = attrvalue; - else if (!strcmp(attrname, "obj_attr_oldvalue")) - obj_attr_oldvalue_s = attrvalue; - else if (!strcmp(attrname, "obj_attr_newvalue")) - obj_attr_newvalue_s = attrvalue; - else { - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring unknown diff attribute %s\n", - state->global->msgprefix, attrname); - return -1; - } - } - - if (type_s) { - switch (atoi(type_s)) { - default: - break; - case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR: { - /* object attribute diff */ - hwloc_topology_diff_obj_attr_type_t obj_attr_type; - hwloc_topology_diff_t diff; - - /* obj_attr mandatory generic attributes */ - if (!obj_depth_s || !obj_index_s || !obj_attr_type_s) { - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: missing mandatory obj attr generic attributes\n", - state->global->msgprefix); - break; - } - - /* obj_attr mandatory attributes common to all subtypes */ - if (!obj_attr_oldvalue_s || !obj_attr_newvalue_s) { - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: missing mandatory obj attr value attributes\n", - state->global->msgprefix); - break; - } - - /* mandatory attributes for obj_attr_info subtype */ - obj_attr_type = atoi(obj_attr_type_s); - if (obj_attr_type == HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_INFO && !obj_attr_name_s) { - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: missing mandatory obj attr info name attribute\n", - state->global->msgprefix); - break; - } - - /* now we know we have everything we need */ - diff = malloc(sizeof(*diff)); - if (!diff) - return -1; - diff->obj_attr.type = HWLOC_TOPOLOGY_DIFF_OBJ_ATTR; - diff->obj_attr.obj_depth = atoi(obj_depth_s); - diff->obj_attr.obj_index = atoi(obj_index_s); - memset(&diff->obj_attr.diff, 0, sizeof(diff->obj_attr.diff)); - diff->obj_attr.diff.generic.type = obj_attr_type; - - switch (atoi(obj_attr_type_s)) { - case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_SIZE: - diff->obj_attr.diff.uint64.oldvalue = strtoull(obj_attr_oldvalue_s, NULL, 0); - diff->obj_attr.diff.uint64.newvalue = strtoull(obj_attr_newvalue_s, NULL, 0); - break; - case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_INFO: - diff->obj_attr.diff.string.name = strdup(obj_attr_name_s); - /* fallthrough */ - case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_NAME: - diff->obj_attr.diff.string.oldvalue = strdup(obj_attr_oldvalue_s); - diff->obj_attr.diff.string.newvalue = strdup(obj_attr_newvalue_s); - break; - } - - if (*firstdiffp) - (*lastdiffp)->generic.next = diff; - else - *firstdiffp = diff; - *lastdiffp = diff; - diff->generic.next = NULL; - } - } - } - - return state->global->close_tag(state); -} - -int -hwloc__xml_import_diff(hwloc__xml_import_state_t state, - hwloc_topology_diff_t *firstdiffp) -{ - hwloc_topology_diff_t firstdiff = NULL, lastdiff = NULL; - *firstdiffp = NULL; - - while (1) { - struct hwloc__xml_import_state_s childstate; - char *tag; - int ret; - - ret = state->global->find_child(state, &childstate, &tag); - if (ret < 0) - return -1; - if (!ret) - break; - - if (!strcmp(tag, "diff")) { - ret = hwloc__xml_import_diff_one(&childstate, &firstdiff, &lastdiff); - } else - ret = -1; - - if (ret < 0) - return ret; - - state->global->close_child(&childstate); - } - - *firstdiffp = firstdiff; - return 0; -} - -/*********************************** - ********* main XML import ********* - ***********************************/ - -static void -hwloc_xml__free_distances(struct hwloc_xml_backend_data_s *data) -{ - struct hwloc_xml_imported_distances_s *xmldist; - while ((xmldist = data->first_distances) != NULL) { - data->first_distances = xmldist->next; - free(xmldist->distances.latency); - free(xmldist); - } -} - -static int -hwloc_xml__handle_distances(struct hwloc_topology *topology, - struct hwloc_xml_backend_data_s *data, - const char *msgprefix) -{ - struct hwloc_xml_imported_distances_s *xmldist; - - /* connect things now because we need levels to check/build, they'll be reconnected properly later anyway */ - hwloc_connect_children(topology->levels[0][0]); - if (hwloc_connect_levels(topology) < 0) { - hwloc_xml__free_distances(data); - return -1; - } - - while ((xmldist = data->first_distances) != NULL) { - hwloc_obj_t root = xmldist->root; - unsigned depth = root->depth + xmldist->distances.relative_depth; - unsigned nbobjs = hwloc_get_nbobjs_inside_cpuset_by_depth(topology, root->cpuset, depth); - - data->first_distances = xmldist->next; - - if (nbobjs != xmldist->distances.nbobjs) { - /* distances invalid, drop */ - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: ignoring invalid distance matrix with %u objs instead of %u\n", - msgprefix, xmldist->distances.nbobjs, nbobjs); - free(xmldist->distances.latency); - } else { - /* distances valid, add it to the internal OS distances list for grouping */ - unsigned *indexes = malloc(nbobjs * sizeof(unsigned)); - hwloc_obj_t child, *objs = malloc(nbobjs * sizeof(hwloc_obj_t)); - unsigned j; - for(j=0, child = hwloc_get_next_obj_inside_cpuset_by_depth(topology, root->cpuset, depth, NULL); - jcpuset, depth, child)) { - indexes[j] = child->os_index; - objs[j] = child; - } - for(j=0; jdistances.latency[j] *= xmldist->distances.latency_base; - hwloc_distances_set(topology, objs[0]->type, nbobjs, indexes, objs, xmldist->distances.latency, 0 /* XML cannot force */); - } - - free(xmldist); - } - - return 0; -} - -/* this canNOT be the first XML call */ -static int -hwloc_look_xml(struct hwloc_backend *backend) -{ - struct hwloc_topology *topology = backend->topology; - struct hwloc_xml_backend_data_s *data = backend->private_data; - struct hwloc__xml_import_state_s state, childstate; - char *tag; - hwloc_localeswitch_declare; - int ret; - - state.global = data; - - assert(!topology->levels[0][0]->cpuset); - - hwloc_localeswitch_init(); - - data->first_distances = data->last_distances = NULL; - - ret = data->look_init(data, &state); - if (ret < 0) - goto failed; - - /* find root object tag and import it */ - ret = state.global->find_child(&state, &childstate, &tag); - if (ret < 0 || !ret || strcmp(tag, "object")) - goto failed; - ret = hwloc__xml_import_object(topology, data, topology->levels[0][0], &childstate); - if (ret < 0) - goto failed; - state.global->close_child(&childstate); - - /* find end of topology tag */ - state.global->close_tag(&state); - - /* keep the "Backend" information intact */ - /* we could add "BackendSource=XML" to notify that XML was used between the actual backend and here */ - - /* if we added some distances, we must check them, and make them groupable */ - if (hwloc_xml__handle_distances(topology, data, data->msgprefix) < 0) - goto err; - data->first_distances = data->last_distances = NULL; - topology->support.discovery->pu = 1; - - hwloc_localeswitch_fini(); - return 1; - - failed: - if (data->look_failed) - data->look_failed(data); - if (hwloc__xml_verbose()) - fprintf(stderr, "%s: XML component discovery failed.\n", - data->msgprefix); - err: - hwloc_xml__free_distances(data); - hwloc_localeswitch_fini(); - return -1; -} - -/* this can be the first XML call */ -int -hwloc_topology_diff_load_xml(hwloc_topology_t topology __hwloc_attribute_unused, - const char *xmlpath, - hwloc_topology_diff_t *firstdiffp, char **refnamep) -{ - struct hwloc__xml_import_state_s state; - struct hwloc_xml_backend_data_s fakedata; /* only for storing global info during parsing */ - hwloc_localeswitch_declare; - const char *basename; - int force_nolibxml; - int ret; - - state.global = &fakedata; - - basename = strrchr(xmlpath, '/'); - if (basename) - basename++; - else - basename = xmlpath; - fakedata.msgprefix = strdup(basename); - - if (!hwloc_libxml_callbacks && !hwloc_nolibxml_callbacks) { - free(fakedata.msgprefix); - errno = ENOSYS; - return -1; - } - - hwloc_localeswitch_init(); - - *firstdiffp = NULL; - - force_nolibxml = hwloc_nolibxml_import(); -retry: - if (!hwloc_libxml_callbacks || (hwloc_nolibxml_callbacks && force_nolibxml)) - ret = hwloc_nolibxml_callbacks->import_diff(&state, xmlpath, NULL, 0, firstdiffp, refnamep); - else { - ret = hwloc_libxml_callbacks->import_diff(&state, xmlpath, NULL, 0, firstdiffp, refnamep); - if (ret < 0 && errno == ENOSYS) { - hwloc_libxml_callbacks = NULL; - goto retry; - } - } - - hwloc_localeswitch_fini(); - - free(fakedata.msgprefix); - return ret; -} - -/* this can be the first XML call */ -int -hwloc_topology_diff_load_xmlbuffer(hwloc_topology_t topology __hwloc_attribute_unused, - const char *xmlbuffer, int buflen, - hwloc_topology_diff_t *firstdiffp, char **refnamep) -{ - struct hwloc__xml_import_state_s state; - struct hwloc_xml_backend_data_s fakedata; /* only for storing global info during parsing */ - hwloc_localeswitch_declare; - int force_nolibxml; - int ret; - - state.global = &fakedata; - fakedata.msgprefix = strdup("xmldiffbuffer"); - - if (!hwloc_libxml_callbacks && !hwloc_nolibxml_callbacks) { - free(fakedata.msgprefix); - errno = ENOSYS; - return -1; - } - - hwloc_localeswitch_init(); - - *firstdiffp = NULL; - - force_nolibxml = hwloc_nolibxml_import(); - retry: - if (!hwloc_libxml_callbacks || (hwloc_nolibxml_callbacks && force_nolibxml)) - ret = hwloc_nolibxml_callbacks->import_diff(&state, NULL, xmlbuffer, buflen, firstdiffp, refnamep); - else { - ret = hwloc_libxml_callbacks->import_diff(&state, NULL, xmlbuffer, buflen, firstdiffp, refnamep); - if (ret < 0 && errno == ENOSYS) { - hwloc_libxml_callbacks = NULL; - goto retry; - } - } - - hwloc_localeswitch_fini(); - - free(fakedata.msgprefix); - return ret; -} - -/************************************************ - ********* XML export (common routines) ********* - ************************************************/ - -#define HWLOC_XML_CHAR_VALID(c) (((c) >= 32 && (c) <= 126) || (c) == '\t' || (c) == '\n' || (c) == '\r') - -static int -hwloc__xml_export_check_buffer(const char *buf, size_t length) -{ - unsigned i; - for(i=0; inew_child(parentstate, &state, "object"); - - state.new_prop(&state, "type", hwloc_obj_type_string(obj->type)); - if (obj->os_level != -1) { - sprintf(tmp, "%d", obj->os_level); - state.new_prop(&state, "os_level", tmp); - } - if (obj->os_index != (unsigned) -1) { - sprintf(tmp, "%u", obj->os_index); - state.new_prop(&state, "os_index", tmp); - } - if (obj->cpuset) { - hwloc_bitmap_asprintf(&cpuset, obj->cpuset); - state.new_prop(&state, "cpuset", cpuset); - free(cpuset); - } - if (obj->complete_cpuset) { - hwloc_bitmap_asprintf(&cpuset, obj->complete_cpuset); - state.new_prop(&state, "complete_cpuset", cpuset); - free(cpuset); - } - if (obj->online_cpuset) { - hwloc_bitmap_asprintf(&cpuset, obj->online_cpuset); - state.new_prop(&state, "online_cpuset", cpuset); - free(cpuset); - } - if (obj->allowed_cpuset) { - hwloc_bitmap_asprintf(&cpuset, obj->allowed_cpuset); - state.new_prop(&state, "allowed_cpuset", cpuset); - free(cpuset); - } - if (obj->nodeset && !hwloc_bitmap_isfull(obj->nodeset)) { - hwloc_bitmap_asprintf(&cpuset, obj->nodeset); - state.new_prop(&state, "nodeset", cpuset); - free(cpuset); - } - if (obj->complete_nodeset && !hwloc_bitmap_isfull(obj->complete_nodeset)) { - hwloc_bitmap_asprintf(&cpuset, obj->complete_nodeset); - state.new_prop(&state, "complete_nodeset", cpuset); - free(cpuset); - } - if (obj->allowed_nodeset && !hwloc_bitmap_isfull(obj->allowed_nodeset)) { - hwloc_bitmap_asprintf(&cpuset, obj->allowed_nodeset); - state.new_prop(&state, "allowed_nodeset", cpuset); - free(cpuset); - } - - if (obj->name) { - char *name = hwloc__xml_export_safestrdup(obj->name); - state.new_prop(&state, "name", name); - free(name); - } - - switch (obj->type) { - case HWLOC_OBJ_CACHE: - sprintf(tmp, "%llu", (unsigned long long) obj->attr->cache.size); - state.new_prop(&state, "cache_size", tmp); - sprintf(tmp, "%u", obj->attr->cache.depth); - state.new_prop(&state, "depth", tmp); - sprintf(tmp, "%u", (unsigned) obj->attr->cache.linesize); - state.new_prop(&state, "cache_linesize", tmp); - sprintf(tmp, "%d", (unsigned) obj->attr->cache.associativity); - state.new_prop(&state, "cache_associativity", tmp); - sprintf(tmp, "%d", (unsigned) obj->attr->cache.type); - state.new_prop(&state, "cache_type", tmp); - break; - case HWLOC_OBJ_GROUP: - sprintf(tmp, "%u", obj->attr->group.depth); - state.new_prop(&state, "depth", tmp); - break; - case HWLOC_OBJ_BRIDGE: - sprintf(tmp, "%u-%u", obj->attr->bridge.upstream_type, obj->attr->bridge.downstream_type); - state.new_prop(&state, "bridge_type", tmp); - sprintf(tmp, "%u", obj->attr->bridge.depth); - state.new_prop(&state, "depth", tmp); - if (obj->attr->bridge.downstream_type == HWLOC_OBJ_BRIDGE_PCI) { - sprintf(tmp, "%04x:[%02x-%02x]", - (unsigned) obj->attr->bridge.downstream.pci.domain, - (unsigned) obj->attr->bridge.downstream.pci.secondary_bus, - (unsigned) obj->attr->bridge.downstream.pci.subordinate_bus); - state.new_prop(&state, "bridge_pci", tmp); - } - if (obj->attr->bridge.upstream_type != HWLOC_OBJ_BRIDGE_PCI) - break; - /* fallthrough */ - case HWLOC_OBJ_PCI_DEVICE: - sprintf(tmp, "%04x:%02x:%02x.%01x", - (unsigned) obj->attr->pcidev.domain, - (unsigned) obj->attr->pcidev.bus, - (unsigned) obj->attr->pcidev.dev, - (unsigned) obj->attr->pcidev.func); - state.new_prop(&state, "pci_busid", tmp); - sprintf(tmp, "%04x [%04x:%04x] [%04x:%04x] %02x", - (unsigned) obj->attr->pcidev.class_id, - (unsigned) obj->attr->pcidev.vendor_id, (unsigned) obj->attr->pcidev.device_id, - (unsigned) obj->attr->pcidev.subvendor_id, (unsigned) obj->attr->pcidev.subdevice_id, - (unsigned) obj->attr->pcidev.revision); - state.new_prop(&state, "pci_type", tmp); - sprintf(tmp, "%f", obj->attr->pcidev.linkspeed); - state.new_prop(&state, "pci_link_speed", tmp); - break; - case HWLOC_OBJ_OS_DEVICE: - sprintf(tmp, "%u", obj->attr->osdev.type); - state.new_prop(&state, "osdev_type", tmp); - break; - default: - break; - } - - if (obj->memory.local_memory) { - sprintf(tmp, "%llu", (unsigned long long) obj->memory.local_memory); - state.new_prop(&state, "local_memory", tmp); - } - - for(i=0; imemory.page_types_len; i++) { - struct hwloc__xml_export_state_s childstate; - state.new_child(&state, &childstate, "page_type"); - sprintf(tmp, "%llu", (unsigned long long) obj->memory.page_types[i].size); - childstate.new_prop(&childstate, "size", tmp); - sprintf(tmp, "%llu", (unsigned long long) obj->memory.page_types[i].count); - childstate.new_prop(&childstate, "count", tmp); - childstate.end_object(&childstate, "page_type"); - } - - for(i=0; iinfos_count; i++) { - char *name = hwloc__xml_export_safestrdup(obj->infos[i].name); - char *value = hwloc__xml_export_safestrdup(obj->infos[i].value); - struct hwloc__xml_export_state_s childstate; - state.new_child(&state, &childstate, "info"); - childstate.new_prop(&childstate, "name", name); - childstate.new_prop(&childstate, "value", value); - childstate.end_object(&childstate, "info"); - free(name); - free(value); - } - - for(i=0; idistances_count; i++) { - unsigned nbobjs = obj->distances[i]->nbobjs; - unsigned j; - struct hwloc__xml_export_state_s childstate; - state.new_child(&state, &childstate, "distances"); - sprintf(tmp, "%u", nbobjs); - childstate.new_prop(&childstate, "nbobjs", tmp); - sprintf(tmp, "%u", obj->distances[i]->relative_depth); - childstate.new_prop(&childstate, "relative_depth", tmp); - sprintf(tmp, "%f", obj->distances[i]->latency_base); - childstate.new_prop(&childstate, "latency_base", tmp); - for(j=0; jdistances[i]->latency[j]); - greatchildstate.new_prop(&greatchildstate, "value", tmp); - greatchildstate.end_object(&greatchildstate, "latency"); - } - childstate.end_object(&childstate, "distances"); - } - - if (obj->userdata && topology->userdata_export_cb) - topology->userdata_export_cb((void*) &state, topology, obj); - - if (obj->arity) { - unsigned x; - for (x=0; xarity; x++) - hwloc__xml_export_object (&state, topology, obj->children[x]); - } - - state.end_object(&state, "object"); -} - -void -hwloc__xml_export_diff(hwloc__xml_export_state_t parentstate, hwloc_topology_diff_t diff) -{ - while (diff) { - struct hwloc__xml_export_state_s state; - char tmp[255]; - - parentstate->new_child(parentstate, &state, "diff"); - - sprintf(tmp, "%u", diff->generic.type); - state.new_prop(&state, "type", tmp); - - switch (diff->generic.type) { - case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR: - sprintf(tmp, "%d", diff->obj_attr.obj_depth); - state.new_prop(&state, "obj_depth", tmp); - sprintf(tmp, "%u", diff->obj_attr.obj_index); - state.new_prop(&state, "obj_index", tmp); - - sprintf(tmp, "%u", diff->obj_attr.diff.generic.type); - state.new_prop(&state, "obj_attr_type", tmp); - - switch (diff->obj_attr.diff.generic.type) { - case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_SIZE: - sprintf(tmp, "%llu", (unsigned long long) diff->obj_attr.diff.uint64.index); - state.new_prop(&state, "obj_attr_index", tmp); - sprintf(tmp, "%llu", (unsigned long long) diff->obj_attr.diff.uint64.oldvalue); - state.new_prop(&state, "obj_attr_oldvalue", tmp); - sprintf(tmp, "%llu", (unsigned long long) diff->obj_attr.diff.uint64.newvalue); - state.new_prop(&state, "obj_attr_newvalue", tmp); - break; - case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_NAME: - case HWLOC_TOPOLOGY_DIFF_OBJ_ATTR_INFO: - if (diff->obj_attr.diff.string.name) - state.new_prop(&state, "obj_attr_name", diff->obj_attr.diff.string.name); - state.new_prop(&state, "obj_attr_oldvalue", diff->obj_attr.diff.string.oldvalue); - state.new_prop(&state, "obj_attr_newvalue", diff->obj_attr.diff.string.newvalue); - break; - } - - break; - default: - assert(0); - } - state.end_object(&state, "diff"); - - diff = diff->generic.next; - } -} - -/********************************** - ********* main XML export ******** - **********************************/ - -/* this can be the first XML call */ -int hwloc_topology_export_xml(hwloc_topology_t topology, const char *filename) -{ - hwloc_localeswitch_declare; - int force_nolibxml; - int ret; - - if (!hwloc_libxml_callbacks && !hwloc_nolibxml_callbacks) { - errno = ENOSYS; - return -1; - } - - hwloc_localeswitch_init(); - - force_nolibxml = hwloc_nolibxml_export(); -retry: - if (!hwloc_libxml_callbacks || (hwloc_nolibxml_callbacks && force_nolibxml)) - ret = hwloc_nolibxml_callbacks->export_file(topology, filename); - else { - ret = hwloc_libxml_callbacks->export_file(topology, filename); - if (ret < 0 && errno == ENOSYS) { - hwloc_libxml_callbacks = NULL; - goto retry; - } - } - - hwloc_localeswitch_fini(); - return ret; -} - -/* this can be the first XML call */ -int hwloc_topology_export_xmlbuffer(hwloc_topology_t topology, char **xmlbuffer, int *buflen) -{ - hwloc_localeswitch_declare; - int force_nolibxml; - int ret; - - if (!hwloc_libxml_callbacks && !hwloc_nolibxml_callbacks) { - errno = ENOSYS; - return -1; - } - - hwloc_localeswitch_init(); - - force_nolibxml = hwloc_nolibxml_export(); -retry: - if (!hwloc_libxml_callbacks || (hwloc_nolibxml_callbacks && force_nolibxml)) - ret = hwloc_nolibxml_callbacks->export_buffer(topology, xmlbuffer, buflen); - else { - ret = hwloc_libxml_callbacks->export_buffer(topology, xmlbuffer, buflen); - if (ret < 0 && errno == ENOSYS) { - hwloc_libxml_callbacks = NULL; - goto retry; - } - } - - hwloc_localeswitch_fini(); - return ret; -} - -/* this can be the first XML call */ -int -hwloc_topology_diff_export_xml(hwloc_topology_t topology __hwloc_attribute_unused, - hwloc_topology_diff_t diff, const char *refname, - const char *filename) -{ - hwloc_localeswitch_declare; - hwloc_topology_diff_t tmpdiff; - int force_nolibxml; - int ret; - - if (!hwloc_libxml_callbacks && !hwloc_nolibxml_callbacks) { - errno = ENOSYS; - return -1; - } - - tmpdiff = diff; - while (tmpdiff) { - if (tmpdiff->generic.type == HWLOC_TOPOLOGY_DIFF_TOO_COMPLEX) { - errno = EINVAL; - return -1; - } - tmpdiff = tmpdiff->generic.next; - } - - hwloc_localeswitch_init(); - - force_nolibxml = hwloc_nolibxml_export(); -retry: - if (!hwloc_libxml_callbacks || (hwloc_nolibxml_callbacks && force_nolibxml)) - ret = hwloc_nolibxml_callbacks->export_diff_file(diff, refname, filename); - else { - ret = hwloc_libxml_callbacks->export_diff_file(diff, refname, filename); - if (ret < 0 && errno == ENOSYS) { - hwloc_libxml_callbacks = NULL; - goto retry; - } - } - - hwloc_localeswitch_fini(); - return ret; -} - -/* this can be the first XML call */ -int -hwloc_topology_diff_export_xmlbuffer(hwloc_topology_t topology __hwloc_attribute_unused, - hwloc_topology_diff_t diff, const char *refname, - char **xmlbuffer, int *buflen) -{ - hwloc_localeswitch_declare; - hwloc_topology_diff_t tmpdiff; - int force_nolibxml; - int ret; - - if (!hwloc_libxml_callbacks && !hwloc_nolibxml_callbacks) { - errno = ENOSYS; - return -1; - } - - tmpdiff = diff; - while (tmpdiff) { - if (tmpdiff->generic.type == HWLOC_TOPOLOGY_DIFF_TOO_COMPLEX) { - errno = EINVAL; - return -1; - } - tmpdiff = tmpdiff->generic.next; - } - - hwloc_localeswitch_init(); - - force_nolibxml = hwloc_nolibxml_export(); -retry: - if (!hwloc_libxml_callbacks || (hwloc_nolibxml_callbacks && force_nolibxml)) - ret = hwloc_nolibxml_callbacks->export_diff_buffer(diff, refname, xmlbuffer, buflen); - else { - ret = hwloc_libxml_callbacks->export_diff_buffer(diff, refname, xmlbuffer, buflen); - if (ret < 0 && errno == ENOSYS) { - hwloc_libxml_callbacks = NULL; - goto retry; - } - } - - hwloc_localeswitch_fini(); - return ret; -} - -void hwloc_free_xmlbuffer(hwloc_topology_t topology __hwloc_attribute_unused, char *xmlbuffer) -{ - int force_nolibxml; - - if (!hwloc_libxml_callbacks && !hwloc_nolibxml_callbacks) { - errno = ENOSYS; - return ; - } - - force_nolibxml = hwloc_nolibxml_export(); - if (!hwloc_libxml_callbacks || (hwloc_nolibxml_callbacks && force_nolibxml)) - hwloc_nolibxml_callbacks->free_buffer(xmlbuffer); - else - hwloc_libxml_callbacks->free_buffer(xmlbuffer); -} - -void -hwloc_topology_set_userdata_export_callback(hwloc_topology_t topology, - void (*export)(void *reserved, struct hwloc_topology *topology, struct hwloc_obj *obj)) -{ - topology->userdata_export_cb = export; -} - -static void -hwloc__export_obj_userdata(hwloc__xml_export_state_t parentstate, int encoded, - const char *name, size_t length, const void *buffer, size_t encoded_length) -{ - struct hwloc__xml_export_state_s state; - char tmp[255]; - parentstate->new_child(parentstate, &state, "userdata"); - if (name) - state.new_prop(&state, "name", name); - sprintf(tmp, "%lu", (unsigned long) length); - state.new_prop(&state, "length", tmp); - if (encoded) - state.new_prop(&state, "encoding", "base64"); - if (encoded_length) - state.add_content(&state, buffer, encoded ? encoded_length : length); - state.end_object(&state, "userdata"); -} - -int -hwloc_export_obj_userdata(void *reserved, - struct hwloc_topology *topology, struct hwloc_obj *obj __hwloc_attribute_unused, - const char *name, const void *buffer, size_t length) -{ - hwloc__xml_export_state_t state = reserved; - - if (!buffer) { - errno = EINVAL; - return -1; - } - - if ((name && hwloc__xml_export_check_buffer(name, strlen(name)) < 0) - || hwloc__xml_export_check_buffer(buffer, length) < 0) { - errno = EINVAL; - return -1; - } - - if (topology->userdata_not_decoded) { - int encoded; - size_t encoded_length; - const char *realname; - if (!strncmp(name, "normal", 6)) { - encoded = 0; - encoded_length = length; - } else if (!strncmp(name, "base64", 6)) { - encoded = 1; - encoded_length = BASE64_ENCODED_LENGTH(length); - } else - assert(0); - if (name[6] == ':') - realname = name+7; - else if (!strcmp(name+6, "-anon")) - realname = NULL; - else - assert(0); - hwloc__export_obj_userdata(state, encoded, realname, length, buffer, encoded_length); - - } else - hwloc__export_obj_userdata(state, 0, name, length, buffer, length); - - return 0; -} - -int -hwloc_export_obj_userdata_base64(void *reserved, - struct hwloc_topology *topology __hwloc_attribute_unused, struct hwloc_obj *obj __hwloc_attribute_unused, - const char *name, const void *buffer, size_t length) -{ - hwloc__xml_export_state_t state = reserved; - size_t encoded_length; - char *encoded_buffer; - int ret __hwloc_attribute_unused; - - if (!buffer) { - errno = EINVAL; - return -1; - } - - assert(!topology->userdata_not_decoded); - - if (name && hwloc__xml_export_check_buffer(name, strlen(name)) < 0) { - errno = EINVAL; - return -1; - } - - encoded_length = BASE64_ENCODED_LENGTH(length); - encoded_buffer = malloc(encoded_length+1); - if (!encoded_buffer) { - errno = ENOMEM; - return -1; - } - - ret = hwloc_encode_to_base64(buffer, length, encoded_buffer, encoded_length+1); - assert(ret == (int) encoded_length); - - hwloc__export_obj_userdata(state, 1, name, length, encoded_buffer, encoded_length); - - free(encoded_buffer); - return 0; -} - -void -hwloc_topology_set_userdata_import_callback(hwloc_topology_t topology, - void (*import)(struct hwloc_topology *topology, struct hwloc_obj *obj, const char *name, const void *buffer, size_t length)) -{ - topology->userdata_import_cb = import; -} - -/*************************************** - ************ XML component ************ - ***************************************/ - -static void -hwloc_xml_backend_disable(struct hwloc_backend *backend) -{ - struct hwloc_xml_backend_data_s *data = backend->private_data; - data->backend_exit(data); - free(data->msgprefix); - free(data); -} - -static struct hwloc_backend * -hwloc_xml_component_instantiate(struct hwloc_disc_component *component, - const void *_data1, - const void *_data2, - const void *_data3) -{ - struct hwloc_xml_backend_data_s *data; - struct hwloc_backend *backend; - int force_nolibxml; - const char * xmlpath = (const char *) _data1; - const char * xmlbuffer = (const char *) _data2; - int xmlbuflen = (int)(uintptr_t) _data3; - const char *basename; - int err; - - if (!hwloc_libxml_callbacks && !hwloc_nolibxml_callbacks) { - errno = ENOSYS; - goto out; - } - - if (!xmlpath && !xmlbuffer) { - errno = EINVAL; - goto out; - } - - backend = hwloc_backend_alloc(component); - if (!backend) - goto out; - - data = malloc(sizeof(*data)); - if (!data) { - errno = ENOMEM; - goto out_with_backend; - } - - backend->private_data = data; - backend->discover = hwloc_look_xml; - backend->disable = hwloc_xml_backend_disable; - backend->is_thissystem = 0; - - if (xmlpath) { - basename = strrchr(xmlpath, '/'); - if (basename) - basename++; - else - basename = xmlpath; - } else { - basename = "xmlbuffer"; - } - data->msgprefix = strdup(basename); - - force_nolibxml = hwloc_nolibxml_import(); -retry: - if (!hwloc_libxml_callbacks || (hwloc_nolibxml_callbacks && force_nolibxml)) - err = hwloc_nolibxml_callbacks->backend_init(data, xmlpath, xmlbuffer, xmlbuflen); - else { - err = hwloc_libxml_callbacks->backend_init(data, xmlpath, xmlbuffer, xmlbuflen); - if (err < 0 && errno == ENOSYS) { - hwloc_libxml_callbacks = NULL; - goto retry; - } - } - if (err < 0) - goto out_with_data; - - return backend; - - out_with_data: - free(data->msgprefix); - free(data); - out_with_backend: - free(backend); - out: - return NULL; -} - -static struct hwloc_disc_component hwloc_xml_disc_component = { - HWLOC_DISC_COMPONENT_TYPE_GLOBAL, - "xml", - ~0, - hwloc_xml_component_instantiate, - 30, - NULL -}; - -const struct hwloc_component hwloc_xml_component = { - HWLOC_COMPONENT_ABI, - NULL, NULL, - HWLOC_COMPONENT_TYPE_DISC, - 0, - &hwloc_xml_disc_component -}; diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/topology.c b/opal/mca/hwloc/hwloc1113/hwloc/src/topology.c deleted file mode 100644 index f11beaeb400..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/topology.c +++ /dev/null @@ -1,3294 +0,0 @@ -/* - * Copyright © 2009 CNRS - * Copyright © 2009-2016 Inria. All rights reserved. - * Copyright © 2009-2012 Université Bordeaux - * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. - * See COPYING in top-level directory. - */ - -#include - -#define _ATFILE_SOURCE -#include -#include -#ifdef HAVE_DIRENT_H -#include -#endif -#ifdef HAVE_UNISTD_H -#include -#endif -#include -#include -#include -#include -#include -#include -#include - -#include -#include -#include -#include - -#ifdef HAVE_MACH_MACH_INIT_H -#include -#endif -#ifdef HAVE_MACH_MACH_HOST_H -#include -#endif - -#ifdef HAVE_SYS_PARAM_H -#include -#endif - -#ifdef HAVE_SYS_SYSCTL_H -#include -#endif - -#ifdef HWLOC_WIN_SYS -#include -#endif - -unsigned hwloc_get_api_version(void) -{ - return HWLOC_API_VERSION; -} - -int hwloc_hide_errors(void) -{ - static int hide = 0; - static int checked = 0; - if (!checked) { - const char *envvar = getenv("HWLOC_HIDE_ERRORS"); - if (envvar) - hide = atoi(envvar); - checked = 1; - } - return hide; -} - -void hwloc_report_os_error(const char *msg, int line) -{ - static int reported = 0; - - if (!reported && !hwloc_hide_errors()) { - fprintf(stderr, "****************************************************************************\n"); - fprintf(stderr, "* hwloc %s has encountered what looks like an error from the operating system.\n", HWLOC_VERSION); - fprintf(stderr, "*\n"); - fprintf(stderr, "* %s\n", msg); - fprintf(stderr, "* Error occurred in topology.c line %d\n", line); - fprintf(stderr, "*\n"); - fprintf(stderr, "* The following FAQ entry in the hwloc documentation may help:\n"); - fprintf(stderr, "* What should I do when hwloc reports \"operating system\" warnings?\n"); - fprintf(stderr, "* Otherwise please report this error message to the hwloc user's mailing list,\n"); -#ifdef HWLOC_LINUX_SYS - fprintf(stderr, "* along with the output+tarball generated by the hwloc-gather-topology script.\n"); -#else - fprintf(stderr, "* along with any relevant topology information from your platform.\n"); -#endif - fprintf(stderr, "****************************************************************************\n"); - reported = 1; - } -} - -#if defined(HAVE_SYSCTLBYNAME) -int hwloc_get_sysctlbyname(const char *name, int64_t *ret) -{ - union { - int32_t i32; - int64_t i64; - } n; - size_t size = sizeof(n); - if (sysctlbyname(name, &n, &size, NULL, 0)) - return -1; - switch (size) { - case sizeof(n.i32): - *ret = n.i32; - break; - case sizeof(n.i64): - *ret = n.i64; - break; - default: - return -1; - } - return 0; -} -#endif - -#if defined(HAVE_SYSCTL) -int hwloc_get_sysctl(int name[], unsigned namelen, int *ret) -{ - int n; - size_t size = sizeof(n); - if (sysctl(name, namelen, &n, &size, NULL, 0)) - return -1; - if (size != sizeof(n)) - return -1; - *ret = n; - return 0; -} -#endif - -/* Return the OS-provided number of processors. Unlike other methods such as - reading sysfs on Linux, this method is not virtualizable; thus it's only - used as a fall-back method, allowing `hwloc_set_fsroot ()' to - have the desired effect. */ -#ifndef HWLOC_WIN_SYS /* The windows implementation is in topology-windows.c */ -unsigned -hwloc_fallback_nbprocessors(struct hwloc_topology *topology) { - int n; -#if HAVE_DECL__SC_NPROCESSORS_ONLN - n = sysconf(_SC_NPROCESSORS_ONLN); -#elif HAVE_DECL__SC_NPROC_ONLN - n = sysconf(_SC_NPROC_ONLN); -#elif HAVE_DECL__SC_NPROCESSORS_CONF - n = sysconf(_SC_NPROCESSORS_CONF); -#elif HAVE_DECL__SC_NPROC_CONF - n = sysconf(_SC_NPROC_CONF); -#elif defined(HAVE_HOST_INFO) && HAVE_HOST_INFO - struct host_basic_info info; - mach_msg_type_number_t count = HOST_BASIC_INFO_COUNT; - host_info(mach_host_self(), HOST_BASIC_INFO, (integer_t*) &info, &count); - n = info.avail_cpus; -#elif defined(HAVE_SYSCTLBYNAME) - int64_t nn; - if (hwloc_get_sysctlbyname("hw.ncpu", &nn)) - nn = -1; - n = nn; -#elif defined(HAVE_SYSCTL) && HAVE_DECL_CTL_HW && HAVE_DECL_HW_NCPU - static int name[2] = {CTL_HW, HW_NPCU}; - if (hwloc_get_sysctl(name, sizeof(name)/sizeof(*name)), &n) - n = -1; -#else -#ifdef __GNUC__ -#warning No known way to discover number of available processors on this system -#warning hwloc_fallback_nbprocessors will default to 1 -#endif - n = -1; -#endif - if (n >= 1) - topology->support.discovery->pu = 1; - else - n = 1; - return n; -} -#endif /* !HWLOC_WIN_SYS */ - -/* - * Use the given number of processors and the optional online cpuset if given - * to set a PU level. - */ -void -hwloc_setup_pu_level(struct hwloc_topology *topology, - unsigned nb_pus) -{ - struct hwloc_obj *obj; - unsigned oscpu,cpu; - - hwloc_debug("%s", "\n\n * CPU cpusets *\n\n"); - for (cpu=0,oscpu=0; cpucpuset = hwloc_bitmap_alloc(); - hwloc_bitmap_only(obj->cpuset, oscpu); - - hwloc_debug_2args_bitmap("cpu %u (os %u) has cpuset %s\n", - cpu, oscpu, obj->cpuset); - hwloc_insert_object_by_cpuset(topology, obj); - - cpu++; - } -} - -#ifdef HWLOC_DEBUG -/* Just for debugging. */ -static void -hwloc_debug_print_object(int indent __hwloc_attribute_unused, hwloc_obj_t obj) -{ - char type[64], idx[10], attr[1024], *cpuset = NULL; - hwloc_debug("%*s", 2*indent, ""); - hwloc_obj_type_snprintf(type, sizeof(type), obj, 1); - if (obj->os_index != (unsigned) -1) - snprintf(idx, sizeof(idx), "#%u", obj->os_index); - else - *idx = '\0'; - hwloc_obj_attr_snprintf(attr, sizeof(attr), obj, " ", 1); - hwloc_debug("%s%s%s%s%s", type, idx, *attr ? "(" : "", attr, *attr ? ")" : ""); - if (obj->name) - hwloc_debug(" name %s", obj->name); - if (obj->cpuset) { - hwloc_bitmap_asprintf(&cpuset, obj->cpuset); - hwloc_debug(" cpuset %s", cpuset); - free(cpuset); - } - if (obj->complete_cpuset) { - hwloc_bitmap_asprintf(&cpuset, obj->complete_cpuset); - hwloc_debug(" complete %s", cpuset); - free(cpuset); - } - if (obj->online_cpuset) { - hwloc_bitmap_asprintf(&cpuset, obj->online_cpuset); - hwloc_debug(" online %s", cpuset); - free(cpuset); - } - if (obj->allowed_cpuset) { - hwloc_bitmap_asprintf(&cpuset, obj->allowed_cpuset); - hwloc_debug(" allowed %s", cpuset); - free(cpuset); - } - if (obj->nodeset) { - hwloc_bitmap_asprintf(&cpuset, obj->nodeset); - hwloc_debug(" nodeset %s", cpuset); - free(cpuset); - } - if (obj->complete_nodeset) { - hwloc_bitmap_asprintf(&cpuset, obj->complete_nodeset); - hwloc_debug(" completeN %s", cpuset); - free(cpuset); - } - if (obj->allowed_nodeset) { - hwloc_bitmap_asprintf(&cpuset, obj->allowed_nodeset); - hwloc_debug(" allowedN %s", cpuset); - free(cpuset); - } - if (obj->arity) - hwloc_debug(" arity %u", obj->arity); - hwloc_debug("%s", "\n"); -} - -static void -hwloc_debug_print_objects(int indent __hwloc_attribute_unused, hwloc_obj_t obj) -{ - hwloc_debug_print_object(indent, obj); - for (obj = obj->first_child; obj; obj = obj->next_sibling) - hwloc_debug_print_objects(indent + 1, obj); -} -#else /* !HWLOC_DEBUG */ -#define hwloc_debug_print_object(indent, obj) do { /* nothing */ } while (0) -#define hwloc_debug_print_objects(indent, obj) do { /* nothing */ } while (0) -#endif /* !HWLOC_DEBUG */ - -void hwloc__free_infos(struct hwloc_obj_info_s *infos, unsigned count) -{ - unsigned i; - for(i=0; iinfos, &obj->infos_count, name, value); -} - -void hwloc_obj_add_info_nodup(hwloc_obj_t obj, const char *name, const char *value, int nodup) -{ - if (nodup && hwloc_obj_get_info_by_name(obj, name)) - return; - hwloc__add_info(&obj->infos, &obj->infos_count, name, value); -} - -/* Traverse children of a parent in a safe way: reread the next pointer as - * appropriate to prevent crash on child deletion: */ -#define for_each_child_safe(child, parent, pchild) \ - for (pchild = &(parent)->first_child, child = *pchild; \ - child; \ - /* Check whether the current child was not dropped. */ \ - (*pchild == child ? pchild = &(child->next_sibling) : NULL), \ - /* Get pointer to next childect. */ \ - child = *pchild) - -static void -hwloc__free_object_contents(hwloc_obj_t obj) -{ - switch (obj->type) { - default: - break; - } - hwloc__free_infos(obj->infos, obj->infos_count); - hwloc_clear_object_distances(obj); - free(obj->memory.page_types); - free(obj->attr); - free(obj->children); - free(obj->name); - hwloc_bitmap_free(obj->cpuset); - hwloc_bitmap_free(obj->complete_cpuset); - hwloc_bitmap_free(obj->online_cpuset); - hwloc_bitmap_free(obj->allowed_cpuset); - hwloc_bitmap_free(obj->nodeset); - hwloc_bitmap_free(obj->complete_nodeset); - hwloc_bitmap_free(obj->allowed_nodeset); -} - -/* Free an object and all its content. */ -void -hwloc_free_unlinked_object(hwloc_obj_t obj) -{ - hwloc__free_object_contents(obj); - free(obj); -} - -/* Replace old with contents of new object, and make new freeable by the caller. - * Only updates next_sibling/first_child pointers, - * so may only be used during early discovery. - */ -static void -hwloc_replace_linked_object(hwloc_obj_t old, hwloc_obj_t new) -{ - /* drop old fields */ - hwloc__free_object_contents(old); - /* copy old tree pointers to new */ - new->next_sibling = old->next_sibling; - new->first_child = old->first_child; - /* copy new contents to old now that tree pointers are OK */ - memcpy(old, new, sizeof(*old)); - /* clear new to that we may free it */ - memset(new, 0,sizeof(*new)); -} - -/* insert the (non-empty) list of sibling starting at firstnew as new children of newparent, - * and return the address of the pointer to the next one - */ -static hwloc_obj_t * -insert_siblings_list(hwloc_obj_t *firstp, hwloc_obj_t firstnew, hwloc_obj_t newparent) -{ - hwloc_obj_t tmp; - assert(firstnew); - *firstp = tmp = firstnew; - tmp->parent = newparent; - while (tmp->next_sibling) { - tmp = tmp->next_sibling; - } - return &tmp->next_sibling; -} - -/* Remove an object from its parent and free it. - * Only updates next_sibling/first_child pointers, - * so may only be used during early discovery. - * Children are inserted where the object was. - */ -static void -unlink_and_free_single_object(hwloc_obj_t *pparent) -{ - hwloc_obj_t old = *pparent; - hwloc_obj_t *lastp; - - if (old->first_child) - /* insert old object children as new siblings below parent instead of old */ - lastp = insert_siblings_list(pparent, old->first_child, old->parent); - else - lastp = pparent; - /* append old siblings back */ - *lastp = old->next_sibling; - - hwloc_free_unlinked_object(old); -} - -/* Remove an object and its children from its parent and free them. - * Only updates next_sibling/first_child pointers, - * so may only be used during early discovery. - */ -static void -unlink_and_free_object_and_children(hwloc_obj_t *pobj) -{ - hwloc_obj_t obj = *pobj, child, *pchild; - - for_each_child_safe(child, obj, pchild) - unlink_and_free_object_and_children(pchild); - - *pobj = obj->next_sibling; - hwloc_free_unlinked_object(obj); -} - -static void -hwloc__duplicate_object(struct hwloc_obj *newobj, - struct hwloc_obj *src) -{ - size_t len; - unsigned i; - - newobj->type = src->type; - newobj->os_index = src->os_index; - - if (src->name) - newobj->name = strdup(src->name); - newobj->userdata = src->userdata; - - memcpy(&newobj->memory, &src->memory, sizeof(struct hwloc_obj_memory_s)); - if (src->memory.page_types_len) { - len = src->memory.page_types_len * sizeof(struct hwloc_obj_memory_page_type_s); - newobj->memory.page_types = malloc(len); - memcpy(newobj->memory.page_types, src->memory.page_types, len); - } - - memcpy(newobj->attr, src->attr, sizeof(*newobj->attr)); - - newobj->cpuset = hwloc_bitmap_dup(src->cpuset); - newobj->complete_cpuset = hwloc_bitmap_dup(src->complete_cpuset); - newobj->allowed_cpuset = hwloc_bitmap_dup(src->allowed_cpuset); - newobj->online_cpuset = hwloc_bitmap_dup(src->online_cpuset); - newobj->nodeset = hwloc_bitmap_dup(src->nodeset); - newobj->complete_nodeset = hwloc_bitmap_dup(src->complete_nodeset); - newobj->allowed_nodeset = hwloc_bitmap_dup(src->allowed_nodeset); - - /* don't duplicate distances, they'll be recreated at the end of the topology build */ - - for(i=0; iinfos_count; i++) - hwloc__add_info(&newobj->infos, &newobj->infos_count, src->infos[i].name, src->infos[i].value); -} - -void -hwloc__duplicate_objects(struct hwloc_topology *newtopology, - struct hwloc_obj *newparent, - struct hwloc_obj *src) -{ - hwloc_obj_t newobj; - hwloc_obj_t child; - - newobj = hwloc_alloc_setup_object(src->type, src->os_index); - hwloc__duplicate_object(newobj, src); - - child = NULL; - while ((child = hwloc_get_next_child(newtopology, src, child)) != NULL) - hwloc__duplicate_objects(newtopology, newobj, child); - - /* no need to check the children order here, the source topology - * is supposed to be OK already, and we have debug asserts. - */ - hwloc_insert_object_by_parent(newtopology, newparent, newobj); -} - -int -hwloc_topology_dup(hwloc_topology_t *newp, - hwloc_topology_t old) -{ - hwloc_topology_t new; - hwloc_obj_t newroot; - hwloc_obj_t oldroot = hwloc_get_root_obj(old); - unsigned i; - - if (!old->is_loaded) { - errno = -EINVAL; - return -1; - } - - hwloc_topology_init(&new); - - new->flags = old->flags; - memcpy(new->ignored_types, old->ignored_types, sizeof(old->ignored_types)); - new->is_thissystem = old->is_thissystem; - new->is_loaded = 1; - new->pid = old->pid; - - memcpy(&new->binding_hooks, &old->binding_hooks, sizeof(old->binding_hooks)); - - memcpy(new->support.discovery, old->support.discovery, sizeof(*old->support.discovery)); - memcpy(new->support.cpubind, old->support.cpubind, sizeof(*old->support.cpubind)); - memcpy(new->support.membind, old->support.membind, sizeof(*old->support.membind)); - - new->userdata_export_cb = old->userdata_export_cb; - new->userdata_import_cb = old->userdata_import_cb; - new->userdata_not_decoded = old->userdata_not_decoded; - - newroot = hwloc_get_root_obj(new); - hwloc__duplicate_object(newroot, oldroot); - for(i=0; iarity; i++) - hwloc__duplicate_objects(new, newroot, oldroot->children[i]); - - if (old->first_osdist) { - struct hwloc_os_distances_s *olddist = old->first_osdist; - while (olddist) { - struct hwloc_os_distances_s *newdist = malloc(sizeof(*newdist)); - newdist->type = olddist->type; - newdist->nbobjs = olddist->nbobjs; - newdist->indexes = malloc(newdist->nbobjs * sizeof(*newdist->indexes)); - memcpy(newdist->indexes, olddist->indexes, newdist->nbobjs * sizeof(*newdist->indexes)); - newdist->objs = NULL; /* will be recomputed when needed */ - newdist->distances = malloc(newdist->nbobjs * newdist->nbobjs * sizeof(*newdist->distances)); - memcpy(newdist->distances, olddist->distances, newdist->nbobjs * newdist->nbobjs * sizeof(*newdist->distances)); - - newdist->forced = olddist->forced; - if (new->first_osdist) { - new->last_osdist->next = newdist; - newdist->prev = new->last_osdist; - } else { - new->first_osdist = newdist; - newdist->prev = NULL; - } - new->last_osdist = newdist; - newdist->next = NULL; - - olddist = olddist->next; - } - } else - new->first_osdist = old->last_osdist = NULL; - - /* no need to duplicate backends, topology is already loaded */ - new->backends = NULL; - - hwloc_connect_children(new->levels[0][0]); - if (hwloc_connect_levels(new) < 0) - goto out; - - hwloc_distances_finalize_os(new); - hwloc_distances_finalize_logical(new); - -#ifndef HWLOC_DEBUG - if (getenv("HWLOC_DEBUG_CHECK")) -#endif - hwloc_topology_check(new); - - *newp = new; - return 0; - - out: - hwloc_topology_destroy(new); - return -1; -} - -/* WARNING: The indexes of this array MUST match the ordering that of - the obj_order_type[] array, below. Specifically, the values must - be laid out such that: - - obj_order_type[obj_type_order[N]] = N - - for all HWLOC_OBJ_* values of N. Put differently: - - obj_type_order[A] = B - - where the A values are in order of the hwloc_obj_type_t enum, and - the B values are the corresponding indexes of obj_order_type. - - We can't use C99 syntax to initialize this in a little safer manner - -- bummer. :-( - - ************************************************************* - *** DO NOT CHANGE THE ORDERING OF THIS ARRAY WITHOUT TRIPLE - *** CHECKING ITS CORRECTNESS! - ************************************************************* - */ -static const unsigned obj_type_order[] = { - /* first entry is HWLOC_OBJ_SYSTEM */ 0, - /* next entry is HWLOC_OBJ_MACHINE */ 1, - /* next entry is HWLOC_OBJ_NUMANODE */ 3, - /* next entry is HWLOC_OBJ_PACKAGE */ 4, - /* next entry is HWLOC_OBJ_CACHE */ 5, - /* next entry is HWLOC_OBJ_CORE */ 6, - /* next entry is HWLOC_OBJ_PU */ 10, - /* next entry is HWLOC_OBJ_GROUP */ 2, - /* next entry is HWLOC_OBJ_MISC */ 11, - /* next entry is HWLOC_OBJ_BRIDGE */ 7, - /* next entry is HWLOC_OBJ_PCI_DEVICE */ 8, - /* next entry is HWLOC_OBJ_OS_DEVICE */ 9 -}; - -static const hwloc_obj_type_t obj_order_type[] = { - HWLOC_OBJ_SYSTEM, - HWLOC_OBJ_MACHINE, - HWLOC_OBJ_GROUP, - HWLOC_OBJ_NUMANODE, - HWLOC_OBJ_PACKAGE, - HWLOC_OBJ_CACHE, - HWLOC_OBJ_CORE, - HWLOC_OBJ_BRIDGE, - HWLOC_OBJ_PCI_DEVICE, - HWLOC_OBJ_OS_DEVICE, - HWLOC_OBJ_PU, - HWLOC_OBJ_MISC, -}; - -/* priority to be used when merging identical parent/children object - * (in merge_useless_child), keep the highest priority one. - * - * Always keep Machine/PU/PCIDev/OSDev - * then System/Node - * then Core - * then Package - * then Cache - * then always drop Group/Misc/Bridge. - * - * Some type won't actually ever be involved in such merging. - */ -static const int obj_type_priority[] = { - /* first entry is HWLOC_OBJ_SYSTEM */ 80, - /* next entry is HWLOC_OBJ_MACHINE */ 100, - /* next entry is HWLOC_OBJ_NUMANODE */ 80, - /* next entry is HWLOC_OBJ_PACKAGE */ 40, - /* next entry is HWLOC_OBJ_CACHE */ 20, - /* next entry is HWLOC_OBJ_CORE */ 60, - /* next entry is HWLOC_OBJ_PU */ 100, - /* next entry is HWLOC_OBJ_GROUP */ 0, - /* next entry is HWLOC_OBJ_MISC */ 0, - /* next entry is HWLOC_OBJ_BRIDGE */ 0, - /* next entry is HWLOC_OBJ_PCI_DEVICE */ 100, - /* next entry is HWLOC_OBJ_OS_DEVICE */ 100 -}; - -static unsigned __hwloc_attribute_const -hwloc_get_type_order(hwloc_obj_type_t type) -{ - return obj_type_order[type]; -} - -#if !defined(NDEBUG) -static hwloc_obj_type_t hwloc_get_order_type(int order) -{ - return obj_order_type[order]; -} -#endif - -static int hwloc_obj_type_is_io (hwloc_obj_type_t type) -{ - return type == HWLOC_OBJ_BRIDGE || type == HWLOC_OBJ_PCI_DEVICE || type == HWLOC_OBJ_OS_DEVICE; -} - -int hwloc_compare_types (hwloc_obj_type_t type1, hwloc_obj_type_t type2) -{ - unsigned order1 = hwloc_get_type_order(type1); - unsigned order2 = hwloc_get_type_order(type2); - - /* bridge and devices are only comparable with each others and with machine and system */ - if (hwloc_obj_type_is_io(type1) - && !hwloc_obj_type_is_io(type2) && type2 != HWLOC_OBJ_SYSTEM && type2 != HWLOC_OBJ_MACHINE) - return HWLOC_TYPE_UNORDERED; - if (hwloc_obj_type_is_io(type2) - && !hwloc_obj_type_is_io(type1) && type1 != HWLOC_OBJ_SYSTEM && type1 != HWLOC_OBJ_MACHINE) - return HWLOC_TYPE_UNORDERED; - - return order1 - order2; -} - -enum hwloc_obj_cmp_e { - HWLOC_OBJ_EQUAL = HWLOC_BITMAP_EQUAL, /**< \brief Equal */ - HWLOC_OBJ_INCLUDED = HWLOC_BITMAP_INCLUDED, /**< \brief Strictly included into */ - HWLOC_OBJ_CONTAINS = HWLOC_BITMAP_CONTAINS, /**< \brief Strictly contains */ - HWLOC_OBJ_INTERSECTS = HWLOC_BITMAP_INTERSECTS, /**< \brief Intersects, but no inclusion! */ - HWLOC_OBJ_DIFFERENT = HWLOC_BITMAP_DIFFERENT /**< \brief No intersection */ -}; - -static enum hwloc_obj_cmp_e -hwloc_type_cmp(hwloc_obj_t obj1, hwloc_obj_t obj2) -{ - hwloc_obj_type_t type1 = obj1->type; - hwloc_obj_type_t type2 = obj2->type; - int compare; - - compare = hwloc_compare_types(type1, type2); - if (compare == HWLOC_TYPE_UNORDERED) - return HWLOC_OBJ_DIFFERENT; /* we cannot do better */ - if (compare > 0) - return HWLOC_OBJ_INCLUDED; - if (compare < 0) - return HWLOC_OBJ_CONTAINS; - - /* Caches have the same types but can have different depths. */ - if (type1 == HWLOC_OBJ_CACHE) { - if (obj1->attr->cache.depth < obj2->attr->cache.depth) - return HWLOC_OBJ_INCLUDED; - else if (obj1->attr->cache.depth > obj2->attr->cache.depth) - return HWLOC_OBJ_CONTAINS; - else if (obj1->attr->cache.type > obj2->attr->cache.type) - /* consider icache deeper than dcache and dcache deeper than unified */ - return HWLOC_OBJ_INCLUDED; - else if (obj1->attr->cache.type < obj2->attr->cache.type) - /* consider icache deeper than dcache and dcache deeper than unified */ - return HWLOC_OBJ_CONTAINS; - } - - /* Group objects have the same types but can have different depths. */ - if (type1 == HWLOC_OBJ_GROUP) { - if (obj1->attr->group.depth == (unsigned) -1 - || obj2->attr->group.depth == (unsigned) -1) - return HWLOC_OBJ_EQUAL; - if (obj1->attr->group.depth < obj2->attr->group.depth) - return HWLOC_OBJ_INCLUDED; - else if (obj1->attr->group.depth > obj2->attr->group.depth) - return HWLOC_OBJ_CONTAINS; - } - - /* Bridges objects have the same types but can have different depths. */ - if (type1 == HWLOC_OBJ_BRIDGE) { - if (obj1->attr->bridge.depth < obj2->attr->bridge.depth) - return HWLOC_OBJ_INCLUDED; - else if (obj1->attr->bridge.depth > obj2->attr->bridge.depth) - return HWLOC_OBJ_CONTAINS; - } - - return HWLOC_OBJ_EQUAL; -} - -/* - * How to compare objects based on cpusets. - */ - -static int -hwloc_obj_cmp_sets(hwloc_obj_t obj1, hwloc_obj_t obj2) -{ - hwloc_bitmap_t set1, set2; - int res = HWLOC_OBJ_DIFFERENT; - - /* compare cpusets first */ - if (obj1->complete_cpuset && obj2->complete_cpuset) { - set1 = obj1->complete_cpuset; - set2 = obj2->complete_cpuset; - } else { - set1 = obj1->cpuset; - set2 = obj2->cpuset; - } - if (set1 && set2 && !hwloc_bitmap_iszero(set1) && !hwloc_bitmap_iszero(set2)) { - res = hwloc_bitmap_compare_inclusion(set1, set2); - if (res == HWLOC_OBJ_INTERSECTS) - return HWLOC_OBJ_INTERSECTS; - } - - /* then compare nodesets, and combine the results */ - if (obj1->complete_nodeset && obj2->complete_nodeset) { - set1 = obj1->complete_nodeset; - set2 = obj2->complete_nodeset; - } else { - set1 = obj1->nodeset; - set2 = obj2->nodeset; - } - if (set1 && set2 && !hwloc_bitmap_iszero(set1) && !hwloc_bitmap_iszero(set2)) { - int noderes = hwloc_bitmap_compare_inclusion(set1, set2); - /* deal with conflicting cpusets/nodesets inclusions */ - if (noderes == HWLOC_OBJ_INCLUDED) { - if (res == HWLOC_OBJ_CONTAINS) - /* contradicting order for cpusets and nodesets */ - return HWLOC_OBJ_INTERSECTS; - res = HWLOC_OBJ_INCLUDED; - - } else if (noderes == HWLOC_OBJ_CONTAINS) { - if (res == HWLOC_OBJ_INCLUDED) - /* contradicting order for cpusets and nodesets */ - return HWLOC_OBJ_INTERSECTS; - res = HWLOC_OBJ_CONTAINS; - - } else if (noderes == HWLOC_OBJ_INTERSECTS) { - return HWLOC_OBJ_INTERSECTS; - - } else { - /* nodesets are different, keep the cpuset order */ - /* FIXME: with upcoming multiple levels of NUMA, we may have to report INCLUDED or CONTAINED here */ - - } - } - - return res; -} - -/* Compare object cpusets based on complete_cpuset if defined (always correctly ordered), - * or fallback to the main cpusets (only correctly ordered during early insert before disallowed/offline bits are cleared). - * - * This is the sane way to compare object among a horizontal level. - */ -int -hwloc__object_cpusets_compare_first(hwloc_obj_t obj1, hwloc_obj_t obj2) -{ - if (obj1->complete_cpuset && obj2->complete_cpuset) - return hwloc_bitmap_compare_first(obj1->complete_cpuset, obj2->complete_cpuset); - else - return hwloc_bitmap_compare_first(obj1->cpuset, obj2->cpuset); -} - -/* format the obj info to print in error messages */ -static void -hwloc__report_error_format_obj(char *buf, size_t buflen, hwloc_obj_t obj) -{ - char typestr[64]; - char *cpusetstr; - hwloc_obj_type_snprintf(typestr, sizeof(typestr), obj, 0); - hwloc_bitmap_asprintf(&cpusetstr, obj->cpuset); - if (obj->os_index != (unsigned) -1) - snprintf(buf, buflen, "%s (P#%u cpuset %s)", - typestr, obj->os_index, cpusetstr); - else - snprintf(buf, buflen, "%s (cpuset %s)", - typestr, cpusetstr); - free(cpusetstr); -} - -/* - * How to insert objects into the topology. - * - * Note: during detection, only the first_child and next_sibling pointers are - * kept up to date. Others are computed only once topology detection is - * complete. - */ - -#define merge_index(new, old, field, type) \ - if ((old)->field == (type) -1) \ - (old)->field = (new)->field; -#define merge_sizes(new, old, field) \ - if (!(old)->field) \ - (old)->field = (new)->field; -#ifdef HWLOC_DEBUG -#define check_sizes(new, old, field) \ - if ((new)->field) \ - assert((old)->field == (new)->field) -#else -#define check_sizes(new, old, field) -#endif - -static void -merge_insert_equal(hwloc_obj_t new, hwloc_obj_t old) -{ - merge_index(new, old, os_index, unsigned); - - if (new->distances_count) { - if (old->distances_count) { - old->distances_count += new->distances_count; - old->distances = realloc(old->distances, old->distances_count * sizeof(*old->distances)); - memcpy(old->distances + new->distances_count, new->distances, new->distances_count * sizeof(*old->distances)); - free(new->distances); - } else { - old->distances_count = new->distances_count; - old->distances = new->distances; - } - new->distances_count = 0; - new->distances = NULL; - } - - if (new->infos_count) { - hwloc__move_infos(&old->infos, &old->infos_count, - &new->infos, &new->infos_count); - } - - if (new->name && !old->name) { - old->name = new->name; - new->name = NULL; - } - - assert(!new->userdata); /* user could not set userdata here (we're before load() */ - - switch(new->type) { - case HWLOC_OBJ_NUMANODE: - if (new->memory.local_memory && !old->memory.local_memory) { - /* no memory in old, use new memory */ - old->memory.local_memory = new->memory.local_memory; - if (old->memory.page_types) - free(old->memory.page_types); - old->memory.page_types_len = new->memory.page_types_len; - old->memory.page_types = new->memory.page_types; - new->memory.page_types = NULL; - new->memory.page_types_len = 0; - } - /* old->memory.total_memory will be updated by propagate_total_memory() */ - break; - case HWLOC_OBJ_CACHE: - merge_sizes(new, old, attr->cache.size); - check_sizes(new, old, attr->cache.size); - merge_sizes(new, old, attr->cache.linesize); - check_sizes(new, old, attr->cache.linesize); - break; - default: - break; - } -} - -/* Try to insert OBJ in CUR, recurse if needed. - * Returns the object if it was inserted, - * the remaining object it was merged, - * NULL if failed to insert. - */ -static struct hwloc_obj * -hwloc___insert_object_by_cpuset(struct hwloc_topology *topology, hwloc_obj_t cur, hwloc_obj_t obj, - hwloc_report_error_t report_error) -{ - hwloc_obj_t child, next_child = NULL; - /* These will always point to the pointer to their next last child. */ - hwloc_obj_t *cur_children = &cur->first_child; - hwloc_obj_t *obj_children = &obj->first_child; - /* Pointer where OBJ should be put */ - hwloc_obj_t *putp = NULL; /* OBJ position isn't found yet */ - - /* Make sure we haven't gone too deep. */ - if (!hwloc_bitmap_isincluded(obj->cpuset, cur->cpuset)) { - fprintf(stderr,"recursion has gone too deep?!\n"); - return NULL; - } - - /* Iteration with prefetching to be completely safe against CHILD removal. - * The list is already sorted by cpuset, and there's no intersection between siblings. - */ - for (child = cur->first_child, child ? next_child = child->next_sibling : NULL; - child; - child = next_child, child ? next_child = child->next_sibling : NULL) { - - int res = hwloc_obj_cmp_sets(obj, child); - - if (res == HWLOC_OBJ_EQUAL) { - if (obj->type == HWLOC_OBJ_GROUP) { - /* Groups are ignored keep_structure or always. Non-ignored Groups isn't possible. */ - assert(topology->ignored_types[HWLOC_OBJ_GROUP] != HWLOC_IGNORE_TYPE_NEVER); - /* Remove the Group now. The normal ignore code path wouldn't tell us whether the Group was removed or not. - * - * The Group doesn't contain anything to keep, just let the caller free it. - */ - return child; - - } else if (child->type == HWLOC_OBJ_GROUP) { - - /* Replace the Group with the new object contents - * and let the caller free the new object - */ - hwloc_replace_linked_object(child, obj); - return child; - - } else { - /* otherwise compare actual types to decide of the inclusion */ - res = hwloc_type_cmp(obj, child); - if (res == HWLOC_OBJ_EQUAL && obj->type == HWLOC_OBJ_MISC) { - /* Misc objects may vary by name */ - int ret = strcmp(obj->name, child->name); - if (ret < 0) - res = HWLOC_OBJ_INCLUDED; - else if (ret > 0) - res = HWLOC_OBJ_CONTAINS; - } - } - } - - switch (res) { - case HWLOC_OBJ_EQUAL: - merge_index(obj, child, os_level, signed); - if (obj->os_level != child->os_level) { - static int reported = 0; - if (!reported && !hwloc_hide_errors()) { - fprintf(stderr, "Cannot merge similar %s objects with different OS levels %u and %u\n", - hwloc_obj_type_string(obj->type), child->os_level, obj->os_level); - reported = 1; - } - return NULL; - } - /* Two objects with same type. - * Groups are handled above. - */ - if (obj->type == child->type - && (obj->type == HWLOC_OBJ_PU || obj->type == HWLOC_OBJ_NUMANODE) - && obj->os_index != child->os_index) { - static int reported = 0; - if (!reported && !hwloc_hide_errors()) { - fprintf(stderr, "Cannot merge similar %s objects with different OS indexes %u and %u\n", - hwloc_obj_type_string(obj->type), child->os_index, obj->os_index); - reported = 1; - } - return NULL; - } - merge_insert_equal(obj, child); - /* Already present, no need to insert. */ - return child; - - case HWLOC_OBJ_INCLUDED: - /* OBJ is strictly contained is some child of CUR, go deeper. */ - return hwloc___insert_object_by_cpuset(topology, child, obj, report_error); - - case HWLOC_OBJ_INTERSECTS: - if (report_error) { - char childstr[512]; - char objstr[512]; - char msg[1024]; - hwloc__report_error_format_obj(objstr, sizeof(objstr), obj); - hwloc__report_error_format_obj(childstr, sizeof(childstr), child); - snprintf(msg, sizeof(msg), "%s intersects with %s without inclusion!", objstr, childstr); - report_error(msg, __LINE__); - } - goto putback; - - case HWLOC_OBJ_DIFFERENT: - /* OBJ should be a child of CUR before CHILD, mark its position if not found yet. */ - if (!putp && (!child->cpuset || hwloc__object_cpusets_compare_first(obj, child) < 0)) - /* Don't insert yet, there could be intersect errors later */ - putp = cur_children; - /* Advance cur_children. */ - cur_children = &child->next_sibling; - break; - - case HWLOC_OBJ_CONTAINS: - /* OBJ contains CHILD, remove CHILD from CUR */ - *cur_children = child->next_sibling; - child->next_sibling = NULL; - /* Put CHILD in OBJ */ - *obj_children = child; - obj_children = &child->next_sibling; - break; - } - } - /* cur/obj_children points to last CUR/OBJ child next_sibling pointer, which must be NULL. */ - assert(!*obj_children); - assert(!*cur_children); - - /* Put OBJ where it belongs, or in last in CUR's children. */ - if (!putp) - putp = cur_children; - obj->next_sibling = *putp; - *putp = obj; - - return obj; - - putback: - /* Put-back OBJ children in CUR and return an error. */ - if (putp) - cur_children = putp; /* No need to try to insert before where OBJ was supposed to go */ - else - cur_children = &cur->first_child; /* Start from the beginning */ - /* We can insert in order, but there can be holes in the middle. */ - while ((child = obj->first_child) != NULL) { - /* Remove from OBJ */ - obj->first_child = child->next_sibling; - /* Find child position in CUR, and insert. */ - while (*cur_children && (*cur_children)->cpuset && hwloc__object_cpusets_compare_first(*cur_children, child) < 0) - cur_children = &(*cur_children)->next_sibling; - child->next_sibling = *cur_children; - *cur_children = child; - } - return NULL; -} - -/* insertion routine that lets you change the error reporting callback */ -struct hwloc_obj * -hwloc__insert_object_by_cpuset(struct hwloc_topology *topology, hwloc_obj_t obj, - hwloc_report_error_t report_error) -{ - struct hwloc_obj *result; - /* Start at the top. */ - result = hwloc___insert_object_by_cpuset(topology, topology->levels[0][0], obj, report_error); - if (result != obj) { - /* either failed to insert, or got merged, free the original object */ - hwloc_free_unlinked_object(obj); - } else { - /* Add the cpuset to the top */ - hwloc_bitmap_or(topology->levels[0][0]->complete_cpuset, topology->levels[0][0]->complete_cpuset, obj->cpuset); - if (obj->nodeset) - hwloc_bitmap_or(topology->levels[0][0]->complete_nodeset, topology->levels[0][0]->complete_nodeset, obj->nodeset); - } - return result; -} - -/* the default insertion routine warns in case of error. - * it's used by most backends */ -struct hwloc_obj * -hwloc_insert_object_by_cpuset(struct hwloc_topology *topology, hwloc_obj_t obj) -{ - return hwloc__insert_object_by_cpuset(topology, obj, hwloc_report_os_error); -} - -void -hwloc_insert_object_by_parent(struct hwloc_topology *topology, hwloc_obj_t parent, hwloc_obj_t obj) -{ - hwloc_obj_t child, next_child = obj->first_child; - hwloc_obj_t *current; - - /* Append to the end of the list. - * The caller takes care of inserting children in the right cpuset order. - * XML checks the order. - * Duplicating doesn't need to check the order since the source topology is supposed to be OK already. - * Other callers just insert random objects such as I/O or Misc. - */ - for (current = &parent->first_child; *current; current = &(*current)->next_sibling); - *current = obj; - obj->next_sibling = NULL; - obj->first_child = NULL; - - /* Use the new object to insert children */ - parent = obj; - - /* Recursively insert children below */ - while (next_child) { - child = next_child; - next_child = child->next_sibling; - hwloc_insert_object_by_parent(topology, parent, child); - } - - if (obj->type == HWLOC_OBJ_MISC) { - /* misc objects go in no level (needed here because level building doesn't see Misc objects inside I/O trees) */ - obj->depth = (unsigned) HWLOC_TYPE_DEPTH_UNKNOWN; - } -} - -/* Adds a misc object _after_ detection, and thus has to reconnect all the pointers */ -hwloc_obj_t -hwloc_topology_insert_misc_object_by_cpuset(struct hwloc_topology *topology, hwloc_const_bitmap_t cpuset, const char *name) -{ - hwloc_obj_t obj, child; - - if (!topology->is_loaded) { - errno = EINVAL; - return NULL; - } - - if (hwloc_bitmap_iszero(cpuset)) - return NULL; - if (!hwloc_bitmap_isincluded(cpuset, hwloc_topology_get_topology_cpuset(topology))) - return NULL; - - obj = hwloc_alloc_setup_object(HWLOC_OBJ_MISC, -1); - if (name) - obj->name = strdup(name); - - /* misc objects go in no level */ - obj->depth = (unsigned) HWLOC_TYPE_DEPTH_UNKNOWN; - - obj->cpuset = hwloc_bitmap_dup(cpuset); - /* initialize default cpusets, we'll adjust them later */ - obj->complete_cpuset = hwloc_bitmap_dup(cpuset); - obj->allowed_cpuset = hwloc_bitmap_dup(cpuset); - obj->online_cpuset = hwloc_bitmap_dup(cpuset); - - obj = hwloc__insert_object_by_cpuset(topology, obj, NULL /* do not show errors on stdout */); - if (!obj) - return NULL; - - hwloc_connect_children(topology->levels[0][0]); - - if ((child = obj->first_child) != NULL && child->cpuset) { - /* keep the main cpuset untouched, but update other cpusets and nodesets from children */ - obj->nodeset = hwloc_bitmap_alloc(); - obj->complete_nodeset = hwloc_bitmap_alloc(); - obj->allowed_nodeset = hwloc_bitmap_alloc(); - while (child) { - if (child->complete_cpuset) - hwloc_bitmap_or(obj->complete_cpuset, obj->complete_cpuset, child->complete_cpuset); - if (child->allowed_cpuset) - hwloc_bitmap_or(obj->allowed_cpuset, obj->allowed_cpuset, child->allowed_cpuset); - if (child->online_cpuset) - hwloc_bitmap_or(obj->online_cpuset, obj->online_cpuset, child->online_cpuset); - if (child->nodeset) - hwloc_bitmap_or(obj->nodeset, obj->nodeset, child->nodeset); - if (child->complete_nodeset) - hwloc_bitmap_or(obj->complete_nodeset, obj->complete_nodeset, child->complete_nodeset); - if (child->allowed_nodeset) - hwloc_bitmap_or(obj->allowed_nodeset, obj->allowed_nodeset, child->allowed_nodeset); - child = child->next_sibling; - } - } else { - /* copy the parent nodesets */ - obj->nodeset = hwloc_bitmap_dup(obj->parent->nodeset); - obj->complete_nodeset = hwloc_bitmap_dup(obj->parent->complete_nodeset); - obj->allowed_nodeset = hwloc_bitmap_dup(obj->parent->allowed_nodeset); - } - - return obj; -} - -hwloc_obj_t -hwloc_topology_insert_misc_object_by_parent(struct hwloc_topology *topology, hwloc_obj_t parent, const char *name) -{ - hwloc_obj_t obj = hwloc_alloc_setup_object(HWLOC_OBJ_MISC, -1); - if (name) - obj->name = strdup(name); - - if (!topology->is_loaded) { - hwloc_free_unlinked_object(obj); - errno = EINVAL; - return NULL; - } - - hwloc_insert_object_by_parent(topology, parent, obj); - - hwloc_connect_children(topology->levels[0][0]); - /* no need to hwloc_connect_levels() since misc object are not in levels */ - - return obj; -} - -/* Append I/O devices below this object to their list */ -static void -append_iodevs(hwloc_topology_t topology, hwloc_obj_t obj) -{ - hwloc_obj_t child, *temp; - - /* make sure we don't have remaining stale pointers from a previous load */ - obj->next_cousin = NULL; - obj->prev_cousin = NULL; - - if (obj->type == HWLOC_OBJ_BRIDGE) { - obj->depth = HWLOC_TYPE_DEPTH_BRIDGE; - /* Insert in the main bridge list */ - if (topology->first_bridge) { - obj->prev_cousin = topology->last_bridge; - obj->prev_cousin->next_cousin = obj; - topology->last_bridge = obj; - } else { - topology->first_bridge = topology->last_bridge = obj; - } - } else if (obj->type == HWLOC_OBJ_PCI_DEVICE) { - obj->depth = HWLOC_TYPE_DEPTH_PCI_DEVICE; - /* Insert in the main pcidev list */ - if (topology->first_pcidev) { - obj->prev_cousin = topology->last_pcidev; - obj->prev_cousin->next_cousin = obj; - topology->last_pcidev = obj; - } else { - topology->first_pcidev = topology->last_pcidev = obj; - } - } else if (obj->type == HWLOC_OBJ_OS_DEVICE) { - obj->depth = HWLOC_TYPE_DEPTH_OS_DEVICE; - /* Insert in the main osdev list */ - if (topology->first_osdev) { - obj->prev_cousin = topology->last_osdev; - obj->prev_cousin->next_cousin = obj; - topology->last_osdev = obj; - } else { - topology->first_osdev = topology->last_osdev = obj; - } - } - - for_each_child_safe(child, obj, temp) - append_iodevs(topology, child); -} - -static int hwloc_memory_page_type_compare(const void *_a, const void *_b) -{ - const struct hwloc_obj_memory_page_type_s *a = _a; - const struct hwloc_obj_memory_page_type_s *b = _b; - /* consider 0 as larger so that 0-size page_type go to the end */ - if (!b->size) - return -1; - /* don't cast a-b in int since those are ullongs */ - if (b->size == a->size) - return 0; - return a->size < b->size ? -1 : 1; -} - -/* Propagate memory counts */ -static void -propagate_total_memory(hwloc_obj_t obj) -{ - hwloc_obj_t *temp, child; - unsigned i; - - /* reset total before counting local and children memory */ - obj->memory.total_memory = 0; - - /* Propagate memory up */ - for_each_child_safe(child, obj, temp) { - propagate_total_memory(child); - obj->memory.total_memory += child->memory.total_memory; - } - obj->memory.total_memory += obj->memory.local_memory; - - /* By the way, sort the page_type array. - * Cannot do it on insert since some backends (e.g. XML) add page_types after inserting the object. - */ - qsort(obj->memory.page_types, obj->memory.page_types_len, sizeof(*obj->memory.page_types), hwloc_memory_page_type_compare); - /* Ignore 0-size page_types, they are at the end */ - for(i=obj->memory.page_types_len; i>=1; i--) - if (obj->memory.page_types[i-1].size) - break; - obj->memory.page_types_len = i; -} - -/* Collect the cpuset of all the PU objects. */ -static void -collect_proc_cpuset(hwloc_obj_t obj, hwloc_obj_t sys) -{ - hwloc_obj_t child, *temp; - - if (sys) { - /* We are already given a pointer to a system object */ - if (obj->type == HWLOC_OBJ_PU) - hwloc_bitmap_or(sys->cpuset, sys->cpuset, obj->cpuset); - } else { - if (obj->cpuset) { - /* This object is the root of a machine */ - sys = obj; - /* Assume no PU for now */ - hwloc_bitmap_zero(obj->cpuset); - } - } - - for_each_child_safe(child, obj, temp) - collect_proc_cpuset(child, sys); -} - -/* While traversing down and up, propagate the offline/disallowed cpus by - * and'ing them to and from the first object that has a cpuset */ -static void -propagate_unused_cpuset(hwloc_obj_t obj, hwloc_obj_t sys) -{ - hwloc_obj_t child, *temp; - - if (obj->cpuset) { - if (sys) { - /* We are already given a pointer to an system object, update it and update ourselves */ - hwloc_bitmap_t mask = hwloc_bitmap_alloc(); - - /* Apply the topology cpuset */ - hwloc_bitmap_and(obj->cpuset, obj->cpuset, sys->cpuset); - - /* Update complete cpuset down */ - if (obj->complete_cpuset) { - hwloc_bitmap_and(obj->complete_cpuset, obj->complete_cpuset, sys->complete_cpuset); - } else { - obj->complete_cpuset = hwloc_bitmap_dup(sys->complete_cpuset); - hwloc_bitmap_and(obj->complete_cpuset, obj->complete_cpuset, obj->cpuset); - } - - /* Update online cpusets */ - if (obj->online_cpuset) { - /* Update ours */ - hwloc_bitmap_and(obj->online_cpuset, obj->online_cpuset, sys->online_cpuset); - - /* Update the given cpuset, but only what we know */ - hwloc_bitmap_copy(mask, obj->cpuset); - hwloc_bitmap_not(mask, mask); - hwloc_bitmap_or(mask, mask, obj->online_cpuset); - hwloc_bitmap_and(sys->online_cpuset, sys->online_cpuset, mask); - } else { - /* Just take it as such */ - obj->online_cpuset = hwloc_bitmap_dup(sys->online_cpuset); - hwloc_bitmap_and(obj->online_cpuset, obj->online_cpuset, obj->cpuset); - } - - /* Update allowed cpusets */ - if (obj->allowed_cpuset) { - /* Update ours */ - hwloc_bitmap_and(obj->allowed_cpuset, obj->allowed_cpuset, sys->allowed_cpuset); - - /* Update the given cpuset, but only what we know */ - hwloc_bitmap_copy(mask, obj->cpuset); - hwloc_bitmap_not(mask, mask); - hwloc_bitmap_or(mask, mask, obj->allowed_cpuset); - hwloc_bitmap_and(sys->allowed_cpuset, sys->allowed_cpuset, mask); - } else { - /* Just take it as such */ - obj->allowed_cpuset = hwloc_bitmap_dup(sys->allowed_cpuset); - hwloc_bitmap_and(obj->allowed_cpuset, obj->allowed_cpuset, obj->cpuset); - } - - hwloc_bitmap_free(mask); - } else { - /* This object is the root of a machine */ - sys = obj; - /* Apply complete cpuset to cpuset, online_cpuset and allowed_cpuset, it - * will automatically be applied below */ - if (obj->complete_cpuset) - hwloc_bitmap_and(obj->cpuset, obj->cpuset, obj->complete_cpuset); - else - obj->complete_cpuset = hwloc_bitmap_dup(obj->cpuset); - if (obj->online_cpuset) - hwloc_bitmap_and(obj->online_cpuset, obj->online_cpuset, obj->complete_cpuset); - else - obj->online_cpuset = hwloc_bitmap_dup(obj->complete_cpuset); - if (obj->allowed_cpuset) - hwloc_bitmap_and(obj->allowed_cpuset, obj->allowed_cpuset, obj->complete_cpuset); - else - obj->allowed_cpuset = hwloc_bitmap_dup(obj->complete_cpuset); - } - } - - for_each_child_safe(child, obj, temp) - propagate_unused_cpuset(child, sys); -} - -/* Force full nodeset for non-NUMA machines */ -static void -add_default_object_sets(hwloc_obj_t obj, int parent_has_sets) -{ - hwloc_obj_t child, *temp; - - /* I/O devices (and their children) have no sets */ - if (hwloc_obj_type_is_io(obj->type)) - return; - - if (parent_has_sets && obj->type != HWLOC_OBJ_MISC) { - /* non-MISC object must have cpuset if parent has one. */ - assert(obj->cpuset); - } - - /* other sets must be consistent with main cpuset: - * check cpusets and add nodesets if needed. - * - * MISC may have no sets at all (if added by parent), or usual ones (if added by cpuset), - * but that's not easy to detect, so just make sure sets are consistent as usual. - */ - if (obj->cpuset) { - assert(obj->online_cpuset); - assert(obj->complete_cpuset); - assert(obj->allowed_cpuset); - if (!obj->nodeset) - obj->nodeset = hwloc_bitmap_alloc_full(); - if (!obj->complete_nodeset) - obj->complete_nodeset = hwloc_bitmap_alloc_full(); - if (!obj->allowed_nodeset) - obj->allowed_nodeset = hwloc_bitmap_alloc_full(); - } else { - assert(!obj->online_cpuset); - assert(!obj->complete_cpuset); - assert(!obj->allowed_cpuset); - assert(!obj->nodeset); - assert(!obj->complete_nodeset); - assert(!obj->allowed_nodeset); - } - - for_each_child_safe(child, obj, temp) - add_default_object_sets(child, obj->cpuset != NULL); -} - -/* Setup object cpusets/nodesets by OR'ing its children. */ -HWLOC_DECLSPEC int -hwloc_fill_object_sets(hwloc_obj_t obj) -{ - hwloc_obj_t child; - assert(obj->cpuset != NULL); - child = obj->first_child; - while (child) { - assert(child->cpuset != NULL); - if (child->complete_cpuset) { - if (!obj->complete_cpuset) - obj->complete_cpuset = hwloc_bitmap_alloc(); - hwloc_bitmap_or(obj->complete_cpuset, obj->complete_cpuset, child->complete_cpuset); - } - if (child->online_cpuset) { - if (!obj->online_cpuset) - obj->online_cpuset = hwloc_bitmap_alloc(); - hwloc_bitmap_or(obj->online_cpuset, obj->online_cpuset, child->online_cpuset); - } - if (child->allowed_cpuset) { - if (!obj->allowed_cpuset) - obj->allowed_cpuset = hwloc_bitmap_alloc(); - hwloc_bitmap_or(obj->allowed_cpuset, obj->allowed_cpuset, child->allowed_cpuset); - } - if (child->nodeset) { - if (!obj->nodeset) - obj->nodeset = hwloc_bitmap_alloc(); - hwloc_bitmap_or(obj->nodeset, obj->nodeset, child->nodeset); - } - if (child->complete_nodeset) { - if (!obj->complete_nodeset) - obj->complete_nodeset = hwloc_bitmap_alloc(); - hwloc_bitmap_or(obj->complete_nodeset, obj->complete_nodeset, child->complete_nodeset); - } - if (child->allowed_nodeset) { - if (!obj->allowed_nodeset) - obj->allowed_nodeset = hwloc_bitmap_alloc(); - hwloc_bitmap_or(obj->allowed_nodeset, obj->allowed_nodeset, child->allowed_nodeset); - } - child = child->next_sibling; - } - return 0; -} - -/* Propagate nodesets up and down */ -static void -propagate_nodeset(hwloc_obj_t obj, hwloc_obj_t sys) -{ - hwloc_obj_t child, *temp; - hwloc_bitmap_t parent_nodeset = NULL; - int parent_weight = 0; - - if (!sys && obj->nodeset) { - sys = obj; - if (!obj->complete_nodeset) - obj->complete_nodeset = hwloc_bitmap_dup(obj->nodeset); - if (!obj->allowed_nodeset) - obj->allowed_nodeset = hwloc_bitmap_dup(obj->complete_nodeset); - } - - if (sys) { - if (obj->nodeset) { - /* Some existing nodeset coming from above, to possibly propagate down */ - parent_nodeset = obj->nodeset; - parent_weight = hwloc_bitmap_weight(parent_nodeset); - } else - obj->nodeset = hwloc_bitmap_alloc(); - } - - for_each_child_safe(child, obj, temp) { - /* don't propagate nodesets in I/O objects, keep them NULL */ - if (hwloc_obj_type_is_io(child->type)) - return; - /* don't propagate nodesets in Misc inserted by parent (no nodeset if no cpuset) */ - if (child->type == HWLOC_OBJ_MISC && !child->cpuset) - return; - - /* Propagate singleton nodesets down */ - if (parent_weight == 1) { - if (!child->nodeset) - child->nodeset = hwloc_bitmap_dup(obj->nodeset); - else if (!hwloc_bitmap_isequal(child->nodeset, parent_nodeset)) { - hwloc_debug_bitmap("Oops, parent nodeset %s", parent_nodeset); - hwloc_debug_bitmap(" is different from child nodeset %s, ignoring the child one\n", child->nodeset); - hwloc_bitmap_copy(child->nodeset, parent_nodeset); - } - } - - /* Recurse */ - propagate_nodeset(child, sys); - - /* Propagate children nodesets up */ - if (sys && child->nodeset) - hwloc_bitmap_or(obj->nodeset, obj->nodeset, child->nodeset); - } -} - -/* Propagate allowed and complete nodesets */ -static void -propagate_nodesets(hwloc_obj_t obj) -{ - hwloc_bitmap_t mask = hwloc_bitmap_alloc(); - hwloc_obj_t child, *temp; - - for_each_child_safe(child, obj, temp) { - /* don't propagate nodesets in I/O objects, keep them NULL */ - if (hwloc_obj_type_is_io(child->type)) - continue; - - if (obj->nodeset) { - /* Update complete nodesets down */ - if (child->complete_nodeset) { - hwloc_bitmap_and(child->complete_nodeset, child->complete_nodeset, obj->complete_nodeset); - } else if (child->nodeset) { - child->complete_nodeset = hwloc_bitmap_dup(obj->complete_nodeset); - hwloc_bitmap_and(child->complete_nodeset, child->complete_nodeset, child->nodeset); - } /* else the child doesn't have nodeset information, we can not provide a complete nodeset */ - - /* Update allowed nodesets down */ - if (child->allowed_nodeset) { - hwloc_bitmap_and(child->allowed_nodeset, child->allowed_nodeset, obj->allowed_nodeset); - } else if (child->nodeset) { - child->allowed_nodeset = hwloc_bitmap_dup(obj->allowed_nodeset); - hwloc_bitmap_and(child->allowed_nodeset, child->allowed_nodeset, child->nodeset); - } - } - - propagate_nodesets(child); - - if (obj->nodeset) { - /* Update allowed nodesets up */ - if (child->nodeset && child->allowed_nodeset) { - hwloc_bitmap_copy(mask, child->nodeset); - hwloc_bitmap_andnot(mask, mask, child->allowed_nodeset); - hwloc_bitmap_andnot(obj->allowed_nodeset, obj->allowed_nodeset, mask); - } - } - } - hwloc_bitmap_free(mask); - - if (obj->nodeset) { - /* Apply complete nodeset to nodeset and allowed_nodeset */ - if (obj->complete_nodeset) - hwloc_bitmap_and(obj->nodeset, obj->nodeset, obj->complete_nodeset); - else - obj->complete_nodeset = hwloc_bitmap_dup(obj->nodeset); - if (obj->allowed_nodeset) - hwloc_bitmap_and(obj->allowed_nodeset, obj->allowed_nodeset, obj->complete_nodeset); - else - obj->allowed_nodeset = hwloc_bitmap_dup(obj->complete_nodeset); - } -} - -static void -remove_unused_sets(hwloc_obj_t obj) -{ - hwloc_obj_t child, *temp; - - if (obj->cpuset) { - hwloc_bitmap_and(obj->cpuset, obj->cpuset, obj->online_cpuset); - hwloc_bitmap_and(obj->cpuset, obj->cpuset, obj->allowed_cpuset); - } - if (obj->nodeset) { - hwloc_bitmap_and(obj->nodeset, obj->nodeset, obj->allowed_nodeset); - } - if (obj->type == HWLOC_OBJ_NUMANODE && obj->os_index != (unsigned) -1 && - !hwloc_bitmap_isset(obj->allowed_nodeset, obj->os_index)) { - unsigned i; - hwloc_debug("Dropping memory from disallowed node %u\n", obj->os_index); - obj->memory.local_memory = 0; - obj->memory.total_memory = 0; - for(i=0; imemory.page_types_len; i++) - obj->memory.page_types[i].count = 0; - } - - for_each_child_safe(child, obj, temp) - remove_unused_sets(child); -} - -static void -reorder_children(hwloc_obj_t parent) -{ - /* move the children list on the side */ - hwloc_obj_t *prev, child, children = parent->first_child; - parent->first_child = NULL; - while (children) { - /* dequeue child */ - child = children; - children = child->next_sibling; - /* find where to enqueue it */ - prev = &parent->first_child; - while (*prev - && (!child->cpuset || !(*prev)->cpuset - || hwloc__object_cpusets_compare_first(child, *prev) > 0)) - prev = &((*prev)->next_sibling); - /* enqueue */ - child->next_sibling = *prev; - *prev = child; - } -} - -/* Remove all ignored objects. */ -static int -remove_ignored(hwloc_topology_t topology, hwloc_obj_t *pparent) -{ - hwloc_obj_t parent = *pparent, child, *pchild; - int dropped_children = 0; - int dropped = 0; - - for_each_child_safe(child, parent, pchild) - dropped_children += remove_ignored(topology, pchild); - - if ((parent != topology->levels[0][0] && - topology->ignored_types[parent->type] == HWLOC_IGNORE_TYPE_ALWAYS) - || (parent->type == HWLOC_OBJ_CACHE && parent->attr->cache.type == HWLOC_OBJ_CACHE_INSTRUCTION - && !(topology->flags & HWLOC_TOPOLOGY_FLAG_ICACHES))) { - hwloc_debug("%s", "\nDropping ignored object "); - hwloc_debug_print_object(0, parent); - unlink_and_free_single_object(pparent); - dropped = 1; - - } else if (dropped_children) { - /* we keep this object but its children changed, reorder them by complete_cpuset */ - reorder_children(parent); - } - - return dropped; -} - -/* Remove all children whose cpuset is empty, except NUMA nodes - * since we want to keep memory information, and except PCI bridges and devices. - */ -static void -remove_empty(hwloc_topology_t topology, hwloc_obj_t *pobj) -{ - hwloc_obj_t obj = *pobj, child, *pchild; - - for_each_child_safe(child, obj, pchild) - remove_empty(topology, pchild); - - if (obj->type != HWLOC_OBJ_NUMANODE - && !obj->first_child /* only remove if all children were removed above, so that we don't remove parents of NUMAnode */ - && !hwloc_obj_type_is_io(obj->type) && obj->type != HWLOC_OBJ_MISC - && obj->cpuset /* don't remove if no cpuset at all, there's likely a good reason why it's different from having an empty cpuset */ - && hwloc_bitmap_iszero(obj->cpuset)) { - /* Remove empty children */ - hwloc_debug("%s", "\nRemoving empty object "); - hwloc_debug_print_object(0, obj); - unlink_and_free_single_object(pobj); - } -} - -/* adjust object cpusets according the given droppedcpuset, - * drop object whose cpuset becomes empty, - * and mark dropped nodes in droppednodeset - */ -static void -restrict_object(hwloc_topology_t topology, unsigned long flags, hwloc_obj_t *pobj, hwloc_const_cpuset_t droppedcpuset, hwloc_nodeset_t droppednodeset, int droppingparent) -{ - hwloc_obj_t obj = *pobj, child, *pchild; - int dropping; - int modified = obj->complete_cpuset && hwloc_bitmap_intersects(obj->complete_cpuset, droppedcpuset); - - hwloc_clear_object_distances(obj); - - if (obj->cpuset) - hwloc_bitmap_andnot(obj->cpuset, obj->cpuset, droppedcpuset); - if (obj->complete_cpuset) - hwloc_bitmap_andnot(obj->complete_cpuset, obj->complete_cpuset, droppedcpuset); - if (obj->online_cpuset) - hwloc_bitmap_andnot(obj->online_cpuset, obj->online_cpuset, droppedcpuset); - if (obj->allowed_cpuset) - hwloc_bitmap_andnot(obj->allowed_cpuset, obj->allowed_cpuset, droppedcpuset); - - if (obj->type == HWLOC_OBJ_MISC) { - dropping = droppingparent && !(flags & HWLOC_RESTRICT_FLAG_ADAPT_MISC); - } else if (hwloc_obj_type_is_io(obj->type)) { - dropping = droppingparent && !(flags & HWLOC_RESTRICT_FLAG_ADAPT_IO); - } else { - dropping = droppingparent || (obj->cpuset && hwloc_bitmap_iszero(obj->cpuset)); - } - - if (modified) - for_each_child_safe(child, obj, pchild) - restrict_object(topology, flags, pchild, droppedcpuset, droppednodeset, dropping); - - if (dropping) { - hwloc_debug("%s", "\nRemoving object during restrict"); - hwloc_debug_print_objects(0, obj); - if (obj->type == HWLOC_OBJ_NUMANODE) - hwloc_bitmap_set(droppednodeset, obj->os_index); - /* remove the object from the tree (no need to remove from levels, they will be entirely rebuilt by the caller) */ - unlink_and_free_single_object(pobj); - /* do not remove children. if they were to be removed, they would have been already */ - } -} - -/* adjust object nodesets accordingly the given droppednodeset - */ -static void -restrict_object_nodeset(hwloc_topology_t topology, hwloc_obj_t *pobj, hwloc_nodeset_t droppednodeset) -{ - hwloc_obj_t obj = *pobj, child, *pchild; - - /* if this object isn't modified, don't bother looking at children */ - if (obj->complete_nodeset && !hwloc_bitmap_intersects(obj->complete_nodeset, droppednodeset)) - return; - - if (obj->nodeset) - hwloc_bitmap_andnot(obj->nodeset, obj->nodeset, droppednodeset); - if (obj->complete_nodeset) - hwloc_bitmap_andnot(obj->complete_nodeset, obj->complete_nodeset, droppednodeset); - if (obj->allowed_nodeset) - hwloc_bitmap_andnot(obj->allowed_nodeset, obj->allowed_nodeset, droppednodeset); - - for_each_child_safe(child, obj, pchild) - restrict_object_nodeset(topology, pchild, droppednodeset); -} - -/* we don't want to merge groups that were inserted explicitly with the custom interface */ -static int -can_merge_group(hwloc_topology_t topology, hwloc_obj_t obj) -{ - const char *value; - /* custom-inserted groups are in custom topologies and have no cpusets, - * don't bother calling hwloc_obj_get_info_by_name() and strcmp() uselessly. - */ - if (!topology->backends->is_custom || obj->cpuset) - return 1; - value = hwloc_obj_get_info_by_name(obj, "Backend"); - return (!value) || strcmp(value, "Custom"); -} - -/* - * Merge with the only child if either the parent or the child has a type to be - * ignored while keeping structure - */ -static int -merge_useless_child(hwloc_topology_t topology, hwloc_obj_t *pparent) -{ - hwloc_obj_t parent = *pparent, child, *pchild, ios; - int replacechild = 0, replaceparent = 0, droppedchildren = 0; - - if (!parent->first_child) - /* There are no child, nothing to merge. */ - return 0; - - for_each_child_safe(child, parent, pchild) - droppedchildren += merge_useless_child(topology, pchild); - - if (droppedchildren) - reorder_children(parent); - - child = parent->first_child; - /* we don't merge if there are multiple "important" children. - * non-important ones are at the end of the list. - * look at the second child to find out. - */ - if (child->next_sibling - /* I/O objects may be ignored when trying to merge */ - && !hwloc_obj_type_is_io(child->next_sibling->type) - /* Misc objects without cpuset may be ignored as well */ - && !(child->next_sibling->type == HWLOC_OBJ_MISC && !child->next_sibling->cpuset)) - /* There are several children that prevent from merging */ - return 0; - - /* There is one important child, and some children that may be ignored - * during merging because they can be attached to anything with the same locality. - * Move them to the side during merging, and append them back later. - * This is easy because children with no cpuset are always last in the list. - */ - ios = child->next_sibling; - child->next_sibling = NULL; - - /* Check whether parent and/or child can be replaced */ - if (topology->ignored_types[parent->type] == HWLOC_IGNORE_TYPE_KEEP_STRUCTURE) { - if (parent->type != HWLOC_OBJ_GROUP || can_merge_group(topology, parent)) - /* Parent can be ignored in favor of the child. */ - replaceparent = 1; - } - if (topology->ignored_types[child->type] == HWLOC_IGNORE_TYPE_KEEP_STRUCTURE) { - if (child->type != HWLOC_OBJ_GROUP || can_merge_group(topology, child)) - /* Child can be ignored in favor of the parent. */ - replacechild = 1; - } - - /* Decide which one to actually replace */ - if (replaceparent && replacechild) { - /* If both may be replaced, look at obj_type_priority */ - if (obj_type_priority[parent->type] > obj_type_priority[child->type]) - replaceparent = 0; - else - replacechild = 0; - } - - if (replaceparent) { - /* Replace parent with child */ - hwloc_debug("%s", "\nIgnoring parent "); - hwloc_debug_print_object(0, parent); - if (parent == topology->levels[0][0]) { - child->parent = NULL; - child->depth = 0; - } - unlink_and_free_single_object(pparent); - - } else if (replacechild) { - /* Replace child with parent */ - hwloc_debug("%s", "\nIgnoring child "); - hwloc_debug_print_object(0, child); - unlink_and_free_single_object(&parent->first_child); - } - - if (ios) { - /* append the remaining list of children to the remaining object */ - pchild = &((*pparent)->first_child); - while (*pchild) - pchild = &((*pchild)->next_sibling); - *pchild = ios; - } - - return replaceparent ? 1 : 0; -} - -static void -hwloc_drop_all_io(hwloc_topology_t topology, hwloc_obj_t root) -{ - hwloc_obj_t child, *pchild; - for_each_child_safe(child, root, pchild) { - if (hwloc_obj_type_is_io(child->type)) - unlink_and_free_object_and_children(pchild); - else - hwloc_drop_all_io(topology, child); - } -} - -/* - * If IO_DEVICES and WHOLE_IO are not set, we drop everything. - * If WHOLE_IO is not set, we drop non-interesting devices, - * and bridges that have no children. - * If IO_BRIDGES is also not set, we also drop all bridges - * except the hostbridges. - */ -static void -hwloc_drop_useless_io(hwloc_topology_t topology, hwloc_obj_t root) -{ - hwloc_obj_t child, *pchild; - - if (!(topology->flags & (HWLOC_TOPOLOGY_FLAG_IO_DEVICES|HWLOC_TOPOLOGY_FLAG_WHOLE_IO))) { - /* drop all I/O children */ - hwloc_drop_all_io(topology, root); - return; - } - - if (!(topology->flags & HWLOC_TOPOLOGY_FLAG_WHOLE_IO)) { - /* drop non-interesting devices */ - for_each_child_safe(child, root, pchild) { - if (child->type == HWLOC_OBJ_PCI_DEVICE) { - unsigned classid = child->attr->pcidev.class_id; - unsigned baseclass = classid >> 8; - if (baseclass != 0x03 /* PCI_BASE_CLASS_DISPLAY */ - && baseclass != 0x02 /* PCI_BASE_CLASS_NETWORK */ - && baseclass != 0x01 /* PCI_BASE_CLASS_STORAGE */ - && baseclass != 0x0b /* PCI_BASE_CLASS_PROCESSOR */ - && classid != 0x0c04 /* PCI_CLASS_SERIAL_FIBER */ - && classid != 0x0c06 /* PCI_CLASS_SERIAL_INFINIBAND */ - && baseclass != 0x12 /* Processing Accelerators */) - unlink_and_free_object_and_children(pchild); - } - } - } - - /* look at remaining children, process recursively, and remove useless bridges */ - for_each_child_safe(child, root, pchild) { - hwloc_drop_useless_io(topology, child); - - if (child->type == HWLOC_OBJ_BRIDGE) { - if (!child->first_child) { - /* bridges with no children are removed if WHOLE_IO isn't given */ - if (!(topology->flags & (HWLOC_TOPOLOGY_FLAG_WHOLE_IO))) { - unlink_and_free_single_object(pchild); - } - - } else if (child->attr->bridge.upstream_type != HWLOC_OBJ_BRIDGE_HOST) { - /* only hostbridges are kept if WHOLE_IO or IO_BRIDGE are not given */ - if (!(topology->flags & (HWLOC_TOPOLOGY_FLAG_IO_BRIDGES|HWLOC_TOPOLOGY_FLAG_WHOLE_IO))) { - unlink_and_free_single_object(pchild); - } - } - } - } -} - -static void -hwloc_propagate_bridge_depth(hwloc_topology_t topology, hwloc_obj_t root, unsigned depth) -{ - hwloc_obj_t child = root->first_child; - while (child) { - if (child->type == HWLOC_OBJ_BRIDGE) { - child->attr->bridge.depth = depth; - hwloc_propagate_bridge_depth(topology, child, depth+1); - } else if (!hwloc_obj_type_is_io(child->type)) { - hwloc_propagate_bridge_depth(topology, child, 0); - } - child = child->next_sibling; - } -} - -static void -hwloc_propagate_symmetric_subtree(hwloc_topology_t topology, hwloc_obj_t root) -{ - hwloc_obj_t child, *array; - - /* assume we're not symmetric by default */ - root->symmetric_subtree = 0; - - /* if no child, we are symmetric */ - if (!root->arity) { - root->symmetric_subtree = 1; - return; - } - - /* look at children, and return if they are not symmetric */ - child = NULL; - while ((child = hwloc_get_next_child(topology, root, child)) != NULL) - hwloc_propagate_symmetric_subtree(topology, child); - while ((child = hwloc_get_next_child(topology, root, child)) != NULL) - if (!child->symmetric_subtree) - return; - - /* now check that children subtrees are identical. - * just walk down the first child in each tree and compare their depth and arities - */ - array = malloc(root->arity * sizeof(*array)); - memcpy(array, root->children, root->arity * sizeof(*array)); - while (1) { - unsigned i; - /* check current level arities and depth */ - for(i=1; iarity; i++) - if (array[i]->depth != array[0]->depth - || array[i]->arity != array[0]->arity) { - free(array); - return; - } - if (!array[0]->arity) - /* no more children level, we're ok */ - break; - /* look at first child of each element now */ - for(i=0; iarity; i++) - array[i] = array[i]->first_child; - } - free(array); - - /* everything went fine, we're symmetric */ - root->symmetric_subtree = 1; -} - -/* - * Initialize handy pointers in the whole topology. - * The topology only had first_child and next_sibling pointers. - * When this funtions return, all parent/children pointers are initialized. - * The remaining fields (levels, cousins, logical_index, depth, ...) will - * be setup later in hwloc_connect_levels(). - * - * Can be called several times, so may have to update the array. - */ -void -hwloc_connect_children(hwloc_obj_t parent) -{ - unsigned n, oldn = parent->arity; - hwloc_obj_t child, prev_child = NULL; - int ok = 1; - - for (n = 0, child = parent->first_child; - child; - n++, prev_child = child, child = child->next_sibling) { - child->parent = parent; - child->sibling_rank = n; - child->prev_sibling = prev_child; - /* already OK in the array? */ - if (n >= oldn || parent->children[n] != child) - ok = 0; - /* recurse */ - hwloc_connect_children(child); - } - parent->last_child = prev_child; - parent->arity = n; - if (!n) { - /* no need for an array anymore */ - free(parent->children); - parent->children = NULL; - return; - } - if (ok) - /* array is already OK (even if too large) */ - return; - - /* alloc a larger array if needed */ - if (oldn < n) { - free(parent->children); - parent->children = malloc(n * sizeof(*parent->children)); - } - /* refill */ - for (n = 0, child = parent->first_child; - child; - n++, child = child->next_sibling) { - parent->children[n] = child; - } -} - -/* - * Check whether there is an object below ROOT that has the same type as OBJ. - * Only used for building levels. - * Stop at I/O or Misc since these don't go into levels, and we never have - * normal objects under them. - */ -static int -find_same_type(hwloc_obj_t root, hwloc_obj_t obj) -{ - hwloc_obj_t child; - - if (hwloc_type_cmp(root, obj) == HWLOC_OBJ_EQUAL) - return 1; - - for (child = root->first_child; child; child = child->next_sibling) - if (!hwloc_obj_type_is_io(child->type) - && child->type != HWLOC_OBJ_MISC - && find_same_type(child, obj)) - return 1; - - return 0; -} - -/* traverse the array of current object and compare them with top_obj. - * if equal, take the object and put its children into the remaining objs. - * if not equal, put the object into the remaining objs. - */ -static int -hwloc_level_take_objects(hwloc_obj_t top_obj, - hwloc_obj_t *current_objs, unsigned n_current_objs, - hwloc_obj_t *taken_objs, unsigned n_taken_objs __hwloc_attribute_unused, - hwloc_obj_t *remaining_objs, unsigned n_remaining_objs __hwloc_attribute_unused) -{ - unsigned taken_i = 0; - unsigned new_i = 0; - unsigned i, j; - - for (i = 0; i < n_current_objs; i++) - if (hwloc_type_cmp(top_obj, current_objs[i]) == HWLOC_OBJ_EQUAL) { - /* Take it, add children. */ - taken_objs[taken_i++] = current_objs[i]; - for (j = 0; j < current_objs[i]->arity; j++) - remaining_objs[new_i++] = current_objs[i]->children[j]; - } else { - /* Leave it. */ - remaining_objs[new_i++] = current_objs[i]; - } - -#ifdef HWLOC_DEBUG - /* Make sure we didn't mess up. */ - assert(taken_i == n_taken_objs); - assert(new_i == n_current_objs - n_taken_objs + n_remaining_objs); -#endif - - return new_i; -} - -/* Given an input object, copy it or its interesting children into the output array. - * If new_obj is NULL, we're just counting interesting ohjects. - */ -static unsigned -hwloc_level_filter_object(hwloc_topology_t topology, - hwloc_obj_t *new_obj, hwloc_obj_t old) -{ - unsigned i, total; - if (hwloc_obj_type_is_io(old->type)) { - if (new_obj) - append_iodevs(topology, old); - return 0; - } - if (old->type != HWLOC_OBJ_MISC) { - if (new_obj) - *new_obj = old; - return 1; - } - for(i=0, total=0; iarity; i++) { - int nb = hwloc_level_filter_object(topology, new_obj, old->children[i]); - if (new_obj) { - new_obj += nb; - } - total += nb; - } - return total; -} - -/* Replace an input array of objects with an input array containing - * only interesting objects for levels. - * Misc objects are removed, their interesting children are added. - * I/O devices are removed and queue to their own lists. - */ -static int -hwloc_level_filter_objects(hwloc_topology_t topology, - hwloc_obj_t **objs, unsigned *n_objs) -{ - hwloc_obj_t *old = *objs, *new; - unsigned nold = *n_objs, nnew, i; - - /* anything to filter? */ - for(i=0; itype) - || old[i]->type == HWLOC_OBJ_MISC) - break; - if (i==nold) - return 0; - - /* count interesting objects and allocate the new array */ - for(i=0, nnew=0; inext_cousin; - } - nb = i; - - /* allocate and fill level */ - *levelp = malloc(nb * sizeof(struct hwloc_obj *)); - obj = first; - i = 0; - while (obj) { - obj->logical_index = i; - (*levelp)[i] = obj; - i++; - obj = obj->next_cousin; - } - - return nb; -} - -/* - * Do the remaining work that hwloc_connect_children() did not do earlier. - */ -int -hwloc_connect_levels(hwloc_topology_t topology) -{ - unsigned l, i=0; - hwloc_obj_t *objs, *taken_objs, *new_objs, top_obj; - unsigned n_objs, n_taken_objs, n_new_objs; - int err; - - /* reset non-root levels (root was initialized during init and will not change here) */ - for(l=1; llevels[l]); - memset(topology->levels+1, 0, (HWLOC_DEPTH_MAX-1)*sizeof(*topology->levels)); - memset(topology->level_nbobjects+1, 0, (HWLOC_DEPTH_MAX-1)*sizeof(*topology->level_nbobjects)); - topology->nb_levels = 1; - /* don't touch next_group_depth, the Group objects are still here */ - - /* initialize all depth to unknown */ - for (l = HWLOC_OBJ_SYSTEM; l < HWLOC_OBJ_MISC; l++) - topology->type_depth[l] = HWLOC_TYPE_DEPTH_UNKNOWN; - /* initialize root type depth */ - topology->type_depth[topology->levels[0][0]->type] = 0; - - /* initialize I/O special levels */ - free(topology->bridge_level); - topology->bridge_level = NULL; - topology->bridge_nbobjects = 0; - topology->first_bridge = topology->last_bridge = NULL; - free(topology->pcidev_level); - topology->pcidev_level = NULL; - topology->pcidev_nbobjects = 0; - topology->first_pcidev = topology->last_pcidev = NULL; - free(topology->osdev_level); - topology->osdev_level = NULL; - topology->osdev_nbobjects = 0; - topology->first_osdev = topology->last_osdev = NULL; - - /* Start with children of the whole system. */ - n_objs = topology->levels[0][0]->arity; - objs = malloc(n_objs * sizeof(objs[0])); - if (!objs) { - errno = ENOMEM; - return -1; - } - memcpy(objs, topology->levels[0][0]->children, n_objs*sizeof(objs[0])); - - /* Filter-out interesting objects */ - err = hwloc_level_filter_objects(topology, &objs, &n_objs); - if (err < 0) - return -1; - - /* Keep building levels while there are objects left in OBJS. */ - while (n_objs) { - /* At this point, the objs array contains only objects that may go into levels */ - - /* First find which type of object is the topmost. - * Don't use PU if there are other types since we want to keep PU at the bottom. - */ - - /* Look for the first non-PU object, and use the first PU if we really find nothing else */ - for (i = 0; i < n_objs; i++) - if (objs[i]->type != HWLOC_OBJ_PU) - break; - top_obj = i == n_objs ? objs[0] : objs[i]; - - /* See if this is actually the topmost object */ - for (i = 0; i < n_objs; i++) { - if (hwloc_type_cmp(top_obj, objs[i]) != HWLOC_OBJ_EQUAL) { - if (find_same_type(objs[i], top_obj)) { - /* OBJS[i] is strictly above an object of the same type as TOP_OBJ, so it - * is above TOP_OBJ. */ - top_obj = objs[i]; - } - } - } - - /* Now peek all objects of the same type, build a level with that and - * replace them with their children. */ - - /* First count them. */ - n_taken_objs = 0; - n_new_objs = 0; - for (i = 0; i < n_objs; i++) - if (hwloc_type_cmp(top_obj, objs[i]) == HWLOC_OBJ_EQUAL) { - n_taken_objs++; - n_new_objs += objs[i]->arity; - } - - /* New level. */ - taken_objs = malloc((n_taken_objs + 1) * sizeof(taken_objs[0])); - /* New list of pending objects. */ - if (n_objs - n_taken_objs + n_new_objs) { - new_objs = malloc((n_objs - n_taken_objs + n_new_objs) * sizeof(new_objs[0])); - } else { -#ifdef HWLOC_DEBUG - assert(!n_new_objs); - assert(n_objs == n_taken_objs); -#endif - new_objs = NULL; - } - - n_new_objs = hwloc_level_take_objects(top_obj, - objs, n_objs, - taken_objs, n_taken_objs, - new_objs, n_new_objs); - - /* Ok, put numbers in the level and link cousins. */ - for (i = 0; i < n_taken_objs; i++) { - taken_objs[i]->depth = topology->nb_levels; - taken_objs[i]->logical_index = i; - if (i) { - taken_objs[i]->prev_cousin = taken_objs[i-1]; - taken_objs[i-1]->next_cousin = taken_objs[i]; - } - } - taken_objs[0]->prev_cousin = NULL; - taken_objs[n_taken_objs-1]->next_cousin = NULL; - - /* One more level! */ - if (top_obj->type == HWLOC_OBJ_CACHE) - hwloc_debug("--- Cache level depth %u", top_obj->attr->cache.depth); - else - hwloc_debug("--- %s level", hwloc_obj_type_string(top_obj->type)); - hwloc_debug(" has number %u\n\n", topology->nb_levels); - - if (topology->type_depth[top_obj->type] == HWLOC_TYPE_DEPTH_UNKNOWN) - topology->type_depth[top_obj->type] = topology->nb_levels; - else - topology->type_depth[top_obj->type] = HWLOC_TYPE_DEPTH_MULTIPLE; /* mark as unknown */ - - taken_objs[n_taken_objs] = NULL; - - topology->level_nbobjects[topology->nb_levels] = n_taken_objs; - topology->levels[topology->nb_levels] = taken_objs; - - topology->nb_levels++; - - free(objs); - - /* Switch to new_objs, after filtering-out interesting objects */ - err = hwloc_level_filter_objects(topology, &new_objs, &n_new_objs); - if (err < 0) - return -1; - - objs = new_objs; - n_objs = n_new_objs; - } - - /* It's empty now. */ - if (objs) - free(objs); - - topology->bridge_nbobjects = hwloc_build_level_from_list(topology->first_bridge, &topology->bridge_level); - topology->pcidev_nbobjects = hwloc_build_level_from_list(topology->first_pcidev, &topology->pcidev_level); - topology->osdev_nbobjects = hwloc_build_level_from_list(topology->first_osdev, &topology->osdev_level); - - hwloc_propagate_symmetric_subtree(topology, topology->levels[0][0]); - - return 0; -} - -void hwloc_alloc_obj_cpusets(hwloc_obj_t obj) -{ - if (!obj->cpuset) - obj->cpuset = hwloc_bitmap_alloc_full(); - if (!obj->complete_cpuset) - obj->complete_cpuset = hwloc_bitmap_alloc(); - if (!obj->online_cpuset) - obj->online_cpuset = hwloc_bitmap_alloc_full(); - if (!obj->allowed_cpuset) - obj->allowed_cpuset = hwloc_bitmap_alloc_full(); - if (!obj->nodeset) - obj->nodeset = hwloc_bitmap_alloc(); - if (!obj->complete_nodeset) - obj->complete_nodeset = hwloc_bitmap_alloc(); - if (!obj->allowed_nodeset) - obj->allowed_nodeset = hwloc_bitmap_alloc_full(); -} - -/* Main discovery loop */ -static int -hwloc_discover(struct hwloc_topology *topology) -{ - struct hwloc_backend *backend; - int gotsomeio = 0; - unsigned discoveries = 0; - unsigned need_reconnect = 0; - - /* discover() callbacks should use hwloc_insert to add objects initialized - * through hwloc_alloc_setup_object. - * For node levels, nodeset and memory must be initialized. - * For cache levels, memory and type/depth must be initialized. - * For group levels, depth must be initialized. - */ - - /* There must be at least a PU object for each logical processor, at worse - * produced by hwloc_setup_pu_level() - */ - - /* To be able to just use hwloc_insert_object_by_cpuset to insert the object - * in the topology according to the cpuset, the cpuset field must be - * initialized. - */ - - /* A priori, All processors are visible in the topology, online, and allowed - * for the application. - * - * - If some processors exist but topology information is unknown for them - * (and thus the backend couldn't create objects for them), they should be - * added to the complete_cpuset field of the lowest object where the object - * could reside. - * - * - If some processors are not online, they should be dropped from the - * online_cpuset field. - * - * - If some processors are not allowed for the application (e.g. for - * administration reasons), they should be dropped from the allowed_cpuset - * field. - * - * The same applies to the node sets complete_nodeset and allowed_cpuset. - * - * If such field doesn't exist yet, it can be allocated, and initialized to - * zero (for complete), or to full (for online and allowed). The values are - * automatically propagated to the whole tree after detection. - */ - - /* - * Discover CPUs first - */ - backend = topology->backends; - while (NULL != backend) { - int err; - if (backend->component->type != HWLOC_DISC_COMPONENT_TYPE_CPU - && backend->component->type != HWLOC_DISC_COMPONENT_TYPE_GLOBAL) - /* not yet */ - goto next_cpubackend; - if (!backend->discover) - goto next_cpubackend; - - if (need_reconnect && (backend->flags & HWLOC_BACKEND_FLAG_NEED_LEVELS)) { - hwloc_debug("Backend %s forcing a reconnect of levels\n", backend->component->name); - hwloc_connect_children(topology->levels[0][0]); - if (hwloc_connect_levels(topology) < 0) - return -1; - need_reconnect = 0; - } - - err = backend->discover(backend); - if (err >= 0) { - if (backend->component->type == HWLOC_DISC_COMPONENT_TYPE_GLOBAL) - gotsomeio += err; - discoveries++; - if (err > 0) - need_reconnect++; - } - hwloc_debug_print_objects(0, topology->levels[0][0]); - -next_cpubackend: - backend = backend->next; - } - - if (!discoveries) { - hwloc_debug("%s", "No CPU backend enabled or no discovery succeeded\n"); - errno = EINVAL; - return -1; - } - - /* - * Group levels by distances - */ - hwloc_distances_finalize_os(topology); - hwloc_group_by_distances(topology); - - /* Update objects cpusets and nodesets now that the CPU/GLOBAL backend populated PUs and nodes */ - - hwloc_debug("%s", "\nRestrict topology cpusets to existing PU and NODE objects\n"); - collect_proc_cpuset(topology->levels[0][0], NULL); - - hwloc_debug("%s", "\nPropagate offline and disallowed cpus down and up\n"); - propagate_unused_cpuset(topology->levels[0][0], NULL); - - if (topology->levels[0][0]->complete_nodeset && hwloc_bitmap_iszero(topology->levels[0][0]->complete_nodeset)) { - /* No nodeset, drop all of them */ - hwloc_bitmap_free(topology->levels[0][0]->nodeset); - topology->levels[0][0]->nodeset = NULL; - hwloc_bitmap_free(topology->levels[0][0]->complete_nodeset); - topology->levels[0][0]->complete_nodeset = NULL; - hwloc_bitmap_free(topology->levels[0][0]->allowed_nodeset); - topology->levels[0][0]->allowed_nodeset = NULL; - } - hwloc_debug("%s", "\nPropagate nodesets\n"); - propagate_nodeset(topology->levels[0][0], NULL); - propagate_nodesets(topology->levels[0][0]); - - hwloc_debug_print_objects(0, topology->levels[0][0]); - - if (!(topology->flags & HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM)) { - hwloc_debug("%s", "\nRemoving unauthorized and offline sets from all sets\n"); - remove_unused_sets(topology->levels[0][0]); - hwloc_debug_print_objects(0, topology->levels[0][0]); - } - - hwloc_debug("%s", "\nAdd default object sets\n"); - add_default_object_sets(topology->levels[0][0], 0); - - /* Now connect handy pointers to make remaining discovery easier. */ - hwloc_debug("%s", "\nOk, finished tweaking, now connect\n"); - hwloc_connect_children(topology->levels[0][0]); - if (hwloc_connect_levels(topology) < 0) - return -1; - hwloc_debug_print_objects(0, topology->levels[0][0]); - - /* - * Additional discovery with other backends - */ - - backend = topology->backends; - need_reconnect = 0; - while (NULL != backend) { - int err; - if (backend->component->type == HWLOC_DISC_COMPONENT_TYPE_CPU - || backend->component->type == HWLOC_DISC_COMPONENT_TYPE_GLOBAL) - /* already done above */ - goto next_noncpubackend; - if (!backend->discover) - goto next_noncpubackend; - - if (need_reconnect && (backend->flags & HWLOC_BACKEND_FLAG_NEED_LEVELS)) { - hwloc_debug("Backend %s forcing a reconnect of levels\n", backend->component->name); - hwloc_connect_children(topology->levels[0][0]); - if (hwloc_connect_levels(topology) < 0) - return -1; - need_reconnect = 0; - } - - err = backend->discover(backend); - if (err >= 0) { - gotsomeio += err; - if (err > 0) - need_reconnect++; - } - hwloc_debug_print_objects(0, topology->levels[0][0]); - -next_noncpubackend: - backend = backend->next; - } - - /* if we got anything, filter interesting objects and update the tree */ - if (gotsomeio) { - hwloc_drop_useless_io(topology, topology->levels[0][0]); - hwloc_debug("%s", "\nNow reconnecting\n"); - hwloc_debug_print_objects(0, topology->levels[0][0]); - hwloc_propagate_bridge_depth(topology, topology->levels[0][0], 0); - } - - /* Removed some stuff */ - - hwloc_debug("%s", "\nRemoving ignored objects\n"); - remove_ignored(topology, &topology->levels[0][0]); - hwloc_debug_print_objects(0, topology->levels[0][0]); - - hwloc_debug("%s", "\nRemoving empty objects except numa nodes and PCI devices\n"); - remove_empty(topology, &topology->levels[0][0]); - if (!topology->levels[0][0]) { - fprintf(stderr, "Topology became empty, aborting!\n"); - abort(); - } - hwloc_debug_print_objects(0, topology->levels[0][0]); - - hwloc_debug("%s", "\nRemoving objects whose type has HWLOC_IGNORE_TYPE_KEEP_STRUCTURE and have only one child or are the only child\n"); - merge_useless_child(topology, &topology->levels[0][0]); - hwloc_debug_print_objects(0, topology->levels[0][0]); - - /* Reconnect things after all these changes */ - hwloc_connect_children(topology->levels[0][0]); - if (hwloc_connect_levels(topology) < 0) - return -1; - - /* accumulate children memory in total_memory fields (only once parent is set) */ - hwloc_debug("%s", "\nPropagate total memory up\n"); - propagate_total_memory(topology->levels[0][0]); - - /* - * Now that objects are numbered, take distance matrices from backends and put them in the main topology. - * - * Some objects may have disappeared (in removed_empty or removed_ignored) since we setup os distances - * (hwloc_distances_finalize_os()) above. Reset them so as to not point to disappeared objects anymore. - */ - hwloc_distances_restrict_os(topology); - hwloc_distances_finalize_os(topology); - hwloc_distances_finalize_logical(topology); - - /* add some identification attributes if not loading from XML */ - if (topology->backends - && strcmp(topology->backends->component->name, "xml")) { - char *value; - /* add a hwlocVersion */ - hwloc_obj_add_info(topology->levels[0][0], "hwlocVersion", HWLOC_VERSION); - /* add a ProcessName */ - value = hwloc_progname(topology); - if (value) { - hwloc_obj_add_info(topology->levels[0][0], "ProcessName", value); - free(value); - } - } - - /* - * Now set binding hooks according to topology->is_thissystem - * what the native OS backend offers. - */ - hwloc_set_binding_hooks(topology); - - return 0; -} - -/* To be before discovery is actually launched, - * Resets everything in case a previous load initialized some stuff. - */ -void -hwloc_topology_setup_defaults(struct hwloc_topology *topology) -{ - struct hwloc_obj *root_obj; - unsigned l; - - /* reset support */ - memset(&topology->binding_hooks, 0, sizeof(topology->binding_hooks)); - memset(topology->support.discovery, 0, sizeof(*topology->support.discovery)); - memset(topology->support.cpubind, 0, sizeof(*topology->support.cpubind)); - memset(topology->support.membind, 0, sizeof(*topology->support.membind)); - - /* Only the System object on top by default */ - topology->nb_levels = 1; /* there's at least SYSTEM */ - topology->next_group_depth = 0; - topology->levels[0] = malloc (sizeof (hwloc_obj_t)); - topology->level_nbobjects[0] = 1; - /* NULLify other levels so that we can detect and free old ones in hwloc_connect_levels() if needed */ - memset(topology->levels+1, 0, (HWLOC_DEPTH_MAX-1)*sizeof(*topology->levels)); - topology->bridge_level = NULL; - topology->pcidev_level = NULL; - topology->osdev_level = NULL; - topology->first_bridge = topology->last_bridge = NULL; - topology->first_pcidev = topology->last_pcidev = NULL; - topology->first_osdev = topology->last_osdev = NULL; - /* sane values to type_depth */ - for (l = HWLOC_OBJ_SYSTEM; l < HWLOC_OBJ_MISC; l++) - topology->type_depth[l] = HWLOC_TYPE_DEPTH_UNKNOWN; - topology->type_depth[HWLOC_OBJ_BRIDGE] = HWLOC_TYPE_DEPTH_BRIDGE; - topology->type_depth[HWLOC_OBJ_PCI_DEVICE] = HWLOC_TYPE_DEPTH_PCI_DEVICE; - topology->type_depth[HWLOC_OBJ_OS_DEVICE] = HWLOC_TYPE_DEPTH_OS_DEVICE; - - /* Create the actual machine object, but don't touch its attributes yet - * since the OS backend may still change the object into something else - * (for instance System) - */ - root_obj = hwloc_alloc_setup_object(HWLOC_OBJ_MACHINE, 0); - root_obj->depth = 0; - root_obj->logical_index = 0; - root_obj->sibling_rank = 0; - topology->levels[0][0] = root_obj; -} - -int -hwloc_topology_init (struct hwloc_topology **topologyp) -{ - struct hwloc_topology *topology; - int i; - - topology = malloc (sizeof (struct hwloc_topology)); - if(!topology) - return -1; - - hwloc_components_init(topology); - - /* Setup topology context */ - topology->is_loaded = 0; - topology->flags = 0; - topology->is_thissystem = 1; - topology->pid = 0; - topology->userdata = NULL; - - topology->support.discovery = malloc(sizeof(*topology->support.discovery)); - topology->support.cpubind = malloc(sizeof(*topology->support.cpubind)); - topology->support.membind = malloc(sizeof(*topology->support.membind)); - - /* Only ignore useless cruft by default */ - for(i = HWLOC_OBJ_SYSTEM; i < HWLOC_OBJ_TYPE_MAX; i++) - topology->ignored_types[i] = HWLOC_IGNORE_TYPE_NEVER; - topology->ignored_types[HWLOC_OBJ_GROUP] = HWLOC_IGNORE_TYPE_KEEP_STRUCTURE; - - hwloc_distances_init(topology); - - topology->userdata_export_cb = NULL; - topology->userdata_import_cb = NULL; - topology->userdata_not_decoded = 0; - - /* Make the topology look like something coherent but empty */ - hwloc_topology_setup_defaults(topology); - - *topologyp = topology; - return 0; -} - -int -hwloc_topology_set_pid(struct hwloc_topology *topology __hwloc_attribute_unused, - hwloc_pid_t pid __hwloc_attribute_unused) -{ - /* this does *not* change the backend */ -#ifdef HWLOC_LINUX_SYS - topology->pid = pid; - return 0; -#else /* HWLOC_LINUX_SYS */ - errno = ENOSYS; - return -1; -#endif /* HWLOC_LINUX_SYS */ -} - -int -hwloc_topology_set_fsroot(struct hwloc_topology *topology, const char *fsroot_path) -{ - return hwloc_disc_component_force_enable(topology, - 0 /* api */, - HWLOC_DISC_COMPONENT_TYPE_CPU, "linux", - fsroot_path, NULL, NULL); -} - -int -hwloc_topology_set_synthetic(struct hwloc_topology *topology, const char *description) -{ - return hwloc_disc_component_force_enable(topology, - 0 /* api */, - -1, "synthetic", - description, NULL, NULL); -} - -int -hwloc_topology_set_xml(struct hwloc_topology *topology, - const char *xmlpath) -{ - return hwloc_disc_component_force_enable(topology, - 0 /* api */, - -1, "xml", - xmlpath, NULL, NULL); -} - -int -hwloc_topology_set_xmlbuffer(struct hwloc_topology *topology, - const char *xmlbuffer, - int size) -{ - return hwloc_disc_component_force_enable(topology, - 0 /* api */, - -1, "xml", NULL, - xmlbuffer, (void*) (uintptr_t) size); -} - -int -hwloc_topology_set_custom(struct hwloc_topology *topology) -{ - return hwloc_disc_component_force_enable(topology, - 0 /* api */, - -1, "custom", - NULL, NULL, NULL); -} - -int -hwloc_topology_set_flags (struct hwloc_topology *topology, unsigned long flags) -{ - if (topology->is_loaded) { - /* actually harmless */ - errno = EBUSY; - return -1; - } - topology->flags = flags; - return 0; -} - -unsigned long -hwloc_topology_get_flags (struct hwloc_topology *topology) -{ - return topology->flags; -} - -int -hwloc_topology_ignore_type(struct hwloc_topology *topology, hwloc_obj_type_t type) -{ - if (type >= HWLOC_OBJ_TYPE_MAX) { - errno = EINVAL; - return -1; - } - - if (type == HWLOC_OBJ_PU) { - /* we need the PU level */ - errno = EINVAL; - return -1; - } else if (hwloc_obj_type_is_io(type)) { - /* I/O devices aren't in any level, use topology flags to ignore them */ - errno = EINVAL; - return -1; - } - - topology->ignored_types[type] = HWLOC_IGNORE_TYPE_ALWAYS; - return 0; -} - -int -hwloc_topology_ignore_type_keep_structure(struct hwloc_topology *topology, hwloc_obj_type_t type) -{ - if (type >= HWLOC_OBJ_TYPE_MAX) { - errno = EINVAL; - return -1; - } - - if (type == HWLOC_OBJ_PU) { - /* we need the PU level */ - errno = EINVAL; - return -1; - } else if (hwloc_obj_type_is_io(type)) { - /* I/O devices aren't in any level, use topology flags to ignore them */ - errno = EINVAL; - return -1; - } - - topology->ignored_types[type] = HWLOC_IGNORE_TYPE_KEEP_STRUCTURE; - return 0; -} - -int -hwloc_topology_ignore_all_keep_structure(struct hwloc_topology *topology) -{ - unsigned type; - for(type = HWLOC_OBJ_SYSTEM; type < HWLOC_OBJ_TYPE_MAX; type++) - if (type != HWLOC_OBJ_PU - && !hwloc_obj_type_is_io((hwloc_obj_type_t) type)) - topology->ignored_types[type] = HWLOC_IGNORE_TYPE_KEEP_STRUCTURE; - return 0; -} - -/* traverse the tree and free everything. - * only use first_child/next_sibling so that it works before load() - * and may be used when switching between backend. - */ -static void -hwloc_topology_clear_tree (struct hwloc_topology *topology, struct hwloc_obj *root) -{ - hwloc_obj_t child = root->first_child; - while (child) { - hwloc_obj_t nextchild = child->next_sibling; - hwloc_topology_clear_tree (topology, child); - child = nextchild; - } - hwloc_free_unlinked_object (root); -} - -void -hwloc_topology_clear (struct hwloc_topology *topology) -{ - unsigned l; - hwloc_topology_clear_tree (topology, topology->levels[0][0]); - for (l=0; lnb_levels; l++) { - free(topology->levels[l]); - topology->levels[l] = NULL; - } - free(topology->bridge_level); - free(topology->pcidev_level); - free(topology->osdev_level); -} - -void -hwloc_topology_destroy (struct hwloc_topology *topology) -{ - hwloc_backends_disable_all(topology); - hwloc_components_destroy_all(topology); - - hwloc_topology_clear(topology); - hwloc_distances_destroy(topology); - - free(topology->support.discovery); - free(topology->support.cpubind); - free(topology->support.membind); - free(topology); -} - -int -hwloc_topology_load (struct hwloc_topology *topology) -{ - int err; - - if (topology->is_loaded) { - errno = EBUSY; - return -1; - } - - if (getenv("HWLOC_XML_USERDATA_NOT_DECODED")) - topology->userdata_not_decoded = 1; - - /* enforce backend anyway if a FORCE variable was given */ - { - const char *fsroot_path_env = getenv("HWLOC_FORCE_FSROOT"); - if (fsroot_path_env) - hwloc_disc_component_force_enable(topology, - 1 /* env force */, - HWLOC_DISC_COMPONENT_TYPE_CPU, "linux", - fsroot_path_env, NULL, NULL); - } - { - const char *xmlpath_env = getenv("HWLOC_FORCE_XMLFILE"); - if (xmlpath_env) - hwloc_disc_component_force_enable(topology, - 1 /* env force */, - -1, "xml", - xmlpath_env, NULL, NULL); - } - - /* only apply non-FORCE variables if we have not changed the backend yet */ - if (!topology->backends) { - const char *fsroot_path_env = getenv("HWLOC_FSROOT"); - if (fsroot_path_env) - hwloc_disc_component_force_enable(topology, - 1 /* env force */, - HWLOC_DISC_COMPONENT_TYPE_CPU, "linux", - fsroot_path_env, NULL, NULL); - } - if (!topology->backends) { - const char *xmlpath_env = getenv("HWLOC_XMLFILE"); - if (xmlpath_env) - hwloc_disc_component_force_enable(topology, - 1 /* env force */, - -1, "xml", - xmlpath_env, NULL, NULL); - } - - /* instantiate all possible other backends now */ - hwloc_disc_components_enable_others(topology); - /* now that backends are enabled, update the thissystem flag */ - hwloc_backends_is_thissystem(topology); - - /* get distance matrix from the environment are store them (as indexes) in the topology. - * indexes will be converted into objects later once the tree will be filled - */ - hwloc_distances_set_from_env(topology); - - /* actual topology discovery */ - err = hwloc_discover(topology); - if (err < 0) - goto out; - -#ifndef HWLOC_DEBUG - if (getenv("HWLOC_DEBUG_CHECK")) -#endif - hwloc_topology_check(topology); - - topology->is_loaded = 1; - return 0; - - out: - hwloc_topology_clear(topology); - hwloc_distances_destroy(topology); - hwloc_topology_setup_defaults(topology); - hwloc_backends_disable_all(topology); - return -1; -} - -int -hwloc_topology_restrict(struct hwloc_topology *topology, hwloc_const_cpuset_t cpuset, unsigned long flags) -{ - hwloc_bitmap_t droppedcpuset, droppednodeset; - - /* make sure we'll keep something in the topology */ - if (!hwloc_bitmap_intersects(cpuset, topology->levels[0][0]->cpuset)) { - errno = EINVAL; /* easy failure, just don't touch the topology */ - return -1; - } - - droppedcpuset = hwloc_bitmap_alloc(); - droppednodeset = hwloc_bitmap_alloc(); - - /* drop object based on the reverse of cpuset, and fill the 'dropped' nodeset */ - hwloc_bitmap_not(droppedcpuset, cpuset); - restrict_object(topology, flags, &topology->levels[0][0], droppedcpuset, droppednodeset, 0 /* root cannot be removed */); - /* update nodesets according to dropped nodeset */ - restrict_object_nodeset(topology, &topology->levels[0][0], droppednodeset); - - hwloc_bitmap_free(droppedcpuset); - hwloc_bitmap_free(droppednodeset); - - hwloc_connect_children(topology->levels[0][0]); - if (hwloc_connect_levels(topology) < 0) - goto out; - - propagate_total_memory(topology->levels[0][0]); - hwloc_distances_restrict(topology, flags); - hwloc_distances_finalize_os(topology); - hwloc_distances_finalize_logical(topology); - return 0; - - out: - /* unrecoverable failure, re-init the topology */ - hwloc_topology_clear(topology); - hwloc_distances_destroy(topology); - hwloc_topology_setup_defaults(topology); - return -1; -} - -int -hwloc_topology_is_thissystem(struct hwloc_topology *topology) -{ - return topology->is_thissystem; -} - -unsigned -hwloc_topology_get_depth(struct hwloc_topology *topology) -{ - return topology->nb_levels; -} - -/* check children between a parent object */ -static void -hwloc__check_children(struct hwloc_obj *parent) -{ - unsigned j; - - if (!parent->arity) { - /* check whether that parent has no children for real */ - assert(!parent->children); - assert(!parent->first_child); - assert(!parent->last_child); - return; - } - /* check whether that parent has children for real */ - assert(parent->children); - assert(parent->first_child); - assert(parent->last_child); - - /* first child specific checks */ - assert(parent->first_child->sibling_rank == 0); - assert(parent->first_child == parent->children[0]); - assert(parent->first_child->prev_sibling == NULL); - - /* last child specific checks */ - assert(parent->last_child->sibling_rank == parent->arity-1); - assert(parent->last_child == parent->children[parent->arity-1]); - assert(parent->last_child->next_sibling == NULL); - - /* check that parent->cpuset == exclusive OR of children - * (can be wrong for complete_cpuset since disallowed/offline/unknown PUs can be removed) - */ - if (parent->cpuset) { - hwloc_bitmap_t remaining_parent_set = hwloc_bitmap_dup(parent->cpuset); - for(j=0; jarity; j++) { - if (!parent->children[j]->cpuset) - continue; - /* check that child cpuset is included in the reminder of the parent */ - assert(hwloc_bitmap_isincluded(parent->children[j]->cpuset, remaining_parent_set)); - hwloc_bitmap_andnot(remaining_parent_set, remaining_parent_set, parent->children[j]->cpuset); - } - if (parent->type == HWLOC_OBJ_PU) { - /* if parent is a PU, its os_index bit may remain. - * it may be in a Misc child inserted by cpuset, or could be in no child */ - if (hwloc_bitmap_weight(remaining_parent_set) == 1) - assert((unsigned) hwloc_bitmap_first(remaining_parent_set) == parent->os_index); - else - assert(hwloc_bitmap_iszero(remaining_parent_set)); - } else { - /* nothing remains */ - assert(hwloc_bitmap_iszero(remaining_parent_set)); - } - hwloc_bitmap_free(remaining_parent_set); - } - - /* check that children complete_cpuset are properly ordered, empty ones may be anywhere - * (can be wrong for main cpuset since removed PUs can break the ordering). - */ - if (parent->complete_cpuset) { - int firstchild; - int prev_firstchild = -1; /* -1 works fine with first comparisons below */ - for(j=0; jarity; j++) { - if (!parent->children[j]->complete_cpuset - || hwloc_bitmap_iszero(parent->children[j]->complete_cpuset)) - continue; - - firstchild = hwloc_bitmap_first(parent->children[j]->complete_cpuset); - assert(prev_firstchild < firstchild); - prev_firstchild = firstchild; - } - } - - /* checks for all children */ - for(j=1; jarity; j++) { - assert(parent->children[j]->parent == parent); - assert(parent->children[j]->sibling_rank == j); - assert(parent->children[j-1]->next_sibling == parent->children[j]); - assert(parent->children[j]->prev_sibling == parent->children[j-1]); - } -} - -static void -hwloc__check_children_depth(struct hwloc_topology *topology, struct hwloc_obj *parent) -{ - hwloc_obj_t child = NULL; - while ((child = hwloc_get_next_child(topology, parent, child)) != NULL) { - if (child->type == HWLOC_OBJ_BRIDGE) - assert(child->depth == (unsigned) HWLOC_TYPE_DEPTH_BRIDGE); - else if (child->type == HWLOC_OBJ_PCI_DEVICE) - assert(child->depth == (unsigned) HWLOC_TYPE_DEPTH_PCI_DEVICE); - else if (child->type == HWLOC_OBJ_OS_DEVICE) - assert(child->depth == (unsigned) HWLOC_TYPE_DEPTH_OS_DEVICE); - else if (child->type == HWLOC_OBJ_MISC) - assert(child->depth == (unsigned) -1); - else if (parent->depth != (unsigned) -1) - assert(child->depth > parent->depth); - hwloc__check_children_depth(topology, child); - } -} - -/* check a whole topology structure */ -void -hwloc_topology_check(struct hwloc_topology *topology) -{ - struct hwloc_obj *obj; - hwloc_obj_type_t type; - unsigned i, j, depth; - - /* check type orders */ - for (type = HWLOC_OBJ_SYSTEM; type < HWLOC_OBJ_TYPE_MAX; type++) { - assert(hwloc_get_order_type(hwloc_get_type_order(type)) == type); - } - for (i = hwloc_get_type_order(HWLOC_OBJ_SYSTEM); - i <= hwloc_get_type_order(HWLOC_OBJ_CORE); i++) { - assert(i == hwloc_get_type_order(hwloc_get_order_type(i))); - } - - /* check that last level is PU */ - assert(hwloc_get_depth_type(topology, hwloc_topology_get_depth(topology)-1) == HWLOC_OBJ_PU); - /* check that other levels are not PU */ - for(i=1; iparent); - - depth = hwloc_topology_get_depth(topology); - - /* check each level */ - for(i=0; idepth == i); - assert(obj->logical_index == j); - /* check that all objects in the level have the same type */ - if (prev) { - assert(hwloc_type_cmp(obj, prev) == HWLOC_OBJ_EQUAL); - assert(prev->next_cousin == obj); - assert(obj->prev_cousin == prev); - } - if (obj->complete_cpuset) { - if (obj->cpuset) - assert(hwloc_bitmap_isincluded(obj->cpuset, obj->complete_cpuset)); - if (obj->online_cpuset) - assert(hwloc_bitmap_isincluded(obj->online_cpuset, obj->complete_cpuset)); - if (obj->allowed_cpuset) - assert(hwloc_bitmap_isincluded(obj->allowed_cpuset, obj->complete_cpuset)); - } - if (obj->complete_nodeset) { - if (obj->nodeset) - assert(hwloc_bitmap_isincluded(obj->nodeset, obj->complete_nodeset)); - if (obj->allowed_nodeset) - assert(hwloc_bitmap_isincluded(obj->allowed_nodeset, obj->complete_nodeset)); - } - /* check that PUs and NUMA nodes have cpuset/nodeset */ - if (obj->type == HWLOC_OBJ_PU) { - assert(obj->cpuset); - assert(hwloc_bitmap_weight(obj->complete_cpuset) == 1); - assert(hwloc_bitmap_first(obj->complete_cpuset) == (int) obj->os_index); - } - if (obj->type == HWLOC_OBJ_NUMANODE) { - assert(obj->nodeset); - assert(hwloc_bitmap_weight(obj->complete_nodeset) == 1); - assert(hwloc_bitmap_first(obj->complete_nodeset) == (int) obj->os_index); - } - /* check children */ - hwloc__check_children(obj); - prev = obj; - } - - /* check first object of the level */ - obj = hwloc_get_obj_by_depth(topology, i, 0); - assert(obj); - assert(!obj->prev_cousin); - - /* check type */ - assert(hwloc_get_depth_type(topology, i) == obj->type); - assert(i == (unsigned) hwloc_get_type_depth(topology, obj->type) || - HWLOC_TYPE_DEPTH_MULTIPLE == hwloc_get_type_depth(topology, obj->type)); - - /* check last object of the level */ - obj = hwloc_get_obj_by_depth(topology, i, width-1); - assert(obj); - assert(!obj->next_cousin); - - /* check last+1 object of the level */ - obj = hwloc_get_obj_by_depth(topology, i, width); - assert(!obj); - } - - /* check bottom objects */ - assert(hwloc_get_nbobjs_by_depth(topology, depth-1) > 0); - for(j=0; jtype == HWLOC_OBJ_PU); - } - - /* check relative depths */ - obj = hwloc_get_root_obj(topology); - assert(obj->depth == 0); - hwloc__check_children_depth(topology, obj); -} - -const struct hwloc_topology_support * -hwloc_topology_get_support(struct hwloc_topology * topology) -{ - return &topology->support; -} - -void hwloc_topology_set_userdata(struct hwloc_topology * topology, const void *userdata) -{ - topology->userdata = (void *) userdata; -} - -void * hwloc_topology_get_userdata(struct hwloc_topology * topology) -{ - return topology->userdata; -} diff --git a/opal/mca/hwloc/hwloc1113/hwloc/src/traversal.c b/opal/mca/hwloc/hwloc1113/hwloc/src/traversal.c deleted file mode 100644 index ac10d501789..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc/src/traversal.c +++ /dev/null @@ -1,714 +0,0 @@ -/* - * Copyright © 2009 CNRS - * Copyright © 2009-2016 Inria. All rights reserved. - * Copyright © 2009-2010 Université Bordeaux - * Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. - * See COPYING in top-level directory. - */ - -#include -#include -#include -#include -#include -#ifdef HAVE_STRINGS_H -#include -#endif /* HAVE_STRINGS_H */ - -int -hwloc_get_type_depth (struct hwloc_topology *topology, hwloc_obj_type_t type) -{ - return topology->type_depth[type]; -} - -hwloc_obj_type_t -hwloc_get_depth_type (hwloc_topology_t topology, unsigned depth) -{ - if (depth >= topology->nb_levels) - switch (depth) { - case HWLOC_TYPE_DEPTH_BRIDGE: - return HWLOC_OBJ_BRIDGE; - case HWLOC_TYPE_DEPTH_PCI_DEVICE: - return HWLOC_OBJ_PCI_DEVICE; - case HWLOC_TYPE_DEPTH_OS_DEVICE: - return HWLOC_OBJ_OS_DEVICE; - default: - return (hwloc_obj_type_t) -1; - } - return topology->levels[depth][0]->type; -} - -unsigned -hwloc_get_nbobjs_by_depth (struct hwloc_topology *topology, unsigned depth) -{ - if (depth >= topology->nb_levels) - switch (depth) { - case HWLOC_TYPE_DEPTH_BRIDGE: - return topology->bridge_nbobjects; - case HWLOC_TYPE_DEPTH_PCI_DEVICE: - return topology->pcidev_nbobjects; - case HWLOC_TYPE_DEPTH_OS_DEVICE: - return topology->osdev_nbobjects; - default: - return 0; - } - return topology->level_nbobjects[depth]; -} - -struct hwloc_obj * -hwloc_get_obj_by_depth (struct hwloc_topology *topology, unsigned depth, unsigned idx) -{ - if (depth >= topology->nb_levels) - switch (depth) { - case HWLOC_TYPE_DEPTH_BRIDGE: - return idx < topology->bridge_nbobjects ? topology->bridge_level[idx] : NULL; - case HWLOC_TYPE_DEPTH_PCI_DEVICE: - return idx < topology->pcidev_nbobjects ? topology->pcidev_level[idx] : NULL; - case HWLOC_TYPE_DEPTH_OS_DEVICE: - return idx < topology->osdev_nbobjects ? topology->osdev_level[idx] : NULL; - default: - return NULL; - } - if (idx >= topology->level_nbobjects[depth]) - return NULL; - return topology->levels[depth][idx]; -} - -unsigned hwloc_get_closest_objs (struct hwloc_topology *topology, struct hwloc_obj *src, struct hwloc_obj **objs, unsigned max) -{ - struct hwloc_obj *parent, *nextparent, **src_objs; - int i,src_nbobjects; - unsigned stored = 0; - - if (!src->cpuset) - return 0; - - src_nbobjects = topology->level_nbobjects[src->depth]; - src_objs = topology->levels[src->depth]; - - parent = src; - while (stored < max) { - while (1) { - nextparent = parent->parent; - if (!nextparent) - goto out; - if (!nextparent->cpuset || !hwloc_bitmap_isequal(parent->cpuset, nextparent->cpuset)) - break; - parent = nextparent; - } - - if (!nextparent->cpuset) - break; - - /* traverse src's objects and find those that are in nextparent and were not in parent */ - for(i=0; icpuset, nextparent->cpuset) - && !hwloc_bitmap_isincluded(src_objs[i]->cpuset, parent->cpuset)) { - objs[stored++] = src_objs[i]; - if (stored == max) - goto out; - } - } - parent = nextparent; - } - - out: - return stored; -} - -static int -hwloc__get_largest_objs_inside_cpuset (struct hwloc_obj *current, hwloc_const_bitmap_t set, - struct hwloc_obj ***res, int *max) -{ - int gotten = 0; - unsigned i; - - /* the caller must ensure this */ - if (*max <= 0) - return 0; - - if (hwloc_bitmap_isequal(current->cpuset, set)) { - **res = current; - (*res)++; - (*max)--; - return 1; - } - - for (i=0; iarity; i++) { - hwloc_bitmap_t subset = hwloc_bitmap_dup(set); - int ret; - - /* split out the cpuset part corresponding to this child and see if there's anything to do */ - if (current->children[i]->cpuset) { - hwloc_bitmap_and(subset, subset, current->children[i]->cpuset); - if (hwloc_bitmap_iszero(subset)) { - hwloc_bitmap_free(subset); - continue; - } - } - - ret = hwloc__get_largest_objs_inside_cpuset (current->children[i], subset, res, max); - gotten += ret; - hwloc_bitmap_free(subset); - - /* if no more room to store remaining objects, return what we got so far */ - if (!*max) - break; - } - - return gotten; -} - -int -hwloc_get_largest_objs_inside_cpuset (struct hwloc_topology *topology, hwloc_const_bitmap_t set, - struct hwloc_obj **objs, int max) -{ - struct hwloc_obj *current = topology->levels[0][0]; - - if (!current->cpuset || !hwloc_bitmap_isincluded(set, current->cpuset)) - return -1; - - if (max <= 0) - return 0; - - return hwloc__get_largest_objs_inside_cpuset (current, set, &objs, &max); -} - -const char * -hwloc_obj_type_string (hwloc_obj_type_t obj) -{ - switch (obj) - { - case HWLOC_OBJ_SYSTEM: return "System"; - case HWLOC_OBJ_MACHINE: return "Machine"; - case HWLOC_OBJ_MISC: return "Misc"; - case HWLOC_OBJ_GROUP: return "Group"; - case HWLOC_OBJ_NUMANODE: return "NUMANode"; - case HWLOC_OBJ_PACKAGE: return "Package"; - case HWLOC_OBJ_CACHE: return "Cache"; - case HWLOC_OBJ_CORE: return "Core"; - case HWLOC_OBJ_BRIDGE: return "Bridge"; - case HWLOC_OBJ_PCI_DEVICE: return "PCIDev"; - case HWLOC_OBJ_OS_DEVICE: return "OSDev"; - case HWLOC_OBJ_PU: return "PU"; - default: return "Unknown"; - } -} - -hwloc_obj_type_t -hwloc_obj_type_of_string (const char * string) -{ - if (!strcasecmp(string, "System")) return HWLOC_OBJ_SYSTEM; - if (!strcasecmp(string, "Machine")) return HWLOC_OBJ_MACHINE; - if (!strcasecmp(string, "Misc")) return HWLOC_OBJ_MISC; - if (!strcasecmp(string, "Group")) return HWLOC_OBJ_GROUP; - if (!strcasecmp(string, "NUMANode") || !strcasecmp(string, "Node")) return HWLOC_OBJ_NUMANODE; - if (!strcasecmp(string, "Package") || !strcasecmp(string, "Socket") /* backward compat with v1.10 */) return HWLOC_OBJ_PACKAGE; - if (!strcasecmp(string, "Cache")) return HWLOC_OBJ_CACHE; - if (!strcasecmp(string, "Core")) return HWLOC_OBJ_CORE; - if (!strcasecmp(string, "PU")) return HWLOC_OBJ_PU; - if (!strcasecmp(string, "Bridge") || !strcasecmp(string, "HostBridge") || !strcasecmp(string, "PCIBridge")) return HWLOC_OBJ_BRIDGE; - if (!strcasecmp(string, "PCIDev")) return HWLOC_OBJ_PCI_DEVICE; - if (!strcasecmp(string, "OSDev")) return HWLOC_OBJ_OS_DEVICE; - return (hwloc_obj_type_t) -1; -} - -int -hwloc_obj_type_sscanf(const char *string, hwloc_obj_type_t *typep, int *depthattrp, void *typeattrp, size_t typeattrsize) -{ - hwloc_obj_type_t type = (hwloc_obj_type_t) -1; - int depthattr = -1; - hwloc_obj_cache_type_t cachetypeattr = (hwloc_obj_cache_type_t) -1; /* unspecified */ - char *end; - - /* never match the ending \0 since we want to match things like core:2 too. - * just use hwloc_strncasecmp() everywhere. - */ - - /* types without depthattr */ - if (!hwloc_strncasecmp(string, "system", 2)) { - type = HWLOC_OBJ_SYSTEM; - } else if (!hwloc_strncasecmp(string, "machine", 2)) { - type = HWLOC_OBJ_MACHINE; - } else if (!hwloc_strncasecmp(string, "node", 2) - || !hwloc_strncasecmp(string, "numa", 2)) { /* matches node and numanode */ - type = HWLOC_OBJ_NUMANODE; - } else if (!hwloc_strncasecmp(string, "package", 2) - || !hwloc_strncasecmp(string, "socket", 2)) { /* backward compat with v1.10 */ - type = HWLOC_OBJ_PACKAGE; - } else if (!hwloc_strncasecmp(string, "core", 2)) { - type = HWLOC_OBJ_CORE; - } else if (!hwloc_strncasecmp(string, "pu", 2)) { - type = HWLOC_OBJ_PU; - } else if (!hwloc_strncasecmp(string, "misc", 4)) { - type = HWLOC_OBJ_MISC; - } else if (!hwloc_strncasecmp(string, "bridge", 4) - || !hwloc_strncasecmp(string, "hostbridge", 6) - || !hwloc_strncasecmp(string, "pcibridge", 5)) { - type = HWLOC_OBJ_BRIDGE; - } else if (!hwloc_strncasecmp(string, "pci", 3)) { - type = HWLOC_OBJ_PCI_DEVICE; - } else if (!hwloc_strncasecmp(string, "os", 2) - || !hwloc_strncasecmp(string, "bloc", 4) - || !hwloc_strncasecmp(string, "net", 3) - || !hwloc_strncasecmp(string, "openfab", 7) - || !hwloc_strncasecmp(string, "dma", 3) - || !hwloc_strncasecmp(string, "gpu", 3) - || !hwloc_strncasecmp(string, "copro", 5) - || !hwloc_strncasecmp(string, "co-pro", 6)) { - type = HWLOC_OBJ_OS_DEVICE; - - /* types with depthattr */ - } else if (!hwloc_strncasecmp(string, "cache", 2)) { - type = HWLOC_OBJ_CACHE; - - } else if ((string[0] == 'l' || string[0] == 'L') && string[1] >= '0' && string[1] <= '9') { - type = HWLOC_OBJ_CACHE; - depthattr = strtol(string+1, &end, 10); - if (*end == 'd') { - cachetypeattr = HWLOC_OBJ_CACHE_DATA; - } else if (*end == 'i') { - cachetypeattr = HWLOC_OBJ_CACHE_INSTRUCTION; - } else if (*end == 'u') { - cachetypeattr = HWLOC_OBJ_CACHE_UNIFIED; - } - - } else if (!hwloc_strncasecmp(string, "group", 2)) { - size_t length; - type = HWLOC_OBJ_GROUP; - length = strcspn(string, "0123456789"); - if (length <= 5 && !hwloc_strncasecmp(string, "group", length) - && string[length] >= '0' && string[length] <= '9') { - depthattr = strtol(string+length, &end, 10); - } - } else - return -1; - - *typep = type; - if (depthattrp) - *depthattrp = depthattr; - if (typeattrp) { - if (type == HWLOC_OBJ_CACHE && sizeof(hwloc_obj_cache_type_t) <= typeattrsize) - memcpy(typeattrp, &cachetypeattr, sizeof(hwloc_obj_cache_type_t)); - } - - return 0; -} - -static const char * -hwloc_pci_class_string(unsigned short class_id) -{ - switch ((class_id & 0xff00) >> 8) { - case 0x00: - switch (class_id) { - case 0x0001: return "VGA"; - } - return "PCI"; - case 0x01: - switch (class_id) { - case 0x0100: return "SCSI"; - case 0x0101: return "IDE"; - case 0x0102: return "Flop"; - case 0x0103: return "IPI"; - case 0x0104: return "RAID"; - case 0x0105: return "ATA"; - case 0x0106: return "SATA"; - case 0x0107: return "SAS"; - case 0x0108: return "NVMExp"; - } - return "Stor"; - case 0x02: - switch (class_id) { - case 0x0200: return "Ether"; - case 0x0201: return "TokRn"; - case 0x0202: return "FDDI"; - case 0x0203: return "ATM"; - case 0x0204: return "ISDN"; - case 0x0205: return "WrdFip"; - case 0x0206: return "PICMG"; - case 0x0207: return "IB"; - } - return "Net"; - case 0x03: - switch (class_id) { - case 0x0300: return "VGA"; - case 0x0301: return "XGA"; - case 0x0302: return "3D"; - } - return "Disp"; - case 0x04: - switch (class_id) { - case 0x0400: return "Video"; - case 0x0401: return "Audio"; - case 0x0402: return "Phone"; - case 0x0403: return "Auddv"; - } - return "MM"; - case 0x05: - switch (class_id) { - case 0x0500: return "RAM"; - case 0x0501: return "Flash"; - } - return "Mem"; - case 0x06: - switch (class_id) { - case 0x0600: return "Host"; - case 0x0601: return "ISA"; - case 0x0602: return "EISA"; - case 0x0603: return "MC"; - case 0x0604: return "PCI_B"; - case 0x0605: return "PCMCIA"; - case 0x0606: return "Nubus"; - case 0x0607: return "CardBus"; - case 0x0608: return "RACEway"; - case 0x0609: return "PCI_SB"; - case 0x060a: return "IB_B"; - } - return "Bridg"; - case 0x07: - switch (class_id) { - case 0x0700: return "Ser"; - case 0x0701: return "Para"; - case 0x0702: return "MSer"; - case 0x0703: return "Modm"; - case 0x0704: return "GPIB"; - case 0x0705: return "SmrtCrd"; - } - return "Comm"; - case 0x08: - switch (class_id) { - case 0x0800: return "PIC"; - case 0x0801: return "DMA"; - case 0x0802: return "Time"; - case 0x0803: return "RTC"; - case 0x0804: return "HtPl"; - case 0x0805: return "SD-HtPl"; - case 0x0806: return "IOMMU"; - } - return "Syst"; - case 0x09: - switch (class_id) { - case 0x0900: return "Kbd"; - case 0x0901: return "Pen"; - case 0x0902: return "Mouse"; - case 0x0903: return "Scan"; - case 0x0904: return "Game"; - } - return "In"; - case 0x0a: - return "Dock"; - case 0x0b: - switch (class_id) { - case 0x0b00: return "386"; - case 0x0b01: return "486"; - case 0x0b02: return "Pent"; - case 0x0b10: return "Alpha"; - case 0x0b20: return "PPC"; - case 0x0b30: return "MIPS"; - case 0x0b40: return "CoProc"; - } - return "Proc"; - case 0x0c: - switch (class_id) { - case 0x0c00: return "Firw"; - case 0x0c01: return "ACCES"; - case 0x0c02: return "SSA"; - case 0x0c03: return "USB"; - case 0x0c04: return "Fibre"; - case 0x0c05: return "SMBus"; - case 0x0c06: return "IB"; - case 0x0c07: return "IPMI"; - case 0x0c08: return "SERCOS"; - case 0x0c09: return "CANBUS"; - } - return "Ser"; - case 0x0d: - switch (class_id) { - case 0x0d00: return "IRDA"; - case 0x0d01: return "IR"; - case 0x0d10: return "RF"; - case 0x0d11: return "Blueth"; - case 0x0d12: return "BroadB"; - case 0x0d20: return "802.1a"; - case 0x0d21: return "802.1b"; - } - return "Wifi"; - case 0x0e: - switch (class_id) { - case 0x0e00: return "I2O"; - } - return "Intll"; - case 0x0f: - switch (class_id) { - case 0x0f00: return "S-TV"; - case 0x0f01: return "S-Aud"; - case 0x0f02: return "S-Voice"; - case 0x0f03: return "S-Data"; - } - return "Satel"; - case 0x10: - return "Crypt"; - case 0x11: - return "Signl"; - case 0x12: - return "Accel"; - case 0x13: - return "Instr"; - case 0xff: - return "Oth"; - } - return "PCI"; -} - -static const char* hwloc_obj_cache_type_letter(hwloc_obj_cache_type_t type) -{ - switch (type) { - case HWLOC_OBJ_CACHE_UNIFIED: return ""; - case HWLOC_OBJ_CACHE_DATA: return "d"; - case HWLOC_OBJ_CACHE_INSTRUCTION: return "i"; - default: return "unknown"; - } -} - -int -hwloc_obj_type_snprintf(char * __hwloc_restrict string, size_t size, hwloc_obj_t obj, int verbose) -{ - hwloc_obj_type_t type = obj->type; - switch (type) { - case HWLOC_OBJ_MISC: - case HWLOC_OBJ_SYSTEM: - case HWLOC_OBJ_MACHINE: - case HWLOC_OBJ_NUMANODE: - case HWLOC_OBJ_PACKAGE: - case HWLOC_OBJ_CORE: - case HWLOC_OBJ_PU: - return hwloc_snprintf(string, size, "%s", hwloc_obj_type_string(type)); - case HWLOC_OBJ_CACHE: - return hwloc_snprintf(string, size, "L%u%s%s", obj->attr->cache.depth, - hwloc_obj_cache_type_letter(obj->attr->cache.type), - verbose ? hwloc_obj_type_string(type): ""); - case HWLOC_OBJ_GROUP: - /* TODO: more pretty presentation? */ - if (obj->attr->group.depth != (unsigned) -1) - return hwloc_snprintf(string, size, "%s%u", hwloc_obj_type_string(type), obj->attr->group.depth); - else - return hwloc_snprintf(string, size, "%s", hwloc_obj_type_string(type)); - case HWLOC_OBJ_BRIDGE: - if (verbose) - return snprintf(string, size, "Bridge %s->%s", - obj->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_PCI ? "PCI" : "Host", - "PCI"); - else - return snprintf(string, size, obj->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_PCI ? "PCIBridge" : "HostBridge"); - case HWLOC_OBJ_PCI_DEVICE: - return snprintf(string, size, "PCI %04x:%04x", - obj->attr->pcidev.vendor_id, obj->attr->pcidev.device_id); - case HWLOC_OBJ_OS_DEVICE: - switch (obj->attr->osdev.type) { - case HWLOC_OBJ_OSDEV_BLOCK: return hwloc_snprintf(string, size, "Block"); - case HWLOC_OBJ_OSDEV_NETWORK: return hwloc_snprintf(string, size, verbose ? "Network" : "Net"); - case HWLOC_OBJ_OSDEV_OPENFABRICS: return hwloc_snprintf(string, size, "OpenFabrics"); - case HWLOC_OBJ_OSDEV_DMA: return hwloc_snprintf(string, size, "DMA"); - case HWLOC_OBJ_OSDEV_GPU: return hwloc_snprintf(string, size, "GPU"); - case HWLOC_OBJ_OSDEV_COPROC: return hwloc_snprintf(string, size, verbose ? "Co-Processor" : "CoProc"); - default: - if (size > 0) - *string = '\0'; - return 0; - } - break; - default: - if (size > 0) - *string = '\0'; - return 0; - } -} - -int -hwloc_obj_attr_snprintf(char * __hwloc_restrict string, size_t size, hwloc_obj_t obj, const char * separator, int verbose) -{ - const char *prefix = ""; - char *tmp = string; - ssize_t tmplen = size; - int ret = 0; - int res; - - /* make sure we output at least an empty string */ - if (size) - *string = '\0'; - - /* print memory attributes */ - res = 0; - if (verbose) { - if (obj->memory.local_memory) - res = hwloc_snprintf(tmp, tmplen, "%slocal=%lu%s%stotal=%lu%s", - prefix, - (unsigned long) hwloc_memory_size_printf_value(obj->memory.local_memory, verbose), - hwloc_memory_size_printf_unit(obj->memory.total_memory, verbose), - separator, - (unsigned long) hwloc_memory_size_printf_value(obj->memory.total_memory, verbose), - hwloc_memory_size_printf_unit(obj->memory.local_memory, verbose)); - else if (obj->memory.total_memory) - res = hwloc_snprintf(tmp, tmplen, "%stotal=%lu%s", - prefix, - (unsigned long) hwloc_memory_size_printf_value(obj->memory.total_memory, verbose), - hwloc_memory_size_printf_unit(obj->memory.total_memory, verbose)); - } else { - if (obj->memory.local_memory) - res = hwloc_snprintf(tmp, tmplen, "%s%lu%s", - prefix, - (unsigned long) hwloc_memory_size_printf_value(obj->memory.local_memory, verbose), - hwloc_memory_size_printf_unit(obj->memory.local_memory, verbose)); - } - if (res < 0) - return -1; - ret += res; - if (ret > 0) - prefix = separator; - if (res >= tmplen) - res = tmplen>0 ? (int)tmplen - 1 : 0; - tmp += res; - tmplen -= res; - - /* printf type-specific attributes */ - res = 0; - switch (obj->type) { - case HWLOC_OBJ_CACHE: - if (verbose) { - char assoc[32]; - if (obj->attr->cache.associativity == -1) - snprintf(assoc, sizeof(assoc), "%sfully-associative", separator); - else if (obj->attr->cache.associativity == 0) - *assoc = '\0'; - else - snprintf(assoc, sizeof(assoc), "%sways=%d", separator, obj->attr->cache.associativity); - res = hwloc_snprintf(tmp, tmplen, "%ssize=%lu%s%slinesize=%u%s", - prefix, - (unsigned long) hwloc_memory_size_printf_value(obj->attr->cache.size, verbose), - hwloc_memory_size_printf_unit(obj->attr->cache.size, verbose), - separator, obj->attr->cache.linesize, - assoc); - } else - res = hwloc_snprintf(tmp, tmplen, "%s%lu%s", - prefix, - (unsigned long) hwloc_memory_size_printf_value(obj->attr->cache.size, verbose), - hwloc_memory_size_printf_unit(obj->attr->cache.size, verbose)); - break; - case HWLOC_OBJ_BRIDGE: - if (verbose) { - char up[128], down[64]; - /* upstream is PCI or HOST */ - if (obj->attr->bridge.upstream_type == HWLOC_OBJ_BRIDGE_PCI) { - char linkspeed[64]= ""; - if (obj->attr->pcidev.linkspeed) - snprintf(linkspeed, sizeof(linkspeed), "%slink=%.2fGB/s", separator, obj->attr->pcidev.linkspeed); - snprintf(up, sizeof(up), "busid=%04x:%02x:%02x.%01x%sid=%04x:%04x%sclass=%04x(%s)%s", - obj->attr->pcidev.domain, obj->attr->pcidev.bus, obj->attr->pcidev.dev, obj->attr->pcidev.func, separator, - obj->attr->pcidev.vendor_id, obj->attr->pcidev.device_id, separator, - obj->attr->pcidev.class_id, hwloc_pci_class_string(obj->attr->pcidev.class_id), linkspeed); - } else - *up = '\0'; - /* downstream is_PCI */ - snprintf(down, sizeof(down), "buses=%04x:[%02x-%02x]", - obj->attr->bridge.downstream.pci.domain, obj->attr->bridge.downstream.pci.secondary_bus, obj->attr->bridge.downstream.pci.subordinate_bus); - if (*up) - res = snprintf(string, size, "%s%s%s", up, separator, down); - else - res = snprintf(string, size, "%s", down); - } - break; - case HWLOC_OBJ_PCI_DEVICE: - if (verbose) { - char linkspeed[64]= ""; - char busid[16] = "[collapsed]"; - if (obj->attr->pcidev.linkspeed) - snprintf(linkspeed, sizeof(linkspeed), "%slink=%.2fGB/s", separator, obj->attr->pcidev.linkspeed); - if (!hwloc_obj_get_info_by_name(obj, "lstopoCollapse")) - snprintf(busid, sizeof(busid), "%04x:%02x:%02x.%01x", - obj->attr->pcidev.domain, obj->attr->pcidev.bus, obj->attr->pcidev.dev, obj->attr->pcidev.func); - res = snprintf(string, size, "busid=%s%sclass=%04x(%s)%s", - busid, separator, - obj->attr->pcidev.class_id, hwloc_pci_class_string(obj->attr->pcidev.class_id), linkspeed); - } - break; - default: - break; - } - if (res < 0) - return -1; - ret += res; - if (ret > 0) - prefix = separator; - if (res >= tmplen) - res = tmplen>0 ? (int)tmplen - 1 : 0; - tmp += res; - tmplen -= res; - - /* printf infos */ - if (verbose) { - unsigned i; - for(i=0; iinfos_count; i++) { - if (!strcmp(obj->infos[i].name, "lstopoCollapse")) - continue; - if (strchr(obj->infos[i].value, ' ')) - res = hwloc_snprintf(tmp, tmplen, "%s%s=\"%s\"", - prefix, - obj->infos[i].name, obj->infos[i].value); - else - res = hwloc_snprintf(tmp, tmplen, "%s%s=%s", - prefix, - obj->infos[i].name, obj->infos[i].value); - if (res < 0) - return -1; - ret += res; - if (res >= tmplen) - res = tmplen>0 ? (int)tmplen - 1 : 0; - tmp += res; - tmplen -= res; - if (ret > 0) - prefix = separator; - } - } - - return ret; -} - - -int -hwloc_obj_snprintf(char *string, size_t size, - struct hwloc_topology *topology __hwloc_attribute_unused, struct hwloc_obj *l, const char *_indexprefix, int verbose) -{ - const char *indexprefix = _indexprefix ? _indexprefix : "#"; - char os_index[12] = ""; - char type[64]; - char attr[128]; - int attrlen; - - if (l->os_index != (unsigned) -1) { - hwloc_snprintf(os_index, 12, "%s%u", indexprefix, l->os_index); - } - - hwloc_obj_type_snprintf(type, sizeof(type), l, verbose); - attrlen = hwloc_obj_attr_snprintf(attr, sizeof(attr), l, " ", verbose); - - if (attrlen > 0) - return hwloc_snprintf(string, size, "%s%s(%s)", type, os_index, attr); - else - return hwloc_snprintf(string, size, "%s%s", type, os_index); -} - -int hwloc_obj_cpuset_snprintf(char *str, size_t size, size_t nobj, struct hwloc_obj * const *objs) -{ - hwloc_bitmap_t set = hwloc_bitmap_alloc(); - int res; - unsigned i; - - hwloc_bitmap_zero(set); - for(i=0; icpuset) - hwloc_bitmap_or(set, set, objs[i]->cpuset); - - res = hwloc_bitmap_snprintf(str, size, set); - hwloc_bitmap_free(set); - return res; -} diff --git a/opal/mca/hwloc/hwloc1113/hwloc1113.h b/opal/mca/hwloc/hwloc1113/hwloc1113.h deleted file mode 100644 index 94a4ae98622..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc1113.h +++ /dev/null @@ -1,48 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2011-2017 Cisco Systems, Inc. All rights reserved - * Copyright (c) 2014-2015 Intel, Inc. All rights reserved. - * Copyright (c) 2016 Los Alamos National Security, LLC. All rights - * reserved. - * - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - * - * When this component is used, this file is included in the rest of - * the OPAL/ORTE/OMPI code base via opal/mca/hwloc/hwloc-internal.h. As such, - * this header represents the public interface to this static component. - */ - -#ifndef MCA_OPAL_HWLOC_HWLOC1113_H -#define MCA_OPAL_HWLOC_HWLOC1113_H - -BEGIN_C_DECLS - -#include "hwloc/include/hwloc.h" - -/* If the including file requested it, also include the hwloc verbs - helper file. We can't just always include this file (even if we - know we have ) because there are some inline - functions in that file that invoke ibv_* functions. Some linkers - (e.g., Solaris Studio Compilers) will instantiate those static - inline functions even if we don't use them, and therefore we need - to be able to resolve the ibv_* symbols at link time. - - Since -libverbs is only specified in places where we use other - ibv_* functions (e.g., the OpenFabrics-based BTLs), that means that - linking random executables can/will fail (e.g., orterun). - */ -#if defined(OPAL_HWLOC_WANT_VERBS_HELPER) && OPAL_HWLOC_WANT_VERBS_HELPER -# if defined(HAVE_INFINIBAND_VERBS_H) -# include "hwloc/include/hwloc/openfabrics-verbs.h" -# else -# error Tried to include hwloc verbs helper file, but hwloc was compiled with no OpenFabrics support -# endif -#endif - -END_C_DECLS - -#endif /* MCA_OPAL_HWLOC_HWLOC1113_H */ diff --git a/opal/mca/hwloc/hwloc1113/hwloc1113_component.c b/opal/mca/hwloc/hwloc1113/hwloc1113_component.c deleted file mode 100644 index 759642975f0..00000000000 --- a/opal/mca/hwloc/hwloc1113/hwloc1113_component.c +++ /dev/null @@ -1,55 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2011-2017 Cisco Systems, Inc. All rights reserved - * Copyright (c) 2014-2015 Intel, Inc. All rights reserved. - * Copyright (c) 2015-2016 Los Alamos National Security, LLC. All rights - * reserved. - * - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - * - * These symbols are in a file by themselves to provide nice linker - * semantics. Since linkers generally pull in symbols by object - * files, keeping these symbols as the only symbols in this file - * prevents utility programs such as "ompi_info" from having to import - * entire components just to query their version and parameters. - */ - -#include "opal_config.h" -#include "opal/constants.h" - -#include "opal/mca/hwloc/hwloc-internal.h" -#include "hwloc1113.h" - -/* - * Public string showing the sysinfo ompi_linux component version number - */ -const char *opal_hwloc_hwloc1113_component_version_string = - "OPAL hwloc1113 hwloc MCA component version " OPAL_VERSION; - -/* - * Instantiate the public struct with all of our public information - * and pointers to our public functions in it - */ - -const opal_hwloc_component_t mca_hwloc_hwloc1113_component = { - - /* First, the mca_component_t struct containing meta information - about the component itself */ - - .base_version = { - OPAL_HWLOC_BASE_VERSION_2_0_0, - - /* Component name and version */ - .mca_component_name = "hwloc1113", - MCA_BASE_MAKE_VERSION(component, OPAL_MAJOR_VERSION, OPAL_MINOR_VERSION, - OPAL_RELEASE_VERSION), - }, - .base_data = { - /* The component is checkpoint ready */ - MCA_BASE_METADATA_PARAM_CHECKPOINT - } -}; diff --git a/opal/mca/hwloc/hwloc2a/Makefile.am b/opal/mca/hwloc/hwloc2a/Makefile.am new file mode 100644 index 00000000000..49cc5325dab --- /dev/null +++ b/opal/mca/hwloc/hwloc2a/Makefile.am @@ -0,0 +1,96 @@ +# +# Copyright (c) 2011-2016 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2014-2017 Intel, Inc. All rights reserved. +# Copyright (c) 2016 Los Alamos National Security, LLC. All rights +# reserved. +# Copyright (c) 2017 Research Organization for Information Science +# and Technology (RIST). All rights reserved. +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +# Due to what might be a bug in Automake, we need to remove stamp-h? +# files manually. See +# http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19418. +DISTCLEANFILES = \ + hwloc/include/hwloc/autogen/stamp-h? \ + hwloc/include/private/autogen/stamp-h? + +# Need to include these files so that these directories are carried in +# the tarball (in case someone invokes autogen.sh on a dist tarball). +EXTRA_DIST = \ + hwloc/doc/README.txt \ + hwloc/contrib/systemd/README.txt \ + hwloc/tests/README.txt \ + hwloc/utils/README.txt + +SUBDIRS = hwloc + +# Headers and sources +headers = hwloc2a.h +sources = hwloc2a_component.c + +# We only ever build this component statically +noinst_LTLIBRARIES = libmca_hwloc_hwloc2a.la +libmca_hwloc_hwloc2a_la_SOURCES = $(headers) $(sources) +nodist_libmca_hwloc_hwloc2a_la_SOURCES = $(nodist_headers) +libmca_hwloc_hwloc2a_la_LDFLAGS = -module -avoid-version $(opal_hwloc_hwloc2a_LDFLAGS) +libmca_hwloc_hwloc2a_la_LIBADD = $(opal_hwloc_hwloc2a_LIBS) +libmca_hwloc_hwloc2a_la_DEPENDENCIES = \ + $(HWLOC_top_builddir)/hwloc/libhwloc_embedded.la + +# Since the rest of the code base includes the underlying hwloc.h, we +# also have to install the underlying header files when +# --with-devel-headers is specified. hwloc doesn't support this; the +# least gross way to make this happen is just to list all of hwloc's +# header files here. :-( +headers += \ + hwloc/include/hwloc.h \ + hwloc/include/hwloc/bitmap.h \ + hwloc/include/hwloc/cuda.h \ + hwloc/include/hwloc/cudart.h \ + hwloc/include/hwloc/deprecated.h \ + hwloc/include/hwloc/diff.h \ + hwloc/include/hwloc/distances.h \ + hwloc/include/hwloc/export.h \ + hwloc/include/hwloc/gl.h \ + hwloc/include/hwloc/helper.h \ + hwloc/include/hwloc/inlines.h \ + hwloc/include/hwloc/intel-mic.h \ + hwloc/include/hwloc/myriexpress.h \ + hwloc/include/hwloc/nvml.h \ + hwloc/include/hwloc/opencl.h \ + hwloc/include/hwloc/openfabrics-verbs.h \ + hwloc/include/hwloc/plugins.h \ + hwloc/include/hwloc/rename.h \ + hwloc/include/hwloc/shmem.h \ + hwloc/include/private/private.h \ + hwloc/include/private/debug.h \ + hwloc/include/private/misc.h \ + hwloc/include/private/cpuid-x86.h +nodist_headers = hwloc/include/hwloc/autogen/config.h + +if HWLOC_HAVE_LINUX +headers += \ + hwloc/include/hwloc/linux.h \ + hwloc/include/hwloc/linux-libnuma.h +endif HWLOC_HAVE_LINUX + +if HWLOC_HAVE_SOLARIS +headers += \ + hwloc/include/private/solaris-chiptype.h +endif HWLOC_HAVE_SOLARIS + +if HWLOC_HAVE_SCHED_SETAFFINITY +headers += hwloc/include/hwloc/glibc-sched.h +endif HWLOC_HAVE_SCHED_SETAFFINITY + +# Conditionally install the header files +if WANT_INSTALL_HEADERS +opaldir = $(opalincludedir)/$(subdir) +nobase_opal_HEADERS = $(headers) +nobase_nodist_opal_HEADERS = $(nodist_headers) +endif diff --git a/opal/mca/hwloc/hwloc2a/README-ompi.txt b/opal/mca/hwloc/hwloc2a/README-ompi.txt new file mode 100644 index 00000000000..a6acc981c7e --- /dev/null +++ b/opal/mca/hwloc/hwloc2a/README-ompi.txt @@ -0,0 +1 @@ +Cherry-picked commits after 2.0.0: diff --git a/opal/mca/hwloc/hwloc2a/configure.m4 b/opal/mca/hwloc/hwloc2a/configure.m4 new file mode 100644 index 00000000000..f5c8db30108 --- /dev/null +++ b/opal/mca/hwloc/hwloc2a/configure.m4 @@ -0,0 +1,210 @@ +# -*- shell-script -*- +# +# Copyright (c) 2009-2017 Cisco Systems, Inc. All rights reserved +# Copyright (c) 2014-2017 Intel, Inc. All rights reserved. +# Copyright (c) 2015-2017 Research Organization for Information Science +# and Technology (RIST). All rights reserved. +# Copyright (c) 2016 Los Alamos National Security, LLC. All rights +# reserved. +# +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +# +# Priority +# +AC_DEFUN([MCA_opal_hwloc_hwloc2a_PRIORITY], [90]) + +# +# Force this component to compile in static-only mode +# +AC_DEFUN([MCA_opal_hwloc_hwloc2a_COMPILE_MODE], [ + AC_MSG_CHECKING([for MCA component $2:$3 compile mode]) + $4="static" + AC_MSG_RESULT([$$4]) +]) + +# Include hwloc m4 files +m4_include(opal/mca/hwloc/hwloc2a/hwloc/config/hwloc.m4) +m4_include(opal/mca/hwloc/hwloc2a/hwloc/config/hwloc_pkg.m4) +m4_include(opal/mca/hwloc/hwloc2a/hwloc/config/hwloc_check_attributes.m4) +m4_include(opal/mca/hwloc/hwloc2a/hwloc/config/hwloc_check_visibility.m4) +m4_include(opal/mca/hwloc/hwloc2a/hwloc/config/hwloc_check_vendor.m4) +m4_include(opal/mca/hwloc/hwloc2a/hwloc/config/hwloc_components.m4) +m4_include(opal/mca/hwloc/hwloc2a/hwloc/config/hwloc_internal.m4) +m4_include(opal/mca/hwloc/hwloc2a/hwloc/config/netloc.m4) + +# MCA_hwloc_hwloc2a_POST_CONFIG() +# --------------------------------- +AC_DEFUN([MCA_opal_hwloc_hwloc2a_POST_CONFIG],[ + OPAL_VAR_SCOPE_PUSH([opal_hwloc_hwloc2a_basedir]) + + # If we won, then do all the rest of the setup + AS_IF([test "$1" = "1" && test "$opal_hwloc_hwloc2a_support" = "yes"], + [ + # Set this variable so that the framework m4 knows what + # file to include in opal/mca/hwloc/hwloc-internal.h + opal_hwloc_hwloc2a_basedir=opal/mca/hwloc/hwloc2a + opal_hwloc_base_include="$opal_hwloc_hwloc2a_basedir/hwloc2a.h" + + # Add some stuff to CPPFLAGS so that the rest of the source + # tree can be built + file=$opal_hwloc_hwloc2a_basedir/hwloc + CPPFLAGS="-I$OPAL_TOP_SRCDIR/$file/include $CPPFLAGS" + AS_IF([test "$OPAL_TOP_BUILDDIR" != "$OPAL_TOP_SRCDIR"], + [CPPFLAGS="-I$OPAL_TOP_BUILDDIR/$file/include $CPPFLAGS"]) + unset file + ]) + OPAL_VAR_SCOPE_POP + + # This must be run unconditionally + HWLOC_DO_AM_CONDITIONALS +])dnl + + +# MCA_hwloc_hwloc2a_CONFIG([action-if-found], [action-if-not-found]) +# -------------------------------------------------------------------- +AC_DEFUN([MCA_opal_hwloc_hwloc2a_CONFIG],[ + # Hwloc needs to know if we have Verbs support + AC_REQUIRE([OPAL_CHECK_VERBS_DIR]) + + AC_CONFIG_FILES([opal/mca/hwloc/hwloc2a/Makefile]) + + OPAL_VAR_SCOPE_PUSH([HWLOC_VERSION opal_hwloc_hwloc2a_save_CPPFLAGS opal_hwloc_hwloc2a_save_LDFLAGS opal_hwloc_hwloc2a_save_LIBS opal_hwloc_hwloc2a_save_cairo opal_hwloc_hwloc2a_save_xml opal_hwloc_hwloc2a_save_mode opal_hwloc_hwloc2a_basedir opal_hwloc_hwloc2a_file opal_hwloc_hwloc2a_save_cflags CPPFLAGS_save LIBS_save opal_hwloc_external]) + + # default to this component not providing support + opal_hwloc_hwloc2a_basedir=opal/mca/hwloc/hwloc2a + opal_hwloc_hwloc2a_support=no + + AS_IF([test "$with_hwloc" = "internal" || test -z "$with_hwloc" || test "$with_hwloc" = "yes"], + [opal_hwloc_external="no"], + [opal_hwloc_external="yes"]) + + opal_hwloc_hwloc2a_save_CPPFLAGS=$CPPFLAGS + opal_hwloc_hwloc2a_save_LDFLAGS=$LDFLAGS + opal_hwloc_hwloc2a_save_LIBS=$LIBS + + # Run the hwloc configuration - if no external hwloc, then set the prefixi + # to minimize the chance that someone will use the internal symbols + AS_IF([test "$opal_hwloc_external" = "no" && + test "$with_hwloc" != "future"], + [HWLOC_SET_SYMBOL_PREFIX([opal_hwloc2a_])]) + + # save XML or graphical options + opal_hwloc_hwloc2a_save_cairo=$enable_cairo + opal_hwloc_hwloc2a_save_xml=$enable_xml + opal_hwloc_hwloc2a_save_static=$enable_static + opal_hwloc_hwloc2a_save_shared=$enable_shared + opal_hwloc_hwloc2a_save_plugins=$enable_plugins + opal_hwloc_hwloc2a_save_mode=$hwloc_mode + + # never enable hwloc's graphical option + enable_cairo=no + + # never enable hwloc's plugin system + enable_plugins=no + enable_static=yes + enable_shared=no + + # Override -- disable hwloc's libxml2 support, but enable the + # native hwloc XML support + enable_libxml2=no + enable_xml=yes + + # ensure we are in "embedded" mode + hwloc_mode=embedded + + # GL and OpenCL OS devices aren't used in OMPI + enable_gl=no + enable_opencl=no + + # Per https://github.com/open-mpi/ompi/pull/4257, ALWAYS + # disable cuda support + enable_cuda=no + + # Open MPI currently does not use hwloc's NVML support + enable_nvml=no + + # hwloc checks for compiler visibility, and its needs to do + # this without "picky" flags. + opal_hwloc_hwloc2a_save_cflags=$CFLAGS + CFLAGS=$OPAL_CFLAGS_BEFORE_PICKY + AS_IF([test -n "$opal_datatype_cuda_CPPFLAGS"], + [CPPFLAGS="$CPPFLAGS $opal_datatype_cuda_CPPFLAGS"]) + + HWLOC_SETUP_CORE([opal/mca/hwloc/hwloc2a/hwloc], + [AC_MSG_CHECKING([whether hwloc configure succeeded]) + AC_MSG_RESULT([yes]) + HWLOC_VERSION="internal v`$srcdir/$opal_hwloc_hwloc2a_basedir/hwloc/config/hwloc_get_version.sh $srcdir/$opal_hwloc_hwloc2a_basedir/hwloc/VERSION`" + + # Build flags for our Makefile.am + opal_hwloc_hwloc2a_LDFLAGS='$(HWLOC_EMBEDDED_LDFLAGS)' + opal_hwloc_hwloc2a_LIBS='$(OPAL_TOP_BUILDDIR)/'"$opal_hwloc_hwloc2a_basedir"'/hwloc/hwloc/libhwloc_embedded.la $(HWLOC_EMBEDDED_LIBS)' + opal_hwloc_hwloc2a_support=yes + + AC_DEFINE_UNQUOTED([HWLOC_HWLOC2a_HWLOC_VERSION], + ["$HWLOC_VERSION"], + [Version of hwloc]) + + # Do we have verbs support? + CPPFLAGS_save=$CPPFLAGS + AS_IF([test "$opal_want_verbs" = "yes"], + [CPPFLAGS="-I$opal_verbs_dir/include $CPPFLAGS"]) + AC_CHECK_HEADERS([infiniband/verbs.h]) + CPPFLAGS=$CPPFLAGS_save + ], + [AC_MSG_CHECKING([whether hwloc configure succeeded]) + AC_MSG_RESULT([no]) + opal_hwloc_hwloc2a_support=no]) + CFLAGS=$opal_hwloc_hwloc2a_save_cflags + + # Restore some env variables, if necessary + AS_IF([test -n "$opal_hwloc_hwloc2a_save_cairo"], + [enable_cairo=$opal_hwloc_hwloc2a_save_cairo]) + AS_IF([test -n "$opal_hwloc_hwloc2a_save_xml"], + [enable_xml=$opal_hwloc_hwloc2a_save_xml]) + AS_IF([test -n "$opal_hwloc_hwloc2a_save_static"], + [enable_static=$opal_hwloc_hwloc2a_save_static]) + AS_IF([test -n "$opal_hwloc_hwloc2a_save_shared"], + [enable_shared=$opal_hwloc_hwloc2a_save_shared]) + AS_IF([test -n "$opal_hwloc_hwloc2a_save_plugins"], + [enable_plugins=$opal_hwloc_hwloc2a_save_shared]) + + CPPFLAGS=$opal_hwloc_hwloc2a_save_CPPFLAGS + LDFLAGS=$opal_hwloc_hwloc2a_save_LDFLAGS + LIBS=$opal_hwloc_hwloc2a_save_LIBS + + AC_SUBST([opal_hwloc_hwloc2a_CFLAGS]) + AC_SUBST([opal_hwloc_hwloc2a_CPPFLAGS]) + AC_SUBST([opal_hwloc_hwloc2a_LDFLAGS]) + AC_SUBST([opal_hwloc_hwloc2a_LIBS]) + + # Finally, add some flags to the wrapper compiler so that our + # headers can be found. + hwloc_hwloc2a_WRAPPER_EXTRA_LDFLAGS="$HWLOC_EMBEDDED_LDFLAGS" + hwloc_hwloc2a_WRAPPER_EXTRA_LIBS="$HWLOC_EMBEDDED_LIBS" + hwloc_hwloc2a_WRAPPER_EXTRA_CPPFLAGS='-I${pkgincludedir}/'"$opal_hwloc_hwloc2a_basedir/hwloc/include" + + # If we are not building the internal hwloc, then indicate that + # this component should not be built. NOTE: we still did all the + # above configury so that all the proper GNU Autotools + # infrastructure is setup properly (e.g., w.r.t. SUBDIRS=hwloc in + # this directory's Makefile.am, we still need the Autotools "make + # distclean" infrastructure to work properly). + AS_IF([test "$opal_hwloc_external" = "yes"], + [AC_MSG_WARN([using an external hwloc; disqualifying this component]) + opal_hwloc_hwloc2a_support=no], + [AC_DEFINE([HAVE_DECL_HWLOC_OBJ_OSDEV_COPROC], [1]) + AC_DEFINE([HAVE_HWLOC_TOPOLOGY_DUP], [1])]) + + # Done! + AS_IF([test "$opal_hwloc_hwloc2a_support" = "yes"], + [$1], + [$2]) + + OPAL_VAR_SCOPE_POP +])dnl diff --git a/opal/mca/hwloc/hwloc2a/hwloc/AUTHORS b/opal/mca/hwloc/hwloc2a/hwloc/AUTHORS new file mode 100644 index 00000000000..740de337b20 --- /dev/null +++ b/opal/mca/hwloc/hwloc2a/hwloc/AUTHORS @@ -0,0 +1,29 @@ +netloc Authors +============== + +The following cumulative list contains the names of most individuals who +have committed code to the hwloc repository. + +Name Affiliation(s) +--------------------------- -------------------- +Cédric Augonnet University of Bordeaux +Guillaume Beauchamp Inria +Ahmad Boissetri Binzagr Inria +Cyril Bordage Inria +Nicholas Buroker UWL +Jérôme Clet-Ortega University of Bordeaux +Ludovic Courtès Inria +Nathalie Furmento CNRS +Brice Goglin Inria +Joshua Hursey UWL +Alexey Kardashevskiy IBM +Douglas MacFarland UWL +Antoine Rougier intern from University of Bordeaux +Jeff Squyres Cisco +Samuel Thibault University of Bordeaux + +Affiliaion abbreviations: +------------------------- +Cisco = Cisco Systems, Inc. +CNRS = Centre national de la recherche scientifique (France) +UWL = University of Wisconsin-La Crosse diff --git a/opal/mca/hwloc/hwloc1113/hwloc/COPYING b/opal/mca/hwloc/hwloc2a/hwloc/COPYING similarity index 96% rename from opal/mca/hwloc/hwloc1113/hwloc/COPYING rename to opal/mca/hwloc/hwloc2a/hwloc/COPYING index 485798f7052..e77516e1801 100644 --- a/opal/mca/hwloc/hwloc1113/hwloc/COPYING +++ b/opal/mca/hwloc/hwloc2a/hwloc/COPYING @@ -11,6 +11,7 @@ Copyright © 2010 IBM Copyright © 2010 Jirka Hladky Copyright © 2012 Aleksej Saushev, The NetBSD Foundation Copyright © 2012 Blue Brain Project, EPFL. All rights reserved. +Copyright © 2013-2014 University of Wisconsin-La Crosse. All rights reserved. Copyright © 2015 Research Organization for Information Science and Technology (RIST). All rights reserved. Copyright © 2015-2016 Intel, Inc. All rights reserved. See COPYING in top-level directory. diff --git a/opal/mca/hwloc/hwloc2a/hwloc/Makefile.am b/opal/mca/hwloc/hwloc2a/hwloc/Makefile.am new file mode 100644 index 00000000000..3ad8113959a --- /dev/null +++ b/opal/mca/hwloc/hwloc2a/hwloc/Makefile.am @@ -0,0 +1,23 @@ +# Copyright © 2009-2016 Inria. All rights reserved. +# Copyright © 2009 Université Bordeaux +# Copyright © 2009-2014 Cisco Systems, Inc. All rights reserved. +# See COPYING in top-level directory. + +# Note that the -I directory must *exactly* match what was specified +# via AC_CONFIG_MACRO_DIR in configure.ac. +ACLOCAL_AMFLAGS = -I ./config + +# +# "make distcheck" requires that tarballs are able to be able to "make +# dist", so we have to include config/distscript.sh. +# +EXTRA_DIST = \ + README VERSION COPYING AUTHORS \ + config/hwloc_get_version.sh \ + config/distscript.sh + +SUBDIRS = include hwloc + +# Do not let automake automatically add the non-standalone dirs to the +# distribution tarball if we're building in embedded mode. +DIST_SUBDIRS = $(SUBDIRS) diff --git a/opal/mca/hwloc/hwloc2a/hwloc/NEWS b/opal/mca/hwloc/hwloc2a/hwloc/NEWS new file mode 100644 index 00000000000..772e42ae5dc --- /dev/null +++ b/opal/mca/hwloc/hwloc2a/hwloc/NEWS @@ -0,0 +1,1484 @@ +Copyright © 2009 CNRS +Copyright © 2009-2017 Inria. All rights reserved. +Copyright © 2009-2013 Université Bordeaux +Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. + +$COPYRIGHT$ + +Additional copyrights may follow + +$HEADER$ + +=========================================================================== + +This file contains the main features as well as overviews of specific +bug fixes (and other actions) for each version of hwloc since version +0.9 (as initially released as "libtopology", then re-branded to "hwloc" +in v0.9.1). + + +Version 2.0.0 +------------- +* The ABI of the library has changed. For instance some hwloc_obj fields + were reordered. + - HWLOC_API_VERSION and hwloc_get_api_version() now give 0x00020000. + - See "How do I handle ABI breaks and API upgrades ?" in the FAQ + and https://github.com/open-mpi/hwloc/wiki/Upgrading-to-v2.0-API +* Major changes + + Topologies always have at least one NUMA object. On non-NUMA machines, + a single NUMA object is added to describe the entire machine memory. + The NUMA level cannot be ignored anymore. + + The HWLOC_OBJ_CACHE type is replaced with 8 types HWLOC_OBJ_L[1-5]CACHE + and HWLOC_OBJ_L[1-3]ICACHE that remove the need to disambiguate levels + when looking for caches with _by_type() functions. + - New hwloc_obj_type_is_{,d,i}cache() functions may be used to check whether + a given type is a cache. + + Replace hwloc_topology_ignore*() functions with hwloc_topology_set_type_filter() + and hwloc_topology_set_all_types_filter(). + - Contrary to hwloc_topology_ignore_{type,all}_keep_structure() which + removed individual objects, HWLOC_TYPE_FILTER_KEEP_STRUCTURE only removes + entire levels (so that topology do not become too asymmetric). + + Remove HWLOC_TOPOLOGY_FLAG_ICACHES in favor of hwloc_topology_set_icache_types_filter() + with HWLOC_TYPE_FILTER_KEEP_ALL. + + Remove HWLOC_TOPOLOGY_FLAG_IO_DEVICES, _IO_BRIDGES and _WHOLE_IO in favor of + hwloc_topology_set_io_types_filter() with HWLOC_TYPE_FILTER_KEEP_ALL or + HWLOC_TYPE_FILTER_KEEP_IMPORTANT. + + hwloc_topology_restrict() doesn't remove objects that contain memory + by default anymore. + - The list of existing restrict flags was modified. + + XML export functions take an additional flags argument, + for instance for exporting XMLs that are compatible with hwloc 1.x. + + The distance API has been completely reworked. It is now described + in hwloc/distances.h. + + Add hwloc/shmem.h for sharing topologies between processes running on + the same machine (for reducing the memory footprint). + + Add the experimental netloc subproject. It is enabled by default when + supported and can be disabled with --disable-netloc. + It currently brings command-line tools to gather and visualize the + topology of InfiniBand fabrics, and an API to convert such topologies + into Scotch architectures for process mapping. + See the documentation for details. + + Remove the online_cpuset from struct hwloc_obj. Offline PUs get unknown + topologies on Linux nowadays, and wrong topology on Solaris. Other OS + do not support them. And one cannot do much about them anyway. Just keep + them in complete_cpuset. + + Remove the custom interface for assembling the topologies of different + nodes as well as the hwloc-assembler tools. + + Remove Kerrighed support from the Linux backend. + + Remove Tru64 (OSF/1) support. + - Remove HWLOC_MEMBIND_REPLICATE which wasn't available anywhere else. +* API + + Objects now have a "subtype" field that supersedes former "Type" and + "CoProcType" info attributes. + + The almost-unused "os_level" attribute has been removed from the + hwloc_obj structure. + + I/O and Misc objects are now stored in a dedicated children list, only + normal children with non-NULL cpusets and nodesets are in the main + children list. + - hwloc_get_next_child() may still be used to iterate over these 3 lists + of children at once. + + Replace hwloc_topology_insert_misc_object_by_cpuset() with + hwloc_topology_insert_group_object() to precisely specify the location + of an additional hierarchy level in the topology. + + Misc objects have their own level and depth to iterate over all of them. + + Misc objects may now only be inserted as a leaf object with + hwloc_topology_insert_misc_object() which deprecates + hwloc_topology_insert_misc_object_by_parent(). + + hwloc_topology_set_fsroot() is removed, the environment variable + HWLOC_FSROOT may be used for the same remote testing/debugging purpose. + + hwloc_type_sscanf() deprecates the old hwloc_obj_type_sscanf(). + + hwloc_type_sscanf_as_depth() is added to convert a type name into + a level depth. + + hwloc_type_name() deprecates the old hwloc_obj_type_string(). + + Remove the deprecated hwloc_obj_snprintf(), hwloc_obj_type_of_string(), + hwloc_distribute[v](). + + hwloc_obj_cpuset_snprintf() is deprecated in favor of hwloc_bitmap_snprintf(). + + Functions diff_load_xml*(), diff_export_xml*() and diff_destroy() in + hwloc/diff.h do not need a topology as first parameter anymore. + + hwloc_parse_cpumap_file () superseded by hwloc_linux_read_path_as_cpumask() + in hwloc/linux.h. +* Tools + - lstopo and hwloc-info have a new --filter option matching the new filtering API. + - hwloc-distances was removed and replaced with lstopo --distances. +* Plugin API + + hwloc_fill_object_sets() is renamed into hwloc_obj_add_children_sets(). +* Misc + + Linux OS devices do not have to be attached through PCI anymore, + for instance enabling the discovery of NVDIMM block devices. + + Add a SectorSize attribute to block OS devices on Linux. + + Misc MemoryModule objects are only added when full I/O discovery is enabled + (WHOLE_IO topology flag). + + Do not set PCI devices and bridges name automatically. Vendor and device + names are already in info attributes. + + Exporting to synthetic now ignores I/O and Misc objects. + + XML and Synthetic export functions have moved to hwloc/export.h, + automatically included from hwloc.h. + + Separate OS device discovery from PCI discovery. Only the latter is disabled + with --disable-pci at configure time. Both may be disabled with --disable-io. + + The old `libpci' component name from hwloc 1.6 is not supported anymore, + only the `pci' name from hwloc 1.7 is now recognized. + + The `linuxpci' component is now renamed into `linuxio'. + + The HWLOC_PCI___LOCALCPUS environment variables are superseded + with a single HWLOC_PCI_LOCALITY where bus ranges may be specified. + + Add HWLOC_SYNTHETIC environment variable to enforce a synthetic topology + as if hwloc_topology_set_synthetic() had been called. + + HWLOC_COMPONENTS doesn't support xml or synthetic component attributes + anymore, they should be passed in HWLOC_XMLFILE or HWLOC_SYNTHETIC instead. + + HWLOC_COMPONENTS takes precedence over other environment variables + for selecting components. + + Remove the dependency on libnuma on Linux. + + +Version 1.11.7 +-------------- +* Fix hwloc-bind --membind for CPU-less NUMA nodes (again). + Thanks to Gilles Gouaillardet for reporting the issue. +* Fix a memory leak on IBM S/390 platforms running Linux. +* Fix a memory leak when forcing the x86 backend first on amd64/topoext + platforms running Linux. +* Command-line tools now support "hbm" instead "numanode" for filtering + only high-bandwidth memory nodes when selecting locations. + + hwloc-bind also support --hbm and --no-hbm for filtering only or + no HBM nodes. + Thanks to Nicolas Denoyelle for the suggestion. +* Add --children and --descendants to hwloc-info for listing object + children or object descendants of a specific type. +* Add --no-index, --index, --no-attrs, --attrs to disable/enable display + of index numbers or attributes in the graphical lstopo output. +* Try to gather hwloc-dump-hwdata output from all possible locations + in hwloc-gather-topology. +* Updates to the documentation of locations in hwloc(7) and + command-line tools manpages. + + +Version 1.11.6 +-------------- +* Make the Linux discovery about twice faster, especially on the CPU side, + by trying to avoid sysfs file accesses as much as possible. +* Add support for AMD Family 17h processors (Zen) SMT cores in the Linux + and x86 backends. +* Add the HWLOC_TOPOLOGY_FLAG_THISSYSTEM_ALLOWED_RESOURCES flag (and the + HWLOC_THISSYSTEM_ALLOWED_RESOURCES environment variable) for reading the + set of allowed resources from the local operating system even if the + topology was loaded from XML or synthetic. +* Fix hwloc_bitmap_set/clr_range() for infinite ranges that do not + overlap currently defined ranges in the bitmap. +* Don't reset the lstopo zoom scale when moving the X11 window. +* lstopo now has --flags for manually setting topology flags. +* hwloc_get_depth_type() returns HWLOC_TYPE_DEPTH_UNKNOWN for Misc objects. + + +Version 1.11.5 +-------------- +* Add support for Knights Mill Xeon Phi, thanks to Piotr Luc for the patch. +* Reenable distance gathering on Solaris, disabled by mistake since v1.0. + Thanks to TU Wien for the help. +* Fix hwloc_get_*obj*_inside_cpuset() functions to ignore objects with + empty CPU sets, for instance, CPU-less NUMA nodes such as KNL MCDRAM. + Thanks to Nicolas Denoyelle for the report. +* Fix XML import of multiple distance matrices. +* Add a FAQ entry about "hwloc is only a structural model, it ignores + performance models, memory bandwidth, etc.?" + + +Version 1.11.4 +-------------- +* Add MemoryMode and ClusterMode attributes in the Machine object on KNL. + Add doc/examples/get-knl-modes.c for an example of retrieving them. + Thanks to Grzegorz Andrejczuk. +* Fix Linux build with -m32 with respect to libudev. + Thanks to Paul Hargrove for reporting the issue. +* Fix build with Visual Studio 2015, thanks to Eloi Gaudry for reporting + the issue and providing the patch. +* Don't forget to display OS device children in the graphical lstopo. +* Fix a memory leak on Solaris, thanks to Bryon Gloden for the patch. +* Properly handle realloc() failures, thanks to Bryon Gloden for reporting + the issue. +* Fix lstopo crash in ascii/fig/windows outputs when some objects have a + lstopoStyle info attribute. + + +Version 1.11.3 +-------------- +* Bug fixes + + Fix a memory leak on Linux S/390 hosts with books. + + Fix /proc/mounts parsing on Linux by using mntent.h. + Thanks to Nathan Hjelm for reporting the issue. + + Fix a x86 infinite loop on VMware due to the x2APIC feature being + advertised without actually being fully supported. + Thanks to Jianjun Wen for reporting the problem and testing the patch. + + Fix the return value of hwloc_alloc() on mmap() failure. + Thanks to Hugo Brunie for reporting the issue. + + Fix the return value of command-line tools in some error cases. + + Do not break individual thread bindings during x86 backend discovery in a + multithreaded process. Thanks to Farouk Mansouri for the report. + + Fix hwloc-bind --membind for CPU-less NUMA nodes. + + Fix some corner cases in the XML export/import of application userdata. +* API Improvements + + Add HWLOC_MEMBIND_BYNODESET flag so that membind() functions accept + either cpusets or nodesets. + + Add hwloc_get_area_memlocation() to check where pages are actually + allocated. Only implemented on Linux for now. + - There's no _nodeset() variant, but the new flag HWLOC_MEMBIND_BYNODESET + is supported. + + Make hwloc_obj_type_sscanf() parse back everything that may be outputted + by hwloc_obj_type_snprintf(). +* Detection Improvements + + Allow the x86 backend to add missing cache levels, so that it completes + what the Solaris backend lacks. + Thanks to Ryan Zezeski for reporting the issue. + + Do not filter-out FibreChannel PCI adapters by default anymore. + Thanks to Matt Muggeridge for the report. + + Add support for CUDA compute capability 6.x. +* Tools + + Add --support to hwloc-info to list supported features, just like with + hwloc_topology_get_support(). + - Also add --objects and --topology to explicitly switch between the + default modes. + + Add --tid to let hwloc-bind operate on individual threads on Linux. + + Add --nodeset to let hwloc-bind report memory binding as NUMA node sets. + + hwloc-annotate and lstopo don't drop application userdata from XMLs anymore. + - Add --cu to hwloc-annotate to drop these application userdata. + + Make the hwloc-dump-hwdata dump directory configurable through configure + options such as --runstatedir or --localstatedir. +* Misc Improvements + + Add systemd service template contrib/systemd/hwloc-dump-hwdata.service + for launching hwloc-dump-hwdata at boot on Linux. + Thanks to Grzegorz Andrejczuk. + + Add HWLOC_PLUGINS_BLACKLIST environment variable to prevent some plugins + from being loaded. Thanks to Alexandre Denis for the suggestion. + + Small improvements for various Windows build systems, + thanks to Jonathan L Peyton and Marco Atzeri. + + +Version 1.11.2 +-------------- +* Improve support for Intel Knights Landing Xeon Phi on Linux: + + Group local NUMA nodes of normal memory (DDR) and high-bandwidth memory + (MCDRAM) together through "Cluster" groups so that the local MCDRAM is + easy to find. + - See "How do I find the local MCDRAM NUMA node on Intel Knights + Landing Xeon Phi?" in the documentation. + - For uniformity across all KNL configurations, always have a NUMA node + object even if the host is UMA. + + Fix the detection of the memory-side cache: + - Add the hwloc-dump-hwdata superuser utility to dump SMBIOS information + into /var/run/hwloc/ as root during boot, and load this dumped + information from the hwloc library at runtime. + - See "Why do I need hwloc-dump-hwdata for caches on Intel Knights + Landing Xeon Phi?" in the documentation. + Thanks to Grzegorz Andrejczuk for the patches and for the help. +* The x86 and linux backends may now be combined for discovering CPUs + through x86 CPUID and memory from the Linux kernel. + This is useful for working around buggy CPU information reported by Linux + (for instance the AMD Bulldozer/Piledriver bug below). + Combination is enabled by passing HWLOC_COMPONENTS=x86 in the environment. +* Fix L3 cache sharing on AMD Opteron 63xx (Piledriver) and 62xx (Bulldozer) + in the x86 backend. Thanks to many users who helped. +* Fix the overzealous L3 cache sharing fix added to the x86 backend in 1.11.1 + for AMD Opteron 61xx (Magny-Cours) processors. +* The x86 backend may now add the info attribute Inclusive=0 or 1 to caches + it discovers, or to caches discovered by other backends earlier. + Thanks to Guillaume Beauchamp for the patch. +* Fix the management on alloc_membind() allocation failures on AIX, HP-UX + and OSF/Tru64. +* Fix spurious failures to load with ENOMEM on AIX in case of Misc objects + below PUs. +* lstopo improvements in X11 and Windows graphical mode: + + Add + - f 1 shortcuts to manually zoom-in, zoom-out, reset the scale, + or fit the entire window. + + Display all keyboard shortcuts in the console. +* Debug messages may be disabled at runtime by passing HWLOC_DEBUG_VERBOSE=0 + in the environment when --enable-debug was passed to configure. +* Add a FAQ entry "What are these Group objects in my topology?". + + +Version 1.11.1 +-------------- +* Detection fixes + + Hardwire the topology of Fujitsu K-computer, FX10, FX100 servers to + workaround buggy Linux kernels. + Thanks to Takahiro Kawashima and Gilles Gouaillardet. + + Fix L3 cache information on AMD Opteron 61xx Magny-Cours processors + in the x86 backend. Thanks to Guillaume Beauchamp for the patch. + + Detect block devices directly attached to PCI without a controller, + for instance NVMe disks. Thanks to Barry M. Tannenbaum. + + Add the PCISlot attribute to all PCI functions instead of only the + first one. +* Miscellaneous internal fixes + + Ignore PCI bridges that could fail assertions by reporting buggy + secondary-subordinate bus numbers + Thanks to George Bosilca for reporting the issue. + + Fix an overzealous assertion when inserting an intermediate Group object + while Groups are totally ignored. + + Fix a memory leak on Linux on AMD processors with dual-core compute units. + Thanks to Bob Benner. + + Fix a memory leak on failure to load a xml diff file. + + Fix some segfaults when inputting an invalid synthetic description. + + Fix a segfault when plugins fail to find core symbols. + Thanks to Guy Streeter. +* Many fixes and improvements in the Windows backend: + + Fix the discovery of more than 32 processors and multiple processor + groups. Thanks to Barry M. Tannenbaum for the help. + + Add thread binding set support in case of multiple process groups. + + Add thread binding get support. + + Add get_last_cpu_location() support for the current thread. + + Disable the unsupported process binding in case of multiple processor + groups. + + Fix/update the Visual Studio support under contrib/windows. + Thanks to Eloi Gaudry for the help. +* Tools fixes + + Fix a segfault when displaying logical indexes in the graphical lstopo. + Thanks to Guillaume Mercier for reporting the issue. + + Fix lstopo linking with X11 libraries, for instance on Mac OS X. + Thanks to Scott Atchley and Pierre Ramet for reporting the issue. + + hwloc-annotate, hwloc-diff and hwloc-patch do not drop unavailable + resources from the output anymore and those may be annotated as well. + + Command-line tools may now import XML from the standard input with -i -.xml + + Add missing documentation for the hwloc-info --no-icaches option. + + +Version 1.11.0 +-------------- +* API + + Socket objects are renamed into Package to align with the terminology + used by processor vendors. The old HWLOC_OBJ_SOCKET type and "Socket" + name are still supported for backward compatibility. + + HWLOC_OBJ_NODE is replaced with HWLOC_OBJ_NUMANODE for clarification. + HWLOC_OBJ_NODE is still supported for backward compatibility. + "Node" and "NUMANode" strings are supported as in earlier releases. +* Detection improvements + + Add support for Intel Knights Landing Xeon Phi. + Thanks to Grzegorz Andrejczuk and Lukasz Anaczkowski. + + Add Vendor, Model, Revision, SerialNumber, Type and LinuxDeviceID + info attributes to Block OS devices on Linux. Thanks to Vineet Pedaballe + for the help. + - Add --disable-libudev to avoid dependency on the libudev library. + + Add "MemoryModule" Misc objects with information about DIMMs, on Linux + when privileged and when I/O is enabled. + Thanks to Vineet Pedaballe for the help. + + Add a PCISlot attribute to PCI devices on Linux when supported to + identify the physical PCI slot where the board is plugged. + + Add CPUStepping info attribute on x86 processors, + thanks to Thomas Röhl for the suggestion. + + Ignore the device-tree on non-Power architectures to avoid buggy + detection on ARM. Thanks to Orion Poplawski for reporting the issue. + + Work-around buggy Xeon E5v3 BIOS reporting invalid PCI-NUMA affinity + for the PCI links on the second processor. + + Add support for CUDA compute capability 5.x, thanks Benjamin Worpitz. + + Many fixes to the x86 backend + - Add L1i and fix L2/L3 type on old AMD processors without topoext support. + - Fix Intel CPU family and model numbers when basic family isn't 6 or 15. + - Fix package IDs on recent AMD processors. + - Fix misc issues due to incomplete APIC IDs on x2APIC processors. + - Avoid buggy discovery on old SGI Altix UVs with non-unique APIC IDs. + + Gather total machine memory on NetBSD. +* Tools + + lstopo + - Collapse identical PCI devices unless --no-collapse is given. + This avoids gigantic outputs when a PCI device contains dozens of + identical virtual functions. + - The ASCII art output is now called "ascii", for instance in + "lstopo -.ascii". + The former "txt" extension is retained for backward compatibility. + - Automatically scales graphical box width to the inner text in Cairo, + ASCII and Windows outputs. + - Add --rect to lstopo to force rectangular layout even for NUMA nodes. + - Add --restrict-flags to configure the behavior of --restrict. + - Objects may have a "Type" info attribute to specify a better type name + and display it in lstopo. + - Really export all verbose information to the given output file. + + hwloc-annotate + - May now operate on all types of objects, including I/O. + - May now insert Misc objects in the topology. + - Do not drop instruction caches and I/O devices from the output anymore. + + Fix lstopo path in hwloc-gather-topology after install. +* Misc + + Fix hwloc/cudart.h for machines with multiple PCI domains, + thanks to Imre Kerr for reporting the problem. + + Fix PCI Bridge-specific depth attribute. + + Fix hwloc_bitmap_intersect() for two infinite bitmaps. + + Fix some corner cases in the building of levels on large NUMA machines + with non-uniform NUMA groups and I/Os. + + Improve the performance of object insertion by cpuset for large + topologies. + + Prefix verbose XML import errors with the source name. + + Improve pkg-config checks and error messages. + + Fix excluding after a component with an argument in the HWLOC_COMPONENTS + environment variable. +* Documentation + + Fix the recommended way in documentation and examples to allocate memory + on some node, it should use HWLOC_MEMBIND_BIND. + Thanks to Nicolas Bouzat for reporting the issue. + + Add a "Miscellaneous objects" section in the documentation. + + Add a FAQ entry "What happens to my topology if I disable symmetric + multithreading, hyper-threading, etc. ?" to the documentation. + + +Version 1.10.1 +-------------- +* Actually remove disallowed NUMA nodes from nodesets when the whole-system + flag isn't enabled. +* Fix the gathering of PCI domains. Thanks to James Custer for reporting + the issue and providing a patch. +* Fix the merging of identical parent and child in presence of Misc objects. + Thanks to Dave Love for reporting the issue. +* Fix some misordering of children when merging with ignore_keep_structure() + in partially allowed topologies. +* Fix an overzealous assertion in the debug code when running on a single-PU + host with I/O. Thanks to Thomas Van Doren for reporting the issue. +* Don't forget to setup NUMA node object nodesets in x86 backend (for BSDs) + and OSF/Tru64 backend. +* Fix cpuid-x86 build error with gcc -O3 on x86-32. Thanks to Thomas Van Doren + for reporting the issue. +* Fix support for future very large caches in the x86 backend. +* Fix vendor/device names for SR-IOV PCI devices on Linux. +* Fix an unlikely crash in case of buggy hierarchical distance matrix. +* Fix PU os_index on some AIX releases. Thanks to Hendryk Bockelmann and + Erik Schnetter for helping debugging. +* Fix hwloc_bitmap_isincluded() in case of infinite sets. +* Change hwloc-ls.desktop into a lstopo.desktop and only install it if + lstopo is built with Cairo/X11 support. It cannot work with a non-graphical + lstopo or hwloc-ls. +* Add support for the renaming of Socket into Package in future releases. +* Add support for the replacement of HWLOC_OBJ_NODE with HWLOC_OBJ_NUMANODE + in future releases. +* Clarify the documentation of distance matrices in hwloc.h and in the manpage + of the hwloc-distances. Thanks to Dave Love for the suggestion. +* Improve some error messages by displaying more information about the + hwloc library in use. +* Document how to deal with the ABI break when upgrading to the upcoming 2.0 + See "How do I handle ABI breaks and API upgrades ?" in the FAQ. + + +Version 1.10.0 +-------------- +* API + + Add hwloc_topology_export_synthetic() to export a topology to a + synthetic string without using lstopo. See the Synthetic topologies + section in the documentation. + + Add hwloc_topology_set/get_userdata() to let the application save + a private pointer in the topology whenever it needs a way to find + its own object corresponding to a topology. + + Add hwloc_get_numanode_obj_by_os_index() and document that this function + as well as hwloc_get_pu_obj_by_os_index() are good at converting + nodesets and cpusets into objects. + + hwloc_distrib() does not ignore any objects anymore when there are + too many of them. They get merged with others instead. + Thanks to Tim Creech for reporting the issue. +* Tools + + hwloc-bind --get now executes the command after displaying + the binding instead of ignoring the command entirely. + Thanks to John Donners for the suggestion. + + Clarify that memory sizes shown in lstopo are local by default + unless specified (total memory added in the root object). +* Synthetic topologies + + Synthetic topology descriptions may now specify attributes such as + memory sizes and OS indexes. See the Synthetic topologies section + in the documentation. + + lstopo now exports in this fully-detailed format by default. + The new option --export-synthetic-flags may be used to revert + back the old format. +* Documentation + + Add the doc/examples/ subdirectory with several real-life examples, + including the already existing hwloc-hello.C for basics. + Thanks to Rob Aulwes for the suggestion. + + Improve the documentation of CPU and memory binding in the API. + + Add a FAQ entry about operating system errors, especially on AMD + platforms with buggy cache information. + + Add a FAQ entry about loading many topologies in a single program. +* Misc + + Work around buggy Linux kernels reporting 2 sockets instead + 1 socket with 2 NUMA nodes for each Xeon E5 v3 (Haswell) processor. + + pciutils/libpci support is now removed since libpciaccess works + well and there's also a Linux-specific PCI backend. For the record, + pciutils was GPL and therefore disabled by default since v1.6.2. + + Add --disable-cpuid configure flag to work around buggy processor + simulators reporting invalid CPUID information. + Thanks for Andrew Friedley for reporting the issue. + + Fix a racy use of libltdl when manipulating multiple topologies in + different threads. + Thanks to Andra Hugo for reporting the issue and testing patches. + + Fix some build failures in private/misc.h. + Thanks to Pavan Balaji and Ralph Castain for the reports. + + Fix failures to detect X11/Xutil.h on some Solaris platforms. + Thanks to Siegmar Gross for reporting the failure. + + The plugin ABI has changed, this release will not load plugins + built against previous hwloc releases. + + +Version 1.9.1 +------------- +* Fix a crash when the PCI locality is invalid. Attach to the root object + instead. Thanks to Nicolas Denoyelle for reporting the issue. +* Fix -f in lstopo manpage. Thanks to Jirka Hladky for reporting the issue. +* Fix hwloc_obj_type_sscanf() and others when strncasecmp() is not properly + available. Thanks to Nick Papior Andersen for reporting the problem. +* Mark Linux file descriptors as close-on-exec to avoid leaks on exec. +* Fix some minor memory leaks. + + +Version 1.9.0 +------------- +* API + + Add hwloc_obj_type_sscanf() to extend hwloc_obj_type_of_string() with + type-specific attributes such as Cache/Group depth and Cache type. + hwloc_obj_type_of_string() is moved to hwloc/deprecated.h. + + Add hwloc_linux_get_tid_last_cpu_location() for retrieving the + last CPU where a Linux thread given by TID ran. + + Add hwloc_distrib() to extend the old hwloc_distribute[v]() functions. + hwloc_distribute[v]() is moved to hwloc/deprecated.h. + + Don't mix total and local memory when displaying verbose object attributes + with hwloc_obj_attr_snprintf() or in lstopo. +* Backends + + Add CPUVendor, CPUModelNumber and CPUFamilyNumber info attributes for + x86, ia64 and Xeon Phi sockets on Linux, to extend the x86-specific + support added in v1.8.1. Requested by Ralph Castain. + + Add many CPU- and Platform-related info attributes on ARM and POWER + platforms, in the Machine and Socket objects. + + Add CUDA info attributes describing the number of multiprocessors and + cores and the size of the global, shared and L2 cache memories in CUDA + OS devices. + + Add OpenCL info attributes describing the number of compute units and + the global memory size in OpenCL OS devices. + + The synthetic backend now accepts extended types such as L2Cache, L1i or + Group3. lstopo also exports synthetic strings using these extended types. +* Tools + + lstopo + - Do not overwrite output files by default anymore. + Pass -f or --force to enforce it. + - Display OpenCL, CUDA and Xeon Phi numbers of cores and memory sizes + in the graphical output. + - Fix export to stdout when specifying a Cairo-based output type + with --of. + + hwloc-ps + - Add -e or --get-last-cpu-location to report where processes/threads + run instead of where they are bound. + - Report locations as likely-more-useful objects such as Cores or Sockets + instead of Caches when possible. + + hwloc-bind + - Fix failure on Windows when not using --pid. + - Add -e as a synonym to --get-last-cpu-location. + + hwloc-distrib + - Add --reverse to distribute using last objects first and singlify + into last bits first. Thanks to Jirka Hladky for the suggestion. + + hwloc-info + - Report unified caches when looking for data or instruction cache + ancestor objects. +* Misc + + Add experimental Visual Studio support under contrib/windows. + Thanks to Eloi Gaudry for his help and for providing the first draft. + + Fix some overzealous assertions and warnings about the ordering of + objects on a level with respect to cpusets. The ordering is only + guaranteed for complete cpusets (based on the first bit in sets). + + Fix some memory leaks when importing xml diffs and when exporting a + "too complex" entry. + + +Version 1.8.1 +------------- +* Fix the cpuid code on Windows 64bits so that the x86 backend gets + enabled as expected and can populate CPU information. + Thanks to Robin Scher for reporting the problem. +* Add CPUVendor/CPUModelNumber/CPUFamilyNumber attributes when running + on x86 architecture. Thanks to Ralph Castain for the suggestion. +* Work around buggy BIOS reporting duplicate NUMA nodes on Linux. + Thanks to Jeff Becker for reporting the problem and testing the patch. +* Add a name to the lstopo graphical window. Thanks to Michael Prokop + for reporting the issue. + + +Version 1.8.0 +------------- +* New components + + Add the "linuxpci" component that always works on Linux even when + libpciaccess and libpci aren't available (and even with a modified + file-system root). By default the old "pci" component runs first + because "linuxpci" lacks device names (obj->name is always NULL). +* API + + Add the topology difference API in hwloc/diff.h for manipulating + many similar topologies. + + Add hwloc_topology_dup() for duplicating an entire topology. + + hwloc.h and hwloc/helper.h have been reorganized to clarify the + documentation sections. The actual inline code has moved out of hwloc.h + into the new hwloc/inlines.h. + + Deprecated functions are now in hwloc/deprecated.h, and not in the + official documentation anymore. +* Tools + + Add hwloc-diff and hwloc-patch tools together with the new diff API. + + Add hwloc-compress-dir to (de)compress an entire directory of XML files + using hwloc-diff and hwloc-patch. + + Object colors in the graphical output of lstopo may be changed by adding + a "lstopoStyle" info attribute. See CUSTOM COLORS in the lstopo(1) manpage + for details. Thanks to Jirka Hladky for discussing the idea. + + hwloc-gather-topology may now gather I/O-related files on Linux when + --io is given. Only the linuxpci component supports discovering I/O + objects from these extended tarballs. + + hwloc-annotate now supports --ri to remove/replace info attributes with + a given name. + + hwloc-info supports "root" and "all" special locations for dumping + information about the root object. + + lstopo now supports --append-legend to append custom lines of text + to the legend in the graphical output. Thanks to Jirka Hladky for + discussing the idea. + + hwloc-calc and friends have a more robust parsing of locations given + on the command-line and they report useful error messages about it. + + Add --whole-system to hwloc-bind, hwloc-calc, hwloc-distances and + hwloc-distrib, and add --restrict to hwloc-bind for uniformity among + tools. +* Misc + + Calling hwloc_topology_load() or hwloc_topology_set_*() on an already + loaded topology now returns an error (deprecated since release 1.6.1). + + Fix the initialisation of cpusets and nodesets in Group objects added + when inserting PCI hostbridges. + + Never merge Group objects that were added explicitly by the user with + hwloc_custom_insert_group_object_by_parent(). + + Add a sanity check during dynamic plugin loading to prevent some + crashes when hwloc is dynamically loaded by another plugin mechanisms. + + Add --with-hwloc-plugins-path to specify the install/load directories + of plugins. + + Add the MICSerialNumber info attribute to the root object when running + hwloc inside a Xeon Phi to match the same attribute in the MIC OS device + when running in the host. + + +Version 1.7.2 +------------- +* Do not create invalid block OS devices on very old Linux kernel such + as RHEL4 2.6.9. +* Fix PCI subvendor/device IDs. +* Fix the management of Misc objects inserted by parent. + Thanks to Jirka Hladky for reporting the problem. +* Add a PortState into attribute to OpenFabrics OS devices. +* Add a MICSerialNumber info attribute to Xeon PHI/MIC OS devices. +* Improve verbose error messages when failing to load from XML. + + +Version 1.7.1 +------------- +* Fix a failed assertion in the distance grouping code when loading a XML + file that already contains some groups. + Thanks to Laercio Lima Pilla for reporting the problem. +* Remove unexpected Group objects when loading XML topologies with I/O + objects and NUMA distances. + Thanks to Elena Elkina for reporting the problem and testing patches. +* Fix PCI link speed discovery when using libpciaccess. +* Fix invalid libpciaccess virtual function device/vendor IDs when using + SR-IOV PCI devices on Linux. +* Fix GL component build with old NVCtrl releases. + Thanks to Jirka Hladky for reporting the problem. +* Fix embedding breakage caused by libltdl. + Thanks to Pavan Balaji for reporting the problem. +* Always use the system-wide libltdl instead of shipping one inside hwloc. +* Document issues when enabling plugins while embedding hwloc in another + project, in the documentation section Embedding hwloc in Other Software. +* Add a FAQ entry "How to get useful topology information on NetBSD?" + in the documentation. +* Somes fixes in the renaming code for embedding. +* Miscellaneous minor build fixes. + + +Version 1.7.0 +------------- +* New operating system backends + + Add BlueGene/Q compute node kernel (CNK) support. See the FAQ in the + documentation for details. Thanks to Jeff Hammond, Christopher Samuel + and Erik Schnetter for their help. + + Add NetBSD support, thanks to Aleksej Saushev. +* New I/O device discovery + + Add co-processor OS devices such as "mic0" for Intel Xeon Phi (MIC) + on Linux. Thanks to Jerome Vienne for helping. + + Add co-processor OS devices such as "cuda0" for NVIDIA CUDA-capable GPUs. + + Add co-processor OS devices such as "opencl0d0" for OpenCL GPU devices + on the AMD OpenCL implementation. + + Add GPU OS devices such as ":0.0" for NVIDIA X11 displays. + + Add GPU OS devices such as "nvml0" for NVIDIA GPUs. + Thanks to Marwan Abdellah and Stefan Eilemann for helping. + These new OS devices have some string info attributes such as CoProcType, + GPUModel, etc. to better identify them. + See the I/O Devices and Attributes documentation sections for details. +* New components + + Add the "opencl", "cuda", "nvml" and "gl" components for I/O device + discovery. + + "nvml" also improves the discovery of NVIDIA GPU PCIe link speed. + All of these new components may be built as plugins. They may also be + disabled entirely by passing --disable-opencl/cuda/nvml/gl to configure. + See the I/O Devices, Components and Plugins, and FAQ documentation + sections for details. +* API + + Add hwloc_topology_get_flags(). + + Add hwloc/plugins.h for building external plugins. + See the Adding new discovery components and plugins section. +* Interoperability + + Add hwloc/opencl.h, hwloc/nvml.h, hwloc/gl.h and hwloc/intel-mic.h + to retrieve the locality of OS devices that correspond to AMD OpenCL + GPU devices or indexes, to NVML devices or indexes, to NVIDIA X11 + displays, or to Intel Xeon Phi (MIC) device indexes. + + Add new helpers in hwloc/cuda.h and hwloc/cudart.h to convert + between CUDA devices or indexes and hwloc OS devices. + + Add hwloc_ibv_get_device_osdev() and clarify the requirements + of the OpenFabrics Verbs helpers in hwloc/openfabrics-verbs.h. +* Tools + + hwloc-info is not only a synonym of lstopo -s anymore, it also + dumps information about objects given on the command-line. +* Documentation + + Add a section "Existing components and plugins". + + Add a list of common OS devices in section "Software devices". + + Add a new FAQ entry "Why is lstopo slow?" about lstopo slowness + issues because of GPUs. + + Clarify the documentation of inline helpers in hwloc/myriexpress.h + and hwloc/openfabrics-verbs.h. +* Misc + + Improve cache detection on AIX. + + The HWLOC_COMPONENTS variable now excludes the components whose + names are prefixed with '-'. + + lstopo --ignore PU now works when displaying the topology in + graphical and textual mode (not when exporting to XML). + + Make sure I/O options always appear in lstopo usage, not only when + using pciutils/libpci. + + Remove some unneeded Linux specific includes from some interoperability + headers. + + Fix some inconsistencies in hwloc-distrib and hwloc-assembler-remote + manpages. Thanks to Guy Streeter for the report. + + Fix a memory leak on AIX when getting memory binding. + + Fix many small memory leaks on Linux. + + The `libpci' component is now called `pci' but the old name is still + accepted in the HWLOC_COMPONENTS variable for backward compatibility. + + +Version 1.6.2 +------------- +* Use libpciaccess instead of pciutils/libpci by default for I/O discovery. + pciutils/libpci is only used if --enable-libpci is given to configure + because its GPL license may taint hwloc. See the Installation section + in the documentation for details. +* Fix get_cpubind on Solaris when bound to a single PU with + processor_bind(). Thanks to Eugene Loh for reporting the problem + and providing a patch. + + +Version 1.6.1 +------------- +* Fix some crash or buggy detection in the x86 backend when Linux + cgroups/cpusets restrict the available CPUs. +* Fix the pkg-config output with --libs --static. + Thanks to Erik Schnetter for reporting one of the problems. +* Fix the output of hwloc-calc -H --hierarchical when using logical + indexes in the output. +* Calling hwloc_topology_load() multiple times on the same topology + is officially deprecated. hwloc will warn in such cases. +* Add some documentation about existing plugins/components, package + dependencies, and I/O devices specification on the command-line. + + +Version 1.6.0 +------------- +* Major changes + + Reorganize the backend infrastructure to support dynamic selection + of components and dynamic loading of plugins. For details, see the + new documentation section Components and plugins. + - The HWLOC_COMPONENTS variable lets one replace the default discovery + components. + - Dynamic loading of plugins may be enabled with --enable-plugins + (except on AIX and Windows). It will build libxml2 and libpci + support as separated modules. This helps reducing the dependencies + of the core hwloc library when distributed as a binary package. +* Backends + + Add CPUModel detection on Darwin and x86/FreeBSD. + Thanks to Robin Scher for providing ways to implement this. + + The x86 backend now adds CPUModel info attributes to socket objects + created by other backends that do not natively support this attribute. + + Fix detection on FreeBSD in case of cpuset restriction. Thanks to + Sebastian Kuzminsky for reporting the problem. +* XML + + Add hwloc_topology_set_userdata_import/export_callback(), + hwloc_export_obj_userdata() and _userdata_base64() to let + applications specify how to save/restore the custom data they placed + in the userdata private pointer field of hwloc objects. +* Tools + + Add hwloc-annotate program to add string info attributes to XML + topologies. + + Add --pid-cmd to hwloc-ps to append the output of a command to each + PID line. May be used for showing Open MPI process ranks, see the + hwloc-ps(1) manpage for details. + + hwloc-bind now exits with an error if binding fails; the executable + is not launched unless binding suceeeded or --force was given. + + Add --quiet to hwloc-calc and hwloc-bind to hide non-fatal error + messages. + + Fix command-line pid support in windows tools. + + All programs accept --verbose as a synonym to -v. +* Misc + + Fix some DIR descriptor leaks on Linux. + + Fix I/O device lists when some were filtered out after a XML import. + + Fix the removal of I/O objects when importing a I/O-enabled XML topology + without any I/O topology flag. + + When merging objects with HWLOC_IGNORE_TYPE_KEEP_STRUCTURE or + lstopo --merge, compare object types before deciding which one of two + identical object to remove (e.g. keep sockets in favor of caches). + + Add some GUID- and LID-related info attributes to OpenFabrics + OS devices. + + Only add CPUType socket attributes on Solaris/Sparc. Other cases + don't report reliable information (Solaris/x86), and a replacement + is available as the Architecture string info in the Machine object. + + Add missing Backend string info on Solaris in most cases. + + Document object attributes and string infos in a new Attributes + section in the documentation. + + Add a section about Synthetic topologies in the documentation. + + +Version 1.5.2 (some of these changes are in v1.6.2 but not in v1.6) +------------- +* Use libpciaccess instead of pciutils/libpci by default for I/O discovery. + pciutils/libpci is only used if --enable-libpci is given to configure + because its GPL license may taint hwloc. See the Installation section + in the documentation for details. +* Fix get_cpubind on Solaris when bound to a single PU with + processor_bind(). Thanks to Eugene Loh for reporting the problem + and providing a patch. +* Fix some DIR descriptor leaks on Linux. +* Fix I/O device lists when some were filtered out after a XML import. +* Add missing Backend string info on Solaris in most cases. +* Fix the removal of I/O objects when importing a I/O-enabled XML topology + without any I/O topology flag. +* Fix the output of hwloc-calc -H --hierarchical when using logical + indexes in the output. +* Fix the pkg-config output with --libs --static. + Thanks to Erik Schnetter for reporting one of the problems. + + +Version 1.5.1 +------------- +* Fix block OS device detection on Linux kernel 3.3 and later. + Thanks to Guy Streeter for reporting the problem and testing the fix. +* Fix the cpuid code in the x86 backend (for FreeBSD). Thanks to + Sebastian Kuzminsky for reporting problems and testing patches. +* Fix 64bit detection on FreeBSD. +* Fix some corner cases in the management of the thissystem flag with + respect to topology flags and environment variables. +* Fix some corner cases in command-line parsing checks in hwloc-distrib + and hwloc-distances. +* Make sure we do not miss some block OS devices on old Linux kernels + when a single PCI device has multiple IDE hosts/devices behind it. +* Do not disable I/O devices or instruction caches in hwloc-assembler output. + + +Version 1.5.0 +------------- +* Backends + + Do not limit the number of processors to 1024 on Solaris anymore. + + Gather total machine memory on FreeBSD. Thanks to Cyril Roelandt. + + XML topology files do not depend on the locale anymore. Float numbers + such as NUMA distances or PCI link speeds now always use a dot as a + decimal separator. + + Add instruction caches detection on Linux, AIX, Windows and Darwin. + + Add get_last_cpu_location() support for the current thread on AIX. + + Support binding on AIX when threads or processes were bound with + bindprocessor(). Thanks to Hendryk Bockelmann for reporting the issue + and testing patches, and to Farid Parpia for explaining the binding + interfaces. + + Improve AMD topology detection in the x86 backend (for FreeBSD) using + the topoext feature. +* API + + Increase HWLOC_API_VERSION to 0x00010500 so that API changes may be + detected at build-time. + + Add a cache type attribute describind Data, Instruction and Unified + caches. Caches with different types but same depth (for instance L1d + and L1i) are placed on different levels. + + Add hwloc_get_cache_type_depth() to retrieve the hwloc level depth of + of the given cache depth and type, for instance L1i or L2. + It helps disambiguating the case where hwloc_get_type_depth() returns + HWLOC_TYPE_DEPTH_MULTIPLE. + + Instruction caches are ignored unless HWLOC_TOPOLOGY_FLAG_ICACHES is + passed to hwloc_topology_set_flags() before load. + + Add hwloc_ibv_get_device_osdev_by_name() OpenFabrics helper in + openfabrics-verbs.h to find the hwloc OS device object corresponding to + an OpenFabrics device. +* Tools + + Add lstopo-no-graphics, a lstopo built without graphical support to + avoid dependencies on external libraries such as Cairo and X11. When + supported, graphical outputs are only available in the original lstopo + program. + - Packagers splitting lstopo and lstopo-no-graphics into different + packages are advised to use the alternatives system so that lstopo + points to the best available binary. + + Instruction caches are enabled in lstopo by default. Use --no-icaches + to disable them. + + Add -t/--threads to show threads in hwloc-ps. +* Removal of obsolete components + + Remove the old cpuset interface (hwloc/cpuset.h) which is deprecated and + superseded by the bitmap API (hwloc/bitmap.h) since v1.1. + hwloc_cpuset and nodeset types are still defined, but all hwloc_cpuset_* + compatibility wrappers are now gone. + + Remove Linux libnuma conversion helpers for the deprecated and + broken nodemask_t interface. + + Remove support for "Proc" type name, it was superseded by "PU" in v1.0. + + Remove hwloc-mask symlinks, it was replaced by hwloc-calc in v1.0. +* Misc + + Fix PCIe 3.0 link speed computation. + + Non-printable characters are dropped from strings during XML export. + + Fix importing of escaped characters with the minimalistic XML backend. + + Assert hwloc_is_thissystem() in several I/O related helpers. + + Fix some memory leaks in the x86 backend for FreeBSD. + + Minor fixes to ease native builds on Windows. + + Limit the number of retries when operating on all threads within a + process on Linux if the list of threads is heavily getting modified. + + +Version 1.4.3 +------------- +* This release is only meant to fix the pciutils license issue when upgrading + to hwloc v1.5 or later is not possible. It contains several other minor + fixes but ignores many of them that are only in v1.5 or later. +* Use libpciaccess instead of pciutils/libpci by default for I/O discovery. + pciutils/libpci is only used if --enable-libpci is given to configure + because its GPL license may taint hwloc. See the Installation section + in the documentation for details. +* Fix PCIe 3.0 link speed computation. +* Fix importing of escaped characters with the minimalistic XML backend. +* Fix a memory leak in the x86 backend. + + +Version 1.4.2 +------------- +* Fix build on Solaris 9 and earlier when fabsf() is not a compiler + built-in. Thanks to Igor Galić for reporting the problem. +* Fix support for more than 32 processors on Windows. Thanks to Hartmut + Kaiser for reporting the problem. +* Fix process-wide binding and cpulocation routines on Linux when some + threads disappear in the meantime. Thanks to Vlad Roubtsov for reporting + the issue. +* Make installed scripts executable. Thanks to Jirka Hladky for reporting + the problem. +* Fix libtool revision management when building for Windows. This fix was + also released as hwloc v1.4.1.1 Windows builds. Thanks to Hartmut Kaiser + for reporting the problem. +* Fix the __hwloc_inline keyword in public headers when compiling with a + C++ compiler. +* Add Port info attribute to network OS devices inside OpenFabrics PCI + devices so as to identify which interface corresponds to which port. +* Document requirements for interoperability helpers: I/O devices discovery + is required for some of them; the topology must match the current host + for most of them. + + +Version 1.4.1 +------------- +* This release contains all changes from v1.3.2. +* Fix hwloc_alloc_membind, thanks Karl Napf for reporting the issue. +* Fix memory leaks in some get_membind() functions. +* Fix helpers converting from Linux libnuma to hwloc (hwloc/linux-libnuma.h) + in case of out-of-order NUMA node ids. +* Fix some overzealous assertions in the distance grouping code. +* Workaround BIOS reporting empty I/O locality in CUDA and OpenFabrics + helpers on Linux. Thanks to Albert Solernou for reporting the problem. +* Install a valgrind suppressions file hwloc-valgrind.supp (see the FAQ). +* Fix memory binding documentation. Thanks to Karl Napf for reporting the + issues. + + +Version 1.4.0 (does not contain all v1.3.2 changes) +------------- +* Major features + + Add "custom" interface and "assembler" tools to build multi-node + topology. See the Multi-node Topologies section in the documentation + for details. +* Interface improvements + + Add symmetric_subtree object attribute to ease assumptions when consulting + regular symmetric topologies. + + Add a CPUModel and CPUType info attribute to Socket objects on Linux + and Solaris. + + Add hwloc_get_obj_index_inside_cpuset() to retrieve the "logical" index + of an object within a subtree of the topology. + + Add more NVIDIA CUDA helpers in cuda.h and cudart.h to find hwloc objects + corresponding to CUDA devices. +* Discovery improvements + + Add a group object above partial distance matrices to make sure + the matrices are available in the final topology, except when this + new object would contradict the existing hierarchy. + + Grouping by distances now also works when loading from XML. + + Fix some corner cases in object insertion, for instance when dealing + with NUMA nodes without any CPU. +* Backends + + Implement hwloc_get_area_membind() on Linux. + + Honor I/O topology flags when importing from XML. + + Further improve XML-related error checking and reporting. + + Hide synthetic topology error messages unless HWLOC_SYNTHETIC_VERBOSE=1. +* Tools + + Add synthetic exporting of symmetric topologies to lstopo. + + lstopo --horiz and --vert can now be applied to some specific object types. + + lstopo -v -p now displays distance matrices with physical indexes. + + Add hwloc-distances utility to list distances. +* Documentation + + Fix and/or document the behavior of most inline functions in hwloc/helper.h + when the topology contains some I/O or Misc objects. + + Backend documentation enhancements. +* Bug fixes + + Fix missing last bit in hwloc_linux_get_thread_cpubind(). + Thanks to Carolina Gómez-Tostón Gutiérrez for reporting the issue. + + Fix FreeBSD build without cpuid support. + + Fix several Windows build issues. + + Fix inline keyword definition in public headers. + + Fix dependencies in the embedded library. + + Improve visibility support detection. Thanks to Dave Love for providing + the patch. + + Remove references to internal symbols in the tools. + + +Version 1.3.3 +------------- +* This release is only meant to fix the pciutils license issue when upgrading + to hwloc v1.4 or later is not possible. It contains several other minor + fixes but ignores many of them that are only in v1.4 or later. +* Use libpciaccess instead of pciutils/libpci by default for I/O discovery. + pciutils/libpci is only used if --enable-libpci is given to configure + because its GPL license may taint hwloc. See the Installation section + in the documentation for details. + + +Version 1.3.2 +------------- +* Fix missing last bit in hwloc_linux_get_thread_cpubind(). + Thanks to Carolina Gómez-Tostón Gutiérrez for reporting the issue. +* Fix build with -mcmodel=medium. Thanks to Devendar Bureddy for reporting + the issue. +* Fix build with Solaris Studio 12 compiler when XML is disabled. + Thanks to Paul H. Hargrove for reporting the problem. +* Fix installation with old GNU sed, for instance on Red Hat 8. + Thanks to Paul H. Hargrove for reporting the problem. +* Fix PCI locality when Linux cgroups restrict the available CPUs. +* Fix floating point issue when grouping by distance on mips64 architecture. + Thanks to Paul H. Hargrove for reporting the problem. +* Fix conversion from/to Linux libnuma when some NUMA nodes have no memory. +* Fix support for gccfss compilers with broken ffs() support. Thanks to + Paul H. Hargrove for reporting the problem and providing a patch. +* Fix FreeBSD build without cpuid support. +* Fix several Windows build issues. +* Fix inline keyword definition in public headers. +* Fix dependencies in the embedded library. +* Detect when a compiler such as xlc may not report compile errors + properly, causing some configure checks to be wrong. Thanks to + Paul H. Hargrove for reporting the problem and providing a patch. +* Improve visibility support detection. Thanks to Dave Love for providing + the patch. +* Remove references to internal symbols in the tools. +* Fix installation on systems with limited command-line size. + Thanks to Paul H. Hargrove for reporting the problem. +* Further improve XML-related error checking and reporting. + + +Version 1.3.1 +------------- +* Fix pciutils detection with pkg-config when not installed in standard + directories. +* Fix visibility options detection with the Solaris Studio compiler. + Thanks to Igor Galić and Terry Dontje for reporting the problems. +* Fix support for old Linux sched.h headers such as those found + on Red Hat 8. Thanks to Paul H. Hargrove for reporting the problems. +* Fix inline and attribute support for Solaris compilers. Thanks to + Dave Love for reporting the problems. +* Print a short summary at the end of the configure output. Thanks to + Stefan Eilemann for the suggestion. +* Add --disable-libnuma configure option to disable libnuma-based + memory binding support on Linux. Thanks to Rayson Ho for the + suggestion. +* Make hwloc's configure script properly obey $PKG_CONFIG. Thanks to + Nathan Phillip Brink for raising the issue. +* Silence some harmless pciutils warnings, thanks to Paul H. Hargrove + for reporting the problem. +* Fix the documentation with respect to hwloc_pid_t and hwloc_thread_t + being either pid_t and pthread_t on Unix, or HANDLE on Windows. + + +Version 1.3.0 +------------- +* Major features + + Add I/O devices and bridges to the topology using the pciutils + library. Only enabled after setting the relevant flag with + hwloc_topology_set_flags() before hwloc_topology_load(). See the + I/O Devices section in the documentation for details. +* Discovery improvements + + Add associativity to the cache attributes. + + Add support for s390/z11 "books" on Linux. + + Add the HWLOC_GROUPING_ACCURACY environment variable to relax + distance-based grouping constraints. See the Environment Variables + section in the documentation for details about grouping behavior + and configuration. + + Allow user-given distance matrices to remove or replace those + discovered by the OS backend. +* XML improvements + + XML is now always supported: a minimalistic custom import/export + code is used when libxml2 is not available. It is only guaranteed + to read XML files generated by hwloc. + + hwloc_topology_export_xml() and export_xmlbuffer() now return an + integer. + + Add hwloc_free_xmlbuffer() to free the buffer allocated by + hwloc_topology_export_xmlbuffer(). + + Hide XML topology error messages unless HWLOC_XML_VERBOSE=1. +* Minor API updates + + Add hwloc_obj_add_info to customize object info attributes. +* Tools + + lstopo now displays I/O devices by default. Several options are + added to configure the I/O discovery. + + hwloc-calc and hwloc-bind now accept I/O devices as input. + + Add --restrict option to hwloc-calc and hwloc-distribute. + + Add --sep option to change the output field separator in hwloc-calc. + + Add --whole-system option to hwloc-ps. + + +Version 1.2.2 +------------- +* Fix build on AIX 5.2, thanks Utpal Kumar Ray for the report. +* Fix XML import of very large page sizes or counts on 32bits platform, + thanks to Karsten Hopp for the RedHat ticket. +* Fix crash when administrator limitations such as Linux cgroup require + to restrict distance matrices. Thanks to Ake Sandgren for reporting the + problem. +* Fix the removal of objects such as AMD Magny-Cours dual-node sockets + in case of administrator restrictions. +* Improve error reporting and messages in case of wrong synthetic topology + description. +* Several other minor internal fixes and documentation improvements. + + +Version 1.2.1 +------------- +* Improve support of AMD Bulldozer "Compute-Unit" modules by detecting + logical processors with different core IDs on Linux. +* Fix hwloc-ps crash when listing processes from another Linux cpuset. + Thanks to Carl Smith for reporting the problem. +* Fix build on AIX and Solaris. Thanks to Carl Smith and Andreas Kupries + for reporting the problems. +* Fix cache size detection on Darwin. Thanks to Erkcan Özcan for reporting + the problem. +* Make configure fail if --enable-xml or --enable-cairo is given and + proper support cannot be found. Thanks to Andreas Kupries for reporting + the XML problem. +* Fix spurious L1 cache detection on AIX. Thanks to Hendryk Bockelmann + for reporting the problem. +* Fix hwloc_get_last_cpu_location(THREAD) on Linux. Thanks to Gabriele + Fatigati for reporting the problem. +* Fix object distance detection on Solaris. +* Add pthread_self weak symbol to ease static linking. +* Minor documentation fixes. + + +Version 1.2.0 +------------- +* Major features + + Expose latency matrices in the API as an array of distance structures + within objects. Add several helpers to find distances. + + Add hwloc_topology_set_distance_matrix() and environment variables + to provide a matrix of distances between a given set of objects. + + Add hwloc_get_last_cpu_location() and hwloc_get_proc_last_cpu_location() + to retrieve the processors where a process or thread recently ran. + - Add the corresponding --get-last-cpu-location option to hwloc-bind. + + Add hwloc_topology_restrict() to restrict an existing topology to a + given cpuset. + - Add the corresponding --restrict option to lstopo. +* Minor API updates + + Add hwloc_bitmap_list_sscanf/snprintf/asprintf to convert between bitmaps + and strings such as 4-5,7-9,12,15- + + hwloc_bitmap_set/clr_range() now support infinite ranges. + + Clarify the difference between inserting Misc objects by cpuset or by + parent. + + hwloc_insert_misc_object_by_cpuset() now returns NULL in case of error. +* Discovery improvements + + x86 backend (for freebsd): add x2APIC support + + Support standard device-tree phandle, to get better support on e.g. ARM + systems providing it. + + Detect cache size on AIX. Thanks Christopher and IBM. + + Improve grouping to support asymmetric topologies. +* Tools + + Command-line tools now support "all" and "root" special locations + consisting in the entire topology, as well as type names with depth + attributes such as L2 or Group4. + + hwloc-calc improvements: + - Add --number-of/-N option to report the number of objects of a given + type or depth. + - -I is now equivalent to --intersect for listing the indexes of + objects of a given type or depth that intersects the input. + - Add -H to report the output as a hierarchical combination of types + and depths. + + Add --thissystem to lstopo. + + Add lstopo-win, a console-less lstopo variant on Windows. +* Miscellaneous + + Remove C99 usage from code base. + + Rename hwloc-gather-topology.sh into hwloc-gather-topology + + Fix AMD cache discovery on freebsd when there is no L3 cache, thanks + Andriy Gapon for the fix. + + +Version 1.1.2 +------------- +* Fix a segfault in the distance-based grouping code when some objects + are not placed in any group. Thanks to Bernd Kallies for reporting + the problem and providing a patch. +* Fix the command-line parsing of hwloc-bind --mempolicy interleave. + Thanks to Guy Streeter for reporting the problem. +* Stop truncating the output in hwloc_obj_attr_snprintf() and in the + corresponding lstopo output. Thanks to Guy Streeter for reporting the + problem. +* Fix object levels ordering in synthetic topologies. +* Fix potential incoherency between device tree and kernel information, + when SMT is disabled on Power machines. +* Fix and document the behavior of hwloc_topology_set_synthetic() in case + of invalid argument. Thanks to Guy Streeter for reporting the problem. +* Add some verbose error message reporting when it looks like the OS + gives erroneous information. +* Do not include unistd.h and stdint.h in public headers on Windows. +* Move config.h files into their own subdirectories to avoid name + conflicts when AC_CONFIG_HEADERS adds -I's for them. +* Remove the use of declaring variables inside "for" loops. +* Some other minor fixes. +* Many minor documentation fixes. + + +Version 1.1.1 +------------- +* Add hwloc_get_api_version() which returns the version of hwloc used + at runtime. Thanks to Guy Streeter for the suggestion. +* Fix the number of hugepages reported for NUMA nodes on Linux. +* Fix hwloc_bitmap_to_ulong() right after allocating the bitmap. + Thanks to Bernd Kallies for reporting the problem. +* Fix hwloc_bitmap_from_ith_ulong() to properly zero the first ulong. + Thanks to Guy Streeter for reporting the problem. +* Fix hwloc_get_membind_nodeset() on Linux. + Thanks to Bernd Kallies for reporting the problem and providing a patch. +* Fix some file descriptor leaks in the Linux discovery. +* Fix the minimum width of NUMA nodes, caches and the legend in the graphical + lstopo output. Thanks to Jirka Hladky for reporting the problem. +* Various fixes to bitmap conversion from/to taskset-strings. +* Fix and document snprintf functions behavior when the buffer size is too + small or zero. Thanks to Guy Streeter for reporting the problem. +* Fix configure to avoid spurious enabling of the cpuid backend. + Thanks to Tim Anderson for reporting the problem. +* Cleanup error management in hwloc-gather-topology.sh. + Thanks to Jirka Hladky for reporting the problem and providing a patch. +* Add a manpage and usage for hwloc-gather-topology.sh on Linux. + Thanks to Jirka Hladky for providing a patch. +* Memory binding documentation enhancements. + + +Version 1.1.0 +------------- + +* API + + Increase HWLOC_API_VERSION to 0x00010100 so that API changes may be + detected at build-time. + + Add a memory binding interface. + + The cpuset API (hwloc/cpuset.h) is now deprecated. It is replaced by + the bitmap API (hwloc/bitmap.h) which offers the same features with more + generic names since it applies to CPU sets, node sets and more. + Backward compatibility with the cpuset API and ABI is still provided but + it will be removed in a future release. + Old types (hwloc_cpuset_t, ...) are still available as a way to clarify + what kind of hwloc_bitmap_t each API function manipulates. + Upgrading to the new API only requires to replace hwloc_cpuset_ function + calls with the corresponding hwloc_bitmap_ calls, with the following + renaming exceptions: + - hwloc_cpuset_cpu -> hwloc_bitmap_only + - hwloc_cpuset_all_but_cpu -> hwloc_bitmap_allbut + - hwloc_cpuset_from_string -> hwloc_bitmap_sscanf + + Add an `infos' array in each object to store couples of info names and + values. It enables generic storage of things like the old dmi board infos + that were previously stored in machine specific attributes. + + Add linesize cache attribute. +* Features + + Bitmaps (and thus CPU sets and node sets) are dynamically (re-)allocated, + the maximal number of CPUs (HWLOC_NBMAXCPUS) has been removed. + + Improve the distance-based grouping code to better support irregular + distance matrices. + + Add support for device-tree to get cache information (useful on Power + architectures). +* Helpers + + Add NVIDIA CUDA helpers in cuda.h and cudart.h to ease interoperability + with CUDA Runtime and Driver APIs. + + Add Myrinet Express helper in myriexpress.h to ease interoperability. +* Tools + + lstopo now displays physical/OS indexes by default in graphical mode + (use -l to switch back to logical indexes). The textual output still uses + logical by default (use -p to switch to physical indexes). + + lstopo prefixes logical indexes with `L#' and physical indexes with `P#'. + Physical indexes are also printed as `P#N' instead of `phys=N' within + object attributes (in parentheses). + + Add a legend at the bottom of the lstopo graphical output, use --no-legend + to remove it. + + Add hwloc-ps to list process' bindings. + + Add --membind and --mempolicy options to hwloc-bind. + + Improve tools command-line options by adding a generic --input option + (and more) which replaces the old --xml, --synthetic and --fsys-root. + + Cleanup lstopo output configuration by adding --output-format. + + Add --intersect in hwloc-calc, and replace --objects with --largest. + + Add the ability to work on standard input in hwloc-calc. + + Add --from, --to and --at in hwloc-distrib. + + Add taskset-specific functions and command-line tools options to + manipulate CPU set strings in the format of the taskset program. + + Install hwloc-gather-topology.sh on Linux. + + +Version 1.0.3 +------------- + +* Fix support for Linux cpuset when emulated by a cgroup mount point. +* Remove unneeded runtime dependency on libibverbs.so in the library and + all utils programs. +* Fix hwloc_cpuset_to_linux_libnuma_ulongs in case of non-linear OS-indexes + for NUMA nodes. +* lstopo now displays physical/OS indexes by default in graphical mode + (use -l to switch back to logical indexes). The textual output still uses + logical by default (use -p to switch to physical indexes). + + +Version 1.0.2 +------------- + +* Public headers can now be included directly from C++ programs. +* Solaris fix for non-contiguous cpu numbers. Thanks to Rolf vandeVaart for + reporting the issue. +* Darwin 10.4 fix. Thanks to Olivier Cessenat for reporting the issue. +* Revert 1.0.1 patch that ignored sockets with unknown ID values since it + only slightly helped POWER7 machines with old Linux kernels while it + prevents recent kernels from getting the complete POWER7 topology. +* Fix hwloc_get_common_ancestor_obj(). +* Remove arch-specific bits in public headers. +* Some fixes in the lstopo graphical output. +* Various man page clarifications and minor updates. + + +Version 1.0.1 +------------- + +* Various Solaris fixes. Thanks to Yannick Martin for reporting the issue. +* Fix "non-native" builds on x86 platforms (e.g., when building 32 + bit executables with compilers that natively build 64 bit). +* Ignore sockets with unknown ID values (which fixes issues on POWER7 + machines). Thanks to Greg Bauer for reporting the issue. +* Various man page clarifications and minor updates. +* Fixed memory leaks in hwloc_setup_group_from_min_distance_clique(). +* Fix cache type filtering on MS Windows 7. Thanks to Αλέξανδρος + Παπαδογιαννάκ for reporting the issue. +* Fixed warnings when compiling with -DNDEBUG. + + +Version 1.0.0 +------------- + +* The ABI of the library has changed. +* Backend updates + + Add FreeBSD support. + + Add x86 cpuid based backend. + + Add Linux cgroup support to the Linux cpuset code. + + Support binding of entire multithreaded process on Linux. + + Fix and enable Group support in Windows. + + Cleanup XML export/import. +* Objects + + HWLOC_OBJ_PROC is renamed into HWLOC_OBJ_PU for "Processing Unit", + its stringified type name is now "PU". + + Use new HWLOC_OBJ_GROUP objects instead of MISC when grouping + objects according to NUMA distances or arbitrary OS aggregation. + + Rework memory attributes. + + Add different cpusets in each object to specify processors that + are offline, unavailable, ... + + Cleanup the storage of object names and DMI infos. +* Features + + Add support for looking up specific PID topology information. + + Add hwloc_topology_export_xml() to export the topology in a XML file. + + Add hwloc_topology_get_support() to retrieve the supported features + for the current topology context. + + Support non-SYSTEM object as the root of the tree, use MACHINE in + most common cases. + + Add hwloc_get_*cpubind() routines to retrieve the current binding + of processes and threads. +* API + + Add HWLOC_API_VERSION to help detect the currently used API version. + + Add missing ending "e" to *compare* functions. + + Add several routines to emulate PLPA functions. + + Rename and rework the cpuset and/or/xor/not/clear operators to output + their result in a dedicated argument instead of modifying one input. + + Deprecate hwloc_obj_snprintf() in favor of hwloc_obj_type/attr_snprintf(). + + Clarify the use of parent and ancestor in the API, do not use father. + + Replace hwloc_get_system_obj() with hwloc_get_root_obj(). + + Return -1 instead of HWLOC_OBJ_TYPE_MAX in the API since the latter + isn't public. + + Relax constraints in hwloc_obj_type_of_string(). + + Improve displaying of memory sizes. + + Add 0x prefix to cpuset strings. +* Tools + + lstopo now displays logical indexes by default, use --physical to + revert back to OS/physical indexes. + + Add colors in the lstopo graphical outputs to distinguish between online, + offline, reserved, ... objects. + + Extend lstopo to show cpusets, filter objects by type, ... + + Renamed hwloc-mask into hwloc-calc which supports many new options. +* Documentation + + Add a hwloc(7) manpage containing general information. + + Add documentation about how to switch from PLPA to hwloc. + + Cleanup the distributed documentation files. +* Miscellaneous + + Many compilers warning fixes. + + Cleanup the ABI by using the visibility attribute. + + Add project embedding support. + + +Version 0.9.4 (unreleased) +-------------------------- + +* Fix reseting colors to normal in lstopo -.txt output. +* Fix Linux pthread_t binding error report. + + +Version 0.9.3 +------------- + +* Fix autogen.sh to work with Autoconf 2.63. +* Fix various crashes in particular conditions: + - xml files with root attributes + - offline CPUs + - partial sysfs support + - unparseable /proc/cpuinfo + - ignoring NUMA level while Misc level have been generated +* Tweak documentation a bit +* Do not require the pthread library for binding the current thread on Linux +* Do not erroneously consider the sched_setaffinity prototype is the old version + when there is actually none. +* Fix _syscall3 compilation on archs for which we do not have the + sched_setaffinity system call number. +* Fix AIX binding. +* Fix libraries dependencies: now only lstopo depends on libtermcap, fix + binutils-gold link +* Have make check always build and run hwloc-hello.c +* Do not limit size of a cpuset. + + +Version 0.9.2 +------------- + +* Trivial documentation changes. + + +Version 0.9.1 +------------- + +* Re-branded to "hwloc" and moved to the Open MPI project, relicensed under the + BSD license. +* The prefix of all functions and tools is now hwloc, and some public + functions were also renamed for real. +* Group NUMA nodes into Misc objects according to their physical distance + that may be reported by the OS/BIOS. + May be ignored by setting HWLOC_IGNORE_DISTANCES=1 in the environment. +* Ignore offline CPUs on Solaris. +* Improved binding support on AIX. +* Add HP-UX support. +* CPU sets are now allocated/freed dynamically. +* Add command line options to tune the lstopo graphical output, add + semi-graphical textual output +* Extend topobind to support multiple cpusets or objects on the command + line as topomask does. +* Add an Infiniband-specific helper hwloc/openfabrics-verbs.h to retrieve + the physical location of IB devices. + + +Version 0.9 (libtopology) +------------------------- + +* First release. diff --git a/opal/mca/hwloc/hwloc2a/hwloc/README b/opal/mca/hwloc/hwloc2a/hwloc/README new file mode 100644 index 00000000000..eadf3bc6a00 --- /dev/null +++ b/opal/mca/hwloc/hwloc2a/hwloc/README @@ -0,0 +1,65 @@ +Introduction + +The Hardware Locality (hwloc) software project aims at easing the process of +discovering hardware resources in parallel architectures. It offers +command-line tools and a C API for consulting these resources, their locality, +attributes, and interconnection. hwloc primarily aims at helping +high-performance computing (HPC) applications, but is also applicable to any +project seeking to exploit code and/or data locality on modern computing +platforms. + +hwloc is actually made of two subprojects distributed together: + + * The original hwloc project for describing the internals of computing nodes. + It is described in details between sections Hardware Locality (hwloc) + Introduction and Network Locality (netloc). + * The network-oriented companion called netloc (Network Locality), described + in details starting at section Network Locality (netloc). Netloc may be + disabled, but the original hwloc cannot. Both hwloc and netloc APIs are + documented after these sections. + +Installation + +hwloc (http://www.open-mpi.org/projects/hwloc/) is available under the BSD +license. It is hosted as a sub-project of the overall Open MPI project (http:// +www.open-mpi.org/). Note that hwloc does not require any functionality from +Open MPI -- it is a wholly separate (and much smaller!) project and code base. +It just happens to be hosted as part of the overall Open MPI project. + +Nightly development snapshots are available on the web site. Additionally, the +code can be directly cloned from Git: + +shell$ git clone https://github.com/open-mpi/hwloc.git +shell$ cd hwloc +shell$ ./autogen.sh + +Note that GNU Autoconf >=2.63, Automake >=1.11 and Libtool >=2.2.6 are required +when building from a Git clone. + +Installation by itself is the fairly common GNU-based process: + +shell$ ./configure --prefix=... +shell$ make +shell$ make install + +hwloc- and netloc-specific configure options and requirements are documented in +sections hwloc Installation and Netloc Installation respectively. + +Also note that if you install supplemental libraries in non-standard locations, +hwloc's configure script may not be able to find them without some help. You +may need to specify additional CPPFLAGS, LDFLAGS, or PKG_CONFIG_PATH values on +the configure command line. + +For example, if libpciaccess was installed into /opt/pciaccess, hwloc's +configure script may not find it be default. Try adding PKG_CONFIG_PATH to the +./configure command line, like this: + +./configure PKG_CONFIG_PATH=/opt/pciaccess/lib/pkgconfig ... + +Running the "lstopo" tool is a good way to check as a graphical output whether +hwloc properly detected the architecture of your node. Netloc command-line +tools can be used to display the network topology interconnecting your nodes. + + + +See https://www.open-mpi.org/projects/hwloc/doc/ for more hwloc documentation. diff --git a/opal/mca/hwloc/hwloc2a/hwloc/VERSION b/opal/mca/hwloc/hwloc2a/hwloc/VERSION new file mode 100644 index 00000000000..cb487e94a5c --- /dev/null +++ b/opal/mca/hwloc/hwloc2a/hwloc/VERSION @@ -0,0 +1,47 @@ +# This is the VERSION file for hwloc, describing the precise version +# of hwloc in this distribution. The various components of the version +# number below are combined to form a single version number string. + +# major, minor, and release are generally combined in the form +# ... If release is zero, then it is omitted. + +# Please update HWLOC_VERSION in contrib/windows/private_config.h too. + +major=2 +minor=0 +release=0 + +# greek is used for alpha or beta release tags. If it is non-empty, +# it will be appended to the version number. It does not have to be +# numeric. Common examples include a1 (alpha release 1), b1 (beta +# release 1), sc2005 (Super Computing 2005 release). The only +# requirement is that it must be entirely printable ASCII characters +# and have no white space. + +greek=a1 + +# The date when this release was created + +date="Unreleased developer copy" + +# If snapshot=1, then use the value from snapshot_version as the +# entire hwloc version (i.e., ignore major, minor, release, and +# greek). This is only set to 1 when making snapshot tarballs. +snapshot=1 +snapshot_version=shmem-20170815.1857.git2478ce8 + +# The shared library version of hwloc's public library. This version +# is maintained in accordance with the "Library Interface Versions" +# chapter from the GNU Libtool documentation. Notes: + +# 1. Since version numbers are associated with *releases*, the version +# number maintained on the hwloc git master (and developer branches) +# is always 0:0:0. + +# 2. Version numbers are described in the Libtool current:revision:age +# format. + +libhwloc_so_version=0:0:0 +libnetloc_so_version=0:0:0 + +# Please also update the lines in contrib/windows/libhwloc.vcxproj diff --git a/opal/mca/hwloc/hwloc1113/hwloc/config/distscript.sh b/opal/mca/hwloc/hwloc2a/hwloc/config/distscript.sh similarity index 100% rename from opal/mca/hwloc/hwloc1113/hwloc/config/distscript.sh rename to opal/mca/hwloc/hwloc2a/hwloc/config/distscript.sh diff --git a/opal/mca/hwloc/hwloc1113/hwloc/config/hwloc.m4 b/opal/mca/hwloc/hwloc2a/hwloc/config/hwloc.m4 similarity index 92% rename from opal/mca/hwloc/hwloc1113/hwloc/config/hwloc.m4 rename to opal/mca/hwloc/hwloc2a/hwloc/config/hwloc.m4 index 6807624cef8..b086e7c79b3 100644 --- a/opal/mca/hwloc/hwloc1113/hwloc/config/hwloc.m4 +++ b/opal/mca/hwloc/hwloc2a/hwloc/config/hwloc.m4 @@ -1,7 +1,7 @@ dnl -*- Autoconf -*- dnl dnl Copyright © 2009-2016 Inria. All rights reserved. -dnl Copyright © 2009-2012, 2015-2016 Université Bordeaux +dnl Copyright © 2009-2012, 2015-2017 Université Bordeaux dnl Copyright © 2004-2005 The Trustees of Indiana University and Indiana dnl University Research and Technology dnl Corporation. All rights reserved. @@ -9,7 +9,7 @@ dnl Copyright © 2004-2012 The Regents of the University of California. dnl All rights reserved. dnl Copyright © 2004-2008 High Performance Computing Center Stuttgart, dnl University of Stuttgart. All rights reserved. -dnl Copyright © 2006-2016 Cisco Systems, Inc. All rights reserved. +dnl Copyright © 2006-2017 Cisco Systems, Inc. All rights reserved. dnl Copyright © 2012 Blue Brain Project, BBP/EPFL. All rights reserved. dnl Copyright © 2012 Oracle and/or its affiliates. All rights reserved. dnl See COPYING in top-level directory. @@ -179,7 +179,7 @@ EOF]) # List of components to be built, either statically or dynamically. # To be enlarged below. # - hwloc_components="noos xml synthetic custom xml_nolibxml" + hwloc_components="noos xml synthetic xml_nolibxml" # # Check OS support @@ -197,10 +197,14 @@ EOF]) hwloc_linux=yes AC_MSG_RESULT([Linux]) hwloc_components="$hwloc_components linux" - if test x$enable_pci != xno; then - hwloc_components="$hwloc_components linuxpci" - AC_DEFINE(HWLOC_HAVE_LINUXPCI, 1, [Define to 1 if building the Linux PCI component]) - hwloc_linuxpci_happy=yes + if test "x$enable_io" != xno; then + hwloc_components="$hwloc_components linuxio" + AC_DEFINE(HWLOC_HAVE_LINUXIO, 1, [Define to 1 if building the Linux I/O component]) + hwloc_linuxio_happy=yes + if test x$enable_pci != xno; then + AC_DEFINE(HWLOC_HAVE_LINUXPCI, 1, [Define to 1 if enabling Linux-specific PCI discovery in the Linux I/O component]) + hwloc_linuxpci_happy=yes + fi fi ;; *-*-irix*) @@ -227,12 +231,6 @@ EOF]) AC_MSG_RESULT([AIX]) hwloc_components="$hwloc_components aix" ;; - *-*-osf*) - AC_DEFINE(HWLOC_OSF_SYS, 1, [Define to 1 on OSF]) - hwloc_osf=yes - AC_MSG_RESULT([OSF]) - hwloc_components="$hwloc_components osf" - ;; *-*-hpux*) AC_DEFINE(HWLOC_HPUX_SYS, 1, [Define to 1 on HP-UX]) hwloc_hpux=yes @@ -263,7 +261,8 @@ EOF]) AC_MSG_WARN([***********************************************************]) AC_MSG_WARN([*** hwloc does not support this system.]) AC_MSG_WARN([*** hwloc will *attempt* to build (but it may not work).]) - AC_MSG_WARN([*** hwloc run-time results may be reduced to showing just one processor.]) + AC_MSG_WARN([*** hwloc run-time results may be reduced to showing just one processor,]) + AC_MSG_WARN([*** and binding will not be supported.]) AC_MSG_WARN([*** You have been warned.]) AC_MSG_WARN([*** Pausing to give you time to read this message...]) AC_MSG_WARN([***********************************************************]) @@ -412,9 +411,11 @@ EOF]) ]) AC_CHECK_HEADERS([sys/lgrp_user.h], [ - AC_CHECK_LIB([lgrp], [lgrp_latency_cookie], + AC_CHECK_LIB([lgrp], [lgrp_init], [HWLOC_LIBS="-llgrp $HWLOC_LIBS" - AC_DEFINE([HAVE_LIBLGRP], 1, [Define to 1 if we have -llgrp])]) + AC_DEFINE([HAVE_LIBLGRP], 1, [Define to 1 if we have -llgrp]) + AC_CHECK_DECLS([lgrp_latency_cookie],,,[[#include ]]) + ]) ]) AC_CHECK_HEADERS([kstat.h], [ AC_CHECK_LIB([kstat], [main], @@ -432,12 +433,12 @@ EOF]) [HWLOC_LIBS="-lpicl $HWLOC_LIBS"])]) AC_CHECK_DECLS([_SC_NPROCESSORS_ONLN, - _SC_NPROCESSORS_CONF, - _SC_NPROC_ONLN, - _SC_NPROC_CONF, - _SC_PAGESIZE, - _SC_PAGE_SIZE, - _SC_LARGE_PAGESIZE],,[:],[[#include ]]) + _SC_NPROCESSORS_CONF, + _SC_NPROC_ONLN, + _SC_NPROC_CONF, + _SC_PAGESIZE, + _SC_PAGE_SIZE, + _SC_LARGE_PAGESIZE],,[:],[[#include ]]) AC_HAVE_HEADERS([mach/mach_host.h]) AC_HAVE_HEADERS([mach/mach_init.h], [ @@ -590,22 +591,16 @@ EOF]) AC_MSG_RESULT([yes])], [AC_MSG_RESULT([no])]) - AC_MSG_CHECKING([for working syscall]) + AC_MSG_CHECKING([for working syscall with 6 parameters]) AC_LINK_IFELSE([ AC_LANG_PROGRAM([[ #include #include - ]], [[syscall(1, 2, 3);]])], - [AC_DEFINE([HWLOC_HAVE_SYSCALL], [1], [Define to 1 if function `syscall' is available]) + ]], [[syscall(0, 1, 2, 3, 4, 5, 6);]])], + [AC_DEFINE([HWLOC_HAVE_SYSCALL], [1], [Define to 1 if function `syscall' is available with 6 parameters]) AC_MSG_RESULT([yes])], [AC_MSG_RESULT([no])]) - # Check for kerrighed, but don't abort if not found. It's illegal - # to pass in an empty 3rd argument, but we trust the output of - # pkg-config, so just give it a value that will always work: - # printf. - HWLOC_PKG_CHECK_MODULES([KERRIGHED], [kerrighed >= 2.0], [printf], [stdio.h], [], [:]) - AC_PATH_PROGS([HWLOC_MS_LIB], [lib]) AC_ARG_VAR([HWLOC_MS_LIB], [Path to Microsoft's Visual Studio `lib' tool]) @@ -663,7 +658,8 @@ EOF]) AC_DEFINE([HWLOC_HAVE_CLZL], [1], [Define to 1 if you have the `clzl' function.]) ]) - AC_CHECK_FUNCS([openat], [hwloc_have_openat=yes]) + AS_IF([test "$hwloc_c_vendor" != "android"], [AC_CHECK_FUNCS([openat], [hwloc_have_openat=yes])]) + AC_CHECK_HEADERS([malloc.h]) AC_CHECK_FUNCS([getpagesize memalign posix_memalign]) @@ -706,44 +702,6 @@ EOF]) ) AC_CHECK_FUNCS([cpuset_setid]) - # Linux libnuma support - hwloc_linux_libnuma_happy=no - if test "x$enable_libnuma" != "xno"; then - hwloc_linux_libnuma_happy=yes - AC_CHECK_HEADERS([numaif.h], [ - AC_CHECK_LIB([numa], [numa_available], [HWLOC_LINUX_LIBNUMA_LIBS="-lnuma"], [hwloc_linux_libnuma_happy=no]) - ], [hwloc_linux_libnuma_happy=no]) - fi - AC_SUBST(HWLOC_LINUX_LIBNUMA_LIBS) - # If we asked for Linux libnuma support but couldn't deliver, fail - HWLOC_LIBS="$HWLOC_LIBS $HWLOC_LINUX_LIBNUMA_LIBS" - AS_IF([test "$enable_libnuma" = "yes" -a "$hwloc_linux_libnuma_happy" = "no"], - [AC_MSG_WARN([Specified --enable-libnuma switch, but could not]) - AC_MSG_WARN([find appropriate support]) - AC_MSG_ERROR([Cannot continue])]) - if test "x$hwloc_linux_libnuma_happy" = "xyes"; then - tmp_save_LIBS="$LIBS" - LIBS="$LIBS $HWLOC_LINUX_LIBNUMA_LIBS" - - AC_CHECK_LIB([numa], [set_mempolicy], [ - enable_set_mempolicy=yes - AC_DEFINE([HWLOC_HAVE_SET_MEMPOLICY], [1], [Define to 1 if set_mempolicy is available.]) - ]) - AC_CHECK_LIB([numa], [mbind], [ - enable_mbind=yes - AC_DEFINE([HWLOC_HAVE_MBIND], [1], [Define to 1 if mbind is available.]) - ]) - AC_CHECK_LIB([numa], [migrate_pages], [ - enable_migrate_pages=yes - AC_DEFINE([HWLOC_HAVE_MIGRATE_PAGES], [1], [Define to 1 if migrate_pages is available.]) - ]) - AC_CHECK_LIB([numa], [move_pages], [ - AC_DEFINE([HWLOC_HAVE_MOVE_PAGES], [1], [Define to 1 if move_pages is available.]) - ]) - - LIBS="$tmp_save_LIBS" - fi - # Linux libudev support if test "x$enable_libudev" != xno; then AC_CHECK_HEADERS([libudev.h], [ @@ -757,29 +715,31 @@ EOF]) # PCI support via libpciaccess. NOTE: we do not support # libpci/pciutils because that library is GPL and is incompatible # with our BSD license. - hwloc_pci_happy=no - if test "x$enable_pci" != xno; then - hwloc_pci_happy=yes - HWLOC_PKG_CHECK_MODULES([PCIACCESS], [pciaccess], [pci_slot_match_iterator_create], [pciaccess.h], [:], [hwloc_pci_happy=no]) + hwloc_pciaccess_happy=no + if test "x$enable_io" != xno && test "x$enable_pci" != xno; then + hwloc_pciaccess_happy=yes + HWLOC_PKG_CHECK_MODULES([PCIACCESS], [pciaccess], [pci_slot_match_iterator_create], [pciaccess.h], [:], [hwloc_pciaccess_happy=no]) + + # Only add the REQUIRES if we got pciaccess through pkg-config. + # Otherwise we don't know if pciaccess.pc is installed + AS_IF([test "$hwloc_pciaccess_happy" = "yes"], [HWLOC_PCIACCESS_REQUIRES=pciaccess]) # Just for giggles, if we didn't find a pciaccess pkg-config, # just try looking for its header file and library. - AS_IF([test "$hwloc_pci_happy" != "yes"], + AS_IF([test "$hwloc_pciaccess_happy" != "yes"], [AC_CHECK_HEADER([pciaccess.h], [AC_CHECK_LIB([pciaccess], [pci_slot_match_iterator_create], - [hwloc_pci_happy=yes + [hwloc_pciaccess_happy=yes HWLOC_PCIACCESS_LIBS="-lpciaccess"]) ]) ]) - AS_IF([test "$hwloc_pci_happy" = "yes"], - [HWLOC_PCIACCESS_REQUIRES=pciaccess - hwloc_pci_lib=pciaccess - hwloc_components="$hwloc_components pci" + AS_IF([test "$hwloc_pciaccess_happy" = "yes"], + [hwloc_components="$hwloc_components pci" hwloc_pci_component_maybeplugin=1]) fi # If we asked for pci support but couldn't deliver, fail - AS_IF([test "$enable_pci" = "yes" -a "$hwloc_pci_happy" = "no"], + AS_IF([test "$enable_pci" = "yes" -a "$hwloc_pciaccess_happy" = "no"], [AC_MSG_WARN([Specified --enable-pci switch, but could not]) AC_MSG_WARN([find appropriate support]) AC_MSG_ERROR([Cannot continue])]) @@ -787,7 +747,7 @@ EOF]) # OpenCL support hwloc_opencl_happy=no - if test "x$enable_opencl" != "xno"; then + if test "x$enable_io" != xno && test "x$enable_opencl" != "xno"; then hwloc_opencl_happy=yes AC_CHECK_HEADERS([CL/cl_ext.h], [ AC_CHECK_LIB([OpenCL], [clGetDeviceIDs], [HWLOC_OPENCL_LIBS="-lOpenCL"], [hwloc_opencl_happy=no]) @@ -824,7 +784,7 @@ EOF]) # CUDA support hwloc_have_cuda=no hwloc_have_cudart=no - if test "x$enable_cuda" != "xno"; then + if test "x$enable_io" != xno && test "x$enable_cuda" != "xno"; then AC_CHECK_HEADERS([cuda.h], [ AC_MSG_CHECKING(if CUDA_VERSION >= 3020) AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[ @@ -873,7 +833,7 @@ EOF]) # NVML support hwloc_nvml_happy=no - if test "x$enable_nvml" != "xno"; then + if test "x$enable_io" != xno && test "x$enable_nvml" != "xno"; then hwloc_nvml_happy=yes AC_CHECK_HEADERS([nvml.h], [ AC_CHECK_LIB([nvidia-ml], [nvmlInit], [HWLOC_NVML_LIBS="-lnvidia-ml"], [hwloc_nvml_happy=no]) @@ -933,7 +893,7 @@ EOF]) # GL Support hwloc_gl_happy=no - if test "x$enable_gl" != "xno"; then + if test "x$enable_io" != xno && test "x$enable_gl" != "xno"; then hwloc_gl_happy=yes AS_IF([test "$hwloc_enable_X11" != "yes"], @@ -948,6 +908,8 @@ EOF]) AC_DEFINE([HWLOC_HAVE_GL], [1], [Define to 1 if you have the GL module components.]) HWLOC_GL_LIBS="-lXNVCtrl -lXext -lX11" AC_SUBST(HWLOC_GL_LIBS) + # FIXME we actually don't know if xext.pc and x11.pc are installed + # since we didn't look for Xext and X11 using pkg-config HWLOC_GL_REQUIRES="xext x11" hwloc_have_gl=yes hwloc_components="$hwloc_components gl" @@ -989,7 +951,7 @@ EOF]) AC_MSG_CHECKING([for x86 cpuid]) old_CPPFLAGS="$CPPFLAGS" CPPFLAGS="$CPPFLAGS -I$HWLOC_top_srcdir/include" - # We need hwloc_uint64_t but we can't use hwloc/autogen/config.h before configure ends. + # We need hwloc_uint64_t but we can't use autogen/config.h before configure ends. # So pass #include/#define manually here for now. CPUID_CHECK_HEADERS= CPUID_CHECK_DEFINE= @@ -1098,7 +1060,7 @@ EOF]) AC_SUBST(HWLOC_PLUGINS_DIR) # Static components output file - hwloc_static_components_dir=${HWLOC_top_builddir}/src + hwloc_static_components_dir=${HWLOC_top_builddir}/hwloc mkdir -p ${hwloc_static_components_dir} hwloc_static_components_file=${hwloc_static_components_dir}/static-components.h rm -f ${hwloc_static_components_file} @@ -1164,7 +1126,7 @@ EOF]) AS_IF([test "$hwloc_mode" = "embedded"], [HWLOC_EMBEDDED_CFLAGS=$HWLOC_CFLAGS HWLOC_EMBEDDED_CPPFLAGS=$HWLOC_CPPFLAGS - HWLOC_EMBEDDED_LDADD='$(HWLOC_top_builddir)/src/libhwloc_embedded.la' + HWLOC_EMBEDDED_LDADD='$(HWLOC_top_builddir)/hwloc/libhwloc_embedded.la' HWLOC_EMBEDDED_LIBS=$HWLOC_LIBS HWLOC_LIBS=]) AC_SUBST(HWLOC_EMBEDDED_CFLAGS) @@ -1176,7 +1138,7 @@ EOF]) AC_CONFIG_FILES( hwloc_config_prefix[Makefile] hwloc_config_prefix[include/Makefile] - hwloc_config_prefix[src/Makefile ] + hwloc_config_prefix[hwloc/Makefile ] ) # Cleanup @@ -1204,12 +1166,12 @@ AC_DEFUN([HWLOC_DO_AM_CONDITIONALS],[ AM_CONDITIONAL([HWLOC_HAVE_GCC], [test "x$GCC" = "xyes"]) AM_CONDITIONAL([HWLOC_HAVE_MS_LIB], [test "x$HWLOC_MS_LIB" != "x"]) AM_CONDITIONAL([HWLOC_HAVE_OPENAT], [test "x$hwloc_have_openat" = "xyes"]) - AM_CONDITIONAL([HWLOC_HAVE_LINUX_LIBNUMA], - [test "x$hwloc_have_linux_libnuma" = "xyes"]) AM_CONDITIONAL([HWLOC_HAVE_SCHED_SETAFFINITY], [test "x$hwloc_have_sched_setaffinity" = "xyes"]) AM_CONDITIONAL([HWLOC_HAVE_PTHREAD], [test "x$hwloc_have_pthread" = "xyes"]) + AM_CONDITIONAL([HWLOC_HAVE_LINUX_LIBNUMA], + [test "x$hwloc_have_linux_libnuma" = "xyes"]) AM_CONDITIONAL([HWLOC_HAVE_LIBIBVERBS], [test "x$hwloc_have_libibverbs" = "xyes"]) AM_CONDITIONAL([HWLOC_HAVE_CUDA], @@ -1222,11 +1184,9 @@ AC_DEFUN([HWLOC_DO_AM_CONDITIONALS],[ [test "x$hwloc_have_cudart" = "xyes"]) AM_CONDITIONAL([HWLOC_HAVE_LIBXML2], [test "$hwloc_libxml2_happy" = "yes"]) AM_CONDITIONAL([HWLOC_HAVE_CAIRO], [test "$hwloc_cairo_happy" = "yes"]) - AM_CONDITIONAL([HWLOC_HAVE_PCI], [test "$hwloc_pci_happy" = "yes"]) + AM_CONDITIONAL([HWLOC_HAVE_PCIACCESS], [test "$hwloc_pciaccess_happy" = "yes"]) AM_CONDITIONAL([HWLOC_HAVE_OPENCL], [test "$hwloc_opencl_happy" = "yes"]) AM_CONDITIONAL([HWLOC_HAVE_NVML], [test "$hwloc_nvml_happy" = "yes"]) - AM_CONDITIONAL([HWLOC_HAVE_SET_MEMPOLICY], [test "x$enable_set_mempolicy" != "xno"]) - AM_CONDITIONAL([HWLOC_HAVE_MBIND], [test "x$enable_mbind" != "xno"]) AM_CONDITIONAL([HWLOC_HAVE_BUNZIPP], [test "x$BUNZIPP" != "xfalse"]) AM_CONDITIONAL([HWLOC_HAVE_USER32], [test "x$hwloc_have_user32" = "xyes"]) @@ -1245,7 +1205,6 @@ AC_DEFUN([HWLOC_DO_AM_CONDITIONALS],[ AM_CONDITIONAL([HWLOC_HAVE_NETBSD], [test "x$hwloc_netbsd" = "xyes"]) AM_CONDITIONAL([HWLOC_HAVE_SOLARIS], [test "x$hwloc_solaris" = "xyes"]) AM_CONDITIONAL([HWLOC_HAVE_AIX], [test "x$hwloc_aix" = "xyes"]) - AM_CONDITIONAL([HWLOC_HAVE_OSF], [test "x$hwloc_osf" = "xyes"]) AM_CONDITIONAL([HWLOC_HAVE_HPUX], [test "x$hwloc_hpux" = "xyes"]) AM_CONDITIONAL([HWLOC_HAVE_WINDOWS], [test "x$hwloc_windows" = "xyes"]) AM_CONDITIONAL([HWLOC_HAVE_MINGW32], [test "x$target_os" = "xmingw32"]) @@ -1266,6 +1225,11 @@ AC_DEFUN([HWLOC_DO_AM_CONDITIONALS],[ AM_CONDITIONAL([HWLOC_HAVE_CXX], [test "x$hwloc_have_cxx" = "xyes"]) ]) hwloc_did_am_conditionals=yes + + # For backwards compatibility (i.e., packages that only call + # HWLOC_DO_AM_CONDITIONS, not NETLOC DO_AM_CONDITIONALS), we also have to + # do the netloc AM conditionals here + NETLOC_DO_AM_CONDITIONALS ])dnl #----------------------------------------------------------------------- @@ -1306,8 +1270,8 @@ AC_DEFUN([_HWLOC_CHECK_DECL], [ AC_MSG_CHECKING([whether function $1 has a complete prototype]) AC_REQUIRE([AC_PROG_CC]) AC_COMPILE_IFELSE([AC_LANG_PROGRAM( - [AC_INCLUDES_DEFAULT([$4])] - [$1(1,2,3,4,5,6,7,8,9,10);], + [AC_INCLUDES_DEFAULT([$4])], + [$1(1,2,3,4,5,6,7,8,9,10);] )], [AC_MSG_RESULT([no]) $3], diff --git a/opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_check_attributes.m4 b/opal/mca/hwloc/hwloc2a/hwloc/config/hwloc_check_attributes.m4 similarity index 99% rename from opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_check_attributes.m4 rename to opal/mca/hwloc/hwloc2a/hwloc/config/hwloc_check_attributes.m4 index 96348e819ee..86444a950c5 100644 --- a/opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_check_attributes.m4 +++ b/opal/mca/hwloc/hwloc2a/hwloc/config/hwloc_check_attributes.m4 @@ -531,4 +531,3 @@ AC_DEFUN([_HWLOC_CHECK_ATTRIBUTES], [ AC_DEFINE_UNQUOTED(HWLOC_HAVE_ATTRIBUTE_WEAK_ALIAS, [$hwloc_cv___attribute__weak_alias], [Whether your compiler has __attribute__ weak alias or not]) ]) - diff --git a/opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_check_vendor.m4 b/opal/mca/hwloc/hwloc2a/hwloc/config/hwloc_check_vendor.m4 similarity index 95% rename from opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_check_vendor.m4 rename to opal/mca/hwloc/hwloc2a/hwloc/config/hwloc_check_vendor.m4 index 0963bc1749a..2281113bc64 100644 --- a/opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_check_vendor.m4 +++ b/opal/mca/hwloc/hwloc2a/hwloc/config/hwloc_check_vendor.m4 @@ -11,6 +11,7 @@ dnl University of Stuttgart. All rights reserved. dnl Copyright © 2004-2005 The Regents of the University of California. dnl All rights reserved. dnl Copyright © 2011 Cisco Systems, Inc. All rights reserved. +dnl Copyright © 2015 Inria. All rights reserved. dnl $COPYRIGHT$ dnl dnl Additional copyrights may follow @@ -86,8 +87,13 @@ AC_DEFUN([_HWLOC_CHECK_COMPILER_VENDOR], [ hwloc_check_compiler_vendor_result="unknown" # GNU is probably the most common, so check that one as soon as - # possible. Intel pretends to be GNU, so need to check Intel - # before checking for GNU. + # possible. Intel and Android pretend to be GNU, so need to + # check Intel and Android before checking for GNU. + + # Android + AS_IF([test "$hwloc_check_compiler_vendor_result" = "unknown"], + [HWLOC_IFDEF_IFELSE([__ANDROID__], + [hwloc_check_compiler_vendor_result="android"])]) # Intel AS_IF([test "$hwloc_check_compiler_vendor_result" = "unknown"], @@ -115,6 +121,7 @@ AC_DEFUN([_HWLOC_CHECK_COMPILER_VENDOR], [ [hwloc_check_compiler_vendor_result="comeau"])]) # Compaq C/C++ + # OSF part actually not needed anymore but doesn't hurt AS_IF([test "$hwloc_check_compiler_vendor_result" = "unknown"], [HWLOC_IF_IFELSE([defined(__DECC) || defined(VAXC) || defined(__VAXC)], [hwloc_check_compiler_vendor_result="compaq"], diff --git a/opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_check_visibility.m4 b/opal/mca/hwloc/hwloc2a/hwloc/config/hwloc_check_visibility.m4 similarity index 100% rename from opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_check_visibility.m4 rename to opal/mca/hwloc/hwloc2a/hwloc/config/hwloc_check_visibility.m4 diff --git a/opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_components.m4 b/opal/mca/hwloc/hwloc2a/hwloc/config/hwloc_components.m4 similarity index 100% rename from opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_components.m4 rename to opal/mca/hwloc/hwloc2a/hwloc/config/hwloc_components.m4 diff --git a/opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_get_version.sh b/opal/mca/hwloc/hwloc2a/hwloc/config/hwloc_get_version.sh similarity index 75% rename from opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_get_version.sh rename to opal/mca/hwloc/hwloc2a/hwloc/config/hwloc_get_version.sh index 74bca537cef..815385a8281 100755 --- a/opal/mca/hwloc/hwloc1113/hwloc/config/hwloc_get_version.sh +++ b/opal/mca/hwloc/hwloc2a/hwloc/config/hwloc_get_version.sh @@ -29,24 +29,24 @@ else if test -f "$srcfile"; then ompi_vers=`sed -n " - t clear - : clear - s/^major/HWLOC_MAJOR_VERSION/ - s/^minor/HWLOC_MINOR_VERSION/ - s/^release/HWLOC_RELEASE_VERSION/ - s/^greek/HWLOC_GREEK_VERSION/ - s/\\\${major}/\\\${HWLOC_MAJOR_VERSION}/ - s/\\\${minor}/\\\${HWLOC_MINOR_VERSION}/ - s/\\\${release}/\\\${HWLOC_RELEASE_VERSION}/ - s/\\\${greek}/\\\${HWLOC_GREEK_VERSION}/ - s/^date/HWLOC_RELEASE_DATE/ - s/^snapshot_version/HWLOC_SNAPSHOT_VERSION/ - s/^snapshot/HWLOC_SNAPSHOT/ - t print - b - : print - p" < "$srcfile"` - eval "$ompi_vers" + t clear + : clear + s/^major/HWLOC_MAJOR_VERSION/ + s/^minor/HWLOC_MINOR_VERSION/ + s/^release/HWLOC_RELEASE_VERSION/ + s/^greek/HWLOC_GREEK_VERSION/ + s/\\\${major}/\\\${HWLOC_MAJOR_VERSION}/ + s/\\\${minor}/\\\${HWLOC_MINOR_VERSION}/ + s/\\\${release}/\\\${HWLOC_RELEASE_VERSION}/ + s/\\\${greek}/\\\${HWLOC_GREEK_VERSION}/ + s/^date/HWLOC_RELEASE_DATE/ + s/^snapshot_version/HWLOC_SNAPSHOT_VERSION/ + s/^snapshot/HWLOC_SNAPSHOT/ + t print + b + : print + p" < "$srcfile"` + eval "$ompi_vers" HWLOC_VERSION="$HWLOC_MAJOR_VERSION.$HWLOC_MINOR_VERSION.$HWLOC_RELEASE_VERSION${HWLOC_GREEK_VERSION}" @@ -62,14 +62,14 @@ else fi if test "$option" = ""; then - option="--version" + option="--version" fi fi case "$option" in --version) - echo $HWLOC_VERSION - ;; + echo $HWLOC_VERSION + ;; --release-date) echo $HWLOC_RELEASE_DATE ;; @@ -77,7 +77,7 @@ case "$option" in echo $HWLOC_SNAPSHOT ;; -h|--help) - cat <

\fR value equivalent to the directory where \fIprun\fR +resides, minus its last subdirectory. For example: + + \fB%\fP /usr/local/bin/prun ... + +is equivalent to + + \fB%\fP prun --prefix /usr/local + +. +.\" ************************** +.\" Quick Summary Section +.\" ************************** +.SH QUICK SUMMARY +. +If you are simply looking for how to run an application, you +probably want to use a command line of the following form: + + \fB%\fP prun [ -np X ] [ --hostfile ] + +This will run X copies of \fI\fR in your current run-time +environment (if running under a supported resource manager, PSRVR's +\fIprun\fR will usually automatically use the corresponding resource manager +process starter, as opposed to, for example, \fIrsh\fR or \fIssh\fR, +which require the use of a hostfile, or will default to running all X +copies on the localhost), scheduling (by default) in a round-robin fashion by +CPU slot. See the rest of this page for more details. +.P +Please note that prun automatically binds processes. Three binding patterns are used in the absence of any further directives: +.TP 18 +.B Bind to core: +when the number of processes is <= 2 +. +. +.TP +.B Bind to socket: +when the number of processes is > 2 +. +. +.TP +.B Bind to none: +when oversubscribed +. +. +.P +If your application uses threads, then you probably want to ensure that you are +either not bound at all (by specifying --bind-to none), or bound to multiple cores +using an appropriate binding level or specific number of processing elements per +application process. +. +.\" ************************** +.\" Options Section +.\" ************************** +.SH OPTIONS +. +.I prun +will send the name of the directory where it was invoked on the local +node to each of the remote nodes, and attempt to change to that +directory. See the "Current Working Directory" section below for further +details. +.\" +.\" Start options listing +.\" Indent 10 characters from start of first column to start of second column +.TP 10 +.B +The program executable. This is identified as the first non-recognized argument +to prun. +. +. +.TP +.B +Pass these run-time arguments to every new process. These must always +be the last arguments to \fIprun\fP. If an app context file is used, +\fI\fP will be ignored. +. +. +.TP +.B -h\fR,\fP --help +Display help for this command +. +. +.TP +.B -q\fR,\fP --quiet +Suppress informative messages from prun during application execution. +. +. +.TP +.B -v\fR,\fP --verbose +Be verbose +. +. +.TP +.B -V\fR,\fP --version +Print version number. If no other arguments are given, this will also +cause prun to exit. +. +. +.TP +.B -N \fR\fP +.br +Launch num processes per node on all allocated nodes (synonym for npernode). +. +. +. +.TP +.B -display-map\fR,\fP --display-map +Display a table showing the mapped location of each process prior to launch. +. +. +. +.TP +.B -display-allocation\fR,\fP --display-allocation +Display the detected resource allocation. +. +. +. +.TP +.B -output-proctable\fR,\fP --output-proctable +Output the debugger proctable after launch. +. +. +. +.TP +.B -max-vm-size\fR,\fP --max-vm-size \fR\fP +Number of processes to run. +. +. +. +.TP +.B -novm\fR,\fP --novm +Execute without creating an allocation-spanning virtual machine (only start +daemons on nodes hosting application procs). +. +. +. +.TP +.B -hnp\fR,\fP --hnp \fR\fP +Specify the URI of the \fRpsrvr\fP process, or the name of the file (specified as +file:filename) that contains that info. +. +. +. +.P +Use one of the following options to specify which hosts (nodes) within the \fRpsrvr\fP to run on. +. +. +.TP +.B -H\fR,\fP -host\fR,\fP --host \fR\fP +List of hosts on which to invoke processes. +. +. +.TP +.B -hostfile\fR,\fP --hostfile \fR\fP +Provide a hostfile to use. +.\" JJH - Should have man page for how to format a hostfile properly. +. +. +.TP +.B -default-hostfile\fR,\fP --default-hostfile \fR\fP +Provide a default hostfile. +. +. +.TP +.B -machinefile\fR,\fP --machinefile \fR\fP +Synonym for \fI-hostfile\fP. +. +. +. +. +.TP +.B -cpu-set\fR,\fP --cpu-set \fR\fP +Restrict launched processes to the specified logical cpus on each node (comma-separated +list). Note that the binding options will still apply within the specified envelope - e.g., +you can elect to bind each process to only one cpu within the specified cpu set. +. +. +. +.P +The following options specify the number of processes to launch. Note that none +of the options imply a particular binding policy - e.g., requesting N processes +for each socket does not imply that the processes will be bound to the socket. +. +. +.TP +.B -c\fR,\fP -n\fR,\fP --n\fR,\fP -np \fR<#>\fP +Run this many copies of the program on the given nodes. This option +indicates that the specified file is an executable program and not an +application context. If no value is provided for the number of copies to +execute (i.e., neither the "-np" nor its synonyms are provided on the command +line), prun will automatically execute a copy of the program on +each process slot (see below for description of a "process slot"). This +feature, however, can only be used in the SPMD model and will return an +error (without beginning execution of the application) otherwise. +. +. +.TP +.B —map-by ppr:N: +Launch N times the number of objects of the specified type on each node. +. +. +.TP +.B -npersocket\fR,\fP --npersocket \fR<#persocket>\fP +On each node, launch this many processes times the number of processor +sockets on the node. +The \fI-npersocket\fP option also turns on the \fI-bind-to-socket\fP option. +(deprecated in favor of --map-by ppr:n:socket) +. +. +.TP +.B -npernode\fR,\fP --npernode \fR<#pernode>\fP +On each node, launch this many processes. +(deprecated in favor of --map-by ppr:n:node) +. +. +.TP +.B -pernode\fR,\fP --pernode +On each node, launch one process -- equivalent to \fI-npernode\fP 1. +(deprecated in favor of --map-by ppr:1:node) +. +. +. +. +.P +To map processes: +. +. +.TP +.B --map-by \fR\fP +Map to the specified object, defaults to \fIsocket\fP. Supported options +include slot, hwthread, core, L1cache, L2cache, L3cache, socket, numa, +board, node, sequential, distance, and ppr. Any object can include +modifiers by adding a \fR:\fP and any combination of PE=n (bind n +processing elements to each proc), SPAN (load +balance the processes across the allocation), OVERSUBSCRIBE (allow +more processes on a node than processing elements), and NOOVERSUBSCRIBE. +This includes PPR, where the pattern would be terminated by another colon +to separate it from the modifiers. +. +.TP +.B -bycore\fR,\fP --bycore +Map processes by core (deprecated in favor of --map-by core) +. +.TP +.B -byslot\fR,\fP --byslot +Map and rank processes round-robin by slot. +. +.TP +.B -nolocal\fR,\fP --nolocal +Do not run any copies of the launched application on the same node as +prun is running. This option will override listing the localhost +with \fB--host\fR or any other host-specifying mechanism. +. +.TP +.B -nooversubscribe\fR,\fP --nooversubscribe +Do not oversubscribe any nodes; error (without starting any processes) +if the requested number of processes would cause oversubscription. +This option implicitly sets "max_slots" equal to the "slots" value for +each node. (Enabled by default). +. +.TP +.B -oversubscribe\fR,\fP --oversubscribe +Nodes are allowed to be oversubscribed, even on a managed system, and +overloading of processing elements. +. +.TP +.B -bynode\fR,\fP --bynode +Launch processes one per node, cycling by node in a round-robin +fashion. This spreads processes evenly among nodes and assigns +ranks in a round-robin, "by node" manner. +. +.TP +.B -cpu-list\fR,\fP --cpu-list \fR\fP +List of processor IDs to bind processes to [default=NULL]. +. +. +. +. +.P +To order processes' ranks: +. +. +.TP +.B --rank-by \fR\fP +Rank in round-robin fashion according to the specified object, +defaults to \fIslot\fP. Supported options +include slot, hwthread, core, L1cache, L2cache, L3cache, +socket, numa, board, and node. +. +. +. +. +.P +For process binding: +. +.TP +.B --bind-to \fR\fP +Bind processes to the specified object, defaults to \fIcore\fP. Supported options +include slot, hwthread, core, l1cache, l2cache, l3cache, socket, numa, board, and none. +. +.TP +.B -cpus-per-proc\fR,\fP --cpus-per-proc \fR<#perproc>\fP +Bind each process to the specified number of cpus. +(deprecated in favor of --map-by :PE=n) +. +.TP +.B -cpus-per-rank\fR,\fP --cpus-per-rank \fR<#perrank>\fP +Alias for \fI-cpus-per-proc\fP. +(deprecated in favor of --map-by :PE=n) +. +.TP +.B -bind-to-core\fR,\fP --bind-to-core +Bind processes to cores (deprecated in favor of --bind-to core) +. +.TP +.B -bind-to-socket\fR,\fP --bind-to-socket +Bind processes to processor sockets (deprecated in favor of --bind-to socket) +. +.TP +.B -report-bindings\fR,\fP --report-bindings +Report any bindings for launched processes. +. +. +. +. +.P +For rankfiles: +. +. +.TP +.B -rf\fR,\fP --rankfile \fR\fP +Provide a rankfile file. +. +. +. +. +.P +To manage standard I/O: +. +. +.TP +.B -output-filename\fR,\fP --output-filename \fR\fP +Redirect the stdout, stderr, and stddiag of all processes to a process-unique version of +the specified filename. Any directories in the filename will automatically be created. +Each output file will consist of filename.id, where the id will be the +processes' rank, left-filled with +zero's for correct ordering in listings. +. +. +.TP +.B -stdin\fR,\fP --stdin\fR \fP +The rank of the process that is to receive stdin. The +default is to forward stdin to rank 0, but this option +can be used to forward stdin to any process. It is also acceptable to +specify \fInone\fP, indicating that no processes are to receive stdin. +. +. +.TP +.B -merge-stderr-to-stdout\fR,\fP --merge-stderr-to-stdout +Merge stderr to stdout for each process. +. +. +.TP +.B -tag-output\fR,\fP --tag-output +Tag each line of output to stdout, stderr, and stddiag with \fB[jobid, MCW_rank]\fP +indicating the process jobid and rank of the process that generated the output, +and the channel which generated it. +. +. +.TP +.B -timestamp-output\fR,\fP --timestamp-output +Timestamp each line of output to stdout, stderr, and stddiag. +. +. +.TP +.B -xml\fR,\fP --xml +Provide all output to stdout, stderr, and stddiag in an xml format. +. +. +.TP +.B -xml-file\fR,\fP --xml-file \fR\fP +Provide all output in XML format to the specified file. +. +. +.TP +.B -xterm\fR,\fP --xterm \fR\fP +Display the output from the processes identified by their ranks in separate xterm windows. The ranks are specified +as a comma-separated list of ranges, with a -1 indicating all. A separate +window will be created for each specified process. +.B Note: +xterm will normally terminate the window upon termination of the process running +within it. However, by adding a "!" to the end of the list of specified ranks, +the proper options will be provided to ensure that xterm keeps the window open +\fIafter\fP the process terminates, thus allowing you to see the process' output. +Each xterm window will subsequently need to be manually closed. +.B Note: +In some environments, xterm may require that the executable be in the user's +path, or be specified in absolute or relative terms. Thus, it may be necessary +to specify a local executable as "./foo" instead of just "foo". If xterm fails to +find the executable, prun will hang, but still respond correctly to a ctrl-c. +If this happens, please check that the executable is being specified correctly +and try again. +. +. +. +. +.P +To manage files and runtime environment: +. +. +.TP +.B -path\fR,\fP --path \fR\fP + that will be used when attempting to locate the requested +executables. This is used prior to using the local PATH setting. +. +. +.TP +.B --prefix \fR\fP +Prefix directory that will be used to set the \fIPATH\fR and +\fILD_LIBRARY_PATH\fR on the remote node before invoking +the target process. See the "Remote Execution" section, below. +. +. +.TP +.B --noprefix +Disable the automatic --prefix behavior +. +. +.TP +.B -s\fR,\fP --preload-binary +Copy the specified executable(s) to remote machines prior to starting remote processes. The +executables will be copied to the session directory and will be deleted upon +completion of the job. +. +. +.TP +.B --preload-files \fR\fP +Preload the comma separated list of files to the current working directory of the remote +machines where processes will be launched prior to starting those processes. +. +. +.TP +.B -set-cwd-to-session-dir\fR,\fP --set-cwd-to-session-dir +Set the working directory of the started processes to their session directory. +. +. +.TP +.B -wd \fR\fP +Synonym for \fI-wdir\fP. +. +. +.TP +.B -wdir \fR\fP +Change to the directory before the user's program executes. +See the "Current Working Directory" section for notes on relative paths. +.B Note: +If the \fI-wdir\fP option appears both on the command line and in an +application context, the context will take precedence over the command +line. Thus, if the path to the desired wdir is different +on the backend nodes, then it must be specified as an absolute path that +is correct for the backend node. +. +. +.TP +.B -x \fR\fP +Export the specified environment variables to the remote nodes before +executing the program. Only one environment variable can be specified +per \fI-x\fP option. Existing environment variables can be specified +or new variable names specified with corresponding values. For +example: + \fB%\fP prun -x DISPLAY -x OFILE=/tmp/out ... + +The parser for the \fI-x\fP option is not very sophisticated; it does +not even understand quoted values. Users are advised to set variables +in the environment, and then use \fI-x\fP to export (not define) them. +. +. +. +. +.P +Setting MCA parameters: +. +. +.TP +.B -gpmca\fR,\fP --gpmca \fR \fP +Pass global MCA parameters that are applicable to all contexts. \fI\fP is +the parameter name; \fI\fP is the parameter value. +. +. +.TP +.B -pmca\fR,\fP --pmca \fR \fP +Send arguments to various MCA modules. See the "MCA" section, below. +. +. +.TP +.B -am \fR\fP +Aggregate MCA parameter set file list. +. +. +.TP +.B -tune\fR,\fP --tune \fR\fP +Specify a tune file to set arguments for various MCA modules and environment variables. +See the "Setting MCA parameters and environment variables from file" section, below. +. +. +. +. +.P +For debugging: +. +. +.TP +.B -debug\fR,\fP --debug +Invoke the user-level debugger indicated by the \fIorte_base_user_debugger\fP +MCA parameter. +. +. +.TP +.B --get-stack-traces +When paired with the +.B --timeout +option, +.I prun +will obtain and print out stack traces from all launched processes +that are still alive when the timeout expires. Note that obtaining +stack traces can take a little time and produce a lot of output, +especially for large process-count jobs. +. +. +.TP +.B -debugger\fR,\fP --debugger \fR\fP +Sequence of debuggers to search for when \fI--debug\fP is used (i.e. +a synonym for \fIorte_base_user_debugger\fP MCA parameter). +. +. +.TP +.B --timeout \fR +The maximum number of seconds that +.I prun +will run. After this many seconds, +.I prun +will abort the launched job and exit with a non-zero exit status. +Using +.B --timeout +can be also useful when combined with the +.B --get-stack-traces +option. +. +. +.TP +.B -tv\fR,\fP --tv +Launch processes under the TotalView debugger. +Deprecated backwards compatibility flag. Synonym for \fI--debug\fP. +. +. +. +. +.P +There are also other options: +. +. +.TP +.B --allow-run-as-root +Allow +.I prun +to run when executed by the root user +.RI ( prun +defaults to aborting when launched as the root user). +. +. +.TP +.B --app \fR\fP +Provide an appfile, ignoring all other command line options. +. +. +.TP +.B -cf\fR,\fP --cartofile \fR\fP +Provide a cartography file. +. +. +.TP +.B -continuous\fR,\fP --continuous +Job is to run until explicitly terminated. +. +. +.TP +.B -disable-recovery\fR,\fP --disable-recovery +Disable recovery (resets all recovery options to off). +. +. +.TP +.B -do-not-launch\fR,\fP --do-not-launch +Perform all necessary operations to prepare to launch the application, but do not actually launch it. +. +. +.TP +.B -do-not-resolve\fR,\fP --do-not-resolve +Do not attempt to resolve interfaces. +. +. +.TP +.B -enable-recovery\fR,\fP --enable-recovery +Enable recovery from process failure [Default = disabled]. +. +. +.TP +.B -index-argv-by-rank\fR,\fP --index-argv-by-rank +Uniquely index argv[0] for each process using its rank. +. +. +.TP +.B -max-restarts\fR,\fP --max-restarts \fR\fP +Max number of times to restart a failed process. +. +. +.TP +.B --ppr \fR\fP +Comma-separated list of number of processes on a given resource type [default: none]. +. +. +.TP +.B -report-child-jobs-separately\fR,\fP --report-child-jobs-separately +Return the exit status of the primary job only. +. +. +.TP +.B -report-events\fR,\fP --report-events \fR\fP +Report events to a tool listening at the specified URI. +. +. +.TP +.B -report-pid\fR,\fP --report-pid \fR\fP +Print out prun's PID during startup. The channel must be either a '-' to indicate +that the pid is to be output to stdout, a '+' to indicate that the pid is to be +output to stderr, or a filename to which the pid is to be written. +. +. +.TP +.B -report-uri\fR,\fP --report-uri \fR\fP +Print out prun's URI during startup. The channel must be either a '-' to indicate +that the URI is to be output to stdout, a '+' to indicate that the URI is to be +output to stderr, or a filename to which the URI is to be written. +. +. +.TP +.B -show-progress\fR,\fP --show-progress +Output a brief periodic report on launch progress. +. +. +.TP +.B -terminate\fR,\fP --terminate +Terminate the DVM. +. +. +.TP +.B -use-hwthread-cpus\fR,\fP --use-hwthread-cpus +Use hardware threads as independent cpus. +. +. +.TP +.B -use-regexp\fR,\fP --use-regexp +Use regular expressions for launch. +. +. +. +. +.P +The following options are useful for developers; they are not generally +useful to most users: +. +.TP +.B -d\fR,\fP --debug-devel +Enable debugging. This is not generally useful for most users. +. +. +.TP +.B -display-devel-allocation\fR,\fP --display-devel-allocation +Display a detailed list of the allocation being used by this job. +. +. +.TP +.B -display-devel-map\fR,\fP --display-devel-map +Display a more detailed table showing the mapped location of each process prior to launch. +. +. +.TP +.B -display-diffable-map\fR,\fP --display-diffable-map +Display a diffable process map just before launch. +. +. +.TP +.B -display-topo\fR,\fP --display-topo +Display the topology as part of the process map just before launch. +. +. +.TP +.B --report-state-on-timeout +When paired with the +.B --timeout +command line option, report the run-time subsystem state of each +process when the timeout expires. +. +. +.P +There may be other options listed with \fIprun --help\fP. +. +. +.\" ************************** +.\" Description Section +.\" ************************** +.SH DESCRIPTION +. +One invocation of \fIprun\fP starts an application running under PSRVR. If the application is single process multiple data (SPMD), the application +can be specified on the \fIprun\fP command line. + +If the application is multiple instruction multiple data (MIMD), comprising of +multiple programs, the set of programs and argument can be specified in one of +two ways: Extended Command Line Arguments, and Application Context. +.PP +An application context describes the MIMD program set including all arguments +in a separate file. +.\" See appcontext(5) for a description of the application context syntax. +This file essentially contains multiple \fIprun\fP command lines, less the +command name itself. The ability to specify different options for different +instantiations of a program is another reason to use an application context. +.PP +Extended command line arguments allow for the description of the application +layout on the command line using colons (\fI:\fP) to separate the specification +of programs and arguments. Some options are globally set across all specified +programs (e.g. --hostfile), while others are specific to a single program +(e.g. -np). +. +. +. +.SS Specifying Host Nodes +. +Host nodes can be identified on the \fIprun\fP command line with the \fI-host\fP +option or in a hostfile. +. +.PP +For example, +. +.TP 4 +prun -H aa,aa,bb ./a.out +launches two processes on node aa and one on bb. +. +.PP +Or, consider the hostfile +. + + \fB%\fP cat myhostfile + aa slots=2 + bb slots=2 + cc slots=2 + +. +.PP +Here, we list both the host names (aa, bb, and cc) but also how many "slots" +there are for each. Slots indicate how many processes can potentially execute +on a node. For best performance, the number of slots may be chosen to be the +number of cores on the node or the number of processor sockets. If the hostfile +does not provide slots information, PSRVR will attempt to discover the number +of cores (or hwthreads, if the use-hwthreads-as-cpus option is set) and set the +number of slots to that value. This default behavior also occurs when specifying +the \fI-host\fP option with a single hostname. Thus, the command +. +.TP 4 +prun -H aa ./a.out +launches a number of processes equal to the number of cores on node aa. +. +.PP +. +.TP 4 +prun -hostfile myhostfile ./a.out +will launch two processes on each of the three nodes. +. +.TP 4 +prun -hostfile myhostfile -host aa ./a.out +will launch two processes, both on node aa. +. +.TP 4 +prun -hostfile myhostfile -host dd ./a.out +will find no hosts to run on and abort with an error. +That is, the specified host dd is not in the specified hostfile. +. +.PP +When running under resource managers (e.g., SLURM, Torque, etc.), +PSRVR will obtain both the hostnames and the number of slots directly +from the resource manger. +. +.SS Specifying Number of Processes +. +As we have just seen, the number of processes to run can be set using the +hostfile. Other mechanisms exist. +. +.PP +The number of processes launched can be specified as a multiple of the +number of nodes or processor sockets available. For example, +. +.TP 4 +prun -H aa,bb -npersocket 2 ./a.out +launches processes 0-3 on node aa and process 4-7 on node bb, +where aa and bb are both dual-socket nodes. +The \fI-npersocket\fP option also turns on the \fI-bind-to-socket\fP option, +which is discussed in a later section. +. +.TP 4 +prun -H aa,bb -npernode 2 ./a.out +launches processes 0-1 on node aa and processes 2-3 on node bb. +. +.TP 4 +prun -H aa,bb -npernode 1 ./a.out +launches one process per host node. +. +.TP 4 +prun -H aa,bb -pernode ./a.out +is the same as \fI-npernode\fP 1. +. +. +.PP +Another alternative is to specify the number of processes with the +\fI-np\fP option. Consider now the hostfile +. + + \fB%\fP cat myhostfile + aa slots=4 + bb slots=4 + cc slots=4 + +. +.PP +Now, +. +.TP 4 +prun -hostfile myhostfile -np 6 ./a.out +will launch processes 0-3 on node aa and processes 4-5 on node bb. The remaining +slots in the hostfile will not be used since the \fI-np\fP option indicated +that only 6 processes should be launched. +. +.SS Mapping Processes to Nodes: Using Policies +. +The examples above illustrate the default mapping of process processes +to nodes. This mapping can also be controlled with various +\fIprun\fP options that describe mapping policies. +. +. +.PP +Consider the same hostfile as above, again with \fI-np\fP 6: +. + + node aa node bb node cc + + prun 0 1 2 3 4 5 + + prun --map-by node 0 3 1 4 2 5 + + prun -nolocal 0 1 2 3 4 5 +. +.PP +The \fI--map-by node\fP option will load balance the processes across +the available nodes, numbering each process in a round-robin fashion. +. +.PP +The \fI-nolocal\fP option prevents any processes from being mapped onto the +local host (in this case node aa). While \fIprun\fP typically consumes +few system resources, \fI-nolocal\fP can be helpful for launching very +large jobs where \fIprun\fP may actually need to use noticeable amounts +of memory and/or processing time. +. +.PP +Just as \fI-np\fP can specify fewer processes than there are slots, it can +also oversubscribe the slots. For example, with the same hostfile: +. +.TP 4 +prun -hostfile myhostfile -np 14 ./a.out +will launch processes 0-3 on node aa, 4-7 on bb, and 8-11 on cc. It will +then add the remaining two processes to whichever nodes it chooses. +. +.PP +One can also specify limits to oversubscription. For example, with the same +hostfile: +. +.TP 4 +prun -hostfile myhostfile -np 14 -nooversubscribe ./a.out +will produce an error since \fI-nooversubscribe\fP prevents oversubscription. +. +.PP +Limits to oversubscription can also be specified in the hostfile itself: +. + % cat myhostfile + aa slots=4 max_slots=4 + bb max_slots=4 + cc slots=4 +. +.PP +The \fImax_slots\fP field specifies such a limit. When it does, the +\fIslots\fP value defaults to the limit. Now: +. +.TP 4 +prun -hostfile myhostfile -np 14 ./a.out +causes the first 12 processes to be launched as before, but the remaining +two processes will be forced onto node cc. The other two nodes are +protected by the hostfile against oversubscription by this job. +. +.PP +Using the \fI--nooversubscribe\fR option can be helpful since PSRVR +currently does not get "max_slots" values from the resource manager. +. +.PP +Of course, \fI-np\fP can also be used with the \fI-H\fP or \fI-host\fP +option. For example, +. +.TP 4 +prun -H aa,bb -np 8 ./a.out +launches 8 processes. Since only two hosts are specified, after the first +two processes are mapped, one to aa and one to bb, the remaining processes +oversubscribe the specified hosts. +. +.PP +And here is a MIMD example: +. +.TP 4 +prun -H aa -np 1 hostname : -H bb,cc -np 2 uptime +will launch process 0 running \fIhostname\fP on node aa and processes 1 and 2 +each running \fIuptime\fP on nodes bb and cc, respectively. +. +.SS Mapping, Ranking, and Binding: Oh My! +. +PSRVR employs a three-phase procedure for assigning process locations and +ranks: +. +.TP 10 +\fBmapping\fP +Assigns a default location to each process +. +.TP 10 +\fBranking\fP +Assigns a rank value to each process +. +.TP 10 +\fBbinding\fP +Constrains each process to run on specific processors +. +.PP +The \fImapping\fP step is used to assign a default location to each process +based on the mapper being employed. Mapping by slot, node, and sequentially results +in the assignment of the processes to the node level. In contrast, mapping by object, allows +the mapper to assign the process to an actual object on each node. +. +.PP +\fBNote:\fP the location assigned to the process is independent of where it will be bound - the +assignment is used solely as input to the binding algorithm. +. +.PP +The mapping of process processes to nodes can be defined not just +with general policies but also, if necessary, using arbitrary mappings +that cannot be described by a simple policy. One can use the "sequential +mapper," which reads the hostfile line by line, assigning processes +to nodes in whatever order the hostfile specifies. Use the +\fI-pmca rmaps seq\fP option. For example, using the same hostfile +as before: +. +.PP +prun -hostfile myhostfile -pmca rmaps seq ./a.out +. +.PP +will launch three processes, one on each of nodes aa, bb, and cc, respectively. +The slot counts don't matter; one process is launched per line on +whatever node is listed on the line. +. +.PP +Another way to specify arbitrary mappings is with a rankfile, which +gives you detailed control over process binding as well. Rankfiles +are discussed below. +. +.PP +The second phase focuses on the \fIranking\fP of the process within +the job. PSRVR +separates this from the mapping procedure to allow more flexibility in the +relative placement of processes. This is best illustrated by considering the +following two cases where we used the —map-by ppr:2:socket option: +. +.PP + node aa node bb + + rank-by core 0 1 ! 2 3 4 5 ! 6 7 + + rank-by socket 0 2 ! 1 3 4 6 ! 5 7 + + rank-by socket:span 0 4 ! 1 5 2 6 ! 3 7 +. +.PP +Ranking by core and by slot provide the identical result - a simple +progression of ranks across each node. Ranking by +socket does a round-robin ranking within each node until all processes +have been assigned a rank, and then progresses to the next +node. Adding the \fIspan\fP modifier to the ranking directive causes +the ranking algorithm to treat the entire allocation as a single +entity - thus, the MCW ranks are assigned across all sockets before +circling back around to the beginning. +. +.PP +The \fIbinding\fP phase actually binds each process to a given set of processors. This can +improve performance if the operating system is placing processes +suboptimally. For example, it might oversubscribe some multi-core +processor sockets, leaving other sockets idle; this can lead +processes to contend unnecessarily for common resources. Or, it +might spread processes out too widely; this can be suboptimal if +application performance is sensitive to interprocess communication +costs. Binding can also keep the operating system from migrating +processes excessively, regardless of how optimally those processes +were placed to begin with. +. +.PP +The processors to be used for binding can be identified in terms of +topological groupings - e.g., binding to an l3cache will bind each +process to all processors within the scope of a single L3 cache within +their assigned location. Thus, if a process is assigned by the mapper +to a certain socket, then a \fI—bind-to l3cache\fP directive will +cause the process to be bound to the processors that share a single L3 +cache within that socket. +. +.PP +To help balance loads, the binding directive uses a round-robin method when binding to +levels lower than used in the mapper. For example, consider the case where a job is +mapped to the socket level, and then bound to core. Each socket will have multiple cores, +so if multiple processes are mapped to a given socket, the binding algorithm will assign +each process located to a socket to a unique core in a round-robin manner. +. +.PP +Alternatively, processes mapped by l2cache and then bound to socket will simply be bound +to all the processors in the socket where they are located. In this manner, users can +exert detailed control over relative MCW rank location and binding. +. +.PP +Finally, \fI--report-bindings\fP can be used to report bindings. +. +.PP +As an example, consider a node with two processor sockets, each comprising +four cores. We run \fIprun\fP with \fI-np 4 --report-bindings\fP and +the following additional options: +. + + % prun ... --map-by core --bind-to core + [...] ... binding child [...,0] to cpus 0001 + [...] ... binding child [...,1] to cpus 0002 + [...] ... binding child [...,2] to cpus 0004 + [...] ... binding child [...,3] to cpus 0008 + + % prun ... --map-by socket --bind-to socket + [...] ... binding child [...,0] to socket 0 cpus 000f + [...] ... binding child [...,1] to socket 1 cpus 00f0 + [...] ... binding child [...,2] to socket 0 cpus 000f + [...] ... binding child [...,3] to socket 1 cpus 00f0 + + % prun ... --map-by core:PE=2 --bind-to core + [...] ... binding child [...,0] to cpus 0003 + [...] ... binding child [...,1] to cpus 000c + [...] ... binding child [...,2] to cpus 0030 + [...] ... binding child [...,3] to cpus 00c0 + + % prun ... --bind-to none +. +.PP +Here, \fI--report-bindings\fP shows the binding of each process as a mask. +In the first case, the processes bind to successive cores as indicated by +the masks 0001, 0002, 0004, and 0008. In the second case, processes bind +to all cores on successive sockets as indicated by the masks 000f and 00f0. +The processes cycle through the processor sockets in a round-robin fashion +as many times as are needed. In the third case, the masks show us that +2 cores have been bound per process. In the fourth case, binding is +turned off and no bindings are reported. +. +.PP +PSRVR's support for process binding depends on the underlying +operating system. Therefore, certain process binding options may not be available +on every system. +. +.PP +Process binding can also be set with MCA parameters. +Their usage is less convenient than that of \fIprun\fP options. +On the other hand, MCA parameters can be set not only on the \fIprun\fP +command line, but alternatively in a system or user mca-params.conf file +or as environment variables, as described in the MCA section below. +Some examples include: +. +.PP + prun option MCA parameter key value + + --map-by core rmaps_base_mapping_policy core + --map-by socket rmaps_base_mapping_policy socket + --rank-by core rmaps_base_ranking_policy core + --bind-to core hwloc_base_binding_policy core + --bind-to socket hwloc_base_binding_policy socket + --bind-to none hwloc_base_binding_policy none +. +. +.SS Rankfiles +. +Rankfiles are text files that specify detailed information about how +individual processes should be mapped to nodes, and to which +processor(s) they should be bound. Each line of a rankfile specifies +the location of one process. The general form of each line in the +rankfile is: +. + + rank = slot= +. +.PP +For example: +. + + $ cat myrankfile + rank 0=aa slot=1:0-2 + rank 1=bb slot=0:0,1 + rank 2=cc slot=1-2 + $ prun -H aa,bb,cc,dd -rf myrankfile ./a.out +. +.PP +Means that +. + + Rank 0 runs on node aa, bound to logical socket 1, cores 0-2. + Rank 1 runs on node bb, bound to logical socket 0, cores 0 and 1. + Rank 2 runs on node cc, bound to logical cores 1 and 2. +. +.PP +Rankfiles can alternatively be used to specify \fIphysical\fP processor +locations. In this case, the syntax is somewhat different. Sockets are +no longer recognized, and the slot number given must be the number of +the physical PU as most OS's do not assign a unique physical identifier +to each core in the node. Thus, a proper physical rankfile looks something +like the following: +. + + $ cat myphysicalrankfile + rank 0=aa slot=1 + rank 1=bb slot=8 + rank 2=cc slot=6 +. +.PP +This means that +. + + Rank 0 will run on node aa, bound to the core that contains physical PU 1 + Rank 1 will run on node bb, bound to the core that contains physical PU 8 + Rank 2 will run on node cc, bound to the core that contains physical PU 6 +. +.PP +Rankfiles are treated as \fIlogical\fP by default, and the MCA parameter +rmaps_rank_file_physical must be set to 1 to indicate that the rankfile +is to be considered as \fIphysical\fP. +. +.PP +The hostnames listed above are "absolute," meaning that actual +resolveable hostnames are specified. However, hostnames can also be +specified as "relative," meaning that they are specified in relation +to an externally-specified list of hostnames (e.g., by prun's --host +argument, a hostfile, or a job scheduler). +. +.PP +The "relative" specification is of the form "+n", where X is an +integer specifying the Xth hostname in the set of all available +hostnames, indexed from 0. For example: +. + + $ cat myrankfile + rank 0=+n0 slot=1:0-2 + rank 1=+n1 slot=0:0,1 + rank 2=+n2 slot=1-2 + $ prun -H aa,bb,cc,dd -rf myrankfile ./a.out +. +.PP +All socket/core slot locations are be +specified as +.I logical +indexes. You can use tools such as HWLOC's "lstopo" to find the +logical indexes of socket and cores. +. +. +.SS Application Context or Executable Program? +. +To distinguish the two different forms, \fIprun\fP +looks on the command line for \fI--app\fP option. If +it is specified, then the file named on the command line is +assumed to be an application context. If it is not +specified, then the file is assumed to be an executable program. +. +. +. +.SS Locating Files +. +If no relative or absolute path is specified for a file, prun will first look for files by searching the directories specified +by the \fI--path\fP option. If there is no \fI--path\fP option set or +if the file is not found at the \fI--path\fP location, then prun +will search the user's PATH environment variable as defined on the +source node(s). +.PP +If a relative directory is specified, it must be relative to the initial +working directory determined by the specific starter used. For example when +using the rsh or ssh starters, the initial directory is $HOME by default. Other +starters may set the initial directory to the current working directory from +the invocation of \fIprun\fP. +. +. +. +.SS Current Working Directory +. +The \fI\-wdir\fP prun option (and its synonym, \fI\-wd\fP) allows +the user to change to an arbitrary directory before the program is +invoked. It can also be used in application context files to specify +working directories on specific nodes and/or for specific +applications. +.PP +If the \fI\-wdir\fP option appears both in a context file and on the +command line, the context file directory will override the command +line value. +.PP +If the \fI-wdir\fP option is specified, prun will attempt to +change to the specified directory on all of the remote nodes. If this +fails, \fIprun\fP will abort. +.PP +If the \fI-wdir\fP option is \fBnot\fP specified, prun will send +the directory name where \fIprun\fP was invoked to each of the +remote nodes. The remote nodes will try to change to that +directory. If they are unable (e.g., if the directory does not exist on +that node), then prun will use the default directory determined by +the starter. +.PP +All directory changing occurs before the user's program is invoked. +. +. +. +.SS Standard I/O +. +PSRVR directs UNIX standard input to /dev/null on all processes +except the rank 0 process. The rank 0 process +inherits standard input from \fIprun\fP. +.B Note: +The node that invoked \fIprun\fP need not be the same as the node where the +rank 0 process resides. PSRVR handles the redirection of +\fIprun\fP's standard input to the rank 0 process. +.PP +PSRVR directs UNIX standard output and error from remote nodes to the node +that invoked \fIprun\fP and prints it on the standard output/error of +\fIprun\fP. +Local processes inherit the standard output/error of \fIprun\fP and transfer +to it directly. +.PP +Thus it is possible to redirect standard I/O for applications by +using the typical shell redirection procedure on \fIprun\fP. + + \fB%\fP prun -np 2 my_app < my_input > my_output + +Note that in this example \fIonly\fP the rank 0 process will +receive the stream from \fImy_input\fP on stdin. The stdin on all the other +nodes will be tied to /dev/null. However, the stdout from all nodes will +be collected into the \fImy_output\fP file. +. +. +. +.SS Signal Propagation +. +When prun receives a SIGTERM and SIGINT, it will attempt to kill +the entire job by sending all processes in the job a SIGTERM, waiting +a small number of seconds, then sending all processes in the job a +SIGKILL. +. +.PP +SIGUSR1 and SIGUSR2 signals received by prun are propagated to +all processes in the job. +. +.PP +A SIGTSTOP signal to prun will cause a SIGSTOP signal to be sent +to all of the programs started by prun and likewise a SIGCONT signal +to prun will cause a SIGCONT sent. +. +.PP +Other signals are not currently propagated +by prun. +. +. +.SS Process Termination / Signal Handling +. +During the run of an application, if any process dies abnormally +(either exiting before invoking \fIPMIx_Finalize\fP, or dying as the result of a +signal), \fIprun\fP will print out an error message and kill the rest of the +application. +.PP +. +. +.SS Process Environment +. +Processes in the application inherit their environment from the +PSRVR daemon upon the node on which they are running. The +environment is typically inherited from the user's shell. On remote +nodes, the exact environment is determined by the boot MCA module +used. The \fIrsh\fR launch module, for example, uses either +\fIrsh\fR/\fIssh\fR to launch the PSRVR daemon on remote nodes, and +typically executes one or more of the user's shell-setup files before +launching the daemon. When running dynamically linked +applications which require the \fILD_LIBRARY_PATH\fR environment +variable to be set, care must be taken to ensure that it is correctly +set when booting PSRVR. +.PP +See the "Remote Execution" section for more details. +. +. +.SS Remote Execution +. +PSRVR requires that the \fIPATH\fR environment variable be set to +find executables on remote nodes (this is typically only necessary in +\fIrsh\fR- or \fIssh\fR-based environments -- batch/scheduled +environments typically copy the current environment to the execution +of remote jobs, so if the current environment has \fIPATH\fR and/or +\fILD_LIBRARY_PATH\fR set properly, the remote nodes will also have it +set properly). If PSRVR was compiled with shared library support, +it may also be necessary to have the \fILD_LIBRARY_PATH\fR environment +variable set on remote nodes as well (especially to find the shared +libraries required to run user applications). +.PP +However, it is not always desirable or possible to edit shell +startup files to set \fIPATH\fR and/or \fILD_LIBRARY_PATH\fR. The +\fI--prefix\fR option is provided for some simple configurations where +this is not possible. +.PP +The \fI--prefix\fR option takes a single argument: the base directory +on the remote node where PSRVR is installed. PSRVR will use +this directory to set the remote \fIPATH\fR and \fILD_LIBRARY_PATH\fR +before executing any user applications. This allows +running jobs without having pre-configured the \fIPATH\fR and +\fILD_LIBRARY_PATH\fR on the remote nodes. +.PP +PSRVR adds the basename of the current +node's "bindir" (the directory where PSRVR's executables are +installed) to the prefix and uses that to set the \fIPATH\fR on the +remote node. Similarly, PSRVR adds the basename of the current +node's "libdir" (the directory where PSRVR's libraries are +installed) to the prefix and uses that to set the +\fILD_LIBRARY_PATH\fR on the remote node. For example: +.TP 15 +Local bindir: +/local/node/directory/bin +.TP +Local libdir: +/local/node/directory/lib64 +.PP +If the following command line is used: + + \fB%\fP prun --prefix /remote/node/directory + +PSRVR will add "/remote/node/directory/bin" to the \fIPATH\fR +and "/remote/node/directory/lib64" to the \fLD_LIBRARY_PATH\fR on the +remote node before attempting to execute anything. +.PP +The \fI--prefix\fR option is not sufficient if the installation paths +on the remote node are different than the local node (e.g., if "/lib" +is used on the local node, but "/lib64" is used on the remote node), +or if the installation paths are something other than a subdirectory +under a common prefix. +.PP +Note that executing \fIprun\fR via an absolute pathname is +equivalent to specifying \fI--prefix\fR without the last subdirectory +in the absolute pathname to \fIprun\fR. For example: + + \fB%\fP /usr/local/bin/prun ... + +is equivalent to + + \fB%\fP prun --prefix /usr/local +. +. +. +.SS Exported Environment Variables +. +All environment variables that are named in the form PMIX_* will automatically +be exported to new processes on the local and remote nodes. Environmental +parameters can also be set/forwarded to the new processes using the MCA +parameter \fImca_base_env_list\fP. While the syntax of the \fI\-x\fP option and MCA param +allows the definition of new variables, note that the parser +for these options are currently not very sophisticated - it does not even +understand quoted values. Users are advised to set variables in the +environment and use the option to export them; not to define them. +. +. +. +.SS Setting MCA Parameters +. +The \fI-pmca\fP switch allows the passing of parameters to various MCA +(Modular Component Architecture) modules. +.\" PSRVR's MCA modules are described in detail in psrvrmca(7). +MCA modules have direct impact on programs because they allow tunable +parameters to be set at run time (such as which BTL communication device driver +to use, what parameters to pass to that BTL, etc.). +.PP +The \fI-pmca\fP switch takes two arguments: \fI\fP and \fI\fP. +The \fI\fP argument generally specifies which MCA module will receive the value. +For example, the \fI\fP "btl" is used to select which BTL to be used for +transporting messages. The \fI\fP argument is the value that is +passed. +For example: +. +.TP 4 +prun -pmca btl tcp,self -np 1 foo +Tells PSRVR to use the "tcp" and "self" BTLs, and to run a single copy of +"foo" on an allocated node. +. +.TP +prun -pmca btl self -np 1 foo +Tells PSRVR to use the "self" BTL, and to run a single copy of "foo" on an +allocated node. +.\" And so on. PSRVR's BTL MCA modules are described in psrvrmca_btl(7). +.PP +The \fI-pmca\fP switch can be used multiple times to specify different +\fI\fP and/or \fI\fP arguments. If the same \fI\fP is +specified more than once, the \fI\fPs are concatenated with a comma +(",") separating them. +.PP +Note that the \fI-pmca\fP switch is simply a shortcut for setting environment variables. +The same effect may be accomplished by setting corresponding environment +variables before running \fIprun\fP. +The form of the environment variables that PSRVR sets is: + + PMIX_MCA_= +.PP +Thus, the \fI-pmca\fP switch overrides any previously set environment +variables. The \fI-pmca\fP settings similarly override MCA parameters set +in the +$OPAL_PREFIX/etc/psrvr-mca-params.conf or $HOME/.psrvr/mca-params.conf +file. +. +.PP +Unknown \fI\fP arguments are still set as +environment variable -- they are not checked (by \fIprun\fP) for correctness. +Illegal or incorrect \fI\fP arguments may or may not be reported -- it +depends on the specific MCA module. +.PP +To find the available component types under the MCA architecture, or to find the +available parameters for a specific component, use the \fIpinfo\fP command. +See the \fIpinfo(1)\fP man page for detailed information on the command. +. +. +. +.SS Setting MCA parameters and environment variables from file. +The \fI-tune\fP command line option and its synonym \fI-pmca mca_base_envar_file_prefix\fP allows a user +to set mca parameters and environment variables with the syntax described below. +This option requires a single file or list of files separated by "," to follow. +.PP +A valid line in the file may contain zero or many "-x", "-pmca", or “--pmca” arguments. +The following patterns are supported: -pmca var val -pmca var "val" -x var=val -x var. +If any argument is duplicated in the file, the last value read will be used. +.PP +MCA parameters and environment specified on the command line have higher precedence than variables specified in the file. +. +. +. +.SS Running as root +. +The PSRVR team strongly advises against executing +.I prun +as the root user. Applications should be run as regular +(non-root) users. +. +.PP +Reflecting this advice, prun will refuse to run as root by default. +To override this default, you can add the +.I --allow-run-as-root +option to the +.I prun +command line. +. +.SS Exit status +. +There is no standard definition for what \fIprun\fP should return as an exit +status. After considerable discussion, we settled on the following method for +assigning the \fIprun\fP exit status (note: in the following description, +the "primary" job is the initial application started by prun - all jobs that +are spawned by that job are designated "secondary" jobs): +. +.IP \[bu] 2 +if all processes in the primary job normally terminate with exit status 0, we return 0 +.IP \[bu] +if one or more processes in the primary job normally terminate with non-zero exit status, +we return the exit status of the process with the lowest rank to have a non-zero status +.IP \[bu] +if all processes in the primary job normally terminate with exit status 0, and one or more +processes in a secondary job normally terminate with non-zero exit status, we (a) return +the exit status of the process with the lowest rank in the lowest jobid to have a non-zero +status, and (b) output a message summarizing the exit status of the primary and all secondary jobs. +.IP \[bu] +if the cmd line option --report-child-jobs-separately is set, we will return -only- the +exit status of the primary job. Any non-zero exit status in secondary jobs will be +reported solely in a summary print statement. +. +.PP +By default, PSRVR records and notes that processes exited with non-zero termination status. +This is generally not considered an "abnormal termination" - i.e., PSRVR will not abort a +job if one or more processes return a non-zero status. Instead, the default behavior simply +reports the number of processes terminating with non-zero status upon completion of the job. +.PP +However, in some cases it can be desirable to have the job abort when any process terminates +with non-zero status. For example, a non-PMIx job might detect a bad result from a calculation +and want to abort, but doesn't want to generate a core file. Or a PMIx job might continue past +a call to PMIx_Finalize, but indicate that all processes should abort due to some post-PMIx result. +.PP +It is not anticipated that this situation will occur frequently. However, in the interest of +serving the broader community, PSRVR now has a means for allowing users to direct that jobs be +aborted upon any process exiting with non-zero status. Setting the MCA parameter +"orte_abort_on_non_zero_status" to 1 will cause PSRVR to abort all processes once any process + exits with non-zero status. +.PP +Terminations caused in this manner will be reported on the console as an "abnormal termination", +with the first process to so exit identified along with its exit status. +.PP +.\" ************************** +.\" Return Value Section +.\" ************************** +. +.SH RETURN VALUE +. +\fIprun\fP returns 0 if all processes started by \fIprun\fP exit after calling +PMIx_Finalize. A non-zero value is returned if an internal error occurred in +prun, or one or more processes exited before calling PMIx_Finalize. If an +internal error occurred in prun, the corresponding error code is returned. +In the event that one or more processes exit before calling PMIx_Finalize, the +return value of the rank of the process that \fIprun\fP first notices died +before calling PMIx_Finalize will be returned. Note that, in general, this will +be the first process that died but is not guaranteed to be so. +. +.PP +If the +.B --timeout +command line option is used and the timeout expires before the job +completes (thereby forcing +.I prun +to kill the job) +.I prun +will return an exit status equivalent to the value of +.B ETIMEDOUT +(which is typically 110 on Linux and OS X systems). + +. +.\" ************************** +.\" See Also Section +.\" ************************** +. diff --git a/orte/tools/prun/prun.c b/orte/tools/prun/prun.c new file mode 100644 index 00000000000..17683b803f5 --- /dev/null +++ b/orte/tools/prun/prun.c @@ -0,0 +1,1373 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ +/* + * Copyright (c) 2004-2010 The Trustees of Indiana University and Indiana + * University Research and Technology + * Corporation. All rights reserved. + * Copyright (c) 2004-2008 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, + * University of Stuttgart. All rights reserved. + * Copyright (c) 2004-2005 The Regents of the University of California. + * All rights reserved. + * Copyright (c) 2006-2017 Cisco Systems, Inc. All rights reserved + * Copyright (c) 2007-2009 Sun Microsystems, Inc. All rights reserved. + * Copyright (c) 2007-2017 Los Alamos National Security, LLC. All rights + * reserved. + * Copyright (c) 2013-2018 Intel, Inc. All rights reserved. + * Copyright (c) 2015 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "orte_config.h" +#include "orte/constants.h" + +#include +#include +#include +#ifdef HAVE_STRINGS_H +#include +#endif /* HAVE_STRINGS_H */ +#ifdef HAVE_UNISTD_H +#include +#endif +#ifdef HAVE_SYS_PARAM_H +#include +#endif +#include +#include +#include +#ifdef HAVE_SYS_TYPES_H +#include +#endif /* HAVE_SYS_TYPES_H */ +#ifdef HAVE_SYS_WAIT_H +#include +#endif /* HAVE_SYS_WAIT_H */ +#ifdef HAVE_SYS_TIME_H +#include +#endif /* HAVE_SYS_TIME_H */ +#include +#ifdef HAVE_SYS_STAT_H +#include +#endif + +#include "opal/mca/event/event.h" +#include "opal/mca/installdirs/installdirs.h" +#include "opal/mca/pmix/base/base.h" +#include "opal/mca/base/base.h" +#include "opal/util/argv.h" +#include "opal/util/output.h" +#include "opal/util/basename.h" +#include "opal/util/cmd_line.h" +#include "opal/util/opal_environ.h" +#include "opal/util/opal_getcwd.h" +#include "opal/util/show_help.h" +#include "opal/util/fd.h" +#include "opal/sys/atomic.h" + +#include "opal/version.h" +#include "opal/runtime/opal.h" +#include "opal/runtime/opal_info_support.h" +#include "opal/runtime/opal_progress_threads.h" +#include "opal/util/os_path.h" +#include "opal/util/path.h" +#include "opal/class/opal_pointer_array.h" +#include "opal/dss/dss.h" + +#include "orte/runtime/runtime.h" +#include "orte/runtime/orte_globals.h" +#include "orte/mca/errmgr/errmgr.h" +#include "orte/mca/schizo/base/base.h" +#include "orte/mca/state/state.h" +#include "orte/orted/orted_submit.h" + +/* ensure I can behave like a daemon */ +#include "prun.h" + +typedef struct { + opal_object_t super; + opal_pmix_lock_t lock; + opal_list_t info; +} myinfo_t; +static void mcon(myinfo_t *p) +{ + OPAL_PMIX_CONSTRUCT_LOCK(&p->lock); + OBJ_CONSTRUCT(&p->info, opal_list_t); +} +static void mdes(myinfo_t *p) +{ + OPAL_PMIX_DESTRUCT_LOCK(&p->lock); + OPAL_LIST_DESTRUCT(&p->info); +} +static OBJ_CLASS_INSTANCE(myinfo_t, opal_object_t, + mcon, mdes); + +static struct { + bool terminate_dvm; + bool system_server_first; + bool system_server_only; + int pid; +} myoptions; + +static opal_list_t job_info; +static volatile bool active = false; +static orte_jobid_t myjobid = ORTE_JOBID_INVALID; +static myinfo_t myinfo; + +static int create_app(int argc, char* argv[], + opal_list_t *jdata, + opal_pmix_app_t **app, + bool *made_app, char ***app_env); +static int parse_locals(opal_list_t *jdata, int argc, char* argv[]); +static void set_classpath_jar_file(opal_pmix_app_t *app, int index, char *jarfile); +static size_t evid = INT_MAX; + + +static opal_cmd_line_init_t cmd_line_init[] = { + /* tell the dvm to terminate */ + { NULL, '\0', "terminate", "terminate", 0, + &myoptions.terminate_dvm, OPAL_CMD_LINE_TYPE_BOOL, + "Terminate the DVM", OPAL_CMD_LINE_OTYPE_DVM }, + + /* look first for a system server */ + { NULL, '\0', "system-server-first", "system-server-first", 0, + &myoptions.system_server_first, OPAL_CMD_LINE_TYPE_BOOL, + "First look for a system server and connect to it if found", OPAL_CMD_LINE_OTYPE_DVM }, + + /* connect only to a system server */ + { NULL, '\0', "system-server-only", "system-server-only", 0, + &myoptions.system_server_only, OPAL_CMD_LINE_TYPE_BOOL, + "Connect only to a system-level server", OPAL_CMD_LINE_OTYPE_DVM }, + + /* provide a connection PID */ + { NULL, '\0', "pid", "pid", 1, + &myoptions.pid, OPAL_CMD_LINE_TYPE_INT, + "PID of the session-level daemon to which we should connect", + OPAL_CMD_LINE_OTYPE_DVM }, + + /* End of list */ + { NULL, '\0', NULL, NULL, 0, + NULL, OPAL_CMD_LINE_TYPE_NULL, NULL } +}; + + +static void infocb(int status, + opal_list_t *info, + void *cbdata, + opal_pmix_release_cbfunc_t release_fn, + void *release_cbdata) +{ + opal_pmix_lock_t *lock = (opal_pmix_lock_t*)cbdata; + OPAL_ACQUIRE_OBJECT(lock); + + if (NULL != release_fn) { + release_fn(release_cbdata); + } + OPAL_PMIX_WAKEUP_THREAD(lock); +} + +static void regcbfunc(int status, size_t ref, void *cbdata) +{ + opal_pmix_lock_t *lock = (opal_pmix_lock_t*)cbdata; + OPAL_ACQUIRE_OBJECT(lock); + evid = ref; + OPAL_PMIX_WAKEUP_THREAD(lock); +} + +static void opcbfunc(int status, void *cbdata) +{ + opal_pmix_lock_t *lock = (opal_pmix_lock_t*)cbdata; + OPAL_ACQUIRE_OBJECT(lock); + OPAL_PMIX_WAKEUP_THREAD(lock); +} + +static bool fired = false; +static void evhandler(int status, + const opal_process_name_t *source, + opal_list_t *info, opal_list_t *results, + opal_pmix_notification_complete_fn_t cbfunc, + void *cbdata) +{ + opal_value_t *val; + int jobstatus=0; + orte_jobid_t jobid = ORTE_JOBID_INVALID; + + /* we should always have info returned to us - if not, there is + * nothing we can do */ + if (NULL != info) { + OPAL_LIST_FOREACH(val, info, opal_value_t) { + if (0 == strcmp(val->key, OPAL_PMIX_JOB_TERM_STATUS)) { + jobstatus = val->data.integer; + } else if (0 == strcmp(val->key, OPAL_PMIX_PROCID)) { + jobid = val->data.name.jobid; + } + } + if (orte_cmd_options.verbose && (myjobid != ORTE_JOBID_INVALID && jobid == myjobid)) { + opal_output(0, "JOB %s COMPLETED WITH STATUS %d", + ORTE_JOBID_PRINT(jobid), jobstatus); + } + } + + /* only terminate if this was our job - keep in mind that we + * can get notifications of job termination prior to our spawn + * having completed! */ + if (!fired && (myjobid != ORTE_JOBID_INVALID && jobid == myjobid)) { + fired = true; + active = false; + } + + /* we _always_ have to execute the evhandler callback or + * else the event progress engine will hang */ + if (NULL != cbfunc) { + cbfunc(OPAL_SUCCESS, NULL, NULL, NULL, cbdata); + } +} + +typedef struct { + opal_pmix_lock_t lock; + opal_list_t list; +} mylock_t; + + +static void setupcbfunc(int status, + opal_list_t *info, + void *provided_cbdata, + opal_pmix_op_cbfunc_t cbfunc, void *cbdata) +{ + mylock_t *mylock = (mylock_t*)provided_cbdata; + opal_value_t *kv; + + if (NULL != info) { + /* cycle across the provided info */ + while (NULL != (kv = (opal_value_t*)opal_list_remove_first(info))) { + opal_list_append(&mylock->list, &kv->super); + } + } + + /* release the caller */ + if (NULL != cbfunc) { + cbfunc(OPAL_SUCCESS, cbdata); + } + + OPAL_PMIX_WAKEUP_THREAD(&mylock->lock); +} + +static void launchhandler(int status, + const opal_process_name_t *source, + opal_list_t *info, opal_list_t *results, + opal_pmix_notification_complete_fn_t cbfunc, + void *cbdata) +{ + opal_value_t *p; + + /* the info list will include the launch directives, so + * transfer those to the myinfo_t for return to the main thread */ + while (NULL != (p = (opal_value_t*)opal_list_remove_first(info))) { + opal_list_append(&myinfo.info, &p->super); + } + + /* we _always_ have to execute the evhandler callback or + * else the event progress engine will hang */ + if (NULL != cbfunc) { + cbfunc(OPAL_SUCCESS, NULL, NULL, NULL, cbdata); + } + + /* now release the thread */ + OPAL_PMIX_WAKEUP_THREAD(&myinfo.lock); +} + +int prun(int argc, char *argv[]) +{ + int rc, i; + char *param; + opal_pmix_lock_t lock; + opal_list_t apps, *lt; + opal_pmix_app_t *app; + opal_value_t *val, *kv, *kv2; + opal_list_t info, codes; + struct timespec tp = {0, 100000}; + mylock_t mylock; + + /* init the globals */ + memset(&orte_cmd_options, 0, sizeof(orte_cmd_options)); + memset(&myoptions, 0, sizeof(myoptions)); + OBJ_CONSTRUCT(&job_info, opal_list_t); + OBJ_CONSTRUCT(&apps, opal_list_t); + + /* search the argv for MCA params */ + for (i=0; NULL != argv[i]; i++) { + if (':' == argv[i][0] || + NULL == argv[i+1] || NULL == argv[i+2]) { + break; + } + if (0 == strncmp(argv[i], "-"OPAL_MCA_CMD_LINE_ID, strlen("-"OPAL_MCA_CMD_LINE_ID)) || + 0 == strncmp(argv[i], "--"OPAL_MCA_CMD_LINE_ID, strlen("--"OPAL_MCA_CMD_LINE_ID)) || + 0 == strncmp(argv[i], "-g"OPAL_MCA_CMD_LINE_ID, strlen("-g"OPAL_MCA_CMD_LINE_ID)) || + 0 == strncmp(argv[i], "--g"OPAL_MCA_CMD_LINE_ID, strlen("--g"OPAL_MCA_CMD_LINE_ID))) { + (void) mca_base_var_env_name (argv[i+1], ¶m); + opal_setenv(param, argv[i+2], true, &environ); + free(param); + } else if (0 == strcmp(argv[i], "-am") || + 0 == strcmp(argv[i], "--am")) { + (void)mca_base_var_env_name("mca_base_param_file_prefix", ¶m); + opal_setenv(param, argv[i+1], true, &environ); + free(param); + } else if (0 == strcmp(argv[i], "-tune") || + 0 == strcmp(argv[i], "--tune")) { + (void)mca_base_var_env_name("mca_base_envar_file_prefix", ¶m); + opal_setenv(param, argv[i+1], true, &environ); + free(param); + } + } + + /* init only the util portion of OPAL */ + if (OPAL_SUCCESS != (rc = opal_init_util(&argc, &argv))) { + return rc; + } + + /* set our proc type for schizo selection */ + orte_process_info.proc_type = ORTE_PROC_TOOL; + + /* open the SCHIZO framework so we can setup the command line */ + if (ORTE_SUCCESS != (rc = mca_base_framework_open(&orte_schizo_base_framework, 0))) { + ORTE_ERROR_LOG(rc); + return rc; + } + if (ORTE_SUCCESS != (rc = orte_schizo_base_select())) { + ORTE_ERROR_LOG(rc); + return rc; + } + + /* setup our cmd line */ + orte_cmd_line = OBJ_NEW(opal_cmd_line_t); + if (OPAL_SUCCESS != (rc = opal_cmd_line_add(orte_cmd_line, cmd_line_init))) { + return rc; + } + + /* setup the rest of the cmd line only once */ + if (OPAL_SUCCESS != (rc = orte_schizo.define_cli(orte_cmd_line))) { + return rc; + } + + /* now that options have been defined, finish setup */ + mca_base_cmd_line_setup(orte_cmd_line); + + /* parse the result to get values */ + if (OPAL_SUCCESS != (rc = opal_cmd_line_parse(orte_cmd_line, + true, false, argc, argv)) ) { + if (OPAL_ERR_SILENT != rc) { + fprintf(stderr, "%s: command line error (%s)\n", argv[0], + opal_strerror(rc)); + } + return rc; + } + + /* see if print version is requested. Do this before + * check for help so that --version --help works as + * one might expect. */ + if (orte_cmd_options.version) { + char *str; + str = opal_info_make_version_str("all", + OPAL_MAJOR_VERSION, OPAL_MINOR_VERSION, + OPAL_RELEASE_VERSION, + OPAL_GREEK_VERSION, + OPAL_REPO_REV); + if (NULL != str) { + fprintf(stdout, "%s (%s) %s\n\nReport bugs to %s\n", + "prun", "PMIx Reference Server", str, PACKAGE_BUGREPORT); + free(str); + } + exit(0); + } + + /* check if we are running as root - if we are, then only allow + * us to proceed if the allow-run-as-root flag was given. Otherwise, + * exit with a giant warning flag + */ + if (0 == geteuid() && !orte_cmd_options.run_as_root) { + /* show_help is not yet available, so print an error manually */ + fprintf(stderr, "--------------------------------------------------------------------------\n"); + if (orte_cmd_options.help) { + fprintf(stderr, "prun cannot provide the help message when run as root.\n\n"); + } else { + fprintf(stderr, "prun has detected an attempt to run as root.\n\n"); + } + + fprintf(stderr, "Running as root is *strongly* discouraged as any mistake (e.g., in\n"); + fprintf(stderr, "defining TMPDIR) or bug can result in catastrophic damage to the OS\n"); + fprintf(stderr, "file system, leaving your system in an unusable state.\n\n"); + + fprintf(stderr, "We strongly suggest that you run prun as a non-root user.\n\n"); + + fprintf(stderr, "You can override this protection by adding the --allow-run-as-root\n"); + fprintf(stderr, "option to your command line. However, we reiterate our strong advice\n"); + fprintf(stderr, "against doing so - please do so at your own risk.\n"); + fprintf(stderr, "--------------------------------------------------------------------------\n"); + exit(1); + } + + /* process any mca params */ + rc = mca_base_cmd_line_process_args(orte_cmd_line, &environ, &environ); + if (ORTE_SUCCESS != rc) { + return rc; + } + + /* Check for help request */ + if (orte_cmd_options.help) { + char *str, *args = NULL; + args = opal_cmd_line_get_usage_msg(orte_cmd_line); + str = opal_show_help_string("help-orterun.txt", "orterun:usage", false, + "prun", "PSVR", OPAL_VERSION, + "prun", args, + PACKAGE_BUGREPORT); + if (NULL != str) { + printf("%s", str); + free(str); + } + free(args); + + /* If someone asks for help, that should be all we do */ + exit(0); + } + + /* ensure we ONLY take the ess/tool component */ + opal_setenv(OPAL_MCA_PREFIX"ess", "tool", true, &environ); + /* tell the ess/tool component how we want to connect */ + if (myoptions.system_server_only) { + opal_setenv(OPAL_MCA_PREFIX"ess_tool_system_server_only", "1", true, &environ); + } + if (myoptions.system_server_first) { + opal_setenv(OPAL_MCA_PREFIX"ess_tool_system_server_first", "1", true, &environ); + } + /* if they specified the DVM's pid, then pass it along */ + if (0 != myoptions.pid) { + asprintf(¶m, "%d", myoptions.pid); + opal_setenv(OPAL_MCA_PREFIX"ess_tool_server_pid", param, true, &environ); + free(param); + } + /* if they specified the URI, then pass it along */ + if (NULL != orte_cmd_options.hnp) { + opal_setenv("PMIX_MCA_ptl_tcp_server_uri", orte_cmd_options.hnp, true, &environ); + } + + /* now initialize ORTE */ + if (OPAL_SUCCESS != (rc = orte_init(&argc, &argv, ORTE_PROC_TOOL))) { + OPAL_ERROR_LOG(rc); + return rc; + } + + /* if the user just wants us to terminate a DVM, then do so */ + if (myoptions.terminate_dvm) { + OBJ_CONSTRUCT(&info, opal_list_t); + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_JOB_CTRL_TERMINATE); + val->type = OPAL_BOOL; + val->data.flag = true; + opal_list_append(&info, &val->super); + fprintf(stderr, "TERMINATING DVM..."); + OPAL_PMIX_CONSTRUCT_LOCK(&lock); + rc = opal_pmix.job_control(NULL, &info, infocb, (void*)&lock); + OPAL_PMIX_WAIT_THREAD(&lock); + OPAL_PMIX_DESTRUCT_LOCK(&lock); + OPAL_LIST_DESTRUCT(&info); + fprintf(stderr, "DONE\n"); + goto DONE; + } + + /* get here if they want to run an application, so let's parse + * the cmd line to get it */ + + if (OPAL_SUCCESS != (rc = parse_locals(&apps, argc, argv))) { + OPAL_ERROR_LOG(rc); + OPAL_LIST_DESTRUCT(&apps); + goto DONE; + } + + /* bozo check */ + if (0 == opal_list_get_size(&apps)) { + opal_output(0, "No application specified!"); + goto DONE; + } + + /* init flag */ + active = true; + + /* register for job terminations so we get notified when + * our job completes */ + OPAL_PMIX_CONSTRUCT_LOCK(&lock); + OBJ_CONSTRUCT(&info, opal_list_t); + val = OBJ_NEW(opal_value_t); + val->key = strdup("foo"); + val->type = OPAL_INT; + val->data.integer = OPAL_ERR_JOB_TERMINATED; + opal_list_append(&info, &val->super); + opal_pmix.register_evhandler(&info, NULL, evhandler, regcbfunc, &lock); + OPAL_PMIX_WAIT_THREAD(&lock); + OPAL_PMIX_DESTRUCT_LOCK(&lock); + OPAL_LIST_DESTRUCT(&info); + + /* we want to be notified upon job completion */ + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_NOTIFY_COMPLETION); + val->type = OPAL_BOOL; + val->data.flag = true; + opal_list_append(&job_info, &val->super); + + /* see if they specified the personality */ + if (NULL != orte_cmd_options.personality) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_PERSONALITY); + val->type = OPAL_STRING; + val->data.string = strdup(orte_cmd_options.personality); + opal_list_append(&job_info, &val->super); + } + + /* check for stdout/err directives */ + /* if we were asked to tag output, mark it so */ + if (orte_cmd_options.tag_output) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_TAG_OUTPUT); + val->type = OPAL_BOOL; + val->data.flag = true; + opal_list_append(&job_info, &val->super); + } + /* if we were asked to timestamp output, mark it so */ + if (orte_cmd_options.timestamp_output) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_TIMESTAMP_OUTPUT); + val->type = OPAL_BOOL; + val->data.flag = true; + opal_list_append(&job_info, &val->super); + } + /* if we were asked to output to files, pass it along */ + if (NULL != orte_cmd_options.output_filename) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_OUTPUT_TO_FILE); + val->type = OPAL_STRING; + /* if the given filename isn't an absolute path, then + * convert it to one so the name will be relative to + * the directory where prun was given as that is what + * the user will have seen */ + if (!opal_path_is_absolute(orte_cmd_options.output_filename)) { + char cwd[OPAL_PATH_MAX]; + getcwd(cwd, sizeof(cwd)); + val->data.string = opal_os_path(false, cwd, orte_cmd_options.output_filename, NULL); + } else { + val->data.string = strdup(orte_cmd_options.output_filename); + } + opal_list_append(&job_info, &val->super); + } + /* if we were asked to merge stderr to stdout, mark it so */ + if (orte_cmd_options.merge) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_MERGE_STDERR_STDOUT); + val->type = OPAL_BOOL; + val->data.flag = true; + opal_list_append(&job_info, &val->super); + } + + /* check what user wants us to do with stdin */ + if (NULL != orte_cmd_options.stdin_target) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_STDIN_TGT); + val->type = OPAL_UINT32; + opal_list_append(&job_info, &val->super); + if (0 == strcmp(orte_cmd_options.stdin_target, "all")) { + val->data.uint32 = ORTE_VPID_WILDCARD; + } else if (0 == strcmp(orte_cmd_options.stdin_target, "none")) { + val->data.uint32 = ORTE_VPID_INVALID; + } else { + val->data.uint32 = strtoul(orte_cmd_options.stdin_target, NULL, 10); + } + } + + /* if we want the argv's indexed, indicate that */ + if (orte_cmd_options.index_argv) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_INDEX_ARGV); + val->type = OPAL_BOOL; + val->data.flag = true; + opal_list_append(&job_info, &val->super); + } + + if (NULL != orte_cmd_options.mapping_policy) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_MAPBY); + val->type = OPAL_STRING; + val->data.string = strdup(orte_cmd_options.mapping_policy); + opal_list_append(&job_info, &val->super); + } else if (orte_cmd_options.pernode) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_PPR); + val->type = OPAL_STRING; + val->data.string = strdup("1:node"); + opal_list_append(&job_info, &val->super); + } else if (0 < orte_cmd_options.npernode) { + /* define the ppr */ + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_PPR); + val->type = OPAL_STRING; + (void)asprintf(&val->data.string, "%d:node", orte_cmd_options.npernode); + opal_list_append(&job_info, &val->super); + } else if (0 < orte_cmd_options.npersocket) { + /* define the ppr */ + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_PPR); + val->type = OPAL_STRING; + (void)asprintf(&val->data.string, "%d:socket", orte_cmd_options.npernode); + opal_list_append(&job_info, &val->super); + } + + /* if the user specified cpus/rank, set it */ + if (0 < orte_cmd_options.cpus_per_proc) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_CPUS_PER_PROC); + val->type = OPAL_UINT32; + val->data.uint32 = orte_cmd_options.cpus_per_proc; + opal_list_append(&job_info, &val->super); + } + + /* if the user specified a ranking policy, then set it */ + if (NULL != orte_cmd_options.ranking_policy) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_RANKBY); + val->type = OPAL_STRING; + val->data.string = strdup(orte_cmd_options.ranking_policy); + opal_list_append(&job_info, &val->super); + } + + /* if the user specified a binding policy, then set it */ + if (NULL != orte_cmd_options.binding_policy) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_BINDTO); + val->type = OPAL_STRING; + val->data.string = strdup(orte_cmd_options.binding_policy); + opal_list_append(&job_info, &val->super); + } + + /* if they asked for nolocal, mark it so */ + if (orte_cmd_options.nolocal) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_NO_PROCS_ON_HEAD); + val->type = OPAL_BOOL; + val->data.flag = true; + opal_list_append(&job_info, &val->super); + } + if (orte_cmd_options.no_oversubscribe) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_NO_OVERSUBSCRIBE); + val->type = OPAL_BOOL; + val->data.flag = true; + opal_list_append(&job_info, &val->super); + } + if (orte_cmd_options.oversubscribe) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_NO_OVERSUBSCRIBE); + val->type = OPAL_BOOL; + val->data.flag = false; + opal_list_append(&job_info, &val->super); + } + if (orte_cmd_options.report_bindings) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_REPORT_BINDINGS); + val->type = OPAL_BOOL; + val->data.flag = true; + opal_list_append(&job_info, &val->super); + } + if (NULL != orte_cmd_options.cpu_list) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_CPU_LIST); + val->type = OPAL_STRING; + val->data.string = strdup(orte_cmd_options.cpu_list); + opal_list_append(&job_info, &val->super); + } + + /* mark if recovery was enabled on the cmd line */ + if (orte_enable_recovery) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_JOB_RECOVERABLE); + val->type = OPAL_BOOL; + val->data.flag = true; + opal_list_append(&job_info, &val->super); + } + /* record the max restarts */ + if (0 < orte_max_restarts) { + OPAL_LIST_FOREACH(app, &apps, opal_pmix_app_t) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_MAX_RESTARTS); + val->type = OPAL_UINT32; + val->data.uint32 = orte_max_restarts; + opal_list_append(&app->info, &val->super); + } + } + /* if continuous operation was specified */ + if (orte_cmd_options.continuous) { + /* mark this job as continuously operating */ + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_JOB_CONTINUOUS); + val->type = OPAL_BOOL; + val->data.flag = true; + opal_list_append(&job_info, &val->super); + } + + /* pickup any relevant envars */ + if (NULL != opal_pmix.server_setup_application) { + OBJ_CONSTRUCT(&info, opal_list_t); + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_SETUP_APP_ENVARS); + val->type = OPAL_BOOL; + val->data.flag = true; + opal_list_append(&info, &val->super); + + OPAL_PMIX_CONSTRUCT_LOCK(&mylock.lock); + OBJ_CONSTRUCT(&mylock.list, opal_list_t); + rc = opal_pmix.server_setup_application(ORTE_PROC_MY_NAME->jobid, + &info, setupcbfunc, &mylock); + if (OPAL_SUCCESS != rc) { + OPAL_LIST_DESTRUCT(&info); + OPAL_PMIX_DESTRUCT_LOCK(&mylock.lock); + OBJ_DESTRUCT(&mylock.list); + goto DONE; + } + OPAL_PMIX_WAIT_THREAD(&mylock.lock); + OPAL_PMIX_DESTRUCT_LOCK(&mylock.lock); + /* transfer any returned ENVARS to the job_info */ + while (NULL != (val = (opal_value_t*)opal_list_remove_first(&mylock.list))) { + if (0 == strcmp(val->key, OPAL_PMIX_SET_ENVAR) || + 0 == strcmp(val->key, OPAL_PMIX_ADD_ENVAR) || + 0 == strcmp(val->key, OPAL_PMIX_UNSET_ENVAR) || + 0 == strcmp(val->key, OPAL_PMIX_PREPEND_ENVAR) || + 0 == strcmp(val->key, OPAL_PMIX_APPEND_ENVAR)) { + opal_list_append(&job_info, &val->super); + } else { + OBJ_RELEASE(val); + } + } + OPAL_LIST_DESTRUCT(&mylock.list); + } + + /* if we were launched by a tool wanting to direct our + * operation, then we need to pause here and give it + * a chance to tell us what we need to do */ + if (NULL != (param = getenv("PMIX_LAUNCHER_PAUSE_FOR_TOOL")) && + 0 == strcmp(param, "1")) { + /* register for the PMIX_LAUNCH_DIRECTIVE event */ + OPAL_PMIX_CONSTRUCT_LOCK(&lock); + OBJ_CONSTRUCT(&codes, opal_list_t); + val = OBJ_NEW(opal_value_t); + val->key = strdup("foo"); + val->type = OPAL_INT; + val->data.integer = OPAL_PMIX_LAUNCH_DIRECTIVE; + opal_list_append(&codes, &val->super); + /* setup the myinfo object to capture the returned + * values - must do so prior to registering in case + * the event has already arrived */ + OBJ_CONSTRUCT(&myinfo, myinfo_t); + /* go ahead and register */ + opal_pmix.register_evhandler(&codes, NULL, launchhandler, regcbfunc, &lock); + OPAL_PMIX_WAIT_THREAD(&lock); + OPAL_PMIX_DESTRUCT_LOCK(&lock); + OPAL_LIST_DESTRUCT(&codes); + /* now wait for the launch directives to arrive */ + OPAL_PMIX_WAIT_THREAD(&myinfo.lock); + /* process the returned directives */ + OPAL_LIST_FOREACH(val, &myinfo.info, opal_value_t) { + if (0 == strcmp(val->key, OPAL_PMIX_DEBUG_JOB_DIRECTIVES)) { + /* there will be a pointer to a list containing the directives */ + lt = (opal_list_t*)val->data.ptr; + while (NULL != (kv = (opal_value_t*)opal_list_remove_first(lt))) { + opal_output(0, "JOB DIRECTIVE: %s", kv->key); + opal_list_append(&job_info, &kv->super); + } + } else if (0 == strcmp(val->key, OPAL_PMIX_DEBUG_APP_DIRECTIVES)) { + /* there will be a pointer to a list containing the directives */ + lt = (opal_list_t*)val->data.ptr; + OPAL_LIST_FOREACH(kv, lt, opal_value_t) { + opal_output(0, "APP DIRECTIVE: %s", kv->key); + OPAL_LIST_FOREACH(app, &apps, opal_pmix_app_t) { + /* the value can only be on one list at a time, so replicate it */ + kv2 = OBJ_NEW(opal_value_t); + opal_value_xfer(kv2, kv); + opal_list_append(&app->info, &kv2->super); + } + } + } + } + } + + if (OPAL_SUCCESS != (rc = opal_pmix.spawn(&job_info, &apps, &myjobid))) { + opal_output(0, "Job failed to spawn: %s", opal_strerror(rc)); + goto DONE; + } + OPAL_LIST_DESTRUCT(&job_info); + OPAL_LIST_DESTRUCT(&apps); + + if (orte_cmd_options.verbose) { + opal_output(0, "JOB %s EXECUTING", OPAL_JOBID_PRINT(myjobid)); + } + + while (active) { + nanosleep(&tp, NULL); + } + OPAL_PMIX_CONSTRUCT_LOCK(&lock); + opal_pmix.deregister_evhandler(evid, opcbfunc, &lock); + OPAL_PMIX_WAIT_THREAD(&lock); + OPAL_PMIX_DESTRUCT_LOCK(&lock); + + DONE: + /* cleanup and leave */ + orte_finalize(); + return 0; +} + +static int parse_locals(opal_list_t *jdata, int argc, char* argv[]) +{ + int i, rc; + int temp_argc; + char **temp_argv, **env; + opal_pmix_app_t *app; + bool made_app; + + /* Make the apps */ + temp_argc = 0; + temp_argv = NULL; + opal_argv_append(&temp_argc, &temp_argv, argv[0]); + + /* NOTE: This bogus env variable is necessary in the calls to + create_app(), below. See comment immediately before the + create_app() function for an explanation. */ + + env = NULL; + for (i = 1; i < argc; ++i) { + if (0 == strcmp(argv[i], ":")) { + /* Make an app with this argv */ + if (opal_argv_count(temp_argv) > 1) { + if (NULL != env) { + opal_argv_free(env); + env = NULL; + } + app = NULL; + rc = create_app(temp_argc, temp_argv, jdata, &app, &made_app, &env); + if (OPAL_SUCCESS != rc) { + /* Assume that the error message has already been + printed; no need to cleanup -- we can just + exit */ + exit(1); + } + if (made_app) { + opal_list_append(jdata, &app->super); + } + + /* Reset the temps */ + + temp_argc = 0; + temp_argv = NULL; + opal_argv_append(&temp_argc, &temp_argv, argv[0]); + } + } else { + opal_argv_append(&temp_argc, &temp_argv, argv[i]); + } + } + + if (opal_argv_count(temp_argv) > 1) { + app = NULL; + rc = create_app(temp_argc, temp_argv, jdata, &app, &made_app, &env); + if (ORTE_SUCCESS != rc) { + /* Assume that the error message has already been printed; + no need to cleanup -- we can just exit */ + exit(1); + } + if (made_app) { + opal_list_append(jdata, &app->super); + } + } + if (NULL != env) { + opal_argv_free(env); + } + opal_argv_free(temp_argv); + + /* All done */ + + return ORTE_SUCCESS; +} + + +/* + * This function takes a "char ***app_env" parameter to handle the + * specific case: + * + * orterun --mca foo bar -app appfile + * + * That is, we'll need to keep foo=bar, but the presence of the app + * file will cause an invocation of parse_appfile(), which will cause + * one or more recursive calls back to create_app(). Since the + * foo=bar value applies globally to all apps in the appfile, we need + * to pass in the "base" environment (that contains the foo=bar value) + * when we parse each line in the appfile. + * + * This is really just a special case -- when we have a simple case like: + * + * orterun --mca foo bar -np 4 hostname + * + * Then the upper-level function (parse_locals()) calls create_app() + * with a NULL value for app_env, meaning that there is no "base" + * environment that the app needs to be created from. + */ +static int create_app(int argc, char* argv[], + opal_list_t *jdata, + opal_pmix_app_t **app_ptr, + bool *made_app, char ***app_env) +{ + char cwd[OPAL_PATH_MAX]; + int i, j, count, rc; + char *param, *value; + opal_pmix_app_t *app = NULL; + bool found = false; + char *appname = NULL; + opal_value_t *val; + + *made_app = false; + + /* parse the cmd line - do this every time thru so we can + * repopulate the globals */ + if (OPAL_SUCCESS != (rc = opal_cmd_line_parse(orte_cmd_line, true, false, + argc, argv)) ) { + if (OPAL_ERR_SILENT != rc) { + fprintf(stderr, "%s: command line error (%s)\n", argv[0], + opal_strerror(rc)); + } + return rc; + } + + /* Setup application context */ + app = OBJ_NEW(opal_pmix_app_t); + opal_cmd_line_get_tail(orte_cmd_line, &count, &app->argv); + + /* See if we have anything left */ + if (0 == count) { + opal_show_help("help-orterun.txt", "orterun:executable-not-specified", + true, "prun", "prun"); + rc = OPAL_ERR_NOT_FOUND; + goto cleanup; + } + + /* Grab all MCA environment variables */ + app->env = opal_argv_copy(*app_env); + for (i=0; NULL != environ[i]; i++) { + if (0 == strncmp("PMIX_", environ[i], 5) || + 0 == strncmp("OMPI_", environ[i], 5)) { + /* check for duplicate in app->env - this + * would have been placed there by the + * cmd line processor. By convention, we + * always let the cmd line override the + * environment + */ + param = strdup(environ[i]); + value = strchr(param, '='); + *value = '\0'; + value++; + opal_setenv(param, value, false, &app->env); + free(param); + } + } + + /* set necessary env variables for external usage from tune conf file*/ + int set_from_file = 0; + char **vars = NULL; + if (OPAL_SUCCESS == mca_base_var_process_env_list_from_file(&vars) && + NULL != vars) { + for (i=0; NULL != vars[i]; i++) { + value = strchr(vars[i], '='); + /* terminate the name of the param */ + *value = '\0'; + /* step over the equals */ + value++; + /* overwrite any prior entry */ + opal_setenv(vars[i], value, true, &app->env); + /* save it for any comm_spawn'd apps */ + opal_setenv(vars[i], value, true, &orte_forwarded_envars); + } + set_from_file = 1; + opal_argv_free(vars); + } + /* Did the user request to export any environment variables on the cmd line? */ + char *env_set_flag; + env_set_flag = getenv("OMPI_MCA_mca_base_env_list"); + if (opal_cmd_line_is_taken(orte_cmd_line, "x")) { + if (NULL != env_set_flag) { + opal_show_help("help-orterun.txt", "orterun:conflict-env-set", false); + return ORTE_ERR_FATAL; + } + j = opal_cmd_line_get_ninsts(orte_cmd_line, "x"); + for (i = 0; i < j; ++i) { + param = opal_cmd_line_get_param(orte_cmd_line, "x", i, 0); + + if (NULL != (value = strchr(param, '='))) { + /* terminate the name of the param */ + *value = '\0'; + /* step over the equals */ + value++; + /* overwrite any prior entry */ + opal_setenv(param, value, true, &app->env); + /* save it for any comm_spawn'd apps */ + opal_setenv(param, value, true, &orte_forwarded_envars); + } else { + value = getenv(param); + if (NULL != value) { + /* overwrite any prior entry */ + opal_setenv(param, value, true, &app->env); + /* save it for any comm_spawn'd apps */ + opal_setenv(param, value, true, &orte_forwarded_envars); + } else { + opal_output(0, "Warning: could not find environment variable \"%s\"\n", param); + } + } + } + } else if (NULL != env_set_flag) { + /* if mca_base_env_list was set, check if some of env vars were set via -x from a conf file. + * If this is the case, error out. + */ + if (!set_from_file) { + /* set necessary env variables for external usage */ + vars = NULL; + if (OPAL_SUCCESS == mca_base_var_process_env_list(env_set_flag, &vars) && + NULL != vars) { + for (i=0; NULL != vars[i]; i++) { + value = strchr(vars[i], '='); + /* terminate the name of the param */ + *value = '\0'; + /* step over the equals */ + value++; + /* overwrite any prior entry */ + opal_setenv(vars[i], value, true, &app->env); + /* save it for any comm_spawn'd apps */ + opal_setenv(vars[i], value, true, &orte_forwarded_envars); + } + opal_argv_free(vars); + } + } else { + opal_show_help("help-orterun.txt", "orterun:conflict-env-set", false); + return ORTE_ERR_FATAL; + } + } + + /* Did the user request a specific wdir? */ + + if (NULL != orte_cmd_options.wdir) { + /* if this is a relative path, convert it to an absolute path */ + if (opal_path_is_absolute(orte_cmd_options.wdir)) { + app->cwd = strdup(orte_cmd_options.wdir); + } else { + /* get the cwd */ + if (OPAL_SUCCESS != (rc = opal_getcwd(cwd, sizeof(cwd)))) { + opal_show_help("help-orterun.txt", "orterun:init-failure", + true, "get the cwd", rc); + goto cleanup; + } + /* construct the absolute path */ + app->cwd = opal_os_path(false, cwd, orte_cmd_options.wdir, NULL); + } + } else if (orte_cmd_options.set_cwd_to_session_dir) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_SET_SESSION_CWD); + val->type = OPAL_BOOL; + val->data.flag = true; + opal_list_append(&app->info, &val->super); + } else { + if (OPAL_SUCCESS != (rc = opal_getcwd(cwd, sizeof(cwd)))) { + opal_show_help("help-orterun.txt", "orterun:init-failure", + true, "get the cwd", rc); + goto cleanup; + } + app->cwd = strdup(cwd); + } + + /* Did the user specify a hostfile. Need to check for both + * hostfile and machine file. + * We can only deal with one hostfile per app context, otherwise give an error. + */ + found = false; + if (0 < (j = opal_cmd_line_get_ninsts(orte_cmd_line, "hostfile"))) { + if (1 < j) { + opal_show_help("help-orterun.txt", "orterun:multiple-hostfiles", + true, "prun", NULL); + return ORTE_ERR_FATAL; + } else { + value = opal_cmd_line_get_param(orte_cmd_line, "hostfile", 0, 0); + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_HOSTFILE); + val->type = OPAL_STRING; + val->data.string = value; + opal_list_append(&app->info, &val->super); + found = true; + } + } + if (0 < (j = opal_cmd_line_get_ninsts(orte_cmd_line, "machinefile"))) { + if (1 < j || found) { + opal_show_help("help-orterun.txt", "orterun:multiple-hostfiles", + true, "prun", NULL); + return ORTE_ERR_FATAL; + } else { + value = opal_cmd_line_get_param(orte_cmd_line, "machinefile", 0, 0); + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_HOSTFILE); + val->type = OPAL_STRING; + val->data.string = value; + opal_list_append(&app->info, &val->super); + } + } + + /* Did the user specify any hosts? */ + if (0 < (j = opal_cmd_line_get_ninsts(orte_cmd_line, "host"))) { + char **targ=NULL, *tval; + for (i = 0; i < j; ++i) { + value = opal_cmd_line_get_param(orte_cmd_line, "host", i, 0); + opal_argv_append_nosize(&targ, value); + } + tval = opal_argv_join(targ, ','); + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_HOST); + val->type = OPAL_STRING; + val->data.string = tval; + opal_list_append(&app->info, &val->super); + } + + /* check for bozo error */ + if (0 > orte_cmd_options.num_procs) { + opal_show_help("help-orterun.txt", "orterun:negative-nprocs", + true, "prun", app->argv[0], + orte_cmd_options.num_procs, NULL); + return ORTE_ERR_FATAL; + } + + app->maxprocs = orte_cmd_options.num_procs; + + /* see if we need to preload the binary to + * find the app - don't do this for java apps, however, as we + * can't easily find the class on the cmd line. Java apps have to + * preload their binary via the preload_files option + */ + if (NULL == strstr(app->argv[0], "java")) { + if (orte_cmd_options.preload_binaries) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_SET_SESSION_CWD); + val->type = OPAL_BOOL; + val->data.flag = true; + opal_list_append(&app->info, &val->super); + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_PRELOAD_BIN); + val->type = OPAL_BOOL; + val->data.flag = true; + opal_list_append(&app->info, &val->super); + } + } + if (NULL != orte_cmd_options.preload_files) { + val = OBJ_NEW(opal_value_t); + val->key = strdup(OPAL_PMIX_PRELOAD_FILES); + val->type = OPAL_BOOL; + val->data.flag = true; + opal_list_append(&app->info, &val->super); + } + + /* Do not try to find argv[0] here -- the starter is responsible + for that because it may not be relevant to try to find it on + the node where orterun is executing. So just strdup() argv[0] + into app. */ + + app->cmd = strdup(app->argv[0]); + if (NULL == app->cmd) { + opal_show_help("help-orterun.txt", "orterun:call-failed", + true, "prun", "library", "strdup returned NULL", errno); + rc = ORTE_ERR_NOT_FOUND; + goto cleanup; + } + + /* if this is a Java application, we have a bit more work to do. Such + * applications actually need to be run under the Java virtual machine + * and the "java" command will start the "executable". So we need to ensure + * that all the proper java-specific paths are provided + */ + appname = opal_basename(app->cmd); + if (0 == strcmp(appname, "java")) { + /* see if we were given a library path */ + found = false; + for (i=1; NULL != app->argv[i]; i++) { + if (NULL != strstr(app->argv[i], "java.library.path")) { + char *dptr; + /* find the '=' that delineates the option from the path */ + if (NULL == (dptr = strchr(app->argv[i], '='))) { + /* that's just wrong */ + rc = ORTE_ERR_BAD_PARAM; + goto cleanup; + } + /* step over the '=' */ + ++dptr; + /* yep - but does it include the path to the mpi libs? */ + found = true; + if (NULL == strstr(app->argv[i], opal_install_dirs.libdir)) { + /* doesn't appear to - add it to be safe */ + if (':' == app->argv[i][strlen(app->argv[i]-1)]) { + asprintf(&value, "-Djava.library.path=%s%s", dptr, opal_install_dirs.libdir); + } else { + asprintf(&value, "-Djava.library.path=%s:%s", dptr, opal_install_dirs.libdir); + } + free(app->argv[i]); + app->argv[i] = value; + } + break; + } + } + if (!found) { + /* need to add it right after the java command */ + asprintf(&value, "-Djava.library.path=%s", opal_install_dirs.libdir); + opal_argv_insert_element(&app->argv, 1, value); + free(value); + } + + /* see if we were given a class path */ + found = false; + for (i=1; NULL != app->argv[i]; i++) { + if (NULL != strstr(app->argv[i], "cp") || + NULL != strstr(app->argv[i], "classpath")) { + /* yep - but does it include the path to the mpi libs? */ + found = true; + /* check if mpi.jar exists - if so, add it */ + value = opal_os_path(false, opal_install_dirs.libdir, "mpi.jar", NULL); + if (access(value, F_OK ) != -1) { + set_classpath_jar_file(app, i+1, "mpi.jar"); + } + free(value); + /* check for oshmem support */ + value = opal_os_path(false, opal_install_dirs.libdir, "shmem.jar", NULL); + if (access(value, F_OK ) != -1) { + set_classpath_jar_file(app, i+1, "shmem.jar"); + } + free(value); + /* always add the local directory */ + asprintf(&value, "%s:%s", app->cwd, app->argv[i+1]); + free(app->argv[i+1]); + app->argv[i+1] = value; + break; + } + } + if (!found) { + /* check to see if CLASSPATH is in the environment */ + found = false; // just to be pedantic + for (i=0; NULL != environ[i]; i++) { + if (0 == strncmp(environ[i], "CLASSPATH", strlen("CLASSPATH"))) { + value = strchr(environ[i], '='); + ++value; /* step over the = */ + opal_argv_insert_element(&app->argv, 1, value); + /* check for mpi.jar */ + value = opal_os_path(false, opal_install_dirs.libdir, "mpi.jar", NULL); + if (access(value, F_OK ) != -1) { + set_classpath_jar_file(app, 1, "mpi.jar"); + } + free(value); + /* check for shmem.jar */ + value = opal_os_path(false, opal_install_dirs.libdir, "shmem.jar", NULL); + if (access(value, F_OK ) != -1) { + set_classpath_jar_file(app, 1, "shmem.jar"); + } + free(value); + /* always add the local directory */ + (void)asprintf(&value, "%s:%s", app->cwd, app->argv[1]); + free(app->argv[1]); + app->argv[1] = value; + opal_argv_insert_element(&app->argv, 1, "-cp"); + found = true; + break; + } + } + if (!found) { + /* need to add it right after the java command - have + * to include the working directory and trust that + * the user set cwd if necessary + */ + char *str, *str2; + /* always start with the working directory */ + str = strdup(app->cwd); + /* check for mpi.jar */ + value = opal_os_path(false, opal_install_dirs.libdir, "mpi.jar", NULL); + if (access(value, F_OK ) != -1) { + (void)asprintf(&str2, "%s:%s", str, value); + free(str); + str = str2; + } + free(value); + /* check for shmem.jar */ + value = opal_os_path(false, opal_install_dirs.libdir, "shmem.jar", NULL); + if (access(value, F_OK ) != -1) { + asprintf(&str2, "%s:%s", str, value); + free(str); + str = str2; + } + free(value); + opal_argv_insert_element(&app->argv, 1, str); + free(str); + opal_argv_insert_element(&app->argv, 1, "-cp"); + } + } + /* try to find the actual command - may not be perfect */ + for (i=1; i < opal_argv_count(app->argv); i++) { + if (NULL != strstr(app->argv[i], "java.library.path")) { + continue; + } else if (NULL != strstr(app->argv[i], "cp") || + NULL != strstr(app->argv[i], "classpath")) { + /* skip the next field */ + i++; + continue; + } + /* declare this the winner */ + opal_setenv("OMPI_COMMAND", app->argv[i], true, &app->env); + /* collect everything else as the cmd line */ + if ((i+1) < opal_argv_count(app->argv)) { + value = opal_argv_join(&app->argv[i+1], ' '); + opal_setenv("OMPI_ARGV", value, true, &app->env); + free(value); + } + break; + } + } else { + /* add the cmd to the environment for MPI_Info to pickup */ + opal_setenv("OMPI_COMMAND", appname, true, &app->env); + if (1 < opal_argv_count(app->argv)) { + value = opal_argv_join(&app->argv[1], ' '); + opal_setenv("OMPI_ARGV", value, true, &app->env); + free(value); + } + } + + *app_ptr = app; + app = NULL; + *made_app = true; + + /* All done */ + + cleanup: + if (NULL != app) { + OBJ_RELEASE(app); + } + if (NULL != appname) { + free(appname); + } + return rc; +} + +static void set_classpath_jar_file(opal_pmix_app_t *app, int index, char *jarfile) +{ + if (NULL == strstr(app->argv[index], jarfile)) { + /* nope - need to add it */ + char *fmt = ':' == app->argv[index][strlen(app->argv[index]-1)] + ? "%s%s/%s" : "%s:%s/%s"; + char *str; + asprintf(&str, fmt, app->argv[index], opal_install_dirs.libdir, jarfile); + free(app->argv[index]); + app->argv[index] = str; + } +} diff --git a/orte/tools/prun/prun.h b/orte/tools/prun/prun.h new file mode 100644 index 00000000000..dad4f76049d --- /dev/null +++ b/orte/tools/prun/prun.h @@ -0,0 +1,37 @@ +/* + * Copyright (c) 2004-2010 The Trustees of Indiana University and Indiana + * University Research and Technology + * Corporation. All rights reserved. + * Copyright (c) 2004-2005 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, + * University of Stuttgart. All rights reserved. + * Copyright (c) 2004-2005 The Regents of the University of California. + * All rights reserved. + * Copyright (c) 2007-2011 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2012-2013 Los Alamos National Security, LLC. + * All rights reserved + * Copyright (c) 2014-2017 Intel, Inc. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#ifndef PRUN_H +#define PRUN_H + +#include "orte_config.h" + +BEGIN_C_DECLS + +/** + * Main body of prun functionality + */ +int prun(int argc, char *argv[]); + +END_C_DECLS + +#endif /* ORTERUN_ORTERUN_H */ diff --git a/orte/util/Makefile.am b/orte/util/Makefile.am index 2eb7ef5e485..d54503b3bb0 100644 --- a/orte/util/Makefile.am +++ b/orte/util/Makefile.am @@ -11,7 +11,7 @@ # All rights reserved. # Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved. # Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. -# Copyright (c) 2014-2017 Intel, Inc. All rights reserved. +# Copyright (c) 2014-2018 Intel, Inc. All rights reserved. # Copyright (c) 2016 Research Organization for Information Science # and Technology (RIST). All rights reserved. # $COPYRIGHT$ @@ -43,24 +43,23 @@ AM_LFLAGS = -Porte_util_hostfile_ LEX_OUTPUT_ROOT = lex.orte_util_hostfile_ headers += \ - util/name_fns.h \ + util/name_fns.h \ util/proc_info.h \ util/session_dir.h \ util/show_help.h \ util/error_strings.h \ - util/context_fns.h \ - util/parse_options.h \ - util/pre_condition_transports.h \ + util/context_fns.h \ + util/parse_options.h \ + util/pre_condition_transports.h \ util/hnp_contact.h \ util/hostfile/hostfile.h \ util/hostfile/hostfile_lex.h \ util/dash_host/dash_host.h \ util/comm/comm.h \ - util/nidmap.h \ - util/regex.h \ util/attr.h \ util/listener.h \ - util/compress.h + util/compress.h \ + util/threads.h lib@ORTE_LIB_PREFIX@open_rte_la_SOURCES += \ util/error_strings.c \ @@ -68,16 +67,14 @@ lib@ORTE_LIB_PREFIX@open_rte_la_SOURCES += \ util/proc_info.c \ util/session_dir.c \ util/show_help.c \ - util/context_fns.c \ - util/parse_options.c \ - util/pre_condition_transports.c \ + util/context_fns.c \ + util/parse_options.c \ + util/pre_condition_transports.c \ util/hnp_contact.c \ util/hostfile/hostfile_lex.l \ util/hostfile/hostfile.c \ util/dash_host/dash_host.c \ util/comm/comm.c \ - util/nidmap.c \ - util/regex.c \ util/attr.c \ util/listener.c \ util/compress.c diff --git a/orte/util/attr.c b/orte/util/attr.c index 1f447f4a87c..9e8716f0928 100644 --- a/orte/util/attr.c +++ b/orte/util/attr.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2014-2017 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2018 Intel, Inc. All rights reserved. * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ @@ -34,12 +34,6 @@ typedef struct { /* all default to NULL */ static orte_attr_converter_t converters[MAX_CONVERTERS]; -static int orte_attr_unload(orte_attribute_t *kv, - void **data, opal_data_type_t type); - -static int orte_attr_load(orte_attribute_t *kv, - void *data, opal_data_type_t type); - bool orte_get_attribute(opal_list_t *attributes, orte_attribute_key_t key, void **data, opal_data_type_t type) @@ -95,6 +89,81 @@ int orte_set_attribute(opal_list_t *attributes, return ORTE_SUCCESS; } +orte_attribute_t* orte_fetch_attribute(opal_list_t *attributes, + orte_attribute_t *prev, + orte_attribute_key_t key) +{ + orte_attribute_t *kv, *end, *next; + + /* if prev is NULL, then find the first attr on the list + * that matches the key */ + if (NULL == prev) { + OPAL_LIST_FOREACH(kv, attributes, orte_attribute_t) { + if (key == kv->key) { + return kv; + } + } + /* if we get, then the key isn't on the list */ + return NULL; + } + + /* if we are at the end of the list, then nothing to do */ + end = (orte_attribute_t*)opal_list_get_end(attributes); + if (prev == end || end == (orte_attribute_t*)opal_list_get_next(&prev->super) || + NULL == opal_list_get_next(&prev->super)) { + return NULL; + } + + /* starting with the next item on the list, search + * for the next attr with the matching key */ + next = (orte_attribute_t*)opal_list_get_next(&prev->super); + while (NULL != next) { + if (next->key == key) { + return next; + } + next = (orte_attribute_t*)opal_list_get_next(&next->super); + } + + /* if we get here, then no matching key was found */ + return NULL; +} + +int orte_add_attribute(opal_list_t *attributes, + orte_attribute_key_t key, bool local, + void *data, opal_data_type_t type) +{ + orte_attribute_t *kv; + int rc; + + kv = OBJ_NEW(orte_attribute_t); + kv->key = key; + kv->local = local; + if (OPAL_SUCCESS != (rc = orte_attr_load(kv, data, type))) { + OBJ_RELEASE(kv); + return rc; + } + opal_list_append(attributes, &kv->super); + return ORTE_SUCCESS; +} + +int orte_prepend_attribute(opal_list_t *attributes, + orte_attribute_key_t key, bool local, + void *data, opal_data_type_t type) +{ + orte_attribute_t *kv; + int rc; + + kv = OBJ_NEW(orte_attribute_t); + kv->key = key; + kv->local = local; + if (OPAL_SUCCESS != (rc = orte_attr_load(kv, data, type))) { + OBJ_RELEASE(kv); + return rc; + } + opal_list_prepend(attributes, &kv->super); + return ORTE_SUCCESS; +} + void orte_remove_attribute(opal_list_t *attributes, orte_attribute_key_t key) { orte_attribute_t *kv; @@ -170,6 +239,16 @@ const char *orte_attr_key_to_str(orte_attribute_key_t key) return "APP-PREFIX-DIR"; case ORTE_APP_NO_CACHEDIR: return "ORTE_APP_NO_CACHEDIR"; + case ORTE_APP_SET_ENVAR: + return "ORTE_APP_SET_ENVAR"; + case ORTE_APP_UNSET_ENVAR: + return "ORTE_APP_UNSET_ENVAR"; + case ORTE_APP_PREPEND_ENVAR: + return "ORTE_APP_PREPEND_ENVAR"; + case ORTE_APP_APPEND_ENVAR: + return "ORTE_APP_APPEND_ENVAR"; + case ORTE_APP_ADD_ENVAR: + return "ORTE_APP_ADD_ENVAR"; case ORTE_NODE_USERNAME: return "NODE-USERNAME"; @@ -286,6 +365,22 @@ const char *orte_attr_key_to_str(orte_attribute_key_t key) return "ORTE_JOB_TRANSPORT_KEY"; case ORTE_JOB_INFO_CACHE: return "ORTE_JOB_INFO_CACHE"; + case ORTE_JOB_FULLY_DESCRIBED: + return "ORTE_JOB_FULLY_DESCRIBED"; + case ORTE_JOB_SILENT_TERMINATION: + return "ORTE_JOB_SILENT_TERMINATION"; + case ORTE_JOB_SET_ENVAR: + return "ORTE_JOB_SET_ENVAR"; + case ORTE_JOB_UNSET_ENVAR: + return "ORTE_JOB_UNSET_ENVAR"; + case ORTE_JOB_PREPEND_ENVAR: + return "ORTE_JOB_PREPEND_ENVAR"; + case ORTE_JOB_APPEND_ENVAR: + return "ORTE_JOB_APPEND_ENVAR"; + case ORTE_JOB_ADD_ENVAR: + return "ORTE_APP_ADD_ENVAR"; + case ORTE_JOB_APP_SETUP_DATA: + return "ORTE_JOB_APP_SETUP_DATA"; case ORTE_PROC_NOBARRIER: return "PROC-NOBARRIER"; @@ -356,11 +451,12 @@ const char *orte_attr_key_to_str(orte_attribute_key_t key) } -static int orte_attr_load(orte_attribute_t *kv, - void *data, opal_data_type_t type) +int orte_attr_load(orte_attribute_t *kv, + void *data, opal_data_type_t type) { opal_byte_object_t *boptr; struct timeval *tv; + opal_envar_t *envar; kv->type = type; if (NULL == data) { @@ -481,6 +577,18 @@ static int orte_attr_load(orte_attribute_t *kv, kv->data.name = *(opal_process_name_t *)data; break; + case OPAL_ENVAR: + OBJ_CONSTRUCT(&kv->data.envar, opal_envar_t); + envar = (opal_envar_t*)data; + if (NULL != envar->envar) { + kv->data.envar.envar = strdup(envar->envar); + } + if (NULL != envar->value) { + kv->data.envar.value = strdup(envar->value); + } + kv->data.envar.separator = envar->separator; + break; + default: OPAL_ERROR_LOG(OPAL_ERR_NOT_SUPPORTED); return OPAL_ERR_NOT_SUPPORTED; @@ -488,10 +596,11 @@ static int orte_attr_load(orte_attribute_t *kv, return OPAL_SUCCESS; } -static int orte_attr_unload(orte_attribute_t *kv, - void **data, opal_data_type_t type) +int orte_attr_unload(orte_attribute_t *kv, + void **data, opal_data_type_t type) { opal_byte_object_t *boptr; + opal_envar_t *envar; if (type != kv->type) { return OPAL_ERR_TYPE_MISMATCH; @@ -599,6 +708,18 @@ static int orte_attr_unload(orte_attribute_t *kv, memcpy(*data, &kv->data.name, sizeof(orte_process_name_t)); break; + case OPAL_ENVAR: + envar = OBJ_NEW(opal_envar_t); + if (NULL != kv->data.envar.envar) { + envar->envar = strdup(kv->data.envar.envar); + } + if (NULL != kv->data.envar.value) { + envar->value = strdup(kv->data.envar.value); + } + envar->separator = kv->data.envar.separator; + *data = envar; + break; + default: OPAL_ERROR_LOG(OPAL_ERR_NOT_SUPPORTED); return OPAL_ERR_NOT_SUPPORTED; diff --git a/orte/util/attr.h b/orte/util/attr.h index 1b961030091..8393dc9a2dd 100644 --- a/orte/util/attr.h +++ b/orte/util/attr.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2014-2017 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2018 Intel, Inc. All rights reserved. * Copyright (c) 2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ @@ -47,6 +47,11 @@ typedef uint8_t orte_app_context_flags_t; #define ORTE_APP_MAX_PPN 14 // uint32 - maximum number of procs/node for this app #define ORTE_APP_PREFIX_DIR 15 // string - prefix directory for this app, if override necessary #define ORTE_APP_NO_CACHEDIR 16 // bool - flag that a cache dir is not to be specified for a Singularity container +#define ORTE_APP_SET_ENVAR 17 // opal_envar_t - set the given envar to the specified value +#define ORTE_APP_UNSET_ENVAR 18 // string - name of envar to unset, if present +#define ORTE_APP_PREPEND_ENVAR 19 // opal_envar_t - prepend the specified value to the given envar +#define ORTE_APP_APPEND_ENVAR 20 // opal_envar_t - append the specified value to the given envar +#define ORTE_APP_ADD_ENVAR 21 // opal_envar_t - add envar, do not override pre-existing one #define ORTE_APP_MAX_KEY 100 @@ -143,6 +148,15 @@ typedef uint16_t orte_job_flags_t; #define ORTE_JOB_NOTIFY_COMPLETION (ORTE_JOB_START_KEY + 50) // bool - notify parent proc when spawned job terminates #define ORTE_JOB_TRANSPORT_KEY (ORTE_JOB_START_KEY + 51) // string - transport keys assigned to this job #define ORTE_JOB_INFO_CACHE (ORTE_JOB_START_KEY + 52) // opal_list_t - list of opal_value_t to be included in job_info +#define ORTE_JOB_FULLY_DESCRIBED (ORTE_JOB_START_KEY + 53) // bool - job is fully described in launch msg +#define ORTE_JOB_SILENT_TERMINATION (ORTE_JOB_START_KEY + 54) // bool - do not generate an event notification when job + // normally terminates +#define ORTE_JOB_SET_ENVAR (ORTE_JOB_START_KEY + 55) // opal_envar_t - set the given envar to the specified value +#define ORTE_JOB_UNSET_ENVAR (ORTE_JOB_START_KEY + 56) // string - name of envar to unset, if present +#define ORTE_JOB_PREPEND_ENVAR (ORTE_JOB_START_KEY + 57) // opal_envar_t - prepend the specified value to the given envar +#define ORTE_JOB_APPEND_ENVAR (ORTE_JOB_START_KEY + 58) // opal_envar_t - append the specified value to the given envar +#define ORTE_JOB_ADD_ENVAR (ORTE_JOB_START_KEY + 59) // opal_envar_t - add envar, do not override pre-existing one +#define ORTE_JOB_APP_SETUP_DATA (ORTE_JOB_START_KEY + 60) // opal_byte_object_t - blob containing app setup data #define ORTE_JOB_MAX_KEY 300 @@ -218,6 +232,24 @@ ORTE_DECLSPEC int orte_set_attribute(opal_list_t *attributes, orte_attribute_key /* Remove the named attribute from a list */ ORTE_DECLSPEC void orte_remove_attribute(opal_list_t *attributes, orte_attribute_key_t key); +ORTE_DECLSPEC orte_attribute_t* orte_fetch_attribute(opal_list_t *attributes, + orte_attribute_t *prev, + orte_attribute_key_t key); + +ORTE_DECLSPEC int orte_add_attribute(opal_list_t *attributes, + orte_attribute_key_t key, bool local, + void *data, opal_data_type_t type); + +ORTE_DECLSPEC int orte_prepend_attribute(opal_list_t *attributes, + orte_attribute_key_t key, bool local, + void *data, opal_data_type_t type); + +ORTE_DECLSPEC int orte_attr_load(orte_attribute_t *kv, + void *data, opal_data_type_t type); + +ORTE_DECLSPEC int orte_attr_unload(orte_attribute_t *kv, + void **data, opal_data_type_t type); + /* * Register a handler for converting attr keys to strings * diff --git a/orte/util/comm/comm.c b/orte/util/comm/comm.c index 426cbc4a69c..fdcbcc033e1 100644 --- a/orte/util/comm/comm.c +++ b/orte/util/comm/comm.c @@ -11,7 +11,7 @@ * All rights reserved. * Copyright (c) 2010-2012 Los Alamos National Security, LLC. * All rights reserved. - * Copyright (c) 2014-2016 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2017 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -28,6 +28,7 @@ #include "opal/util/output.h" #include "opal/threads/tsd.h" #include "opal/mca/event/event.h" +#include "opal/mca/pmix/pmix.h" #include "opal/runtime/opal_progress.h" #include "opal/dss/dss.h" @@ -38,6 +39,7 @@ #include "orte/mca/rml/base/rml_contact.h" #include "orte/mca/routed/routed.h" #include "orte/util/name_fns.h" +#include "orte/util/threads.h" #include "orte/runtime/orte_globals.h" #include "orte/runtime/orte_wait.h" @@ -109,9 +111,7 @@ static bool tool_connected = false; int orte_util_comm_connect_tool(char *uri) { int rc; - - /* set the contact info into the comm hash tables*/ - orte_rml.set_contact_info(uri); + opal_value_t val; /* extract the tool's name and store it */ if (ORTE_SUCCESS != (rc = orte_rml_base_parse_uris(uri, &tool, NULL))) { @@ -119,6 +119,22 @@ int orte_util_comm_connect_tool(char *uri) return rc; } + /* set the contact info into the comm hash tables*/ + OBJ_CONSTRUCT(&val, opal_value_t); + val.key = OPAL_PMIX_PROC_URI; + val.type = OPAL_STRING; + val.data.string = uri; + if (OPAL_SUCCESS != (rc = opal_pmix.store_local(&tool, &val))) { + ORTE_ERROR_LOG(rc); + val.key = NULL; + val.data.string = NULL; + OBJ_DESTRUCT(&val); + return rc; + } + val.key = NULL; + val.data.string = NULL; + OBJ_DESTRUCT(&val); + /* set the route to be direct */ if (ORTE_SUCCESS != (rc = orte_routed.update_route(NULL, &tool, &tool))) { ORTE_ERROR_LOG(rc); @@ -807,4 +823,3 @@ int orte_util_comm_halt_vm(const orte_process_name_t *hnp) CLEANUP: return rc; } - diff --git a/orte/util/error_strings.c b/orte/util/error_strings.c index 3e9c2239b57..a2acad8339d 100644 --- a/orte/util/error_strings.c +++ b/orte/util/error_strings.c @@ -12,7 +12,7 @@ * Copyright (c) 2010-2016 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2011-2013 Los Alamos National Security, LLC. * All rights reserved. - * Copyright (c) 2014-2017 Intel, Inc. All rights reserved. + * Copyright (c) 2014-2018 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -89,7 +89,7 @@ int orte_err2str(int errnum, const char **errmsg) if (orte_report_silent_errors) { retval = "Silent error"; } else { - retval = NULL; + retval = ""; } break; case ORTE_ERR_ADDRESSEE_UNKNOWN: @@ -174,7 +174,7 @@ int orte_err2str(int errnum, const char **errmsg) if (orte_report_silent_errors) { retval = "Next option"; } else { - retval = NULL; + retval = ""; } break; case ORTE_ERR_SENSOR_LIMIT_EXCEEDED: @@ -195,39 +195,12 @@ int orte_err2str(int errnum, const char **errmsg) case ORTE_ERR_OP_IN_PROGRESS: retval = "Operation in progress"; break; - case ORTE_ERR_OPEN_CHANNEL_PEER_FAIL: - retval = "Open channel to peer failed"; - break; - case ORTE_ERR_OPEN_CHANNEL_PEER_REJECT: - retval = "Open channel to peer was rejected"; - break; - case ORTE_ERR_QOS_TYPE_UNSUPPORTED: - retval = "QoS type unsupported"; - break; - case ORTE_ERR_QOS_ACK_WINDOW_FULL: - retval = "QoS ack window full"; - break; - case ORTE_ERR_ACK_TIMEOUT_SENDER: - retval = "Send ack timed out"; - break; - case ORTE_ERR_ACK_TIMEOUT_RECEIVER: - retval = "Recv ack timed out"; - break; - case ORTE_ERR_LOST_MSG_IN_WINDOW: - retval = "Msg lost in window"; - break; - case ORTE_ERR_CHANNEL_BUSY: - retval = "Channel busy"; - break; - case ORTE_ERR_DUPLICATE_MSG: - retval = "Duplicate message"; + case ORTE_ERR_OPEN_CONDUIT_FAIL: + retval = "Open messaging conduit failed"; break; case ORTE_ERR_OUT_OF_ORDER_MSG: retval = "Out of order message"; break; - case ORTE_ERR_OPEN_CHANNEL_DUPLICATE: - retval = "Duplicate channel open request"; - break; case ORTE_ERR_FORCE_SELECT: retval = "Force select"; break; @@ -244,11 +217,7 @@ int orte_err2str(int errnum, const char **errmsg) retval = "Partial success"; break; default: - if (orte_report_silent_errors) { - retval = "Unknown error"; - } else { - retval = NULL; - } + retval = "Unknown error"; } *errmsg = retval; @@ -284,6 +253,8 @@ const char *orte_job_state_to_str(orte_job_state_t state) return "VM READY"; case ORTE_JOB_STATE_LAUNCH_APPS: return "PENDING APP LAUNCH"; + case ORTE_JOB_STATE_SEND_LAUNCH_MSG: + return "SENDING LAUNCH MSG"; case ORTE_JOB_STATE_RUNNING: return "RUNNING"; case ORTE_JOB_STATE_SUSPENDED: diff --git a/orte/util/help-regex.txt b/orte/util/help-regex.txt index b9b00bc2170..ef24b52c5b5 100644 --- a/orte/util/help-regex.txt +++ b/orte/util/help-regex.txt @@ -12,6 +12,7 @@ # All rights reserved. # Copyright (c) 2014 Research Organization for Information Science # and Technology (RIST). All rights reserved. +# Copyright (c) 2017 Intel, Inc. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -68,3 +69,18 @@ digits in the names: regexp: %s Please contact the Open MPI help list for assistance. +# +[regex:invalid-name] +While trying to create a regular expression of the node names +used in this application, the regex parser has detected the +presence of an illegal character in the following node name: + + node: %s + +Node names must be composed of a combination of ascii letters, +digits, dots, and the hyphen ('-') character. See the following +for an explanation: + + https://en.wikipedia.org/wiki/Hostname + +Please correct the error and try again. diff --git a/orte/util/hnp_contact.c b/orte/util/hnp_contact.c index 3ec8e471b8a..f7cf36f8374 100644 --- a/orte/util/hnp_contact.c +++ b/orte/util/hnp_contact.c @@ -12,7 +12,7 @@ * All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2016 Intel, Inc. All rights reserved. + * Copyright (c) 2016-2017 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -41,8 +41,10 @@ #include "opal/util/os_path.h" #include "opal/util/output.h" #include "opal/util/os_dirpath.h" +#include "opal/mca/pmix/pmix.h" #include "orte/mca/errmgr/errmgr.h" +#include "orte/mca/oob/base/base.h" #include "orte/mca/rml/rml.h" #include "orte/mca/rml/base/rml_contact.h" #include "orte/mca/routed/routed.h" @@ -77,7 +79,7 @@ int orte_write_hnp_contact_file(char *filename) FILE *fp; char *my_uri; - my_uri = orte_rml.get_contact_info(); + orte_oob_base_get_addr(&my_uri); if (NULL == my_uri) { return ORTE_ERROR; } @@ -104,6 +106,7 @@ int orte_read_hnp_contact_file(char *filename, orte_hnp_contact_t *hnp, bool con char *hnp_uri, *pidstr; FILE *fp; int rc; + opal_value_t val; fp = fopen(filename, "r"); if (NULL == fp) { /* failed on first read - wait and try again */ @@ -133,9 +136,6 @@ int orte_read_hnp_contact_file(char *filename, orte_hnp_contact_t *hnp, bool con fclose(fp); if (connect) { - /* set the contact info into the comm hash tables*/ - orte_rml.set_contact_info(hnp_uri); - /* extract the HNP's name and store it */ if (ORTE_SUCCESS != (rc = orte_rml_base_parse_uris(hnp_uri, &hnp->name, NULL))) { ORTE_ERROR_LOG(rc); @@ -143,6 +143,23 @@ int orte_read_hnp_contact_file(char *filename, orte_hnp_contact_t *hnp, bool con return rc; } + /* set the contact info into the comm hash tables*/ + OBJ_CONSTRUCT(&val, opal_value_t); + val.key = OPAL_PMIX_PROC_URI; + val.type = OPAL_STRING; + val.data.string = hnp_uri; + if (OPAL_SUCCESS != (rc = opal_pmix.store_local(&hnp->name, &val))) { + ORTE_ERROR_LOG(rc); + val.key = NULL; + val.data.string = NULL; + OBJ_DESTRUCT(&val); + free(hnp_uri); + return rc; + } + val.key = NULL; + val.data.string = NULL; + OBJ_DESTRUCT(&val); + /* set the route to be direct */ if (ORTE_SUCCESS != (rc = orte_routed.update_route(NULL, &hnp->name, &hnp->name))) { ORTE_ERROR_LOG(rc); diff --git a/orte/util/hostfile/hostfile.c b/orte/util/hostfile/hostfile.c index 78bf08a60f7..f502d3bfa06 100644 --- a/orte/util/hostfile/hostfile.c +++ b/orte/util/hostfile/hostfile.c @@ -12,7 +12,7 @@ * Copyright (c) 2007 Los Alamos National Security, LLC. All rights * reserved. * Copyright (c) 2011 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2013-2014 Intel, Inc. All rights reserved. + * Copyright (c) 2013-2017 Intel, Inc. All rights reserved. * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2016 IBM Corporation. All rights reserved. @@ -661,7 +661,6 @@ int orte_util_filter_hostfile_nodes(opal_list_t *nodes, orte_node_t *node = (orte_node_t*)item2; if (0 == strcmp(node_from_file->name, node->name)) { /* match - remove it */ - opal_output(0, "HOST %s ON EXCLUDE LIST - REMOVING", node->name); opal_list_remove_item(&newnodes, item2); OBJ_RELEASE(item2); break; @@ -795,7 +794,8 @@ int orte_util_filter_hostfile_nodes(opal_list_t *nodes, * to the specified count - this allows people * to subdivide an allocation */ - if (node_from_file->slots < node_from_list->slots) { + if (ORTE_FLAG_TEST(node_from_file, ORTE_NODE_FLAG_SLOTS_GIVEN) && + node_from_file->slots < node_from_list->slots) { node_from_list->slots = node_from_file->slots; } if (remove) { diff --git a/orte/util/nidmap.c b/orte/util/nidmap.c deleted file mode 100644 index be0437bf209..00000000000 --- a/orte/util/nidmap.c +++ /dev/null @@ -1,1088 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2011 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2012-2014 Los Alamos National Security, LLC. - * All rights reserved. - * Copyright (c) 2013-2017 Intel, Inc. All rights reserved. - * Copyright (c) 2014 Research Organization for Information Science - * and Technology (RIST). All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ -#include "orte_config.h" -#include "orte/types.h" -#include "orte/constants.h" - -#include -#include -#include -#include -#include -#ifdef HAVE_UNISTD_H -#include -#endif -#ifdef HAVE_SYS_SOCKET_H -#include -#endif -#ifdef HAVE_NETINET_IN_H -#include -#endif -#ifdef HAVE_ARPA_INET_H -#include -#endif -#ifdef HAVE_NETDB_H -#include -#endif -#ifdef HAVE_IFADDRS_H -#include -#endif - -#include "opal/dss/dss.h" -#include "opal/runtime/opal.h" -#include "opal/class/opal_pointer_array.h" -#include "opal/mca/pmix/pmix.h" -#include "opal/mca/hwloc/base/base.h" -#include "opal/util/net.h" -#include "opal/util/output.h" -#include "opal/util/argv.h" -#include "opal/datatype/opal_datatype.h" - -#include "orte/mca/dfs/dfs.h" -#include "orte/mca/errmgr/errmgr.h" -#include "orte/mca/odls/base/odls_private.h" -#include "orte/mca/routed/routed.h" -#include "orte/util/show_help.h" -#include "orte/util/proc_info.h" -#include "orte/util/name_fns.h" -#include "orte/util/regex.h" -#include "orte/runtime/orte_globals.h" -#include "orte/mca/rml/base/rml_contact.h" -#include "orte/mca/state/state.h" - -#include "orte/util/nidmap.h" - -int orte_util_build_daemon_nidmap(char **nodes) -{ - int i, num_nodes; - int rc; - struct hostent *h; - opal_buffer_t buf; - opal_process_name_t proc; - char *uri, *addr; - char *proc_name; - opal_value_t kv; - - num_nodes = opal_argv_count(nodes); - - if (0 == num_nodes) { - /* nothing to do */ - return ORTE_SUCCESS; - } - - /* install the entry for the HNP */ - proc.jobid = ORTE_PROC_MY_NAME->jobid; - proc.vpid = 0; - OBJ_CONSTRUCT(&kv, opal_value_t); - kv.key = strdup(ORTE_DB_DAEMON_VPID); - kv.data.uint32 = proc.vpid; - kv.type = OPAL_UINT32; - if (OPAL_SUCCESS != (rc = opal_pmix.store_local(&proc, &kv))) { - ORTE_ERROR_LOG(rc); - OBJ_DESTRUCT(&kv); - return rc; - } - OBJ_DESTRUCT(&kv); - - /* the daemon vpids will be assigned in order, - * starting with vpid=0 for the HNP */ - OBJ_CONSTRUCT(&buf, opal_buffer_t); - for (i=0; i < num_nodes; i++) { - /* define the vpid for this daemon */ - proc.vpid = i; - /* store the hostname for the proc */ - OBJ_CONSTRUCT(&kv, opal_value_t); - kv.key = strdup(OPAL_PMIX_HOSTNAME); - kv.data.string = strdup(nodes[i]); - kv.type = OPAL_STRING; - if (OPAL_SUCCESS != (rc = opal_pmix.store_local(&proc, &kv))) { - ORTE_ERROR_LOG(rc); - OBJ_DESTRUCT(&kv); - return rc; - } - OBJ_DESTRUCT(&kv); - - /* the arch defaults to our arch so that non-hetero - * case will yield correct behavior - */ - OBJ_CONSTRUCT(&kv, opal_value_t); - kv.key = strdup(OPAL_PMIX_ARCH); - kv.data.uint32 = opal_local_arch; - kv.type = OPAL_UINT32; - if (OPAL_SUCCESS != (rc = opal_pmix.store_local(&proc, &kv))) { - ORTE_ERROR_LOG(rc); - OBJ_DESTRUCT(&kv); - return rc; - } - OBJ_DESTRUCT(&kv); - - /* lookup the address of this node */ - if (NULL == (h = gethostbyname(nodes[i]))) { - ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND); - return ORTE_ERR_NOT_FOUND; - } - addr = inet_ntoa(*(struct in_addr*)h->h_addr_list[0]); - - /* since we are using static ports, all my fellow daemons will be on my - * port. Setup the contact info for each daemon in my hash tables. Note - * that this will -not- open a port to those daemons, but will only - * define the info necessary for opening such a port if/when I communicate - * to them - */ - - /* construct the URI */ - orte_util_convert_process_name_to_string(&proc_name, &proc); - asprintf(&uri, "%s;tcp://%s:%d", proc_name, addr, (int)orte_process_info.my_port); - OPAL_OUTPUT_VERBOSE((2, orte_debug_verbosity, - "%s orte:util:build:daemon:nidmap node %s daemon %d addr %s uri %s", - ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), - nodes[i], i+1, addr, uri)); - opal_dss.pack(&buf, &uri, 1, OPAL_STRING); - free(proc_name); - free(uri); - } - - /* load the hash tables */ - if (ORTE_SUCCESS != (rc = orte_rml_base_update_contact_info(&buf))) { - ORTE_ERROR_LOG(rc); - } - OBJ_DESTRUCT(&buf); - - return rc; -} - -int orte_util_encode_nodemap(opal_buffer_t *buffer) -{ - char *node; - char prefix[ORTE_MAX_NODE_PREFIX]; - int i, j, n, len, startnum, nodenum, numdigits; - bool found, fullname, test; - char *suffix, *sfx; - orte_regex_node_t *ndreg; - orte_regex_range_t *range, *rng, *slt, *tp, *flg; - opal_list_t nodenms, dvpids, slots, topos, flags; - opal_list_item_t *item, *itm2; - char **regexargs = NULL, *tmp, *tmp2; - orte_node_t *nptr; - int rc; - uint8_t ui8; - - /* setup the list of results */ - OBJ_CONSTRUCT(&nodenms, opal_list_t); - OBJ_CONSTRUCT(&dvpids, opal_list_t); - OBJ_CONSTRUCT(&slots, opal_list_t); - OBJ_CONSTRUCT(&topos, opal_list_t); - OBJ_CONSTRUCT(&flags, opal_list_t); - - rng = NULL; - slt = NULL; - tp = NULL; - flg = NULL; - for (n=0; n < orte_node_pool->size; n++) { - if (NULL == (nptr = (orte_node_t*)opal_pointer_array_get_item(orte_node_pool, n))) { - continue; - } - /* if no daemon has been assigned, then this node is not being used */ - if (NULL == nptr->daemon) { - continue; - } - /* deal with the daemon vpid - see if it is next in the - * current range */ - if (NULL == rng) { - /* just starting */ - rng = OBJ_NEW(orte_regex_range_t); - rng->start = nptr->daemon->name.vpid; - rng->cnt = 1; - opal_list_append(&dvpids, &rng->super); - } else { - /* is this the next in line */ - if (nptr->daemon->name.vpid == (orte_vpid_t)(rng->start + rng->cnt)) { - rng->cnt++; - } else { - /* need to start another range */ - rng = OBJ_NEW(orte_regex_range_t); - rng->start = nptr->daemon->name.vpid; - rng->cnt = 1; - opal_list_append(&dvpids, &rng->super); - } - } - /* check the #slots */ - if (NULL == slt) { - /* just starting */ - slt = OBJ_NEW(orte_regex_range_t); - slt->start = nptr->daemon->name.vpid; - slt->slots = nptr->slots; - slt->cnt = 1; - opal_list_append(&slots, &slt->super); - } else { - /* is this the next in line */ - if (nptr->slots == slt->slots) { - slt->cnt++; - } else { - /* need to start another range */ - slt = OBJ_NEW(orte_regex_range_t); - slt->start = nptr->daemon->name.vpid; - slt->slots = nptr->slots; - slt->cnt = 1; - opal_list_append(&slots, &slt->super); - } - } - /* check the topologies */ - if (NULL == tp) { - if (NULL != nptr->topology) { - /* just starting */ - tp = OBJ_NEW(orte_regex_range_t); - tp->start = nptr->daemon->name.vpid; - tp->t = nptr->topology; - tp->cnt = 1; - opal_list_append(&topos, &tp->super); - } - } else { - if (NULL != nptr->topology) { - /* is this the next in line */ - if (tp->t == nptr->topology) { - tp->cnt++; - } else { - /* need to start another range */ - tp = OBJ_NEW(orte_regex_range_t); - tp->start = nptr->daemon->name.vpid; - tp->t = nptr->topology; - tp->cnt = 1; - opal_list_append(&topos, &tp->super); - } - } - } - /* check the flags */ - test = ORTE_FLAG_TEST(nptr, ORTE_NODE_FLAG_SLOTS_GIVEN); - if (NULL == flg) { - /* just starting */ - flg = OBJ_NEW(orte_regex_range_t); - flg->start = nptr->daemon->name.vpid; - if (test) { - flg->slots = 1; - } else { - flg->slots = 0; - } - flg->cnt = 1; - opal_list_append(&flags, &flg->super); - } else { - /* is this the next in line */ - if ((test && 1 == flg->slots) || - (!test && 0 == flg->slots)) { - flg->cnt++; - } else { - /* need to start another range */ - flg = OBJ_NEW(orte_regex_range_t); - flg->start = nptr->daemon->name.vpid; - if (test) { - flg->slots = 1; - } else { - flg->slots = 0; - } - flg->cnt = 1; - opal_list_append(&flags, &flg->super); - } - } - node = nptr->name; - /* determine this node's prefix by looking for first non-alpha char */ - fullname = false; - len = strlen(node); - startnum = -1; - memset(prefix, 0, ORTE_MAX_NODE_PREFIX); - numdigits = 0; - for (i=0, j=0; i < len; i++) { - if (!isalpha(node[i])) { - /* found a non-alpha char */ - if (!isdigit(node[i])) { - /* if it is anything but a digit, we just use - * the entire name - */ - fullname = true; - break; - } - /* count the size of the numeric field - but don't - * add the digits to the prefix - */ - numdigits++; - if (startnum < 0) { - /* okay, this defines end of the prefix */ - startnum = i; - } - continue; - } - if (startnum < 0) { - prefix[j++] = node[i]; - } - } - if (fullname || startnum < 0) { - /* can't compress this name - just add it to the list */ - ndreg = OBJ_NEW(orte_regex_node_t); - ndreg->prefix = strdup(node); - opal_list_append(&nodenms, &ndreg->super); - continue; - } - /* convert the digits and get any suffix */ - nodenum = strtol(&node[startnum], &sfx, 10); - if (NULL != sfx) { - suffix = strdup(sfx); - } else { - suffix = NULL; - } - /* is this node name already on our list? */ - found = false; - for (item = opal_list_get_first(&nodenms); - !found && item != opal_list_get_end(&nodenms); - item = opal_list_get_next(item)) { - ndreg = (orte_regex_node_t*)item; - if (0 < strlen(prefix) && NULL == ndreg->prefix) { - continue; - } - if (0 == strlen(prefix) && NULL != ndreg->prefix) { - continue; - } - if (0 < strlen(prefix) && NULL != ndreg->prefix - && 0 != strcmp(prefix, ndreg->prefix)) { - continue; - } - if (NULL == suffix && NULL != ndreg->suffix) { - continue; - } - if (NULL != suffix && NULL == ndreg->suffix) { - continue; - } - if (NULL != suffix && NULL != ndreg->suffix && - 0 != strcmp(suffix, ndreg->suffix)) { - continue; - } - if (numdigits != ndreg->num_digits) { - continue; - } - /* found a match - flag it */ - found = true; - /* get the last range on this nodeid - we do this - * to preserve order - */ - range = (orte_regex_range_t*)opal_list_get_last(&ndreg->ranges); - if (NULL == range) { - /* first range for this nodeid */ - range = OBJ_NEW(orte_regex_range_t); - range->start = nodenum; - range->cnt = 1; - opal_list_append(&ndreg->ranges, &range->super); - break; - } - /* see if the node number is out of sequence */ - if (nodenum != (range->start + range->cnt)) { - /* start a new range */ - range = OBJ_NEW(orte_regex_range_t); - range->start = nodenum; - range->cnt = 1; - opal_list_append(&ndreg->ranges, &range->super); - break; - } - /* everything matches - just increment the cnt */ - range->cnt++; - break; - } - if (!found) { - /* need to add it */ - ndreg = OBJ_NEW(orte_regex_node_t); - if (0 < strlen(prefix)) { - ndreg->prefix = strdup(prefix); - } - if (NULL != suffix) { - ndreg->suffix = strdup(suffix); - } - ndreg->num_digits = numdigits; - opal_list_append(&nodenms, &ndreg->super); - /* record the first range for this nodeid - we took - * care of names we can't compress above - */ - range = OBJ_NEW(orte_regex_range_t); - range->start = nodenum; - range->cnt = 1; - opal_list_append(&ndreg->ranges, &range->super); - } - if (NULL != suffix) { - free(suffix); - } - } - - /* begin constructing the regular expression */ - while (NULL != (item = opal_list_remove_first(&nodenms))) { - ndreg = (orte_regex_node_t*)item; - - /* if no ranges, then just add the name */ - if (0 == opal_list_get_size(&ndreg->ranges)) { - if (NULL != ndreg->prefix) { - /* solitary node */ - asprintf(&tmp, "%s", ndreg->prefix); - opal_argv_append_nosize(®exargs, tmp); - free(tmp); - } - OBJ_RELEASE(ndreg); - continue; - } - /* start the regex for this nodeid with the prefix */ - if (NULL != ndreg->prefix) { - asprintf(&tmp, "%s[%d:", ndreg->prefix, ndreg->num_digits); - } else { - asprintf(&tmp, "[%d:", ndreg->num_digits); - } - /* add the ranges */ - while (NULL != (itm2 = opal_list_remove_first(&ndreg->ranges))) { - range = (orte_regex_range_t*)itm2; - if (1 == range->cnt) { - asprintf(&tmp2, "%s%d,", tmp, range->start); - } else { - asprintf(&tmp2, "%s%d-%d,", tmp, range->start, range->start + range->cnt - 1); - } - free(tmp); - tmp = tmp2; - OBJ_RELEASE(range); - } - /* replace the final comma */ - tmp[strlen(tmp)-1] = ']'; - if (NULL != ndreg->suffix) { - /* add in the suffix, if provided */ - asprintf(&tmp2, "%s%s", tmp, ndreg->suffix); - free(tmp); - tmp = tmp2; - } - opal_argv_append_nosize(®exargs, tmp); - free(tmp); - OBJ_RELEASE(ndreg); - } - - /* assemble final result */ - tmp = opal_argv_join(regexargs, ','); - /* cleanup */ - opal_argv_free(regexargs); - OBJ_DESTRUCT(&nodenms); - - /* pack the string */ - if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &tmp, 1, OPAL_STRING))) { - ORTE_ERROR_LOG(rc); - OPAL_LIST_DESTRUCT(&dvpids); - OPAL_LIST_DESTRUCT(&slots); - return rc; - } - if (NULL != tmp) { - free(tmp); - } - - /* do the same for the vpids */ - tmp = NULL; - while (NULL != (item = opal_list_remove_first(&dvpids))) { - rng = (orte_regex_range_t*)item; - if (1 < rng->cnt) { - if (NULL == tmp) { - asprintf(&tmp, "%d-%d", rng->start, rng->start + rng->cnt - 1); - } else { - asprintf(&tmp2, "%s,%d-%d", tmp, rng->start, rng->start + rng->cnt - 1); - free(tmp); - tmp = tmp2; - } - } else { - if (NULL == tmp) { - asprintf(&tmp, "%d", rng->start); - } else { - asprintf(&tmp2, "%s,%d", tmp, rng->start); - free(tmp); - tmp = tmp2; - } - } - OBJ_RELEASE(rng); - } - OPAL_LIST_DESTRUCT(&dvpids); - - /* pack the string */ - if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &tmp, 1, OPAL_STRING))) { - ORTE_ERROR_LOG(rc); - OPAL_LIST_DESTRUCT(&slots); - return rc; - } - if (NULL != tmp) { - free(tmp); - } - - /* do the same to pass #slots on each node */ - tmp = NULL; - while (NULL != (item = opal_list_remove_first(&slots))) { - rng = (orte_regex_range_t*)item; - if (1 < rng->cnt) { - if (NULL == tmp) { - asprintf(&tmp, "%d-%d[%d]", rng->start, rng->start + rng->cnt - 1, rng->slots); - } else { - asprintf(&tmp2, "%s,%d-%d[%d]", tmp, rng->start, rng->start + rng->cnt - 1, rng->slots); - free(tmp); - tmp = tmp2; - } - } else { - if (NULL == tmp) { - asprintf(&tmp, "%d[%d]", rng->start, rng->slots); - } else { - asprintf(&tmp2, "%s,%d[%d]", tmp, rng->start, rng->slots); - free(tmp); - tmp = tmp2; - } - } - OBJ_RELEASE(rng); - } - OPAL_LIST_DESTRUCT(&slots); - - /* pack the string */ - if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &tmp, 1, OPAL_STRING))) { - ORTE_ERROR_LOG(rc); - return rc; - } - if (NULL != tmp) { - free(tmp); - } - - /* do the same to pass the flags for each node */ - tmp = NULL; - while (NULL != (item = opal_list_remove_first(&flags))) { - rng = (orte_regex_range_t*)item; - if (1 < rng->cnt) { - if (NULL == tmp) { - asprintf(&tmp, "%d-%d[%x]", rng->start, rng->start + rng->cnt - 1, rng->slots); - } else { - asprintf(&tmp2, "%s,%d-%d[%x]", tmp, rng->start, rng->start + rng->cnt - 1, rng->slots); - free(tmp); - tmp = tmp2; - } - } else { - if (NULL == tmp) { - asprintf(&tmp, "%d[%x]", rng->start, rng->slots); - } else { - asprintf(&tmp2, "%s,%d[%x]", tmp, rng->start, rng->slots); - free(tmp); - tmp = tmp2; - } - } - OBJ_RELEASE(rng); - } - OPAL_LIST_DESTRUCT(&flags); - - /* pack the string */ - if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &tmp, 1, OPAL_STRING))) { - ORTE_ERROR_LOG(rc); - return rc; - } - if (NULL != tmp) { - free(tmp); - } - - /* pack a flag indicating if the HNP was included in the allocation */ - if (orte_hnp_is_allocated) { - ui8 = 1; - } else { - ui8 = 0; - } - if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &ui8, 1, OPAL_UINT8))) { - ORTE_ERROR_LOG(rc); - return rc; - } - - /* pack a flag indicating if we are in a managed allocation */ - if (orte_managed_allocation) { - ui8 = 1; - } else { - ui8 = 0; - } - if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &ui8, 1, OPAL_UINT8))) { - ORTE_ERROR_LOG(rc); - return rc; - } - - /* handle the topologies - as the most common case by far - * is to have homogeneous topologies, we only send them - * if something is different */ - tmp = NULL; - if (1 < opal_list_get_size(&topos)) { - opal_buffer_t bucket, *bptr; - OBJ_CONSTRUCT(&bucket, opal_buffer_t); - while (NULL != (item = opal_list_remove_first(&topos))) { - rng = (orte_regex_range_t*)item; - if (1 < rng->cnt) { - if (NULL == tmp) { - asprintf(&tmp, "%d-%d", rng->start, rng->start + rng->cnt - 1); - } else { - asprintf(&tmp2, "%s,%d-%d", tmp, rng->start, rng->start + rng->cnt - 1); - free(tmp); - tmp = tmp2; - } - } else { - if (NULL == tmp) { - asprintf(&tmp, "%d", rng->start); - } else { - asprintf(&tmp2, "%s,%d", tmp, rng->start); - free(tmp); - tmp = tmp2; - } - } - /* pack this topology string */ - if (ORTE_SUCCESS != (rc = opal_dss.pack(&bucket, &rng->t->sig, 1, OPAL_STRING))) { - ORTE_ERROR_LOG(rc); - OBJ_RELEASE(rng); - OPAL_LIST_DESTRUCT(&topos); - OBJ_DESTRUCT(&bucket); - free(tmp); - return rc; - } - /* pack the topology itself */ - if (ORTE_SUCCESS != (rc = opal_dss.pack(&bucket, &rng->t->topo, 1, OPAL_HWLOC_TOPO))) { - ORTE_ERROR_LOG(rc); - OBJ_RELEASE(rng); - OPAL_LIST_DESTRUCT(&topos); - OBJ_DESTRUCT(&bucket); - free(tmp); - return rc; - } - OBJ_RELEASE(rng); - } - OPAL_LIST_DESTRUCT(&topos); - - /* pack the string */ - if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &tmp, 1, OPAL_STRING))) { - ORTE_ERROR_LOG(rc); - OBJ_DESTRUCT(&bucket); - free(tmp); - return rc; - } - free(tmp); - - /* now pack the topologies */ - bptr = &bucket; - if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &bptr, 1, OPAL_BUFFER))) { - ORTE_ERROR_LOG(rc); - OBJ_DESTRUCT(&bucket); - return rc; - } - OBJ_DESTRUCT(&bucket); - } else { - /* need to pack the NULL just to terminate the region */ - if (ORTE_SUCCESS != (rc = opal_dss.pack(buffer, &tmp, 1, OPAL_STRING))) { - ORTE_ERROR_LOG(rc); - return rc; - } - } - - return ORTE_SUCCESS; -} - -/* decode a nodemap for a daemon */ -int orte_util_decode_daemon_nodemap(opal_buffer_t *buffer) -{ - int n, nn, rc; - orte_node_t *node; - size_t k, endpt, start; - orte_job_t *daemons; - orte_proc_t *dptr; - char **nodes=NULL, *dvpids=NULL, *slots=NULL, *topos=NULL, *flags=NULL; - char *ndnames, *rmndr, **tmp; - opal_list_t dids, slts, flgs;; - opal_buffer_t *bptr=NULL; - orte_topology_t *t; - orte_regex_range_t *rng, *drng, *srng, *frng; - uint8_t ui8; - - /* unpack the node regex */ - n = 1; - if (ORTE_SUCCESS != (rc = opal_dss.unpack(buffer, &ndnames, &n, OPAL_STRING))) { - ORTE_ERROR_LOG(rc); - return rc; - } - /* it is okay for this to be NULL */ - if (NULL == ndnames) { - return ORTE_SUCCESS; - } - - OBJ_CONSTRUCT(&dids, opal_list_t); - OBJ_CONSTRUCT(&slts, opal_list_t); - OBJ_CONSTRUCT(&flgs, opal_list_t); - - /* unpack the daemon vpid regex */ - n = 1; - if (ORTE_SUCCESS != (rc = opal_dss.unpack(buffer, &dvpids, &n, OPAL_STRING))) { - ORTE_ERROR_LOG(rc); - goto cleanup; - } - /* this is not allowed to be NULL */ - if (NULL == dvpids) { - ORTE_ERROR_LOG(ORTE_ERR_BAD_PARAM); - rc = ORTE_ERR_BAD_PARAM; - goto cleanup; - } - - /* unpack the slots regex */ - n = 1; - if (ORTE_SUCCESS != (rc = opal_dss.unpack(buffer, &slots, &n, OPAL_STRING))) { - ORTE_ERROR_LOG(rc); - goto cleanup; - } - /* this is not allowed to be NULL */ - if (NULL == slots) { - ORTE_ERROR_LOG(ORTE_ERR_BAD_PARAM); - rc = ORTE_ERR_BAD_PARAM; - goto cleanup; - } - - /* unpack the flags regex */ - n = 1; - if (ORTE_SUCCESS != (rc = opal_dss.unpack(buffer, &flags, &n, OPAL_STRING))) { - ORTE_ERROR_LOG(rc); - goto cleanup; - } - /* this is not allowed to be NULL */ - if (NULL == flags) { - ORTE_ERROR_LOG(ORTE_ERR_BAD_PARAM); - rc = ORTE_ERR_BAD_PARAM; - goto cleanup; - } - - /* unpack the flag indicating if the HNP was allocated */ - n = 1; - if (ORTE_SUCCESS != (rc = opal_dss.unpack(buffer, &ui8, &n, OPAL_UINT8))) { - ORTE_ERROR_LOG(rc); - goto cleanup; - } - if (0 == ui8) { - orte_hnp_is_allocated = false; - } else { - orte_hnp_is_allocated = true; - } - - /* unpack the flag indicating we are in a managed allocation */ - n = 1; - if (ORTE_SUCCESS != (rc = opal_dss.unpack(buffer, &ui8, &n, OPAL_UINT8))) { - ORTE_ERROR_LOG(rc); - goto cleanup; - } - if (0 == ui8) { - orte_managed_allocation = false; - } else { - orte_managed_allocation = true; - } - - /* unpack the topos regex - this may not have been - * provided (e.g., for a homogeneous machine) */ - n = 1; - if (ORTE_SUCCESS != (rc = opal_dss.unpack(buffer, &topos, &n, OPAL_STRING))) { - ORTE_ERROR_LOG(rc); - goto cleanup; - } - if (NULL != topos) { - /* need to unpack the topologies */ - n = 1; - if (ORTE_SUCCESS != (rc = opal_dss.unpack(buffer, &bptr, &n, OPAL_BUFFER))) { - ORTE_ERROR_LOG(rc); - goto cleanup; - } - } - - /* if we are the HNP, then we just discard these strings as we already - * have a complete picture - but we needed to unpack them in order to - * maintain sync in the unpacking order */ - if (ORTE_PROC_IS_HNP) { - rc = ORTE_SUCCESS; - goto cleanup; - } - - /* decompress the regex */ - nodes = NULL; - if (ORTE_SUCCESS != (rc = orte_regex_extract_node_names(ndnames, &nodes))) { - ORTE_ERROR_LOG(rc); - goto cleanup; - } - - if (NULL == nodes) { - /* should not happen */ - ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND); - rc = ORTE_ERR_NOT_FOUND; - goto cleanup; - } - - /* decompress the vpids */ - tmp = opal_argv_split(dvpids, ','); - for (n=0; NULL != tmp[n]; n++) { - rng = OBJ_NEW(orte_regex_range_t); - opal_list_append(&dids, &rng->super); - /* convert the number - since it might be a range, - * save the remainder pointer */ - rng->start = strtoul(tmp[n], &rmndr, 10); - if (NULL == rmndr || 0 == strlen(rmndr)) { - rng->endpt = rng->start; - } else { - /* it must be a range - find the endpoint */ - ++rmndr; - rng->endpt = strtoul(rmndr, NULL, 10); - } - } - opal_argv_free(tmp); - - /* decompress the slots */ - tmp = opal_argv_split(slots, ','); - for (n=0; NULL != tmp[n]; n++) { - rng = OBJ_NEW(orte_regex_range_t); - opal_list_append(&slts, &rng->super); - /* find the '[' as that delimits the value */ - rmndr = strchr(tmp[n], '['); - if (NULL == rmndr) { - ORTE_ERROR_LOG(ORTE_ERR_BAD_PARAM); - rc = ORTE_ERR_BAD_PARAM; - opal_argv_free(tmp); - goto cleanup; - } - *rmndr = '\0'; - ++rmndr; - /* convert that number as this is the number of - * slots for this range */ - rng->slots = strtoul(rmndr, NULL, 10); - /* convert the starting pt - since it might be a range, - * save the remainder pointer */ - rng->start = strtoul(tmp[n], &rmndr, 10); - if (NULL == rmndr || 0 == strlen(rmndr)) { - rng->endpt = rng->start; - } else { - /* it must be a range - find the endpoint */ - ++rmndr; - rng->endpt = strtoul(rmndr, NULL, 10); - } - } - opal_argv_free(tmp); - - /* decompress the flags */ - tmp = opal_argv_split(flags, ','); - for (n=0; NULL != tmp[n]; n++) { - rng = OBJ_NEW(orte_regex_range_t); - opal_list_append(&dids, &rng->super); - /* find the '[' as that delimits the value */ - rmndr = strchr(tmp[n], '['); - if (NULL == rmndr) { - ORTE_ERROR_LOG(ORTE_ERR_BAD_PARAM); - opal_argv_free(tmp); - rc = ORTE_ERR_BAD_PARAM; - goto cleanup; - } - *rmndr = '\0'; - ++rmndr; - /* check the value - it is just one character */ - if ('1' == *rmndr) { - rng->slots = 1; - } else { - rng->slots = 0; - } - /* convert the starting pt - since it might be a range, - * save the remainder pointer */ - rng->start = strtoul(tmp[n], &rmndr, 10); - if (NULL == rmndr || 0 == strlen(rmndr)) { - rng->endpt = rng->start; - } else { - /* it must be a range - find the endpoint */ - ++rmndr; - rng->endpt = strtoul(rmndr, NULL, 10); - } - } - opal_argv_free(tmp); - free(flags); - - /* get the daemon job object */ - daemons = orte_get_job_data_object(ORTE_PROC_MY_NAME->jobid); - - /* update the node array */ - drng = (orte_regex_range_t*)opal_list_get_first(&dids); - srng = (orte_regex_range_t*)opal_list_get_first(&slts); - frng = (orte_regex_range_t*)opal_list_get_first(&flgs); - for (n=0; NULL != nodes[n]; n++) { - /* the daemon vpids for these nodes will be in the dids array, so - * use those to lookup the nodes */ - nn = drng->start + n; - if (nn == drng->endpt) { - drng = (orte_regex_range_t*)opal_list_get_next(&drng->super); - } - if (NULL == (node = (orte_node_t*)opal_pointer_array_get_item(orte_node_pool, nn))) { - node = OBJ_NEW(orte_node_t); - node->name = nodes[n]; - node->index = nn; - opal_pointer_array_set_item(orte_node_pool, nn, node); - } - /* set the number of slots */ - node->slots = srng->slots; - if (srng->endpt == nn) { - srng = (orte_regex_range_t*)opal_list_get_next(&srng->super); - } - /* set the flags */ - if (0 == frng->slots) { - ORTE_FLAG_UNSET(node, ORTE_NODE_FLAG_SLOTS_GIVEN); - } else { - ORTE_FLAG_SET(node, ORTE_NODE_FLAG_SLOTS_GIVEN); - } - if (frng->endpt == nn) { - frng = (orte_regex_range_t*)opal_list_get_next(&frng->super); - } - ++orte_process_info.num_nodes; - /* if this is me, just ignore the rest as we are all setup */ - if (nn == (int)ORTE_PROC_MY_NAME->vpid) { - continue; - } - if (NULL != node->daemon) { - OBJ_RELEASE(node->daemon); - node->daemon = NULL; - } - if (NULL == (dptr = (orte_proc_t*)opal_pointer_array_get_item(daemons->procs, nn))) { - /* create a daemon object for this node */ - dptr = OBJ_NEW(orte_proc_t); - dptr->name.jobid = ORTE_PROC_MY_NAME->jobid; - dptr->name.vpid = nn; - ORTE_FLAG_SET(dptr, ORTE_PROC_FLAG_ALIVE); // assume the daemon is alive until discovered otherwise - opal_pointer_array_set_item(daemons->procs, nn, dptr); - ++daemons->num_procs; - } else if (NULL != dptr->node) { - OBJ_RELEASE(dptr->node); - dptr->node = NULL; - } - /* link the node to the daemon */ - OBJ_RETAIN(dptr); - node->daemon = dptr; - /* link the node to the daemon */ - OBJ_RETAIN(node); - dptr->node = node; - } - /* we cannot use opal_argv_free here as this would release - * all the node names themselves. Instead, we just free the - * array of string pointers, leaving the strings alone */ - free(nodes); - - /* if no topology info was passed, then everyone shares our topology */ - if (NULL == bptr) { - orte_topology_t *t; - /* our topology is first in the array */ - t = (orte_topology_t*)opal_pointer_array_get_item(orte_node_topologies, 0); - for (n=0; n < orte_node_pool->size; n++) { - if (NULL != (node = (orte_node_t*)opal_pointer_array_get_item(orte_node_pool, n))) { - if (NULL == node->topology) { - OBJ_RETAIN(t); - node->topology = t; - } - } - } - } else { - char *sig; - hwloc_topology_t topo; - /* decompress the topology regex */ - tmp = opal_argv_split(topos, ','); - /* there must be a topology definition for each range */ - for (nn=0; NULL != tmp[nn]; nn++) { - /* unpack the signature */ - n = 1; - if (ORTE_SUCCESS != (rc = opal_dss.unpack(bptr, &sig, &n, OPAL_STRING))) { - ORTE_ERROR_LOG(rc); - opal_argv_free(tmp); - OBJ_RELEASE(bptr); - goto cleanup; - } - n = 1; - if (ORTE_SUCCESS != (rc = opal_dss.unpack(bptr, &topo, &n, OPAL_HWLOC_TOPO))) { - ORTE_ERROR_LOG(rc); - opal_argv_free(tmp); - OBJ_RELEASE(bptr); - free(sig); - goto cleanup; - } - /* see if we already have this topology - could be an update */ - for (n=0; n < orte_node_topologies->size; n++) { - if (NULL == (t = (orte_topology_t*)opal_pointer_array_get_item(orte_node_topologies, n))) { - continue; - } - if (0 == strcmp(t->sig, sig)) { - /* found a match */ - free(sig); - opal_hwloc_base_free_topology(topo); - sig = NULL; - break; - } - } - if (NULL != sig) { - /* new topology - record it */ - t = OBJ_NEW(orte_topology_t); - t->sig = sig; - t->topo = topo; - } - /* point each of the nodes in the regex to this topology */ - start = strtoul(tmp[nn], &rmndr, 10); - if (NULL != rmndr) { - /* it must be a range - find the endpoint */ - ++rmndr; - endpt = strtoul(rmndr, NULL, 10); - } else { - endpt = start; - } - for (k=start; k <= endpt; k++) { - if (NULL != (node = (orte_node_t*)opal_pointer_array_get_item(orte_node_pool, k))) { - if (NULL == node->topology) { - OBJ_RETAIN(t); - node->topology = t; - } - } - } - } - OBJ_RELEASE(bptr); - opal_argv_free(tmp); - } - - /* unpdate num procs */ - if (orte_process_info.num_procs != daemons->num_procs) { - orte_process_info.num_procs = daemons->num_procs; - /* need to update the routing plan */ - orte_routed.update_routing_plan(NULL); - } - - if (orte_process_info.max_procs < orte_process_info.num_procs) { - orte_process_info.max_procs = orte_process_info.num_procs; - } - - /* update num_daemons */ - orte_process_info.num_daemons = daemons->num_procs; - - if (0 < opal_output_get_verbosity(orte_debug_verbosity)) { - int i; - for (i=0; i < orte_node_pool->size; i++) { - if (NULL == (node = (orte_node_t*)opal_pointer_array_get_item(orte_node_pool, i))) { - continue; - } - opal_output(0, "%s node[%d].name %s daemon %s", - ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), i, - (NULL == node->name) ? "NULL" : node->name, - (NULL == node->daemon) ? "NONE" : ORTE_VPID_PRINT(node->daemon->name.vpid)); - } - } - - cleanup: - OPAL_LIST_DESTRUCT(&dids); - OPAL_LIST_DESTRUCT(&slts); - OPAL_LIST_DESTRUCT(&flgs); - return rc; -} diff --git a/orte/util/nidmap.h b/orte/util/nidmap.h deleted file mode 100644 index e91be60e001..00000000000 --- a/orte/util/nidmap.h +++ /dev/null @@ -1,60 +0,0 @@ -/* - * Copyright (c) 2004-2008 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2011 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2014-2017 Intel, Inc. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -/** @file: - * - * Populates global structure with system-specific information. - * - * Notes: add limits.h, compute size of integer and other types via sizeof(type)*CHAR_BIT - * - */ - -#ifndef _ORTE_NIDMAP_H_ -#define _ORTE_NIDMAP_H_ - -#include "orte_config.h" -#include "orte/types.h" - -#include "opal/dss/dss_types.h" - -#include "orte/runtime/orte_globals.h" - -BEGIN_C_DECLS - -#define ORTE_MAX_NODE_PREFIX 50 -#define ORTE_CONTIG_NODE_CMD 0x01 -#define ORTE_NON_CONTIG_NODE_CMD 0x02 - -/* create a regular expression describing the nodes in the - * allocation */ -ORTE_DECLSPEC int orte_util_encode_nodemap(opal_buffer_t *buffer); - -/* decode a regular expression created by the encode function - * into the orte_node_pool array */ -ORTE_DECLSPEC int orte_util_decode_daemon_nodemap(opal_buffer_t *buffer); - -ORTE_DECLSPEC int orte_util_build_daemon_nidmap(char **nodes); - -ORTE_DECLSPEC int orte_util_encode_topologies(opal_buffer_t *buffer); - -ORTE_DECLSPEC int orte_util_decode_topologies(opal_buffer_t *buffer); - -END_C_DECLS - -#endif diff --git a/orte/util/pre_condition_transports.c b/orte/util/pre_condition_transports.c index 7ff55f78bbf..ec514ea4967 100644 --- a/orte/util/pre_condition_transports.c +++ b/orte/util/pre_condition_transports.c @@ -12,7 +12,7 @@ * Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. - * Copyright (c) 2016 Intel, Inc. All rights reserved. + * Copyright (c) 2016-2017 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -130,7 +130,7 @@ char* orte_pre_condition_transports_print(uint64_t *unique_key) } -int orte_pre_condition_transports(orte_job_t *jdata) +int orte_pre_condition_transports(orte_job_t *jdata, char **key) { uint64_t unique_key[2]; int n; @@ -164,23 +164,28 @@ int orte_pre_condition_transports(orte_job_t *jdata) } /* record it in case this job executes a dynamic spawn */ - orte_set_attribute(&jdata->attributes, ORTE_JOB_TRANSPORT_KEY, ORTE_ATTR_LOCAL, string_key, OPAL_STRING); + if (NULL != jdata) { + orte_set_attribute(&jdata->attributes, ORTE_JOB_TRANSPORT_KEY, ORTE_ATTR_LOCAL, string_key, OPAL_STRING); - if (OPAL_SUCCESS != mca_base_var_env_name ("orte_precondition_transports", &cs_env)) { - ORTE_ERROR_LOG(ORTE_ERR_OUT_OF_RESOURCE); - free(string_key); - return ORTE_ERR_OUT_OF_RESOURCE; - } + if (OPAL_SUCCESS != mca_base_var_env_name ("orte_precondition_transports", &cs_env)) { + ORTE_ERROR_LOG(ORTE_ERR_OUT_OF_RESOURCE); + free(string_key); + return ORTE_ERR_OUT_OF_RESOURCE; + } - for (n=0; n < jdata->apps->size; n++) { - if (NULL == (app = (orte_app_context_t*)opal_pointer_array_get_item(jdata->apps, n))) { - continue; + for (n=0; n < jdata->apps->size; n++) { + if (NULL == (app = (orte_app_context_t*)opal_pointer_array_get_item(jdata->apps, n))) { + continue; + } + opal_setenv(cs_env, string_key, true, &app->env); } - opal_setenv(cs_env, string_key, true, &app->env); + free(cs_env); + free(string_key); + } else if (NULL != key) { + *key = string_key; + } else { + free(string_key); } - free(cs_env); - free(string_key); - return ORTE_SUCCESS; } diff --git a/orte/util/pre_condition_transports.h b/orte/util/pre_condition_transports.h index 1e1ed17a3a7..dadca24a780 100644 --- a/orte/util/pre_condition_transports.h +++ b/orte/util/pre_condition_transports.h @@ -9,6 +9,7 @@ * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. + * Copyright (c) 2017 Intel, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -32,7 +33,7 @@ BEGIN_C_DECLS -ORTE_DECLSPEC int orte_pre_condition_transports(orte_job_t *jdata); +ORTE_DECLSPEC int orte_pre_condition_transports(orte_job_t *jdata, char **key); ORTE_DECLSPEC char* orte_pre_condition_transports_print(uint64_t *unique_key); diff --git a/orte/util/proc_info.c b/orte/util/proc_info.c index 277afa2bc49..4e0db3db890 100644 --- a/orte/util/proc_info.c +++ b/orte/util/proc_info.c @@ -12,7 +12,7 @@ * Copyright (c) 2009-2016 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2012 Los Alamos National Security, LLC. * All rights reserved. - * Copyright (c) 2014-2016 Intel, Inc. All rights reserved + * Copyright (c) 2014-2017 Intel, Inc. All rights reserved. * Copyright (c) 2016 IBM Corporation. All rights reserved. * $COPYRIGHT$ * @@ -69,7 +69,6 @@ ORTE_DECLSPEC orte_proc_info_t orte_process_info = { .aliases = NULL, .pid = 0, .proc_type = ORTE_PROC_TYPE_NONE, - .sync_buf = NULL, .my_port = 0, .num_restarts = 0, .my_node_rank = ORTE_NODE_RANK_INVALID, @@ -265,9 +264,6 @@ int orte_proc_info(void) &orte_ess_node_rank); orte_process_info.my_node_rank = (orte_node_rank_t) orte_ess_node_rank; - /* setup the sync buffer */ - orte_process_info.sync_buf = OBJ_NEW(opal_buffer_t); - return ORTE_SUCCESS; } @@ -330,11 +326,6 @@ int orte_proc_info_finalize(void) orte_process_info.proc_type = ORTE_PROC_TYPE_NONE; - OBJ_RELEASE(orte_process_info.sync_buf); - orte_process_info.sync_buf = NULL; - - OBJ_DESTRUCT(&orte_process_info.super); - opal_argv_free(orte_process_info.aliases); init = false; diff --git a/orte/util/proc_info.h b/orte/util/proc_info.h index 810f31cf84d..75d11c2d92c 100644 --- a/orte/util/proc_info.h +++ b/orte/util/proc_info.h @@ -11,7 +11,7 @@ * All rights reserved. * Copyright (c) 2011-2012 Los Alamos National Security, LLC. * All rights reserved. - * Copyright (c) 2013-2016 Intel, Inc. All rights reserved + * Copyright (c) 2013-2017 Intel, Inc. All rights reserved. * Copyright (c) 2017 Cisco Systems, Inc. All rights reserved * $COPYRIGHT$ * @@ -99,7 +99,6 @@ struct orte_proc_info_t { char **aliases; /**< aliases for this node */ pid_t pid; /**< Local process ID for this process */ orte_proc_type_t proc_type; /**< Type of process */ - opal_buffer_t *sync_buf; /**< buffer to store sync response */ uint16_t my_port; /**< TCP port for out-of-band comm */ int num_restarts; /**< number of times this proc has restarted */ orte_node_rank_t my_node_rank; /**< node rank */ diff --git a/orte/util/regex.c b/orte/util/regex.c deleted file mode 100644 index a723c877dbd..00000000000 --- a/orte/util/regex.c +++ /dev/null @@ -1,628 +0,0 @@ -/* - * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2011 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2017 Intel, Inc. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ -#include "orte_config.h" -#include "orte/types.h" -#include "orte/constants.h" - -#include -#include -#include -#include -#ifdef HAVE_UNISTD_H -#include -#endif -#ifdef HAVE_SYS_SOCKET_H -#include -#endif -#ifdef HAVE_NETINET_IN_H -#include -#endif -#ifdef HAVE_ARPA_INET_H -#include -#endif -#ifdef HAVE_NETDB_H -#include -#endif -#ifdef HAVE_IFADDRS_H -#include -#endif - -#include "opal/util/argv.h" - -#include "orte/mca/errmgr/errmgr.h" -#include "orte/mca/odls/odls_types.h" -#include "orte/mca/rml/base/rml_contact.h" -#include "orte/mca/rmaps/rmaps_types.h" -#include "orte/util/show_help.h" -#include "orte/util/name_fns.h" -#include "orte/util/nidmap.h" -#include "orte/runtime/orte_globals.h" -#include "orte/mca/ess/ess.h" - -#include "orte/util/regex.h" - -#define ORTE_MAX_NODE_PREFIX 50 - -static int regex_parse_node_ranges(char *base, char *ranges, int num_digits, char *suffix, char ***names); -static int regex_parse_node_range(char *base, char *range, int num_digits, char *suffix, char ***names); - -int orte_regex_create(char *nodelist, char **regexp) -{ - char *node; - char prefix[ORTE_MAX_NODE_PREFIX]; - int i, j, len, startnum, nodenum, numdigits; - bool found, fullname; - char *suffix, *sfx; - orte_regex_node_t *ndreg; - orte_regex_range_t *range; - opal_list_t nodeids; - opal_list_item_t *item, *itm2; - char **regexargs = NULL, *tmp, *tmp2; - char *cptr; - - /* define the default */ - *regexp = NULL; - - cptr = strchr(nodelist, ','); - if (NULL == cptr) { - /* if there is only one node, don't bother */ - *regexp = strdup(nodelist); - return ORTE_SUCCESS; - } - - /* setup the list of results */ - OBJ_CONSTRUCT(&nodeids, opal_list_t); - - /* cycle thru the array of nodenames */ - node = nodelist; - while (NULL != (cptr = strchr(node, ',')) || 0 < strlen(node)) { - if (NULL != cptr) { - *cptr = '\0'; - } - /* determine this node's prefix by looking for first non-alpha char */ - fullname = false; - len = strlen(node); - startnum = -1; - memset(prefix, 0, ORTE_MAX_NODE_PREFIX); - numdigits = 0; - for (i=0, j=0; i < len; i++) { - if (!isalpha(node[i])) { - /* found a non-alpha char */ - if (!isdigit(node[i])) { - /* if it is anything but a digit, we just use - * the entire name - */ - fullname = true; - break; - } - /* count the size of the numeric field - but don't - * add the digits to the prefix - */ - numdigits++; - if (startnum < 0) { - /* okay, this defines end of the prefix */ - startnum = i; - } - continue; - } - if (startnum < 0) { - prefix[j++] = node[i]; - } - } - if (fullname || startnum < 0) { - /* can't compress this name - just add it to the list */ - ndreg = OBJ_NEW(orte_regex_node_t); - ndreg->prefix = strdup(node); - opal_list_append(&nodeids, &ndreg->super); - /* move to the next posn */ - if (NULL == cptr) { - break; - } - node = cptr + 1; - continue; - } - /* convert the digits and get any suffix */ - nodenum = strtol(&node[startnum], &sfx, 10); - if (NULL != sfx) { - suffix = strdup(sfx); - } else { - suffix = NULL; - } - /* is this nodeid already on our list? */ - found = false; - for (item = opal_list_get_first(&nodeids); - !found && item != opal_list_get_end(&nodeids); - item = opal_list_get_next(item)) { - ndreg = (orte_regex_node_t*)item; - if (0 < strlen(prefix) && NULL == ndreg->prefix) { - continue; - } - if (0 == strlen(prefix) && NULL != ndreg->prefix) { - continue; - } - if (0 < strlen(prefix) && NULL != ndreg->prefix - && 0 != strcmp(prefix, ndreg->prefix)) { - continue; - } - if (NULL == suffix && NULL != ndreg->suffix) { - continue; - } - if (NULL != suffix && NULL == ndreg->suffix) { - continue; - } - if (NULL != suffix && NULL != ndreg->suffix && - 0 != strcmp(suffix, ndreg->suffix)) { - continue; - } - if (numdigits != ndreg->num_digits) { - continue; - } - /* found a match - flag it */ - found = true; - /* get the last range on this nodeid - we do this - * to preserve order - */ - range = (orte_regex_range_t*)opal_list_get_last(&ndreg->ranges); - if (NULL == range) { - /* first range for this nodeid */ - range = OBJ_NEW(orte_regex_range_t); - range->start = nodenum; - range->cnt = 1; - opal_list_append(&ndreg->ranges, &range->super); - break; - } - /* see if the node number is out of sequence */ - if (nodenum != (range->start + range->cnt)) { - /* start a new range */ - range = OBJ_NEW(orte_regex_range_t); - range->start = nodenum; - range->cnt = 1; - opal_list_append(&ndreg->ranges, &range->super); - break; - } - /* everything matches - just increment the cnt */ - range->cnt++; - break; - } - if (!found) { - /* need to add it */ - ndreg = OBJ_NEW(orte_regex_node_t); - if (0 < strlen(prefix)) { - ndreg->prefix = strdup(prefix); - } - if (NULL != suffix) { - ndreg->suffix = strdup(suffix); - } - ndreg->num_digits = numdigits; - opal_list_append(&nodeids, &ndreg->super); - /* record the first range for this nodeid - we took - * care of names we can't compress above - */ - range = OBJ_NEW(orte_regex_range_t); - range->start = nodenum; - range->cnt = 1; - opal_list_append(&ndreg->ranges, &range->super); - } - if (NULL != suffix) { - free(suffix); - } - /* move to the next posn */ - if (NULL == cptr) { - break; - } - node = cptr + 1; - } - - /* begin constructing the regular expression */ - while (NULL != (item = opal_list_remove_first(&nodeids))) { - ndreg = (orte_regex_node_t*)item; - - /* if no ranges, then just add the name */ - if (0 == opal_list_get_size(&ndreg->ranges)) { - if (NULL != ndreg->prefix) { - /* solitary node */ - asprintf(&tmp, "%s", ndreg->prefix); - opal_argv_append_nosize(®exargs, tmp); - free(tmp); - } - OBJ_RELEASE(ndreg); - continue; - } - /* start the regex for this nodeid with the prefix */ - if (NULL != ndreg->prefix) { - asprintf(&tmp, "%s[%d:", ndreg->prefix, ndreg->num_digits); - } else { - asprintf(&tmp, "[%d:", ndreg->num_digits); - } - /* add the ranges */ - while (NULL != (itm2 = opal_list_remove_first(&ndreg->ranges))) { - range = (orte_regex_range_t*)itm2; - if (1 == range->cnt) { - asprintf(&tmp2, "%s%d,", tmp, range->start); - } else { - asprintf(&tmp2, "%s%d-%d,", tmp, range->start, range->start + range->cnt - 1); - } - free(tmp); - tmp = tmp2; - OBJ_RELEASE(range); - } - /* replace the final comma */ - tmp[strlen(tmp)-1] = ']'; - if (NULL != ndreg->suffix) { - /* add in the suffix, if provided */ - asprintf(&tmp2, "%s%s", tmp, ndreg->suffix); - free(tmp); - tmp = tmp2; - } - opal_argv_append_nosize(®exargs, tmp); - free(tmp); - OBJ_RELEASE(ndreg); - } - - /* assemble final result */ - *regexp = opal_argv_join(regexargs, ','); - /* cleanup */ - opal_argv_free(regexargs); - - OBJ_DESTRUCT(&nodeids); - - - return ORTE_SUCCESS; -} - -int orte_regex_extract_node_names(char *regexp, char ***names) -{ - int i, j, k, len, ret; - char *base; - char *orig, *suffix; - bool found_range = false; - bool more_to_come = false; - int num_digits; - - if (NULL == regexp) { - *names = NULL; - return ORTE_SUCCESS; - } - - orig = base = strdup(regexp); - if (NULL == base) { - ORTE_ERROR_LOG(ORTE_ERR_OUT_OF_RESOURCE); - return ORTE_ERR_OUT_OF_RESOURCE; - } - - OPAL_OUTPUT_VERBOSE((1, orte_debug_output, - "%s regex:extract:nodenames: checking nodelist: %s", - ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), - regexp)); - - do { - /* Find the base */ - len = strlen(base); - for (i = 0; i <= len; ++i) { - if (base[i] == '[') { - /* we found a range. this gets dealt with below */ - base[i] = '\0'; - found_range = true; - break; - } - if (base[i] == ',') { - /* we found a singleton node, and there are more to come */ - base[i] = '\0'; - found_range = false; - more_to_come = true; - break; - } - if (base[i] == '\0') { - /* we found a singleton node */ - found_range = false; - more_to_come = false; - break; - } - } - if (i == 0 && !found_range) { - /* we found a special character at the beginning of the string */ - orte_show_help("help-regex.txt", "regex:special-char", true, regexp); - free(orig); - return ORTE_ERR_BAD_PARAM; - } - - if (found_range) { - /* If we found a range, get the number of digits in the numbers */ - i++; /* step over the [ */ - for (j=i; j < len; j++) { - if (base[j] == ':') { - base[j] = '\0'; - break; - } - } - if (j >= len) { - /* we didn't find the number of digits */ - orte_show_help("help-regex.txt", "regex:num-digits-missing", true, regexp); - free(orig); - return ORTE_ERR_BAD_PARAM; - } - num_digits = strtol(&base[i], NULL, 10); - i = j + 1; /* step over the : */ - /* now find the end of the range */ - for (j = i; j < len; ++j) { - if (base[j] == ']') { - base[j] = '\0'; - break; - } - } - if (j >= len) { - /* we didn't find the end of the range */ - orte_show_help("help-regex.txt", "regex:end-range-missing", true, regexp); - free(orig); - return ORTE_ERR_BAD_PARAM; - } - /* check for a suffix */ - if (j+1 < len && base[j+1] != ',') { - /* find the next comma, if present */ - for (k=j+1; k < len && base[k] != ','; k++); - if (k < len) { - base[k] = '\0'; - } - suffix = strdup(&base[j+1]); - if (k < len) { - base[k] = ','; - } - j = k-1; - } else { - suffix = NULL; - } - OPAL_OUTPUT_VERBOSE((1, orte_debug_output, - "%s regex:extract:nodenames: parsing range %s %s %s", - ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), - base, base + i, suffix)); - - ret = regex_parse_node_ranges(base, base + i, num_digits, suffix, names); - if (NULL != suffix) { - free(suffix); - } - if (ORTE_SUCCESS != ret) { - orte_show_help("help-regex.txt", "regex:bad-value", true, regexp); - free(orig); - return ret; - } - if (j+1 < len && base[j + 1] == ',') { - more_to_come = true; - base = &base[j + 2]; - } else { - more_to_come = false; - } - } else { - /* If we didn't find a range, just add the node */ - if(ORTE_SUCCESS != (ret = opal_argv_append_nosize(names, base))) { - ORTE_ERROR_LOG(ret); - free(orig); - return ret; - } - /* step over the comma */ - i++; - /* set base equal to the (possible) next base to look at */ - base = &base[i]; - } - } while(more_to_come); - - free(orig); - - /* All done */ - return ret; -} - -/* - * Parse one or more ranges in a set - * - * @param base The base text of the node name - * @param *ranges A pointer to a range. This can contain multiple ranges - * (i.e. "1-3,10" or "5" or "9,0100-0130,250") - * @param ***names An argv array to add the newly discovered nodes to - */ -static int regex_parse_node_ranges(char *base, char *ranges, int num_digits, char *suffix, char ***names) -{ - int i, len, ret; - char *start, *orig; - - /* Look for commas, the separator between ranges */ - - len = strlen(ranges); - for (orig = start = ranges, i = 0; i < len; ++i) { - if (',' == ranges[i]) { - ranges[i] = '\0'; - ret = regex_parse_node_range(base, start, num_digits, suffix, names); - if (ORTE_SUCCESS != ret) { - ORTE_ERROR_LOG(ret); - return ret; - } - start = ranges + i + 1; - } - } - - /* Pick up the last range, if it exists */ - - if (start < orig + len) { - - OPAL_OUTPUT_VERBOSE((1, orte_debug_output, - "%s regex:parse:ranges: parse range %s (2)", - ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), start)); - - ret = regex_parse_node_range(base, start, num_digits, suffix, names); - if (ORTE_SUCCESS != ret) { - ORTE_ERROR_LOG(ret); - return ret; - } - } - - /* All done */ - return ORTE_SUCCESS; -} - - -/* - * Parse a single range in a set and add the full names of the nodes - * found to the names argv - * - * @param base The base text of the node name - * @param *ranges A pointer to a single range. (i.e. "1-3" or "5") - * @param ***names An argv array to add the newly discovered nodes to - */ -static int regex_parse_node_range(char *base, char *range, int num_digits, char *suffix, char ***names) -{ - char *str, tmp[132]; - size_t i, k, start, end; - size_t base_len, len; - bool found; - int ret; - - if (NULL == base || NULL == range) { - return ORTE_ERROR; - } - - len = strlen(range); - base_len = strlen(base); - /* Silence compiler warnings; start and end are always assigned - properly, below */ - start = end = 0; - - /* Look for the beginning of the first number */ - - for (found = false, i = 0; i < len; ++i) { - if (isdigit((int) range[i])) { - if (!found) { - start = atoi(range + i); - found = true; - break; - } - } - } - if (!found) { - ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND); - return ORTE_ERR_NOT_FOUND; - } - - /* Look for the end of the first number */ - - for (found = false; i < len; ++i) { - if (!isdigit(range[i])) { - break; - } - } - - /* Was there no range, just a single number? */ - - if (i >= len) { - end = start; - found = true; - } else { - /* Nope, there was a range. Look for the beginning of the second - * number - */ - for (; i < len; ++i) { - if (isdigit(range[i])) { - end = strtol(range + i, NULL, 10); - found = true; - break; - } - } - } - if (!found) { - ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND); - return ORTE_ERR_NOT_FOUND; - } - - /* Make strings for all values in the range */ - - len = base_len + num_digits + 32; - if (NULL != suffix) { - len += strlen(suffix); - } - str = (char *) malloc(len); - if (NULL == str) { - ORTE_ERROR_LOG(ORTE_ERR_OUT_OF_RESOURCE); - return ORTE_ERR_OUT_OF_RESOURCE; - } - for (i = start; i <= end; ++i) { - memset(str, 0, len); - strcpy(str, base); - /* we need to zero-pad the digits */ - for (k=0; k < (size_t)num_digits; k++) { - str[k+base_len] = '0'; - } - memset(tmp, 0, 132); - snprintf(tmp, 132, "%lu", (unsigned long)i); - for (k=0; k < strlen(tmp); k++) { - str[base_len + num_digits - k - 1] = tmp[strlen(tmp)-k-1]; - } - /* if there is a suffix, add it */ - if (NULL != suffix) { - strcat(str, suffix); - } - ret = opal_argv_append_nosize(names, str); - if(ORTE_SUCCESS != ret) { - ORTE_ERROR_LOG(ret); - free(str); - return ret; - } - } - free(str); - - /* All done */ - return ORTE_SUCCESS; -} - -/***** CLASS INSTANTIATIONS ****/ - -static void range_construct(orte_regex_range_t *ptr) -{ - ptr->start = 0; - ptr->cnt = 0; -} -OBJ_CLASS_INSTANCE(orte_regex_range_t, - opal_list_item_t, - range_construct, NULL); - -static void orte_regex_node_construct(orte_regex_node_t *ptr) -{ - ptr->prefix = NULL; - ptr->suffix = NULL; - ptr->num_digits = 0; - OBJ_CONSTRUCT(&ptr->ranges, opal_list_t); -} -static void orte_regex_node_destruct(orte_regex_node_t *ptr) -{ - opal_list_item_t *item; - - if (NULL != ptr->prefix) { - free(ptr->prefix); - } - if (NULL != ptr->suffix) { - free(ptr->suffix); - } - - while (NULL != (item = opal_list_remove_first(&ptr->ranges))) { - OBJ_RELEASE(item); - } - OBJ_DESTRUCT(&ptr->ranges); -} -OBJ_CLASS_INSTANCE(orte_regex_node_t, - opal_list_item_t, - orte_regex_node_construct, - orte_regex_node_destruct); diff --git a/orte/util/regex.h b/orte/util/regex.h deleted file mode 100644 index 1e8ab8bc859..00000000000 --- a/orte/util/regex.h +++ /dev/null @@ -1,65 +0,0 @@ -/* - * Copyright (c) 2004-2008 The Trustees of Indiana University and Indiana - * University Research and Technology - * Corporation. All rights reserved. - * Copyright (c) 2004-2006 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, - * University of Stuttgart. All rights reserved. - * Copyright (c) 2004-2005 The Regents of the University of California. - * All rights reserved. - * Copyright (c) 2017 Intel, Inc. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -/** @file: - * - */ - -#ifndef _ORTE_REGEX_H_ -#define _ORTE_REGEX_H_ - -#include "orte_config.h" - -#include "opal/class/opal_value_array.h" -#include "opal/class/opal_list.h" - -#include "orte/mca/odls/odls_types.h" -#include "orte/runtime/orte_globals.h" - -BEGIN_C_DECLS - -typedef struct { - opal_list_item_t super; - int start; - int endpt; - int cnt; - int slots; - orte_topology_t *t; -} orte_regex_range_t; -ORTE_DECLSPEC OBJ_CLASS_DECLARATION(orte_regex_range_t); - -typedef struct { - /* list object */ - opal_list_item_t super; - char *prefix; - char *suffix; - int num_digits; - opal_list_t ranges; -} orte_regex_node_t; -ORTE_DECLSPEC OBJ_CLASS_DECLARATION(orte_regex_node_t); - -/* NOTE: this is a destructive call for the nodes param - the - * function will search and replace all commas with '\0' - */ -ORTE_DECLSPEC int orte_regex_create(char *nodes, char **regexp); - -ORTE_DECLSPEC int orte_regex_extract_node_names(char *regexp, char ***names); - -END_C_DECLS -#endif diff --git a/orte/util/session_dir.c b/orte/util/session_dir.c index bdd73f48be6..90f464fefbb 100644 --- a/orte/util/session_dir.c +++ b/orte/util/session_dir.c @@ -370,14 +370,12 @@ int orte_session_dir(bool create, orte_process_name_t *proc) int orte_session_dir_cleanup(orte_jobid_t jobid) { - int rc = ORTE_SUCCESS; - if (!orte_create_session_dirs || orte_process_info.rm_session_dirs ) { /* we haven't created them or RM will clean them up for us*/ return ORTE_SUCCESS; } - if (NULL == orte_process_info.job_session_dir || + if (NULL == orte_process_info.jobfam_session_dir || NULL == orte_process_info.proc_session_dir) { /* this should never happen - it means we are calling * cleanup *before* properly setting up the session @@ -385,30 +383,20 @@ orte_session_dir_cleanup(orte_jobid_t jobid) * accidentally removing directories we shouldn't * touch */ - rc = ORTE_ERR_NOT_INITIALIZED; - goto CLEANUP; + return ORTE_ERR_NOT_INITIALIZED; } /* recursively blow the whole session away for our job family, * saving only output files */ - opal_os_dirpath_destroy(orte_process_info.job_session_dir, + opal_os_dirpath_destroy(orte_process_info.jobfam_session_dir, true, orte_dir_check_file); - /* now attempt to eliminate the top level directory itself - this - * will fail if anything is present, but ensures we cleanup if - * we are the last one out - */ - if( NULL != orte_process_info.top_session_dir ){ - opal_os_dirpath_destroy(orte_process_info.top_session_dir, - false, orte_dir_check_file); - } - - if (opal_os_dirpath_is_empty(orte_process_info.job_session_dir)) { + if (opal_os_dirpath_is_empty(orte_process_info.jobfam_session_dir)) { if (orte_debug_flag) { - opal_output(0, "sess_dir_cleanup: found job session dir empty - deleting"); + opal_output(0, "sess_dir_cleanup: found jobfam session dir empty - deleting"); } - rmdir(orte_process_info.job_session_dir); + rmdir(orte_process_info.jobfam_session_dir); } else { if (orte_debug_flag) { if (OPAL_ERR_NOT_FOUND == @@ -418,12 +406,10 @@ orte_session_dir_cleanup(orte_jobid_t jobid) opal_output(0, "sess_dir_cleanup: job session dir not empty - leaving"); } } - goto CLEANUP; } - if ( NULL != orte_process_info.top_session_dir ){ - - if( opal_os_dirpath_is_empty(orte_process_info.top_session_dir) ) { + if (NULL != orte_process_info.top_session_dir) { + if (opal_os_dirpath_is_empty(orte_process_info.top_session_dir)) { if (orte_debug_flag) { opal_output(0, "sess_dir_cleanup: found top session dir empty - deleting"); } @@ -440,9 +426,17 @@ orte_session_dir_cleanup(orte_jobid_t jobid) } } -CLEANUP: + /* now attempt to eliminate the top level directory itself - this + * will fail if anything is present, but ensures we cleanup if + * we are the last one out + */ + if( NULL != orte_process_info.top_session_dir ){ + opal_os_dirpath_destroy(orte_process_info.top_session_dir, + false, orte_dir_check_file); + } + - return rc; + return ORTE_SUCCESS; } diff --git a/orte/util/show_help.c b/orte/util/show_help.c index fe3ed50a33f..1b68c94580c 100644 --- a/orte/util/show_help.c +++ b/orte/util/show_help.c @@ -13,6 +13,7 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. * All rights reserved. * Copyright (c) 2016-2017 Intel, Inc. All rights reserved. + * Copyright (c) 2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -83,7 +84,7 @@ typedef struct { static void tuple_list_item_constructor(tuple_list_item_t *obj); static void tuple_list_item_destructor(tuple_list_item_t *obj); -OBJ_CLASS_INSTANCE(tuple_list_item_t, opal_list_item_t, +static OBJ_CLASS_INSTANCE(tuple_list_item_t, opal_list_item_t, tuple_list_item_constructor, tuple_list_item_destructor); diff --git a/orte/util/threads.h b/orte/util/threads.h new file mode 100644 index 00000000000..5bd1be82b5b --- /dev/null +++ b/orte/util/threads.h @@ -0,0 +1,159 @@ +/* + * Copyright (c) 2017 Intel, Inc. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#ifndef ORTE_THREADS_H +#define ORTE_THREADS_H + +#include "orte_config.h" + +#include "opal/sys/atomic.h" +#include "opal/threads/threads.h" + +/* provide macros for forward-proofing the shifting + * of objects between threads - at some point, we + * may revamp our threading model */ + +/* post an object to another thread - for now, we + * only have a memory barrier */ +#define ORTE_POST_OBJECT(o) opal_atomic_wmb() + +/* acquire an object from another thread - for now, + * we only have a memory barrier */ +#define ORTE_ACQUIRE_OBJECT(o) opal_atomic_rmb() + +#define orte_condition_wait(a,b) pthread_cond_wait(a, &(b)->m_lock_pthread) +typedef pthread_cond_t orte_condition_t; +#define orte_condition_broadcast(a) pthread_cond_broadcast(a) +#define orte_condition_signal(a) pthread_cond_signal(a) +#define ORTE_CONDITION_STATIC_INIT PTHREAD_COND_INITIALIZER + +/* define a threadshift macro */ +#define ORTE_THREADSHIFT(x, eb, f, p) \ + do { \ + opal_event_set((eb), &((x)->ev), -1, OPAL_EV_WRITE, (f), (x)); \ + opal_event_set_priority(&((x)->ev), (p)); \ + ORTE_POST_OBJECT((x)); \ + opal_event_active(&((x)->ev), OPAL_EV_WRITE, 1); \ + } while(0) + +typedef struct { + opal_mutex_t mutex; + orte_condition_t cond; + volatile bool active; +} orte_lock_t; + +#define ORTE_CONSTRUCT_LOCK(l) \ + do { \ + OBJ_CONSTRUCT(&(l)->mutex, opal_mutex_t); \ + pthread_cond_init(&(l)->cond, NULL); \ + (l)->active = true; \ + } while(0) + +#define ORTE_DESTRUCT_LOCK(l) \ + do { \ + OBJ_DESTRUCT(&(l)->mutex); \ + pthread_cond_destroy(&(l)->cond); \ + } while(0) + + +#if OPAL_ENABLE_DEBUG +#define ORTE_ACQUIRE_THREAD(lck) \ + do { \ + opal_mutex_lock(&(lck)->mutex); \ + if (opal_debug_threads) { \ + opal_output(0, "Waiting for thread %s:%d", \ + __FILE__, __LINE__); \ + } \ + while ((lck)->active) { \ + orte_condition_wait(&(lck)->cond, &(lck)->mutex); \ + } \ + if (opal_debug_threads) { \ + opal_output(0, "Thread obtained %s:%d", \ + __FILE__, __LINE__); \ + } \ + (lck)->active = true; \ + OPAL_ACQUIRE_OBJECT(lck); \ + } while(0) +#else +#define ORTE_ACQUIRE_THREAD(lck) \ + do { \ + opal_mutex_lock(&(lck)->mutex); \ + while ((lck)->active) { \ + orte_condition_wait(&(lck)->cond, &(lck)->mutex); \ + } \ + (lck)->active = true; \ + OPAL_ACQUIRE_OBJECT(lck); \ + } while(0) +#endif + + +#if OPAL_ENABLE_DEBUG +#define ORTE_WAIT_THREAD(lck) \ + do { \ + opal_mutex_lock(&(lck)->mutex); \ + if (opal_debug_threads) { \ + opal_output(0, "Waiting for thread %s:%d", \ + __FILE__, __LINE__); \ + } \ + while ((lck)->active) { \ + orte_condition_wait(&(lck)->cond, &(lck)->mutex); \ + } \ + if (opal_debug_threads) { \ + opal_output(0, "Thread obtained %s:%d", \ + __FILE__, __LINE__); \ + } \ + OPAL_ACQUIRE_OBJECT(&lck); \ + opal_mutex_unlock(&(lck)->mutex); \ + } while(0) +#else +#define ORTE_WAIT_THREAD(lck) \ + do { \ + opal_mutex_lock(&(lck)->mutex); \ + while ((lck)->active) { \ + orte_condition_wait(&(lck)->cond, &(lck)->mutex); \ + } \ + OPAL_ACQUIRE_OBJECT(lck); \ + opal_mutex_unlock(&(lck)->mutex); \ + } while(0) +#endif + + +#if OPAL_ENABLE_DEBUG +#define ORTE_RELEASE_THREAD(lck) \ + do { \ + if (opal_debug_threads) { \ + opal_output(0, "Releasing thread %s:%d", \ + __FILE__, __LINE__); \ + } \ + (lck)->active = false; \ + OPAL_POST_OBJECT(lck); \ + orte_condition_broadcast(&(lck)->cond); \ + opal_mutex_unlock(&(lck)->mutex); \ + } while(0) +#else +#define ORTE_RELEASE_THREAD(lck) \ + do { \ + (lck)->active = false; \ + OPAL_POST_OBJECT(lck); \ + orte_condition_broadcast(&(lck)->cond); \ + opal_mutex_unlock(&(lck)->mutex); \ + } while(0) +#endif + + +#define ORTE_WAKEUP_THREAD(lck) \ + do { \ + opal_mutex_lock(&(lck)->mutex); \ + (lck)->active = false; \ + OPAL_POST_OBJECT(lck); \ + orte_condition_broadcast(&(lck)->cond); \ + opal_mutex_unlock(&(lck)->mutex); \ + } while(0) + +#endif /* ORTE_THREADS_H */ diff --git a/oshmem/include/pshmem.h b/oshmem/include/pshmem.h index a48231fefa1..8ab2cda8183 100644 --- a/oshmem/include/pshmem.h +++ b/oshmem/include/pshmem.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2014-2016 Mellanox Technologies, Inc. + * Copyright (c) 2014-2017 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2014 Intel, Inc. All rights reserved * Copyright (c) 2016 Research Organization for Information Science @@ -464,7 +464,6 @@ OSHMEM_DECLSPEC void pshmem_quiet(void); */ OSHMEM_DECLSPEC void pshmem_broadcast32(void *target, const void *source, size_t nlong, int PE_root, int PE_start, int logPE_stride, int PE_size, long *pSync); OSHMEM_DECLSPEC void pshmem_broadcast64(void *target, const void *source, size_t nlong, int PE_root, int PE_start, int logPE_stride, int PE_size, long *pSync); -OSHMEM_DECLSPEC void pshmem_broadcast(void *target, const void *source, size_t nlong, int PE_root, int PE_start, int logPE_stride, int PE_size, long *pSync); OSHMEM_DECLSPEC void pshmem_collect32(void *target, const void *source, size_t nlong, int PE_start, int logPE_stride, int PE_size, long *pSync); OSHMEM_DECLSPEC void pshmem_collect64(void *target, const void *source, size_t nlong, int PE_start, int logPE_stride, int PE_size, long *pSync); OSHMEM_DECLSPEC void pshmem_fcollect32(void *target, const void *source, size_t nlong, int PE_start, int logPE_stride, int PE_size, long *pSync); diff --git a/oshmem/include/shmem-compat.h b/oshmem/include/shmem-compat.h index e24781224a4..19a873f7903 100644 --- a/oshmem/include/shmem-compat.h +++ b/oshmem/include/shmem-compat.h @@ -1,6 +1,6 @@ /* oshmem/include/shmem-compat.h. This file contains OpenSHMEM lagacy API */ /* - * Copyright (c) 2014-2015 Mellanox Technologies, Inc. + * Copyright (c) 2014-2017 Mellanox Technologies, Inc. * All rights reserved. * $COPYRIGHT$ * @@ -33,11 +33,6 @@ OSHMEM_DECLSPEC void* shmemalign(size_t align, size_t size); OSHMEM_DECLSPEC void* shrealloc(void *ptr, size_t size); OSHMEM_DECLSPEC void shfree(void* ptr); -OSHMEM_DECLSPEC void shmem_char_put(char *target, const char *source, size_t len, int pe); -OSHMEM_DECLSPEC void shmem_char_get(char *target, const char *source, size_t len, int pe); - -OSHMEM_DECLSPEC void shmem_put(void *target, const void *source, size_t len, int pe); -OSHMEM_DECLSPEC void shmem_get(void *target, const void *source, size_t len, int pe); OSHMEM_DECLSPEC void globalexit(int status); #if defined(c_plusplus) || defined(__cplusplus) diff --git a/oshmem/include/shmem.fh b/oshmem/include/shmem.fh index d98faba9e79..bb7fc281d6c 100644 --- a/oshmem/include/shmem.fh +++ b/oshmem/include/shmem.fh @@ -19,11 +19,12 @@ integer SHMEM_MINOR_VERSION parameter ( SHMEM_MINOR_VERSION = 3 ) - CHARACTER(LEN = 256), PARAMETER :: SHMEM_VENDOR_STRING = "http://www.open-mpi.org/" - integer SHMEM_MAX_NAME_LEN parameter ( SHMEM_MAX_NAME_LEN = 256-1 ) + character(LEN = SHMEM_MAX_NAME_LEN) SHMEM_VENDOR_STRING + parameter ( SHMEM_VENDOR_STRING = "http://www.open-mpi.org/" ) + integer SHMEM_BARRIER_SYNC_SIZE parameter ( SHMEM_BARRIER_SYNC_SIZE = 4 ) diff --git a/oshmem/include/shmem.h.in b/oshmem/include/shmem.h.in index ba1f88d063c..a81e890cdc6 100644 --- a/oshmem/include/shmem.h.in +++ b/oshmem/include/shmem.h.in @@ -117,7 +117,8 @@ enum shmem_wait_ops { #define _SHMEM_BCAST_SYNC_SIZE (1 + _SHMEM_BARRIER_SYNC_SIZE) #define _SHMEM_COLLECT_SYNC_SIZE (1 + _SHMEM_BCAST_SYNC_SIZE) #define _SHMEM_REDUCE_SYNC_SIZE (1 + _SHMEM_BCAST_SYNC_SIZE) -#define _SHMEM_ALLTOALL_SYNC_SIZE (1) +#define _SHMEM_ALLTOALL_SYNC_SIZE (_SHMEM_BARRIER_SYNC_SIZE) +#define _SHMEM_ALLTOALLS_SYNC_SIZE (_SHMEM_BARRIER_SYNC_SIZE) #define _SHMEM_REDUCE_MIN_WRKDATA_SIZE (1) #define _SHMEM_SYNC_VALUE (-1) @@ -126,6 +127,7 @@ enum shmem_wait_ops { #define SHMEM_COLLECT_SYNC_SIZE _SHMEM_COLLECT_SYNC_SIZE #define SHMEM_REDUCE_SYNC_SIZE _SHMEM_REDUCE_SYNC_SIZE #define SHMEM_ALLTOALL_SYNC_SIZE _SHMEM_ALLTOALL_SYNC_SIZE +#define SHMEM_ALLTOALLS_SYNC_SIZE _SHMEM_ALLTOALLS_SYNC_SIZE #define SHMEM_REDUCE_MIN_WRKDATA_SIZE _SHMEM_REDUCE_MIN_WRKDATA_SIZE #define SHMEM_SYNC_VALUE _SHMEM_SYNC_VALUE @@ -298,7 +300,7 @@ OSHMEM_DECLSPEC long long shmem_longlong_g(const long long* addr, int pe); OSHMEM_DECLSPEC long double shmem_longdouble_g(const long double* addr, int pe); #if OSHMEM_HAVE_C11 #define shmem_g(addr, pe) \ - _Generic(&*(dst), \ + _Generic(&*(addr), \ char*: shmem_char_g, \ short*: shmem_short_g, \ int*: shmem_int_g, \ diff --git a/oshmem/mca/atomic/basic/Makefile.am b/oshmem/mca/atomic/basic/Makefile.am index 11f64fa9594..2719514e678 100644 --- a/oshmem/mca/atomic/basic/Makefile.am +++ b/oshmem/mca/atomic/basic/Makefile.am @@ -1,6 +1,7 @@ # # Copyright (c) 2013 Mellanox Technologies, Inc. # All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -32,6 +33,7 @@ mcacomponentdir = $(oshmemlibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_atomic_basic_la_SOURCES = $(sources) mca_atomic_basic_la_LDFLAGS = -module -avoid-version +mca_atomic_basic_la_LIBADD = $(top_builddir)/oshmem/liboshmem.la noinst_LTLIBRARIES = $(component_noinst) libmca_atomic_basic_la_SOURCES =$(sources) diff --git a/oshmem/mca/atomic/mxm/Makefile.am b/oshmem/mca/atomic/mxm/Makefile.am index 6457c6d5ad9..87a54b7e14a 100644 --- a/oshmem/mca/atomic/mxm/Makefile.am +++ b/oshmem/mca/atomic/mxm/Makefile.am @@ -1,6 +1,7 @@ # # Copyright (c) 2013 Mellanox Technologies, Inc. # All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -33,7 +34,8 @@ endif mcacomponentdir = $(oshmemlibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_atomic_mxm_la_SOURCES = $(mxm_sources) -mca_atomic_mxm_la_LIBADD = $(atomic_mxm_LIBS) +mca_atomic_mxm_la_LIBADD = $(top_builddir)/oshmem/liboshmem.la \ + $(atomic_mxm_LIBS) mca_atomic_mxm_la_LDFLAGS = -module -avoid-version $(atomic_mxm_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/oshmem/mca/atomic/mxm/atomic_mxm_cswap.c b/oshmem/mca/atomic/mxm/atomic_mxm_cswap.c index bb6c675a03c..9aba82bff0a 100644 --- a/oshmem/mca/atomic/mxm/atomic_mxm_cswap.c +++ b/oshmem/mca/atomic/mxm/atomic_mxm_cswap.c @@ -34,8 +34,9 @@ int mca_atomic_mxm_cswap(void *target, mxm_send_req_t sreq; mca_atomic_mxm_req_init(&sreq, pe, target, nlong); + memcpy(prev, value, nlong); - sreq.base.data.buffer.ptr = (void *) value; + sreq.base.data.buffer.ptr = prev; if (NULL == cond) { sreq.opcode = MXM_REQ_OP_ATOMIC_SWAP; } else { @@ -45,8 +46,6 @@ int mca_atomic_mxm_cswap(void *target, mca_atomic_mxm_post(&sreq); - memcpy(prev, value, nlong); - return OSHMEM_SUCCESS; } diff --git a/oshmem/mca/atomic/ucx/Makefile.am b/oshmem/mca/atomic/ucx/Makefile.am index 57f8b552b77..d922456e726 100644 --- a/oshmem/mca/atomic/ucx/Makefile.am +++ b/oshmem/mca/atomic/ucx/Makefile.am @@ -1,6 +1,7 @@ # # Copyright (c) 2013 Mellanox Technologies, Inc. # All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -33,7 +34,8 @@ endif mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_atomic_ucx_la_SOURCES = $(ucx_sources) -mca_atomic_ucx_la_LIBADD = $(atomic_ucx_LIBS) +mca_atomic_ucx_la_LIBADD = $(top_builddir)/oshmem/liboshmem.la \ + $(atomic_ucx_LIBS) mca_atomic_ucx_la_LDFLAGS = -module -avoid-version $(atomic_ucx_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/oshmem/mca/atomic/ucx/atomic_ucx_cswap.c b/oshmem/mca/atomic/ucx/atomic_ucx_cswap.c index fc4c7a33f50..7d84f9e3dc7 100644 --- a/oshmem/mca/atomic/ucx/atomic_ucx_cswap.c +++ b/oshmem/mca/atomic/ucx/atomic_ucx_cswap.c @@ -63,7 +63,7 @@ int mca_atomic_ucx_cswap(void *target, return ucx_status_to_oshmem(status); err_size: - ATOMIC_ERROR("[#%d] Type size must be 1/2/4 or 8 bytes.", my_pe); + ATOMIC_ERROR("[#%d] Type size must be 4 or 8 bytes.", my_pe); return OSHMEM_ERROR; } diff --git a/oshmem/mca/atomic/ucx/atomic_ucx_fadd.c b/oshmem/mca/atomic/ucx/atomic_ucx_fadd.c index a1b88c95deb..b9ce9dee0dc 100644 --- a/oshmem/mca/atomic/ucx/atomic_ucx_fadd.c +++ b/oshmem/mca/atomic/ucx/atomic_ucx_fadd.c @@ -63,6 +63,6 @@ int mca_atomic_ucx_fadd(void *target, return ucx_status_to_oshmem(status); err_size: - ATOMIC_ERROR("[#%d] Type size must be 1/2/4 or 8 bytes.", my_pe); + ATOMIC_ERROR("[#%d] Type size must be 4 or 8 bytes.", my_pe); return OSHMEM_ERROR; } diff --git a/oshmem/mca/memheap/buddy/Makefile.am b/oshmem/mca/memheap/buddy/Makefile.am index 5746cc2be92..7d08f28b9da 100644 --- a/oshmem/mca/memheap/buddy/Makefile.am +++ b/oshmem/mca/memheap/buddy/Makefile.am @@ -1,6 +1,7 @@ # # Copyright (c) 2013 Mellanox Technologies, Inc. # All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -29,6 +30,7 @@ mcacomponentdir = $(oshmemlibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_memheap_buddy_la_SOURCES = $(buddy_sources) mca_memheap_buddy_la_LDFLAGS = -module -avoid-version +mca_memheap_buddy_la_LIBADD = $(top_builddir)/oshmem/liboshmem.la #noinst_LTLIBRARIES = $(lib) noinst_LTLIBRARIES = $(component_noinst) diff --git a/oshmem/mca/memheap/memheap.h b/oshmem/mca/memheap/memheap.h index 3492812a328..7cad1e9e3f3 100644 --- a/oshmem/mca/memheap/memheap.h +++ b/oshmem/mca/memheap/memheap.h @@ -138,13 +138,18 @@ typedef struct mca_memheap_base_module_t mca_memheap_base_module_t; OSHMEM_DECLSPEC extern mca_memheap_base_module_t mca_memheap; +static inline int mca_memheap_base_mkey_is_shm(sshmem_mkey_t *mkey) +{ + return (0 == mkey->len) && (MAP_SEGMENT_SHM_INVALID != (int)mkey->u.key); +} + /** * check if memcpy() can be used to copy data to dst_addr * must be memheap address and segment must be mapped */ static inline int mca_memheap_base_can_local_copy(sshmem_mkey_t *mkey, void *dst_addr) { return mca_memheap.memheap_is_symmetric_addr(dst_addr) && - (0 == mkey->len) && (MAP_SEGMENT_SHM_INVALID != (int)mkey->u.key); + mca_memheap_base_mkey_is_shm(mkey); } diff --git a/oshmem/mca/memheap/ptmalloc/Makefile.am b/oshmem/mca/memheap/ptmalloc/Makefile.am index 9a92bef5f9d..aaaec6a88d0 100644 --- a/oshmem/mca/memheap/ptmalloc/Makefile.am +++ b/oshmem/mca/memheap/ptmalloc/Makefile.am @@ -1,6 +1,7 @@ # # Copyright (c) 2013 Mellanox Technologies, Inc. # All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -31,6 +32,7 @@ mcacomponentdir = $(oshmemlibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_memheap_ptmalloc_la_SOURCES = $(ptmalloc_sources) mca_memheap_ptmalloc_la_LDFLAGS = -module -avoid-version +mca_memheap_ptmalloc_la_LIBADD = $(top_builddir)/oshmem/liboshmem.la #noinst_LTLIBRARIES = $(lib) noinst_LTLIBRARIES = $(component_noinst) diff --git a/oshmem/mca/scoll/base/scoll_base_frame.c b/oshmem/mca/scoll/base/scoll_base_frame.c index e8db9b35c35..592b3dd2b8a 100644 --- a/oshmem/mca/scoll/base/scoll_base_frame.c +++ b/oshmem/mca/scoll/base/scoll_base_frame.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * $COPYRIGHT$ * @@ -17,6 +17,7 @@ #include "oshmem/mca/mca.h" #include "opal/util/output.h" #include "opal/mca/base/base.h" +#include "ompi/util/timings.h" #include "oshmem/util/oshmem_util.h" #include "oshmem/mca/scoll/scoll.h" @@ -57,6 +58,8 @@ int mca_scoll_enable(void) { int ret = OSHMEM_SUCCESS; + OPAL_TIMING_ENV_INIT(mca_scoll_enable); + if (!mca_scoll_sync_array) { void* ptr = (void*) mca_scoll_sync_array; int i = 0; @@ -69,16 +72,23 @@ int mca_scoll_enable(void) } } + OPAL_TIMING_ENV_NEXT(mca_scoll_enable, "memheap"); + /* Note: it is done to support FCA only and we need to consider possibility to * find a way w/o this ugly hack */ if (OSHMEM_SUCCESS != (ret = mca_scoll_base_select(oshmem_group_all))) { return ret; } + + OPAL_TIMING_ENV_NEXT(mca_scoll_enable, "group_all"); + if (OSHMEM_SUCCESS != (ret = mca_scoll_base_select(oshmem_group_self))) { return ret; } + OPAL_TIMING_ENV_NEXT(mca_scoll_enable, "group_self"); + return OSHMEM_SUCCESS; } diff --git a/oshmem/mca/scoll/base/scoll_base_select.c b/oshmem/mca/scoll/base/scoll_base_select.c index 600fdc4ec68..fdaddfe1699 100644 --- a/oshmem/mca/scoll/base/scoll_base_select.c +++ b/oshmem/mca/scoll/base/scoll_base_select.c @@ -1,7 +1,7 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. - * Copyright (c) 2016 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2016 Cisco Systems, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -21,6 +21,7 @@ #include "oshmem/mca/mca.h" #include "opal/mca/base/base.h" #include "opal/mca/base/mca_base_component_repository.h" +#include "ompi/util/timings.h" #include "oshmem/util/oshmem_util.h" #include "oshmem/mca/scoll/scoll.h" @@ -194,6 +195,8 @@ int mca_scoll_base_select(struct oshmem_group_t *group) opal_list_item_t *item; int ret; + OPAL_TIMING_ENV_INIT(mca_scoll_base_select); + /* Announce */ SCOLL_VERBOSE(10, "scoll:base:group_select: new group: %d", group->id); mca_scoll_base_group_unselect(group); @@ -206,6 +209,9 @@ int mca_scoll_base_select(struct oshmem_group_t *group) group->g_scoll.scoll_alltoall = scoll_null_alltoall; return OSHMEM_SUCCESS; } + + OPAL_TIMING_ENV_NEXT(mca_scoll_base_select, "setup"); + SCOLL_VERBOSE(10, "scoll:base:group_select: Checking all available modules"); selectable = check_components(&oshmem_scoll_base_framework.framework_components, group); @@ -218,6 +224,8 @@ int mca_scoll_base_select(struct oshmem_group_t *group) return OSHMEM_ERROR; } + OPAL_TIMING_ENV_NEXT(mca_scoll_base_select, "check_components"); + /* do the selection loop */ for (item = opal_list_remove_first(selectable); NULL != item; item = opal_list_remove_first(selectable)) { @@ -236,6 +244,8 @@ int mca_scoll_base_select(struct oshmem_group_t *group) OBJ_RELEASE(avail); } + OPAL_TIMING_ENV_NEXT(mca_scoll_base_select, "select_loop"); + /* Done with the list from the check_components() call so release it. */ OBJ_RELEASE(selectable); if ((NULL == group->g_scoll.scoll_barrier) @@ -247,6 +257,8 @@ int mca_scoll_base_select(struct oshmem_group_t *group) return OSHMEM_ERR_NOT_FOUND; } + OPAL_TIMING_ENV_NEXT(mca_scoll_base_select, "release"); + return OSHMEM_SUCCESS; } diff --git a/oshmem/mca/scoll/basic/Makefile.am b/oshmem/mca/scoll/basic/Makefile.am index 689b563c80a..708a45aae7a 100644 --- a/oshmem/mca/scoll/basic/Makefile.am +++ b/oshmem/mca/scoll/basic/Makefile.am @@ -1,6 +1,7 @@ # # Copyright (c) 2013-2016 Mellanox Technologies, Inc. # All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -35,6 +36,7 @@ mcacomponentdir = $(oshmemlibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_scoll_basic_la_SOURCES = $(sources) mca_scoll_basic_la_LDFLAGS = -module -avoid-version +mca_scoll_basic_la_LIBADD = $(top_builddir)/oshmem/liboshmem.la noinst_LTLIBRARIES = $(component_noinst) libmca_scoll_basic_la_SOURCES =$(sources) diff --git a/oshmem/mca/scoll/basic/scoll_basic_alltoall.c b/oshmem/mca/scoll/basic/scoll_basic_alltoall.c index cc97a05f21b..479e118ec44 100644 --- a/oshmem/mca/scoll/basic/scoll_basic_alltoall.c +++ b/oshmem/mca/scoll/basic/scoll_basic_alltoall.c @@ -19,13 +19,19 @@ #include "oshmem/mca/scoll/base/base.h" #include "scoll_basic.h" -static int _algorithm_simple(struct oshmem_group_t *group, - void *target, - const void *source, - ptrdiff_t dst, ptrdiff_t sst, - size_t nelems, - size_t element_size, - long *pSync); +static int a2a_alg_simple(struct oshmem_group_t *group, + void *target, + const void *source, + size_t nelems, + size_t element_size); + +static int a2as_alg_simple(struct oshmem_group_t *group, + void *target, + const void *source, + ptrdiff_t dst, ptrdiff_t sst, + size_t nelems, + size_t element_size); + int mca_scoll_basic_alltoall(struct oshmem_group_t *group, void *target, @@ -36,88 +42,150 @@ int mca_scoll_basic_alltoall(struct oshmem_group_t *group, long *pSync, int alg) { - int rc = OSHMEM_SUCCESS; + int rc; + int i; /* Arguments validation */ if (!group) { SCOLL_ERROR("Active set (group) of PE is not defined"); - rc = OSHMEM_ERR_BAD_PARAM; + return OSHMEM_ERR_BAD_PARAM; } /* Check if this PE is part of the group */ - if ((rc == OSHMEM_SUCCESS) && oshmem_proc_group_is_member(group)) { - int i = 0; - - if (pSync) { - rc = _algorithm_simple(group, - target, - source, - dst, - sst, - nelems, - element_size, - pSync); - } else { - SCOLL_ERROR("Incorrect argument pSync"); - rc = OSHMEM_ERR_BAD_PARAM; - } + if (!oshmem_proc_group_is_member(group)) { + return OSHMEM_SUCCESS; + } - /* Restore initial values */ - SCOLL_VERBOSE(12, - "PE#%d Restore special synchronization array", - group->my_pe); - for (i = 0; pSync && (i < _SHMEM_ALLTOALL_SYNC_SIZE); i++) { - pSync[i] = _SHMEM_SYNC_VALUE; - } + if (!pSync) { + SCOLL_ERROR("Incorrect argument pSync"); + return OSHMEM_ERR_BAD_PARAM; + } + + if ((sst == 1) && (dst == 1)) { + rc = a2a_alg_simple(group, target, source, nelems, element_size); + } else { + rc = a2as_alg_simple(group, target, source, dst, sst, nelems, + element_size); + } + + if (rc != OSHMEM_SUCCESS) { + return rc; + } + + /* quiet is needed because scoll level barrier does not + * guarantee put completion + */ + MCA_SPML_CALL(quiet()); + + /* Wait for operation completion */ + SCOLL_VERBOSE(14, "[#%d] Wait for operation completion", group->my_pe); + rc = BARRIER_FUNC(group, pSync + 1, SCOLL_DEFAULT_ALG); + + /* Restore initial values */ + SCOLL_VERBOSE(12, "PE#%d Restore special synchronization array", + group->my_pe); + + for (i = 0; pSync && (i < _SHMEM_ALLTOALL_SYNC_SIZE); i++) { + pSync[i] = _SHMEM_SYNC_VALUE; } return rc; } -static int _algorithm_simple(struct oshmem_group_t *group, - void *target, - const void *source, - ptrdiff_t tst, ptrdiff_t sst, - size_t nelems, - size_t element_size, - long *pSync) + +static inline void * +get_stride_elem(const void *base, ptrdiff_t sst, size_t nelems, size_t elem_size, + int block_idx, int elem_idx) { - int rc = OSHMEM_SUCCESS; - int pe_cur; - int i; - int j; - int k; + /* + * j th block starts at: nelems * element_size * sst * j + * offset of the l th element in the block is: element_size * sst * l + */ + return (char *)base + elem_size * sst * (nelems * block_idx + elem_idx); +} + +static inline int +get_dst_pe(struct oshmem_group_t *group, int src_blk_idx, int dst_blk_idx, int *dst_pe_idx) +{ + /* index permutation for better distribution of traffic */ + (*dst_pe_idx) = (dst_blk_idx + src_blk_idx) % group->proc_count; + + /* convert to the global pe */ + return oshmem_proc_pe(group->proc_array[*dst_pe_idx]); +} + +static int a2as_alg_simple(struct oshmem_group_t *group, + void *target, + const void *source, + ptrdiff_t tst, ptrdiff_t sst, + size_t nelems, + size_t element_size) +{ + int rc; + int dst_pe; + int src_blk_idx; + int dst_blk_idx; + int dst_pe_idx; + size_t elem_idx; SCOLL_VERBOSE(14, "[#%d] send data to all PE in the group", group->my_pe); - j = oshmem_proc_group_find_id(group, group->my_pe); - for (i = 0; i < group->proc_count; i++) { - /* index permutation for better distribution of traffic */ - k = (((j)+(i))%(group->proc_count)); - pe_cur = oshmem_proc_pe(group->proc_array[k]); - rc = MCA_SPML_CALL(put( - (void *)((char *)target + j * tst * nelems * element_size), - nelems * element_size, - (void *)((char *)source + i * sst * nelems * element_size), - pe_cur)); - if (OSHMEM_SUCCESS != rc) { - break; + + dst_blk_idx = oshmem_proc_group_find_id(group, group->my_pe); + + for (src_blk_idx = 0; src_blk_idx < group->proc_count; src_blk_idx++) { + + dst_pe = get_dst_pe(group, src_blk_idx, dst_blk_idx, &dst_pe_idx); + for (elem_idx = 0; elem_idx < nelems; elem_idx++) { + rc = MCA_SPML_CALL(put( + get_stride_elem(target, tst, nelems, element_size, + dst_blk_idx, elem_idx), + element_size, + get_stride_elem(source, sst, nelems, element_size, + dst_pe_idx, elem_idx), + dst_pe)); + if (OSHMEM_SUCCESS != rc) { + return rc; + } } } - /* fence (which currently acts as quiet) is needed - * because scoll level barrier does not guarantee put completion - */ - MCA_SPML_CALL(fence()); + return OSHMEM_SUCCESS; +} - /* Wait for operation completion */ - if (rc == OSHMEM_SUCCESS) { - SCOLL_VERBOSE(14, "[#%d] Wait for operation completion", group->my_pe); - rc = BARRIER_FUNC(group, - (pSync + 1), - SCOLL_DEFAULT_ALG); - } +static int a2a_alg_simple(struct oshmem_group_t *group, + void *target, + const void *source, + size_t nelems, + size_t element_size) +{ + int rc; + int dst_pe; + int src_blk_idx; + int dst_blk_idx; + int dst_pe_idx; + void *dst_blk; - return rc; -} + SCOLL_VERBOSE(14, + "[#%d] send data to all PE in the group", + group->my_pe); + dst_blk_idx = oshmem_proc_group_find_id(group, group->my_pe); + + /* block start at stride 1 first elem */ + dst_blk = get_stride_elem(target, 1, nelems, element_size, dst_blk_idx, 0); + + for (src_blk_idx = 0; src_blk_idx < group->proc_count; src_blk_idx++) { + + dst_pe = get_dst_pe(group, src_blk_idx, dst_blk_idx, &dst_pe_idx); + rc = MCA_SPML_CALL(put(dst_blk, + nelems * element_size, + get_stride_elem(source, 1, nelems, + element_size, dst_pe_idx, 0), + dst_pe)); + if (OSHMEM_SUCCESS != rc) { + return rc; + } + } + return OSHMEM_SUCCESS; +} diff --git a/oshmem/mca/scoll/basic/scoll_basic_barrier.c b/oshmem/mca/scoll/basic/scoll_basic_barrier.c index 8f2c5970b6f..e7f87a544f4 100644 --- a/oshmem/mca/scoll/basic/scoll_basic_barrier.c +++ b/oshmem/mca/scoll/basic/scoll_basic_barrier.c @@ -167,8 +167,7 @@ static int _algorithm_central_counter(struct oshmem_group_t *group, The root could leave the first barrier and in the second barrier it could get SHMEM_SYNC_WAIT value on remote node before the remote node receives its SHMEM_SYNC_RUN value in the first barrier */ - /* TODO: actually it must be quiet */ - MCA_SPML_CALL(fence()); + MCA_SPML_CALL(quiet()); } /* Wait for RUN signal */ else { diff --git a/oshmem/mca/scoll/basic/scoll_basic_broadcast.c b/oshmem/mca/scoll/basic/scoll_basic_broadcast.c index f184c110bca..bccf48c9c9d 100644 --- a/oshmem/mca/scoll/basic/scoll_basic_broadcast.c +++ b/oshmem/mca/scoll/basic/scoll_basic_broadcast.c @@ -146,10 +146,10 @@ static int _algorithm_central_counter(struct oshmem_group_t *group, rc = MCA_SPML_CALL(put(target, nlong, (void *)source, pe_cur)); } } - /* fence (which currently acts as quiet) is needed - * because scoll level barrier does not guarantee put completion + /* quiet is needed because scoll level barrier does not + * guarantee put completion */ - MCA_SPML_CALL(fence()); + MCA_SPML_CALL(quiet()); } if (rc == OSHMEM_SUCCESS) { diff --git a/oshmem/mca/scoll/fca/Makefile.am b/oshmem/mca/scoll/fca/Makefile.am index b434c455b4c..85164775e5b 100644 --- a/oshmem/mca/scoll/fca/Makefile.am +++ b/oshmem/mca/scoll/fca/Makefile.am @@ -3,6 +3,7 @@ # # Copyright (c) 2013-2015 Mellanox Technologies, Inc. # All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -33,7 +34,8 @@ endif mcacomponentdir = $(oshmemlibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_scoll_fca_la_SOURCES = $(scoll_fca_sources) -mca_scoll_fca_la_LIBADD = $(scoll_fca_LIBS) +mca_scoll_fca_la_LIBADD = $(top_builddir)/oshmem/liboshmem.la \ + $(scoll_fca_LIBS) mca_scoll_fca_la_LDFLAGS = -module -avoid-version $(scoll_fca_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/oshmem/mca/scoll/mpi/Makefile.am b/oshmem/mca/scoll/mpi/Makefile.am index 3e114d4606c..f8782fccffa 100644 --- a/oshmem/mca/scoll/mpi/Makefile.am +++ b/oshmem/mca/scoll/mpi/Makefile.am @@ -1,5 +1,6 @@ # Copyright (c) 2013-2015 Mellanox Technologies, Inc. # All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -29,7 +30,8 @@ endif mcacomponentdir = $(pkglibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_scoll_mpi_la_SOURCES = $(scoll_mpi_sources) -mca_scoll_mpi_la_LIBADD = $(scoll_mpi_LIBS) +mca_scoll_mpi_la_LIBADD = $(top_builddir)/oshmem/liboshmem.la \ + $(scoll_mpi_LIBS) mca_scoll_mpi_la_LDFLAGS = -module -avoid-version $(scoll_mpi_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/oshmem/mca/scoll/mpi/scoll_mpi_module.c b/oshmem/mca/scoll/mpi/scoll_mpi_module.c index adc1b4a826f..1228cf8a3a2 100644 --- a/oshmem/mca/scoll/mpi/scoll_mpi_module.c +++ b/oshmem/mca/scoll/mpi/scoll_mpi_module.c @@ -1,6 +1,6 @@ /* - * Copyright (c) 2011 Mellanox Technologies. All rights reserved. - * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2011-2018 Mellanox Technologies. All rights reserved. + * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2014 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ @@ -17,6 +17,7 @@ #include "oshmem/proc/proc.h" #include "oshmem/runtime/runtime.h" #include "ompi/mca/coll/base/base.h" +#include "opal/util/timings.h" int mca_scoll_mpi_init_query(bool enable_progress_threads, bool enable_mpi_threads) { @@ -121,20 +122,27 @@ mca_scoll_mpi_comm_query(oshmem_group_t *osh_group, int *priority) if ((osh_group->proc_count < 2) || (osh_group->proc_count < cm->mpi_np)) { return NULL; } + OPAL_TIMING_ENV_INIT(comm_query); + /* Create OMPI_Comm object and store ptr to it in group obj*/ if (NULL == oshmem_group_all) { osh_group->ompi_comm = &(ompi_mpi_comm_world.comm); + OPAL_TIMING_ENV_NEXT(comm_query, "ompi_mpi_comm_world"); } else { err = ompi_comm_group(&(ompi_mpi_comm_world.comm), &parent_group); if (OPAL_UNLIKELY(OMPI_SUCCESS != err)) { return NULL; } + OPAL_TIMING_ENV_NEXT(comm_query, "ompi_comm_group"); + ranks = (int*) malloc(osh_group->proc_count * sizeof(int)); if (OPAL_UNLIKELY(NULL == ranks)) { return NULL; } tag = 1; + OPAL_TIMING_ENV_NEXT(comm_query, "malloc"); + for (i = 0; i < osh_group->proc_count; i++) { ompi_proc_t* ompi_proc; for( int j = 0; j < ompi_group_size(parent_group); j++ ) { @@ -146,24 +154,32 @@ mca_scoll_mpi_comm_query(oshmem_group_t *osh_group, int *priority) } } + OPAL_TIMING_ENV_NEXT(comm_query, "build_ranks"); + err = ompi_group_incl(parent_group, osh_group->proc_count, ranks, &new_group); if (OPAL_UNLIKELY(OMPI_SUCCESS != err)) { free(ranks); return NULL; } + OPAL_TIMING_ENV_NEXT(comm_query, "ompi_group_incl"); + err = ompi_comm_create_group(&(ompi_mpi_comm_world.comm), new_group, tag, &newcomm); if (OPAL_UNLIKELY(OMPI_SUCCESS != err)) { free(ranks); return NULL; } + OPAL_TIMING_ENV_NEXT(comm_query, "ompi_comm_create_group"); + err = ompi_group_free(&new_group); if (OPAL_UNLIKELY(OMPI_SUCCESS != err)) { free(ranks); return NULL; } + OPAL_TIMING_ENV_NEXT(comm_query, "ompi_group_free"); free(ranks); osh_group->ompi_comm = newcomm; + OPAL_TIMING_ENV_NEXT(comm_query, "set_group_comm"); } mpi_module = OBJ_NEW(mca_scoll_mpi_module_t); if (!mpi_module){ diff --git a/oshmem/mca/spml/base/base.h b/oshmem/mca/spml/base/base.h index 4a0eb3e7350..58025561ca5 100644 --- a/oshmem/mca/spml/base/base.h +++ b/oshmem/mca/spml/base/base.h @@ -73,6 +73,8 @@ OSHMEM_DECLSPEC int mca_spml_base_oob_get_mkeys(int pe, OSHMEM_DECLSPEC void mca_spml_base_rmkey_unpack(sshmem_mkey_t *mkey, uint32_t seg, int pe, int tr_id); OSHMEM_DECLSPEC void mca_spml_base_rmkey_free(sshmem_mkey_t *mkey); +OSHMEM_DECLSPEC void *mca_spml_base_rmkey_ptr(const void *dst_addr, sshmem_mkey_t *mkey, int pe); + OSHMEM_DECLSPEC int mca_spml_base_put_nb(void *dst_addr, size_t size, void *src_addr, diff --git a/oshmem/mca/spml/base/spml_base.c b/oshmem/mca/spml/base/spml_base.c index 75ccda73936..3b950988fa5 100644 --- a/oshmem/mca/spml/base/spml_base.c +++ b/oshmem/mca/spml/base/spml_base.c @@ -1,6 +1,7 @@ /* * Copyright (c) 2013 Mellanox Technologies, Inc. * All rights reserved. + * Copyright (c) 2017 ARM, Inc. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -13,40 +14,39 @@ #include "opal/datatype/opal_convertor.h" #include "orte/include/orte/types.h" #include "orte/runtime/orte_globals.h" -#include "oshmem/mca/spml/yoda/spml_yoda.h" #include "oshmem/proc/proc.h" #include "oshmem/mca/spml/base/base.h" -#include "oshmem/mca/spml/yoda/spml_yoda_putreq.h" -#include "oshmem/mca/spml/yoda/spml_yoda_getreq.h" #include "opal/mca/btl/btl.h" -#define SPML_BASE_DO_CMP(res, addr, op, val) \ - switch((op)) { \ +#define SPML_BASE_DO_CMP(_res, _addr, _op, _val) \ + switch((_op)) { \ case SHMEM_CMP_EQ: \ - res = *(addr) == (val) ? 1 : 0; \ + _res = *(_addr) == (_val) ? 1 : 0; \ break; \ case SHMEM_CMP_NE: \ - res = *(addr) != (val) ? 1 : 0; \ + _res = *(_addr) != (_val) ? 1 : 0; \ break; \ case SHMEM_CMP_GT: \ - res = *(addr) > (val) ? 1 : 0; \ + _res = *(_addr) > (_val) ? 1 : 0; \ break; \ case SHMEM_CMP_LE: \ - res = *(addr) <= (val) ? 1 : 0; \ + _res = *(_addr) <= (_val) ? 1 : 0; \ break; \ case SHMEM_CMP_LT: \ - res = *(addr) < (val) ? 1: 0; \ + _res = *(_addr) < (_val) ? 1 : 0; \ break; \ case SHMEM_CMP_GE: \ - res = *(addr) >= (val) ? 1 : 0; \ + _res = *(_addr) >= (_val) ? 1 : 0; \ break; \ } -#define SPML_BASE_DO_WAIT(cond, val, addr, op) \ - do { \ - SPML_BASE_DO_CMP(cond, val,addr,op); \ - opal_progress(); \ - } while (cond == 0) ; +#define SPML_BASE_DO_WAIT(_res, _addr, _op, _val) \ + do { \ + SPML_BASE_DO_CMP(_res, _addr, _op, _val); \ + if (_res == 0) { \ + opal_progress(); \ + } \ + } while (_res == 0); /** * Wait for data delivery. @@ -54,15 +54,24 @@ */ int mca_spml_base_wait(void* addr, int cmp, void* value, int datatype) { - int *int_addr, int_value; - long *long_addr, long_value; - short *short_addr, short_value; - long long *longlong_addr, longlong_value; - int32_t *int32_addr, int32_value; - int64_t *int64_addr, int64_value; + volatile int *int_addr; + volatile long *long_addr; + volatile short *short_addr; + volatile long long *longlong_addr; + volatile int32_t *int32_addr; + volatile int64_t *int64_addr; + + int int_value; + long long_value; + short short_value; + long long longlong_value; + int32_t int32_value; + int64_t int64_value; + ompi_fortran_integer_t *fint_addr, fint_value; ompi_fortran_integer4_t *fint4_addr, fint4_value; ompi_fortran_integer8_t *fint8_addr, fint8_value; + int res = 0; switch (datatype) { @@ -144,11 +153,7 @@ int mca_spml_base_wait(void* addr, int cmp, void* value, int datatype) */ int mca_spml_base_wait_nb(void* handle) { - /* TODO fence is a gag for more accurate code - * Use shmem_quiet() (or a function calling shmem_quiet()) or - * shmem_wait_nb() to force completion of transfers for non-blocking operations. - */ - MCA_SPML_CALL(fence()); + MCA_SPML_CALL(quiet()); return OSHMEM_SUCCESS; } @@ -166,6 +171,11 @@ void mca_spml_base_rmkey_free(sshmem_mkey_t *mkey) { } +void *mca_spml_base_rmkey_ptr(const void *dst_addr, sshmem_mkey_t *mkey, int pe) +{ + return NULL; +} + int mca_spml_base_put_nb(void *dst_addr, size_t size, void *src_addr, int dst, void **handle) { diff --git a/oshmem/mca/spml/base/spml_base_frame.c b/oshmem/mca/spml/base/spml_base_frame.c index 2c230bf1825..d732f6b2476 100644 --- a/oshmem/mca/spml/base/spml_base_frame.c +++ b/oshmem/mca/spml/base/spml_base_frame.c @@ -144,7 +144,7 @@ static int mca_spml_base_open(mca_base_open_flag_t flags) if( (NULL == default_spml || NULL == default_spml[0] || 0 == strlen(default_spml[0])) || (default_spml[0][0] == '^') ) { opal_pointer_array_add(&mca_spml_base_spml, strdup("ikrit")); - opal_pointer_array_add(&mca_spml_base_spml, strdup("yoda")); + opal_pointer_array_add(&mca_spml_base_spml, strdup("ucx")); } else { opal_pointer_array_add(&mca_spml_base_spml, strdup(default_spml[0])); } diff --git a/oshmem/mca/spml/base/spml_base_select.c b/oshmem/mca/spml/base/spml_base_select.c index 5fdd773a4d1..fd46f796aa8 100644 --- a/oshmem/mca/spml/base/spml_base_select.c +++ b/oshmem/mca/spml/base/spml_base_select.c @@ -147,12 +147,7 @@ int mca_spml_base_select(bool enable_progress_threads, bool enable_mpi_threads) if (NULL == tmp_val) { continue; } - if (0 == strncmp(tmp_val, "yoda", 4) && !mca_bml_base_inited()) { - orte_errmgr.abort(1, "SPML %s cannot be selected becasue no btls are available. Please make sure that ob1 pml is selected by ompi (-mca pml ob1)", tmp_val); - } - else { - orte_errmgr.abort(1, "SPML %s cannot be selected", tmp_val); - } + orte_errmgr.abort(1, "SPML %s cannot be selected", tmp_val); } if (0 == i) { orte_errmgr.abort(2, diff --git a/oshmem/mca/spml/configure.m4 b/oshmem/mca/spml/configure.m4 index 4113cb92a32..6f6ceca5fc2 100644 --- a/oshmem/mca/spml/configure.m4 +++ b/oshmem/mca/spml/configure.m4 @@ -16,4 +16,10 @@ AC_DEFUN([MCA_oshmem_spml_CONFIG],[ # this is a direct callable component, so set that up. MCA_SETUP_DIRECT_CALL($1, $2) + + if test -z "$MCA_$1_$2_DSO_COMPONENTS" && test -z "$MCA_$1_$2_STATIC_COMPONENTS"; then + OSHMEM_FOUND_WORKING_SPML=0 + else + OSHMEM_FOUND_WORKING_SPML=1 + fi ]) diff --git a/oshmem/mca/spml/ikrit/Makefile.am b/oshmem/mca/spml/ikrit/Makefile.am index 1ce3b96063c..ffd785beab9 100644 --- a/oshmem/mca/spml/ikrit/Makefile.am +++ b/oshmem/mca/spml/ikrit/Makefile.am @@ -2,6 +2,7 @@ # Copyright (c) 2013 Mellanox Technologies, Inc. # All rights reserved. # +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -31,7 +32,8 @@ endif mcacomponentdir = $(oshmemlibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_spml_ikrit_la_SOURCES = $(ikrit_sources) -mca_spml_ikrit_la_LIBADD = $(spml_ikrit_LIBS) +mca_spml_ikrit_la_LIBADD = $(top_builddir)/oshmem/liboshmem.la \ + $(spml_ikrit_LIBS) mca_spml_ikrit_la_LDFLAGS = -module -avoid-version $(spml_ikrit_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/oshmem/mca/spml/ikrit/spml_ikrit.c b/oshmem/mca/spml/ikrit/spml_ikrit.c index e9f89ce90be..2afe4705b89 100644 --- a/oshmem/mca/spml/ikrit/spml_ikrit.c +++ b/oshmem/mca/spml/ikrit/spml_ikrit.c @@ -168,9 +168,11 @@ mca_spml_ikrit_t mca_spml_ikrit = { mca_spml_ikrit_send, mca_spml_base_wait, mca_spml_base_wait_nb, + mca_spml_ikrit_fence, /* fence is implemented as quiet */ mca_spml_ikrit_fence, mca_spml_ikrit_cache_mkeys, mca_spml_base_rmkey_free, + mca_spml_base_rmkey_ptr, mca_spml_base_memuse_hook, (void*)&mca_spml_ikrit @@ -672,7 +674,7 @@ static inline void get_completion_cb(void *ctx) { mca_spml_ikrit_get_request_t *get_req = (mca_spml_ikrit_get_request_t *) ctx; - OPAL_THREAD_ADD32(&mca_spml_ikrit.n_active_gets, -1); + OPAL_THREAD_ADD_FETCH32(&mca_spml_ikrit.n_active_gets, -1); free_get_req(get_req); } @@ -700,7 +702,7 @@ static inline int mca_spml_ikrit_get_async(void *src_addr, get_req->mxm_req.flags = 0; get_req->mxm_req.base.completed_cb = get_completion_cb; get_req->mxm_req.base.context = get_req; - OPAL_THREAD_ADD32(&mca_spml_ikrit.n_active_gets, 1); + OPAL_THREAD_ADD_FETCH32(&mca_spml_ikrit.n_active_gets, 1); SPML_IKRIT_MXM_POST_SEND(get_req->mxm_req); @@ -712,7 +714,7 @@ static inline void fence_completion_cb(void *ctx) mca_spml_ikrit_get_request_t *fence_req = (mca_spml_ikrit_get_request_t *) ctx; - OPAL_THREAD_ADD32(&mca_spml_ikrit.n_mxm_fences, -1); + OPAL_THREAD_ADD_FETCH32(&mca_spml_ikrit.n_mxm_fences, -1); free_get_req(fence_req); } @@ -734,7 +736,7 @@ static int mca_spml_ikrit_mxm_fence(int dst) fence_req->mxm_req.base.state = MXM_REQ_NEW; fence_req->mxm_req.base.completed_cb = fence_completion_cb; fence_req->mxm_req.base.context = fence_req; - OPAL_THREAD_ADD32(&mca_spml_ikrit.n_mxm_fences, 1); + OPAL_THREAD_ADD_FETCH32(&mca_spml_ikrit.n_mxm_fences, 1); SPML_IKRIT_MXM_POST_SEND(fence_req->mxm_req); return OSHMEM_SUCCESS; @@ -745,7 +747,7 @@ static inline void put_completion_cb(void *ctx) mca_spml_ikrit_put_request_t *put_req = (mca_spml_ikrit_put_request_t *) ctx; mxm_peer_t *peer; - OPAL_THREAD_ADD32(&mca_spml_ikrit.n_active_puts, -1); + OPAL_THREAD_ADD_FETCH32(&mca_spml_ikrit.n_active_puts, -1); /* TODO: keep pointer to peer in the request */ peer = &mca_spml_ikrit.mxm_peers[put_req->pe]; @@ -847,7 +849,7 @@ static inline int mca_spml_ikrit_put_internal(void* dst_addr, put_req->mxm_req.op.mem.remote_mkey = mkey; - OPAL_THREAD_ADD32(&mca_spml_ikrit.n_active_puts, 1); + OPAL_THREAD_ADD_FETCH32(&mca_spml_ikrit.n_active_puts, 1); if (mca_spml_ikrit.mxm_peers[dst].need_fence == 0) { opal_list_append(&mca_spml_ikrit.active_peers, &mca_spml_ikrit.mxm_peers[dst].link); diff --git a/oshmem/mca/spml/spml.h b/oshmem/mca/spml/spml.h index 16b372b2dcb..f320e83e474 100644 --- a/oshmem/mca/spml/spml.h +++ b/oshmem/mca/spml/spml.h @@ -120,6 +120,17 @@ typedef int (*mca_spml_base_module_wait_fn_t)(void* addr, */ typedef void (*mca_spml_base_module_mkey_unpack_fn_t)(sshmem_mkey_t *, uint32_t segno, int remote_pe, int tr_id); +/** + * If possible, get a pointer to the remote memory described by the mkey + * + * @param dst_addr address of the symmetric variable + * @param mkey remote memory key + * @param pe remote PE + * + * @return pointer to remote memory or NULL + */ +typedef void * (*mca_spml_base_module_mkey_ptr_fn_t)(const void *dst_addr, sshmem_mkey_t *mkey, int pe); + /** * free resources used by deserialized remote mkey * @@ -264,12 +275,19 @@ typedef int (*mca_spml_base_module_send_fn_t)(void *buf, mca_spml_base_put_mode_t mode); /** - * Wait for completion of all outstanding put() requests + * Assures ordering of delivery of put() requests * * @return - OSHMEM_SUCCESS or failure status. */ typedef int (*mca_spml_base_module_fence_fn_t)(void); +/** + * Wait for completion of all outstanding put() requests + * + * @return - OSHMEM_SUCCESS or failure status. + */ +typedef int (*mca_spml_base_module_quiet_fn_t)(void); + /** * Waits for completion of a non-blocking put or get issued by the calling PE. * @@ -310,9 +328,11 @@ struct mca_spml_base_module_1_0_0_t { mca_spml_base_module_wait_fn_t spml_wait; mca_spml_base_module_wait_nb_fn_t spml_wait_nb; mca_spml_base_module_fence_fn_t spml_fence; + mca_spml_base_module_quiet_fn_t spml_quiet; mca_spml_base_module_mkey_unpack_fn_t spml_rmkey_unpack; mca_spml_base_module_mkey_free_fn_t spml_rmkey_free; + mca_spml_base_module_mkey_ptr_fn_t spml_rmkey_ptr; mca_spml_base_module_memuse_hook_fn_t spml_memuse_hook; void *self; diff --git a/oshmem/mca/spml/ucx/Makefile.am b/oshmem/mca/spml/ucx/Makefile.am index fdb86de492e..84d8a749250 100644 --- a/oshmem/mca/spml/ucx/Makefile.am +++ b/oshmem/mca/spml/ucx/Makefile.am @@ -2,6 +2,7 @@ # Copyright (c) 2015 Mellanox Technologies, Inc. # All rights reserved. # +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -32,7 +33,8 @@ endif mcacomponentdir = $(ompilibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_spml_ucx_la_SOURCES = $(ucx_sources) -mca_spml_ucx_la_LIBADD = $(spml_ucx_LIBS) +mca_spml_ucx_la_LIBADD = $(top_builddir)/oshmem/liboshmem.la \ + $(spml_ucx_LIBS) mca_spml_ucx_la_LDFLAGS = -module -avoid-version $(spml_ucx_LDFLAGS) noinst_LTLIBRARIES = $(component_noinst) diff --git a/oshmem/mca/spml/ucx/spml_ucx.c b/oshmem/mca/spml/ucx/spml_ucx.c index 1e7ac7f5374..463b0f0a005 100644 --- a/oshmem/mca/spml/ucx/spml_ucx.c +++ b/oshmem/mca/spml/ucx/spml_ucx.c @@ -1,7 +1,7 @@ /* * Copyright (c) 2013 Mellanox Technologies, Inc. * All rights reserved. - * Copyright (c) 2014-2016 Research Organization for Information Science + * Copyright (c) 2014-2018 Research Organization for Information Science * and Technology (RIST). All rights reserved. * Copyright (c) 2016 ARM, Inc. All rights reserved. * $COPYRIGHT$ @@ -60,10 +60,11 @@ mca_spml_ucx_t mca_spml_ucx = { mca_spml_ucx_send, mca_spml_base_wait, mca_spml_base_wait_nb, - mca_spml_ucx_quiet, /* At the moment fence is the same as quite for - every spml */ + mca_spml_ucx_fence, + mca_spml_ucx_quiet, mca_spml_ucx_rmkey_unpack, mca_spml_ucx_rmkey_free, + mca_spml_ucx_rmkey_ptr, mca_spml_ucx_memuse_hook, (void*)&mca_spml_ucx }, @@ -270,7 +271,7 @@ int mca_spml_ucx_add_procs(ompi_proc_t** procs, size_t nprocs) dump_address(my_rank, (char *)wk_local_addr, wk_addr_len); rc = oshmem_shmem_xchng(wk_local_addr, wk_addr_len, nprocs, - (void **)&wk_raddrs, &wk_roffs, &wk_rsizes); + (void **)&wk_raddrs, &wk_roffs, &wk_rsizes); if (rc != OSHMEM_SUCCESS) { goto error; } @@ -285,13 +286,14 @@ int mca_spml_ucx_add_procs(ompi_proc_t** procs, size_t nprocs) ep_params.field_mask = UCP_EP_PARAM_FIELD_REMOTE_ADDRESS; ep_params.address = (ucp_address_t *)(wk_raddrs + wk_roffs[i]); - err = ucp_ep_create(mca_spml_ucx.ucp_worker, - &ep_params, + err = ucp_ep_create(mca_spml_ucx.ucp_worker, &ep_params, &mca_spml_ucx.ucp_peers[i].ucp_conn); if (UCS_OK != err) { - SPML_ERROR("ucp_ep_create failed: %s", ucs_status_string(err)); + SPML_ERROR("ucp_ep_create(proc=%d/%d) failed: %s", n, nprocs, + ucs_status_string(err)); goto error2; } + OSHMEM_PROC_DATA(procs[i])->num_transports = 1; OSHMEM_PROC_DATA(procs[i])->transport_ids = spml_ucx_transport_ids; } @@ -318,8 +320,6 @@ int mca_spml_ucx_add_procs(ompi_proc_t** procs, size_t nprocs) free(wk_rsizes); if (wk_roffs) free(wk_roffs); - if (mca_spml_ucx.ucp_peers) - free(mca_spml_ucx.ucp_peers); error: rc = OSHMEM_ERR_OUT_OF_RESOURCE; SPML_ERROR("add procs FAILED rc=%d", rc); @@ -353,6 +353,23 @@ void mca_spml_ucx_rmkey_free(sshmem_mkey_t *mkey) ucp_rkey_destroy(ucx_mkey->rkey); } +void *mca_spml_ucx_rmkey_ptr(const void *dst_addr, sshmem_mkey_t *mkey, int pe) +{ +#if (((UCP_API_MAJOR >= 1) && (UCP_API_MINOR >= 3)) || (UCP_API_MAJOR >= 2)) + void *rva; + ucs_status_t err; + spml_ucx_mkey_t *ucx_mkey = (spml_ucx_mkey_t *)(mkey->spml_context); + + err = ucp_rkey_ptr(ucx_mkey->rkey, (uint64_t)dst_addr, &rva); + if (UCS_OK != err) { + return NULL; + } + return rva; +#else + return NULL; +#endif +} + static void mca_spml_ucx_cache_mkey(sshmem_mkey_t *mkey, uint32_t segno, int dst_pe) { ucp_peer_t *peer; @@ -503,19 +520,20 @@ int mca_spml_ucx_deregister(sshmem_mkey_t *mkeys) spml_ucx_mkey_t *ucx_mkey; map_segment_t *mem_seg; - MCA_SPML_CALL(fence()); + MCA_SPML_CALL(quiet()); if (!mkeys) return OSHMEM_SUCCESS; if (!mkeys[0].spml_context) return OSHMEM_SUCCESS; - mem_seg = memheap_find_va(mkeys[0].va_base); + mem_seg = memheap_find_va(mkeys[0].va_base); + ucx_mkey = (spml_ucx_mkey_t*)mkeys[0].spml_context; if (MAP_SEGMENT_ALLOC_UCX != mem_seg->type) { - ucx_mkey = (spml_ucx_mkey_t *)mkeys[0].spml_context; ucp_mem_unmap(mca_spml_ucx.ucp_context, ucx_mkey->mem_h); } + ucp_rkey_destroy(ucx_mkey->rkey); if (0 < mkeys[0].len) { ucp_rkey_buffer_release(mkeys[0].u.data); @@ -580,7 +598,7 @@ int mca_spml_ucx_fence(void) { ucs_status_t err; - err = ucp_worker_flush(mca_spml_ucx.ucp_worker); + err = ucp_worker_fence(mca_spml_ucx.ucp_worker); if (UCS_OK != err) { SPML_ERROR("fence failed: %s", ucs_status_string(err)); oshmem_shmem_abort(-1); @@ -595,7 +613,7 @@ int mca_spml_ucx_quiet(void) err = ucp_worker_flush(mca_spml_ucx.ucp_worker); if (UCS_OK != err) { - SPML_ERROR("fence failed: %s", ucs_status_string(err)); + SPML_ERROR("quiet failed: %s", ucs_status_string(err)); oshmem_shmem_abort(-1); return OSHMEM_ERROR; } diff --git a/oshmem/mca/spml/ucx/spml_ucx.h b/oshmem/mca/spml/ucx/spml_ucx.h index b524031d3f2..b57850414bb 100644 --- a/oshmem/mca/spml/ucx/spml_ucx.h +++ b/oshmem/mca/spml/ucx/spml_ucx.h @@ -112,6 +112,7 @@ extern void mca_spml_ucx_memuse_hook(void *addr, size_t length); extern void mca_spml_ucx_rmkey_unpack(sshmem_mkey_t *mkey, uint32_t segno, int pe, int tr_id); extern void mca_spml_ucx_rmkey_free(sshmem_mkey_t *mkey); +extern void *mca_spml_ucx_rmkey_ptr(const void *dst_addr, sshmem_mkey_t *, int pe); extern int mca_spml_ucx_add_procs(ompi_proc_t** procs, size_t nprocs); extern int mca_spml_ucx_del_procs(ompi_proc_t** procs, size_t nprocs); diff --git a/oshmem/mca/spml/ucx/spml_ucx_component.c b/oshmem/mca/spml/ucx/spml_ucx_component.c index 42567c3add8..6562184ae63 100644 --- a/oshmem/mca/spml/ucx/spml_ucx_component.c +++ b/oshmem/mca/spml/ucx/spml_ucx_component.c @@ -93,7 +93,7 @@ static inline void mca_spml_ucx_param_register_string(const char* param_name, static int mca_spml_ucx_component_register(void) { - mca_spml_ucx_param_register_int("priority", 5, + mca_spml_ucx_param_register_int("priority", 21, "[integer] ucx priority", &mca_spml_ucx.priority); @@ -126,8 +126,9 @@ static int mca_spml_ucx_component_open(void) } memset(¶ms, 0, sizeof(params)); - params.field_mask = UCP_PARAM_FIELD_FEATURES; + params.field_mask = UCP_PARAM_FIELD_FEATURES|UCP_PARAM_FIELD_ESTIMATED_NUM_EPS; params.features = UCP_FEATURE_RMA|UCP_FEATURE_AMO32|UCP_FEATURE_AMO64; + params.estimated_num_eps = ompi_proc_world_size(); err = ucp_init(¶ms, ucp_config, &mca_spml_ucx.ucp_context); ucp_config_release(ucp_config); diff --git a/oshmem/mca/spml/yoda/Makefile.am b/oshmem/mca/spml/yoda/Makefile.am deleted file mode 100644 index e0d48bfdb2f..00000000000 --- a/oshmem/mca/spml/yoda/Makefile.am +++ /dev/null @@ -1,45 +0,0 @@ -# -# Copyright (c) 2013 Mellanox Technologies, Inc. -# All rights reserved. -# -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# - -dist_oshmemdata_DATA = \ - help-oshmem-spml-yoda.txt - -EXTRA_DIST = post_configure.sh - -AM_CFLAGS = $(btl_sm_CPPFLAGS) - -yoda_sources = \ - spml_yoda.c \ - spml_yoda.h \ - spml_yoda_component.c \ - spml_yoda_component.h \ - spml_yoda_rdmafrag.h \ - spml_yoda_putreq.c \ - spml_yoda_putreq.h \ - spml_yoda_getreq.c \ - spml_yoda_getreq.h - -if MCA_BUILD_ompi_pml_ob1_DSO -component_noinst = -component_install = mca_spml_yoda.la -else -component_noinst = libmca_spml_yoda.la -component_install = -endif - -mcacomponentdir = $(oshmemlibdir) -mcacomponent_LTLIBRARIES = $(component_install) -mca_spml_yoda_la_SOURCES = $(yoda_sources) -mca_spml_yoda_la_LDFLAGS = -module -avoid-version - -noinst_LTLIBRARIES = $(component_noinst) -libmca_spml_yoda_la_SOURCES = $(yoda_sources) -libmca_spml_yoda_la_LDFLAGS = -module -avoid-version diff --git a/oshmem/mca/spml/yoda/help-oshmem-spml-yoda.txt b/oshmem/mca/spml/yoda/help-oshmem-spml-yoda.txt deleted file mode 100644 index ac185cdd3f5..00000000000 --- a/oshmem/mca/spml/yoda/help-oshmem-spml-yoda.txt +++ /dev/null @@ -1,17 +0,0 @@ -# -*- text -*- -# -# Copyright (c) 2013 Mellanox Technologies, Inc. -# All rights reserved. -# $COPYRIGHT$ -# -# Additional copyrights may follow -# -# $HEADER$ -# -[internal_oom_error] -'%s' operation failed. Unable to allocate buffer, need %d bytes. -Try increasing 'spml_yoda_bml_alloc_threshold' value or setting it to '0' to -force waiting for all puts completion. - - spml_yoda_bml_alloc_threshold: %d - diff --git a/oshmem/mca/spml/yoda/post_configure.sh b/oshmem/mca/spml/yoda/post_configure.sh deleted file mode 100644 index d7d3db8278e..00000000000 --- a/oshmem/mca/spml/yoda/post_configure.sh +++ /dev/null @@ -1,4 +0,0 @@ -# Copyright (c) 2013 Mellanox Technologies, Inc. -# All rights reserved -# $COPYRIGHT$ -DIRECT_CALL_HEADER="oshmem/mca/spml/yoda/spml_yoda.h" diff --git a/oshmem/mca/spml/yoda/spml_yoda.c b/oshmem/mca/spml/yoda/spml_yoda.c deleted file mode 100644 index ebdceab8c96..00000000000 --- a/oshmem/mca/spml/yoda/spml_yoda.c +++ /dev/null @@ -1,1269 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2013-2015 Mellanox Technologies, Inc. - * All rights reserved. - * Copyright (c) 2014-2016 Research Organization for Information Science - * and Technology (RIST). All rights reserved. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights - * reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#include "oshmem_config.h" - -#include "opal/util/show_help.h" -#include "orte/include/orte/types.h" -#include "orte/runtime/orte_globals.h" - -#include "opal/datatype/opal_convertor.h" - -#include "ompi/datatype/ompi_datatype.h" -#include "ompi/mca/pml/pml.h" -#include "opal/mca/btl/btl.h" -#include "opal/mca/btl/base/base.h" -#include "opal/mca/btl/sm/btl_sm_frag.h" - -#include "oshmem/proc/proc.h" -#include "oshmem/mca/memheap/memheap.h" -#include "oshmem/mca/memheap/base/base.h" -#include "oshmem/mca/spml/spml.h" - -#include "spml_yoda.h" -#include "spml_yoda_putreq.h" -#include "spml_yoda_getreq.h" -#ifdef HAVE_UNISTD_H -#include -#endif -#include "oshmem/runtime/runtime.h" - -/* Turn ON/OFF debug output from build (default 0) */ -#ifndef SPML_YODA_DEBUG -#define SPML_YODA_DEBUG 0 -#endif - -mca_spml_yoda_module_t mca_spml_yoda = { - { - /* Init mca_spml_base_module_t */ - mca_spml_yoda_add_procs, - mca_spml_yoda_del_procs, - mca_spml_yoda_enable, - mca_spml_yoda_register, - mca_spml_yoda_deregister, - mca_spml_base_oob_get_mkeys, - mca_spml_yoda_put, - mca_spml_yoda_put_nb, - mca_spml_yoda_get, - mca_spml_yoda_get_nb, - mca_spml_yoda_recv, - mca_spml_yoda_send, - mca_spml_base_wait, - mca_spml_base_wait_nb, - mca_spml_yoda_fence, - mca_spml_base_rmkey_unpack, - mca_spml_base_rmkey_free, - mca_spml_base_memuse_hook, - - (void *)&mca_spml_yoda - } -}; - -static inline mca_bml_base_btl_t *get_next_btl(int dst, int *btl_id); - -static inline void spml_yoda_prepare_for_get(void* buffer, size_t size, void* p_src, int dst, void* p_dst, void* p_getreq); - -static int btl_name_to_id(char *btl_name) -{ - if (0 == strcmp(btl_name, "sm")) { - return YODA_BTL_SM; - } else if (0 == strcmp(btl_name, "openib")) { - return YODA_BTL_OPENIB; - } else if (0 == strcmp(btl_name, "self")) { - return YODA_BTL_SELF; - } else if (0 == strcmp(btl_name, "vader")) { - return YODA_BTL_VADER; - } else if (0 == strcmp(btl_name, "ugni")) { - return YODA_BTL_UGNI; - } - return YODA_BTL_UNKNOWN; -} - -static char *btl_type2str(int btl_type) -{ - switch (btl_type) { - case YODA_BTL_UNKNOWN: - return "unknown btl"; - case YODA_BTL_SELF: - return "self"; - case YODA_BTL_OPENIB: - return "openib"; - case YODA_BTL_SM: - return "sm"; - case YODA_BTL_VADER: - return "vader"; - case YODA_BTL_UGNI: - return "ugni"; - } - return "bad_btl_type"; -} - -static inline void calc_nfrags_put (mca_bml_base_btl_t* bml_btl, - size_t size, - unsigned int *frag_size, - int *nfrags, - int use_send) -{ - if (use_send) { - *frag_size = bml_btl->btl->btl_max_send_size - SPML_YODA_SEND_CONTEXT_SIZE; - } - else { - *frag_size = bml_btl->btl->btl_max_send_size; - } - *nfrags = 1 + (size - 1) / (*frag_size); -} - -static inline void calc_nfrags_get (mca_bml_base_btl_t* bml_btl, - size_t size, - unsigned int *frag_size, - int *nfrags, - int use_send) -{ - if (use_send) { - *frag_size = bml_btl->btl->btl_max_send_size - SPML_YODA_SEND_CONTEXT_SIZE; - } - else { - *frag_size = bml_btl->btl->btl_max_send_size; - } - *nfrags = 1 + (size - 1) / (*frag_size); -} - -static int mca_spml_yoda_fence_internal(int puts_wait) -{ - int n_puts_wait; - - /* Waiting for certain number of puts : 'puts_wait' - * if 'puts_wait' == 0 waiting for all puts ('n_active_puts') - * if 'puts_wait' > 'n_active_puts' waiting for 'n_active_puts' */ - - n_puts_wait = puts_wait > 0 ? mca_spml_yoda.n_active_puts - puts_wait : 0; - - if (n_puts_wait < 0) { - n_puts_wait = 0; - } - - while (n_puts_wait < mca_spml_yoda.n_active_puts) { - oshmem_request_wait_any_completion(); - } - return OSHMEM_SUCCESS; -} - -static inline void mca_spml_yoda_bml_alloc( mca_bml_base_btl_t* bml_btl, - mca_btl_base_descriptor_t** des, - uint8_t order, size_t size, uint32_t flags, - int use_send) -{ - bool is_done; - bool is_fence_complete; - - is_done = false; - is_fence_complete = false; - - if (use_send) { - size = (0 == size ? size : size + SPML_YODA_SEND_CONTEXT_SIZE); - } - - do { - mca_bml_base_alloc(bml_btl, - des, - MCA_BTL_NO_ORDER, - size, - flags); - - if (OPAL_UNLIKELY(!(*des) || !(*des)->des_segments ) && !is_fence_complete) { - mca_spml_yoda_fence_internal(mca_spml_yoda.bml_alloc_threshold); - - is_fence_complete = true; - } else { - is_done = true; - } - - } while (!is_done); -} - -static inline void spml_yoda_prepare_for_put(void* buffer, size_t size, void* p_src, void* p_dst, int use_send) -{ - if (use_send) { - memcpy((void*) buffer, &size, sizeof(size)); - memcpy((void*) (((char*) buffer) + sizeof(size)), &p_dst, sizeof(void *)); - memcpy((void*) (((char*) buffer) + sizeof(size) + sizeof(void *)), p_src, size); - } - else { - memcpy((void*) ((unsigned char*) buffer), p_src, size); - } -} - -static inline void spml_yoda_prepare_for_get_response(void* buffer, size_t size, void* p_src, void* p_dst, void* p_getreq, int use_send) -{ - if (use_send) { - memcpy((void*) buffer, &size, sizeof(size)); - memcpy((void*) (((char*) buffer) + sizeof(size)), &p_dst, sizeof(void *)); - memcpy((void*) (((char*) buffer) + sizeof(size) + sizeof(void *)), p_src, size); - memcpy((void*) (((char*) buffer) + sizeof(size) + sizeof(void *) + size), &p_getreq, sizeof(void *)); - } - else { - memcpy((void*) ( (unsigned char*) buffer), p_src, size); - } -} - -static inline void spml_yoda_prepare_for_get(void* buffer, size_t size, void* p_src, int dst, void* p_dst, void* p_getreq) -{ - memcpy((void*) buffer, &p_src, sizeof(void *)); - memcpy((void*) (((unsigned char*) buffer) + sizeof(void *)), &size, sizeof(size)); - memcpy((void*) (((unsigned char*) buffer) + sizeof(void *) + sizeof(size) ), &dst, sizeof(dst)); - memcpy((void*) (((unsigned char*) buffer) + sizeof(void *) + sizeof(size) + sizeof(dst)), &p_dst, sizeof(void *)); - memcpy((void*) (((unsigned char*) buffer) + sizeof(void *) + sizeof(size) + sizeof(dst) + sizeof(void *)), &p_getreq, sizeof(void *)); -} - -static void mca_yoda_put_callback(mca_btl_base_module_t* btl, - mca_btl_base_tag_t tag, - mca_btl_base_descriptor_t* des, - void* cbdata ) -{ - size_t* size; - void** l_addr; - - size = (size_t *) des->des_segments->seg_addr.pval; - l_addr = (void**) ( ((char*)size) + sizeof(*size)); - memcpy(*l_addr, ((char*)l_addr) + sizeof(*l_addr), *size); -} - -static void mca_yoda_get_callback(mca_btl_base_module_t* btl, - mca_btl_base_tag_t tag, - mca_btl_base_descriptor_t* des, - void* cbdata ) -{ - void** p, ** p_src, **p_dst; - size_t* size; - int* dst; - void** p_getreq; - mca_btl_base_descriptor_t* des_loc; - int rc; - mca_bml_base_btl_t* bml_btl; - mca_spml_yoda_rdma_frag_t* frag; - int btl_id; - mca_spml_yoda_put_request_t *putreq; - - rc = OSHMEM_SUCCESS; - btl_id = 0; - putreq = NULL; - - /* Unpack data */ - p = (void **)des->des_segments->seg_addr.pval; - p_src = (void*) p; - - size = (size_t*)((char*)p_src + sizeof(*p_src) ); - dst = (int*)( (char*)size + sizeof(*size)); - p_dst = (void*) ((char*)dst + sizeof(*dst)); - p_getreq =(void**) ( (char*)p_dst + sizeof(*p_dst)); - - /* Prepare put via send*/ - bml_btl = get_next_btl(*dst, &btl_id); - - putreq = mca_spml_yoda_putreq_alloc(*dst); - frag = &putreq->put_frag; - - mca_spml_yoda_bml_alloc(bml_btl, - &des_loc, - MCA_BTL_NO_ORDER, - *size, - MCA_BTL_DES_SEND_ALWAYS_CALLBACK, - 1); - - if (OPAL_UNLIKELY(!des_loc || !des_loc->des_segments)) { - SPML_ERROR("shmem OOM error need %d bytes", (int)*size); - oshmem_shmem_abort(-1); - } - spml_yoda_prepare_for_get_response((void*)des_loc->des_segments->seg_addr.pval, *size, (void*)*p_src, (void*) *p_dst,(void*)*p_getreq,1); - - frag->rdma_req = putreq; - - /* Initialize callback data for put*/ - des_loc->des_cbdata = frag; - des_loc->des_cbfunc = mca_spml_yoda_put_completion; - des_loc->des_segment_count = 1; - - OPAL_THREAD_ADD32(&mca_spml_yoda.n_active_puts, 1); - - /* Put via send*/ - rc = mca_bml_base_send(bml_btl, des_loc, MCA_SPML_YODA_GET_RESPONSE); - if (1 == rc) { - rc = OSHMEM_SUCCESS; - } - - if (OPAL_UNLIKELY(OSHMEM_SUCCESS != rc)) { - if (OSHMEM_ERR_OUT_OF_RESOURCE == rc) { - /* No free resources, Block on completion here */ - SPML_ERROR("shmem error: OSHMEM_ERR_OUT_OF_RESOURCE"); - oshmem_request_wait_completion(&putreq->req_put.req_base.req_oshmem); - } else { - SPML_ERROR("shmem error"); - } - /* exit with errro */ - SPML_ERROR("shmem error: ret = %i, send_pe = %i, dest_pe = %i", - rc, oshmem_my_proc_id(), *dst); - oshmem_shmem_abort(-1); - rc = OSHMEM_ERROR; - } -} - -static void mca_yoda_get_response_callback(mca_btl_base_module_t* btl, - mca_btl_base_tag_t tag, - mca_btl_base_descriptor_t* des, - void* cbdata ) -{ - size_t* size; - void** l_addr; - mca_spml_yoda_get_request_t* getreq; - - /* unpacking data*/ - size = (size_t *) ( ((char*)des->des_segments->seg_addr.pval) ); - l_addr = (void**)( ((char*)size) + sizeof(*size)); - getreq = (mca_spml_yoda_get_request_t*)*(void**)((char*)l_addr + sizeof(*l_addr) + *size); - - /* Complete get request*/ - OPAL_THREAD_ADD32(&getreq->parent->active_count, -1); - getreq->req_get.req_base.req_spml_complete = true; - oshmem_request_complete(&getreq->req_get.req_base.req_oshmem, 1); - oshmem_request_free((oshmem_request_t**) &getreq); - - memcpy(*l_addr, (char*)l_addr + sizeof(*l_addr), *size); -} - -/** - * note: we have to reg memory directly with btl because no proc will have a full btl list in proc_bml - */ -int mca_spml_yoda_deregister(sshmem_mkey_t *mkeys) -{ - int i; - struct yoda_btl *ybtl; - mca_spml_yoda_context_t* yoda_context; - - MCA_SPML_CALL(fence()); - mca_spml_yoda_wait_gets(); - - if (!mkeys) { - return OSHMEM_SUCCESS; - } - - for (i = 0; i < mca_spml_yoda.n_btls; i++) { - ybtl = &mca_spml_yoda.btl_type_map[i]; - yoda_context = (mca_spml_yoda_context_t*) mkeys[i].spml_context; - if (NULL == yoda_context) { - continue; - } - if (yoda_context->btl_src_descriptor) { - ybtl->btl->btl_free(ybtl->btl, yoda_context->btl_src_descriptor); - yoda_context->btl_src_descriptor = NULL; - } - if (yoda_context->registration) { - ybtl->btl->btl_deregister_mem (ybtl->btl, yoda_context->registration); - } - - } - free(mkeys); - - return OSHMEM_SUCCESS; -} - -sshmem_mkey_t *mca_spml_yoda_register(void* addr, - size_t size, - uint64_t shmid, - int *count) -{ - int i; - sshmem_mkey_t *mkeys; - struct yoda_btl *ybtl; - mca_spml_yoda_context_t* yoda_context; - - SPML_VERBOSE(10, "address %p len %llu", addr, (unsigned long long)size); - *count = 0; - /* make sure everything is initialized to 0 */ - mkeys = (sshmem_mkey_t *) calloc(1, - mca_spml_yoda.n_btls * sizeof(*mkeys)); - if (!mkeys) { - return NULL ; - } - - mca_bml.bml_register( MCA_SPML_YODA_PUT, - mca_yoda_put_callback, - NULL ); - mca_bml.bml_register( MCA_SPML_YODA_GET, - mca_yoda_get_callback, - NULL ); - mca_bml.bml_register( MCA_SPML_YODA_GET_RESPONSE, - mca_yoda_get_response_callback, - NULL ); - /* Register proc memory in every rdma BTL. */ - for (i = 0; i < mca_spml_yoda.n_btls; i++) { - - ybtl = &mca_spml_yoda.btl_type_map[i]; - mkeys[i].va_base = addr; - mkeys[i].u.key = MAP_SEGMENT_SHM_INVALID; - - if (!ybtl->use_cnt) { - SPML_VERBOSE(10, - "%s: present but not in use. SKIP registration", - btl_type2str(ybtl->btl_type)); - continue; - } - - /* If we have shared memory just save its id */ - if ((YODA_BTL_SM == ybtl->btl_type || YODA_BTL_VADER == ybtl->btl_type) - && MAP_SEGMENT_SHM_INVALID != (int)shmid) { - mkeys[i].u.key = shmid; - mkeys[i].va_base = 0; - continue; - } - - yoda_context = calloc(1, sizeof(*yoda_context)); - mkeys[i].spml_context = yoda_context; - - yoda_context->registration = NULL; - if (ybtl->btl->btl_flags & MCA_BTL_FLAGS_RDMA) { - if (NULL != ybtl->btl->btl_register_mem) { - yoda_context->registration = ybtl->btl->btl_register_mem (ybtl->btl, MCA_BTL_ENDPOINT_ANY, - addr, size, MCA_BTL_REG_FLAG_ACCESS_ANY); - if (NULL == yoda_context->registration) { - SPML_ERROR("%s: failed to register source memory: addr: %p, size: %u", - btl_type2str(ybtl->btl_type), addr, size); - /* FIXME some cleanup might be needed here - * yoda_context->btl_src_descriptor = NULL; - * *count = ???; - * free(spml_context); - */ - free(mkeys); - return NULL; - } - } - - yoda_context->btl_src_descriptor = NULL; - mkeys[i].u.data = yoda_context->registration; - mkeys[i].len = yoda_context->registration ? ybtl->btl->btl_registration_handle_size : 0; - } - - SPML_VERBOSE(5, - "rank %d btl %s va_base: 0x%p len: %d key %llx size %llu", - oshmem_proc_pe(oshmem_proc_local()), btl_type2str(ybtl->btl_type), - mkeys[i].va_base, mkeys[i].len, (unsigned long long)mkeys[i].u.key, (unsigned long long)size); - } - *count = mca_spml_yoda.n_btls; - return mkeys; -} - -/* - * For each proc setup a datastructure that indicates the BTLs - * that can be used to reach the destination. - */ -static void mca_spml_yoda_error_handler(struct mca_btl_base_module_t* btl, - int32_t flags, - opal_proc_t* errproc, - char* btlinfo) -{ - oshmem_shmem_abort(-1); -} - -/* make global btl list&map */ -static int create_btl_list(void) -{ - int btl_type; - char *btl_name; - int size; - opal_list_item_t *item; - mca_btl_base_selected_module_t *btl_sm; - int i; - - size = opal_list_get_size(&mca_btl_base_modules_initialized); - if (0 >= size) { - SPML_ERROR("no btl(s) available"); - return OSHMEM_ERROR; - } - SPML_VERBOSE(50, "found %d capable btls", size); - - mca_spml_yoda.btl_type_map = - (struct yoda_btl *) calloc(size, sizeof(struct yoda_btl)); - if (!mca_spml_yoda.btl_type_map) - return OSHMEM_ERROR; - - mca_spml_yoda.n_btls = 0; - for (i = 0, item = opal_list_get_first(&mca_btl_base_modules_initialized); - item != opal_list_get_end(&mca_btl_base_modules_initialized); - item = opal_list_get_next(item), i++) { - - btl_sm = (mca_btl_base_selected_module_t *) item; - btl_name = btl_sm->btl_component->btl_version.mca_component_name; - btl_type = btl_name_to_id(btl_name); - - SPML_VERBOSE(50, "found btl (%s) btl_type=%s", btl_name, btl_type2str(btl_type)); - - /* Note: we setup bml_btl in create_btl_idx() */ - mca_spml_yoda.btl_type_map[mca_spml_yoda.n_btls].bml_btl = NULL; - mca_spml_yoda.btl_type_map[mca_spml_yoda.n_btls].btl = - btl_sm->btl_module; - mca_spml_yoda.btl_type_map[mca_spml_yoda.n_btls].btl_type = btl_type; - mca_spml_yoda.n_btls++; - } - - if (0 == mca_spml_yoda.n_btls) { - SPML_ERROR("can not find any suitable btl"); - return OSHMEM_ERROR; - } - - return OSHMEM_SUCCESS; -} - -static int _find_btl_id(mca_bml_base_btl_t *bml_btl) -{ - int i; - - for (i = 0; i < mca_spml_yoda.n_btls; i++) { - if (mca_spml_yoda.btl_type_map[i].btl == bml_btl->btl) - return i; - } - return -1; -} - -/* for each proc create transport ids which are indexes into global - * btl list&map - */ -static int create_btl_idx(int dst_pe) -{ - ompi_proc_t *proc; - int btl_id; - mca_bml_base_endpoint_t* endpoint; - mca_bml_base_btl_t* bml_btl = 0; - int i, size; - mca_bml_base_btl_array_t *btl_array; - int shmem_index = -1; - - proc = oshmem_proc_group_find(oshmem_group_all, dst_pe); - endpoint = (mca_bml_base_endpoint_t*) proc->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_BML]; - assert(endpoint); - size = mca_bml_base_btl_array_get_size(btl_array = &endpoint->btl_rdma); - - if (0 >= size) { - /* Possibly this is SM BTL with KNEM disabled? Then we should use send based get/put */ - /* - This hack is necessary for the case when KNEM is not available. - In this case we still want to use send/recv of SM BTL for put and get - but SM BTL is not in the rdma list anymore - */ - size = mca_bml_base_btl_array_get_size(btl_array = - &endpoint->btl_eager); - if (0 < size) { - /*Chose SHMEM capable btl from eager array. Not filter now: take the first - (but could appear on demand).*/ - shmem_index = 0; - size = 1; - } - else { - SPML_ERROR("no SHMEM capable transport for dest pe=%d", dst_pe); - return OSHMEM_ERROR; - } - } - - OSHMEM_PROC_DATA(proc)->transport_ids = (char *) malloc(size * sizeof(char)); - if (NULL == OSHMEM_PROC_DATA(proc)->transport_ids) - return OSHMEM_ERROR; - - OSHMEM_PROC_DATA(proc)->num_transports = size; - - for (i = 0; i < size; i++) { - bml_btl = mca_bml_base_btl_array_get_index(btl_array, - (shmem_index >= 0) ? - (shmem_index) : (i)); - btl_id = _find_btl_id(bml_btl); - SPML_VERBOSE(50, - "dst_pe(%d) use btl (%s) btl_id=%d", - dst_pe, bml_btl->btl->btl_component->btl_version.mca_component_name, btl_id); - if (0 > btl_id) { - SPML_ERROR("unknown btl: dst_pe(%d) use btl (%s) btl_id=%d", - dst_pe, bml_btl->btl->btl_component->btl_version.mca_component_name, btl_id); - return OSHMEM_ERROR; - } - OSHMEM_PROC_DATA(proc)->transport_ids[i] = btl_id; - mca_spml_yoda.btl_type_map[btl_id].bml_btl = bml_btl; - mca_spml_yoda.btl_type_map[btl_id].use_cnt++; - } - return OSHMEM_SUCCESS; -} - -static int destroy_btl_list(void) -{ - if (mca_spml_yoda.btl_type_map) { - free(mca_spml_yoda.btl_type_map); - } - - return OSHMEM_SUCCESS; -} - -static int destroy_btl_idx(int dst_pe) -{ - ompi_proc_t *proc; - - proc = oshmem_proc_group_find(oshmem_group_all, dst_pe); - if (NULL != OSHMEM_PROC_DATA(proc)->transport_ids) { - free(OSHMEM_PROC_DATA(proc)->transport_ids); - } - - return OSHMEM_SUCCESS; -} - -int mca_spml_yoda_add_procs(ompi_proc_t** procs, size_t nprocs) -{ - opal_bitmap_t reachable; - int rc; - size_t i; - - if (0 == nprocs) { - return OSHMEM_SUCCESS; - } - - OBJ_CONSTRUCT(&reachable, opal_bitmap_t); - rc = opal_bitmap_init(&reachable, (int) nprocs); - if (OSHMEM_SUCCESS != rc) { - return rc; - } - - rc = mca_bml.bml_register_error(mca_spml_yoda_error_handler); - if (OMPI_SUCCESS != rc) { - goto cleanup_and_return; - } - - /* create_btl_idx requires the proc was add_proc'ed, so do it now */ - rc = MCA_PML_CALL(add_procs(procs, nprocs)); - if (OMPI_SUCCESS != rc) { - goto cleanup_and_return; - } - - /* create btl index and map */ - rc = create_btl_list(); - if (OSHMEM_SUCCESS != rc) { - goto cleanup_and_return; - } - - for (i = 0; i < nprocs; i++) { - rc = create_btl_idx(i); - if (OSHMEM_SUCCESS != rc) { - goto cleanup_and_return; - } - } - -cleanup_and_return: - OBJ_DESTRUCT(&reachable); - - return rc; -} - -int mca_spml_yoda_del_procs(ompi_proc_t** procs, size_t nprocs) -{ - size_t i; - - for (i = 0; i < nprocs; i++) { - destroy_btl_idx(i); - } - destroy_btl_list(); - - return OSHMEM_SUCCESS; -} - -static inline mca_bml_base_btl_t *get_next_btl(int dst, int *btl_id) -{ - mca_bml_base_endpoint_t* endpoint; - mca_bml_base_btl_t* bml_btl = NULL; - ompi_proc_t *proc; - mca_bml_base_btl_array_t *btl_array = 0; - int shmem_index = -1; - int size = 0; - - /* get endpoint and btl */ - proc = oshmem_proc_group_all(dst); - if (!proc) { - SPML_ERROR("Can not find destination proc for pe=%d", dst); - return NULL ; - } - - endpoint = (mca_bml_base_endpoint_t*) proc->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_BML]; - if (!endpoint) { - SPML_ERROR("pe=%d proc has no endpoint", dst); - return NULL ; - } - - /* At the moment always return first transport */ - size = mca_bml_base_btl_array_get_size(btl_array = &endpoint->btl_rdma); - - if (0 >= size) { - /* Possibly this is SM BTL with KNEM disabled? Then we should use send based get/put */ - /* - This hack is necessary for the case when KNEM is not available. - In this case we still want to use send/recv of SM BTL for put and get - but SM BTL is not in the rdma list anymore - */ - size = mca_bml_base_btl_array_get_size(btl_array = - &endpoint->btl_eager); - } - if (0 < size) { - shmem_index = 0; - bml_btl = mca_bml_base_btl_array_get_index(btl_array, shmem_index); - } - - *btl_id = OSHMEM_PROC_DATA(proc)->transport_ids[0]; - -#if SPML_YODA_DEBUG == 1 - assert(*btl_id >= 0 && *btl_id < YODA_BTL_MAX); - SPML_VERBOSE(100, "pe=%d reachable via btl %s %d", dst, - bml_btl->btl->btl_component->btl_version.mca_component_name, *btl_id); -#endif - return bml_btl; -} - -static inline int mca_spml_yoda_put_internal(void *dst_addr, - size_t size, - void *src_addr, - int dst, - int is_nb) -{ - int rc = OSHMEM_SUCCESS; - mca_spml_yoda_put_request_t *putreq = NULL; - mca_bml_base_btl_t* bml_btl; - mca_btl_base_descriptor_t* des = NULL; - mca_btl_base_segment_t* segment; - mca_spml_yoda_rdma_frag_t* frag; - int nfrags; - int i; - unsigned ncopied = 0; - unsigned int frag_size = 0; - char *p_src, *p_dst; - void* rva; - sshmem_mkey_t *r_mkey; - int btl_id = 0; - struct yoda_btl *ybtl; - int put_via_send; - mca_btl_base_registration_handle_t *local_handle = NULL, *remote_handle = NULL; - - /* If nothing to put its OK.*/ - if (0 >= size) { - return OSHMEM_SUCCESS; - } - - /* Find bml_btl and its global btl_id */ - bml_btl = get_next_btl(dst, &btl_id); - if (!bml_btl) { - SPML_ERROR("cannot reach %d pe: no appropriate btl found", oshmem_my_proc_id()); - rc = OSHMEM_ERR_FATAL; - goto exit_fatal; - } - /* Check if btl has PUT method. If it doesn't - use SEND*/ - put_via_send = !(bml_btl->btl->btl_flags & MCA_BTL_FLAGS_PUT); - - /* Get rkey of remote PE (dst proc) which must be on memheap*/ - r_mkey = mca_memheap_base_get_cached_mkey(dst, dst_addr, btl_id, &rva); - if (!r_mkey) { - SPML_ERROR("pe=%d: %p is not address of shared variable", - dst, dst_addr); - rc = OSHMEM_ERR_FATAL; - goto exit_fatal; - } - -#if SPML_YODA_DEBUG == 1 - SPML_VERBOSE(100, "put: pe:%d dst=%p <- src: %p sz=%d. dst_rva=%p, %s", - dst, dst_addr, src_addr, (int)size, (void *)rva, mca_spml_base_mkey2str(r_mkey)); -#endif - - ybtl = &mca_spml_yoda.btl_type_map[btl_id]; - - if (ybtl->btl->btl_register_mem) { - assert (r_mkey->len == ybtl->btl->btl_registration_handle_size); - remote_handle = (mca_btl_base_registration_handle_t *) r_mkey->u.data; - } - - /* check if we doing put into shm attached segment and if so - * just do memcpy - */ - if ((YODA_BTL_SM == ybtl->btl_type || YODA_BTL_VADER == ybtl->btl_type) - && mca_memheap_base_can_local_copy(r_mkey, dst_addr)) { - memcpy((void *) (unsigned long) rva, src_addr, size); - return OSHMEM_SUCCESS; - } - - /* We support only blocking PUT now => we always need copy for src buffer*/ - calc_nfrags_put (bml_btl, size, &frag_size, &nfrags, put_via_send); - - p_src = (char*) src_addr; - p_dst = (char*) (unsigned long) rva; - for (i = 0; i < nfrags; i++) { - /* Allocating send request from free list */ - putreq = mca_spml_yoda_putreq_alloc(dst); - frag = &putreq->put_frag; - ncopied = i < nfrags - 1 ? frag_size :(unsigned) ((char *) src_addr + size - p_src); - - /* Preparing source buffer */ - - /* allocate buffer */ - mca_spml_yoda_bml_alloc(bml_btl, - &des, - MCA_BTL_NO_ORDER, - ncopied, - MCA_BTL_DES_SEND_ALWAYS_CALLBACK, - put_via_send); - - if (OPAL_UNLIKELY(!des || !des->des_segments )) { - SPML_ERROR("src=%p nfrags = %d frag_size=%d", - src_addr, nfrags, frag_size); - SPML_ERROR("shmem OOM error need %d bytes", ncopied); - opal_show_help("help-oshmem-spml-yoda.txt", - "internal_oom_error", - true, - "Put", ncopied, mca_spml_yoda.bml_alloc_threshold); - rc = OSHMEM_ERR_FATAL; - goto exit_fatal; - } - - /* copy data to allocated buffer*/ - segment = des->des_segments; - spml_yoda_prepare_for_put((void*)segment->seg_addr.pval, ncopied, - (void*)p_src, (void*)p_dst, put_via_send); - - if (!put_via_send && ybtl->btl->btl_register_mem) { - local_handle = ybtl->btl->btl_register_mem (ybtl->btl, bml_btl->btl_endpoint, - segment->seg_addr.pval, ncopied, 0); - if (NULL == local_handle) { - /* No free resources, Block on completion here */ - SPML_ERROR("shmem error: OSHMEM_ERR_OUT_OF_RESOURCE"); - oshmem_request_wait_completion(&putreq->req_put.req_base.req_oshmem); - } - } - - frag->rdma_segs[0].base_seg.seg_addr.lval = (uintptr_t) p_dst; - frag->rdma_segs[0].base_seg.seg_len = (put_via_send ? - ncopied + SPML_YODA_SEND_CONTEXT_SIZE : - ncopied); - frag->rdma_req = putreq; - - /* initialize callback data for put*/ - des->des_cbdata = frag; - des->des_cbfunc = mca_spml_yoda_put_completion; - - OPAL_THREAD_ADD32(&mca_spml_yoda.n_active_puts, 1); - /* put the data to remote side */ - if (!put_via_send) { - rc = mca_bml_base_put (bml_btl, segment->seg_addr.pval, (uint64_t) (intptr_t) p_dst, - local_handle, remote_handle, ncopied, 0, 0, mca_spml_yoda_put_completion_rdma, - des); - } else { - rc = mca_bml_base_send(bml_btl, des, MCA_SPML_YODA_PUT); - if (1 == rc) - rc = OSHMEM_SUCCESS; - } - - if (OPAL_UNLIKELY(OSHMEM_SUCCESS != rc)) { - if (OSHMEM_ERR_OUT_OF_RESOURCE == rc) { - /* No free resources, Block on completion here */ - SPML_ERROR("shmem error: OSHMEM_ERR_OUT_OF_RESOURCE"); - oshmem_request_wait_completion(&putreq->req_put.req_base.req_oshmem); - } else { - SPML_ERROR("shmem error"); - } - /* exit with errro */ - SPML_ERROR("shmem error: ret = %i, send_pe = %i, dest_pe = %i", - rc, oshmem_my_proc_id(), dst); - rc = OSHMEM_ERR_FATAL; - goto exit_fatal; - } - p_src += ncopied; - p_dst += ncopied; - } - - return rc; - -exit_fatal: - if (OSHMEM_SUCCESS != rc) { - oshmem_shmem_abort(rc); - } - return rc; -} - -int mca_spml_yoda_put(void *dst_addr, size_t size, void *src_addr, int dst) -{ - return mca_spml_yoda_put_internal(dst_addr, size, src_addr, dst, 0); -} - -int mca_spml_yoda_put_nb(void* dst_addr, - size_t size, - void* src_addr, - int dst, - void **handle) -{ - UNREFERENCED_PARAMETER(handle); - - /* TODO: real nonblocking operation is needed - */ - return mca_spml_yoda_put_internal(dst_addr, size, src_addr, dst, 1); -} - -int mca_spml_yoda_fence(void) -{ - return mca_spml_yoda_fence_internal(0); -} - -int mca_spml_yoda_wait_gets(void) -{ - - while (0 < mca_spml_yoda.n_active_gets) { - opal_progress(); - } - return OSHMEM_SUCCESS; -} - - -int mca_spml_yoda_enable(bool enable) -{ - SPML_VERBOSE(50, "*** yoda ENABLED ****"); - if (false == enable) { - return OSHMEM_SUCCESS; - } - - OBJ_CONSTRUCT(&mca_spml_yoda.lock, opal_mutex_t); - - /** - *If we get here this is the SPML who get selected for the run. We - * should get ownership for the put and get requests list, and - * initialize them with the size of our own requests. - */ - - opal_free_list_init (&mca_spml_base_put_requests, - sizeof(mca_spml_yoda_put_request_t), - opal_cache_line_size, - OBJ_CLASS(mca_spml_yoda_put_request_t), - 0, - opal_cache_line_size, - mca_spml_yoda.free_list_num, - mca_spml_yoda.free_list_max, - mca_spml_yoda.free_list_inc, - NULL, 0, NULL, NULL, NULL); - - opal_free_list_init (&mca_spml_base_get_requests, - sizeof(mca_spml_yoda_get_request_t), - opal_cache_line_size, - OBJ_CLASS(mca_spml_yoda_get_request_t), - 0, - opal_cache_line_size, - mca_spml_yoda.free_list_num, - mca_spml_yoda.free_list_max, - mca_spml_yoda.free_list_inc, - NULL, 0, NULL, NULL, NULL); - - mca_spml_yoda.enabled = true; - - /* The following line resolves the issue with BTL tcp and SPML yoda. In this case the - * atomic_basic_lock(root_rank) function may behave as DoS attack on root_rank, since - * all the procceses will do shmem_int_get from root_rank. These calls would go through - * bml active messaging and will trigger replays in libevent on root rank. If the flag - * OPAL_ENVLOOP_ONCE is not set then libevent will continously progress constantly - * incoming events thus causing root_rank to stuck in libevent loop. - */ - opal_progress_set_event_flag(OPAL_EVLOOP_NONBLOCK | OPAL_EVLOOP_ONCE); - -#if OSHMEM_WAIT_COMPLETION_DEBUG == 1 - condition_dbg_init(); -#endif - - return OSHMEM_SUCCESS; -} - -int mca_spml_yoda_get_nb(void* src_addr, - size_t size, - void* dst_addr, - int src, - void **handle) -{ - /* TODO: real nonblocking operation is needed - */ - return mca_spml_yoda_get(src_addr, size, dst_addr, src); -} - -/** - * shmem_get reads data from a remote address - * in the symmetric heap via RDMA READ. - * Get operation: - * 1. Get the rkey to the remote address. - * 2. Allocate a get request. - * 3. Allocated a temporary pre-registered buffer - * to copy the data to. - * 4. Init the request descriptor with remote side - * data and local side data. - * 5. Read the remote buffer to a pre-registered - * buffer on the local PE using RDMA READ. - * 6. Copy the received data to dst_addr if an - * intermediate pre-register buffer was used. - * 7. Clear the request and return. - * - * src_addr - address on remote pe. - * size - the amount on bytes to be read. - * dst_addr - address on the local pe. - * src - the pe of remote process. - */ -int mca_spml_yoda_get(void* src_addr, size_t size, void* dst_addr, int src) -{ - int rc = OSHMEM_SUCCESS; - sshmem_mkey_t *r_mkey, *l_mkey; - void* rva; - unsigned ncopied = 0; - unsigned int frag_size = 0; - char *p_src, *p_dst; - int i; - int nfrags; - mca_bml_base_btl_t* bml_btl = NULL; - mca_btl_base_segment_t* segment; - mca_btl_base_descriptor_t* des = NULL; - mca_spml_yoda_rdma_frag_t* frag = NULL; - struct mca_spml_yoda_getreq_parent get_holder; - struct yoda_btl *ybtl; - int btl_id = 0; - int get_via_send; - mca_btl_base_registration_handle_t *local_handle, *remote_handle = NULL; - mca_spml_yoda_get_request_t* getreq = NULL; - - /*If nothing to get its OK.*/ - if (0 >= size) { - return rc; - } - - /* Find bml_btl and its global btl_id */ - bml_btl = get_next_btl(src, &btl_id); - if (!bml_btl) { - SPML_ERROR("cannot reach %d pe: no appropriate btl found", oshmem_my_proc_id()); - rc = OSHMEM_ERR_FATAL; - goto exit_fatal; - } - /* Check if btl has GET method. If it doesn't - use SEND*/ - get_via_send = ! ( (bml_btl->btl->btl_flags & (MCA_BTL_FLAGS_GET)) && - (bml_btl->btl->btl_flags & (MCA_BTL_FLAGS_PUT)) ); - - /* Get rkey of remote PE (src proc) which must be on memheap*/ - r_mkey = mca_memheap_base_get_cached_mkey(src, src_addr, btl_id, &rva); - if (!r_mkey) { - SPML_ERROR("pe=%d: %p is not address of shared variable", - src, src_addr); - rc = OSHMEM_ERR_FATAL; - goto exit_fatal; - } - -#if SPML_YODA_DEBUG == 1 - SPML_VERBOSE(100, "get: pe:%d src=%p -> dst: %p sz=%d. src_rva=%p, %s", - src, src_addr, dst_addr, (int)size, (void *)rva, mca_spml_base_mkey2str(r_mkey)); -#endif - - ybtl = &mca_spml_yoda.btl_type_map[btl_id]; - - if (ybtl->btl->btl_register_mem) { - assert(ybtl->btl->btl_registration_handle_size == r_mkey->len); - remote_handle = (mca_btl_base_registration_handle_t *) r_mkey->u.data; - } - - nfrags = 1; - - /* check if we doing get into shm attached segment and if so - * just do memcpy - */ - if ((YODA_BTL_SM == ybtl->btl_type || YODA_BTL_VADER == ybtl->btl_type) - && mca_memheap_base_can_local_copy(r_mkey, src_addr)) { - memcpy(dst_addr, (void *) rva, size); - /* must call progress here to avoid deadlock. Scenarion: - * pe1 pols pe2 via shm get. pe2 tries to get static variable from node one, which goes to sm btl - * In this case pe2 is stuck forever because pe1 never calls opal_progress. - * May be we do not need to call progress on every get() here but rather once in a while. - */ - opal_progress(); - return OSHMEM_SUCCESS; - } - - l_mkey = mca_memheap.memheap_get_local_mkey(dst_addr, - btl_id); - /* - * Need a copy if local memory has not been registered or - * we make GET via SEND - */ - frag_size = ncopied; - if ((NULL == l_mkey) || get_via_send) { - calc_nfrags_get (bml_btl, size, &frag_size, &nfrags, get_via_send); - } - - p_src = (char*) (unsigned long) rva; - p_dst = (char*) dst_addr; - get_holder.active_count = 0; - - for (i = 0; i < nfrags; i++) { - /** - * Allocating a get request from a pre-allocated - * and pre-registered free list. - */ - getreq = mca_spml_yoda_getreq_alloc(src); - assert(getreq); - getreq->p_dst = NULL; - frag = &getreq->get_frag; - getreq->parent = &get_holder; - - ncopied = i < nfrags - 1 ? frag_size :(unsigned) ((char *) dst_addr + size - p_dst); - frag->allocated = 0; - /* Prepare destination descriptor*/ - memcpy(&frag->rdma_segs[0].base_seg, - r_mkey->u.data, - r_mkey->len); - - frag->rdma_segs[0].base_seg.seg_len = (get_via_send ? ncopied + SPML_YODA_SEND_CONTEXT_SIZE : ncopied); - if (get_via_send) { - frag->use_send = 1; - frag->allocated = 1; - /** - * Allocate a temporary buffer on the local PE. - * The local buffer will store the data read - * from the remote address. - */ - mca_spml_yoda_bml_alloc(bml_btl, - &des, - MCA_BTL_NO_ORDER, - (int)frag_size, - MCA_BTL_DES_SEND_ALWAYS_CALLBACK, - get_via_send); - if (OPAL_UNLIKELY(!des || !des->des_segments)) { - SPML_ERROR("shmem OOM error need %d bytes", ncopied); - SPML_ERROR("src=%p nfrags = %d frag_size=%d", - src_addr, nfrags, frag_size); - rc = OSHMEM_ERR_FATAL; - goto exit_fatal; - } - - segment = des->des_segments; - spml_yoda_prepare_for_get((void*)segment->seg_addr.pval, ncopied, (void*)p_src, oshmem_my_proc_id(), (void*)p_dst, (void*) getreq); - des->des_cbfunc = mca_spml_yoda_get_response_completion; - des->des_cbdata = frag; - - OPAL_THREAD_ADD32(&mca_spml_yoda.n_active_gets, 1); - } - else { - /* - * Register src memory if do GET via GET - */ - if (NULL == l_mkey && ybtl->btl->btl_register_mem) { - local_handle = ybtl->btl->btl_register_mem (ybtl->btl, bml_btl->btl_endpoint, p_dst, ncopied, - MCA_BTL_REG_FLAG_LOCAL_WRITE); - - if (NULL == local_handle) { - SPML_ERROR("%s: failed to register destination memory %p.", - btl_type2str(ybtl->btl_type), p_dst); - } - - frag->local_handle = local_handle; - } else if (NULL == l_mkey) { - local_handle = NULL; - frag->local_handle = NULL; - } else { - local_handle = ((mca_spml_yoda_context_t*)l_mkey->spml_context)->registration; - frag->local_handle = NULL; - } - - frag->rdma_segs[0].base_seg.seg_addr.lval = (uintptr_t) p_src; - getreq->p_dst = (uint64_t*) p_dst; - frag->size = ncopied; - - OPAL_THREAD_ADD32(&mca_spml_yoda.n_active_gets, 1); - } - - /** - * Initialize the remote data fragment - * with remote address data required for - * executing RDMA READ from a remote buffer. - */ - - frag->rdma_req = getreq; - - /** - * Do GET operation - */ - if (get_via_send) { - rc = mca_bml_base_send(bml_btl, des, MCA_SPML_YODA_GET); - if (1 == rc) - rc = OSHMEM_SUCCESS; - } else { - rc = mca_bml_base_get(bml_btl, p_dst, (uint64_t) (intptr_t) p_src, local_handle, - remote_handle, ncopied, 0, 0, mca_spml_yoda_get_completion, frag); - } - - if (OPAL_UNLIKELY(OSHMEM_SUCCESS != rc)) { - if (OSHMEM_ERR_OUT_OF_RESOURCE == rc) { - /* No free resources, Block on completion here */ - oshmem_request_wait_completion(&getreq->req_get.req_base.req_oshmem); - return OSHMEM_SUCCESS; - } else { - SPML_ERROR("oshmem_get: error %d", rc); - goto exit_fatal; - } - } - p_dst += ncopied; - p_src += ncopied; - OPAL_THREAD_ADD32(&get_holder.active_count, 1); - } - - /* revisit if we really need this for self and sm */ - /* if (YODA_BTL_SELF == ybtl->btl_type) */ - opal_progress(); - - /* Wait for completion on request */ - while (get_holder.active_count > 0) - oshmem_request_wait_completion(&getreq->req_get.req_base.req_oshmem); - - return rc; - -exit_fatal: - if (OSHMEM_SUCCESS != rc) { - oshmem_shmem_abort(rc); - } - return rc; -} - -int mca_spml_yoda_send(void* buf, - size_t size, - int dst, - mca_spml_base_put_mode_t sendmode) -{ - int rc = OSHMEM_SUCCESS; - - rc = MCA_PML_CALL(send(buf, - size, - &(ompi_mpi_unsigned_char.dt), - dst, - 0, - (mca_pml_base_send_mode_t)sendmode, - &(ompi_mpi_comm_world.comm))); - - return rc; -} - -int mca_spml_yoda_recv(void* buf, size_t size, int src) -{ - int rc = OSHMEM_SUCCESS; - - rc = MCA_PML_CALL(recv(buf, - size, - &(ompi_mpi_unsigned_char.dt), - src, - 0, - &(ompi_mpi_comm_world.comm), - NULL)); - - return rc; -} - diff --git a/oshmem/mca/spml/yoda/spml_yoda.h b/oshmem/mca/spml/yoda/spml_yoda.h deleted file mode 100644 index 13c6cac4e56..00000000000 --- a/oshmem/mca/spml/yoda/spml_yoda.h +++ /dev/null @@ -1,150 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2013 Mellanox Technologies, Inc. - * All rights reserved. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights - * reserved. - * Copyright (c) 2016 Research Organization for Information Science - * and Technology (RIST). All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ -/** - * @file - */ - -#ifndef MCA_SPML_YODA_H -#define MCA_SPML_YODA_H - -#include "oshmem_config.h" -#include "oshmem/request/request.h" -#include "oshmem/mca/spml/spml.h" -#include "oshmem/util/oshmem_util.h" -#include "oshmem/mca/spml/base/spml_base_putreq.h" -#include "oshmem/proc/proc.h" -#include "oshmem/mca/spml/base/spml_base_request.h" -#include "oshmem/mca/spml/base/spml_base_getreq.h" - -#include "orte/runtime/orte_globals.h" - -#include "ompi/mca/bml/base/base.h" -#include "opal/mca/btl/btl.h" -#include "opal/class/opal_free_list.h" - -/* Turn ON/OFF debug output from build (default 0) */ -#ifndef OSHMEM_WAIT_COMPLETION_DEBUG -#define OSHMEM_WAIT_COMPLETION_DEBUG 0 -#endif - -#define MCA_SPML_YODA_PUT (MCA_BTL_TAG_USR + 0x0A) -#define MCA_SPML_YODA_GET (MCA_BTL_TAG_USR + 0x0B) -#define MCA_SPML_YODA_GET_RESPONSE (MCA_BTL_TAG_USR + 0x0C) - -#define SPML_YODA_SEND_CONTEXT_SIZE (sizeof(size_t) + 3*sizeof(void*) + sizeof(int)) -BEGIN_C_DECLS - -/** - * YODA SPML module - */ - -enum { - YODA_BTL_UNKNOWN = -1, - YODA_BTL_SELF = 0, - YODA_BTL_SM, - YODA_BTL_OPENIB, - YODA_BTL_VADER, - YODA_BTL_UGNI, - YODA_BTL_MAX -}; - -struct yoda_btl { - mca_btl_base_module_t *btl; - mca_bml_base_btl_t *bml_btl; - int btl_type; - int use_cnt; -}; - -struct mca_spml_yoda_t { - mca_spml_base_module_t super; - - int priority; - int free_list_num; /* initial size of free list */ - int free_list_max; /* maximum size of free list */ - int free_list_inc; /* number of elements to grow free list */ - int bml_alloc_threshold; /* number of puts to wait - in case of put/get temporary buffer allocation failture */ - - /* lock queue access */ - opal_mutex_t lock; - - /* free lists */ - opal_free_list_t rdma_frags; - /* number of outstanding put requests */ - int32_t n_active_puts; - int32_t n_active_gets; - bool enabled; - struct yoda_btl *btl_type_map; - int n_btls; -}; -typedef struct mca_spml_yoda_t mca_spml_yoda_module_t; - -struct mca_spml_yoda_context_t { - mca_btl_base_descriptor_t* btl_src_descriptor; - mca_btl_base_registration_handle_t *registration; -}; -typedef struct mca_spml_yoda_context_t mca_spml_yoda_context_t; - -extern mca_spml_yoda_module_t mca_spml_yoda; - -extern int mca_spml_yoda_enable(bool enable); -extern int mca_spml_yoda_get(void* dst_addr, - size_t size, - void* src_addr, - int src); -extern int mca_spml_yoda_get_nb(void* dst_addr, - size_t size, - void* src_addr, - int dst, - void **handle); -extern int mca_spml_yoda_put(void* dst_addr, - size_t size, - void* src_addr, - int dst); -extern int mca_spml_yoda_put_nb(void* dst_addr, - size_t size, - void* src_addr, - int dst, - void **handle); -extern int mca_spml_yoda_recv(void* buf, size_t size, int src); -extern int mca_spml_yoda_send(void* buf, - size_t size, - int dst, - mca_spml_base_put_mode_t mode); -extern sshmem_mkey_t *mca_spml_yoda_register(void* addr, - size_t size, - uint64_t shmid, - int *count); -extern int mca_spml_yoda_deregister(sshmem_mkey_t *mkeys); -extern int mca_spml_yoda_add_procs(ompi_proc_t** procs, - size_t nprocs); -extern int mca_spml_yoda_del_procs(ompi_proc_t** procs, - size_t nprocs); -extern int mca_spml_yoda_fence(void); -extern void* mca_spml_yoda_get_remote_context(void*); -extern void mca_spml_yoda_set_remote_context(void**, void*); -extern int mca_spml_yoda_get_remote_context_size(void*); -extern void mca_spml_yoda_set_remote_context_size(void**, int); -extern int mca_spml_yoda_wait_gets(void); - -#if OSHMEM_WAIT_COMPLETION_DEBUG == 1 -extern void condition_dbg_init(void); -extern void condition_dbg_finalize(void); -#endif - -END_C_DECLS - -#endif - diff --git a/oshmem/mca/spml/yoda/spml_yoda_component.c b/oshmem/mca/spml/yoda/spml_yoda_component.c deleted file mode 100644 index 26f67fbc391..00000000000 --- a/oshmem/mca/spml/yoda/spml_yoda_component.c +++ /dev/null @@ -1,140 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2013 Mellanox Technologies, Inc. - * All rights reserved. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights - * reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#include "oshmem_config.h" -#include "oshmem/runtime/params.h" -#include "oshmem/mca/spml/spml.h" -#include "spml_yoda_component.h" -#include "oshmem/mca/spml/yoda/spml_yoda_rdmafrag.h" -#include "oshmem/mca/spml/yoda/spml_yoda_putreq.h" -#include "oshmem/mca/spml/yoda/spml_yoda.h" - -static int mca_spml_yoda_component_register(void); -static int mca_spml_yoda_component_open(void); -static int mca_spml_yoda_component_close(void); -static mca_spml_base_module_t* -mca_spml_yoda_component_init(int* priority, - bool enable_progress_threads, - bool enable_mpi_threads); -static int mca_spml_yoda_component_fini(void); - -mca_spml_base_component_2_0_0_t mca_spml_yoda_component = { - - /* First, the mca_base_component_t struct containing meta - information about the component itself */ - - .spmlm_version = { - MCA_SPML_BASE_VERSION_2_0_0, - - .mca_component_name = "yoda", - MCA_BASE_MAKE_VERSION(component, OSHMEM_MAJOR_VERSION, OSHMEM_MINOR_VERSION, - OSHMEM_RELEASE_VERSION), - .mca_open_component = mca_spml_yoda_component_open, - .mca_close_component = mca_spml_yoda_component_close, - .mca_register_component_params = mca_spml_yoda_component_register, - }, - .spmlm_data = { - /* The component is checkpoint ready */ - MCA_BASE_METADATA_PARAM_CHECKPOINT - }, - - .spmlm_init = mca_spml_yoda_component_init, - .spmlm_finalize = mca_spml_yoda_component_fini, -}; - -static inline void mca_spml_yoda_param_register_int(const char *param_name, - int default_value, - const char *help_msg, - int *storage) -{ - *storage = default_value; - (void) mca_base_component_var_register(&mca_spml_yoda_component.spmlm_version, - param_name, - help_msg, - MCA_BASE_VAR_TYPE_INT, NULL, 0, MCA_BASE_VAR_FLAG_SETTABLE, - OPAL_INFO_LVL_9, - MCA_BASE_VAR_SCOPE_READONLY, - storage); -} - -static int mca_spml_yoda_component_register(void) -{ - mca_spml_yoda_param_register_int("free_list_num", 1024, - 0, - &mca_spml_yoda.free_list_num); - mca_spml_yoda_param_register_int("free_list_max", 1024, - 0, - &mca_spml_yoda.free_list_max); - mca_spml_yoda_param_register_int("free_list_inc", 16, - 0, - &mca_spml_yoda.free_list_inc); - mca_spml_yoda_param_register_int("bml_alloc_threshold", 3, - "number of puts to wait \ - in case of put/get temporary buffer \ - allocation failture", - &mca_spml_yoda.bml_alloc_threshold); - mca_spml_yoda_param_register_int("priority", 10, - "[integer] yoda priority", - &mca_spml_yoda.priority); - return OSHMEM_SUCCESS; -} - -static int mca_spml_yoda_component_open(void) -{ - return OSHMEM_SUCCESS; -} - -static int mca_spml_yoda_component_close(void) -{ - return OSHMEM_SUCCESS; -} - -static mca_spml_base_module_t* -mca_spml_yoda_component_init(int* priority, - bool enable_progress_threads, - bool enable_mpi_threads) -{ - SPML_VERBOSE( 10, "in yoda, my priority is %d\n", mca_spml_yoda.priority); - - *priority = mca_spml_yoda.priority; - if ((*priority) > mca_spml_yoda.priority) { - return NULL ; - } - - /* We use BML/BTL and need to start it */ - if (!mca_bml_base_inited()) { - SPML_VERBOSE(10, "can not select yoda because ompi has no bml component"); - return NULL; - } - - mca_spml_yoda.n_active_puts = 0; - mca_spml_yoda.n_active_gets = 0; - - return &mca_spml_yoda.super; -} - -int mca_spml_yoda_component_fini(void) -{ - if (!mca_spml_yoda.enabled) { - return OSHMEM_SUCCESS; /* never selected.. return success.. */ - } - mca_spml_yoda.enabled = false; /* not anymore */ - - OBJ_DESTRUCT(&mca_spml_yoda.lock); -#if OSHMEM_WAIT_COMPLETION_DEBUG == 1 - condition_dbg_finalize(); -#endif - - return OSHMEM_SUCCESS; -} - diff --git a/oshmem/mca/spml/yoda/spml_yoda_component.h b/oshmem/mca/spml/yoda/spml_yoda_component.h deleted file mode 100644 index 01c3c089526..00000000000 --- a/oshmem/mca/spml/yoda/spml_yoda_component.h +++ /dev/null @@ -1,25 +0,0 @@ -/* - * Copyright (c) 2013 Mellanox Technologies, Inc. - * All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ -/** - * @file - */ - -#ifndef MCA_SPML_YODA_COMPONENT_H -#define MCA_SPML_YODA_COMPONENT_H - -BEGIN_C_DECLS - -/* - * SPML module functions. - */ -OSHMEM_MODULE_DECLSPEC extern mca_spml_base_component_2_0_0_t mca_spml_yoda_component; -END_C_DECLS - -#endif diff --git a/oshmem/mca/spml/yoda/spml_yoda_getreq.c b/oshmem/mca/spml/yoda/spml_yoda_getreq.c deleted file mode 100644 index 657beb15a62..00000000000 --- a/oshmem/mca/spml/yoda/spml_yoda_getreq.c +++ /dev/null @@ -1,128 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2013 Mellanox Technologies, Inc. - * All rights reserved. - * Copyright (c) 2014 Research Organization for Information Science - * and Technology (RIST). All rights reserved. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights - * reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#include "oshmem_config.h" -#include "opal/prefetch.h" -#include "oshmem/constants.h" -#include "oshmem/mca/spml/spml.h" -#include "opal/mca/btl/btl.h" -#include "orte/mca/errmgr/errmgr.h" -#include "opal/mca/mpool/mpool.h" -#include "ompi/mca/bml/base/base.h" -#include "oshmem/mca/spml/yoda/spml_yoda.h" -#include "oshmem/mca/spml/yoda/spml_yoda_putreq.h" -#include "oshmem/mca/spml/yoda/spml_yoda_getreq.h" -#include "oshmem/mca/spml/yoda/spml_yoda_rdmafrag.h" - -/* - * The free call mark the final stage in a request life-cycle. Starting from this - * point the request is completed at both SPML and user level, and can be used - * for others one sided communications. Therefore, in the case of the YODA SPML it should - * be added to the free request list. - */ -static int mca_spml_yoda_get_request_free(struct oshmem_request_t** request) -{ - mca_spml_yoda_get_request_t* getreq = - *(mca_spml_yoda_get_request_t**) request; - - assert( false == getreq->req_get.req_base.req_free_called); - - OPAL_THREAD_LOCK(&oshmem_request_lock); - getreq->req_get.req_base.req_free_called = true; - - opal_free_list_return (&mca_spml_base_get_requests, - (opal_free_list_item_t*)getreq); - - OPAL_THREAD_UNLOCK(&oshmem_request_lock); - - *request = SHMEM_REQUEST_NULL; /*MPI_REQUEST_NULL;*/ - return OSHMEM_SUCCESS; -} - -static int mca_spml_yoda_get_request_cancel(struct oshmem_request_t* request, - int complete) -{ - /* we dont cancel get requests by now */ - return OSHMEM_SUCCESS; -} - -static void mca_spml_yoda_get_request_construct(mca_spml_yoda_get_request_t* req) -{ - req->req_get.req_base.req_type = MCA_SPML_REQUEST_GET; - req->req_get.req_base.req_oshmem.req_free = mca_spml_yoda_get_request_free; - req->req_get.req_base.req_oshmem.req_cancel = - mca_spml_yoda_get_request_cancel; -} - -static void mca_spml_yoda_get_request_destruct(mca_spml_yoda_get_request_t* req) -{ -} - -OBJ_CLASS_INSTANCE( mca_spml_yoda_get_request_t, - mca_spml_base_get_request_t, - mca_spml_yoda_get_request_construct, - mca_spml_yoda_get_request_destruct); - -void mca_spml_yoda_get_completion (struct mca_btl_base_module_t* module, - struct mca_btl_base_endpoint_t* endpoint, - void *local_address, - struct mca_btl_base_registration_handle_t *local_handle, - void *context, void *cbdata, int status) -{ - mca_spml_yoda_rdma_frag_t* frag = - (mca_spml_yoda_rdma_frag_t*) cbdata; - mca_spml_yoda_get_request_t* getreq = - (mca_spml_yoda_get_request_t*) frag->rdma_req; - mca_bml_base_btl_t* bml_btl = (mca_bml_base_btl_t*) context; - - /* check completion status */ - if (OPAL_UNLIKELY(OPAL_SUCCESS != status)) { - /* shmem has no way to propagate errors. cry&die */ - SPML_ERROR("FATAL get completion error"); - abort(); - } - - if (getreq->parent) { - OPAL_THREAD_ADD32(&getreq->parent->active_count, -1); - } - getreq->req_get.req_base.req_spml_complete = true; - oshmem_request_complete(&getreq->req_get.req_base.req_oshmem, 1); - oshmem_request_free((oshmem_request_t**) &getreq); - - if (bml_btl->btl->btl_register_mem && frag->local_handle) { - bml_btl->btl->btl_deregister_mem (bml_btl->btl, frag->local_handle); - } - - OPAL_THREAD_ADD32(&mca_spml_yoda.n_active_gets, -1); -} - -void mca_spml_yoda_get_response_completion(mca_btl_base_module_t* btl, - struct mca_btl_base_endpoint_t* ep, - struct mca_btl_base_descriptor_t* des, - int status) -{ - mca_bml_base_btl_t* bml_btl = (mca_bml_base_btl_t*) des->des_context; - - /* check completion status */ - if (OPAL_UNLIKELY(OSHMEM_SUCCESS != status)) { - /* shmem has no way to propagate errors. cry&die */ - SPML_ERROR("FATAL get completion error"); - abort(); - } - - mca_bml_base_free(bml_btl, des); - - OPAL_THREAD_ADD32(&mca_spml_yoda.n_active_gets, -1); -} diff --git a/oshmem/mca/spml/yoda/spml_yoda_getreq.h b/oshmem/mca/spml/yoda/spml_yoda_getreq.h deleted file mode 100644 index 765f2e3df95..00000000000 --- a/oshmem/mca/spml/yoda/spml_yoda_getreq.h +++ /dev/null @@ -1,70 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2013 Mellanox Technologies, Inc. - * All rights reserved. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights - * reserved. - * Copyright (c) 2015 Research Organization for Information Science - * and Technology (RIST). All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef OSHMEM_SPML_YODA_GET_REQUEST_H -#define OSHMEM_SPML_YODA_GET_REQUEST_H - -#include "opal/mca/btl/btl.h" -#include "oshmem/mca/spml/base/spml_base_putreq.h" -#include "opal/mca/mpool/base/base.h" -#include "ompi/mca/bml/bml.h" -#include "oshmem/mca/spml/yoda/spml_yoda_rdmafrag.h" -#include "oshmem/mca/spml/yoda/spml_yoda.h" -#include "orte/runtime/orte_globals.h" -#include "oshmem/mca/spml/base/spml_base_getreq.h" - -BEGIN_C_DECLS - -struct mca_spml_yoda_getreq_parent { - int32_t active_count; -}; - -struct mca_spml_yoda_get_request_t { - mca_spml_base_get_request_t req_get; - uint64_t *p_dst; - struct mca_spml_yoda_getreq_parent *parent; - mca_spml_yoda_rdma_frag_t get_frag; -}; - -typedef struct mca_spml_yoda_get_request_t mca_spml_yoda_get_request_t; -OBJ_CLASS_DECLARATION(mca_spml_yoda_get_request_t); - -static inline mca_spml_yoda_get_request_t *mca_spml_yoda_getreq_alloc(int dst) -{ - opal_free_list_item_t *item; - mca_spml_yoda_get_request_t *getreq; - - item = opal_free_list_wait (&mca_spml_base_get_requests); - getreq = (mca_spml_yoda_get_request_t*) item; - assert(getreq); - getreq->req_get.req_base.req_free_called = false; - getreq->req_get.req_base.req_oshmem.req_complete = false; - - return getreq; -} - -void mca_spml_yoda_get_completion (struct mca_btl_base_module_t* module, - struct mca_btl_base_endpoint_t* endpoint, - void *local_address, - struct mca_btl_base_registration_handle_t *local_handle, - void *context, void *cbdata, int status); - -void mca_spml_yoda_get_response_completion(mca_btl_base_module_t* btl, - struct mca_btl_base_endpoint_t* ep, - struct mca_btl_base_descriptor_t* des, - int status); - -END_C_DECLS -#endif /* OSHMEM_SPML_YODA_GET_REQUEST_H */ diff --git a/oshmem/mca/spml/yoda/spml_yoda_putreq.c b/oshmem/mca/spml/yoda/spml_yoda_putreq.c deleted file mode 100644 index c1dca770898..00000000000 --- a/oshmem/mca/spml/yoda/spml_yoda_putreq.c +++ /dev/null @@ -1,113 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2013 Mellanox Technologies, Inc. - * All rights reserved. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights - * reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#include "oshmem_config.h" -#include "opal/prefetch.h" -#include "oshmem/constants.h" -#include "oshmem/mca/spml/spml.h" -#include "opal/mca/btl/btl.h" -#include "orte/mca/errmgr/errmgr.h" -#include "opal/mca/mpool/mpool.h" -#include "ompi/mca/bml/base/base.h" -#include "oshmem/mca/spml/yoda/spml_yoda.h" -#include "oshmem/mca/spml/yoda/spml_yoda_putreq.h" -#include "oshmem/mca/spml/yoda/spml_yoda_rdmafrag.h" -#include "oshmem/runtime/runtime.h" -/* - * The free call mark the final stage in a request life-cycle. Starting from this - * point the request is completed at both SPML and user level, and can be used - * for others p2p communications. Therefore, in the case of the YODA SPML it should - * be added to the free request list. - */ -static int mca_spml_yoda_put_request_free(struct oshmem_request_t** request) -{ - mca_spml_yoda_put_request_t* putreq = - *(mca_spml_yoda_put_request_t**) request; - - assert( false == putreq->req_put.req_base.req_free_called); - - OPAL_THREAD_LOCK(&oshmem_request_lock); - putreq->req_put.req_base.req_free_called = true; - opal_free_list_return (&mca_spml_base_put_requests, - (opal_free_list_item_t*)putreq); - OPAL_THREAD_UNLOCK(&oshmem_request_lock); - - *request = SHMEM_REQUEST_NULL; - return OSHMEM_SUCCESS; -} - -static int mca_spml_yoda_put_request_cancel(struct oshmem_request_t* request, - int complete) -{ - /* we dont cancel put requests by now */ - return OSHMEM_SUCCESS; -} - -static void mca_spml_yoda_put_request_construct(mca_spml_yoda_put_request_t* req) -{ - req->req_put.req_base.req_type = MCA_SPML_REQUEST_PUT; - req->req_put.req_base.req_oshmem.req_free = mca_spml_yoda_put_request_free; - req->req_put.req_base.req_oshmem.req_cancel = - mca_spml_yoda_put_request_cancel; -} - -static void mca_spml_yoda_put_request_destruct(mca_spml_yoda_put_request_t* req) -{ -} - -OBJ_CLASS_INSTANCE( mca_spml_yoda_put_request_t, - mca_spml_base_put_request_t, - mca_spml_yoda_put_request_construct, - mca_spml_yoda_put_request_destruct); - -void mca_spml_yoda_put_completion(mca_btl_base_module_t* btl, - struct mca_btl_base_endpoint_t* ep, - struct mca_btl_base_descriptor_t* des, - int status) -{ - mca_spml_yoda_rdma_frag_t* frag = - (mca_spml_yoda_rdma_frag_t*) des->des_cbdata; - mca_spml_yoda_put_request_t* putreq = - (mca_spml_yoda_put_request_t*) frag->rdma_req; - mca_bml_base_btl_t* bml_btl = (mca_bml_base_btl_t*) des->des_context; - - OPAL_THREAD_ADD32(&mca_spml_yoda.n_active_puts, -1); - /* check completion status */ - if (OPAL_UNLIKELY(OSHMEM_SUCCESS != status)) { - /* no way to propagete errors. die */ - SPML_ERROR("FATAL put completion error"); - oshmem_shmem_abort(-1); - } - - putreq->req_put.req_base.req_spml_complete = true; - oshmem_request_complete(&putreq->req_put.req_base.req_oshmem, 1); - oshmem_request_free((oshmem_request_t**) &putreq); - mca_bml_base_free(bml_btl, des); -} - -void mca_spml_yoda_put_completion_rdma (struct mca_btl_base_module_t* module, - struct mca_btl_base_endpoint_t* endpoint, - void *local_address, - struct mca_btl_base_registration_handle_t *local_handle, - void *context, void *cbdata, int status) -{ - mca_btl_base_descriptor_t *des = (mca_btl_base_descriptor_t *) cbdata; - mca_bml_base_btl_t *bml_btl = (mca_bml_base_btl_t *) context; - des->des_context = context; - - if (bml_btl->btl->btl_register_mem) { - bml_btl->btl->btl_deregister_mem (bml_btl->btl, local_handle); - } - - des->des_cbfunc (module, endpoint, des, status); -} diff --git a/oshmem/mca/spml/yoda/spml_yoda_putreq.h b/oshmem/mca/spml/yoda/spml_yoda_putreq.h deleted file mode 100644 index 9bdb1b86511..00000000000 --- a/oshmem/mca/spml/yoda/spml_yoda_putreq.h +++ /dev/null @@ -1,63 +0,0 @@ -/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* - * Copyright (c) 2013 Mellanox Technologies, Inc. - * All rights reserved. - * Copyright (c) 2015 Los Alamos National Security, LLC. All rights - * reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -#ifndef OSHMEM_SPML_YODA_PUT_REQUEST_H -#define OSHMEM_SPML_YODA_PUT_REQUEST_H - -#include "opal/mca/btl/btl.h" -#include "oshmem/mca/spml/base/base.h" -#include "oshmem/mca/spml/base/spml_base_putreq.h" -#include "opal/mca/mpool/base/base.h" -#include "ompi/mca/bml/bml.h" -#include "oshmem/mca/spml/yoda/spml_yoda_rdmafrag.h" -#include "oshmem/mca/spml/yoda/spml_yoda.h" -#include "orte/runtime/orte_globals.h" - -BEGIN_C_DECLS - -struct mca_spml_yoda_put_request_t { - mca_spml_base_put_request_t req_put; - mca_spml_yoda_rdma_frag_t put_frag; -}; - -typedef struct mca_spml_yoda_put_request_t mca_spml_yoda_put_request_t; - -OBJ_CLASS_DECLARATION(mca_spml_yoda_put_request_t); - -static inline mca_spml_yoda_put_request_t *mca_spml_yoda_putreq_alloc(int dst) { - opal_free_list_item_t *item; - mca_spml_yoda_put_request_t *putreq; - - item = opal_free_list_wait (&mca_spml_base_put_requests); - putreq = (mca_spml_yoda_put_request_t*) item; - assert(putreq); - putreq->req_put.req_base.req_free_called = false; - putreq->req_put.req_base.req_oshmem.req_complete = false; - - return putreq; -} - -void mca_spml_yoda_put_completion(mca_btl_base_module_t* btl, - struct mca_btl_base_endpoint_t* ep, - struct mca_btl_base_descriptor_t* des, - int status); - -void mca_spml_yoda_put_completion_rdma (struct mca_btl_base_module_t* module, - struct mca_btl_base_endpoint_t* endpoint, - void *local_address, - struct mca_btl_base_registration_handle_t *local_handle, - void *context, void *cbdata, int status); - -END_C_DECLS - -#endif /* OSHMEM_SPML_YODA_PUT_REQUEST_H */ diff --git a/oshmem/mca/spml/yoda/spml_yoda_rdmafrag.h b/oshmem/mca/spml/yoda/spml_yoda_rdmafrag.h deleted file mode 100644 index d04067521ce..00000000000 --- a/oshmem/mca/spml/yoda/spml_yoda_rdmafrag.h +++ /dev/null @@ -1,45 +0,0 @@ -/* - * Copyright (c) 2013 Mellanox Technologies, Inc. - * All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ -/** - * @file - */ - -#ifndef MCA_SPML_YODA_RDMAFRAG_H -#define MCA_SPML_YODA_RDMAFRAG_H - -#include "opal/mca/btl/btl.h" -#include "opal/types.h" -#include "opal/util/arch.h" -#include "oshmem/proc/proc.h" - -BEGIN_C_DECLS - -typedef enum { - MCA_SPML_YODA_RDMA_PUT, - MCA_SPML_YODA_RDMA_GET -} mca_spml_yoda_rdma_state_t; - -typedef union mca_spml_yoda_segment_t { - mca_btl_base_segment_t base_seg; -} mca_spml_yoda_segment_t; - -struct mca_spml_yoda_rdma_frag_t { - mca_spml_yoda_segment_t rdma_segs[2]; - mca_btl_base_registration_handle_t *local_handle; - void *rdma_req; - int allocated; - int use_send; - int size; -}; - -typedef struct mca_spml_yoda_rdma_frag_t mca_spml_yoda_rdma_frag_t; -END_C_DECLS -#endif - diff --git a/oshmem/mca/sshmem/mmap/Makefile.am b/oshmem/mca/sshmem/mmap/Makefile.am index b9550da435d..5529cdd7f08 100644 --- a/oshmem/mca/sshmem/mmap/Makefile.am +++ b/oshmem/mca/sshmem/mmap/Makefile.am @@ -1,5 +1,6 @@ # Copyright (c) 2014 Mellanox Technologies, Inc. # All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -30,6 +31,7 @@ mcacomponentdir = $(oshmemlibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_sshmem_mmap_la_SOURCES = $(sources) mca_sshmem_mmap_la_LDFLAGS = -module -avoid-version +mca_sshmem_mmap_la_LIBADD = $(top_builddir)/oshmem/liboshmem.la noinst_LTLIBRARIES = $(component_noinst) libmca_sshmem_mmap_la_SOURCES =$(sources) diff --git a/oshmem/mca/sshmem/sysv/Makefile.am b/oshmem/mca/sshmem/sysv/Makefile.am index c458b5a9c6b..dd677a8d3c6 100644 --- a/oshmem/mca/sshmem/sysv/Makefile.am +++ b/oshmem/mca/sshmem/sysv/Makefile.am @@ -1,6 +1,7 @@ # Copyright (c) 2014 Mellanox Technologies, Inc. # All rights reserved. # Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -31,6 +32,7 @@ mcacomponentdir = $(oshmemlibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_sshmem_sysv_la_SOURCES = $(sources) mca_sshmem_sysv_la_LDFLAGS = -module -avoid-version +mca_sshmem_sysv_la_LIBADD = $(top_builddir)/oshmem/liboshmem.la noinst_LTLIBRARIES = $(component_noinst) libmca_sshmem_sysv_la_SOURCES = $(sources) diff --git a/oshmem/mca/sshmem/sysv/sshmem_sysv_module.c b/oshmem/mca/sshmem/sysv/sshmem_sysv_module.c index dec0bee0bee..a1d112da7d9 100644 --- a/oshmem/mca/sshmem/sysv/sshmem_sysv_module.c +++ b/oshmem/mca/sshmem/sysv/sshmem_sysv_module.c @@ -115,6 +115,7 @@ segment_create(map_segment_t *ds_buf, void *addr = NULL; int shmid = MAP_SEGMENT_SHM_INVALID; int flags; + int try_hp; assert(ds_buf); @@ -129,14 +130,28 @@ segment_create(map_segment_t *ds_buf, * real_size here */ flags = IPC_CREAT | IPC_EXCL | S_IRUSR | S_IWUSR; + try_hp = mca_sshmem_sysv_component.use_hp; #if defined (SHM_HUGETLB) - flags |= ((0 != mca_sshmem_sysv_component.use_hp) ? SHM_HUGETLB : 0); + flags |= ((0 != try_hp) ? SHM_HUGETLB : 0); size = ((size + sshmem_sysv_gethugepagesize() - 1) / sshmem_sysv_gethugepagesize()) * sshmem_sysv_gethugepagesize(); #endif /* Create a new shared memory segment and save the shmid. */ +retry_alloc: shmid = shmget(IPC_PRIVATE, size, flags); if (shmid == MAP_SEGMENT_SHM_INVALID) { + /* hugepage alloc was set to auto. Hopefully it failed because there are no + * enough hugepages on the system. Turn it off and retry. + */ + if (-1 == try_hp) { + OPAL_OUTPUT_VERBOSE( + (10, oshmem_sshmem_base_framework.framework_output, + "failed to allocate %llu bytes with huge pages. " + "Using regular pages", (unsigned long long)size)); + flags = IPC_CREAT | IPC_EXCL | S_IRUSR | S_IWUSR; + try_hp = 0; + goto retry_alloc; + } opal_show_help("help-oshmem-sshmem.txt", "create segment failure", true, diff --git a/oshmem/mca/sshmem/ucx/Makefile.am b/oshmem/mca/sshmem/ucx/Makefile.am index 2bc2205679f..bf3a08b547a 100644 --- a/oshmem/mca/sshmem/ucx/Makefile.am +++ b/oshmem/mca/sshmem/ucx/Makefile.am @@ -1,5 +1,6 @@ # Copyright (c) 2014 Mellanox Technologies, Inc. # All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -32,7 +33,8 @@ mcacomponentdir = $(oshmemlibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_sshmem_ucx_la_SOURCES = $(sources) mca_sshmem_ucx_la_LDFLAGS = -module -avoid-version $(sshmem_ucx_LDFLAGS) -mca_sshmem_ucx_la_LIBADD = $(sshmem_ucx_LIBS) +mca_sshmem_ucx_la_LIBADD = $(top_builddir)/oshmem/liboshmem.la \ + $(sshmem_ucx_LIBS) noinst_LTLIBRARIES = $(component_noinst) libmca_sshmem_ucx_la_SOURCES =$(sources) diff --git a/oshmem/mca/sshmem/verbs/Makefile.am b/oshmem/mca/sshmem/verbs/Makefile.am index 7ebde4d9a1d..cdbbf02e6ca 100644 --- a/oshmem/mca/sshmem/verbs/Makefile.am +++ b/oshmem/mca/sshmem/verbs/Makefile.am @@ -1,5 +1,6 @@ # Copyright (c) 2014 Mellanox Technologies, Inc. # All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -30,7 +31,8 @@ mcacomponentdir = $(oshmemlibdir) mcacomponent_LTLIBRARIES = $(component_install) mca_sshmem_verbs_la_SOURCES = $(sources) mca_sshmem_verbs_la_LDFLAGS = -module -avoid-version $(oshmem_verbs_LDFLAGS) -mca_sshmem_verbs_la_LIBADD = $(oshmem_verbs_LIBS) \ +mca_sshmem_verbs_la_LIBADD = $(top_builddir)/oshmem/liboshmem.la \ + $(oshmem_verbs_LIBS) \ $(OPAL_TOP_BUILDDIR)/opal/mca/common/verbs/lib@OPAL_LIB_PREFIX@mca_common_verbs.la noinst_LTLIBRARIES = $(component_noinst) diff --git a/oshmem/mca/sshmem/verbs/configure.m4 b/oshmem/mca/sshmem/verbs/configure.m4 index dc31a3d38fb..1f8820386ec 100644 --- a/oshmem/mca/sshmem/verbs/configure.m4 +++ b/oshmem/mca/sshmem/verbs/configure.m4 @@ -76,6 +76,26 @@ AC_DEFUN([MCA_oshmem_sshmem_verbs_CONFIG],[ exp_reg_mr_happy=0 AS_IF([test "$oshmem_have_mpage" = "3"], [ + oshmem_verbs_save_CFLAGS="$CFLAGS" + CFLAGS="$CFLAGS -Wno-strict-prototypes -Werror" + + AC_COMPILE_IFELSE( + [AC_LANG_PROGRAM([[#include ]], + [[ + struct ibv_exp_reg_shared_mr_in in_smr; + uint64_t access_flags = IBV_EXP_ACCESS_SHARED_MR_USER_READ | + IBV_EXP_ACCESS_SHARED_MR_USER_WRITE | + IBV_EXP_ACCESS_SHARED_MR_GROUP_READ | + IBV_EXP_ACCESS_SHARED_MR_GROUP_WRITE | + IBV_EXP_ACCESS_SHARED_MR_OTHER_READ | + IBV_EXP_ACCESS_SHARED_MR_OTHER_WRITE; + in_smr.exp_access = access_flags; + ibv_exp_reg_shared_mr(&in_smr); + ]])], [], + [oshmem_verbs_sm_build_verbs=0]) + + CFLAGS="$oshmem_verbs_save_CFLAGS" + AC_CHECK_MEMBER([struct ibv_exp_reg_shared_mr_in.exp_access], [exp_access_happy=1], [], diff --git a/oshmem/proc/proc.c b/oshmem/proc/proc.c index 665dd10caed..8aa67726f9b 100644 --- a/oshmem/proc/proc.c +++ b/oshmem/proc/proc.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2014-2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. @@ -17,6 +17,7 @@ #include "oshmem/constants.h" #include "oshmem/runtime/runtime.h" #include "oshmem/mca/scoll/base/base.h" +#include "oshmem/proc/proc_group_cache.h" #ifdef HAVE_STRINGS_H #include @@ -65,40 +66,67 @@ oshmem_group_t* oshmem_group_null = NULL; OBJ_CLASS_INSTANCE(oshmem_group_t, opal_object_t, NULL, NULL); +static void oshmem_proc_group_destroy_internal(oshmem_group_t* group, + int scoll_unselect); + int oshmem_proc_group_init(void) { + int rc; + + rc = oshmem_group_cache_init(); + if (OSHMEM_SUCCESS != rc) { + return rc; + } + /* Setup communicator array */ OBJ_CONSTRUCT(&oshmem_group_array, opal_pointer_array_t); - if (OPAL_SUCCESS - != opal_pointer_array_init(&oshmem_group_array, - 0, - ORTE_GLOBAL_ARRAY_MAX_SIZE, - 1)) { - return OSHMEM_ERROR; + + rc = opal_pointer_array_init(&oshmem_group_array, 0, + ORTE_GLOBAL_ARRAY_MAX_SIZE, 1); + if (OPAL_SUCCESS != rc) { + goto err1; } /* Setup SHMEM_GROUP_ALL */ - if (NULL - == (oshmem_group_all = - oshmem_proc_group_create(0, - 1, - ompi_comm_size(oshmem_comm_world)))) { - return OSHMEM_ERROR; + oshmem_group_all = oshmem_proc_group_create(0, 1, ompi_comm_size(oshmem_comm_world)); + if (NULL == oshmem_group_all) { + goto err2; } /* Setup SHMEM_GROUP_SELF */ - if (NULL - == (oshmem_group_self = oshmem_proc_group_create(oshmem_proc_pe(oshmem_proc_local()), - 0, - 1))) { - oshmem_proc_group_destroy(oshmem_group_self); - return OSHMEM_ERROR; + oshmem_group_self = oshmem_proc_group_create(oshmem_proc_pe(oshmem_proc_local()), 0, 1); + if (NULL == oshmem_group_self) { + goto err3; } /* Setup SHMEM_GROUP_NULL */ oshmem_group_null = NULL; return OSHMEM_SUCCESS; + +err3: + oshmem_proc_group_destroy_internal(oshmem_group_all, 1); +err2: + OBJ_DESTRUCT(&oshmem_group_array); +err1: + oshmem_group_cache_destroy(); + return OSHMEM_ERROR; +} + +void oshmem_proc_group_finalize_scoll(void) +{ + int max, i; + oshmem_group_t *group; + + /* Check whether we have some left */ + max = opal_pointer_array_get_size(&oshmem_group_array); + for (i = 0; i < max; i++) { + group = (oshmem_group_t *) opal_pointer_array_get_item(&oshmem_group_array, + i); + if (NULL != group) { + mca_scoll_base_group_unselect(group); + } + } } int oshmem_proc_group_finalize(void) @@ -114,18 +142,17 @@ int oshmem_proc_group_finalize(void) i); if (NULL != group) { /* Group has not been freed before finalize */ - oshmem_proc_group_destroy(group); + oshmem_proc_group_destroy_internal(group, 0); } } OBJ_DESTRUCT(&oshmem_group_array); + oshmem_group_cache_destroy(); return OSHMEM_SUCCESS; } -oshmem_group_t* oshmem_proc_group_create(int pe_start, - int pe_stride, - size_t pe_size) +oshmem_group_t* oshmem_proc_group_create(int pe_start, int pe_stride, int pe_size) { int cur_pe, count_pe; int i; @@ -135,107 +162,133 @@ oshmem_group_t* oshmem_proc_group_create(int pe_start, assert(oshmem_proc_local()); + group = oshmem_group_cache_find(pe_start, pe_stride, pe_size); + if (NULL != group) { + return group; + } + group = OBJ_NEW(oshmem_group_t); + if (NULL == group) { + return NULL; + } - if (group) { - cur_pe = 0; - count_pe = 0; + cur_pe = 0; + count_pe = 0; - OPAL_THREAD_LOCK(&oshmem_proc_lock); + OPAL_THREAD_LOCK(&oshmem_proc_lock); + + /* allocate an array */ + proc_array = (ompi_proc_t**) malloc(pe_size * sizeof(ompi_proc_t*)); + if (NULL == proc_array) { + OBJ_RELEASE(group); + OPAL_THREAD_UNLOCK(&oshmem_proc_lock); + return NULL ; + } - /* allocate an array */ - proc_array = (ompi_proc_t**) malloc(pe_size * sizeof(ompi_proc_t*)); - if (NULL == proc_array) { + group->my_pe = oshmem_proc_pe(oshmem_proc_local()); + group->is_member = 0; + for (i = 0 ; i < ompi_comm_size(oshmem_comm_world) ; i++) { + proc = oshmem_proc_find(i); + if (NULL == proc) { + opal_output(0, + "Error: Can not find proc object for pe = %d", i); + free(proc_array); OBJ_RELEASE(group); OPAL_THREAD_UNLOCK(&oshmem_proc_lock); - return NULL ; + return NULL; } - - group->my_pe = oshmem_proc_pe(oshmem_proc_local()); - group->is_member = 0; - for (i = 0 ; i < ompi_comm_size(oshmem_comm_world) ; i++) { - proc = oshmem_proc_find(i); - if (NULL == proc) { - opal_output(0, - "Error: Can not find proc object for pe = %d", i); - free(proc_array); - OBJ_RELEASE(group); - OPAL_THREAD_UNLOCK(&oshmem_proc_lock); - return NULL; - } - if (count_pe >= (int) pe_size) { - break; - } else if ((cur_pe >= pe_start) - && ((pe_stride == 0) - || (((cur_pe - pe_start) % pe_stride) == 0))) { - proc_array[count_pe++] = proc; - if (oshmem_proc_pe(proc) == group->my_pe) - group->is_member = 1; - } - cur_pe++; + if (count_pe >= (int) pe_size) { + break; + } else if ((cur_pe >= pe_start) + && ((pe_stride == 0) + || (((cur_pe - pe_start) % pe_stride) == 0))) { + proc_array[count_pe++] = proc; + if (oshmem_proc_pe(proc) == group->my_pe) + group->is_member = 1; } - group->proc_array = proc_array; - group->proc_count = (int) count_pe; - group->ompi_comm = NULL; - - /* Prepare peers list */ - OBJ_CONSTRUCT(&(group->peer_list), opal_list_t); - { - orte_namelist_t *peer = NULL; - - for (i = 0; i < group->proc_count; i++) { - peer = OBJ_NEW(orte_namelist_t); - peer->name.jobid = OSHMEM_PROC_JOBID(group->proc_array[i]); - peer->name.vpid = OSHMEM_PROC_VPID(group->proc_array[i]); - opal_list_append(&(group->peer_list), &peer->super); - } + cur_pe++; + } + group->proc_array = proc_array; + group->proc_count = (int) count_pe; + group->ompi_comm = NULL; + + /* Prepare peers list */ + OBJ_CONSTRUCT(&(group->peer_list), opal_list_t); + { + orte_namelist_t *peer = NULL; + + for (i = 0; i < group->proc_count; i++) { + peer = OBJ_NEW(orte_namelist_t); + peer->name.jobid = OSHMEM_PROC_JOBID(group->proc_array[i]); + peer->name.vpid = OSHMEM_PROC_VPID(group->proc_array[i]); + opal_list_append(&(group->peer_list), &peer->super); } - group->id = opal_pointer_array_add(&oshmem_group_array, group); + } + group->id = opal_pointer_array_add(&oshmem_group_array, group); - memset(&group->g_scoll, 0, sizeof(mca_scoll_base_group_scoll_t)); + memset(&group->g_scoll, 0, sizeof(mca_scoll_base_group_scoll_t)); - if (OSHMEM_SUCCESS != mca_scoll_base_select(group)) { - opal_output(0, - "Error: No collective modules are available: group is not created, returning NULL"); - oshmem_proc_group_destroy(group); - OPAL_THREAD_UNLOCK(&oshmem_proc_lock); - return NULL; - } + if (OSHMEM_SUCCESS != mca_scoll_base_select(group)) { + opal_output(0, + "Error: No collective modules are available: group is not created, returning NULL"); + oshmem_proc_group_destroy_internal(group, 0); OPAL_THREAD_UNLOCK(&oshmem_proc_lock); + return NULL; } + if (OSHMEM_SUCCESS != oshmem_group_cache_insert(group, pe_start, + pe_stride, pe_size)) { + oshmem_proc_group_destroy_internal(group, 1); + OPAL_THREAD_UNLOCK(&oshmem_proc_lock); + return NULL; + } + + OPAL_THREAD_UNLOCK(&oshmem_proc_lock); return group; } -void oshmem_proc_group_destroy(oshmem_group_t* group) +static void +oshmem_proc_group_destroy_internal(oshmem_group_t* group, int scoll_unselect) { - if (group) { + if (NULL == group) { + return; + } + + if (scoll_unselect) { mca_scoll_base_group_unselect(group); + } - /* Destroy proc array */ - if (group->proc_array) { - free(group->proc_array); - } + /* Destroy proc array */ + if (group->proc_array) { + free(group->proc_array); + } - /* Destroy peer list */ - { - opal_list_item_t *item; + /* Destroy peer list */ + { + opal_list_item_t *item; - while (NULL != (item = opal_list_remove_first(&(group->peer_list)))) { - /* destruct the item (we constructed it), then free the memory chunk */ - OBJ_RELEASE(item); - } - OBJ_DESTRUCT(&(group->peer_list)); + while (NULL != (item = opal_list_remove_first(&(group->peer_list)))) { + /* destruct the item (we constructed it), then free the memory chunk */ + OBJ_RELEASE(item); } + OBJ_DESTRUCT(&(group->peer_list)); + } - /* reset the oshmem_group_array entry - make sure that the - * entry is in the table */ - if (NULL - != opal_pointer_array_get_item(&oshmem_group_array, - group->id)) { - opal_pointer_array_set_item(&oshmem_group_array, group->id, NULL ); - } + /* reset the oshmem_group_array entry - make sure that the + * entry is in the table */ + if (NULL + != opal_pointer_array_get_item(&oshmem_group_array, + group->id)) { + opal_pointer_array_set_item(&oshmem_group_array, group->id, NULL ); + } - OBJ_RELEASE(group); + OBJ_RELEASE(group); +} + +void oshmem_proc_group_destroy(oshmem_group_t* group) +{ + if (oshmem_group_cache_enabled()) { + return; } + oshmem_proc_group_destroy_internal(group, 1); } diff --git a/oshmem/proc/proc.h b/oshmem/proc/proc.h index 11ab5e75ec0..4d4f9b005f8 100644 --- a/oshmem/proc/proc.h +++ b/oshmem/proc/proc.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. @@ -17,8 +17,6 @@ #include "oshmem/types.h" #include "oshmem/constants.h" -#include "oshmem/mca/scoll/scoll.h" - #include "opal/class/opal_list.h" #include "opal/util/proc.h" #include "opal/dss/dss_types.h" @@ -30,6 +28,10 @@ #include "ompi/proc/proc.h" #include "ompi/communicator/communicator.h" +#include "oshmem/mca/scoll/scoll.h" +#include "oshmem/runtime/runtime.h" +#include "oshmem/shmem/shmem_api_logger.h" + BEGIN_C_DECLS /* ******************************************************************** */ @@ -177,19 +179,17 @@ OSHMEM_DECLSPEC int oshmem_proc_group_init(void); /** * Finalize the OSHMEM process predefined groups * - * Initialize the Open SHMEM process predefined groups. This function will - * query the run-time environment and build a list of the proc - * instances in the current pe set. The local information not - * easily determined by the run-time ahead of time (architecture and - * hostname) will be published during this call. - * - * @note This is primarily used once during SHMEM setup. - * * @retval OSHMEM_SUCESS System successfully initialized * @retval OSHMEM_ERROR Initialization failed due to unspecified error */ OSHMEM_DECLSPEC int oshmem_proc_group_finalize(void); +/** + * Release collectives used by the groups. The function + * must be called prior to the oshmem_proc_group_finalize() + */ +OSHMEM_DECLSPEC void oshmem_proc_group_finalize_scoll(void); + /** * Create processes group. * @@ -205,7 +205,29 @@ OSHMEM_DECLSPEC int oshmem_proc_group_finalize(void); */ OSHMEM_DECLSPEC oshmem_group_t *oshmem_proc_group_create(int pe_start, int pe_stride, - size_t pe_size); + int pe_size); + +/** + * same as above but abort on failure + */ +static inline oshmem_group_t * +oshmem_proc_group_create_nofail(int pe_start, int pe_stride, int pe_size) +{ + oshmem_group_t *group; + + group = oshmem_proc_group_create(pe_start, pe_stride, pe_size); + if (NULL == group) { + goto fatal; + } + return group; + +fatal: + SHMEM_API_ERROR("Failed to create group (%d,%d,%d)", + pe_start, pe_stride, pe_size); + oshmem_shmem_abort(-1); + return NULL; +} + /** * Destroy processes group. diff --git a/oshmem/proc/proc_group_cache.c b/oshmem/proc/proc_group_cache.c index daa09680ce6..975a3671e18 100644 --- a/oshmem/proc/proc_group_cache.c +++ b/oshmem/proc/proc_group_cache.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. @@ -13,104 +13,73 @@ #include "oshmem/constants.h" #include "oshmem/runtime/runtime.h" -OBJ_CLASS_INSTANCE(oshmem_group_cache_t, opal_object_t, NULL, NULL); -opal_list_t oshmem_group_cache_list = {{0}}; -unsigned int oshmem_group_cache_size = 0; -oshmem_group_t* find_group_in_cache(int PE_start, int logPE_stride, int PE_size) -{ - int cache_look_up_id[3] = { PE_start, logPE_stride, PE_size }; - opal_list_item_t *item; - if (opal_list_is_empty(&oshmem_group_cache_list)) { - return NULL ; - } +#define OSHMEM_GROUP_CACHE_SIZE 1024 - for (item = opal_list_get_first(&oshmem_group_cache_list); - item && (item != opal_list_get_end(&oshmem_group_cache_list)); - item = opal_list_get_next(item)) { - if (!memcmp(((oshmem_group_cache_t *) item)->cache_id, - cache_look_up_id, - 3 * sizeof(int))) { - return ((oshmem_group_cache_t *) item)->group; - } - } - return NULL ; -} +static opal_hash_table_t group_cache; + +typedef struct { + int pe_start; + int pe_size; + int pe_stride; +} oshmem_group_key_t; -int cache_group(oshmem_group_t *group, - int PE_start, - int logPE_stride, - int PE_size) +static int group_cache_n_hits; +static int group_cache_n_lookups; + +int oshmem_group_cache_init(void) { - oshmem_group_cache_t *cached_group = NULL; - cached_group = OBJ_NEW(oshmem_group_cache_t); -#if OPAL_ENABLE_DEBUG - cached_group->item.opal_list_item_belong_to = NULL; - cached_group->item.opal_list_item_refcount = 0; -#endif - cached_group->group = group; - cached_group->cache_id[0] = PE_start; - cached_group->cache_id[1] = logPE_stride; - cached_group->cache_id[2] = PE_size; - if (opal_list_get_size(&oshmem_group_cache_list) - < oshmem_group_cache_size) { - opal_list_append(&oshmem_group_cache_list, - (opal_list_item_t *)cached_group); - } else { -#if ABORT_ON_CACHE_OVERFLOW - opal_output(0, - "error: group cache overflow on rank %i: cache_size = %u: try encreasing oshmem_group_cache_size mca parameter", - group->my_pe, - oshmem_group_cache_size); - oshmem_shmem_abort(-1); -#else - /*This part of code makes FIFO group cache management. Define ABORT_ON_CACHE_OVERFLOW as 0 to enable this.*/ - oshmem_group_cache_t *cached_group_to_remove = (oshmem_group_cache_t *)opal_list_remove_first(&oshmem_group_cache_list); - oshmem_proc_group_destroy(cached_group_to_remove->group); - OBJ_RELEASE(cached_group_to_remove); - opal_list_append(&oshmem_group_cache_list,(opal_list_item_t *)cached_group); -#endif + OBJ_CONSTRUCT(&group_cache, opal_hash_table_t); + if (OPAL_SUCCESS != opal_hash_table_init(&group_cache, OSHMEM_GROUP_CACHE_SIZE)) { + return OSHMEM_ERROR; } return OSHMEM_SUCCESS; } -int oshmem_group_cache_list_init(void) +void oshmem_group_cache_destroy(void) { - int mca_value; - int cache_size_default = 100; - OBJ_CONSTRUCT(&oshmem_group_cache_list, opal_list_t); - - mca_value = cache_size_default; - (void) mca_base_var_register("oshmem", - "proc", - NULL, - "group_cache_size", - "The depth of the oshmem_group cache list used to speed up collective operations", - MCA_BASE_VAR_TYPE_INT, - NULL, - 0, - MCA_BASE_VAR_FLAG_SETTABLE, - OPAL_INFO_LVL_9, - MCA_BASE_VAR_SCOPE_READONLY, - &mca_value); - if (mca_value < 0) { - opal_output(0, - "error: oshmem_group_cache_size mca parameter was set to %i while it has to be positive value. Default value %i will be used.", - mca_value, - cache_size_default); - mca_value = cache_size_default; + OBJ_DESTRUCT(&group_cache); +} + +oshmem_group_t *oshmem_group_cache_find(int pe_start, int pe_stride, int pe_size) +{ + oshmem_group_key_t key; + oshmem_group_t *group; + + if (!oshmem_group_cache_enabled()) { + return NULL; } - oshmem_group_cache_size = (unsigned int) mca_value; - return OSHMEM_SUCCESS; + + key.pe_start = pe_start; + key.pe_size = pe_size; + key.pe_stride = pe_stride; + + group_cache_n_lookups++; + + if (OPAL_SUCCESS != opal_hash_table_get_value_ptr(&group_cache, &key, + sizeof(key), (void **)&group)) { + return NULL; + } + + group_cache_n_hits++; + return group; } -int oshmem_group_cache_list_free(void) +int oshmem_group_cache_insert(oshmem_group_t *group, int pe_start, + int pe_stride, int pe_size) { - oshmem_group_cache_t *cached_group = NULL; - opal_list_item_t *item; - while (NULL != (item = opal_list_remove_first(&oshmem_group_cache_list))) { - cached_group = (oshmem_group_cache_t *) item; - oshmem_proc_group_destroy(cached_group->group); - OBJ_RELEASE(cached_group); + oshmem_group_key_t key; + + if (!oshmem_group_cache_enabled()) { + return OSHMEM_SUCCESS; + } + + key.pe_start = pe_start; + key.pe_size = pe_size; + key.pe_stride = pe_stride; + + if (OPAL_SUCCESS != opal_hash_table_set_value_ptr(&group_cache, &key, + sizeof(key), group)) { + return OSHMEM_ERROR; } return OSHMEM_SUCCESS; } diff --git a/oshmem/proc/proc_group_cache.h b/oshmem/proc/proc_group_cache.h index d97cfa8380d..6befbaaeb92 100644 --- a/oshmem/proc/proc_group_cache.h +++ b/oshmem/proc/proc_group_cache.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * $COPYRIGHT$ * @@ -13,29 +13,33 @@ #include "oshmem_config.h" #include "proc.h" -#define OSHMEM_GROUP_CACHE_ENABLED 1 -#define ABORT_ON_CACHE_OVERFLOW 1 +#define OSHMEM_GROUP_CACHE_ENABLED 1 + BEGIN_C_DECLS -struct oshmem_group_cache_t { - opal_list_item_t item; - oshmem_group_t *group; - int cache_id[3]; -}; - -typedef struct oshmem_group_cache_t oshmem_group_cache_t; -OSHMEM_DECLSPEC OBJ_CLASS_DECLARATION(oshmem_group_cache_t); -OSHMEM_DECLSPEC extern opal_list_t oshmem_group_cache_list; - -oshmem_group_t* find_group_in_cache(int PE_start, int logPE_stride, int PE_size); - -int cache_group(oshmem_group_t *group, - int PE_start, - int logPE_stride, - int PE_size); -int oshmem_group_cache_list_init(void); -int oshmem_group_cache_list_free(void); - -extern unsigned int oshmem_group_cache_size; + +/** + * A group cache. + * + * Deletion of a group is not implemented because it + * requires a synchronization between PEs + * + * If cache enabled every group is kept until the + * shmem_finalize() is called + */ + +int oshmem_group_cache_init(void); +void oshmem_group_cache_destroy(void); + +oshmem_group_t* oshmem_group_cache_find(int pe_start, int pe_stride, int pe_size); + +int oshmem_group_cache_insert(oshmem_group_t *group, int pe_start, + int pe_stride, int pe_size); + +static inline int oshmem_group_cache_enabled(void) +{ + return OSHMEM_GROUP_CACHE_ENABLED; +} + END_C_DECLS #endif diff --git a/oshmem/request/request.h b/oshmem/request/request.h index 946d55ae024..8d90bd922cf 100644 --- a/oshmem/request/request.h +++ b/oshmem/request/request.h @@ -1,5 +1,4 @@ /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ -/* -*- Mode: C; c-basic-offset:4 ; -*- */ /* * Copyright (c) 2013 Mellanox Technologies, Inc. * All rights reserved. @@ -138,7 +137,7 @@ typedef struct oshmem_request_t oshmem_request_t; * See oshmem/communicator/communicator.h comments with struct oshmem_group_t * for full explanation why we chose the following padding construct for predefines. */ -#define PREDEFINED_REQUEST_PAD (sizeof(void*) * 32) +#define PREDEFINED_REQUEST_PAD 256 struct oshmem_predefined_request_t { struct oshmem_request_t request; diff --git a/oshmem/runtime/oshmem_info_support.c b/oshmem/runtime/oshmem_info_support.c index 3a80690e37e..5c2ddddc3e4 100644 --- a/oshmem/runtime/oshmem_info_support.c +++ b/oshmem/runtime/oshmem_info_support.c @@ -2,6 +2,7 @@ * Copyright (c) 2013 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -57,7 +58,7 @@ int oshmem_info_register_framework_params(opal_pointer_array_t *component_map) } /* Do OMPI interface call */ - rc = ompi_info_register_framework_params(component_map); + rc = opal_info_register_framework_params(component_map); if (OMPI_SUCCESS != rc) { return rc; } @@ -74,7 +75,7 @@ void oshmem_info_close_components(void) } /* Do OMPI interface call */ - ompi_info_close_components(); + opal_info_close_components(); } void oshmem_info_show_oshmem_version(const char *scope) diff --git a/oshmem/runtime/oshmem_shmem_abort.c b/oshmem/runtime/oshmem_shmem_abort.c index aba775a15ea..a299330b0a4 100644 --- a/oshmem/runtime/oshmem_shmem_abort.c +++ b/oshmem/runtime/oshmem_shmem_abort.c @@ -1,6 +1,7 @@ /* * Copyright (c) 2013 Mellanox Technologies, Inc. * All rights reserved. + * Copyright (c) 2017 FUJITSU LIMITED. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -24,6 +25,7 @@ #endif #include "opal/mca/backtrace/backtrace.h" +#include "opal/util/error.h" #include "opal/runtime/opal_params.h" #include "orte/util/proc_info.h" @@ -79,7 +81,7 @@ int oshmem_shmem_abort(int errcode) if (OPAL_SUCCESS == opal_backtrace_buffer(&messages, &len)) { for (i = 0; i < len; ++i) { fprintf(stderr, - "[%s:%d] [%d] func:%s\n", + "[%s:%05d] [%d] func:%s\n", host, (int) pid, i, @@ -95,24 +97,8 @@ int oshmem_shmem_abort(int errcode) } } - /* Should we wait for a while before aborting? */ - - if (0 != opal_abort_delay) { - if (opal_abort_delay < 0) { - fprintf(stderr ,"[%s:%d] Looping forever (MCA parameter opal_abort_delay is < 0)\n", - host, (int) pid); - fflush(stderr); - while (1) { - sleep(5); - } - } else { - fprintf(stderr, "[%s:%d] Delaying for %d seconds before aborting\n", - host, (int) pid, opal_abort_delay); - do { - sleep(1); - } while (--opal_abort_delay > 0); - } - } + /* Wait for a while before aborting */ + opal_delay_abort(); if (!orte_initialized || !oshmem_shmem_initialized) { if (orte_show_help_is_available()) { @@ -124,7 +110,7 @@ int oshmem_shmem_abort(int errcode) (int) pid); } else { fprintf(stderr, - "[%s:%d] Local abort completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!\n", + "[%s:%05d] Local abort completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!\n", host, (int) pid); } diff --git a/oshmem/runtime/oshmem_shmem_finalize.c b/oshmem/runtime/oshmem_shmem_finalize.c index b3282e17945..d88c83259ce 100644 --- a/oshmem/runtime/oshmem_shmem_finalize.c +++ b/oshmem/runtime/oshmem_shmem_finalize.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. @@ -58,29 +58,29 @@ #include "oshmem/shmem/shmem_lock.h" #include "oshmem/runtime/oshmem_shmem_preconnect.h" +extern int oshmem_shmem_globalexit_status; + static int _shmem_finalize(void); int oshmem_shmem_finalize(void) { int ret = OSHMEM_SUCCESS; - static int32_t finalize_has_already_started = 0; - if (opal_atomic_cmpset_32(&finalize_has_already_started, 0, 1) - && oshmem_shmem_initialized && !oshmem_shmem_aborted) { + if (oshmem_shmem_initialized && !oshmem_shmem_aborted) { /* Should be called first because ompi_mpi_finalize makes orte and opal finalization */ ret = _shmem_finalize(); - if ((OSHMEM_SUCCESS == ret) && ompi_mpi_initialized - && !ompi_mpi_finalized) { - PMPI_Comm_free(&oshmem_comm_world); - ret = ompi_mpi_finalize(); - } - if (OSHMEM_SUCCESS == ret) { oshmem_shmem_initialized = false; } } + if ((OSHMEM_SUCCESS == ret) && ompi_mpi_initialized + && !ompi_mpi_finalized && oshmem_shmem_globalexit_status == 0) { + PMPI_Comm_free(&oshmem_comm_world); + ret = ompi_mpi_finalize(); + } + return ret; } @@ -101,17 +101,10 @@ static int _shmem_finalize(void) if (OSHMEM_SUCCESS != (ret = oshmem_request_finalize())) { return ret; } - /* must free cached groups before we kill collectives */ - if (OSHMEM_SUCCESS != (ret = oshmem_group_cache_list_free())) { - return ret; - } - /* We need to call mca_scoll_base_group_unselect explicitly for each group - * that are not freed by oshmem_group_cache_list_free. We can only release its collectives at this point */ - mca_scoll_base_group_unselect(oshmem_group_all); - mca_scoll_base_group_unselect(oshmem_group_self); - /* Close down MCA modules */ + oshmem_proc_group_finalize_scoll(); + /* Close down MCA modules */ if (OSHMEM_SUCCESS != (ret = mca_base_framework_close(&oshmem_atomic_base_framework) ) ) { return ret; } diff --git a/oshmem/runtime/oshmem_shmem_init.c b/oshmem/runtime/oshmem_shmem_init.c index ae58e837693..2a52b4550cd 100644 --- a/oshmem/runtime/oshmem_shmem_init.c +++ b/oshmem/runtime/oshmem_shmem_init.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2015-2016 Research Organization for Information Science * and Technology (RIST). All rights reserved. @@ -51,6 +51,7 @@ #include "opal/mca/allocator/base/base.h" #include "ompi/proc/proc.h" #include "ompi/runtime/mpiruntime.h" +#include "ompi/util/timings.h" #include "oshmem/constants.h" #include "oshmem/runtime/runtime.h" @@ -143,17 +144,26 @@ int oshmem_shmem_init(int argc, char **argv, int requested, int *provided) { int ret = OSHMEM_SUCCESS; + OMPI_TIMING_INIT(32); + if (!oshmem_shmem_initialized) { if (!ompi_mpi_initialized && !ompi_mpi_finalized) { ret = ompi_mpi_init(argc, argv, requested, provided); } + OMPI_TIMING_NEXT("ompi_mpi_init"); if (OSHMEM_SUCCESS != ret) { return ret; } PMPI_Comm_dup(MPI_COMM_WORLD, &oshmem_comm_world); + OMPI_TIMING_NEXT("PMPI_Comm_dup"); + ret = _shmem_init(argc, argv, requested, provided); + OMPI_TIMING_NEXT("_shmem_init"); + OMPI_TIMING_IMPORT_OPAL("mca_scoll_mpi_comm_query"); + OMPI_TIMING_IMPORT_OPAL("mca_scoll_enable"); + OMPI_TIMING_IMPORT_OPAL("mca_scoll_base_select"); if (OSHMEM_SUCCESS != ret) { return ret; @@ -164,11 +174,15 @@ int oshmem_shmem_init(int argc, char **argv, int requested, int *provided) SHMEM_API_ERROR( "shmem_lock_init() failed"); return OSHMEM_ERROR; } + OMPI_TIMING_NEXT("shmem_lock_init"); /* this is a collective op, implies barrier */ MCA_MEMHEAP_CALL(get_all_mkeys()); + OMPI_TIMING_NEXT("get_all_mkeys()"); oshmem_shmem_preconnect_all(); + OMPI_TIMING_NEXT("shmem_preconnect_all"); + #if OSHMEM_OPAL_THREAD_ENABLE pthread_t thread_id; int perr; @@ -178,11 +192,14 @@ int oshmem_shmem_init(int argc, char **argv, int requested, int *provided) return OSHMEM_ERROR; } #endif + OMPI_TIMING_NEXT("THREAD_ENABLE"); } #ifdef SIGUSR1 signal(SIGUSR1,sighandler__SIGUSR1); signal(SIGTERM,sighandler__SIGTERM); #endif + OMPI_TIMING_OUT; + OMPI_TIMING_FINALIZE; return ret; } @@ -259,11 +276,6 @@ static int _shmem_init(int argc, char **argv, int requested, int *provided) goto error; } - if (OSHMEM_SUCCESS != (ret = oshmem_group_cache_list_init())) { - error = "oshmem_group_cache_list_init() failed"; - goto error; - } - if (OSHMEM_SUCCESS != (ret = oshmem_op_init())) { error = "oshmem_op_init() failed"; goto error; diff --git a/oshmem/shmem/c/profile/defines.h b/oshmem/shmem/c/profile/defines.h index 9a376a941ff..7f61bc2738f 100644 --- a/oshmem/shmem/c/profile/defines.h +++ b/oshmem/shmem/c/profile/defines.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013-2016 Mellanox Technologies, Inc. + * Copyright (c) 2013-2017 Mellanox Technologies, Inc. * All rights reserved. * $COPYRIGHT$ * @@ -305,7 +305,6 @@ */ #define shmem_broadcast32 pshmem_broadcast32 #define shmem_broadcast64 pshmem_broadcast64 -#define shmem_broadcast pshmem_broadcast #define shmem_collect32 pshmem_collect32 #define shmem_collect64 pshmem_collect64 #define shmem_fcollect32 pshmem_fcollect32 diff --git a/oshmem/shmem/c/shmem_alltoall.c b/oshmem/shmem/c/shmem_alltoall.c index c75aa3ae292..57f40f67bd8 100644 --- a/oshmem/shmem/c/shmem_alltoall.c +++ b/oshmem/shmem/c/shmem_alltoall.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2016 Mellanox Technologies, Inc. + * Copyright (c) 2016-2018 Mellanox Technologies, Inc. * All rights reserved. * $COPYRIGHT$ * @@ -19,7 +19,6 @@ #include "oshmem/mca/scoll/scoll.h" #include "oshmem/proc/proc.h" -#include "oshmem/proc/proc_group_cache.h" static void _shmem_alltoall(void *target, const void *source, @@ -78,48 +77,23 @@ static void _shmem_alltoall(void *target, int PE_size, long *pSync) { - int rc = OSHMEM_SUCCESS; - oshmem_group_t* group = NULL; + int rc; + oshmem_group_t* group; - if ((0 <= PE_start) && (0 <= logPE_stride)) { - /* Create group basing PE_start, logPE_stride and PE_size */ -#if OSHMEM_GROUP_CACHE_ENABLED == 0 - group = oshmem_proc_group_create(PE_start, (1 << logPE_stride), PE_size); - if (!group) - rc = OSHMEM_ERROR; -#else - group = find_group_in_cache(PE_start, logPE_stride, PE_size); - if (!group) { - group = oshmem_proc_group_create(PE_start, - (1 << logPE_stride), - PE_size); - if (!group) { - rc = OSHMEM_ERROR; - } else { - cache_group(group, PE_start, logPE_stride, PE_size); - } - } -#endif /* OSHMEM_GROUP_CACHE_ENABLED */ - - /* Collective operation call */ - if (rc == OSHMEM_SUCCESS) { - /* Call collective alltoall operation */ - rc = group->g_scoll.scoll_alltoall(group, - target, - source, - dst, - sst, - nelems, - element_size, - pSync, - SCOLL_DEFAULT_ALG); - } -#if OSHMEM_GROUP_CACHE_ENABLED == 0 - if ( rc == OSHMEM_SUCCESS ) { - oshmem_proc_group_destroy(group); - } -#endif /* OSHMEM_GROUP_CACHE_ENABLED */ - } + /* Create group basing PE_start, logPE_stride and PE_size */ + group = oshmem_proc_group_create_nofail(PE_start, 1<g_scoll.scoll_alltoall(group, + target, + source, + dst, + sst, + nelems, + element_size, + pSync, + SCOLL_DEFAULT_ALG); + oshmem_proc_group_destroy(group); + RUNTIME_CHECK_RC(rc); } #if OSHMEM_PROFILING diff --git a/oshmem/shmem/c/shmem_barrier.c b/oshmem/shmem/c/shmem_barrier.c index eba6fe68331..7ce0ddc96f7 100644 --- a/oshmem/shmem/c/shmem_barrier.c +++ b/oshmem/shmem/c/shmem_barrier.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * $COPYRIGHT$ * @@ -18,7 +18,6 @@ #include "oshmem/mca/scoll/base/base.h" #include "oshmem/proc/proc.h" -#include "oshmem/proc/proc_group_cache.h" #if OSHMEM_PROFILING @@ -30,48 +29,22 @@ void shmem_barrier(int PE_start, int logPE_stride, int PE_size, long *pSync) { - int rc = OSHMEM_SUCCESS; - oshmem_group_t* group = NULL; + int rc; + oshmem_group_t* group; RUNTIME_CHECK_INIT(); #if OSHMEM_SPEC_COMPAT == 1 /* all outstanding puts must be completed */ - shmem_fence(); + shmem_quiet(); #endif - if ((0 <= PE_start) && (0 <= logPE_stride)) { - /* Create group basing PE_start, logPE_stride and PE_size */ -#if OSHMEM_GROUP_CACHE_ENABLED == 0 - group = oshmem_proc_group_create(PE_start, (1 << logPE_stride), PE_size); - if (!group) - rc = OSHMEM_ERROR; -#else - group = find_group_in_cache(PE_start, logPE_stride, PE_size); - if (!group) { - group = oshmem_proc_group_create(PE_start, - (1 << logPE_stride), - PE_size); - if (!group) { - rc = OSHMEM_ERROR; - } else { - cache_group(group, PE_start, logPE_stride, PE_size); - } - } -#endif /* OSHMEM_GROUP_CACHE_ENABLED */ - /* Collective operation call */ - if (rc == OSHMEM_SUCCESS) { - /* Call barrier operation */ - rc = group->g_scoll.scoll_barrier(group, pSync, SCOLL_DEFAULT_ALG); - } + /* Create group basing PE_start, logPE_stride and PE_size */ + group = oshmem_proc_group_create_nofail(PE_start, 1<g_scoll.scoll_barrier(group, pSync, SCOLL_DEFAULT_ALG); -#if OSHMEM_GROUP_CACHE_ENABLED == 0 - if ( rc == OSHMEM_SUCCESS ) - { - oshmem_proc_group_destroy(group); - } -#endif /* OSHMEM_GROUP_CACHE_ENABLED */ - } + oshmem_proc_group_destroy(group); RUNTIME_CHECK_RC(rc); } @@ -81,7 +54,7 @@ void shmem_barrier_all(void) #if OSHMEM_SPEC_COMPAT == 1 /* all outstanding puts must be completed */ - shmem_fence(); + shmem_quiet(); #endif if (mca_scoll_sync_array) { diff --git a/oshmem/shmem/c/shmem_broadcast.c b/oshmem/shmem/c/shmem_broadcast.c index dc7334aec1a..a618df733ca 100644 --- a/oshmem/shmem/c/shmem_broadcast.c +++ b/oshmem/shmem/c/shmem_broadcast.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * $COPYRIGHT$ * @@ -19,7 +19,6 @@ #include "oshmem/mca/scoll/scoll.h" #include "oshmem/proc/proc.h" -#include "oshmem/proc/proc_group_cache.h" static void _shmem_broadcast(void *target, const void *source, @@ -58,57 +57,36 @@ static void _shmem_broadcast(void *target, int PE_size, long *pSync) { - int rc = OSHMEM_SUCCESS; - oshmem_group_t* group = NULL; + int rc; + oshmem_group_t *group; if ((0 <= PE_root) && (PE_root < PE_size)) { /* Create group basing PE_start, logPE_stride and PE_size */ -#if OSHMEM_GROUP_CACHE_ENABLED == 0 - group = oshmem_proc_group_create(PE_start, (1 << logPE_stride), PE_size); - if (!group || (PE_root >= group->proc_count)) - { + group = oshmem_proc_group_create_nofail(PE_start, 1 << logPE_stride, PE_size); + if (PE_root >= group->proc_count) { rc = OSHMEM_ERROR; + goto out; } -#else - group = find_group_in_cache(PE_start, logPE_stride, PE_size); - if (!group) { - group = oshmem_proc_group_create(PE_start, - (1 << logPE_stride), - PE_size); - if (!group || (PE_root >= group->proc_count)) { - rc = OSHMEM_ERROR; - } else { - cache_group(group, PE_start, logPE_stride, PE_size); - } - } -#endif /* OSHMEM_GROUP_CACHE_ENABLED */ - /* Collective operation call */ - if (rc == OSHMEM_SUCCESS) { - /* Define actual PE using relative in active set */ - PE_root = oshmem_proc_pe(group->proc_array[PE_root]); + /* Define actual PE using relative in active set */ + PE_root = oshmem_proc_pe(group->proc_array[PE_root]); - /* Call collective broadcast operation */ - rc = group->g_scoll.scoll_broadcast(group, - PE_root, - target, - source, - nbytes, - pSync, - SCOLL_DEFAULT_ALG); - } -#if OSHMEM_GROUP_CACHE_ENABLED == 0 - if ( rc == OSHMEM_SUCCESS ) - { - oshmem_proc_group_destroy(group); - } -#endif /* OSHMEM_GROUP_CACHE_ENABLED */ + /* Call collective broadcast operation */ + rc = group->g_scoll.scoll_broadcast(group, + PE_root, + target, + source, + nbytes, + pSync, + SCOLL_DEFAULT_ALG); +out: + oshmem_proc_group_destroy(group); + RUNTIME_CHECK_RC(rc); } } #if OSHMEM_PROFILING #include "oshmem/include/pshmem.h" -#pragma weak shmem_broadcast = pshmem_broadcast #pragma weak shmem_broadcast32 = pshmem_broadcast32 #pragma weak shmem_broadcast64 = pshmem_broadcast64 #include "oshmem/shmem/c/profile/defines.h" @@ -116,4 +94,3 @@ static void _shmem_broadcast(void *target, SHMEM_TYPE_BROADCAST(_broadcast32, sizeof(uint32_t)) SHMEM_TYPE_BROADCAST(_broadcast64, sizeof(uint64_t)) -SHMEM_TYPE_BROADCAST(_broadcast, sizeof(uint64_t)) diff --git a/oshmem/shmem/c/shmem_collect.c b/oshmem/shmem/c/shmem_collect.c index 9e58c7a20ab..91502035fcc 100644 --- a/oshmem/shmem/c/shmem_collect.c +++ b/oshmem/shmem/c/shmem_collect.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * $COPYRIGHT$ * @@ -19,7 +19,6 @@ #include "oshmem/mca/scoll/scoll.h" #include "oshmem/proc/proc.h" -#include "oshmem/proc/proc_group_cache.h" static void _shmem_collect(void *target, const void *source, @@ -58,47 +57,21 @@ static void _shmem_collect(void *target, long *pSync, bool array_type) { - int rc = OSHMEM_SUCCESS; - oshmem_group_t* group = NULL; + int rc; + oshmem_group_t *group; - { - /* Create group basing PE_start, logPE_stride and PE_size */ -#if OSHMEM_GROUP_CACHE_ENABLED == 0 - group = oshmem_proc_group_create(PE_start, (1 << logPE_stride), PE_size); - if (!group) - rc = OSHMEM_ERROR; -#else - group = find_group_in_cache(PE_start, logPE_stride, PE_size); - if (!group) { - group = oshmem_proc_group_create(PE_start, - (1 << logPE_stride), - PE_size); - if (!group) { - rc = OSHMEM_ERROR; - } else { - cache_group(group, PE_start, logPE_stride, PE_size); - } - } -#endif /* OSHMEM_GROUP_CACHE_ENABLED */ - - /* Collective operation call */ - if (rc == OSHMEM_SUCCESS) { - /* Call collective broadcast operation */ - rc = group->g_scoll.scoll_collect(group, - target, - source, - nbytes, - pSync, - array_type, - SCOLL_DEFAULT_ALG); - } -#if OSHMEM_GROUP_CACHE_ENABLED == 0 - if ( rc == OSHMEM_SUCCESS ) - { - oshmem_proc_group_destroy(group); - } -#endif /* OSHMEM_GROUP_CACHE_ENABLED */ - } + /* Create group basing PE_start, logPE_stride and PE_size */ + group = oshmem_proc_group_create_nofail(PE_start, 1<g_scoll.scoll_collect(group, + target, + source, + nbytes, + pSync, + array_type, + SCOLL_DEFAULT_ALG); + oshmem_proc_group_destroy(group); + RUNTIME_CHECK_RC(rc); } #if OSHMEM_PROFILING diff --git a/oshmem/shmem/c/shmem_finalize.c b/oshmem/shmem/c/shmem_finalize.c index dd98dc8d40f..dca792179ea 100644 --- a/oshmem/shmem/c/shmem_finalize.c +++ b/oshmem/shmem/c/shmem_finalize.c @@ -22,15 +22,9 @@ #include "oshmem/shmem/c/profile/defines.h" #endif -extern int oshmem_shmem_globalexit_status; - void shmem_finalize(void) { OPAL_CR_FINALIZE_LIBRARY(); - if (oshmem_shmem_globalexit_status != 0) - { - return; - } oshmem_shmem_finalize(); } diff --git a/oshmem/shmem/c/shmem_free.c b/oshmem/shmem/c/shmem_free.c index 71801a297ee..b0e706b0094 100644 --- a/oshmem/shmem/c/shmem_free.c +++ b/oshmem/shmem/c/shmem_free.c @@ -1,6 +1,7 @@ /* * Copyright (c) 2013-2015 Mellanox Technologies, Inc. * All rights reserved. + * Copyright (c) 2018 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -41,7 +42,12 @@ static inline void _shfree(void* ptr) { int rc; - RUNTIME_CHECK_INIT(); RUNTIME_CHECK_ADDR(ptr); + RUNTIME_CHECK_INIT(); + if (NULL == ptr) { + return; + } + + RUNTIME_CHECK_ADDR(ptr); #if OSHMEM_SPEC_COMPAT == 1 shmem_barrier_all(); diff --git a/oshmem/shmem/c/shmem_ptr.c b/oshmem/shmem/c/shmem_ptr.c index 12413b29b9f..35a324c2212 100644 --- a/oshmem/shmem/c/shmem_ptr.c +++ b/oshmem/shmem/c/shmem_ptr.c @@ -19,6 +19,9 @@ #include "oshmem/shmem/shmem_api_logger.h" #include "oshmem/runtime/runtime.h" +#include "oshmem/mca/memheap/memheap.h" +#include "oshmem/mca/memheap/base/base.h" + #if OSHMEM_PROFILING #include "oshmem/include/pshmem.h" @@ -26,11 +29,43 @@ #include "oshmem/shmem/c/profile/defines.h" #endif -void *shmem_ptr(const void *ptr, int pe) +void *shmem_ptr(const void *dst_addr, int pe) { - SHMEM_API_VERBOSE(10, - "*************** WARNING!!! NOT SUPPORTED FUNCTION **********************\n" - "shmem_ptr() function is available only on systems where ordinary memory loads\n" - "and stores are used to implement OpenSHMEM put and get operations."); - return 0; + ompi_proc_t *proc; + sshmem_mkey_t *mkey; + int i; + void *rva; + + RUNTIME_CHECK_INIT(); + RUNTIME_CHECK_PE(pe); + RUNTIME_CHECK_ADDR(dst_addr); + + /* process can access its own memory */ + if (pe == oshmem_my_proc_id()) { + return (void *)dst_addr; + } + + /* The memory must be on the local node */ + proc = oshmem_proc_group_find(oshmem_group_all, pe); + if (!OPAL_PROC_ON_LOCAL_NODE(proc->super.proc_flags)) { + return NULL; + } + + for (i = 0; i < mca_memheap_base_num_transports(); i++) { + mkey = mca_memheap_base_get_cached_mkey(pe, (void *)dst_addr, i, &rva); + if (!mkey) { + continue; + } + + if (mca_memheap_base_mkey_is_shm(mkey)) { + return rva; + } + + rva = MCA_SPML_CALL(rmkey_ptr(dst_addr, mkey, pe)); + if (rva != NULL) { + return rva; + } + } + + return NULL; } diff --git a/oshmem/shmem/c/shmem_quiet.c b/oshmem/shmem/c/shmem_quiet.c index c0019dfaaf0..834e6c7fe12 100644 --- a/oshmem/shmem/c/shmem_quiet.c +++ b/oshmem/shmem/c/shmem_quiet.c @@ -23,5 +23,5 @@ void shmem_quiet(void) { - MCA_SPML_CALL(fence()); + MCA_SPML_CALL(quiet()); } diff --git a/oshmem/shmem/c/shmem_reduce.c b/oshmem/shmem/c/shmem_reduce.c index 72c8ea9abb9..57b4cf60f87 100644 --- a/oshmem/shmem/c/shmem_reduce.c +++ b/oshmem/shmem/c/shmem_reduce.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * $COPYRIGHT$ * @@ -16,15 +16,8 @@ #include "oshmem/mca/scoll/scoll.h" #include "oshmem/proc/proc.h" -#include "oshmem/proc/proc_group_cache.h" #include "oshmem/op/op.h" -#if OSHMEM_GROUP_CACHE_ENABLED == 0 -static bool __group_cache_enabled = false; -#else -static bool __group_cache_enabled = true; -#endif /* OSHMEM_GROUP_CACHE_ENABLED */ - /* * The shared memory (SHMEM) reduction routines perform an associative binary operation * across symmetric arrays on multiple virtual PEs. @@ -50,47 +43,22 @@ static bool __group_cache_enabled = true; RUNTIME_CHECK_ADDR(source); \ \ { \ - /* Create group basing PE_start, logPE_stride and PE_size */ \ - if (!__group_cache_enabled) \ - { \ - group = oshmem_proc_group_create(PE_start, (1 << logPE_stride), PE_size); \ - if (!group) \ - rc = OSHMEM_ERROR; \ - } \ - else \ - { \ - group = find_group_in_cache(PE_start,logPE_stride,PE_size); \ - if (!group) \ - { \ - group = oshmem_proc_group_create(PE_start, (1 << logPE_stride), PE_size); \ - if (!group) \ - rc = OSHMEM_ERROR; \ - cache_group(group,PE_start,logPE_stride,PE_size); \ - } \ - } \ - \ - /* Collective operation call */ \ - if ( rc == OSHMEM_SUCCESS ) \ - { \ + group = oshmem_proc_group_create_nofail(PE_start, 1<dt_size; \ + size_t size = nreduce * op->dt_size; \ \ - /* Call collective reduce operation */ \ - rc = group->g_scoll.scoll_reduce( \ - group, \ - op, \ - (void*)target, \ - (const void*)source, \ - size, \ - pSync, \ - (void*)pWrk, \ - SCOLL_DEFAULT_ALG ); \ - } \ + /* Call collective reduce operation */ \ + rc = group->g_scoll.scoll_reduce( \ + group, \ + op, \ + (void*)target, \ + (const void*)source, \ + size, \ + pSync, \ + (void*)pWrk, \ + SCOLL_DEFAULT_ALG ); \ \ - if ( !__group_cache_enabled && (rc == OSHMEM_SUCCESS ) ) \ - { \ - oshmem_proc_group_destroy(group); \ - } \ + oshmem_proc_group_destroy(group); \ } \ RUNTIME_CHECK_RC(rc); \ } diff --git a/oshmem/shmem/fortran/shmem_alltoall_f.c b/oshmem/shmem/fortran/shmem_alltoall_f.c index 58fd866792c..6845edcf3f0 100644 --- a/oshmem/shmem/fortran/shmem_alltoall_f.c +++ b/oshmem/shmem/fortran/shmem_alltoall_f.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013-2016 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. * $COPYRIGHT$ @@ -15,7 +15,6 @@ #include "oshmem/constants.h" #include "oshmem/mca/scoll/scoll.h" #include "oshmem/proc/proc.h" -#include "oshmem/proc/proc_group_cache.h" #include "oshmem/op/op.h" #if OSHMEM_PROFILING @@ -59,7 +58,7 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, (FORTRAN_POINTER_T target, FORTRAN_POINTER_T source, MPI_Fint *dst, MPI_Fint *sst, MPI_Fint *nlong, MPI_Fint *PE_start, MPI_Fint * logPE_stride, MPI_Fint *PE_size, FORTRAN_POINTER_T pSync), (target, source, dst, sst, nlong, PE_start, logPE_stride, PE_size, pSync)) -#define SHMEM_ALLTOALL(F_NAME, T_NAME, OSHMEM_GROUP_CACHE_ENABLED) void F_NAME(FORTRAN_POINTER_T target, \ +#define SHMEM_ALLTOALL(F_NAME, T_NAME) void F_NAME(FORTRAN_POINTER_T target, \ FORTRAN_POINTER_T source, \ MPI_Fint *nlong,\ MPI_Fint *PE_start, \ @@ -67,61 +66,28 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, MPI_Fint *PE_size, \ FORTRAN_POINTER_T pSync)\ {\ - int rc = OSHMEM_SUCCESS;\ - oshmem_group_t* group = NULL;\ - {\ - /* Create group basing PE_start, logPE_stride and PE_size */\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0)\ - {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start), \ - (1 << OMPI_FINT_2_INT(*logPE_stride)), \ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - rc = OSHMEM_ERROR;\ - }\ - else\ - {\ - group = find_group_in_cache(OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start), \ - (1 << OMPI_FINT_2_INT(*logPE_stride)), \ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - rc = OSHMEM_ERROR;\ - cache_group(group,OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - }\ - } /* OSHMEM_GROUP_CACHE_ENABLED */\ - /* Collective operation call */\ - if ( rc == OSHMEM_SUCCESS )\ - {\ - oshmem_op_t* op = T_NAME;\ -\ - /* Call collective broadcast operation */\ - rc = group->g_scoll.scoll_alltoall( group, \ - FPTR_2_VOID_PTR(target), \ - FPTR_2_VOID_PTR(source), \ - 1, \ - 1, \ - OMPI_FINT_2_INT(*nlong), \ - op->dt_size, \ - FPTR_2_VOID_PTR(pSync), SCOLL_DEFAULT_ALG );\ - }\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0) \ - {\ - if ( group )\ - {\ - oshmem_proc_group_destroy(group);\ - }\ - } /* OSHMEM_GROUP_CACHE_ENABLED */\ - }\ + int rc;\ + oshmem_group_t *group;\ + /* Create group basing PE_start, logPE_stride and PE_size */\ + group = oshmem_proc_group_create_nofail(OMPI_FINT_2_INT(*PE_start), \ + (1 << OMPI_FINT_2_INT(*logPE_stride)), \ + OMPI_FINT_2_INT(*PE_size));\ + oshmem_op_t* op = T_NAME;\ + \ + /* Call collective broadcast operation */\ + rc = group->g_scoll.scoll_alltoall( group, \ + FPTR_2_VOID_PTR(target), \ + FPTR_2_VOID_PTR(source), \ + 1, \ + 1, \ + OMPI_FINT_2_INT(*nlong), \ + op->dt_size, \ + FPTR_2_VOID_PTR(pSync), SCOLL_DEFAULT_ALG );\ + oshmem_proc_group_destroy(group);\ + RUNTIME_CHECK_RC(rc);\ } -#define SHMEM_ALLTOALLS(F_NAME, T_NAME, OSHMEM_GROUP_CACHE_ENABLED) void F_NAME(FORTRAN_POINTER_T target, \ +#define SHMEM_ALLTOALLS(F_NAME, T_NAME) void F_NAME(FORTRAN_POINTER_T target, \ FORTRAN_POINTER_T source, \ MPI_Fint *dst,\ MPI_Fint *sst,\ @@ -131,61 +97,28 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, MPI_Fint *PE_size, \ FORTRAN_POINTER_T pSync)\ {\ - int rc = OSHMEM_SUCCESS;\ - oshmem_group_t* group = NULL;\ - {\ - /* Create group basing PE_start, logPE_stride and PE_size */\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0)\ - {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start), \ - (1 << OMPI_FINT_2_INT(*logPE_stride)), \ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - rc = OSHMEM_ERROR;\ - }\ - else\ - {\ - group = find_group_in_cache(OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start), \ - (1 << OMPI_FINT_2_INT(*logPE_stride)), \ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - rc = OSHMEM_ERROR;\ - cache_group(group,OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - }\ - } /* OSHMEM_GROUP_CACHE_ENABLED */\ - /* Collective operation call */\ - if ( rc == OSHMEM_SUCCESS )\ - {\ - oshmem_op_t* op = T_NAME;\ -\ - /* Call collective broadcast operation */\ - rc = group->g_scoll.scoll_alltoall( group, \ - FPTR_2_VOID_PTR(target), \ - FPTR_2_VOID_PTR(source), \ - OMPI_FINT_2_INT(*dst), \ - OMPI_FINT_2_INT(*sst), \ - OMPI_FINT_2_INT(*nlong), \ - op->dt_size, \ - FPTR_2_VOID_PTR(pSync), SCOLL_DEFAULT_ALG );\ - }\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0) \ - {\ - if ( group )\ - {\ - oshmem_proc_group_destroy(group);\ - }\ - } /* OSHMEM_GROUP_CACHE_ENABLED */\ - }\ + int rc;\ + oshmem_group_t *group;\ + /* Create group basing PE_start, logPE_stride and PE_size */\ + group = oshmem_proc_group_create_nofail(OMPI_FINT_2_INT(*PE_start), \ + (1 << OMPI_FINT_2_INT(*logPE_stride)), \ + OMPI_FINT_2_INT(*PE_size));\ + oshmem_op_t* op = T_NAME;\ + \ + /* Call collective broadcast operation */\ + rc = group->g_scoll.scoll_alltoall( group, \ + FPTR_2_VOID_PTR(target), \ + FPTR_2_VOID_PTR(source), \ + OMPI_FINT_2_INT(*dst), \ + OMPI_FINT_2_INT(*sst), \ + OMPI_FINT_2_INT(*nlong), \ + op->dt_size, \ + FPTR_2_VOID_PTR(pSync), SCOLL_DEFAULT_ALG );\ + oshmem_proc_group_destroy(group);\ + RUNTIME_CHECK_RC(rc);\ } -SHMEM_ALLTOALL(shmem_alltoall32_f, oshmem_op_prod_fint4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_ALLTOALL(shmem_alltoall64_f, oshmem_op_prod_fint8, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_ALLTOALLS(shmem_alltoalls32_f, oshmem_op_prod_fint4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_ALLTOALLS(shmem_alltoalls64_f, oshmem_op_prod_fint8, OSHMEM_GROUP_CACHE_ENABLED) +SHMEM_ALLTOALL(shmem_alltoall32_f, oshmem_op_prod_fint4) +SHMEM_ALLTOALL(shmem_alltoall64_f, oshmem_op_prod_fint8) +SHMEM_ALLTOALLS(shmem_alltoalls32_f, oshmem_op_prod_fint4) +SHMEM_ALLTOALLS(shmem_alltoalls64_f, oshmem_op_prod_fint8) diff --git a/oshmem/shmem/fortran/shmem_and_to_all_f.c b/oshmem/shmem/fortran/shmem_and_to_all_f.c index 868f7e844c1..d653360dbf8 100644 --- a/oshmem/shmem/fortran/shmem_and_to_all_f.c +++ b/oshmem/shmem/fortran/shmem_and_to_all_f.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. * $COPYRIGHT$ @@ -15,7 +15,6 @@ #include "oshmem/constants.h" #include "oshmem/mca/scoll/scoll.h" #include "oshmem/proc/proc.h" -#include "oshmem/proc/proc_group_cache.h" #include "oshmem/op/op.h" #if OSHMEM_PROFILING @@ -50,7 +49,7 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, (FORTRAN_POINTER_T target, FORTRAN_POINTER_T source, MPI_Fint *nreduce, MPI_Fint *PE_start, MPI_Fint * logPE_stride, MPI_Fint *PE_size, FORTRAN_POINTER_T *pWrk, FORTRAN_POINTER_T pSync), (target,source,nreduce,PE_start,logPE_stride,PE_size,pWrk,pSync) ) -#define SHMEM_AND_TO_ALL(F_NAME, T_NAME, OSHMEM_GROUP_CACHE_ENABLED) void F_NAME(FORTRAN_POINTER_T target, \ +#define SHMEM_AND_TO_ALL(F_NAME, T_NAME) void F_NAME(FORTRAN_POINTER_T target, \ FORTRAN_POINTER_T source, \ MPI_Fint *nreduce, \ MPI_Fint *PE_start, \ @@ -59,61 +58,27 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, FORTRAN_POINTER_T *pWrk, \ FORTRAN_POINTER_T pSync)\ {\ - int rc = OSHMEM_SUCCESS;\ - oshmem_group_t* group = NULL;\ - {\ - /* Create group basing PE_start, logPE_stride and PE_size */\ - if (0 == OSHMEM_GROUP_CACHE_ENABLED)\ - {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start), \ - (1 << OMPI_FINT_2_INT(*logPE_stride)), \ - OMPI_FINT_2_INT(*PE_size));\ - if (!group) \ - rc = OSHMEM_ERROR;\ - }\ - else \ - {\ - group = find_group_in_cache(OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start), \ - (1 << OMPI_FINT_2_INT(*logPE_stride)), \ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - rc = OSHMEM_ERROR;\ - cache_group(group,\ - OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - }\ - }\ - /* Collective operation call */\ - if ( rc == OSHMEM_SUCCESS )\ - {\ - oshmem_op_t* op = T_NAME;\ - size_t size = OMPI_FINT_2_INT(*nreduce) * op->dt_size;\ -\ - /* Call collective reduce operation */\ - rc = group->g_scoll.scoll_reduce( group, \ - op, \ - FPTR_2_VOID_PTR(target), \ - FPTR_2_VOID_PTR(source), \ - size, \ - FPTR_2_VOID_PTR(pSync), \ - FPTR_2_VOID_PTR(*pWrk), SCOLL_DEFAULT_ALG);\ - }\ - if (0 == OSHMEM_GROUP_CACHE_ENABLED)\ - {\ - if ( rc == OSHMEM_SUCCESS )\ - {\ - oshmem_proc_group_destroy(group);\ - }\ - }\ - }\ + int rc;\ + oshmem_group_t *group;\ + /* Create group basing PE_start, logPE_stride and PE_size */\ + group = oshmem_proc_group_create_nofail(OMPI_FINT_2_INT(*PE_start), \ + (1 << OMPI_FINT_2_INT(*logPE_stride)), \ + OMPI_FINT_2_INT(*PE_size));\ + oshmem_op_t* op = T_NAME;\ + size_t size = OMPI_FINT_2_INT(*nreduce) * op->dt_size;\ + \ + /* Call collective reduce operation */\ + rc = group->g_scoll.scoll_reduce(group, \ + op, \ + FPTR_2_VOID_PTR(target), \ + FPTR_2_VOID_PTR(source), \ + size, \ + FPTR_2_VOID_PTR(pSync), \ + FPTR_2_VOID_PTR(*pWrk), SCOLL_DEFAULT_ALG);\ + oshmem_proc_group_destroy(group);\ + RUNTIME_CHECK_RC(rc); \ } -SHMEM_AND_TO_ALL(shmem_int2_and_to_all_f, oshmem_op_and_fint2, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_AND_TO_ALL(shmem_int4_and_to_all_f, oshmem_op_and_fint4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_AND_TO_ALL(shmem_int8_and_to_all_f, oshmem_op_and_fint8, OSHMEM_GROUP_CACHE_ENABLED) +SHMEM_AND_TO_ALL(shmem_int2_and_to_all_f, oshmem_op_and_fint2) +SHMEM_AND_TO_ALL(shmem_int4_and_to_all_f, oshmem_op_and_fint4) +SHMEM_AND_TO_ALL(shmem_int8_and_to_all_f, oshmem_op_and_fint8) diff --git a/oshmem/shmem/fortran/shmem_broadcast_f.c b/oshmem/shmem/fortran/shmem_broadcast_f.c index b603194da08..d3d737de96a 100644 --- a/oshmem/shmem/fortran/shmem_broadcast_f.c +++ b/oshmem/shmem/fortran/shmem_broadcast_f.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. * $COPYRIGHT$ @@ -15,7 +15,6 @@ #include "oshmem/constants.h" #include "oshmem/mca/scoll/scoll.h" #include "oshmem/proc/proc.h" -#include "oshmem/proc/proc_group_cache.h" #include "oshmem/op/op.h" #if OSHMEM_PROFILING @@ -59,7 +58,7 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, (FORTRAN_POINTER_T target, FORTRAN_POINTER_T source, MPI_Fint *nlong, MPI_Fint *PE_root, MPI_Fint *PE_start, MPI_Fint * logPE_stride, MPI_Fint *PE_size, FORTRAN_POINTER_T pSync), (target, source, nlong, PE_root, PE_start, logPE_stride, PE_size, pSync)) -#define SHMEM_BROADCAST(F_NAME, T_NAME, OSHMEM_GROUP_CACHE_ENABLED) void F_NAME(FORTRAN_POINTER_T target, \ +#define SHMEM_BROADCAST(F_NAME, T_NAME) void F_NAME(FORTRAN_POINTER_T target, \ FORTRAN_POINTER_T source, \ MPI_Fint *nlong,\ MPI_Fint *PE_root, \ @@ -68,70 +67,40 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, MPI_Fint *PE_size, \ FORTRAN_POINTER_T pSync)\ {\ - int rc = OSHMEM_SUCCESS;\ - oshmem_group_t* group = NULL;\ + int rc;\ + oshmem_group_t *group;\ + int rel_PE_root = 0;\ + oshmem_op_t* op = T_NAME;\ \ if ((0 <= OMPI_FINT_2_INT(*PE_root)) && \ - (OMPI_FINT_2_INT(*PE_root) < OMPI_FINT_2_INT(*PE_size)))\ + (OMPI_FINT_2_INT(*PE_root) < OMPI_FINT_2_INT(*PE_size)))\ {\ - /* Create group basing PE_start, logPE_stride and PE_size */\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0)\ - {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start), \ + group = oshmem_proc_group_create_nofail(OMPI_FINT_2_INT(*PE_start), \ (1 << OMPI_FINT_2_INT(*logPE_stride)), \ OMPI_FINT_2_INT(*PE_size));\ - if (!group || (OMPI_FINT_2_INT(*PE_root) >= group->proc_count))\ - {\ - rc = OSHMEM_ERROR;\ - }\ - }\ - else\ + if (OMPI_FINT_2_INT(*PE_root) >= group->proc_count)\ {\ - group = find_group_in_cache(OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start),\ - (1 << OMPI_FINT_2_INT(*logPE_stride)),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group || (OMPI_FINT_2_INT(*PE_root) >= group->proc_count))\ - {\ - rc = OSHMEM_ERROR;\ - }\ - cache_group(group,OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - }\ - } /* OSHMEM_GROUP_CACHE_ENABLED */\ - /* Collective operation call */\ - if ( rc == OSHMEM_SUCCESS )\ - {\ - int rel_PE_root = 0;\ - oshmem_op_t* op = T_NAME;\ -\ - /* Define actual PE using relative in active set */\ - rel_PE_root = oshmem_proc_pe(group->proc_array[OMPI_FINT_2_INT(*PE_root)]);\ -\ - /* Call collective broadcast operation */\ - rc = group->g_scoll.scoll_broadcast( group, \ + rc = OSHMEM_ERROR;\ + goto out;\ + }\ + \ + /* Define actual PE using relative in active set */\ + rel_PE_root = oshmem_proc_pe(group->proc_array[OMPI_FINT_2_INT(*PE_root)]);\ + \ + /* Call collective broadcast operation */\ + rc = group->g_scoll.scoll_broadcast( group, \ rel_PE_root, \ FPTR_2_VOID_PTR(target), \ FPTR_2_VOID_PTR(source), \ OMPI_FINT_2_INT(*nlong) * op->dt_size, \ FPTR_2_VOID_PTR(pSync), SCOLL_DEFAULT_ALG );\ - }\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0) \ - {\ - if ( group )\ - {\ - oshmem_proc_group_destroy(group);\ - }\ - } /* OSHMEM_GROUP_CACHE_ENABLED */\ - }\ + out: \ + oshmem_proc_group_destroy(group);\ + RUNTIME_CHECK_RC(rc); \ + }\ } -SHMEM_BROADCAST(shmem_broadcast4_f, oshmem_op_prod_fint4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_BROADCAST(shmem_broadcast8_f, oshmem_op_prod_fint8, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_BROADCAST(shmem_broadcast32_f, oshmem_op_prod_fint4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_BROADCAST(shmem_broadcast64_f, oshmem_op_prod_fint8, OSHMEM_GROUP_CACHE_ENABLED) +SHMEM_BROADCAST(shmem_broadcast4_f, oshmem_op_prod_fint4) +SHMEM_BROADCAST(shmem_broadcast8_f, oshmem_op_prod_fint8) +SHMEM_BROADCAST(shmem_broadcast32_f, oshmem_op_prod_fint4) +SHMEM_BROADCAST(shmem_broadcast64_f, oshmem_op_prod_fint8) diff --git a/oshmem/shmem/fortran/shmem_collect_f.c b/oshmem/shmem/fortran/shmem_collect_f.c index 79ec3ad7e48..d990a6c902f 100644 --- a/oshmem/shmem/fortran/shmem_collect_f.c +++ b/oshmem/shmem/fortran/shmem_collect_f.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. * $COPYRIGHT$ @@ -15,7 +15,6 @@ #include "oshmem/constants.h" #include "oshmem/mca/scoll/scoll.h" #include "oshmem/proc/proc.h" -#include "oshmem/proc/proc_group_cache.h" #include "oshmem/op/op.h" #if OSHMEM_PROFILING @@ -95,7 +94,7 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, (FORTRAN_POINTER_T target, FORTRAN_POINTER_T source, MPI_Fint *nlong, MPI_Fint *PE_start, MPI_Fint * logPE_stride, MPI_Fint *PE_size, FORTRAN_POINTER_T pSync), (target,source,nlong,PE_start,logPE_stride,PE_size,pSync) ) -#define SHMEM_COLLECT(F_NAME, T_NAME, OSHMEM_GROUP_CACHE_ENABLED) void F_NAME(FORTRAN_POINTER_T target, \ +#define SHMEM_COLLECT(F_NAME, T_NAME) void F_NAME(FORTRAN_POINTER_T target, \ FORTRAN_POINTER_T source, \ MPI_Fint *nlong, \ MPI_Fint *PE_start, \ @@ -103,62 +102,29 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, MPI_Fint *PE_size, \ FORTRAN_POINTER_T pSync)\ {\ - int rc = OSHMEM_SUCCESS;\ - oshmem_group_t* group = NULL;\ - {\ - /* Create group basing PE_start, logPE_stride and PE_size */\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0)\ - {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start), \ - (1 << OMPI_FINT_2_INT(*logPE_stride)), \ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - rc = OSHMEM_ERROR;\ - }\ - else\ - {\ - group = find_group_in_cache(OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start), \ - (1 << OMPI_FINT_2_INT(*logPE_stride)), \ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - rc = OSHMEM_ERROR;\ - cache_group(group,OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - }\ - } /* OSHMEM_GROUP_CACHE_ENABLED */\ - /* Collective operation call */\ - if ( rc == OSHMEM_SUCCESS )\ - {\ - oshmem_op_t* op = T_NAME;\ - /* Call collective broadcast operation */\ - rc = group->g_scoll.scoll_collect( group, \ - FPTR_2_VOID_PTR(target), \ - FPTR_2_VOID_PTR(source), \ - OMPI_FINT_2_INT(*nlong) * op->dt_size, \ - FPTR_2_VOID_PTR(pSync), \ - false, SCOLL_DEFAULT_ALG);\ - }\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0)\ - {\ - if ( rc == OSHMEM_SUCCESS )\ - {\ - oshmem_proc_group_destroy(group);\ - }\ - }/* OSHMEM_GROUP_CACHE_ENABLED */\ - }\ + int rc;\ + oshmem_group_t *group;\ + /* Create group basing PE_start, logPE_stride and PE_size */\ + group = oshmem_proc_group_create_nofail(OMPI_FINT_2_INT(*PE_start), \ + (1 << OMPI_FINT_2_INT(*logPE_stride)), \ + OMPI_FINT_2_INT(*PE_size));\ + oshmem_op_t* op = T_NAME;\ + /* Call collective broadcast operation */\ + rc = group->g_scoll.scoll_collect( group, \ + FPTR_2_VOID_PTR(target), \ + FPTR_2_VOID_PTR(source), \ + OMPI_FINT_2_INT(*nlong) * op->dt_size, \ + FPTR_2_VOID_PTR(pSync), \ + false, SCOLL_DEFAULT_ALG);\ + oshmem_proc_group_destroy(group);\ + RUNTIME_CHECK_RC(rc);\ } -SHMEM_COLLECT(shmem_collect4_f, oshmem_op_prod_fint4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_COLLECT(shmem_collect8_f, oshmem_op_prod_fint8, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_COLLECT(shmem_collect32_f, oshmem_op_prod_fint4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_COLLECT(shmem_collect64_f, oshmem_op_prod_fint8, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_COLLECT(shmem_fcollect4_f, oshmem_op_prod_freal4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_COLLECT(shmem_fcollect8_f, oshmem_op_prod_freal8, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_COLLECT(shmem_fcollect32_f, oshmem_op_prod_freal4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_COLLECT(shmem_fcollect64_f, oshmem_op_prod_freal8, OSHMEM_GROUP_CACHE_ENABLED) +SHMEM_COLLECT(shmem_collect4_f, oshmem_op_prod_fint4) +SHMEM_COLLECT(shmem_collect8_f, oshmem_op_prod_fint8) +SHMEM_COLLECT(shmem_collect32_f, oshmem_op_prod_fint4) +SHMEM_COLLECT(shmem_collect64_f, oshmem_op_prod_fint8) +SHMEM_COLLECT(shmem_fcollect4_f, oshmem_op_prod_freal4) +SHMEM_COLLECT(shmem_fcollect8_f, oshmem_op_prod_freal8) +SHMEM_COLLECT(shmem_fcollect32_f, oshmem_op_prod_freal4) +SHMEM_COLLECT(shmem_fcollect64_f, oshmem_op_prod_freal8) diff --git a/oshmem/shmem/fortran/shmem_info_f.c b/oshmem/shmem/fortran/shmem_info_f.c index d87c54b895a..fc02870a412 100644 --- a/oshmem/shmem/fortran/shmem_info_f.c +++ b/oshmem/shmem/fortran/shmem_info_f.c @@ -1,7 +1,9 @@ /* * Copyright (c) 2013 Mellanox Technologies, Inc. * All rights reserved. - * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -13,7 +15,7 @@ #include -#include "ompi/mpi/fortran/base/strings.h" +#include "ompi/mpi/fortran/base/fortran_base_strings.h" #include "oshmem/shmem/fortran/bindings.h" #include "oshmem/include/shmem.h" diff --git a/oshmem/shmem/fortran/shmem_max_to_all_f.c b/oshmem/shmem/fortran/shmem_max_to_all_f.c index 0ed5e27a4b3..8b3b4465317 100644 --- a/oshmem/shmem/fortran/shmem_max_to_all_f.c +++ b/oshmem/shmem/fortran/shmem_max_to_all_f.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. * $COPYRIGHT$ @@ -15,7 +15,6 @@ #include "oshmem/constants.h" #include "oshmem/mca/scoll/scoll.h" #include "oshmem/proc/proc.h" -#include "oshmem/proc/proc_group_cache.h" #include "oshmem/op/op.h" #if OSHMEM_PROFILING @@ -77,7 +76,7 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, (FORTRAN_POINTER_T target, FORTRAN_POINTER_T source, MPI_Fint *nreduce, MPI_Fint *PE_start, MPI_Fint * logPE_stride, MPI_Fint *PE_size, FORTRAN_POINTER_T *pWrk, FORTRAN_POINTER_T pSync), (target,source,nreduce,PE_start,logPE_stride,PE_size,pWrk,pSync) ) -#define SHMEM_MAX_TO_ALL(F_NAME, T_NAME, OSHMEM_GROUP_CACHE_ENABLED) void F_NAME(FORTRAN_POINTER_T target, \ +#define SHMEM_MAX_TO_ALL(F_NAME, T_NAME) void F_NAME(FORTRAN_POINTER_T target, \ FORTRAN_POINTER_T source, \ MPI_Fint *nreduce,\ MPI_Fint *PE_start,\ @@ -86,61 +85,29 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, FORTRAN_POINTER_T *pWrk,\ FORTRAN_POINTER_T pSync)\ {\ - int rc = OSHMEM_SUCCESS;\ - oshmem_group_t* group = NULL;\ - {\ - /* Create group basing PE_start, logPE_stride and PE_size */\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0) {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start),\ - (1 << OMPI_FINT_2_INT(*logPE_stride)),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - rc = OSHMEM_ERROR;\ - }\ - else\ - {\ - group = find_group_in_cache(OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start),\ - (1 << OMPI_FINT_2_INT(*logPE_stride)),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - rc = OSHMEM_ERROR;\ - cache_group(group,OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - }\ - }\ - /* Collective operation call */\ - if ( rc == OSHMEM_SUCCESS )\ - {\ - oshmem_op_t* op = T_NAME;\ - size_t size = OMPI_FINT_2_INT(*nreduce) * op->dt_size;\ - /* Call collective reduce operation */\ - rc = group->g_scoll.scoll_reduce( group,\ - op,\ - FPTR_2_VOID_PTR(target),\ - FPTR_2_VOID_PTR(source),\ - size,\ - FPTR_2_VOID_PTR(pSync),\ - FPTR_2_VOID_PTR(*pWrk), SCOLL_DEFAULT_ALG);\ - }\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0)\ - {\ - if ( rc == OSHMEM_SUCCESS )\ - {\ - oshmem_proc_group_destroy(group);\ - }\ - }\ - }\ + int rc;\ + oshmem_group_t *group;\ + /* Create group basing PE_start, logPE_stride and PE_size */\ + group = oshmem_proc_group_create_nofail(OMPI_FINT_2_INT(*PE_start),\ + (1 << OMPI_FINT_2_INT(*logPE_stride)),\ + OMPI_FINT_2_INT(*PE_size));\ + oshmem_op_t* op = T_NAME;\ + size_t size = OMPI_FINT_2_INT(*nreduce) * op->dt_size;\ + /* Call collective reduce operation */\ + rc = group->g_scoll.scoll_reduce( group,\ + op,\ + FPTR_2_VOID_PTR(target),\ + FPTR_2_VOID_PTR(source),\ + size,\ + FPTR_2_VOID_PTR(pSync),\ + FPTR_2_VOID_PTR(*pWrk), SCOLL_DEFAULT_ALG);\ + oshmem_proc_group_destroy(group);\ + RUNTIME_CHECK_RC(rc);\ } -SHMEM_MAX_TO_ALL(shmem_int2_max_to_all_f, oshmem_op_max_fint2, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_MAX_TO_ALL(shmem_int4_max_to_all_f, oshmem_op_max_fint4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_MAX_TO_ALL(shmem_int8_max_to_all_f, oshmem_op_max_fint8, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_MAX_TO_ALL(shmem_real4_max_to_all_f, oshmem_op_max_freal4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_MAX_TO_ALL(shmem_real8_max_to_all_f, oshmem_op_max_freal8, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_MAX_TO_ALL(shmem_real16_max_to_all_f, oshmem_op_max_freal16, OSHMEM_GROUP_CACHE_ENABLED) +SHMEM_MAX_TO_ALL(shmem_int2_max_to_all_f, oshmem_op_max_fint2) +SHMEM_MAX_TO_ALL(shmem_int4_max_to_all_f, oshmem_op_max_fint4) +SHMEM_MAX_TO_ALL(shmem_int8_max_to_all_f, oshmem_op_max_fint8) +SHMEM_MAX_TO_ALL(shmem_real4_max_to_all_f, oshmem_op_max_freal4) +SHMEM_MAX_TO_ALL(shmem_real8_max_to_all_f, oshmem_op_max_freal8) +SHMEM_MAX_TO_ALL(shmem_real16_max_to_all_f, oshmem_op_max_freal16) diff --git a/oshmem/shmem/fortran/shmem_min_to_all_f.c b/oshmem/shmem/fortran/shmem_min_to_all_f.c index 15003744440..22201286f8e 100644 --- a/oshmem/shmem/fortran/shmem_min_to_all_f.c +++ b/oshmem/shmem/fortran/shmem_min_to_all_f.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. * $COPYRIGHT$ @@ -15,7 +15,6 @@ #include "oshmem/constants.h" #include "oshmem/mca/scoll/scoll.h" #include "oshmem/proc/proc.h" -#include "oshmem/proc/proc_group_cache.h" #include "oshmem/op/op.h" #if OSHMEM_PROFILING @@ -78,7 +77,7 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, (target,source,nreduce,PE_start,logPE_stride,PE_size,pWrk,pSync) ) -#define SHMEM_MIN_TO_ALL(F_NAME, T_NAME, OSHMEM_GROUP_CACHE_ENABLED) void F_NAME(FORTRAN_POINTER_T target, \ +#define SHMEM_MIN_TO_ALL(F_NAME, T_NAME) void F_NAME(FORTRAN_POINTER_T target, \ FORTRAN_POINTER_T source, \ MPI_Fint *nreduce,\ MPI_Fint *PE_start,\ @@ -87,61 +86,30 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, FORTRAN_POINTER_T *pWrk,\ FORTRAN_POINTER_T pSync)\ {\ - int rc = OSHMEM_SUCCESS;\ - oshmem_group_t* group = NULL;\ - {\ - /* Create group basing PE_start, logPE_stride and PE_size */\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0) {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start),\ - (1 << OMPI_FINT_2_INT(*logPE_stride)),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - rc = OSHMEM_ERROR;\ - }\ - else\ - {\ - group = find_group_in_cache(OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start),\ - (1 << OMPI_FINT_2_INT(*logPE_stride)),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - rc = OSHMEM_ERROR;\ - cache_group(group,OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - }\ - }\ - /* Collective operation call */\ - if ( rc == OSHMEM_SUCCESS )\ - {\ - oshmem_op_t* op = T_NAME;\ - size_t size = OMPI_FINT_2_INT(*nreduce) * op->dt_size;\ - /* Call collective reduce operation */\ - rc = group->g_scoll.scoll_reduce( group,\ - op,\ - FPTR_2_VOID_PTR(target),\ - FPTR_2_VOID_PTR(source),\ - size,\ - FPTR_2_VOID_PTR(pSync),\ - FPTR_2_VOID_PTR(*pWrk), SCOLL_DEFAULT_ALG);\ - }\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0)\ - {\ - if ( rc == OSHMEM_SUCCESS )\ - {\ - oshmem_proc_group_destroy(group);\ - }\ - }\ - }\ + int rc;\ + oshmem_group_t *group;\ + /* Create group basing PE_start, logPE_stride and PE_size */\ + group = oshmem_proc_group_create_nofail(OMPI_FINT_2_INT(*PE_start),\ + (1 << OMPI_FINT_2_INT(*logPE_stride)),\ + OMPI_FINT_2_INT(*PE_size));\ + /* Collective operation call */\ + oshmem_op_t* op = T_NAME;\ + size_t size = OMPI_FINT_2_INT(*nreduce) * op->dt_size;\ + /* Call collective reduce operation */\ + rc = group->g_scoll.scoll_reduce( group,\ + op,\ + FPTR_2_VOID_PTR(target),\ + FPTR_2_VOID_PTR(source),\ + size,\ + FPTR_2_VOID_PTR(pSync),\ + FPTR_2_VOID_PTR(*pWrk), SCOLL_DEFAULT_ALG);\ + oshmem_proc_group_destroy(group);\ + RUNTIME_CHECK_RC(rc);\ } -SHMEM_MIN_TO_ALL(shmem_int2_min_to_all_f, oshmem_op_min_fint2, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_MIN_TO_ALL(shmem_int4_min_to_all_f, oshmem_op_min_fint4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_MIN_TO_ALL(shmem_int8_min_to_all_f, oshmem_op_min_fint8, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_MIN_TO_ALL(shmem_real4_min_to_all_f, oshmem_op_min_freal4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_MIN_TO_ALL(shmem_real8_min_to_all_f, oshmem_op_min_freal8, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_MIN_TO_ALL(shmem_real16_min_to_all_f, oshmem_op_min_freal16, OSHMEM_GROUP_CACHE_ENABLED) +SHMEM_MIN_TO_ALL(shmem_int2_min_to_all_f, oshmem_op_min_fint2) +SHMEM_MIN_TO_ALL(shmem_int4_min_to_all_f, oshmem_op_min_fint4) +SHMEM_MIN_TO_ALL(shmem_int8_min_to_all_f, oshmem_op_min_fint8) +SHMEM_MIN_TO_ALL(shmem_real4_min_to_all_f, oshmem_op_min_freal4) +SHMEM_MIN_TO_ALL(shmem_real8_min_to_all_f, oshmem_op_min_freal8) +SHMEM_MIN_TO_ALL(shmem_real16_min_to_all_f, oshmem_op_min_freal16) diff --git a/oshmem/shmem/fortran/shmem_or_to_all_f.c b/oshmem/shmem/fortran/shmem_or_to_all_f.c index 10a19953d26..7bca154b606 100644 --- a/oshmem/shmem/fortran/shmem_or_to_all_f.c +++ b/oshmem/shmem/fortran/shmem_or_to_all_f.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. * $COPYRIGHT$ @@ -15,7 +15,6 @@ #include "oshmem/constants.h" #include "oshmem/mca/scoll/scoll.h" #include "oshmem/proc/proc.h" -#include "oshmem/proc/proc_group_cache.h" #include "oshmem/op/op.h" #if OSHMEM_PROFILING @@ -50,7 +49,7 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, (FORTRAN_POINTER_T target, FORTRAN_POINTER_T source, MPI_Fint *nreduce, MPI_Fint *PE_start, MPI_Fint * logPE_stride, MPI_Fint *PE_size, FORTRAN_POINTER_T *pWrk, FORTRAN_POINTER_T pSync), (target,source,nreduce,PE_start,logPE_stride,PE_size,pWrk,pSync) ) -#define SHMEM_OR_TO_ALL(F_NAME, T_NAME, OSHMEM_GROUP_CACHE_ENABLED) void F_NAME(FORTRAN_POINTER_T target, \ +#define SHMEM_OR_TO_ALL(F_NAME, T_NAME) void F_NAME(FORTRAN_POINTER_T target, \ FORTRAN_POINTER_T source, \ MPI_Fint *nreduce,\ MPI_Fint *PE_start,\ @@ -59,58 +58,26 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, FORTRAN_POINTER_T *pWrk,\ FORTRAN_POINTER_T pSync)\ {\ - int rc = OSHMEM_SUCCESS;\ - oshmem_group_t* group = NULL;\ - {\ - /* Create group basing PE_start, logPE_stride and PE_size */\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0) {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start),\ - (1 << OMPI_FINT_2_INT(*logPE_stride)),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - rc = OSHMEM_ERROR;\ - }\ - else\ - {\ - group = find_group_in_cache(OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start),\ - (1 << OMPI_FINT_2_INT(*logPE_stride)),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - rc = OSHMEM_ERROR;\ - cache_group(group,OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - }\ - }\ - /* Collective operation call */\ - if ( rc == OSHMEM_SUCCESS )\ - {\ - oshmem_op_t* op = T_NAME;\ - size_t size = OMPI_FINT_2_INT(*nreduce) * op->dt_size;\ - /* Call collective reduce operation */\ - rc = group->g_scoll.scoll_reduce( group,\ - op,\ - FPTR_2_VOID_PTR(target),\ - FPTR_2_VOID_PTR(source),\ - size,\ - FPTR_2_VOID_PTR(pSync),\ - FPTR_2_VOID_PTR(*pWrk), SCOLL_DEFAULT_ALG);\ - }\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0)\ - {\ - if ( rc == OSHMEM_SUCCESS )\ - {\ - oshmem_proc_group_destroy(group);\ - }\ - }\ - }\ + int rc;\ + oshmem_group_t *group;\ + /* Create group basing PE_start, logPE_stride and PE_size */\ + group = oshmem_proc_group_create_nofail(OMPI_FINT_2_INT(*PE_start),\ + (1 << OMPI_FINT_2_INT(*logPE_stride)),\ + OMPI_FINT_2_INT(*PE_size));\ + oshmem_op_t* op = T_NAME;\ + size_t size = OMPI_FINT_2_INT(*nreduce) * op->dt_size;\ + /* Call collective reduce operation */\ + rc = group->g_scoll.scoll_reduce( group,\ + op,\ + FPTR_2_VOID_PTR(target),\ + FPTR_2_VOID_PTR(source),\ + size,\ + FPTR_2_VOID_PTR(pSync),\ + FPTR_2_VOID_PTR(*pWrk), SCOLL_DEFAULT_ALG);\ + oshmem_proc_group_destroy(group);\ + RUNTIME_CHECK_RC(rc); \ } -SHMEM_OR_TO_ALL(shmem_int2_or_to_all_f, oshmem_op_or_fint2, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_OR_TO_ALL(shmem_int4_or_to_all_f, oshmem_op_or_fint4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_OR_TO_ALL(shmem_int8_or_to_all_f, oshmem_op_or_fint8, OSHMEM_GROUP_CACHE_ENABLED) +SHMEM_OR_TO_ALL(shmem_int2_or_to_all_f, oshmem_op_or_fint2) +SHMEM_OR_TO_ALL(shmem_int4_or_to_all_f, oshmem_op_or_fint4) +SHMEM_OR_TO_ALL(shmem_int8_or_to_all_f, oshmem_op_or_fint8) diff --git a/oshmem/shmem/fortran/shmem_prod_to_all_f.c b/oshmem/shmem/fortran/shmem_prod_to_all_f.c index eb4f777d07e..c093bb00573 100644 --- a/oshmem/shmem/fortran/shmem_prod_to_all_f.c +++ b/oshmem/shmem/fortran/shmem_prod_to_all_f.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. * $COPYRIGHT$ @@ -15,7 +15,6 @@ #include "oshmem/constants.h" #include "oshmem/mca/scoll/scoll.h" #include "oshmem/proc/proc.h" -#include "oshmem/proc/proc_group_cache.h" #include "oshmem/op/op.h" #if OSHMEM_PROFILING @@ -96,7 +95,7 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, (FORTRAN_POINTER_T target, FORTRAN_POINTER_T source, MPI_Fint *nreduce, MPI_Fint *PE_start, MPI_Fint * logPE_stride, MPI_Fint *PE_size, FORTRAN_POINTER_T *pWrk, FORTRAN_POINTER_T pSync), (target,source,nreduce,PE_start,logPE_stride,PE_size,pWrk,pSync) ) -#define SHMEM_PROD_TO_ALL(F_NAME, T_NAME, OSHMEM_GROUP_CACHE_ENABLED) void F_NAME(FORTRAN_POINTER_T target, \ +#define SHMEM_PROD_TO_ALL(F_NAME, T_NAME) void F_NAME(FORTRAN_POINTER_T target, \ FORTRAN_POINTER_T source, \ MPI_Fint *nreduce,\ MPI_Fint *PE_start,\ @@ -105,63 +104,31 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, FORTRAN_POINTER_T *pWrk,\ FORTRAN_POINTER_T pSync)\ {\ - int rc = OSHMEM_SUCCESS;\ - oshmem_group_t* group = NULL;\ - {\ - /* Create group basing PE_start, logPE_stride and PE_size */\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0) {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start),\ - (1 << OMPI_FINT_2_INT(*logPE_stride)),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - rc = OSHMEM_ERROR;\ - }\ - else\ - {\ - group = find_group_in_cache(OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start),\ - (1 << OMPI_FINT_2_INT(*logPE_stride)),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - rc = OSHMEM_ERROR;\ - cache_group(group,OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - }\ - }\ - /* Collective operation call */\ - if ( rc == OSHMEM_SUCCESS )\ - {\ - oshmem_op_t* op = T_NAME;\ - size_t size = OMPI_FINT_2_INT(*nreduce) * op->dt_size;\ - /* Call collective reduce operation */\ - rc = group->g_scoll.scoll_reduce( group,\ - op,\ - FPTR_2_VOID_PTR(target),\ - FPTR_2_VOID_PTR(source),\ - size,\ - FPTR_2_VOID_PTR(pSync),\ - FPTR_2_VOID_PTR(*pWrk), SCOLL_DEFAULT_ALG);\ - }\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0)\ - {\ - if ( rc == OSHMEM_SUCCESS )\ - {\ - oshmem_proc_group_destroy(group);\ - }\ - }\ - }\ + int rc;\ + oshmem_group_t *group;\ + /* Create group basing PE_start, logPE_stride and PE_size */\ + group = oshmem_proc_group_create_nofail(OMPI_FINT_2_INT(*PE_start),\ + (1 << OMPI_FINT_2_INT(*logPE_stride)),\ + OMPI_FINT_2_INT(*PE_size));\ + oshmem_op_t* op = T_NAME;\ + size_t size = OMPI_FINT_2_INT(*nreduce) * op->dt_size;\ + /* Call collective reduce operation */\ + rc = group->g_scoll.scoll_reduce( group,\ + op,\ + FPTR_2_VOID_PTR(target),\ + FPTR_2_VOID_PTR(source),\ + size,\ + FPTR_2_VOID_PTR(pSync),\ + FPTR_2_VOID_PTR(*pWrk), SCOLL_DEFAULT_ALG);\ + oshmem_proc_group_destroy(group);\ + RUNTIME_CHECK_RC(rc);\ } -SHMEM_PROD_TO_ALL(shmem_int2_prod_to_all_f, oshmem_op_prod_fint2, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_PROD_TO_ALL(shmem_int4_prod_to_all_f, oshmem_op_prod_fint4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_PROD_TO_ALL(shmem_int8_prod_to_all_f, oshmem_op_prod_fint8, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_PROD_TO_ALL(shmem_comp4_prod_to_all_f, oshmem_op_prod_complexf, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_PROD_TO_ALL(shmem_comp8_prod_to_all_f, oshmem_op_prod_complexd, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_PROD_TO_ALL(shmem_real4_prod_to_all_f, oshmem_op_prod_freal4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_PROD_TO_ALL(shmem_real8_prod_to_all_f, oshmem_op_prod_freal8, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_PROD_TO_ALL(shmem_real16_prod_to_all_f, oshmem_op_prod_freal16, OSHMEM_GROUP_CACHE_ENABLED) +SHMEM_PROD_TO_ALL(shmem_int2_prod_to_all_f, oshmem_op_prod_fint2) +SHMEM_PROD_TO_ALL(shmem_int4_prod_to_all_f, oshmem_op_prod_fint4) +SHMEM_PROD_TO_ALL(shmem_int8_prod_to_all_f, oshmem_op_prod_fint8) +SHMEM_PROD_TO_ALL(shmem_comp4_prod_to_all_f, oshmem_op_prod_complexf) +SHMEM_PROD_TO_ALL(shmem_comp8_prod_to_all_f, oshmem_op_prod_complexd) +SHMEM_PROD_TO_ALL(shmem_real4_prod_to_all_f, oshmem_op_prod_freal4) +SHMEM_PROD_TO_ALL(shmem_real8_prod_to_all_f, oshmem_op_prod_freal8) +SHMEM_PROD_TO_ALL(shmem_real16_prod_to_all_f, oshmem_op_prod_freal16) diff --git a/oshmem/shmem/fortran/shmem_quiet_f.c b/oshmem/shmem/fortran/shmem_quiet_f.c index f7a8a5bf8a2..aa4af87563b 100644 --- a/oshmem/shmem/fortran/shmem_quiet_f.c +++ b/oshmem/shmem/fortran/shmem_quiet_f.c @@ -30,5 +30,5 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, void shmem_quiet_f(void) { - MCA_SPML_CALL(fence()); + MCA_SPML_CALL(quiet()); } diff --git a/oshmem/shmem/fortran/shmem_sum_to_all_f.c b/oshmem/shmem/fortran/shmem_sum_to_all_f.c index d3874ec25a0..ca48f484407 100644 --- a/oshmem/shmem/fortran/shmem_sum_to_all_f.c +++ b/oshmem/shmem/fortran/shmem_sum_to_all_f.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. * $COPYRIGHT$ @@ -15,7 +15,6 @@ #include "oshmem/constants.h" #include "oshmem/mca/scoll/scoll.h" #include "oshmem/proc/proc.h" -#include "oshmem/proc/proc_group_cache.h" #include "oshmem/op/op.h" #if OSHMEM_PROFILING @@ -95,7 +94,7 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, (FORTRAN_POINTER_T target, FORTRAN_POINTER_T source, MPI_Fint *nreduce, MPI_Fint *PE_start, MPI_Fint * logPE_stride, MPI_Fint *PE_size, FORTRAN_POINTER_T *pWrk, FORTRAN_POINTER_T pSync), (target,source,nreduce,PE_start,logPE_stride,PE_size,pWrk,pSync) ) -#define SHMEM_SUM_TO_ALL(F_NAME, T_NAME, OSHMEM_GROUP_CACHE_ENABLED) void F_NAME(FORTRAN_POINTER_T target, \ +#define SHMEM_SUM_TO_ALL(F_NAME, T_NAME) void F_NAME(FORTRAN_POINTER_T target, \ FORTRAN_POINTER_T source, \ MPI_Fint *nreduce,\ MPI_Fint *PE_start,\ @@ -104,63 +103,31 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, FORTRAN_POINTER_T *pWrk,\ FORTRAN_POINTER_T pSync)\ {\ - int rc = OSHMEM_SUCCESS;\ - oshmem_group_t* group = NULL;\ - {\ - /* Create group basing PE_start, logPE_stride and PE_size */\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0) {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start),\ - (1 << OMPI_FINT_2_INT(*logPE_stride)),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - rc = OSHMEM_ERROR;\ - }\ - else\ - {\ - group = find_group_in_cache(OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start),\ - (1 << OMPI_FINT_2_INT(*logPE_stride)),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - rc = OSHMEM_ERROR;\ - cache_group(group,OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - }\ - }\ - /* Collective operation call */\ - if ( rc == OSHMEM_SUCCESS )\ - {\ - oshmem_op_t* op = T_NAME;\ - size_t size = OMPI_FINT_2_INT(*nreduce) * op->dt_size;\ - /* Call collective reduce operation */\ - rc = group->g_scoll.scoll_reduce( group,\ - op,\ - FPTR_2_VOID_PTR(target),\ - FPTR_2_VOID_PTR(source),\ - size,\ - FPTR_2_VOID_PTR(pSync),\ - FPTR_2_VOID_PTR(*pWrk), SCOLL_DEFAULT_ALG);\ - }\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0)\ - {\ - if ( rc == OSHMEM_SUCCESS )\ - {\ - oshmem_proc_group_destroy(group);\ - }\ - }\ - }\ + int rc;\ + oshmem_group_t *group;\ + /* Create group basing PE_start, logPE_stride and PE_size */\ + group = oshmem_proc_group_create_nofail(OMPI_FINT_2_INT(*PE_start),\ + (1 << OMPI_FINT_2_INT(*logPE_stride)),\ + OMPI_FINT_2_INT(*PE_size));\ + oshmem_op_t* op = T_NAME;\ + size_t size = OMPI_FINT_2_INT(*nreduce) * op->dt_size;\ + /* Call collective reduce operation */\ + rc = group->g_scoll.scoll_reduce( group,\ + op,\ + FPTR_2_VOID_PTR(target),\ + FPTR_2_VOID_PTR(source),\ + size,\ + FPTR_2_VOID_PTR(pSync),\ + FPTR_2_VOID_PTR(*pWrk), SCOLL_DEFAULT_ALG);\ + oshmem_proc_group_destroy(group);\ + RUNTIME_CHECK_RC(rc);\ } -SHMEM_SUM_TO_ALL(shmem_int2_sum_to_all_f, oshmem_op_sum_fint2, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_SUM_TO_ALL(shmem_int4_sum_to_all_f, oshmem_op_sum_fint4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_SUM_TO_ALL(shmem_int8_sum_to_all_f, oshmem_op_sum_fint8, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_SUM_TO_ALL(shmem_comp4_sum_to_all_f, oshmem_op_sum_complexf, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_SUM_TO_ALL(shmem_comp8_sum_to_all_f, oshmem_op_sum_complexd, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_SUM_TO_ALL(shmem_real4_sum_to_all_f, oshmem_op_sum_freal4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_SUM_TO_ALL(shmem_real8_sum_to_all_f, oshmem_op_sum_freal8, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_SUM_TO_ALL(shmem_real16_sum_to_all_f, oshmem_op_sum_freal16, OSHMEM_GROUP_CACHE_ENABLED) +SHMEM_SUM_TO_ALL(shmem_int2_sum_to_all_f, oshmem_op_sum_fint2) +SHMEM_SUM_TO_ALL(shmem_int4_sum_to_all_f, oshmem_op_sum_fint4) +SHMEM_SUM_TO_ALL(shmem_int8_sum_to_all_f, oshmem_op_sum_fint8) +SHMEM_SUM_TO_ALL(shmem_comp4_sum_to_all_f, oshmem_op_sum_complexf) +SHMEM_SUM_TO_ALL(shmem_comp8_sum_to_all_f, oshmem_op_sum_complexd) +SHMEM_SUM_TO_ALL(shmem_real4_sum_to_all_f, oshmem_op_sum_freal4) +SHMEM_SUM_TO_ALL(shmem_real8_sum_to_all_f, oshmem_op_sum_freal8) +SHMEM_SUM_TO_ALL(shmem_real16_sum_to_all_f, oshmem_op_sum_freal16) diff --git a/oshmem/shmem/fortran/shmem_xor_to_all_f.c b/oshmem/shmem/fortran/shmem_xor_to_all_f.c index 7d24ecbbc09..f85d62b92b0 100644 --- a/oshmem/shmem/fortran/shmem_xor_to_all_f.c +++ b/oshmem/shmem/fortran/shmem_xor_to_all_f.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 Mellanox Technologies, Inc. + * Copyright (c) 2013-2018 Mellanox Technologies, Inc. * All rights reserved. * Copyright (c) 2013 Cisco Systems, Inc. All rights reserved. * $COPYRIGHT$ @@ -15,7 +15,6 @@ #include "oshmem/constants.h" #include "oshmem/mca/scoll/scoll.h" #include "oshmem/proc/proc.h" -#include "oshmem/proc/proc_group_cache.h" #include "oshmem/op/op.h" #if OSHMEM_PROFILING @@ -68,7 +67,7 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, (FORTRAN_POINTER_T target, FORTRAN_POINTER_T source, MPI_Fint *nreduce, MPI_Fint *PE_start, MPI_Fint * logPE_stride, MPI_Fint *PE_size, FORTRAN_POINTER_T *pWrk, FORTRAN_POINTER_T pSync), (target,source,nreduce,PE_start,logPE_stride,PE_size,pWrk,pSync) ) -#define SHMEM_XOR_TO_ALL(F_NAME, T_NAME, OSHMEM_GROUP_CACHE_ENABLED) void F_NAME(FORTRAN_POINTER_T target, \ +#define SHMEM_XOR_TO_ALL(F_NAME, T_NAME) void F_NAME(FORTRAN_POINTER_T target, \ FORTRAN_POINTER_T source, \ MPI_Fint *nreduce,\ MPI_Fint *PE_start,\ @@ -77,62 +76,29 @@ SHMEM_GENERATE_FORTRAN_BINDINGS_SUB (void, FORTRAN_POINTER_T *pWrk,\ FORTRAN_POINTER_T pSync)\ {\ - int rc = OSHMEM_SUCCESS;\ - oshmem_group_t* group = NULL;\ - {\ - /* Create group basing PE_start, logPE_stride and PE_size */\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0)\ - {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start), \ - (1 << OMPI_FINT_2_INT(*logPE_stride)), \ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - rc = OSHMEM_ERROR;\ - }\ - else\ - {\ - group = find_group_in_cache(OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - {\ - group = oshmem_proc_group_create(OMPI_FINT_2_INT(*PE_start), \ - (1 << OMPI_FINT_2_INT(*logPE_stride)), \ - OMPI_FINT_2_INT(*PE_size));\ - if (!group)\ - rc = OSHMEM_ERROR;\ - cache_group(group,OMPI_FINT_2_INT(*PE_start),\ - OMPI_FINT_2_INT(*logPE_stride),\ - OMPI_FINT_2_INT(*PE_size));\ - }\ - }\ - /* Collective operation call */\ - if ( rc == OSHMEM_SUCCESS )\ - {\ - oshmem_op_t* op = T_NAME;\ - size_t size = OMPI_FINT_2_INT(*nreduce) * op->dt_size;\ -\ - /* Call collective reduce operation */\ - rc = group->g_scoll.scoll_reduce( group,\ - op,\ - FPTR_2_VOID_PTR(target),\ - FPTR_2_VOID_PTR(source),\ - size,\ - FPTR_2_VOID_PTR(pSync),\ - FPTR_2_VOID_PTR(*pWrk), SCOLL_DEFAULT_ALG);\ - }\ - if (OSHMEM_GROUP_CACHE_ENABLED == 0)\ - {\ - if ( rc == OSHMEM_SUCCESS )\ - {\ - oshmem_proc_group_destroy(group);\ - }\ - }\ - }\ + int rc;\ + oshmem_group_t *group;\ + /* Create group basing PE_start, logPE_stride and PE_size */\ + group = oshmem_proc_group_create_nofail(OMPI_FINT_2_INT(*PE_start), \ + (1 << OMPI_FINT_2_INT(*logPE_stride)), \ + OMPI_FINT_2_INT(*PE_size));\ + oshmem_op_t* op = T_NAME;\ + size_t size = OMPI_FINT_2_INT(*nreduce) * op->dt_size;\ + \ + /* Call collective reduce operation */\ + rc = group->g_scoll.scoll_reduce( group,\ + op,\ + FPTR_2_VOID_PTR(target),\ + FPTR_2_VOID_PTR(source),\ + size,\ + FPTR_2_VOID_PTR(pSync),\ + FPTR_2_VOID_PTR(*pWrk), SCOLL_DEFAULT_ALG);\ + oshmem_proc_group_destroy(group);\ + RUNTIME_CHECK_RC(rc); \ } -SHMEM_XOR_TO_ALL(shmem_int2_xor_to_all_f, oshmem_op_xor_fint2, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_XOR_TO_ALL(shmem_int4_xor_to_all_f, oshmem_op_xor_fint4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_XOR_TO_ALL(shmem_int8_xor_to_all_f, oshmem_op_xor_fint8, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_XOR_TO_ALL(shmem_comp4_xor_to_all_f, oshmem_op_xor_fint4, OSHMEM_GROUP_CACHE_ENABLED) -SHMEM_XOR_TO_ALL(shmem_comp8_xor_to_all_f, oshmem_op_xor_fint8, OSHMEM_GROUP_CACHE_ENABLED) +SHMEM_XOR_TO_ALL(shmem_int2_xor_to_all_f, oshmem_op_xor_fint2) +SHMEM_XOR_TO_ALL(shmem_int4_xor_to_all_f, oshmem_op_xor_fint4) +SHMEM_XOR_TO_ALL(shmem_int8_xor_to_all_f, oshmem_op_xor_fint8) +SHMEM_XOR_TO_ALL(shmem_comp4_xor_to_all_f, oshmem_op_xor_fint4) +SHMEM_XOR_TO_ALL(shmem_comp8_xor_to_all_f, oshmem_op_xor_fint8) diff --git a/oshmem/tools/oshmem_info/Makefile.am b/oshmem/tools/oshmem_info/Makefile.am index c4ddc2d6e9a..a474eaf51d0 100644 --- a/oshmem/tools/oshmem_info/Makefile.am +++ b/oshmem/tools/oshmem_info/Makefile.am @@ -1,8 +1,10 @@ # # Copyright (c) 2014 Mellanox Technologies, Inc. # All rights reserved. -# Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2016 IBM Corporation. All rights reserved. +# Copyright (c) 2017 Research Organization for Information Science +# and Technology (RIST). All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -15,7 +17,7 @@ AM_CPPFLAGS = \ -DOPAL_CONFIGURE_HOST="\"@OPAL_CONFIGURE_HOST@\"" \ -DOPAL_CONFIGURE_DATE="\"@OPAL_CONFIGURE_DATE@\"" \ -DOMPI_BUILD_USER="\"$$USER\"" \ - -DOMPI_BUILD_HOST="\"`hostname`\"" \ + -DOMPI_BUILD_HOST="\"`(hostname || uname -n) 2> /dev/null | sed 1q`\"" \ -DOMPI_BUILD_DATE="\"`date`\"" \ -DOMPI_BUILD_CFLAGS="\"@CFLAGS@\"" \ -DOMPI_BUILD_CPPFLAGS="\"@CPPFLAGS@\"" \ diff --git a/oshmem/tools/oshmem_info/oshmem_info.c b/oshmem/tools/oshmem_info/oshmem_info.c index d51658db4d3..d925f1b6853 100644 --- a/oshmem/tools/oshmem_info/oshmem_info.c +++ b/oshmem/tools/oshmem_info/oshmem_info.c @@ -3,6 +3,7 @@ * All rights reserved. * Copyright (c) 2014 Intel, Inc. All rights reserved. * + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -111,7 +112,7 @@ int main(int argc, char *argv[]) #endif /* add in the ompi frameworks */ - ompi_info_register_types(&mca_types); + opal_info_register_types(&mca_types); /* add in the oshmem frameworks */ oshmem_info_register_types(&mca_types); diff --git a/oshmem/tools/oshmem_info/param.c b/oshmem/tools/oshmem_info/param.c index b3c802276fe..502c4f52edb 100644 --- a/oshmem/tools/oshmem_info/param.c +++ b/oshmem/tools/oshmem_info/param.c @@ -2,9 +2,10 @@ * Copyright (c) 2013 Mellanox Technologies, Inc. * All rights reserved. * - * Copyright (c) 2014-2015 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2014-2016 Research Organization for Information Science + * Copyright (c) 2014-2018 Cisco Systems, Inc. All rights reserved + * Copyright (c) 2014-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2016-2017 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -46,25 +47,7 @@ #include "oshmem/tools/oshmem_info/oshmem_info.h" -const char *ompi_info_deprecated_value = "deprecated-ompi-info-value"; - -static void append(char *dest, size_t max, int *first, char *src) -{ - size_t len; - - if (NULL == src) { - return; - } - - len = max - strlen(dest); - if (!(*first)) { - strncat(dest, ", ", len); - len = max - strlen(dest); - } - strncat(dest, src, len); - *first = 0; -} - +const char *opal_info_deprecated_value = "deprecated-ompi-info-value"; /* * do_config @@ -80,23 +63,7 @@ static void append(char *dest, size_t max, int *first, char *src) */ void oshmem_info_do_config(bool want_all) { - char *cxx; - char *fortran_mpifh; - char *fortran_usempi; - char *fortran_usempif08; - char *fortran_usempif08_compliance; - char *fortran_have_ignore_tkr; - char *fortran_have_f08_assumed_rank; - char *fortran_build_f08_subarrays; - char *fortran_have_optional_args; - char *fortran_have_bind_c; - char *fortran_have_private; - char *fortran_have_abstract; - char *fortran_have_asynchronous; - char *fortran_have_procedure; - char *fortran_have_c_funloc; - char *fortran_08_using_wrappers_for_choice_buffer_functions; - char *java; + char *fortran; char *heterogeneous; char *memprofile; char *memdebug; @@ -104,9 +71,7 @@ void oshmem_info_do_config(bool want_all) char *mpi_interface_warning; char *cprofiling; char *cxxprofiling; - char *fortran_mpifh_profiling; - char *fortran_usempi_profiling; - char *fortran_usempif08_profiling; + char *fortran_profiling; char *cxxexceptions; char *threads; char *have_dl; @@ -114,7 +79,6 @@ void oshmem_info_do_config(bool want_all) char *mpirun_prefix_by_default; #endif char *sparse_groups; - char *have_mpi_io; char *wtime_support; char *symbol_visibility; char *ft_support; @@ -145,84 +109,6 @@ void oshmem_info_do_config(bool want_all) paramcheck = "runtime"; #endif - /* setup the strings that don't require allocations*/ - cxx = OMPI_BUILD_CXX_BINDINGS ? "yes" : "no"; - if (OMPI_BUILD_FORTRAN_BINDINGS >= OMPI_FORTRAN_USEMPI_BINDINGS) { - if (OMPI_FORTRAN_HAVE_IGNORE_TKR) { - fortran_usempi = "yes (full: ignore TKR)"; - } else { - fortran_usempi = "yes (limited: overloading)"; - } - } else { - fortran_usempi = "no"; - } - fortran_usempif08 = OMPI_BUILD_FORTRAN_BINDINGS >= OMPI_FORTRAN_USEMPIF08_BINDINGS ? "yes" : "no"; - fortran_have_f08_assumed_rank = OMPI_FORTRAN_HAVE_F08_ASSUMED_RANK ? - "yes" : "no"; - fortran_build_f08_subarrays = OMPI_BUILD_FORTRAN_F08_SUBARRAYS ? - "yes" : "no"; - fortran_have_optional_args = OMPI_FORTRAN_HAVE_OPTIONAL_ARGS ? - "yes" : "no"; - fortran_have_bind_c = OMPI_FORTRAN_HAVE_BIND_C ? "yes" : "no"; - fortran_have_private = OMPI_FORTRAN_HAVE_PRIVATE ? "yes" : "no"; - fortran_have_abstract = OMPI_FORTRAN_HAVE_ABSTRACT ? "yes" : "no"; - fortran_have_asynchronous = OMPI_FORTRAN_HAVE_ASYNCHRONOUS ? "yes" : "no"; - fortran_have_procedure = OMPI_FORTRAN_HAVE_PROCEDURE ? "yes" : "no"; - fortran_have_c_funloc = OMPI_FORTRAN_HAVE_C_FUNLOC ? "yes" : "no"; - fortran_08_using_wrappers_for_choice_buffer_functions = - OMPI_FORTRAN_NEED_WRAPPER_ROUTINES ? "yes" : "no"; - - /* Build a string describing what level of compliance the mpi_f08 - module has */ - char f08_msg[1024]; - if (OMPI_BUILD_FORTRAN_BINDINGS >= OMPI_FORTRAN_USEMPIF08_BINDINGS) { - - /* Do we have everything? */ - if (OMPI_BUILD_FORTRAN_F08_SUBARRAYS && - OMPI_FORTRAN_HAVE_PRIVATE && - OMPI_FORTRAN_HAVE_ABSTRACT && - OMPI_FORTRAN_HAVE_ASYNCHRONOUS && - OMPI_FORTRAN_HAVE_PROCEDURE && - OMPI_FORTRAN_HAVE_C_FUNLOC && - OMPI_FORTRAN_NEED_WRAPPER_ROUTINES) { - fortran_usempif08_compliance = "The mpi_f08 module is available, and is fully compliant. w00t!"; - } else { - int first = 1; - snprintf(f08_msg, sizeof(f08_msg), - "The mpi_f08 module is available, but due to limitations in the %s compiler, does not support the following: ", - OMPI_FC); - if (!OMPI_BUILD_FORTRAN_F08_SUBARRAYS) { - append(f08_msg, sizeof(f08_msg), &first, "array subsections"); - } - if (!OMPI_FORTRAN_HAVE_PRIVATE) { - append(f08_msg, sizeof(f08_msg), &first, - "private MPI_Status members"); - } - if (!OMPI_FORTRAN_HAVE_ABSTRACT) { - append(f08_msg, sizeof(f08_msg), &first, - "ABSTRACT INTERFACE function pointers"); - } - if (!OMPI_FORTRAN_HAVE_ASYNCHRONOUS) { - append(f08_msg, sizeof(f08_msg), &first, - "Fortran '08-specified ASYNCHRONOUS behavior"); - } - if (!OMPI_FORTRAN_HAVE_PROCEDURE) { - append(f08_msg, sizeof(f08_msg), &first, "PROCEDUREs"); - } - if (!OMPI_FORTRAN_HAVE_C_FUNLOC) { - append(f08_msg, sizeof(f08_msg), &first, "C_FUNLOCs"); - } - if (OMPI_FORTRAN_NEED_WRAPPER_ROUTINES) { - append(f08_msg, sizeof(f08_msg), &first, - "direct passthru (where possible) to underlying Open MPI's C functionality"); - } - fortran_usempif08_compliance = f08_msg; - } - } else { - fortran_usempif08_compliance = "The mpi_f08 module was not built"; - } - - java = OMPI_WANT_JAVA_BINDINGS ? "yes" : "no"; heterogeneous = OPAL_ENABLE_HETEROGENEOUS_SUPPORT ? "yes" : "no"; memprofile = OPAL_ENABLE_MEM_PROFILE ? "yes" : "no"; memdebug = OPAL_ENABLE_MEM_DEBUG ? "yes" : "no"; @@ -231,39 +117,25 @@ void oshmem_info_do_config(bool want_all) cprofiling = "yes"; cxxprofiling = OMPI_BUILD_CXX_BINDINGS ? "yes" : "no"; cxxexceptions = (OMPI_BUILD_CXX_BINDINGS && OMPI_HAVE_CXX_EXCEPTION_SUPPORT) ? "yes" : "no"; - fortran_mpifh_profiling = (OMPI_BUILD_FORTRAN_BINDINGS >= OMPI_FORTRAN_MPIFH_BINDINGS) ? "yes" : "no"; - fortran_usempi_profiling = (OMPI_BUILD_FORTRAN_BINDINGS >= OMPI_FORTRAN_USEMPI_BINDINGS) ? "yes" : "no"; - fortran_usempif08_profiling = (OMPI_BUILD_FORTRAN_BINDINGS >= OMPI_FORTRAN_USEMPIF08_BINDINGS) ? "yes" : "no"; + fortran_profiling = (OMPI_BUILD_FORTRAN_BINDINGS >= OMPI_FORTRAN_MPIFH_BINDINGS) ? "yes" : "no"; have_dl = OPAL_HAVE_DL_SUPPORT ? "yes" : "no"; #if OMPI_RTE_ORTE mpirun_prefix_by_default = ORTE_WANT_ORTERUN_PREFIX_BY_DEFAULT ? "yes" : "no"; #endif sparse_groups = OMPI_GROUP_SPARSE ? "yes" : "no"; - have_mpi_io = OMPI_PROVIDE_MPI_FILE_INTERFACE ? "yes" : "no"; wtime_support = OPAL_TIMER_USEC_NATIVE ? "native" : "gettimeofday"; symbol_visibility = OPAL_C_HAVE_VISIBILITY ? "yes" : "no"; topology_support = "yes"; /* setup strings that require allocation */ if (OMPI_BUILD_FORTRAN_BINDINGS >= OMPI_FORTRAN_MPIFH_BINDINGS) { - (void)asprintf(&fortran_mpifh, "yes (%s)", + (void)asprintf(&fortran, "yes (%s)", (OPAL_HAVE_WEAK_SYMBOLS ? "all" : (OMPI_FORTRAN_CAPS ? "caps" : (OMPI_FORTRAN_PLAIN ? "lower case" : (OMPI_FORTRAN_SINGLE_UNDERSCORE ? "single underscore" : "double underscore"))))); } else { - fortran_mpifh = strdup("no"); - } - - if (OMPI_FORTRAN_HAVE_IGNORE_TKR) { - /* OMPI_FORTRAN_IGNORE_TKR_PREDECL is already in quotes; it - didn't work consistently to put it in _STRINGIFY because - sometimes the compiler would actually interpret the pragma - in there before stringify-ing it. */ - (void)asprintf(&fortran_have_ignore_tkr, "yes (%s)", - OMPI_FORTRAN_IGNORE_TKR_PREDECL); - } else { - fortran_have_ignore_tkr = strdup("no"); + fortran = strdup("no"); } #if OMPI_RTE_ORTE @@ -291,20 +163,8 @@ void oshmem_info_do_config(bool want_all) opal_info_out("Built host", "build:host", OMPI_BUILD_HOST); opal_info_out("C bindings", "bindings:c", "yes"); - opal_info_out("C++ bindings", "bindings:cxx", cxx); - opal_info_out("Fort mpif.h", "bindings:mpif.h", fortran_mpifh); - free(fortran_mpifh); - opal_info_out("Fort use mpi", "bindings:use_mpi", - fortran_usempi); - opal_info_out("Fort use mpi size", "bindings:use_mpi:size", - ompi_info_deprecated_value); - opal_info_out("Fort use mpi_f08", "bindings:use_mpi_f08", - fortran_usempif08); - opal_info_out("Fort mpi_f08 compliance", "bindings:use_mpi_f08:compliance", - fortran_usempif08_compliance); - opal_info_out("Fort mpi_f08 subarrays", "bindings:use_mpi_f08:subarrays-supported", - fortran_build_f08_subarrays); - opal_info_out("Java bindings", "bindings:java", java); + opal_info_out("Fort shmem.fh", "bindings:fortran", fortran); + free(fortran); opal_info_out("Wrapper compiler rpath", "compiler:all:rpath", WRAPPER_RPATH_SUPPORT); @@ -351,36 +211,6 @@ void oshmem_info_do_config(bool want_all) opal_info_out("Fort compiler", "compiler:fortran:command", OMPI_FC); opal_info_out("Fort compiler abs", "compiler:fortran:absolute", OMPI_FC_ABSOLUTE); - opal_info_out("Fort ignore TKR", "compiler:fortran:ignore_tkr", - fortran_have_ignore_tkr); - free(fortran_have_ignore_tkr); - opal_info_out("Fort 08 assumed shape", - "compiler:fortran:f08_assumed_rank", - fortran_have_f08_assumed_rank); - opal_info_out("Fort optional args", - "compiler:fortran:optional_arguments", - fortran_have_optional_args); - opal_info_out("Fort BIND(C)", - "compiler:fortran:bind_c", - fortran_have_bind_c); - opal_info_out("Fort PRIVATE", - "compiler:fortran:private", - fortran_have_private); - opal_info_out("Fort ABSTRACT", - "compiler:fortran:abstract", - fortran_have_abstract); - opal_info_out("Fort ASYNCHRONOUS", - "compiler:fortran:asynchronous", - fortran_have_asynchronous); - opal_info_out("Fort PROCEDURE", - "compiler:fortran:procedure", - fortran_have_procedure); - opal_info_out("Fort C_FUNLOC", - "compiler:fortran:c_funloc", - fortran_have_c_funloc); - opal_info_out("Fort f08 using wrappers", - "compiler:fortran:08_wrappers", - fortran_08_using_wrappers_for_choice_buffer_functions); if (want_all) { @@ -517,13 +347,8 @@ void oshmem_info_do_config(bool want_all) opal_info_out("C profiling", "option:profiling:c", cprofiling); opal_info_out("C++ profiling", "option:profiling:cxx", cxxprofiling); - opal_info_out("Fort mpif.h profiling", "option:profiling:mpif.h", - fortran_mpifh_profiling); - opal_info_out("Fort use mpi profiling", "option:profiling:use_mpi", - fortran_usempi_profiling); - opal_info_out("Fort use mpi_f08 prof", - "option:profiling:use_mpi_f08", - fortran_usempif08_profiling); + opal_info_out("Fort shmem.fh profiling", "option:profiling:shmem.fh", + fortran_profiling); opal_info_out("C++ exceptions", "option:cxx_exceptions", cxxexceptions); opal_info_out("Thread support", "option:threads", threads); @@ -566,7 +391,6 @@ void oshmem_info_do_config(bool want_all) opal_info_out("mpirun default --prefix", "mpirun:prefix_by_default", mpirun_prefix_by_default); #endif - opal_info_out("MPI I/O support", "options:mpi-io", have_mpi_io); opal_info_out("MPI_WTIME support", "options:mpi-wtime", wtime_support); opal_info_out("Symbol vis. support", "options:visibility", symbol_visibility); opal_info_out("Host topology support", "options:host-topology", @@ -592,13 +416,8 @@ void oshmem_info_do_config(bool want_all) MPI_MAX_INFO_VAL); opal_info_out_int("MPI_MAX_PORT_NAME", "options:mpi-max-port-name", MPI_MAX_PORT_NAME); -#if OMPI_PROVIDE_MPI_FILE_INTERFACE opal_info_out_int("MPI_MAX_DATAREP_STRING", "options:mpi-max-datarep-string", MPI_MAX_DATAREP_STRING); -#else - opal_info_out("MPI_MAX_DATAREP_STRING", "options:mpi-max-datarep-string", - "IO interface not provided"); -#endif /* This block displays all the options with which the current * installation of oshmem was configured. */ diff --git a/oshmem/tools/wrappers/Makefile.am b/oshmem/tools/wrappers/Makefile.am index 9f1ddcef368..e8f1e48484a 100644 --- a/oshmem/tools/wrappers/Makefile.am +++ b/oshmem/tools/wrappers/Makefile.am @@ -2,7 +2,7 @@ # All rights reserved. # Copyright (c) 2013-2014 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2014 Intel, Inc. All rights reserved. -# Copyright (c) 2014 Research Organization for Information Science +# Copyright (c) 2014-2017 Research Organization for Information Science # and Technology (RIST). All rights reserved. # $COPYRIGHT$ # @@ -12,49 +12,95 @@ include $(top_srcdir)/Makefile.ompi-rules -man_pages = oshcc.1 shmemcc.1 oshfort.1 shmemfort.1 oshrun.1 shmemrun.1 +man_pages = oshcc.1 shmemcc.1 oshc++.1 shmemc++.1 oshcxx.1 shmemcxx.1 oshfort.1 shmemfort.1 oshrun.1 shmemrun.1 if PROJECT_OSHMEM man_MANS = $(man_pages) nodist_oshmemdata_DATA = \ shmemcc-wrapper-data.txt \ + shmemc++-wrapper-data.txt \ shmemfort-wrapper-data.txt # Only install / uninstall if we're building oshmem -install-exec-hook: +install-exec-hook-always: test -z "$(bindir)" || $(mkdir_p) "$(DESTDIR)$(bindir)" (cd $(DESTDIR)$(bindir); rm -f shmemrun$(EXEEXT); $(LN_S) mpirun$(EXEEXT) shmemrun$(EXEEXT)) (cd $(DESTDIR)$(bindir); rm -f oshrun$(EXEEXT); $(LN_S) mpirun$(EXEEXT) oshrun$(EXEEXT)) - (cd $(DESTDIR)$(bindir); rm -f shmemcc$(EXEEXT); $(LN_S) mpicc$(EXEEXT) shmemcc$(EXEEXT)) - (cd $(DESTDIR)$(bindir); rm -f oshcc$(EXEEXT); $(LN_S) mpicc$(EXEEXT) oshcc$(EXEEXT)) - (cd $(DESTDIR)$(bindir); rm -f shmemfort$(EXEEXT); $(LN_S) mpifort$(EXEEXT) shmemfort$(EXEEXT)) - (cd $(DESTDIR)$(bindir); rm -f oshfort$(EXEEXT); $(LN_S) mpifort$(EXEEXT) oshfort$(EXEEXT)) - -install-data-hook: + (cd $(DESTDIR)$(bindir); rm -f shmemcc$(EXEEXT); $(LN_S) opal_wrapper$(EXEEXT) shmemcc$(EXEEXT)) + (cd $(DESTDIR)$(bindir); rm -f oshcc$(EXEEXT); $(LN_S) opal_wrapper$(EXEEXT) oshcc$(EXEEXT)) + (cd $(DESTDIR)$(bindir); rm -f shmemc++$(EXEEXT); $(LN_S) opal_wrapper$(EXEEXT) shmemc++$(EXEEXT)) + (cd $(DESTDIR)$(bindir); rm -f shmemcxx$(EXEEXT); $(LN_S) opal_wrapper$(EXEEXT) shmemcxx$(EXEEXT)) + (cd $(DESTDIR)$(bindir); rm -f oshc++$(EXEEXT); $(LN_S) opal_wrapper$(EXEEXT) oshc++$(EXEEXT)) + (cd $(DESTDIR)$(bindir); rm -f oshcxx$(EXEEXT); $(LN_S) opal_wrapper$(EXEEXT) oshcxx$(EXEEXT)) + (cd $(DESTDIR)$(bindir); rm -f shmemfort$(EXEEXT); $(LN_S) opal_wrapper$(EXEEXT) shmemfort$(EXEEXT)) + (cd $(DESTDIR)$(bindir); rm -f oshfort$(EXEEXT); $(LN_S) opal_wrapper$(EXEEXT) oshfort$(EXEEXT)) + +install-data-hook-always: (cd $(DESTDIR)$(pkgdatadir); rm -f oshcc-wrapper-data.txt; $(LN_S) shmemcc-wrapper-data.txt oshcc-wrapper-data.txt) + (cd $(DESTDIR)$(pkgdatadir); rm -f shmemcxx-wrapper-data.txt; $(LN_S) shmemc++-wrapper-data.txt shmemcxx-wrapper-data.txt) + (cd $(DESTDIR)$(pkgdatadir); rm -f oshc++-wrapper-data.txt; $(LN_S) shmemc++-wrapper-data.txt oshc++-wrapper-data.txt) + (cd $(DESTDIR)$(pkgdatadir); rm -f oshcxx-wrapper-data.txt; $(LN_S) shmemc++-wrapper-data.txt oshcxx-wrapper-data.txt) (cd $(DESTDIR)$(pkgdatadir); rm -f oshfort-wrapper-data.txt; $(LN_S) shmemfort-wrapper-data.txt oshfort-wrapper-data.txt) -uninstall-local: +uninstall-local-always: rm -f $(DESTDIR)$(bindir)/shmemrun$(EXEEXT) \ $(DESTDIR)$(bindir)/oshrun$(EXEEXT) \ $(DESTDIR)$(bindir)/shmemcc$(EXEEXT) \ $(DESTDIR)$(bindir)/oshcc$(EXEEXT) \ + $(DESTDIR)$(bindir)/shmemcxx$(EXEEXT) \ + $(DESTDIR)$(bindir)/oshcxx$(EXEEXT) \ $(DESTDIR)$(bindir)/shmemfort$(EXEEXT) \ $(DESTDIR)$(bindir)/oshfort$(EXEEXT) \ $(DESTDIR)$(pkgdatadir)/shmemcc-wrapper-data.txt \ $(DESTDIR)$(pkgdatadir)/oshcc-wrapper-data.txt \ + $(DESTDIR)$(pkgdatadir)/shmemcxx-wrapper-data.txt \ + $(DESTDIR)$(pkgdatadir)/oshcxx-wrapper-data.txt \ $(DESTDIR)$(pkgdatadir)/shmemfort-wrapper-data.txt \ $(DESTDIR)$(pkgdatadir)/oshfort-wrapper-data.txt +if CASE_SENSITIVE_FS +man_MANS += oshCC.1 shmemCC.1 + +install-exec-hook: install-exec-hook-always + (cd $(DESTDIR)$(bindir); rm -f shmemCC$(EXEEXT); $(LN_S) opal_wrapper$(EXEEXT) shmemCC$(EXEEXT)) + (cd $(DESTDIR)$(bindir); rm -f oshCC$(EXEEXT); $(LN_S) opal_wrapper$(EXEEXT) oshCC$(EXEEXT)) + +install-data-hook: install-data-hook-always + (cd $(DESTDIR)$(pkgdatadir); rm -f shmemCC-wrapper-data.txt; $(LN_S) shmemcxx-wrapper-data.txt shmemCC-wrapper-data.txt) + (cd $(DESTDIR)$(pkgdatadir); rm -f oshCC-wrapper-data.txt; $(LN_S) oshcxx-wrapper-data.txt oshCC-wrapper-data.txt) + +uninstall-local: uninstall-local-always + rm -f $(DESTDIR)$(bindir)/shmemCC$(EXEEXT) \ + $(DESTDIR)$(mandir)/man1/shmemCC.1 \ + $(DESTDIR)$(pkgdatadir)/shmemCC-wrapper-data.txt + rm -f $(DESTDIR)$(bindir)/oshCC$(EXEEXT) \ + $(DESTDIR)$(mandir)/man1/oshCC.1 \ + $(DESTDIR)$(pkgdatadir)/oshCC-wrapper-data.txt + +oshCC.1: $(top_builddir)/opal/tools/wrappers/generic_wrapper.1 + rm -f oshCC.1 + sed -e 's/#COMMAND#/oshCC/g' -e 's/#PROJECT#/Open SHMEM/g' -e 's/#PROJECT_SHORT#/OSHMEM/g' -e 's/#LANGUAGE#/C++/g' < $(top_builddir)/opal/tools/wrappers/generic_wrapper.1 > oshCC.1 + +shmemCC.1: $(top_builddir)/opal/tools/wrappers/generic_wrapper.1 + rm -f shmemCC.1 + sed -e 's/#COMMAND#/shmemCC/g' -e 's/#PROJECT#/Open SHMEM/g' -e 's/#PROJECT_SHORT#/OSHMEM/g' -e 's/#LANGUAGE#/C++/g' < $(top_builddir)/opal/tools/wrappers/generic_wrapper.1 > shmemCC.1 + +else # CASE_SENSITIVE_FS +install-exec-hook: install-exec-hook-always +install-data-hook: install-data-hook-always +uninstall-local: uninstall-local-always + +endif # CASE_SENSITIVE_FS + ######################################################## # # Man page generation / handling # ######################################################## distclean-local: - rm -f $(man_pages) + rm -f $(man_MANS) $(top_builddir)/opal/tools/wrappers/generic_wrapper.1: (cd $(top_builddir)/opal/tools/wrappers && $(MAKE) $(AM_MAKEFLAGS) generic_wrapper.1) @@ -67,6 +113,22 @@ shmemcc.1: $(top_builddir)/opal/tools/wrappers/generic_wrapper.1 rm -f shmemcc.1 sed -e 's/#COMMAND#/shmemcc/g' -e 's/#PROJECT#/Open SHMEM/g' -e 's/#PROJECT_SHORT#/OSHMEM/g' -e 's/#LANGUAGE#/C/g' < $(top_builddir)/opal/tools/wrappers/generic_wrapper.1 > shmemcc.1 +oshc++.1: $(top_builddir)/opal/tools/wrappers/generic_wrapper.1 + rm -f oshc++.1 + sed -e 's/#COMMAND#/oshc++/g' -e 's/#PROJECT#/Open SHMEM/g' -e 's/#PROJECT_SHORT#/OSHMEM/g' -e 's/#LANGUAGE#/C++/g' < $(top_builddir)/opal/tools/wrappers/generic_wrapper.1 > oshc++.1 + +shmemc++.1: $(top_builddir)/opal/tools/wrappers/generic_wrapper.1 + rm -f shmemc++.1 + sed -e 's/#COMMAND#/shmemc++/g' -e 's/#PROJECT#/Open SHMEM/g' -e 's/#PROJECT_SHORT#/OSHMEM/g' -e 's/#LANGUAGE#/C++/g' < $(top_builddir)/opal/tools/wrappers/generic_wrapper.1 > shmemc++.1 + +oshcxx.1: $(top_builddir)/opal/tools/wrappers/generic_wrapper.1 + rm -f oshcxx.1 + sed -e 's/#COMMAND#/oshcxx/g' -e 's/#PROJECT#/Open SHMEM/g' -e 's/#PROJECT_SHORT#/OSHMEM/g' -e 's/#LANGUAGE#/C++/g' < $(top_builddir)/opal/tools/wrappers/generic_wrapper.1 > oshcxx.1 + +shmemcxx.1: $(top_builddir)/opal/tools/wrappers/generic_wrapper.1 + rm -f shmemcxx.1 + sed -e 's/#COMMAND#/shmemcxx/g' -e 's/#PROJECT#/Open SHMEM/g' -e 's/#PROJECT_SHORT#/OSHMEM/g' -e 's/#LANGUAGE#/C++/g' < $(top_builddir)/opal/tools/wrappers/generic_wrapper.1 > shmemcxx.1 + oshfort.1: $(top_builddir)/opal/tools/wrappers/generic_wrapper.1 rm -f oshfort.1 sed -e 's/#COMMAND#/oshfort/g' -e 's/#PROJECT#/Open SHMEM/g' -e 's/#PROJECT_SHORT#/OSHMEM/g' -e 's/#LANGUAGE#/Fortran/g' < $(top_builddir)/opal/tools/wrappers/generic_wrapper.1 > oshfort.1 diff --git a/oshmem/tools/wrappers/shmemc++-wrapper-data.txt.in b/oshmem/tools/wrappers/shmemc++-wrapper-data.txt.in new file mode 100644 index 00000000000..ebd1d963192 --- /dev/null +++ b/oshmem/tools/wrappers/shmemc++-wrapper-data.txt.in @@ -0,0 +1,37 @@ +# Copyright (c) 2013 Mellanox Technologies, Inc. +# All rights reserved. +# Copyright (c) 2014-2015 Cisco Systems, Inc. All rights reserved. +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# +# There can be multiple blocks of configuration data, chosen by +# compiler flags (using the compiler_args key to chose which block +# should be activated. This can be useful for multilib builds. See the +# multilib page at: +# https://github.com/open-mpi/ompi/wiki/compilerwrapper3264 +# for more information. + +project=Open SHMEM +project_short=OSHMEM +version=@OSHMEM_VERSION@ +language=C++ +compiler_env=CXX +compiler_flags_env=CXXFLAGS +compiler=@CXX@ +preprocessor_flags=@OMPI_WRAPPER_EXTRA_CPPFLAGS@ +compiler_flags=@OMPI_WRAPPER_EXTRA_CXXFLAGS@ +linker_flags=@OMPI_WRAPPER_EXTRA_LDFLAGS@ +# Note that per https://svn.open-mpi.org/trac/ompi/ticket/3422, we +# intentionally only link in the SHMEM and MPI libraries (ORTE, OPAL, +# etc. are pulled in implicitly) because we intend SHMEM/MPI +# applications to only use the SHMEM and MPI APIs. +libs=-loshmem -lmpi +libs_static=-loshmem -lmpi -l@ORTE_LIB_PREFIX@open-rte -l@OPAL_LIB_PREFIX@open-pal @OMPI_WRAPPER_EXTRA_LIBS@ +dyn_lib_file=liboshmem.@OPAL_DYN_LIB_SUFFIX@ +static_lib_file=liboshmem.a +required_file= +includedir=${includedir} +libdir=${libdir} diff --git a/test/Makefile.am b/test/Makefile.am index 7eee672d46e..9982b94530a 100644 --- a/test/Makefile.am +++ b/test/Makefile.am @@ -13,6 +13,7 @@ # Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. # Copyright (c) 2015-2016 Research Organization for Information Science # and Technology (RIST). All rights reserved. +# Copyright (c) 2017 IBM Corporation. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -21,7 +22,7 @@ # # support needs to be first for dependencies -SUBDIRS = support asm class threads datatype util dss +SUBDIRS = support asm class threads datatype util dss mpool if PROJECT_OMPI SUBDIRS += monitoring endif diff --git a/test/asm/Makefile.am b/test/asm/Makefile.am index 31cbe61d742..17f222bbd9d 100644 --- a/test/asm/Makefile.am +++ b/test/asm/Makefile.am @@ -38,46 +38,44 @@ EXTRA_DIST = run_tests ###################################################################### atomic_barrier_SOURCES = atomic_barrier.c -atomic_barrier_LDADD = $(top_builddir)/opal/asm/libasm.la atomic_barrier_noinline.c: ln -s $(top_srcdir)/test/asm/atomic_barrier.c atomic_barrier_noinline.c atomic_barrier_noinline_SOURCES = atomic_barrier_noinline.c -atomic_barrier_noinline_LDADD = $(top_builddir)/opal/asm/libasm.la atomic_barrier_noinline_CFLAGS = $(AM_CFLAGS) -DOMPI_DISABLE_INLINE_ASM ###################################################################### atomic_spinlock_SOURCES = atomic_spinlock.c -atomic_spinlock_LDADD = $(top_builddir)/opal/asm/libasm.la $(libs) +atomic_spinlock_LDADD = $(libs) atomic_spinlock_noinline.c: ln -s $(top_srcdir)/test/asm/atomic_spinlock.c atomic_spinlock_noinline.c atomic_spinlock_noinline_SOURCES = atomic_spinlock_noinline.c atomic_spinlock_noinline_CFLAGS = $(AM_CFLAGS) -DOMPI_DISABLE_INLINE_ASM -atomic_spinlock_noinline_LDADD = $(top_builddir)/opal/asm/libasm.la $(libs) +atomic_spinlock_noinline_LDADD = $(libs) ###################################################################### atomic_math_SOURCES = atomic_math.c -atomic_math_LDADD = $(top_builddir)/opal/asm/libasm.la $(libs) +atomic_math_LDADD = $(libs) atomic_math_noinline.c: ln -s $(top_srcdir)/test/asm/atomic_math.c atomic_math_noinline.c atomic_math_noinline_SOURCES = atomic_math_noinline.c atomic_math_noinline_CFLAGS = $(AM_CFLAGS) -DOMPI_DISABLE_INLINE_ASM -atomic_math_noinline_LDADD = $(top_builddir)/opal/asm/libasm.la $(libs) +atomic_math_noinline_LDADD = $(libs) ###################################################################### atomic_cmpset_SOURCES = atomic_cmpset.c -atomic_cmpset_LDADD = $(top_builddir)/opal/asm/libasm.la $(libs) +atomic_cmpset_LDADD = $(libs) atomic_cmpset_noinline.c: ln -s $(top_srcdir)/test/asm/atomic_cmpset.c atomic_cmpset_noinline.c atomic_cmpset_noinline_SOURCES = atomic_cmpset_noinline.c atomic_cmpset_noinline_CFLAGS = $(AM_CFLAGS) -DOMPI_DISABLE_INLINE_ASM -atomic_cmpset_noinline_LDADD = $(top_builddir)/opal/asm/libasm.la $(libs) +atomic_cmpset_noinline_LDADD = $(libs) ###################################################################### diff --git a/test/asm/atomic_cmpset.c b/test/asm/atomic_cmpset.c index cfe08a7bbbf..4a06847703f 100644 --- a/test/asm/atomic_cmpset.c +++ b/test/asm/atomic_cmpset.c @@ -1,3 +1,4 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology @@ -12,6 +13,8 @@ * Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2015 Research Organization for Information Science * and Technology (RIST). All rights reserved. + * Copyright (c) 2017 Los Alamos National Security, LLC. All rights + * reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -54,6 +57,13 @@ int64_t old64 = 0; int64_t new64 = 0; #endif +#if OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_128 +volatile opal_int128_t vol128; +opal_int128_t val128; +opal_int128_t old128; +opal_int128_t new128; +#endif + volatile int volint = 0; int valint = 0; int oldint = 0; @@ -72,11 +82,11 @@ static void *thread_main(void *arg) /* thread tests */ for (i = 0; i < nreps; i++) { - opal_atomic_add_32(&val32, 5); + opal_atomic_add_fetch_32(&val32, 5); #if OPAL_HAVE_ATOMIC_MATH_64 - opal_atomic_add_64(&val64, 5); + opal_atomic_add_fetch_64(&val64, 5); #endif - opal_atomic_add(&valint, 5); + opal_atomic_add (&valint, 5); } return (void *) (unsigned long) (rank + 1000); @@ -99,143 +109,184 @@ int main(int argc, char *argv[]) /* -- cmpset 32-bit tests -- */ vol32 = 42, old32 = 42, new32 = 50; - assert(opal_atomic_cmpset_32(&vol32, old32, new32) == 1); + assert(opal_atomic_compare_exchange_strong_32 (&vol32, &old32, new32) == true); opal_atomic_rmb(); assert(vol32 == new32); + assert(old32 == 42); vol32 = 42, old32 = 420, new32 = 50; - assert(opal_atomic_cmpset_32(&vol32, old32, new32) == 0); + assert(opal_atomic_compare_exchange_strong_32 (&vol32, &old32, new32) == false); opal_atomic_rmb(); assert(vol32 == 42); + assert(old32 == 42); vol32 = 42, old32 = 42, new32 = 50; - assert(opal_atomic_cmpset_acq_32(&vol32, old32, new32) == 1); + assert(opal_atomic_compare_exchange_strong_32 (&vol32, &old32, new32) == true); assert(vol32 == new32); + assert(old32 == 42); vol32 = 42, old32 = 420, new32 = 50; - assert(opal_atomic_cmpset_acq_32(&vol32, old32, new32) == 0); + assert(opal_atomic_compare_exchange_strong_acq_32 (&vol32, &old32, new32) == false); assert(vol32 == 42); + assert(old32 == 42); vol32 = 42, old32 = 42, new32 = 50; - assert(opal_atomic_cmpset_rel_32(&vol32, old32, new32) == 1); + assert(opal_atomic_compare_exchange_strong_rel_32 (&vol32, &old32, new32) == true); opal_atomic_rmb(); assert(vol32 == new32); + assert(old32 == 42); vol32 = 42, old32 = 420, new32 = 50; - assert(opal_atomic_cmpset_rel_32(&vol32, old32, new32) == 0); + assert(opal_atomic_compare_exchange_strong_rel_32 (&vol32, &old32, new32) == false); opal_atomic_rmb(); assert(vol32 == 42); + assert(old32 == 42); /* -- cmpset 64-bit tests -- */ #if OPAL_HAVE_ATOMIC_MATH_64 vol64 = 42, old64 = 42, new64 = 50; - assert(1 == opal_atomic_cmpset_64(&vol64, old64, new64)); + assert(opal_atomic_compare_exchange_strong_64 (&vol64, &old64, new64) == true); opal_atomic_rmb(); assert(new64 == vol64); + assert(old64 == 42); vol64 = 42, old64 = 420, new64 = 50; - assert(opal_atomic_cmpset_64(&vol64, old64, new64) == 0); + assert(opal_atomic_compare_exchange_strong_64 (&vol64, &old64, new64) == false); opal_atomic_rmb(); assert(vol64 == 42); + assert(old64 == 42); vol64 = 42, old64 = 42, new64 = 50; - assert(opal_atomic_cmpset_acq_64(&vol64, old64, new64) == 1); + assert(opal_atomic_compare_exchange_strong_acq_64 (&vol64, &old64, new64) == true); assert(vol64 == new64); + assert(old64 == 42); vol64 = 42, old64 = 420, new64 = 50; - assert(opal_atomic_cmpset_acq_64(&vol64, old64, new64) == 0); + assert(opal_atomic_compare_exchange_strong_acq_64 (&vol64, &old64, new64) == false); assert(vol64 == 42); + assert(old64 == 42); vol64 = 42, old64 = 42, new64 = 50; - assert(opal_atomic_cmpset_rel_64(&vol64, old64, new64) == 1); + assert(opal_atomic_compare_exchange_strong_rel_64 (&vol64, &old64, new64) == true); opal_atomic_rmb(); assert(vol64 == new64); + assert(old64 == 42); vol64 = 42, old64 = 420, new64 = 50; - assert(opal_atomic_cmpset_rel_64(&vol64, old64, new64) == 0); + assert(opal_atomic_compare_exchange_strong_rel_64 (&vol64, &old64, new64) == false); opal_atomic_rmb(); assert(vol64 == 42); + assert(old64 == 42); #endif + + /* -- cmpset 128-bit tests -- */ + +#if OPAL_HAVE_ATOMIC_COMPARE_EXCHANGE_128 + vol128 = 42, old128 = 42, new128 = 50; + assert(opal_atomic_compare_exchange_strong_128 (&vol128, &old128, new128) == true); + opal_atomic_rmb(); + assert(new128 == vol128); + assert(old128 == 42); + + vol128 = 42, old128 = 420, new128 = 50; + assert(opal_atomic_compare_exchange_strong_128 (&vol128, &old128, new128) == false); + opal_atomic_rmb(); + assert(vol128 == 42); + assert(old128 == 42); +#endif + /* -- cmpset int tests -- */ volint = 42, oldint = 42, newint = 50; - assert(opal_atomic_cmpset(&volint, oldint, newint) == 1); + assert(opal_atomic_compare_exchange_strong (&volint, &oldint, newint) == true); opal_atomic_rmb(); - assert(volint ==newint); + assert(volint == newint); + assert(oldint == 42); volint = 42, oldint = 420, newint = 50; - assert(opal_atomic_cmpset(&volint, oldint, newint) == 0); + assert(opal_atomic_compare_exchange_strong (&volint, &oldint, newint) == false); opal_atomic_rmb(); assert(volint == 42); + assert(oldint == 42); volint = 42, oldint = 42, newint = 50; - assert(opal_atomic_cmpset_acq(&volint, oldint, newint) == 1); + assert(opal_atomic_compare_exchange_strong_acq (&volint, &oldint, newint) == true); assert(volint == newint); + assert(oldint == 42); volint = 42, oldint = 420, newint = 50; - assert(opal_atomic_cmpset_acq(&volint, oldint, newint) == 0); + assert(opal_atomic_compare_exchange_strong_acq (&volint, &oldint, newint) == false); assert(volint == 42); + assert(oldint == 42); volint = 42, oldint = 42, newint = 50; - assert(opal_atomic_cmpset_rel(&volint, oldint, newint) == 1); + assert(opal_atomic_compare_exchange_strong_rel (&volint, &oldint, newint) == true); opal_atomic_rmb(); assert(volint == newint); + assert(oldint == 42); volint = 42, oldint = 420, newint = 50; - assert(opal_atomic_cmpset_rel(&volint, oldint, newint) == 0); + assert(opal_atomic_compare_exchange_strong_rel (&volint, &oldint, newint) == false); opal_atomic_rmb(); assert(volint == 42); + assert(oldint == 42); /* -- cmpset ptr tests -- */ volptr = (void *) 42, oldptr = (void *) 42, newptr = (void *) 50; - assert(opal_atomic_cmpset_ptr(&volptr, oldptr, newptr) == 1); + assert(opal_atomic_compare_exchange_strong_ptr (&volptr, &oldptr, newptr) == true); opal_atomic_rmb(); assert(volptr == newptr); + assert(oldptr == (void *) 42); volptr = (void *) 42, oldptr = (void *) 420, newptr = (void *) 50; - assert(opal_atomic_cmpset_ptr(&volptr, oldptr, newptr) == 0); + assert(opal_atomic_compare_exchange_strong_ptr (&volptr, &oldptr, newptr) == false); opal_atomic_rmb(); assert(volptr == (void *) 42); + assert(oldptr == (void *) 42); volptr = (void *) 42, oldptr = (void *) 42, newptr = (void *) 50; - assert(opal_atomic_cmpset_acq_ptr(&volptr, oldptr, newptr) == 1); + assert(opal_atomic_compare_exchange_strong_acq_ptr (&volptr, &oldptr, newptr) == true); assert(volptr == newptr); + assert(oldptr == (void *) 42); volptr = (void *) 42, oldptr = (void *) 420, newptr = (void *) 50; - assert(opal_atomic_cmpset_acq_ptr(&volptr, oldptr, newptr) == 0); + assert(opal_atomic_compare_exchange_strong_acq_ptr (&volptr, &oldptr, newptr) == false); assert(volptr == (void *) 42); + assert(oldptr == (void *) 42); volptr = (void *) 42, oldptr = (void *) 42, newptr = (void *) 50; - assert(opal_atomic_cmpset_rel_ptr(&volptr, oldptr, newptr) == 1); + assert(opal_atomic_compare_exchange_strong_rel_ptr (&volptr, &oldptr, newptr) == true); opal_atomic_rmb(); assert(volptr == newptr); + assert(oldptr == (void *) 42); volptr = (void *) 42, oldptr = (void *) 420, newptr = (void *) 50; - assert(opal_atomic_cmpset_rel_ptr(&volptr, oldptr, newptr) == 0); + assert(opal_atomic_compare_exchange_strong_rel_ptr (&volptr, &oldptr, newptr) == false); opal_atomic_rmb(); assert(volptr == (void *) 42); + assert(oldptr == (void *) 42); /* -- add_32 tests -- */ val32 = 42; - assert(opal_atomic_add_32(&val32, 5) == (42 + 5)); + assert(opal_atomic_add_fetch_32(&val32, 5) == (42 + 5)); opal_atomic_rmb(); assert((42 + 5) == val32); /* -- add_64 tests -- */ #if OPAL_HAVE_ATOMIC_MATH_64 val64 = 42; - assert(opal_atomic_add_64(&val64, 5) == (42 + 5)); + assert(opal_atomic_add_fetch_64(&val64, 5) == (42 + 5)); opal_atomic_rmb(); assert((42 + 5) == val64); #endif /* -- add_int tests -- */ valint = 42; - opal_atomic_add(&valint, 5); + opal_atomic_add (&valint, 5); opal_atomic_rmb(); assert((42 + 5) == valint); diff --git a/test/asm/atomic_math.c b/test/asm/atomic_math.c index f94299e8185..54f771cc26b 100644 --- a/test/asm/atomic_math.c +++ b/test/asm/atomic_math.c @@ -44,11 +44,11 @@ static void* atomic_math_test(void* arg) int i; for (i = 0 ; i < count ; ++i) { - (void)opal_atomic_add_32(&val32, 5); + (void)opal_atomic_add_fetch_32(&val32, 5); #if OPAL_HAVE_ATOMIC_MATH_64 - (void)opal_atomic_add_64(&val64, 6); + (void)opal_atomic_add_fetch_64(&val64, 6); #endif - (void)opal_atomic_add(&valint, 4); + opal_atomic_add (&valint, 4); } return NULL; @@ -100,6 +100,10 @@ atomic_math_test_th(int count, int thr_count) int main(int argc, char *argv[]) { + int32_t test32; +#if OPAL_HAVE_ATOMIC_MATH_64 + int64_t test64; +#endif int ret = 77; int num_threads = 1; @@ -109,11 +113,147 @@ main(int argc, char *argv[]) } num_threads = atoi(argv[1]); + test32 = opal_atomic_add_fetch_32 (&val32, 17); + if (test32 != 17 || val32 != 17) { + fprintf (stderr, "error in opal_atomic_add_fetch_32. expected (17, 17), got (%d, %d)\n", test32, val32); + exit(EXIT_FAILURE); + } + + test32 = opal_atomic_fetch_add_32 (&val32, 13); + if (test32 != 17 || val32 != 30) { + fprintf (stderr, "error in opal_atomic_fetch_add_32. expected (17, 30), got (%d, %d)\n", test32, val32); + exit(EXIT_FAILURE); + } + + + + test32 = opal_atomic_and_fetch_32 (&val32, 0x18); + if (test32 != 24 || val32 != 24) { + fprintf (stderr, "error in opal_atomic_and_fetch_32. expected (24, 24), got (%d, %d)\n", test32, val32); + exit(EXIT_FAILURE); + } + + test32 = opal_atomic_fetch_and_32 (&val32, 0x10); + if (test32 != 24 || val32 != 16) { + fprintf (stderr, "error in opal_atomic_fetch_and_32. expected (24, 16), got (%d, %d)\n", test32, val32); + exit(EXIT_FAILURE); + } + + + + test32 = opal_atomic_or_fetch_32 (&val32, 0x03); + if (test32 != 19 || val32 != 19) { + fprintf (stderr, "error in opal_atomic_or_fetch_32. expected (19, 19), got (%d, %d)\n", test32, val32); + exit(EXIT_FAILURE); + } + + test32 = opal_atomic_fetch_or_32 (&val32, 0x04); + if (test32 != 19 || val32 != 23) { + fprintf (stderr, "error in opal_atomic_fetch_or_32. expected (19, 23), got (%d, %d)\n", test32, val32); + exit(EXIT_FAILURE); + } + + + test32 = opal_atomic_xor_fetch_32 (&val32, 0x03); + if (test32 != 20 || val32 != 20) { + fprintf (stderr, "error in opal_atomic_xor_fetch_32. expected (20, 20), got (%d, %d)\n", test32, val32); + exit(EXIT_FAILURE); + } + + test32 = opal_atomic_fetch_xor_32 (&val32, 0x05); + if (test32 != 20 || val32 != 17) { + fprintf (stderr, "error in opal_atomic_fetch_xor_32. expected (20, 17), got (%d, %d)\n", test32, val32); + exit(EXIT_FAILURE); + } + + + + test32 = opal_atomic_sub_fetch_32 (&val32, 14); + if (test32 != 3 || val32 != 3) { + fprintf (stderr, "error in opal_atomic_sub_fetch_32. expected (3, 3), got (%d, %d)\n", test32, val32); + exit(EXIT_FAILURE); + } + + test32 = opal_atomic_fetch_xor_32 (&val32, 3); + if (test32 != 3 || val32 != 0) { + fprintf (stderr, "error in opal_atomic_fetch_sub_32. expected (3, 0), got (%d, %d)\n", test32, val32); + exit(EXIT_FAILURE); + } + +#if OPAL_HAVE_ATOMIC_MATH_64 + test64 = opal_atomic_add_fetch_64 (&val64, 17); + if (test64 != 17 || val64 != 17) { + fprintf (stderr, "error in opal_atomic_add_fetch_64. expected (17, 17), got (%" PRId64 ", %" PRId64 ")\n", test64, val64); + exit(EXIT_FAILURE); + } + + test64 = opal_atomic_fetch_add_64 (&val64, 13); + if (test64 != 17 || val64 != 30) { + fprintf (stderr, "error in opal_atomic_fetch_add_64. expected (17, 30), got (%" PRId64 ", %" PRId64 ")\n", test64, val64); + exit(EXIT_FAILURE); + } + + + + test64 = opal_atomic_and_fetch_64 (&val64, 0x18); + if (test64 != 24 || val64 != 24) { + fprintf (stderr, "error in opal_atomic_and_fetch_64. expected (24, 24), got (%" PRId64 ", %" PRId64 ")\n", test64, val64); + exit(EXIT_FAILURE); + } + + test64 = opal_atomic_fetch_and_64 (&val64, 0x10); + if (test64 != 24 || val64 != 16) { + fprintf (stderr, "error in opal_atomic_fetch_and_64. expected (24, 16), got (%" PRId64 ", %" PRId64 ")\n", test64, val64); + exit(EXIT_FAILURE); + } + + + + test64 = opal_atomic_or_fetch_64 (&val64, 0x03); + if (test64 != 19 || val64 != 19) { + fprintf (stderr, "error in opal_atomic_or_fetch_64. expected (19, 19), got (%" PRId64 ", %" PRId64 ")\n", test64, val64); + exit(EXIT_FAILURE); + } + + test64 = opal_atomic_fetch_or_64 (&val64, 0x04); + if (test64 != 19 || val64 != 23) { + fprintf (stderr, "error in opal_atomic_fetch_or_64. expected (19, 23), got (%" PRId64 ", %" PRId64 ")\n", test64, val64); + exit(EXIT_FAILURE); + } + + + test64 = opal_atomic_xor_fetch_64 (&val64, 0x03); + if (test64 != 20 || val64 != 20) { + fprintf (stderr, "error in opal_atomic_xor_fetch_64. expected (20, 20), got (%" PRId64 ", %" PRId64 ")\n", test64, val64); + exit(EXIT_FAILURE); + } + + test64 = opal_atomic_fetch_xor_64 (&val64, 0x05); + if (test64 != 20 || val64 != 17) { + fprintf (stderr, "error in opal_atomic_fetch_xor_64. expected (20, 17), got (%" PRId64 ", %" PRId64 ")\n", test64, val64); + exit(EXIT_FAILURE); + } + + + + test64 = opal_atomic_sub_fetch_64 (&val64, 14); + if (test64 != 3 || val64 != 3) { + fprintf (stderr, "error in opal_atomic_sub_fetch_64. expected (3, 3), got (%" PRId64 ", %" PRId64 ")\n", test64, val64); + exit(EXIT_FAILURE); + } + + test64 = opal_atomic_fetch_xor_64 (&val64, 3); + if (test64 != 3 || val64 != 0) { + fprintf (stderr, "error in opal_atomic_fetch_sub_64. expected (3, 0), got (%" PRId64 ", %" PRId64 ")\n", test64, val64); + exit(EXIT_FAILURE); + } +#endif + ret = atomic_math_test_th(TEST_REPS, num_threads); if (ret == 77) return ret; opal_atomic_mb(); if (val32 != TEST_REPS * num_threads * 5) { - printf("opal_atomic_add32 failed. Expected %d, got %d.\n", + printf("opal_atomic_add_fetch32 failed. Expected %d, got %d.\n", TEST_REPS * num_threads * 5, val32); ret = 1; } @@ -121,7 +261,7 @@ main(int argc, char *argv[]) if (val64 != TEST_REPS * num_threads * 6) { /* Safe to case to (int) here because we know it's going to be a small value */ - printf("opal_atomic_add32 failed. Expected %d, got %d.\n", + printf("opal_atomic_add_fetch32 failed. Expected %d, got %d.\n", TEST_REPS * num_threads * 6, (int) val64); ret = 1; } @@ -129,7 +269,7 @@ main(int argc, char *argv[]) printf(" * skipping 64 bit tests\n"); #endif if (valint != TEST_REPS * num_threads * 4) { - printf("opal_atomic_add32 failed. Expected %d, got %d.\n", + printf("opal_atomic_add_fetch32 failed. Expected %d, got %d.\n", TEST_REPS * num_threads * 4, valint); ret = 1; } diff --git a/test/asm/atomic_spinlock.c b/test/asm/atomic_spinlock.c index ac7941581fd..a6c68b134d0 100644 --- a/test/asm/atomic_spinlock.c +++ b/test/asm/atomic_spinlock.c @@ -123,7 +123,7 @@ main(int argc, char *argv[]) } num_threads = atoi(argv[1]); - opal_atomic_init(&lock, OPAL_ATOMIC_UNLOCKED); + opal_atomic_lock_init(&lock, OPAL_ATOMIC_LOCK_UNLOCKED); ret = atomic_spinlock_test_th(&lock, TEST_REPS, 0, num_threads); return ret; diff --git a/test/carto/carto-file b/test/carto/carto-file index 3253b2590fa..46e6c8ef323 100644 --- a/test/carto/carto-file +++ b/test/carto/carto-file @@ -20,7 +20,7 @@ NODE Ethernet eth1 # # -# Connection decleration From node To node:weight To node:weight ...... +# Connection declaration From node To node:weight To node:weight ...... # (Reserve word) (declered (declered (declered # above) above) above) #=============================================================================================== diff --git a/test/carto/carto_test.c b/test/carto/carto_test.c index 05be6f8f68e..2ff81d6d86c 100644 --- a/test/carto/carto_test.c +++ b/test/carto/carto_test.c @@ -56,13 +56,13 @@ main(int argc, char* argv[]) opal_graph_print(graph); slot0 = opal_carto_base_find_node(graph, "slot0"); if (NULL == slot0) { - opal_output(0,"couldnt find slot0 in the graph exiting\n"); + opal_output(0,"couldn't find slot0 in the graph exiting\n"); opal_carto_base_free_graph(graph); return -1; } end_node = opal_carto_base_find_node(graph, "slot3"); if (NULL == end_node) { - opal_output(0,"couldnt find mthca1 in the graph exiting\n"); + opal_output(0,"couldn't find mthca1 in the graph exiting\n"); opal_carto_base_free_graph(graph); return -1; } @@ -87,13 +87,13 @@ main(int argc, char* argv[]) opal_graph_print(graph); slot0 = opal_carto_base_find_node(graph, "slot0"); if (NULL == slot0) { - opal_output(0,"couldnt find slot0 in the graph exiting\n"); + opal_output(0,"couldn't find slot0 in the graph exiting\n"); opal_carto_base_free_graph(graph); return -1; } end_node = opal_carto_base_find_node(graph, "mthca1"); if (NULL == end_node) { - opal_output(0,"couldnt find mthca1 in the graph exiting\n"); + opal_output(0,"couldn't find mthca1 in the graph exiting\n"); opal_carto_base_free_graph(graph); return -1; } @@ -118,13 +118,13 @@ main(int argc, char* argv[]) opal_graph_print(graph); slot0 = opal_carto_base_find_node(graph, "slot0"); if (NULL == slot0) { - opal_output(0,"couldnt find slot0 in the graph exiting\n"); + opal_output(0,"couldn't find slot0 in the graph exiting\n"); opal_carto_base_free_graph(graph); return -1; } end_node = opal_carto_base_find_node(graph, "eth1"); if (NULL == end_node) { - opal_output(0,"couldnt find mthca1 in the graph exiting\n"); + opal_output(0,"couldn't find mthca1 in the graph exiting\n"); opal_carto_base_free_graph(graph); return -1; } @@ -149,13 +149,13 @@ main(int argc, char* argv[]) opal_graph_print(graph); slot0 = opal_carto_base_find_node(graph, "slot0"); if (NULL == slot0) { - opal_output(0,"couldnt find slot0 in the graph exiting\n"); + opal_output(0,"couldn't find slot0 in the graph exiting\n"); opal_carto_base_free_graph(graph); return -1; } end_node = opal_carto_base_find_node(graph, "mem3"); if (NULL == end_node) { - opal_output(0,"couldnt find mthca1 in the graph exiting\n"); + opal_output(0,"couldn't find mthca1 in the graph exiting\n"); opal_carto_base_free_graph(graph); return -1; } diff --git a/test/class/opal_fifo.c b/test/class/opal_fifo.c index 03f9fad4dc6..122524a8d9f 100644 --- a/test/class/opal_fifo.c +++ b/test/class/opal_fifo.c @@ -2,6 +2,7 @@ /* * Copyright (c) 2014 Los Alamos National Security, LLC. All rights * reserved. + * Copyright (c) 2018 IBM Corporation. All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -103,7 +104,7 @@ static void *thread_test_exhaust (void *arg) { static bool check_fifo_consistency (opal_fifo_t *fifo, int expected_count) { - opal_list_item_t *item; + volatile opal_list_item_t *volatile item; int count; for (count = 0, item = fifo->opal_fifo_head.data.item ; item != &fifo->opal_fifo_ghost ; diff --git a/test/class/opal_pointer_array.c b/test/class/opal_pointer_array.c index ed05963e16d..5c7bec18607 100644 --- a/test/class/opal_pointer_array.c +++ b/test/class/opal_pointer_array.c @@ -3,7 +3,7 @@ * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. - * Copyright (c) 2004-2007 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, @@ -71,7 +71,7 @@ static void test(bool thread_usage){ if( (array->size - array->number_free) == test_len_in_array) { test_success(); } else { - test_failure("check on number of elments in array"); + test_failure("check on number of elements in array"); } /* check order of data */ @@ -109,11 +109,7 @@ static void test(bool thread_usage){ } /* test opal_pointer_array_get_item */ - array->number_free=array->size; - array->lowest_free=0; - for(i=0 ; i < array->size ; i++ ) { - array->addr[i] = NULL; - } + opal_pointer_array_remove_all(array); error_cnt=0; for(i=0 ; i < array->size ; i++ ) { value.ivalue = i + 2; @@ -141,7 +137,35 @@ static void test(bool thread_usage){ test_failure(" data check - 2nd "); } - free (array); + OBJ_RELEASE(array); + assert(NULL == array); + + array=OBJ_NEW(opal_pointer_array_t); + assert(array); + opal_pointer_array_init(array, 0, 4, 2); + for( i = 0; i < 4; i++ ) { + value.ivalue = i + 1; + if( 0 > opal_pointer_array_add( array, value.cvalue ) ) { + test_failure("Add/Remove: failure during initial data_add "); + } + } + for( i = i-1; i >= 0; i-- ) { + if( i % 2 ) + if( 0 != opal_pointer_array_set_item(array, i, NULL) ) + test_failure("Add/Remove: failure during item removal "); + } + for( i = 0; i < 4; i++ ) { + if( !opal_pointer_array_add( array, (void*)(uintptr_t)(i+1) ) ) { + if( i != 2 ) { + test_failure("Add/Remove: failure during the readd "); + break; + } + } + } + opal_pointer_array_remove_all(array); + OBJ_RELEASE(array); + assert(NULL == array); + free(test_data); } diff --git a/test/datatype/Makefile.am b/test/datatype/Makefile.am index 9c9aaa4a1a0..cd867134a4f 100644 --- a/test/datatype/Makefile.am +++ b/test/datatype/Makefile.am @@ -18,7 +18,7 @@ if PROJECT_OMPI MPI_TESTS = checksum position position_noncontig ddt_test ddt_raw unpack_ooo ddt_pack external32 MPI_CHECKS = to_self endif -TESTS = opal_datatype_test $(MPI_TESTS) +TESTS = opal_datatype_test unpack_hetero $(MPI_TESTS) check_PROGRAMS = $(TESTS) $(MPI_CHECKS) @@ -79,5 +79,10 @@ external32_LDADD = \ $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la +unpack_hetero_SOURCES = unpack_hetero.c +unpack_hetero_LDFLAGS = $(OMPI_PKG_CONFIG_LDFLAGS) +unpack_hetero_LDADD = \ + $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la + distclean: rm -rf *.dSYM .deps .libs *.log *.o *.trs $(check_PROGRAMS) Makefile diff --git a/test/datatype/checksum.c b/test/datatype/checksum.c index 666f49869a2..34d958ef746 100644 --- a/test/datatype/checksum.c +++ b/test/datatype/checksum.c @@ -115,7 +115,7 @@ int main( int argc, char* argv[] ) OBJ_RELEASE(convertor); /** - * The datatype is not usefull anymore + * The datatype is not useful anymore */ OBJ_RELEASE(sparse); diff --git a/test/datatype/ddt_lib.c b/test/datatype/ddt_lib.c index 9170da0914a..49a5264034b 100644 --- a/test/datatype/ddt_lib.c +++ b/test/datatype/ddt_lib.c @@ -418,7 +418,7 @@ ompi_datatype_t* test_contiguous( void ) { ompi_datatype_t *pdt, *pdt1, *pdt2; - printf( "test contiguous (alignement)\n" ); + printf( "test contiguous (alignment)\n" ); ompi_datatype_create_contiguous(0, &ompi_mpi_datatype_null.dt, &pdt1); ompi_datatype_add( pdt1, &ompi_mpi_double.dt, 1, 0, -1 ); if( outputFlags & DUMP_DATA_AFTER_COMMIT ) { diff --git a/test/datatype/ddt_pack.c b/test/datatype/ddt_pack.c index 3439e16c409..1164e6feca8 100644 --- a/test/datatype/ddt_pack.c +++ b/test/datatype/ddt_pack.c @@ -11,7 +11,7 @@ * Copyright (c) 2004-2006 The Regents of the University of California. * All rights reserved. * Copyright (c) 2006 Sun Microsystems Inc. All rights reserved. - * Copyright (c) 2015 Research Organization for Information Science + * Copyright (c) 2015-2017 Research Organization for Information Science * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * @@ -30,7 +30,7 @@ #include -static int get_extents(ompi_datatype_t * type, OPAL_PTRDIFF_TYPE *lb, OPAL_PTRDIFF_TYPE *extent, OPAL_PTRDIFF_TYPE *true_lb, OPAL_PTRDIFF_TYPE *true_extent) { +static int get_extents(ompi_datatype_t * type, ptrdiff_t *lb, ptrdiff_t *extent, ptrdiff_t *true_lb, ptrdiff_t *true_extent) { int ret; ret = ompi_datatype_get_extent(type, lb, extent); @@ -50,10 +50,10 @@ main(int argc, char* argv[]) struct ompi_datatype_t *unpacked_dt; int ret = 0; int blen[4]; - OPAL_PTRDIFF_TYPE disp[4]; + ptrdiff_t disp[4]; ompi_datatype_t *newType, *types[4], *struct_type, *vec_type; - OPAL_PTRDIFF_TYPE old_lb, old_extent, old_true_lb, old_true_extent; - OPAL_PTRDIFF_TYPE lb, extent, true_lb, true_extent; + ptrdiff_t old_lb, old_extent, old_true_lb, old_true_extent; + ptrdiff_t lb, extent, true_lb, true_extent; /* make ompi_proc_local () work ... */ struct ompi_proc_t dummy_proc; diff --git a/test/datatype/opal_datatype_test.c b/test/datatype/opal_datatype_test.c index fcb8164faf5..9c30a9e36f9 100644 --- a/test/datatype/opal_datatype_test.c +++ b/test/datatype/opal_datatype_test.c @@ -12,6 +12,8 @@ * All rights reserved. * Copyright (c) 2006 Sun Microsystems Inc. All rights reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -49,7 +51,7 @@ uint32_t remote_arch = 0xffffffff; */ static size_t compute_memory_size( opal_datatype_t const * const pdt, int count ) { - OPAL_PTRDIFF_TYPE extent, true_lb, true_extent; + ptrdiff_t extent, true_lb, true_extent; opal_datatype_type_extent( pdt, &extent ); opal_datatype_get_true_extent( pdt, &true_lb, &true_extent ); @@ -140,7 +142,7 @@ static int test_upper( unsigned int length ) */ static int local_copy_ddt_count( opal_datatype_t const * const pdt, int count ) { - OPAL_PTRDIFF_TYPE lb, extent; + ptrdiff_t lb, extent; size_t malloced_size; char *odst, *osrc; void *pdst, *psrc; @@ -185,7 +187,7 @@ static int local_copy_ddt_count( opal_datatype_t const * const pdt, int count ) } } if( 0 == errors ) { - printf("Validation check succesfully passed\n"); + printf("Validation check successfully passed\n"); } else { printf("Found %d errors. Giving up!\n", errors); exit(-1); @@ -202,7 +204,7 @@ local_copy_with_convertor_2datatypes( opal_datatype_t const * const send_type, i opal_datatype_t const * const recv_type, int recv_count, int chunk ) { - OPAL_PTRDIFF_TYPE send_lb, send_extent, recv_lb, recv_extent; + ptrdiff_t send_lb, send_extent, recv_lb, recv_extent; void *pdst = NULL, *psrc = NULL, *ptemp = NULL; char *odst, *osrc; opal_convertor_t *send_convertor = NULL, *recv_convertor = NULL; @@ -306,7 +308,7 @@ local_copy_with_convertor_2datatypes( opal_datatype_t const * const send_type, i static int local_copy_with_convertor( opal_datatype_t const * const pdt, int count, int chunk ) { - OPAL_PTRDIFF_TYPE lb, extent; + ptrdiff_t lb, extent; void *pdst = NULL, *psrc = NULL, *ptemp = NULL; char *odst, *osrc; opal_convertor_t *send_convertor = NULL, *recv_convertor = NULL; @@ -471,7 +473,7 @@ static int local_copy_with_convertor( opal_datatype_t const * const pdt, int cou } } if( 0 == errors ) { - printf("Validation check succesfully passed\n"); + printf("Validation check successfully passed\n"); } else { printf("Found %d errors. Giving up!\n", errors); exit(-1); diff --git a/test/datatype/opal_ddt_lib.c b/test/datatype/opal_ddt_lib.c index 5fabd90a2b7..3fa5592bd5c 100644 --- a/test/datatype/opal_ddt_lib.c +++ b/test/datatype/opal_ddt_lib.c @@ -4,6 +4,8 @@ * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2009 Oak Ridge National Labs. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -27,14 +29,14 @@ uint32_t outputFlags = VALIDATE_DATA | CHECK_PACK_UNPACK | RESET_CONVERTORS | QU static int32_t opal_datatype_create_indexed( int count, const int* pBlockLength, const int* pDisp, const opal_datatype_t* oldType, opal_datatype_t** newType ); -static int32_t opal_datatype_create_hindexed( int count, const int* pBlockLength, const OPAL_PTRDIFF_TYPE* pDisp, +static int32_t opal_datatype_create_hindexed( int count, const int* pBlockLength, const ptrdiff_t* pDisp, const opal_datatype_t* oldType, opal_datatype_t** newType ); static int32_t opal_datatype_create_struct( int count, const int* pBlockLength, - const OPAL_PTRDIFF_TYPE* pDisp, + const ptrdiff_t* pDisp, opal_datatype_t** pTypes, opal_datatype_t** newType ); static int32_t opal_datatype_create_vector( int count, int bLength, int stride, const opal_datatype_t* oldType, opal_datatype_t** newType ); -static int32_t opal_datatype_create_hvector( int count, int bLength, OPAL_PTRDIFF_TYPE stride, +static int32_t opal_datatype_create_hvector( int count, int bLength, ptrdiff_t stride, const opal_datatype_t* oldType, opal_datatype_t** newType ); @@ -136,7 +138,7 @@ opal_datatype_t* test_struct( void ) NULL, (opal_datatype_t*)&opal_datatype_int1 }; int lengths[] = { 2, 1, 3 }; - OPAL_PTRDIFF_TYPE disp[] = { 0, 16, 26 }; + ptrdiff_t disp[] = { 0, 16, 26 }; opal_datatype_t* pdt, *pdt1; printf( "test struct\n" ); @@ -166,7 +168,7 @@ opal_datatype_t* test_struct_char_double( void ) { char_double_t data; int lengths[] = {1, 1}; - OPAL_PTRDIFF_TYPE displ[] = {0, 0}; + ptrdiff_t displ[] = {0, 0}; opal_datatype_t *pdt; opal_datatype_t* types[] = { (opal_datatype_t*)&opal_datatype_int1, (opal_datatype_t*)&opal_datatype_float8}; @@ -200,7 +202,7 @@ typedef struct { opal_datatype_t* create_strange_dt( void ) { sdata_intern v[2]; - OPAL_PTRDIFF_TYPE displ[3]; + ptrdiff_t displ[3]; opal_datatype_t *pdt, *pdt1; opal_datatype_create_contiguous(0, &opal_datatype_empty, &pdt1); @@ -280,7 +282,7 @@ static int32_t opal_datatype_create_indexed( int count, const int* pBlockLength, { opal_datatype_t* pdt; int i, dLength, endat, disp; - OPAL_PTRDIFF_TYPE extent; + ptrdiff_t extent; if( 0 == count ) { *newType = opal_datatype_create( 0 ); @@ -317,12 +319,12 @@ static int32_t opal_datatype_create_indexed( int count, const int* pBlockLength, return OPAL_SUCCESS; } -static int32_t opal_datatype_create_hindexed( int count, const int* pBlockLength, const OPAL_PTRDIFF_TYPE* pDisp, +static int32_t opal_datatype_create_hindexed( int count, const int* pBlockLength, const ptrdiff_t* pDisp, const opal_datatype_t* oldType, opal_datatype_t** newType ) { opal_datatype_t* pdt; int i, dLength; - OPAL_PTRDIFF_TYPE extent, disp, endat; + ptrdiff_t extent, disp, endat; if( 0 == count ) { *newType = opal_datatype_create( 0 ); @@ -360,11 +362,11 @@ static int32_t opal_datatype_create_hindexed( int count, const int* pBlockLength } -static int32_t opal_datatype_create_struct( int count, const int* pBlockLength, const OPAL_PTRDIFF_TYPE* pDisp, +static int32_t opal_datatype_create_struct( int count, const int* pBlockLength, const ptrdiff_t* pDisp, opal_datatype_t** pTypes, opal_datatype_t** newType ) { int i; - OPAL_PTRDIFF_TYPE disp = 0, endto, lastExtent, lastDisp; + ptrdiff_t disp = 0, endto, lastExtent, lastDisp; int lastBlock; opal_datatype_t *pdt, *lastType; @@ -433,7 +435,7 @@ static int32_t opal_datatype_create_vector( int count, int bLength, int stride, const opal_datatype_t* oldType, opal_datatype_t** newType ) { opal_datatype_t *pTempData, *pData; - OPAL_PTRDIFF_TYPE extent = oldType->ub - oldType->lb; + ptrdiff_t extent = oldType->ub - oldType->lb; if( 0 == count ) { @@ -461,11 +463,11 @@ static int32_t opal_datatype_create_vector( int count, int bLength, int stride, } -static int32_t opal_datatype_create_hvector( int count, int bLength, OPAL_PTRDIFF_TYPE stride, +static int32_t opal_datatype_create_hvector( int count, int bLength, ptrdiff_t stride, const opal_datatype_t* oldType, opal_datatype_t** newType ) { opal_datatype_t *pTempData, *pData; - OPAL_PTRDIFF_TYPE extent = oldType->ub - oldType->lb; + ptrdiff_t extent = oldType->ub - oldType->lb; if( 0 == count ) { *newType = opal_datatype_create( 0 ); @@ -609,7 +611,7 @@ opal_datatype_t* test_contiguous( void ) { opal_datatype_t *pdt, *pdt1, *pdt2; - printf( "test contiguous (alignement)\n" ); + printf( "test contiguous (alignment)\n" ); opal_datatype_create_contiguous(0, &opal_datatype_empty, &pdt1); opal_datatype_add( pdt1, &opal_datatype_float8, 1, 0, -1 ); if( outputFlags & DUMP_DATA_AFTER_COMMIT ) { @@ -637,8 +639,8 @@ opal_datatype_t* test_contiguous( void ) int mpich_typeub( void ) { int errs = 0; - OPAL_PTRDIFF_TYPE extent, lb, extent1, extent2, extent3; - OPAL_PTRDIFF_TYPE displ[2]; + ptrdiff_t extent, lb, extent1, extent2, extent3; + ptrdiff_t displ[2]; int blens[2]; opal_datatype_t *type1, *type2, *type3, *types[2]; @@ -700,7 +702,7 @@ int mpich_typeub2( void ) { int blocklen[3], err = 0; size_t sz1, sz2, sz3; - OPAL_PTRDIFF_TYPE disp[3], lb, ub, ex1, ex2, ex3; + ptrdiff_t disp[3], lb, ub, ex1, ex2, ex3; opal_datatype_t *types[3], *dt1, *dt2, *dt3; blocklen[0] = 1; @@ -779,7 +781,7 @@ int mpich_typeub3( void ) { int blocklen[3], err = 0, idisp[3]; size_t sz; - OPAL_PTRDIFF_TYPE disp[3], lb, ub, ex; + ptrdiff_t disp[3], lb, ub, ex; opal_datatype_t *types[3], *dt1, *dt2, *dt3, *dt4, *dt5; /* Create a datatype with explicit LB and UB */ diff --git a/test/datatype/position_noncontig.c b/test/datatype/position_noncontig.c index 12a15fa47a7..0fb94c224ab 100644 --- a/test/datatype/position_noncontig.c +++ b/test/datatype/position_noncontig.c @@ -1,6 +1,6 @@ /* -*- Mode: C; c-basic-offset:4 ; -*- */ /* - * Copyright (c) 2004-2007 The University of Tennessee and The University + * Copyright (c) 2004-2017 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. * Copyright (c) 2011-2013 Cisco Systems, Inc. All rights reserved. @@ -23,7 +23,7 @@ /** * The purpose of this test is to simulate the multi-network packing and * unpacking process. The pack operation will happens in-order while the - * will be done randomly. Therefore, before each unpack the correct + * unpack will be done randomly. Therefore, before each unpack the correct * position in the user buffer has to be set. */ diff --git a/test/datatype/unpack_hetero.c b/test/datatype/unpack_hetero.c new file mode 100644 index 00000000000..48c9c1c2746 --- /dev/null +++ b/test/datatype/unpack_hetero.c @@ -0,0 +1,99 @@ +/* -*- Mode: C; c-basic-offset:4 ; -*- */ +/* + * Copyright (c) 2014-2016 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "opal_config.h" +#include "opal/runtime/opal.h" +#include "opal/datatype/opal_datatype.h" +#include "opal/datatype/opal_datatype_internal.h" +#include "opal/datatype/opal_convertor.h" +#include "opal/datatype/opal_datatype_prototypes.h" +#include "opal/util/arch.h" +#include +#include +#ifdef HAVE_SYS_TIME_H +#include +#endif +#include +#include + +/* Compile with: +gcc -DHAVE_CONFIG_H -I. -I../../include -I../.. -I../../include -I../../../ompi-trunk/opal -I../../../ompi-trunk/orte -g opal_datatype_test.c -o opal_datatype_test +*/ + +uint32_t remote_arch = 0xffffffff; + +/** + * Main function. Call several tests and print-out the results. It try to stress the convertor + * using difficult data-type constructions as well as strange segment sizes for the conversion. + * Usually, it is able to detect most of the data-type and convertor problems. Any modifications + * on the data-type engine should first pass all the tests from this file, before going into other + * tests. + */ +int main( int argc, char* argv[] ) +{ + opal_datatype_init(); + + /** + * By default simulate homogeneous architectures. + */ + remote_arch = opal_local_arch ^ OPAL_ARCH_ISBIGENDIAN; + + opal_convertor_t * pConv; + int sbuf[2], rbuf[2]; + size_t max_data; + struct iovec a; + uint32_t iov_count; + + sbuf[0] = 0x01000000; sbuf[1] = 0x02000000; + + printf( "\n\n#\n * TEST UNPACKING 1 int out of 1\n#\n\n" ); + + pConv = opal_convertor_create( remote_arch, 0 ); + rbuf[0] = -1; rbuf[1] = -1; + if( OPAL_SUCCESS != opal_convertor_prepare_for_recv( pConv, &opal_datatype_int4, 1, rbuf ) ) { + printf( "Cannot attach the datatype to a convertor\n" ); + return OPAL_ERROR; + } + + a.iov_base = sbuf; + a.iov_len = 4; + iov_count = 1; + max_data = 4; + opal_unpack_general( pConv, &a, &iov_count, &max_data ); + + assert(1 == rbuf[0]); + assert(-1 == rbuf[1]); + OBJ_RELEASE(pConv); + + printf( "\n\n#\n * TEST UNPACKING 1 int out of 2\n#\n\n" ); + pConv = opal_convertor_create( remote_arch, 0 ); + rbuf[0] = -1; rbuf[1] = -1; + if( OPAL_SUCCESS != opal_convertor_prepare_for_recv( pConv, &opal_datatype_int4, 2, rbuf ) ) { + printf( "Cannot attach the datatype to a convertor\n" ); + return OPAL_ERROR; + } + + + a.iov_base = sbuf; + a.iov_len = 4; + iov_count = 1; + max_data = 4; + opal_unpack_general( pConv, &a, &iov_count, &max_data ); + + assert(1 == rbuf[0]); + assert(-1 == rbuf[1]); + OBJ_RELEASE(pConv); + + /* clean-ups all data allocations */ + opal_datatype_finalize(); + opal_finalize(); + return OPAL_SUCCESS; +} diff --git a/test/event/event-test.c b/test/event/event-test.c index 1b90b0b9732..244c929cc74 100644 --- a/test/event/event-test.c +++ b/test/event/event-test.c @@ -123,10 +123,10 @@ main (int argc, char **argv) fprintf(stderr, "Write data to %s\n", fifo); #endif - /* Initalize the event library */ + /* Initialize the event library */ opal_init(); - /* Initalize one event */ + /* Initialize one event */ #ifdef WIN32 opal_event.set(opal_event_base, &evfifo, (int)socket, OPAL_EV_READ, fifo_read, &evfifo); #else diff --git a/test/event/signal-test.c b/test/event/signal-test.c index 79c189a14c1..261892b315d 100644 --- a/test/event/signal-test.c +++ b/test/event/signal-test.c @@ -50,10 +50,10 @@ main (int argc, char **argv) { opal_event_t signal_int, signal_term; - /* Initalize the event library */ + /* Initialize the event library */ opal_init(); - /* Initalize one event */ + /* Initialize one event */ opal_event.set(opal_event_base, &signal_term, SIGUSR1, OPAL_EV_SIGNAL|OPAL_EV_PERSIST, signal_cb, &signal_term); opal_event.set(opal_event_base, &signal_int, SIGUSR2, OPAL_EV_SIGNAL|OPAL_EV_PERSIST, signal_cb, diff --git a/test/event/time-test.c b/test/event/time-test.c index e4e38c46680..cec0e04715b 100644 --- a/test/event/time-test.c +++ b/test/event/time-test.c @@ -52,10 +52,10 @@ main (int argc, char **argv) opal_event_t timeout; struct timeval tv; - /* Initalize the event library */ + /* Initialize the event library */ opal_event_init(); - /* Initalize one event */ + /* Initialize one event */ opal_evtimer_set(&timeout, timeout_cb, &timeout); timerclear(&tv); diff --git a/test/monitoring/Makefile.am b/test/monitoring/Makefile.am index 54538cf9c5f..b95d59ed4dd 100644 --- a/test/monitoring/Makefile.am +++ b/test/monitoring/Makefile.am @@ -1,11 +1,12 @@ # -# Copyright (c) 2013-2015 The University of Tennessee and The University +# Copyright (c) 2013-2017 The University of Tennessee and The University # of Tennessee Research Foundation. All rights # reserved. -# Copyright (c) 2013-2015 Inria. All rights reserved. -# Copyright (c) 2015 Research Organization for Information Science +# Copyright (c) 2013-2017 Inria. All rights reserved. +# Copyright (c) 2015-2017 Research Organization for Information Science # and Technology (RIST). All rights reserved. # Copyright (c) 2016 IBM Corporation. All rights reserved. +# Copyright (c) 2016 Cisco Systems, Inc. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -16,19 +17,33 @@ # This test requires multiple processes to run. Don't run it as part # of 'make check' if PROJECT_OMPI - noinst_PROGRAMS = monitoring_test + noinst_PROGRAMS = monitoring_test test_pvar_access test_overhead check_monitoring example_reduce_count monitoring_test_SOURCES = monitoring_test.c - monitoring_test_LDFLAGS = $(WRAPPER_EXTRA_LDFLAGS) - monitoring_test_LDADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la $(top_builddir)/opal/libopen-pal.la - -if MCA_BUILD_ompi_pml_monitoring_DSO - lib_LTLIBRARIES = monitoring_prof.la - monitoring_prof_la_SOURCES = monitoring_prof.c - monitoring_prof_la_LDFLAGS=-module -avoid-version -shared $(WRAPPER_EXTRA_LDFLAGS) - monitoring_prof_la_LIBADD = $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la $(top_builddir)/opal/libopen-pal.la -endif - -endif + monitoring_test_LDFLAGS = $(OMPI_PKG_CONFIG_LDFLAGS) + monitoring_test_LDADD = \ + $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la + test_pvar_access_SOURCES = test_pvar_access.c + test_pvar_access_LDFLAGS = $(OMPI_PKG_CONFIG_LDFLAGS) + test_pvar_access_LDADD = \ + $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la + test_overhead_SOURCES = test_overhead.c + test_overhead_LDFLAGS = $(OMPI_PKG_CONFIG_LDFLAGS) + test_overhead_LDADD = \ + $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la + check_monitoring_SOURCES = check_monitoring.c + check_monitoring_LDFLAGS = $(OMPI_PKG_CONFIG_LDFLAGS) + check_monitoring_LDADD = \ + $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la + example_reduce_count_SOURCES = example_reduce_count.c + example_reduce_count_LDFLAGS = $(OMPI_PKG_CONFIG_LDFLAGS) + example_reduce_count_LDADD = \ + $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \ + $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la +endif # PROJECT_OMPI distclean: - rm -rf *.dSYM .deps .libs *.la *.lo monitoring_test *.log *.o *.trs Makefile + rm -rf *.dSYM .deps .libs *.la *.lo monitoring_test test_pvar_access test_overhead check_monitoring example_reduce_count prof *.log *.o *.trs Makefile diff --git a/test/monitoring/check_monitoring.c b/test/monitoring/check_monitoring.c new file mode 100644 index 00000000000..50c00769228 --- /dev/null +++ b/test/monitoring/check_monitoring.c @@ -0,0 +1,516 @@ +/* + * Copyright (c) 2016-2017 Inria. All rights reserved. + * Copyright (c) 2017 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +/* + Check the well working of the monitoring component for Open-MPI. + + To be run as: + + mpirun -np 4 --mca pml_monitoring_enable 2 ./check_monitoring +*/ + +#include +#include +#include +#include + +#define PVAR_GENERATE_VARIABLES(pvar_prefix, pvar_name, pvar_class) \ + /* Variables */ \ + static MPI_T_pvar_handle pvar_prefix ## _handle; \ + static const char pvar_prefix ## _pvar_name[] = pvar_name; \ + static int pvar_prefix ## _pvar_idx; \ + /* Functions */ \ + static inline int pvar_prefix ## _start(MPI_T_pvar_session session) \ + { \ + int MPIT_result; \ + MPIT_result = MPI_T_pvar_start(session, pvar_prefix ## _handle); \ + if( MPI_SUCCESS != MPIT_result ) { \ + fprintf(stderr, "Failed to start handle on \"%s\" pvar, check that you have " \ + "enabled the monitoring component.\n", pvar_prefix ## _pvar_name); \ + MPI_Abort(MPI_COMM_WORLD, MPIT_result); \ + } \ + return MPIT_result; \ + } \ + static inline int pvar_prefix ## _init(MPI_T_pvar_session session) \ + { \ + int MPIT_result; \ + /* Get index */ \ + MPIT_result = MPI_T_pvar_get_index(pvar_prefix ## _pvar_name, \ + pvar_class, \ + &(pvar_prefix ## _pvar_idx)); \ + if( MPI_SUCCESS != MPIT_result ) { \ + fprintf(stderr, "Cannot find monitoring MPI_Tool \"%s\" pvar, check that you have " \ + "enabled the monitoring component.\n", pvar_prefix ## _pvar_name); \ + MPI_Abort(MPI_COMM_WORLD, MPIT_result); \ + return MPIT_result; \ + } \ + /* Allocate handle */ \ + /* Allocating a new PVAR in a session will reset the counters */ \ + int count; \ + MPIT_result = MPI_T_pvar_handle_alloc(session, pvar_prefix ## _pvar_idx, \ + MPI_COMM_WORLD, &(pvar_prefix ## _handle), \ + &count); \ + if( MPI_SUCCESS != MPIT_result ) { \ + fprintf(stderr, "Failed to allocate handle on \"%s\" pvar, check that you have " \ + "enabled the monitoring component.\n", pvar_prefix ## _pvar_name); \ + MPI_Abort(MPI_COMM_WORLD, MPIT_result); \ + return MPIT_result; \ + } \ + /* Start PVAR */ \ + return pvar_prefix ## _start(session); \ + } \ + static inline int pvar_prefix ## _stop(MPI_T_pvar_session session) \ + { \ + int MPIT_result; \ + MPIT_result = MPI_T_pvar_stop(session, pvar_prefix ## _handle); \ + if( MPI_SUCCESS != MPIT_result ) { \ + fprintf(stderr, "Failed to stop handle on \"%s\" pvar, check that you have " \ + "enabled the monitoring component.\n", pvar_prefix ## _pvar_name); \ + MPI_Abort(MPI_COMM_WORLD, MPIT_result); \ + } \ + return MPIT_result; \ + } \ + static inline int pvar_prefix ## _finalize(MPI_T_pvar_session session) \ + { \ + int MPIT_result; \ + /* Stop PVAR */ \ + MPIT_result = pvar_prefix ## _stop(session); \ + /* Free handle */ \ + MPIT_result = MPI_T_pvar_handle_free(session, &(pvar_prefix ## _handle)); \ + if( MPI_SUCCESS != MPIT_result ) { \ + fprintf(stderr, "Failed to allocate handle on \"%s\" pvar, check that you have " \ + "enabled the monitoring component.\n", pvar_prefix ## _pvar_name); \ + MPI_Abort(MPI_COMM_WORLD, MPIT_result); \ + return MPIT_result; \ + } \ + return MPIT_result; \ + } \ + static inline int pvar_prefix ## _read(MPI_T_pvar_session session, void*values) \ + { \ + int MPIT_result; \ + /* Stop pvar */ \ + MPIT_result = pvar_prefix ## _stop(session); \ + /* Read values */ \ + MPIT_result = MPI_T_pvar_read(session, pvar_prefix ## _handle, values); \ + if( MPI_SUCCESS != MPIT_result ) { \ + fprintf(stderr, "Failed to read handle on \"%s\" pvar, check that you have " \ + "enabled the monitoring component.\n", pvar_prefix ## _pvar_name); \ + MPI_Abort(MPI_COMM_WORLD, MPIT_result); \ + } \ + /* Start and return */ \ + return pvar_prefix ## _start(session); \ + } + +#define GENERATE_CS(prefix, pvar_name_prefix, pvar_class_c, pvar_class_s) \ + PVAR_GENERATE_VARIABLES(prefix ## _count, pvar_name_prefix "_count", pvar_class_c) \ + PVAR_GENERATE_VARIABLES(prefix ## _size, pvar_name_prefix "_size", pvar_class_s) \ + static inline int pvar_ ## prefix ## _init(MPI_T_pvar_session session) \ + { \ + prefix ## _count_init(session); \ + return prefix ## _size_init(session); \ + } \ + static inline int pvar_ ## prefix ## _finalize(MPI_T_pvar_session session) \ + { \ + prefix ## _count_finalize(session); \ + return prefix ## _size_finalize(session); \ + } \ + static inline void pvar_ ## prefix ## _read(MPI_T_pvar_session session, \ + size_t*cvalues, size_t*svalues) \ + { \ + /* Read count values */ \ + prefix ## _count_read(session, cvalues); \ + /* Read size values */ \ + prefix ## _size_read(session, svalues); \ + } + +GENERATE_CS(pml, "pml_monitoring_messages", MPI_T_PVAR_CLASS_SIZE, MPI_T_PVAR_CLASS_SIZE) +GENERATE_CS(osc_s, "osc_monitoring_messages_sent", MPI_T_PVAR_CLASS_SIZE, MPI_T_PVAR_CLASS_SIZE) +GENERATE_CS(osc_r, "osc_monitoring_messages_recv", MPI_T_PVAR_CLASS_SIZE, MPI_T_PVAR_CLASS_SIZE) +GENERATE_CS(coll, "coll_monitoring_messages", MPI_T_PVAR_CLASS_SIZE, MPI_T_PVAR_CLASS_SIZE) +GENERATE_CS(o2a, "coll_monitoring_o2a", MPI_T_PVAR_CLASS_COUNTER, MPI_T_PVAR_CLASS_AGGREGATE) +GENERATE_CS(a2o, "coll_monitoring_a2o", MPI_T_PVAR_CLASS_COUNTER, MPI_T_PVAR_CLASS_AGGREGATE) +GENERATE_CS(a2a, "coll_monitoring_a2a", MPI_T_PVAR_CLASS_COUNTER, MPI_T_PVAR_CLASS_AGGREGATE) + +static size_t *old_cvalues, *old_svalues; + +static inline void pvar_all_init(MPI_T_pvar_session*session, int world_size) +{ + int MPIT_result, provided; + MPIT_result = MPI_T_init_thread(MPI_THREAD_SINGLE, &provided); + if (MPIT_result != MPI_SUCCESS) { + fprintf(stderr, "Failed to initialiaze MPI_Tools sub-system.\n"); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + MPIT_result = MPI_T_pvar_session_create(session); + if (MPIT_result != MPI_SUCCESS) { + printf("Failed to create a session for PVARs.\n"); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + old_cvalues = malloc(2 * world_size * sizeof(size_t)); + old_svalues = old_cvalues + world_size; + pvar_pml_init(*session); + pvar_osc_s_init(*session); + pvar_osc_r_init(*session); + pvar_coll_init(*session); + pvar_o2a_init(*session); + pvar_a2o_init(*session); + pvar_a2a_init(*session); +} + +static inline void pvar_all_finalize(MPI_T_pvar_session*session) +{ + int MPIT_result; + pvar_pml_finalize(*session); + pvar_osc_s_finalize(*session); + pvar_osc_r_finalize(*session); + pvar_coll_finalize(*session); + pvar_o2a_finalize(*session); + pvar_a2o_finalize(*session); + pvar_a2a_finalize(*session); + free(old_cvalues); + MPIT_result = MPI_T_pvar_session_free(session); + if (MPIT_result != MPI_SUCCESS) { + printf("Failed to close a session for PVARs.\n"); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + (void)MPI_T_finalize(); +} + +static inline int pvar_pml_check(MPI_T_pvar_session session, int world_size, int world_rank) +{ + int i, ret = MPI_SUCCESS; + size_t *cvalues, *svalues; + cvalues = malloc(2 * world_size * sizeof(size_t)); + svalues = cvalues + world_size; + /* Get values */ + pvar_pml_read(session, cvalues, svalues); + for( i = 0; i < world_size && MPI_SUCCESS == ret; ++i ) { + /* Check count values */ + if( i == world_rank && (cvalues[i] - old_cvalues[i]) != (size_t) 0 ) { + fprintf(stderr, "Error in %s: count_values[%d]=%zu, and should be equal to %zu.\n", + __func__, i, cvalues[i] - old_cvalues[i], (size_t) 0); + ret = -1; + } else if ( i != world_rank && (cvalues[i] - old_cvalues[i]) < (size_t) world_size ) { + fprintf(stderr, "Error in %s: count_values[%d]=%zu, and should be >= %zu.\n", + __func__, i, cvalues[i] - old_cvalues[i], (size_t) world_size); + ret = -1; + } + /* Check size values */ + if( i == world_rank && (svalues[i] - old_svalues[i]) != (size_t) 0 ) { + fprintf(stderr, "Error in %s: size_values[%d]=%zu, and should be equal to %zu.\n", + __func__, i, svalues[i] - old_svalues[i], (size_t) 0); + ret = -1; + } else if ( i != world_rank && (svalues[i] - old_svalues[i]) < (size_t) (world_size * 13 * sizeof(char)) ) { + fprintf(stderr, "Error in %s: size_values[%d]=%zu, and should be >= %zu.\n", + __func__, i, svalues[i] - old_svalues[i], (size_t) (world_size * 13 * sizeof(char))); + ret = -1; + } + } + if( MPI_SUCCESS == ret ) { + fprintf(stdout, "Check PML...[ OK ]\n"); + } else { + fprintf(stdout, "Check PML...[FAIL]\n"); + } + /* Keep old PML values */ + memcpy(old_cvalues, cvalues, 2 * world_size * sizeof(size_t)); + /* Free arrays */ + free(cvalues); + return ret; +} + +static inline int pvar_osc_check(MPI_T_pvar_session session, int world_size, int world_rank) +{ + int i, ret = MPI_SUCCESS; + size_t *cvalues, *svalues; + cvalues = malloc(2 * world_size * sizeof(size_t)); + svalues = cvalues + world_size; + /* Get OSC values */ + memset(cvalues, 0, 2 * world_size * sizeof(size_t)); + /* Check OSC sent values */ + pvar_osc_s_read(session, cvalues, svalues); + for( i = 0; i < world_size && MPI_SUCCESS == ret; ++i ) { + /* Check count values */ + if( cvalues[i] < (size_t) world_size ) { + fprintf(stderr, "Error in %s: count_values[%d]=%zu, and should be >= %zu.\n", + __func__, i, cvalues[i], (size_t) world_size); + ret = -1; + } + /* Check size values */ + if( svalues[i] < (size_t) (world_size * 13 * sizeof(char)) ) { + fprintf(stderr, "Error in %s: size_values[%d]=%zu, and should be >= %zu.\n", + __func__, i, svalues[i], (size_t) (world_size * 13 * sizeof(char))); + ret = -1; + } + } + /* Check OSC received values */ + pvar_osc_r_read(session, cvalues, svalues); + for( i = 0; i < world_size && MPI_SUCCESS == ret; ++i ) { + /* Check count values */ + if( cvalues[i] < (size_t) world_size ) { + fprintf(stderr, "Error in %s: count_values[%d]=%zu, and should be >= %zu.\n", + __func__, i, cvalues[i], (size_t) world_size); + ret = -1; + } + /* Check size values */ + if( svalues[i] < (size_t) (world_size * 13 * sizeof(char)) ) { + fprintf(stderr, "Error in %s: size_values[%d]=%zu, and should be >= %zu.\n", + __func__, i, svalues[i], (size_t) (world_size * 13 * sizeof(char))); + ret = -1; + } + } + if( MPI_SUCCESS == ret ) { + fprintf(stdout, "Check OSC...[ OK ]\n"); + } else { + fprintf(stdout, "Check OSC...[FAIL]\n"); + } + /* Keep old PML values */ + memcpy(old_cvalues, cvalues, 2 * world_size * sizeof(size_t)); + /* Free arrays */ + free(cvalues); + return ret; +} + +static inline int pvar_coll_check(MPI_T_pvar_session session, int world_size, int world_rank) { + int i, ret = MPI_SUCCESS; + size_t count, size; + size_t *cvalues, *svalues; + cvalues = malloc(2 * world_size * sizeof(size_t)); + svalues = cvalues + world_size; + /* Get COLL values */ + pvar_coll_read(session, cvalues, svalues); + for( i = 0; i < world_size && MPI_SUCCESS == ret; ++i ) { + /* Check count values */ + if( i == world_rank && cvalues[i] != (size_t) 0 ) { + fprintf(stderr, "Error in %s: count_values[%d]=%zu, and should be equal to %zu.\n", + __func__, i, cvalues[i], (size_t) 0); + ret = -1; + } else if ( i != world_rank && cvalues[i] < (size_t) (world_size + 1) * 4 ) { + fprintf(stderr, "Error in %s: count_values[%d]=%zu, and should be >= %zu.\n", + __func__, i, cvalues[i], (size_t) (world_size + 1) * 4); + ret = -1; + } + /* Check size values */ + if( i == world_rank && svalues[i] != (size_t) 0 ) { + fprintf(stderr, "Error in %s: size_values[%d]=%zu, and should be equal to %zu.\n", + __func__, i, svalues[i], (size_t) 0); + ret = -1; + } else if ( i != world_rank && svalues[i] < (size_t) (world_size * (2 * 13 * sizeof(char) + sizeof(int)) + 13 * 3 * sizeof(char) + sizeof(int)) ) { + fprintf(stderr, "Error in %s: size_values[%d]=%zu, and should be >= %zu.\n", + __func__, i, svalues[i], (size_t) (world_size * (2 * 13 * sizeof(char) + sizeof(int)) + 13 * 3 * sizeof(char) + sizeof(int))); + ret = -1; + } + } + /* Check One-to-all COLL values */ + pvar_o2a_read(session, &count, &size); + if( count < (size_t) 2 ) { + fprintf(stderr, "Error in %s: count_o2a=%zu, and should be >= %zu.\n", + __func__, count, (size_t) 2); + ret = -1; + } + if( size < (size_t) ((world_size - 1) * 13 * 2 * sizeof(char)) ) { + fprintf(stderr, "Error in %s: size_o2a=%zu, and should be >= %zu.\n", + __func__, size, (size_t) ((world_size - 1) * 13 * 2 * sizeof(char))); + ret = -1; + } + /* Check All-to-one COLL values */ + pvar_a2o_read(session, &count, &size); + if( count < (size_t) 2 ) { + fprintf(stderr, "Error in %s: count_a2o=%zu, and should be >= %zu.\n", + __func__, count, (size_t) 2); + ret = -1; + } + if( size < (size_t) ((world_size - 1) * (13 * sizeof(char) + sizeof(int))) ) { + fprintf(stderr, "Error in %s: size_a2o=%zu, and should be >= %zu.\n", + __func__, size, + (size_t) ((world_size - 1) * (13 * sizeof(char) + sizeof(int)))); + ret = -1; + } + /* Check All-to-all COLL values */ + pvar_a2a_read(session, &count, &size); + if( count < (size_t) (world_size * 4) ) { + fprintf(stderr, "Error in %s: count_a2a=%zu, and should be >= %zu.\n", + __func__, count, (size_t) (world_size * 4)); + ret = -1; + } + if( size < (size_t) (world_size * (world_size - 1) * (2 * 13 * sizeof(char) + sizeof(int))) ) { + fprintf(stderr, "Error in %s: size_a2a=%zu, and should be >= %zu.\n", + __func__, size, + (size_t) (world_size * (world_size - 1) * (2 * 13 * sizeof(char) + sizeof(int)))); + ret = -1; + } + if( MPI_SUCCESS == ret ) { + fprintf(stdout, "Check COLL...[ OK ]\n"); + } else { + fprintf(stdout, "Check COLL...[FAIL]\n"); + } + /* Keep old PML values */ + pvar_pml_read(session, old_cvalues, old_svalues); + /* Free arrays */ + free(cvalues); + return ret; +} + +int main(int argc, char* argv[]) +{ + int size, i, n, to, from, world_rank; + MPI_T_pvar_session session; + MPI_Status status; + char s1[20], s2[20]; + strncpy(s1, "hello world!", 13); + + MPI_Init(NULL, NULL); + MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); + MPI_Comm_size(MPI_COMM_WORLD, &size); + + pvar_all_init(&session, size); + + /* first phase: exchange size times data with everyone in + MPI_COMM_WORLD with collective operations. This phase comes + first in order to ease the prediction of messages exchanged of + each kind. + */ + char*coll_buff = malloc(2 * size * 13 * sizeof(char)); + char*coll_recv_buff = coll_buff + size * 13; + int sum_ranks; + for( n = 0; n < size; ++n ) { + /* Allgather */ + memset(coll_buff, 0, size * 13 * sizeof(char)); + MPI_Allgather(s1, 13, MPI_CHAR, coll_buff, 13, MPI_CHAR, MPI_COMM_WORLD); + for( i = 0; i < size; ++i ) { + if( strncmp(s1, &coll_buff[i * 13], 13) ) { + fprintf(stderr, "Error in Allgather check: received \"%s\" instead of " + "\"hello world!\" from %d.\n", &coll_buff[i * 13], i); + MPI_Abort(MPI_COMM_WORLD, -1); + } + } + /* Scatter */ + MPI_Scatter(coll_buff, 13, MPI_CHAR, s2, 13, MPI_CHAR, n, MPI_COMM_WORLD); + if( strncmp(s1, s2, 13) ) { + fprintf(stderr, "Error in Scatter check: received \"%s\" instead of " + "\"hello world!\" from %d.\n", s2, n); + MPI_Abort(MPI_COMM_WORLD, -1); + } + /* Allreduce */ + MPI_Allreduce(&world_rank, &sum_ranks, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); + if( sum_ranks != ((size - 1) * size / 2) ) { + fprintf(stderr, "Error in Allreduce check: sum_ranks=%d instead of %d.\n", + sum_ranks, (size - 1) * size / 2); + MPI_Abort(MPI_COMM_WORLD, -1); + } + /* Alltoall */ + memset(coll_recv_buff, 0, size * 13 * sizeof(char)); + MPI_Alltoall(coll_buff, 13, MPI_CHAR, coll_recv_buff, 13, MPI_CHAR, MPI_COMM_WORLD); + for( i = 0; i < size; ++i ) { + if( strncmp(s1, &coll_recv_buff[i * 13], 13) ) { + fprintf(stderr, "Error in Alltoall check: received \"%s\" instead of " + "\"hello world!\" from %d.\n", &coll_recv_buff[i * 13], i); + MPI_Abort(MPI_COMM_WORLD, -1); + } + } + /* Bcast */ + if( n == world_rank ) { + MPI_Bcast(s1, 13, MPI_CHAR, n, MPI_COMM_WORLD); + } else { + MPI_Bcast(s2, 13, MPI_CHAR, n, MPI_COMM_WORLD); + if( strncmp(s1, s2, 13) ) { + fprintf(stderr, "Error in Bcast check: received \"%s\" instead of " + "\"hello world!\" from %d.\n", s2, n); + MPI_Abort(MPI_COMM_WORLD, -1); + } + } + /* Barrier */ + MPI_Barrier(MPI_COMM_WORLD); + /* Gather */ + memset(coll_buff, 0, size * 13 * sizeof(char)); + MPI_Gather(s1, 13, MPI_CHAR, coll_buff, 13, MPI_CHAR, n, MPI_COMM_WORLD); + if( n == world_rank ) { + for( i = 0; i < size; ++i ) { + if( strncmp(s1, &coll_buff[i * 13], 13) ) { + fprintf(stderr, "Error in Gather check: received \"%s\" instead of " + "\"hello world!\" from %d.\n", &coll_buff[i * 13], i); + MPI_Abort(MPI_COMM_WORLD, -1); + } + } + } + /* Reduce */ + MPI_Reduce(&world_rank, &sum_ranks, 1, MPI_INT, MPI_SUM, n, MPI_COMM_WORLD); + if( n == world_rank ) { + if( sum_ranks != ((size - 1) * size / 2) ) { + fprintf(stderr, "Error in Reduce check: sum_ranks=%d instead of %d.\n", + sum_ranks, (size - 1) * size / 2); + MPI_Abort(MPI_COMM_WORLD, -1); + } + } + } + free(coll_buff); + if( -1 == pvar_coll_check(session, size, world_rank) ) MPI_Abort(MPI_COMM_WORLD, -1); + + /* second phase: exchange size times data with everyone except self + in MPI_COMM_WORLD with Send/Recv */ + for( n = 0; n < size; ++n ) { + for( i = 0; i < size - 1; ++i ) { + to = (world_rank+1+i)%size; + from = (world_rank+size-1-i)%size; + if(world_rank < to){ + MPI_Send(s1, 13, MPI_CHAR, to, world_rank, MPI_COMM_WORLD); + MPI_Recv(s2, 13, MPI_CHAR, from, from, MPI_COMM_WORLD, &status); + } else { + MPI_Recv(s2, 13, MPI_CHAR, from, from, MPI_COMM_WORLD, &status); + MPI_Send(s1, 13, MPI_CHAR, to, world_rank, MPI_COMM_WORLD); + } + if( strncmp(s2, "hello world!", 13) ) { + fprintf(stderr, "Error in PML check: s2=\"%s\" instead of \"hello world!\".\n", + s2); + MPI_Abort(MPI_COMM_WORLD, -1); + } + } + } + if( -1 == pvar_pml_check(session, size, world_rank) ) MPI_Abort(MPI_COMM_WORLD, -1); + + /* third phase: exchange size times data with everyone, including self, in + MPI_COMM_WORLD with RMA opertations */ + char win_buff[20]; + MPI_Win win; + MPI_Win_create(win_buff, 20, sizeof(char), MPI_INFO_NULL, MPI_COMM_WORLD, &win); + for( n = 0; n < size; ++n ) { + for( i = 0; i < size; ++i ) { + MPI_Win_lock(MPI_LOCK_EXCLUSIVE, i, 0, win); + MPI_Put(s1, 13, MPI_CHAR, i, 0, 13, MPI_CHAR, win); + MPI_Win_unlock(i, win); + } + MPI_Win_lock(MPI_LOCK_EXCLUSIVE, world_rank, 0, win); + if( strncmp(win_buff, "hello world!", 13) ) { + fprintf(stderr, "Error in OSC check: win_buff=\"%s\" instead of \"hello world!\".\n", + win_buff); + MPI_Abort(MPI_COMM_WORLD, -1); + } + MPI_Win_unlock(world_rank, win); + for( i = 0; i < size; ++i ) { + MPI_Win_lock(MPI_LOCK_EXCLUSIVE, i, 0, win); + MPI_Get(s2, 13, MPI_CHAR, i, 0, 13, MPI_CHAR, win); + MPI_Win_unlock(i, win); + if( strncmp(s2, "hello world!", 13) ) { + fprintf(stderr, "Error in OSC check: s2=\"%s\" instead of \"hello world!\".\n", + s2); + MPI_Abort(MPI_COMM_WORLD, -1); + } + } + } + MPI_Win_free(&win); + if( -1 == pvar_osc_check(session, size, world_rank) ) MPI_Abort(MPI_COMM_WORLD, -1); + + pvar_all_finalize(&session); + + MPI_Finalize(); + + return EXIT_SUCCESS; +} diff --git a/test/monitoring/example_reduce_count.c b/test/monitoring/example_reduce_count.c new file mode 100644 index 00000000000..d7811d2bf08 --- /dev/null +++ b/test/monitoring/example_reduce_count.c @@ -0,0 +1,127 @@ +/* + * Copyright (c) 2017 Inria. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include +#include +#include + +static MPI_T_pvar_handle count_handle; +static const char count_pvar_name[] = "pml_monitoring_messages_count"; +static int count_pvar_idx; + +int main(int argc, char**argv) +{ + int rank, size, n, to, from, tagno, MPIT_result, provided, count; + MPI_T_pvar_session session; + MPI_Status status; + MPI_Request request; + size_t*counts; + + n = -1; + MPI_Init(&argc, &argv); + MPI_Comm_rank(MPI_COMM_WORLD, &rank); + MPI_Comm_size(MPI_COMM_WORLD, &size); + to = (rank + 1) % size; + from = (rank + size - 1) % size; + tagno = 201; + + MPIT_result = MPI_T_init_thread(MPI_THREAD_SINGLE, &provided); + if (MPIT_result != MPI_SUCCESS) + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + + MPIT_result = MPI_T_pvar_get_index(count_pvar_name, MPI_T_PVAR_CLASS_SIZE, &count_pvar_idx); + if (MPIT_result != MPI_SUCCESS) { + printf("cannot find monitoring MPI_T \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + MPIT_result = MPI_T_pvar_session_create(&session); + if (MPIT_result != MPI_SUCCESS) { + printf("cannot create a session for \"%s\" pvar\n", count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + /* Allocating a new PVAR in a session will reset the counters */ + MPIT_result = MPI_T_pvar_handle_alloc(session, count_pvar_idx, + MPI_COMM_WORLD, &count_handle, &count); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to allocate handle on \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + counts = (size_t*)malloc(count * sizeof(size_t)); + + MPIT_result = MPI_T_pvar_start(session, count_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to start handle on \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + /* Token Ring communications */ + if (rank == 0) { + n = 25; + MPI_Isend(&n,1,MPI_INT,to,tagno,MPI_COMM_WORLD,&request); + } + while (1) { + MPI_Irecv(&n, 1, MPI_INT, from, tagno, MPI_COMM_WORLD, &request); + MPI_Wait(&request, &status); + if (rank == 0) {n--;tagno++;} + MPI_Isend(&n, 1, MPI_INT, to, tagno, MPI_COMM_WORLD, &request); + if (rank != 0) {n--;tagno++;} + if (n<0){ + break; + } + } + + MPIT_result = MPI_T_pvar_read(session, count_handle, counts); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to read handle on \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + /*** REDUCE ***/ + MPI_Allreduce(MPI_IN_PLACE, counts, count, MPI_UNSIGNED_LONG, MPI_MAX, MPI_COMM_WORLD); + + if(0 == rank) { + for(n = 0; n < count; ++n) + printf("%zu%s", counts[n], n < count - 1 ? ", " : "\n"); + } + + free(counts); + + MPIT_result = MPI_T_pvar_stop(session, count_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to stop handle on \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + MPIT_result = MPI_T_pvar_handle_free(session, &count_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to free handle on \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + MPIT_result = MPI_T_pvar_session_free(&session); + if (MPIT_result != MPI_SUCCESS) { + printf("cannot close a session for \"%s\" pvar\n", count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + (void)MPI_T_finalize(); + + MPI_Finalize(); + + return EXIT_SUCCESS; +} diff --git a/test/monitoring/monitoring_prof.c b/test/monitoring/monitoring_prof.c deleted file mode 100644 index 946f690a3a7..00000000000 --- a/test/monitoring/monitoring_prof.c +++ /dev/null @@ -1,255 +0,0 @@ -/* - * Copyright (c) 2013-2016 The University of Tennessee and The University - * of Tennessee Research Foundation. All rights - * reserved. - * Copyright (c) 2013-2015 Inria. All rights reserved. - * Copyright (c) 2013-2015 Bull SAS. All rights reserved. - * Copyright (c) 2016 Cisco Systems, Inc. All rights reserved. - * $COPYRIGHT$ - * - * Additional copyrights may follow - * - * $HEADER$ - */ - -/* -pml monitoring PMPI profiler - -Designed by George Bosilca , Emmanuel Jeannot and Guillaume Papauré -Contact the authors for questions. - -To be run as: - -mpirun -np 4 -x LD_PRELOAD=ompi_install_dir/lib/monitoring_prof.so --mca pml_monitoring_enable 1 ./my_app - -... -... -... - -writing 4x4 matrix to monitoring_msg.mat -writing 4x4 matrix to monitoring_size.mat -writing 4x4 matrix to monitoring_avg.mat - -*/ - -#include -#include -#include -#include -#include -#include - -static MPI_T_pvar_session session; -static int comm_world_size; -static int comm_world_rank; - -struct monitoring_result -{ - char * pvar_name; - int pvar_idx; - MPI_T_pvar_handle pvar_handle; - uint64_t * vector; -}; -typedef struct monitoring_result monitoring_result; - -static monitoring_result counts; -static monitoring_result sizes; - -static int write_mat(char *, uint64_t *, unsigned int); -static void init_monitoring_result(const char *, monitoring_result *); -static void start_monitoring_result(monitoring_result *); -static void stop_monitoring_result(monitoring_result *); -static void get_monitoring_result(monitoring_result *); -static void destroy_monitoring_result(monitoring_result *); - -int MPI_Init(int* argc, char*** argv) -{ - int result, MPIT_result; - int provided; - - result = PMPI_Init(argc, argv); - - PMPI_Comm_size(MPI_COMM_WORLD, &comm_world_size); - PMPI_Comm_rank(MPI_COMM_WORLD, &comm_world_rank); - - MPIT_result = MPI_T_init_thread(MPI_THREAD_SINGLE, &provided); - if (MPIT_result != MPI_SUCCESS) { - fprintf(stderr, "ERROR : failed to intialize MPI_T interface, preventing to get monitoring results: check your OpenMPI installation\n"); - PMPI_Abort(MPI_COMM_WORLD, MPIT_result); - } - - MPIT_result = MPI_T_pvar_session_create(&session); - if (MPIT_result != MPI_SUCCESS) { - fprintf(stderr, "ERROR : failed to create MPI_T session, preventing to get monitoring results: check your OpenMPI installation\n"); - PMPI_Abort(MPI_COMM_WORLD, MPIT_result); - } - - init_monitoring_result("pml_monitoring_messages_count", &counts); - init_monitoring_result("pml_monitoring_messages_size", &sizes); - - start_monitoring_result(&counts); - start_monitoring_result(&sizes); - - return result; -} - -int MPI_Finalize(void) -{ - int result, MPIT_result; - uint64_t * exchange_count_matrix = NULL; - uint64_t * exchange_size_matrix = NULL; - uint64_t * exchange_avg_size_matrix = NULL; - - if (0 == comm_world_rank) { - exchange_count_matrix = (uint64_t *) malloc(comm_world_size * comm_world_size * sizeof(uint64_t)); - exchange_size_matrix = (uint64_t *) malloc(comm_world_size * comm_world_size * sizeof(uint64_t)); - exchange_avg_size_matrix = (uint64_t *) malloc(comm_world_size * comm_world_size * sizeof(uint64_t)); - } - - stop_monitoring_result(&counts); - stop_monitoring_result(&sizes); - - get_monitoring_result(&counts); - get_monitoring_result(&sizes); - - PMPI_Gather(counts.vector, comm_world_size, MPI_UNSIGNED_LONG, exchange_count_matrix, comm_world_size, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD); - PMPI_Gather(sizes.vector, comm_world_size, MPI_UNSIGNED_LONG, exchange_size_matrix, comm_world_size, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD); - - if (0 == comm_world_rank) { - int i, j; - - //Get the same matrix than profile2mat.pl - for (i = 0; i < comm_world_size; ++i) { - for (j = i + 1; j < comm_world_size; ++j) { - exchange_count_matrix[i * comm_world_size + j] = exchange_count_matrix[j * comm_world_size + i] = (exchange_count_matrix[i * comm_world_size + j] + exchange_count_matrix[j * comm_world_size + i]) / 2; - exchange_size_matrix[i * comm_world_size + j] = exchange_size_matrix[j * comm_world_size + i] = (exchange_size_matrix[i * comm_world_size + j] + exchange_size_matrix[j * comm_world_size + i]) / 2; - if (exchange_count_matrix[i * comm_world_size + j] != 0) - exchange_avg_size_matrix[i * comm_world_size + j] = exchange_avg_size_matrix[j * comm_world_size + i] = exchange_size_matrix[i * comm_world_size + j] / exchange_count_matrix[i * comm_world_size + j]; - } - } - - write_mat("monitoring_msg.mat", exchange_count_matrix, comm_world_size); - write_mat("monitoring_size.mat", exchange_size_matrix, comm_world_size); - write_mat("monitoring_avg.mat", exchange_avg_size_matrix, comm_world_size); - } - - free(exchange_count_matrix); - free(exchange_size_matrix); - free(exchange_avg_size_matrix); - destroy_monitoring_result(&counts); - destroy_monitoring_result(&sizes); - - MPIT_result = MPI_T_pvar_session_free(&session); - if (MPIT_result != MPI_SUCCESS) { - fprintf(stderr, "WARNING : failed to free MPI_T session, monitoring results may be impacted : check your OpenMPI installation\n"); - } - - MPIT_result = MPI_T_finalize(); - if (MPIT_result != MPI_SUCCESS) { - fprintf(stderr, "WARNING : failed to finalize MPI_T interface, monitoring results may be impacted : check your OpenMPI installation\n"); - } - - result = PMPI_Finalize(); - - return result; -} - -void init_monitoring_result(const char * pvar_name, monitoring_result * res) -{ - int count; - int MPIT_result; - MPI_Comm comm_world = MPI_COMM_WORLD; - - res->pvar_name = strdup(pvar_name); - - MPIT_result = MPI_T_pvar_get_index(res->pvar_name, MPI_T_PVAR_CLASS_SIZE, &(res->pvar_idx)); - if (MPIT_result != MPI_SUCCESS) { - fprintf(stderr, "ERROR : cannot find monitoring MPI_T \"%s\" pvar, check that you have monitoring pml\n", pvar_name); - PMPI_Abort(MPI_COMM_WORLD, MPIT_result); - } - - MPIT_result = MPI_T_pvar_handle_alloc(session, res->pvar_idx, comm_world, &(res->pvar_handle), &count); - if (MPIT_result != MPI_SUCCESS) { - fprintf(stderr, "ERROR : failed to allocate handle on \"%s\" pvar, check that you have monitoring pml\n", pvar_name); - PMPI_Abort(MPI_COMM_WORLD, MPIT_result); - } - - if (count != comm_world_size) { - fprintf(stderr, "ERROR : COMM_WORLD has %d ranks \"%s\" pvar contains %d values, check that you have monitoring pml\n", comm_world_size, pvar_name, count); - PMPI_Abort(MPI_COMM_WORLD, count); - } - - res->vector = (uint64_t *) malloc(comm_world_size * sizeof(uint64_t)); -} - -void start_monitoring_result(monitoring_result * res) -{ - int MPIT_result; - - MPIT_result = MPI_T_pvar_start(session, res->pvar_handle); - if (MPIT_result != MPI_SUCCESS) { - fprintf(stderr, "ERROR : failed to start handle on \"%s\" pvar, check that you have enabled the monitoring pml\n", res->pvar_name); - PMPI_Abort(MPI_COMM_WORLD, MPIT_result); - } -} - -void stop_monitoring_result(monitoring_result * res) -{ - int MPIT_result; - - MPIT_result = MPI_T_pvar_stop(session, res->pvar_handle); - if (MPIT_result != MPI_SUCCESS) { - fprintf(stderr, "ERROR : failed to stop handle on \"%s\" pvar, check that you have enabled the monitoring pml\n", res->pvar_name); - MPI_Abort(MPI_COMM_WORLD, MPIT_result); - } -} - -void get_monitoring_result(monitoring_result * res) -{ - int MPIT_result; - - MPIT_result = MPI_T_pvar_read(session, res->pvar_handle, res->vector); - if (MPIT_result != MPI_SUCCESS) { - fprintf(stderr, "ERROR : failed to read \"%s\" pvar, check that you have enabled the monitoring pml\n", res->pvar_name); - PMPI_Abort(MPI_COMM_WORLD, MPIT_result); - } -} - -void destroy_monitoring_result(monitoring_result * res) -{ - int MPIT_result; - - MPIT_result = MPI_T_pvar_handle_free(session, &(res->pvar_handle)); - if (MPIT_result != MPI_SUCCESS) { - printf("ERROR : failed to free handle on \"%s\" pvar, check that you have enabled the monitoring pml\n", res->pvar_name); - MPI_Abort(MPI_COMM_WORLD, MPIT_result); - } - - free(res->pvar_name); - free(res->vector); -} - -int write_mat(char * filename, uint64_t * mat, unsigned int dim) -{ - FILE *matrix_file; - int i, j; - - matrix_file = fopen(filename, "w"); - if (!matrix_file) { - fprintf(stderr, "ERROR : failed to open \"%s\" file in write mode, check your permissions\n", filename); - return -1; - } - - printf("writing %ux%u matrix to %s\n", dim, dim, filename); - - for (i = 0; i < comm_world_size; ++i) { - for (j = 0; j < comm_world_size; ++j) { - fprintf(matrix_file, "%" PRIu64 " ", mat[i * comm_world_size + j]); - } - fprintf(matrix_file, "\n"); - } - fflush(matrix_file); - fclose(matrix_file); - - return 0; -} diff --git a/test/monitoring/monitoring_test.c b/test/monitoring/monitoring_test.c index 70d51d17c29..ad1a00e4253 100644 --- a/test/monitoring/monitoring_test.c +++ b/test/monitoring/monitoring_test.c @@ -2,7 +2,7 @@ * Copyright (c) 2013-2015 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. - * Copyright (c) 2013-2015 Inria. All rights reserved. + * Copyright (c) 2013-2017 Inria. All rights reserved. * Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2016 Intel, Inc. All rights reserved. * $COPYRIGHT$ @@ -15,243 +15,362 @@ /* pml monitoring tester. -Designed by George Bosilca and Emmanuel Jeannot +Designed by George Bosilca Emmanuel Jeannot and Clément Foyer Contact the authors for questions. -To be run as: - -mpirun -np 4 --mca pml_monitoring_enable 2 ./monitoring_test -pm -Then, the output should be: - -flushing to ./prof/phase_1_2.prof -flushing to ./prof/phase_1_0.prof -flushing to ./prof/phase_1_3.prof -flushing to ./prof/phase_2_1.prof -flushing to ./prof/phase_2_3.prof -flushing to ./prof/phase_2_0.prof -flushing to ./prof/phase_2_2.prof -I 0 1 108 bytes 27 msgs sent -E 0 1 1012 bytes 30 msgs sent -E 0 2 23052 bytes 61 msgs sent -I 1 2 104 bytes 26 msgs sent -I 1 3 208 bytes 52 msgs sent -E 1 0 860 bytes 24 msgs sent -E 1 3 2552 bytes 56 msgs sent -I 2 3 104 bytes 26 msgs sent -E 2 0 22804 bytes 49 msgs sent -E 2 3 860 bytes 24 msgs sent -I 3 0 104 bytes 26 msgs sent -I 3 1 204 bytes 51 msgs sent -E 3 1 2304 bytes 44 msgs sent -E 3 2 860 bytes 24 msgs sent - -or as - -mpirun -np 4 --mca pml_monitoring_enable 1 ./monitoring_test - -for an output as: - -flushing to ./prof/phase_1_1.prof -flushing to ./prof/phase_1_0.prof -flushing to ./prof/phase_1_2.prof -flushing to ./prof/phase_1_3.prof -flushing to ./prof/phase_2_1.prof -flushing to ./prof/phase_2_3.prof -flushing to ./prof/phase_2_2.prof -flushing to ./prof/phase_2_0.prof -I 0 1 1120 bytes 57 msgs sent -I 0 2 23052 bytes 61 msgs sent -I 1 0 860 bytes 24 msgs sent -I 1 2 104 bytes 26 msgs sent -I 1 3 2760 bytes 108 msgs sent -I 2 0 22804 bytes 49 msgs sent -I 2 3 964 bytes 50 msgs sent -I 3 0 104 bytes 26 msgs sent -I 3 1 2508 bytes 95 msgs sent -I 3 2 860 bytes 24 msgs sent -*/ +To options are available for this test, with/without MPI_Tools, and with/without RMA operations. The default mode is without MPI_Tools, and with RMA operations. +To enable the MPI_Tools use, add "--with-mpit" as an application parameter. +To disable the RMA operations testing, add "--without-rma" as an application parameter. + +To be run as (without using MPI_Tool): + +mpirun -np 4 --mca pml_monitoring_enable 2 --mca pml_monitoring_enable_output 3 --mca pml_monitoring_filename prof/output ./monitoring_test + +with the results being, as an example: +output.1.prof +# POINT TO POINT +E 1 2 104 bytes 26 msgs sent 0,0,0,26,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 +E 1 3 208 bytes 52 msgs sent 8,0,0,65,1,5,2,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 +I 1 0 140 bytes 27 msgs sent +I 1 2 2068 bytes 1 msgs sent +I 1 3 2256 bytes 31 msgs sent +# OSC +S 1 0 0 bytes 1 msgs sent +R 1 0 40960 bytes 1 msgs sent +S 1 2 40960 bytes 1 msgs sent +# COLLECTIVES +C 1 0 140 bytes 27 msgs sent +C 1 2 140 bytes 27 msgs sent +C 1 3 140 bytes 27 msgs sent +D MPI COMMUNICATOR 4 DUP FROM 0 procs: 0,1,2,3 +O2A 1 0 bytes 0 msgs sent +A2O 1 0 bytes 0 msgs sent +A2A 1 276 bytes 15 msgs sent +D MPI_COMM_WORLD procs: 0,1,2,3 +O2A 1 0 bytes 0 msgs sent +A2O 1 0 bytes 0 msgs sent +A2A 1 96 bytes 9 msgs sent +D MPI COMMUNICATOR 5 SPLIT_TYPE FROM 4 procs: 0,1,2,3 +O2A 1 0 bytes 0 msgs sent +A2O 1 0 bytes 0 msgs sent +A2A 1 48 bytes 3 msgs sent +D MPI COMMUNICATOR 3 SPLIT FROM 0 procs: 1,3 +O2A 1 0 bytes 0 msgs sent +A2O 1 0 bytes 0 msgs sent +A2A 1 0 bytes 0 msgs sent +*/ -#include #include "mpi.h" +#include +#include static MPI_T_pvar_handle flush_handle; static const char flush_pvar_name[] = "pml_monitoring_flush"; +static const void*nullbuf = NULL; static int flush_pvar_idx; +static int with_mpit = 0; +static int with_rma = 1; int main(int argc, char* argv[]) { - int rank, size, n, to, from, tagno, MPIT_result, provided, count; + int rank, size, n, to, from, tagno, MPIT_result, provided, count, world_rank; MPI_T_pvar_session session; - MPI_Status status; MPI_Comm newcomm; - MPI_Request request; char filename[1024]; - + + for ( int arg_it = 1; argc > 1 && arg_it < argc; ++arg_it ) { + if( 0 == strcmp(argv[arg_it], "--with-mpit") ) { + with_mpit = 1; + printf("enable MPIT support\n"); + } else if( 0 == strcmp(argv[arg_it], "--without-rma") ) { + with_rma = 0; + printf("disable RMA testing\n"); + } + } /* first phase : make a token circulated in MPI_COMM_WORLD */ n = -1; - MPI_Init(&argc, &argv); - MPI_Comm_rank(MPI_COMM_WORLD, &rank); + MPI_Init(NULL, NULL); + MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); MPI_Comm_size(MPI_COMM_WORLD, &size); + rank = world_rank; to = (rank + 1) % size; from = (rank - 1) % size; tagno = 201; - MPIT_result = MPI_T_init_thread(MPI_THREAD_SINGLE, &provided); - if (MPIT_result != MPI_SUCCESS) - MPI_Abort(MPI_COMM_WORLD, MPIT_result); + if( with_mpit ) { + MPIT_result = MPI_T_init_thread(MPI_THREAD_SINGLE, &provided); + if (MPIT_result != MPI_SUCCESS) + MPI_Abort(MPI_COMM_WORLD, MPIT_result); - MPIT_result = MPI_T_pvar_get_index(flush_pvar_name, MPI_T_PVAR_CLASS_GENERIC, &flush_pvar_idx); - if (MPIT_result != MPI_SUCCESS) { - printf("cannot find monitoring MPI_T \"%s\" pvar, check that you have monitoring pml\n", - flush_pvar_name); - MPI_Abort(MPI_COMM_WORLD, MPIT_result); - } + MPIT_result = MPI_T_pvar_get_index(flush_pvar_name, MPI_T_PVAR_CLASS_GENERIC, &flush_pvar_idx); + if (MPIT_result != MPI_SUCCESS) { + printf("cannot find monitoring MPI_T \"%s\" pvar, check that you have monitoring pml\n", + flush_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } - MPIT_result = MPI_T_pvar_session_create(&session); - if (MPIT_result != MPI_SUCCESS) { - printf("cannot create a session for \"%s\" pvar\n", flush_pvar_name); - MPI_Abort(MPI_COMM_WORLD, MPIT_result); - } + MPIT_result = MPI_T_pvar_session_create(&session); + if (MPIT_result != MPI_SUCCESS) { + printf("cannot create a session for \"%s\" pvar\n", flush_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } - /* Allocating a new PVAR in a session will reset the counters */ - MPIT_result = MPI_T_pvar_handle_alloc(session, flush_pvar_idx, - MPI_COMM_WORLD, &flush_handle, &count); - if (MPIT_result != MPI_SUCCESS) { - printf("failed to allocate handle on \"%s\" pvar, check that you have monitoring pml\n", - flush_pvar_name); - MPI_Abort(MPI_COMM_WORLD, MPIT_result); - } + /* Allocating a new PVAR in a session will reset the counters */ + MPIT_result = MPI_T_pvar_handle_alloc(session, flush_pvar_idx, + MPI_COMM_WORLD, &flush_handle, &count); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to allocate handle on \"%s\" pvar, check that you have monitoring pml\n", + flush_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } - MPIT_result = MPI_T_pvar_start(session, flush_handle); - if (MPIT_result != MPI_SUCCESS) { - printf("failed to start handle on \"%s\" pvar, check that you have monitoring pml\n", - flush_pvar_name); - MPI_Abort(MPI_COMM_WORLD, MPIT_result); + MPIT_result = MPI_T_pvar_start(session, flush_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to start handle on \"%s\" pvar, check that you have monitoring pml\n", + flush_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } } if (rank == 0) { n = 25; - MPI_Isend(&n,1,MPI_INT,to,tagno,MPI_COMM_WORLD,&request); + MPI_Send(&n,1,MPI_INT,to,tagno,MPI_COMM_WORLD); } while (1) { - MPI_Irecv(&n,1,MPI_INT,from,tagno,MPI_COMM_WORLD, &request); - MPI_Wait(&request,&status); + MPI_Recv(&n, 1, MPI_INT, from, tagno, MPI_COMM_WORLD, MPI_STATUS_IGNORE); if (rank == 0) {n--;tagno++;} - MPI_Isend(&n,1,MPI_INT,to,tagno,MPI_COMM_WORLD, &request); + MPI_Send(&n, 1, MPI_INT, to, tagno, MPI_COMM_WORLD); if (rank != 0) {n--;tagno++;} if (n<0){ break; } } - /* Build one file per processes - Every thing that has been monitored by each - process since the last flush will be output in filename */ + if( with_mpit ) { + /* Build one file per processes + Every thing that has been monitored by each + process since the last flush will be output in filename */ + /* + Requires directory prof to be created. + Filename format should display the phase number + and the process rank for ease of parsing with + aggregate_profile.pl script + */ + sprintf(filename, "prof/phase_1"); - /* - Requires directory prof to be created. - Filename format should display the phase number - and the process rank for ease of parsing with - aggregate_profile.pl script - */ - sprintf(filename,"prof/phase_1_%d.prof",rank); - if( MPI_SUCCESS != MPI_T_pvar_write(session, flush_handle, filename) ) { - fprintf(stderr, "Process %d cannot save monitoring in %s\n", rank, filename); - } - /* Force the writing of the monitoring data */ - MPIT_result = MPI_T_pvar_stop(session, flush_handle); - if (MPIT_result != MPI_SUCCESS) { - printf("failed to stop handle on \"%s\" pvar, check that you have monitoring pml\n", - flush_pvar_name); - MPI_Abort(MPI_COMM_WORLD, MPIT_result); - } + if( MPI_SUCCESS != MPI_T_pvar_write(session, flush_handle, filename) ) { + fprintf(stderr, "Process %d cannot save monitoring in %s.%d.prof\n", + world_rank, filename, world_rank); + } + /* Force the writing of the monitoring data */ + MPIT_result = MPI_T_pvar_stop(session, flush_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to stop handle on \"%s\" pvar, check that you have monitoring pml\n", + flush_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } - MPIT_result = MPI_T_pvar_start(session, flush_handle); - if (MPIT_result != MPI_SUCCESS) { - printf("failed to start handle on \"%s\" pvar, check that you have monitoring pml\n", - flush_pvar_name); - MPI_Abort(MPI_COMM_WORLD, MPIT_result); - } - /* Don't set a filename. If we stop the session before setting it, then no output ile - * will be generated. - */ - if( MPI_SUCCESS != MPI_T_pvar_write(session, flush_handle, NULL) ) { - fprintf(stderr, "Process %d cannot save monitoring in %s\n", rank, filename); + MPIT_result = MPI_T_pvar_start(session, flush_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to start handle on \"%s\" pvar, check that you have monitoring pml\n", + flush_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + /* Don't set a filename. If we stop the session before setting it, then no output file + * will be generated. + */ + if( MPI_SUCCESS != MPI_T_pvar_write(session, flush_handle, (void*)&nullbuf) ) { + fprintf(stderr, "Process %d cannot save monitoring in %s\n", world_rank, filename); + } } /* Second phase. Work with different communicators. - even ranls will circulate a token - while odd ranks wil perform a all_to_all + even ranks will circulate a token + while odd ranks will perform a all_to_all */ MPI_Comm_split(MPI_COMM_WORLD, rank%2, rank, &newcomm); - /* the filename for flushing monitoring now uses 2 as phase number! */ - sprintf(filename, "prof/phase_2_%d.prof", rank); - - if(rank%2){ /*even ranks (in COMM_WORD) circulate a token*/ + if(rank%2){ /*odd ranks (in COMM_WORD) circulate a token*/ MPI_Comm_rank(newcomm, &rank); MPI_Comm_size(newcomm, &size); if( size > 1 ) { - to = (rank + 1) % size;; - from = (rank - 1) % size ; + to = (rank + 1) % size; + from = (rank - 1) % size; tagno = 201; if (rank == 0){ n = 50; MPI_Send(&n, 1, MPI_INT, to, tagno, newcomm); } while (1){ - MPI_Recv(&n, 1, MPI_INT, from, tagno, newcomm, &status); + MPI_Recv(&n, 1, MPI_INT, from, tagno, newcomm, MPI_STATUS_IGNORE); if (rank == 0) {n--; tagno++;} MPI_Send(&n, 1, MPI_INT, to, tagno, newcomm); if (rank != 0) {n--; tagno++;} if (n<0){ - if( MPI_SUCCESS != MPI_T_pvar_write(session, flush_handle, filename) ) { - fprintf(stderr, "Process %d cannot save monitoring in %s\n", rank, filename); - } break; } } } - } else { /*odd ranks (in COMM_WORD) will perform a all_to_all and a barrier*/ + } else { /*even ranks (in COMM_WORD) will perform a all_to_all and a barrier*/ int send_buff[10240]; int recv_buff[10240]; + MPI_Comm newcomm2; MPI_Comm_rank(newcomm, &rank); MPI_Comm_size(newcomm, &size); MPI_Alltoall(send_buff, 10240/size, MPI_INT, recv_buff, 10240/size, MPI_INT, newcomm); - MPI_Comm_split(newcomm, rank%2, rank, &newcomm); - MPI_Barrier(newcomm); + MPI_Comm_split(newcomm, rank%2, rank, &newcomm2); + MPI_Barrier(newcomm2); + MPI_Comm_free(&newcomm2); + } + + if( with_mpit ) { + /* Build one file per processes + Every thing that has been monitored by each + process since the last flush will be output in filename */ + /* + Requires directory prof to be created. + Filename format should display the phase number + and the process rank for ease of parsing with + aggregate_profile.pl script + */ + sprintf(filename, "prof/phase_2"); + if( MPI_SUCCESS != MPI_T_pvar_write(session, flush_handle, filename) ) { - fprintf(stderr, "Process %d cannot save monitoring in %s\n", rank, filename); + fprintf(stderr, "Process %d cannot save monitoring in %s.%d.prof\n", + world_rank, filename, world_rank); } - } - MPIT_result = MPI_T_pvar_stop(session, flush_handle); - if (MPIT_result != MPI_SUCCESS) { - printf("failed to stop handle on \"%s\" pvar, check that you have monitoring pml\n", - flush_pvar_name); - MPI_Abort(MPI_COMM_WORLD, MPIT_result); - } + /* Force the writing of the monitoring data */ + MPIT_result = MPI_T_pvar_stop(session, flush_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to stop handle on \"%s\" pvar, check that you have monitoring pml\n", + flush_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } - MPIT_result = MPI_T_pvar_handle_free(session, &flush_handle); - if (MPIT_result != MPI_SUCCESS) { - printf("failed to free handle on \"%s\" pvar, check that you have monitoring pml\n", - flush_pvar_name); - MPI_Abort(MPI_COMM_WORLD, MPIT_result); + MPIT_result = MPI_T_pvar_start(session, flush_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to start handle on \"%s\" pvar, check that you have monitoring pml\n", + flush_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + /* Don't set a filename. If we stop the session before setting it, then no output + * will be generated. + */ + if( MPI_SUCCESS != MPI_T_pvar_write(session, flush_handle, (void*)&nullbuf ) ) { + fprintf(stderr, "Process %d cannot save monitoring in %s\n", world_rank, filename); + } } - MPIT_result = MPI_T_pvar_session_free(&session); - if (MPIT_result != MPI_SUCCESS) { - printf("cannot close a session for \"%s\" pvar\n", flush_pvar_name); - MPI_Abort(MPI_COMM_WORLD, MPIT_result); + if( with_rma ) { + MPI_Win win; + int rs_buff[10240]; + int win_buff[10240]; + MPI_Comm_rank(MPI_COMM_WORLD, &rank); + MPI_Comm_size(MPI_COMM_WORLD, &size); + to = (rank + 1) % size; + from = (rank + size - 1) % size; + for( int v = 0; v < 10240; ++v ) + rs_buff[v] = win_buff[v] = rank; + + MPI_Win_create(win_buff, 10240 * sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &win); + MPI_Win_fence(MPI_MODE_NOPRECEDE, win); + if( rank%2 ) { + MPI_Win_fence(MPI_MODE_NOSTORE | MPI_MODE_NOPUT, win); + MPI_Get(rs_buff, 10240, MPI_INT, from, 0, 10240, MPI_INT, win); + } else { + MPI_Put(rs_buff, 10240, MPI_INT, to, 0, 10240, MPI_INT, win); + MPI_Win_fence(MPI_MODE_NOSTORE | MPI_MODE_NOPUT, win); + } + MPI_Win_fence(MPI_MODE_NOSUCCEED, win); + + for( int v = 0; v < 10240; ++v ) + if( rs_buff[v] != win_buff[v] && ((rank%2 && rs_buff[v] != from) || (!(rank%2) && rs_buff[v] != rank)) ) { + printf("Error on checking exchanged values: %s_buff[%d] == %d instead of %d\n", + rank%2 ? "rs" : "win", v, rs_buff[v], rank%2 ? from : rank); + MPI_Abort(MPI_COMM_WORLD, -1); + } + + MPI_Group world_group, newcomm_group, distant_group; + MPI_Comm_group(MPI_COMM_WORLD, &world_group); + MPI_Comm_group(newcomm, &newcomm_group); + MPI_Group_difference(world_group, newcomm_group, &distant_group); + if( rank%2 ) { + MPI_Win_post(distant_group, 0, win); + MPI_Win_wait(win); + /* Check received values */ + for( int v = 0; v < 10240; ++v ) + if( from != win_buff[v] ) { + printf("Error on checking exchanged values: win_buff[%d] == %d instead of %d\n", + v, win_buff[v], from); + MPI_Abort(MPI_COMM_WORLD, -1); + } + } else { + MPI_Win_start(distant_group, 0, win); + MPI_Put(rs_buff, 10240, MPI_INT, to, 0, 10240, MPI_INT, win); + MPI_Win_complete(win); + } + MPI_Group_free(&world_group); + MPI_Group_free(&newcomm_group); + MPI_Group_free(&distant_group); + MPI_Barrier(MPI_COMM_WORLD); + + for( int v = 0; v < 10240; ++v ) rs_buff[v] = rank; + + MPI_Win_lock(MPI_LOCK_EXCLUSIVE, to, 0, win); + MPI_Put(rs_buff, 10240, MPI_INT, to, 0, 10240, MPI_INT, win); + MPI_Win_unlock(to, win); + + MPI_Barrier(MPI_COMM_WORLD); + + /* Check received values */ + for( int v = 0; v < 10240; ++v ) + if( from != win_buff[v] ) { + printf("Error on checking exchanged values: win_buff[%d] == %d instead of %d\n", + v, win_buff[v], from); + MPI_Abort(MPI_COMM_WORLD, -1); + } + + MPI_Win_free(&win); } - (void)PMPI_T_finalize(); + if( with_mpit ) { + /* the filename for flushing monitoring now uses 3 as phase number! */ + sprintf(filename, "prof/phase_3"); + + if( MPI_SUCCESS != MPI_T_pvar_write(session, flush_handle, filename) ) { + fprintf(stderr, "Process %d cannot save monitoring in %s.%d.prof\n", + world_rank, filename, world_rank); + } + + MPIT_result = MPI_T_pvar_stop(session, flush_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to stop handle on \"%s\" pvar, check that you have monitoring pml\n", + flush_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + MPIT_result = MPI_T_pvar_handle_free(session, &flush_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to free handle on \"%s\" pvar, check that you have monitoring pml\n", + flush_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + MPIT_result = MPI_T_pvar_session_free(&session); + if (MPIT_result != MPI_SUCCESS) { + printf("cannot close a session for \"%s\" pvar\n", flush_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + (void)MPI_T_finalize(); + } + MPI_Comm_free(&newcomm); /* Now, in MPI_Finalize(), the pml_monitoring library outputs, in STDERR, the aggregated recorded monitoring of all the phases*/ MPI_Finalize(); diff --git a/test/monitoring/test_overhead.c b/test/monitoring/test_overhead.c new file mode 100644 index 00000000000..5356761334a --- /dev/null +++ b/test/monitoring/test_overhead.c @@ -0,0 +1,294 @@ +/* + * Copyright (c) 2016-2017 Inria. All rights reserved. + * Copyright (c) 2017 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +/* + Measurement for the pml_monitoring component overhead + + Designed by Clement Foyer + Contact the authors for questions. + + To be run as: + + +*/ + +#include +#include +#include +#include +#include +#include "mpi.h" + +#define NB_ITER 1000 +#define FULL_NB_ITER (size_world * NB_ITER) +#define MAX_SIZE (1024 * 1024 * 1.4) +#define NB_OPS 6 + +static int rank_world = -1; +static int size_world = 0; +static int to = -1; +static int from = -1; +static MPI_Win win = MPI_WIN_NULL; + +/* Sorting results */ +static int comp_double(const void*_a, const void*_b) +{ + const double*a = _a; + const double*b = _b; + if(*a < *b) + return -1; + else if(*a > *b) + return 1; + else + return 0; +} + +/* Timing */ +static inline void get_tick(struct timespec*t) +{ +#if defined(__bg__) +# define CLOCK_TYPE CLOCK_REALTIME +#elif defined(CLOCK_MONOTONIC_RAW) +# define CLOCK_TYPE CLOCK_MONOTONIC_RAW +#elif defined(CLOCK_MONOTONIC) +# define CLOCK_TYPE CLOCK_MONOTONIC +#endif +#if defined(CLOCK_TYPE) + clock_gettime(CLOCK_TYPE, t); +#else + struct timeval tv; + gettimeofday(&tv, NULL); + t->tv_sec = tv.tv_sec; + t->tv_nsec = tv.tv_usec * 1000; +#endif +} +static inline double timing_delay(const struct timespec*const t1, const struct timespec*const t2) +{ + const double delay = 1000000.0 * (t2->tv_sec - t1->tv_sec) + (t2->tv_nsec - t1->tv_nsec) / 1000.0; + return delay; +} + +/* Operations */ +static inline void op_send(double*res, void*sbuf, int size, int tagno, void*rbuf) { + MPI_Request request; + struct timespec start, end; + + /* Post to be sure no unexpected message will be generated */ + MPI_Irecv(rbuf, size, MPI_BYTE, from, tagno, MPI_COMM_WORLD, &request); + + /* Token ring to synchronize */ + /* We message the sender to make him know we are ready to + receive (even for non-eager mode sending) */ + if( 0 == rank_world ) { + MPI_Send(NULL, 0, MPI_BYTE, from, 100, MPI_COMM_WORLD); + MPI_Recv(NULL, 0, MPI_BYTE, to, 100, MPI_COMM_WORLD, MPI_STATUS_IGNORE); + } else { + MPI_Recv(NULL, 0, MPI_BYTE, to, 100, MPI_COMM_WORLD, MPI_STATUS_IGNORE); + MPI_Send(NULL, 0, MPI_BYTE, from, 100, MPI_COMM_WORLD); + } + + /* do monitored operation */ + get_tick(&start); + MPI_Send(sbuf, size, MPI_BYTE, to, tagno, MPI_COMM_WORLD); + get_tick(&end); + + MPI_Wait(&request, MPI_STATUS_IGNORE); + *res = timing_delay(&start, &end); +} + +static inline void op_send_pingpong(double*res, void*sbuf, int size, int tagno, void*rbuf) { + struct timespec start, end; + + MPI_Barrier(MPI_COMM_WORLD); + + /* do monitored operation */ + if(rank_world % 2) { /* Odd ranks : Recv - Send */ + MPI_Recv(rbuf, size, MPI_BYTE, from, tagno, MPI_COMM_WORLD, MPI_STATUS_IGNORE); + MPI_Send(sbuf, size, MPI_BYTE, from, tagno, MPI_COMM_WORLD); + MPI_Barrier(MPI_COMM_WORLD); + get_tick(&start); + MPI_Send(sbuf, size, MPI_BYTE, from, tagno, MPI_COMM_WORLD); + MPI_Recv(rbuf, size, MPI_BYTE, from, tagno, MPI_COMM_WORLD, MPI_STATUS_IGNORE); + get_tick(&end); + } else { /* Even ranks : Send - Recv */ + get_tick(&start); + MPI_Send(sbuf, size, MPI_BYTE, to, tagno, MPI_COMM_WORLD); + MPI_Recv(rbuf, size, MPI_BYTE, to, tagno, MPI_COMM_WORLD, MPI_STATUS_IGNORE); + get_tick(&end); + MPI_Barrier(MPI_COMM_WORLD); + MPI_Recv(rbuf, size, MPI_BYTE, to, tagno, MPI_COMM_WORLD, MPI_STATUS_IGNORE); + MPI_Send(sbuf, size, MPI_BYTE, to, tagno, MPI_COMM_WORLD); + } + + *res = timing_delay(&start, &end) / 2; +} + +static inline void op_coll(double*res, void*buff, int size, int tagno, void*rbuf) { + struct timespec start, end; + MPI_Barrier(MPI_COMM_WORLD); + + /* do monitored operation */ + get_tick(&start); + MPI_Bcast(buff, size, MPI_BYTE, 0, MPI_COMM_WORLD); + get_tick(&end); + + *res = timing_delay(&start, &end); +} + +static inline void op_a2a(double*res, void*sbuf, int size, int tagno, void*rbuf) { + struct timespec start, end; + MPI_Barrier(MPI_COMM_WORLD); + + /* do monitored operation */ + get_tick(&start); + MPI_Alltoall(sbuf, size, MPI_BYTE, rbuf, size, MPI_BYTE, MPI_COMM_WORLD); + get_tick(&end); + + *res = timing_delay(&start, &end); +} + +static inline void op_put(double*res, void*sbuf, int size, int tagno, void*rbuf) { + struct timespec start, end; + + MPI_Win_lock(MPI_LOCK_EXCLUSIVE, to, 0, win); + + /* do monitored operation */ + get_tick(&start); + MPI_Put(sbuf, size, MPI_BYTE, to, 0, size, MPI_BYTE, win); + MPI_Win_unlock(to, win); + get_tick(&end); + + *res = timing_delay(&start, &end); +} + +static inline void op_get(double*res, void*rbuf, int size, int tagno, void*sbuf) { + struct timespec start, end; + + MPI_Win_lock(MPI_LOCK_SHARED, to, 0, win); + + /* do monitored operation */ + get_tick(&start); + MPI_Get(rbuf, size, MPI_BYTE, to, 0, size, MPI_BYTE, win); + MPI_Win_unlock(to, win); + get_tick(&end); + + *res = timing_delay(&start, &end); +} + +static inline void do_bench(int size, char*sbuf, double*results, + void(*op)(double*, void*, int, int, void*)) { + int iter; + int tagno = 201; + char*rbuf = sbuf ? sbuf + size : NULL; + + if(op == op_put || op == op_get){ + win = MPI_WIN_NULL; + MPI_Win_create(rbuf, size, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &win); + } + + for( iter = 0; iter < NB_ITER; ++iter ) { + op(&results[iter], sbuf, size, tagno, rbuf); + MPI_Barrier(MPI_COMM_WORLD); + } + + if(op == op_put || op == op_get){ + MPI_Win_free(&win); + win = MPI_WIN_NULL; + } +} + +int main(int argc, char* argv[]) +{ + int size, iter, nop; + char*sbuf = NULL; + double results[NB_ITER]; + void(*op)(double*, void*, int, int, void*); + char name[255]; + MPI_Init(&argc, &argv); + MPI_Comm_rank(MPI_COMM_WORLD, &rank_world); + MPI_Comm_size(MPI_COMM_WORLD, &size_world); + to = (rank_world + 1) % size_world; + from = (rank_world + size_world - 1) % size_world; + + double full_res[FULL_NB_ITER]; + + for( nop = 0; nop < NB_OPS; ++nop ) { + switch(nop) { + case 0: + op = op_send; + sprintf(name, "MPI_Send"); + break; + case 1: + op = op_coll; + sprintf(name, "MPI_Bcast"); + break; + case 2: + op = op_a2a; + sprintf(name, "MPI_Alltoall"); + break; + case 3: + op = op_send_pingpong; + sprintf(name, "MPI_Send_pp"); + break; + case 4: + op = op_put; + sprintf(name, "MPI_Put"); + break; + case 5: + op = op_get; + sprintf(name, "MPI_Get"); + break; + } + + if( 0 == rank_world ) + printf("# %s%%%d\n# size \t| latency \t| 10^6 B/s \t| MB/s \t| median \t| q1 \t| q3 \t| d1 \t| d9 \t| avg \t| max\n", name, size_world); + + for(size = 0; size < MAX_SIZE; size = ((int)(size * 1.4) > size) ? (size * 1.4) : (size + 1)) { + /* Init buffers */ + if( 0 != size ) { + sbuf = (char *)realloc(sbuf, (size_world + 1) * size); /* sbuf + alltoall recv buf */ + } + + do_bench(size, sbuf, results, op); + + MPI_Gather(results, NB_ITER, MPI_DOUBLE, full_res, NB_ITER, MPI_DOUBLE, 0, MPI_COMM_WORLD); + + if( 0 == rank_world ) { + qsort(full_res, FULL_NB_ITER, sizeof(double), &comp_double); + const double min_lat = full_res[0]; + const double max_lat = full_res[FULL_NB_ITER - 1]; + const double med_lat = full_res[(FULL_NB_ITER - 1) / 2]; + const double q1_lat = full_res[(FULL_NB_ITER - 1) / 4]; + const double q3_lat = full_res[ 3 * (FULL_NB_ITER - 1) / 4]; + const double d1_lat = full_res[(FULL_NB_ITER - 1) / 10]; + const double d9_lat = full_res[ 9 * (FULL_NB_ITER - 1) / 10]; + double avg_lat = 0.0; + for( iter = 0; iter < FULL_NB_ITER; iter++ ){ + avg_lat += full_res[iter]; + } + avg_lat /= FULL_NB_ITER; + const double bw_million_byte = size / min_lat; + const double bw_mbyte = bw_million_byte / 1.048576; + + printf("%9lld\t%9.3lf\t%9.3f\t%9.3f\t%9.3lf\t%9.3lf\t%9.3lf\t%9.3lf\t%9.3lf\t%9.3lf\t%9.3lf", + (long long)size, min_lat, bw_million_byte, bw_mbyte, + med_lat, q1_lat, q3_lat, d1_lat, d9_lat, + avg_lat, max_lat); + printf("\n"); + } + } + free(sbuf); + sbuf = NULL; + } + + MPI_Finalize(); + return EXIT_SUCCESS; +} diff --git a/test/monitoring/test_overhead.sh b/test/monitoring/test_overhead.sh new file mode 100755 index 00000000000..96814ba0bad --- /dev/null +++ b/test/monitoring/test_overhead.sh @@ -0,0 +1,218 @@ +#!/bin/bash + +# +# Copyright (c) 2016-2017 Inria. All rights reserved. +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +# +# Author Clément Foyer +# +# This script launches the test_overhead test case for 2, 4, 8, 12, +# 16, 20 and 24 processes, once with the monitoring component enabled, +# and once without any monitoring. It then parses and aggregates the +# results in order to create heatmaps. To work properly, this scripts +# needs sqlite3, sed, awk and gnuplot. It also needs the rights to +# write/create directories in the working path. Temporary files can be +# found in $resdir/.tmp. They are cleaned between two executions fo +# this script. +# +# This file create as an output one heatmap per operation +# tested. Currently, tested operations are : +# - MPI_Send (software overhead) +# - MPI_Send (ping-pong, to measure the overhead with the communication time) +# - MPI_Bcast +# - MPI_Alltoall +# - MPI_Put +# - MPI_Get +# + +exe=test_overhead + +# add common options +if [ $# -ge 1 ] +then + mfile="-machinefile $1" +fi +common_opt="$mfile --bind-to core" + +# dir +resdir=res +tmpdir=$resdir/.tmp +# files +base_nomon=$resdir/unmonitored +base_mon=$resdir/monitored +dbfile=$tmpdir/base.db +dbscript=$tmpdir/overhead.sql +plotfile=$tmpdir/plot.gp +# operations +ops=(send a2a bcast put get sendpp) + +# no_monitoring(nb_nodes, exe_name, output_filename, error_filename) +function no_monitoring() { + mpiexec -n $1 $common_opt --mca pml ^monitoring --mca osc ^monitoring --mca coll ^monitoring $2 2> $4 > $3 +} + +# monitoring(nb_nodes, exe_name, output_filename, error_filename) +function monitoring() { + mpiexec -n $1 $common_opt --mca pml_monitoring_enable 1 --mca pml_monitoring_enable_output 3 --mca pml_monitoring_filename "prof/toto" $2 2> $4 > $3 +} + +# filter_output(filenames_list) +function filter_output() { + for filename in "$@" + do + # remove extra texts from the output + sed -i '/--------------------------------------------------------------------------/,/--------------------------------------------------------------------------/d' $filename + # create all sub files as $tmpdir/$filename + file=$(sed -e "s|$resdir/|$tmpdir/|" -e "s/\.dat/.csv/" <<< $filename) + # split in file, one per kind of operation monitored + awk "/^# MPI_Send/ {out=\"$(sed "s/\.$nbprocs/.send&/" <<< $file)\"}; \ + /^# MPI_Bcast/ {out=\"$(sed "s/\.$nbprocs/.bcast&/" <<< $file)\"}; \ + /^# MPI_Alltoall/ {out=\"$(sed "s/\.$nbprocs/.a2a&/" <<< $file)\"}; \ + /^# MPI_Put/ {out=\"$(sed "s/\.$nbprocs/.put&/" <<< $file)\"}; \ + /^# MPI_Get/ {out=\"$(sed "s/\.$nbprocs/.get&/" <<< $file)\"}; \ + /^# MPI_Send_pp/ {out=\"$(sed "s/\.$nbprocs/.sendpp&/" <<< $file)\"}; \ + /^#/ { } ; !/^#/ {\$0=\"$nbprocs \"\$0; print > out};" \ + out=$tmpdir/tmp $filename + done + # trim spaces and replace them with comma in each file generated with awk + for file in `ls $tmpdir/*.*.$nbprocs.csv` + do + sed -i 's/[[:space:]]\{1,\}/,/g' $file + done +} + +# clean previous execution if any +if [ -d $tmpdir ] +then + rm -fr $tmpdir +fi +mkdir -p $tmpdir + +# start creating the sql file for data post-processing +cat > $dbscript <> $dbscript + echo -e "create table if not exists ${op}_mon (nbprocs integer, datasize integer, lat float, speed float, MBspeed float, media float, q1 float, q3 float, d1 float, d9 float, average float, maximum float, primary key (nbprocs, datasize) on conflict abort);\ncreate table if not exists ${op}_nomon (nbprocs integer, datasize integer, lat float, speed float, MBspeed float, media float, q1 float, q3 float, d1 float, d9 float, average float, maximum float, primary key (nbprocs, datasize) on conflict abort);" >> $dbscript +done + +# main loop to launch benchmarks +for nbprocs in 2 4 8 12 16 20 24 +do + echo "$nbprocs procs..." + output_nomon="$base_nomon.$nbprocs.dat" + error_nomon="$base_nomon.$nbprocs.err" + output_mon="$base_mon.$nbprocs.dat" + error_mon="$base_mon.$nbprocs.err" + # actually do the benchmarks + no_monitoring $nbprocs $exe $output_nomon $error_nomon + monitoring $nbprocs $exe $output_mon $error_mon + # prepare data to insert them more easily into database + filter_output $output_nomon $output_mon + # insert into database + echo -e "\n-- Import each CSV file in its corresponding table" >> $dbscript + for op in ${ops[*]} + do + echo -e ".import $(sed "s|$resdir/|$tmpdir/|" <<<$base_mon).${op}.${nbprocs}.csv ${op}_mon\n.import $(sed "s|$resdir/|$tmpdir/|" <<<$base_nomon).${op}.${nbprocs}.csv ${op}_nomon" >> $dbscript + done +done + +echo "Fetch data..." +echo -e "\n-- Perform some select query" >> $dbscript +for op in ${ops[*]} +do + cat >> $dbscript <> $dbscript <> $dbscript <> $dbscript < $plotfile < out ; print $0 > out } else { print $0 > out } }' out=$tmpdir/${op}.dat $tmpdir/${op}.dat + echo -e "set output '$resdir/${op}.png'\nsplot '$tmpdir/${op}.dat' using (\$1):(\$2):(\$3) with pm3d" +done) +EOF + +echo "Generating graphs..." + +gnuplot < $plotfile + +echo "Done." diff --git a/test/monitoring/test_pvar_access.c b/test/monitoring/test_pvar_access.c new file mode 100644 index 00000000000..3c0d5c04eb2 --- /dev/null +++ b/test/monitoring/test_pvar_access.c @@ -0,0 +1,323 @@ +/* + * Copyright (c) 2013-2017 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * Copyright (c) 2013-2016 Inria. All rights reserved. + * Copyright (c) 2015 Cisco Systems, Inc. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +/* +pml monitoring tester. + +Designed by George Bosilca , Emmanuel Jeannot and +Clement Foyer +Contact the authors for questions. + +To be run as: + +mpirun -np 4 --mca pml_monitoring_enable 2 ./test_pvar_access + +Then, the output should be: +Flushing phase 1: +I 0 1 108 bytes 27 msgs sent +I 1 2 104 bytes 26 msgs sent +I 2 3 104 bytes 26 msgs sent +I 3 0 104 bytes 26 msgs sent +Flushing phase 2: +I 0 1 20 bytes 4 msgs sent +I 0 2 20528 bytes 9 msgs sent +I 1 0 20 bytes 4 msgs sent +I 1 2 104 bytes 26 msgs sent +I 1 3 236 bytes 56 msgs sent +I 2 0 20528 bytes 9 msgs sent +I 2 3 112 bytes 27 msgs sent +I 3 1 220 bytes 52 msgs sent +I 3 2 20 bytes 4 msgs sent + +*/ + +#include +#include +#include + +static MPI_T_pvar_handle count_handle; +static MPI_T_pvar_handle msize_handle; +static const char count_pvar_name[] = "pml_monitoring_messages_count"; +static const char msize_pvar_name[] = "pml_monitoring_messages_size"; +static int count_pvar_idx, msize_pvar_idx; +static int world_rank, world_size; + +static void print_vars(int rank, int size, size_t* msg_count, size_t*msg_size) +{ + int i; + for(i = 0; i < size; ++i) { + if(0 != msg_size[i]) + printf("I\t%d\t%d\t%zu bytes\t%zu msgs sent\n", rank, i, msg_size[i], msg_count[i]); + } +} + +int main(int argc, char* argv[]) +{ + int rank, size, n, to, from, tagno, MPIT_result, provided, count; + MPI_T_pvar_session session; + MPI_Status status; + MPI_Comm newcomm; + MPI_Request request; + size_t*msg_count_p1, *msg_size_p1; + size_t*msg_count_p2, *msg_size_p2; + + /* first phase : make a token circulated in MPI_COMM_WORLD */ + n = -1; + MPI_Init(&argc, &argv); + MPI_Comm_rank(MPI_COMM_WORLD, &rank); + MPI_Comm_size(MPI_COMM_WORLD, &size); + world_size = size; + world_rank = rank; + to = (rank + 1) % size; + from = (rank - 1) % size; + tagno = 201; + + MPIT_result = MPI_T_init_thread(MPI_THREAD_SINGLE, &provided); + if (MPIT_result != MPI_SUCCESS) + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + + /* Retrieve the pvar indices */ + MPIT_result = MPI_T_pvar_get_index(count_pvar_name, MPI_T_PVAR_CLASS_SIZE, &count_pvar_idx); + if (MPIT_result != MPI_SUCCESS) { + printf("cannot find monitoring MPI_T \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + MPIT_result = MPI_T_pvar_get_index(msize_pvar_name, MPI_T_PVAR_CLASS_SIZE, &msize_pvar_idx); + if (MPIT_result != MPI_SUCCESS) { + printf("cannot find monitoring MPI_T \"%s\" pvar, check that you have monitoring pml\n", + msize_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + /* Get session for pvar binding */ + MPIT_result = MPI_T_pvar_session_create(&session); + if (MPIT_result != MPI_SUCCESS) { + printf("cannot create a session for \"%s\" and \"%s\" pvars\n", + count_pvar_name, msize_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + /* Allocating a new PVAR in a session will reset the counters */ + MPIT_result = MPI_T_pvar_handle_alloc(session, count_pvar_idx, + MPI_COMM_WORLD, &count_handle, &count); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to allocate handle on \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + MPIT_result = MPI_T_pvar_handle_alloc(session, msize_pvar_idx, + MPI_COMM_WORLD, &msize_handle, &count); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to allocate handle on \"%s\" pvar, check that you have monitoring pml\n", + msize_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + /* Allocate arrays to retrieve results */ + msg_count_p1 = calloc(count * 4, sizeof(size_t)); + msg_size_p1 = &msg_count_p1[count]; + msg_count_p2 = &msg_count_p1[2*count]; + msg_size_p2 = &msg_count_p1[3*count]; + + /* Start pvar */ + MPIT_result = MPI_T_pvar_start(session, count_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to start handle on \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + MPIT_result = MPI_T_pvar_start(session, msize_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to start handle on \"%s\" pvar, check that you have monitoring pml\n", + msize_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + if (rank == 0) { + n = 25; + MPI_Isend(&n,1,MPI_INT,to,tagno,MPI_COMM_WORLD,&request); + } + while (1) { + MPI_Irecv(&n, 1, MPI_INT, from, tagno, MPI_COMM_WORLD, &request); + MPI_Wait(&request, &status); + if (rank == 0) {n--;tagno++;} + MPI_Isend(&n, 1, MPI_INT, to, tagno, MPI_COMM_WORLD, &request); + if (rank != 0) {n--;tagno++;} + if (n<0){ + break; + } + } + + /* Test stopping variable then get values */ + MPIT_result = MPI_T_pvar_stop(session, count_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to stop handle on \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + MPIT_result = MPI_T_pvar_stop(session, msize_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to stop handle on \"%s\" pvar, check that you have monitoring pml\n", + msize_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + MPIT_result = MPI_T_pvar_read(session, count_handle, msg_count_p1); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to fetch handle on \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + MPIT_result = MPI_T_pvar_read(session, msize_handle, msg_size_p1); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to fetch handle on \"%s\" pvar, check that you have monitoring pml\n", + msize_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + /* Circulate a token to proper display the results */ + if(0 == world_rank) { + printf("Flushing phase 1:\n"); + print_vars(world_rank, world_size, msg_count_p1, msg_size_p1); + MPI_Send(NULL, 0, MPI_BYTE, (world_rank + 1) % world_size, 300, MPI_COMM_WORLD); + MPI_Recv(NULL, 0, MPI_BYTE, (world_rank - 1) % world_size, 300, MPI_COMM_WORLD, &status); + } else { + MPI_Recv(NULL, 0, MPI_BYTE, (world_rank - 1) % world_size, 300, MPI_COMM_WORLD, &status); + print_vars(world_rank, world_size, msg_count_p1, msg_size_p1); + MPI_Send(NULL, 0, MPI_BYTE, (world_rank + 1) % world_size, 300, MPI_COMM_WORLD); + } + + /* Add to the phase 1 the display token ring message count */ + MPIT_result = MPI_T_pvar_read(session, count_handle, msg_count_p1); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to fetch handle on \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + MPIT_result = MPI_T_pvar_read(session, msize_handle, msg_size_p1); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to fetch handle on \"%s\" pvar, check that you have monitoring pml\n", + msize_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + /* + Second phase. Work with different communicators. + even ranks will circulate a token + while odd ranks will perform a all_to_all + */ + MPIT_result = MPI_T_pvar_start(session, count_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to start handle on \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + MPIT_result = MPI_T_pvar_start(session, msize_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to start handle on \"%s\" pvar, check that you have monitoring pml\n", + msize_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + MPI_Comm_split(MPI_COMM_WORLD, rank%2, rank, &newcomm); + + if(rank%2){ /*even ranks (in COMM_WORD) circulate a token*/ + MPI_Comm_rank(newcomm, &rank); + MPI_Comm_size(newcomm, &size); + if( size > 1 ) { + to = (rank + 1) % size; + from = (rank - 1) % size; + tagno = 201; + if (rank == 0){ + n = 50; + MPI_Send(&n, 1, MPI_INT, to, tagno, newcomm); + } + while (1){ + MPI_Recv(&n, 1, MPI_INT, from, tagno, newcomm, &status); + if (rank == 0) {n--; tagno++;} + MPI_Send(&n, 1, MPI_INT, to, tagno, newcomm); + if (rank != 0) {n--; tagno++;} + if (n<0){ + break; + } + } + } + } else { /*odd ranks (in COMM_WORD) will perform a all_to_all and a barrier*/ + int send_buff[10240]; + int recv_buff[10240]; + MPI_Comm_rank(newcomm, &rank); + MPI_Comm_size(newcomm, &size); + MPI_Alltoall(send_buff, 10240/size, MPI_INT, recv_buff, 10240/size, MPI_INT, newcomm); + MPI_Comm_split(newcomm, rank%2, rank, &newcomm); + MPI_Barrier(newcomm); + } + + MPIT_result = MPI_T_pvar_read(session, count_handle, msg_count_p2); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to fetch handle on \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + MPIT_result = MPI_T_pvar_read(session, msize_handle, msg_size_p2); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to fetch handle on \"%s\" pvar, check that you have monitoring pml\n", + msize_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + /* Taking only in account the second phase */ + for(int i = 0; i < size; ++i) { + msg_count_p2[i] -= msg_count_p1[i]; + msg_size_p2[i] -= msg_size_p1[i]; + } + + /* Circulate a token to proper display the results */ + if(0 == world_rank) { + printf("Flushing phase 2:\n"); + print_vars(world_rank, world_size, msg_count_p2, msg_size_p2); + MPI_Send(NULL, 0, MPI_BYTE, (world_rank + 1) % world_size, 300, MPI_COMM_WORLD); + MPI_Recv(NULL, 0, MPI_BYTE, (world_rank - 1) % world_size, 300, MPI_COMM_WORLD, &status); + } else { + MPI_Recv(NULL, 0, MPI_BYTE, (world_rank - 1) % world_size, 300, MPI_COMM_WORLD, &status); + print_vars(world_rank, world_size, msg_count_p2, msg_size_p2); + MPI_Send(NULL, 0, MPI_BYTE, (world_rank + 1) % world_size, 300, MPI_COMM_WORLD); + } + + MPIT_result = MPI_T_pvar_handle_free(session, &count_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to free handle on \"%s\" pvar, check that you have monitoring pml\n", + count_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + MPIT_result = MPI_T_pvar_handle_free(session, &msize_handle); + if (MPIT_result != MPI_SUCCESS) { + printf("failed to free handle on \"%s\" pvar, check that you have monitoring pml\n", + msize_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + MPIT_result = MPI_T_pvar_session_free(&session); + if (MPIT_result != MPI_SUCCESS) { + printf("cannot close a session for \"%s\" and \"%s\" pvars\n", + count_pvar_name, msize_pvar_name); + MPI_Abort(MPI_COMM_WORLD, MPIT_result); + } + + (void)MPI_T_finalize(); + + free(msg_count_p1); + + MPI_Finalize(); + return EXIT_SUCCESS; +} diff --git a/test/mpi/environment/run_tests b/test/mpi/environment/run_tests index 8f6890a21ef..25d8818ba79 100755 --- a/test/mpi/environment/run_tests +++ b/test/mpi/environment/run_tests @@ -3,7 +3,7 @@ # ==== # run script for mpi environment tests -# Arguements +# Arguments # ========== # # test_list = no args or all runs all tests diff --git a/test/mpi/run_tests b/test/mpi/run_tests index cfcd3d9f33b..73b5c80516f 100755 --- a/test/mpi/run_tests +++ b/test/mpi/run_tests @@ -3,7 +3,7 @@ # ==== # run script for mpi tests -# Arguements +# Arguments # ========== # # test_list = no args or all runs all tests diff --git a/test/mpool/Makefile.am b/test/mpool/Makefile.am new file mode 100644 index 00000000000..620ed8c9a56 --- /dev/null +++ b/test/mpool/Makefile.am @@ -0,0 +1,21 @@ +# Copyright (c) 2018 Los Alamos National Security, LLC. All rights reserved. +# +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +TESTS = mpool_memkind + +check_PROGRAMS = $(TESTS) $(MPI_CHECKS) + +mpool_memkind_SOURCES = mpool_memkind.c + +LDFLAGS = $(OPAL_PKG_CONFIG_LDFLAGS) +LDADD = $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la + +distclean: + rm -rf *.dSYM .deps .libs *.log *.o *.trs $(check_PROGRAMS) Makefile + diff --git a/test/mpool/mpool_memkind.c b/test/mpool/mpool_memkind.c new file mode 100644 index 00000000000..bae81fd3b5b --- /dev/null +++ b/test/mpool/mpool_memkind.c @@ -0,0 +1,160 @@ +/* + * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana + * University Research and Technology + * Corporation. All rights reserved. + * Copyright (c) 2004-2005 The University of Tennessee and The University + * of Tennessee Research Foundation. All rights + * reserved. + * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, + * University of Stuttgart. All rights reserved. + * Copyright (c) 2004-2005 The Regents of the University of California. + * All rights reserved. + * Copyright (c) 2016 Research Organization for Information Science + * and Technology (RIST). All rights reserved. + * Copyright (c) 2018 Los Alamos National Security, LLC. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +/* + * only do this test if we have built with memkind support + */ + +#include +#include +#include +#include "opal_config.h" +#ifdef HAVE_MEMKIND_H +#include "opal/constants.h" +#include "opal/mca/mpool/mpool.h" +#include "opal/include/opal/frameworks.h" +#include "opal/runtime/opal.h" + +#define SIZE (2 * 1024 * 1024) + +const char *memory_types[] = { + "memkind_default", + "memkind_hbw", + NULL +}; + +const char *memory_policy[] = { + "mempolicy_bind_local", + "mempolicy_bind_all", + "mempolicy_perferred_local", + "mempolicy_interleave_local", + "mempolicy_interleave_all", + NULL +}; + +const char *memory_kind_bits[] = { + "memkind_mask_page_size_4KB", + "memkind_mask_page_size_2MB", + NULL +}; + +int main (int argc, char* argv[]) +{ + int ret = 0; + void *ptr = NULL; + char *error = NULL; + char **mp_ptr = NULL; + char **mt_ptr = NULL; + char **mk_ptr = NULL; + const char mpool_hints[] = "mpool=memkind"; + char hints[1024]; + + opal_init_util(&argc, &argv); + + if (opal_frameworks == NULL){ + error = "opal frameworks is NULL"; + goto error; + } + + if (OPAL_SUCCESS != (ret = mca_base_framework_open(&opal_allocator_base_framework, 0))) { + error = "mca_allocator_base_open() failed"; + goto error; + } + + if (OPAL_SUCCESS != (ret = mca_base_framework_open(&opal_mpool_base_framework, 0))) { + error = "mca_mpool_base_open() failed"; + goto error; + } + + /* + * first try basic allocation + */ + + ptr = mca_mpool_base_alloc(SIZE, NULL, mpool_hints); + if (NULL == ptr) { + error = "mca_mpool_base_alloc() failed"; + goto error; + } + + if (OPAL_SUCCESS != mca_mpool_base_free(ptr)) { + error = "mca_mpool_base_free() failed"; + goto error; + } + + /* + * now try policies + */ + + mp_ptr = (char **)memory_policy; + while (NULL != *mp_ptr) { + + mt_ptr = (char **)memory_types; + while (NULL != *mt_ptr) { + + mk_ptr = (char **)memory_kind_bits; + while (NULL != *mk_ptr) { + snprintf(hints, sizeof(hints), "%s,policy=%s,type=%s,kind=%s", + mpool_hints, *mp_ptr, *mt_ptr, *mk_ptr); + ptr = mca_mpool_base_alloc(SIZE, NULL, hints); + if (NULL == ptr) { + error = "mca_mpool_base_alloc() failed"; + goto error; + } + + if (OPAL_SUCCESS != mca_mpool_base_free(ptr)) { + error = "mca_mpool_base_free() failed"; + goto error; + } + mk_ptr++; + } + mt_ptr++; + } + mp_ptr++; + } + + if (OPAL_SUCCESS != (ret = mca_base_framework_close(&opal_mpool_base_framework))) { + error = "mca_mpool_base_close() failed"; + goto error; + } + + if (OPAL_SUCCESS != (ret = mca_base_framework_close(&opal_allocator_base_framework))) { + error = "mca_mpool_base_close() failed"; + goto error; + } + + opal_finalize(); + +error: + if (NULL != error) { + fprintf(stderr, "mpool/memkind test failed %s\n", error); + ret = -1; + } else { + fprintf(stderr, "mpool/memkind test passed\n"); + } + + return ret; +} +#else +int main (int argc, char* argv[]) +{ + return 77; +} +#endif /* HAVE_MEMKIND_H */ diff --git a/test/threads/opal_thread.c b/test/threads/opal_thread.c index 7fb11c6f880..169c8b5984c 100644 --- a/test/threads/opal_thread.c +++ b/test/threads/opal_thread.c @@ -36,13 +36,13 @@ static volatile int count = 0; static void* thr1_run(opal_object_t* obj) { - (void)opal_atomic_add(&count, 1); + opal_atomic_add (&count, 1); return NULL; } static void* thr2_run(opal_object_t* obj) { - (void)opal_atomic_add(&count, 2); + opal_atomic_add (&count, 2); return NULL; } diff --git a/test/util/Makefile.am b/test/util/Makefile.am index 73e12fa8f18..1b3757a27ee 100644 --- a/test/util/Makefile.am +++ b/test/util/Makefile.am @@ -12,6 +12,9 @@ # Copyright (c) 2012 Los Alamos National Security, LLC. All rights # reserved. # Copyright (c) 2016 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2018 Research Organization for Information Science +# and Technology (RIST). All rights reserved. +# Copyright (c) 2018 Intel, Inc. All rights reserved. # $COPYRIGHT$ # # Additional copyrights may follow @@ -34,7 +37,9 @@ AM_CPPFLAGS = -I$(top_srcdir)/test/support check_PROGRAMS = \ - opal_bit_ops opal_path_nfs + opal_bit_ops \ + opal_path_nfs \ + bipartite_graph TESTS = \ $(check_PROGRAMS) @@ -74,7 +79,6 @@ opal_bit_ops_LDADD = \ $(top_builddir)/test/support/libsupport.a opal_bit_ops_DEPENDENCIES = $(opal_path_nfs_LDADD) - opal_path_nfs_SOURCES = opal_path_nfs.c opal_path_nfs_LDADD = \ $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la \ @@ -118,6 +122,12 @@ opal_path_nfs_DEPENDENCIES = $(opal_path_nfs_LDADD) # $(top_builddir)/test/support/libsupport.a #orte_universe_setup_file_io_DEPENDENCIES = $(orte_universe_setup_file_io_LDADD) +bipartite_graph_SOURCES = bipartite_graph.c +bipartite_graph_LDADD = \ + $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la \ + $(top_builddir)/test/support/libsupport.a +bipartite_graph_DEPENDENCIES = $(bipartite_graph_LDADD) + clean-local: rm -f test_session_dir_out test-file opal_path_nfs.out diff --git a/test/util/bipartite_graph.c b/test/util/bipartite_graph.c new file mode 100644 index 00000000000..e5dd8710ead --- /dev/null +++ b/test/util/bipartite_graph.c @@ -0,0 +1,1112 @@ +/* + * Copyright (c) 2014 Cisco Systems, Inc. All rights reserved. + * $COPYRIGHT$ + * + * Additional copyrights may follow + * + * $HEADER$ + */ + +#include "opal_config.h" + +#include +#include + +#include "opal/constants.h" +#include "opal/class/opal_list.h" +#include "opal/class/opal_pointer_array.h" +#include "opal/util/bipartite_graph.h" +#include "opal/util/bipartite_graph_internal.h" + +# define test_out(...) fprintf(stderr, __VA_ARGS__) +# define check(a) \ + do { \ + if (!(a)) { \ + test_out("%s:%d: check failed, '%s'\n", __func__, __LINE__, #a); \ + return 1; \ + } \ + } while (0) +# define check_str_eq(a,b) \ + do { \ + const char *a_ = (a); \ + const char *b_ = (b); \ + if (0 != strcmp(a_,b_)) { \ + test_out("%s:%d: check failed, \"%s\" != \"%s\"\n", \ + __func__, __LINE__, a_, b_); \ + return 1; \ + } \ + } while (0) +# define check_int_eq(got, expected) \ + do { \ + if ((got) != (expected)) { \ + test_out("%s:%d: check failed, \"%s\" != \"%s\", got %d\n", \ + __func__, __LINE__, #got, #expected, (got)); \ + return 1; \ + } \ + } while (0) +/* just use check_int_eq for now, no public error code to string routine + * exists (opal_err2str is static) */ +# define check_err_code(got, expected) \ + check_int_eq(got, expected) +# define check_msg(a, msg) \ + do { \ + if (!(a)) { \ + test_out("%s:%d: check failed, \"%s\" (%s)\n", \ + __func__, __LINE__, #a, (msg)); \ + return 1; \ + } \ + } while (0) + +#define check_graph_is_consistent(g) \ + do { \ + check(opal_bp_graph_order(g) <= opal_pointer_array_get_size(&g->vertices)); \ + check(g->source_idx >= -1 || g->source_idx < opal_bp_graph_order(g)); \ + check(g->sink_idx >= -1 || g->sink_idx < opal_bp_graph_order(g)); \ + } while (0) + +#define check_has_in_out_degree(g, u, expected_indegree, expected_outdegree) \ + do { \ + check_int_eq(opal_bp_graph_indegree(g, (u)), expected_indegree); \ + check_int_eq(opal_bp_graph_outdegree(g, (u)), expected_outdegree); \ + } while (0) + +/* Check the given path for sanity and that it does not have a cycle. Uses + * the "racing pointers" approach for cycle checking. */ +#define check_path_cycle(n, source, sink, pred) \ + do { \ + int i_, j_; \ + check_int_eq(pred[source], -1); \ + for (i_ = 0; i_ < n; ++i_) { \ + check(pred[i_] >= -1); \ + check(pred[i_] < n); \ + } \ + i_ = (sink); \ + j_ = pred[(sink)]; \ + while (i_ != -1 && j_ != -1) { \ + check_msg(i_ != j_, "CYCLE DETECTED"); \ + i_ = pred[i_]; \ + j_ = pred[j_]; \ + if (j_ != -1) { \ + j_ = pred[j_]; \ + } \ + } \ + } while (0) + +static int v_cleanup_count = 0; +static int e_cleanup_count = 0; + +static void v_cleanup(void *v_data) +{ + ++v_cleanup_count; +} + +static void e_cleanup(void *e_data) +{ + ++e_cleanup_count; +} + +/* a utility function for comparing integer pairs, useful for sorting the edge + * list returned by opal_bp_graph_solve_bipartite_assignment */ +static int cmp_int_pair(const void *a, const void *b) +{ + int *ia = (int *)a; + int *ib = (int *)b; + + if (ia[0] < ib[0]) { + return -1; + } + else if (ia[0] > ib[0]) { + return 1; + } + else { /* ia[0] == ib[0] */ + if (ia[1] < ib[1]) { + return -1; + } + else if (ia[1] > ib[1]) { + return 1; + } + else { + return 0; + } + } +} + +/* Simple time function so that we don't have to deal with the + complexity of finding mpi.h to use MPI_Wtime */ +static double gettime(void) +{ + double wtime; + struct timeval tv; + gettimeofday(&tv, NULL); + wtime = tv.tv_sec; + wtime += (double)tv.tv_usec / 1000000.0; + + return wtime; +} + +static int test_graph_create(void *ctx) +{ + opal_bp_graph_t *g; + int i; + int err; + int user_data; + int index; + + /* TEST CASE: check zero-vertex case */ + g = NULL; + err = opal_bp_graph_create(NULL, NULL, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + check(opal_bp_graph_order(g) == 0); + check_graph_is_consistent(g); + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + + /* TEST CASE: check nonzero-vertex case with no cleanup routines */ + g = NULL; + err = opal_bp_graph_create(NULL, NULL, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + check_graph_is_consistent(g); + for (i = 0; i < 4; ++i) { + index = -1; + err = opal_bp_graph_add_vertex(g, &user_data, &index); + check_err_code(err, OPAL_SUCCESS); + check(index == i); + } + check(opal_bp_graph_order(g) == 4); + check_graph_is_consistent(g); + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + + /* TEST CASE: make sure cleanup routines are invoked properly */ + g = NULL; + v_cleanup_count = 0; + e_cleanup_count = 0; + err = opal_bp_graph_create(&v_cleanup, &e_cleanup, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + check_graph_is_consistent(g); + for (i = 0; i < 5; ++i) { + err = opal_bp_graph_add_vertex(g, &user_data, &index); + check_err_code(err, OPAL_SUCCESS); + check(index == i); + } + check(opal_bp_graph_order(g) == 5); + check_graph_is_consistent(g); + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/1, + /*capacity=*/2, &user_data); + check_graph_is_consistent(g); + check(v_cleanup_count == 0); + check(e_cleanup_count == 0); + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + check(v_cleanup_count == 5); + check(e_cleanup_count == 1); + + return 0; +} + +static int test_graph_clone(void *ctx) +{ + opal_bp_graph_t *g, *gx; + int i; + int err; + int user_data; + int index; + + /* TEST CASE: make sure that simple cloning works fine */ + g = NULL; + v_cleanup_count = 0; + e_cleanup_count = 0; + err = opal_bp_graph_create(&v_cleanup, &e_cleanup, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + check_graph_is_consistent(g); + + /* add 5 edges */ + for (i = 0; i < 5; ++i) { + err = opal_bp_graph_add_vertex(g, &user_data, &index); + check_err_code(err, OPAL_SUCCESS); + } + check(opal_bp_graph_order(g) == 5); + check_graph_is_consistent(g); + + /* and two edges */ + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/1, + /*capacity=*/2, &user_data); + check_err_code(err, OPAL_SUCCESS); + check_graph_is_consistent(g); + err = opal_bp_graph_add_edge(g, /*u=*/3, /*v=*/1, /*cost=*/2, + /*capacity=*/100, &user_data); + check_err_code(err, OPAL_SUCCESS); + check_graph_is_consistent(g); + + /* now clone it and ensure that we get the same kind of graph */ + gx = NULL; + err = opal_bp_graph_clone(g, /*copy_user_data=*/false, &gx); + check_err_code(err, OPAL_SUCCESS); + check(gx != NULL); + + /* double check that cleanups still happen as expected after cloning */ + err = opal_bp_graph_free(gx); + check_err_code(err, OPAL_SUCCESS); + check(v_cleanup_count == 0); + check(e_cleanup_count == 0); + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + check(v_cleanup_count == 5); + check(e_cleanup_count == 2); + + return 0; +} + +static int test_graph_accessors(void *ctx) +{ + opal_bp_graph_t *g; + int i; + int err; + + /* TEST CASE: check _indegree/_outdegree/_order work correctly */ + err = opal_bp_graph_create(NULL, NULL, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + + for (i = 0; i < 4; ++i) { + err = opal_bp_graph_add_vertex(g, NULL, NULL); + check_err_code(err, OPAL_SUCCESS); + + check(opal_bp_graph_indegree(g, i) == 0); + check(opal_bp_graph_outdegree(g, i) == 0); + } + + check(opal_bp_graph_order(g) == 4); + + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/2, /*cost=*/2, + /*capacity=*/1, NULL); + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/1, /*cost=*/2, + /*capacity=*/1, NULL); + + check(opal_bp_graph_indegree(g, 0) == 0); + check(opal_bp_graph_outdegree(g, 0) == 2); + check(opal_bp_graph_indegree(g, 1) == 1); + check(opal_bp_graph_outdegree(g, 1) == 0); + check(opal_bp_graph_indegree(g, 2) == 1); + check(opal_bp_graph_outdegree(g, 2) == 0); + check(opal_bp_graph_indegree(g, 3) == 0); + check(opal_bp_graph_outdegree(g, 3) == 0); + + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + + return 0; +} + +static int test_graph_assignment_solver(void *ctx) +{ + opal_bp_graph_t *g; + int i; + int err; + int nme; + int *me; + int iter; + double start, end; + + /* TEST CASE: check that simple cases are solved correctly + * + * 0 --> 2 + * 1 --> 3 + */ + err = opal_bp_graph_create(NULL, NULL, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + + for (i = 0; i < 4; ++i) { + err = opal_bp_graph_add_vertex(g, NULL, NULL); + check_err_code(err, OPAL_SUCCESS); + } + + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/2, /*cost=*/10, + /*capacity=*/1, NULL); + err = opal_bp_graph_add_edge(g, /*u=*/1, /*v=*/3, /*cost=*/2, + /*capacity=*/1, NULL); + + me = NULL; + err = opal_bp_graph_solve_bipartite_assignment(g, + &nme, + &me); + check_err_code(err, OPAL_SUCCESS); + check_int_eq(nme, 2); + check(me != NULL); + qsort(me, nme, 2*sizeof(int), &cmp_int_pair); + check(me[0] == 0 && me[1] == 2); + check(me[2] == 1 && me[3] == 3); + + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + + + /* TEST CASE: left side has more vertices than the right side + * + * 0 --> 3 + * 1 --> 4 + * 2 --> 4 + */ + err = opal_bp_graph_create(NULL, NULL, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + + for (i = 0; i < 5; ++i) { + err = opal_bp_graph_add_vertex(g, NULL, NULL); + check_err_code(err, OPAL_SUCCESS); + } + + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/10, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/1, /*v=*/4, /*cost=*/2, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/2, /*v=*/4, /*cost=*/1, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + + me = NULL; + err = opal_bp_graph_solve_bipartite_assignment(g, + &nme, + &me); + check_err_code(err, OPAL_SUCCESS); + check_int_eq(nme, 2); + check(me != NULL); + qsort(me, nme, 2*sizeof(int), &cmp_int_pair); + check(me[0] == 0 && me[1] == 3); + check(me[2] == 2 && me[3] == 4); + free(me); + + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + + + /* test Christian's case: + * 0 --> 2 + * 0 --> 3 + * 1 --> 3 + * + * make sure that 0-->2 & 1-->3 get chosen. + */ + err = opal_bp_graph_create(NULL, NULL, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + + for (i = 0; i < 4; ++i) { + err = opal_bp_graph_add_vertex(g, NULL, NULL); + check_err_code(err, OPAL_SUCCESS); + } + + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/2, /*cost=*/10, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/1, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/1, /*v=*/3, /*cost=*/5, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + + me = NULL; + err = opal_bp_graph_solve_bipartite_assignment(g, + &nme, + &me); + check_err_code(err, OPAL_SUCCESS); + check_int_eq(nme, 2); + check(me != NULL); + qsort(me, nme, 2*sizeof(int), &cmp_int_pair); + check(me[0] == 0 && me[1] == 2); + check(me[2] == 1 && me[3] == 3); + free(me); + + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + + /* Also need to do this version of it to be safe: + * 0 --> 2 + * 1 --> 2 + * 1 --> 3 + * + * Should choose 0-->2 & 1-->3 here too. + */ + err = opal_bp_graph_create(NULL, NULL, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + + for (i = 0; i < 4; ++i) { + err = opal_bp_graph_add_vertex(g, NULL, NULL); + check_err_code(err, OPAL_SUCCESS); + } + + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/2, /*cost=*/10, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/1, /*v=*/2, /*cost=*/1, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/1, /*v=*/3, /*cost=*/5, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + + me = NULL; + err = opal_bp_graph_solve_bipartite_assignment(g, + &nme, + &me); + check_err_code(err, OPAL_SUCCESS); + check_int_eq(nme, 2); + check(me != NULL); + qsort(me, nme, 2*sizeof(int), &cmp_int_pair); + check(me[0] == 0 && me[1] == 2); + check(me[2] == 1 && me[3] == 3); + free(me); + + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + + /* TEST CASE: test Christian's case with negative weights: + * 0 --> 2 + * 0 --> 3 + * 1 --> 3 + * + * make sure that 0-->2 & 1-->3 get chosen. + */ + err = opal_bp_graph_create(NULL, NULL, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + + for (i = 0; i < 4; ++i) { + err = opal_bp_graph_add_vertex(g, NULL, NULL); + check_err_code(err, OPAL_SUCCESS); + } + + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/2, /*cost=*/-1, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/-10, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/1, /*v=*/3, /*cost=*/-5, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + + me = NULL; + err = opal_bp_graph_solve_bipartite_assignment(g, + &nme, + &me); + check_err_code(err, OPAL_SUCCESS); + check_int_eq(nme, 2); + check(me != NULL); + qsort(me, nme, 2*sizeof(int), &cmp_int_pair); + check(me[0] == 0 && me[1] == 2); + check(me[2] == 1 && me[3] == 3); + free(me); + + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + + + /* TEST CASE: add some disconnected vertices + * 0 --> 2 + * 0 --> 3 + * 1 --> 3 + * x --> 4 + * + * make sure that 0-->2 & 1-->3 get chosen. + */ + err = opal_bp_graph_create(NULL, NULL, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + + for (i = 0; i < 5; ++i) { + err = opal_bp_graph_add_vertex(g, NULL, NULL); + check_err_code(err, OPAL_SUCCESS); + } + + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/2, /*cost=*/-1, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/-10, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/1, /*v=*/3, /*cost=*/-5, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + + me = NULL; + err = opal_bp_graph_solve_bipartite_assignment(g, + &nme, + &me); + check_err_code(err, OPAL_SUCCESS); + check_int_eq(nme, 2); + check(me != NULL); + qsort(me, nme, 2*sizeof(int), &cmp_int_pair); + check(me[0] == 0 && me[1] == 2); + check(me[2] == 1 && me[3] == 3); + free(me); + + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + + /* TEST CASE: sample UDP graph from bldsb005 + bldsb007 + * 0 --> 2 (cost -4294967296) + * 1 --> 2 (cost -4294967296) + * 0 --> 3 (cost -4294967296) + * 1 --> 3 (cost -4294967296) + * + * Make sure that either (0-->2 && 1-->3) or (0-->3 && 1-->2) get chosen. + */ + err = opal_bp_graph_create(NULL, NULL, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + + for (i = 0; i < 4; ++i) { + err = opal_bp_graph_add_vertex(g, NULL, NULL); + check_err_code(err, OPAL_SUCCESS); + } + + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/2, /*cost=*/-4294967296, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/1, /*v=*/2, /*cost=*/-4294967296, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/-4294967296, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/1, /*v=*/3, /*cost=*/-4294967296, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + + me = NULL; + err = opal_bp_graph_solve_bipartite_assignment(g, + &nme, + &me); + check_err_code(err, OPAL_SUCCESS); + check_int_eq(nme, 2); + check(me != NULL); + qsort(me, nme, 2*sizeof(int), &cmp_int_pair); + if (me[1] == 2) { + check(me[0] == 0 && me[1] == 2); + check(me[2] == 1 && me[3] == 3); + } else { + check(me[0] == 0 && me[1] == 3); + check(me[2] == 1 && me[3] == 2); + } + free(me); + + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + + + /* TEST CASE: check that simple cases are solved correctly + * + * 0 --> 2 + * 1 --> 2 + */ + err = opal_bp_graph_create(NULL, NULL, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + + for (i = 0; i < 3; ++i) { + err = opal_bp_graph_add_vertex(g, NULL, NULL); + check_err_code(err, OPAL_SUCCESS); + } + + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/2, /*cost=*/-100, + /*capacity=*/1, NULL); + err = opal_bp_graph_add_edge(g, /*u=*/1, /*v=*/2, /*cost=*/-100, + /*capacity=*/1, NULL); + + me = NULL; + err = opal_bp_graph_solve_bipartite_assignment(g, + &nme, + &me); + check_err_code(err, OPAL_SUCCESS); + check_int_eq(nme, 1); + check(me != NULL); + qsort(me, nme, 2*sizeof(int), &cmp_int_pair); + check((me[0] == 0 || me[0] == 1) && me[1] == 2); + + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + + + /* TEST CASE: performance sanity check + * + * Construct this graph and ensure that it doesn't take too long on a large + * cluster (1000 nodes). + * 0 --> 3 + * 1 --> 4 + * 2 --> 4 + */ +#define NUM_ITER (10000) + start = gettime(); + for (iter = 0; iter < NUM_ITER; ++iter) { + err = opal_bp_graph_create(NULL, NULL, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + + for (i = 0; i < 5; ++i) { + err = opal_bp_graph_add_vertex(g, NULL, NULL); + check_err_code(err, OPAL_SUCCESS); + } + + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/10, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/1, /*v=*/4, /*cost=*/2, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/2, /*v=*/4, /*cost=*/1, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + + me = NULL; + err = opal_bp_graph_solve_bipartite_assignment(g, + &nme, + &me); + check_err_code(err, OPAL_SUCCESS); + check_int_eq(nme, 2); + check(me != NULL); + qsort(me, nme, 2*sizeof(int), &cmp_int_pair); + check(me[0] == 0 && me[1] == 3); + check(me[2] == 2 && me[3] == 4); + free(me); + + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + } + end = gettime(); + /* ensure that this operation on a 1000 node cluster will take less than one second */ + check(((end - start) / NUM_ITER) < 0.001); +#if 0 + fprintf(stderr, "timing for %d iterations is %f seconds (%f s/iter)\n", + NUM_ITER, end - start, (end - start) / NUM_ITER); +#endif + + return 0; +} + +static int test_graph_bellman_ford(void *ctx) +{ + opal_bp_graph_t *g; + int i; + int err; + bool path_found; + int *pred; + + /* TEST CASE: check that simple cases are solved correctly + * -> 0 --> 2 + * / \ + * 4 --> 5 + * \ / + * -> 1 --> 3 / + * + * should yield the path 5,1,3,6 (see costs in code below) + */ + err = opal_bp_graph_create(NULL, NULL, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + + for (i = 0; i < 6; ++i) { + err = opal_bp_graph_add_vertex(g, NULL, NULL); + check_err_code(err, OPAL_SUCCESS); + } + + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/2, /*cost=*/10, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/1, /*v=*/3, /*cost=*/2, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/4, /*v=*/0, /*cost=*/0, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/4, /*v=*/1, /*cost=*/0, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/2, /*v=*/5, /*cost=*/0, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/3, /*v=*/5, /*cost=*/0, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + + pred = malloc(6*sizeof(*pred)); + check(pred != NULL); + path_found = opal_bp_graph_bellman_ford(g, /*source=*/4, /*target=*/5, pred); + check(path_found); + check_path_cycle(6, /*source=*/4, /*target=*/5, pred); + check_int_eq(pred[5], 3); + check_int_eq(pred[3], 1); + check_int_eq(pred[1], 4); + free(pred); + + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + + + /* TEST CASE: left side has more vertices than the right side, then + * convert to a flow network + * + * 0 --> 3 + * 1 --> 4 + * 2 --> 4 + */ + err = opal_bp_graph_create(NULL, NULL, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + + for (i = 0; i < 5; ++i) { + err = opal_bp_graph_add_vertex(g, NULL, NULL); + check_err_code(err, OPAL_SUCCESS); + } + + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/10, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/1, /*v=*/4, /*cost=*/2, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/2, /*v=*/4, /*cost=*/1, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + + err = opal_bp_graph_bipartite_to_flow(g); + check_err_code(err, OPAL_SUCCESS); + + pred = malloc(7*sizeof(*pred)); + check(pred != NULL); + path_found = opal_bp_graph_bellman_ford(g, /*source=*/5, /*target=*/6, pred); + check(path_found); + check_int_eq(g->source_idx, 5); + check_int_eq(g->sink_idx, 6); + check_path_cycle(7, /*source=*/5, /*target=*/6, pred); + check_int_eq(pred[6], 4); + check_int_eq(pred[4], 2); + check_int_eq(pred[2], 5); + free(pred); + + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + + /* TEST CASE: same as previous, but with very large cost values (try to + * catch incorrect integer conversions) + * + * 0 --> 3 + * 1 --> 4 + * 2 --> 4 + */ + err = opal_bp_graph_create(NULL, NULL, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + + for (i = 0; i < 5; ++i) { + err = opal_bp_graph_add_vertex(g, NULL, NULL); + check_err_code(err, OPAL_SUCCESS); + } + + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/INT32_MAX+10LL, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/1, /*v=*/4, /*cost=*/INT32_MAX+2LL, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/2, /*v=*/4, /*cost=*/INT32_MAX+1LL, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + + err = opal_bp_graph_bipartite_to_flow(g); + check_err_code(err, OPAL_SUCCESS); + + pred = malloc(7*sizeof(*pred)); + check(pred != NULL); + path_found = opal_bp_graph_bellman_ford(g, /*source=*/5, /*target=*/6, pred); + check(path_found); + check_int_eq(g->source_idx, 5); + check_int_eq(g->sink_idx, 6); + check_path_cycle(7, /*source=*/5, /*target=*/6, pred); + check_int_eq(pred[6], 4); + check_int_eq(pred[4], 2); + check_int_eq(pred[2], 5); + free(pred); + + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + + /* TEST CASE: left side has more vertices than the right side, then + * convert to a flow network. Negative costs are used, but should not + * result in a negative cycle. + * + * 0 --> 3 + * 1 --> 4 + * 2 --> 4 + */ + err = opal_bp_graph_create(NULL, NULL, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + + for (i = 0; i < 5; ++i) { + err = opal_bp_graph_add_vertex(g, NULL, NULL); + check_err_code(err, OPAL_SUCCESS); + } + + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/-1, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/1, /*v=*/4, /*cost=*/-2, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/2, /*v=*/4, /*cost=*/-10, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + + err = opal_bp_graph_bipartite_to_flow(g); + check_err_code(err, OPAL_SUCCESS); + + pred = malloc(7*sizeof(*pred)); + check(pred != NULL); + path_found = opal_bp_graph_bellman_ford(g, /*source=*/5, /*target=*/6, pred); + check(path_found); + check_int_eq(g->source_idx, 5); + check_int_eq(g->sink_idx, 6); + check_path_cycle(7, /*source=*/5, /*target=*/6, pred); + check_int_eq(pred[6], 4); + check_int_eq(pred[4], 2); + check_int_eq(pred[2], 5); + free(pred); + + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + + return 0; +} + +static int test_graph_flow_conversion(void *ctx) +{ + opal_bp_graph_t *g; + int i; + int err; + + /* TEST CASE: left side has more vertices than the right side, then + * convert to a flow network + * + * 0 --> 3 + * 1 --> 4 + * 2 --> 4 + */ + err = opal_bp_graph_create(NULL, NULL, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + + for (i = 0; i < 5; ++i) { + err = opal_bp_graph_add_vertex(g, NULL, NULL); + check_err_code(err, OPAL_SUCCESS); + } + + err = opal_bp_graph_add_edge(g, /*u=*/0, /*v=*/3, /*cost=*/10, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/1, /*v=*/4, /*cost=*/2, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/2, /*v=*/4, /*cost=*/1, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + + check_int_eq(opal_bp_graph_order(g), 5); + check_has_in_out_degree(g, 0, /*exp_indeg=*/0, /*exp_outdeg=*/1); + check_has_in_out_degree(g, 1, /*exp_indeg=*/0, /*exp_outdeg=*/1); + check_has_in_out_degree(g, 2, /*exp_indeg=*/0, /*exp_outdeg=*/1); + check_has_in_out_degree(g, 3, /*exp_indeg=*/1, /*exp_outdeg=*/0); + check_has_in_out_degree(g, 4, /*exp_indeg=*/2, /*exp_outdeg=*/0); + + /* this should add two nodes and a bunch of edges */ + err = opal_bp_graph_bipartite_to_flow(g); + check_err_code(err, OPAL_SUCCESS); + + check_int_eq(opal_bp_graph_order(g), 7); + check_has_in_out_degree(g, 0, /*exp_indeg=*/2, /*exp_outdeg=*/2); + check_has_in_out_degree(g, 1, /*exp_indeg=*/2, /*exp_outdeg=*/2); + check_has_in_out_degree(g, 2, /*exp_indeg=*/2, /*exp_outdeg=*/2); + check_has_in_out_degree(g, 3, /*exp_indeg=*/2, /*exp_outdeg=*/2); + check_has_in_out_degree(g, 4, /*exp_indeg=*/3, /*exp_outdeg=*/3); + check_has_in_out_degree(g, 5, /*exp_indeg=*/3, /*exp_outdeg=*/3); + check_has_in_out_degree(g, 6, /*exp_indeg=*/2, /*exp_outdeg=*/2); + + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + + + /* TEST CASE: empty graph + * + * there's no reason that the code should bother to support this, it's not + * useful + */ + err = opal_bp_graph_create(NULL, NULL, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + check_int_eq(opal_bp_graph_order(g), 0); + err = opal_bp_graph_bipartite_to_flow(g); + check_err_code(err, OPAL_ERR_BAD_PARAM); + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + + return 0; +} + +static int test_graph_param_checking(void *ctx) +{ + opal_bp_graph_t *g; + int i; + int err; + + err = opal_bp_graph_create(NULL, NULL, &g); + check_err_code(err, OPAL_SUCCESS); + check(g != NULL); + + /* try with no vertices */ + err = opal_bp_graph_add_edge(g, /*u=*/3, /*v=*/5, /*cost=*/0, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_ERR_BAD_PARAM); + + for (i = 0; i < 6; ++i) { + err = opal_bp_graph_add_vertex(g, NULL, NULL); + check_err_code(err, OPAL_SUCCESS); + } + + /* try u out of range */ + err = opal_bp_graph_add_edge(g, /*u=*/9, /*v=*/5, /*cost=*/0, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_ERR_BAD_PARAM); + err = opal_bp_graph_add_edge(g, /*u=*/6, /*v=*/5, /*cost=*/0, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_ERR_BAD_PARAM); + + /* try v out of range */ + err = opal_bp_graph_add_edge(g, /*u=*/2, /*v=*/8, /*cost=*/0, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_ERR_BAD_PARAM); + err = opal_bp_graph_add_edge(g, /*u=*/2, /*v=*/6, /*cost=*/0, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_ERR_BAD_PARAM); + + /* try adding an edge that already exists */ + err = opal_bp_graph_add_edge(g, /*u=*/2, /*v=*/4, /*cost=*/0, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + err = opal_bp_graph_add_edge(g, /*u=*/2, /*v=*/4, /*cost=*/0, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_EXISTS); + + /* try an edge with an out of range cost */ + err = opal_bp_graph_add_edge(g, /*u=*/2, /*v=*/3, /*cost=*/INT64_MAX, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_ERR_BAD_PARAM); + err = opal_bp_graph_add_edge(g, /*u=*/2, /*v=*/3, /*cost=*/INT64_MAX-1, + /*capacity=*/1, NULL); + check_err_code(err, OPAL_SUCCESS); + + err = opal_bp_graph_free(g); + check_err_code(err, OPAL_SUCCESS); + + return 0; +} + +static int test_graph_helper_macros(void *ctx) +{ + int u, v; + int pred[6]; + bool visited[6][6]; + int pair1[2]; + int pair2[2]; + +#define RESET_ARRAYS(n, pred, visited) \ + do { \ + for (u = 0; u < 6; ++u) { \ + pred[u] = -1; \ + for (v = 0; v < 6; ++v) { \ + visited[u][v] = false; \ + } \ + } \ + } while (0) + + /* TEST CASE: make sure that an empty path does not cause any edges to be + * visited */ + RESET_ARRAYS(6, pred, visited); + FOREACH_UV_ON_PATH(pred, 3, 5, u, v) { + visited[u][v] = true; + } + for (u = 0; u < 6; ++u) { + for (v = 0; v < 6; ++v) { + check(visited[u][v] == false); + } + } + + /* TEST CASE: make sure that every edge in the given path gets visited */ + RESET_ARRAYS(6, pred, visited); + pred[5] = 2; + pred[2] = 1; + pred[1] = 3; + FOREACH_UV_ON_PATH(pred, 3, 5, u, v) { + visited[u][v] = true; + } + for (u = 0; u < 6; ++u) { + for (v = 0; v < 6; ++v) { + if ((u == 2 && v == 5) || + (u == 1 && v == 2) || + (u == 3 && v == 1)) { + check(visited[u][v] == true); + } + else { + check(visited[u][v] == false); + } + } + } + +#undef RESET_ARRAYS + + /* not technically a macro, but make sure that the pair comparison function + * isn't broken (because it was in an earlier revision...) */ + pair1[0] = 0; pair1[1] = 1; + pair2[0] = 0; pair2[1] = 1; + check(cmp_int_pair(&pair1[0], &pair2[0]) == 0); + + pair1[0] = 1; pair1[1] = 1; + pair2[0] = 0; pair2[1] = 1; + check(cmp_int_pair(pair1, pair2) > 0); + + pair1[0] = 0; pair1[1] = 1; + pair2[0] = 1; pair2[1] = 1; + check(cmp_int_pair(pair1, pair2) < 0); + + pair1[0] = 1; pair1[1] = 0; + pair2[0] = 1; pair2[1] = 1; + check(cmp_int_pair(pair1, pair2) < 0); + + pair1[0] = 1; pair1[1] = 1; + pair2[0] = 1; pair2[1] = 0; + check(cmp_int_pair(pair1, pair2) > 0); + + return 0; +} + +int main(int argc, char *argv[]) +{ + check(test_graph_create(NULL) == 0); + check(test_graph_clone(NULL) == 0); + check(test_graph_accessors(NULL) == 0); + check(test_graph_assignment_solver(NULL) == 0); + check(test_graph_bellman_ford(NULL) == 0); + check(test_graph_flow_conversion(NULL) == 0); + check(test_graph_param_checking(NULL) == 0); + check(test_graph_helper_macros(NULL) == 0); + + return 0; +} diff --git a/test/util/opal_path_nfs.c b/test/util/opal_path_nfs.c index e2405bdefe4..b5fad7ae3dd 100644 --- a/test/util/opal_path_nfs.c +++ b/test/util/opal_path_nfs.c @@ -12,7 +12,7 @@ * All rights reserved. * Copyright (c) 2010 Oak Ridge National Laboratory. * All rights reserved. - * Copyright (c) 2010-2014 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2010-2017 Cisco Systems, Inc. All rights reserved * Copyright (c) 2010 IBM Corporation. All rights reserved. * Copyright (c) 2014 Los Alamos National Security, LLC. All rights * reserved. @@ -132,7 +132,6 @@ void test(char* file, bool expect) void get_mounts (int * num_dirs, char ** dirs[], bool * nfs[]) { -#define MAX_DIR 256 #define SIZE 1024 char * cmd = "mount | cut -f3,5 -d' ' > opal_path_nfs.out"; int rc; @@ -150,13 +149,31 @@ void get_mounts (int * num_dirs, char ** dirs[], bool * nfs[]) **dirs = NULL; *nfs = NULL; } - dirs_tmp = (char**) calloc (MAX_DIR, sizeof(char**)); - nfs_tmp = (bool*) malloc (MAX_DIR * sizeof(bool)); + /* First, count how many mount points there are. Previous + versions of this test tried to have a (large) constant-sized + array for the mount points, but periodically it would break + because we would run this test on a system with a larger number + of mount points than the array. So just count and make sure to + have an array large enough. */ file = fopen("opal_path_nfs.out", "r"); + int count = 0; + while (NULL != fgets (buffer, SIZE, file)) { + ++count; + } + printf("Found %d mounts\n", count); + + // Add one more so we can have a NULL entry at the end + ++count; + + dirs_tmp = (char**) calloc (count, sizeof(char*)); + nfs_tmp = (bool*) calloc (count, sizeof(bool)); + i = 0; rc = 4711; - while (NULL != fgets (buffer, SIZE, file)) { + rewind(file); + // i should never be more than count, but be safe anyway. + while (i < count && NULL != fgets (buffer, SIZE, file)) { int mount_known; char fs[MAXNAMLEN];