pythonizing

Using Python for MPI Standard bindings

As of MPI-4.0, the language-independent specification ("LIS"), C, Fortran '90 ("F90"), and Fortran '08 ("F08") bindings are no longer being hard-coded in LaTeX. Instead, a procedural method is used to define what the bindings are (e.g., the procedure name, the parameter names/types/directions/descriptions, etc.), and then the C, F90, and F08 bindings are rendered in LaTeX automatically.

Simple Example

This is best shown through an example. Here's is a snipit of the point-to-point .tex file -- with a little surrounding LaTeX, just for context:

The syntax of the blocking send operation is given below.

\cdeclmainindex{MPI\_Comm}
\begin{mpi-binding}
    functionname('MPI_Send')

    parameter('buf', 'BUFFER', desc='initial address of send buffer', constant=True)
    parameter('count', 'POLYXFER_NUM_ELEM', desc='number of elements in send buffer')
    parameter('datatype', 'DATATYPE', desc='datatype of each send buffer element')
    parameter('dest', 'RANK', desc='rank of destination')
    parameter('tag', 'TAG', desc='message tag')
    parameter('comm', 'COMMUNICATOR')
\end{mpi-binding}

The blocking semantics of this call are described in Section~\ref{sec:pt2pt-modes}.

Notice the new {mpi-binding} section. This section wholly replaces the hard-coded LIS/C/F90/F08 LaTeX bindings.

This section is actually Python code (i.e., when you invoke make, a Python interpreter runs the contents of each {mpi-binding} section to render the bindings in LaTeX). The intent is that you call Python functions to define an MPI routine, such as:

functionname(NAME): define the name of this function. This function is straightforward.
parameter(NAME, TYPE, ...): define a parameter and attributes about this parameter. Several examples of calling parameter(...) are shown above; the parameters that this function accepts will be described in detail below.

With just these two Python functions, LaTeX for the LIS, C, F90, and F08 can be generated for the vast majority of MPI procedures. This rendering is equivalent to what was used in the MPI-3.1 version of the standard.

Although the above definition of MPI_Send is a relatively straightforward example, it is actually a good representation of how the vast majority of MPI functions are now written.

Of course, there are more complicated cases that require a few more Python functions and several more parameters to the parameter() function; those will be described below (an easy example to cite is MPI_WTICK, which has to return a double precision value, not an integer). But for the most part, the Pythonized version of the MPI routines is as simple as is pictured above.

Benefits

Using this scheme provides the following benefits:

Only define an MPI procedure once (vs. effectively defining it four times in LaTeX: LIS, C, F90, F08). Meaning: significantly less typing for / less chances for error by chapter authors (yay!).
Ensure that the LIS, C, F90, and F08 bindings agree in function name, parameter names and types, etc.
Ensure more consistent style of language bindings throughout the entire document.
Allow the possibility of easily making global changes to how bindings are rendered throughout the entire document.
Nearly all Fortran ierror parameter handling is automatic.
Enable the generation and publication of:
- Large portions of mpi.h, mpif.h, the mpi module, and the mpi_f08 module (i.e., all the procedures) that can be used as reference.
- Machine-readable files containing the contents of all the {mpi-binding} blocks (i.e., all procedures, their parameters, etc.).
Decrease the time needed for humans to verify bindings in the text.
Incorporate continuous integration-style checking of the bindings. For example:
- Note any changes to bindings in pull requests for human review before merging.
- Compare any two versions of the .tex source to deterministically and repeatably show the differences between them.

Instructions for Chapter Authors

So how do you go about writing / editing / maintaining MPI bindings in this Python style?

Delete the old LaTeX bindings

First thing to note: we are replacing all the LaTeX bindings with Pythonized bindings. Meaning: delete the old LaTeX bindings when you create new Pythonized bindings (don't just comment them out).

Leaving the old LaTeX bindings in the text just clutters up the document and makes it harder to maintain over time.
The old LaTeX bindings are available in git history if we ever need them.

Let me be clear: the goal is to delete the old LaTeX bindings. Do not just comment them out.

General notes

Let's start with a few general notes about the {mpi-bindings} sections:

When you make (either a full document or an individual chapter), this section will automatically be replaced by generated LaTeX for you.
The generated LaTeX will include appropriate index references, etc. (just like the old hard-coded LaTeX bindings).
{mpi-binding} blocks are only for MPI bindings. They are not for constants, typedefs, examples, or any other type of code blocks. All of those must still be hard-coded in LaTeX with the appropriate macros.
Everything between \begin{mpi-binding} and \end{mpi-binding} is Python.
1. The contents of this section are actually given to a Python interpreter to execute. As a direct consequence, you must obey Python syntax inside {mpi-bindings} sections! This includes whitespace, line breaks, etc. For example:
  - Blank lines are fine.
  - You must consistently whitespace-indent all your code lines to the same level.
  - You may actually use any valid Python code in this block (but this probably isn't too useful).
    - No Python state is shared between different {mpi-binding} sections; each {mpi-binding} section is interpreted in its own, unique Python interpreter.
    - You may only define one MPI routine per {mpi-binding} block
  - Comments can begin with # (you can even use the """-style Python "comments", if desired. Just like in LaTeX, there are sometimes complicated situations where leaving comments for future authors are helpful.
  - Do not use LaTeX escaping in {mpi-binding} sections; use Python escaping. In particular, do not escape underscores (_); the Python code will escape all of those for you when rendering the final LaTeX.
2. If you do not use correct Python syntax, you will actually get a Python syntax error during make. There is unfortunately not good debugging output to indicate which {mpi-binding} block caused the syntax error; you'll have to rely on context from the Python error output to discover the location of your error.

Available Python functions

As stated above, the intent is that you invoke a few Python functions to define the MPI routine. The two main functions that you will use are functionname() and parameter(), but there are a few other functions that are necessary in some cases.

The definitive listing of these functions, parameters, and other information you may need to know are in the binding-tool/binding_tool.py script in the git repository. This is only mentioned in case this wiki documentation gets stale (gasp!).

`functionname()`

This function is straightforward: pass in the mixed-case name of the MPI routine in question. E.g., MPI_Send (not mpi_send or MPI_send). Incorrect casing will not be fixed for you.

It is assumed that this function will be invoked in every single {mpi-binding} block.

`parameter()`

A single invocation of this function describes a single parameter in an MPI routine.

Order of invocation

The order in which parameters are defined via the parameter() function is maintained when then final LaTeX is rendered. Meaning:

functionname('MPI_Foo')
parameter('foo', 'COMMUNICATOR', desc='the communicator')
parameter('bar', 'DATATYPE', desc='the datatype')

will render MPI_Foo(foo, bar), while:

functionname('MPI_Foo')
parameter('bar', 'DATATYPE', desc='the datatype')
parameter('foo', 'COMMUNICATOR', desc='the communicator')

will render MPI_Foo(bar, foo).

Parameters

The parameter() function can take many parameters. The first two are positional and are mandatory. The remaining are either optional or only required in certain cases.

It is highly recommended that you go try to write your bindings with just the name, kind, direction, and desc parameters to parameter(), and use the documentation in this section as a reference for when those four parameters are not sufficient.

`name`

The string name of the parameter. This parameter is always the first parameter, and is required.

`kind`

A string representing the type/kind of the parameter. This parameter is always the second parameter, and is required. The allowable kinds are:

BUFFER: a choice buffer
C_BUFFER: a C choice buffer (e.g., for MPI_ALLOC_MEM, which specifically takes a C buffer argument)
EXTRA_STATE: extra state (e.g., for MPI attribute functions)
FUNCTION: a function pointer. When this type is used, the func_type parameter must be specified to indicate the type of the function pointer.
STRING: a string
STRING_ARRAY: an array of strings (e.g., for MPI_COMM_SPAWN)
STRING_2DARRAY: an array of arrays of strings (e.g., for MPI_COMM_SPAWN_MULTIPLE)
ARRAY_LENGTH: the integer length of an array
- JMS: MAY NEED TO REVISIT THIS?
ATTRIBUTE_VAL_10: the type of MPI attribute values in MPI-1.0. This type only exists because of some deprecated functions that are still listed in the standard. It should not be used for any new bindings.
ATTRIBUTE_VAL: the type of MPI attribute values starting with MPI-2.0.
BLOCKLENGTH: integer length of blocks
COLOR: color for algebraic operations (e.g., for MPI_COMM_SPLIT)
ENUM: an arbitrary enum-like integer
FILE_DESCRIPTOR: an integer file descriptor (e.g., for MPI_COMM_JOIN)
KEY: an integer key (e.g., for MPI_COMM_SPLIT)
KEYVAL: integer keyvals for MPI attribute functions
INDEX: integer index into an array
LOGICAL: Boolean true/false value
NUM_DIMS: integer number of dimensions
RANK: integer rank in a communicator or group
COMM_SIZE: the integer number of processes in a communicator or group
STRING_LENGTH: the integer length of a string
STRIDE_BYTES: an integer stride expressed as a number of bytes
STRIDE_ELEM: an integer stride expressed as a number of elements
TAG: an integer tag
VERSION: an integer version
DEFERRED_INT: an integer that we may need to revist to figure out if it needs to be embiggened for "big count" purposes in MPI-4.0
- JMS the intent is that this type will disappear before MPI-4.0 is published
ALLOC_MEM_NUM_BYTES: this should probably be DEFERRED_INT. See MPI_ALLOC_MEM.
PACK_EXTERNAL_SIZE: this should probably be DEFERRED_INT. See the external pack routines.
DISPLACEMENT_BIG: MPI-3.1 "big" parameters in the _X functions.
XFER_NUM_ELEM_BIG: MPI-3.1 "big" parameters in the _X functions.
NUM_BYTES_BIG: MPI-3.1 "big" parameters in the _X functions.
ERROR_CODE: an integer MPI error code
ERROR_CLASS: an integer MPI error class
ORDER: an integer MPI enum-like value
THREAD_LEVEL: an integer MPI enum-like value
COMBINER: an integer MPI enum-like value
POLYDISPLACEMENT: a displacement that is currently setup to render as plain integer in MPI-3.1 style and "big" integer in MPI-4.0 style.
- JMS: MAY NEED TO REVISIT THIS?
POLYDTYPE_NUM_ELEM: a datatype number of elements that is currently setup to render as plain integer in MPI-3.1 style and "big" integer in MPI-4.0 style.
- JMS: MAY NEED TO REVISIT THIS?
POLYNUM_BYTES: a datatype number of bytes that is currently setup to render as plain integer in MPI-3.1 style and "big" integer in MPI-4.0 style.
- JMS: MAY NEED TO REVISIT THIS?
POLYXFER_NUM_ELEM: a number of elements that is currently setup to render as plain integer in MPI-3.1 style and "big" integer in MPI-4.0 style.
- JMS: MAY NEED TO REVISIT THIS?
COMMUNICATOR: an MPI communicator handle
DATATYPE: an MPI datatype handle
ERRHANDLER: an MPI errhandler handle
FILE: an MPI file handle
GROUP: an MPI group handle
INFO: an MPI info handle
MESSAGE: an MPI message handle
REQUEST: an MPI request handle
STATUS: an MPI status
WINDOW: an MPI window handle

`desc`

This string parameter is passed by name (i.e., desc="blah").

It is technically not required, but it is strongly recomended.

The string value is rendered as part of the LIS.

`direction`

Indicate the direction intent of this parameter. This parameter affects the rendering in most language bindings:

LIS: determines the IN, INOUT, or OUT label
C: generally determines whether the parameter is passed by value or reference
F90: does not affect the rendering
F08: generally determines the INTENT clause

The allowable values of the direction parameter are:

in: since the majority of MPI parameters are intent IN, in is the default value for this parameter. Parameters marked as in will be rendered as being passed by value.
out: the OUT intent. Parameters marked as out will be rendered as being passed by reference.
inout: the INOUT intent. Parameters marked as inout will be rendered as being passed by reference -- except for one special case. See below.

There is a special case: MPI's definition of INOUT has a peculiar meaning with regards to MPI handles. Specifically: if an MPI handle parameter is marked as INOUT, it may be passed by value or it may be passed by reference depending on the situation.

By default, inout-marked parameters are passed by reference. But for cases where the MPI binding actually requires the MPI handle to be passed by value, you can pass a special value to the direction indicating the disparity. For example:

functionname('MPI_Comm_set_info')

parameter('comm', 'COMMUNICATOR', direction='lis:inout,param:in',
        desc='communicator')
parameter('info', 'INFO', desc='info object')

For MPI_COMM_SET_INFO, the comm argument is marked INOUT in the LIS, but it is passed by value in the C/F08 bindings. Hence, we pass lis:inout to indicate that the LIS should be rendered as INOUT, but the C/Fortran bindings parameter should be rendered as IN.

Thanks, MPI! 😉

`length`

For single-dimension array parameters, this value is set to a string representing the length of the array (when that length is known). For example:

functionname('MPI_Waitall')
parameter('count', 'ARRAY_LENGTH', desc='lists length')
parameter('array_of_requests', 'REQUEST',  desc='array of requests',
          direction='inout', length='count')
parameter('array_of_statuses', 'STATUS',
          desc='array of status objects',
          direction='out', length='*')

Note the array_of_requests parameter lists count as its length, because that array is defined to be the length specified by the count parameter.

Note, too, the length for the array_of_statuses parameter is *. This is because the length of the array is not known (specifically, because it could be MPI_STATUS_IGNORE), and therefore must be rendered as (*) in Fortran (array lengths are not rendered in C; arrays are rendered as [] in C).

There are a small number of two-dimension arrays in MPI. 2D string arrays are a special beast and have their own type (STRING_2DARRAY) and do not use the length parameter. But functions like MPI_GROUP_RANGE_INCL require a fixed 2D array of integers. Consider:

functionname('MPI_Group_range_incl')

parameter('group', 'GROUP', desc='group')
parameter('n', 'DEFERRED_INT',
          desc='number of triplets in array \mpiarg{ranges}')
parameter('ranges', 'RANK', length=['n', '3'],
          desc='a one-dimensional array of integer triplets, of the form (first rank, last rank, stride) indicating ranks in \mpiarg{group} of processes to be included in \mpiarg{newgroup}')
parameter('newgroup', 'GROUP', direction='out',
          desc='new group derived from above, in the order defined by \mpiarg{ranges}')

Notice that length is an array of each of the dimension lengths.

`func_type`

If the kind parameter is FUNCTION, this parameter must be specified.

The string value is the type of the function parameter. For example:

functionname('MPI_Comm_create_keyval')

parameter('comm_copy_attr_fn', 'FUNCTION',
          func_type='MPI_Comm_copy_attr_function',
          desc='copy callback function for \mpiarg{comm_keyval}')
parameter('comm_delete_attr_fn', 'FUNCTION',
          func_type='MPI_Comm_delete_attr_function',
          desc='delete callback function for \mpiarg{comm_keyval}')
# ...etc.

Specifying the function pointer type allows the C/Fortran bindings to render the correct type.

`pointer`

This is a Boolean value (that defaults to False) that allows you to override the rendering and render passing the parameter by value.

`constant`

This is a Boolean value (that defaults to False) that indicates whether the parameter is constant or not. In C, this translates to prefixing the type with const.

`root_only`

This is a Boolean value (that defaults to False). If set to True, the string ", significant only at root" is added to the description. It is meant as a shortcut / syntatic sugar for the many rooted MPI routings.

`mpi_owned`

This is a Boolean value (that defaults to False). When set to True, it indicates that MPI retains ownership of this value after the routine returns. This causes the ASYNCHRONOUS keyword to be rendered in the F08 bindings for this parameter.

`suppress`

This parameter is used to suppress the rendering of certain properties. They are generally very special cases, and are only needed infrequently. This parameter can take the following values:

f08_intent: do not emit the F08 INTENT clause. Specifically, the INTENT clause is rendered for most F08 parameters. There are a few cases where INTENT is not rendered, and those are usually automatically detected by the rendering engine. However, there are a few cases where we specifically do not include an INTENT clause in the F08 bindings, but the reasons for omitting the INTENT are obscure and/or do not fit into a general rule that the rendering engine knows. Hence, you can pass suppress=f08_intent to cause the F08 bindings to not emit an INTENT clause for this parameter.

functionname('MPI_Buffer_attach')
parameter('buffer', 'BUFFER', desc='initial buffer address',
          mpi_owned=True, suppress='f08_intent')
parameter('size', 'POLYNUM_BYTES', desc='buffer size, in bytes')

`optional`

This is a Boolean value (that defaults to False). It is used to indicate optional parameters. The only notable parameter that meets this description is the F08 ierror, which is automatically included in all parameter lists unless the no_ierror() function is invoked.

`parameters_c_only()`

This function takes the same arguments as parameter(), but these parameters are only used in the LIS+C bindings (not the Fortran bindings). MPI_INIT and MPI_INIT_THREAD are good examples where this is needed.

JMS NOT SURE THIS HAS BEEN TESTED / NOT EXACTLY SURE WHAT THE PARAMS ARE TO THIS FUNCTION

`returntype()`

Nearly all MPI routines return an int in C and render a SUBROUTINE in Fortran (i.e., no return value). However, there are a small number of routines that return something else. The returntype() function accepts the following values:

INT: if not invoked, INT is assumed. Return an int in C and render a SUBROUTINE in Fortran.
DOUBLE: return a double precision value.
ADDRESS: return an address type.

functionname('MPI_Wtime')
returntype('DOUBLE')
no_ierror()

`no_ierror()`

A small number of MPI routines do not have an ierror parameter to the Fortran bindings. Invoking this function suppresses the ierror parameter in Fortran bindings. For example:

functionname('MPI_Wtick')
returntype('DOUBLE')
no_ierror()

`no_x_variant()`

Only relevant for MPI-4.0-style rendering, which isn't done yet.

JMS TBD

`no_f08_binding()`

When this function is invoked, the F08 binding is suppressed.

This function is really only necessary for several MPI-1.0 functions that are still listed in the deprecated chapter that have no F08 bindings. It should probably not be used for new MPI routines.

pythonizing

Using Python for MPI Standard bindings

Simple Example

Benefits

Instructions for Chapter Authors

Delete the old LaTeX bindings

General notes

Available Python functions

functionname()

parameter()

Order of invocation

Parameters

name

kind

desc

direction

length

func_type

pointer

constant

root_only

mpi_owned

suppress

optional

parameters_c_only()

returntype()

no_ierror()

no_x_variant()

no_f08_binding()

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

`functionname()`

`parameter()`

`name`

`kind`

`desc`

`direction`

`length`

`func_type`

`pointer`

`constant`

`root_only`

`mpi_owned`

`suppress`

`optional`

`parameters_c_only()`

`returntype()`

`no_ierror()`

`no_x_variant()`

`no_f08_binding()`