Skip to content

Potential FPE bug (divide-by-zero) in pstrord() and pdtrord() #146

@cparrott73

Description

@cparrott73

We have a new Fortran compiler under development, and we have included building and testing ScaLAPACK in our nightly regression testing.

Two of the tests, xshseqr and xdhseqr, fail with the development compiler due to a divide by zero FPE. The failure happens at line 1087 in both the pstrord.f and pdtrord.f files:

               IF( FLOPS.NE.0 .AND.
     $              ( FLOPS*100 ) / ( 2*NWIN*NWIN ) .GE. MMULT ) THEN

In the case where NWIN is 0, the divisor in this expression also becomes 0, and hence we get the divide by zero FPE.

Many compilers will short-circuit evaluating the second part of the expression if FLOPS is 0, but this is not strictly required. According to the Fortran 2003 Handbook, p. 222:

The rules for equivalent evaluation schemes allow the compiler to elide evaluating any part of an expression that has no effect on the resulting value of the expression. Consider the expression X ∗ F(Y), where F is a function and X has the value 0. The result will be the same regardless of the value of F(Y); therefore, it need not be evaluated. This shortened evaluation is allowed in all cases, even if F(Y) has side effects. In this case every data object that F could affect is undefined after the expression is evaluated—that is, it does not have a predictable value.
This normally applies to functions in logical expressions where expression evaluation is often “short-circuited”. Some processors evaluate every term in a logical expression, others use run-time tests and skip further evaluation once the result is clear.

Consider

PRESENT( A ) .AND. A > 0 .AND. LOG( A ) < 3.5

where A is an optional argument. If A is not present, the processor is allowed to evaluate the A > 0 term, and the program is invalid. Similarly, if A is present and has a negative value, the processor is allowed to evaluate LOG(A) and the program is again invalid.

The conclusion to be drawn from all of this is that the result of a program using a function with side effects is not predictable and hence not portable. To be completely safe and portable, a subroutine should be used in place of a function when a procedure is needed with a side effect. However, in practice, the side effect will occur as expected in most cases.

Hence, a compiler that chooses to evaluate all parts of the expression is not performing invalid behavior.

We propose that this code should be fixed to guard against compilers choosing to evaluate all parts of the expression, as follows:

               IF( FLOPS.NE.0 .AND.
     $              ( FLOPS*100 ) / MAX(1, 2*NWIN*NWIN ) .GE. MMULT )
     $              THEN

In this case, the divisor is always guaranteed to be at least 1, so no divide by zero will occur.

This change makes the FPE exception go away and allows the test to pass with our development compiler.

Thanks in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions