Skip to content

[flang] surprising performance loss with nested type operator overloading #129779

@ivan-pi

Description

@ivan-pi

I've attempted to create a Fortran implementation benchmark which sums an array of numbers, but in different ways to measure the overhead of operator overloading for simple value types:

abstraction_penalty.F90.txt (alternatively, on gist)

When I run the program, I see the output:

$ flang-new -O2 abstraction_penalty.F90 
$ ./a.out
[info] compiler: Homebrew flang version 19.1.4 (https://github.com/Homebrew/homebrew-core/issues)
[info] compiler options: flang-new -O2 abstraction_penalty.F90
[info] using naive sum
[info] number of iterations: 25000

        test    absolute   additions  ratio with
      number  time (sec)  per second       test0

           0      0.0532   9.400E+02       1.000
           1      0.0498   1.003E+03       0.937
           2      0.0493   1.015E+03       0.926
           3      0.0526   9.511E+02       0.988
           4      0.0595   8.410E+02       1.118
           5      0.0515   9.700E+02       0.969
           6      0.0486   1.029E+03       0.913
           7      0.0485   1.031E+03       0.912
           8      0.0490   1.020E+03       0.922
           9      0.0472   1.059E+03       0.888
          10      0.0483   1.036E+03       0.907
          11      0.0485   1.031E+03       0.912
          12      0.0479   1.044E+03       0.901
          13      0.0481   1.039E+03       0.905
          14      6.7735   7.382E+00     127.336
          15      6.7167   7.444E+00     126.267
          16      0.0467   1.071E+03       0.878
          17      0.0452   1.105E+03       0.850
          18      0.0451   1.108E+03       0.849
          19      0.0452   1.105E+03       0.850
          20      0.0476   1.050E+03       0.895
          21      0.0469   1.066E+03       0.882
          22      0.0467   1.071E+03       0.877
          23      0.0461   1.086E+03       0.866
          24      0.0454   1.101E+03       0.853
          25      0.0452   1.105E+03       0.851
          26      0.0456   1.097E+03       0.857
          27      0.0454   1.102E+03       0.853
          28      6.6540   7.514E+00     125.089
          29      6.5274   7.660E+00     122.709
------------------------------------------------
        mean      0.0928   5.386E+02        1.75

A value <= 1.0 for the mean, means there is no penalty associated with the use of derived types, compared to an intrinsic type.

The slow cases (14, 15, 28, 29) are calling the procedure test_ddd, which calls dsum for the type(ddd), which is really just a double value but defined in a obscure way:

    integer, parameter :: dp = c_double

    ! Double wrapper
    type :: dd
        real(dp) :: val
    end type

    ! Double wrapper child with TBP
    type, extends(dd) :: ddi
    contains
        procedure :: get => get_ddi_val
    end type

    ! Double wrapper wrapper
    type :: ddd
        type(dd) :: val
    end type

The sum procedure looks as follows:

    pure function ddd_sum(a) result(res)
        type(ddd), intent(in) :: a(:)
        type(ddd) :: res
        real(dp), pointer :: t(:)
#if USE_INTRINSIC_SUM
        res%val%val = sum(a%val%val)
#else
        integer :: i
        res = ddd(dd(0.0_dp))
        do i = 1, size(a)
            res = res + a(i)
        end do
#endif
    end function

where the + is the overloaded operator(+) defined as,

    pure function ddd_add(a,b) result(c)
        type(ddd), intent(in) :: a, b
        type(ddd) :: c
        c%val%val = a%val%val + b%val%val
    end function

If the intrinsic sum (-DUSE_INTRINSIC_SUM) is used instead, there are no observable penalties. There are other switches too, namely -DUSE_INTRINSIC_REDUCE which displays good performance, and -DUSE_STRUCTURE_CONSTRUCTOR which makes the performance even worse (300x slower than the baseline).

Metadata

Metadata

Assignees

No one assigned

    Labels

    flangFlang issues not falling into any other category

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions