Skip to content

[flang] [openmp] performance issue due to code generation for private variables #153374

@shivaramaarao

Description

@shivaramaarao

consider the following program

program parallel_do_example
  implicit none
  integer :: i, n, k
  real, dimension(4096) :: a, b, c , x

  n = 4096

  ! Initialize arrays
  do i = 1, n
    a(i) = real(i)
    b(i) = 2.0 * real(i)
    x(k) = 0.0
  end do

  !$OMP PARALLEL DO PRIVATE(x)
  do i = 1, n
    do k = 1, n
      x(k) =  x(k) + a(k) + b(k)
    enddo
    c(i) = a(i) + b(i) * x(i)
  end do
  !$OMP END PARALLEL DO

  ! Print a few results to verify
  print *, 'c(1) = ', c(1)
  print *, 'c(50) = ', c(50)
  print *, 'c(100) = ', c(100)

end program parallel_do_example

$flang -O3 -march=znver5 -fopenmp -S mytest.f90

The generated assembly shows memory allocated for x array through malloc and it is freed at the end of function call

    pushq   %rbp
    pushq   %r15
    pushq   %r14
    pushq   %r13
    pushq   %r12
    pushq   %rbx
    subq    $232, %rsp
    movq    (%rdx), %r15
    movl    $16384, %edi
    movl    (%r15), %ebp
    callq   malloc@PLT
     vzeroupper
        callq   __kmpc_for_static_fini@PLT
.LBB1_30:
        movq    %rbx, %rdi
        callq   free@PLT

This causes significant performance degradation compared to classic flang and ifx compiler. This type of code is present in 350.md benchmark of omp2012. In the benchmark there is an array of size 3 is used and it is being allocated and freed.

A solution would be to allocate the variables in stack rather than malloc and free. that would help to improve the benchmark performance.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions