Skip to content

Conversation

@ergawy
Copy link
Member

@ergawy ergawy commented Feb 17, 2025

This extends support for generic loop rewriting by:

  1. Preventing nesting multiple worksharing loops inside each other. This is checked by walking the teams loop region searching for any loop directive whose bind modifier is parallel.
  2. Preventing convert to worksharing loop if calls to unknow functions are found in the loop directive's body.

We walk the teams loop body to identify either of the above 2 conditions, if either of them is found to be true, we map the loop directive to distribute.

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels Feb 17, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 17, 2025

@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

This extends support for generic loop rewriting by:

  1. Preventing nesting multiple worksharing loops inside each other. This is checked by walking the teams loop region searching for any loop directive whose bind modifier is parallel.
  2. Preventing convert to worksharing loop if calls to unknow functions are found in the loop directive's body.

We walk the teams loop body to identify either of the above 2 conditions, if either of them is found to be true, we map the loop directive to distribute.


Full diff: https://github.com/llvm/llvm-project/pull/127489.diff

2 Files Affected:

  • (modified) flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp (+49-1)
  • (modified) flang/test/Lower/OpenMP/loop-directive.f90 (+59)
diff --git a/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp b/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
index 3512a537d38c3..5df2408bd7806 100644
--- a/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
@@ -56,7 +56,10 @@ class GenericLoopConversionPattern
           "not yet implemented: Combined `parallel loop` directive");
       break;
     case GenericLoopCombinedInfo::TeamsLoop:
-      rewriteToDistributeParallelDo(loopOp, rewriter);
+      if (teamsLoopCanBeParallelFor(loopOp))
+        rewriteToDistributeParallelDo(loopOp, rewriter);
+      else
+        rewriteToDistrbute(loopOp, rewriter);
       break;
     }
 
@@ -117,6 +120,51 @@ class GenericLoopConversionPattern
     return result;
   }
 
+  static bool teamsLoopCanBeParallelFor(mlir::omp::LoopOp loopOp) {
+    bool canBeParallelFor = true;
+    loopOp.walk([&](mlir::omp::LoopOp nestedLoopOp) {
+      if (nestedLoopOp == loopOp)
+        mlir::WalkResult::advance();
+
+      GenericLoopCombinedInfo combinedInfo =
+          findGenericLoopCombineInfo(nestedLoopOp);
+
+      // Worksharing loops cannot be nested inside each other. Therefore, if the
+      // current `loop` directive nests another `loop` whose `bind` modifier is
+      // `parallel`, this `loop` directive cannot be mapped to `distribute
+      // parallel for` but rather only to `distribute`.
+      if (combinedInfo == GenericLoopCombinedInfo::Standalone &&
+          nestedLoopOp.getBindKind() &&
+          *nestedLoopOp.getBindKind() == mlir::omp::ClauseBindKind::Parallel)
+        canBeParallelFor = false;
+
+      // TODO check for combined `parallel loop` when we support it.
+
+      return canBeParallelFor ? mlir::WalkResult::advance()
+                              : mlir::WalkResult::interrupt();
+    });
+
+    loopOp.walk([&](mlir::CallOpInterface callOp) {
+      // Calls to non-OpenMP API runtime functions inhibits transformation to
+      // `teams distribute parallel do` since the called functions might have
+      // nested parallelism themselves.
+      bool isOpenMPAPI = false;
+      mlir::CallInterfaceCallable callable = callOp.getCallableForCallee();
+
+      if (auto callableSymRef = mlir::dyn_cast<mlir::SymbolRefAttr>(callable))
+        isOpenMPAPI =
+            callableSymRef.getRootReference().strref().find("omp_") == 0;
+
+      if (!isOpenMPAPI)
+        canBeParallelFor = false;
+
+      return canBeParallelFor ? mlir::WalkResult::advance()
+                              : mlir::WalkResult::interrupt();
+    });
+
+    return canBeParallelFor;
+  }
+
   void rewriteStandaloneLoop(mlir::omp::LoopOp loopOp,
                              mlir::ConversionPatternRewriter &rewriter) const {
     using namespace mlir::omp;
diff --git a/flang/test/Lower/OpenMP/loop-directive.f90 b/flang/test/Lower/OpenMP/loop-directive.f90
index 785f732e1b4f5..4cc136fac198a 100644
--- a/flang/test/Lower/OpenMP/loop-directive.f90
+++ b/flang/test/Lower/OpenMP/loop-directive.f90
@@ -179,3 +179,62 @@ subroutine test_standalone_bind_parallel
     c(i) = a(i) * b(i)
   end do
 end subroutine
+
+! CHECK-LABEL: func.func @_QPteams_loop_cannot_be_parallel_for
+subroutine teams_loop_cannot_be_parallel_for
+  implicit none
+  integer :: iter, iter2, val(20)
+  val = 0
+  ! CHECK: omp.teams {
+
+  ! Verify the outer `loop` directive was mapped to only `distribute`.
+  ! CHECK-NOT: omp.parallel {{.*}}
+  ! CHECK:     omp.distribute {{.*}} {
+  ! CHECK-NOT:   omp.wsloop
+  ! CHECK:       omp.loop_nest {{.*}} {
+
+  ! Verify the inner `loop` directive was mapped to a worksharing loop.
+  ! CHECK:         omp.wsloop {{.*}} {
+  ! CHECK:           omp.loop_nest {{.*}} {
+  ! CHECK:           }
+  ! CHECK:         }
+
+  ! CHECK:       }
+  ! CHECK:     }
+
+  ! CHECK: }
+  !$omp target teams loop map(tofrom:val)
+  DO iter = 1, 5
+    !$omp loop bind(parallel)
+    DO iter2 = 1, 5
+      val(iter+iter2) = iter+iter2
+    END DO
+  END DO
+end subroutine
+
+subroutine foo()
+end subroutine
+
+! CHECK-LABEL: func.func @_QPteams_loop_cannot_be_parallel_for_2
+subroutine teams_loop_cannot_be_parallel_for_2
+  implicit none
+  integer :: iter, iter2, val(20)
+  val = 0
+
+  ! CHECK: omp.teams {
+
+  ! Verify the `loop` directive was mapped to only `distribute`.
+  ! CHECK-NOT: omp.parallel {{.*}}
+  ! CHECK:     omp.distribute {{.*}} {
+  ! CHECK-NOT:   omp.wsloop
+  ! CHECK:       omp.loop_nest {{.*}} {
+  ! CHECK:         fir.call @_QPfoo
+  ! CHECK:       }
+  ! CHECK:     }
+
+  ! CHECK: }
+  !$omp target teams loop map(tofrom:val)
+  DO iter = 1, 5
+    call foo()
+  END DO
+end subroutine

@ergawy ergawy force-pushed the target_teams_can_be_parallel_for branch 2 times, most recently from 1011020 to f9369bd Compare February 18, 2025 05:27
ergawy added a commit that referenced this pull request Feb 18, 2025
Extends support for the `loop` directive by adding support for `parallel
loop` combined directive.

Parent PR: #127489. Only the latest commit is relevant.
@ergawy
Copy link
Member Author

ergawy commented Feb 19, 2025

Ping! Please take a look when you have time.

Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This LGTM, I just have some comments to hopefully improve the implementation.

@ergawy ergawy force-pushed the target_teams_can_be_parallel_for branch from f9369bd to e196418 Compare February 20, 2025 15:18
Copy link
Member Author

@ergawy ergawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review Sergio. Handled your comments.

ergawy added a commit that referenced this pull request Feb 20, 2025
Extends support for the `loop` directive by adding support for `parallel
loop` combined directive.

Parent PR: #127489. Only the latest commit is relevant.
Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@ergawy ergawy force-pushed the target_teams_can_be_parallel_for branch from e196418 to 2301cf6 Compare February 21, 2025 13:41
This extends support for generic `loop` rewriting by:
1. Preventing nesting multiple worksharing loops inside each other. This
   is checked by walking the `teams loop` region searching for any
   `loop` directive whose `bind` modifier is `parallel`.
2. Preventing convert to worksharing loop if calls to unknow functions
   are found in the `loop` directive's body.

We walk the `teams loop` body to identify either of the above 2
conditions, if either of them is found to be true, we map the `loop`
directive to `distribute`.
@ergawy ergawy force-pushed the target_teams_can_be_parallel_for branch from 2301cf6 to b069501 Compare February 21, 2025 13:41
@ergawy ergawy merged commit bff6b92 into llvm:main Feb 21, 2025
11 checks passed
ergawy added a commit that referenced this pull request Feb 21, 2025
Extends support for the `loop` directive by adding support for `parallel
loop` combined directive.

Parent PR: #127489. Only the latest commit is relevant.
ergawy added a commit that referenced this pull request Feb 21, 2025
Extends support for the `loop` directive by adding support for `parallel
loop` combined directive.

Parent PR: #127489. Only the latest commit is relevant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants