- 
                Notifications
    You must be signed in to change notification settings 
- Fork 15k
[flang][Multi-Image] Moving Mutli-image lowering to PRIF into the MIF dialect #161179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[flang][Multi-Image] Moving Mutli-image lowering to PRIF into the MIF dialect #161179
Conversation
| @llvm/pr-subscribers-flang-driver Author: Jean-Didier PAILLEUX (JDPailleux) ChangesSupport for multi-image features has begun to be integrated into LLVM. A new dialect which simplifies lowering to PRIF wil be proposed in this PR. Features like TEAMs and the use of coarrays will follow later in other PRs. Patch is 231.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/161179.diff 59 Files Affected: 
 diff --git a/flang/include/flang/Optimizer/Builder/Runtime/Coarray.h b/flang/include/flang/Optimizer/Builder/Runtime/Coarray.h
deleted file mode 100644
index 20bfb7c124af2..0000000000000
--- a/flang/include/flang/Optimizer/Builder/Runtime/Coarray.h
+++ /dev/null
@@ -1,85 +0,0 @@
-//===-- Coarray.h -- generate Coarray intrinsics runtime calls --*- C++ -*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-
-#ifndef FORTRAN_OPTIMIZER_BUILDER_RUNTIME_COARRAY_H
-#define FORTRAN_OPTIMIZER_BUILDER_RUNTIME_COARRAY_H
-
-#include "flang/Lower/AbstractConverter.h"
-#include "flang/Optimizer/Support/InternalNames.h"
-#include "mlir/Dialect/Func/IR/FuncOps.h"
-
-namespace fir {
-class ExtendedValue;
-class FirOpBuilder;
-} // namespace fir
-
-namespace fir::runtime {
-
-// Get the function type for a prif subroutine with a variable number of
-// arguments
-#define PRIF_FUNCTYPE(...)                                                     \
-  mlir::FunctionType::get(builder.getContext(), /*inputs*/ {__VA_ARGS__},      \
-                          /*result*/ {})
-
-// Default prefix for subroutines of PRIF compiled with LLVM
-#define PRIFNAME_SUB(fmt)                                                      \
-  []() {                                                                       \
-    std::ostringstream oss;                                                    \
-    oss << "prif_" << fmt;                                                     \
-    return fir::NameUniquer::doProcedure({"prif"}, {}, oss.str());             \
-  }()
-
-#define PRIF_STAT_TYPE builder.getRefType(builder.getI32Type())
-#define PRIF_ERRMSG_TYPE                                                       \
-  fir::BoxType::get(fir::CharacterType::get(builder.getContext(), 1,           \
-                                            fir::CharacterType::unknownLen()))
-
-/// Generate Call to runtime prif_init
-mlir::Value genInitCoarray(fir::FirOpBuilder &builder, mlir::Location loc);
-
-/// Generate Call to runtime prif_num_images
-mlir::Value getNumImages(fir::FirOpBuilder &builder, mlir::Location loc);
-
-/// Generate Call to runtime prif_num_images_with_team or
-/// prif_num_images_with_team_number
-mlir::Value getNumImagesWithTeam(fir::FirOpBuilder &builder, mlir::Location loc,
-                                 mlir::Value team);
-
-/// Generate Call to runtime prif_this_image_no_coarray
-mlir::Value getThisImage(fir::FirOpBuilder &builder, mlir::Location loc,
-                         mlir::Value team = {});
-
-/// Generate call to runtime subroutine prif_co_broadcast
-void genCoBroadcast(fir::FirOpBuilder &builder, mlir::Location loc,
-                    mlir::Value A, mlir::Value sourceImage, mlir::Value stat,
-                    mlir::Value errmsg);
-
-/// Generate call to runtime subroutine prif_co_max and prif_co_max_character
-void genCoMax(fir::FirOpBuilder &builder, mlir::Location loc, mlir::Value A,
-              mlir::Value resultImage, mlir::Value stat, mlir::Value errmsg);
-
-/// Generate call to runtime subroutine prif_co_min or prif_co_min_character
-void genCoMin(fir::FirOpBuilder &builder, mlir::Location loc, mlir::Value A,
-              mlir::Value resultImage, mlir::Value stat, mlir::Value errmsg);
-
-/// Generate call to runtime subroutine prif_co_sum
-void genCoSum(fir::FirOpBuilder &builder, mlir::Location loc, mlir::Value A,
-              mlir::Value resultImage, mlir::Value stat, mlir::Value errmsg);
-
-/// Generate call to runtime subroutine prif_sync_all
-void genSyncAllStatement(fir::FirOpBuilder &builder, mlir::Location loc,
-                         mlir::Value stat, mlir::Value errmsg);
-/// Generate call to runtime subroutine prif_sync_memory
-void genSyncMemoryStatement(fir::FirOpBuilder &builder, mlir::Location loc,
-                            mlir::Value stat, mlir::Value errmsg);
-/// Generate call to runtime subroutine prif_sync_images
-void genSyncImagesStatement(fir::FirOpBuilder &builder, mlir::Location loc,
-                            mlir::Value imageSet, mlir::Value stat,
-                            mlir::Value errmsg);
-} // namespace fir::runtime
-#endif // FORTRAN_OPTIMIZER_BUILDER_RUNTIME_COARRAY_H
diff --git a/flang/include/flang/Optimizer/Dialect/CMakeLists.txt b/flang/include/flang/Optimizer/Dialect/CMakeLists.txt
index adefcfea0b5dc..7b1a12932276a 100644
--- a/flang/include/flang/Optimizer/Dialect/CMakeLists.txt
+++ b/flang/include/flang/Optimizer/Dialect/CMakeLists.txt
@@ -1,5 +1,6 @@
 add_subdirectory(CUF)
 add_subdirectory(FIRCG)
+add_subdirectory(MIF)
 
 # This replicates part of the add_mlir_dialect cmake function from MLIR that
 # cannot be used her because it expects to be run inside MLIR directory which
diff --git a/flang/include/flang/Optimizer/Dialect/MIF/CMakeLists.txt b/flang/include/flang/Optimizer/Dialect/MIF/CMakeLists.txt
new file mode 100644
index 0000000000000..27ba3889c8045
--- /dev/null
+++ b/flang/include/flang/Optimizer/Dialect/MIF/CMakeLists.txt
@@ -0,0 +1,9 @@
+set(LLVM_TARGET_DEFINITIONS MIFDialect.td)
+mlir_tablegen(MIFDialect.h.inc -gen-dialect-decls -dialect=mif)
+mlir_tablegen(MIFDialect.cpp.inc -gen-dialect-defs -dialect=mif)
+
+# Add Multi Image Fortran operations
+set(LLVM_TARGET_DEFINITIONS MIFOps.td)
+mlir_tablegen(MIFOps.h.inc -gen-op-decls)
+mlir_tablegen(MIFOps.cpp.inc -gen-op-defs)
+add_public_tablegen_target(MIFOpsIncGen)
diff --git a/flang/include/flang/Optimizer/Dialect/MIF/MIFDialect.h b/flang/include/flang/Optimizer/Dialect/MIF/MIFDialect.h
new file mode 100644
index 0000000000000..f862d9175a6ae
--- /dev/null
+++ b/flang/include/flang/Optimizer/Dialect/MIF/MIFDialect.h
@@ -0,0 +1,31 @@
+//===- MIF.h - MIF dialect --------------------------------*- C++-*-==//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef FORTRAN_OPTIMIZER_DIALECT_MIF_MIF_H
+#define FORTRAN_OPTIMIZER_DIALECT_MIF_MIF_H
+
+#include "mlir/Bytecode/BytecodeOpInterface.h"
+#include "mlir/IR/Dialect.h"
+#include "mlir/IR/OpDefinition.h"
+#include "mlir/IR/OpImplementation.h"
+#include "mlir/IR/SymbolTable.h"
+#include "mlir/Interfaces/CallInterfaces.h"
+#include "mlir/Interfaces/InferTypeOpInterface.h"
+#include "mlir/Interfaces/SideEffectInterfaces.h"
+#include "mlir/Interfaces/VectorInterfaces.h"
+
+// #include "flang/Optimizer/Dialect/FIRAttr.h"
+// #include "flang/Optimizer/Dialect/FIRType.h"
+
+//===----------------------------------------------------------------------===//
+// MIFDialect
+//===----------------------------------------------------------------------===//
+
+#include "flang/Optimizer/Dialect/MIF/MIFDialect.h.inc"
+
+#endif // FORTRAN_OPTIMIZER_DIALECT_MIF_MIF_H
diff --git a/flang/include/flang/Optimizer/Dialect/MIF/MIFDialect.td b/flang/include/flang/Optimizer/Dialect/MIF/MIFDialect.td
new file mode 100644
index 0000000000000..f8be6a86a79fe
--- /dev/null
+++ b/flang/include/flang/Optimizer/Dialect/MIF/MIFDialect.td
@@ -0,0 +1,38 @@
+//===-- MIFDialect.td - MIF dialect base definitions - tablegen ---------*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// Definition of the Multi-Image Fortran (MIF)  dialect
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef FORTRAN_DIALECT_MIF_MIFDIALECT
+#define FORTRAN_DIALECT_MIF_MIFDIALECT
+
+include "mlir/IR/AttrTypeBase.td"
+include "mlir/IR/EnumAttr.td"
+include "mlir/IR/OpBase.td"
+
+def MIFDialect : Dialect {
+  let name = "mif";
+
+  let summary = "Multi-Image Fortran dialect";
+
+  let description = [{
+    The "MIF" dialect is designed to contain the basic coarray operations
+    in Fortran and all multi image operations as descibed in the standard.
+    This includes synchronization operations, atomic operations,
+    image queries, teams, criticals, etc. The MIF dialect operations use 
+    the FIR types and are tightly coupled with FIR and HLFIR.
+  }];
+
+  let cppNamespace = "::mif";
+  let dependentDialects = ["fir::FIROpsDialect"];
+}
+
+#endif // FORTRAN_DIALECT_MIF_MIFDIALECT
diff --git a/flang/include/flang/Optimizer/Dialect/MIF/MIFOps.h b/flang/include/flang/Optimizer/Dialect/MIF/MIFOps.h
new file mode 100644
index 0000000000000..a9e53c21171c4
--- /dev/null
+++ b/flang/include/flang/Optimizer/Dialect/MIF/MIFOps.h
@@ -0,0 +1,20 @@
+//===-- Optimizer/Dialect/MIF/MIFOps.h - MIF operations ---------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef FORTRAN_OPTIMIZER_DIALECT_MIF_MIFOPS_H
+#define FORTRAN_OPTIMIZER_DIALECT_MIF_MIFOPS_H
+
+#include "flang/Optimizer/Dialect/FIRType.h"
+#include "flang/Optimizer/Dialect/MIF/MIFDialect.h"
+#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
+#include "mlir/IR/OpDefinition.h"
+
+#define GET_OP_CLASSES
+#include "flang/Optimizer/Dialect/MIF/MIFOps.h.inc"
+
+#endif // FORTRAN_OPTIMIZER_DIALECT_MIF_MIFOPS_H
diff --git a/flang/include/flang/Optimizer/Dialect/MIF/MIFOps.td b/flang/include/flang/Optimizer/Dialect/MIF/MIFOps.td
new file mode 100644
index 0000000000000..115ef0c65a76f
--- /dev/null
+++ b/flang/include/flang/Optimizer/Dialect/MIF/MIFOps.td
@@ -0,0 +1,274 @@
+//===-- MIFOps.td - MIF operation definitions ------*- tablegen -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// Definition of the MIF dialect operations
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef FORTRAN_DIALECT_MIF_MIF_OPS
+#define FORTRAN_DIALECT_MIF_MIF_OPS
+
+include "flang/Optimizer/Dialect/MIF/MIFDialect.td"
+include "flang/Optimizer/Dialect/FIRTypes.td"
+include "flang/Optimizer/Dialect/FIRAttr.td"
+include "mlir/Dialect/LLVMIR/LLVMAttrDefs.td"
+include "mlir/Dialect/LLVMIR/LLVMOpBase.td"
+include "mlir/Interfaces/LoopLikeInterface.td"
+include "mlir/IR/BuiltinAttributes.td"
+
+class mif_Op<string mnemonic, list<Trait> traits>
+    : Op<MIFDialect, mnemonic, traits>;
+
+//===----------------------------------------------------------------------===//
+// Initialization and Finalization
+//===----------------------------------------------------------------------===//
+
+def mif_InitOp : mif_Op<"init", []> {
+  let summary = "Initialize the parallel environment";
+  let description = [{This operation will initialize the parallel environment}];
+
+  let results = (outs I32:$stat);
+  let assemblyFormat = "`->` type($stat) attr-dict";
+}
+
+//===----------------------------------------------------------------------===//
+// Image Queries
+//===----------------------------------------------------------------------===//
+
+def mif_NumImagesOp
+    : mif_Op<"num_images", [NoMemoryEffect, AttrSizedOperandSegments]> {
+  let summary = "Query the number of images in the specified or current team";
+  let description = [{
+    This operation query the number of images in the specified or current
+    team and can be called with 3 differents way :
+    - `num_images()`
+    - `num_images(team)`
+    - `num_images(team_number)`
+
+    Arguments:
+    - `team` : Shall be a scalar of type `team_type` from the `ISO_FORTRAN_ENV`
+            module with a value that identifies the current or ancestor team.
+    - `team_number` :  Shall be an integer scalar. It shall identify the
+            initial team or a sibling team of the current team.
+
+    Result Value: The number of images in the specified team, or in the current
+    team if no team is specified.
+  }];
+
+  let arguments = (ins Optional<AnyInteger>:$team_number,
+                       Optional<AnyRefOrBoxType>:$team);
+  let results = (outs I32:$res);
+
+  let builders = [OpBuilder<(ins CArg<"mlir::Value", "{}">:$teamArg)>];
+
+  let hasVerifier = 1;
+  let assemblyFormat = [{
+    ( `team_number` `(` $team_number^ `:` type($team_number) `)` )? 
+    ( `team` `(` $team^ `:` type($team) `)` )? 
+    attr-dict `->` type($res)
+  }];
+}
+
+def mif_ThisImageOp
+    : mif_Op<"this_image", [NoMemoryEffect, AttrSizedOperandSegments]> {
+  let summary = "Determine the image index of the current image";
+  let description = [{
+    Arguments:
+    - `coarray` :  Shall be a coarray of any type.
+    - `dim` : Shall be an integer scalar. Its value shall be in the range of
+          1 <= DIM <= N, where N is the corank of the coarray.
+    - `team`(optional) : Shall be a scalar of type `team_type` from
+          ISO_FORTRAN_ENV. If the `coarray` is present, it shall be
+          established in that team.
+
+    Results:
+    - Case(1) : The result of `this_image([team])` is a scalar with a value
+          equal to the index of the image in the current or specified team.
+    - Case(2) : The result of `this_image(coarray [,team])` is the sequence of
+          cosubscript values for `coarray`.
+    - Case(3) : The result of `this_image(coarray, dim [,team])` is the value of
+          cosubscript `dim` in the sequence of cosubscript values for `coarray`.
+
+    Example:
+    ```fortran
+      REAL :: A[10, 0:9, 0:*]
+    ```
+    If we take a look on the example and we are on image 5, `this_image` has the
+    value 5, `this_image(A)` has the value [5, 0, 0].
+  }];
+
+  let arguments = (ins Optional<fir_BoxType>:$coarray,
+                       Optional<AnyInteger>:$dim, Optional<fir_BoxType>:$team);
+  let results = (outs I32:$res);
+
+  let builders = [OpBuilder<(ins "mlir::Value":$coarray, "mlir::Value":$team)>,
+                  OpBuilder<(ins "mlir::Value":$team)>];
+
+  let hasVerifier = 1;
+  let assemblyFormat = [{
+    ( `coarray` `(` $coarray^ `:` type($coarray) `)` )? 
+    ( `team` `(` $team^ `:` type($team) `)` )? 
+    ( `dim` `(` $dim^ `:` type($dim) `)` )? 
+    attr-dict `->` type($res)
+  }];
+}
+
+//===----------------------------------------------------------------------===//
+// Synchronization
+//===----------------------------------------------------------------------===//
+
+def mif_SyncAllOp : mif_Op<"sync_all", [AttrSizedOperandSegments,
+                                        MemoryEffects<[MemWrite]>]> {
+  let summary =
+      "Performs a collective synchronization of all images in the current team";
+
+  let arguments = (ins Arg<Optional<AnyReferenceLike>, "", [MemWrite]>:$stat,
+                       Arg<Optional<AnyRefOrBoxType>, "", [MemWrite]>:$errmsg);
+  let assemblyFormat = [{
+    (`stat` `(` $stat^ `:` type($stat) `)` )?
+    (`errmsg` `(` $errmsg^ `:` type($errmsg) `)` )? 
+    attr-dict
+  }];
+}
+
+def mif_SyncImagesOp : mif_Op<"sync_images", [AttrSizedOperandSegments,
+                                              MemoryEffects<[MemWrite]>]> {
+  let summary = "Performs a synchronization of image with each of the other "
+                "images in the `image_set`";
+  let description = [{
+    This operation can take an optional argument `image_set`, wich must be an integer expression
+    and must be scalar or rank one. If `image_set` is omitted from the call, this operation will 
+    adopt the behavior of the Fortran statement `SYNC IMAGES(*)`.
+  }];
+
+  let arguments = (ins Optional<AnyRefOrBoxType>:$image_set,
+                       Arg<Optional<AnyReferenceLike>, "", [MemWrite]>:$stat,
+                       Arg<Optional<AnyRefOrBoxType>, "", [MemWrite]>:$errmsg);
+  let hasVerifier = 1;
+  let assemblyFormat = [{
+    (`image_set` `(` $image_set^ `:` type($image_set) `)` )? 
+    (`stat` `(` $stat^ `:` type($stat) `)` )?
+    (`errmsg` `(` $errmsg^ `:` type($errmsg) `)` )? 
+    attr-dict
+  }];
+}
+
+def mif_SyncMemoryOp : mif_Op<"sync_memory", [AttrSizedOperandSegments,
+                                              MemoryEffects<[MemWrite]>]> {
+  let summary = "Operation that ends one segment and begins another.";
+  let description = [{
+    Operation that ends one segment and begins another; Those two segments can 
+    be ordered by user-defined way with respect to segments on other images.
+  }];
+
+  let arguments = (ins Arg<Optional<AnyReferenceLike>, "", [MemWrite]>:$stat,
+                       Arg<Optional<AnyRefOrBoxType>, "", [MemWrite]>:$errmsg);
+  let assemblyFormat = [{
+    (`stat` `(` $stat^ `:` type($stat) `)` )?
+    (`errmsg` `(` $errmsg^ `:` type($errmsg) `)` )? 
+    attr-dict
+  }];
+}
+
+//===----------------------------------------------------------------------===//
+// Collective Operations
+//===----------------------------------------------------------------------===//
+
+def mif_CoBroadcastOp : mif_Op<"co_broadcast", [AttrSizedOperandSegments,
+                                                MemoryEffects<[MemWrite]>]> {
+  let summary = "Broadcast value to images.";
+  let description = [{
+    The co_broadcast operation performs the computation of the sum
+    across images.
+  }];
+
+  let arguments = (ins fir_BoxType:$a, 
+                       AnyIntegerType:$source_image,
+                       Arg<Optional<AnyReferenceLike>, "", [MemWrite]>:$stat,
+                       Arg<Optional<AnyRefOrBoxType>, "", [MemWrite]>:$errmsg);
+
+  let assemblyFormat = [{
+    $a `:` qualified(type($a)) 
+    `source` `(` $source_image `:` type($source_image) `)`
+    (`stat`  `(` $stat^ `:` type($stat) `)` )?
+    (`errmsg` `(` $errmsg^ `:` type($errmsg) `)` )? 
+    attr-dict
+  }];
+}
+
+def mif_CoMaxOp
+    : mif_Op<"co_max", [AttrSizedOperandSegments, MemoryEffects<[MemWrite]>]> {
+  let summary = "Compute maximum value across images.";
+  let description = [{
+    The co_max operation performs the computation of the maximum 
+    across images.
+  }];
+
+  let arguments = (ins fir_BoxType:$a, 
+                       Optional<AnyIntegerType>:$result_image,
+                       Arg<Optional<AnyReferenceLike>, "", [MemWrite]>:$stat,
+                       Arg<Optional<AnyRefOrBoxType>, "", [MemWrite]>:$errmsg);
+
+  let hasVerifier = 1;
+  let assemblyFormat = [{
+    $a `:`  qualified(type($a))
+    (`result` `(` $result_image^ `:` type($result_image) `)` )? 
+    (`stat` `(` $stat^ `:` type($stat) `)` )?
+    (`errmsg` `(` $errmsg^ `:` type($errmsg) `)` )? 
+    attr-dict
+  }];
+}
+
+def mif_CoMinOp
+    : mif_Op<"co_min", [AttrSizedOperandSegments, MemoryEffects<[MemWrite]>]> {
+  let summary = "Compute minimum value across images.";
+  let description = [{
+    The co_min operation performs the computation of the minimum
+    across images.
+  }];
+
+  let arguments = (ins fir_BoxType:$a,
+                       Optional<AnyIntegerType>:$result_image,
+                       Arg<Optional<AnyReferenceLike>, "", [MemWrite]>:$stat,
+                       Arg<Optional<AnyRefOrBoxType>, "", [MemWrite]>:$errmsg);
+
+  let hasVerifier = 1;
+  let assemblyFormat = [{
+    $a `:`  qualified(type($a))
+    (`result` `(` $result_image^ `:` type($result_image) `)` )? 
+    (`stat` `(` $stat^ `:` type($stat) `)` )?
+    (`errmsg` `(` $errmsg^ `:` type($errmsg) `)` )? 
+    attr-dict
+  }];
+}
+
+def mif_CoSumOp
+    : mif_Op<"co_sum", [AttrSizedOperandSegments, MemoryEffects<[MemWrite]>]> {
+  let summary = "Compute sum across images.";
+  let description = [{
+    The co_sum operation performs the computation of the sum
+    across images.
+  }];
+
+  let arguments = (ins fir_BoxType:$a, 
+                       Optional<AnyIntegerType>:$result_image,
+                       Arg<Optional<AnyReferenceLike>, "", [MemWrite]>:$stat,
+                       Arg<Optional<Any...
[truncated]
 | 
| @llvm/pr-subscribers-flang-fir-hlfir Author: Jean-Didier PAILLEUX (JDPailleux) ChangesSupport for multi-image features has begun to be integrated into LLVM. A new dialect which simplifies lowering to PRIF wil be proposed in this PR. Features like TEAMs and the use of coarrays will follow later in other PRs. Patch is 231.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/161179.diff 59 Files Affected: 
 diff --git a/flang/include/flang/Optimizer/Builder/Runtime/Coarray.h b/flang/include/flang/Optimizer/Builder/Runtime/Coarray.h
deleted file mode 100644
index 20bfb7c124af2..0000000000000
--- a/flang/include/flang/Optimizer/Builder/Runtime/Coarray.h
+++ /dev/null
@@ -1,85 +0,0 @@
-//===-- Coarray.h -- generate Coarray intrinsics runtime calls --*- C++ -*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-
-#ifndef FORTRAN_OPTIMIZER_BUILDER_RUNTIME_COARRAY_H
-#define FORTRAN_OPTIMIZER_BUILDER_RUNTIME_COARRAY_H
-
-#include "flang/Lower/AbstractConverter.h"
-#include "flang/Optimizer/Support/InternalNames.h"
-#include "mlir/Dialect/Func/IR/FuncOps.h"
-
-namespace fir {
-class ExtendedValue;
-class FirOpBuilder;
-} // namespace fir
-
-namespace fir::runtime {
-
-// Get the function type for a prif subroutine with a variable number of
-// arguments
-#define PRIF_FUNCTYPE(...)                                                     \
-  mlir::FunctionType::get(builder.getContext(), /*inputs*/ {__VA_ARGS__},      \
-                          /*result*/ {})
-
-// Default prefix for subroutines of PRIF compiled with LLVM
-#define PRIFNAME_SUB(fmt)                                                      \
-  []() {                                                                       \
-    std::ostringstream oss;                                                    \
-    oss << "prif_" << fmt;                                                     \
-    return fir::NameUniquer::doProcedure({"prif"}, {}, oss.str());             \
-  }()
-
-#define PRIF_STAT_TYPE builder.getRefType(builder.getI32Type())
-#define PRIF_ERRMSG_TYPE                                                       \
-  fir::BoxType::get(fir::CharacterType::get(builder.getContext(), 1,           \
-                                            fir::CharacterType::unknownLen()))
-
-/// Generate Call to runtime prif_init
-mlir::Value genInitCoarray(fir::FirOpBuilder &builder, mlir::Location loc);
-
-/// Generate Call to runtime prif_num_images
-mlir::Value getNumImages(fir::FirOpBuilder &builder, mlir::Location loc);
-
-/// Generate Call to runtime prif_num_images_with_team or
-/// prif_num_images_with_team_number
-mlir::Value getNumImagesWithTeam(fir::FirOpBuilder &builder, mlir::Location loc,
-                                 mlir::Value team);
-
-/// Generate Call to runtime prif_this_image_no_coarray
-mlir::Value getThisImage(fir::FirOpBuilder &builder, mlir::Location loc,
-                         mlir::Value team = {});
-
-/// Generate call to runtime subroutine prif_co_broadcast
-void genCoBroadcast(fir::FirOpBuilder &builder, mlir::Location loc,
-                    mlir::Value A, mlir::Value sourceImage, mlir::Value stat,
-                    mlir::Value errmsg);
-
-/// Generate call to runtime subroutine prif_co_max and prif_co_max_character
-void genCoMax(fir::FirOpBuilder &builder, mlir::Location loc, mlir::Value A,
-              mlir::Value resultImage, mlir::Value stat, mlir::Value errmsg);
-
-/// Generate call to runtime subroutine prif_co_min or prif_co_min_character
-void genCoMin(fir::FirOpBuilder &builder, mlir::Location loc, mlir::Value A,
-              mlir::Value resultImage, mlir::Value stat, mlir::Value errmsg);
-
-/// Generate call to runtime subroutine prif_co_sum
-void genCoSum(fir::FirOpBuilder &builder, mlir::Location loc, mlir::Value A,
-              mlir::Value resultImage, mlir::Value stat, mlir::Value errmsg);
-
-/// Generate call to runtime subroutine prif_sync_all
-void genSyncAllStatement(fir::FirOpBuilder &builder, mlir::Location loc,
-                         mlir::Value stat, mlir::Value errmsg);
-/// Generate call to runtime subroutine prif_sync_memory
-void genSyncMemoryStatement(fir::FirOpBuilder &builder, mlir::Location loc,
-                            mlir::Value stat, mlir::Value errmsg);
-/// Generate call to runtime subroutine prif_sync_images
-void genSyncImagesStatement(fir::FirOpBuilder &builder, mlir::Location loc,
-                            mlir::Value imageSet, mlir::Value stat,
-                            mlir::Value errmsg);
-} // namespace fir::runtime
-#endif // FORTRAN_OPTIMIZER_BUILDER_RUNTIME_COARRAY_H
diff --git a/flang/include/flang/Optimizer/Dialect/CMakeLists.txt b/flang/include/flang/Optimizer/Dialect/CMakeLists.txt
index adefcfea0b5dc..7b1a12932276a 100644
--- a/flang/include/flang/Optimizer/Dialect/CMakeLists.txt
+++ b/flang/include/flang/Optimizer/Dialect/CMakeLists.txt
@@ -1,5 +1,6 @@
 add_subdirectory(CUF)
 add_subdirectory(FIRCG)
+add_subdirectory(MIF)
 
 # This replicates part of the add_mlir_dialect cmake function from MLIR that
 # cannot be used her because it expects to be run inside MLIR directory which
diff --git a/flang/include/flang/Optimizer/Dialect/MIF/CMakeLists.txt b/flang/include/flang/Optimizer/Dialect/MIF/CMakeLists.txt
new file mode 100644
index 0000000000000..27ba3889c8045
--- /dev/null
+++ b/flang/include/flang/Optimizer/Dialect/MIF/CMakeLists.txt
@@ -0,0 +1,9 @@
+set(LLVM_TARGET_DEFINITIONS MIFDialect.td)
+mlir_tablegen(MIFDialect.h.inc -gen-dialect-decls -dialect=mif)
+mlir_tablegen(MIFDialect.cpp.inc -gen-dialect-defs -dialect=mif)
+
+# Add Multi Image Fortran operations
+set(LLVM_TARGET_DEFINITIONS MIFOps.td)
+mlir_tablegen(MIFOps.h.inc -gen-op-decls)
+mlir_tablegen(MIFOps.cpp.inc -gen-op-defs)
+add_public_tablegen_target(MIFOpsIncGen)
diff --git a/flang/include/flang/Optimizer/Dialect/MIF/MIFDialect.h b/flang/include/flang/Optimizer/Dialect/MIF/MIFDialect.h
new file mode 100644
index 0000000000000..f862d9175a6ae
--- /dev/null
+++ b/flang/include/flang/Optimizer/Dialect/MIF/MIFDialect.h
@@ -0,0 +1,31 @@
+//===- MIF.h - MIF dialect --------------------------------*- C++-*-==//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef FORTRAN_OPTIMIZER_DIALECT_MIF_MIF_H
+#define FORTRAN_OPTIMIZER_DIALECT_MIF_MIF_H
+
+#include "mlir/Bytecode/BytecodeOpInterface.h"
+#include "mlir/IR/Dialect.h"
+#include "mlir/IR/OpDefinition.h"
+#include "mlir/IR/OpImplementation.h"
+#include "mlir/IR/SymbolTable.h"
+#include "mlir/Interfaces/CallInterfaces.h"
+#include "mlir/Interfaces/InferTypeOpInterface.h"
+#include "mlir/Interfaces/SideEffectInterfaces.h"
+#include "mlir/Interfaces/VectorInterfaces.h"
+
+// #include "flang/Optimizer/Dialect/FIRAttr.h"
+// #include "flang/Optimizer/Dialect/FIRType.h"
+
+//===----------------------------------------------------------------------===//
+// MIFDialect
+//===----------------------------------------------------------------------===//
+
+#include "flang/Optimizer/Dialect/MIF/MIFDialect.h.inc"
+
+#endif // FORTRAN_OPTIMIZER_DIALECT_MIF_MIF_H
diff --git a/flang/include/flang/Optimizer/Dialect/MIF/MIFDialect.td b/flang/include/flang/Optimizer/Dialect/MIF/MIFDialect.td
new file mode 100644
index 0000000000000..f8be6a86a79fe
--- /dev/null
+++ b/flang/include/flang/Optimizer/Dialect/MIF/MIFDialect.td
@@ -0,0 +1,38 @@
+//===-- MIFDialect.td - MIF dialect base definitions - tablegen ---------*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// Definition of the Multi-Image Fortran (MIF)  dialect
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef FORTRAN_DIALECT_MIF_MIFDIALECT
+#define FORTRAN_DIALECT_MIF_MIFDIALECT
+
+include "mlir/IR/AttrTypeBase.td"
+include "mlir/IR/EnumAttr.td"
+include "mlir/IR/OpBase.td"
+
+def MIFDialect : Dialect {
+  let name = "mif";
+
+  let summary = "Multi-Image Fortran dialect";
+
+  let description = [{
+    The "MIF" dialect is designed to contain the basic coarray operations
+    in Fortran and all multi image operations as descibed in the standard.
+    This includes synchronization operations, atomic operations,
+    image queries, teams, criticals, etc. The MIF dialect operations use 
+    the FIR types and are tightly coupled with FIR and HLFIR.
+  }];
+
+  let cppNamespace = "::mif";
+  let dependentDialects = ["fir::FIROpsDialect"];
+}
+
+#endif // FORTRAN_DIALECT_MIF_MIFDIALECT
diff --git a/flang/include/flang/Optimizer/Dialect/MIF/MIFOps.h b/flang/include/flang/Optimizer/Dialect/MIF/MIFOps.h
new file mode 100644
index 0000000000000..a9e53c21171c4
--- /dev/null
+++ b/flang/include/flang/Optimizer/Dialect/MIF/MIFOps.h
@@ -0,0 +1,20 @@
+//===-- Optimizer/Dialect/MIF/MIFOps.h - MIF operations ---------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef FORTRAN_OPTIMIZER_DIALECT_MIF_MIFOPS_H
+#define FORTRAN_OPTIMIZER_DIALECT_MIF_MIFOPS_H
+
+#include "flang/Optimizer/Dialect/FIRType.h"
+#include "flang/Optimizer/Dialect/MIF/MIFDialect.h"
+#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
+#include "mlir/IR/OpDefinition.h"
+
+#define GET_OP_CLASSES
+#include "flang/Optimizer/Dialect/MIF/MIFOps.h.inc"
+
+#endif // FORTRAN_OPTIMIZER_DIALECT_MIF_MIFOPS_H
diff --git a/flang/include/flang/Optimizer/Dialect/MIF/MIFOps.td b/flang/include/flang/Optimizer/Dialect/MIF/MIFOps.td
new file mode 100644
index 0000000000000..115ef0c65a76f
--- /dev/null
+++ b/flang/include/flang/Optimizer/Dialect/MIF/MIFOps.td
@@ -0,0 +1,274 @@
+//===-- MIFOps.td - MIF operation definitions ------*- tablegen -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// Definition of the MIF dialect operations
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef FORTRAN_DIALECT_MIF_MIF_OPS
+#define FORTRAN_DIALECT_MIF_MIF_OPS
+
+include "flang/Optimizer/Dialect/MIF/MIFDialect.td"
+include "flang/Optimizer/Dialect/FIRTypes.td"
+include "flang/Optimizer/Dialect/FIRAttr.td"
+include "mlir/Dialect/LLVMIR/LLVMAttrDefs.td"
+include "mlir/Dialect/LLVMIR/LLVMOpBase.td"
+include "mlir/Interfaces/LoopLikeInterface.td"
+include "mlir/IR/BuiltinAttributes.td"
+
+class mif_Op<string mnemonic, list<Trait> traits>
+    : Op<MIFDialect, mnemonic, traits>;
+
+//===----------------------------------------------------------------------===//
+// Initialization and Finalization
+//===----------------------------------------------------------------------===//
+
+def mif_InitOp : mif_Op<"init", []> {
+  let summary = "Initialize the parallel environment";
+  let description = [{This operation will initialize the parallel environment}];
+
+  let results = (outs I32:$stat);
+  let assemblyFormat = "`->` type($stat) attr-dict";
+}
+
+//===----------------------------------------------------------------------===//
+// Image Queries
+//===----------------------------------------------------------------------===//
+
+def mif_NumImagesOp
+    : mif_Op<"num_images", [NoMemoryEffect, AttrSizedOperandSegments]> {
+  let summary = "Query the number of images in the specified or current team";
+  let description = [{
+    This operation query the number of images in the specified or current
+    team and can be called with 3 differents way :
+    - `num_images()`
+    - `num_images(team)`
+    - `num_images(team_number)`
+
+    Arguments:
+    - `team` : Shall be a scalar of type `team_type` from the `ISO_FORTRAN_ENV`
+            module with a value that identifies the current or ancestor team.
+    - `team_number` :  Shall be an integer scalar. It shall identify the
+            initial team or a sibling team of the current team.
+
+    Result Value: The number of images in the specified team, or in the current
+    team if no team is specified.
+  }];
+
+  let arguments = (ins Optional<AnyInteger>:$team_number,
+                       Optional<AnyRefOrBoxType>:$team);
+  let results = (outs I32:$res);
+
+  let builders = [OpBuilder<(ins CArg<"mlir::Value", "{}">:$teamArg)>];
+
+  let hasVerifier = 1;
+  let assemblyFormat = [{
+    ( `team_number` `(` $team_number^ `:` type($team_number) `)` )? 
+    ( `team` `(` $team^ `:` type($team) `)` )? 
+    attr-dict `->` type($res)
+  }];
+}
+
+def mif_ThisImageOp
+    : mif_Op<"this_image", [NoMemoryEffect, AttrSizedOperandSegments]> {
+  let summary = "Determine the image index of the current image";
+  let description = [{
+    Arguments:
+    - `coarray` :  Shall be a coarray of any type.
+    - `dim` : Shall be an integer scalar. Its value shall be in the range of
+          1 <= DIM <= N, where N is the corank of the coarray.
+    - `team`(optional) : Shall be a scalar of type `team_type` from
+          ISO_FORTRAN_ENV. If the `coarray` is present, it shall be
+          established in that team.
+
+    Results:
+    - Case(1) : The result of `this_image([team])` is a scalar with a value
+          equal to the index of the image in the current or specified team.
+    - Case(2) : The result of `this_image(coarray [,team])` is the sequence of
+          cosubscript values for `coarray`.
+    - Case(3) : The result of `this_image(coarray, dim [,team])` is the value of
+          cosubscript `dim` in the sequence of cosubscript values for `coarray`.
+
+    Example:
+    ```fortran
+      REAL :: A[10, 0:9, 0:*]
+    ```
+    If we take a look on the example and we are on image 5, `this_image` has the
+    value 5, `this_image(A)` has the value [5, 0, 0].
+  }];
+
+  let arguments = (ins Optional<fir_BoxType>:$coarray,
+                       Optional<AnyInteger>:$dim, Optional<fir_BoxType>:$team);
+  let results = (outs I32:$res);
+
+  let builders = [OpBuilder<(ins "mlir::Value":$coarray, "mlir::Value":$team)>,
+                  OpBuilder<(ins "mlir::Value":$team)>];
+
+  let hasVerifier = 1;
+  let assemblyFormat = [{
+    ( `coarray` `(` $coarray^ `:` type($coarray) `)` )? 
+    ( `team` `(` $team^ `:` type($team) `)` )? 
+    ( `dim` `(` $dim^ `:` type($dim) `)` )? 
+    attr-dict `->` type($res)
+  }];
+}
+
+//===----------------------------------------------------------------------===//
+// Synchronization
+//===----------------------------------------------------------------------===//
+
+def mif_SyncAllOp : mif_Op<"sync_all", [AttrSizedOperandSegments,
+                                        MemoryEffects<[MemWrite]>]> {
+  let summary =
+      "Performs a collective synchronization of all images in the current team";
+
+  let arguments = (ins Arg<Optional<AnyReferenceLike>, "", [MemWrite]>:$stat,
+                       Arg<Optional<AnyRefOrBoxType>, "", [MemWrite]>:$errmsg);
+  let assemblyFormat = [{
+    (`stat` `(` $stat^ `:` type($stat) `)` )?
+    (`errmsg` `(` $errmsg^ `:` type($errmsg) `)` )? 
+    attr-dict
+  }];
+}
+
+def mif_SyncImagesOp : mif_Op<"sync_images", [AttrSizedOperandSegments,
+                                              MemoryEffects<[MemWrite]>]> {
+  let summary = "Performs a synchronization of image with each of the other "
+                "images in the `image_set`";
+  let description = [{
+    This operation can take an optional argument `image_set`, wich must be an integer expression
+    and must be scalar or rank one. If `image_set` is omitted from the call, this operation will 
+    adopt the behavior of the Fortran statement `SYNC IMAGES(*)`.
+  }];
+
+  let arguments = (ins Optional<AnyRefOrBoxType>:$image_set,
+                       Arg<Optional<AnyReferenceLike>, "", [MemWrite]>:$stat,
+                       Arg<Optional<AnyRefOrBoxType>, "", [MemWrite]>:$errmsg);
+  let hasVerifier = 1;
+  let assemblyFormat = [{
+    (`image_set` `(` $image_set^ `:` type($image_set) `)` )? 
+    (`stat` `(` $stat^ `:` type($stat) `)` )?
+    (`errmsg` `(` $errmsg^ `:` type($errmsg) `)` )? 
+    attr-dict
+  }];
+}
+
+def mif_SyncMemoryOp : mif_Op<"sync_memory", [AttrSizedOperandSegments,
+                                              MemoryEffects<[MemWrite]>]> {
+  let summary = "Operation that ends one segment and begins another.";
+  let description = [{
+    Operation that ends one segment and begins another; Those two segments can 
+    be ordered by user-defined way with respect to segments on other images.
+  }];
+
+  let arguments = (ins Arg<Optional<AnyReferenceLike>, "", [MemWrite]>:$stat,
+                       Arg<Optional<AnyRefOrBoxType>, "", [MemWrite]>:$errmsg);
+  let assemblyFormat = [{
+    (`stat` `(` $stat^ `:` type($stat) `)` )?
+    (`errmsg` `(` $errmsg^ `:` type($errmsg) `)` )? 
+    attr-dict
+  }];
+}
+
+//===----------------------------------------------------------------------===//
+// Collective Operations
+//===----------------------------------------------------------------------===//
+
+def mif_CoBroadcastOp : mif_Op<"co_broadcast", [AttrSizedOperandSegments,
+                                                MemoryEffects<[MemWrite]>]> {
+  let summary = "Broadcast value to images.";
+  let description = [{
+    The co_broadcast operation performs the computation of the sum
+    across images.
+  }];
+
+  let arguments = (ins fir_BoxType:$a, 
+                       AnyIntegerType:$source_image,
+                       Arg<Optional<AnyReferenceLike>, "", [MemWrite]>:$stat,
+                       Arg<Optional<AnyRefOrBoxType>, "", [MemWrite]>:$errmsg);
+
+  let assemblyFormat = [{
+    $a `:` qualified(type($a)) 
+    `source` `(` $source_image `:` type($source_image) `)`
+    (`stat`  `(` $stat^ `:` type($stat) `)` )?
+    (`errmsg` `(` $errmsg^ `:` type($errmsg) `)` )? 
+    attr-dict
+  }];
+}
+
+def mif_CoMaxOp
+    : mif_Op<"co_max", [AttrSizedOperandSegments, MemoryEffects<[MemWrite]>]> {
+  let summary = "Compute maximum value across images.";
+  let description = [{
+    The co_max operation performs the computation of the maximum 
+    across images.
+  }];
+
+  let arguments = (ins fir_BoxType:$a, 
+                       Optional<AnyIntegerType>:$result_image,
+                       Arg<Optional<AnyReferenceLike>, "", [MemWrite]>:$stat,
+                       Arg<Optional<AnyRefOrBoxType>, "", [MemWrite]>:$errmsg);
+
+  let hasVerifier = 1;
+  let assemblyFormat = [{
+    $a `:`  qualified(type($a))
+    (`result` `(` $result_image^ `:` type($result_image) `)` )? 
+    (`stat` `(` $stat^ `:` type($stat) `)` )?
+    (`errmsg` `(` $errmsg^ `:` type($errmsg) `)` )? 
+    attr-dict
+  }];
+}
+
+def mif_CoMinOp
+    : mif_Op<"co_min", [AttrSizedOperandSegments, MemoryEffects<[MemWrite]>]> {
+  let summary = "Compute minimum value across images.";
+  let description = [{
+    The co_min operation performs the computation of the minimum
+    across images.
+  }];
+
+  let arguments = (ins fir_BoxType:$a,
+                       Optional<AnyIntegerType>:$result_image,
+                       Arg<Optional<AnyReferenceLike>, "", [MemWrite]>:$stat,
+                       Arg<Optional<AnyRefOrBoxType>, "", [MemWrite]>:$errmsg);
+
+  let hasVerifier = 1;
+  let assemblyFormat = [{
+    $a `:`  qualified(type($a))
+    (`result` `(` $result_image^ `:` type($result_image) `)` )? 
+    (`stat` `(` $stat^ `:` type($stat) `)` )?
+    (`errmsg` `(` $errmsg^ `:` type($errmsg) `)` )? 
+    attr-dict
+  }];
+}
+
+def mif_CoSumOp
+    : mif_Op<"co_sum", [AttrSizedOperandSegments, MemoryEffects<[MemWrite]>]> {
+  let summary = "Compute sum across images.";
+  let description = [{
+    The co_sum operation performs the computation of the sum
+    across images.
+  }];
+
+  let arguments = (ins fir_BoxType:$a, 
+                       Optional<AnyIntegerType>:$result_image,
+                       Arg<Optional<AnyReferenceLike>, "", [MemWrite]>:$stat,
+                       Arg<Optional<Any...
[truncated]
 | 
| Thanks @JDPailleux for the progress with the Multi Image Dialect. Does this currently cover only the intrinsics? If you could write a high level description of the MIF dialect as a PR to flang/docs or write a summary in discourse (https://discourse.llvm.org/c/subprojects/flang/33) that will make it easier for others to follow and help review. Alternatively you could consider presenting at the Flang community call next Wednesday. Are there any general principles that you are following. Like all co-array or multi-image intrinsics will be represented as an operation. Are there any new FIR/MLIR types to represent coarrays/multi-images? Is the dialect only for ease of lowering or will it include some transformations as well? What kind of transformations do you plan to have in the dialect? | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JDPailleux for adding a dialect for multi image Fortran!
It is nice to centralize the interface with PRIF in an MLIR pass and to have operations for MIF/Coarrays that will make experimenting easier.
I added comments inline, overall this looks great to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JDPailleux Thanks for all the great work on this new MIF dialect!
I don't follow all the LLVM details in this PR, but I've added some questions/comments on parts I think I understand.
| 
 I see that you have some slides in https://flang-compiler.slack.com/archives/C06QTP07SCU/p1758900973869609. This addresses my points above. Thanks. | 
| Hello @kiranchandramohan, currently in this PR, the dialect will cover the intrinsics/statements that are already upstreamed in LLVM. But later we will add statements/intrinsics, coarray allocation and deallocation, and coarray memory access operations, ... Of course, I will write something on the discourse. About your question about the representation of a coarray in that, I haven't fully thought about it yet. But there are some ideas in the slides pointed on slack. Unfortunately, I won't be available next Wednesday, but I'll be available for a next Flang community call. | 
Co-authored-by: Dan Bonachea <[email protected]>
Co-authored-by: Dan Bonachea <[email protected]>
Co-authored-by: Dan Bonachea <[email protected]>
- Add decoration MemWrite and MemRead - Remote useless --split-file flag - Remote useless Start/End/Main in the conversion tests - Replace PRIF macro type by static functions - Remove unecessary things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JDPailleux Thanks for all the improvements!
I'm very excited to see this effort progressing.
Co-authored-by: Valentin Clement (バレンタイン クレメン) <[email protected]>
Co-authored-by: Valentin Clement (バレンタイン クレメン) <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take care of the missing side effect, LGTM otherwise. Thanks for adding this dialect!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the recent improvements look great!
I added a few more observations to consider as possible improvements.
| //===----------------------------------------------------------------------===// | ||
|  | ||
| def mif_SyncAllOp : mif_Op<"sync_all", [AttrSizedOperandSegments, | ||
| MemoryEffects<[MemWrite]>]> { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All three SYNC operations in this section (sync_all, sync_images and sync_memory) are Fortran "image control statements" (see F23 11.7.1). This means they end a Fortran "segment" and start another (see F23 11.7.2), which affects the memory consistency semantics of the multi-image execution.
One practical impact of this is to restrict the types of optimizations that can safely be performed on surrounding operations that might access memory reachable from a coarray (which generally includes any variable directly in a coarray or any variable with a TARGET attribute). So for example in the absence of very sophisticated analysis, the LLVM optimizer should not be permitted to effectively hoist any access operations touching such locations across an image control statement like sync all.
The parallel runtime library will only ensure the image synchronization semantics of the operation and inhibit hardware-level reordering. It's the compiler's responsibility to ensure that optimizations don't move surrounding memory access operations across the image control statement in ways that could break the Fortran memory model. Such violations probably wouldn't affect visible behavior today, but could become a visible problem once we add coindexed access.
I found only limited documentation on mlir::MemoryEffects, so I'm not sure if the dialect is a sufficient and proper place to represent this category of constraint, or if it belongs elsewhere. However one conservative way to enforce the required constraint might be to set the MemoryEffects of dialect operations corresponding to image control statements such that compiler analysis assumes the operation effectively has a side-effect of reading and writing all memory locations (not just those directly referenced by the arguments).
@jeanPerier Can this be expressed by setting attributes like MemoryEffects<[MemRead,MemWrite]> on dialect operations corresponding to image control statements? Or do we need something even more aggressive like MemoryEffects<[MemRead<DefaultResource, 0, FullEffect>,MemWrite<DefaultResource, 1, FullEffect>]>, or something else entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a good point that a sync operation will not only write but also read something from the runtime state.
FullEffect seems to imply all the memory resource is being read/written. It is not the case here, and if anything, I would imagine this is an indication that allows doing more optimizations (like deleting the first write in write after write). It is not used on any of the FIR operations.
The safest approach is to not implement the side effect interface at all, this will prevent any dialect independent optimizations involving memory. My understanding is that this is semantically equivalent to specifying MemoryEffects<[MemRead,MemWrite,MemAlloc,MemFree]>, with the exception that the latter imply that the operation has no memory conflict with an operation that would have MemoryEffects on something else than the DefaultRessource.
For now I would advise starting by not implementing the side effect interface at all without some deeper design and testing phase.
I checked the MPI dialect and it does not implement side effect interfaces outside of few invariant operations that they consider Pure.
Later, there is maybe a case from ensuring some optimizations like load to store forwarding of local scalars that are not involved in any multi image statements is still possible (my understanding is that an operation that either does not implement the side effect interface or has generic write effect will prevent such optimization if it is between the load and store, even if the operation does not take the address, or an alias of the address as an argument). We should likely have a discussion with the MPI dialect contributors when trying to specify side effects in a finer way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FullEffectseems to imply all the memory resource is being read/written. It is not the case here, and if anything, I would imagine this is an indication that allows doing more optimizations (like deleting the first write in write after write).
The idea was that by including MemRead<DefaultResource, 0, FullEffect> then analysis must assume this operation starts by reading every single memory location, and then writes it MemWrite<DefaultResource, 1, FullEffect>]>.  I suspect that first step automatically avoids the WAW hazard you mention by ensuring this operation starts by effectively reading all of memory. It's not literally how a sync all behaves, but we're looking for an annotation that inhibits all motion of memory access across this segment boundary. Conservatively saying that it consumes all of memory and then writes all of memory is more than necessary but should be sufficient.
All that being said, I yield to @jeanPerier 's expertise regarding the safest near-term approach being to remove the side effect interface from sync {all,images,memory}, which hopefully has the same fencing effect on problematic optimization across segment boundaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's definitely a good point! Perhaps later on, a specific memory effect will need to be added for this type of operation, if I understand correctly, because there isn't really one for this type of use ?
The current proposal to remove the memory effect isn't a right solution. Because if we omit passing errmsg and stat, after optimization, the operation is removed. As specified by @jeanPerier , should we use full effect in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current proposal to remove the memory effect isn't a right solution. Because if we omit passing errmsg and stat, after optimization, the operation is removed.
Are you sure you removed all the memory effect specifications from the operation, including the ones on the operands ([MemWrite]>:$stat and [MemWrite]>:$errmsg)?
Specifying memory effects on the operands gives the MemoryEffectsOpInterface to the operation and implies that there are no other memory being read/written to than the operands, which is not what we want here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, sorry, when I tested it, I forgot to remove one of the arguments!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a specific memory effect will need to be added for this type of operation, if I understand correctly, because there isn't really one for this type of use ?
No, it is likely a matter of expressing it with what exists. What I am saying is that there is no point adding memory effects to say that it can read/write to any memory. That is already implied by not implementing the interface (interfaces are here to enable optimizations in a controlled fashion which by default are disabled). I think the key will likely be to define some CoarrayMemoryRessource, and have the MIF operations do read/write to that resource. This is similar to how we ensure that VOLATILE load/stores are not re-ordered without impacting the load/stores of non VOLATILE memory around that.
The point is: only implement MemoryEffectsOpInterface if is obvious for the operation, or, for operation like here, if you have a clear design and related tests to verify that memory related optimizations will only kick in when allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This thread is now resolved.
| Sorry for my absence during the last Flang community call, but as mentioned at the beginning of the PR, I couldn't attend this call. However, I will be present at the next one on Octobre 22. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made one minor cosmetic request, otherwise LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have verified through testing with Caffeine that this PR doesn't contain any regressions for the features switched from the experimental support to support using the new MIF dialect.
Co-authored-by: Dan Bonachea <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for all the work and improvements!
| LLVM Buildbot has detected a new failure on builder  Full details are available at: https://lab.llvm.org/buildbot/#/builders/198/builds/8929 Here is the relevant piece of the build log for the reference | 
| @JDPailleux this seems to be breaking AArch64 buildbots. Could you please take a look? https://lab.llvm.org/buildbot/#/builders/41/builds/9750 | 
| @ceseo Hum, weird but yes I take a look ! | 
Fix incorrect linking and dependencies introduced in #161179 that break standalone builds of Flang. Signed-off-by: Michał Górny <[email protected]>
* [flang] Fix standalone build regression from llvm#161179 (llvm#164309) Fix incorrect linking and dependencies introduced in llvm#161179 that break standalone builds of Flang. Signed-off-by: Michał Górny <[email protected]> * [AMDGPU] Remove magic constants from V_PK_ADD_F32 pattern. NFC (llvm#164335) * [AMDGPU] Update code sequence for CU-mode Release Fences in GFX10+ (llvm#161638) They were previously optimized to not emit any waitcnt, which is technically correct because there is no reordering of operations at workgroup scope in CU mode for GFX10+. This breaks transitivity however, for example if we have the following sequence of events in one thread: - some stores - store atomic release syncscope("workgroup") - barrier then another thread follows with - barrier - load atomic acquire - store atomic release syncscope("agent") It does not work because, while the other thread sees the stores, it cannot release them at the wider scope. Our release fences aren't strong enough to "wait" on stores from other waves. We also cannot strengthen our release fences any further to allow for releasing other wave's stores because only GFX12 can do that with `global_wb`. GFX10-11 do not have the writeback instruction. It'd also add yet another level of complexity to code sequences, with both acquire/release having CU-mode only alternatives. Lastly, acq/rel are always used together. The price for synchronization has to be paid either at the acq, or the rel. Strengthening the releases would just make the memory model more complex but wouldn't help performance. So the choice here is to streamline the code sequences by making CU and WGP mode emit almost identical (vL0 inv is not needed in CU mode) code for release (or stronger) atomic ordering. This also removes the `vm_vsrc(0)` wait before barriers. Now that the release fence in CU mode is strong enough, it is no longer needed. Supersedes llvm#160501 Solves SC1-6454 * [InstSimplify] Support ptrtoaddr in simplifyGEPInst() (llvm#164262) This adds support for ptrtoaddr in the `ptradd p, ptrtoaddr(p2) - ptrtoaddr(p) -> p2` fold. This fold requires that p and p2 have the same underlying object (otherwise the provenance may not be the same). The argument I would like to make here is that because the underlying objects are the same (and the pointers in the same address space), the non-address bits of the pointer must be the same. Looking at some specific cases of underlying object relationship: * phi/select: Trivially true. * getelementptr: Only modifies address bits, non-address bits must remain the same. * addrspacecast round-trip cast: Must preserve all bits because we optimize such round-trip casts away. * non-interposable global alias: I'm a bit unsure about this one, but I guess the alias and the aliasee must have the same non-address bits? * various intrinsics like launder.invariant.group, ptrmask. I think these all either preserve all pointer bits (like the invariant.group ones) or at least the non-address bits (like ptrmask). There are some interesting cases like amdgcn.make.buffer.rsrc, but those are cross address-space. ----- There is a second `gep (gep p, C), (sub 0, ptrtoint(p)) -> C` transform in this function, which I am not extending to handle ptrtoaddr, adding negative tests instead. This transform is overall dubious for provenance reasons, but especially dubious with ptrtoaddr, as then we don't have the guarantee that provenance of `p` has been exposed. * [Hexagon] Add REQUIRES: asserts to test This test uses -debug-only, so needs an assertion-enabled build. * [AArch64] Combing scalar_to_reg into DUP if the DUP already exists (llvm#160499) If we already have a dup(x) as part of the DAG along with a scalar_to_vec(x), we can re-use the result of the dup to the scalar_to_vec(x). * [CAS] OnDiskGraphDB - fix MSVC "not all control paths return a value" warnings. NFC. (llvm#164369) * Reapply "[libc++] Optimize __hash_table::erase(iterator, iterator)" (llvm#162850) This reapplication fixes the use after free caused by not properly updating the bucket list in one case. Original commit message: Instead of just calling the single element `erase` on every element of the range, we can combine some of the operations in a custom implementation. Specifically, we don't need to search for the previous node or re-link the list every iteration. Removing this unnecessary work results in some nice performance improvements: ``` ----------------------------------------------------------------------------------------------------------------------- Benchmark old new ----------------------------------------------------------------------------------------------------------------------- std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/0 457 ns 459 ns std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/32 995 ns 626 ns std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/1024 18196 ns 7995 ns std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/8192 124722 ns 70125 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/0 456 ns 461 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/32 1183 ns 769 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/1024 27827 ns 18614 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/8192 266681 ns 226107 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/0 455 ns 462 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/32 996 ns 659 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/1024 15963 ns 8108 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/8192 136493 ns 71848 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/0 454 ns 455 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/32 985 ns 703 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/1024 16277 ns 9085 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/8192 125736 ns 82710 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/0 457 ns 454 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/32 1091 ns 646 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/1024 17784 ns 7664 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/8192 127098 ns 72806 ns ``` This reverts commit acc3a62. * [TableGen] List the indices of sub-operands (llvm#163723) Some instances of the `Operand` class used in Tablegen instruction definitions expand to a cluster of multiple operands at the MC layer, such as complex addressing modes involving base + offset + shift, or clusters of operands describing conditional Arm instructions or predicated MVE instructions. There's currently no convenient way for C++ code to know the offset of one of those sub-operands from the start of the cluster: instead it just hard-codes magic numbers like `index+2`, which is hard to read and fragile. This patch adds an extra piece of output to `InstrInfoEmitter` to define those instruction offsets, based on the name of the `Operand` class instance in Tablegen, and the names assigned to the sub-operands in the `MIOperandInfo` field. For example, if target Foo were to define def Bar : Operand { let MIOperandInfo = (ops GPR:$first, i32imm:$second); // ... } then the new constants would be `Foo::SUBOP_Bar_first` and `Foo::SUBOP_Bar_second`, defined as 0 and 1 respectively. As an example, I've converted some magic numbers related to the MVE predication operand types (`vpred_n` and its superset `vpred_r`) to use the new named constants in place of the integer literals they previously used. This is more verbose, but also clearer, because it explains why the integer is chosen instead of what its value is. * [lldb] Add bidirectional packetLog to gdbclientutils.py (llvm#162176) While debugging the tests for llvm#155000 I found it helpful to have both sides of the simulated gdb-rsp traffic rather than just the responses so I've extended the packetLog in MockGDBServerResponder to record traffic in both directions. Tests have been updated accordingly * [MLIR] [Vector] Added canonicalizer for folding from_elements + transpose (llvm#161841) ## Description Adds a new canonicalizer that folds `vector.from_elements(vector.transpose))` => `vector.from_elements`. This canonicalization reorders the input elements for `vector.from_elements`, adjusts the output shape to match the effect of the transpose op and eliminating its need. ## Testing Added a 2D vector lit test that verifies the working of the rewrite. --------- Signed-off-by: Keshav Vinayak Jha <[email protected]> * [DA] Add initial support for monotonicity check (llvm#162280) The dependence testing functions in DA assume that the analyzed AddRec does not wrap over the entire iteration space. For AddRecs that may wrap, DA should conservatively return unknown dependence. However, no validation is currently performed to ensure that this condition holds, which can lead to incorrect results in some cases. This patch introduces the notion of *monotonicity* and a validation logic to check whether a SCEV is monotonic. The monotonicity check classifies the SCEV into one of the following categories: - Unknown: Nothing is known about the monotonicity of the SCEV. - Invariant: The SCEV is loop-invariant. - MultivariateSignedMonotonic: The SCEV doesn't wrap in a signed sense for any iteration of the loops in the loop nest. The current validation logic basically searches an affine AddRec recursively and checks whether the `nsw` flag is present. Notably, it is still unclear whether we should also have a category for unsigned monotonicity. The monotonicity check is still under development and disabled by default for now. Since such a check is necessary to make DA sound, it should be enabled by default once the functionality is sufficient. Split off from llvm#154527. * [VPlan] Use VPlan::getRegion to shorten code (NFC) (llvm#164287) * [VPlan] Improve code using m_APInt (NFC) (llvm#161683) * [SystemZ] Avoid trunc(add(X,X)) patterns (llvm#164378) Replace with trunc(add(X,Y)) to avoid premature folding in upcoming patch llvm#164227 * [clang][CodeGen] Emit `llvm.tbaa.errno` metadata during module creation Let Clang emit `llvm.tbaa.errno` metadata in order to let LLVM carry out optimizations around errno-writing libcalls to, as long as it is proved the involved memory location does not alias `errno`. Previous discussion: https://discourse.llvm.org/t/rfc-modelling-errno-memory-effects/82972. * [LV][NFC] Remove undef from phi incoming values (llvm#163762) Split off from PR llvm#163525, this standalone patch replaces use of undef as incoming PHI values with zero, in order to reduce the likelihood of contributors hitting the `undef deprecator` warning in github. * [DA] Add option to enable specific dependence test only (llvm#164245) PR llvm#157084 added an option `da-run-siv-routines-only` to run only SIV routines in DA. This PR replaces that option with a more fine-grained one that allows to select other than SIV routines as well. This option is useful for regression testing of individual DA routines. This patch also reorganizes regression tests that use `da-run-siv-routines-only`. * [libcxx] Optimize `std::generate_n` for segmented iterators (llvm#164266) Part of llvm#102817. This is a natural follow-up to llvm#163006. We are forwarding `std::generate_n` to `std::__for_each_n` (`std::for_each_n` needs c++17), resulting in improved performance for segmented iterators. before: ``` std::generate_n(deque<int>)/32 17.5 ns 17.3 ns 40727273 std::generate_n(deque<int>)/50 25.7 ns 25.5 ns 26352941 std::generate_n(deque<int>)/1024 490 ns 487 ns 1445161 std::generate_n(deque<int>)/8192 3908 ns 3924 ns 179200 ``` after: ``` std::generate_n(deque<int>)/32 11.1 ns 11.0 ns 64000000 std::generate_n(deque<int>)/50 16.1 ns 16.0 ns 44800000 std::generate_n(deque<int>)/1024 291 ns 292 ns 2357895 std::generate_n(deque<int>)/8192 2269 ns 2250 ns 298667 ``` * [BOLT] Check entry point address is not in constant island (llvm#163418) There are cases where `addEntryPointAtOffset` is called with a given `Offset` that points to an address within a constant island. This triggers `assert(!isInConstantIsland(EntryPointAddress)` and causes BOLT to crash. This patch adds a check which ignores functions that would add such entry points and warns the user. * [llvm][dwarfdump] Pretty-print DW_AT_language_version (llvm#164222) In both verbose and non-verbose mode we will now use the `llvm::dwarf::LanguageDescription` to turn the version into a human readable string. In verbose mode we also display the raw version code (similar to how we display addresses in verbose mode). To make the version code and prettified easier to distinguish, we print the prettified name in colour (if available), which is consistent with how `DW_AT_language` is printed in colour. Before: ``` 0x0000000c: DW_TAG_compile_unit DW_AT_language_name (DW_LNAME_C) DW_AT_language_version (201112) ``` After: ``` 0x0000000c: DW_TAG_compile_unit DW_AT_language_name (DW_LNAME_C) DW_AT_language_version (201112 C11) ``` --------- Signed-off-by: Michał Górny <[email protected]> Signed-off-by: Keshav Vinayak Jha <[email protected]> Co-authored-by: Michał Górny <[email protected]> Co-authored-by: Stanislav Mekhanoshin <[email protected]> Co-authored-by: Pierre van Houtryve <[email protected]> Co-authored-by: Nikita Popov <[email protected]> Co-authored-by: David Green <[email protected]> Co-authored-by: Simon Pilgrim <[email protected]> Co-authored-by: Nikolas Klauser <[email protected]> Co-authored-by: Simon Tatham <[email protected]> Co-authored-by: Daniel Sanders <[email protected]> Co-authored-by: Keshav Vinayak Jha <[email protected]> Co-authored-by: Ryotaro Kasuga <[email protected]> Co-authored-by: Ramkumar Ramachandra <[email protected]> Co-authored-by: Antonio Frighetto <[email protected]> Co-authored-by: David Sherwood <[email protected]> Co-authored-by: Connector Switch <[email protected]> Co-authored-by: Asher Dobrescu <[email protected]> Co-authored-by: Michael Buch <[email protected]>
Support for multi-image features has begun to be integrated into LLVM. A new dialect which simplifies lowering to PRIF wil be proposed in this PR.
The initial definition of this dialect (MIF) is based only on operations already upstreamed in LLVM and the current lowering will be moved to this dialect.
Features like TEAMs and the use of coarrays will follow later in other PRs.
Any feedback regarding what is proposed is welcome.