Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions flang-rt/lib/runtime/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ set(supported_sources
allocatable.cpp
array-constructor.cpp
assign.cpp
assign_omp.cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's a good idea to make the Fortran runtime depend on the OpenMP runtime library. I think it makes more sense to have this routine live in the OpenMP offload runtime as a potentially generic API entry point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fortran runtime function implementations exist in the OpenMP runtime library like this: https://github.com/llvm/llvm-project/blob/main/openmp/runtime/src/kmp_ftn_entry.h . If it is offload-only, it may also live in libomptarget.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context:
Assign_omp just does omp_target_memcpy between two device pointers.
void RTDEF(Assign_omp)(Descriptor &to, const Descriptor &from, const char *sourceFile, int sourceLine, omp::OMPDeviceTy omp_device)

This api is required, when hoisting "fir.call @_FortranAAssign(...)" from omp.target to the host in "lower-workdistribute" pass #140523

Descriptor struct is defined in flang-rt/runtime/descriptor.h.

Issue:
Now if I need to move this implementation to openmp runtime or libomptarget, I wouldn't have access to fortran-rt Descriptor structure there. Is there any solution to deal with such issue?

Probable solution:
Instead of creating new runtime api, may be, make call to "omp_target_memcpy" directly by extracting the ptrs from Descriptor structure at MLIR stage? Need to check if this is possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two options:

  • Use the CFI of Fortran to access the descriptor. It will sort of tie the OpenMP runtime to the Fortran runtime, but it will do so in an ISO-conforming way.
  • Unpack the descriptor in the code-gen for the assign operation. It might be prudent to provide an accessor function for getting the pointers and size information from the descriptor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions @mjklemm
Have tried second approach of adding Descriptor accessors in flang-rt. Draft PR #152756 is under progress.

Working on adding API in openmp runtime to do omp_target_memcpy between two device pointers.

buffer.cpp
character.cpp
connection.cpp
Expand Down Expand Up @@ -99,6 +100,7 @@ set(gpu_sources
allocatable.cpp
array-constructor.cpp
assign.cpp
assign_omp.cpp
buffer.cpp
character.cpp
connection.cpp
Expand Down
77 changes: 77 additions & 0 deletions flang-rt/lib/runtime/assign_omp.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
//===-- lib/runtime/assign_omp.cpp ----------------------------------*- C++
//-*-===//
Comment on lines +1 to +2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format

//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#include "flang-rt/runtime/assign-impl.h"
#include "flang-rt/runtime/derived.h"
#include "flang-rt/runtime/descriptor.h"
#include "flang-rt/runtime/stat.h"
#include "flang-rt/runtime/terminator.h"
#include "flang-rt/runtime/tools.h"
#include "flang-rt/runtime/type-info.h"
#include "flang/Runtime/assign.h"

#include <omp.h>

namespace Fortran::runtime {
namespace omp {

typedef int32_t OMPDeviceTy;

template <typename T> static T *getDevicePtr(T *anyPtr, OMPDeviceTy ompDevice) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for easier reasability:

Suggested change
template <typename T> static T *getDevicePtr(T *anyPtr, OMPDeviceTy ompDevice) {
template <typename T> static T *getDevicePtr(T *hostPtr, OMPDeviceTy ompDevice) {

auto voidAnyPtr = reinterpret_cast<void *>(anyPtr);
// If not present on the device it should already be a device ptr
if (!omp_target_is_present(voidAnyPtr, ompDevice))
return anyPtr;
T *device_ptr = omp_get_mapped_ptr(anyPtr, ompDevice);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use the same style for variables names: devicePtr.

return device_ptr;
}

RT_API_ATTRS static void Assign(Descriptor &to, const Descriptor &from,
Terminator &terminator, int flags, OMPDeviceTy omp_device) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flags is not used. Do we need it here?

std::size_t toElementBytes{to.ElementBytes()};
std::size_t fromElementBytes{from.ElementBytes()};
std::size_t toElements{to.Elements()};
std::size_t fromElements{from.Elements()};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to check also that descriptors are contiguous. You can have the same number of elements but the stride might be different.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want it to work also on non contiguous descriptors, the Assign function as a mechanism to pass memmove function to use.


if (toElementBytes != fromElementBytes)
terminator.Crash("Assign: toElementBytes != fromElementBytes");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging the number of element bytes (and elements below) in the crash might be helpful when debugging.

if (toElements != fromElements)
terminator.Crash("Assign: toElements != fromElements");

// Get base addresses and calculate length
void *to_base = to.raw().base_addr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Naming style is different in this function as well.

void *from_base = from.raw().base_addr;
size_t length = toElements * toElementBytes;

// Get device pointers after ensuring data is on device
void *to_ptr = getDevicePtr(to_base, omp_device);
void *from_ptr = getDevicePtr(from_base, omp_device);

// Perform copy between device pointers
int result = omp_target_memcpy(to_ptr, from_ptr, length,
/*dst_offset*/ 0, /*src_offset*/ 0, omp_device, omp_device);

if (result != 0)
terminator.Crash("Assign: omp_target_memcpy failed");
return;
}

extern "C" {
RT_EXT_API_GROUP_BEGIN
void RTDEF(Assign_omp)(Descriptor &to, const Descriptor &from,
const char *sourceFile, int sourceLine, omp::OMPDeviceTy omp_device) {
Terminator terminator{sourceFile, sourceLine};
Fortran::runtime::omp::Assign(to, from, terminator,
MaybeReallocate | NeedFinalization | ComponentCanBeDefinedAssignment,
omp_device);
}

} // extern "C"
} // namespace omp
} // namespace Fortran::runtime
3 changes: 3 additions & 0 deletions flang/include/flang/Runtime/assign.h
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,9 @@ extern "C" {
// API for lowering assignment
void RTDECL(Assign)(Descriptor &to, const Descriptor &from,
const char *sourceFile = nullptr, int sourceLine = 0);
void RTDECL(Assign_omp)(Descriptor &to, const Descriptor &from,
const char *sourceFile = nullptr, int sourceLine = 0,
int32_t omp_device = 0);
// This variant has no finalization, defined assignment, or allocatable
// reallocation.
void RTDECL(AssignTemporary)(Descriptor &to, const Descriptor &from,
Expand Down