Skip to content

Commit 645776b

Browse files
authored
Merge branch 'main' into bundle-break-phyreg-liveness
2 parents 77a7133 + 4903c11 commit 645776b

File tree

49 files changed

+3441
-798
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+3441
-798
lines changed

clang/docs/HIPSupport.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
HIP Support
1818
=============
1919

20-
HIP (Heterogeneous-Compute Interface for Portability) `<https://github.com/ROCm-Developer-Tools/HIP>`_ is
20+
HIP (Heterogeneous-Compute Interface for Portability) `<https://github.com/ROCm/HIP>`_ is
2121
a C++ Runtime API and Kernel Language. It enables developers to create portable applications for
2222
offloading computation to different hardware platforms from a single source code.
2323

@@ -41,9 +41,9 @@ backend or the out-of-tree LLVM-SPIRV translator. The SPIR-V is then bundled and
4141
.. note::
4242
While Clang does not directly provide HIP support for NVIDIA GPUs and CPUs, these platforms are supported via other means:
4343

44-
- NVIDIA GPUs: HIP support is offered through the HIP project `<https://github.com/ROCm-Developer-Tools/HIP>`_, which provides a header-only library for translating HIP runtime APIs into CUDA runtime APIs. The code is subsequently compiled using NVIDIA's `nvcc`.
44+
- NVIDIA GPUs: HIP support is offered through the HIP project `<https://github.com/ROCm/HIP>`_, which provides a header-only library for translating HIP runtime APIs into CUDA runtime APIs. The code is subsequently compiled using NVIDIA's `nvcc`.
4545

46-
- CPUs: HIP support is available through the HIP-CPU runtime library `<https://github.com/ROCm-Developer-Tools/HIP-CPU>`_. This header-only library enables CPUs to execute unmodified HIP code.
46+
- CPUs: HIP support is available through the HIP-CPU runtime library `<https://github.com/ROCm/HIP-CPU>`_. This header-only library enables CPUs to execute unmodified HIP code.
4747

4848

4949
Example Usage
@@ -328,7 +328,7 @@ The `parallel_unsequenced_policy <https://en.cppreference.com/w/cpp/algorithm/ex
328328
maps relatively well to the execution model of AMD GPUs. This, coupled with the
329329
the availability and maturity of GPU accelerated algorithm libraries that
330330
implement most / all corresponding algorithms in the standard library
331-
(e.g. `rocThrust <https://github.com/ROCmSoftwarePlatform/rocThrust>`__), makes
331+
(e.g. `rocThrust <https://github.com/ROCm/rocm-libraries/tree/develop/projects/rocthrust>`__), makes
332332
it feasible to provide seamless accelerator offload for supported algorithms,
333333
when an accelerated version exists. Thus, it becomes possible to easily access
334334
the computational resources of an AMD accelerator, via a well specified,
@@ -483,7 +483,7 @@ such as GPUs, work.
483483
allocation / deallocation functions with accelerator-aware equivalents,
484484
based on a pre-established table; the list of functions that can be
485485
interposed is available
486-
`here <https://github.com/ROCmSoftwarePlatform/roc-stdpar#allocation--deallocation-interposition-status>`__;
486+
`here <https://github.com/ROCm/roc-stdpar#allocation--deallocation-interposition-status>`__;
487487
- This is only run when compiling for the host.
488488

489489
The second pass is optional.
@@ -627,7 +627,7 @@ Linux operating system. Support is synthesised in the following table:
627627
The minimum Linux kernel version for running in HMM mode is 6.4.
628628

629629
The forwarding header can be obtained from
630-
`its GitHub repository <https://github.com/ROCmSoftwarePlatform/roc-stdpar>`_.
630+
`its GitHub repository <https://github.com/ROCm/roc-stdpar>`_.
631631
It will be packaged with a future `ROCm <https://rocm.docs.amd.com/en/latest/>`_
632632
release. Because accelerated algorithms are provided via
633633
`rocThrust <https://rocm.docs.amd.com/projects/rocThrust/en/latest/>`_, a
@@ -636,7 +636,7 @@ transitive dependency on
636636
can be obtained either by installing their associated components of the
637637
`ROCm <https://rocm.docs.amd.com/en/latest/>`_ stack, or from their respective
638638
repositories. The list algorithms that can be offloaded is available
639-
`here <https://github.com/ROCmSoftwarePlatform/roc-stdpar#algorithm-support-status>`_.
639+
`here <https://github.com/ROCm/roc-stdpar#algorithm-support-status>`_.
640640

641641
HIP Specific Elements
642642
---------------------

clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -311,7 +311,7 @@ Error relocateOffloadSection(const ArgList &Args, StringRef Output) {
311311
ObjcopyArgs.emplace_back("--remove-section");
312312
ObjcopyArgs.emplace_back(".llvm.offloading");
313313
StringRef Prefix = "llvm";
314-
auto Section = (Prefix + "llvm_offload_entries").str();
314+
auto Section = (Prefix + "_offload_entries").str();
315315
// Rename the offloading entires to make them private to this link unit.
316316
ObjcopyArgs.emplace_back("--rename-section");
317317
ObjcopyArgs.emplace_back(

compiler-rt/lib/fuzzer/FuzzerDataFlowTrace.cpp

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -265,8 +265,6 @@ int CollectDataFlow(const std::string &DFTBinary, const std::string &DirPath,
265265
// we then request tags in [0,Size/2) and [Size/2, Size), and so on.
266266
// Function number => DFT.
267267
auto OutPath = DirPlusFile(DirPath, Hash(FileToVector(F.File)));
268-
std::unordered_map<size_t, std::vector<uint8_t>> DFTMap;
269-
std::unordered_set<std::string> Cov;
270268
Command Cmd;
271269
Cmd.addArgument(DFTBinary);
272270
Cmd.addArgument(F.File);

flang-rt/lib/cuda/descriptor.cpp

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,14 @@ void RTDEF(CUFSyncGlobalDescriptor)(
5454
((Descriptor *)devAddr, (Descriptor *)hostPtr, sourceFile, sourceLine);
5555
}
5656

57+
void RTDEF(CUFDescriptorCheckSection)(
58+
const Descriptor *desc, const char *sourceFile, int sourceLine) {
59+
if (desc && !desc->IsContiguous()) {
60+
Terminator terminator{sourceFile, sourceLine};
61+
terminator.Crash("device array section argument is not contiguous");
62+
}
63+
}
64+
5765
RT_EXT_API_GROUP_END
5866
}
5967
} // namespace Fortran::runtime::cuda

flang/include/flang/Lower/LoweringOptions.def

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,5 +63,8 @@ ENUM_LOWERINGOPT(StackRepackArrays, unsigned, 1, 0)
6363
/// in the leading dimension.
6464
ENUM_LOWERINGOPT(RepackArraysWhole, unsigned, 1, 0)
6565

66+
/// If true, CUDA Fortran runtime check is inserted.
67+
ENUM_LOWERINGOPT(CUDARuntimeCheck, unsigned, 1, 0)
68+
6669
#undef LOWERINGOPT
6770
#undef ENUM_LOWERINGOPT

flang/include/flang/Optimizer/Builder/Runtime/CUDA/Descriptor.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,11 @@ namespace fir::runtime::cuda {
2626
void genSyncGlobalDescriptor(fir::FirOpBuilder &builder, mlir::Location loc,
2727
mlir::Value hostPtr);
2828

29+
/// Generate runtime call to check the section of a descriptor and raise an
30+
/// error if it is not contiguous.
31+
void genDescriptorCheckSection(fir::FirOpBuilder &builder, mlir::Location loc,
32+
mlir::Value desc);
33+
2934
} // namespace fir::runtime::cuda
3035

3136
#endif // FORTRAN_OPTIMIZER_BUILDER_RUNTIME_CUDA_DESCRIPTOR_H_

flang/include/flang/Runtime/CUDA/descriptor.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,10 @@ void RTDECL(CUFDescriptorSync)(Descriptor *dst, const Descriptor *src,
3737
void RTDECL(CUFSyncGlobalDescriptor)(
3838
void *hostPtr, const char *sourceFile = nullptr, int sourceLine = 0);
3939

40+
/// Check descriptor passed to a kernel.
41+
void RTDECL(CUFDescriptorCheckSection)(
42+
const Descriptor *, const char *sourceFile = nullptr, int sourceLine = 0);
43+
4044
} // extern "C"
4145

4246
} // namespace Fortran::runtime::cuda

flang/lib/Lower/ConvertCall.cpp

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
#include "flang/Optimizer/Builder/IntrinsicCall.h"
2727
#include "flang/Optimizer/Builder/LowLevelIntrinsics.h"
2828
#include "flang/Optimizer/Builder/MutableBox.h"
29+
#include "flang/Optimizer/Builder/Runtime/CUDA/Descriptor.h"
2930
#include "flang/Optimizer/Builder/Runtime/Derived.h"
3031
#include "flang/Optimizer/Builder/Todo.h"
3132
#include "flang/Optimizer/Dialect/CUF/CUFOps.h"
@@ -543,6 +544,19 @@ Fortran::lower::genCallOpAndResult(
543544
fir::FortranProcedureFlagsEnumAttr procAttrs =
544545
caller.getProcedureAttrs(builder.getContext());
545546

547+
if (converter.getLoweringOptions().getCUDARuntimeCheck()) {
548+
if (caller.getCallDescription().chevrons().empty()) {
549+
for (auto [oper, arg] :
550+
llvm::zip(operands, caller.getPassedArguments())) {
551+
if (auto boxTy = mlir::dyn_cast<fir::BaseBoxType>(oper.getType())) {
552+
const Fortran::semantics::Symbol *sym = caller.getDummySymbol(arg);
553+
if (sym && Fortran::evaluate::IsCUDADeviceSymbol(*sym))
554+
fir::runtime::cuda::genDescriptorCheckSection(builder, loc, oper);
555+
}
556+
}
557+
}
558+
}
559+
546560
if (!caller.getCallDescription().chevrons().empty()) {
547561
// A call to a CUDA kernel with the chevron syntax.
548562

flang/lib/Optimizer/Builder/Runtime/CUDA/Descriptor.cpp

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,3 +32,18 @@ void fir::runtime::cuda::genSyncGlobalDescriptor(fir::FirOpBuilder &builder,
3232
builder, loc, fTy, hostPtr, sourceFile, sourceLine)};
3333
builder.create<fir::CallOp>(loc, callee, args);
3434
}
35+
36+
void fir::runtime::cuda::genDescriptorCheckSection(fir::FirOpBuilder &builder,
37+
mlir::Location loc,
38+
mlir::Value desc) {
39+
mlir::func::FuncOp func =
40+
fir::runtime::getRuntimeFunc<mkRTKey(CUFDescriptorCheckSection)>(loc,
41+
builder);
42+
auto fTy = func.getFunctionType();
43+
mlir::Value sourceFile = fir::factory::locationToFilename(builder, loc);
44+
mlir::Value sourceLine =
45+
fir::factory::locationToLineNo(builder, loc, fTy.getInput(2));
46+
llvm::SmallVector<mlir::Value> args{fir::runtime::createArguments(
47+
builder, loc, fTy, desc, sourceFile, sourceLine)};
48+
builder.create<fir::CallOp>(loc, func, args);
49+
}
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
! RUN: bbc -emit-hlfir -fcuda %s -o - | FileCheck %s
2+
3+
! Check insertion of runtime checks
4+
5+
interface
6+
subroutine foo(a)
7+
real, device, dimension(:,:) :: a
8+
end subroutine
9+
end interface
10+
11+
real, device, allocatable, dimension(:,:) :: a
12+
allocate(a(10,10))
13+
call foo(a(1:10,1:10:2))
14+
end
15+
16+
subroutine foo(a)
17+
real, device, dimension(:,:) :: a
18+
end subroutine
19+
20+
! CHECK-LABEL: func.func @_QQmain()
21+
! CHECK: fir.call @_FortranACUFDescriptorCheckSection
22+
! CHECK: fir.call @_QPfoo

0 commit comments

Comments
 (0)