Skip to content

Commit a44f116

Browse files
[SYCL RTC] Introduce --auto-pch support (#20226)
Compilation of `#include <sycl/sycl.hpp>` is slow and that's especially problematic for SYCL RTC (run-time compilation). One way to overcome this is fine-grained includes that are being pursued separately. Another way is to employ clang's precompiled headers support which this PR is doing. Those two approaches can be combined, and this PR adds `test-e2e/PerformanceTests/KernelCompiler/auto-pch.cpp` that gives some idea of the PCH impact. The test shows PCH benefits when compiling some of the fine-grained includes on top of absolute minimum required to compiled SYCL RTC's "Hello world". From one of the CI runs: | Extra Headers | Without PCH | With auto-PCH | -|-|- | <none> | 176ms 137ms 136ms 136ms 136ms | 226ms 64ms 64ms 64ms 64ms | sycl/half_type.hpp | 165ms 165ms 165ms 165ms 165ms | 267ms 71ms 72ms 72ms 72ms | sycl/ext/oneapi/bfloat16.hpp | 174ms 173ms 173ms 173ms 173ms | 279ms 76ms 73ms 73ms 74ms | sycl/marray.hpp | 142ms 143ms 142ms 142ms 143ms | 235ms 66ms 66ms 66ms 66ms | sycl/vector.hpp | 296ms 290ms 290ms 290ms 290ms | 487ms 124ms 125ms 125ms 125ms | sycl/multi_ptr.hpp | 278ms 278ms 276ms 275ms 274ms | 441ms 125ms 125ms 125ms 125ms | sycl/builtins.hpp | 537ms 533ms 531ms 531ms 531ms | 883ms 218ms 218ms 219ms 218ms It misses `sycl/sycl.hpp` line because that currently crashes FE when reading the generated PCH, the crash is being investigated/fixed separately. Implementation-wise I'm reusing existing upstream `clang::PrecompiledPreamble` with one minor modification. It seems that `PrecompiledPreamble`'s main usage is for things like `clangd` so it ignores errors in the code. I've modified it so that those errors would break pch-generation the same way normal compilation would break. I'm also not sure if we'd want that long-term, because it seems that making such "auto-pch" persistent would deviate from the upstream version of `PrecompiledPreamble` even more. I can imagine that in some near future we'd need to "fork" it into a separate utility. Still, seems to be fine for the first step. Driver modifications are for the `--auto-pch` option support that should only be present on the SYCL RTC path and not for the regular `clang` invocations from the command line. I'm relatively confident those will stay in future.
1 parent 82f43fa commit a44f116

File tree

13 files changed

+649
-17
lines changed

13 files changed

+649
-17
lines changed

clang/include/clang/Driver/Options.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ enum ClangVisibility {
4141
FlangOption = (1 << 4),
4242
FC1Option = (1 << 5),
4343
DXCOption = (1 << 6),
44+
SYCLRTCOnlyOption = (1 << 7),
4445
};
4546

4647
enum ID {

clang/include/clang/Driver/Options.td

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,9 @@ def FC1Option : OptionVisibility;
107107
// are made available when the driver is running in DXC compatibility mode.
108108
def DXCOption : OptionVisibility;
109109

110+
// SYCLRTCOnlyOption - only acceptable for the SYCL RTC (Run Time Compilation).
111+
def SYCLRTCOnlyOption : OptionVisibility;
112+
110113
/////////
111114
// Docs
112115

@@ -195,6 +198,11 @@ def sycl_Group : OptionGroup<"<SYCL group>">, Group<f_Group>,
195198
DocName<"SYCL options">,
196199
Visibility<[ClangOption, CLOption]>;
197200

201+
def sycl_rtc_only_Group : OptionGroup<"<SYCL RTC only group">,
202+
Group<f_Group>,
203+
DocName<"SYCL RTC specific options">,
204+
Visibility<[SYCLRTCOnlyOption]>;
205+
198206
def cuda_Group : OptionGroup<"<CUDA group>">, Group<f_Group>,
199207
DocName<"CUDA options">,
200208
Visibility<[ClangOption, CLOption]>;
@@ -7543,6 +7551,15 @@ def fsyclbin : Flag<["-"], "fsyclbin">, Alias<fsyclbin_EQ>,
75437551
AliasArgs<["executable"]>;
75447552
} // let Group = sycl_Group
75457553

7554+
// Options specific to the SYCL RTC and only available for JIT compilation (not
7555+
// through regular `clang++ -fsycl` in command line):
7556+
let Visibility = [SYCLRTCOnlyOption] in {
7557+
let Group = sycl_rtc_only_Group in {
7558+
def auto_pch : Flag<["--"], "auto-pch">,
7559+
HelpText<"Enable Auto-PCH for SYCL RTC Compilation">;
7560+
} // let Group = sycl_rtc_only_Group
7561+
} // let Visibility = [SYCLRTCOnlyOption]
7562+
75467563
// FIXME: -fsycl-explicit-simd is deprecated. remove it when support is dropped.
75477564
def : Flag<["-"], "fsycl-explicit-simd">, Flags<[Deprecated]>,
75487565
Group<clang_ignored_legacy_options_Group>,

clang/include/clang/Frontend/PrecompiledPreamble.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -87,8 +87,8 @@ class PrecompiledPreamble {
8787
IntrusiveRefCntPtr<DiagnosticsEngine> Diagnostics,
8888
IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS,
8989
std::shared_ptr<PCHContainerOperations> PCHContainerOps,
90-
bool StoreInMemory, StringRef StoragePath,
91-
PreambleCallbacks &Callbacks);
90+
bool StoreInMemory, StringRef StoragePath, PreambleCallbacks &Callbacks,
91+
bool AllowASTWithErrors = true);
9292

9393
PrecompiledPreamble(PrecompiledPreamble &&);
9494
PrecompiledPreamble &operator=(PrecompiledPreamble &&);

clang/lib/Frontend/PrecompiledPreamble.cpp

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -247,9 +247,10 @@ class TempPCHFile {
247247
class PrecompilePreambleAction : public ASTFrontendAction {
248248
public:
249249
PrecompilePreambleAction(std::shared_ptr<PCHBuffer> Buffer, bool WritePCHFile,
250-
PreambleCallbacks &Callbacks)
250+
PreambleCallbacks &Callbacks,
251+
bool AllowASTWithErrors = true)
251252
: Buffer(std::move(Buffer)), WritePCHFile(WritePCHFile),
252-
Callbacks(Callbacks) {}
253+
Callbacks(Callbacks), AllowASTWithErrors(AllowASTWithErrors) {}
253254

254255
std::unique_ptr<ASTConsumer> CreateASTConsumer(CompilerInstance &CI,
255256
StringRef InFile) override;
@@ -285,17 +286,19 @@ class PrecompilePreambleAction : public ASTFrontendAction {
285286
bool WritePCHFile; // otherwise the PCH is written into the PCHBuffer only.
286287
std::unique_ptr<llvm::raw_pwrite_stream> FileOS; // null if in-memory
287288
PreambleCallbacks &Callbacks;
289+
bool AllowASTWithErrors;
288290
};
289291

290292
class PrecompilePreambleConsumer : public PCHGenerator {
291293
public:
292294
PrecompilePreambleConsumer(PrecompilePreambleAction &Action, Preprocessor &PP,
293295
ModuleCache &ModCache, StringRef isysroot,
294296
std::shared_ptr<PCHBuffer> Buffer,
295-
const CodeGenOptions &CodeGenOpts)
297+
const CodeGenOptions &CodeGenOpts,
298+
bool AllowASTWithErrors = true)
296299
: PCHGenerator(PP, ModCache, "", isysroot, std::move(Buffer), CodeGenOpts,
297300
ArrayRef<std::shared_ptr<ModuleFileExtension>>(),
298-
/*AllowASTWithErrors=*/true),
301+
AllowASTWithErrors),
299302
Action(Action) {}
300303

301304
bool HandleTopLevelDecl(DeclGroupRef DG) override {
@@ -337,7 +340,7 @@ PrecompilePreambleAction::CreateASTConsumer(CompilerInstance &CI,
337340

338341
return std::make_unique<PrecompilePreambleConsumer>(
339342
*this, CI.getPreprocessor(), CI.getModuleCache(), Sysroot, Buffer,
340-
CI.getCodeGenOpts());
343+
CI.getCodeGenOpts(), AllowASTWithErrors);
341344
}
342345

343346
template <class T> bool moveOnNoError(llvm::ErrorOr<T> Val, T &Output) {
@@ -415,7 +418,8 @@ llvm::ErrorOr<PrecompiledPreamble> PrecompiledPreamble::Build(
415418
IntrusiveRefCntPtr<DiagnosticsEngine> Diagnostics,
416419
IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS,
417420
std::shared_ptr<PCHContainerOperations> PCHContainerOps, bool StoreInMemory,
418-
StringRef StoragePath, PreambleCallbacks &Callbacks) {
421+
StringRef StoragePath, PreambleCallbacks &Callbacks,
422+
bool AllowASTWithErrors) {
419423
assert(VFS && "VFS is null");
420424

421425
auto PreambleInvocation = std::make_shared<CompilerInvocation>(Invocation);
@@ -512,7 +516,7 @@ llvm::ErrorOr<PrecompiledPreamble> PrecompiledPreamble::Build(
512516
auto Act = std::make_unique<PrecompilePreambleAction>(
513517
std::move(Buffer),
514518
/*WritePCHFile=*/Storage->getKind() == PCHStorage::Kind::TempFile,
515-
Callbacks);
519+
Callbacks, AllowASTWithErrors);
516520
if (!Act->BeginSourceFile(*Clang, Clang->getFrontendOpts().Inputs[0]))
517521
return BuildPreambleError::BeginSourceFileFailed;
518522

clang/test/Driver/sycl-unsupported.cpp

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,15 @@
6464
// UNSUPPORTED_OPT-NOT: clang{{.*}} "-fsycl-is-device"{{.*}} "[[OPT_CC1]]{{.*}}"
6565
// UNSUPPORTED_OPT: clang{{.*}} "-fsycl-is-host"{{.*}} "[[OPT_CC1]]{{.*}}"
6666

67+
// "--auto-pch" should only be enabled for SYCL RTC compilations, regular driver
68+
// shouldn't know about it:
69+
//
70+
// RUN: not %clangxx -### %s --auto-pch 2>&1 | FileCheck %s --check-prefix AUTO_PCH
71+
// RUN: not %clangxx -fsycl-device-only -### %s --auto-pch 2>&1 | FileCheck %s --check-prefix AUTO_PCH
72+
// RUN: not %clangxx -fsycl -### %s --auto-pch 2>&1 | FileCheck %s --check-prefix AUTO_PCH
73+
//
74+
// AUTO_PCH: error: unknown argument: '--auto-pch'
75+
6776
// FPGA support has been removed, usage of any FPGA specific options and any
6877
// options that have FPGA specific arguments should emit a specific error
6978
// diagnostic.

sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp

Lines changed: 116 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
#include "Resource.h"
1313
#include "translation/Translation.h"
1414

15+
#include "clang/Lex/PreprocessorOptions.h"
1516
#include <clang/Basic/DiagnosticDriver.h>
1617
#include <clang/Basic/Version.h>
1718
#include <clang/CodeGen/CodeGenAction.h>
@@ -25,6 +26,7 @@
2526
#include <clang/Frontend/ChainedDiagnosticConsumer.h>
2627
#include <clang/Frontend/CompilerInstance.h>
2728
#include <clang/Frontend/FrontendActions.h>
29+
#include <clang/Frontend/PrecompiledPreamble.h>
2830
#include <clang/Frontend/TextDiagnosticBuffer.h>
2931
#include <clang/Frontend/TextDiagnosticPrinter.h>
3032
#include <clang/Frontend/Utils.h>
@@ -78,6 +80,12 @@ class SYCLToolchain {
7880
}
7981
}
8082

83+
struct PrecompiledPreambles {
84+
using key = std::pair<std::string /*Opts*/, std::string /*Preamble*/>;
85+
std::mutex Mutex;
86+
std::map<key, std::shared_ptr<PrecompiledPreamble>> PreamblesMap;
87+
};
88+
8189
// Similar to FrontendActionFactory, but we don't take ownership of
8290
// `FrontendAction`, nor do we create copies of it as we only perform a single
8391
// `ToolInvocation`.
@@ -140,9 +148,15 @@ class SYCLToolchain {
140148
}
141149

142150
ArgStringList ASL;
143-
for_each(DAL, [&DAL, &ASL](Arg *A) { A->render(DAL, ASL); });
144-
for_each(UserArgList,
145-
[&UserArgList, &ASL](Arg *A) { A->render(UserArgList, ASL); });
151+
for (Arg *A : DAL)
152+
A->render(DAL, ASL);
153+
for (Arg *A : UserArgList) {
154+
Option Group = A->getOption().getGroup();
155+
if (Group.isValid() && Group.getID() == OPT_sycl_rtc_only_Group)
156+
continue;
157+
158+
A->render(UserArgList, ASL);
159+
}
146160

147161
std::vector<std::string> CommandLine;
148162
CommandLine.reserve(ASL.size() + 2);
@@ -153,6 +167,83 @@ class SYCLToolchain {
153167
return CommandLine;
154168
}
155169

170+
class ActionWithPCHPreamble : public Action {
171+
std::string CmdLineOpts;
172+
173+
public:
174+
ActionWithPCHPreamble(FrontendAction &FEAction, std::string &&CmdLineOpts)
175+
: Action(FEAction), CmdLineOpts(std::move(CmdLineOpts)) {}
176+
177+
bool runInvocation(std::shared_ptr<CompilerInvocation> Invocation,
178+
FileManager *Files,
179+
std::shared_ptr<PCHContainerOperations> PCHContainerOps,
180+
DiagnosticConsumer *DiagConsumer) override {
181+
auto MainFilePath = Invocation->getFrontendOpts().Inputs[0].getFile();
182+
auto MainFileBuffer = Files->getBufferForFile(MainFilePath);
183+
assert(MainFileBuffer && "Can't get memory buffer for in-memory source?");
184+
185+
PreambleBounds Bounds = ComputePreambleBounds(
186+
Invocation->getLangOpts(), **MainFileBuffer, 100 /* MaxLines */);
187+
188+
PrecompiledPreambles::key key{
189+
std::move(CmdLineOpts),
190+
(*MainFileBuffer)->getBuffer().substr(0, Bounds.Size).str()};
191+
192+
std::shared_ptr<PrecompiledPreamble> Preamble;
193+
{
194+
PrecompiledPreambles &Preambles = SYCLToolchain::instance().Preambles;
195+
std::lock_guard<std::mutex> Lock{Preambles.Mutex};
196+
auto [It, Inserted] = Preambles.PreamblesMap.try_emplace(key);
197+
198+
if (Inserted) {
199+
PreambleCallbacks Callbacks;
200+
auto DiagIds = llvm::makeIntrusiveRefCnt<DiagnosticIDs>();
201+
auto DiagOpts = Invocation->getDiagnosticOpts();
202+
auto Diags = llvm::makeIntrusiveRefCnt<DiagnosticsEngine>(
203+
DiagIds, DiagOpts, DiagConsumer, false);
204+
205+
static std::string StoragePath =
206+
(SYCLToolchain::instance().getPrefix() + "/preambles").str();
207+
llvm::ErrorOr<PrecompiledPreamble> NewPreamble =
208+
PrecompiledPreamble::Build(
209+
*Invocation, MainFileBuffer->get(), Bounds, Diags,
210+
Files->getVirtualFileSystemPtr(), PCHContainerOps,
211+
/*StorePreamblesInMemory*/ true, StoragePath, Callbacks,
212+
/*AllowASTWithErrors=*/false);
213+
214+
if (!NewPreamble)
215+
return false;
216+
217+
It->second = std::make_shared<PrecompiledPreamble>(
218+
std::move(NewPreamble.get()));
219+
}
220+
221+
Preamble = It->second;
222+
} // End lock
223+
224+
assert(Preamble);
225+
assert(Preamble->CanReuse(*Invocation, **MainFileBuffer, Bounds,
226+
Files->getVirtualFileSystem()));
227+
228+
assert(Invocation->getPreprocessorOpts().RetainRemappedFileBuffers ==
229+
false);
230+
// `PreprocessorOptions::RetainRemappedFileBuffers` defaults to false, so
231+
// MemoryBuffer will be cleaned up by the CompilerInstance, thus
232+
// `std::unique_ptr::release`.
233+
auto Buf = llvm::MemoryBuffer::getMemBufferCopy(
234+
(*MainFileBuffer)->getBuffer(), MainFilePath)
235+
.release();
236+
237+
auto VFS = Files->getVirtualFileSystemPtr();
238+
Preamble->AddImplicitPreamble(*Invocation, VFS, Buf);
239+
auto NewFiles = makeIntrusiveRefCnt<FileManager>(
240+
Files->getFileSystemOpts(), std::move(VFS));
241+
242+
return Action::runInvocation(std::move(Invocation), NewFiles.get(),
243+
std::move(PCHContainerOps), DiagConsumer);
244+
}
245+
};
246+
156247
public:
157248
static SYCLToolchain &instance() {
158249
static SYCLToolchain Instance;
@@ -162,7 +253,8 @@ class SYCLToolchain {
162253
bool run(const InputArgList &UserArgList, BinaryFormat Format,
163254
const char *SourceFilePath, FrontendAction &FEAction,
164255
IntrusiveRefCntPtr<FileSystem> FSOverlay = nullptr,
165-
DiagnosticConsumer *DiagConsumer = nullptr) {
256+
DiagnosticConsumer *DiagConsumer = nullptr,
257+
bool UseAutoPCH = false) {
166258
std::vector<std::string> CommandLine =
167259
createCommandLine(UserArgList, Format, SourceFilePath);
168260

@@ -175,9 +267,21 @@ class SYCLToolchain {
175267
auto Files = llvm::makeIntrusiveRefCnt<clang::FileManager>(
176268
clang::FileSystemOptions{"." /* WorkingDir */}, FS);
177269

178-
Action A{FEAction};
179-
ToolInvocation TI{std::move(CommandLine), &A, Files.get(),
180-
std::make_shared<PCHContainerOperations>()};
270+
Action Normal{FEAction};
271+
272+
// User compilation options must be part of the key in the preambles map. We
273+
// can either use "raw" user options or the "processed" from
274+
// `createCommandLine` as long as we're consistent in what we're using.
275+
// Current internal APIs pass `InputArgList` around instead of a single
276+
// `std::string`, so it's easier to use `CommandLine`. Just make sure to
277+
// drop `rtc_N.cpp` that is always different:
278+
ActionWithPCHPreamble WithPreamble{FEAction,
279+
join(drop_end(CommandLine, 1), " ")};
280+
ToolInvocation TI{std::move(CommandLine),
281+
UseAutoPCH ? static_cast<Action *>(&WithPreamble)
282+
: &Normal,
283+
Files.get(), std::make_shared<PCHContainerOperations>()};
284+
181285
TI.setDiagnosticConsumer(DiagConsumer ? DiagConsumer : &IgnoreDiag);
182286

183287
return TI.run();
@@ -217,6 +321,8 @@ class SYCLToolchain {
217321
std::string ClangXXExe = (Prefix + "/bin/clang++").str();
218322
llvm::IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem> ToolchainFS =
219323
llvm::makeIntrusiveRefCnt<llvm::vfs::InMemoryFileSystem>();
324+
325+
PrecompiledPreambles Preambles;
220326
};
221327

222328
class ClangDiagnosticWrapper {
@@ -348,9 +454,11 @@ Expected<ModuleUPtr> jit_compiler::compileDeviceCode(
348454
DiagnosticOptions DiagOpts;
349455
ClangDiagnosticWrapper Wrapper(BuildLog, &DiagOpts);
350456

457+
bool AutoPCH = UserArgList.hasArg(OPT_auto_pch);
458+
351459
if (SYCLToolchain::instance().run(UserArgList, Format, SourceFile.Path, ELOA,
352460
getInMemoryFS(SourceFile, IncludeFiles),
353-
Wrapper.consumer())) {
461+
Wrapper.consumer(), AutoPCH)) {
354462
return ELOA.takeModule();
355463
} else {
356464
return createStringError(BuildLog);

sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler.asciidoc

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1127,6 +1127,60 @@ build_options{{
11271127
Relax the requirement that parameter types for free-function kernels must be
11281128
forward-declarable.
11291129

1130+
===== `--auto-pch`
1131+
1132+
The first time this option is passed, the compiler finds the initial set of
1133+
preprocessor directives (e.g., `#define`/`#include`) and comments in the
1134+
compiled source string (the preamble) and pre-compiles it. Essentialy, it
1135+
behaves like a precompiled header containing that preamble. On subsequent
1136+
compilations, if the compiled source string has the same preamble and the same
1137+
compilation options are used, the precompiled preamble is used, which speeds up
1138+
compilation.
1139+
1140+
If the compiled source string has a different preamble or compilation options
1141+
differ, a new precompiled preamble is generated, and that preamble can also be
1142+
used to speed up subsequent compilations. These precompiled preambles are stored
1143+
internally in memory, so they do not persist from one execution of the
1144+
application to the next.
1145+
1146+
The preamble ends with the first statement that is not a preprocessor directive
1147+
or a comment. For example, in the code below, the preamble ends immediately
1148+
before the namespace syclext = statement.
1149+
1150+
[source,c++]
1151+
----
1152+
#define SYCL_SIMPLE_SWIZZLES
1153+
#include <sycl/sycl.hpp>
1154+
1155+
// Auto-detected preamble ends before next line:
1156+
namespace syclext = sycl::ext::oneapi;
1157+
namespace syclexp = sycl::ext::oneapi::experimental;
1158+
1159+
extern "C"
1160+
SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclexp::nd_range_kernel<1>))
1161+
void iota(sycl::vec<int, 2> *p) {
1162+
size_t id = syclext::this_work_item::get_nd_item<1>().get_global_linear_id();
1163+
p[id] = p[id].xx();
1164+
}
1165+
----
1166+
1167+
The compiler uses the following factors when deciding whether a previously
1168+
generated precompiled preamble can be used:
1169+
1170+
* The preamble must exactly match (including whitespace and comments).
1171+
* The compilation options must match (including the same order and the same spelling).
1172+
* There are also certain restrictions that the user must avoid:
1173+
1174+
- The content of each header file in the preamble must not change from one
1175+
compilation to another.
1176+
- It is not recommended to use the `+__DATE__+` or `+__TIME__+` macros in the
1177+
preamble header files. Depending on the circumstances, these macros may be
1178+
replaced with the date / time that corresponds to the time at which the
1179+
precompiled preamble was generated, rather than the time at which the source
1180+
string is compiled. See also the clang compiler options `-Wpch-date-time`
1181+
and `-Werror=pch-date-time`, which cause the compiler to diagnose a warning
1182+
or error in this scenario.
1183+
11301184
=== Known issues and limitations when the language is `sycl`
11311185

11321186
==== Changing the compiler action or output

0 commit comments

Comments
 (0)