-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[PGO][Offload] Allow PGO flags to be used on GPU targets #94268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
EthanLuisMcDonough
merged 5 commits into
llvm:main
from
EthanLuisMcDonough:gpuprofdriver
Mar 20, 2025
Merged
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
3a2047c
[PGO][Offload] Allow PGO flags to be used on GPU targets
EthanLuisMcDonough 3fcaded
Revert == nullptr check to !
EthanLuisMcDonough 298dafc
Fix version extraction
EthanLuisMcDonough 0dd32c3
Manually set Version instead of changing instprof macros
EthanLuisMcDonough afd16b0
Revert Version change in instrprofdata.inc
EthanLuisMcDonough File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
llvm/test/tools/llvm-profdata/malformed-not-space-for-another-header.test
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
llvm/test/tools/llvm-profdata/malformed-ptr-to-counter-array.test
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -16,6 +16,7 @@ | |||||
|
|
||||||
| #include "Shared/Utils.h" | ||||||
|
|
||||||
| #include "llvm/ProfileData/InstrProfData.inc" | ||||||
| #include "llvm/Support/Error.h" | ||||||
|
|
||||||
| #include <cstring> | ||||||
|
|
@@ -214,6 +215,13 @@ GenericGlobalHandlerTy::readProfilingGlobals(GenericDeviceTy &Device, | |||||
| if (auto Err = readGlobalFromDevice(Device, Image, DataGlobal)) | ||||||
| return Err; | ||||||
| DeviceProfileData.Data.push_back(std::move(Data)); | ||||||
| } else if (*NameOrErr == INSTR_PROF_QUOTE(INSTR_PROF_RAW_VERSION_VAR)) { | ||||||
| uint64_t RawVersionData; | ||||||
| GlobalTy RawVersionGlobal(NameOrErr->str(), Sym.getSize(), | ||||||
| &RawVersionData); | ||||||
| if (auto Err = readGlobalFromDevice(Device, Image, RawVersionGlobal)) | ||||||
| return Err; | ||||||
| DeviceProfileData.Version = RawVersionData; | ||||||
| } | ||||||
| } | ||||||
| return DeviceProfileData; | ||||||
|
|
@@ -265,7 +273,7 @@ void GPUProfGlobals::dump() const { | |||||
| } | ||||||
|
|
||||||
| Error GPUProfGlobals::write() const { | ||||||
| if (!__llvm_write_custom_profile) | ||||||
| if (__llvm_write_custom_profile == nullptr) | ||||||
|
||||||
| if (__llvm_write_custom_profile == nullptr) | |
| if (!__llvm_write_custom_profile) |
nit. this is idiomatic and shouldn't have been changed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| // RUN: %libomptarget-compile-generic -fcreate-profile \ | ||
| // RUN: -Xarch_device -fprofile-generate | ||
jhuber6 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| // RUN: env LLVM_PROFILE_FILE=%basename_t.llvm.profraw \ | ||
| // RUN: %libomptarget-run-generic 2>&1 | ||
| // RUN: llvm-profdata show --all-functions --counts \ | ||
| // RUN: %target_triple.%basename_t.llvm.profraw | \ | ||
| // RUN: %fcheck-generic --check-prefix="LLVM-PGO" | ||
|
|
||
| // RUN: %libomptarget-compile-generic -fcreate-profile \ | ||
| // RUN: -Xarch_device -fprofile-instr-generate | ||
| // RUN: env LLVM_PROFILE_FILE=%basename_t.clang.profraw \ | ||
| // RUN: %libomptarget-run-generic 2>&1 | ||
| // RUN: llvm-profdata show --all-functions --counts \ | ||
| // RUN: %target_triple.%basename_t.clang.profraw | \ | ||
| // RUN: %fcheck-generic --check-prefix="CLANG-PGO" | ||
|
|
||
| // REQUIRES: gpu | ||
| // REQUIRES: pgo | ||
|
|
||
| int test1(int a) { return a / 2; } | ||
| int test2(int a) { return a * 2; } | ||
|
|
||
| int main() { | ||
| int m = 2; | ||
| #pragma omp target | ||
| for (int i = 0; i < 10; i++) { | ||
| m = test1(m); | ||
| for (int j = 0; j < 2; j++) { | ||
| m = test2(m); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // LLVM-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: | ||
| // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} | ||
| // LLVM-PGO: Counters: 4 | ||
| // LLVM-PGO: Block counts: [20, 10, 2, 1] | ||
|
|
||
| // LLVM-PGO-LABEL: test1: | ||
| // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} | ||
| // LLVM-PGO: Counters: 1 | ||
| // LLVM-PGO: Block counts: [10] | ||
|
|
||
| // LLVM-PGO-LABEL: test2: | ||
| // LLVM-PGO: Hash: {{0[xX][0-9a-fA-F]+}} | ||
| // LLVM-PGO: Counters: 1 | ||
| // LLVM-PGO: Block counts: [20] | ||
|
|
||
| // LLVM-PGO-LABEL: Instrumentation level: | ||
| // LLVM-PGO-SAME: IR | ||
| // LLVM-PGO-SAME: entry_first = 0 | ||
| // LLVM-PGO-LABEL: Functions shown: | ||
| // LLVM-PGO-SAME: 3 | ||
| // LLVM-PGO-LABEL: Maximum function count: | ||
| // LLVM-PGO-SAME: 20 | ||
| // LLVM-PGO-LABEL: Maximum internal block count: | ||
| // LLVM-PGO-SAME: 10 | ||
|
|
||
| // CLANG-PGO-LABEL: __omp_offloading_{{[_0-9a-zA-Z]*}}_main_{{[_0-9a-zA-Z]*}}: | ||
| // CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} | ||
| // CLANG-PGO: Counters: 3 | ||
| // CLANG-PGO: Function count: 0 | ||
| // CLANG-PGO: Block counts: [11, 20] | ||
|
|
||
| // CLANG-PGO-LABEL: test1: | ||
| // CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} | ||
| // CLANG-PGO: Counters: 1 | ||
| // CLANG-PGO: Function count: 10 | ||
| // CLANG-PGO: Block counts: [] | ||
|
|
||
| // CLANG-PGO-LABEL: test2: | ||
| // CLANG-PGO: Hash: {{0[xX][0-9a-fA-F]+}} | ||
| // CLANG-PGO: Counters: 1 | ||
| // CLANG-PGO: Function count: 20 | ||
| // CLANG-PGO: Block counts: [] | ||
|
|
||
| // CLANG-PGO-LABEL: Instrumentation level: | ||
| // CLANG-PGO-SAME: Front-end | ||
| // CLANG-PGO-LABEL: Functions shown: | ||
| // CLANG-PGO-SAME: 3 | ||
| // CLANG-PGO-LABEL: Maximum function count: | ||
| // CLANG-PGO-SAME: 20 | ||
| // CLANG-PGO-LABEL: Maximum internal block count: | ||
| // CLANG-PGO-SAME: 20 | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ensures that a GPU profile is written with the version pulled from the GPU. Previously, if the host used LLVM-level instrumentation and the device has clang-level instrumentation, the GPU profile would use the host's format. This change ensures that is not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is a part of public API and is used by a number of users that have their own runtime implementation (for example baremetal users, OS kernels) and this change is going to break all of them. Is there another way we could handle this case that wouldn't require this change and avoid all that churn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best solution would be to forbid the user from using different instrumentation levels (LLVM IR vs clang) on the host and device. I can replace the version replacement with a check that ensures the device version matches the host version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the version detection and made sure to set the
Versionproperty the custom value instead of changing the default value.