framework: half-precision presets#566
Merged
Treece-Burgess merged 3 commits intoicl-utk-edu:masterfrom Feb 19, 2026
Merged
Conversation
This was referenced Feb 18, 2026
18d06d3 to
193bbd7
Compare
Updates to framework to accommodate half-precision preset events. These changes have been tested on the ARM Neoverse V2 architecture.
Add half-precision presets for Intel Sapphire Rapids and Emerald Rapids. Also modify existing presets monitoring all precisions (PAPI_FP_OPS, PAPI_FP_INS, and PAPI_VEC_INS) if the native, half-precision events cannot be added to the same event set as the other native events in the existing preset definition. These changes have been tested on the Intel Sapphire Rapids and Emerald Rapids architectures.
The header file 'papi.h' states that the PAPI_MAX_INFO_TERMS "should match PAPI_EVENTS_IN_DERIVED_EVENT defined in papi_internal.h", However, PAPI_MAX_INFO_TERMS is defined as 12; whereas, PAPI_EVENTS_IN_DERIVED_EVENT is defined as 8. This commit changes the 8 to 12. These changes have been tested on the Intel Sapphire Rapids architecture.
b6e66e3 to
179d932
Compare
Treece-Burgess
approved these changes
Feb 19, 2026
Contributor
Treece-Burgess
left a comment
There was a problem hiding this comment.
I have tested this PR on an Intel Xeon Gold 6430 (SPR) on Picard at Oregon.
papi_avail show the new PAPI_HP_OPS, PAPI_VEC_HP presets along with the updated PAPI_VEC_INS and PAPI_FP_INS.
papi_command_line worked with the aforementioned presets.
Lastly, I tested with a custom application code to verify the counts. Looks good.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Description
These changes have been tested on the ARM Neoverse V2, Intel Sapphire Rapids, and Intel Emerald Rapids architectures.
Author Checklist
Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
Commits are self contained and only do one thing
Commits have a header of the form:
module: short descriptionCommits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
The PR needs to pass all the tests