-
Notifications
You must be signed in to change notification settings - Fork 257
Add MX FP4 device conversion tests #1889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Patch was merged llvm/llvm-project#127464, so repro is not needed anymore. |
| return __builtin_amdgcn_cvt_scalef32_pk_f32_fp4(value.bitwise, type_convert<float>(scale), 0); | ||
| float2_t tmp = | ||
| __builtin_amdgcn_cvt_scalef32_pk_f32_fp4(value.bitwise, type_convert<float>(scale), 0); | ||
| // permute high bits and low bits to match the order of the original vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need these changes because we modified packing order in the fp4 storage?
__host__ __device__ inline type pack(const type x0, const type x1)
{
return (x0 << 4) | (x1 & 0b00001111);
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to provide more context for both comments here. In the CK we have several ways of packing elements into vectors: llvm clang vectors, our custom non_native_vector_base and custom types which we pack manually. In llvm clang vectors 0th element is stored in the highest bits and Nth element is in the lowest bits. Same layout is used in the non_native_vector_base, which makes sense as we use llvm clang vector under the hood. So I decided to update the f4x2_pk_t type to have a consistent layout with other vectors. I believe the issue with native conversion instructions is that they swap high and low bits, so we have to swap either input or output vector elements. I believe keeping old f4x2_pk_t layout would help with this issue, but have to be well documented and considered in the tests. @andriy-ca what is your perspective on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having elements in bytes aligned consistently with the other data types makes sense.
| __host__ __device__ inline type pack(const type x0, const type x1) | ||
| { | ||
| return (x1 << 4) | (x0 & 0b00001111); | ||
| return (x0 << 4) | (x1 & 0b00001111); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the original order incorrect?
andriy-ca
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
andriy-ca
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Proposed changes
This PR adds MX FP4 tests and fixes MX FP4 functionality.
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
This PR updates FP4 elements layout in a vector of 2 to comply with other vector types, see discussion below.