-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[DataLayout] Add byte specification #106536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
nikic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks reasonable to me. Please start a fresh RFC thread for it.
There is a thread I started two years ago specifically for discussing the support in data layout: |
I don't think this is going to be fruitful. To move forward with any part of this, there needs to be a strong, high-level consensus to move forward with middle-end support for non-8-bit bytes. Your RFC should outline, at least in broad strokes, all the changes that will be needed to support this (and what will / won't be supported and why) and not focus on just details of DataLayout. |
OK, sure. I'll try to have it prepared by the end of the next week. |
|
(Just some comments for now, haven't really reviewed the code.) Thanks! This patch may be valuable for people who wants to support other byte-sizes downstream. Even if it never gets merged. My idea here https://discourse.llvm.org/t/rfc-on-non-8-bit-bytes-and-the-target-for-it/53455/44?u=bjope was to try to add patches "bottom-up" rather than "top-down". I kind of fear that we would hit dead-end again (as any previous attempt), if we open up pandoras box like this. People will say things like "Hey, now when there is a way to specify byte width in the DataLayout, then we need lots of tests to guarantee that it works all over the place. Or how do we describe what the limitations are? And what is a byte really? And do I need to think about this whenever I add a new pass or optimization?". Maybe one could need a top-level I don't know if that makes sense. Nor if it actually would be more feasible doing it in such a "bottom-up" way, compared to starting "top-down" with adding support to set the byte-size without really supporting it throughout LLVM. If you read this far you probably wonder how we would test that any refactoring in SimplifyLibCalls, or isBytewiseValue, or DAGCombiner, etc that would aim at making life simpler for non-8-bit-bytes. I don't know really. Obviously we rely on this being tested downstream today. So it wouldn't be much different? Although in for example SimplifyLibCalls lots of things should map to C and be described as char rather than i8. So there should really be a "char size" member somewhere (just like one can change CharUnit in clang). And then maybe lit test might use some option to set that when testing SimplifyLibCalls pass? I don't know really, but maybe we can find ways around it without having official support for changing byte size in DataLayout. (With all that said, it wouldn't mind getting support in-tree for 16-bit bytes. Just being afraid that if we aim too high, then we end up with nothing, just like all previous RFC-discussion about byte size....) |
|
@bjope FWIW, my perspective here is kind of diametrically opposed: I am happy to accept non-8-bit-byte type support upstream, as long as it is actually a properly supported, testable configuration. But I would not accept patches to "make support easier" for a configuration that is not officially supported/tested. (A corollary of this is that I am also only open to introducing this support in the middle-end, where we can cleanly test this using non-standard data layouts.) |
|
@bjope |
arichardson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this looks fine to me but I'll defer to others about whether this is the correct strategy towards non-8-bit bytes.
Personally, I much prefer having code upstream that is testable rather than just a fixes upstream that can only be tested downstream (this has been a notable issue for CHERI where each upstream merge brings in new test failures), so I like having this upstream.
| bool isBigEndian() const { return BigEndian; } | ||
|
|
||
| /// Returns the size of a byte, in bits. | ||
| unsigned getByteWidth() const { return ByteWidth; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| unsigned getByteWidth() const { return ByteWidth; } | |
| unsigned getByteWidth() const { return 1 << ByteShift; } |
Since it is a power-of-two, this will allow more optimized code (https://godbolt.org/z/5T86s84Ez)
| class StructLayout final : public TrailingObjects<StructLayout, TypeSize> { | ||
| TypeSize StructSize; | ||
| Align StructAlignment; | ||
| unsigned ByteWidth; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less critical here since we are multiplying not dividing, but maybe always use a shift? In that case we could even use uint8_t to fit it into the padding.
llvm/lib/IR/DataLayout.cpp
Outdated
| // less conservative, they should have specified it explicitly in the data | ||
| // layout. | ||
| return Align(PowerOf2Ceil(BitWidth / 8)); | ||
| return Align(PowerOf2Ceil(getTypeStoreSize(Ty).getFixedValue())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is actually a change? Isn't the store size for fp80 128 bits (or maybe 96)? That is not the same as the type size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The store size is 80 bits as long as 80 is a multiple of the byte width. If it is not, the division must be ceiling, and getTypeStoreSize does exactly that. This is one of the places where the calculation is wrong, but produces the correct result on targets with 8-bit bytes.
I'll revert this change as it is not strictly related. Thanks for spotting!
| bool BigEndian = false; | ||
|
|
||
| /// The size of a byte in bits. | ||
| unsigned ByteWidth = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should store this as a shift value instead of a width since it will allow Clang/GCC to elide divisions (see https://godbolt.org/z/5T86s84Ez case 3 optimizes to a right shift).
And it looks like this also works for divideCeil: https://godbolt.org/z/1oa18Gns7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it too, but decided to leave it as is until there is an evidence that it affects compilation time. I didn't measure yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forcing everyone to deal with a shift value is not very ergonomic, so to be clear I'm not suggesting using getByteShift(), but rather the approach of storing a shift value and then modifying getByteSize() to return 1u << shift.
This will allow using uint8_t to fit it into existing padding and more importantly allow the compiler to elide divisions after inlining.
Great work. Just a minor question: why not use character 'B' instead of 'b' for byte? |
Most of the specifiers are lowercase letters, no other reason. |
98c3676 to
31ee84f
Compare
31ee84f to
b46d554
Compare
b46d554 to
4048990
Compare
4048990 to
2488060
Compare
2488060 to
3539d10
Compare
This patch adds byte specification to data layout string. The specification is `b:<size>`, where `<size>` is the size of a byte in bits (later referred to as "byte width"). Limitations: * The only values allowed for byte width are 8, 16, and 32. 16-bit bytes are popular, and my downstream target has 32-bit bytes. These are the widths I'm going to add tests for in follow-up patches, so this restriction only exists because other widths are untested. * It is assumed that bytes are the same in all address spaces. Supporting different byte widths in different address spaces would require adding an address space argument to all DataLayout methods that query ABI / preferred alignments because they return *byte* alignments, and those will be different for different address spaces. This is too much effort, but it can be done in the future if the need arises, the specification reserves address space number before ':'.
3539d10 to
670e94d
Compare

This patch adds byte specification to data layout string.
The specification is
b:<size>, where<size>is the size of a bytein bits (later referred to as "byte width").
Limitations:
16-bit bytes are popular, and my downstream target has 32-bit bytes.
These are the widths I'm going to add tests for in follow-up patches,
so this restriction only exists because other widths are untested.
Supporting different byte widths in different address spaces would
require adding an address space argument to all DataLayout methods
that query ABI / preferred alignments because they return byte
alignments, and those will be different for different address spaces.
This is too much effort, but it can be done in the future if the need
arises, the specification reserves address space number before ':'.
There were many RFCs for supporting unusual byte widths,
but I'll start one more discourse thread as suggested by @nikic below.