Skip to content

Conversation

@s-barannikov
Copy link
Contributor

@s-barannikov s-barannikov commented Aug 29, 2024

This patch adds byte specification to data layout string.
The specification is b:<size>, where <size> is the size of a byte
in bits (later referred to as "byte width").

Limitations:

  • The only values allowed for byte width are 8, 16, and 32.
    16-bit bytes are popular, and my downstream target has 32-bit bytes.
    These are the widths I'm going to add tests for in follow-up patches,
    so this restriction only exists because other widths are untested.
  • It is assumed that bytes are the same in all address spaces.
    Supporting different byte widths in different address spaces would
    require adding an address space argument to all DataLayout methods
    that query ABI / preferred alignments because they return byte
    alignments, and those will be different for different address spaces.
    This is too much effort, but it can be done in the future if the need
    arises, the specification reserves address space number before ':'.

There were many RFCs for supporting unusual byte widths,
but I'll start one more discourse thread as suggested by @nikic below.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks reasonable to me. Please start a fresh RFC thread for it.

@s-barannikov
Copy link
Contributor Author

This change looks reasonable to me. Please start a fresh RFC thread for it.

There is a thread I started two years ago specifically for discussing the support in data layout:
https://discourse.llvm.org/t/byte-width-specification-in-data-layout-string/62202
It provides more context and is not overloaded with comments. I'll try to revive it instead of cloning.

@nikic
Copy link
Contributor

nikic commented Aug 29, 2024

This change looks reasonable to me. Please start a fresh RFC thread for it.

There is a thread I started two years ago specifically for discussing the support in data layout: https://discourse.llvm.org/t/byte-width-specification-in-data-layout-string/62202 It provides more context and is not overloaded with comments. I'll try to revive it instead of cloning.

I don't think this is going to be fruitful. To move forward with any part of this, there needs to be a strong, high-level consensus to move forward with middle-end support for non-8-bit bytes. Your RFC should outline, at least in broad strokes, all the changes that will be needed to support this (and what will / won't be supported and why) and not focus on just details of DataLayout.

@s-barannikov
Copy link
Contributor Author

This change looks reasonable to me. Please start a fresh RFC thread for it.

There is a thread I started two years ago specifically for discussing the support in data layout: https://discourse.llvm.org/t/byte-width-specification-in-data-layout-string/62202 It provides more context and is not overloaded with comments. I'll try to revive it instead of cloning.

I don't think this is going to be fruitful. To move forward with any part of this, there needs to be a strong, high-level consensus to move forward with middle-end support for non-8-bit bytes. Your RFC should outline, at least in broad strokes, all the changes that will be needed to support this (and what will / won't be supported and why) and not focus on just details of DataLayout.

OK, sure. I'll try to have it prepared by the end of the next week.

@bjope
Copy link
Collaborator

bjope commented Aug 29, 2024

(Just some comments for now, haven't really reviewed the code.)

Thanks! This patch may be valuable for people who wants to support other byte-sizes downstream. Even if it never gets merged.

My idea here https://discourse.llvm.org/t/rfc-on-non-8-bit-bytes-and-the-target-for-it/53455/44?u=bjope was to try to add patches "bottom-up" rather than "top-down".
Such as trying to refactor/fix code in a way that it either would magically work for other byte sizes (sometimes with just a little bit of adaption), or make it easy to spot code that really assume 8-bit bytes. This to reduce amount of places that would need to be touch for people who use different byte sizes down stream.
But the idea was not to support changing the byte size upstream. Officially LLVM would still only support 8 bit bytes (at least for now).

I kind of fear that we would hit dead-end again (as any previous attempt), if we open up pandoras box like this. People will say things like "Hey, now when there is a way to specify byte width in the DataLayout, then we need lots of tests to guarantee that it works all over the place. Or how do we describe what the limitations are? And what is a byte really? And do I need to think about this whenever I add a new pass or optimization?".

Maybe one could need a top-level DataLaytout::getByteWidth() { return 8; }, just to let us downstream maintainers annotate/assert (in upstream code) that a function has been identified as not supporting other byte sizes.

I don't know if that makes sense. Nor if it actually would be more feasible doing it in such a "bottom-up" way, compared to starting "top-down" with adding support to set the byte-size without really supporting it throughout LLVM.

If you read this far you probably wonder how we would test that any refactoring in SimplifyLibCalls, or isBytewiseValue, or DAGCombiner, etc that would aim at making life simpler for non-8-bit-bytes. I don't know really. Obviously we rely on this being tested downstream today. So it wouldn't be much different? Although in for example SimplifyLibCalls lots of things should map to C and be described as char rather than i8. So there should really be a "char size" member somewhere (just like one can change CharUnit in clang). And then maybe lit test might use some option to set that when testing SimplifyLibCalls pass? I don't know really, but maybe we can find ways around it without having official support for changing byte size in DataLayout.

(With all that said, it wouldn't mind getting support in-tree for 16-bit bytes. Just being afraid that if we aim too high, then we end up with nothing, just like all previous RFC-discussion about byte size....)

@nikic
Copy link
Contributor

nikic commented Aug 29, 2024

@bjope FWIW, my perspective here is kind of diametrically opposed: I am happy to accept non-8-bit-byte type support upstream, as long as it is actually a properly supported, testable configuration. But I would not accept patches to "make support easier" for a configuration that is not officially supported/tested.

(A corollary of this is that I am also only open to introducing this support in the middle-end, where we can cleanly test this using non-standard data layouts.)

@s-barannikov
Copy link
Contributor Author

s-barannikov commented Aug 29, 2024

@bjope
Thank you for your comment, it raises some important questions. If you don't mind, I'll quote it and reply to it in the future RFC so that we have the discussion going in one place.

Copy link
Member

@arichardson arichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this looks fine to me but I'll defer to others about whether this is the correct strategy towards non-8-bit bytes.

Personally, I much prefer having code upstream that is testable rather than just a fixes upstream that can only be tested downstream (this has been a notable issue for CHERI where each upstream merge brings in new test failures), so I like having this upstream.

bool isBigEndian() const { return BigEndian; }

/// Returns the size of a byte, in bits.
unsigned getByteWidth() const { return ByteWidth; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsigned getByteWidth() const { return ByteWidth; }
unsigned getByteWidth() const { return 1 << ByteShift; }

Since it is a power-of-two, this will allow more optimized code (https://godbolt.org/z/5T86s84Ez)

class StructLayout final : public TrailingObjects<StructLayout, TypeSize> {
TypeSize StructSize;
Align StructAlignment;
unsigned ByteWidth;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less critical here since we are multiplying not dividing, but maybe always use a shift? In that case we could even use uint8_t to fit it into the padding.

// less conservative, they should have specified it explicitly in the data
// layout.
return Align(PowerOf2Ceil(BitWidth / 8));
return Align(PowerOf2Ceil(getTypeStoreSize(Ty).getFixedValue()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is actually a change? Isn't the store size for fp80 128 bits (or maybe 96)? That is not the same as the type size?

Copy link
Contributor Author

@s-barannikov s-barannikov Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The store size is 80 bits as long as 80 is a multiple of the byte width. If it is not, the division must be ceiling, and getTypeStoreSize does exactly that. This is one of the places where the calculation is wrong, but produces the correct result on targets with 8-bit bytes.

I'll revert this change as it is not strictly related. Thanks for spotting!

bool BigEndian = false;

/// The size of a byte in bits.
unsigned ByteWidth = 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should store this as a shift value instead of a width since it will allow Clang/GCC to elide divisions (see https://godbolt.org/z/5T86s84Ez case 3 optimizes to a right shift).
And it looks like this also works for divideCeil: https://godbolt.org/z/1oa18Gns7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it too, but decided to leave it as is until there is an evidence that it affects compilation time. I didn't measure yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forcing everyone to deal with a shift value is not very ergonomic, so to be clear I'm not suggesting using getByteShift(), but rather the approach of storing a shift value and then modifying getByteSize() to return 1u << shift.

This will allow using uint8_t to fit it into existing padding and more importantly allow the compiler to elide divisions after inlining.

@hongjia
Copy link

hongjia commented Aug 30, 2024

This patch adds byte specification to data layout string. The specification is b:<size>, where <size> is the size of a byte in bits (later referred to as "byte width").

Great work. Just a minor question: why not use character 'B' instead of 'b' for byte?

@s-barannikov
Copy link
Contributor Author

Just a minor question: why not use character 'B' instead of 'b' for byte?

Most of the specifiers are lowercase letters, no other reason.

@s-barannikov s-barannikov force-pushed the users/s-barannikov/byte/1-datalayout branch from 98c3676 to 31ee84f Compare October 27, 2024 00:56
@s-barannikov s-barannikov force-pushed the users/s-barannikov/byte/1-datalayout branch from 31ee84f to b46d554 Compare February 4, 2025 13:59
@s-barannikov s-barannikov force-pushed the users/s-barannikov/byte/1-datalayout branch from b46d554 to 4048990 Compare April 6, 2025 17:22
@s-barannikov s-barannikov force-pushed the users/s-barannikov/byte/1-datalayout branch from 4048990 to 2488060 Compare April 19, 2025 08:18
@s-barannikov s-barannikov force-pushed the users/s-barannikov/byte/1-datalayout branch from 2488060 to 3539d10 Compare July 11, 2025 16:25
This patch adds byte specification to data layout string.
The specification is `b:<size>`, where `<size>` is the size of a byte
in bits (later referred to as "byte width").

Limitations:
* The only values allowed for byte width are 8, 16, and 32.
16-bit bytes are popular, and my downstream target has 32-bit bytes.
These are the widths I'm going to add tests for in follow-up patches,
so this restriction only exists because other widths are untested.
* It is assumed that bytes are the same in all address spaces.
Supporting different byte widths in different address spaces would
require adding an address space argument to all DataLayout methods
that query ABI / preferred alignments because they return *byte*
alignments, and those will be different for different address spaces.
This is too much effort, but it can be done in the future if the need
arises, the specification reserves address space number before ':'.
@s-barannikov s-barannikov force-pushed the users/s-barannikov/byte/1-datalayout branch from 3539d10 to 670e94d Compare November 8, 2025 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants