[DataLayout] Add byte specification #106536

s-barannikov · 2024-08-29T11:50:59Z

This patch adds byte specification to data layout string.
The specification is b:<size>, where <size> is the size of a byte
in bits (later referred to as "byte width").

Limitations:

The only values allowed for byte width are 8, 16, and 32.
16-bit bytes are popular, and my downstream target has 32-bit bytes.
These are the widths I'm going to add tests for in follow-up patches,
so this restriction only exists because other widths are untested.
It is assumed that bytes are the same in all address spaces.
Supporting different byte widths in different address spaces would
require adding an address space argument to all DataLayout methods
that query ABI / preferred alignments because they return byte
alignments, and those will be different for different address spaces.
This is too much effort, but it can be done in the future if the need
arises, the specification reserves address space number before ':'.

There were many RFCs for supporting unusual byte widths,
but I'll start one more discourse thread as suggested by @nikic below.

s-barannikov · 2024-08-29T11:51:15Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

nikic

This change looks reasonable to me. Please start a fresh RFC thread for it.

s-barannikov · 2024-08-29T12:33:07Z

This change looks reasonable to me. Please start a fresh RFC thread for it.

There is a thread I started two years ago specifically for discussing the support in data layout:
https://discourse.llvm.org/t/byte-width-specification-in-data-layout-string/62202
It provides more context and is not overloaded with comments. I'll try to revive it instead of cloning.

nikic · 2024-08-29T12:40:40Z

This change looks reasonable to me. Please start a fresh RFC thread for it.

There is a thread I started two years ago specifically for discussing the support in data layout: https://discourse.llvm.org/t/byte-width-specification-in-data-layout-string/62202 It provides more context and is not overloaded with comments. I'll try to revive it instead of cloning.

I don't think this is going to be fruitful. To move forward with any part of this, there needs to be a strong, high-level consensus to move forward with middle-end support for non-8-bit bytes. Your RFC should outline, at least in broad strokes, all the changes that will be needed to support this (and what will / won't be supported and why) and not focus on just details of DataLayout.

s-barannikov · 2024-08-29T12:52:14Z

This change looks reasonable to me. Please start a fresh RFC thread for it.

There is a thread I started two years ago specifically for discussing the support in data layout: https://discourse.llvm.org/t/byte-width-specification-in-data-layout-string/62202 It provides more context and is not overloaded with comments. I'll try to revive it instead of cloning.

I don't think this is going to be fruitful. To move forward with any part of this, there needs to be a strong, high-level consensus to move forward with middle-end support for non-8-bit bytes. Your RFC should outline, at least in broad strokes, all the changes that will be needed to support this (and what will / won't be supported and why) and not focus on just details of DataLayout.

OK, sure. I'll try to have it prepared by the end of the next week.

bjope · 2024-08-29T14:09:30Z

(Just some comments for now, haven't really reviewed the code.)

Thanks! This patch may be valuable for people who wants to support other byte-sizes downstream. Even if it never gets merged.

My idea here https://discourse.llvm.org/t/rfc-on-non-8-bit-bytes-and-the-target-for-it/53455/44?u=bjope was to try to add patches "bottom-up" rather than "top-down".
Such as trying to refactor/fix code in a way that it either would magically work for other byte sizes (sometimes with just a little bit of adaption), or make it easy to spot code that really assume 8-bit bytes. This to reduce amount of places that would need to be touch for people who use different byte sizes down stream.
But the idea was not to support changing the byte size upstream. Officially LLVM would still only support 8 bit bytes (at least for now).

I kind of fear that we would hit dead-end again (as any previous attempt), if we open up pandoras box like this. People will say things like "Hey, now when there is a way to specify byte width in the DataLayout, then we need lots of tests to guarantee that it works all over the place. Or how do we describe what the limitations are? And what is a byte really? And do I need to think about this whenever I add a new pass or optimization?".

Maybe one could need a top-level DataLaytout::getByteWidth() { return 8; }, just to let us downstream maintainers annotate/assert (in upstream code) that a function has been identified as not supporting other byte sizes.

I don't know if that makes sense. Nor if it actually would be more feasible doing it in such a "bottom-up" way, compared to starting "top-down" with adding support to set the byte-size without really supporting it throughout LLVM.

If you read this far you probably wonder how we would test that any refactoring in SimplifyLibCalls, or isBytewiseValue, or DAGCombiner, etc that would aim at making life simpler for non-8-bit-bytes. I don't know really. Obviously we rely on this being tested downstream today. So it wouldn't be much different? Although in for example SimplifyLibCalls lots of things should map to C and be described as char rather than i8. So there should really be a "char size" member somewhere (just like one can change CharUnit in clang). And then maybe lit test might use some option to set that when testing SimplifyLibCalls pass? I don't know really, but maybe we can find ways around it without having official support for changing byte size in DataLayout.

(With all that said, it wouldn't mind getting support in-tree for 16-bit bytes. Just being afraid that if we aim too high, then we end up with nothing, just like all previous RFC-discussion about byte size....)

nikic · 2024-08-29T14:31:24Z

@bjope FWIW, my perspective here is kind of diametrically opposed: I am happy to accept non-8-bit-byte type support upstream, as long as it is actually a properly supported, testable configuration. But I would not accept patches to "make support easier" for a configuration that is not officially supported/tested.

(A corollary of this is that I am also only open to introducing this support in the middle-end, where we can cleanly test this using non-standard data layouts.)

s-barannikov · 2024-08-29T15:06:24Z

@bjope
Thank you for your comment, it raises some important questions. If you don't mind, I'll quote it and reply to it in the future RFC so that we have the discussion going in one place.

arichardson

In general this looks fine to me but I'll defer to others about whether this is the correct strategy towards non-8-bit bytes.

Personally, I much prefer having code upstream that is testable rather than just a fixes upstream that can only be tested downstream (this has been a notable issue for CHERI where each upstream merge brings in new test failures), so I like having this upstream.

arichardson · 2024-08-29T17:53:30Z

llvm/include/llvm/IR/DataLayout.h

  bool isBigEndian() const { return BigEndian; }

+  /// Returns the size of a byte, in bits.
+  unsigned getByteWidth() const { return ByteWidth; }


Suggested change

unsigned getByteWidth() const { return ByteWidth; }

unsigned getByteWidth() const { return 1 << ByteShift; }

Since it is a power-of-two, this will allow more optimized code (https://godbolt.org/z/5T86s84Ez)

arichardson · 2024-08-29T17:56:29Z

llvm/include/llvm/IR/DataLayout.h

 class StructLayout final : public TrailingObjects<StructLayout, TypeSize> {
  TypeSize StructSize;
  Align StructAlignment;
+  unsigned ByteWidth;


Less critical here since we are multiplying not dividing, but maybe always use a shift? In that case we could even use uint8_t to fit it into the padding.

arichardson · 2024-08-29T17:58:57Z

llvm/lib/IR/DataLayout.cpp

    // less conservative, they should have specified it explicitly in the data
    // layout.
-    return Align(PowerOf2Ceil(BitWidth / 8));
+    return Align(PowerOf2Ceil(getTypeStoreSize(Ty).getFixedValue()));


I think this is actually a change? Isn't the store size for fp80 128 bits (or maybe 96)? That is not the same as the type size?

The store size is 80 bits as long as 80 is a multiple of the byte width. If it is not, the division must be ceiling, and getTypeStoreSize does exactly that. This is one of the places where the calculation is wrong, but produces the correct result on targets with 8-bit bytes.

I'll revert this change as it is not strictly related. Thanks for spotting!

arichardson · 2024-08-29T18:11:07Z

llvm/include/llvm/IR/DataLayout.h

  bool BigEndian = false;

+  /// The size of a byte in bits.
+  unsigned ByteWidth = 8;


I think we should store this as a shift value instead of a width since it will allow Clang/GCC to elide divisions (see https://godbolt.org/z/5T86s84Ez case 3 optimizes to a right shift).
And it looks like this also works for divideCeil: https://godbolt.org/z/1oa18Gns7

I thought about it too, but decided to leave it as is until there is an evidence that it affects compilation time. I didn't measure yet.

Forcing everyone to deal with a shift value is not very ergonomic, so to be clear I'm not suggesting using getByteShift(), but rather the approach of storing a shift value and then modifying getByteSize() to return 1u << shift.

This will allow using uint8_t to fit it into existing padding and more importantly allow the compiler to elide divisions after inlining.

llvm/include/llvm/IR/DataLayout.h

hongjia · 2024-08-30T02:17:07Z

This patch adds byte specification to data layout string. The specification is b:<size>, where <size> is the size of a byte in bits (later referred to as "byte width").

Great work. Just a minor question: why not use character 'B' instead of 'b' for byte?

s-barannikov · 2024-08-30T08:27:42Z

Just a minor question: why not use character 'B' instead of 'b' for byte?

Most of the specifiers are lowercase letters, no other reason.

This patch adds byte specification to data layout string. The specification is `b:<size>`, where `<size>` is the size of a byte in bits (later referred to as "byte width"). Limitations: * The only values allowed for byte width are 8, 16, and 32. 16-bit bytes are popular, and my downstream target has 32-bit bytes. These are the widths I'm going to add tests for in follow-up patches, so this restriction only exists because other widths are untested. * It is assumed that bytes are the same in all address spaces. Supporting different byte widths in different address spaces would require adding an address space argument to all DataLayout methods that query ABI / preferred alignments because they return *byte* alignments, and those will be different for different address spaces. This is too much effort, but it can be done in the future if the need arises, the specification reserves address space number before ':'.

s-barannikov requested review from arichardson, arsenm, bjope and nikic August 29, 2024 11:54

nikic reviewed Aug 29, 2024

View reviewed changes

arichardson reviewed Aug 29, 2024

View reviewed changes

s-barannikov force-pushed the users/s-barannikov/byte/1-datalayout branch from 98c3676 to 31ee84f Compare October 27, 2024 00:56

s-barannikov force-pushed the users/s-barannikov/byte/1-datalayout branch from 31ee84f to b46d554 Compare February 4, 2025 13:59

s-barannikov force-pushed the users/s-barannikov/byte/1-datalayout branch from b46d554 to 4048990 Compare April 6, 2025 17:22

s-barannikov force-pushed the users/s-barannikov/byte/1-datalayout branch from 4048990 to 2488060 Compare April 19, 2025 08:18

s-barannikov force-pushed the users/s-barannikov/byte/1-datalayout branch from 2488060 to 3539d10 Compare July 11, 2025 16:25

s-barannikov force-pushed the users/s-barannikov/byte/1-datalayout branch from 3539d10 to 670e94d Compare November 8, 2025 09:11

	unsigned getByteWidth() const { return ByteWidth; }
	unsigned getByteWidth() const { return 1 << ByteShift; }

[DataLayout] Add byte specification #106536

Are you sure you want to change the base?

[DataLayout] Add byte specification #106536

Uh oh!

Conversation

s-barannikov commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

s-barannikov commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

s-barannikov commented Aug 29, 2024

Uh oh!

nikic commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

s-barannikov commented Aug 29, 2024

Uh oh!

bjope commented Aug 29, 2024

Uh oh!

nikic commented Aug 29, 2024

Uh oh!

s-barannikov commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arichardson left a comment

Choose a reason for hiding this comment

Uh oh!

arichardson Aug 29, 2024

Choose a reason for hiding this comment

Uh oh!

arichardson Aug 29, 2024

Choose a reason for hiding this comment

Uh oh!

arichardson Aug 29, 2024

Choose a reason for hiding this comment

Uh oh!

s-barannikov Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arichardson Aug 29, 2024

Choose a reason for hiding this comment

Uh oh!

s-barannikov Aug 29, 2024

Choose a reason for hiding this comment

Uh oh!

arichardson Aug 29, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hongjia commented Aug 30, 2024

Uh oh!

s-barannikov commented Aug 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

s-barannikov commented Aug 29, 2024 •

edited

Loading

s-barannikov commented Aug 29, 2024 •

edited

Loading

nikic commented Aug 29, 2024 •

edited

Loading

s-barannikov commented Aug 29, 2024 •

edited

Loading

s-barannikov Aug 29, 2024 •

edited

Loading