Skip to content

spirv: fix binary encoding/decoding on big-endian hosts#3658

Open
amaanq wants to merge 1 commit intoKhronosGroup:mainfrom
amaanq:fix-big-endian-spirv
Open

spirv: fix binary encoding/decoding on big-endian hosts#3658
amaanq wants to merge 1 commit intoKhronosGroup:mainfrom
amaanq:fix-big-endian-spirv

Conversation

@amaanq
Copy link

@amaanq amaanq commented Mar 17, 2026

Problem

The SPIR-V spec (Section 3.1) defines the binary format as little-endian,
but the word encoder wrote uint32_t values in native byte order via
ostream::write(), producing big-endian SPIR-V on ppc64/s390x. Downstream
tools (SPIRV-Tools, mesa's mesa_clc) then failed to parse strings
correctly, for example, "OpenCL.std" read back as "nepOs.LC" because MakeString
extracts bytes assuming little-endian word layout.

Solution

Byte-swap words in the encoder and all decoder paths on big-endian hosts
so the binary on disk is always little-endian per spec. A new
readSPIRVWord() helper centralizes this for raw word reads in
parseSPIRV, isSpirvBinary, and the stream decoder.

@CLAassistant
Copy link

CLAassistant commented Mar 17, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@MrSidims MrSidims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! Per my understanding, there should be some other reads that go not through decodeBinary, for example SPIRVModuleImpl::parseSPIRV parses header via I.read (per my understanding it doesn't consider endianness of the environment). So there the bit swap should be inserted as well, otherwise on a big-endian host, Header[0] == MagicNumber would compare 0x03022307 == 0x07230203, which fails. May be there are some other places.

Please also note windows build failure - it must be addressed before merging.

Some thinking out loud, may be it also a good idea to introduce a runtime flag to the translator to control endianness encoding/decoding to allow cross compilation, but not sure if anybody will find it useful.

@MrSidims MrSidims requested review from svenvh and vmaksimo March 17, 2026 13:42
The SPIR-V spec (Section 3.1) defines the binary format as little-endian,
but the word encoder wrote `uint32_t` values in native byte order via
`ostream::write()`, producing big-endian SPIR-V on ppc64/s390x. Downstream
tools (SPIRV-Tools, mesa's `mesa_clc`) then failed to parse strings
correctly, for example, "OpenCL.std" read back as "nepOs.LC" because `MakeString`
extracts bytes assuming little-endian word layout.

Byte-swap words in the encoder and all decoder paths on big-endian hosts
so the binary on disk is always little-endian per spec. A new
`readSPIRVWord()` helper centralizes this for raw word reads in
`parseSPIRV`, `isSpirvBinary`, and the stream decoder.
@amaanq amaanq force-pushed the fix-big-endian-spirv branch from 2a39cfa to 380877a Compare March 18, 2026 01:24
@amaanq
Copy link
Author

amaanq commented Mar 18, 2026

Thanks for the fix!

And thanks for the quick feedback!

Per my understanding, there should be some other reads that go not through decodeBinary, for example SPIRVModuleImpl::parseSPIRV parses header via I.read (per my understanding it doesn't consider endianness of the environment). So there the bit swap should be inserted as well, otherwise on a big-endian host, Header[0] == MagicNumber would compare 0x03022307 == 0x07230203, which fails. May be there are some other places.

You would be right about that, I've gone ahead and introduced a readSPIRVWord() helper in SPIRVStream.h that centralizes the byte-swap for all raw word reads. I updated former calls to I.read with this now.

Please also note windows build failure - it must be addressed before merging.

Fixed by adding a check for __BYTE_ORDER__ being defined, there don't appear to be any BE windows targets.

Copy link
Contributor

@MrSidims MrSidims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to avoid force pushes to PR, it might complicate review process.

Overall the patch looks reasonable, there is 1 performance and 1 security concern from me. But either way I'd like to have @svenvh and @vmaksimo to take a look at the PR from maintainers perspective.

/// Read a single SPIR-V word from a binary stream, byte-swapping from
/// little-endian on big-endian hosts.
inline uint32_t readSPIRVWord(std::istream &IS) {
uint32_t W;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets fix this while we are here as with this patch this may become critical parser vulnerability for attacker-controlled input as we start to use decodeBinary/readSPIRVWord to parse the header

Suggested change
uint32_t W;
uint32_t W = 0;

Comment on lines 2784 to 2789
bool isSpirvBinary(const std::string &Img) {
if (Img.size() < sizeof(unsigned))
return false;
const auto *Magic = reinterpret_cast<const unsigned *>(Img.data());
return *Magic == MagicNumber;
std::istringstream IS(Img);
return readSPIRVWord(IS) == MagicNumber;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the code correctly, as isSpirvBinary is public API - Img can be the whole SPIR-V module passed in a library call. If I'm right, I'd like to avoid unnecessary copy of Img. I'd suggest to avoid it either by memcopying only 4 bytes or byte checking first 4 bytes byte-by-byte.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants