spirv: fix binary encoding/decoding on big-endian hosts#3658
spirv: fix binary encoding/decoding on big-endian hosts#3658amaanq wants to merge 1 commit intoKhronosGroup:mainfrom
Conversation
There was a problem hiding this comment.
Thanks for the fix! Per my understanding, there should be some other reads that go not through decodeBinary, for example SPIRVModuleImpl::parseSPIRV parses header via I.read (per my understanding it doesn't consider endianness of the environment). So there the bit swap should be inserted as well, otherwise on a big-endian host, Header[0] == MagicNumber would compare 0x03022307 == 0x07230203, which fails. May be there are some other places.
Please also note windows build failure - it must be addressed before merging.
Some thinking out loud, may be it also a good idea to introduce a runtime flag to the translator to control endianness encoding/decoding to allow cross compilation, but not sure if anybody will find it useful.
The SPIR-V spec (Section 3.1) defines the binary format as little-endian, but the word encoder wrote `uint32_t` values in native byte order via `ostream::write()`, producing big-endian SPIR-V on ppc64/s390x. Downstream tools (SPIRV-Tools, mesa's `mesa_clc`) then failed to parse strings correctly, for example, "OpenCL.std" read back as "nepOs.LC" because `MakeString` extracts bytes assuming little-endian word layout. Byte-swap words in the encoder and all decoder paths on big-endian hosts so the binary on disk is always little-endian per spec. A new `readSPIRVWord()` helper centralizes this for raw word reads in `parseSPIRV`, `isSpirvBinary`, and the stream decoder.
2a39cfa to
380877a
Compare
And thanks for the quick feedback!
You would be right about that, I've gone ahead and introduced a
Fixed by adding a check for |
| /// Read a single SPIR-V word from a binary stream, byte-swapping from | ||
| /// little-endian on big-endian hosts. | ||
| inline uint32_t readSPIRVWord(std::istream &IS) { | ||
| uint32_t W; |
There was a problem hiding this comment.
Lets fix this while we are here as with this patch this may become critical parser vulnerability for attacker-controlled input as we start to use decodeBinary/readSPIRVWord to parse the header
| uint32_t W; | |
| uint32_t W = 0; |
| bool isSpirvBinary(const std::string &Img) { | ||
| if (Img.size() < sizeof(unsigned)) | ||
| return false; | ||
| const auto *Magic = reinterpret_cast<const unsigned *>(Img.data()); | ||
| return *Magic == MagicNumber; | ||
| std::istringstream IS(Img); | ||
| return readSPIRVWord(IS) == MagicNumber; | ||
| } |
There was a problem hiding this comment.
If I understand the code correctly, as isSpirvBinary is public API - Img can be the whole SPIR-V module passed in a library call. If I'm right, I'd like to avoid unnecessary copy of Img. I'd suggest to avoid it either by memcopying only 4 bytes or byte checking first 4 bytes byte-by-byte.
Problem
The SPIR-V spec (Section 3.1) defines the binary format as little-endian,
but the word encoder wrote
uint32_tvalues in native byte order viaostream::write(), producing big-endian SPIR-V on ppc64/s390x. Downstreamtools (SPIRV-Tools, mesa's
mesa_clc) then failed to parse stringscorrectly, for example, "OpenCL.std" read back as "nepOs.LC" because
MakeStringextracts bytes assuming little-endian word layout.
Solution
Byte-swap words in the encoder and all decoder paths on big-endian hosts
so the binary on disk is always little-endian per spec. A new
readSPIRVWord()helper centralizes this for raw word reads inparseSPIRV,isSpirvBinary, and the stream decoder.