-
Notifications
You must be signed in to change notification settings - Fork 7
Description
The Zig compiler produces somewhat unusually shaped LLVM bitcode as compared to Clang. This issue aims to document what steps we would need to perform in order to support Zig-generated bitcode properly.
Throughout this issue, I will be using the bitcode that Zig generated from this program:
// test.zig
export fn add(a: i32, b: i32) i32 {
return a + b;
}Compiled like so:
$ zig version
0.16.0-dev.27+83f773fc6
$ zig build-lib -femit-llvm-bc -OReleaseFast test.zig
This produces a test.bc bitcode file. Here are the issues (in order) that I encountered when loading this bitcode file into llvm-pretty-bc-parser:
match failed [...] TYPE_BLOCK
The first issue I ran into is:
> parseBitCodeFromFileWithWarnings "test.bc" >>= \x -> case x of Left err -> putStrLn (formatError err); Right _ -> pure ()
match failed
from:
TYPE_BLOCK
type symbol table
MODULE_BLOCK
Bitstream
This ultimately arises from how the type table is parsed here:
llvm-pretty-bc-parser/src/Data/LLVM/BitCode/IR/Types.hs
Lines 50 to 51 in 122aa18
| -- drop everything until we hit TYPE_CODE_NUMENTRY | |
| (r,ents) <- match (dropUntil numEntry) es |
Where numEntry is defined here:
llvm-pretty-bc-parser/src/Data/LLVM/BitCode/IR/Types.hs
Lines 25 to 27 in 122aa18
| -- | Pattern match the TYPE_CODE_NUMENTRY unabbreviated record. | |
| numEntry :: Match Entry Record | |
| numEntry = hasRecordCode 1 <=< fromUnabbrev <=< unabbrev |
llvm-pretty-bc-parser expects TYPE_CODE_NUMENTRY to live in an unabbreviated record, but Zig's compiler happens to put TYPE_CODE_NUMENTRY in an abbreviated record instead. Fair enough, I suppose—I'm not sure why llvm-pretty-bc-parser is so picky here. The following (untested) patch appears to fix that issue:
diff --git a/src/Data/LLVM/BitCode/IR/Types.hs b/src/Data/LLVM/BitCode/IR/Types.hs
index ef564bd..a5be591 100644
--- a/src/Data/LLVM/BitCode/IR/Types.hs
+++ b/src/Data/LLVM/BitCode/IR/Types.hs
@@ -24,7 +24,7 @@ import Data.Ord (comparing)
-- | Pattern match the TYPE_CODE_NUMENTRY unabbreviated record.
numEntry :: Match Entry Record
-numEntry = hasRecordCode 1 <=< fromUnabbrev <=< unabbrev
+numEntry = hasRecordCode 1 <=< fromEntry
resolveTypeDecls :: Parse [TypeDecl]
resolveTypeDecls = doUnimplemented types
After applying the patch above, the next stumbling point is:
> parseBitCodeFromFileWithWarnings "test.bc" >>= \x -> case x of Left err -> putStrLn (formatError err); Right _ -> pure ()
not implemented
from:
TYPE_CODE_BFLOAT
TYPE_BLOCK
type symbol table
MODULE_BLOCK
Bitstream
This happens because the bitcode file's type table contains an entry for bfloats, even though the program itself never uses bfloats directly. Quite odd.
In any case, this has been reported previously as #214. Fixing that issue properly would require some API changes downstream in llvm-pretty first. In the pursuit of making progress, I applied a quick hack here:
@@ -194,7 +194,7 @@ parseTypeBlockEntry (fromEntry -> Just r) = case recordCode r of
notImplemented
23 -> label "TYPE_CODE_BFLOAT" $ do
- notImplemented
+ noType
24 -> label "TYPE_CODE_X86_AMX" $ do
notImplementedI also had to apply similar hacks to work around other unimplemented types, which have been reported in #213 and #215:
@@ -191,13 +191,13 @@ parseTypeBlockEntry (fromEntry -> Just r) = case recordCode r of
[] -> fail "function expects a return type"
22 -> label "TYPE_CODE_TOKEN" $ do
- notImplemented
+ noType
23 -> label "TYPE_CODE_BFLOAT" $ do
- notImplemented
+ noType
24 -> label "TYPE_CODE_X86_AMX" $ do
- notImplemented
+ noType
25 -> label "TYPE_CODE_OPAQUE_POINTER" $ do
let field = parseField rparseField: unable to parse record field 1 of record [...] (TYPE_CODE_FUNCTION)
The next stumbling block is:
> parseBitCodeFromFileWithWarnings "test.bc" >>= \x -> case x of Left err -> putStrLn (formatError err); Right _ -> pure ()
parseField: unable to parse record field 1 of record Record {recordCode = 21, recordFields = [FieldFixed (BitString {bsLength = NumBits 1, bsData = 0}),FieldFixed (BitString {bsLength = NumBits 5, bsData = 17}),FieldArray [FieldFixed (BitString {bsLength = NumBits 5, bsData = 17}),FieldFixed (BitString {bsLength = NumBits 5, bsData = 17})]]}
from:
parameters
TYPE_CODE_FUNCTION
TYPE_BLOCK
type symbol table
MODULE_BLOCK
Bitstream
What is going on here? This ultimately arises from how llvm-pretty-bc-parser parses TYPE_CODE_FUNCTION records (i.e., function types):
llvm-pretty-bc-parser/src/Data/LLVM/BitCode/IR/Types.hs
Lines 184 to 191 in 122aa18
| -- [vararg, [retty, paramty x N]] | |
| 21 -> label "TYPE_CODE_FUNCTION" $ do | |
| let field = parseField r | |
| vararg <- label "vararg" (field 0 boolean) | |
| tys <- label "parameters" (field 1 (fieldArray typeRef)) | |
| case tys of | |
| rty:ptys -> addType (FunTy rty ptys vararg) | |
| [] -> fail "function expects a return type" |
Specifically, llvm-pretty-bc-parser expects the convention that the record will have two fields:
- A
FieldFixedat index0containing thevararginformation - A
FieldArrayat index1containing the function's result and argument types (rtyandptys, respectively)
Zig, on the other hand, does it slightly differently. It has a TYPE_CODE_FUNCTION record with the following fields:
- A
FieldFixedat index0containing thevararginformation - A
FieldFixedat index1containing the return type (what is calledrtyin the code above) - A
FieldArrayat index2containing the argument types (what is calledptysin the code above)
This is just different enough to confuse llvm-pretty-bc-parser. Interestingly, the official LLVM bitcode specification's documentation for TYPE_CODE_RECORD suggests that the latter convention is closer to how it is supposed to work, although in practice LLVM appears to accept either convention. (For whatever reason, Clang itself always uses the former convention, which is most likely why llvm-pretty-bc-parser's code was designed with the former convention in mind.)
Given that LLVM accepts either convention, we should make llvm-pretty-bc-parser follow suit. I am unclear how much work this would require, however. Here is a very quick-and-dirty hack to make things work with Zig-generated bitcode (but not with Clang-generated bitcode):
@@ -185,19 +185,18 @@ parseTypeBlockEntry (fromEntry -> Just r) = case recordCode r of
21 -> label "TYPE_CODE_FUNCTION" $ do
let field = parseField r
vararg <- label "vararg" (field 0 boolean)
- tys <- label "parameters" (field 1 (fieldArray typeRef))
- case tys of
- rty:ptys -> addType (FunTy rty ptys vararg)
- [] -> fail "function expects a return type"
+ rty <- label "result" (field 1 typeRef)
+ ptys <- label "parameters" (field 2 (fieldArray typeRef))
+ addType (FunTy rty ptys vararg)
22 -> label "TYPE_CODE_TOKEN" $ do(I have also opened #304 about the subject of how best to handle FieldArrays.)
parseSlice: unable to parse record field 1 of record [...] (value symbol table)
The next stumbling block is:
parseSlice: unable to parse record field 1 of record Record {recordCode = 2, recordFields = [FieldArray [FieldFixed (BitString {bsLength = NumBits 8, bsData = 120}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 56}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 54}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 95}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 54}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 52}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 45}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 117}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 110}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 107}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 110}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 111}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 119}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 110}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 45}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 108}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 105}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 110}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 117}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 120}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 54}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 46}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 56}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 46}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 48}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 45}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 103}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 110}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 117}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 50}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 46}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 51}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 57}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 46}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 48})]]}
from:
value symbol table
MODULE_BLOCK
Bitstream
I haven't gotten to the bottom of this yet, but I wonder if this is due to yet another place in the code that assumes that FieldArrays can't happen in places where they actually can.
For the sake of completeness, here is a full diff for all the hacks that I have used up to this point:
diff --git a/src/Data/LLVM/BitCode/IR/Types.hs b/src/Data/LLVM/BitCode/IR/Types.hs
index ef564bd..916b180 100644
--- a/src/Data/LLVM/BitCode/IR/Types.hs
+++ b/src/Data/LLVM/BitCode/IR/Types.hs
@@ -24,7 +24,7 @@ import Data.Ord (comparing)
-- | Pattern match the TYPE_CODE_NUMENTRY unabbreviated record.
numEntry :: Match Entry Record
-numEntry = hasRecordCode 1 <=< fromUnabbrev <=< unabbrev
+numEntry = hasRecordCode 1 <=< fromEntry
resolveTypeDecls :: Parse [TypeDecl]
resolveTypeDecls = do
@@ -185,19 +185,18 @@ parseTypeBlockEntry (fromEntry -> Just r) = case recordCode r of
21 -> label "TYPE_CODE_FUNCTION" $ do
let field = parseField r
vararg <- label "vararg" (field 0 boolean)
- tys <- label "parameters" (field 1 (fieldArray typeRef))
- case tys of
- rty:ptys -> addType (FunTy rty ptys vararg)
- [] -> fail "function expects a return type"
+ rty <- label "result" (field 1 typeRef)
+ ptys <- label "parameters" (field 2 (fieldArray typeRef))
+ addType (FunTy rty ptys vararg)
22 -> label "TYPE_CODE_TOKEN" $ do
- notImplemented
+ noType
23 -> label "TYPE_CODE_BFLOAT" $ do
- notImplemented
+ noType
24 -> label "TYPE_CODE_X86_AMX" $ do
- notImplemented
+ noType
25 -> label "TYPE_CODE_OPAQUE_POINTER" $ do
let field = parseField r