Support parsing bitcode produced by Zig

The [Zig](https://ziglang.org/) compiler produces somewhat unusually shaped LLVM bitcode as compared to Clang. This issue aims to document what steps we would need to perform in order to support Zig-generated bitcode properly.

Throughout this issue, I will be using the bitcode that Zig generated from this program:

```zig
// test.zig
export fn add(a: i32, b: i32) i32 {
    return a + b;
}
```

Compiled like so:

```
$ zig version
0.16.0-dev.27+83f773fc6
$ zig build-lib -femit-llvm-bc -OReleaseFast test.zig
```

This produces a `test.bc` bitcode file. Here are the issues (in order) that I encountered when loading this bitcode file into `llvm-pretty-bc-parser`:

## `match failed` [...] `TYPE_BLOCK`

The first issue I ran into is:

```
> parseBitCodeFromFileWithWarnings "test.bc" >>= \x -> case x of Left err -> putStrLn (formatError err); Right _ -> pure ()
match failed
from:
	TYPE_BLOCK
	type symbol table
	MODULE_BLOCK
	Bitstream
```

This ultimately arises from how the type table is parsed here:

https://github.com/GaloisInc/llvm-pretty-bc-parser/blob/122aa18fc052db1adfdb5456d67223d4e4590499/src/Data/LLVM/BitCode/IR/Types.hs#L50-L51

Where `numEntry` is defined here:

https://github.com/GaloisInc/llvm-pretty-bc-parser/blob/122aa18fc052db1adfdb5456d67223d4e4590499/src/Data/LLVM/BitCode/IR/Types.hs#L25-L27

`llvm-pretty-bc-parser` expects `TYPE_CODE_NUMENTRY` to live in an unabbreviated record, but Zig's compiler happens to put `TYPE_CODE_NUMENTRY` in an abbreviated record instead. Fair enough, I suppose—I'm not sure why `llvm-pretty-bc-parser` is so picky here. The following (untested) patch appears to fix that issue:

```diff
diff --git a/src/Data/LLVM/BitCode/IR/Types.hs b/src/Data/LLVM/BitCode/IR/Types.hs
index ef564bd..a5be591 100644
--- a/src/Data/LLVM/BitCode/IR/Types.hs
+++ b/src/Data/LLVM/BitCode/IR/Types.hs
@@ -24,7 +24,7 @@ import           Data.Ord (comparing)

 -- | Pattern match the TYPE_CODE_NUMENTRY unabbreviated record.
 numEntry :: Match Entry Record
-numEntry  = hasRecordCode 1 <=< fromUnabbrev <=< unabbrev
+numEntry  = hasRecordCode 1 <=< fromEntry

 resolveTypeDecls :: Parse [TypeDecl]
 resolveTypeDecls  = do
```

## Unimplemented types

After applying the patch above, the next stumbling point is:

```
> parseBitCodeFromFileWithWarnings "test.bc" >>= \x -> case x of Left err -> putStrLn (formatError err); Right _ -> pure ()
not implemented
from:
	TYPE_CODE_BFLOAT
	TYPE_BLOCK
	type symbol table
	MODULE_BLOCK
	Bitstream
```

This happens because the bitcode file's type table contains an entry for `bfloat`s, even though the program itself never uses `bfloat`s directly. Quite odd.

In any case, this has been reported previously as https://github.com/GaloisInc/llvm-pretty-bc-parser/issues/214. Fixing that issue properly would require some API changes downstream in `llvm-pretty` first. In the pursuit of making progress, I applied a quick hack here:

```diff
@@ -194,7 +194,7 @@ parseTypeBlockEntry (fromEntry -> Just r) = case recordCode r of
     notImplemented

   23 -> label "TYPE_CODE_BFLOAT" $ do
-    notImplemented
+    noType

   24 -> label "TYPE_CODE_X86_AMX" $ do
     notImplemented
```

I also had to apply similar hacks to work around other unimplemented types, which have been reported in https://github.com/GaloisInc/llvm-pretty-bc-parser/issues/213 and https://github.com/GaloisInc/llvm-pretty-bc-parser/issues/215:

```diff
@@ -191,13 +191,13 @@ parseTypeBlockEntry (fromEntry -> Just r) = case recordCode r of
       []       -> fail "function expects a return type"

   22 -> label "TYPE_CODE_TOKEN" $ do
-    notImplemented
+    noType

   23 -> label "TYPE_CODE_BFLOAT" $ do
-    notImplemented
+    noType

   24 -> label "TYPE_CODE_X86_AMX" $ do
-    notImplemented
+    noType

   25 -> label "TYPE_CODE_OPAQUE_POINTER" $ do
     let field = parseField r
```

## `parseField: unable to parse record field 1 of record [...]` (`TYPE_CODE_FUNCTION`)

The next stumbling block is:

```
> parseBitCodeFromFileWithWarnings "test.bc" >>= \x -> case x of Left err -> putStrLn (formatError err); Right _ -> pure ()
parseField: unable to parse record field 1 of record Record {recordCode = 21, recordFields = [FieldFixed (BitString {bsLength = NumBits 1, bsData = 0}),FieldFixed (BitString {bsLength = NumBits 5, bsData = 17}),FieldArray [FieldFixed (BitString {bsLength = NumBits 5, bsData = 17}),FieldFixed (BitString {bsLength = NumBits 5, bsData = 17})]]}
from:
	parameters
	TYPE_CODE_FUNCTION
	TYPE_BLOCK
	type symbol table
	MODULE_BLOCK
	Bitstream
```

What is going on here? This ultimately arises from how `llvm-pretty-bc-parser` parses `TYPE_CODE_FUNCTION` records (i.e., function types):

https://github.com/GaloisInc/llvm-pretty-bc-parser/blob/122aa18fc052db1adfdb5456d67223d4e4590499/src/Data/LLVM/BitCode/IR/Types.hs#L184-L191

Specifically, `llvm-pretty-bc-parser` expects the convention that the record will have two fields:

* A `FieldFixed` at index `0` containing the `vararg` information
* A `FieldArray` at index `1` containing the function's result and argument types (`rty` and `ptys`, respectively)

Zig, on the other hand, does it slightly differently. It has a `TYPE_CODE_FUNCTION` record with the following fields:

* A `FieldFixed` at index `0` containing the `vararg` information
* A `FieldFixed` at index `1` containing the return type (what is called `rty` in the code above)
* A `FieldArray` at index `2` containing the argument types (what is called `ptys` in the code above)

This is just different enough to confuse `llvm-pretty-bc-parser`. Interestingly, the [official LLVM bitcode specification's documentation for `TYPE_CODE_RECORD`](https://llvm.org/docs/BitCodeFormat.html#type-code-function-record) suggests that the latter convention is closer to how it is supposed to work, although in practice LLVM appears to accept either convention. (For whatever reason, Clang itself always uses the former convention, which is most likely why `llvm-pretty-bc-parser`'s code was designed with the former convention in mind.)

Given that LLVM accepts either convention, we should make `llvm-pretty-bc-parser` follow suit. I am unclear how much work this would require, however. Here is a very quick-and-dirty hack to make things work with Zig-generated bitcode (but not with Clang-generated bitcode):

```diff
@@ -185,19 +185,18 @@ parseTypeBlockEntry (fromEntry -> Just r) = case recordCode r of
   21 -> label "TYPE_CODE_FUNCTION" $ do
     let field = parseField r
     vararg <- label "vararg"     (field 0 boolean)
-    tys    <- label "parameters" (field 1 (fieldArray typeRef))
-    case tys of
-      rty:ptys -> addType (FunTy rty ptys vararg)
-      []       -> fail "function expects a return type"
+    rty    <- label "result"     (field 1 typeRef)
+    ptys   <- label "parameters" (field 2 (fieldArray typeRef))
+    addType (FunTy rty ptys vararg)

   22 -> label "TYPE_CODE_TOKEN" $ do
```

(I have also opened https://github.com/GaloisInc/llvm-pretty-bc-parser/issues/304 about the subject of how best to handle `FieldArray`s.)

## `parseSlice: unable to parse record field 1 of record [...]` (`value symbol table`)

The next stumbling block is:

```
parseSlice: unable to parse record field 1 of record Record {recordCode = 2, recordFields = [FieldArray [FieldFixed (BitString {bsLength = NumBits 8, bsData = 120}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 56}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 54}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 95}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 54}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 52}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 45}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 117}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 110}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 107}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 110}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 111}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 119}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 110}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 45}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 108}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 105}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 110}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 117}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 120}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 54}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 46}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 56}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 46}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 48}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 45}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 103}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 110}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 117}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 50}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 46}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 51}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 57}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 46}),FieldFixed (BitString {bsLength = NumBits 8, bsData = 48})]]}
from:
	value symbol table
	MODULE_BLOCK
	Bitstream
```

I haven't gotten to the bottom of this yet, but I wonder if this is due to yet another place in the code that assumes that `FieldArray`s can't happen in places where they actually can.

-----

For the sake of completeness, here is a full diff for all the hacks that I have used up to this point:

```diff
diff --git a/src/Data/LLVM/BitCode/IR/Types.hs b/src/Data/LLVM/BitCode/IR/Types.hs
index ef564bd..916b180 100644
--- a/src/Data/LLVM/BitCode/IR/Types.hs
+++ b/src/Data/LLVM/BitCode/IR/Types.hs
@@ -24,7 +24,7 @@ import           Data.Ord (comparing)

 -- | Pattern match the TYPE_CODE_NUMENTRY unabbreviated record.
 numEntry :: Match Entry Record
-numEntry  = hasRecordCode 1 <=< fromUnabbrev <=< unabbrev
+numEntry  = hasRecordCode 1 <=< fromEntry

 resolveTypeDecls :: Parse [TypeDecl]
 resolveTypeDecls  = do
@@ -185,19 +185,18 @@ parseTypeBlockEntry (fromEntry -> Just r) = case recordCode r of
   21 -> label "TYPE_CODE_FUNCTION" $ do
     let field = parseField r
     vararg <- label "vararg"     (field 0 boolean)
-    tys    <- label "parameters" (field 1 (fieldArray typeRef))
-    case tys of
-      rty:ptys -> addType (FunTy rty ptys vararg)
-      []       -> fail "function expects a return type"
+    rty    <- label "result"     (field 1 typeRef)
+    ptys   <- label "parameters" (field 2 (fieldArray typeRef))
+    addType (FunTy rty ptys vararg)

   22 -> label "TYPE_CODE_TOKEN" $ do
-    notImplemented
+    noType

   23 -> label "TYPE_CODE_BFLOAT" $ do
-    notImplemented
+    noType

   24 -> label "TYPE_CODE_X86_AMX" $ do
-    notImplemented
+    noType

   25 -> label "TYPE_CODE_OPAQUE_POINTER" $ do
     let field = parseField r
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support parsing bitcode produced by Zig #302

`match failed` [...] `TYPE_BLOCK`

Unimplemented types

`parseField: unable to parse record field 1 of record [...]` (`TYPE_CODE_FUNCTION`)

`parseSlice: unable to parse record field 1 of record [...]` (`value symbol table`)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	-- drop everything until we hit TYPE_CODE_NUMENTRY
	(r,ents) <- match (dropUntil numEntry) es

	-- \| Pattern match the TYPE_CODE_NUMENTRY unabbreviated record.
	numEntry :: Match Entry Record
	numEntry = hasRecordCode 1 <=< fromUnabbrev <=< unabbrev

	-- [vararg, [retty, paramty x N]]
	21 -> label "TYPE_CODE_FUNCTION" $ do
	let field = parseField r
	vararg <- label "vararg" (field 0 boolean)
	tys <- label "parameters" (field 1 (fieldArray typeRef))
	case tys of
	rty:ptys -> addType (FunTy rty ptys vararg)
	[] -> fail "function expects a return type"

Support parsing bitcode produced by Zig #302

Description

match failed [...] TYPE_BLOCK

Unimplemented types

parseField: unable to parse record field 1 of record [...] (TYPE_CODE_FUNCTION)

parseSlice: unable to parse record field 1 of record [...] (value symbol table)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`match failed` [...] `TYPE_BLOCK`

`parseField: unable to parse record field 1 of record [...]` (`TYPE_CODE_FUNCTION`)

`parseSlice: unable to parse record field 1 of record [...]` (`value symbol table`)