Abstract away `FieldArray`s when parsing

In multiple instances, we have encountered parsing bugs related to the fact that LLVM bitcode can encode certain constructs' operands in multiple ways:

* https://github.com/GaloisInc/llvm-pretty-bc-parser/pull/303 is an instance where `CST_CODE_STRING` can encode its operands using a single `FieldArray` field or multiple, non-`FieldArray`s fields.
* https://github.com/GaloisInc/llvm-pretty-bc-parser/issues/302 (in particular, the `parseField: unable to parse record field 1 of record [...] (TYPE_CODE_FUNCTION)` bit) is an instance where `TYPE_CODE_FUNCTION` can encode its operands using a single `FieldArray` field or multiple, non-`FieldArray`s fields depending on whether bitcode is produced from Clang or Zig.

Both of these issues ultimately stem from the way `llvm-pretty-bc-parser` handles `FieldArray`s. The upstream LLVM tooling abstracts over `FieldArray`s for the most part, and the [LLVM code which parses bitcode](https://github.com/llvm/llvm-project/blob/0b6df5485ef77e76fcb09a349b5e1c39d926de5f/llvm/lib/Bitcode/Reader/BitcodeReader.cpp) generally does not have to distinguish `FieldArray` operands from non-`FieldArray` operands. `llvm-pretty-bc-parser`, on the other hand, _does_ distinguish between them, which means that the low-level parsing code must explicitly check for the presence of `FieldArray`s in any construct that might possibly make use of them. This feels like a poor separation of concerns.

I propose that we reconsider our approach to parsing `FieldArray`s. In particular, I propose that we "flatten" records (in the style of [`flattenRecord`](https://github.com/GaloisInc/llvm-pretty-bc-parser/blob/122aa18fc052db1adfdb5456d67223d4e4590499/src/Data/LLVM/BitCode/Record.hs#L89-L94)) before parsing operands such that we normalize the representation of records to abstract over whether operands are encoded using arrays or not. (We might even consider having a separate data type from `Field` that does not contain `FieldArray` to avoid non-representable states.) As far as I can tell, LLVM's tooling is doing something similar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abstract away `FieldArray`s when parsing #304

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Abstract away FieldArrays when parsing #304

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Abstract away `FieldArray`s when parsing #304