-
Notifications
You must be signed in to change notification settings - Fork 7
Description
In multiple instances, we have encountered parsing bugs related to the fact that LLVM bitcode can encode certain constructs' operands in multiple ways:
- Update CST_CODE_STRING parsing for newer alternative format. #303 is an instance where
CST_CODE_STRINGcan encode its operands using a singleFieldArrayfield or multiple, non-FieldArrays fields. - Support parsing bitcode produced by Zig #302 (in particular, the
parseField: unable to parse record field 1 of record [...] (TYPE_CODE_FUNCTION)bit) is an instance whereTYPE_CODE_FUNCTIONcan encode its operands using a singleFieldArrayfield or multiple, non-FieldArrays fields depending on whether bitcode is produced from Clang or Zig.
Both of these issues ultimately stem from the way llvm-pretty-bc-parser handles FieldArrays. The upstream LLVM tooling abstracts over FieldArrays for the most part, and the LLVM code which parses bitcode generally does not have to distinguish FieldArray operands from non-FieldArray operands. llvm-pretty-bc-parser, on the other hand, does distinguish between them, which means that the low-level parsing code must explicitly check for the presence of FieldArrays in any construct that might possibly make use of them. This feels like a poor separation of concerns.
I propose that we reconsider our approach to parsing FieldArrays. In particular, I propose that we "flatten" records (in the style of flattenRecord) before parsing operands such that we normalize the representation of records to abstract over whether operands are encoded using arrays or not. (We might even consider having a separate data type from Field that does not contain FieldArray to avoid non-representable states.) As far as I can tell, LLVM's tooling is doing something similar.