-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Macros should be supported in both AT&T and Intel (NASM-like) syntax.
Syntax
There are two types of macros: multiline and inline (inline supported only in Intel syntax with the %define directive). The multiline macro is created using the .macro directive in AT&T or %macro in Intel and closed off by .endm or %endmacro respectively.
All macros may have a list of parameters (each of which is an array of tokens). The way these are accessed changes across the three cases:
- Multiline in AT&T: Parameters are assigned a single-token name, and within the macro definition are referenced with the prefix
\. For example:
.macro genByte nibble1, nibble2
.byte \nibble1 | \nibble2 << 4
.endm
genByte 4, 5 # 45
- Multiline in Intel: Parameters are assigned a number indicating their index; they are referenced with the prefix
%(note that thenextfunction may need to be contextualized a bit to disambiguate e.g.5%3from5 (%3)). For example:
%macro genByte 2
db %1 | %2 << 4
%endmacro
genByte 4, 5 # 45
- Inline in Intel: Parameters are assigned a single-token name, and are accessed simply by referencing their name, no prefix needed. For example:
%define genByte(nibble1, nibble2) db nibble1 | nibble2 << 4
genByte(4, 5) # 45
Behavior
Definition
When a macro is defined, the assembler should record all the tokens up to the closing directive (.endm or %endmacro, or a newline for inline macros) and store them in a map. This map should behave similarly to the symbols map: every token that is read by the parser should be recorded as a reference to a macro with the same name. If the macro is later defined or redefined, all lines containing references to the macro should be recompiled entirely (i.e. from the source). All newlines stored in the macro should be replaced with semicolons so that the byte output of a macro corresponds to a single line.
Macro frames
When a macro is used, the parser should create a "macro frame" state; in this state, the parser should read the tokens from the appropriate macro, replacing them with the arguments given as necessary (each macro frame has its own dictionary of parameters). During this state, the parser will not update the currRange variable; it will remain fixed on the macro reference itself. Any parser error thrown within a macro should cause all macro frames to immediately be destroyed, and the parser should continue from the non-macro source (this is to prevent multiple errors piling up on the same macro).
Recursion
Recursion should also be supported: if a macro references another macro (or itself), a new macro frame should be created for that macro, with its own parameter dictionary. .if/%if and .endif/%endif directives would be helpful (note that they should take in a constant, non-IP relative expression in order to prevent paradoxes). There should be a limit on how many macro frames are allowed at any given moment (perhaps 100?); if this limit is reached, an error is thrown and all macro frames are destroyed.