|
| 1 | +# Syntax of a Binary Pattern |
| 2 | + |
| 3 | +The syntax for this crate's binary patterns is primarily inspired by the [pelite](https://docs.rs/pelite/latest/pelite/pattern/fn.parse.html) crate's pattern system, aligning with existing de facto standards to simplify migration. Additionally, numerous enhancements have been introduced to facilitate matching against generated assembly instructions for function or code signatures. |
| 4 | + |
| 5 | +Below, all available operators are defined and explained. |
| 6 | + |
| 7 | +# Available Operators |
| 8 | + |
| 9 | +## Binary Data (`<hex>`) |
| 10 | + |
| 11 | +The **Match Binary Data** operator is the most fundamental operator. It performs a byte-by-byte comparison of the input with the specified hexadecimal values. Each byte must be written in hexadecimal format and padded to two digits. |
| 12 | + |
| 13 | +The following example searches for the hexadecimal sequence `0xFF 0xDE 0x01 0x23` in the target: |
| 14 | + |
| 15 | +```pattern |
| 16 | +FF DE 01 23 |
| 17 | +``` |
| 18 | + |
| 19 | +Spaces between the hexadecimal values are optional. The following example is equivalent to the one above: |
| 20 | + |
| 21 | +```pattern |
| 22 | +FFDE0123 |
| 23 | +``` |
| 24 | + |
| 25 | +## Byte Wildcard (`?`) |
| 26 | + |
| 27 | +The Byte Wildcard operator (`?`) matches any byte value, serving as the opposite of the Match Binary Data operator. |
| 28 | +For example, the following pattern matches any 32-bit relative call instruction (`E8 rel32`) followed by a return (`C3`) in x86 assembly: |
| 29 | + |
| 30 | +```pattern |
| 31 | +E8 ? ? ? ? C3 |
| 32 | +``` |
| 33 | + |
| 34 | +Note that a single question mark matches a whole byte. |
| 35 | + |
| 36 | +## Range Wildcard (`[<min>-<max>]` / `[<count>]`) |
| 37 | + |
| 38 | +The **Range Wildcard** operator (`[<min>-<max>]` / `[<count>]`) extends the capabilities of the byte wildcard operator by allowing you to match a specific range or a fixed count of bytes with any value. |
| 39 | + |
| 40 | +- Fixed Count Wildcard (`[<count>]`) |
| 41 | + Matches an exact number of bytes. For example, the following matches a 32-bit relative call instruction (`E8`), skips four bytes, and then matches a return instruction (`C3`): |
| 42 | + |
| 43 | + ```pattern |
| 44 | + E8 [4] C3 |
| 45 | + ``` |
| 46 | + |
| 47 | +- Variable Range Wildcard (`[<min>-<max>]`) |
| 48 | + Matches a variable range of bytes. |
| 49 | + The matcher aligns the remaining pattern with any offset within the range. |
| 50 | + For instance, the following matches a sequence starting with 0xFF, followed by four to eight random bytes, and ending with 0x00: |
| 51 | + |
| 52 | + ```pattern |
| 53 | + FF [4-8] FF |
| 54 | + ``` |
| 55 | + |
| 56 | +## Save Cursor (`'`) |
| 57 | + |
| 58 | +The **Save Cursor** operator (`'`) acts as a bookmark to save the current cursor's relative virtual address (RVA) in the save array returned by the matcher. |
| 59 | +The following example would save the rva of the beginning of the counting sequence in the result array at index 1: |
| 60 | + |
| 61 | +```pattern |
| 62 | +FF ' 01 02 03 04 |
| 63 | +``` |
| 64 | + |
| 65 | +Note: |
| 66 | +The first index (index 0) in the returned array from the matcher always contains the start address of the matched pattern. |
| 67 | + |
| 68 | +## Rel/Abs Jump (`%` / `$` / `@`) |
| 69 | + |
| 70 | +The **Jump** operator follows either a relative or absolute jump, allowing the pattern to continue matching at the resolved jump target. The following jump modes are supported: |
| 71 | + |
| 72 | +- **1-byte relative jump**: `%` |
| 73 | +- **4-byte relative jump**: `$` |
| 74 | +- **8-byte absolute jump**: `@` |
| 75 | + |
| 76 | +When using a jump operator, subsequent operations will be performed at the resolved jump location. |
| 77 | + |
| 78 | +Example: |
| 79 | +The following pattern matches a function call (`E8`), resolves a 4-byte relative jump (`$`), saves the function's start address to the save array, and confirms the function begins with `push rsp` (`54`): |
| 80 | + |
| 81 | +```pattern |
| 82 | +E8 $ ' 54 |
| 83 | +``` |
| 84 | + |
| 85 | +## Rel/Abs Jump with Sub-Pattern (`%` / `$` / `@` with `{}`) |
| 86 | + |
| 87 | +The **Jump** operator can also match a sub-pattern at the resolved jump destination while returning the cursor to its original location after the jump. This is achieved by enclosing the sub-pattern in curly braces (`{}`) immediately following the jump symbol. |
| 88 | + |
| 89 | +Behavior: |
| 90 | + |
| 91 | +- The sub-pattern within the curly braces is matched at the resolved jump destination. |
| 92 | +- After the sub-pattern is matched successfully, the cursor returns to the original location before the jump. |
| 93 | +- The bytes defining the jump are skipped, and matching continues from that point. |
| 94 | + |
| 95 | +Example: |
| 96 | + |
| 97 | +The following pattern matches a function call (`E8`), resolves a 4-byte relative jump (`$`), confirms the jump target begins with `push rsp` (`54`), saves the target address, and then continues matching after the jump: |
| 98 | + |
| 99 | +```pattern |
| 100 | +E8 $ { ' 54 } |
| 101 | +``` |
| 102 | + |
| 103 | +## OR / Branch (`(<pattern a> | <pattern b> [ | <pattern n> ])`) |
| 104 | + |
| 105 | +The **Branch** operator enables matching against one of multiple specified patterns. It allows for flexibility in matching sequences where alternatives are valid. This operator is especially useful when dealing with multiple valid opcode variations or alternative byte sequences. |
| 106 | + |
| 107 | +Example: |
| 108 | +The following pattern matches any of these sequences: 0xFF 0x01 0xFF, 0xFF 0x03 0xFF, or 0xFF 0xFF 0xFF: |
| 109 | + |
| 110 | +```pattern |
| 111 | +FF ( 01 | 03 | FF ) FF |
| 112 | +``` |
| 113 | + |
| 114 | +## Read Value (`r1` / `r2` / `r4`) |
| 115 | + |
| 116 | +The **Read Value** operator reads and saves a value from the matched bytes. It supports reading 1, 2, or 4 bytes and stores the result in the matched stack. This operator is particularly useful for extracting values like offsets, addresses, or immediate data from matched byte sequences. |
| 117 | + |
| 118 | +Example: |
| 119 | +The following pattern matches a 32-bit relative call instruction (`E8`) and saves the RVA (read from the 4 bytes following the instruction) into the matched stack at index 1: |
| 120 | + |
| 121 | +```pattern |
| 122 | +E8 r4 |
| 123 | +``` |
| 124 | + |
| 125 | +# Formal Syntax specification |
| 126 | + |
| 127 | +The following ABNF specifies the general syntax: |
| 128 | + |
| 129 | +```abnf |
| 130 | +match_string := *(operand " ") |
| 131 | +
|
| 132 | +operand := operand_bin / operand_wildcard_byte / operand_wildcard_range / operand_jump / operand_read / operand_cursor_save / operand_branch |
| 133 | +
|
| 134 | +operand_bin := 1*(2HEXDIG) |
| 135 | +operand_wildcard_byte := "?" |
| 136 | +operand_wildcard_range := "[" (wildcard_fixed / wildcard_range) "]" |
| 137 | +operand_jump := "%" / "$" / "@" [jump_target_matcher] |
| 138 | +operand_read := "r" ("1" / "2" / "4") |
| 139 | +operand_cursor_save := "'" |
| 140 | +operand_branch := "(" *( *(match_string) "|") ")" |
| 141 | +
|
| 142 | +wildcard_range := 1*DIGIT "-" 1*DIGIT |
| 143 | +wildcard_fixed := 1*DIGIT |
| 144 | +
|
| 145 | +jump_target_matcher := "{" *(match_string) "}" |
| 146 | +``` |
0 commit comments