Skip to content

Improve instruction relocation function #324

@disinvite

Description

@disinvite

Background

The first requirement for an almost-matching-function to be promoted to an effective match is that the lists of instructions in orig and recomp have the same length. Call these lists A for orig and B for recomp. We also have a list of "opcodes" provided by difflib.SequenceMatcher that describe how to turn A into B.

With this information, relocate_instructions() from fixes.py looks for situations where an instruction deleted from A and inserted to some position in B does not alter the behavior of the function. This function takes a naive approach that is not without problems: #322

How it works now

  1. Read the list of inserted lines in B. Discard special lines that are not instructions such as entries to jump tables or data tables (both used in switch statements).
  2. Read deleted indices in A and find a line X that matches the one from B exactly. A set() is used here so ordering is not guaranteed.
  3. Get the set of registers R that are modified or used by X.
  4. Find the deletion index and the insertion index in A. To allow for swaps where the deleted line is before or after the inserted line, put these indices in order i, j. (Recent bugfix: Fix: Did not check all intermediate instructions when relocating #323)
  5. Check whether any instructions in A[i : j] modify the registers in R. If not, this is an allowed swap.
  6. Remove the deleted line from consideration for future swaps. Note that the starting list A is not modified, just the list of deletions.

Limitations of the current design

  • Considers blocks of deleted or inserted lines as single lines without regard for context.
  • Matches instruction lines exactly without considering stack pointer adjustment.
  • When considering a potential swap, we read from a set of deleted lines in no specific order. The line we get may from a completely irrelevant index.
  • Does not consider alterations to the ebp register for push or pop instructions.
  • Does not consider FPU registers when checking intermediate lines.
  • Does not consider local variables or memory locations when checking intermediate lines.
  • Does not consider replace diff opcodes where a block of instructions in A is replaced with one from B (not necessarily equal size) in the same position.

What we can do about it

I think the first step is to come up with a set of counterexamples that demonstrate the acknowledged limitations. We can add these as xfail tests before attempting to refactor this code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions