You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
overhauls the target/architecture abstraction (2/n) (#1226)
* enables interworking in the disassembler driver
What is interworking
--------------------
Interworking is a feature of some architectures that enables mixing
several instruction sets in the same compilation unit. Example, arm
and thumb interworking that this branch is trying to add.
What is done
-------------
1. We add the switch primitive to the basic interface that changes the
dissassembler in the current disassembling state. It is a bold move
and can have conseqeuences, should be carefully reviewed
2. Attributes each destination in the disassembler driver state with
the architecture and calls switch every time we are going to
disassemble the next chunk of memory.
3. The default rule that extends the unit architecture to all
instructions in that unit is disabled for ARM/Thumb and is overriden
in the arm plugin with the following behavior, if an arm unit has a file
and that file has a symbol table then we provide information based on
the last bit of that symbol table (todo: we should also check for
abi), otherwise we propagate the unit arch to instructions.
What is to be done
------------------
Next, the arm lifter shall provide a promise to compute
destinations (which itself will require destinations, because we don't
really want to compute them) and provide the destination architecture,
based on the source encoding. We can safely examine any representation
of the instruction since it is already will be lifted by that moment.
* flattens the target interface, publishes the Enum module
also makes Enum more strict by checking that the element is indeed a
member of the set of elements and by preventing double declarations.
* adds an llvm decode for x86
* drops the dependency on arch from the disassembler driver
* overhauls the target/architecture abstraction (2/n)
In the second patch of this series (#1225) we completely got rid of
Arch.t dependency in the disassembler engine that finally opens the
path for seamless integration of targets that are not representable
with Arch.t.
To achieve this, we introduced a proper dependency injection into the
disassembler driver so that it is no longer responsible for creating
the llvm MC disassembler. Instead a plugin that implements a target,
aka the target support package, has to create a disassembler and is
now in full control of all parameters and can choose backend, specify
the CPU and other details of encoding. The encoding is a new
abstraction in our knowledge base that breaks the tight connection
between the target and the way how the program for that target is
encoded. Unlike the target, which is a property of a unit of code, the
encoding is associated with a program itself, i.e., it is a property
of each instruction. That enables targets with context-dependent
encodings such ARM's thumb mode and MIPS16e for binary encodings as
well as paves the road for non-binary encodings for the same
architecture, e.g., text assembly (which also may have several
encodings on its own, cf. att vs intel syntax). We base this branch on
the enable-interworking (#1188) and this branch fully superseeds and
includes it, since encodings made it much more natural. It is still
highlty untested how it will work with real thumb binaries but we will
get back to it when we will merge #1178.
Another big update, is that the disassembler backend (which is
responsible for translating bits into machine instructions) is no
longer required to be implemented in C++ and it is now possible to
write your own backends/disassemblers in pure OCaml, e.g., to support
PIC microcontrollers. The Backend interface is pretty low-level and we
might provide higher-level interfaces later, see
`Disasm_expert.Backend` for the interface and detailed comments.
Finally, we rectify the interface introduced in the previous PR and
flatten the hierarchy of newly introduced to the Core Theory
abtractions, i.e., instead of `Theory.Target.Endiannes` we now have
`Theory.Endianness` and so on. We also made the `Enum` module public
which introduced enumerated types built on to of `Knowledge.Value`s.
In the next episodes of this series we will gradually remove Arch.t
from other bap components and further clean up the newly introduced
interfaces.
0 commit comments