Skip to content

Commit aae5c10

Browse files
authored
overhauls the target/architecture abstraction (2/n) (#1226)
* enables interworking in the disassembler driver What is interworking -------------------- Interworking is a feature of some architectures that enables mixing several instruction sets in the same compilation unit. Example, arm and thumb interworking that this branch is trying to add. What is done ------------- 1. We add the switch primitive to the basic interface that changes the dissassembler in the current disassembling state. It is a bold move and can have conseqeuences, should be carefully reviewed 2. Attributes each destination in the disassembler driver state with the architecture and calls switch every time we are going to disassemble the next chunk of memory. 3. The default rule that extends the unit architecture to all instructions in that unit is disabled for ARM/Thumb and is overriden in the arm plugin with the following behavior, if an arm unit has a file and that file has a symbol table then we provide information based on the last bit of that symbol table (todo: we should also check for abi), otherwise we propagate the unit arch to instructions. What is to be done ------------------ Next, the arm lifter shall provide a promise to compute destinations (which itself will require destinations, because we don't really want to compute them) and provide the destination architecture, based on the source encoding. We can safely examine any representation of the instruction since it is already will be lifted by that moment. * flattens the target interface, publishes the Enum module also makes Enum more strict by checking that the element is indeed a member of the set of elements and by preventing double declarations. * adds an llvm decode for x86 * drops the dependency on arch from the disassembler driver * overhauls the target/architecture abstraction (2/n) In the second patch of this series (#1225) we completely got rid of Arch.t dependency in the disassembler engine that finally opens the path for seamless integration of targets that are not representable with Arch.t. To achieve this, we introduced a proper dependency injection into the disassembler driver so that it is no longer responsible for creating the llvm MC disassembler. Instead a plugin that implements a target, aka the target support package, has to create a disassembler and is now in full control of all parameters and can choose backend, specify the CPU and other details of encoding. The encoding is a new abstraction in our knowledge base that breaks the tight connection between the target and the way how the program for that target is encoded. Unlike the target, which is a property of a unit of code, the encoding is associated with a program itself, i.e., it is a property of each instruction. That enables targets with context-dependent encodings such ARM's thumb mode and MIPS16e for binary encodings as well as paves the road for non-binary encodings for the same architecture, e.g., text assembly (which also may have several encodings on its own, cf. att vs intel syntax). We base this branch on the enable-interworking (#1188) and this branch fully superseeds and includes it, since encodings made it much more natural. It is still highlty untested how it will work with real thumb binaries but we will get back to it when we will merge #1178. Another big update, is that the disassembler backend (which is responsible for translating bits into machine instructions) is no longer required to be implemented in C++ and it is now possible to write your own backends/disassemblers in pure OCaml, e.g., to support PIC microcontrollers. The Backend interface is pretty low-level and we might provide higher-level interfaces later, see `Disasm_expert.Backend` for the interface and detailed comments. Finally, we rectify the interface introduced in the previous PR and flatten the hierarchy of newly introduced to the Core Theory abtractions, i.e., instead of `Theory.Target.Endiannes` we now have `Theory.Endianness` and so on. We also made the `Enum` module public which introduced enumerated types built on to of `Knowledge.Value`s. In the next episodes of this series we will gradually remove Arch.t from other bap components and further clean up the newly introduced interfaces.
1 parent 0095046 commit aae5c10

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+1662
-929
lines changed

lib/arm/arm_target.ml

Lines changed: 159 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,37 @@
1+
let package = "bap"
2+
13
open Core_kernel
24
open Bap_core_theory
35
open Bap.Std
6+
open KB.Syntax
47

5-
let package = "bap"
8+
module CT = Theory
69

710
type r128 and r80 and r64 and r32 and r16 and r8
811

9-
type 'a bitv = 'a Theory.Bitv.t Theory.Value.sort
12+
type 'a bitv = 'a CT.Bitv.t CT.Value.sort
1013

11-
let r128 : r128 bitv = Theory.Bitv.define 128
12-
let r80 : r80 bitv = Theory.Bitv.define 80
13-
let r64 : r64 bitv = Theory.Bitv.define 64
14-
let r32 : r32 bitv = Theory.Bitv.define 32
15-
let r16 : r16 bitv = Theory.Bitv.define 16
16-
let r8 : r8 bitv = Theory.Bitv.define 8
17-
let bool = Theory.Bool.t
14+
let r128 : r128 bitv = CT.Bitv.define 128
15+
let r80 : r80 bitv = CT.Bitv.define 80
16+
let r64 : r64 bitv = CT.Bitv.define 64
17+
let r32 : r32 bitv = CT.Bitv.define 32
18+
let r16 : r16 bitv = CT.Bitv.define 16
19+
let r8 : r8 bitv = CT.Bitv.define 8
20+
let bool = CT.Bool.t
1821

1922

20-
let reg t n = Theory.Var.define t n
21-
let untyped = List.map ~f:Theory.Var.forget
23+
let reg t n = CT.Var.define t n
24+
let untyped = List.map ~f:CT.Var.forget
2225
let (@<) xs ys = untyped xs @ untyped ys
2326

2427
let array ?(index=string_of_int) t pref size =
2528
List.init size ~f:(fun i -> reg t (pref ^ index i))
2629

27-
let mems = Theory.Mem.define r32 r8
28-
let data = Theory.Var.define mems (Var.name Arm_env.mem)
30+
let mems = CT.Mem.define r32 r8
31+
let data = CT.Var.define mems (Var.name Arm_env.mem)
2932

3033
let of_bil v =
31-
Theory.Var.define (Var.sort v) (Var.name v)
34+
CT.Var.define (Var.sort v) (Var.name v)
3235

3336
let vars32 = List.map ~f:of_bil Arm_env.[
3437
r0; r1; r2; r3; r4; r5; r6; r7; r8; r9;
@@ -43,8 +46,8 @@ let gp64 = array r64 "X" 30
4346
let fp64 = array r128 "Q" 32
4447
let sp64 = [reg r64 "SP"]
4548
let lr64 = [reg r64 "LR"]
46-
let mems64 = Theory.Mem.define r64 r8
47-
let data64 = Theory.Var.define mems64 "mem"
49+
let mems64 = CT.Mem.define r64 r8
50+
let data64 = CT.Var.define mems64 "mem"
4851
let flags64 = [
4952
reg bool "NF";
5053
reg bool "ZF";
@@ -54,59 +57,59 @@ let flags64 = [
5457

5558
let vars64 = gp64 @< fp64 @< sp64 @< lr64 @< flags64 @< [data64]
5659

57-
let parent = Theory.Target.declare "arm"
60+
let parent = CT.Target.declare "arm"
5861

5962
module type v4 = sig
6063
end
6164

6265

6366
module type ARM = sig
64-
val endianness : Theory.Target.endianness
65-
val parent : Theory.Target.t
66-
val v4 : Theory.Target.t
67-
val v4t : Theory.Target.t
68-
val v5 : Theory.Target.t
69-
val v5t : Theory.Target.t
70-
val v5te : Theory.Target.t
71-
val v5tej : Theory.Target.t
72-
val v6 : Theory.Target.t
73-
val v6t2 : Theory.Target.t
74-
val v6z : Theory.Target.t
75-
val v6k : Theory.Target.t
76-
val v6m : Theory.Target.t
77-
val v7 : Theory.Target.t
78-
val v7fp : Theory.Target.t
79-
val v7a : Theory.Target.t
80-
val v7afp : Theory.Target.t
81-
val v8a : Theory.Target.t
82-
val v81a : Theory.Target.t
83-
val v82a : Theory.Target.t
84-
val v83a : Theory.Target.t
85-
val v84a : Theory.Target.t
86-
val v85a : Theory.Target.t
87-
val v86a : Theory.Target.t
67+
val endianness : CT.endianness
68+
val parent : CT.Target.t
69+
val v4 : CT.Target.t
70+
val v4t : CT.Target.t
71+
val v5 : CT.Target.t
72+
val v5t : CT.Target.t
73+
val v5te : CT.Target.t
74+
val v5tej : CT.Target.t
75+
val v6 : CT.Target.t
76+
val v6t2 : CT.Target.t
77+
val v6z : CT.Target.t
78+
val v6k : CT.Target.t
79+
val v6m : CT.Target.t
80+
val v7 : CT.Target.t
81+
val v7fp : CT.Target.t
82+
val v7a : CT.Target.t
83+
val v7afp : CT.Target.t
84+
val v8a : CT.Target.t
85+
val v81a : CT.Target.t
86+
val v82a : CT.Target.t
87+
val v83a : CT.Target.t
88+
val v84a : CT.Target.t
89+
val v85a : CT.Target.t
90+
val v86a : CT.Target.t
8891
end
8992

90-
module type Endianness = sig val endianness : Theory.Target.endianness end
93+
module type Endianness = sig val endianness : CT.endianness end
9194
module Family (Order : Endianness) = struct
9295
include Order
9396

9497
let ordered name =
95-
let order = Theory.Target.Endianness.name endianness in
98+
let order = CT.Endianness.name endianness in
9699
name ^ "+" ^ KB.Name.unqualified order
97100

98101
let (<:) parent name =
99-
if Theory.Target.is_unknown parent
100-
then Theory.Target.unknown
101-
else Theory.Target.declare ~package (ordered name) ~parent
102+
if CT.Target.is_unknown parent
103+
then CT.Target.unknown
104+
else CT.Target.declare ~package (ordered name) ~parent
102105
~nicknames:[name]
103106

104-
let is_bi_endian = Theory.Target.Endianness.(equal bi) endianness
107+
let is_bi_endian = CT.Endianness.(equal bi) endianness
105108

106109
let v4 =
107110
if is_bi_endian
108-
then Theory.Target.unknown
109-
else Theory.Target.declare ~package (ordered "armv4")
111+
then CT.Target.unknown
112+
else CT.Target.declare ~package (ordered "armv4")
110113
~parent
111114
~nicknames:["armv4"]
112115
~bits:32
@@ -127,8 +130,8 @@ module Family (Order : Endianness) = struct
127130
let v6k = v6z <: "armv6k"
128131
let v6m = v6 <: "armv6-m"
129132

130-
let v7 = if not is_bi_endian then v6t2 <: "armv7"
131-
else Theory.Target.declare ~package (ordered "armv4")
133+
let v7 = if not is_bi_endian then v6t2 <: "armv7"
134+
else CT.Target.declare ~package (ordered "armv4")
132135
~parent
133136
~nicknames:["armv4"]
134137
~bits:32
@@ -138,18 +141,18 @@ module Family (Order : Endianness) = struct
138141
~data:data
139142
~vars:vars32
140143

141-
let v7fp = Theory.Target.declare ~package (ordered "armv7+fp") ~parent:v7
144+
let v7fp = CT.Target.declare ~package (ordered "armv7+fp") ~parent:v7
142145
~nicknames:["armv7+fp"]
143146
~vars:vars32_fp
144147

145148
let v7a = v7 <: "armv7-a"
146-
let v7afp = Theory.Target.declare ~package (ordered "armv7-a+fp")
149+
let v7afp = CT.Target.declare ~package (ordered "armv7-a+fp")
147150
~nicknames:["armv7-a+fp"]
148151
~parent:v7a
149152
~vars:vars32_fp
150153

151154
let v8a =
152-
Theory.Target.declare ~package (ordered "armv8-a") ~parent:v7
155+
CT.Target.declare ~package (ordered "armv8-a") ~parent:v7
153156
~nicknames:["armv8-a"]
154157
~bits:64
155158
~code:data64
@@ -166,19 +169,9 @@ module Family (Order : Endianness) = struct
166169
let parent = if is_bi_endian then v7 else v4
167170
end
168171

169-
module LE = Family(struct
170-
let endianness = Theory.Target.Endianness.le
171-
end)
172-
173-
174-
module Bi = Family(struct
175-
let endianness = Theory.Target.Endianness.bi
176-
end)
177-
178-
179-
module EB = Family(struct
180-
let endianness = Theory.Target.Endianness.eb
181-
end)
172+
module LE = Family(struct let endianness = CT.Endianness.le end)
173+
module Bi = Family(struct let endianness = CT.Endianness.bi end)
174+
module EB = Family(struct let endianness = CT.Endianness.eb end)
182175

183176
let family_of_endian is_little : (module ARM) = match is_little with
184177
| None -> (module Bi)
@@ -213,8 +206,8 @@ let normalize arch sub =
213206
let enable_loader () =
214207
let open Bap.Std in
215208
KB.Rule.(declare ~package "arm-target" |>
216-
require Project.specification_slot |>
217-
provide Theory.Unit.target |>
209+
require Image.Spec.slot |>
210+
provide CT.Unit.target |>
218211
comment "computes target from the OGRE specification");
219212
let open KB.Syntax in
220213
let request_info doc =
@@ -227,10 +220,10 @@ let enable_loader () =
227220
match Ogre.eval request doc with
228221
| Error _ -> None,None,None
229222
| Ok info -> info in
230-
KB.promise Theory.Unit.target @@ fun unit ->
231-
KB.collect Project.specification_slot unit >>|
223+
KB.promise CT.Unit.target @@ fun unit ->
224+
KB.collect Image.Spec.slot unit >>|
232225
request_info >>| fun (arch,sub,is_little) ->
233-
if not (in_family arch) then Theory.Target.unknown
226+
if not (in_family arch) then CT.Target.unknown
234227
else
235228
let module Family = (val family_of_endian is_little) in
236229
match normalize arch sub with
@@ -271,8 +264,8 @@ type arms = [
271264
| Arch.aarch64
272265
]
273266

274-
let arms : arms Map.M(Theory.Target).t =
275-
Map.of_alist_exn (module Theory.Target) [
267+
let arms : arms Map.M(CT.Target).t =
268+
Map.of_alist_exn (module CT.Target) [
276269
LE.v4, `armv4;
277270
LE.v4t, `armv4;
278271
LE.v5, `armv5;
@@ -321,16 +314,105 @@ let arms : arms Map.M(Theory.Target).t =
321314
let enable_arch () =
322315
let open KB.Syntax in
323316
KB.Rule.(declare ~package "arm-arch" |>
324-
require Theory.Unit.target |>
317+
require CT.Unit.target |>
325318
provide Arch.unit_slot |>
326319
comment "computes Arch.t from the unit's target");
327320
KB.promise Arch.unit_slot @@ fun unit ->
328-
KB.collect Theory.Unit.target unit >>| fun t ->
321+
KB.collect CT.Unit.target unit >>| fun t ->
329322
match Map.find arms t with
330323
| Some arch -> (arch :> Arch.t)
331324
| None -> `unknown
332325

333326

327+
let llvm_a32 = CT.Language.declare ~package "llvm-A32"
328+
let llvm_t32 = CT.Language.declare ~package "llvm-T32"
329+
let llvm_a64 = CT.Language.declare ~package "llvm-A64"
330+
331+
module Dis = Disasm_expert.Basic
332+
333+
let register encoding triple =
334+
Dis.register encoding @@ fun _ ->
335+
Dis.create ~backend:"llvm" triple
336+
337+
let symbol_values doc =
338+
let field = Ogre.Query.(select (from Image.Scheme.symbol_value)) in
339+
match Ogre.eval (Ogre.collect field) doc with
340+
| Ok syms -> syms
341+
| Error err ->
342+
failwithf "Arm_target: broken file specification: %s"
343+
(Error.to_string_hum err) ()
344+
345+
module Encodings = struct
346+
let empty = Map.empty (module Bitvec_order)
347+
348+
let symbols_encoding spec =
349+
symbol_values spec |>
350+
Seq.fold ~init:empty ~f:(fun symbols (addr,value) ->
351+
let addr = Bitvec.M32.int64 addr in
352+
Map.add_exn symbols ~key:addr
353+
~data:(match Int64.(value land 1L) with
354+
| 0L -> llvm_a32
355+
| _ -> llvm_a64))
356+
357+
let slot = KB.Class.property CT.Unit.cls
358+
~package "symbols-encodings" @@
359+
KB.Domain.flat "encodings"
360+
~empty
361+
~equal:(Map.equal CT.Language.equal)
362+
363+
let () =
364+
let open KB.Syntax in
365+
KB.promise slot @@ fun label ->
366+
KB.collect Image.Spec.slot label >>|
367+
symbols_encoding
368+
end
369+
370+
let (>>=?) x f = x >>= function
371+
| None -> !!CT.Language.unknown
372+
| Some x -> f x
373+
374+
let compute_encoding_from_symbol_table default label =
375+
KB.collect CT.Label.unit label >>=? fun unit ->
376+
KB.collect CT.Label.addr label >>=? fun addr ->
377+
KB.collect Encodings.slot unit >>= fun encodings ->
378+
KB.return @@ match Map.find encodings addr with
379+
| Some x -> x
380+
| None -> default
381+
382+
(* here less than means: was introduced before *)
383+
let (<) t p = not (CT.Target.belongs p t)
384+
let (<=) t p = CT.Target.equal t p || t < p
385+
let (>=) = CT.Target.belongs
386+
387+
let is_arm = CT.Target.belongs parent
388+
389+
let before_thumb2 t =
390+
t < LE.v6t2 ||
391+
t < EB.v6t2
392+
393+
let is_64bit t =
394+
t >= LE.v8a ||
395+
t >= EB.v8a ||
396+
t >= Bi.v8a
397+
398+
let guess_encoding label target =
399+
if is_arm target then
400+
if before_thumb2 target
401+
then compute_encoding_from_symbol_table llvm_a32 label
402+
else KB.return @@
403+
if is_64bit target then llvm_a64 else llvm_a32
404+
else KB.return CT.Language.unknown
405+
406+
407+
let enable_decoder () =
408+
let open KB.Syntax in
409+
register llvm_a32 "armv7";
410+
register llvm_t32 "thumbv7";
411+
register llvm_a64 "aarch64";
412+
KB.promise CT.Label.encoding @@ fun label ->
413+
CT.Label.target label >>= guess_encoding label
414+
334415
let load () =
335416
enable_loader ();
336-
enable_arch ()
417+
enable_arch ();
418+
enable_decoder ()

0 commit comments

Comments
 (0)