Skip to content

Commit 0095046

Browse files
committed
overhauls the target/architecture abstraction (1/n)
Introduces Theory.Target.t that superseeds Bap.Std.Arch.t. The old representation suffered from a few problems that we inherited from LLVM. The main issue is that Arch.t is not extensible and in order to add a new architecture the Bap.Std code shall be changed in a backward-compatibility-breaking manner. Arch.t is als unable to represent the whole variety of computing devices, which is especially relevant to micro-controllers (AVR, PIC) and IoT devices on which we are currently focusing. Finally, Arch.t is not precise enough to capture information that is necessary for code generation, the new venue that we are currently exploring. As the first attempt that didn't really work we introduced arch, sub, and other properties to the `core-theory:unit` class in #1119. The problem with that approach was the stringly typed interface as `arch` was represented as a simple string. In addition, the proposed properties werent' able to describe uncommon architectures. Finally, it was very awkward to use, all fields were optional with no good defaults. This is the second attempt and it will be split into several pull requests. The first PR, this one, introduce the Theory.Target.t but still keeps Arch.t alive, i.e., it is used by all internal and external components of BAP. This is to ensure that switching to Target.t doesn't break any existing code. The consequent pull requests will gradually deprecated functions that use Arch.t and switch Target.t everywhere. The most important switch will affect the disassembler/decoder framework, which is currently still stuck on Arch.t. Just to be clear, after this work is finished and until BAP 3.0 and maybe even thereafter Arch.t will still work as it used to work and no code will break or require updates. However, newly added architectures, such as AVR or PIC, i.e., those that could not be represented with Arch.t will not be available for the code that still relies on it. In addition to Theory.Target.t we add a few more abstractions and convenience functions, e.g., `Project.empty` and a completely new interface for Project.Input.t generation, which makes it easier to create projects from strings or other custom data, e.g., `Project.Input.from_string` . We also add Source, Language, and Compiler abstractions to the knowledge base Core Theory. These abstractions, together with Target, describe the full cycle of the program transformation using the compiler from source code in the given language to the program for the specified target (and the other way around). The Target abstraction itself comes with a few more data types that describe various aspects of the target system, including file formats, ABI, floating-point ABI (FABI), endianness, which is no longer limited to the binary choice of little and big endianness, and an extensible data type for storing target-specific options. Finally, all targets are formed into hierarchies and families, which helps in controlling the vast zoo of computer architectures and devices. The Target.t is an abstract data type and is self-describing and includes enough information that describes all the details of the architecture. We also provide four library modules, for arm, mips, powerpc, and x86 that exposes the currenlty declared targets. Our LLVM backend is not yet precise enough to recongize many of the supported targets and we don't have analyses right now that will infer the target from the binary, but we will add the `--target` option in the next PRs (when we will switch to Target.t) everywhere. As usual, comments, questions, reviews are very welcome.
1 parent 543e684 commit 0095046

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+2528
-378
lines changed

lib/arm/arm_target.ml

Lines changed: 336 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,336 @@
1+
open Core_kernel
2+
open Bap_core_theory
3+
open Bap.Std
4+
5+
let package = "bap"
6+
7+
type r128 and r80 and r64 and r32 and r16 and r8
8+
9+
type 'a bitv = 'a Theory.Bitv.t Theory.Value.sort
10+
11+
let r128 : r128 bitv = Theory.Bitv.define 128
12+
let r80 : r80 bitv = Theory.Bitv.define 80
13+
let r64 : r64 bitv = Theory.Bitv.define 64
14+
let r32 : r32 bitv = Theory.Bitv.define 32
15+
let r16 : r16 bitv = Theory.Bitv.define 16
16+
let r8 : r8 bitv = Theory.Bitv.define 8
17+
let bool = Theory.Bool.t
18+
19+
20+
let reg t n = Theory.Var.define t n
21+
let untyped = List.map ~f:Theory.Var.forget
22+
let (@<) xs ys = untyped xs @ untyped ys
23+
24+
let array ?(index=string_of_int) t pref size =
25+
List.init size ~f:(fun i -> reg t (pref ^ index i))
26+
27+
let mems = Theory.Mem.define r32 r8
28+
let data = Theory.Var.define mems (Var.name Arm_env.mem)
29+
30+
let of_bil v =
31+
Theory.Var.define (Var.sort v) (Var.name v)
32+
33+
let vars32 = List.map ~f:of_bil Arm_env.[
34+
r0; r1; r2; r3; r4; r5; r6; r7; r8; r9;
35+
r10; r11; r12; sp; lr; mem;
36+
nf; zf; cf; vf; qf;
37+
]
38+
39+
let vars32_fp = vars32 @ untyped @@ array r64 "D" 16
40+
41+
42+
let gp64 = array r64 "X" 30
43+
let fp64 = array r128 "Q" 32
44+
let sp64 = [reg r64 "SP"]
45+
let lr64 = [reg r64 "LR"]
46+
let mems64 = Theory.Mem.define r64 r8
47+
let data64 = Theory.Var.define mems64 "mem"
48+
let flags64 = [
49+
reg bool "NF";
50+
reg bool "ZF";
51+
reg bool "CF";
52+
reg bool "VF";
53+
]
54+
55+
let vars64 = gp64 @< fp64 @< sp64 @< lr64 @< flags64 @< [data64]
56+
57+
let parent = Theory.Target.declare "arm"
58+
59+
module type v4 = sig
60+
end
61+
62+
63+
module type ARM = sig
64+
val endianness : Theory.Target.endianness
65+
val parent : Theory.Target.t
66+
val v4 : Theory.Target.t
67+
val v4t : Theory.Target.t
68+
val v5 : Theory.Target.t
69+
val v5t : Theory.Target.t
70+
val v5te : Theory.Target.t
71+
val v5tej : Theory.Target.t
72+
val v6 : Theory.Target.t
73+
val v6t2 : Theory.Target.t
74+
val v6z : Theory.Target.t
75+
val v6k : Theory.Target.t
76+
val v6m : Theory.Target.t
77+
val v7 : Theory.Target.t
78+
val v7fp : Theory.Target.t
79+
val v7a : Theory.Target.t
80+
val v7afp : Theory.Target.t
81+
val v8a : Theory.Target.t
82+
val v81a : Theory.Target.t
83+
val v82a : Theory.Target.t
84+
val v83a : Theory.Target.t
85+
val v84a : Theory.Target.t
86+
val v85a : Theory.Target.t
87+
val v86a : Theory.Target.t
88+
end
89+
90+
module type Endianness = sig val endianness : Theory.Target.endianness end
91+
module Family (Order : Endianness) = struct
92+
include Order
93+
94+
let ordered name =
95+
let order = Theory.Target.Endianness.name endianness in
96+
name ^ "+" ^ KB.Name.unqualified order
97+
98+
let (<:) parent name =
99+
if Theory.Target.is_unknown parent
100+
then Theory.Target.unknown
101+
else Theory.Target.declare ~package (ordered name) ~parent
102+
~nicknames:[name]
103+
104+
let is_bi_endian = Theory.Target.Endianness.(equal bi) endianness
105+
106+
let v4 =
107+
if is_bi_endian
108+
then Theory.Target.unknown
109+
else Theory.Target.declare ~package (ordered "armv4")
110+
~parent
111+
~nicknames:["armv4"]
112+
~bits:32
113+
~byte:8
114+
~endianness
115+
~code:data
116+
~data:data
117+
~vars:vars32
118+
119+
let v4t = v4 <: "armv4t"
120+
let v5 = v4 <: "armv5"
121+
let v5t = v5 <: "armv5t"
122+
let v5te = v5t <: "armv5te"
123+
let v5tej = v5te <: "armv5tej"
124+
let v6 = v5tej <: "armv6"
125+
let v6t2 = v6 <: "armv6t2"
126+
let v6z = v6 <: "armv6z"
127+
let v6k = v6z <: "armv6k"
128+
let v6m = v6 <: "armv6-m"
129+
130+
let v7 = if not is_bi_endian then v6t2 <: "armv7"
131+
else Theory.Target.declare ~package (ordered "armv4")
132+
~parent
133+
~nicknames:["armv4"]
134+
~bits:32
135+
~byte:8
136+
~endianness
137+
~code:data
138+
~data:data
139+
~vars:vars32
140+
141+
let v7fp = Theory.Target.declare ~package (ordered "armv7+fp") ~parent:v7
142+
~nicknames:["armv7+fp"]
143+
~vars:vars32_fp
144+
145+
let v7a = v7 <: "armv7-a"
146+
let v7afp = Theory.Target.declare ~package (ordered "armv7-a+fp")
147+
~nicknames:["armv7-a+fp"]
148+
~parent:v7a
149+
~vars:vars32_fp
150+
151+
let v8a =
152+
Theory.Target.declare ~package (ordered "armv8-a") ~parent:v7
153+
~nicknames:["armv8-a"]
154+
~bits:64
155+
~code:data64
156+
~data:data64
157+
~vars:vars64
158+
159+
let v81a = v8a <: "armv8.1-a"
160+
let v82a = v81a <: "armv8.2-a"
161+
let v83a = v82a <: "armv8.3-a"
162+
let v84a = v83a <: "armv8.4-a"
163+
let v85a = v84a <: "armv8.5-a"
164+
let v86a = v85a <: "armv8.6-a"
165+
166+
let parent = if is_bi_endian then v7 else v4
167+
end
168+
169+
module LE = Family(struct
170+
let endianness = Theory.Target.Endianness.le
171+
end)
172+
173+
174+
module Bi = Family(struct
175+
let endianness = Theory.Target.Endianness.bi
176+
end)
177+
178+
179+
module EB = Family(struct
180+
let endianness = Theory.Target.Endianness.eb
181+
end)
182+
183+
let family_of_endian is_little : (module ARM) = match is_little with
184+
| None -> (module Bi)
185+
| Some true -> (module LE)
186+
| Some false -> (module EB)
187+
188+
189+
let prefixes = ["arm"; "thumb"; "aarch64";]
190+
let suffixes = ["eb"; "_be"]
191+
192+
let in_family = function
193+
| None -> false
194+
| Some x -> List.exists prefixes ~f:(fun prefix ->
195+
String.is_prefix ~prefix x)
196+
197+
let drop_end s =
198+
Option.value ~default:s @@
199+
List.find_map suffixes ~f:(fun suffix ->
200+
String.chop_suffix ~suffix s)
201+
202+
let split s = List.find_map_exn prefixes ~f:(fun prefix ->
203+
match String.chop_prefix ~prefix s with
204+
| None -> None
205+
| Some r -> Some (prefix,drop_end r))
206+
207+
let normalize arch sub =
208+
match arch,sub with
209+
| None,_ -> assert false
210+
| Some arch,None -> split arch
211+
| Some arch, Some sub -> arch,sub
212+
213+
let enable_loader () =
214+
let open Bap.Std in
215+
KB.Rule.(declare ~package "arm-target" |>
216+
require Project.specification_slot |>
217+
provide Theory.Unit.target |>
218+
comment "computes target from the OGRE specification");
219+
let open KB.Syntax in
220+
let request_info doc =
221+
let open Ogre.Syntax in
222+
let request =
223+
Ogre.request Image.Scheme.arch >>= fun arch ->
224+
Ogre.request Image.Scheme.subarch >>= fun sub ->
225+
Ogre.request Image.Scheme.is_little_endian >>= fun little ->
226+
Ogre.return (arch,sub, little) in
227+
match Ogre.eval request doc with
228+
| Error _ -> None,None,None
229+
| Ok info -> info in
230+
KB.promise Theory.Unit.target @@ fun unit ->
231+
KB.collect Project.specification_slot unit >>|
232+
request_info >>| fun (arch,sub,is_little) ->
233+
if not (in_family arch) then Theory.Target.unknown
234+
else
235+
let module Family = (val family_of_endian is_little) in
236+
match normalize arch sub with
237+
| "arm","v4" -> Family.v4
238+
| "arm","v4t" -> Family.v4t
239+
| "arm","v5" -> Family.v5
240+
| "arm","v5t" -> Family.v5t
241+
| "arm","v5te" -> Family.v5te
242+
| "arm","v5tej" -> Family.v5tej
243+
| "arm","v6" -> Family.v6
244+
| "arm","v6z" -> Family.v6z
245+
| "arm","v6k" -> Family.v6k
246+
| "arm","v6m" -> Family.v6m
247+
| "arm","v6t2" -> Family.v6t2
248+
| "arm","v7" -> Family.v7
249+
| "arm","v7fp" -> Family.v7
250+
| "arm","v7a" -> Family.v7a
251+
| "arm","v7afp" -> Family.v7afp
252+
| "arm","v8" -> Family.v8a
253+
| "arm","v8a" -> Family.v8a
254+
| "arm","v81a" -> Family.v81a
255+
| "arm","v82a" -> Family.v82a
256+
| "arm","v83a" -> Family.v83a
257+
| "arm","v84a" -> Family.v84a
258+
| "arm","v85a" -> Family.v85a
259+
| "arm","v86a" -> Family.v86a
260+
| "thumb", "v4" -> Family.v4t
261+
| "thumb", "v5" -> Family.v5t
262+
| "aarch64",_ -> Family.v86a
263+
| _ -> Family.v7
264+
265+
266+
type arms = [
267+
| Arch.arm
268+
| Arch.armeb
269+
| Arch.thumb
270+
| Arch.thumbeb
271+
| Arch.aarch64
272+
]
273+
274+
let arms : arms Map.M(Theory.Target).t =
275+
Map.of_alist_exn (module Theory.Target) [
276+
LE.v4, `armv4;
277+
LE.v4t, `armv4;
278+
LE.v5, `armv5;
279+
LE.v5t, `armv5;
280+
LE.v5te, `armv5;
281+
LE.v5tej, `armv5;
282+
LE.v6, `armv6;
283+
LE.v6z, `armv6;
284+
LE.v6k, `armv6;
285+
LE.v6m, `armv6;
286+
LE.v6t2, `armv6;
287+
LE.v7, `armv7;
288+
LE.v7a, `armv7;
289+
LE.v7afp, `armv7;
290+
LE.v8a, `aarch64;
291+
LE.v81a, `aarch64;
292+
LE.v82a, `aarch64;
293+
LE.v83a, `aarch64;
294+
LE.v84a, `aarch64;
295+
LE.v85a, `aarch64;
296+
LE.v86a, `aarch64;
297+
EB.v4, `armv4eb;
298+
EB.v4t, `armv4eb;
299+
EB.v5, `armv5eb;
300+
EB.v5t, `armv5eb;
301+
EB.v5te, `armv5eb;
302+
EB.v5tej, `armv5eb;
303+
EB.v6, `armv6eb;
304+
EB.v6z, `armv6eb;
305+
EB.v6k, `armv6eb;
306+
EB.v6m, `armv6eb;
307+
EB.v6t2,`armv6eb;
308+
EB.v7, `armv7eb;
309+
EB.v7a, `armv7eb;
310+
EB.v7afp, `armv7eb;
311+
EB.v8a, `aarch64_be;
312+
EB.v81a, `aarch64_be;
313+
EB.v82a, `aarch64_be;
314+
EB.v83a, `aarch64_be;
315+
EB.v84a, `aarch64_be;
316+
EB.v85a, `aarch64_be;
317+
EB.v86a, `aarch64_be;
318+
]
319+
320+
321+
let enable_arch () =
322+
let open KB.Syntax in
323+
KB.Rule.(declare ~package "arm-arch" |>
324+
require Theory.Unit.target |>
325+
provide Arch.unit_slot |>
326+
comment "computes Arch.t from the unit's target");
327+
KB.promise Arch.unit_slot @@ fun unit ->
328+
KB.collect Theory.Unit.target unit >>| fun t ->
329+
match Map.find arms t with
330+
| Some arch -> (arch :> Arch.t)
331+
| None -> `unknown
332+
333+
334+
let load () =
335+
enable_loader ();
336+
enable_arch ()

0 commit comments

Comments
 (0)