Skip to content

Commit 1a655b1

Browse files
committed
BAP Public API
============== Abstract -------- This patch adds JSON Public API to BAP. It also introduces some bug fixes and extensions. As an example of API usage a small python binding is provided. Infrastructure Changes ---------------------- Since BAP now has more that 50 dependencies (most are optional), opam needs further guidance, otherwise it can't find a proper solution. This patch adds such guidance, properly specifying packages versions. Also, this patch fixes warnings issued by a build system on a newer compiler version. Core Lwt library ---------------- This patch introduces a Core Lwt library, that is a thin wrapper around Lwt library to provide a concise Core-like interface. I'm going to remove this library from BAP sooner or later, and push it to opam as a separate entity. Consistent Constructor Names ---------------------------- In order to satisfy the requirements of ADT data format, and for consistency purposes, I've title-cased all constructor names. This is a breaking change. Debugging Support ----------------- Many were extended to support `sexp_of` or `sexp` protocols. Also, all regular types got a plethora of new primitives, that can be used with different format specifiers. Changes to Bitvector -------------------- Functions `of_int32` and `of_int64` now accepts optional `width` parameter. Also, a new `string_of_value` function was added, that converts to string only vector value, dropping information about the size. Changes to Image ---------------- Switched to bap's size type instead of core's. Python Bindings --------------- This PR introduces a python binding to BAP. This binding allows one to disassemble arbitrary strings, and to load and analyze binary files. The library is packed with distutils, that means, that in order to use, you can just `pip install` it. The binding is rather deep, that means that instead of textly typed dictionaries you will get first-class python value of the appropriate type. All known to us ARM operands, registers and instructions are lifted to python classes. See documentation for more info. Public API ---------- BAP now can be called from any language, using JSON API. This API is implemented in a new program called `bap-server`. The server will also take care on storage and persistence problems, i.e., it will store date for you. Currently, the following set of protocols are implemented: - http - mmap - file - zmq This set allows to use the following physical media for interaction: - tcp and udp sockets - unix sockets - regular files - shared memory
0 parents  commit 1a655b1

File tree

7 files changed

+1052
-0
lines changed

7 files changed

+1052
-0
lines changed

__init__.py

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
r"""Python inteface to BAP.
2+
3+
In a few keystrokes:
4+
5+
>>> import bap
6+
>>> print '\n'.join(insn.asm for insn in bap.disasm("\x48\x83\xec\x08"))
7+
decl %eax
8+
subl $0x8, %esp
9+
10+
A more complex example:
11+
12+
>>> img = bap.image('coreutils_O0_ls')
13+
>>> sym = img.get_symbol('main')
14+
>>> print '\n'.join(insn.asm for insn in bap.disasm(sym))
15+
push {r11, lr}
16+
add r11, sp, #0x4
17+
sub sp, sp, #0xc8
18+
... <snip> ...
19+
20+
Bap package exposes two functions:
21+
22+
#. ``disasm`` returns a disassembly of the given object
23+
#. ``image`` loads given file
24+
25+
Disassembling things
26+
====================
27+
28+
``disasm`` is a swiss knife for disassembling things. It takes either a
29+
string object, or something returned by an ``image`` function, e.g.,
30+
images, sections and symbols.
31+
32+
``disasm`` function returns a generator yielding instances of class
33+
``Insn`` defined in module :mod:`asm`. It has the following attributes:
34+
35+
* name - instruction name, as undelying backend names it
36+
* addr - address of the first byte of instruction
37+
* size - overall size of the instruction
38+
* operands - list of instances of class ``Op``
39+
* asm - assembler string, in native assembler
40+
* kinds - instruction meta properties, see :mod:`asm`
41+
* target - instruction lifter to a target platform, e.g., see :mod:`arm`
42+
* bil - a list of BIL statements, describing instruction semantics.
43+
44+
``disasm`` function also accepts a bunch of keyword arguments, to name a few:
45+
46+
* server - either an url to a bap server or a dictionay containing port
47+
and/or executable name
48+
* arch
49+
* endian (instance of ``bil.Endian``)
50+
* addr (should be an instance of type ``bil.Int``)
51+
* backend
52+
* stop_conditions
53+
54+
All attributes are self-describing I hope. ``stop_conditions`` is a list of
55+
``Kind`` instances defined in :mod:`asm`. If disassembler meets instruction
56+
that is instance of one of this kind, it will stop.
57+
58+
Reading files
59+
=============
60+
61+
To read and analyze file one should load it with ``image``
62+
function. This function returns an instance of class ``Image`` that
63+
allows one to discover information about the file, and perform different
64+
queries. It has function ``get_symbol`` function to lookup symbol in
65+
file by name, and the following set of attributes (self describing):
66+
67+
* arch
68+
* entry_point
69+
* addr_size
70+
* endian
71+
* file (file name)
72+
* sections
73+
74+
Sections is a list of instances of ``Section`` class, that also has a
75+
``get_symbol`` function and the following attributes:
76+
77+
* name
78+
* perm (a list of ['r', 'w', 'x'])
79+
* addr
80+
* size
81+
* memory
82+
* symbols
83+
84+
Symbols is a list of, you get it, ``Symbol`` class, each having the
85+
following attributes:
86+
87+
* name
88+
* is_function
89+
* is_debug
90+
* addr
91+
* chunks
92+
93+
Where chunks is a list of instances of ``Memory`` class, each having the
94+
following attributes:
95+
96+
* addr
97+
* size
98+
* data
99+
100+
Where data is actual string of bytes.
101+
"""
102+
__all__ = ['disasm', 'image']
103+
104+
from .bap import disasm, image

adt.py

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
#!/usr/bin/env python
2+
"""
3+
Algebraic Data Types (ADT) is used to represent two kinds of things:
4+
5+
1. A discrimintated union of types, called sum
6+
2. A combination of some types, called product.
7+
8+
# Sum types
9+
10+
Sum types represents a concept of generalizing. For example,
11+
on ARM R0 and R1 are all general purpose registers (GPR). Also on ARM
12+
we have Condition Code registers (CCR) :
13+
14+
class Reg(ADT) : pass
15+
class GPR(Reg) : pass
16+
class CCR(Reg) : pass
17+
class R0(GPR) : pass
18+
class R1(GPR) : pass
19+
20+
21+
That states that a register can be either R0 or R1, but not both.
22+
23+
# Product types
24+
25+
Product types represent a combination of other types. For example,
26+
mov instruction has two arguments, and the arguments are also ADT's by
27+
itself:
28+
29+
def Insn(ADT) : pass
30+
def Mov(Insn) : pass
31+
32+
Mov(R0(), R1())
33+
34+
35+
# Comparison
36+
37+
ADT objects are compared structurally: if they have the same class and
38+
and their values are structurally the same, then they are equal, i.e.,
39+
40+
assert(R0() == R0())
41+
assert(R1() != R0())
42+
43+
"""
44+
45+
from collections import Iterable
46+
47+
class ADT(object):
48+
""" Algebraic Data Type.
49+
50+
This is a base class for all ADTs. ADT represented by a tuple of arguments,
51+
stored in a val field. Arguments should be instances of ADT class, or numbers,
52+
or strings. Empty set of arguments is permitted.
53+
A one-tuple is automatically untupled, i.e., `Int(12)` has value `12`, not `(12,)`.
54+
For convenience, a name of the constructor is provided in `name` field.
55+
56+
A structural comparison is provided.
57+
58+
"""
59+
def __init__(self, *args):
60+
self.name = self.__class__.__name__
61+
self.val = args if len(args) != 1 else args[0]
62+
63+
def __cmp__(self,other):
64+
return self.__dict__.__cmp__(other.__dict__)
65+
66+
def __repr__(self):
67+
def qstr(x):
68+
if isinstance(x, int) or isinstance(x, ADT):
69+
return str(x)
70+
else:
71+
return '"{0}"'.format(x)
72+
def args():
73+
if isinstance(self.val, tuple):
74+
return ", ".join(qstr(x) for x in self.val)
75+
else:
76+
return qstr(self.val)
77+
78+
return "{0}({1})".format(self.name, args())
79+
80+
81+
class Visitor(object):
82+
""" ADT Visitor.
83+
This class helps to perform iterations over arbitrary ADTs.
84+
85+
This visitor supports, subtyping, i.e. you can match not only on
86+
leaf constructors, but also on their bases. For example, with
87+
the `Exp` hierarchy, provided below, you can visit all binary operators,
88+
by overriding `visit_BinOp` method. See `run` method description for
89+
more infromation.
90+
"""
91+
92+
def visit_ADT(self, adt):
93+
"""Default visitor.
94+
95+
This method will be called for those data types that has
96+
no specific visitors. It will recursively descent into all
97+
ADT values.
98+
"""
99+
if isinstance(adt.val, tuple):
100+
for e in adt.val:
101+
self.run(e)
102+
103+
def run(self, adt):
104+
"""ADT.run(adt-or-iterable) -> None
105+
106+
if adt is iterable, the run is called recursively for each member
107+
of adt.
108+
109+
Otherwise, for an ADT of type C the method `visit_C` is looked up in the
110+
visitors methods dictionary. If it doesn't exist, then `visit_B` is
111+
looked up, where `D` is the base class of `C`. The process continues,
112+
until the method is found. This is guaranteed to terminate,
113+
since visit_ADT method is defined.
114+
115+
Note: Non ADTs will be silently ignored.
116+
117+
Once the method is found it is called. It is the method's responsiblity
118+
to recurse into sub-elements, e.g., call run method.
119+
120+
For example, suppose that we want to count negative values in a given
121+
BIL expression:
122+
123+
class CountNegatives(Visitor):
124+
def __init__(self):
125+
self.neg = False
126+
self.count = 0
127+
128+
def visit_Int(self, int):
129+
if int.val < 0 and not self.neg \
130+
or int.val > 0 and self.neg:
131+
self.count += 1
132+
133+
def visit_NEG(self, op):
134+
was = self.neg
135+
self.neg = not was
136+
self.run(op.val)
137+
self.neg = was
138+
139+
We need to keep track on the unary negation operator, and, of
140+
course, we need to look for immediates, so we override two methods:
141+
visit_Int for Int constructor and visit_NEG for counting unary minuses.
142+
(Actually we should count for bitwise NOT operation also, since it will
143+
change the sign bit also, but lets forget about it for the matter of the
144+
excercise (and it can be easily fixed just by matching visit_UnOp)).
145+
146+
When we hit visit_NEG we toggle current sign, storing its previous value
147+
and recurse into the operand. After we return from the recursion, we restore
148+
the sign.
149+
"""
150+
if isinstance(adt, Iterable):
151+
for s in adt:
152+
self.run(s)
153+
if isinstance(adt, ADT):
154+
for c in adt.__class__.mro():
155+
name = ("visit_%s" % c.__name__)
156+
fn = getattr(self, name, None)
157+
if fn is not None:
158+
return fn(adt)
159+
160+
161+
if __name__ == "__main__":
162+
class Fruit(ADT) : pass
163+
class Bannana(Fruit) : pass
164+
class Apple(Fruit) : pass
165+
166+
assert(Bannana() == Bannana())
167+
assert(Bannana() != Apple())
168+
assert( Apple() < Bannana())

0 commit comments

Comments
 (0)