@@ -22,24 +22,22 @@ \section{Introduction}
2222This document describes the syntax and semantics of BAP Instruction
2323Language. The language is used to represent a semantics of machine
2424instructions. Each machine instruction is represented by a BIL program
25- that tries to capture all side effect of the instruction.
26-
27-
25+ that captures side effect of the instruction.
2826
2927\section {Syntax }
3028\label {sec:syntax }
3129
3230\subsection {Metavariables }
3331\label {sec:meta }
3432
35- We define a small set of metavariables that are used to denote string
36- and numeric literals and subscripts :
33+ We define a small set of metavariables that are used to denote
34+ subscripts, numerals and string literals :
3735
3836\ottmetavars
3937
4038\subsection {BIL syntax }
4139
42- BIL program is reperesented as a sequence of BIL statements. Each
40+ BIL program is reperesented as a sequence of statements. Each
4341statement performs some side-effectful computation.
4442
4543\ottgrammartabular {
@@ -57,11 +55,6 @@ \subsection{BIL syntax}
5755
5856\ottgrammartabular {
5957\ottexp\ottinterrule
60- \ottvar\ottinterrule
61- \ottbop\ottinterrule
62- \ottuop\ottinterrule
63- \ottendian\ottinterrule
64- \ottcast\ottinterrule
6558}
6659
6760\ottgrammartabular {
@@ -92,10 +85,10 @@ \subsection{Bitvector syntax}
9285value and size. Operations \verb |ext | and \verb |exts | performs
9386extract/extend operation. The former is unsigned (i.e., it extends
9487with zeros), the latter is signed. This operation extracts bits from a
95- bitvector starting from $ \mathit {hi}$ and ending $ \mathit {lo}$ bit
96- (both ends including ). If $ \mathit {hi}$ is greater than the bitwidth
97- of the bitvector, then it is extended with zeros (for \verb | ext |
98- operation) or with a sign bit (for \verb |exts |) operation.
88+ bitvector starting from $ \mathit {hi}$ and ending with $ \mathit {lo}$
89+ bit (both ends included ). If $ \mathit {hi}$ is greater than the
90+ bitwidth of the bitvector, then it is extended with zeros (for
91+ \verb | ext | operation) or with a sign bit (for \verb |exts |) operation.
9992
10093\ottgrammartabular {
10194\ottword\ottinterrule
@@ -108,8 +101,8 @@ \subsection{Value syntax}
108101expressions that are not reducible.
109102
110103We have three kinds of values --- immediates, represented as
111- bitvectors; unknown values and storages (aka memories) represented
112- symbolically as a list of assignments:
104+ bitvectors; unknown values and storages (memories in BIL parlance),
105+ represented symbolically as a list of assignments:
113106
114107\ottgrammartabular {
115108\ottval\ottinterrule
@@ -126,7 +119,7 @@ \subsection{Formula syntax}
126119$ \Delta $ context is represented as list of pairs. We also add a small
127120set of operations over natural numbers, like comparison and
128121arithmetics. Natural numbers are mostly used to reason about sizes of
129- bitvectors, that why they are often referred as $ \mathit {sz}$ .
122+ bitvectors, that's why they are often referred as $ \mathit {sz}$ .
130123
131124We also add syntax for equality comparison for values and variables.
132125
@@ -147,15 +140,16 @@ \subsection{Instruction syntax}
147140\label {sec:insn }
148141
149142To reason about the whole program we introduce a syntax for
150- instruction. An instruction is a binary string with length
151- $ \mathit {w_2}$ , that was read by a decoder from an address
152- $ \mathit {w_1}$ . The semantics of an instruction is described by a
153- $ \mathit {bil}$ program.
143+ instruction. An instruction is a binary sequence of $ \mathit {w_2}$
144+ bytes, that was read by a decoder from an address $ \mathit {w_1}$ . The
145+ semantics of an instruction is described by the $ \mathit {bil}$ program.
154146
155147\ottgrammartabular {
156148\ottinsn\ottinterrule
157149}
158150
151+ \clearpage
152+
159153\section {Typing }
160154\label {sec:typing }
161155
@@ -167,24 +161,89 @@ \section{Typing}
167161
168162\ottdefnstypingXXexp
169163
164+ \clearpage
165+
170166
171167\section {Operational semantics }
172168
173169\subsection {Model of a program }
174170
175171Program is coinductively defined as an infinite stream of program
176172states, produced by a step rule. Each state is represented with a
177- triplet $ \Delta , w, var$ , where $ \Delta $ is a mapping from variables
173+ triplet $ ( \Delta , w, var) $ , where $ \Delta $ is a mapping from variables
178174to values, $ w$ is a program counter, and $ var$ is a variable
179175denoting currently active memory.
180176
181177The \verb |step | rule defines how a machine instruction is
182- evaluated. We use `` magic'' function \verb |decode | that fetches
183- instruction from memory and decodes it to a BIL program.
178+ evaluated. We use `` magic'' rule \verb |decode | that fetches
179+ instructions from the memory and decodes them to a BIL program.
184180
185- A program counter is updated after each instruction.
181+ The BIL code is evaluated using reduction rules of statements (see
182+ section \ref {sec:sema:stmt }). Then the program counter is updated with
183+ the $ w_3 $ , that initially points to a byte following current instruction.
186184
187185\ottdefnsprogram
188186
187+ \section {Semantics of statements }
188+ \label {sec:sema:stmt }
189+
190+ The reduction rule defines transformation of a state for each
191+ statement. The state of the reduction rule consists of a pair
192+ $ (\Delta ,w)$ , where $ \Delta $ is a mapping from variables to values and
193+ $ w$ is an address of a next instruction.
194+
195+ Two statements affect the state: \verb |Move | statement introduces new
196+ $ var \leftarrow v$ binding in $ \Delta $ , and \verb |Jmp | affects
197+ program counter.
198+
199+ The \verb |if | and \verb |while | instructions introduce local control
200+ flow.
201+
202+ There is no special semantics associated with \verb |special | and
203+ \verb |cpuexn | statements.
204+
205+ \ottdefnsreduceXXstmt
206+
207+
208+ \section {Semantics of expressions}
209+ \label {sec:sema:exp }
210+
211+ This section describes a small step operational semantics for
212+ expressions. A symbolic formula $ \Delta \vdash e \rightarrow e' $
213+ defines a step of transformation from expression $ e$ to an expression
214+ $ e'$ under given context $ \Delta $ .
215+
216+ A well formed (well typed) expression evaluates to a value expression,
217+ that is syntactic subset of expression grammar (see
218+ section \ref {sec:values }).
219+
220+ A value can be either an immediate, represented by a bitvector, a
221+ unknown value, or a memory storage.
222+
223+ A memory storage is represented symbolically as a sequence of
224+ storages to the originally undefined memory. Each storage
225+ operation of size greater than 8 bits is desugared into a sequence of
226+ 8 bit storages in a big endian order.
227+
228+ A load operation will first reduce all sub expressions of a memory
229+ object to values and then recursively destruct the object until one of
230+ the following conditions is met:
231+
232+
233+ \begin {description }
234+ \item [load-byte:] if the memory object is a storage of a \verb |value |
235+ to an immediate (known) address that we're trying to load then the
236+ load expression is reduced to \verb |value |.
237+ \item [load-un-memory:] if the memory object is an \verb |unknown | value,
238+ then the load expression evaluates to \verb |unknown |.
239+ \item [load-un-addr:] if the memory object is a storage to
240+ \verb |unknown | value address then load expression evaluates to
241+ \verb |unknown |.
242+ \end {description }
243+
244+ \ottdefnsreduceXXexp
245+
246+ \ottdefnshelpers
247+
189248
190249\end {document }
0 commit comments