Skip to content

Commit 168ca30

Browse files
authored
knowledge base improvements (#1217)
* knowledge base improvements This PR brings a few improvements to BAP that are summarized in the following demo: https://asciinema.org/a/358996 Important highlights of the PR: - a REPL for querying and modifying the knowledge base - a portable and efficient representation of the knowledge base REPL ---- The REPL is using lineoise and features completion (hit TAB), context-dependent hints (prints what the grammar expects as you type) and is, of course, extensible, i.e., it is possible to implement your own commands and call them from the REPL. The script mode as well as direct input of the commands from the command-line is also supported. Efficient KB Representation --------------------------- The KB representation is more efficient (more that x2 improvement in space) and is portable across different versions of bap (and the representation is itself versioned). To enable such speed up we changed the representation of the Knowledge.Name into an interned form using a hash function with low probablility of collisions. Much like the polymoprhic variants in OCaml except that we use 63 bits instead of 31. Of course, hash collisions are captured and properly reported. This also slightly improved perfomance and memory footprint of bap in general as names were used everywhere in BAP, in variables, in sorts, etc. Although the representation is using bin_prot it is designed to enable interaction with other languages as well as extensibility. Each property is stored as `<ID> <LEN> <PAYLOAD>` where `<ID>` is the name of the property (interned), `<LEN>` is the length of the payload (so that it can be skipped if it is not supported by the parser), and <PAYLOAD> is the string of bytes in the format specific to the property serializer (which itself may include a version tag). Optimized Loading And Storing ----------------------------- Both loading and storing of the cache is now made via memory mapping (that means that the knowledge base should be a regular file). Since all the information is now stored in the knowledge base, just loading it is enough to get the project, that makes loading the project x20 or x25 faster than it was before. This affects both loading from the cache and loading from the specified knowledge base. Interaction With The Cache -------------------------- The cache as before, along other data, stores a knowledge base per each file, indexed by the digest of the input file and all parameters that affect the disassembly. The only thing that changed is that now the result of disassembly is also stored in the knowledge base (previously it was stored as a separate file). When no project is specified (or the project file doesn't exist) the file is loaded from cache. This enables fast extraction of the file's KB from the cache, e.g., ``` bap /bin/ls --project ls.proj --update ``` will load `/bin/ls` from cache and immediately store it in the `ls.proj`, provided that `ls.proj` didn't exist. Lazy Project ------------ The project data structure includes a lot of fat data representation, such as whole program CFG, Symtab that includes a CFG per each function, and the program data structure. This information takes a lot of space both on disk and in RAM and was computed even if it was never used. Moreover it is easily computatble from KB, which uses a much more efficient representation. To address this we made the abovementioned data structures lazy, i.e., if you don't use the program IR then it will not be computed. This saves space and time a lot. New API ------- The following API were added: - [Project.State] that represents the disassembled binary; - [Project.Analysis] for writing your own KB analyses. Minor Tweaks ------------ Tweaks the pretty-printing representation of the knowledge, BIR, and BIL. It is now much more readable, concise, and properly indented. Bug Fixes --------- Fixes #1216 Fixes #1169 Fixes #1168 * enables subroutine search by address not only by name as not all subroutines have names. * disables the BIL verification action We have changed the binary representation so the traces are no longer valid and for technical reasons we can't create new tests in the near future so the only solution is to temporary disable them.
1 parent c93b5d8 commit 168ca30

39 files changed

+2020
-419
lines changed

.github/workflows/build-and-test.yml

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -52,14 +52,6 @@ jobs:
5252
- name: Run Functional Tests
5353
run: opam exec -- make check
5454

55-
- name: Run BIL verification tool
56-
run: |
57-
opam install --fake bap
58-
opam pin add bap-veri testsuite/veri/bap-veri/ -n
59-
opam depext bap-veri
60-
opam install bap-veri
61-
opam exec -- make -C testsuite veri
62-
6355
- uses: actions/upload-artifact@v2
6456
if: ${{ always() }}
6557
with:

0 commit comments

Comments
 (0)