Skip to content
Merged
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ cl-binary-store works on 64-bit SBCL, ECL, CCL, ABCL, Allegro Common Lisp, and L
- Should work out of the box without any effort with an easy programmer / user interface (no need to write code for each class/struct you use!)
- Stable API and no breaking changes (this is a standard Common Lisp goal)
- Ability to limit amount of data written or read (safety rails)
- Safe from malicious input (some amount of fuzz testing and code reading done, but if you want to rely on safety, please contribute!)

## General features

Expand Down Expand Up @@ -101,6 +102,8 @@ The package :cl-binary-store-user exports all the user facing interfaces above.

If you keep files around long enough, eventually you find you have stored stuff you don't remember. It's nice if you don't get horrible errors while loading the files. cl-binary-store provides a good set of restarts for missing packages (create-package, rehome symbol) and for missing objects or structures (create them, use a different class) or for changes in slots (discard, change slot name). The deserialization is extensible enough that you can put in line upgrading of objects.

The two conditions signalled here are of type MAYBE-EXPECTED-ERROR and INVALID-INPUT-DATA and are MISSING-SLOT and OBJECT-TYPE-NOT-FOUND

## Extending object serialization

For serializing objects, the default behavior is probably good enough for 95% of users. There are four further methods of extension provided at with increasing degrees of complexity and control.
Expand Down Expand Up @@ -199,6 +202,10 @@ This can be used to override the restoration with a user provided codespace in c

Specify what codespace to use during writing. Use \*output-magic-number\* so the file records what was used during writing.

### Conditions and malicious input

cl-binary-store attempts to handle both malicious input and corrupted input reasonably. There is by default a \*max-to-read\* of 2GB which will prevent the equivalent of zip bombs, and I have done some fuzz testing so that in general one expects to see an INVALID-INPUT-DATA error signalled if there is bad input data as opposed to crashing. There are two types of errors one might expect, MISSING-SLOT and OBJECT-TYPE-NOT-FOUND which inherit from MAYBE-EXPECTED-ERROR which is of type INVALID-INPUT-DATA. This allows you to either catch all INVALID-INPUT-DATA (if you just want things to work) or all INVALID-INPUT-DATA except MAYBE-EXPECTED-ERRORs (if you want some interactive recovery). If you actually have corrupted input and wish to recover it, I suggest adding :debug-cbs to \*features\*, recompiling, and pulling the partial data out of the debugger where some of the data will be available on the stack. It is too complicated to support corrupted data recovery and maintain high performance.

### Extending the codespace

A codespace is a definition of the binary file format, they are identified with a magic / version number. At write time the codespace is identified by \*write-version\*. The codespace can optionally be written out to the output (\*write-magic-number\*). Currently we have baked in a notion of tag bytes between objects that identify the type of the next object --- you could presumably switch to whatever tagging scheme you want with a bit of work on the code generation side. We automatically build the storage time typecase dispatch, provide the basics of reference tracking, and some other niceties, and as well a dispatch case statement during restore. This code is specialized for each codespace and built at compile / load time. This can lead to some complexities debugging as the source code is not accessible. To alleviate this one may define-codespace with :debug t, in which case the store and restore functions that are built are dumped to a file "codespace-debug.lisp" and loaded so the usual nice Common Lisp debugging experience can occur. Usually you want to inline many of your functions for performance reasons (especially if you have regular data, the inlining, at least on sbcl, allows very nice performance as, for example, the first restore-object call from inside a wrapper function can be inlined --- so the list restore, for example, is not bouncing back and forth between functions).
Expand Down Expand Up @@ -279,7 +286,7 @@ I suggest just piping the output through gzip if you need the smallest possible

## Debugging

We generate the codespace code through a maze of macros and functions in [codespaces.lisp](src/codespaces.lisp), so if something isn't doing what you want, it is easiest to inspect cl-binary-store::\*codespaces\* and look at the codespace objects that are built and then look at the slots RESTORE-OBJECTS-SOURCE-CODE and STORE-OBJECTS-SOURCE-CODE (which are what are used to build the restore-objects and store-objects functions in the codespace). These can be compiled at the repl or put into a file and compiled so that you can get full debugging of store-objects / restore-objects.
We generate the codespace code through a maze of macros and functions in [codespaces.lisp](src/codespaces.lisp), so if something isn't doing what you want, it is easiest to inspect cl-binary-store::\*codespaces\* and look at the codespace objects that are built and then look at the slots RESTORE-OBJECTS-SOURCE-CODE and STORE-OBJECTS-SOURCE-CODE (which are what are used to build the restore-objects and store-objects functions in the codespace). These can be compiled at the repl or put into a file and compiled so that you can get full debugging of store-objects / restore-objects. To improve the debugging experience you can specify :debug t in [basic-codespace.lisp](src/basic-codespace.lisp) which will emit the code to a file for you so you get the full debugging experience. Pushing :debug-cbs to \*features\* will also help.

## Basic codespace and user codes

Expand Down
1 change: 1 addition & 0 deletions benchmarks.lisp
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@
(with-open-file (str "blarg.bin" :if-exists :supersede :if-does-not-exist :create
:direction :output :element-type '(unsigned-byte 8))
(cl-binary-store:store str data))))))
;;(assert (equalp (cl-binary-store:restore store) data))
(when read
(timed (" READ :" repeats output-size-MB)
(dotimes (x repeats) (cl-binary-store:restore store)))
Expand Down
2 changes: 1 addition & 1 deletion cl-binary-store.asd
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
:description "Unit tests for CL-BINARY-STORE"
:author "Andrew J. Berkley <ajberkley@gmail.com>"
:license :BSD-3
:depends-on (#:parachute)
:depends-on (#:parachute #:cl-binary-store)
:pathname "test/"
:components ((:file "cl-binary-store-tests"))
:perform (test-op (o c) (uiop:symbol-call :parachute :test :cl-binary-store-tests)))
4 changes: 3 additions & 1 deletion src/actions.lisp
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,9 @@
value which is :ignore, :end, or nil if the object is to be
collected for the user. The second value only works if the
object is a top level object (that is one of the objects in
the call to store (store nil obj-a obj-b (make-instance 'end-action))"))
the call to store (store nil obj-a obj-b (make-instance 'end-action))")
(:method ((command t) (storage t) (references t) (restore-object t))
(unexpected-data "Expected an action command" command)))

(defgeneric store-action (action storage store-object)
(:documentation "Call during the serialization phase. You can
Expand Down
64 changes: 42 additions & 22 deletions src/array.lisp
Original file line number Diff line number Diff line change
@@ -1,31 +1,51 @@
(in-package :cl-binary-store)

(defun restore-array (storage restore-object)
;; This is somewhat complex because we cannot build the array
;; if it is displaced to another array until we restore what the array
;; is displaced to. So we need to use a fix-up scenario.
(declare (type function restore-object))
(declare (type function restore-object) (optimize (speed 3) (safety 1)))
(let* ((has-fill-pointer (funcall restore-object))
(fill-pointer (when has-fill-pointer (funcall restore-object)))
(fill-pointer (when has-fill-pointer (restore-tagged-unsigned-fixnum storage)))
(adjustable (funcall restore-object))
(array-rank (the (unsigned-byte 8) (restore-ub8 storage)))
;; restore tagged integers
(dimensions (loop repeat array-rank collect (funcall restore-object)))
(displaced (funcall restore-object)))
(if displaced
(let ((element-type (funcall restore-object))
(offset (funcall restore-object))
(displaced-to (funcall restore-object)))
(make-array dimensions :element-type element-type :adjustable adjustable
:fill-pointer fill-pointer :displaced-to displaced-to
:displaced-index-offset offset))
(let ((array
(let* ((element-type (funcall restore-object)))
(make-array dimensions :element-type element-type :adjustable adjustable
:fill-pointer fill-pointer))))
(loop for idx fixnum from 0 below (array-total-size array)
do (restore-object-to (row-major-aref array idx) restore-object))
array))))
(dimensions (loop repeat array-rank
collect (restore-tagged-unsigned-fixnum storage)))
(displaced (funcall restore-object))
(array-total-size (if dimensions (reduce #'* dimensions) 0)))
(unless (and (typep array-total-size 'fixnum) (>= array-total-size 0))
(unexpected-data "Array total size is too large"))
(check-if-too-much-data (read-storage-max-to-read storage) array-total-size)
(labels ((check-fill-pointer (dimensions)
(when has-fill-pointer
(unless (= array-rank 1)
(unexpected-data "found fill-pointer for a non-vector"))
(unless (<= fill-pointer (first dimensions))
(unexpected-data "fill-pointer > vector length")))
(values)))
(if displaced
(let ((element-type (funcall restore-object))
(offset (restore-tagged-unsigned-fixnum storage))
(displaced-to (funcall restore-object)))
(unless (typep displaced-to 'array)
(unexpected-data "displaced to a non array?!"))
(unless (typep (array-element-type displaced-to) element-type)
(unexpected-data "array displaced to array of different element-type"))
(unless (< offset (array-total-size displaced-to))
(unexpected-data "array displaced to too small array"))
(when has-fill-pointer (check-fill-pointer dimensions))
(make-array dimensions :element-type element-type :adjustable adjustable
:fill-pointer fill-pointer :displaced-to displaced-to
:displaced-index-offset offset))
(progn
(when has-fill-pointer (check-fill-pointer dimensions))
(let ((array
(let* ((element-type (funcall restore-object)))
(make-array dimensions :element-type element-type :adjustable adjustable
:fill-pointer fill-pointer))))
;; We need to make our array first in case any of the array elements refer to it!
;; If we are ever referred to, then there will already be a fixup in place for
;; our array handled by `restore-new-reference-indicator'.
(loop for idx fixnum from 0 below array-total-size
do (restore-object-to (row-major-aref array idx) restore-object))
array))))))

(defun store-array (array storage eq-refs store-object assign-new-reference-id)
(declare (optimize speed safety) (type array array) (type function store-object))
Expand Down
5 changes: 5 additions & 0 deletions src/cl-binary-store-user.lisp
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,11 @@
#:serializable-object-info
#:specialized-object-constructor
#:specialized-serializer/deserializer

;; More conditions
#:invalid-input-data
#:too-much-data
#:maybe-expected-error
))

(in-package #:cl-binary-store-user)
Expand Down
24 changes: 23 additions & 1 deletion src/cl-binary-store.lisp
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,29 @@
#:*allow-codespace-switching*
#:*max-to-write*
#:*max-to-read*
#:*output-magic-number*))
#:*output-magic-number*

;; Conditions
#:invalid-input-data
#:too-much-data
#:maybe-expected-error))


(in-package :cl-binary-store)

(define-condition invalid-input-data (simple-error)
())

(defun unexpected-data (expected &optional (data nil data-provided-p))
(error 'invalid-input-data
:format-control "Expected ~A~A"
:format-arguments (list expected
(if data-provided-p
;; be careful not to provide anything
;; that cannot be printed trivially here!
(format nil ", found ~A" data)
""))))

(define-condition maybe-expected-error (invalid-input-data)
()
(:documentation "Things like MISSING-PACKAGE-DURING-RESTORE, MISSING-SLOT"))
9 changes: 5 additions & 4 deletions src/codespaces.lisp
Original file line number Diff line number Diff line change
Expand Up @@ -530,8 +530,8 @@
(t (case ,code-to-dispatch-on
,@numeric-dispatch-codes
(otherwise
(error 'simple-error :format-control "Unknown code ~A found in stream"
:format-arguments (list ,code-to-dispatch-on)))))))))
(error 'invalid-input-data :format-control "Unknown code ~A found in stream"
:format-arguments (list ,code-to-dispatch-on)))))))))

(defun store-objects (storage &rest stuff)
"Store all the objects in stuff to storage. Do not call this directly without let'ing
Expand All @@ -549,6 +549,7 @@
*read-version*."
(declare (type read-storage storage))
(let ((codespace *current-codespace*))
(assert codespace nil
"Unknown codespace to restore objects with... is *read-version* not correct?")
(unless codespace
(error 'invalid-input-data :format-control
"Unknown codespace to restore objects with... is *read-version* not correct?"))
(funcall (codespace-restore-objects codespace) storage)))
30 changes: 17 additions & 13 deletions src/cons.lisp
Original file line number Diff line number Diff line change
Expand Up @@ -142,16 +142,20 @@
(defun restore-list/known-length (storage restore-object)
(declare (optimize (speed 3) (safety 0)))
(let* ((length (restore-tagged-unsigned-fixnum/interior storage)))
(check-if-too-much-data (read-storage-max-to-read storage)
(truly-the fixnum
(+ (read-storage-total-read storage)
(truly-the fixnum (* 16 length)))))
(let* ((head (make-list length))
(cons head))
(dotimes (count (1- length))
(restore-object-to (car cons) restore-object)
(setf cons (cdr cons)))
;; Support dotted end of list
(restore-object-to (car cons) restore-object)
(restore-object-to (cdr cons) restore-object)
head)))
(unless (and (<= 0 length (ash most-positive-fixnum -4))
(<=
(ash length 4)
(truly-the fixnum
(- (read-storage-max-to-read storage) (read-storage-total-read storage)))))
(error 'too-much-data :max-bytes (read-storage-max-to-read storage)
:bytes (+ (ash length 4) (read-storage-total-read storage))))
(when (> length 0)
(let* ((head (make-list length))
(cons head))
(dotimes (count (1- length))
(restore-object-to (car cons) restore-object)
(setf cons (cdr cons)))
;; Support dotted end of list
(restore-object-to (car cons) restore-object)
(restore-object-to (cdr cons) restore-object)
head))))
4 changes: 4 additions & 0 deletions src/hash-table.lisp
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@
#-sbcl (declare (ignore synchronized weakness))
;; weakness works as far as I can discern
;; because of how we do reference restoration
(unless (typep rehash-size '(or (integer 1 *) (float (1.0) *)))
(unexpected-data "rehash-size is not correct"))
(unless (< size (ash most-positive-fixnum -4))
(unexpected-data "hash table too large"))
(check-if-too-much-data (read-storage-max-to-read storage)
(* 16 size)) ;; an estimate
(make-hash-table :test test :size size
Expand Down
16 changes: 10 additions & 6 deletions src/magic-numbers.lisp
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,11 @@
(let ((magic-number (funcall restore-object)))
(let ((codespace (gethash magic-number *codespaces*)))
(unless codespace
(error "Unsupported codespace version #x~X, we have ~{~x~X~^ ~}~%"
magic-number (loop for key being the hash-keys of *codespaces*
collect key)))
(error 'invalid-input-data
:format-control "Unsupported codespace version #x~X, we have ~{~x~X~^ ~}~%"
:format-arguments (list
magic-number (loop for key being the hash-keys of *codespaces*
collect key))))
(cond
((not (eq *current-codespace* codespace))
(cond
Expand All @@ -38,9 +40,11 @@
(setf *version-being-read* magic-number)
(restore-objects storage))
(t
(error "Switching codespace away from #x~X (~A) is DISALLOWED"
(codespace-magic-number *current-codespace*)
(codespace-name *current-codespace*)))))
(error 'invalid-input-data
:format-control "Switching codespace away from #x~X (~A) is DISALLOWED"
:format-arguments (list
(codespace-magic-number *current-codespace*)
(codespace-name *current-codespace*))))))
(t
(setf *version-being-read* magic-number)
(format t "Deserializing from version #x~X (~A)~%"
Expand Down
Loading