-
Notifications
You must be signed in to change notification settings - Fork 177
Implementation Details
For maximizing the overall disk throughput, ForestDB writes all incoming mutations to a database file in a log-structured append-only manner. However, the index structures (i.e., ID index and Seq index) are not immediately updated in order to reduce the overall write amplification, and the batch of these pended mutations is eventually applied to the index structures later. For this, ForestDB maintains in-memory WAL ID index and WAL Seq index for indexing WAL documents that are not indexed by the main index structures. In this way, those WAL documents can be efficiently retrieved by looking up in-memory WAL indexes.

As shown in the above figure, these in-memory WAL indexes are organized as hash table that maps from the hash value of a key (ID) or its sequence number to the offset of a database file where (metadata, value) entry is located. Since these index structures reside only in-memory, there is no further disk I/O overhead when we lookup WAL entries.

There is a threshold on the total size of WAL entries, and we check the WAL size for every commit operation. The batch of main index updates for WAL entries are aggregated and reflected in both ID index and Seq index if the total size of WAL entries is larger than the configured threshold. Updated nodes in the main index structures are sequentially appended at the end of a database file, and all entries in in-memory WAL indexes are removed. The upcoming mutations from an application will be appended at the end of a database file and added to the in-memory WAL indexes again.
ForestDB supports an optional block align in writing a document (i.e., key's metadata and value) in a database file. If a block align is enabled, then when a document is written in a database file, ForestDB first checks the remaining space of the current block to avoid a document being written over several blocks. The document is simply appended if the remaining space is larger than the size of document. If not, the document is written at the first byte of next block skipping the space of the current block. When the size of a document is larger than a block size, we compare the total number of blocks to be occupied by this document when the document is written at the end of the current position with when the document is written at the beginning of the next block. Then, we choose the location that occupies less blocks. The following figure shows a file layout when the block align option is enabled. Note that the block align option for a document is disabled by default.

Unlike documents, each node in B+ tree is exactly fitted into a block, and they are always written at the beginning of a new block to avoid interleaving with documents in the same block. Since ForestDB does not allow in-place updates, a modified node is written at the end of file invalidating the old node. If an update occurs on a leaf node, its parent node should be modified and written in a new block because the location of the leaf node is moved. For the same reason, all the nodes along the path from the updated leaf node to the root node should be updated and appended to the end of the file recursively. After writing all index updates, we finally write a database header block at the end of the file.
