Skip to content

Latest commit

 

History

History
171 lines (121 loc) · 7.4 KB

File metadata and controls

171 lines (121 loc) · 7.4 KB

Key/Value Encoding

The key/value store uses binary Vec<u8> keys and values, so we need an encoding scheme to translate between in-memory Rust data structures and the on-disk binary data. This is provided by the encoding module, with separate schemes for key and value encoding.

Bincode Value Encoding

Values are encoded using Bincode, a third-party binary encoding scheme for Rust. Bincode is convenient because it can easily encode any arbitrary Rust data type. But we could also have chosen e.g. JSON, Protobuf, MessagePack, or any other encoding.

We won't dwell on the actual binary format here, see the Bincode specification for details.

To use a consistent configuration for all encoding and decoding, we provide helper functions in the encoding::bincode module which use bincode::config::standard().

/// Use the standard Bincode configuration.
const CONFIG: bincode::config::Configuration = bincode::config::standard();
/// Serializes a value using Bincode.
pub fn serialize<T: Serialize>(value: &T) -> Vec<u8> {
// Panic on failure, as this is a problem with the data structure.
bincode::serde::encode_to_vec(value, CONFIG).expect("value must be serializable")
}
/// Deserializes a value using Bincode.
pub fn deserialize<'de, T: Deserialize<'de>>(bytes: &'de [u8]) -> Result<T> {
Ok(bincode::serde::borrow_decode_from_slice(bytes, CONFIG)?.0)
}

Bincode uses the very common Serde framework for its API. toyDB also provides an encoding::Value helper trait for value types which adds automatic encode() and decode() methods:

/// Adds automatic Bincode encode/decode methods to value types. These are used
/// for values in key/value storage engines, and also for e.g. network protocol
/// messages and other values.
pub trait Value: Serialize + DeserializeOwned {
/// Decodes a value from a byte slice using Bincode.
fn decode(bytes: &[u8]) -> Result<Self> {
bincode::deserialize(bytes)
}
/// Decodes a value from a reader using Bincode.
fn decode_from<R: Read>(reader: R) -> Result<Self> {
bincode::deserialize_from(reader)
}
/// Decodes a value from a reader using Bincode, or returns None if the
/// reader is closed.
fn maybe_decode_from<R: Read>(reader: R) -> Result<Option<Self>> {
bincode::maybe_deserialize_from(reader)
}
/// Encodes a value to a byte vector using Bincode.
fn encode(&self) -> Vec<u8> {
bincode::serialize(self)
}
/// Encodes a value into a writer using Bincode.
fn encode_into<W: Write>(&self, writer: W) -> Result<()> {
bincode::serialize_into(writer, self)
}
}

Here's an example of how this can be used to encode and decode an arbitrary Dog data type:

#[derive(serde::Serialize, serde::Deserialize)]
struct Dog {
    name: String,
    age: u8,
    good_boy: bool,
}

impl encoding::Value for Dog {}

let pluto = Dog { name: "Pluto".into(), age: 4, good_boy: true };
let bytes = pluto.encode();
println!("{bytes:02x?}");

// Outputs [05, 50, 6c, 75, 74, 6f, 04, 01]:
//
// * Length of string "Pluto": 05.
// * String "Pluto": 50 6c 75 74 6f.
// * Age 4: 04.
// * Good boy: 01 (true).

let pluto = Dog::decode(&bytes)?; // gives us back Pluto

Keycode Key Encoding

Unlike values, keys can't just use any binary encoding like Bincode. As mentioned in the storage section, the storage engine sorts data by key to enable range scans. The key encoding must therefore preserve the lexicographical order of the encoded values: the binary byte slices must sort in the same order as the original values.

As an example of why we can't just use Bincode, consider the strings "house" and "key". These should be sorted in alphabetical order: "house" before "key". However, Bincode encodes strings prefixed by their length, so "key" would be sorted before "house" in binary form:

03 6b 65 79        ← 3 bytes: key
05 68 6f 75 73 65  ← 5 bytes: house

For similar reasons, we can't just encode numbers in their native binary form: the little-endian representation will order very large numbers before small numbers, and the sign bit will order positive numbers before negative numbers. This would violate the ordering of natural numbers.

We also have to be careful with value sequences, which should be ordered element-wise. For example, the pair ("a", "xyz") should be ordered before ("ab", "cd"), so we can't just encode the strings one after the other like "axyz" and "abcd" since that would sort ("ab", "cd") first.

toyDB provides an order-preserving encoding called "Keycode" in the encoding::keycode module. Like Bincode, the Keycode encoding is not self-describing: the binary data does not say what the data type is, the caller must provide a type to decode into. It only supports a handful of primitive data types, and only needs to order values of the same type.

Keycode is implemented as a Serde (de)serializer, which requires a lot of boilerplate code to satisfy the trait, but we'll just focus on the actual encoding. The encoding scheme is as follows:

  • bool: 00 for false and 01 for true.

    /// bool simply uses 1 for true and 0 for false.
    fn serialize_bool(self, v: bool) -> Result<()> {
    self.output.push(if v { 1 } else { 0 });
    Ok(())
    }

  • u64: the big-endian binary encoding.

    /// u64 simply uses the big-endian encoding.
    fn serialize_u64(self, v: u64) -> Result<()> {
    self.output.extend(v.to_be_bytes());
    Ok(())
    }

  • i64: the big-endian binary encoding, but with the sign bit flipped to order negative numbers before positive ones.

    /// i64 uses the big-endian two's complement encoding, but flips the
    /// left-most sign bit such that negative numbers are ordered before
    /// positive numbers.
    ///
    /// The relative ordering of the remaining bits is already correct: -1, the
    /// largest negative integer, is encoded as 01111111...11111111, ordered
    /// after all other negative integers but before positive integers.
    fn serialize_i64(self, v: i64) -> Result<()> {
    let mut bytes = v.to_be_bytes();
    bytes[0] ^= 1 << 7; // flip sign bit
    self.output.extend(bytes);
    Ok(())
    }

  • f64: the big-endian IEEE 754 binary encoding, but with the sign bit flipped, and all bits flipped for negative numbers, to order negative numbers correctly.

    /// f64 is encoded in big-endian IEEE 754 form, but it flips the sign bit to
    /// order positive numbers after negative numbers, and also flips all other
    /// bits for negative numbers to order them from smallest to largest. NaN is
    /// ordered at the end.
    fn serialize_f64(self, v: f64) -> Result<()> {
    let mut bytes = v.to_be_bytes();
    match v.is_sign_negative() {
    false => bytes[0] ^= 1 << 7, // positive, flip sign bit
    true => bytes.iter_mut().for_each(|b| *b = !*b), // negative, flip all bits
    }
    self.output.extend(bytes);
    Ok(())
    }

  • Vec<u8>: terminated by 00 00, with 00 escaped as 00 ff to disambiguate it.

    // Byte slices are terminated by 0x0000, escaping 0x00 as 0x00ff. This
    // ensures that we can detect the end, and that for two overlapping slices,
    // the shorter one orders before the longer one.
    //
    // We can't use e.g. length prefix encoding, since it doesn't sort correctly.
    fn serialize_bytes(self, v: &[u8]) -> Result<()> {
    let bytes = v
    .iter()
    .flat_map(|&byte| match byte {
    0x00 => Either::Left([0x00, 0xff].into_iter()),
    byte => Either::Right([byte].into_iter()),
    })
    .chain([0x00, 0x00]);
    self.output.extend(bytes);
    Ok(())
    }

  • String: like Vec<u8>.

    // Strings are encoded like bytes.
    fn serialize_str(self, v: &str) -> Result<()> {
    self.serialize_bytes(v.as_bytes())
    }

  • Vec<T>, [T], (T,): the concatenation of the inner values.

    /// Sequences simply concatenate the serialized elements, with no external structure.
    impl SerializeSeq for &mut Serializer {
    type Ok = ();
    type Error = Error;
    fn serialize_element<T: Serialize + ?Sized>(&mut self, value: &T) -> Result<()> {
    value.serialize(&mut **self)
    }
    fn end(self) -> Result<()> {
    Ok(())
    }
    }

  • enum: the variant's numerical index as a u8, then the inner values (if any).

    /// Enum variants are serialized using their index, as a single byte.
    fn serialize_unit_variant(self, _: &'static str, index: u32, _: &'static str) -> Result<()> {
    self.output.push(index.try_into()?);
    Ok(())
    }

Like encoding::Value, there is also an encoding::Key helper trait:

/// Adds automatic Keycode encode/decode methods to key enums. These are used
/// as keys in the key/value store.
pub trait Key<'de>: Serialize + Deserialize<'de> {
/// Decodes a key from a byte slice using Keycode.
fn decode(bytes: &'de [u8]) -> Result<Self> {
keycode::deserialize(bytes)
}
/// Encodes a key to a byte vector using Keycode.
///
/// In the common case, the encoded key is borrowed for a storage engine
/// call and then thrown away. We could avoid a bunch of allocations by
/// taking a reusable byte vector to encode into and return a reference to
/// it, but we keep it simple.
fn encode(&self) -> Vec<u8> {
keycode::serialize(self)
}
}

Different kinds of keys are usually represented as enums. For example, if we wanted to store cars and video games, we could use:

#[derive(serde::Serialize, serde::Deserialize)]
enum Key {
    Car(String, String, u64),    // make, model, year
    Game(String, u64, Platform), // name, year, platform
}

#[derive(serde::Serialize, serde::Deserialize)]
enum Platform {
    PC,
    PS5,
    Switch,
    Xbox,
}

impl encoding::Key for Key {}

let returnal = Key::Game("Returnal".into(), 2021, Platform::PS5);
let bytes = returnal.encode();
println!("{bytes:02x?}");

// Outputs [01, 52, 65, 74, 75, 72, 6e, 61, 6c, 00, 00, 00, 00, 00, 00, 00, 00, 07, e5, 01].
//
// * Key::Game: 01
// * Returnal: 52 65 74 75 72 6e 61 6c 00 00
// * 2021: 00 00 00 00 00 00 07 e5
// * Platform::PS5: 01

let returnal = Key::decode(&bytes)?;

Because the keys are sorted in element-wise order, this would allow us to e.g. perform a prefix scan to fetch all platforms which Returnal (2021) was released on, or perform a range scan to fetch all models of Nissan Altima released between 2010 and 2015.


Storage Engine   |   MVCC Transactions