|
1 | | -todo!() |
| 1 | +# 🧸 ShorterDB |
| 2 | + |
| 3 | +ShorterDB is a lightweight, embedded key-value store inspired by popular databases like RocksDB and LevelDB. It is designed to provide a simple and extensible architecture for learning and experimentation. While it may not match the performance of production-grade systems, it offers a clear and modular implementation of key-value store concepts, including Write-Ahead Logging (WAL), Memtables, and Sorted String Tables (SSTs). |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | +## Table of Contents |
| 8 | + |
| 9 | +1. [Introduction](#introduction) |
| 10 | +2. [Installation](#installation) |
| 11 | +3. [Features](#features) |
| 12 | +4. [Examples](#examples) |
| 13 | + - [Embedded Database](#embedded-database) |
| 14 | + - [gRPC Server](#grpc-server) |
| 15 | + - [CSV Import with REPL](#csv-import-with-repl) |
| 16 | +5. [Code Walkthrough](#code-walkthrough) |
| 17 | + - [Error Handling](#error-handling) |
| 18 | + - [Database Core (`ShorterDB`)](#database-core-shorterdb) |
| 19 | +6. [Limitations](#limitations) |
| 20 | +7. [Future Work](#future-work) |
| 21 | +8. [Architecture Overview](#architecture-overview) |
| 22 | + - [Write-Ahead Log (WAL)](#write-ahead-log-wal) |
| 23 | + - [Memtable](#memtable) |
| 24 | + - [Sorted String Table (SST)](#sorted-string-table-sst) |
| 25 | +9. [Conclusion](#conclusion) |
| 26 | +10. [Contributing](#contributing) |
| 27 | + |
| 28 | +--- |
| 29 | + |
| 30 | +## Installation |
| 31 | + |
| 32 | +To use ShorterDB in your Rust project, add the following to your `Cargo.toml`: |
| 33 | + |
| 34 | +```toml |
| 35 | +[dependencies] |
| 36 | +shorterdb = "0.1.0" |
| 37 | +``` |
| 38 | + |
| 39 | +For building the project locally, ensure you have Rust installed. Clone the repository and run: |
| 40 | + |
| 41 | +```bash |
| 42 | +git clone https://github.com/your-repo/shorterdb.git |
| 43 | +cd shorterdb |
| 44 | +cargo build |
| 45 | +``` |
| 46 | + |
| 47 | +--- |
| 48 | + |
| 49 | +## Introduction |
| 50 | + |
| 51 | +ShorterDB is a simple key-value store built using a De-LSM architecture. It is designed for educational purposes and provides a modular implementation of database components. The project includes examples for embedded usage, gRPC-based remote access, and CSV imports. |
| 52 | + |
| 53 | +--- |
| 54 | + |
| 55 | +## Features |
| 56 | + |
| 57 | +- **Embedded Database**: Use ShorterDB as a lightweight, file-based key-value store. |
| 58 | +- **gRPC Server**: Access the database remotely using gRPC. |
| 59 | +- **REPL Interface**: Interact with the database in a command-line interface. |
| 60 | +- **Write-Ahead Logging (WAL)**: Ensure durability by logging all writes. |
| 61 | +- **Memtable**: An in-memory data structure for fast reads and writes. |
| 62 | +- **Sorted String Table (SST)**: Persistent storage for key-value pairs. |
| 63 | + |
| 64 | +--- |
| 65 | + |
| 66 | +## Examples |
| 67 | + |
| 68 | +### Embedded Database |
| 69 | + |
| 70 | +The [`embedded`](examples/embedded) example demonstrates how to use ShorterDB as an embedded database. |
| 71 | + |
| 72 | +```rust |
| 73 | +let mut db = ShorterDB::new(Path::new("./embedded_db")).unwrap(); |
| 74 | +db.set(b"hello", b"world").unwrap(); |
| 75 | +let value = db.get(b"hello").unwrap(); |
| 76 | +assert_eq!(value, Some(b"world".to_vec())); |
| 77 | +``` |
| 78 | + |
| 79 | +### gRPC Server |
| 80 | + |
| 81 | +The [`grpc`](examples/grpc) example provides a gRPC interface for remote database access. |
| 82 | + |
| 83 | +```rust |
| 84 | +#[tonic::async_trait] |
| 85 | +impl Basic for DbOperations { |
| 86 | + async fn get(&self, request: tonic::Request<GetRequest>) -> Result<tonic::Response<GetResponse>, tonic::Status> { |
| 87 | + let key = request.get_ref().key.clone(); |
| 88 | + let db = self.db.lock().await; |
| 89 | + match db.get(key.as_bytes()) { |
| 90 | + Ok(Some(value)) => Ok(tonic::Response::new(GetResponse { value: String::from_utf8(value).unwrap() })), |
| 91 | + Ok(None) => Err(tonic::Status::not_found("Key not found")), |
| 92 | + Err(_) => Err(tonic::Status::internal("Error reading from the database")), |
| 93 | + } |
| 94 | + } |
| 95 | +} |
| 96 | +``` |
| 97 | + |
| 98 | +### CSV Import with REPL |
| 99 | + |
| 100 | +The [`repl_csv`](examples/repl_csv) example imports data from a CSV file and provides a REPL interface. |
| 101 | + |
| 102 | +```rust |
| 103 | +let mut rdr = ReaderBuilder::new().has_headers(false).from_reader(File::open("data.csv").unwrap()); |
| 104 | +for result in rdr.records() { |
| 105 | + let record = result.unwrap(); |
| 106 | + db.set(record.get(0).unwrap().as_bytes(), record.get(1).unwrap().as_bytes()).unwrap(); |
| 107 | +} |
| 108 | +``` |
| 109 | + |
| 110 | +--- |
| 111 | + |
| 112 | +## Code Walkthrough |
| 113 | + |
| 114 | +### Error Handling |
| 115 | + |
| 116 | +ShorterDB uses the `thiserror` crate for error handling. Custom error types are defined in `errors.rs`. |
| 117 | + |
| 118 | +```rust |
| 119 | +#[derive(Error, Debug)] |
| 120 | +pub enum ShortDBErrors { |
| 121 | + #[error("{0}")] |
| 122 | + Io(#[from] io::Error), |
| 123 | + #[error("Key not found")] |
| 124 | + KeyNotFound, |
| 125 | + #[error("Value not set")] |
| 126 | + ValueNotSet, |
| 127 | + #[error("Flush needed from Memtable")] |
| 128 | + FlushNeededFromMemTable, |
| 129 | +} |
| 130 | +``` |
| 131 | + |
| 132 | +### Database Core (`ShorterDB`) |
| 133 | + |
| 134 | +The `ShorterDB` struct ties together the Memtable and SST components. |
| 135 | + |
| 136 | +```rust |
| 137 | +pub struct ShorterDB { |
| 138 | + pub(crate) memtable: Memtable, |
| 139 | + pub(crate) sst: SST, |
| 140 | + pub(crate) data_dir: PathBuf, |
| 141 | +} |
| 142 | + |
| 143 | +impl ShorterDB { |
| 144 | + pub fn set(&mut self, key: &[u8], value: &[u8]) -> Result<()> { |
| 145 | + self.memtable.set(key, value)?; |
| 146 | + Ok(()) |
| 147 | + } |
| 148 | +} |
| 149 | +``` |
| 150 | + |
| 151 | +--- |
| 152 | + |
| 153 | +## Code Walkthrough |
| 154 | + |
| 155 | +### Error Handling |
| 156 | + |
| 157 | +ShorterDB uses the `thiserror` crate for error handling. Custom error types are defined in `errors.rs`. |
| 158 | + |
| 159 | +```rust |
| 160 | +#[derive(Error, Debug)] |
| 161 | +pub enum ShortDBErrors { |
| 162 | + #[error("{0}")] |
| 163 | + Io(#[from] io::Error), |
| 164 | + #[error("Key not found")] |
| 165 | + KeyNotFound, |
| 166 | + #[error("Value not set")] |
| 167 | + ValueNotSet, |
| 168 | + #[error("Flush needed from Memtable")] |
| 169 | + FlushNeededFromMemTable, |
| 170 | +} |
| 171 | +``` |
| 172 | + |
| 173 | +### Database Core (`ShorterDB`) |
| 174 | + |
| 175 | +The `ShorterDB` struct ties together the WAL, Memtable, and SST components. |
| 176 | + |
| 177 | +```rust |
| 178 | +pub struct ShorterDB { |
| 179 | + pub(crate) memtable: Memtable, |
| 180 | + pub(crate) wal: WAL, |
| 181 | + pub(crate) sst: SST, |
| 182 | + pub(crate) data_dir: PathBuf, |
| 183 | +} |
| 184 | + |
| 185 | +impl ShorterDB { |
| 186 | + pub fn set(&mut self, key: &[u8], value: &[u8]) -> Result<()> { |
| 187 | + let entry = WALEntry { |
| 188 | + key: Bytes::copy_from_slice(key), |
| 189 | + value: Bytes::copy_from_slice(value), |
| 190 | + }; |
| 191 | + self.wal.write(&entry)?; |
| 192 | + self.memtable.set(key, value)?; |
| 193 | + Ok(()) |
| 194 | + } |
| 195 | +} |
| 196 | +``` |
| 197 | + |
| 198 | +--- |
| 199 | + |
| 200 | +## Examples |
| 201 | + |
| 202 | +### Embedded Database |
| 203 | + |
| 204 | +The `embedded` example demonstrates how to use ShorterDB as an embedded database. |
| 205 | + |
| 206 | +```rust |
| 207 | +let mut db = ShorterDB::new(Path::new("./embedded_db")).unwrap(); |
| 208 | +db.set(b"hello", b"world").unwrap(); |
| 209 | +let value = db.get(b"hello").unwrap(); |
| 210 | +assert_eq!(value, Some(b"world".to_vec())); |
| 211 | +``` |
| 212 | + |
| 213 | +### gRPC Server |
| 214 | + |
| 215 | +The `grpc` example provides a gRPC interface for remote database access. |
| 216 | + |
| 217 | +```rust |
| 218 | +#[tonic::async_trait] |
| 219 | +impl Basic for DbOperations { |
| 220 | + async fn get(&self, request: tonic::Request<GetRequest>) -> Result<tonic::Response<GetResponse>, tonic::Status> { |
| 221 | + let key = request.get_ref().key.clone(); |
| 222 | + let db = self.db.lock().await; |
| 223 | + match db.get(key.as_bytes()) { |
| 224 | + Ok(Some(value)) => Ok(tonic::Response::new(GetResponse { value: String::from_utf8(value).unwrap() })), |
| 225 | + Ok(None) => Err(tonic::Status::not_found("Key not found")), |
| 226 | + Err(_) => Err(tonic::Status::internal("Error reading from the database")), |
| 227 | + } |
| 228 | + } |
| 229 | +} |
| 230 | +``` |
| 231 | + |
| 232 | +### CSV Import with REPL |
| 233 | + |
| 234 | +The `repl_csv` example imports data from a CSV file and provides a REPL interface. |
| 235 | + |
| 236 | +```rust |
| 237 | +let mut rdr = ReaderBuilder::new().has_headers(false).from_reader(File::open("data.csv").unwrap()); |
| 238 | +for result in rdr.records() { |
| 239 | + let record = result.unwrap(); |
| 240 | + db.set(record.get(0).unwrap().as_bytes(), record.get(1).unwrap().as_bytes()).unwrap(); |
| 241 | +} |
| 242 | +``` |
| 243 | + |
| 244 | +--- |
| 245 | + |
| 246 | +## Limitations |
| 247 | + |
| 248 | +- Performance is not optimized for production use. |
| 249 | +- Limited concurrency support. |
| 250 | +- No advanced features like compression or compaction. |
| 251 | + |
| 252 | +--- |
| 253 | + |
| 254 | +## Future Work |
| 255 | + |
| 256 | +- Add support for compression. |
| 257 | +- Implement advanced compaction strategies. |
| 258 | +- Improve concurrency and parallelism. |
| 259 | + |
| 260 | +--- |
| 261 | + |
| 262 | +## Architecture Overview |
| 263 | + |
| 264 | +ShorterDB is built using a modular architecture that separates concerns into distinct components: |
| 265 | + |
| 266 | +### Write-Ahead Log (WAL) |
| 267 | + |
| 268 | +The WAL ensures durability by logging all write operations before they are applied to the in-memory `Memtable`. This guarantees that data can be recovered in case of a crash. |
| 269 | + |
| 270 | +```rust |
| 271 | +pub(crate) struct WAL { |
| 272 | + path: PathBuf, |
| 273 | + file: File, |
| 274 | +} |
| 275 | + |
| 276 | +impl WAL { |
| 277 | + pub(crate) fn write(&mut self, entry: &WALEntry) -> io::Result<()> { |
| 278 | + self.file.write_all(&entry.key.len().to_le_bytes())?; |
| 279 | + self.file.write_all(entry.key.as_ref())?; |
| 280 | + self.file.write_all(&entry.value.len().to_le_bytes())?; |
| 281 | + self.file.write_all(entry.value.as_ref())?; |
| 282 | + self.file.flush()?; |
| 283 | + Ok(()) |
| 284 | + } |
| 285 | +} |
| 286 | +``` |
| 287 | + |
| 288 | +### Memtable |
| 289 | + |
| 290 | +The `Memtable` is an in-memory data structure that stores key-value pairs. It uses a `SkipMap` for efficient lookups and maintains a size limit to trigger flushing to SSTs. |
| 291 | + |
| 292 | +```rust |
| 293 | +pub(crate) struct Memtable { |
| 294 | + pub(crate) memtable: Arc<SkipMap<Bytes, Bytes>>, |
| 295 | + pub(crate) size: u64, |
| 296 | +} |
| 297 | + |
| 298 | +impl Memtable { |
| 299 | + pub(crate) fn set(&mut self, key: &[u8], value: &[u8]) -> Result<()> { |
| 300 | + self.memtable.insert(Bytes::copy_from_slice(key), Bytes::copy_from_slice(value)); |
| 301 | + self.size += 1; |
| 302 | + if self.size >= 256 { |
| 303 | + return Err(ShortDBErrors::FlushNeededFromMemTable); |
| 304 | + } |
| 305 | + Ok(()) |
| 306 | + } |
| 307 | +} |
| 308 | +``` |
| 309 | + |
| 310 | +### Sorted String Table (SST) |
| 311 | + |
| 312 | +The SST is a persistent, sorted, and immutable data structure stored on disk. It is used for long-term storage of key-value pairs. |
| 313 | + |
| 314 | +```rust |
| 315 | +pub(crate) struct SST { |
| 316 | + pub(crate) dir: PathBuf, |
| 317 | + pub(crate) levels: Vec<PathBuf>, |
| 318 | + pub(crate) queue: VecDeque<Memtable>, |
| 319 | +} |
| 320 | + |
| 321 | +impl SST { |
| 322 | + pub(crate) fn set(&mut self) { |
| 323 | + let mem = self.queue.pop_front().unwrap(); |
| 324 | + for entry in mem.memtable.iter() { |
| 325 | + let key = entry.key(); |
| 326 | + let value = entry.value(); |
| 327 | + let mut path_of_kv_file = self.dir.clone(); |
| 328 | + path_of_kv_file.push("l0"); |
| 329 | + path_of_kv_file.push(bytes_to_string(key)); |
| 330 | + let mut file = File::create_new(&path_of_kv_file); |
| 331 | + file.unwrap().write_all(value).unwrap(); |
| 332 | + } |
| 333 | + } |
| 334 | +} |
| 335 | +``` |
| 336 | + |
| 337 | +--- |
| 338 | + |
| 339 | +## Limitations |
| 340 | + |
| 341 | +- Performance is not optimized for production use. |
| 342 | +- Limited concurrency support. |
| 343 | +- No advanced features like compression or compaction. |
| 344 | + |
| 345 | +--- |
| 346 | + |
| 347 | +## Future Work |
| 348 | + |
| 349 | +- Add support for compression. |
| 350 | +- Implement advanced compaction strategies. |
| 351 | +- Improve concurrency and parallelism. |
| 352 | + |
| 353 | +--- |
| 354 | + |
| 355 | +## Contributing |
| 356 | + |
| 357 | +Contributions are welcome! To contribute: |
| 358 | + |
| 359 | +1. Fork the repository. |
| 360 | +2. Create a new branch for your feature or bugfix. |
| 361 | +3. Submit a pull request with a clear description of your changes. |
| 362 | + |
| 363 | +--- |
| 364 | + |
| 365 | +## Conclusion |
| 366 | + |
| 367 | +ShorterDB is a simple and modular key-value store designed for learning and experimentation. While it may not match the performance of production-grade systems, it provides a clear and extensible implementation of database concepts. Explore the [examples](#examples) to get started! |
0 commit comments