Skip to content

Commit be9ac7d

Browse files
committed
Add 'lsm' shell to work with lsm trees on the command-line
1 parent d045be1 commit be9ac7d

File tree

4 files changed

+2848
-0
lines changed

4 files changed

+2848
-0
lines changed

Cargo.toml

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,29 +16,48 @@ categories = ["data-structures", "database-implementations", "algorithms"]
1616
name = "lsm_tree"
1717
path = "src/lib.rs"
1818

19+
[[bin]]
20+
name = "lsm"
21+
path = "src/tool.rs"
22+
required-features = ["tool"]
23+
1924
[features]
2025
default = []
2126
lz4 = ["dep:lz4_flex"]
2227
bytes_1 = ["dep:bytes"]
2328
metrics = []
29+
tool = ["dep:clap", "dep:shlex", "dep:rustyline", "dep:parse-size", "dep:humansize",
30+
"dep:tracing", "dep:tracing-log", "dep:tracing-panic", "dep:tracing-subscriber"]
2431

2532
[dependencies]
2633
bytes = { version = "1", optional = true }
2734
byteorder = { package = "byteorder-lite", version = "0.1.0" }
2835
byteview = "~0.10.0"
36+
clap = { version = "4", features = ["derive"], optional = true }
2937
crossbeam-skiplist = "0.1.3"
3038
enum_dispatch = "0.3.13"
39+
humansize = { version = "2.1", optional = true }
3140
interval-heap = "0.0.5"
3241
log = "0.4.27"
3342
lz4_flex = { version = "0.11.5", optional = true, default-features = false }
43+
parse-size = { version = "1.0", optional = true }
3444
quick_cache = { version = "0.6.16", default-features = false, features = [] }
3545
rustc-hash = "2.1.1"
46+
rustyline = { version = "15", optional = true }
3647
self_cell = "1.2.0"
3748
sfa = "~1.0.0"
49+
shlex = { version = "1", optional = true }
3850
tempfile = "3.20.0"
51+
tracing = { version = "0.1", optional = true }
52+
tracing-log = { version = "0.2", optional = true }
53+
tracing-panic = { version = "0.1.2", optional = true }
54+
tracing-subscriber = { version = "0.3", features = ["env-filter", "fmt", "registry"], optional = true }
3955
varint-rs = "2.2.0"
4056
xxhash-rust = { version = "0.8.15", features = ["xxh3"] }
4157

58+
[target.'cfg(unix)'.dev-dependencies]
59+
rexpect = "0.5"
60+
4261
[dev-dependencies]
4362
criterion = { version = "0.8.0", features = ["html_reports"] }
4463
fs_extra = "1.3.0"

README.md

Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,149 @@ Uses [`bytes`](https://github.com/tokio-rs/bytes) as the underlying `Slice` type
6262

6363
*Disabled by default.*
6464

65+
### tool
66+
67+
Enables the `lsm` CLI binary for interacting with LSM trees from the command line.
68+
69+
*Disabled by default.*
70+
71+
## CLI Tool
72+
73+
The crate includes an optional CLI tool (`lsm`) for inspecting and manipulating LSM trees.
74+
75+
### Installation
76+
77+
```bash
78+
cargo install lsm-tree --features tool
79+
```
80+
81+
Or build from source:
82+
83+
```bash
84+
cargo build --release --features tool
85+
```
86+
87+
### Usage
88+
89+
The tool can be used either with direct commands or in interactive shell mode.
90+
91+
#### Direct Commands
92+
93+
```bash
94+
# Set a key-value pair
95+
lsm /path/to/db set mykey "my value"
96+
97+
# Get a value
98+
lsm /path/to/db get mykey
99+
100+
# Delete a key
101+
lsm /path/to/db del mykey
102+
103+
# List all keys (aliases: list, ls)
104+
lsm /path/to/db scan
105+
106+
# List keys with a prefix
107+
lsm /path/to/db scan "user:"
108+
109+
# List keys in a range [start, end)
110+
lsm /path/to/db range a z
111+
112+
# Count items
113+
lsm /path/to/db count
114+
115+
# Show database info
116+
lsm /path/to/db info
117+
118+
# Flush memtable to disk
119+
lsm /path/to/db flush
120+
121+
# Run compaction
122+
lsm /path/to/db compact
123+
```
124+
125+
#### Interactive Shell
126+
127+
Start an interactive shell by running without a command:
128+
129+
```bash
130+
lsm /path/to/db
131+
```
132+
133+
The shell supports all the above commands plus:
134+
135+
- `begin` - Start a batch/transaction
136+
- `commit` - Commit the current batch
137+
- `rollback` - Discard the current batch
138+
- `exit` / `quit` - Exit (flushes data first)
139+
- `abort` - Exit without flushing
140+
- `help` - Show available commands
141+
142+
#### Batch Operations
143+
144+
The shell supports batching multiple operations into an atomic unit:
145+
146+
```
147+
lsm> begin
148+
OK (batch started)
149+
lsm> set key1 value1
150+
OK (batched, ready to commit)
151+
lsm> set key2 value2
152+
OK (batched, ready to commit)
153+
lsm> del key3
154+
OK (batched, ready to commit)
155+
lsm> commit
156+
OK (batch committed, ready to flush)
157+
```
158+
159+
While a batch is active:
160+
- `get` reads from the batch first, then falls back to the tree
161+
- `scan` and `range` warn that they ignore uncommitted batch operations
162+
- `info` shows the pending batch operations
163+
- `rollback` discards all batched operations
164+
165+
#### Long Scan
166+
167+
Use `-l` / `--long` to show internal entry details including sequence numbers, value types, and tombstones:
168+
169+
```
170+
lsm> scan -l
171+
=== Active Memtable ===
172+
key1 = value1 [seqno=0, type=Value]
173+
key2 [seqno=1, type=Tombstone]
174+
175+
=== Persisted (on disk) ===
176+
key3 = value3 [seqno=2, type=Value]
177+
178+
(3 total items, 2 in memtable, 1 persisted, 1 tombstones)
179+
```
180+
181+
#### Blob Trees with Indirect Items
182+
183+
A blob tree uses key-value separation, storing large values in separate blob files and keeping indirect references (indirections) in the main LSM-tree. This improves performance for large values by reducing write amplification and improving compaction efficiency.
184+
185+
To create a blob tree, use the `--blob-tree` flag along with `--separation-threshold` (or `-t`) to specify the size threshold in bytes. Values larger than this threshold will be stored as indirect items:
186+
187+
```bash
188+
# Create a blob tree with 1 KiB separation threshold
189+
lsm --blob-tree --separation-threshold 1024 /path/to/db set largekey "very large value..."
190+
191+
# Or using the short form
192+
lsm -b -t 1KiB /path/to/db set largekey "very large value..."
193+
194+
# In interactive mode
195+
lsm --blob-tree -t 1024 /path/to/db
196+
lsm> set largekey "very large value..."
197+
OK (set)
198+
lsm> flush
199+
OK (flushed)
200+
lsm> scan -l
201+
=== Active Memtable ===
202+
=== Persisted (on disk) ===
203+
largekey = very large value... [seqno=0, type=Indirection]
204+
```
205+
206+
After flushing, values that exceed the separation threshold will appear as `type=Indirection` in verbose scan output, indicating they are stored in separate blob files rather than inline in the table.
207+
65208
## Run unit benchmarks
66209

67210
```bash

0 commit comments

Comments
 (0)