Skip to content
Open
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
480edca
oplog compression
Wulf0x67E7 Sep 14, 2021
393255e
forgot about associated const
Wulf0x67E7 Sep 14, 2021
e238b4f
"fixed" loom test
Wulf0x67E7 Sep 15, 2021
0c430ec
test limited compress range
Wulf0x67E7 Sep 15, 2021
04adefa
more tests, found/fixed none removal bug
Wulf0x67E7 Sep 15, 2021
abcf063
optimize none removal when few nones
Wulf0x67E7 Sep 15, 2021
1ca8a2a
preempted possible future bug with none_back_count
Wulf0x67E7 Sep 15, 2021
4f5d9fe
4. optimization makes prev bug fix redundant
Wulf0x67E7 Sep 15, 2021
5bc3699
comments and minor tweaks
Wulf0x67E7 Sep 16, 2021
0aa5900
...reversed tweak to bring rust version back down
Wulf0x67E7 Sep 16, 2021
722c318
fixed missing optimization wghile reversing tweak
Wulf0x67E7 Sep 19, 2021
f22dc65
try_compress(&mut prev, next): better fits model
Wulf0x67E7 Sep 19, 2021
6c7bfd7
rangeify and cleanup
Wulf0x67E7 Sep 19, 2021
3a37d0d
more cleanup
Wulf0x67E7 Sep 19, 2021
c6aa598
One final cleanup and test
Wulf0x67E7 Sep 20, 2021
f0c9966
Apply suggestions from code review
Wulf0x67E7 Sep 26, 2021
cde672f
suggested doc changes
Wulf0x67E7 Sep 26, 2021
f74cadd
unwrap_or_else to expect
Wulf0x67E7 Sep 26, 2021
000d512
inverted do-compression condition
Wulf0x67E7 Sep 26, 2021
cec99e0
factored out compress_insert_op
Wulf0x67E7 Sep 26, 2021
0382332
cleaner sanity checks
Wulf0x67E7 Sep 26, 2021
f49adca
test tweaks
Wulf0x67E7 Sep 26, 2021
13c380e
quickcheck correctness
Wulf0x67E7 Sep 26, 2021
5f58376
heavily rigged criterion benchmark
Wulf0x67E7 Sep 26, 2021
4b61b26
Apply documentation suggestions from code review
Wulf0x67E7 Oct 4, 2021
1ca7bbc
tweaking extend variant branch layout
Wulf0x67E7 Oct 4, 2021
e3ff872
Merge branch 'master' of https://github.com/Wulf0x67E7/left-right
Wulf0x67E7 Oct 4, 2021
5138829
factored out custom retain
Wulf0x67E7 Oct 4, 2021
b852b9c
refined quicktest
Wulf0x67E7 Oct 4, 2021
6ca3be6
make test use factored out retain instead
Wulf0x67E7 Oct 4, 2021
b6cbabe
correcting/tweaking outdated doc
Wulf0x67E7 Oct 4, 2021
14c2c9c
benchmark revamp
Wulf0x67E7 Oct 8, 2021
0016728
changed quickcheck to not blow up miri
Wulf0x67E7 Oct 9, 2021
e2a5b82
fixed bug from false assumptions
Wulf0x67E7 Oct 9, 2021
1524b18
more benches
Wulf0x67E7 Oct 9, 2021
f512288
un-constify benchamrks
Wulf0x67E7 Oct 12, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,15 @@ categories = ["concurrency"]
[dependencies]
slab = "0.4"

[dev-dependencies]
rand = "0.8.4"
quickcheck = "1.0.3"
quickcheck_macros = "1.0.0"
criterion = "0.3"

[[bench]]
name = "benchmark"
harness = false

[target.'cfg(loom)'.dependencies]
loom = "0.4.0"
207 changes: 207 additions & 0 deletions benches/benchmark.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
use std::collections::{BTreeMap, HashMap};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you post the results from these once you feel like they're in a decent place?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Done for today though. In their current state compression with unlimited range against no compression is about 20% faster for btreemap and about 35% faster for hashmap, with limited compression falling somewhere in between. Considering that due to the higher lookup-cost btreemap should be the one benefitting more there definitely is something fishy going on.


use criterion::{criterion_group, criterion_main, BatchSize, Criterion};
mod utilities;
use left_right::*;
use utilities::*;

// Number of ops to insert/publish in total
const LEN: usize = 1 << 16;
// Number of ops per extend
const CHUNK_LEN: usize = 1 << 6;
// Number of ops between publishes
const FREQ: usize = 1 << 10;

fn hash_max(c: &mut Criterion) {
c.bench_function("hash_max", |b| {
b.iter_batched(
|| {
let ops = random_ops(LEN);
let (w, _) = new::<HashMap<_, _>, MapOp<{ usize::MAX }>>();
(ops, w)
},
|(mut ops, mut w)| {
let mut log_len = 0;
while !ops.is_empty() {
w.extend(ops.drain(0..CHUNK_LEN));
log_len += CHUNK_LEN;
if log_len >= FREQ {
log_len -= FREQ;
w.publish();
}
}
},
BatchSize::LargeInput,
)
});
}

fn btree_max(c: &mut Criterion) {
c.bench_function("btree_max", |b| {
b.iter_batched(
|| {
let ops = random_ops(LEN);
let (w, _) = new::<BTreeMap<_, _>, MapOp<{ usize::MAX }>>();
(ops, w)
},
|(mut ops, mut w)| {
let mut log_len = 0;
while !ops.is_empty() {
w.extend(ops.drain(0..CHUNK_LEN));
log_len += CHUNK_LEN;
if log_len >= FREQ {
log_len -= FREQ;
w.publish();
}
}
},
BatchSize::LargeInput,
)
});
}
fn hash_1(c: &mut Criterion) {
c.bench_function("hash_1", |b| {
b.iter_batched(
|| {
let ops = random_ops(LEN);
let (w, _) = new::<HashMap<_, _>, MapOp<1>>();
(ops, w)
},
|(mut ops, mut w)| {
let mut log_len = 0;
while !ops.is_empty() {
w.extend(ops.drain(0..CHUNK_LEN));
log_len += CHUNK_LEN;
if log_len >= FREQ {
log_len -= FREQ;
w.publish();
}
}
},
BatchSize::LargeInput,
)
});
}

fn btree_1(c: &mut Criterion) {
c.bench_function("btree_1", |b| {
b.iter_batched(
|| {
let ops = random_ops(LEN);
let (w, _) = new::<BTreeMap<_, _>, MapOp<1>>();
(ops, w)
},
|(mut ops, mut w)| {
let mut log_len = 0;
while !ops.is_empty() {
w.extend(ops.drain(0..CHUNK_LEN));
log_len += CHUNK_LEN;
if log_len >= FREQ {
log_len -= FREQ;
w.publish();
}
}
},
BatchSize::LargeInput,
)
});
}
fn hash_16(c: &mut Criterion) {
c.bench_function("hash_16", |b| {
b.iter_batched(
|| {
let ops = random_ops(LEN);
let (w, _) = new::<HashMap<_, _>, MapOp<16>>();
(ops, w)
},
|(mut ops, mut w)| {
let mut log_len = 0;
while !ops.is_empty() {
w.extend(ops.drain(0..CHUNK_LEN));
log_len += CHUNK_LEN;
if log_len >= FREQ {
log_len -= FREQ;
w.publish();
}
}
},
BatchSize::LargeInput,
)
});
}

fn btree_16(c: &mut Criterion) {
c.bench_function("btree_16", |b| {
b.iter_batched(
|| {
let ops = random_ops(LEN);
let (w, _) = new::<BTreeMap<_, _>, MapOp<16>>();
(ops, w)
},
|(mut ops, mut w)| {
let mut log_len = 0;
while !ops.is_empty() {
w.extend(ops.drain(0..CHUNK_LEN));
log_len += CHUNK_LEN;
if log_len >= FREQ {
log_len -= FREQ;
w.publish();
}
}
},
BatchSize::LargeInput,
)
});
}
fn hash_none(c: &mut Criterion) {
c.bench_function("hash_none", |b| {
b.iter_batched(
|| {
let ops = random_ops(LEN);
let (w, _) = new::<HashMap<_, _>, MapOp<0>>();
(ops, w)
},
|(mut ops, mut w)| {
let mut log_len = 0;
while !ops.is_empty() {
w.extend(ops.drain(0..CHUNK_LEN));
log_len += CHUNK_LEN;
if log_len >= FREQ {
log_len -= FREQ;
w.publish();
}
}
},
BatchSize::LargeInput,
)
});
}

fn btree_none(c: &mut Criterion) {
c.bench_function("btree_none", |b| {
b.iter_batched(
|| {
let ops = random_ops(LEN);
let (w, _) = new::<BTreeMap<_, _>, MapOp<0>>();
(ops, w)
},
|(mut ops, mut w)| {
let mut log_len = 0;
while !ops.is_empty() {
w.extend(ops.drain(0..CHUNK_LEN));
log_len += CHUNK_LEN;
if log_len >= FREQ {
log_len -= FREQ;
w.publish();
}
}
},
BatchSize::LargeInput,
)
});
}

criterion_group!(
benches, btree_max, btree_16, btree_1, btree_none, hash_max, hash_16, hash_1, hash_none,
);
criterion_main!(benches);
114 changes: 114 additions & 0 deletions benches/utilities.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
#![allow(dead_code)]
use left_right::*;
use rand::{distributions::Uniform, Rng};
use std::collections::{BTreeMap, HashMap, VecDeque};

pub(crate) fn random_ops<const RANGE: usize>(len: usize) -> VecDeque<MapOp<RANGE>> {
let rng = rand::thread_rng();
let dist = Uniform::new(0, usize::MAX);
rng.sample_iter(&dist)
.take(len)
.map(|x| {
// 64 keys, low(current) favors heavy compression, higher favors low/no compression
let key = x & !((!1) << 6);
// rest value
let value = x >> 6;
// One in 1024 is MapOp::Clear, low(current) favors heavy compression, higher favors low/no compression
if x & !((!1) << 10) == 0 {
MapOp::Clear
} else {
// We are using a Map of Strings to have non-trivial operations.
MapOp::Set(key, format!("value of {:?} is: {:?}", key, value))
}
})
.collect()
}

pub(crate) enum MapOp<const RANGE: usize> {
Set(usize, String),
Clear,
}
impl<const RANGE: usize> Absorb<MapOp<RANGE>> for HashMap<usize, String> {
fn absorb_first(&mut self, operation: &mut MapOp<RANGE>, _: &Self) {
match operation {
MapOp::Set(key, value) => {
if let Some(loc) = self.get_mut(key) {
*loc = value.clone();
} else {
self.insert(*key, value.clone());
}
}
MapOp::Clear => {
self.clear();
}
}
}

fn sync_with(&mut self, first: &Self) {
*self = first.clone();
}

const MAX_COMPRESS_RANGE: usize = RANGE;
fn try_compress(
mut prev: &mut MapOp<RANGE>,
next: MapOp<RANGE>,
) -> TryCompressResult<MapOp<RANGE>> {
match (&mut prev, next) {
(MapOp::Set(prev_key, prev_value), MapOp::Set(key, value)) => {
if *prev_key == key {
*prev_value = value;
TryCompressResult::Compressed
} else {
TryCompressResult::Independent(MapOp::Set(key, value))
}
}
(_, MapOp::Clear) => {
*prev = MapOp::Clear;
TryCompressResult::Compressed
}
(MapOp::Clear, next @ MapOp::Set(_, _)) => TryCompressResult::Dependent(next),
}
}
}
impl<const RANGE: usize> Absorb<MapOp<RANGE>> for BTreeMap<usize, String> {
fn absorb_first(&mut self, operation: &mut MapOp<RANGE>, _: &Self) {
match operation {
MapOp::Set(key, value) => {
if let Some(loc) = self.get_mut(key) {
*loc = value.clone();
} else {
self.insert(*key, value.clone());
}
}
MapOp::Clear => {
self.clear();
}
}
}

fn sync_with(&mut self, first: &Self) {
*self = first.clone();
}

const MAX_COMPRESS_RANGE: usize = RANGE;
fn try_compress(
mut prev: &mut MapOp<RANGE>,
next: MapOp<RANGE>,
) -> TryCompressResult<MapOp<RANGE>> {
match (&mut prev, next) {
(MapOp::Set(prev_key, prev_value), MapOp::Set(key, value)) => {
if *prev_key == key {
*prev_value = value;
TryCompressResult::Compressed
} else {
TryCompressResult::Independent(MapOp::Set(key, value))
}
}
(_, MapOp::Clear) => {
*prev = MapOp::Clear;
TryCompressResult::Compressed
}
(MapOp::Clear, next @ MapOp::Set(_, _)) => TryCompressResult::Dependent(next),
}
}
}
41 changes: 41 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,22 @@ pub use crate::read::{ReadGuard, ReadHandle, ReadHandleFactory};

pub mod aliasing;

/// The result of calling [`Absorb::try_compress`](Absorb::try_compress).
#[derive(Debug)]
pub enum TryCompressResult<O> {
/// Returned when [`try_compress`](Absorb::try_compress) was successful
/// and `prev` is now the combined operation after consuming `next`
/// and can be used as the new `next` to continue our current attempt at compression with the next `prev`.
Compressed,
/// Returned when [`try_compress`](Absorb::try_compress) failed because `prev` and `next` are independent of each other
/// and can't be compressed together, though `next` may precede `prev` in the oplog,
/// meaning we can resume our attempt at compression with the next `prev`. Contains `next`.
Independent(O),
/// Returned when [`try_compress`](Absorb::try_compress) failed because `prev` must precede `next`,
/// halting any further attempt to compress `next` before it's insertion. Contains `next`.
Dependent(O),
}

/// Types that can incorporate operations of type `O`.
///
/// This trait allows `left-right` to keep the two copies of the underlying data structure (see the
Expand Down Expand Up @@ -261,6 +277,31 @@ pub trait Absorb<O> {
/// subtly affect results like the `RandomState` of a `HashMap` which can change iteration
/// order.
fn sync_with(&mut self, first: &Self);

/// Range at which [`WriteHandle`] tries to compress the oplog, reset each time a compression succeeds.
///
/// Can be used to avoid having insertion into the oplog be O(oplog.len * ops.len) if it is filled with mainly independent ops.
///
/// Defaults to `0`, which disables compression and allows the usage of an efficient fallback.
const MAX_COMPRESS_RANGE: usize = 0;

/// Try to compress two ops into a single op to optimize the oplog.
///
/// `prev` is the target of the compression and temporarily removed from the oplog, `next` is the op to be inserted.
///
/// A return value of [`TryCompressResult::Compressed`] means the ops were successfully compressed,
/// [`TryCompressResult::Independent`] that while the ops can't be compressed, `next` can safely precede `prev`,
/// and [`TryCompressResult::Dependent`] that they can not be compressed, and `prev` must precede `next`.
///
/// Defaults to [`TryCompressResult::Dependent`], which sub-optimally disables compression.
/// Setting [`Self::MAX_COMPRESS_RANGE`](Absorb::MAX_COMPRESS_RANGE) to or leaving it at it's default of `0` is vastly more efficient for that.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rustfmt may complain about the line wrapping of this comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have vscode configured to automatically run cargo fmt on save and cargo doc doesn't complain about it, so I think it's fine? The single break between 'Defaults...' and 'Setting...' is purely meant to make reading it in an editor easier and should be turned into a simple space on docs.rs.

fn try_compress(prev: &mut O, next: O) -> TryCompressResult<O> {
// yes, unnecessary, but: makes it so that prev is not an unused variable
// and really matches the mental model of 'all ops are dependent'.
match prev {
_ => TryCompressResult::Dependent(next),
}
}
}

/// Construct a new write and read handle pair from an empty data structure.
Expand Down
Loading