-
Notifications
You must be signed in to change notification settings - Fork 43
fix: fallback to normal delete when deleting a small ranges #644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| "k8s.io/utils/pointer" | ||
| ) | ||
|
|
||
| func BenchmarkGenerate100(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the benchmark test. if you want to understand more details, you can check out to commit 8996085 and check the tests/deleterange_test.go . that is using the actual data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have the before vs after results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from benchmark test
main branch
goos: darwin
goarch: arm64
pkg: github.com/streamnative/oxia/server/kv
cpu: Apple M1 Pro
BenchmarkDeleteRange
BenchmarkDeleteRange-10 8863 34814860 ns/op
PASS
this branch
goos: darwin
goarch: arm64
pkg: github.com/streamnative/oxia/server/kv
cpu: Apple M1 Pro
BenchmarkDeleteRange
BenchmarkDeleteRange-10 10000 848611 ns/op
PASS
|
@mattisonchao In BookKeeper, we also use deleteRange to delete keys, but we found RocksDB background compaction job will skip compacting the SST files whose key are deleted by deleteRange operation. apache/bookkeeper#4555 Would you please double check if the SST file are compacted? |
you are mentioning the disk size issue. the current problem is read performance issue. |
@mattisonchao It will also impact the read performance |
|
we are using in-memory implementation for benchmarking. Also, the data didn't exceed the memtable size. therefore, it has nothing with sst.
but I would like to understand why it will also impact the read performance. |
In an LSM you start from the top and you go down the tree of SST files. With point The problem is typically around handling iterators, because you have to remember the tombstones in the levels above the one you're currently exploring. |
Yes.
Yes, and it will construct a skyline to improve the read performance. but it's expensive to construct(maybe Pebble did cache for sst). and it seems like Pebble(Rocksdb) will make improvements to merge range when moving memtable to sst. and then the delete range in sst will be less. but for the current case, the reason is we have many small ranges |
|
|
||
| if err := batch.DeleteRange(delReq.StartInclusive, delReq.EndExclusive); err != nil { | ||
| return nil, errors.Wrap(err, "oxia db: failed to delete range") | ||
| var validKeys []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd try to find a way to avoid building this key array, since it could be very big.
One strategy could be:
- start doing point delete operations
- when we cross the threshold, we continue with the delete range
Motivation
In the current implementation, we will get terrible performance for iterator-related operations when we have many small
DeleteRangeranges.FYI: https://rocksdb.org/blog/2018/11/21/delete-range.html
It will make our
putWithSequencemethod execute very slowly. (based on the current memtable size)The reason is that the implementation of pebble(rocksDB) will create the fragmented range deletion tombstones for range deletion. Even Pebble will cache the tombstones, but it will invalidate when you get a new delete range operation. Therefore, we need to rebuild the deletion iterator after the new
DeleteRangeoperation. And it will get worse along with moreDeleteRangeoperations.FYI: https://github.com/cockroachdb/pebble/blob/dbc1c128682f7efcdb76352432249780b12447f7/mem_table.go#L248-L251
Also, rocksdb article also mentioned that in
Future Worksection.Modification
DeleteRangeoperation back to the normal delete if the actual keys are lower thanDeleteRangeThreshold(default 100).Next
DeleteRangeThresholdconfigurable.