-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[WIP] feat: implement DenseSet shrink with iterator safety #6094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
||
| // Process from high to low. This mirrors Grow's iteration order. | ||
| // Elements move to lower buckets (not yet processed) based on their hash. | ||
| for (long i = prev_size - 1; i >= 0; --i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my mind, it should be vice versa from 0 to size, because you move them into lower positions, to avoid double processing
| EXPECT_TRUE(ss_->Contains(str)) << "Missing: " << str; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need sscan test during shrink
| } | ||
|
|
||
| void DenseSet::Reserve(size_t sz) { | ||
| void DenseSet::Resize(size_t sz) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is controversial because it describes some changes to the internal representation. Consider using AdjustCapacity()
| if (num_iterators_ != 0) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks strange to me that shrinking depends on the iterators existing
dranikpg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that all those internal calls to Delete() are now not safe anymore. The safeguards are not a substitute for actually reviewing the code around it and its usage
| IteratorBase(const IteratorBase& other) | ||
| : owner_(other.owner_), curr_list_(other.curr_list_), curr_entry_(other.curr_entry_) { | ||
| if (owner_) { | ||
| ++owner_->num_iterators_; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can call upper base constructor
| // Check num_iterators_ to prevent shrink during iteration (would invalidate iterators) | ||
| if (num_iterators_ != 0) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually like higher level safeguards. On the other hand, having one like this, assumes that this is valid or possible behaviour, and even worse, if it occurs, it will happen unnoticed, because the function just silently returns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, external ExpireIfNeeded calling Delete and then calling this function looks really dangerous. It holds references, the bucket id, etc. It also never holds an iterator directly, so none of your safeguards will kick in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover the safeguards, can be hurting as well, consider:
for (auto it: set) {
set->Delete(it)
}Shrink won't be called even once even if the usage drops to 0%
| TEST_F(StringSetTest, ShrinkWithTTL) { | ||
| // Add elements with TTL | ||
| constexpr size_t num_strs = 16; | ||
| vector<string> strs; | ||
| for (size_t i = 0; i < num_strs; ++i) { | ||
| strs.push_back(random_string(generator_, 10)); | ||
| EXPECT_TRUE(ss_->Add(strs.back(), 100)); // TTL of 100 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a test where items are actively expiring while you call set operations, even most basic ones like Find(), causing the set to shrink
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I grep ExpireIfNeeded there are like 20 matches in dense_set.cc, and every of them might horribly crash if it calls Delete() and Shink() inside. The test should have at least thousands of elements and calling most basic code paths
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a really dangerous change
|
@dranikpg I've already discussed with @vyavdoshenko all these problems; he rewrites all this logic. |
Closes: #6044
Implements automatic and manual shrinking for
DenseSetto reclaim memory when utilization drops, with safeguards to prevent iterator invalidation.Added
Shrink(size_t new_size)that reduces table capacity by rehashing elements:shrinking_flag to prevent recursive callsAdded tracking to prevent shrink during active iteration:
IteratorBaseincrements/decrementsnum_iterators_counter on construction/destructionnum_iterators_ == 0Automatically shrinks in
Delete()when utilization drops below 25%.