Skip to content

Conversation

@vyavdoshenko
Copy link
Contributor

Closes: #6044

Implements automatic and manual shrinking for DenseSet to reclaim memory when utilization drops, with safeguards to prevent iterator invalidation.

Added Shrink(size_t new_size) that reduces table capacity by rehashing elements:

  • Processes buckets backwards (high → low) to correctly merge elements into lower buckets
  • Handles chain reorganization and preserves TTL during element movement
  • Protected by shrinking_ flag to prevent recursive calls

Added tracking to prevent shrink during active iteration:

  • IteratorBase increments/decrements num_iterators_ counter on construction/destruction
  • Shrink only works when num_iterators_ == 0

Automatically shrinks in Delete() when utilization drops below 25%.


// Process from high to low. This mirrors Grow's iteration order.
// Elements move to lower buckets (not yet processed) based on their hash.
for (long i = prev_size - 1; i >= 0; --i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my mind, it should be vice versa from 0 to size, because you move them into lower positions, to avoid double processing

EXPECT_TRUE(ss_->Contains(str)) << "Missing: " << str;
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need sscan test during shrink

}

void DenseSet::Reserve(size_t sz) {
void DenseSet::Resize(size_t sz) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is controversial because it describes some changes to the internal representation. Consider using AdjustCapacity()

Comment on lines +404 to +406
if (num_iterators_ != 0) {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks strange to me that shrinking depends on the iterators existing

Copy link
Contributor

@dranikpg dranikpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that all those internal calls to Delete() are now not safe anymore. The safeguards are not a substitute for actually reviewing the code around it and its usage

Comment on lines +196 to +201
IteratorBase(const IteratorBase& other)
: owner_(other.owner_), curr_list_(other.curr_list_), curr_entry_(other.curr_entry_) {
if (owner_) {
++owner_->num_iterators_;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can call upper base constructor

Comment on lines +403 to +406
// Check num_iterators_ to prevent shrink during iteration (would invalidate iterators)
if (num_iterators_ != 0) {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually like higher level safeguards. On the other hand, having one like this, assumes that this is valid or possible behaviour, and even worse, if it occurs, it will happen unnoticed, because the function just silently returns

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, external ExpireIfNeeded calling Delete and then calling this function looks really dangerous. It holds references, the bucket id, etc. It also never holds an iterator directly, so none of your safeguards will kick in

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover the safeguards, can be hurting as well, consider:

for (auto it: set) {
  set->Delete(it)
}

Shrink won't be called even once even if the usage drops to 0%

Comment on lines +861 to +868
TEST_F(StringSetTest, ShrinkWithTTL) {
// Add elements with TTL
constexpr size_t num_strs = 16;
vector<string> strs;
for (size_t i = 0; i < num_strs; ++i) {
strs.push_back(random_string(generator_, 10));
EXPECT_TRUE(ss_->Add(strs.back(), 100)); // TTL of 100
}
Copy link
Contributor

@dranikpg dranikpg Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a test where items are actively expiring while you call set operations, even most basic ones like Find(), causing the set to shrink

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I grep ExpireIfNeeded there are like 20 matches in dense_set.cc, and every of them might horribly crash if it calls Delete() and Shink() inside. The test should have at least thousands of elements and calling most basic code paths

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a really dangerous change

@BorysTheDev
Copy link
Contributor

@dranikpg I've already discussed with @vyavdoshenko all these problems; he rewrites all this logic.

@vyavdoshenko vyavdoshenko marked this pull request as draft November 21, 2025 09:12
@vyavdoshenko vyavdoshenko changed the title feat: implement DenseSet shrink with iterator safety [WIP] feat: implement DenseSet shrink with iterator safety Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Shrink() to DenseSet

4 participants