Skip to content

Refresh PG statistics after log replay on restart#398

Open
JacksonYao287 wants to merge 2 commits intoeBay:mainfrom
JacksonYao287:refresh-pg-metrics-when-restart
Open

Refresh PG statistics after log replay on restart#398
JacksonYao287 wants to merge 2 commits intoeBay:mainfrom
JacksonYao287:refresh-pg-metrics-when-restart

Conversation

@JacksonYao287
Copy link
Collaborator

Add refresh_pg_statistics() to recalculate and update PG statistics (active_blob_count, tombstone_blob_count, total_occupied_blk_count) after log replay completes. This ensures statistics accuracy after system crashes or restarts.

The function is called in on_log_replay_done() before the raft group joins, scanning the index table and chunks to recompute all three statistics from actual data. Statistics are persisted at the next periodic checkpoint.

Add refresh_pg_statistics() to recalculate and update PG statistics
(active_blob_count, tombstone_blob_count, total_occupied_blk_count)
after log replay completes. This ensures statistics accuracy after
system crashes or restarts.

The function is called in on_log_replay_done() before the raft group
joins, scanning the index table and chunks to recompute all three
statistics from actual data. Statistics are persisted at the next
periodic checkpoint.
@JacksonYao287
Copy link
Collaborator Author

according to the suggestion from @zhiteng , I made this change to force to refresh pg key metrics when restarting

Copy link
Collaborator

@xiaoxichen xiaoxichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we UT this ?

de.total_occupied_blk_count.store(total_occupied, std::memory_order_relaxed);
});

LOGI("Refreshed statistics for pg={}: active_blobs={}, tombstone_blobs={}, occupied_blocks={}", pg_id, active_count,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to log out original value as well and a keyword like "corrected" for debugging.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@codecov-commenter
Copy link

codecov-commenter commented Mar 13, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 75.60976% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.86%. Comparing base (1746bcc) to head (4d0c98b).
⚠️ Report is 164 commits behind head on main.

Files with missing lines Patch % Lines
src/lib/homestore_backend/hs_pg_manager.cpp 75.00% 9 Missing and 1 partial ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #398       +/-   ##
===========================================
- Coverage   63.15%   52.86%   -10.30%     
===========================================
  Files          32       36        +4     
  Lines        1900     5272     +3372     
  Branches      204      656      +452     
===========================================
+ Hits         1200     2787     +1587     
- Misses        600     2194     +1594     
- Partials      100      291      +191     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

EXPECT_GT(pg_stats.used_bytes, 0) << "Used bytes should be greater than 0";

uint64_t used_bytes_after = pg_stats.used_bytes;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to corrupt one more time here

auto hs_pg = dynamic_cast< HSHomeObject::HS_PG* >(_obj_inst->_pg_map[pg_id].get());
ASSERT_NE(hs_pg, nullptr);

// Manually corrupt statistics to simulate desync
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/desync/inconsistency


LOGI("Refreshed statistics for pg={}: active_blobs={}, tombstone_blobs={}, occupied_blocks={}", pg_id, active_count,
tombstone_count, total_occupied);
LOGI("[corrected] Refreshed statistics for pg={}: active_blobs={} (original={}), tombstone_blobs={} (original={}), "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the [corrected] should not be printed if we didnt correct anything

} else {
active_count++;
}
return false; // Continue scanning
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can hit OOM here....

Each blob takes 24B+, assuming we have 10GB mem , it is up to ~400M blobs, with 8KB minimal blob size , it can hit OOM.

Can we cleanup the dummy_output during pagination ?

LOGINFO("Created shard {}", shard_info.id);

// Put some blobs to populate statistics using put_blobs
const uint32_t num_active_blobs = 10;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have to deal with pagination by ourselves, the number should be large enough to ensure b-tree do have multi pages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants